Putting the Star Schema Benchmark to work with SQL Server 2008R2

Getting Standard ML working on a Macbook »

Putting the Star Schema Benchmark to work with SQL Server 2008R2

July 3, 2011 by neilrobbins

So, having described a bit about the Star Schema Benchmark let’s put it in to action with SQL Server 2008. Whilst I could run it locally I thought, it would be more interesting to run it on an Amazon EC2 SQL Server instance. Now obviously, in terms of HDD performance & all round IO this is not an ideal platform for this kind of thing, but the point of this post is just to provide a bootstrap for someone looking to use this benchmark & its data generation tool with SQL Server.

First things first, a quick trip to my EC2 console & in less than 2 minutes I have a nice Windows instance up and running (though of course I have to wait > 15 mins to get the password 😦 The pains of Windows on EC2 instead of Linux but ah well). I plumped for a m1.xlarge instance which gives me 4 virtual cores & 15GB of RAM on Windows Server 2008R2 Datacentre edition with SQL Server 2008R2 Standard edition also installed and ready. Not only that, but given the cluster machine types are not available with the Windows OS’s then it also gives me the best I/O of the selection. With the instance started, the administrator password retrieved, and me now logged in I took care of some basic tasks. Firstly, getting git installed courtesy of msysgit. Secondly, cloning my repo for this project from GitHub, and thirdly starting up the SQL Server service.

With that all done, I was ready to generate some files. From the command prompt, and with the dbgen tool & the dists.dss file in a new folder ready to hold the generated files I ran the command:

dbgen -s 2 -T a

to generate all of the tables with a scale-factor of 2. With that taking less than the time it took me to flip back to my laptop & send some tweets, I drew confidence and ran the same command again, but this time with a scale factor of 10, which gave me 59,986,214 rows in my fact table. Not a big database by any standards, but big enough to get going with I figure. This took about 10 mins to generate the data (I didn’t time it, but it seemed like about 10 mins to me) and so I then ran my script with bulk insert statements to insert all of the data in to the database (I’d generated the database & tables using my scripts whilst the data was being gen’d). During the load into SQL Server I took the opportunity to look at the resource monitor and, unsurprisingly, it showed that whilst memory & CPU were hardly being touched, Disk I/O was going at full pelt as it read from the lineorder table and pushed the data into the tempdb. Unfortunately, during the upload of the LineOrder table I also ran out of disk space for tempdb to use. Lesson Learned: Use a bigger HDD. So, I created a new 25GB EBS Volume for tempdb & another of 50GB for the database both of which I then attached to the instance before restarting it. With the instance now showing the new drive I remapped the tempdb data and log files to the new drive, and detached the StarSchemaBenchmark database so that I could move its files to the new drive & reattach them there:

ALTER DATABASE tempdb MODIFY FILE (NAME = ‘tempdev’, FILENAME = ‘D:\SqlData\tempdb.mdf’)

ALTER DATABASE tempdb MODIFY FILE (NAME = ‘templog’, FILENAME = ‘D:\SqlData\tempdb.ldf’)

Now I could run the import of the LineOrder table again.

This time it was successful. So I ran the standard queries as a batch, which, in case anyone cares took 2 minutes 05 seconds to complete. A bit of index tuning later and I had that down to 21 seconds. But don’t pay attention to the results, the point of this article (& the series it’s a part of) is to help people in using the Star Schema Benchmark & its dbgen tool. From here I could, and will, create cubes out of the same data, compare different approaches to writing sql to see where they might carry performance benefits, investigate indexing, all sorts really. I’ll probably also be setting up a server I bought off ebay a while back (amazing what £100 will get you in terms of hardware) and putting a 500 million dataset on it & see how that works out, which will also let me play around more with the effect of moving things around different drives, partitioning, etc… not to mention running the Enterprise edition. (and of course, the whole, much larger world that isn’t MS SQL Server, prob. starting with Greenplum CE, Postgres, & MonetDB).

The scripts I used for this are all available from my GitHub account here.

Nick Haslam has blogged about working with the TPC-H standard & SQL Server here.

Posted in Benchmarking, Datawarehouses, Sql Server | Tagged Benchmarking, SqlServer, SqlServer2008, StarSchema | 7 Comments

7 Responses

on July 3, 2011 at 19:09 | Reply Using the Star Schema Benchmark « Semeosis

[…] « Review & Reaction: The Revolution in Database Architecture Putting the Star Schema Benchmark to work with SQL Server 2008R2 […]
on July 3, 2011 at 19:46 | Reply Fun with TPC-H (Part 1) « Dev Blog

[…] You can also read an article, written by Neil Robbins, on this topic also, here: https://blog.semeosis.com/2011/07/03/putting-the-star-schema-benchmark-to-work-with-sql-server-2008r… […]
on October 9, 2011 at 22:21 | Reply Felipe

Do you know where can I download the Start Schema Benchmark database?
- on October 9, 2011 at 23:39 | Reply neilrobbins
  
  The Star Schema Benchmark doesn’t have a database as such. It rather guides on how to create a database that will conform to the benchmark, and then provides a tool (the dbgen tool) which will generate the data. At the end of the post I give a link to a git repo of mine where I have placed both a copy of the dbgen tool, and also a copy of the scripts I used to create this database in SQL Server 2008. The scripts ought to be broadly similar for most RDBMS out there, while the tool will generate the data files for you.
on July 28, 2013 at 12:19 | Reply Dan Howard

dbgen just crashes for me, What else do I need to run it?
on July 28, 2013 at 12:24 | Reply Dan Howard

Sorry figured it out. Github downloaded the DSS as an HTML document. I viewed raw and copy/pasted into a new file.
- on July 28, 2013 at 17:39 | Reply neilrobbins
  
  Hi Dan, glad to hear you’ve found the issue. Neil

Comments RSS

Semeosis

Sine Cera