It seems self-evident that to be relevant a benchmark must speak to a particular class of problems. For example, it should be obvious that a benchmark designed to measure the performance of transaction processing will not be a good fit when assessing the appropriateness of systems intended soley for use in reporting. The specificity argument can though of course lead to ever more specific benchmarks. In ‘The Set Query Benchmark‘, a paper which forms a part of ‘The Benchmark Handbook‘, a volume edited by Jim Gray on Benchmarking databases, Gray presents 4 criteria which a domain-specific benchmark must meet if it is to be useful:
The Star Schema Benchmark is described in a 2007 paper of which the lead author, Pat O’Neil, is also the author of The Set Query Benchmark discussed above. It describes a domain-specific benchmark that has been specifically designed to enable the comparison of star schema performance across different products. The benchmark itself is a derivative of the TPC-H standard but where the structure of the database has been transformed into a star schema, also dropping columns, for instance, text columns from the fact table, and in other ways described at length in the paper reworking the database so that it aligns with the advise and practices considered optimal by Kimball. Following the The Set Query Benchmark paper, this paper also contains a consideration of the functional and selectivity coverage aimed for by this benchmark. In terms of the functional coverage, the benchmark provides a relatively small number of queries, each exploring different numbers of predicates on dimension, and fact, columns. From this point of view a number of common star schema query scenarios are not provided, or possible, including, for example, where degenerate dimensions, junk dimensions, factless fact tables, fact dimensions, or joins from the fact table to non-leaf level dimension attributes are present. Selectivity coverage is provided for by varying across the queries the number of rows from the fact table which must be fetched in order to provide the results. The queries then are split into four ‘flights’ of queries, where Flights 1 – 4 each have involve restrictions on the corresponding number of dimensions (so Flight one has a restriction on dimension, while Flight 4 has a restriction on 4 dimensions).
Another thing which the Star Schema Benchmark brings to the table, and that I have found useful, is a tool which will generate synthetic datasets with consistent cardinalities between the fact table & dimension tables, and distributions within these tables. In the course of my work, whether in selecting a database platform, or in proving the effectiveness of different approaches to a query design, having access to a suitably modeled & distributed dataset is very helpful. An option which I have used previously & seen being used is the use, sanitised of course, of a ‘real world’ (aka client) dataset. Beyond the sanitisation required such an approach also brings a number of other complicating factors. DeWitt in his paper, The Wisconsin Benchmark: Past, Present, and Future, explains why in the creation of this benchmark he & his team opted for synthetic rather than empirical (so pre-existing real world) data. The arguments he puts forward are that:
- Empirical databases are hard to scale;
- The values in empirical databases make it more difficult to systematically benchmark a system. E.g. creating queries that allow for precise levels of selectivity;
- Empirical databases don’t tend to have uniformly distributed values;
- Through the use of a synthetic database the simplicity of the structure and distributions of attribute values could be ensured enabling those using the benchmark to quickly understand the database and to design new queries for it.
Whilst if, through the investment of effort, these difficulties are overcome such a dataset may result in a benchmark which concerns a more focused, and therefore perhaps, relevant domain the cost of this effort should not be underestimated (though in IT of course, it probably will be). Also, depending on the questions which must be answered, the usefulness of such a specific dataset may not be any greater. In Doing Your Own Benchmark (again part of the same Jim Gray volume) Sawyer suggests that before, & whilst, undertaking a benchmarking exercise (or adventure as he puts it) it is necessary to ask three interacting questions:
- What do you want to learn?
- How much are you prepared to invest?
- What are you prepared to give up?
A lot of the time I suspect, when faced with the answer to 2. and given the alternative of generating an otherwise fit for purpose synthetic dataset, the answer to 3. will include giving up an empirical dataset, along perhaps with some of the more specific questions it might have additionally answered, but which probably are not of core importance.
The dbgen tool which was created for The Star Schema Benchmark is derived from the TPC-H dbgen tool and enables its user to generate files containing pipe separated records which match the tables required by this benchmark. Furthermore, in using this tool a scale-factor value is provided which acts as a multiplier on the number of rows which will be generated. So that with a scale-fact of 1 nearly 6,000,000 fact table rows are generated with various smaller amounts of rows for each of the dimension tables depending on their cardinality relative to the fact table, but given a scale factor of 10 the figure is 60 million with the rows for the dimension tables being scaled appropriately (so for dates not at all).
The Benchmark Handbook can be found freely available online here, courtesy of Microsoft Research. Together with links to pdfs for each of the chapters from it that I have cited here.
There are two Star Schema Benchmark papers that I have been able to access & use. The 2007 paper here, and the 2009 revision (the 3rd revision apparently) of this paper here.