|
|
Program for TPC-H Data Generation with Skew |
The schema and queries of the TPC-H (formerly TPC-D) benchmark are widely used by people in the database community. Last published: April 26, 2016.
The schema and queries of the TPC-H (formerly TPC-D) benchmark are widely used by people in the database community. One of the requirements of the benchmark is that data for columns in the database are generated from a uniform distribution. However, this requirement makes it hard for users to conclude about the robustness/effectiveness of their system since real world data distributions are often non-uniform. We have therefore created a new data generation program for TPC-H that is capable of generating a database where the columns have non-uniform (skewed) data distributions. In particular, the program can generate data from a Zipfian distribution, where the Zipf value (z), which controls the degree of skew in the data, is a parameter that can be specified to the program. In addition, the program allows the generation of a database with “mixed” data distribution where the skew of a column in the database is randomly chosen from the Zipfian values {0,1,2,3,4}. Note that the total number of rows in the tables and the total database size are not affected by our changes.
Files
|
|
Status: LiveThis download is still available on microsoft.com. The downloads below will come directly from the Microsoft Download Center. |
| Files |
|---|
|
|
System Requirements
Operating Systems: Windows 10, Windows 7, Windows 8
- Windows 7, Windows 8, or Windows 10
Installation Instructions
- Click Download and follow the instructions.