HammerDB for Managers

This post is targeted towards the questions most often asked by non-technical management who want to get up to speed on what HammerDB is (what it isn’t) and how it can benefit their organization.

What is HammerDB?

HammerDB is a software application for database benchmarking.  It enables the user to measure database performance and make comparative judgements about database hardware and software.

HammerDB has graphical and command line interfaces for the Windows and Linux operating systems.

 

 

 

 

 

 

 

Why HammerDB was developed

Databases are highly sophisticated software, and to design and run a fair benchmark workload is a complex undertaking. The Transaction Processing Performance Council (TPC) was founded to bring standards to database benchmarking, and the history of the TPC can be found here. The TPC designed benchmarks for transaction processing (OLTP) and analytics (OLAP) and anyone can run these benchmarks, have them audited by the TPC and published on the official benchmark rankings.

However, although these results are the gold standard of database benchmarking to do so, requires time, expertise and not insignificant cost. Additionally, many databases contain a license clause colloquially known as a De-Witt clause that prevents the publication of non-approved benchmarks.

These factors meant that often when looking for database performance information, the results for a particular combination of software and hardware were not available.

As the TPC makes its specifications available for free, the need was seen for an open source benchmarking application that could leverage the standards written by the TPC yet implemented quickly, easily and at low cost by anyone. Also, when testing a database with a De-Witt clause, it was then possible to produce your own results without having to rely on a particular vendor to publish results of interest.

The HammerDB name

Originally, HammerDB was named  Hammerora because the first database the application supported was Oracle. Ora was a common prefix/suffix for Oracle related software, and the name inspired by a genre of classic films and the characters portrayed, in particular for testing Oracle RAC.  As more databases were added, the original name was less appropriate and the name HammerDB was suggested and adopted.

HammerDB development

HammerDB was originally developed by Steve Shaw as an employer approved  own-time, own-materials project to implement workloads derived from TPC specifications in a user accessible way.  An important concept was to simulate database users called Virtual Users in parallel (rather than concurrently) to accurately simulate a real database workload with multiple users running from separate systems. Programming languages such as Java, Lua and Python were unable to support this unique requirement (an application to simulate separate users in multiple threads that do not block or pause each other’s workloads, the Python GIL for example only permits one user to run at any one time) and the design decisions made are discussed in the following post.

What programming languages does HammerDB use and why does it matter?

As a result, HammerDB was able to scale beyond other database workload applications as the following post illustrates.

Why Tcl is 700% faster than Python for database benchmarking

Adoption by the TPC

Given the increasing importance of open source and the widespread use of HammerDB, the TPC adopted HammerDB in 2019 and now hosts the project on GitHub. Today, the TPC-OSS subcommittee oversees development and approves all modifications made through pull requests.

Usage and industry adoption

HammerDB maintains a web page under the stats link where the number of downloads can be tracked. HammerDB is used globally, with the areas of use shown in blue on the map below with a darker color showing higher levels of use, with the most popular destinations being the USA and China.

The results of HammerDB workloads have been published by all leading cloud vendors, database software vendors and systems suppliers, HammerDB maintains a collated list of these publications but does not vet or audit results before inclusion.

HammerDB Licensing

HammerDB is open source software licensed under the
The GNU General Public License v3.0 (GPLv3) and a quick guide to GPLv3 can be viewed here.  HammerDB has dependencies on external open source software and can be built from source.

How to build HammerDB from source

HammerDB is Free software and consequently engineers should consider not only how they can benefit from using the software but also how they can contribute to the community with code and documentation.

Businesses that depend on open source should consider sponsorship of open source projects or financial support to ensure that the open source they depend on remains freely available.

Supported Databases

HammerDB supports the most popular databases on the db-engines ranking, namely Oracle Database, Microsoft SQL Server, IBM Db2, TimesTen, MySQL, MariaDB, PostgreSQL, Greenplum, Postgres Plus Advanced Server, Citus Data, Amazon Aurora and Amazon Redshift.  HammerDB supports these databases running in the cloud and in the enterprise, and will also run workloads against databases derived from the most popular open source databases MySQL and PostgreSQL.

The wide range of database support gives HammerDB an advantage over other database benchmarking tools that only implement workloads against one or two databases, limiting comparison between database engines and assessing relative performance.

Derived Workloads

 When testing database performance there are 2 distinct workloads, transactional or OLTP and analytic (data warehouse, decision support) or OLAP.  HammerDB supports 2 workloads derived from TPC specifications to test these different requirements, namely TPROC-C derived from TPC-C for OLTP and TPROC-H derived from TPC-H for OLAP.

It is important to note that TPC-C and TPC-H are registered trademarks of the TPC and using the names TPC-C or TPC-H and/or as official metrics such as tpmC or QphH in a non-audited publication is considered a trademark violation and should not be used.

An additional specification called TPC-CH for hybrid transactional/analytical processing (HTAP) is under research and development for inclusion in a future release  as TPROC-CH.

In addition to the TPC-C specification for OLTP workloads, the TPC has also developed the TPC-E specification.  HammerDB will consider developing a TPC-E derived workload when official benchmark publications are made from at least 3 of the supported databases to ensure a fair representation in an open source version.

The NOPM Metric

When reporting TPROC-C workloads the key metric is known as NOPM or New Orders per Minute.  This metric measures the same value as the tpmC metric in an official TPC-C publication, however as noted previously the use of official terminology in derived workloads is not permitted and therefore HammerDB uses NOPM as a derived metric.  HammerDB also reports TPM as an engineering metric, it is not a requirement for there to be a fixed relationship between NOPM and TPM across databases and therefore NOPM can be used for comparison of performance  between databases whereas TPM is for analysing a particular database engine.

Cached vs Scaled Workloads

The official TPC-C OLTP workload is a what is known as a scaled workload, however a key difference in the HammerDB design was by default to implement a smaller more efficient cached workload based on the same specification that could still give an indication of comparative performance in the same way that the scaled workload could.

A key difference between cached and scaled workloads is the implementation of keying and thinking time to introduce a pause of time between transactions. In a scaled derived TPC-C workload allowing for this keying and thinking time, one Virtual User will complete approximately 1 New Order per Minute and therefore for example 10,000 database sessions will run at approximately 10,000 NOPM and 100,000 sessions at 100,000 NOPM.  The workload also outputted the data from the Virtual Users by simulating individual terminals.

Note that HammerDB can also implement a scaled workload with a feature called event-driven scaling.  However, this requires a large data set and middleware to manage the large database session count.  Instead, most users prefer to implement a cached workload.

When HammerDB was designed, it was clear that where the database software was scalable (initially such as Oracle), CPU performance at full utilization was the key determining factor for database performance. Therefore, a perfectly scaled implementation provided the levels of memory and I/O or disk capacity to reach full CPU utilization.

However, prior to the advent of high performance Solid State Disks (SSDs) to implement such a configuration required a high capacity of hard disk drives (HDDs) in one or many fibre attached storage arrays at considerable expense.

Instead, HammerDB implemented a cached workload by eliminating the keying and thinking time and the requirement for terminals. Now individual Virtual Users don’t pause between transactions and can run at tens of thousands NOPM each.  With the reduced I/O footprint, the data each Virtual User requires is much reduced, meaning that most of the workload is cached in memory. This means we can reach full CPU utilization (with scalable database software) quicker and without requiring middleware.

This approach gives us an indication of what an optimally configured scaled configuration with a high I/O and memory capacity can achieve, however be aware that for a production environment you will have measured the CPU’s database potential, but you will need sufficient memory, I/O and scalable database software to make full use of this CPU potential.

Summary

1. HammerDB is a software application for database benchmarking.

2. HammerDB was developed to allow anyone to run database benchmarks quickly, easily and at low cost.

3. HammerDB was designed to scale and is developed in a language that is not restrictied by a Global Interpreter Lock (GIL) that restricts workloads to being single-threaded and instead runs in parallel.

4. The HammerDB name was inspired by a genre of classic films.

5. HammerDB is hosted by Transaction Processing Performance Council (TPC) who oversee development.

6. HammerDB is used globally and provides statistics on downloads and publications.

7. HammerDB is open source software licensed under the The GNU General Public License v3.0 (GPLv3).

8. HammerDB supports the most popular relational databases on the db-engines ranking.

9. HammerDB runs workloads called TPROC-C and TPROC-H derived from the TPC specifications TPC-C and TPC-H respectively with the NOPM metric, the key metric for measuring transactional performance. Using TPC terminology for non-audited benchmarks violate TPC trademarks.

10. By default HammerDB implements a cached vs a scaled workload but can implement both types of benchmark.

 

Author