HammerDB v4.1 New Features Pt3: Step Workloads

 

HammerDB Variable or Step Workloads are an advanced testing feature that enables you to automatically vary the load on the database over a period of time. When taking this approach you would not focus on the test result but instead monitor the databases ability to cope with the variation in demand and transaction response times.

Step Workloads

To implement Step Workloads, HammerDB v4.1 introduces a CLI only command called steprun combined with a new XML configuration file called steps.xml.   the steprun command reads the XML configuration file and creates primary and replica instances of HammerDB per step with the replica instances starting at pre-defined time intervals automatically connecting back to the primary instance of HammerDB.

Primary and Replicas

When defining the workload it is best to think of the configuration as defining a pyramid with the primary at the base and the replicas sitting above it. Each replica must finish at the same time or earlier than the primary.  The primary running time continues to be defined by the standard settings. For example for configuring an Oracle workload the following commands set the rampup and duration running times respectively. Therefore in this case the workload woud run for 12 minutes with 2 minutes of rampup and 10 minutes of duration.

diset tpcc rampup 2 
diset tpcc duration 10

The replicas are defined in steps.xml in the config directory to determine how many virtual users each replica will configure, how soon after the previous instance has started the replica should start and how long it will run for.

<steps>
<replica1>
<start_after_prev>2</start_after_prev>
<duration>8</duration>
<virtual_users>2</virtual_users>
</replica1>
<replica2>
<start_after_prev>2</start_after_prev>
<duration>6</duration>
<virtual_users>2</virtual_users>
</replica2>
<replica3>
<start_after_prev>2</start_after_prev>
<duration>4</duration>
<virtual_users>2</virtual_users>
</replica3>
<replica4>
<start_after_prev>2</start_after_prev>
<duration>2</duration>
<virtual_users>2</virtual_users>
</replica4>
</steps>

If the configuration is incorrect HammerDB will report the error and fail to start the Step Workload.  In the following example we have set the default 2 minutes of rampup and 5 minutes of test.

diset tpcc rampup 2 
diset tpcc duration 5

In this case the workload errors because the replica running times exceed the primary.

Value 2 for tpcc:rampup is the same as existing value 2, no change made
Value 5 for tpcc:duration is the same as existing value 5, no change made
primary starts immediately, runs rampup for 2 minutes then runs test for 5 minutes with 2 Active VU
replica1 starts 2 minutes after rampup completes and runs test for 8 minutes with 2 Active VU
Error: replica1 is set to complete after 12 minutes and is longer than the Primary running time of 7 minutes
replica2 starts 2 minutes after previous replica starts and runs test for 6 minutes with 2 Active VU
Error: replica2 is set to complete after 12 minutes and is longer than the Primary running time of 7 minutes
replica3 starts 2 minutes after previous replica starts and runs test for 4 minutes with 2 Active VU
Error: replica3 is set to complete after 12 minutes and is longer than the Primary running time of 7 minutes
replica4 starts 2 minutes after previous replica starts and runs test for 2 minutes with 2 Active VU
Error: replica4 is set to complete after 12 minutes and is longer than the Primary running time of 7 minutes
Error: Step workload primary running time must exceed the running time of all replicas

Instead the following script defines the rampup of 2 minutes and duration of 10 correctly so that the primary provides an adequate base for the pyramid, starting before and ending at the same time or after all of the replicas. The workload is started with the steprun command as the last command.  No commands should follow steprun as the command will intentionally exit all replicas and the primary when the workload is complete.

dbset db ora
dbset bm TPC-C
diset connection system_user system
diset connection system_password oracle
diset connection instance RAZPDB1
diset tpcc tpcc_user tpcc
diset tpcc tpcc_pass tpcc
diset tpcc total_iterations 10000000
diset tpcc ora_driver timed
diset tpcc rampup 2
diset tpcc duration 10
vuset logtotemp 1
vuset vu 2
steprun

Running this script it can be seen that without further intervention the primary and replicas are created with the replicas automatically connecting back to the primary,  the replicas are then started at the time interval given in the steps.xml file.

The primary sets the rampup in the replicas to zero (as the rampup has completed in the primary) and then sends the individual duration times to the replicas. Time profiling is also disabled in the replicas.  When complete replicas will call exit from the primary and when the final replica has completed the primary will also exit.

Note that it is expected for Virtual User 1 of the replicas to end immediately with the message.

Vuser 1:FINISHED SUCCESS

This is because Virtual User 1 is the monitor Virtual User but this Virtual User does not run in the replica meaning it ends immediately. When the replicas are started the message sending “run_virtual” is recorded.

Sending "run_virtual" ....

The following output shows the previously defined step workload running against an Oracle database.

hammerdb>source runstepora.tcl
Database set to Oracle
Benchmark set to TPC-C for Oracle
Value system for connection:system_user is the same as existing value system, no change made
Changed connection:system_password from manager to oracle for Oracle
Changed connection:instance from oracle to RAZPDB1 for Oracle
Value tpcc for tpcc:tpcc_user is the same as existing value tpcc, no change made
Value tpcc for tpcc:tpcc_pass is the same as existing value tpcc, no change made
Changed tpcc:total_iterations from 1000000 to 10000000 for Oracle
Clearing Script, reload script to activate new setting
Script cleared
Changed tpcc:ora_driver from test to timed for Oracle
Value 2 for tpcc:rampup is the same as existing value 2, no change made
Changed tpcc:duration from 5 to 10 for Oracle
primary starts immediately, runs rampup for 2 minutes then runs test for 10 minutes with 2 Active VU
replica1 starts 2 minutes after rampup completes and runs test for 8 minutes with 2 Active VU
replica2 starts 2 minutes after previous replica starts and runs test for 6 minutes with 2 Active VU
replica3 starts 2 minutes after previous replica starts and runs test for 4 minutes with 2 Active VU
replica4 starts 2 minutes after previous replica starts and runs test for 2 minutes with 2 Active VU
Switch from Local
to Primary mode?
Enter yes or no: replied yes
Setting Primary Mode at id : 20166, hostname : razorbill.home
Primary Mode active at id : 20166, hostname : razorbill.home
Starting 1 replica HammerDB instance
Starting 2 replica HammerDB instance
HammerDB CLI v4.1
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
HammerDB CLI v4.1
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
Starting 3 replica HammerDB instance
Starting 4 replica HammerDB instance
Doing wait to connnect ....
Primary waiting for all replicas to connect .... 0 out of 4 are connected
HammerDB CLI v4.1
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
HammerDB CLI v4.1
Copyright (C) 2003-2021 Steve Shaw
Type "help" for a list of commands
The xml is well-formed, applying configuration
The xml is well-formed, applying configuration
The xml is well-formed, applying configuration
Switch from Local
to Replica mode?
Enter yes or no: replied yes
Switch from Local
to Replica mode?
Enter yes or no: replied yes
Setting Replica Mode at id : 20182, hostname : razorbill.home
Replica connecting to localhost 20166 : Connection succeeded
Received a new replica connection from host ::1
Setting Replica Mode at id : 20181, hostname : razorbill.home
Replica connecting to localhost 20166 : Connection succeeded
New replica joined : {20182 razorbill.home}
The xml is well-formed, applying configuration
Received a new replica connection from host ::1
New replica joined : {20182 razorbill.home} {20181 razorbill.home}
Switch from Local
to Replica mode?
Primary call back successful
Switched to Replica mode via callback
Enter yes or no: replied yes
Primary call back successful
Switched to Replica mode via callback
Setting Replica Mode at id : 20183, hostname : razorbill.home
Received a new replica connection from host ::1
Replica connecting to localhost 20166 : Connection succeeded
New replica joined : {20182 razorbill.home} {20181 razorbill.home} {20183 razorbill.home}
Primary call back successful
Switched to Replica mode via callback
Switch from Local
to Replica mode?
Enter yes or no: replied yes
Setting Replica Mode at id : 20184, hostname : razorbill.home
Received a new replica connection from host ::1
Replica connecting to localhost 20166 : Connection succeeded
New replica joined : {20182 razorbill.home} {20181 razorbill.home} {20183 razorbill.home} {20184 razorbill.home}
Primary call back successful
Switched to Replica mode via callback
Primary waiting for all replicas to connect .... {20182 razorbill.home} {20181 razorbill.home} {20183 razorbill.home} {20184 razorbill.home} out of 4 are connected
Primary Received all replica connections {20182 razorbill.home} {20181 razorbill.home} {20183 razorbill.home} {20184 razorbill.home}
Database set to Oracle
Database set to Oracle
Database set to Oracle
Setting primary to run 2 virtual users for 10 duration
Database set to Oracle
Database set to Oracle
Value 10 for tpcc:duration is the same as existing value 10, no change made
Sending dbset all to 20182 razorbill.home
Setting replica1 to start after 2 duration 8 VU count 2, Replica instance is 20182 razorbill.home
Sending "diset tpcc ora_timeprofile false" to 20182 razorbill.home
Value false for tpcc:ora_timeprofile is the same as existing value false, no change made
Sending "diset tpcc rampup 0" to 20182 razorbill.home
Changed tpcc:rampup from 2 to 0 for Oracle
Sending "diset tpcc duration 8" to 20182 razorbill.home
Changed tpcc:duration from 10 to 8 for Oracle
Sending "vuset vu 2" to 20182 razorbill.home
Sending dbset all to 20181 razorbill.home
Setting replica2 to start after 2 duration 6 VU count 2, Replica instance is 20181 razorbill.home
Sending "diset tpcc ora_timeprofile false" to 20181 razorbill.home
Value false for tpcc:ora_timeprofile is the same as existing value false, no change made
Sending "diset tpcc rampup 0" to 20181 razorbill.home
Changed tpcc:rampup from 2 to 0 for Oracle
Sending "diset tpcc duration 6" to 20181 razorbill.home
Changed tpcc:duration from 10 to 6 for Oracle
Sending "vuset vu 2" to 20181 razorbill.home
Sending dbset all to 20183 razorbill.home
Setting replica3 to start after 2 duration 4 VU count 2, Replica instance is 20183 razorbill.home
Sending "diset tpcc ora_timeprofile false" to 20183 razorbill.home
Value false for tpcc:ora_timeprofile is the same as existing value false, no change made
Sending "diset tpcc rampup 0" to 20183 razorbill.home
Changed tpcc:rampup from 2 to 0 for Oracle
Sending "diset tpcc duration 4" to 20183 razorbill.home
Changed tpcc:duration from 10 to 4 for Oracle
Sending "vuset vu 2" to 20183 razorbill.home
Sending dbset all to 20184 razorbill.home
Setting replica4 to start after 2 duration 2 VU count 2, Replica instance is 20184 razorbill.home
Sending "diset tpcc ora_timeprofile false" to 20184 razorbill.home
Value false for tpcc:ora_timeprofile is the same as existing value false, no change made
Sending "diset tpcc rampup 0" to 20184 razorbill.home
Changed tpcc:rampup from 2 to 0 for Oracle
Sending "diset tpcc duration 2" to 20184 razorbill.home
Changed tpcc:duration from 10 to 2 for Oracle
Sending "vuset vu 2" to 20184 razorbill.home
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Script loaded, Type "print script" to view
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 3 created - WAIT IDLE
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 1 created MONITOR - WAIT IDLE
Vuser 3 created - WAIT IDLE
Vuser 2 created - WAIT IDLE
3 Virtual Users Created with Monitor VU
Vuser 3 created - WAIT IDLE
3 Virtual Users Created with Monitor VU
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 3 created - WAIT IDLE
3 Virtual Users Created with Monitor VU
Vuser 1 created MONITOR - WAIT IDLE
Vuser 2 created - WAIT IDLE
Vuser 3 created - WAIT IDLE
3 Virtual Users Created with Monitor VU
Logging activated
to /tmp/hammerdb.log
3 Virtual Users Created with Monitor VU
Starting Primary VUs
Vuser 1:RUNNING
Vuser 1:Beginning rampup time of 2 minutes
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Processing 10000000 transactions with output suppressed...
Delaying Start of Replicas to rampup 2 replica1 2 replica2 2 replica3 2 replica4 2
Delaying replica1 for 4 minutes.
Delaying replica2 for 6 minutes.
Delaying replica3 for 8 minutes.
Delaying replica4 for 10 minutes.
Primary entering loop waiting for vucomplete
Vuser 1:Rampup 1 minutes complete ...
Vuser 1:Rampup 2 minutes complete ...
Vuser 1:Rampup complete, Taking start AWR snapshot.
Vuser 1:Start Snapshot 18 taken at 10 MAY 2021 09:07 of instance RAZCDB1 (1) of database RAZCDB1 (171153594)
Vuser 1:Timing test period of 10 in minutes
Vuser 1:1 ...,
Sending "run_virtual" to 20182 razorbill.home
Vuser 1:RUNNING
Vuser 1:Operating in Replica Mode, No Snapshots taken...
Vuser 1:FINISHED SUCCESS
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 1:2 ...,
Vuser 1:3 ...,
Sending "run_virtual" to 20181 razorbill.home
Vuser 1:RUNNING
Vuser 1:Operating in Replica Mode, No Snapshots taken...
Vuser 1:FINISHED SUCCESS
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 1:4 ...,
Vuser 1:5 ...,
Sending "run_virtual" to 20183 razorbill.home
Vuser 1:RUNNING
Vuser 1:Operating in Replica Mode, No Snapshots taken...
Vuser 1:FINISHED SUCCESS
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 1:6 ...,
Vuser 1:7 ...,
Sending "run_virtual" to 20184 razorbill.home
Vuser 1:RUNNING
Vuser 1:Operating in Replica Mode, No Snapshots taken...
Vuser 1:FINISHED SUCCESS
Vuser 2:RUNNING
Vuser 2:Processing 10000000 transactions with output suppressed...
Vuser 3:RUNNING
Vuser 3:Processing 10000000 transactions with output suppressed...
Vuser 1:8 ...,
Vuser 1:9 ...,
Vuser 1:10 ...,
Vuser 1:Test complete, Taking end AWR snapshot.
Vuser 1:End Snapshot 19 taken at 10 MAY 2021 09:17 of instance RAZCDB1 (1) of database RAZCDB1 (171153594)
Vuser 1:Test complete: view report from SNAPID 18 to 19
Vuser 1:2 Active Virtual Users configured
Vuser 1:TEST RESULT : System achieved 13607 NOPM from 28559 Oracle TPM
Vuser 1:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
Vuser 3:FINISHED SUCCESS
Vuser 2:FINISHED SUCCESS
Vuser 3:FINISHED SUCCESS
Vuser 3:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 2:FINISHED SUCCESS
Replica workload complete and calling exit from primary
Lost connection to : 20182 razorbill.home because target application died or connection lost
Vuser 3:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 3:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Vuser 2:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Replica workload complete and calling exit from primary
Lost connection to : 20181 razorbill.home because target application died or connection lost
Replica workload complete and calling exit from primary
Lost connection to : 20183 razorbill.home because target application died or connection lost
Replica workload complete and calling exit from primary
Lost connection to : 20184 razorbill.home because target application died or connection lost
Primary complete
deleting port_file /tmp/hdbcallback.tcl
Step workload complete

Monitoring the workload enables you to see the variation and the impact of starting additional instances against the same database over time.

Oracle Transaction Counter

Step workloads enable you to configure complex Virtual User configurations to see how your database responds to changes in load over time.

Author