PostgreSQL extensions are great! Simply by adding an extension, one transforms what is an otherwise vanilla general-purpose database management system into one capable of processing data requirements in a highly optimized fashion. Some extensions, like pg_repack, simplify and enhance existing features already, while other extensions, such as PostGIS and pgvector, add completely new capabilities.

I’d like to introduce to you the PostgreSQL extension TimescaleDB. As the name implies, TimescaleDB is all about data recorded against time. In database systems, time series data is often a relatively small data point of one or more dimensions. A good example is the gathering of weather data, i.e., time versus air pressure, UV index, wind speed, temperature, humidity, etc.

Time series data processing can be taxing to any database system:

  • Data INSERTs are coming in at a furious rate.
  • Data retention requirements can be in the hundreds of terabytes, even petabytes.
  • Data analysis and report generation capabilities of both archived data and data that’s coming in real-time are often required.

While PostgreSQL is quite capable of ingesting, managing, and analyzing time series data, there are other solutions, many of them proprietary, that can perform data ingestion and generate actionable insights at a faster rate.

Which brings us back to TimescaleDB.

TimescaleDB was created as a means to improve the rate of data ingestion of time series data into PostgreSQL by simplifying and automating many of the steps that an experienced DBA uses in order to optimize and maintain the health of the database over its lifecycle.

Installing and enabling the TimescaleDB extension

The best way to appreciate the possibilities is, of course, to install the extension and try it out!

Step one: Create the PostgreSQL file repository configuration

Because my Linux machine is Ubuntu, the following commands are used to install the extension:

Step two: Get the extension

As I’m using Ubuntu “Jammy,” I have the following available packages:

Step three: Install TimescaleDB packages

The available packages include:

Step four: Tune the data-cluster

The CLI utility timescaledb-tune is great as it can be used to generate an optimized set of Postgres runtime parameters. However, it assumes that Postgres is the only major process running on the host and is neither competing for RAM nor CPU.

Suggested test invocation:

And here’s the invocation saving the results in postgresql.auto.conf:

The service can now be restarted:

Step five: Create database and extension

Pretty standard stuff, create your database and extension:

Once created, the TimescaleDB extension installs a number of functions, views, and tables in the database:

Working with Timescale

Now that everything has been set up, we’re ready to explore.

Scenario one

Creating the tables

Let’s create two sets of tables: one is the standard heap table found in PostgreSQL, while the second type will, of course, be TimescaleDB’s hypertable.

Creating a TimescaleDB hypertable automatically defines the partitioning rule using any column of Date/Time Types found in the table. In this case, it’s column t_stamp which is of datatype timestamptz (timestamp with time zone) as the key.

This is the command required to create the hypertable:

The ordinary heap table is created thusly:

Increasing/decreasing chunk size

TimescaleDB partitions its tables into chunks. Although the default size is constrained to seven days, it can however be varied to any time range desired.

For the purposes of this blog, and because I’m using small data sets, the chunk interval for table t_timescale is set at 10 minutes:

Chunk size best practices:

  • Never make the chunk larger than the PostgreSQL shared buffer and free RAM.
  • When in doubt, always start with chunks known to be smaller than the shared buffer. The reasoning being is that it’s easier to manage and administer tables making smaller chunks bigger than making larger chunks smaller in order to fit the available RAM.

Populating the tables

Prior to populating the tables, the psql meta-command timing is invoked:

While the standard table didn’t have any index at table creation, thus accelerating data population, the hypertable added the index at the time of the hyper table’s creation. Please note: these numbers will vary wildly according to the Hardware used, i.e., mine is relatively low-end.

Taking a closer look at the t_timescale schema definition one sees that five partitioned child tables have been created. Recalling the chunk interval setting of 10 minutes prior to the data population, new partitions were created as the time interval incremented by 10 minutes:

Examination of one of the chunks confirms the partition is set at 10 minutes:

The inconsistent sizes, as indicated in the results below, are because the number of records varied within the assigned 10-minute interval:

You’ll notice the timescaleDB Indexes are significantly larger than the Btree index created for table t_standard. Evidently, they contain more information/data:

Administering hypertable chunks

The following focuses solely on simple chunk administration.

Chunk, general purpose function calls

These TimescaleDB functions are used to administer the chunks:

Chunk, compression function calls

These TimescaleDB functions are used to administer table’s compression:

Timescale chunk runtime parameters

Similar to the general collection of Postgres runtime parameters, some of these timescale parameters operate across the data cluster while others can be more fine-tuned for specific relations and even sessions.

Scenario two

Taking a small subset of the aforementioned functions and runtime parameters, this next scenario demonstrates how one can compress everything from individual chunks to setting a comprehensive policy for a table based on the chunk age that has been created under normal production conditions.

Compressing a chunk

Table compression is one of those features best labeled under the category of the killer feature. What’s especially cool is one can not only reduce space consumption but query and perform DML operations on a hyper table too.

Notice how the chunk’s size has been zeroed. In order to get the true size of the now compressed relation, you’ll need to use a function call, see below for an example invocation.

This function gives a complete set of metrics describing the now compressed chunk:

Decompressing a chunk

It’s just as easy to decompress a chunk as it is to compress one. The typical reason will be to maximize performance when it comes to the various DML operations that must be performed.

Setting a chunk compression policy

As previously demonstrated, setting the CHUNK interval (time) is a fairly straightforward process. So too, is setting the compression policy. One merely chooses the hypertable and how long after it has been created before it is finally compressed. Having a time-based argument for compression recognizes the need to process the most recent data in the shortest amount of time:

This compression policy compresses chunks that are older than 30 minutes:

Testing the newly set compression policy is accomplished by inserting new records:

Note the partition size differences between the old and new ones:

Disabling the compression policy is accomplished thusly:

Caveat

As with all technologies, especially new ones, there’s always some form of limitation. TimescaleDB is no exception:

  • Foreign key constraints referencing a hypertable are not supported.
  • Time dimensions (columns) used for partitioning cannot have NULL values.
  • UNIQUE indexes must include all columns that are partitioning dimensions.
  • UPDATE statements that move values between partitions (chunks) are not supported.
  • Horizontal scaling, and multi-node support, are no longer supported. Instead, it is recommended that a distributed network file system, such as Ceph, be used for scaling purposes.

References

    https://docs.timescale.com/ 
    https://docs.timescale.com/api/latest/ 
    https://docs.timescale.com/api/latest/compression/alter_table_compression/ 
    https://docs.timescale.com/use-timescale/latest/compression/about-compression/ 

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments