Using Huge Pages with PostgreSQL Running Inside Kubernetes

Huge pages make PostgreSQL faster; can we implement it in Kubernetes? Modern servers operate with terabytes of RAM, and by default, processors work with virtual memory address translation for each 4KB page. OS maintains a huge list of allocated and free pages to make slow but reliable address translation from virtual to physical.

Please check out the Why Linux HugePages are Super Important for Database Servers: A Case with PostgreSQL blog post for more information.

Setup

I recommend starting with 2MB huge pages because it’s trivial to set up. Unfortunately, the performance in benchmarks is almost the same as for 4KB pages. Kubernetes worker nodes should be configured with GRUB_CMDLINE_LINUX or sysctl vm.nr_hugepages=N: https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/

This step could be hard with managed Kubernetes services, like GCP, but easy for kubeadm, kubespray, k3d, and kind installations.

Kubectl helps to check the amount of huge pages available.

kubectl describe nodes NODENAME
…
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      1Gi (25%)  1Gi (25%)
…

kubectl describe nodes NODENAME

…

hugepages-1Gi 0 (0%) 0 (0%)

hugepages-2Mi 1Gi (25%) 1Gi (25%)

…

The tool reports only 2MB pages availability in the above output. During the deployment procedure on the custom resource apply stage, Percona Operator for PostgreSQL 2.2.0 is not able to start on such nodes:

$ kubectl -n pgo get pods -l postgres-operator.crunchydata.com/data=postgres
NAME                        READY   STATUS             RESTARTS       AGE
cluster1-instance1-f65t-0   3/4     CrashLoopBackOff   6 (112s ago)   8m35s
cluster1-instance1-2bss-0   3/4     CrashLoopBackOff   6 (100s ago)   8m35s
cluster1-instance1-89v7-0   3/4     CrashLoopBackOff   6 (104s ago)   8m35s

$ kubectl -n pgo get pods -l postgres-operator.crunchydata.com/data=postgres

NAME READY STATUS RESTARTS AGE

cluster1-instance1-f65t-0 3/4 CrashLoopBackOff 6 (112s ago) 8m35s

cluster1-instance1-2bss-0 3/4 CrashLoopBackOff 6 (100s ago) 8m35s

cluster1-instance1-89v7-0 3/4 CrashLoopBackOff 6 (104s ago) 8m35s

Logs are very confusing:

kubectl -n pgo logs cluster1-instance1-f65t-0 -c database
selecting dynamic shared memory implementation ... posix
sh: line 1:   737 Bus error               (core dumped) "/usr/pgsql-15/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1

kubectl -n pgo logs cluster1-instance1-f65t-0 -c database

selecting dynamic shared memory implementation ... posix

sh: line 1: 737 Bus error (core dumped) "/usr/pgsql-15/bin/postgres" --check -F -c log_checkpoints=false -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=posix < "/dev/null" > "/dev/null" 2>&1

By default, PostgreSQL is configured to use huge pages, but Kubernetes needs to allow it first. .spec.instances.resources.limits should be modified to mention huge pages. PG pods are not able to start without proper limits on the node with huge pages enabled.

instances:
  - name: instance1
    replicas: 3
    resources:
      limits:
        hugepages-2Mi: 1024Mi
        memory: 1Gi
        cpu: 500m

instances:

- name: instance1

replicas: 3

resources:

limits:

hugepages-2Mi: 1024Mi

memory: 1Gi

cpu: 500m

hugepages-2Mi works in combination with the memory parameter; you can’t just specify huge pages limits.

Finally, let’s verify huge pages usage in postmaster memory map:

$ kubectl -n pgo exec -it cluster1-instance1-hgrp-0 -c database -- bash

ps -eFH # check process tree and find “first” postgres process

pmap -X -p 107|grep huge

         Address Perm   Offset Device     Inode   Size   Rss  Pss Pss_Dirty Referenced Anonymous LazyFree ShmemPmdMapped FilePmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked THPeligible Mapping

    7f35c5c00000 rw-s 00000000  00:0f 145421787 432128     0    0         0          0         0        0              0             0          18432          264192    0       0      0           0 /anon_hugepage (deleted)

$ kubectl -n pgo exec -it cluster1-instance1-hgrp-0 -c database -- bash

ps -eFH # check process tree and find “first” postgres process

pmap -X -p 107|grep huge

Address Perm Offset Device Inode Size Rss Pss Pss_Dirty Referenced Anonymous LazyFree ShmemPmdMapped FilePmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked THPeligible Mapping

7f35c5c00000 rw-s 00000000 00:0f 145421787 432128 0 0 0 0 0 0 0 0 18432 264192 0 0 0 0 /anon_hugepage (deleted)

Both Shared_Hugetlb Private_Hugetlb columns are set (18432 and 264192). It confirms that PostgreSQL can use huge pages.

Don’t set huge pages to the exact value of shared_buffers, as shared memory could also be consumed by extensions and many internal structures.

postgres=# SELECT sum(allocated_size)/1024/1024 FROM pg_shmem_allocations ;
       ?column?       
----------------------
 422.0000000000000000
(1 row)
postgres=# select * from pg_shmem_allocations order by allocated_size desc LIMIT 10;
         name         |    off    |   size    | allocated_size 
----------------------+-----------+-----------+----------------
 <anonymous>          |           | 275369344 |      275369344
 Buffer Blocks        |   6843520 | 134217728 |      134217728
 pg_stat_monitor      | 147603584 |  20971584 |       20971648
 XLOG Ctl             |     54144 |   4208200 |        4208256
                      | 439219200 |   3279872 |        3279872
 Buffer Descriptors   |   5794944 |   1048576 |        1048576
 CommitTs             |   4792192 |    533920 |         534016
 Xact                 |   4263040 |    529152 |         529152
 Checkpointer Data    | 146862208 |    393280 |         393344
 Checkpoint BufferIds | 141323392 |    327680 |         327680
(10 rows)

postgres=# SELECT sum(allocated_size)/1024/1024 FROM pg_shmem_allocations ;

?column?

----------------------

422.0000000000000000

(1 row)

postgres=# select * from pg_shmem_allocations order by allocated_size desc LIMIT 10;

name | off | size | allocated_size

----------------------+-----------+-----------+----------------

<anonymous> | | 275369344 | 275369344

Buffer Blocks | 6843520 | 134217728 | 134217728

pg_stat_monitor | 147603584 | 20971584 | 20971648

XLOG Ctl | 54144 | 4208200 | 4208256

| 439219200 | 3279872 | 3279872

Buffer Descriptors | 5794944 | 1048576 | 1048576

CommitTs | 4792192 | 533920 | 534016

Xact | 4263040 | 529152 | 529152

Checkpointer Data | 146862208 | 393280 | 393344

Checkpoint BufferIds | 141323392 | 327680 | 327680

(10 rows)

Pg_stat_statements and pg_stat_monitor could introduce a significant difference to the small value of shared_buffers. Thus you need “hugepages-2Mi: 512Mi” for “shared_buffers: 128MB”.

Now you know all the caveats and may want to repeat the configuration.

It’s easy with anydbver and k3d. Allocate 2MB huge pages:

sysctl vm.nr_hugepages=2048

1	sysctl vm.nr_hugepages=2048

Verify huge pages availability:

egrep 'Huge|Direct' /proc/meminfo
AnonHugePages:    380928 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         4194304 kB
DirectMap4k:     1542008 kB
DirectMap2M:    19326976 kB
DirectMap1G:           0 kB

egrep 'Huge|Direct' /proc/meminfo

AnonHugePages: 380928 kB

ShmemHugePages: 0 kB

FileHugePages: 0 kB

HugePages_Total: 2048

HugePages_Free: 2048

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

Hugetlb: 4194304 kB

DirectMap4k: 1542008 kB

DirectMap2M: 19326976 kB

DirectMap1G: 0 kB

Install and configure anydbver.

git clone https://github.com/ihanick/anydbver.git cd anydbver ansible-galaxy collection install theredgreek.sqlite echo PROVIDER=docker > .anydbver (cd images-build;./build.sh)
1
2
3
4
5
git clone https://github.com/ihanick/anydbver.git
cd anydbver
ansible-galaxy collection install theredgreek.sqlite
echo PROVIDER=docker > .anydbver
(cd images-build;./build.sh)
Start k3d cluster and install Percona Operator for PostgreSQL 2.2.0:

./anydbver deploy k8s-pg:2.2.0
1
./anydbver deploy k8s-pg:2.2.0
The command hangs on the cluster deployment stage, and the second terminal shows CrashLoopBackoff state:

kubectl -n pgo get pods -l postgres-operator.crunchydata.com/data=postgres
1
kubectl -n pgo get pods -l postgres-operator.crunchydata.com/data=postgres
Change data/k8s/percona-postgresql-operator/deploy/cr.yaml
Uncomment .spec.instances[0].resources.limits and set memory: 1Gi, hugepages-2Mi: 1024Mi
Apply CR again:

kubectl -n pgo apply -f data/k8s/percona-postgresql-operator/deploy/cr.yaml
1
kubectl -n pgo apply -f data/k8s/percona-postgresql-operator/deploy/cr.yaml

In summary:

Huge pages are not supported out of the box in public clouds
Database crashes can occur if huge pages allocation fails with a bus error
- Fresh containerd 1.1.10+ is required.
- Reserve huge pages amounts bigger than shared_buffers, and verify estimations with pg_shmem_allocations.
Huge pages is not a silver bullet.
- Without frequent CPU context switches and massively random large shared buffer access, default 4K pages show comparable results.
- Workloads with less than 4-5k transactions per second are fine even without huge pages

Learn more about Percona Operator for PostgreSQL

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using Huge Pages with PostgreSQL Running Inside Kubernetes

Setup

In summary:

Related

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using Huge Pages with PostgreSQL Running Inside Kubernetes

Setup

In summary:

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation