PGO will now turn "huge_pages" to "try" or "off" based on whether huge pages have been requested in the resource spec.

dsessler7 · dsessler7 · commit f1a8f5ad2671 · 2023-02-22T14:47:28.000-07:00
[sc-17766]
diff --git a/docs/content/guides/huge-pages.md b/docs/content/guides/huge-pages.md
@@ -0,0 +1,71 @@
+---
+title: "Huge Pages"
+date:
+draft: false
+weight: 100
+---
+
+# Huge Pages
+
+Huge Pages, a.k.a. "Super Pages" or "Large Pages", are larger chunks of memory that can speed up your system. Normally, the chunks of memory, or "pages", used by the CPU are 4kB in size. The more memory a process needs, the more pages the CPU needs to manage. By using larger pages, the CPU can manage fewer pages and increase its efficiency. For this reason, it is generally recommended to use Huge Pages with your Postgres databases.
+
+# Configuring Huge Pages with PGO
+
+To turn Huge Pages on with PGO, you first need to have Huge Pages turned on at the OS level. This means having them enabled, and a specific number of pages preallocated, on the node(s) where you plan to schedule your pods. All processes that run on a given node and request Huge pages will be sharing this pool of pages, so it is important to allocate enough pages for all the different processes to get what they need. This system/kube-level configuration is outside the scope of this document, since the way that Huge Pages are configured at the OS/node level is dependent on your Kube environment. Consult your Kube environment documentation and any IT support you have for assistance with this step.
+
+When you enable Huge Pages in your Kube cluster, it is important to keep a few things in mind during the rest of the configuration process:
+1. What size of Huge Pages are enabled? If there are multiple sizes enabled, which one is the default? Which one do you want Postgres to use?
+2. How many pages were preallocated? Are there any other applications or processes that will be using these pages?
+3. Which nodes have Huge Pages enabled? Is it possible that more nodes will be added to the cluster? If so, will they also have Huge Pages enabled?
+
+Once Huge Pages are enabled on one or more nodes in your Kubernetes cluster, you can tell Postgres to start using them by adding some configuration to your PostgresCluster spec (Warning: setting/changing this setting will cause your database to restart):
+
+```yaml
+apiVersion: postgres-operator.crunchydata.com/v1beta1
+kind: PostgresCluster
+metadata:
+  name: hippo
+spec:
+  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.6-2
+  postgresVersion: 14
+  instances:
+    - name: instance1
+      resources:
+        limits:
+          hugepages-2Mi: 16Mi
+          memory: 4Gi
+```
+
+This is where it is important to know the size and the number of Huge Pages available. In the spec above, the `hugepages-2Mi` line indicates that we want to use 2MB sized pages. If your system only has 1GB sized pages available, then you will want to use `hugepages-1Gi` as the setting instead. The value after it, `16Mi` in our example, determines the amount of pages to be allocated to this Postgres instance. If you have multiple instances, you will need to enable/allocate Huge Pages on an instance by instance basis. Keep in mind that if you have a "Highly Available" cluster, meaning you have multiple replicas, each replica will also request Huge Pages. You therefore need to be cognizant of the total amount of Huge Pages available on the node(s) and the amount your cluster is requesting. If you request more pages than are available, you might see some replicas/instances fail to start.
+
+Note: In the `instances.#.resources` spec, there are `limits` and `requests`. If a request value is not specified (like in the example above), it is presumed to be equal to the limit value. For Huge Pages, the request value must always be equal to the limit value, therefore, it is perfectly acceptable to just specify it in the `limits` section.
+
+Note: Postgres uses the system default size by default. This means that if there are multiple sizes of Huge Pages available on the node(s) and you attempt to use a size in your PostgresCluster that is not the system default, it will fail. To use a non-default size you will need to tell Postgres the size to use with the `huge_page_size` variable, which can be set via dynamic configuration (Warning: setting/changing this parameter will cause your database to restart):
+
+```yaml
+patroni:
+  dynamicConfiguration:
+    postgresql:
+      parameters:
+        huge_page_size: 1GB
+```
+
+# The Kubernetes Issue
+
+There is an issue in Kubernetes where essentially, if Huge Pages are available on a node, it will tell the processes running in the pods on that node that it has Huge Pages available even if the pod has not actually requested any Huge Pages. This is an issue because by default, Postgres is set to "try" to use Huge Pages. When Postgres is led to believe that Huge Pages are available and it attempts to use Huge Pages only to find that the pod doesn't actually have any Huge Pages allocated since they were never requested, Postgres will fail.
+
+We have worked around this issue by setting `huge_pages = off` in our newest Crunchy Postgres images. PGO will automatically turn `huge_pages` back to `try` whenever Huge Pages are requested in the resources spec. Those who were already happily using Huge Pages will be unaffected, and those who were not using Huge Pages, but were attempting to run their Postgres containers on nodes that have Huge Pages enabled, will no longer see their databases crash.
+
+The only dilemma that remains is that those whose PostgresClusters are not using Huge Pages, but are running on nodes that have Huge Pages enabled, will see their `shared_buffers` set to their lowest possible setting. This is due to the way that Postgres' `initdb` works when bootstrapping a database. There are few ways to work around this issue:
+
+1. Use Huge Pages! You're already running your Postgres containers on nodes that have Huge Pages enabled, why not use them in Postgres?
+2. Create nodes in your Kubernetes cluster that don't have Huge Pages enabled, and put your Postgres containers on those nodes.
+3. If for some reason you cannot use Huge Pages in Postgres, but you must run your Postgres containers on nodes that have Huge Pages enabled, you can manually set the `shared_buffers` parameter back to a good setting using dynamic configuration (Warning: setting/changing this parameter will cause your database to restart):
+
+```yaml
+patroni:
+  dynamicConfiguration:
+    postgresql:
+      parameters:
+        shared_buffers: 128MB
+```
diff --git a/internal/controller/postgrescluster/controller.go b/internal/controller/postgrescluster/controller.go
@@ -216,6 +216,9 @@ func (r *Reconciler) Reconcile(
 	pgbackrest.PostgreSQL(cluster, &pgParameters)
 	pgmonitor.PostgreSQLParameters(cluster, &pgParameters)
 
+	// Set huge_pages = try if a hugepages resource limit > 0, otherwise set "off"
+	postgres.SetHugePages(cluster, &pgParameters)
+
 	if err == nil {
 		rootCA, err = r.reconcileRootCertificate(ctx, cluster)
 	}
diff --git a/internal/postgres/huge_pages.go b/internal/postgres/huge_pages.go
@@ -0,0 +1,54 @@
+/*
+ Copyright 2021 - 2023 Crunchy Data Solutions, Inc.
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/
+
+package postgres
+
+import (
+	"strings"
+
+	corev1 "k8s.io/api/core/v1"
+	"k8s.io/apimachinery/pkg/api/resource"
+
+	"github.com/crunchydata/postgres-operator/pkg/apis/postgres-operator.crunchydata.com/v1beta1"
+)
+
+// This function looks for a valid huge_pages resource request. If it finds one,
+// it sets the PostgreSQL parameter "huge_pages" to "try". If it doesn't find
+// one, it sets "huge_pages" to "off".
+func SetHugePages(cluster *v1beta1.PostgresCluster, pgParameters *Parameters) {
+	if hugePagesRequested(cluster) {
+		pgParameters.Default.Add("huge_pages", "try")
+	} else {
+		pgParameters.Default.Add("huge_pages", "off")
+	}
+}
+
+// This helper function checks to see if a huge_pages value greater than zero has
+// been set in any of the PostgresCluster's instances' resource specs
+func hugePagesRequested(cluster *v1beta1.PostgresCluster) bool {
+	for _, instance := range cluster.Spec.InstanceSets {
+		for resourceName := range instance.Resources.Limits {
+			if strings.HasPrefix(resourceName.String(), corev1.ResourceHugePagesPrefix) {
+				resourceQuantity := instance.Resources.Limits.Name(resourceName, resource.BinarySI)
+
+				if resourceQuantity != nil && resourceQuantity.Value() > 0 {
+					return true
+				}
+			}
+		}
+	}
+
+	return false
+}
diff --git a/internal/postgres/huge_pages_test.go b/internal/postgres/huge_pages_test.go
@@ -0,0 +1,109 @@
+/*
+ Copyright 2021 - 2023 Crunchy Data Solutions, Inc.
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+*/
+
+package postgres
+
+import (
+	"testing"
+
+	"gotest.tools/v3/assert"
+	corev1 "k8s.io/api/core/v1"
+	"k8s.io/apimachinery/pkg/api/resource"
+
+	"github.com/crunchydata/postgres-operator/internal/initialize"
+	"github.com/crunchydata/postgres-operator/pkg/apis/postgres-operator.crunchydata.com/v1beta1"
+)
+
+func TestSetHugePages(t *testing.T) {
+	t.Run("hugepages not set at all", func(t *testing.T) {
+		cluster := new(v1beta1.PostgresCluster)
+
+		cluster.Spec.InstanceSets = []v1beta1.PostgresInstanceSetSpec{{
+			Name:     "test-instance1",
+			Replicas: initialize.Int32(1),
+			Resources: corev1.ResourceRequirements{
+				Limits: corev1.ResourceList{},
+			},
+		}}
+
+		pgParameters := NewParameters()
+		SetHugePages(cluster, &pgParameters)
+
+		assert.Equal(t, pgParameters.Default.Has("huge_pages"), true)
+		assert.Equal(t, pgParameters.Default.Value("huge_pages"), "off")
+	})
+
+	t.Run("hugepages quantity not set", func(t *testing.T) {
+		cluster := new(v1beta1.PostgresCluster)
+
+		emptyQuantity, _ := resource.ParseQuantity("")
+		cluster.Spec.InstanceSets = []v1beta1.PostgresInstanceSetSpec{{
+			Name:     "test-instance1",
+			Replicas: initialize.Int32(1),
+			Resources: corev1.ResourceRequirements{
+				Limits: corev1.ResourceList{
+					corev1.ResourceHugePagesPrefix + "2Mi": emptyQuantity,
+				},
+			},
+		}}
+
+		pgParameters := NewParameters()
+		SetHugePages(cluster, &pgParameters)
+
+		assert.Equal(t, pgParameters.Default.Has("huge_pages"), true)
+		assert.Equal(t, pgParameters.Default.Value("huge_pages"), "off")
+	})
+
+	t.Run("hugepages set to zero", func(t *testing.T) {
+		cluster := new(v1beta1.PostgresCluster)
+
+		cluster.Spec.InstanceSets = []v1beta1.PostgresInstanceSetSpec{{
+			Name:     "test-instance1",
+			Replicas: initialize.Int32(1),
+			Resources: corev1.ResourceRequirements{
+				Limits: corev1.ResourceList{
+					corev1.ResourceHugePagesPrefix + "2Mi": resource.MustParse("0Mi"),
+				},
+			},
+		}}
+
+		pgParameters := NewParameters()
+		SetHugePages(cluster, &pgParameters)
+
+		assert.Equal(t, pgParameters.Default.Has("huge_pages"), true)
+		assert.Equal(t, pgParameters.Default.Value("huge_pages"), "off")
+	})
+
+	t.Run("hugepages set correctly", func(t *testing.T) {
+		cluster := new(v1beta1.PostgresCluster)
+
+		cluster.Spec.InstanceSets = []v1beta1.PostgresInstanceSetSpec{{
+			Name:     "test-instance1",
+			Replicas: initialize.Int32(1),
+			Resources: corev1.ResourceRequirements{
+				Limits: corev1.ResourceList{
+					corev1.ResourceHugePagesPrefix + "2Mi": resource.MustParse("16Mi"),
+				},
+			},
+		}}
+
+		pgParameters := NewParameters()
+		SetHugePages(cluster, &pgParameters)
+
+		assert.Equal(t, pgParameters.Default.Has("huge_pages"), true)
+		assert.Equal(t, pgParameters.Default.Value("huge_pages"), "try")
+	})
+
+}

Original file line number	Diff line number	Diff line change
`@@ -216,6 +216,9 @@ func (r *Reconciler) Reconcile(`
`216`	`216`	`pgbackrest.PostgreSQL(cluster, &pgParameters)`
`217`	`217`	`pgmonitor.PostgreSQLParameters(cluster, &pgParameters)`
`218`	`218`
	`219`	`+ // Set huge_pages = try if a hugepages resource limit > 0, otherwise set "off"`
	`220`	`+ postgres.SetHugePages(cluster, &pgParameters)`
	`221`	`+`
`219`	`222`	`if err == nil {`
`220`	`223`	`rootCA, err = r.reconcileRootCertificate(ctx, cluster)`
`221`	`224`	`}`