>
kubernetes, Planet, PostgreSQL, Technical

Testing CloudNativePG Preferred Data Durability

This is the third post about running Jepsen against CloudNativePG. Earlier posts:

First: shout out to whoever first came up with Oracle Data Guard Protection Modes. Designing it to be explained as a choice between performance, availability and protection was a great idea.

Yesterday’s blog post described how the core of all data safety is copies of the data, and the importance of efficient architectures to meet data safety requirements.

With Postgres, three-node clusters ensure the highest level of availability if one host fails. But two-node clusters are often worth the cost savings in exchange for a few seconds of unavailability during cluster reconfigurations. Similar to Oracle, Postgres two-node clusters can be configured to maximize performance or availability or protection.

Oracle Data Guard modeBehaviorPatroni configurationCloudNativePG configuration
Max Performance 
oracle default
Async; fastest commits; possible data loss on failoverpatroni defaultcnpg default
Max Availability (NOAFFIRM)Sync when standby available; acknowledge after standby write (not flush); if none available, don’t blocksynchronous_mode: true
synchronous_commit: remote_write
method: any
number: 1
dataDurability: preferred
synchronous_commit: remote_write
Max Availability (AFFIRM)Sync when standby available; acknowledge after standby flush; if none available, don’t blocksynchronous_mode: truemethod: any
number: 1
dataDurability: preferred
Max ProtectionAlways sync; if no sync standby, block commits (no data loss)synchronous_mode: true
synchronous_mode_strict: true
method: any
number: 1

Automated failovers can involve a small amount of data loss with maximum performance and maximum availability configurations. With Oracle Fast-Start Failover, the FastStartFailoverLagLimit configuration property indicates the maximum amount of data loss that is permissible in order for an automatic failover to occur.

The previous blog post in this series compared CloudNativePG Max Performance and Max Protection modes. Now I want to take a look at Max Availability. In CloudNativePG, the key setting here is spec.postgresql.synchronous.dataDurability. When dataDurability is set to preferred, the required number of synchronous instances adjusts based on the number of available standbys. PostgreSQL will attempt to replicate WAL records to the designated number of synchronous standbys, but write operations will continue even if fewer than the requested number of standbys are available.

All of these experiments were executed on my HP EliteBook (Ryzen Pro 5) with two CNPG Lab VMs via Hyper‑V and the tests ran in a loop for 12–24 hours to aggregate failure rates across the runs.

Experiment 1

Using the same test harness as before to indice rapid failures. The test harness waits for all replicas to be READY (per k8s) and then immediately kills the writer.

Hypothesis: in max protection mode we won’t see any data loss, but we will see data loss in max availability mode. Adding a third node to the cluster should reduce the likelihood of data loss.

dataDurabilityinstancesruns showing data loss
required20% [results]
preferred248% [results]
preferred34% [results]

Findings: Setting dataDurability: preferred in CloudNativePG allows for higher availability but can result in data loss during failover, especially in smaller clusters. I was surprised how much the third node helped.

Experiment 2

Hypothesis A: I was seeing a high failure rate specifically because the rapid failures were triggering a failover before CloudNativePG had enough time to restart synchronous replication after the last failure. If there are 60 seconds between each failure, then we shouldn’t see any data loss.

Hypothesis B: CloudNativePG has a failoverDelay setting which can inject a delay before the CNPG reconciliation loop triggers a failover when the primary is unhealthy. If we set this to 60 seconds then we shouldn’t see any data loss.

nb. I also switched to running the latest development build from the trunk of CloudNativePG. (Separately, I had wanted to test some code that was checked in the day before I ran these tests.)

seconds between killsfailoverDelayruns showing data loss
0040% [results]
6004% [results]
0600% [results]

Findings: Introducing a delay – either by spacing out failures or by configuring failoverDelay – dramatically reduced or eliminated data loss in preferred mode. When failures occurred back-to-back with no delay, data loss was frequent. However, waiting 60 seconds between failures, or setting a 60-second failoverDelay, allowed CloudNativePG enough time to reestablish synchronous replication, resulting in little or no data loss.

What this means

CloudNativePG’s preferred data durability mode offers data safety and high availability with lower-cost two-node clusters by allowing commits to proceed even if the synchronous standby is temporarily unavailable. However, this flexibility comes with a small risk of data loss during failover, especially when failures happen in rapid succession. Introducing delays via the failoverDelay setting minimizes risk. For environments where data durability is paramount, three-node clusters in required mode remain the safest choice, but for those willing to trade a small risk of data loss for improved availability, two-node clusters in preferred mode can be a practical option. Consider setting failoverDelay alongside preferred durability for extra safety.

Unknown's avatar

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

Discussion

No comments yet.

Leave a New Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com


https://about.me/jeremy_schneider

oaktableocmaceracattack

(a)

Enter your email address to receive notifications of new posts by email.

Join 76 other subscribers