>
Planet, PostgreSQL, Technical

PostgreSQL Logical Replicas and Snapshots: Proceed Carefully

Earlier this year, Christos Christoudias – an Engineer from Instacart – published an article on Medium about Creating a Logical Replica from a Snapshot in RDS Postgres. Recently I’ve seen discussions about this article a few times, in a few different places. Is it safe? Is it ok to do this?

A little under a year before publishing this article (just before the pandemic completely shut everything down), Christos presented at the San Francisco Bay Area PostgreSQL Users Group. The YouTube recording is well worth watching. Christos started off his talk by saying “this is a novel not a textbook” which I think is great framing for technical discussions like this. I decided to post a few thoughts here on my own blog as another chapter in that novel.

I’ll start with my colleague’s answer to the question: “Is it ok to do this?” … he said, “Whenever I’ve talked to the author, they said to ‘be careful with that blog’”

I wasn’t in that conversation between my colleague and Christos (I don’t know if Christos really said that) but it perfectly reflects my own sentiment. Be careful with this blog. Let’s dig a little deeper.

First, some background:

Logical replication always consists of two distinct things.

  1. initial sync/copy
  2. start the stream of changes

On the surface, this sounds simple enough. But when you look closer, it’s more complicated than you think – and you can lose data if you’re not careful.

The safest way to do logical replication is to simply use a single integrated solution which handles both parts, where some other engineers already worked through the data loss corner cases. The best example of this would be something like the CREATE PUBLICATION and CREATE SUBSCRIPTION sql commands in PostgreSQL, which can correctly handle both the initial copy and starting the change stream.

That said, it’s entirely possible to separate them – and many people do this for exactly the same reason as Instacart: to do the copy faster.

James Coleman from Braintree left a comment on the Instacart article describing the main risk I’m aware of. His description is well articulated:

If transactions are in flight at the LSN snapshot point, then are those transactions going to logically replicate? They won’t be visible in the physical snapshot restore (since there’d be no COMMIT record in the WAL for them yet), but I also wouldn’t have expected logical decoding to go back in time for all transactions in flight around that LSN. In fact, the point of the confirmed_flush_lsn (as I understand it) is so that logical decoding knows it definitely does not need to read WAL records before that point.

So unless you pause traffic and get a snapshot during that pause in writes, isn’t it possible this won’t capture in-flight transactions?

So if lots of people are doing the copy separately from starting change streaming, then why do we tell people to be careful with the Instacart blog post?

The part of the Instacart blog that’s potentially problematic is the step-by-step instructions to move the LSN forward to a specific value obtained from a specific log entry. It’s a simple process which someone could easily copy and paste without fully understanding what it’s doing. The general concept of snapshot + replication is no different from anything else that separates the initial copy from the change streaming. You could also use pg_dump, promote a standby, pg_basebackup, or create a snapshot with your storage array. You can use this shortcut but as I heard a coworker say: shortcuts are dangerous and wrought with peril if you don’t understand them!

The main risk coming to mind for me is data loss from mis-synchronization between the initial sync and the change streaming. So how could someone do this safely? Personally I would leave lots of margin – replay a bunch of changes that I believe should already be on the target system. I’d keep an eye on the timestamp for the LSN used to start replication, I’d monitor in-flight transactions (pg_stat_activity) and I’d wait to take a snapshot until I was sure that no transactions remained which started before the replication LSN and give buffer for safety. Perhaps most important, and this should go without saying, I’d perform this during a period of very low activity – if possible I’d actually take a short downtime window and properly pause the workload.

It might be more complicated than just looking at in-flight transactions. For example, I think logical decoding has do some extra work to handle catalog synchronization; I don’t know whether this can impact the synchronization between initial sync and change streaming. This might be risky territory – especially on a system with a heavy workload. Another risk might be whether replaying transactions already in the snapshot could cause constraint violations and break replication.

I know a few people at AWS who are knowledgeable about PostgreSQL and I could easily see one of them saying something like this:

It’s not possible to do this safely unless you have someone at the steering wheel who understands PostgreSQL well.

The blog post said experts at AWS told them this was not possible at all; maybe it got over-simplified as it was relayed, or maybe someone at AWS wasn’t fully informed. Regardless, this is just another chapter in the novel. I think we can iterate on these ideas and hopefully arrive at a process that we all agree is fool-proof.

.

Post-script… The YouTube video from SFPUG isn’t related to logical replication but it’s an awesome PostgreSQL battle story. I think everyone operating PostgreSQL in production should watch it. A few of my own notes about that video:

  • The recording said this is an 11TB transactional, operational database. He mentioned that their downtime window was 2 hours – between the last delivery in Hawaii and the first delivery on the east coast. If there’s anyone left who thinks nobody is using relational databases for critical production workloads, or at new businesses who scale up quickly, this seems to be a good counterpoint. Another one is Amazon’s Fulfillment Centers, and there are plenty of public case studies with additional examples. By no means am I suggesting anything negative about other approaches to data storage and processing; I’m just saying that relational isn’t disappearing as other technologies grow. From what I see in places like db-engines rankings, PostgreSQL in particular seems to be growing at a very healthy pace.
  • The SF PostgreSQL User Group recording also mentioned someone from AWS telling Instacart that their database was the largest RDS PostgreSQL database. That raised my eyebrows and surprised me a bit; I checked with someone at RDS and confirmed this was inaccurate. Apologies guys, whoever told you that was evidently a bit misinformed… 🙂
  • The recording mentioned an RDS “documentation mistake” with min/max_wal_size, and listeners might walk away thinking they should just disregard RDS documentation. I remember back when this happened, and the recording is correct that it happened – but I think there is a little additional backstory that’s worth adding to the account. The upstream open source PostgreSQL community made a change between version 9.6 and version 10 that seemed innocent enough… changing the default units for this parameter. In RDS, numeric parameters always use default units. Instacart is right in saying that RDS missed updating their informational description of the parameter at the time of initial release (it’s since been fixed). Anyone who was setting a custom value for this parameter could have easily copied the value from their 9.6 database to their v10+ database, having no idea what they’d done. This problem applied to any PostgreSQL installation using default units – not just RDS. Documentation is important and impactful; I don’t want to downplay that – Instacart’s war story at the SFPUG meetup is a strong testimony of this. A certain amount of “trust but verify” is important, and I think RDS deserved to lose a little trust because of the doc mistake. I’m glad RDS fixed it. But I myself do still think the RDS documentation has a ton of value and I hope nobody walks away with a message of disregarding it!

.

This is my personal blog. The views expressed here are mine alone and may not reflect the views of my employer.

About Jeremy

Building and running reliable data platforms that scale and perform. about.me/jeremy_schneider

Discussion

No comments yet.

Leave a New Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com


https://about.me/jeremy_schneider

oaktableocmaceracattack

(a)

Enter your email address to receive notifications of new posts by email.

Join 37 other followers

%d bloggers like this: