Our platform team has a regular meeting where we often use ops issues as a springboard to dig into Postgres internals. Great meeting today – we ended up talking about the internal architecture of Postgres replication. Sharing a few high-quality links from our discussion:
Alexander Kukushkin’s conference talk earlier this year, which includes a great explanation of how replication works
- https://posetteconf.com/speakers/alexander-kukushkin/
- https://www.youtube.com/watch?v=PFn9qRGzTMc
- https://www.postgresql.eu/events/pgconfde2025/sessions/session/ 6559-myths-and-truths-about-synchronous-replication-in-postgresql/
- https://www.postgresql.eu/events/pgconfde2025/sessions/session/ 6559/slides/663/Myths%20and%20Truths%20about%20 Synchronous%20Replication%20in%20PostgreSQL.pdf
Alexander’s interview on PostgresTV with Nik Samokhvalov
PostgresFM episode about synchronous_commit
Postgres Documentation for pg_stat_replication system catalog (most important source of replication monitoring data)
CloudNativePG source code that translates pg_stat_replication data into prometheus metrics
- https://github.com/cloudnative-pg/cloudnative-pg/blob/main/config/manager/default-monitoring.yaml#L385
Chapter about streaming replication in Hironobu Suzuki’s book, Internals of PostgreSQL
Here is very helpful diagram from Alexander’s slide deck, which we referenced heavily during our discussion.
Can you identify exactly where in this diagram the three lag metrics come from? (write lag, flush lag and replay lag)




Discussion
No comments yet.