Postgres database-level “synchronous replication” does not actually mean the replication is synchronous. It’s a bit of a lie really. The replication is actually – always – asynchronous. What it actually means is “when the client issues a COMMIT then pause until we know the transaction is replicated.” In fact the primary writer database doesn’t need to wait for the replicas to catch up UNTIL the client issues a COMMIT …and even then it’s only a single individual connection which waits. This has many interesting properties.
One benefit is throughput and performance. It means that much of the database workload is actually asynchronous – which tends to work pretty well. The replication stream operates in parallel to the primary workload.
But an interesting drawback is that you can get into situations where the primary can speed ahead of the replica quite a bit before that COMMIT statement hits and then the specific client who issued the COMMIT will need to sit and wait for awhile. It also means that bulk operations like pg_repack or VACUUM FULL or REFRESH MATERIALIZED VIEW or COPY do not have anything to throttle them. They will generate WAL basically as fast as it can be written to the local disk. In the mean time, everybody else on the system will see their COMMIT operations start to exhibit dramatic hangs and will see apparent sudden performance drops – while they wait for their commit record to eventually get replicated by a lagging replication stream. It can be non-obvious that this performance degradation is completely unrelated to the queries that appear to be slowing down. This is the infamous IPC:SyncRep wait event.
Another drawback: as the replication stream begins to lag, the amount of disk needed for WAL storage balloons. This makes it challenging to predict the required size of a dedicated volume for WAL. A system might seem to have lots of headroom, and then a pg_repack on a large table might fill the WAL volume without warning.
This is a bit different from storage-level synchronous replication. With storage-level replication, each IO operation performing a write to the disk needs to be replicated. Postgres has a single WAL stream – so if any connection issues a COMMIT then postgres will immediately fsync the entire WAL stream up to that point – including all of the WAL for the bulk operation. In this way, the fsync works a little bit like the IPC:SyncRep wait – however I have a sense that fsync somehow introduces more backpressure into the system as a whole and likely provides at least a small amount of healthy throttling for large bulk operations.
When your workload consists ONLY of small short transactions, Postgres database-level replication can work really well and there’s back-pressure that keeps the database system in equilibrium. This Postgres database won’t lag because each individual transaction pauses. The problem is when you start injecting those big bulk operations with no back-pressure to throttle them.
This is also the reason why autovacuum_vacuum_cost_delay of zero can cause chaos and is a bad idea; it unleashes a vacuum running at full speed and generates massive & bursty amounts of WAL for large busy tables, as fast as it can write to the disk.
If you’re seeing the IPC:SyncRep wait event then one of the first things you should do is analyze your WAL activity. Something along these lines might be useful, if you’re debugging in real time (or add something similar to your monitoring system):
psql --csv -Xtc "create extension pg_walinspect"
psql --csv -Xtc "select now(),pg_current_wal_lsn()" >>wal-data.csv
while true; do
NEXTWAL=$(grep ^2025 wal-data.csv|tail -1|cut -d, -f2)
psql --csv -c "SELECT now(),pg_current_wal_lsn(),*
FROM pg_get_wal_stats('$NEXTWAL', pg_current_wal_lsn())" >>wal-data.csv
echo $(date) - $NEXTWAL
sleep 1
done
One potential idea for fixing this would be to add code into postgres vacuum and refresh materialized view and repack and copy which checks the value of the synchronous_commit parameter and performs periodic pauses according to how it’s set. This is a bit like the idea of doing “batch commits” during large bulk data loads, but we don’t need a real commit – we just need to periodically wait for the remote LSN to catch up, according to the value of synchronous_commit. This would provide a bit more healthy back-pressure to throttle those bulk operations, and might protect the rest of the system from such dramatic negative impact.
It might also be good to come up with some monitoring queries which can make it clear when a single connection is flooding the WAL stream with one bulk operation, versus an aggregate total across many write-heavy connections.
.




Discussion
Trackbacks/Pingbacks
Pingback: Data Retention Policy Implementation - How and Why - November 19, 2025