At the Seattle PostgreSQL User Group meetup this past Tuesday, we got onto the topic of invalid pages in PostgreSQL. It was a fun discussion and it made me realize that it’d be worth writing down a bunch of the stuff we talked about – it might be interesting to a few more people too!
You see an error message that looks like this:
ERROR: invalid page in block 1226710 of relation base/16750/27244
First and foremost – what does this error mean? I like to think of PostgreSQL as having a fairly strong “boundary” between (1) the database itself and (2) the operating system [and by extension everything else… firmware, disks, network, remote storage, etc]. PostgreSQL maintains page validity primarily on the way in and out of its buffer cache.
What does this mean in practice? Suppose there’s a physical memory failure and somehow the ECC parity is unable to detect it. This means that a little bit of physical memory on the server now has incorrect garbage and the correct data from that memory is lost.
PostgreSQL has two main “validity checks” that it performs on pages. You can read the code in the function PageIsVerified() but I’ll summarize here. You can tell from your error message which validity check failed. It depends on whether you see a second additional WARNING right before the ERROR. The warning would look like this:Continue reading