<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Postgres on Redowan&#39;s Reflections</title>
    <link>https://rednafi.com/tags/postgres/</link>
    <description>Recent content in Postgres on Redowan&#39;s Reflections</description>
    <image>
      <title>Redowan&#39;s Reflections</title>
      <url>https://blob.rednafi.com/static/images/home/cover.png</url>
      <link>https://rednafi.com/</link>
    </image>
    <generator>Hugo -- 0.162.1</generator>
    <language>en</language>
    <lastBuildDate>Sat, 13 Jun 2026 21:11:14 +0200</lastBuildDate>
    <atom:link href="https://rednafi.com/tags/postgres/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Reading your own writes with WAIT FOR LSN in Postgres 19</title>
      <link>https://rednafi.com/system/wait-for-lsn/</link>
      <pubDate>Sat, 13 Jun 2026 00:00:00 +0000</pubDate>
      <guid>https://rednafi.com/system/wait-for-lsn/</guid>
      <description>PostgreSQL 19&amp;#39;s new WAIT FOR LSN command lets a replica block until it has replayed your write. The read-after-write problem it solves, the workarounds it replaces, and what the timeout, status, and mode options are actually for.</description>
      <content:encoded>&lt;p&gt;Postgres 19 finally gives us a clean way to do read-after-write across replicas. Without it,
here&amp;rsquo;s the problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you write a row to the primary&lt;/li&gt;
&lt;li&gt;then you immediately read it back and that query goes to a replica&lt;/li&gt;
&lt;li&gt;but the replica hasn&amp;rsquo;t replayed the write yet, so you get stale data or nothing at all&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The usual workarounds are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;sleep after the write and hope the replica caught up (terrible, don&amp;rsquo;t do it)&lt;/li&gt;
&lt;li&gt;pin the user to the primary for a few seconds&lt;/li&gt;
&lt;li&gt;or poll the replica until the write shows up&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The new &lt;a href=&#34;https://www.postgresql.org/docs/19/sql-wait-for.html&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;WAIT FOR LSN&lt;/a&gt;
 command replaces all of that. It lets the replica block until it has
replayed up to your write. MySQL has had this for ages with &lt;a href=&#34;https://dev.mysql.com/doc/refman/8.4/en/replication-functions-synchronization.html&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;SOURCE_POS_WAIT()&lt;/a&gt;
; Postgres
never did, so I&amp;rsquo;m glad to see it land in &lt;a href=&#34;https://www.postgresql.org/about/news/postgresql-19-beta-1-released-3313/&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;Postgres 19 beta 1&lt;/a&gt;
. Of everything in this
release, it&amp;rsquo;s the feature I&amp;rsquo;m most excited about.&lt;/p&gt;
&lt;h2 id=&#34;reproducing-the-stale-read&#34;&gt;Reproducing the stale read &lt;a class=&#34;anchor&#34; href=&#34;#reproducing-the-stale-read&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A stale read only shows up when there&amp;rsquo;s real replication lag, so reproducing one needs real
replication. The setup here is two Postgres containers running locally from the
&lt;code&gt;postgres:19beta1&lt;/code&gt; image: a primary, and a &lt;code&gt;pg_basebackup&lt;/code&gt; replica streaming from it.
Postgres calls a streaming replica like this a standby, and that&amp;rsquo;s the word you&amp;rsquo;ll see in
its config settings and its error messages.&lt;/p&gt;
&lt;p&gt;Replication is asynchronous. The primary commits the moment a write reaches its write-ahead
log (WAL). The replica replays that WAL slightly later. In production that gap is a few
milliseconds, and with enough traffic some read lands inside it. Locally, running one query
at a time, you would never catch it. To create that gap on purpose, the replica runs with
&lt;a href=&#34;https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-RECOVERY-MIN-APPLY-DELAY&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;recovery_min_apply_delay&lt;/a&gt;
 set to &lt;code&gt;10s&lt;/code&gt;, a standby setting that holds each commit for a
fixed delay before applying it. So it still receives every WAL record the instant the
primary sends it, but waits ten seconds before applying each one.&lt;/p&gt;
&lt;p&gt;All of it is wired up in &lt;a href=&#34;https://gist.github.com/rednafi/6812c61fd022715e1d94989d49077324&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;a small script&lt;/a&gt;
; pipe it to your shell, and pass a longer delay
like &lt;code&gt;sh -s 30s&lt;/code&gt; if you want a wider window:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;curl -fsSL https://gist.githubusercontent.com/rednafi/6812c61fd022715e1d94989d49077324/raw/wait_for_lsn_lab.sh &lt;span class=&#34;p&#34;&gt;|&lt;/span&gt; sh&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That leaves two containers running. Open a psql session to each in separate terminals;
that&amp;rsquo;s where the &lt;code&gt;-- on the primary&lt;/code&gt; and &lt;code&gt;-- on the replica&lt;/code&gt; snippets below run:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker &lt;span class=&#34;nb&#34;&gt;exec&lt;/span&gt; -it pg-primary psql -U postgres   &lt;span class=&#34;c1&#34;&gt;# terminal 1: primary&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;docker &lt;span class=&#34;nb&#34;&gt;exec&lt;/span&gt; -it pg-replica psql -U postgres   &lt;span class=&#34;c1&#34;&gt;# terminal 2: replica, 10s behind&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Create a table on the primary and seed it with a row:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the primary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;CREATE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;TABLE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;id&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;serial&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;PRIMARY&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;KEY&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;INSERT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;INTO&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;VALUES&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;alice&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The replica runs ten seconds behind, so right after that it hasn&amp;rsquo;t applied the
&lt;code&gt;CREATE TABLE&lt;/code&gt; yet. Read it immediately and the table isn&amp;rsquo;t even there:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ERROR:  relation &amp;#34;users&amp;#34; does not exist
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;LINE 1: SELECT * FROM users;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                      ^&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Ten seconds on, the commit lands and the read works:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; id | name
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;----+-------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  1 | alice
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;So the replica does keep up; it just runs a fixed ten seconds behind the primary. Any read
that lands inside that ten-second window comes back stale.&lt;/p&gt;
&lt;h2 id=&#34;the-usual-workarounds&#34;&gt;The usual workarounds &lt;a class=&#34;anchor&#34; href=&#34;#the-usual-workarounds&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The crudest is to sleep before the read. Drop a 100 ms pause between the write and the read
and hope the lag stays under it. But lag isn&amp;rsquo;t a constant. The pause comes out too long for
the common case and too short for the bad one, and every request pays it even when the
replica had already caught up.&lt;/p&gt;
&lt;p&gt;Sticky reads are a step up. After a user writes, route their reads to the primary for a few
seconds. Plenty of routing layers do exactly this. But now something has to track who wrote
recently, the primary serves reads the replicas were meant to absorb, and &amp;ldquo;a few seconds&amp;rdquo; is
still a guess.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also the heavyweight option: make every write wait. Set &lt;a href=&#34;https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;synchronous_commit&lt;/a&gt;
 to
&lt;code&gt;remote_apply&lt;/code&gt;, and commits on the primary block until synchronous standbys have applied the
WAL.&lt;/p&gt;
&lt;p&gt;Reads are always fresh, but the cost lands in the wrong place. Every commit pays for the
replication round trip, including the ones nobody is ever going to read back. And a single
struggling standby drags down every writer in the cluster.&lt;/p&gt;
&lt;p&gt;If you want to do it properly, you poll the replica. Right after the write, you grab the
commit&amp;rsquo;s LSN, its log sequence number, which is just a byte position in the WAL, with
&lt;a href=&#34;https://www.postgresql.org/docs/current/functions-admin.html&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;pg_current_wal_insert_lsn()&lt;/a&gt;
. Then you ask the replica, over and over, whether its replay
has reached that point yet:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pg_last_wal_replay_lsn&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B038&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pg_lsn&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; ?column?
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You loop until it flips to &lt;code&gt;t&lt;/code&gt;, and only then do the read. It works, but now you own a
busy-wait: you have to pick a poll interval, set a deadline, and decide what happens when
that deadline passes. Every codebase that does this ends up with its own slightly different
version of the loop.&lt;/p&gt;
&lt;p&gt;Polling is the right idea with bad ergonomics. The server knows the exact moment replay
passes an LSN. There was just no way to ask it to block until that happened.&lt;/p&gt;
&lt;h2 id=&#34;wait-for-lsn&#34;&gt;WAIT FOR LSN &lt;a class=&#34;anchor&#34; href=&#34;#wait-for-lsn&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;WAIT FOR LSN&lt;/code&gt; blocks until the replica has replayed up to an LSN you give it. The server
wakes the session the moment replay passes the mark, so there&amp;rsquo;s no loop to write and no
busy-wait to own. The &lt;a href=&#34;https://www.postgresql.org/docs/19/sql-wait-for.html&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;synopsis&lt;/a&gt;
:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;lsn&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;option&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;...]&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It takes three options: &lt;code&gt;MODE&lt;/code&gt;, &lt;code&gt;TIMEOUT&lt;/code&gt;, and &lt;code&gt;NO_THROW&lt;/code&gt;. Leave them off and you get the
defaults: the &lt;code&gt;standby_replay&lt;/code&gt; mode, no timeout, and a thrown error if the wait can&amp;rsquo;t be
met.&lt;/p&gt;
&lt;p&gt;End to end, a correct read-after-write is four steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;write on the primary&lt;/li&gt;
&lt;li&gt;on that same connection, right after the commit, grab the LSN with
&lt;code&gt;pg_current_wal_insert_lsn()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;carry that LSN to wherever the read happens: a session store, a cookie, whatever the
client echoes back&lt;/li&gt;
&lt;li&gt;on the replica, run &lt;code&gt;WAIT FOR LSN&lt;/code&gt; with that LSN, then do the read&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Run it on a real write. On the primary, insert a row and grab its LSN on the same
connection, which covers steps 1 and 2:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the primary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;INSERT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;INTO&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;VALUES&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;bob&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pg_current_wal_insert_lsn&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;INSERT 0 1
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; pg_current_wal_insert_lsn
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; 0/0307B038
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Your own value will differ, so use whatever your primary prints. bob is committed now, but
the replica is still ten seconds behind, so a read over there can&amp;rsquo;t see him yet. That&amp;rsquo;s why,
before the read, you wait on the LSN you just captured (step 4):&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B038&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It blocks for the rest of the ten-second delay, then returns the instant replay reaches the
LSN:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; status
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; success
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;And now the read is guaranteed to see bob:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FROM&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; id | name
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;----+-------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  1 | alice
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  2 | bob
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(2 rows)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;alert alert-warning&#34;&gt;
  &lt;p class=&#34;alert-title&#34;&gt;Warning&lt;/p&gt;
  &lt;p&gt;Grab the LSN after you commit, not while the transaction is still open. If you call
&lt;code&gt;pg_current_wal_insert_lsn()&lt;/code&gt; before the commit, the commit record doesn&amp;rsquo;t exist yet, so
your real commit lands at a higher LSN than the one you captured. Wait on that lower
number and it can return before your row is actually visible.&lt;/p&gt;
&lt;/div&gt;&lt;h2 id=&#34;timeouts-and-statuses&#34;&gt;Timeouts and statuses &lt;a class=&#34;anchor&#34; href=&#34;#timeouts-and-statuses&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;By default the wait has no deadline. As the docs put it:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;If no timeout is specified or it is set to zero, this command waits indefinitely.&lt;/p&gt;

&lt;/blockquote&gt;&lt;p&gt;A replica that&amp;rsquo;s wedged or hours behind would leave your request hanging for exactly that
long, so in practice you bound it with &lt;code&gt;TIMEOUT&lt;/code&gt;. To watch one fire, commit a third row on
the primary and grab its LSN:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the primary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;INSERT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;INTO&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;users&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;name&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;VALUES&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;carol&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;pg_current_wal_insert_lsn&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;INSERT 0 1
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; pg_current_wal_insert_lsn
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; 0/0307B120
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then wait on it inside the ten-second window, but give it only two seconds:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TIMEOUT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;2s&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ERROR:  timed out while waiting for target LSN 0/0307B120 to be replayed; current standby_replay LSN 0/0307B0F8&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;By default, a wait that doesn&amp;rsquo;t succeed raises an error, so your application has to catch
it. Add &lt;code&gt;NO_THROW&lt;/code&gt; and it returns a status instead of raising:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TIMEOUT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;2s&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NO_THROW&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; status
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; timeout
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The status is one of &lt;code&gt;success&lt;/code&gt;, &lt;code&gt;timeout&lt;/code&gt;, or &lt;code&gt;not in recovery&lt;/code&gt;. That last one means the
server you asked isn&amp;rsquo;t a standby. You&amp;rsquo;d see it if you ran the command on the primary by
mistake, which without &lt;code&gt;NO_THROW&lt;/code&gt; is an error:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the primary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ERROR:  recovery is not in progress
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;HINT:  Waiting for the standby_replay LSN can only be executed during recovery.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the primary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;NO_THROW&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;     status
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; not in recovery
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;You&amp;rsquo;d also see it if the standby gets promoted while a wait is in flight. Promote it while a
session is waiting on a far-off LSN, and that session comes back with:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ERROR:  recovery is not in progress
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;DETAIL:  Recovery ended before target LSN 99/00000000 was replayed; last standby_replay LSN 0/0307B240.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Promotion starts a new timeline, so the docs say to re-evaluate whether the LSN you&amp;rsquo;re
holding still means anything.&lt;/p&gt;
&lt;p&gt;That leaves the read path with three branches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;success&lt;/code&gt;: read from the replica&lt;/li&gt;
&lt;li&gt;&lt;code&gt;timeout&lt;/code&gt;: fall back to the primary, or knowingly serve stale data&lt;/li&gt;
&lt;li&gt;&lt;code&gt;not in recovery&lt;/code&gt;: the topology changed, so re-check which server is the primary before
retrying&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;the-other-modes&#34;&gt;The other modes &lt;a class=&#34;anchor&#34; href=&#34;#the-other-modes&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;div class=&#34;alert alert-tip&#34;&gt;
  &lt;p class=&#34;alert-title&#34;&gt;Tip&lt;/p&gt;
  &lt;p&gt;Everything so far has used the default mode, &lt;code&gt;standby_replay&lt;/code&gt;. For read-after-write it&amp;rsquo;s
the one you want, and probably the only one you&amp;rsquo;ll ever touch. The other three are about
durability, not read visibility.&lt;/p&gt;
&lt;/div&gt;&lt;p&gt;&lt;code&gt;MODE&lt;/code&gt; picks which milestone the wait blocks on. Besides the default it takes &lt;a href=&#34;https://postgr.es/c/49a181b5d&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;three more
values&lt;/a&gt;
. A committed write reaches a standby in three stages, and there&amp;rsquo;s a mode for each:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;standby_write&lt;/code&gt;: the WAL is written to the standby&amp;rsquo;s OS, though it may still sit in OS
buffers&lt;/li&gt;
&lt;li&gt;&lt;code&gt;standby_flush&lt;/code&gt;: the WAL is flushed to the standby&amp;rsquo;s disk&lt;/li&gt;
&lt;li&gt;&lt;code&gt;standby_replay&lt;/code&gt;: the WAL is applied, so a &lt;code&gt;SELECT&lt;/code&gt; on the standby can see it&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;standby_replay&lt;/code&gt; is the default and the only one of the three about visibility. The other
two stop earlier, once the WAL has reached the standby but before it&amp;rsquo;s applied. Streaming
and replay are separate steps, and the apply delay only slows replay, so carol&amp;rsquo;s WAL is
already written and flushed on the standby even though its replay is still ten seconds out.
The LSN that timed out under &lt;code&gt;standby_replay&lt;/code&gt; a moment ago comes back instantly under both:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica, still inside the apply delay
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;MODE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;standby_write&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; status
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; success
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the replica, still inside the apply delay
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;MODE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;standby_flush&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; status
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; success
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;That looks useless until you remember asynchronous replication can lose a write for good:
the primary acknowledges a commit, then crashes before any standby has the WAL, and the row
goes with it.&lt;/p&gt;
&lt;p&gt;For a write that can&amp;rsquo;t tolerate that, say it moves money, you commit it on the primary, grab
its LSN, and then wait for a standby to confirm it has the WAL before you report success.
&lt;code&gt;standby_flush&lt;/code&gt; is the mode to reach for: once it returns, the WAL is on the standby&amp;rsquo;s disk,
so the row survives a crash on the primary and the failover to that standby.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;standby_write&lt;/code&gt; is cheaper and weaker, since the bytes may still be in OS buffers when it
returns. Either way the cost falls only on the writes that ask for it, instead of switching
the whole cluster to synchronous replication.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;primary_flush&lt;/code&gt; is the odd one out. It runs on the primary rather than a standby, and waits
for the primary&amp;rsquo;s own WAL to reach disk:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;-- on the primary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;WITH&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;MODE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;primary_flush&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; status
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;---------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; success
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;It came back at once here because that WAL was flushed long ago. Where it pays off is under
&lt;code&gt;synchronous_commit = off&lt;/code&gt;, where a &lt;code&gt;COMMIT&lt;/code&gt; returns before its WAL is fsynced, so a crash
can drop the last few rows you thought were committed. Fire a batch of those fast commits,
then one &lt;code&gt;primary_flush&lt;/code&gt; wait on the latest LSN confirms everything up to it is on disk.
Once it returns, &lt;code&gt;pg_current_wal_flush_lsn()&lt;/code&gt; has reached or passed your target.&lt;/p&gt;
&lt;h2 id=&#34;restrictions&#34;&gt;Restrictions &lt;a class=&#34;anchor&#34; href=&#34;#restrictions&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;WAIT FOR LSN&lt;/code&gt; only runs as a top-level statement, and only when no snapshot is open. Wrap
it in a function or a &lt;code&gt;DO&lt;/code&gt; block and Postgres refuses:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;DO&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;err&#34;&gt;$$&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;BEGIN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;EXECUTE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;WAIT FOR LSN &amp;#39;&amp;#39;0/0307B120&amp;#39;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;END&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;err&#34;&gt;$$&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ERROR:  WAIT FOR can only be executed as a top-level statement
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;DETAIL:  WAIT FOR cannot be used within a function, procedure, or DO block.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;CONTEXT:  SQL statement &amp;#34;WAIT FOR LSN &amp;#39;0/0307B120&amp;#39;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;PL/pgSQL function inline_code_block line 3 at EXECUTE&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Once a transaction above &lt;code&gt;READ COMMITTED&lt;/code&gt; has taken a snapshot, it holds that snapshot for
the rest of the transaction, so those are out too:&lt;/p&gt;
&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-sql&#34; data-lang=&#34;sql&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;BEGIN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;ISOLATION&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;LEVEL&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;REPEATABLE&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;READ&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;SELECT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;WAIT&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;FOR&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;LSN&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;0/0307B120&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;ROLLBACK&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;codeblock&#34;&gt;
  &lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34; data-lang=&#34;txt&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;BEGIN
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; ?column?
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;        1
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;(1 row)
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ERROR:  WAIT FOR must be called without an active or registered snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;DETAIL:  WAIT FOR cannot be executed within a transaction with an isolation level higher than READ COMMITTED.
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ROLLBACK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;At &lt;code&gt;READ COMMITTED&lt;/code&gt; you can still run the wait inside a transaction, even after other
queries, since the snapshot is dropped between statements. It&amp;rsquo;s only blocked once a snapshot
is pinned: by &lt;code&gt;REPEATABLE READ&lt;/code&gt; or higher, an open cursor, or a surrounding function.&lt;/p&gt;
&lt;div class=&#34;alert alert-tip&#34;&gt;
  &lt;p class=&#34;alert-title&#34;&gt;Tip&lt;/p&gt;
  &lt;p&gt;The simplest habit is to issue &lt;code&gt;WAIT FOR LSN&lt;/code&gt; on its own, right before the read it&amp;rsquo;s
guarding.&lt;/p&gt;
&lt;/div&gt;&lt;p&gt;Two more things the docs call out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a wait on a standby can be interrupted by recovery conflicts, so the command can fail for
reasons unrelated to your LSN, and retrying is on you&lt;/li&gt;
&lt;li&gt;LSNs know nothing about timelines, so after a failover the same number can mean a
different history, which is exactly what the &lt;code&gt;not in recovery&lt;/code&gt; status is warning you about&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;why-it-took-three-tries&#34;&gt;Why it took three tries &lt;a class=&#34;anchor&#34; href=&#34;#why-it-took-three-tries&#34;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Postgres tried to ship this twice before 19 and pulled it back both times. A stored
procedure called &lt;code&gt;pg_wal_replay_wait()&lt;/code&gt; was committed during the Postgres 17 cycle and
reverted before release. It was &lt;a href=&#34;https://pgpedia.info/p/pg_wal_replay_wait.html&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;committed again for 18, and reverted again&lt;/a&gt;
. No released
version ever shipped it.&lt;/p&gt;
&lt;p&gt;The blocker is the snapshot rule from the restrictions above. A query on a standby holds a
snapshot, and while it&amp;rsquo;s alive Postgres can&amp;rsquo;t discard the row versions that snapshot might
still need. That collides with replication: replaying WAL sometimes removes old row
versions, say after a vacuum on the primary, so replay stalls behind any query holding an
older snapshot.&lt;/p&gt;
&lt;p&gt;Now picture the session waiting for replay also holding a snapshot. It&amp;rsquo;s waiting for replay
to advance while replay waits for its snapshot to go away. Deadlock. The &lt;a href=&#34;https://postgr.es/c/447aae13b&#34; rel=&#34;noopener noreferrer&#34; target=&#34;_blank&#34;&gt;commit that landed
in 19&lt;/a&gt;
 cites exactly this as why a function can&amp;rsquo;t do the job: a function always runs inside
a query, and that query holds a snapshot. The stored-procedure version kept hitting the same
wall, so 19 made it a top-level command instead.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s still beta 1. GA should land around September or October. Given the history above I
wouldn&amp;rsquo;t wire this into anything important yet, but for the first time it feels like it&amp;rsquo;ll
actually stick.&lt;/p&gt;
&lt;!-- References --&gt;
&lt;!-- prettier-ignore-start --&gt;
&lt;!-- prettier-ignore-end --&gt;
</content:encoded>
      <category>Database</category>
      <category>SQL</category>
      <category>Distributed Systems</category>
      <category>Postgres</category>
    </item>
  </channel>
</rss>

