Jason Owen - postgresqlhttps://jasonaowen.net/blog/2017-07-29T00:00:00-07:00Setting up PostgreSQL on Debian2017-07-29T00:00:00-07:002017-07-29T00:00:00-07:00Jason Owentag:jasonaowen.net,2017-07-29:/blog/2017/Jul/29/setting-up-postgresql-on-debian/<p>From time to time I need to set up PostgreSQL on a Debian machine. It's fairly
straightforward, but I frequently need to look something up, so this time I am
writing down my notes.</p>
<p><a href="https://packages.debian.org/stretch/postgresql">Debian packages PostgreSQL</a>,
and if you don't care about what version of PostgreSQL you use that's …</p><p>From time to time I need to set up PostgreSQL on a Debian machine. It's fairly
straightforward, but I frequently need to look something up, so this time I am
writing down my notes.</p>
<p><a href="https://packages.debian.org/stretch/postgresql">Debian packages PostgreSQL</a>,
and if you don't care about what version of PostgreSQL you use that's the
easiest way. If you do care about what version, <a href="https://www.postgresql.org/download/linux/debian/">the PostgreSQL project
packages all the supported versions of
PostgreSQL</a> - this allows
you to install old (supported) versions, and will allow you to easily install
PostgreSQL 10 once it is released. The PostgreSQL Debian page includes
instructions on how to add their apt repository.</p>
<p>Whether you're installing the Debian-packaged or PostgreSQL-packaged server,
installation of the current version of the server is the same:</p>
<div class="highlight"><pre><span></span>$ sudo apt install postgresql-9.6
</pre></div>
<p>This will install the server and the client, and create and start a cluster.</p>
<p>The default, installer-created cluster does not have <a href="http://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-data-checksums/">data
checksums</a>
enabled. Data checksums trade performance for safety; since I am not using
PostgreSQL in a particularly performance-sensitive environment, I would prefer
safety. To enable data checksums, we will need to recreate the cluster.</p>
<p>Additionally, I prefer to have PostgreSQL store its data files in
<code>/srv/postgresql</code> instead of <code>/var/lib/postgresql/</code>. Since we're recreating the
cluster anyway, we don't need to move any files, and can instead simply specify
the new location at creation time.</p>
<p>First, drop the old cluster:</p>
<div class="highlight"><pre><span></span>$ sudo -u postgres pg_dropcluster --stop <span class="m">9</span>.6 main
</pre></div>
<p>Then, create the new cluster:</p>
<div class="highlight"><pre><span></span>$ sudo -u postgres pg_createcluster <span class="se">\</span>
--datadir<span class="o">=</span>/srv/postgresql/9.6/main <span class="se">\</span>
--start <span class="se">\</span>
<span class="m">9</span>.6 <span class="se">\</span>
main <span class="se">\</span>
-- <span class="se">\</span>
--data-checksums
</pre></div>
<p>Note the version and cluster name in the data directory argument. Debian
contributors have created wrapper scripts (like <code>pg_createcluster</code>) to allow
multiple versions and multiple instances of PostgreSQL to run on the same
system side-by-side; including the version and cluster name in the path support
that.</p>
<p>You can check that data checksums are enabled by running this query in a psql
session:</p>
<div class="highlight"><pre><span></span>$ <span class="nv">sudo</span> <span class="o">-</span><span class="nv">u</span> <span class="nv">postgres</span> <span class="nv">psql</span>
<span class="nv">postgres</span><span class="o">=</span># <span class="k">show</span> <span class="nv">data_checksums</span><span class="c1">;</span>
<span class="nv">data_checksums</span>
<span class="o">----------------</span>
<span class="nv">on</span>
<span class="ss">(</span><span class="mi">1</span> <span class="nv">row</span><span class="ss">)</span>
</pre></div>
<p>Once you've created a cluster, you'll probably want a user and a database:</p>
<div class="highlight"><pre><span></span>$ sudo -u postgres createuser <span class="si">${</span><span class="nv">USER</span><span class="si">}</span>
$ sudo -u postgres createdb --owner<span class="o">=</span><span class="si">${</span><span class="nv">USER</span><span class="si">}</span> <span class="si">${</span><span class="nv">USER</span><span class="si">}</span>
</pre></div>
<p>Now you should be able to log in with <code>psql</code>!</p>Benchmarking UUIDs, v22017-04-13T19:20:00-04:002017-04-14T10:15:00-04:00Jason Owentag:jasonaowen.net,2017-04-13:/blog/2017/Apr/13/benchmarking-uuids-v2/<h1>Correction</h1>
<p>Shortly after I published <a href="https://jasonaowen.net/blog/2017/Apr/13/benchmarking-uuids/">Benchmarking
UUIDs</a>, Per Wigren emailed me
with a correction. It turns out the approach Jonathan and I used to time how
long PostgreSQL takes to generate a million UUIDs is mostly timing how long it
takes to generate a million queries:</p>
<div class="highlight"><pre><span></span><span class="k">DO</span> <span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="k">FOR</span> <span class="n">i …</span></pre></div><h1>Correction</h1>
<p>Shortly after I published <a href="https://jasonaowen.net/blog/2017/Apr/13/benchmarking-uuids/">Benchmarking
UUIDs</a>, Per Wigren emailed me
with a correction. It turns out the approach Jonathan and I used to time how
long PostgreSQL takes to generate a million UUIDs is mostly timing how long it
takes to generate a million queries:</p>
<div class="highlight"><pre><span></span><span class="k">DO</span> <span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="k">FOR</span> <span class="n">i</span> <span class="k">IN</span> <span class="mf">0..1000000</span> <span class="k">LOOP</span>
<span class="k">PERFORM</span> <span class="mf">1</span><span class="p">;</span>
<span class="k">END</span> <span class="k">LOOP</span><span class="p">;</span>
<span class="k">RETURN</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
</pre></div>
<p>They pointed out a better way to test:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="mf">1</span> <span class="k">FROM</span> <span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span> <span class="mf">1000000</span><span class="p">)</span>
<span class="p">)</span> <span class="k">AS</span> <span class="n">x</span><span class="p">;</span>
</pre></div>
<p>This results in a roughly order-of-magnitude difference in test times, just in
overhead.</p>
<p>When we take this insight and apply it to the two UUID generator functions,
we find that PostgreSQL is faster at this task than nodejs:</p>
<div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="n">uuid_generate_v4</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span> <span class="mf">1000000</span><span class="p">)</span>
<span class="p">)</span> <span class="k">AS</span> <span class="n">x</span><span class="p">;</span>
</pre></div>
<div class="highlight"><pre><span></span><span class="k">SELECT</span> <span class="n">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span> <span class="p">(</span>
<span class="k">SELECT</span> <span class="n">gen_random_uuid</span><span class="p">()</span> <span class="k">FROM</span> <span class="n">generate_series</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span> <span class="mf">1000000</span><span class="p">)</span>
<span class="p">)</span> <span class="k">AS</span> <span class="n">x</span><span class="p">;</span>
</pre></div>
<p>On my machine, I see a big difference between the two functions, more than 5x:</p>
<table>
<thead>
<tr>
<th>uuid_generate_v4 (uuid-ossp)</th>
<th>gen_random_uuid (pgcrypto)</th>
<th>nodejs</th>
</tr>
</thead>
<tbody>
<tr>
<td>6484.110 ms</td>
<td>1166.969 ms</td>
<td>2886.117 ms</td>
</tr>
<tr>
<td>6451.433 ms</td>
<td>1169.010 ms</td>
<td>2822.078 ms</td>
</tr>
<tr>
<td>6285.573 ms</td>
<td>1161.001 ms</td>
<td>2829.395 ms</td>
</tr>
</tbody>
</table>
<p>Interestingly, on Per Wigren's machine, running macOS Sierra with PostgreSQL
9.6.2 installed from Homebrew, the two functions were approximately equally
fast, with <code>uuid_generate_v4</code> slightly edging out <code>gen_random_uuid</code>. Both were
faster than the nodejs version.</p>
<h1>Conclusion</h1>
<ul>
<li>Writing benchmarks is tricky!</li>
<li>Using this updated methodology, on my machine, PostgreSQL with pgcrypto is
faster at generating UUIDs than nodejs, which in turn is faster than
PostgreSQL with uuid-ossp.</li>
</ul>
<p>Thankfully, I don't think the flaw in my original measurements undermines the
conclusion I drew: the difference between these methods is vanishingly small,
and the likelihood that generating UUIDs is the bottleneck in your system is
low. Better to focus your optimization efforts elsewhere!</p>
<hr>
<p><em>Many thanks to Per Wigren for the feedback!</em></p>
<p>2017-04-14: Updated to credit Per Wigren and clarify the table of new
measurements by adding the extension to the title of the columns and including
the nodejs measurements from the previous post.</p>Benchmarking UUIDs2017-04-13T17:17:00-04:002017-04-13T19:20:00-04:00Jason Owentag:jasonaowen.net,2017-04-13:/blog/2017/Apr/13/benchmarking-uuids/<p>UPDATE: The test methodology is flawed! PostgreSQL can be faster than nodejs.
See the <a href="https://jasonaowen.net/blog/2017/Apr/13/benchmarking-uuids-v2/">follow-up article</a>.</p>
<hr>
<p>Jonathan New wrote an interesting article on <a href="http://blog.jonnew.com/posts/uuid-postgres-node">UUID creation in Postgres vs
Node</a>. In it, he described the
performance tradeoff of generating a
<a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> in the
database vs in the application. It's not very …</p><p>UPDATE: The test methodology is flawed! PostgreSQL can be faster than nodejs.
See the <a href="https://jasonaowen.net/blog/2017/Apr/13/benchmarking-uuids-v2/">follow-up article</a>.</p>
<hr>
<p>Jonathan New wrote an interesting article on <a href="http://blog.jonnew.com/posts/uuid-postgres-node">UUID creation in Postgres vs
Node</a>. In it, he described the
performance tradeoff of generating a
<a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> in the
database vs in the application. It's not very long, go read it!</p>
<p>I've used PostgreSQL to generate UUIDs before, but I hadn't seen the function
<code>uuid_generate_v4()</code>. It turns out to come from the <a href="https://www.postgresql.org/docs/9.6/static/uuid-ossp.html">uuid-ossp
extension</a>, which
also supports other UUID generation methods. Previously, I've used the
<a href="https://www.postgresql.org/docs/9.6/static/pgcrypto.html">pgcrypto extension</a>,
which provides the <code>gen_random_uuid()</code> function.</p>
<p>How do they compare? On my machine, using the <a href="https://wiki.postgresql.org/wiki/Apt">PostgreSQL package for
Ubuntu</a> (as opposed to the <a href="http://packages.ubuntu.com/xenial/postgresql">Ubuntu
package for PostgreSQL</a>...), the
pgcrypto version is more than <strong>twice as fast</strong> than the uuid-ossp version.</p>
<p>How does this compare with nodejs? Using Jonathan's approach, nodejs is about
1.5 times as fast as PostgreSQL with pgcrypto!</p>
<table>
<thead>
<tr>
<th>uuid-ossp</th>
<th>pgcrypto</th>
<th>nodejs</th>
</tr>
</thead>
<tbody>
<tr>
<td>10942.376 ms</td>
<td>4173.924 ms</td>
<td>2886.117 ms</td>
</tr>
<tr>
<td>11235.807 ms</td>
<td>4341.270 ms</td>
<td>2822.078 ms</td>
</tr>
<tr>
<td>10764.468 ms</td>
<td>4265.632 ms</td>
<td>2829.395 ms</td>
</tr>
</tbody>
</table>
<p>What does this mean? I argue: very little! The slowest method takes ~11 seconds
to generate one million UUIDs, and the fastest takes ~3 seconds. That's 3 - 11
microseconds per UUID! If this is the bottleneck in your application, I think
you've done a very good job of optimizing - and you might have a pretty unusual
use case.</p>
<p>PS: the <a href="https://www.postgresql.org/docs/9.6/static/sql-insert.html"><code>RETURNING</code>
clause</a>, not
mentioned in Jonathan's post, is really cool:</p>
<div class="highlight"><pre><span></span><span class="o">></span> <span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">example</span> <span class="p">(</span>
<span class="n">example_id</span> <span class="nb">UUID</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="k">DEFAULT</span> <span class="n">gen_random_uuid</span><span class="p">(),</span>
<span class="n">number</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span>
<span class="p">);</span>
<span class="go">CREATE TABLE</span>
<span class="go">> INSERT INTO example (number)</span>
<span class="go"> VALUES (1)</span>
<span class="go"> RETURNING example_id;</span>
<span class="go"> example_id</span>
<span class="go">--------------------------------------</span>
<span class="go"> 045857b4-6125-4746-94b8-a2e58f342b86</span>
<span class="go">(1 row)</span>
<span class="go">INSERT 0 1</span>
</pre></div>
<h1>Methodology</h1>
<p>This was a very unscientific benchmark! I'm not controlling for other programs
running on my machine, and this is not a server, it's just a laptop.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/BSUMBBFjxrY" frameborder="0" allowfullscreen></iframe>
<p>In the interest of writing things down, here's how I came up with the numbers
above.</p>
<h2>Environment</h2>
<p>According to <code>/proc/cpuinfo</code>, I am running on a Intel(R) Core(TM) i7-3520M CPU
@ 2.90GHz. My operating system is Ubuntu 16.04.2.</p>
<div class="highlight"><pre><span></span><span class="o">></span> <span class="k">select</span> <span class="k">version</span><span class="p">();</span>
<span class="go"> PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit</span>
</pre></div>
<div class="highlight"><pre><span></span>$ nodejs --version
v6.10.2
</pre></div>
<h2>Tests</h2>
<h3>nodejs</h3>
<div class="highlight"><pre><span></span>$ <span class="nb">cd</span> /tmp
$ npm install uuid
/tmp
└── uuid@3.0.1
$ nodejs
</pre></div>
<div class="highlight"><pre><span></span><span class="kr">const</span> <span class="nx">uuidV4</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'uuid/v4'</span><span class="p">);</span>
<span class="nx">test</span> <span class="o">=</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">time</span><span class="p">(</span><span class="s2">"uuid"</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="mi">1000000</span><span class="p">;</span> <span class="o">++</span><span class="nx">i</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">uuidV4</span><span class="p">();</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">timeEnd</span><span class="p">(</span><span class="s2">"uuid"</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">test</span><span class="p">()</span>
<span class="nx">test</span><span class="p">()</span>
<span class="nx">test</span><span class="p">()</span>
</pre></div>
<h3>pgcrypto</h3>
<div class="highlight"><pre><span></span><span class="k">CREATE</span> <span class="k">EXTENSION</span> <span class="n">pgcrypto</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">FUNCTION</span> <span class="n">loop_gen_random_uuid</span><span class="p">()</span> <span class="k">RETURNS</span> <span class="nb">void</span>
<span class="k">LANGUAGE</span> <span class="n">plpgsql</span>
<span class="k">AS</span> <span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="k">FOR</span> <span class="n">i</span> <span class="k">IN</span> <span class="mf">0..1000000</span> <span class="k">LOOP</span>
<span class="k">PERFORM</span> <span class="n">gen_random_uuid</span><span class="p">();</span>
<span class="k">END</span> <span class="k">LOOP</span><span class="p">;</span>
<span class="k">RETURN</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
<span class="err">\</span><span class="n">timing</span> <span class="k">on</span>
<span class="k">SELECT</span> <span class="n">loop_gen_random_uuid</span><span class="p">();</span>
<span class="k">SELECT</span> <span class="n">loop_gen_random_uuid</span><span class="p">();</span>
<span class="k">SELECT</span> <span class="n">loop_gen_random_uuid</span><span class="p">();</span>
</pre></div>
<h3>uuid-ossp</h3>
<div class="highlight"><pre><span></span><span class="k">CREATE</span> <span class="k">EXTENSION</span> <span class="s s-Name">"uuid-ossp"</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">FUNCTION</span> <span class="n">loop_uuid_generate_v4</span><span class="p">()</span> <span class="k">RETURNS</span> <span class="nb">void</span>
<span class="k">LANGUAGE</span> <span class="n">plpgsql</span>
<span class="k">AS</span> <span class="s">$$</span>
<span class="k">BEGIN</span>
<span class="k">FOR</span> <span class="n">i</span> <span class="k">IN</span> <span class="mf">0..1000000</span> <span class="k">LOOP</span>
<span class="k">PERFORM</span> <span class="n">uuid_generate_v4</span><span class="p">();</span>
<span class="k">END</span> <span class="k">LOOP</span><span class="p">;</span>
<span class="k">RETURN</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span><span class="p">;</span>
<span class="err">\</span><span class="n">timing</span> <span class="k">on</span>
<span class="k">SELECT</span> <span class="n">loop_uuid_generate_v4</span><span class="p">();</span>
<span class="k">SELECT</span> <span class="n">loop_uuid_generate_v4</span><span class="p">();</span>
<span class="k">SELECT</span> <span class="n">loop_uuid_generate_v4</span><span class="p">();</span>
</pre></div>
<h4>Background on <code>uuid-ossp</code></h4>
<p>The <code>uuid-ossp</code> extension
<a href="https://www.postgresql.org/docs/9.6/static/uuid-ossp.html#AEN184550">builds</a>
upon some underlying library: <code>libc</code> on BSDs, <code>libuuid</code> from e2fs, or <code>ossp</code>,
the original library from which the extension takes its name. It appears that
<a href="https://postgresapp.com/">Postgres.app</a> uses <code>libuuid</code>, according to <a href="https://github.com/PostgresApp/PostgresApp/blob/122a60e975368038d3fe003b09d3979888d66ea2/src/makefile#L81">its
Makefile</a>
(note the <code>--with-uuid=e2fs</code>). The <a href="https://wiki.postgresql.org/wiki/Apt">PostgreSQL package for
Ubuntu</a> (as opposed to the <a href="http://packages.ubuntu.com/xenial/postgresql">Ubuntu
package for PostgreSQL</a>...) uses
the same library:</p>
<div class="highlight"><pre><span></span>$ ldd /usr/lib/postgresql/9.6/lib/uuid-ossp.so <span class="p">|</span> grep uuid
libuuid.so.1 <span class="o">=</span>> /lib/x86_64-linux-gnu/libuuid.so.1 <span class="o">(</span>0x00007fa4de773000<span class="o">)</span>
</pre></div>
<p>So, I think Jonathan and I are using the same underlying library, and while we
have different machines, I think this is a reasonably apples-to-apples
comparison.</p>AWS PostgreSQL RDS Passwords2017-02-09T00:00:00-08:002017-02-09T00:00:00-08:00Jason Owentag:jasonaowen.net,2017-02-09:/blog/2017/Feb/09/aws-postgresql-rds-passwords/<p>In order to set a strong password for the PostgreSQL database I provisioned on
Amazon RDS, I looked up the limits. In my case, there are two sources of
constraints:</p>
<ol>
<li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Limits.html">Amazon RDS
limits</a><ul>
<li>Must contain 8 to 128 characters</li>
<li>The password for the master database user can be any printable …</li></ul></li></ol><p>In order to set a strong password for the PostgreSQL database I provisioned on
Amazon RDS, I looked up the limits. In my case, there are two sources of
constraints:</p>
<ol>
<li><a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Limits.html">Amazon RDS
limits</a><ul>
<li>Must contain 8 to 128 characters</li>
<li>The password for the master database user can be any printable ASCII
character except <code>/</code>, <code>`</code>, or <code>@</code>.</li>
</ul>
</li>
<li>Characters allowed in Amazon Lambda environment variables<ul>
<li>Member must satisfy regular expression pattern: <code>[^,]*</code> (I cannot find
documentation for this, except the error message when you try to save a
value that has a comma in it.)</li>
</ul>
</li>
</ol>
<p>We can generate a password that meets these restrictions with
<a href="https://tracker.debian.org/pkg/makepasswd">makepasswd(1)</a>:</p>
<div class="highlight"><pre><span></span>$ makepasswd --chars<span class="o">=</span><span class="m">128</span> <span class="se">\ </span>--string
<span class="s1">'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`~!#$%^&*()-_=+[]{}\|;:<>.?'</span><span class="se">\'</span>
</pre></div>
<p>Note the <code>'\'</code> at the end: that means "close the single-quoted string, and
append an escaped single-quote."</p>
<p>You can then save this to your
<a href="https://www.postgresql.org/docs/current/static/libpq-pgpass.html"><code>~/.pgpass</code></a>
file, being sure to escape <code>\</code> and <code>:</code> characters:</p>
<div class="highlight"><pre><span></span>$ sed -e <span class="s1">'s/\\/\\\\/g;s/:/\\:/g'</span>
</pre></div>Variable names in PostgreSQL stored procedures2017-02-08T00:00:00-08:002017-02-08T00:00:00-08:00Jason Owentag:jasonaowen.net,2017-02-08:/blog/2017/Feb/08/variable-names-in-postgresql-stored-procedures/<p>I am building a web application that delegates authentication to a third party.
Once the third party authenticates the user, the app create a session for the
user - and maybe create the user, too, if they don't already exist!</p>
<p>My first draft of this had all the SQL queries in …</p><p>I am building a web application that delegates authentication to a third party.
Once the third party authenticates the user, the app create a session for the
user - and maybe create the user, too, if they don't already exist!</p>
<p>My first draft of this had all the SQL queries in the code. The logic is
something like:</p>
<div class="highlight"><pre><span></span><span class="nv">does</span> <span class="nv">user</span> <span class="nv">exist</span>?
<span class="nv">yes</span>: <span class="nv">create</span> <span class="nv">session</span>
<span class="nv">no</span>: <span class="nv">create</span> <span class="nv">user</span>, <span class="k">then</span> <span class="nv">create</span> <span class="nv">session</span>
</pre></div>
<p>I wasn't very happy with the code, for a couple of reasons. First, the SQL
queries were rather ugly string constants. Multi-line strings aren't really
great in any language, and embedding SQL in another language's source file
makes it harder for editors to do syntax highlighting. Second, handling errors
and encoding the (fairly simple) logic above was obscuring my intent.</p>
<p><a href="https://en.wikipedia.org/wiki/Stored_procedure">Stored procedures</a> are a way
to keep database logic in the database. Among other benefits, this can
dramatically simplify the calling code.</p>
<p>I ended up with a function like the following:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span> <span class="k">FUNCTION</span> <span class="n">create_session</span><span class="p">(</span>
<span class="n">external_user_id</span> <span class="nb">bigint</span><span class="p">,</span>
<span class="n">external_user_name</span> <span class="nb">text</span><span class="p">,</span>
<span class="k">OUT</span> <span class="n">session_id</span> <span class="nb">uuid</span><span class="p">,</span>
<span class="k">OUT</span> <span class="n">session_expiration</span> <span class="nb">TIMESTAMP</span> <span class="nb">WITH TIME ZONE</span>
<span class="p">)</span> <span class="k">AS</span> <span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="n">existing_user_id</span> <span class="nb">INTEGER</span><span class="p">;</span>
<span class="n">new_user_id</span> <span class="nb">INTEGER</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="k">SELECT</span> <span class="k">INTO</span> <span class="n">existing_user_id</span> <span class="n">user_id</span>
<span class="k">FROM</span> <span class="n">users_external</span>
<span class="k">WHERE</span> <span class="n">users_external</span><span class="mf">.</span><span class="n">external_user_id</span> <span class="o">=</span> <span class="n">external_user_id</span><span class="p">;</span>
<span class="k">IF</span> <span class="n">existing_user_id</span> <span class="k">IS</span> <span class="k">NULL</span> <span class="k">THEN</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">users_external</span> <span class="p">(</span><span class="n">external_user_id</span><span class="p">,</span> <span class="n">external_user_name</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="n">external_user_id</span><span class="p">,</span> <span class="n">external_user_name</span><span class="p">)</span>
<span class="k">RETURNING</span> <span class="n">user_id</span>
<span class="k">INTO</span> <span class="n">new_user_id</span><span class="p">;</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">sessions</span> <span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="n">new_user_id</span><span class="p">)</span>
<span class="k">RETURNING</span> <span class="n">session_id</span><span class="p">,</span> <span class="n">session_expiration</span>
<span class="k">INTO</span> <span class="n">session_id</span><span class="p">,</span> <span class="n">session_expiration</span><span class="p">;</span>
<span class="k">ELSE</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">sessions</span> <span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="n">existing_user_id</span><span class="p">)</span>
<span class="k">RETURNING</span> <span class="n">session_id</span><span class="p">,</span> <span class="n">expires</span>
<span class="k">INTO</span> <span class="n">session_id</span><span class="p">,</span> <span class="n">session_expiration</span><span class="p">;</span>
<span class="k">END</span> <span class="k">IF</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span> <span class="k">LANGUAGE</span> <span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>This is a syntactically correct function and will be accepted by PostgreSQL,
but fails at runtime:</p>
<div class="highlight"><pre><span></span><span class="o">></span> <span class="k">select</span> <span class="o">*</span> <span class="k">from</span> <span class="n">create_session</span><span class="p">(</span><span class="mi">12345</span><span class="p">,</span> <span class="s1">'example'</span><span class="p">);</span>
<span class="n">ERROR</span><span class="p">:</span> <span class="k">column</span> <span class="n">reference</span> <span class="ss">"external_user_id"</span> <span class="k">is</span> <span class="n">ambiguous</span>
<span class="n">LINE</span> <span class="mi">3</span><span class="p">:</span> <span class="k">WHERE</span> <span class="n">users_external</span><span class="p">.</span><span class="n">external_user_id</span> <span class="o">=</span> <span class="n">external_user_id</span>
<span class="o">^</span>
<span class="n">DETAIL</span><span class="p">:</span> <span class="n">It</span> <span class="n">could</span> <span class="n">refer</span> <span class="k">to</span> <span class="n">either</span> <span class="n">a</span> <span class="n">PL</span><span class="o">/</span><span class="n">pgSQL</span> <span class="k">variable</span> <span class="k">or</span> <span class="n">a</span> <span class="k">table</span> <span class="k">column</span><span class="p">.</span>
<span class="n">QUERY</span><span class="p">:</span> <span class="k">SELECT</span> <span class="n">user_id</span>
<span class="k">FROM</span> <span class="n">users_external</span>
<span class="k">WHERE</span> <span class="n">users_external</span><span class="p">.</span><span class="n">external_user_id</span> <span class="o">=</span> <span class="n">external_user_id</span>
<span class="n">CONTEXT</span><span class="p">:</span> <span class="n">PL</span><span class="o">/</span><span class="n">pgSQL</span> <span class="k">function</span> <span class="n">create_session</span><span class="p">(</span><span class="nb">bigint</span><span class="p">,</span><span class="nb">text</span><span class="p">)</span> <span class="n">line</span> <span class="mi">6</span> <span class="k">at</span> <span class="k">SQL</span> <span class="k">statement</span>
</pre></div>
<h1>How do you disambiguate a parameter from a column name?</h1>
<p>There is some useful documentation about <a href="https://www.postgresql.org/docs/current/static/plpgsql-implementation.html">how variable substitution works in
PL/pgSQL</a>.
In particular, it mentions that you can disambiguate column names from variable
names by labelling the declaring block:</p>
<div class="highlight"><pre><span></span><span class="o"><<</span><span class="n">block</span><span class="o">>></span>
<span class="k">DECLARE</span>
<span class="n">foo</span> <span class="nb">int</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="n">foo</span> <span class="p">:</span><span class="o">=</span> <span class="p">...;</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">dest</span> <span class="p">(</span><span class="n">col</span><span class="p">)</span> <span class="k">SELECT</span> <span class="n">block</span><span class="p">.</span><span class="n">foo</span> <span class="o">+</span> <span class="n">bar</span> <span class="k">FROM</span> <span class="n">src</span><span class="p">;</span>
</pre></div>
<p>This is tangentially related, but it does not cover the issue I was having.
However, it links to <a href="https://www.postgresql.org/docs/current/static/plpgsql-structure.html">Structure of
PL/pgSQL</a>,
which includes a note near the bottom:</p>
<blockquote>
<p>There is actually a hidden "outer block" surrounding the body of any PL/pgSQL
function. This block provides the declarations of the function's parameters
(if any), as well as some special variables such as FOUND (see Section
41.5.5). The outer block is labeled with the function's name, meaning that
parameters and special variables can be qualified with the function's name.</p>
</blockquote>
<p>That was the key piece I was missing! <strong>You can disambiguate a parameter from a
column by prefixing the parameter with the function name.</strong> Here's what we
needed to change to get the example to work:</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -13 +13 @@</span>
<span class="gd">- WHERE users_external.external_user_id = external_user_id;</span>
<span class="gi">+ WHERE users_external.external_user_id = create_session.external_user_id;</span>
<span class="gu">@@ -17 +17,4 @@</span>
<span class="gd">- VALUES (external_user_id, external_user_name)</span>
<span class="gi">+ VALUES (</span>
<span class="gi">+ create_session.external_user_id,</span>
<span class="gi">+ create_session.external_user_name</span>
<span class="gi">+ )</span>
<span class="gu">@@ -22,2 +25,4 @@</span>
<span class="gd">- RETURNING session_id, session_expiration</span>
<span class="gd">- INTO session_id, session_expiration;</span>
<span class="gi">+ RETURNING sessions.session_id,</span>
<span class="gi">+ sessions.session_expiration</span>
<span class="gi">+ INTO create_session.session_id,</span>
<span class="gi">+ create_session.session_expiration;</span>
<span class="gu">@@ -27,2 +32,4 @@</span>
<span class="gd">- RETURNING session_id, expires</span>
<span class="gd">- INTO session_id, session_expiration;</span>
<span class="gi">+ RETURNING sessions.session_id,</span>
<span class="gi">+ sessions.session_expiration</span>
<span class="gi">+ INTO create_session.session_id,</span>
<span class="gi">+ create_session.session_expiration;</span>
</pre></div>
<p>This simplifies the calling code to a single query!</p>
<div class="highlight"><pre><span></span><span class="o">></span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">create_session</span><span class="p">(</span><span class="mi">12345</span><span class="p">,</span> <span class="s1">'example'</span><span class="p">);</span>
<span class="n">session_id</span> <span class="o">|</span> <span class="n">session_expiration</span>
<span class="c1">--------------------------------------+-------------------------------</span>
<span class="mi">7</span><span class="n">be20fa5</span><span class="o">-</span><span class="mi">63</span><span class="n">ec</span><span class="o">-</span><span class="mi">4937</span><span class="o">-</span><span class="mi">8</span><span class="n">a02</span><span class="o">-</span><span class="mi">3417</span><span class="n">df54571b</span> <span class="o">|</span> <span class="mi">2017</span><span class="o">-</span><span class="mi">02</span><span class="o">-</span><span class="mi">15</span> <span class="mi">18</span><span class="p">:</span><span class="mi">44</span><span class="p">:</span><span class="mi">29</span><span class="p">.</span><span class="mi">653136</span><span class="o">-</span><span class="mi">05</span>
<span class="p">(</span><span class="mi">1</span> <span class="k">row</span><span class="p">)</span>
</pre></div>
<hr>
<p>Here's the schema I'm using for this example:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">users_external</span><span class="p">(</span>
<span class="n">user_id</span> <span class="nb">SERIAL</span> <span class="k">PRIMARY</span> <span class="k">KEY</span><span class="p">,</span>
<span class="n">external_user_id</span> <span class="nb">BIGINT</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">external_user_name</span> <span class="nb">TEXT</span> <span class="k">NOT</span> <span class="k">NULL</span>
<span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">sessions</span><span class="p">(</span>
<span class="n">session_id</span> <span class="n">UUID</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="k">DEFAULT</span> <span class="n">gen_random_uuid</span><span class="p">(),</span>
<span class="n">user_id</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">REFERENCES</span> <span class="n">users_external</span> <span class="p">(</span><span class="n">user_id</span><span class="p">),</span>
<span class="n">session_expiration</span> <span class="k">TIMESTAMP</span> <span class="k">WITH</span> <span class="n">TIME</span> <span class="k">ZONE</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="n">NOW</span><span class="p">()</span> <span class="o">+</span> <span class="s1">'1 week'</span>
<span class="p">);</span>
</pre></div>
<p>And the complete, rewritten, working stored procedure:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span> <span class="k">FUNCTION</span> <span class="n">create_session</span><span class="p">(</span>
<span class="n">external_user_id</span> <span class="nb">bigint</span><span class="p">,</span>
<span class="n">external_user_name</span> <span class="nb">text</span><span class="p">,</span>
<span class="k">OUT</span> <span class="n">session_id</span> <span class="nb">uuid</span><span class="p">,</span>
<span class="k">OUT</span> <span class="n">session_expiration</span> <span class="nb">TIMESTAMP</span> <span class="nb">WITH TIME ZONE</span>
<span class="p">)</span> <span class="k">AS</span> <span class="s">$$</span>
<span class="k">DECLARE</span>
<span class="n">existing_user_id</span> <span class="nb">INTEGER</span><span class="p">;</span>
<span class="n">new_user_id</span> <span class="nb">INTEGER</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="k">SELECT</span> <span class="k">INTO</span> <span class="n">existing_user_id</span> <span class="n">user_id</span>
<span class="k">FROM</span> <span class="n">users_external</span>
<span class="k">WHERE</span> <span class="n">users_external</span><span class="mf">.</span><span class="n">external_user_id</span> <span class="o">=</span> <span class="n">create_session</span><span class="mf">.</span><span class="n">external_user_id</span><span class="p">;</span>
<span class="k">IF</span> <span class="n">existing_user_id</span> <span class="k">IS</span> <span class="k">NULL</span> <span class="k">THEN</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">users_external</span> <span class="p">(</span><span class="n">external_user_id</span><span class="p">,</span> <span class="n">external_user_name</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span>
<span class="n">create_session</span><span class="mf">.</span><span class="n">external_user_id</span><span class="p">,</span>
<span class="n">create_session</span><span class="mf">.</span><span class="n">external_user_name</span>
<span class="p">)</span>
<span class="k">RETURNING</span> <span class="n">user_id</span>
<span class="k">INTO</span> <span class="n">new_user_id</span><span class="p">;</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">sessions</span> <span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="n">new_user_id</span><span class="p">)</span>
<span class="k">RETURNING</span> <span class="n">sessions</span><span class="mf">.</span><span class="n">session_id</span><span class="p">,</span>
<span class="n">sessions</span><span class="mf">.</span><span class="n">session_expiration</span>
<span class="k">INTO</span> <span class="n">create_session</span><span class="mf">.</span><span class="n">session_id</span><span class="p">,</span>
<span class="n">create_session</span><span class="mf">.</span><span class="n">session_expiration</span><span class="p">;</span>
<span class="k">ELSE</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">sessions</span> <span class="p">(</span><span class="n">user_id</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span><span class="n">existing_user_id</span><span class="p">)</span>
<span class="k">RETURNING</span> <span class="n">sessions</span><span class="mf">.</span><span class="n">session_id</span><span class="p">,</span>
<span class="n">sessions</span><span class="mf">.</span><span class="n">session_expiration</span>
<span class="k">INTO</span> <span class="n">create_session</span><span class="mf">.</span><span class="n">session_id</span><span class="p">,</span>
<span class="n">create_session</span><span class="mf">.</span><span class="n">session_expiration</span><span class="p">;</span>
<span class="k">END</span> <span class="k">IF</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="s">$$</span> <span class="k">LANGUAGE</span> <span class="n">plpgsql</span><span class="p">;</span>
</pre></div>