PostgreSQL 9.6 snap and feedback

Fri Sep 2 11:29:17 UTC 2016

One of my colleagues wrote regarding his experimental postgres-9-6 RC1
snap, and I'm replying to the list with permission, because I think he
raises a lot of good points and perspectives and questions that folks
here would have comments on.

On 31/08/16 13:08, Stuart Bishop wrote:
>
> PostgreSQL is well behaved and well polished, so snapping was not
> problematic. Unlike what most people are recommending, I started
> directly with strict confinement. My gut feeling is that it is better
> to trip over issues one at a time when they come up rather than try to
> deal with them in bulk once you have something working in devmode.
> That said, I didn't trip over many.

I think there should be a meme-generator for this. "I don't often make
snaps but when I do I start strictly".

>
> The biggest problem I had was dropping privileges to a non-root user.
> There is currently no way to do this (nor to create such a
> non-privileged user when the snap is installed). PostgreSQL upstream
> block the root user from running tools this, with deliberately no way
> to disable it since they traditionally took flak from poorly managed
> installs (and they still bitch about Debian using non-standard paths).
> I needed to patch several places in the code, and there may be some
> more lurking in there not triggered by the PostgreSQL test suite.

OK, we definitely want to support this.

I think the scaffolding we need is:

 * such user names should be managed globally (i.e. assigned in an
assertion)
 * such user names should by design not conflict with real users

I think this requirement has come up before and what we decided was to
pre-reserve a name in Ubuntu which would be in /etc/passwd up front.
Perhaps this was for LXD, I forget. That's not a scalable solution but
it might work for Pg.

> Usability is currently poor. PostgreSQL is cli heavy, so having all
> the well known commands with munged names like
> 'postgresql-9-6.pg-dump' instead of 'pg_dump' would stop adoption all
> by itself. I understand that this is being addressed, and I will be
> able to present the dozen or so commands with their preferred names.
> But this will also cause different issues, when I snap postgresql-10.
> postgresql-10 will have the same set of tools. So I'll have two sets
> on my path, each only able to deal with their own confinement. I think
> the search path needs a defined ordering (eg. alphabetically), and
> tools need to be available in both their prefixed and unprefixed form.
> I will need to have both postgresql-9-6 and postgresql-10 snaps
> installed in the same container, as this is the only migration path I
> can come up with (allowing postgresql-10.pg_upgrade access to the
> postgresql-9-6 containment via the content interface).

Yes, these are very useful practical items of feedback for the command
work. Let's promise to take that up in a sprint, when it's close to the
top of our priority list. As a straw man it seems that we want groups of
snaps (your pg versions) to be able to overlap in command names, but
have only one of them get that top-level space at a time. That's a
little bit like update-alternatives but with a snap as the level of
granularity. If pg-10 is your default then all the top-level commands
are pg-10, others are pg-X.command.

>
> PostgreSQL is very extensible, and I haven't worked out the best way
> to handle it. This includes adding arbitrary 3rd party Python, Perl
> and tcl libraries for use from the built in stored procedure
> languages. This includes building, linking and installing entirely new
> stored procedure languages such as Ruby and all their dependencies and
> extensions. This includes building, linking and installing C stored
> procedures. This includes building other PostgreSQL extensions and
> installing them where they can be installed using 'CREATE EXTENSION',
> such as Citus for massive scale out and sharding or BDR for multi
> master replication. The ecosystem is too vast to package everything,
> and a lot of people are using bespoke and proprietary tools.

Yes, the general question about plugins and extensions is super
interesting. Snaps offer tight and deterministic binaries on demand, and
that slightly conflicts with extensible ecosystems, but we must explore
the boundaries of the known world to better understand where we're going.

Simplistically, a snap author could make an ecosystem of things they
bring into their snap. But what you want is a little less controlled
than that.

>
> Launchpad was invaluable, as my attention span doesn't cover uploading
> 100MB++ blobs over my awful 3rd world ADSL connection. Awful hacks
> were used to get test suites running as part of the build process from
> the static snapcraft.yaml, which in hindsite would have been done
> better as a custom plugin.
>

Yeah, a custom plugin is a much easier way to solve build issues than I
expected too. We could do a much better job of helping people write such
plugins, for example by documenting the BasePlugin much better, perhaps
as a skeleton that 'snapcraft init-plugin' gives you.

> I needed to write my own log rotation daemon. There seems to be no way
> to schedule regular operations.

System logging locally via syslog and systemd journal should be
straightforward interfaces (I think the latter is in-progress by Jamie).
The system should handle rotation of *those* logs. I like the idea of
having log rotation as a capability for logs in $SNAP_DATA and cousins.

> locales was tricky to get right, and I don't know if I got it right. I
> need *all* the locales available, as you can declare indexes and
> columns in particular locales to get particular collation orders. I
> ended up pulling in locales-all from universe as a stage-package, and
> my wrappers set LOCPATH to the snapped locales location.

Yes, I think numerous folks have run into this. I think an official
solution for locales-in-snap and locales-from-system would be most useful.

> Access to dotfiles in $HOME is necessary, like most of the CLI tools
> I'm interested in snapping. I suspect the use case for a home
> interface that blocks 'hidden' files is small, possibly non-existent,
> and it might be best to just remove the restriction than add a new
> interface.

Noted. I'd like to preserve the current plan, which is to extend the
home interface for parameterised access to known dotfiles, but this is
another good data point to consider.

>
> Storage is going to be a problem. I can't use the snap in the
> PostgreSQL juju charm, as I need to store WAL and data files on
> partitions provided by Juju, outside of containment. Non-juju
> production installs will also need to split WAL and datafiles onto
> separate partitions (even with pure SSD you can want separate channels
> for the increased bandwidth). And another potentially huge issue is
> that the database gets destroyed when the snap is uninstalled. I get
> the feeling that I need to be able to define arbitrary paths paths on
> the host to be accessible from inside containment.

Yes we will need to look again at storage for really serious
data-persistence software like Postgres.

> And on opening up paths on the host, unix sockets. By default,
> PostgreSQL clients try to connect to the server using sockets stored
> in a well known directory (/var/run/postgresql on Debian). Tools
> compiled against the standard PostgreSQL packages will not find the
> listening socket in /var/run/postgresql, but need to be told to look
> in /var/snap/postgresql-9-6/common/run. If the server running inside
> containment could write to /var/run/postgresql, then everyone could
> play together happily.
>

This should be handled pretty cleanly by an interface, and I would
suggest going ahead and making one for Postgres right away as that will
allow other snaps to use Postgres if they want it.

> Upgrades will be slow, as data (possibly terabytes) will need to be
> copied from the postgresql-9-6 container into the postgresql-10
> container. Non-snap installs get to hardlink the datafiles into the
> new location and migrate them quickly. This is another use case for
> allowing access to areas of the host.

The content interface should enable this - a snap could offer the data
spaces up to different versions of postgres. I'm not sure if the content
interface is yet ready for this particular use-case, but its definitely
in progress.

> Where is the best place to store data? I'm using $SNAP_COMMON because
> I can't let the data be rolled back (it might be a good idea on a
> standalone system, but rollback would desynchronize and break a
> replicated system).
>

For now that's probably best, until we have the content interface settled.

> How to customize the snap? PostgreSQL can be customized by sudo vi
> /var/snap/postgresql-9-6/common/data/postgresql.conf, but for snap
> configuration I went with /var/snap/postgresql-9-6/common/snap.ini
> (currently it just defines the log retention time).

For now I think an editable config file in a known writable location is
best. Later we'll have a standardised mechanism which your snap can use
to hook into bigger systems.

> I have lots and lots and lots of man pages not on my MANPATH. Not
> urgent, but nice to expose them.
>

Noted and I think in-progress.

> For general adoption, the bar is really high. PostgreSQL running in an
> machine container is exactly what people want (and what we have been
> pushing on the Cloud side of things). Even if we can address all the
> above issues, a PostgreSQL snap will still be more cumbersome than
> running PostgreSQL in an lxd container (or a docker container for that
> matter). Operators get to control and grow the environment they need,
> rather than being stuck with the static one I provide and the
> interfaces, plugs and canned scripts that customize it in preapproved
> ways. I've been wondering if the solution is adding a new 'machine'
> confinement, where the snap software is installed and updated in a
> real lxd machine container on the host. Maybe the best of both worlds?

Let's start with a more focused used-case: postgres as a storage engine
for IoT. In that case the level of customization is much reduced, people
are looking for some highly predictable behavior more than every
possible degree of freedom in customization. Remember they can also
build postgres INTO their own snap (and it actually might be worth
exporting a shared pg part so people can do that trivially).

Mark