Dumping a PostgreSQL database is more or less straight forward, loading the dump back is a bit trickier compared to that.

Dump the data

To get a dump, I usually only specify the parameter -Fc since I prefer the dumps in the custom format:

pg_dump -Fc $DATABASE_URL > db.dump

Loading it back

Loading it back into a developer's database instance is tricky compared to the dumping operation. I ended up using pg_restore in the following way by default:

pg_restore -c --if-exists -1 -x -O -d my_dev_db db.dump

The parameters do the following:

-c --if-exists

Clean up existing items before attempting to create them.

Using --if-exists makes it a bit relaxed, so that it will not fail if a new table has been added but does not yet exist locally. This is useful when regularly refreshing.

-1

Run inside of a single transaction.

I usually prefer this, since I'd like to know if something unexpected happens.

-x -O

Avoids that privileges or the owner are set.

This is useful when loading a dump into a developer's database server, you would typically not want this in other scenarios.

-d
This just tells it into which database to load the dump.

Use case

So far I found this most useful for the case of a developer running a local PostgreSQL instance when I wanted to load a specific set of data.

In other cases there is most probably a better way to automate things. E.g. on a CI server that it is usually easier to automate in a way where first an new empty test database is created, then data is imported. In this case you would not want -c --if-exists and it may be that you want privileges and owner data to be applied as well.

One thought on loading production data

It may be tempting to distribute a production dump by default, in some cases it might even be needed.

My suggestion is to avoid this as much as possible. It will create in many cases complications since data protection concerns would have to be met. If needed, set up something to automatically scramble the personal data and try to get to a reasonable subset. Ideally, solve it by providing a minimal fixture for development which has example data for all relevant cases.

So far I only found a real need to have a production dump at hands, when something around database migrations has to be done. Here I prefer to have a test run against a copy of the database before applying this in production.

Other approaches

Got a plain text SQL dump? psql should work.


Comments

comments powered by Disqus