Working with a database will involve at some point a needed change of the schema. Sometimes also data has to be migrated. I am referring to this case as a data migration.

Alembic is my preferred tool for database migrations when using SQLAlchemy as the ORM. It has very good support for schema migrations and allows to migrate data as well. One approach which I often find good enough, is to write the migration in a plain SQL statement. This post shows the pattern which I found most useful so far for this case.

Prepare an new migration

Alembic comes with a command line tool which helps with the common tasks. In this case the goal is to prepare an empty migration as a starting point. This can be achieved with the following command:

alembic revision -m "Example data migration"

Add the needed SQL

The new migration should have an empty function called upgrade() which looks like the following code excerpt:

def upgrade():
    pass

This is the starting point to get busy. For quick tests I usually have a dump of relevant example data at hands and I am able to quickly load it again into my development database.

After adding the missing bits, upgrade() should look like this:

def upgrade():
    connection = op.get_bind()

    result = connection.execute("""

    UPDATE users
       SET active = TRUE
     WHERE login = 'test@example.com'
    RETURNING id, login

    """)

    for user in result:
        log.info('Activated user. id=%s login=%s', *user)

Suggestions

I always try to leave a trace of what I did in the logging system. When using plain SQL statements in data migrations it is usually sufficient to make use of the clause RETURNING to be able to log relevant information.

There can be situations when this is not a good idea, usually depending on the amount of records you are going to change. Typically this seems to be practical though.

Conclusion

Quite often a good old plain SQL statement seems to be good enough to get the job done. It even allows me to work in the native database client first and then freeze a copy of the final statement into one migration. This way it is documented in the version control history and can go through the regular deployment pipeline into the production environment.

There are downsides though: In particular if you support multiple database systems, then this approach can find its limits as soon as a specific extensions of the database's dialect are used. In such cases it might be better to try a higher abstraction level and freeze the needed part of your ORM models as part of the migration.

Pointers


Comments

comments powered by Disqus