Backwards compatible database changes
Written by Jeroen van Veen on 5th March 2015
In this document, when I talk about backwards compatible database changes, I mean that the last version of your project’s code is able to run against the database scheme of your next version without too much problems. Why would you desire this?
Tables and rows
There are different ways to look at a database. As a DevOp I often work with its tables and rows. I see the database as separate from the running code and am used to store and retrieve information. Usually there is more than one database server (or web server for that matter) and we are working to keep the information stored in the tables in sync.
If you’re a coder at a company like Spindle, you’ll probably work with a framework like Django. It has a built-in ORM. You can instantiate objects and it takes care of their storage in the database. Of course you know they’re stored in the rows of some table, but you don’t have to think about that every time. If you change your models, there is usually some tool available (South in our case) to make the database ready to work with them. Lots of companies leave the deployment solely to devops meaning that as a programmer, you can get away with this way of thinking. Spindle, however, is not one of them and for this article I must ask you to take a look from my perspective.
The drawback of not worrying about the database scheme
The drawback of not worrying about the database scheme is the following: at any point the database is only compatible with one version of your code. If you for instance remove a column in your model and run the migration, from that moment on, your old code will crash. Since you can’t instantly upgrade your web server(s) and your database scheme at the same time, this means you have to go into maintenance mode not accepting any clients at your website. But it gets even worse: testing is never perfect and you might find yourself in the situation that you need to do a rollback. If your old code is incompatible with your current database scheme, the only way to do this is to restore a database backup. This can mean a long downtime and even the loss of data. Yes, in some cases you could make a backwards migration. But if you need to restore many thousands of values, this becomes impractical or even impossible.
If your database is compatible with both the old and new versions of your codebase, you only need to set your previous release live and your website is up, letting you fix any problems without a sweat. If you have a setup of redundant web servers behind a load balancer (you really should), the benefits grow even bigger. Actually I mean: you MUST make your scheme migrations backwards compatible. You can then take a webserver out of the load balancer, deploy your update and test it by host filing to it. You can then switch the load balancer, directing traffic to the newly deployed hosts and away from the hosts that still run the old code, keeping the inactive web servers in reserve. In case there is a problem with the new code after all, you can switch back by inverting the active hosts on the loadbalancer again.
Backwards compatible database changes?
So how do we do these backwards compatible database changes? It basically means planning. If you will need a new column next week, you can run the database migration now (technically you’re creating a forward compatible database scheme here). And if you have something like this in your migration:
class Migration(SchemaMigration): def forwards(self, orm): db.delete_column('permission_systemuser', 'system_name')
then there i s no need to do this right at the time of the migration. Just make sure that the column has a default value. If the new code becomes active it will run fine. You can then delete (or repurpose!) the column at the next deployment.
It’s important not to rename a column. It makes things complicated and there is no technical necessity. If you absolutely need to, you can perhaps think of something by using the db_column property of your model but only if you absolutely have to. The same goes for changing the type of a column. Just add a new column and remove the old one some other time (the next deployment for instance).
Scheme migrations and data migrations
So far we have mostly discussed scheme migrations. We also need to think about data migrations. As you probably guessed, this involves moving data. This can be from one column to another or calculating a value from data in different columns. When you do this, make sure you put the new style values in a new column, leaving the old column and its data intact. Also make sure to do these migrations incremental, looping over a limited number of records until all data is migrated.
I realize this post is far from complete. But the main thing is not exactly adhering to some guide. Just think about a code upgrade and database migrations as a process and not something that happens automatically.
Do you have ideas on doing database migrations in a high availability (Django) setup? Please let me know!