2016-02-10T15:21:43Z
Resolving Database Schema Conflicts
If you work on a project that uses database migrations with other developers, it is likely that you have experienced migration conflicts at some point. These occur when two or more developers are merging unrelated features to the master source control branch at around the same time, with each feature requiring different changes to the database.
In this article, I'm going to describe the problem and its solution in detail, using an actual example based on my Flask-Migrate extension. While I will be using commands that are specific to Flask-Migrate and Alembic, the solution to the problem that I present here can be adapted to other database migration frameworks.
The Problem
Let's say we have a team of two developers working on different features of an application. Mary, for example, needs to implement the user authentication subsystem, while David needs to add avatars for each user. The two developers start working on their features at about the same time, each on a freshly cloned copy of the team's git repository.
At the time Mary and David clone the project to start their work, the project has a User model that looks like this:
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128))
They are using Flask-SQLAlchemy as ORM, and Flask-Migrate to track migrations to the database. The database migration history at the time both developers begin working on their features includes a single migration:
$ python app.py db history
<base> -> 279ebc64991a (head), add user table
Now Mary and David get to work on their features. Mary needs to add password hashes to the User model, so one of the very first things she does is to edit the model as follows:
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128))
password_hash = db.Column(db.String(64))
Immediately after editing the model, she generates and applies a database migration, to make this change on her development database:
$ python app.py db migrate -m "add password hashes to users"
INFO [alembic.autogenerate.compare] Detected added column 'user.password_hash'
Generating migrations/versions/d3868407e935_add_password_hashes_to_users.py ... done
$ python app.py db upgrade
INFO [alembic.runtime.migration] Running upgrade 279ebc64991a -> d3868407e935, add password hashes to users
She then checks the database migration history to make sure the change was applied:
$ python app.py db history
279ebc64991a -> d3868407e935 (head), add password hashes to users
<base> -> 279ebc64991a, add user table
So Mary happily goes off to work on adding password hashes.
Meanwhile, David needs to add avatar URLs for all the users, so starting from the same User model as Mary, he makes the following change on his development environment:
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(128))
avatar = db.Column(db.String(256))
And he also creates a database migration for this change:
$ python app.py db migrate -m "add user avatars"
INFO [alembic.autogenerate.compare] Detected added column 'user.avatar'
Generating migrations/versions/678d339a120f_add_user_avatars.py ... done
$ python app.py db history
279ebc64991a -> 678d339a120f (head), add user avatars
<base> -> 279ebc64991a, add user table
Let's assume that David finishes his work first, and pushes the user avatar changes to the upstream repository. This commit triggers an automatic deployment to a staging server, where David goes and checks that user avatars are working fine. He finds no problems, so he goes to find other work.
When Mary completes her work on user authentication, she tries to push to master and gets an error that tells her that her source tree is out of date. So she does a git pull --rebase, and then tries to push again. This time the push succeeds, so then she anxiously waits for the staging server to update so that she can check her work before moving on. But something bad happened, the deployment to the staging server failed horribly.
Can you see why? When Mary goes to check the logs of the failed deployment, this is what she finds:
$ python app.py db upgrade
Traceback (most recent call last):
...
alembic.script.revision.MultipleHeads: Multiple heads are present for given argument 'head';
678d339a120f, d3868407e935
And sure enough, she did not realize this, but even her own source tree got into a weird state after she pulled in David's change:
$ python app.py db history
279ebc64991a -> 678d339a120f (head), add user avatars
279ebc64991a -> d3868407e935 (head), add password hashes to users
<base> -> 279ebc64991a (branchpoint), add user table
So what happened here? As you see above, the migration history is not linear anymore. Migration 279ebc64991a was the starting migration for both Mary and David, so both their migrations are based on this change, creating a branched migration history with two heads.
It's interesting to note that when Mary pulled David's change git did not complain. These schema migrations are stored each in its own file, so git did not see conflicts in Mary's change and allowed her change to be merged, causing the project to break. Not nice, right?
How to Detect Schema Conflicts Before They are Committed
Before I tell you how to untangle this schema mess, let's think about Mary's actions. Could she or her team have done anything different to prevent conflicts like this from ever appearing in the team's repository?
There are certainly ways to prevent this type of conflicts. A migration history test could be written to find this and other problems with migrations. This is a test that creates an empty database, and simply applies all the migrations in order, to ensure that they all run fine. The test can then downgrade the database all the way back to its initial state, to also test downgrades, something that very few people check. A migration history test can be included as part of the application's unit test suite, or as a source control pre-commit check.
How to Resolve a Schema Conflict with a Merge
While detecting these conflicts before they are pushed to the shared repository is important, let's not forget about Mary's situation. Because she applied her migration before pulling in David's changes, her database is aligned with one of the two branched heads in the migration history. David's migration is in the other branch, so it can't be applied through an upgrade.
There are a couple of ways to unlock Mary's database. Recent releases of Alembic and Flask-Migrate support the merge command, which creates yet another migration that joins these multiple heads, creating a diamond shape. To get everything back in order with a merge, you need to run this command:
$ python app.py db merge -m "merge migrations from mary and david" 678d339a120f d3868407e935
Generating migrations/versions/66fd23fca675_merge_migrations_from_mary_and_david.py ... done
And then after that, doing an upgrade works fine:
$ python app.py db upgrade
INFO [alembic.runtime.migration] Running upgrade 279ebc64991a -> 678d339a120f, add user avatars
INFO [alembic.runtime.migration] Running upgrade 678d339a120f, d3868407e935 -> 66fd23fca675, merge migrations from mary and david
If you look at the updated migration history, you will notice that the branches remain in the history behind the new merge migration:
$ python app.py db history
678d339a120f, d3868407e935 -> 66fd23fca675 (head) (mergepoint), merge migrations from mary and david
279ebc64991a -> 678d339a120f, add user avatars
279ebc64991a -> d3868407e935, add password hashes to users
<base> -> 279ebc64991a (branchpoint), add user table
Committing this new migration and pushing it to the team's git repository will address the problem and allow the deployment script to apply the migrations correctly.
How to Resolve a Schema Conflict Without Alembic Merges
I honestly find these diamond shaped merges confusing, I prefer to always have a linear migration history, so my approach to fixing schema conflicts does not involve Alembic's merge command.
Let's look at the state of the migration history on Mary's environment when she found she had a conflict:
$ python app.py db history
279ebc64991a -> 678d339a120f (head), add user avatars
279ebc64991a -> d3868407e935 (head), add password hashes to users
<base> -> 279ebc64991a (branchpoint), add user table
Her database is synced to migration d3868407e935, which is one of the two heads:
$ python app.py db current
d3868407e935 (head)
To unlock the migration history, all that needs to be done is to alter the order of the migrations, so that these two migrations by David and Mary happen one after the other. Because David was first to commit his migration, the correct thing to do is to move Mary's migration after David's.
The first step to reorder the migrations is to move the database back one migration, so that it isn't inside Mary's branch anymore:
$ python app.py db downgrade
INFO [alembic.runtime.migration] Running downgrade d3868407e935 -> 279ebc64991a, add password hashes to users
With this command, the changes that Mary made to her database are removed. The next step involves manual editing of Mary's migration script, which is in file migrations/versions/d3868407e935_add_password_hashes_to_users.py. The first few lines of this script are:
"""add password hashes to users
Revision ID: d3868407e935
Revises: 279ebc64991a
Create Date: 2016-02-09 22:13:25.135581
"""
# revision identifiers, used by Alembic.
revision = 'd3868407e935'
down_revision = '279ebc64991a'
# ...
The important part here is the revision that is set as the down_revision, which is the migration that comes right before it in the history. To move this migration script after David's, all that needs to be done is to replace revision 279ebc64991a with David's revision code, which is 678d339a120f. After you edit the two occurrences of the previous migration, the migration script should look like this:
"""add password hashes to users
Revision ID: d3868407e935
Revises: 678d339a120f
Create Date: 2016-02-09 22:13:25.135581
"""
# revision identifiers, used by Alembic.
revision = 'd3868407e935'
down_revision = '678d339a120f'
# ...
And with that change saved, the migration history has been reordered and is again nice and sequential:
$ python app.py db history
678d339a120f -> d3868407e935 (head), add password hashes to users
279ebc64991a -> 678d339a120f, add user avatars
<base> -> 279ebc64991a, add user table
And now an upgrade cleanly applies David's migration and then Mary's:
$ python app.py db upgrade
INFO [alembic.runtime.migration] Running upgrade 279ebc64991a -> 678d339a120f, add user avatars
INFO [alembic.runtime.migration] Running upgrade 678d339a120f -> d3868407e935, add password hashes to users
At this point, the schema migrations are back in order and can be pushed to the team's repository to address the breakage.
Final Words
How do you deal with schema migration conflicts? If you have a different technique and want to share it, please write below in the comments!


#1 Igor said 2016-02-11T10:01:30Z
#2 Miguel Grinberg said 2016-02-11T17:19:45Z
#3 Cecil said 2016-02-11T22:41:38Z
#4 Igor said 2016-02-12T09:11:22Z
#5 Jack Grahl said 2016-02-12T13:33:37Z
#6 Miguel Grinberg said 2016-02-14T20:01:40Z
#7 Frank said 2016-04-05T00:36:31Z
#8 Miguel Grinberg said 2016-04-05T02:08:37Z
#9 yuhan cheng said 2018-07-16T06:53:30Z
#10 Miguel Grinberg said 2018-07-16T17:36:49Z
#11 Kiptoo Magutt said 2018-07-28T13:52:51Z
#12 Adithya said 2020-07-27T10:54:44Z
#13 Miguel Grinberg said 2020-07-27T14:53:20Z