At DoctorCare we’re a believer in the simplicity and value of continuous delivery. We try and avoid long lived branches. All pull requests go to master and master always ships.
In fact, there’s no way not to ship master once a PR is merged (outside of stopping the build mind you).
For the most part, we’re OK at this. Not great. But ok. We’re running a django application using Bitbucket Pipelines for deployments into AWS Beanstalk with a Postgresql RDS backend. That’s a pretty vanilla deployment if ever there was one and we’re happy to keep it simple.
This past friday however, we managed to break it, badly. Here’s how.
We dropped a constraint on one of more commonly used tables. The migration that called for the dropping of the constraint failed to run cleanly in production and the build failed. The big sin here - of which there are plenty honestly - is that I failed to appreciate that dropping a constraint requires an ACCESS EXCLUSIVE lock on a table in Postgresql. I knew that the addition of a constraint requires one but I simply didn’t consider the dropping of a constraint to require it. Shameful.
The reason that the lock failed to acquire is one I never got to the bottom of. Obviously it was waiting on other locks to release on the table. We have some users that occasionally connect into production to read some data and I believe one of their queries had been running for a long time.
Marco Slot from Citusdata explains it nicely from his blog post
One other thing to be aware of is that Postgres uses lock queues. If you run an ALTER TABLE command then that command goes into the queue and blocks until all queries on that table are finished, but any SELECT that comes immediately after will be blocked until the ALTER TABLE is finished even if the ALTER TABLE is not yet running.
So our ALTER TABLE command fails to acquire the lock, this puts it into the queue and pretty much anything after that is going to queue up behind it and wait.
Solutions to this aren’t that hard honestly and there are great people in the community that have solved this by writing another postgresql backend for django application. This is one example.