Blog of Rob Galanakis (@robgalanakis)

Don’t bother with migrations for performance-based indices

I used to write database migrations for performance-based indices, like adding an index to a foreign key column. I don’t do it anymore, and do not recommend the practice. Instead, write your SQL against a production-like database (same schema and data), and run it against production directly as part of your “operations” rather than as part of your development workflow.

Caveats: This doesn’t apply to migrations part of the initial schema (may as well do that at once), and it doesn’t apply to indices part of a constraint, like a unique constraint. Also, you may have totally valid special-case exceptions, I’m only speaking as a rule of thumb on project sizes that you’d take advice on my blog from, like from 1-100 devs. When you get past a certain size, you probably already have standards for all these things, so you do you.

Writing formal migrations for performance-based indices doesn’t really have any upside.

  • You don’t need them in non-production-like environments because they only matter when there is enough data and traffic to be worthwhile in the first place.
  • They do not change the correctness of the code so are not relevant to the code.
  • They are impossible to develop locally. You need to develop and test them in a production-like environment anyway.
  • They’ll come along with database restores anyway, so your production-like staging or wherever that is based on production data will still get them.

The process I see everyone follow ends up being:

  • Run SQL in a prod-like environment to find something that works
  • Copy it to code
  • Go through code review, merging, and deployment
  • CD runs migration on production
  • Verify performance has changed

It seems like you can have a much more iterative and faster process to avoid the migration and CD hoops. That said, you still do need a rigorous process for indices- unwinding from “we added too many indices and it hid performance issues” is almost impossible.

So the process I suggest is:

  • Iterate on the migration in a prod-like environment
  • Get another set of eyes to review the migration and effects
  • Document the rationale and performance results (like a GitHub issues comment with the before/after EXPLAIN)
  • Run the migration in prod directly
  • Verify performance has changed

I mean, performance index migrations in code isn’t the end of the world, but I feel like removing the friction on this encourages us to fix issues faster and explore more. Like, maybe you want to replace a btree with a gin index- it seems silly to have to write a migration (and then another to potentially undo the change). So I encourage teams to avoid the rigamarole of following the code-change path for performance indices and be more open to experimentation
by exercising the same rigor but with a shorter process.

Leave a Reply