DevOps Lessons from Real Production Systems
Hard-won lessons from years of keeping client platforms alive: boring deploys, honest monitoring, and why your rollback plan is the real pipeline.
Tanjil Ahmed
Lead Software Engineer · Notionhive
Everything I know about DevOps I learned from systems that were already in production, serving real users, making real money — where 'let's just try it' is not an experiment but an outage. These lessons cost incidents to learn. They're cheaper to read.
Deploys should be boring
The goal of a CI/CD pipeline is not speed — it's the removal of adrenaline. A good deploy is indistinguishable from no deploy: tests gate the merge, the build is reproducible, migrations run safely, the release switches atomically, and nobody watches the graphs afterward because the graphs never move. If deploys are exciting, the pipeline is incomplete.
The rollback is the pipeline
- Every deploy needs a tested way back — not a theoretical one.
- Migrations must be backwards-compatible for at least one release; add columns before you depend on them, drop them a release after nothing does.
- Feature flags separate shipping code from launching features — deploy at 3 PM, launch when ready.
- If restoring last night's backup has never been rehearsed, you don't have backups. You have hope.
Cache with a plan, not a prayer
Caching rescued more launches than any hardware upgrade — and caused more mysterious bugs than any other layer. The rule that survived: every cache needs an owner, a TTL, and a documented invalidation story. 'We cache it in Redis' is not a strategy; 'product pages cache for an hour and bust on update' is.
Production doesn't care how elegant the architecture diagram is. It cares whether Tuesday's deploy can be undone by Tuesday's on-call.