2017-05-22 to 2017-05-24
Living #opslife makes us keenly aware of the cavernous gap between lofty ideals and 3am reality. In a perfect world, everyone would be devopsing sans effort. In the real world, sharing oncall is not as easy as giving devs prod AWS creds, adding them to the rotation, and saying “good luck! have fun!”
Multi-team oncall means caring enough to let go, while not letting your beloved co-workers fall (and fail) unsupported. Distributing understanding requires building trust. Start by letting go of control, acknowledging the tribal knowledge you need to externalize, and untangling the tightly coupled bits. Feature teams have their own parts to play, such as writing health checks which aren’t aspirational. (Don’t return 200 OK when broken, unless your site is “This is fine” dog as a service.)
Architectural considerations such as microservices can really save your hipster soy-based breakfast strips, ensuring oncall doesn’t have to be an all-or-nothing scenario across your entire org. Containers and functions as a service are implementation details, as are features of your particular cloud, but in general, go for self-healing and redundancy. Treating infra as an armada, not a yacht, will keep you shipping.
From tightly-guarded fiefdoms to “of course all the devs are on call” to carefully negotiated compromises, I’ve lived this movie enough times to see what works (and what definitely doesn’t). I spent 1999 to 2015 on call for production infrastructure and made mistakes so you don’t have to! Spoiler alert: instead of volunteering as tribute to the vagaries of the pager, volunteer to invest in your architecture and your co-workers; you’ll sleep better at night.