My On-Call Investigation Tool

I’ve been on-call and the thing that always got to me wasn’t the incidents themselves, it was the context-switching. You get paged, you stop what you’re doing, you open six different tools, each with its own auth and query language, and you start piecing together what’s broken. By the time you’ve gathered enough signal to have a theory, twenty minutes have passed and half of that was just logging in to things. ...

Standardizing Docker and CI across Microservices

The company I’m working at currently runs around 30 Node.js microservices. At that scale, it doesn’t take long for each service to start developing its own “personality.” At first, it’s subtle. One service uses a slightly different base image; another has a bespoke CI script because of a weird test dependency. But eventually, you wake up and realize no two services work the same way. On paper, they all do the same thing: install dependencies, build TypeScript, run tests, and deploy. In practice, every repository has its own “flavor” of that workflow. ...