Measuring Engineering Productivity in the Age of AI
When you’re running an engineering team, the north star metric for any engineering leader should be developer productivity. The challenge with developer productivity is that output for developers is non-linear. You can’t just say “this engineer produced 500 lines of code today so they were productive“. Some days a two line code change can be the same amount of effort as a 1000 line change, it just depends on the complexity. Some days, the entire day is taken up by a big incident that has P0 impact on customers etc.
Other professions can measure productivity in a much more linear way. Sales can measure number of meetings booked and deals closed. Product can measure external KPIs to measure performance. Ops can roughly measure projects implemented.
The best metrics are actually relative: how much is this developer contributing compared to others. The way I see it, there are a few main ways to measure productivity of developers:
Story points
Lines of code
Squashed commits
By far and away, I think story points are the best metric, because it forces the entire team to measure how hard a certain task is. Every strong developer team should be measuring story points because it’s really the only way to track the progress of developers against each other. What I like to do is hold an hour, every two weeks where the team reviews tickets and scores them on a kanban. This makes it easy to track team velocity over time.
The problem with story points is that a story point a year ago !== a story point today with advancements in AI. A good example is a new endpoint in my app. A year ago, without AI, that probably would have been a 2 or a 3, but now I can just prompt my way to something that’s working so really it’s a 1 in terms of overall effort. With AI teams actually got more efficient, and that reduces the effort of certain tasks and overall increases velocity. The trick is some really tricky tasks (bug fixes that require lots of reasoning) aren’t assisted as easily with AI.
Story points are really the only way you should be evaluating individual engineers. The problem with engineering workload is that there can actually be lots of work that isn’t well tracked by the number of lines of code that are added or removed. Sometimes an engineer is spending all of their time on incidents, and this should be appropriately tracked and reflected too.
This is why I think to truly measure velocity, you also need to measure one of the two: lines of code (across the team) or commits on main. This is a good way to measure the net velocity of your team (not individual engineers). With AI teams are likely going from contributing 100k lines a quarter to 200-300k lines total. This should be measured when tracking overall velocity of the team. Squashed commits or commits on main are also a good proxy. At previous jobs for promos we tracked number of PRs for engineers. Turns out the people who we agreed were the most productive also put up the most PRs. The more PRs your team puts up, the faster you’re shipping.
I will acknowledge this is more of an art than a science. It takes experienced leaders to correctly track, but when you do, all of a sudden you can know when to speed up or pump the brakes on engineering velocity.