DORA metrics in perspective

DORA is an entry point. It is not the end goal.

John Hayes

Senior Product Marketing Manager, SquaredUp

A friend of mine once had an annual appraisal where his manager blithely declared to him that his target for the next year was "to exceed his targets". Rather than spend the next year screaming silently whilst trapped inside an MC Esher-esque cycle of infinite recursion, my friend politely demurred and requested a more achievable goal, such as building a time machine out of jellybeans. This is a kind of an extreme case, but nevertheless it serves as a reminder that "targets" can often take on a life of their own, often far removed from the real-life phenomena they are meant to represent.

In the world of Engineering, we are engaged with systems amenable to empirical observation. Theoretically therefore, we can evaluate performance without recourse to woolly formulations such as "exceeding your targets". At the same time though, we must never underestimate the ability of Goodhart's famous law that "when a measure becomes a target, it ceases to be a good measure" to assert itself, no matter how unfavourable the circumstances might be.

*"I can't figure out why adopting DevOps has not improved team morale"*

The triumph of DORA

There can be few measures in the world of IT that have achieved such broad adoption in such a short space of time as DORA metrics. They were first defined in the 2014 State of DevOps Report and later expanded upon in the book Accelerate, which has established itself as a classic of DevOps literature.

Just to recap, the four measures which comprise the DORA metrics set are:

Deployment Frequency
Lead Time for Changes
Mean Time to Recovery
Change Failure Rate

In explaining the rationale for these measures, the authors were keen to stress that they are not just an end in themselves. They discuss the values in the context of alleviating developer pain. By this they meant the often tortuous, fragile and manual processes that developers often faced in getting their code deployed. So the metric was not conceived as a whip to make developers more productive, it was actually about removing friction and improving the developer experience. Even though the DORA team have become highly influential, they were not the only or even the first advocates of this school of thought. As far back as the year 2000, Joel Spolsky was extolling the value of One-click builds as an indicator of process maturity.

A measure of caution

Even though these measures have achieved a great deal of traction, they need to be used contextually and as just one instrument in a wider toolkit. They are a localized X-Ray rather than a full health check. Being able to deploy code commits frequently is proof of fluidity in your release process. It does not, necessarily, add business value. A better measure might be the number of features delivered. Even then, those features don’t have value if they are not being used, so ideally, we would also gather data on feature usage.

Equally, the seemingly uncontroversial MTTR metric has its critics. Being able to fix incidents quickly in trivial services does not compensate for slow remediation times in critical services with thousands or millions of users. Experts such as Martin Mao argue that in the cloud age, P99 and P95 may be more useful metrics.

Even those measures though, are not watertight. The 1% of outliers being excluded from your P99 reckoning may represent the severest incidents which had the greatest business impact. If you do decide to think outside the P99 box, you are on good company. Prioritising incidents on the basis of impact on the bottom line is also the approach taken by Google who, after all wrote the book on SRE. Having said that though, it now appears that some of those good practices were "aspirational" rather than a reflection of actual practice.

Going with the flow

There is also an argument for taking a broader view of the Lead Time for Changes metric – which is often used to measure the time elapsed between code being committed and code being deployed to production. In some ways, this could be seen as quite a narrow definition of “Lead Time”. Mik Kersten’s Flow Framework, for example, looks at the entire value stream from end to end. One of the metrics in the Flow Framework is Flow Time - which measures the time it takes for a Flow Item to move through the entire value stream - from idea to delivery.

The stretching of Time before Development begins

I remember many years ago working at a company where a team were given a project which was deemed to be "urgent" and needed to be delivered within four weeks. This was after it had been rattling around inside the company's demand management system for six years. Pretty much every developer can tell a similar story, and it is amazing how often the clock only starts ticking on a deliverable once it is handed over to the development team. Before that point, time is never of the essence, it is a kind of gloop which stretches and warps likes clocks in a Dali painting.

The spirit of DevOps

Ultimately, as I said earlier, the spirit of DevOps is one of continual self-monitoring and improvement. Although DORA metrics may provide a good foundation, you will doubtless notice other pain points and blockers which reduce your team's velocity. For example:

How long does it take to run your integration tests?
What percentage of your deployments fail?
Are you implementing feature flags?
Do you have automated processes for handling access requests?

In my own experience I have found the last item to be a must-have for improving flow. It means that DevOps teams don't have to spend time dealing with helpdesk tickets asking for resource permissions and at the same time developers are not blocked. The overall point is that each organisation will have its own distinct set of environments and configurations which will throw up unique bumps in the road and you will need to define your own measures for dealing with these.

A matter of context

Does this mean that I think you should junk your DORA metrics – no absolutely not. It is just a kind of reminder that measures only have value within a context. If you are only deploying to prod once every few months then, there may well be a problem. At the same time a company that is deploying to prod 1,000 times a day is not ten times better than a company deploying to prod 100 times a day. Having impressive numbers is no substitute for having continuous communication between developers and DevOps/Platform engineers, and those communications will reveal whether the numbers are meaningful and whether you really have high velocity, low friction processes. Maybe we could conclude that your target is to understand your targets.