Josh Chessman, Senior Analyst at Gartner, spoke at SquaredUp Live 2021 on the future of IT operations (IT Ops) monitoring and how monitoring teams need to change to get there. Here are some of the highlights from his talk.
In IT Operations monitoring, the only things that will matter are business outcomes and customer experience.
Currently, we simply reactively monitor. We ask, ‘Are things good or bad?’ But we rarely, if ever, predict or proactively fix issues. Monitoring today isn’t set up for that, but it needs to change.
“By 2023, monitoring efforts not aligned to customer experience initiatives will have >50% risk of having their budget cut.” Gartner
The current state of IT Ops monitoring
Monitoring within IT has focused on measuring the performance and availability of applications and transactions, whether physical or virtual, systems, hardware, networks, servers, storage… But there are several significant issues that, if not addressed, will put monitoring at risk of becoming unimportant to the business and seeing a substantial reduction in funding.
Today, in IT Ops monitoring:
- IT teams are often out of sync with business goals
- Too many expensive overlapping tools and technical debt
- Limited ability to embrace automation as a way to enhance its agility
- The use of AI and ML technologies is immature
- Visibility gaps from containers and cloud migration
This needs to change. We need to mature as IT Ops monitoring.
There are five levels of IT monitoring maturity:
- Availability monitoring
- Service monitoring
- Business insights
- Self-driving ITOM
Most businesses are currently achieving diagnostics and service monitoring. Some are still only doing availability monitoring. But, according to Gartner research, only 5-10% are achieving business insights. Self-driving ITOM hasn’t yet been reached but will come in a few years.
However, only being able to identify whether the server or network is up or not isn’t good enough anymore. Monitoring conditions is the bare minimum, not the end goal.
Current monitoring metrics, while great, don’t mean anything to a business audience. The business doesn’t care about mean time to prepare or round-trip time. They care whether the service was delivered to the end-user or not. They care whether the sales are made or not. And end-users have come to expect a lot of the customer experience too – they use the likes of Netflix day in and day out.
Business outcomes and customer experience are what matter.
Shift to Focus on Business Metrics and Customer Experience
In March 2020, Microsoft had a certificate that expired. Nobody noticed, and that meant that Microsoft Teams was down for two hours in many parts of the world just when everyone was relying on video conferencing for working from home in the middle of a pandemic.
Costco went down for almost 16 hours over Black Friday in 2019 and lost approximately $11m because their internal servers were unable to handle the unusually high load that was being generated from all their sites.
Apple’s shares fell 2% after a 12-hour global iTunes outage a couple of days after the Apple Watch launch due to an internal DNS error.
When monitoring is reactive only, these are the scale of business problems that issues can cause when not caught in time. The worst scenario in SaaS is discovering your application is down via frustrated end-users on Twitter.
If business outcomes matter so much, how do we support them in monitoring?
We need to move from answering, ‘Is it up or down?’ to ‘Why and how?’ How can we fix problems? How fast can we get things better? How agile can we be? And can we move at the speed of business?
This requires a very fine-grained level of monitoring so we can make the shift to proactive monitoring.
We need to know what’s going to happen, when it’s going to happen, before it happens.
A big problem in the way of achieving this is tool sprawl. Although there are many point solutions you need, organizations have numerous tools for monitoring, some even having over three hundred monitoring tools.
But you need a way to view all the data in context and relate it all to the business objectives you’re trying to achieve and how it’s helping. If you’re not helping, you’re a hindrance. IT needs to be a team player.
This is why there’s a shift toward a single tool for monitoring everything and a need for advanced intelligence in IT monitoring.
Where IT Ops monitoring is headed
The future of IT Operations monitoring will be unified, holistic, agile, intelligent, and most importantly, business-focused.
Gartner suggested that, by 2025, 50% of new cloud-native application monitoring will use open-source instrumentation instead of vendor-specific agents for improved interoperability, up from 5% in 2019.
We’re seeing a major shift away from vendor-specific proprietary agents to gather data. The move is more towards open source agents because it provides more flexibility and allows applications to be instrumented from the inside out.
This need for a unified and holistic oversight of all the business objects and applications has brought about the rise of ‘observability’ as a bit of a buzzword. Though the term is used quite broadly, its original intention is more focused on the application.
Observability is the movement or change from having to instrument an application to the application being made observable.
Observability is no longer about collecting but connecting.
The process of observability is a shift away from having to deploy a proprietary agent on my application or server to building an open-source agent into my application. And that’s going to provide data to everybody, not just the vendor you’re using.
This isn’t being seen everywhere yet, but there is a move toward this unification of data.
The future of IT Operations monitoring is that, instead of the agent asking, ‘Is everything okay?’ the apps and infrastructure are going to self-report, ‘Here’s what’s going on.’
The objective is to identify the unknown unknowns.
The promise of AIOps
Ultimately, we expect to see AIOps emerge as the answer to monitoring with a business outcome focus. It promises to deliver consolidated a analysis of performance across domains, including application performance management (APM), IT infrastructure monitoring (ITIM), network performance management and diagnostics (NPMD), and digital experience monitoring (DEM).
AIOps will bring in the switch from reactive monitoring to proactive by delivering useful, actionable insights quickly that augment human intelligence. AIOps will find current and potential future problems by detecting change as the cause of the issue and then recommend solutions.
The foundation you need to lay in IT Ops monitoring is a good configuration management database (CMDB) because you need to know everything in your environment, and integrate it with your ITSM. When everything is integrated, you get a better view.
Following AIOps is the addition of automation. Automation in monitoring will increase agility because, once AIOps has identified the problem, you can make changes faster, more efficiently, and reliably. For example, if you need to make 50 changes to 100 routers, that’s a lot of manual labor and the error potential goes up exponentially. But if ML automation can do that for you, you regain a huge amount of time and will see significantly fewer errors.
Plus, automation will come with simulation so you can simulate your changes before you implement them. Eventually, we’ll see closed-loop automation.
What should enterprises do to prepare for the new IT monitoring environment?
In light of all we’ve covered about the changing face of IT Ops monitoring and the promises of the AIOps and automation future, how can you start to pivot your IT Ops team to align with the demands of the business?
First, reorient your perspective. Monitor from the top down.
Monitor by first asking: what are my business processes and how are we supporting and achieving the processes and goals? Are we able to achieve what we’re trying to do?
This requires you to push your boundaries, become more agile, and figure out what’s going on beyond what’s up and what’s down.
If the application latency is up 10% following a new release just 10 mins ago, you might not think that’s so bad. But if you haven’t been monitoring sales alongside that, you wouldn’t notice that the latency increase coincided with a drop in sales. That suddenly makes the issue potentially much bigger because you’re monitoring from a business outcomes-first perspective rather than IT-first perspective.
You need to evolve through the 5 stages of monitoring maturity.
Most organizations today are at stage two or three, but we need to move to level four – business insights. Stage five will come in a few years’ time.
But how do you level up? Here’s a step-by-step roadmap for the next five years.
2021: Adopt a product-centric focus aligned with business outcomes. Use AIOps. Collaborate with adjacent teams (ITSM, DevOps, SecOps)
2022: Broaden IT Ops skill sets. Shift from the single-stack specialist to a generalist.
2023: Support digital business product monitoring.
2024+: AI-driven closed-loop automation.
We want to become the creator, curator, analyst, utilizer, and distributor of business insights and IT Operations wisdom. If we can’t get there, IT Ops will continue to struggle because we’re not able to help business outcomes or customer experiences.
Everything needs to feed up to the business level. IT is the bedrock of businesses, so prove it.
And remember: the only things that matter are business outcomes and customer experience.