The benefits and challenges of a single pane of glass
Practical benefits and implementation challenges
Editor’s Note: This is part 2 of a 2-part series exploring the results of our recent survey on the concept of a single pane of glass. Check out the 1st part of our analysis here.
In part 1 of our single pane of glass survey analysis, we landed on a couple of key takeaways, which we’ll share again here:
- There is a distinct need for a single pane of glass, and a high degree of consensus on its required components
- The pain points felt by modern IT teams extend beyond traditional infrastructure monitoring and into application development and broader observability.
With consensus around the definitional attributes of a single pane of glass and the primary pain points outlined, we wanted to turn our attention to the tangible benefits a single pane of glass could provide, as well as what kinds of practical problems it could help solve.
Finally, we’ll conclude with a brief analysis of different factors—given all the agreement about the what, why, and how of a single pane of glass—that make this solution so difficult to implement.
Practical benefits a single pane of glass could provide
The following benefits identified by our survey respondents rely upon effective and thoughtful implementations, but they represent what might be possible with a single pane of glass.
Faster incident response
The hard truth is, outages occur, and they hurt.
System infrastructure, application build pipelines, customer-facing UIs—all of them can (and likely will) fail at some point, and without an effective plan for response and restoration, costs (and frustration) can mount quickly.
Incident response is one of the primary mechanisms for measuring system reliability. The faster teams and organizations can move a broken component from flashing red to steady green, the more likely they are to hit their SLOs and SLAs, reduce toil and limit alert fatigue, provide value to the business, and focus their attention on building more resilient systems and shipping new features.
If all the data was together in a single pane, then it would have drastically improved time to resolution.How a single pane of glass could improve incident response
Our survey respondents identified a few ways in which an effective single pane of glass could lead to faster incident response.
Reduce time taken to identify where an issue is occurring
Without a high-level view across the distributed components of a system, identifying the exact node (or even the exact service) that’s failing can consist of frantic response calls, wrangling a dozen different UIs, and quite a bit of anxiety.
Route issues more efficiently
The quicker monitoring and observability teams can identify where an issue is occurring, the easier and quicker it becomes to find the team that owns the failing component and initiate a fix.
Build more efficient incident response processes
With a single pane of glass, teams and organizations can have the confidence to develop and implement more proactive and efficient incident response policies. Implementing a layer of visibility that helps identify where an issue is occurring and who needs to respond to it can lay the foundation for building better processes over time.
Proactive monitoring and observability
As with most things data, pattern recognition can be an essential step towards a better signal-to-noise ratio. The same can be said for infrastructure monitoring—and, by extension, broader IT observability.
Reacting and responding to incidents is one thing, but an effective single pane of glass could help teams move beyond reactive approaches to monitoring and observability and towards proactive strategies.
“Having all the information together would potentially allow for proactive monitoring of our assets to prevent future outages.”Addressing a single pane of glass's role in proactive monitoring
Recently, the Shift Left concept has gained traction and signals a need for more proactive approaches to building reliable systems—the central idea being to take a process that’s normally reserved for the final stages of a given lifecycle and prioritize that same stage earlier on. Or in other words, moving from a reactive system to a more proactive one.
Our survey respondents signaled the potential of a single pane of glass to enable this shift.
Recognize issue patterns more easily over time
If one component of a system repeatedly fails or requires patchwork fixes, infrastructure and application teams can more readily identify the root cause of such a pattern.
Build and maintain more resilient systems
With evidence of a recognizable issue pattern, teams can more readily justify time spent building resilient systems and workflows.
Bake monitoring and observability practices into team culture
A single pane of glass, if implemented well, could also serve as an impetus to re-examine cultural practices around monitoring and observability. For instance, a more widely accessible and actionable entry point into a distributed system could empower individual teams to take ownership of the reliability of the components they manage, while also limiting additional reporting overhead.
Map data correlations and dependencies
In order to build, monitor, and observe distributed systems in more proactive ways, teams need to better understand how the components they manage correlate to others they don’t.
Within this framework, mapping the dependencies between elements of a system would require special consideration. Implicit in many of the pain points we identified earlier is the reality that, in highly complex and distributed systems, changes in one component can ripple out and cause unintended consequences for other components, teams, or even an entire service.
“Correlate data between different tools to get a better overview of monitoring and performance. Seeing this data side-by-side would be valuable for the different teams across the organization.”On the value of a single pane of glass that correlates and maps distributed data
An effectively implemented single pane of glass could provide the context and dependency maps necessary to quickly drill down to the right node or service that’s resulting in component- or service-level failures.
Better understand upstream and downstream dependencies
Mapping dependencies isn’t just valuable for individual teams—it can also serve as a way for key stakeholders, on-call responders, and others to get an at-a-glance view of system- and service-level health and status.
More efficiently drill down to examine/determine root causes
After the dust has settled with a given incident, it can be helpful to conduct a more robust root cause analysis. But if data isn’t correlated and dependencies aren’t effectively mapped, root cause analysis becomes much more time-intensive than it might be with a single pane of glass with at-a-glance and drilldown capabilities.
Communicate more effectively with stakeholders
So far, we’ve explored responses to our survey that put the concept of a single pane of glass in fairly technical terms—how it could improve existing tools and aggregate data sources, how it could improve key metrics like incident response times, and how it could create rich maps of infrastructure and application dependencies.
But there’s a bit of an elephant-in-the-room dynamic when we focus primarily on the technical specifics. Ultimately, no matter what tooling is in place or how sophisticated a hypothetical approach is, it’s still humans who will ultimately act on the information they have and respond to the organizational incentives that are in place.
“Having one central single pane of glass where everyone could view and check all systems and their mapped dependencies...would help with saving time and spreading the load across more team members rather than relying on just a few...Seeing this data side-by-side would be valuable for the different teams across the organization.”On the value of a single pane of glass that correlates and maps distributed data
Put another way, if individual teams aren’t encouraged and incentivized to prioritize the reliability and observability of the services they manage, then...they probably won’t.
While not a silver bullet, a well-implemented single pane of glass could be a key element in fostering a better culture around monitoring and observability.
Establish a “source of truth” for monitoring and observability
A unified point of entry that provides both at-a-glance and drilldown views of system status would increase visibility across a larger organization. Executives could quickly review system-level health, while individual teams could more readily understand and communicate the performance and reliability of the services they manage.
More easily publish metrics and KPIs to stakeholders
Management and leadership teams don’t necessarily need to know how the sausage is made, but they do need to ensure that systems are running smoothly and that customers aren’t experiencing major disruptions or outages. An effective single pane of glass would need to serve as a way for teams to share high-level progress on their KPIs without dragging management down into the weeds of server and application telemetry data.
Share knowledge internally
Survey respondents reported a tendency for individual component or service teams to act as data or knowledge silos. Breaking down those silos—or at least creating pathways between them—is another tangible benefit a well-implemented single pane of glass could provide.
Why is a single pane of glass so difficult to achieve?
Tada! We did it, right?
With the help of a bunch of industry practitioners, we’ve identified a clear and persistent need, detailed current pain points in monitoring and observability, and identified the tangible ways a single pane of glass could address those pain points. So in theory, we should be ready to go forth and implement a solution.
But as you might have already guessed, implementing an effective single pane of glass is much harder in the real world. And existing approaches struggle to fulfill the requirements and solve the pain points we’ve outlined, especially at enterprise-scale.
Traditionally, creating a single pane of glass from disparate data sources has started with centralizing that data. From this centralized database, data can be aggregated, queried, manipulated, and visualized—theoretically as a unified, big-picture view of an overall system.
In practice, however, there are at least a few issues with this classic approach:
“Yet another database”
Modern IT and application architectures are often built using a wide range of different databases. Individual microservices often have their own databases, and the tools teams use often do, as well. The appetite for adding yet another database in a scaling org might be an increasingly difficult sell.
Maintaining data vs deriving value from it
Mass ingestion and collection ensures that you’re capturing the immense range of data that’s being generated—but it isn’t helping you separate the signal from the noise, act on the data you have, or respond to incidents more efficiently.
Spiraling costs (both time and money)
Underneath the two issues detailed above is the cost of centralized data ingestion and collection. Storage costs can spiral with scale, and the time costs of maintaining, optimizing, querying, and otherwise administering a centralized database are certainly not trivial.
Building a modern single pane of glass at SquaredUp
So if more effectively using data—not simply collecting it—is the primary challenge scaling enterprises face, then what constitutes a more effective approach?
As we embark on a mission to create a modern single pane of glass, we’re hedging our bets on a few key principles:
Observability data extends beyond the “three pillars”
We’re referencing logs, metrics, and traces here, specifically (and we could also include "events" in this equation). While these data types are crucial monitoring and observability signals, they alone do not constitute a strategy or approach. Organizational context, business KPIs, and many other idiosyncrasies can affect how organizations take raw signals and turn them into actionable insights.
Connect data, don’t collect it
Instead of collecting and copying all data into a central store or database, organizations would be better suited building a system that allows them to connect, visualize, and report on the data they need, when they need it. We prefer an in-place analytics approach to data connection.
Embrace data sprawl, don’t manage it
Data sprawl isn’t something to recoil from. In fact, it’s what enables the rapid development and iteration of modern IT organizations. We believe all teams—from DevOps to frontend and everything in between—should work with the tools and systems they prefer, while also being able to seamlessly roll up and publish KPIs to their stakeholders.
Enterprise observability is, at its core, a knowledge problem
One of the primary threats of data sprawl and decentralized data storage/connection is the creation of knowledge or data silos. How systems work, what the dependencies are between their components, and other key pieces of knowledge are often lost in translation. IT organizations will need to develop systems that enable each team to capture and share important info about the services they manage.
Using our learnings from this survey as a springboard, we’ll be addressing in more depth the problems with existing approaches and the principles we believe are essential to a single pane of glass for the modern enterprise—both through the product we’re building and by sharing our unique perspective with the larger community.
We look forward to meeting this profound challenge head on, and we hope you’ll join us for the ride!