A million moving parts
Applications are built and run by many people and made of many components: infrastructure, code pipelines and end users to name a few. Understanding the status of those components and teams is never straight forward.
In this blog, we will be unpacking the problem faced by most organizations and taking a look at how SquaredUp can empower you and your organization with status visibility across different teams / components / services – all in one view.
The reality: siloed teams with no visibility of complete status
I used to work for a company whose revenue was made from selling subscriptions for its world-famous online publication service. Any outage could impact our company’s bottom line. The service was developed and maintained by 4 Engineering teams and 1 DevOps team, each responsible for a specific area of the service (authentication, front-end, back-end, content delivery, subscription management).
- Metrics and statuses were available across different tools, but each tool was monitored only by the team responsible for it, which limited the ability to get the big picture for everyone else.
- Engineers cared about the experience of the end users; the CTO cared about renewal rates, especially if those were impacted by a poor-performing component of the application. But there was no central place for anyone to check that everything was OK.
- When something broke, there were multiple places to go. One day, our users started to complain that they could not login. After running around investigating many teams and tools, we found the root cause: a buggy small release had been deployed at the same time the outage had begun. It would have been blatantly obvious with the right status dashboards.
The solution: unified status dashboards in one place for all to explore.
SquaredUp enables an organization to plug into any tool and surface real-time status. With 50+ plugins, each team can capture and share important information about their own component – what it is, what its dependencies are, and what the status is. No data is centralized and stored; every plugin streams its data, meaning teams can keep the tools they love.
SquaredUp plays nicely across teams – centralized visibility that keeps ownership decentralized. DevOps and Engineering teams can choose to publish any data they need from the tools that they use: key metrics, logs, alerts, documentation, pipeline releases – anything that other teams might need to quickly determine the root cause and impact of an outage.
Let’s take a closer look at how it could help my company. The dashboard below gives an overall status of the application. In this example, every stakeholder can see that:
- the daily sign-ups are off target
- there is a potential issue with the subscription management service
- there is definitely an issue with the AWS Lambdas
Because every team has their own dashboards with their own metrics to define their own monitors, it is possible to further drilldown and explore the status of sub-components. In this example, we can see that there was a recent release that failed, but more importantly, that the AWS Lambdas underpinning the subscription management service are erroring out.
The DevOps team can now take immediate action and work with the Engineering team to initially reverse the deployment and fix the code.
When individual teams roll-up their health and key metrics, SquaredUp automatically creates the relationships between components. SLOs and KPIs can then be derived and aggregated to add more context and insights to these status dashboards.
What can SquaredUp do for me?
To see for yourself what SquaredUp can do for your environment, get started today for free with our out-of-the-box dashboards that will have you up and running in minutes.
Or, to find out more: