Bruce Cullen
Director of Products, SquaredUp
Director of Products, SquaredUp
I first started my career in IT support back in 2003 when VMs where something that was “coming” rather than mainstream, so I had the privilege of witnessing the birth of VMs first hand when my company made the switch from bare metal to VMware GSX running on top of Windows Server 2003. It was amazing to be able to get greater density on those same hardware resources through after than initial elation came the support calls about application performance issue, and so came my introduction to monitoring of VMs and application performance with Big Brother and later MOM 2005.
Fast forward over 20 years and those early lessons still ring true today with Azure, though thankfully the hypervisor shaped pain is taken care of with cloud hosting. Another change over the last 20 years, I am no longer in IT support, I am now Director of Products at SquaredUp where I manage the data sources for our cloud product among other things. When I first started exploring how to get the most out of Azure VMs, I quickly realized the importance of delivering immediate value to users. It's crucial for customers to see benefits right out of the box, to quickly gain insights, and to understand the potential of both our product and to gain insights on where to focus their time within their Azure estate. Nobody enjoys staring at a blank sheet of paper, trying to imagine possibilities from scratch. It's much easier to start with something concrete and then tailor it to your needs.
With this learning in hand, I have taken the lessons I learnt from my early career in IT support and implemented them in the out of box dashboards that come with SquaredUp Cloud’s Azure data source so you don’t have to go through the same learning curve to figure out what is important and what is simply noise.
We had out of box dashboards in the Azure data source before my recent tinkering, but while they were pretty, they were also pretty useless. They focused on showcasing what is possible with the product rather than being focused on actually delivering value to our customers. Great for a demo, not great for the real world.
If you have been in the business for as long as I have, you will know the most important things to focus on are CPU, Disk, Memory and network IO, so this is what our dashboards focus on too, showing you busies boxes and where you can optimize and save resources and cost.
There are many dashboards that come with the Azure data source, I focus on the VMs one in this blog but strongly recommend you check out them all.
This dashboard is designed to provide a comprehensive view of your VM landscape, helping you identify performance issues, optimize resource usage, and maintain overall health.
Gone are the days when dashboards were just pretty pictures, meaning little to anyone aside from their author. Our out-of-box VM dashboard is scoped to one or more subscriptions selectable by dashboard variables from a dropdown, letting you select only subscriptions that are important to you and saving you worrying over an unhealthy VM from a non dev subscription. Here's a closer look at the key features and how they can help you see where you should be taking action now:
High CPU usage can be an indicator of performance bottlenecks. The Virtual Machines dashboard highlights the top 5 VMs with the highest CPU usage. This allows you to quickly identify which VMs are under heavy load, potentially affecting application performance. By pinpointing these VMs, you can take proactive measures to optimize their performance or allocate additional resources where needed.
On the flip side, the dashboard also shows the lowest 5 CPU usage VMs. These VMs, often referred to as "zombie VMs," consume resources without doing meaningful work. Identifying these underutilized VMs helps in resource optimization and cost management, a topic we will cover off in another blog. With visibility into these zombie boxes you can do something about them (delete, kill destroy).
Memory usage is another critical metric for VM performance. The dashboard lists the 5 VMs with the lowest available memory, which can indicate inefficient resource allocation and will result in applications grinding to a halt. By monitoring these VMs, you can better understand their memory requirements and adjust allocations or change VM type to improve overall performance.
Similarly, identifying the 5 VMs with the most free memory could help identify overprovisioned VMs that could be rightsized to save money or more zombie boxes. You may need the money you save here to increase available resources to the VMs with the least available memory.
Network traffic is crucial for VM performance, especially for applications that rely heavily on data transfer. The dashboard provides insights into the top 5 VMs with the highest network in and network out traffic. Monitoring these VMs ensures that your VMs are appropriately specified and network resources are available as needed to help in diagnosing any network-related issues promptly.
Disk I/O performance can significantly impact the responsiveness of your applications. The dashboard tracks the top 5 VMs with the highest disk writes per second and disk reads per second. This information helps in identifying VMs with intense I/O operations, allowing you to optimize disk performance and avoid potential bottlenecks.
The Virtual Machines dashboard provides a list of unhealthy VMs, as reported by Azure itself, helping you quickly spot and address any issues. Additionally, the alerts section ensures that you are notified of any critical events that require immediate attention, allowing you to maintain high availability and reliability.
One of my favorite features of this dashboard is the integration with Azure Service Health, showing you Azure declared service outages applicable to your subscription. Nothing is more frustrating than chasing down an issue only to discover it was caused by an Azure outage. With this feature, you can see Azure service outages directly within the dashboard, saving valuable troubleshooting time – there is nothing worse than having spent hours trying to fix an issue only to find it is Azure’s fault.
Almost all of the tiles on this dashboard are driven by KQL which is great as it allows you to be able to see and edit the KQL for each tile, saving significant time on having to write KQL from scratch yourself
To conclude, Understanding VMs health and performance comes down to CPU, Disk, Memory and Network IO, which is what our new out of box dashboards capture, allowing you to see which VMs need more resources to deliver a good experience for their workloads and where you can save resources and cut down on costs with right sizing. Whether you are an IT administrator or a cloud engineer, this dashboard is designed to empower you with the information you need to manage your VMs effectively.