Sameer Mhaisekar
Developer Advocate, SquaredUp
Developer Advocate, SquaredUp
Recently I've been working with an MSP on a couple of their use cases. One of the services they're designing for their customers is that of cost optimization on their IT infrastructure. On top of the in-house services and tools they had, they needed a tool to help them identify and quantify this underutilization. This is something SquaredUp is very well suited to do – and so we got to work!
We had the perfect set of data sources to create a proof of concept. We would use VMWare, Azure and AWS data sources to identify underutilized cloud resources and the M365 data source to identify inactive licenses, thereby revoking them and saving cost. In this article, we will take a look at the VMware use case. We will cover the other data sources in the following articles.
Based on this, we came up with a few dashboards. The idea is to identify either resources that are turned off, or instances where the key metric for that resource type was underperforming.
To simplify the process, we used dashboard variables wherever we could to see both the combined, as well as individual resource consumption.
For VMware, we chose three key components – hosts, datastores and guests. This way we could identify the wasted resources at various levels for a fuller picture.
The VMware plugin lists loads of metrics at different levels (L1-L4). All you need to do is to identify the KPIs for underutilization and display the top or bottom X number (depending on what the metric is) of resources for that KPI.
to start, we mapped a couple of key metrics for each host. These include the classic CPU, Disk and Memory usage, as well as network throughput. I also extended it a little bit by adding a tile for host properties, and a tile that uses SQL analytics to run a query to roughly calculate how much cost could be reduced as part of rightsizing based on these metrics.
As you can see, by default the dashboard is showing the aggregate of all the items in my variable collection. That can be easily changed – select the specific item you want to see and the tiles will change data accordingly. Easy.
We also plotted similar KPIs for the guest VMs as well, with an addition of a couple other metrics such as the count of all the VMs that are turned off. The tile on the top is always scoped to the top 3 VMs that are running at the lowest CPU so you can quickly identify the underperforming ones, even if new ones are added later.
Of course, you can select a specific VM or a group of them via dashboard variables as well.
Lastly, we have the datastores. Here we only have 3 tiles, all related to the same metric – disk utilization. One of them shows the total disk capacity of each datastore, and the one below it shows used disk space. For the third, we got a bit creative. I wanted to plot the percentage utilization for each of these datastores and did not see any direct metric for that. So I wrote a query to do just that using SQL analytics.
And just like that we have the percentage too!
Using these dashboards, you can already see the resources at various levels of utilization and take appropriate action proactively.
That’s all for now – happy dashboarding!