New Health Explorer for Azure VMs
Yup – we've now got a health explorer for Azure VMs! Health Explorer… now why does that sound familiar? You’ve probably guessed it by now, but here’s another hint:
Hmm…now where have I seen that before? Right – in SCOM!
What is the Health Explorer and what does it do?
We finally have a ‘proper’ health model and also a proper health explorer for Azure VMs. Until now, you’ve only observed the health of a VM changing based on its service health. But now, a health model similar to SCOM’s has been rolled out in preview and I must say I’m excited!
The health of the server now changes based on the health of the ‘monitors’ that are monitoring some performance aspect. Just like in SCOM, you also have a unit monitor and an aggregate monitor. The unit monitors are the actual workflows that monitor a threshold of something and change state if it’s crossed. Health of the unit monitors then rolls up to the aggregate monitor, which then rolls up to the server level. Very SCOM-y.
How to use Azure VM Health Explorer
Alright, let’s get in the portal and see how we can actually use this in real life.
When you jump into the portal and navigate to Azure Monitor > Insights > Virtual Machines, you’ll now see something like this:
A list of all the virtual machines for the set of filters applied. You’ll notice that there may be the purple rocket-shaped icons on all or some VMs, which means the agent on them requires upgrading. You can upgrade the agent one by one or in bulk by selecting multiple.
Here’s one of the machines that I’ve upgraded the agent on:
You’ll notice that there’s a new tab called ‘Health (Preview)’. This is where you’ll see the new health explorer, with all the aggregate and unit monitors for that VM.
As you click on the monitors, a new blade will appear that gives you more details about the monitor.
It also gives you the option to see the configuration of that monitor – what the threshold is, health status, severity level, etc.
In the monitor configuration, you can also set different thresholds for different VMs by browsing into their respective health explorers. You can also do it in bulk using DCRs (Data Collection Rules).
Another great thing about this is that there is no direct cost for the guest health feature, but there is a cost for ingestion and storage of health state data in the Log Analytics workspace. All data is stored in the HealthStateChangeEvent table, which means you can also query the Log Analytics workspace to get the data with KQL and integrate that in your dashboards and reports.
Here’s a quick VM health dashboard I made in SquaredUp with a very simple KQL query.
Looks pretty awesome! 😊
Alright, that ends this quick tour and if you’re interested in finding out more about this feature and how to use it, I highly recommend going through the Microsoft Docs here. Since this is still in preview as of writing this, I’m expecting a lot more improvements and advancements in this. And, as a SCOM guy, I can’t wait to find out how that compares. 😉