Visualizing VMInsights data in SquaredUp

Kevin HoodSquaredUp, Senior Test Engineer

Following the release of Azure Monitor for VMs in January 2020, some changes have been made to how its performance data is stored in Log Analytics.   The most significant of these changes is that this data is now stored in the InsightsMetrics table rather than in the Perf table, as we discussed in our ‘What is Azure VM Insights?’ article.  According to Microsoft, this change was made because the schema of the InsightsMetrics table is more flexible than that of the Perf table, and consequently they’re able to store a more diverse range of data in it than would have been possible with the Perf table.  The InsightsMetrics table is also where Azure Monitor for containers stores its data, so there is an element of consolidation to this change as well.  There are some potential downsides to this as performance data could be duplicated in both the Perf table and the InsightsMetrics table.  If customers have not disabled Azure Monitor for VMs performance counters that were writing to the Perf table then the cost of the workspace could be higher than it needs to be. 

The good news is that if you are using Azure Monitor for VMs to collect performance counter data, SquaredUp can help surface this data whether it’s being stored in the Perf table, or the InsightsMetrics table, or both. It is, however, worth noting that the structure of the InsightsMetrics table differs slightly from the Perf table, and this can make it a little more challenging to write KQL queries to get the information you want. 

The InsightsMetrics table structure contains columns for Computer, Name and Namespace where Name is equivalent to the ObjectName column and Namespace is equivalent to the CounterName column in the Perf table.  The one column we seem to be missing in InsightsMetrics is the Instance column.  This information is stored in the Tags column, which is a string in JSON format.  To demonstrate a way to get at this information, I will configure a Line Graph tile showing Free Disk Space on a per-instance basis. 

InsightsMetrics
| where Namespace == "LogicalDisk" and Name == "FreeSpaceMB"
| where split(Computer, ".")[0] in ({{scope.map(item => '"'+item.name+'"').join(',')}})
| extend Instance = extractjson("$['vm.azm.ms/mountId']", Tags)
| extend ComputerInstance = strcat(split(Computer, ".")[0], " | ", Instance)
| summarize avg(Val / 1024) by ComputerInstance, bin(TimeGenerated, 1d)

Figure 1. The SquaredUp Logs tile query field

Stepping through the query line-by-line:

  1. From the InsightsMetrics table…
  2. …retrieve the rows where Namespace is LogicalDisk and Name is FreeSpaceMB
  3. Filter out any rows where the value of the Computer does not match any of the resources in the tile scope. This is where the query gets a bit of SquaredUp magic injected into it.  I have scoped the tile to Virtual Machines in a single Resource Group using the tile’s Scope panel, so I can use properties from the objects in scope (like the name of the VM) to filter the query results.  The mustache template I am using is a pre-canned one that is available from the insert dynamic value button {{}} in the query panel and, whilst it looks quite scary, all it is doing is dumping the name of every single azure object in the scope into a comma-separated list.  This mustache template functionality is also available in Logs tiles created on SquaredUp perspectives.  The benefit of this is that it makes it possible to create a Logs tile on one VM with a KQL query that uses a dynamic mustache template for the computer name, and then that same tile will work for other VMs without any additional work because the computer name in the query will be dynamically replaced with the name of VM you’ve drilled down into.  Take a look at SquaredUp perspectives on our Knowledge Base for more information.

    Figure 2. The SquaredUp Scope panel

    You may have noticed that I am using the Kusto split function on the Computer field.  This is because the names of our VMs in Azure have a DNS suffix but in Log Analytics the Computer value is the NetBIOS name. To get them to match, I am splitting the fully qualified name into segments using the “.” character and only taking the first segment (the NetBIOS name).

  4. For the FreeSpaceMB metric, the Tags column contains a string that is in a JSON-like format. I initially used the todynamic() function to convert it to JSON, which worked.  However I noticed that the todynamic() pageactually states that if you want a single value from the JSON then extractjson() is more efficient.  The extractjson() page then told me that using a Regular Expression is even more efficient.  Personally, I dislike RegEx, and it is worth demonstrating that there are a few different ways to get to the information, so I have left the use of extractjson() in this particular example.  The output from this function gives us the partition e.g C: or D: and assigns it to a new Instance column. 

    Whilst my puny intellect is terrified of RegEx, I have rather painfully worked out a RegEx alternative for line 4 of the query using the extract() function anyway, because it sounds like it is the preferred way of doing things in this case:

    | extend Instance = extract("(\":\")(.*)(\",\")", 2, Tags)

    What this line is saying is “Match this pattern in the Tags field => “:”[any characters]”,” .  Split it into 3 capture groups (denoted by the parentheses in the RegEx string), then extract the 2nd capture group.”  Looking at figure 3 below, this 2nd capture group would be the disk partition name.   

    Figure 3.  An example of an InsightsMetrics Tags field
  5. The final use of extend is to create a new column which will contain the label values for our Line Graph. The strcat() function concatenates strings.  The result of this will be a column called ComputerInstance which combines the Computer and Instance columns to give values like Server1 | C: or Server2 | D:. 
  6. Finally, the summarize line will split all the rows from the query result into bins of 1 day using the TimeGenerated column value, and it will do this for each unique ComputerInstance value. For each of the resulting bins it will divide the counter value column (Val) by 1024 to turn each value into gigabytes and then obtain an average value.  The result is a per-partition time-series graph for free disk space with a nicely formatted label for each line:

    Figure 4.  The results of the Free Disk Space KQL query in a Logs Line Graph tile

    In this instance I did not set a timeframe in the KQL query.  This is because I set it to last 30 days in the tile configuration instead.  This makes it easy to change timeframes later.  However, if you do set a timeframe in the query, then you should set the tile timeframe to all otherwise your query results will be constrained by whatever timeframe is set in the tile.  It is also worth pointing out that the bin(TimeGenerated, 1d) line in the query results in a single average value for an entire day.  I am asking for data for the last 30 days, so having it return an average value  for, say, every 5-minute period rather than every day would result in quite a bit of data being pulled down from Log Analytics.

    Figure 5.  The timeframe panel in a SquaredUp Logs tile

    This example is a relatively complex query because of the nature of the FreeSpaceMB metric in the InsightsMetrics table.  However, simple queries can pull back useful information too.  For example, the CPU Line Graph below uses this query to obtain its data:

    InsightsMetrics
    | where Namespace == "Processor"
    | where Name == "UtilizationPercentage"
    | summarize avg(Val) by Computer, bin(TimeGenerated, 5m)

    Figure 6. A SquaredUp Logs Line Graph tile showing CPU usage from a simple KQL query

    The Logs Donut tile can also be useful to display Insights data.  In this case I have created a column called DiskSpaceStatus.  Using the iff() function, If the value of the FreeSpacePercentage metric is less then or equal to 25% then the value returned for it will be “Low Disk Space” and if it is higher then it will be “Disk Space OK”. The first summarize line with the arg_max() function ensures I only get the latest record for each distinct Computer by selecting the row with the most recent TimeGenerated value (it will also include any other columns’ values from this row that I want – in this case I want DiskSpaceStatus) and the second summarize line then counts up the number of instances of the two possible DiskSpaceStatus values in the data that was passed to it from the previous summarize line.

    InsightsMetrics
    | where Namespace == "LogicalDisk" and Name == "FreeSpacePercentage"
    | where split(Computer, ".")[0] in ({{scope.map(item => '\"'+item.name+'\"').join(',')}})
    | extend Instance = extract("(\":\")(.*)(\")", 2, Tags)
    | extend ComputerInstance = strcat(split(Computer, ".")[0], " | ", Instance)
    | extend DiskSpaceStatus = iff(round(Val) <= 25, "Low Disk Space", "Disk Space OK")
    | summarize arg_max(TimeGenerated, DiskSpaceStatus) by ComputerInstance
    | summarize count() by DiskSpaceStatus

    Figure 7. A SquaredUp Donut tile using InsightsMetrics data to show a Disk Space Summary

    We probably need the name of the partitions that have low disk space to do anything about it though, so the query could be modified slightly and added to a Logs Grid tile on the same dashboard as the Donut above:

    InsightsMetrics
    | where Namespace == "LogicalDisk" and Name == "FreeSpacePercentage"
    | where split(Computer, ".")[0] in ({{scope.map(item => '\"'+item.name+'\"').join(',')}})
    | extend Instance = extract("(\":\")(.*)(\")", 2, Tags)
    | extend ComputerInstance = strcat(split(Computer, ".")[0], " | ", Instance)
    | extend DiskSpaceStatus = iff(round(Val) <= 25, "Low Disk Space", "Disk Space OK")
    | summarize arg_max(TimeGenerated, DiskSpaceStatus, round(Val)) by ComputerInstance

    Figure 8. A Donut and Grid tile showing summarised and detailed disk space information using KQL

    To be successful with the Logs Line Graph, Logs Donut and Logs Bar tiles it is important to be aware of what data they are expecting:

    Line Graph:  your query should return at least 2 columns – one being a DateTime value and the other being a numeric value (int, long, real).  In scenarios where the line graph will have more than a single line, you also need a String column for use as the label value for each line.

    Donut and Bar: your query should return at least 2 columns – one being a String column for the legend labels and the other being a numeric value (int, long, real).

    For the Logs Grid tile your query simply needs to return one or more columns, and the Logs Scalar tile requires a single numeric value.

    Hopefully, this has been a useful demonstration of how to extract and display InsightsMetrics data in SquaredUp.  As always, our Support and Community pages can lend a helping hand if needed!