Kusto 101 – An introductory KQL guide
This blog post is for anyone needing a jumpstart into the world of Kusto. (For those more advanced, look at using Kusto to aggregate and visualize data).
Perhaps you’ve heard about Kusto and are just curious. Maybe you’re just starting to use Azure Monitor for your application monitoring. You might even be getting skilled up in anticipation of the new SquaredUp for Azure release that will have KQL at its heart.
Whatever your reason, set aside the next 10 minutes and we'll get you up to speed with KQL.
Ready? Let's go!
What is Kusto and what is KQL?
KQL stands for Kusto Query Language. It’s the language used to query the Azure log databases: Azure Monitor Logs, Azure Monitor Application Insights and others.
You won't be using Kusto databases for your ERP or CRM, but they’re perfect for massive amounts of streamed data like application logs.
Kusto was the original codename for the Azure Application Insights platform that Azure Monitor is now based on. If you're wondering where the name comes from, it's named after Jacques Cousteau – a French undersea explorer – and you’ll see some cheeky references to Jacques in the Kusto documentation.
KQL sounds like it's related to SQL – is it?
It sure is! SQL, as you probably know, stands for Structured Query Language and is used to query traditional relational databases like Microsoft SQL Server.
The languages look similar too.
If you know what this line of SQL does:
SELECT * FROM ConferenceSessions WHERE conference = 'Ignite 2018'
then you'll find KQL pretty easy to pick up.
Kusto databases do actually support SQL queries. But that just wouldn’t be as much fun as learning KQL...
So can you give me a simple KQL example?
OK, the SQL query above would look like this in KQL:
ConferenceSessions
| where conference == 'Ignite 2018'
That | character looks very familiar – is that PowerShell?
Aha! Microsoft aligning all products on a single language! Unfortunately not.
Although it looks similar, KQL isn't PowerShell. And it's not the M query language you use in PowerBI either. But that '|' character (called a pipe) is very much the same thing as in PowerShell: it passes (or 'pipes') the data through into the next command.
Using pipes makes it much easier to grasp what’s going on when you read a query, and it helps structure things when you're authoring a query.
So the query above takes all the rows in ConferenceSessions and pipes them through to the where statement, which filters to the rows where conference is 'Ignite 2018'.
You can chain multiple statements together by piping the data through to further statements:
ConferenceSessions
| where conference == 'Ignite 2018'
| order by session_title
So it's like SQL with pipes. That’s easy. Are we done?
Patience, young padawan.
For a start, you might have noticed that equality operators use double ==.
conference == 'Ignite 2018'
But that’s not all.
Case-insensitive comparisons on strings use =~.
conference =~ 'ignite 2018'
And string contains (case-insensitive) can be done with contains:
conference contains 'ignite'
DateTimes are different and more powerful too. For example you can refer to d, h, m, and s for days, hours, minutes and seconds, and use them in queries:
starttime > now(-7d)
You can see that KQL is quite expressive and there is a lot more to learn. Don’t worry, though, you’re well on your way to becoming a KQL Jedi!
OK, what other shortcuts can I take to learn Kusto Query Language?
If you're familiar with SQL or PowerShell, there are other similarities that you can use.
Let’s take a look at the KQL keywords count, project and extend. These are three very useful keywords you’ll use often.
I can guess what count is used for. How do I use it?
You guessed right, the keyword count gives you the count of rows. It's like SUM in SQL and measure.Count() in PowerShell.
To use it, simply pipe your data into the count statement.
So this SQL:
SELECT SUM(*) FROM ConferenceSessions
Or this PowerShell:
Get-ConferenceSessions | measure
Becomes this KQL:
ConferenceSessions
| count
How about project? I’ve not heard of that before.
The keyword project is like SELECT in SQL and select in PowerShell. It allows you to pick the columns (or properties) you're interested in.
Like with other keywords, you pipe the data into the project statement to select the columns.
So this SQL:
SELECT session_title, level FROM ConferenceSessions
Or this PowerShell:
Get-ConferenceSessions | select session_title, level
Becomes this KQL:
ConferenceSessions
| project session_title, level
Can I calculate new columns based on other values?
If you need to calculate columns, use extend. It adds new columns to your data.
ConferenceSessions
| extend duration_hours = duration / 60.0
| project session_title, level, duration_hours
That's all pretty intuitive. What about sorting?
The KQL keyword sort by is straightforward to use. It's like ORDER BY in SQL and sort in PowerShell:
ConferenceSessions
| sort by starttime
If you want to truncate your results, the keyword take is like TOP in SQL and select-first in PowerShell:
ConferenceSessions
| take 10
Often, you'll want to combine sort by and take to give you the top N results. KQL has the top command especially for this:
ConferenceSessions
| top 10 by starttime
It's also worth knowing that KQL commands have aliases (i.e. there can be more than one word for the same command). For example, the keywords top and limit do the same thing.
How about joining tables? Not inner, outer, cross – just a basic join.
No surprises here. It's much the same as in SQL:
ConferenceSessions
| join ConferenceStats on sessionid
This is all very familiar. Is there anything in KQL that's totally different?
There are two commands that are absolutely essential if you’re exploring the data interactively: summarize and render.
The keyword summarize is closest to GROUP BY in SQL. It is typically used to count the number of rows in different categories. It works by grouping together rows using some comparison, and then performing an aggregation.
For example, if we want to count how many sessions are at each conference:
ConferenceSessions
| summarize count() by conference
And the render keyword is a real 'wow!' command if you’re using a KQL client like Azure Data Explorer:
ConferenceSessions
| summarize count() by conference
| render piechart
Ta da!
Awesome! This Azure Data Explorer, where do I find it?
Ah! I had been keeping this from you until now.
Azure Data Explorer is like a Jedi's lightsabre. It can do extraordinary things but only in the hands of someone who has the right powers.
After getting this far, you should have the right KQL powers!
https://dataexplorer.azure.com/clusters/help/databases/Samples
You can paste in any of the examples we’ve looked at so far and see them for real (apart from the join example).
Top tip: use shift-enter to execute a query.
Kapow! What about trying this out on some real data?
The KQL client in the Azure portal is very similar to the Azure Data Explorer.
Try browsing to a Virtual Machine in the Azure portal and select Logs under the Monitoring section.
Then try the following KQL queries. These should be familiar to you now.
SecurityEvent
| count
SecurityEvent
| take 1
SecurityEvent
| summarize count() by Activity
Perf
| where ObjectName == "Processor"
| project TimeGenerated, CounterValue
| render timechart
These should get you started, but try out your own queries!
Thanks. I'm really starting to get the hang of KQL now. What more can you teach me?
We also have a KQL guide on aggregating and visualizing data with Kusto. Learn how to use aggregation functions, visualize query results, put your data into context, and more!
As always, share your questions and tips on SquaredUp's Community Answers - let's join forces and achieve everything we could ever wish for!
Good luck, young KQL Jedi!