AKS Monitoring Deep Dive — Part 2
In Part 1 of this series we have seen the various native options Azure Kubernetes Service provides in combination with Log Analytics, Azure Monitor and Container Insights. We have also seen how to enable the integration of Azure Monitor as well as configuring the Diagnostic settings.
The following diagram from docs.microsoft.com explains the various sources and streams of logs and metrics in an AKS setup pretty well:
We are seeing metrics and logs from the various layers of your AKS infrastructure, the operating system of your worker nodes, the Kubernetes service itself as well as data from and about the containers and their workloads.
The builtin integration of AKS and Azure Monitor deploys a containerized version of the Log Analytics agent onto all worker nodes in your cluster. This agent is then automatically collecting metrics and logs and stores them in your Log Analytics workspace.
Querying data in Log Analytics
Now that we have seen and configured the initial integration and also the builtin dashboards, let us take a deeper look into Log Analytics itself. How can we query data directly from Log Analytics?
On the left side you are seeing all the different tables that will show up in our Log Analytics workspace after enabling the Diagnostic settings (LogManagement) and Container Insights.
Our AKS diagnostic logs and metrics are stored in AzureDiagnostics (in LogManagement). Querying this table will show you the same categories we have already seen while configuring our Diagnostic settings in Part 1 of this series.
Here is an example for a KQL (Kusto Query Language) query to show you all available categories and the number of events:
AzureDiagnostics| where TimeGenerated > ago(24h)| summarize count() by Category| sort by Category
If you now want to dig deeper into a specific category, for example kube-audit, you can easily build more advanced KQL queries. Like for example:
AzureDiagnostics| where Category == "kube-audit"| extend logs = parse_json(log_s)| where logs.verb == "delete"| project TimeGenerated, logs.kind, logs.level, logs.verb, logs
Which will return a list of all delete (verb) operations including all the other relevant details like user, sourceIPs, timestamps etc.
You can also use these queries to draw charts for specific purposes. Here for example the distribution of used verbs within the last 24 hours (not saying that this is the most useful example):
AzureDiagnostics| where Category == "kube-audit"| where TimeGenerated > ago(24h)| extend verb_ = tostring(parse_json(log_s).verb)| summarize count() by verb_| project count_, verb_| render piechart
Same is true for ContainerInsights data stored in Log Analytics, as we have seen in the beginning, there are a couple more tables available:
To for example query for Warnings in a specific namespace, you can for example use:
KubeEvents| where TimeGenerated > ago(24h)| where Namespace == "kube-system"| project TimeGenerated, Namespace, Name, Reason, Message
These were of course only a couple of examples to show you what is stored in Log Analytics when configuring Diagnostic settings and Container Insights and how powerful KQL is to query data. You can find a ton of more example queries in the Azure Monitor Community Repository on GitHub.
In Part 3 of this series we are going to take a deeper look into the additional capabilities Prometheus support in Azure Monitor has to offer.