AKS Monitoring Deep Dive — Part 3

5 min readFeb 12, 2021

In Part 3 of this series we are going to take a deeper look into the Prometheus support of Azure Monitor for Containers. If you would like to learn more about Azure Monitor and Log Analytics, please take a look on the previous parts:

Before we start, what is Prometheus? Prometheus is a popular open source monitoring and alerting toolkit and a graduated project of the Cloud Native Computing Foundation. Typically Prometheus is set up as an individual server hosted on top of Kubernetes which scrapes and stores time series data.

Azure Monitor provides builtin support to collect Prometheus metrics, without the need for a Prometheus server. You just need to expose the Prometheus metrics, which we will see in more detail below, and Azure Monitor for containers can scrape the metrics for you and stores them in Log Analytics.

The screenshot below, from docs.microsoft.com, shows you how the containerized Log Analytics agent scrapes the data from various endpoints.

Overview about Prometheus scraping in Azure Monitor

To enable and configure Prometheus scraping we have to apply a Kubernetes configmap to our AKS cluster. The configmap is available here and you will find a couple of more details and notes how to configure it here.

For the sake of time, we are applying this configmap directly from its location on GitHub:

kubectl apply -f https://aka.ms/container-azm-ms-agentconfig

Under normal circumstances you would usually download the file first, make your modifications and apply it to your cluster.

After applying the configmap, we are using kubectl edit to make our modifications. But let us first take a look what endpoints are available to scrape metrics from.

Monitor the managed Kubernetes control plane

The Kubernetes’ apiserver is part of the Microsoft-managed control plane in Azure Kubernetes Service. As it is a managed, PaaS-like service, the responsibility for it is on the Microsoft side, but, it might be nevertheless a good idea to monitor some metrics that are relevant for your workload.

The Kubernetes apiserver provides its metrics via an endpoint, available within the cluster, we can scrape via Azure Monitor.

Let us take a look on this endpoint first, before we configure Azure Monitor to scrape the metrics for us. We are accessing the endpoint from within the cluster, using one of the Log Analytics agent pods:

kubectl get pods -n kube-system | grep oms<.. list of omsagent pods ..>kubectl exec -n kube-system -it omsagent-hr279 -- /bin/bash

Accessing the endpoint is only possible via HTTPS (self-signed certificates, we therefore need to allow insecure connections) and authentication via a bearer token. The required bearer token is already available on our OMSAgent pod.

TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token`curl https://kubernetes.default.svc/metrics -k -H "Authorization: Bearer ${TOKEN}"

This will now return a long list of available metrics:

curl returning the metrics provided by the apiserver

To now enable the OMSAgent (aka Log Analytics agent) to scrape this endpoint, we have to modify the configmap we have applied earlier:

kubectl edit cm/container-azm-ms-agentconfig -n kube-system

Further down in the configmap, we have a section called [prometheus_data_collection_settings.cluster]. In this section we define cluster-level scrape endpoints. Look for kubernetes_services, add the endpoint URL and make sure that the line is not commented out.

kubernetes_services = ["https://kubernetes.default.svc/metrics"]

Saving your changes will now automatically update the OMSAgents and Azure Monitor will start to scrape metrics from the apiserver’s metrics endpoint. A couple of minutes later, you should see new metrics in Log Analytics:

InsightsMetrics| where TimeGenerated > ago(24h)| extend scrapeUrl = tostring(parse_json(Tags).scrapeUrl)| where scrapeUrl == "https://kubernetes.default.svc/metrics"| summarize count() by Name

And we will see that we have around 120 new metrics from the kubernetes.default.svc/metrics endpoint available in the InsightsMetrics table in Log Analytics.

List of metrics from the apiserver available in Log Analytics

Gathering metrics from Kubelet

Same thing we have seen for the apiserver in the previous step is true for Kubelet. Kubelet is the primary “node agent” that runs on each node in a Kubernetes cluster.

As this is not a cluster-wide service as the apiserver, we have a kubelet metrics endpoint on each node in our cluster. Here is an example how to retrieve data from them. Lookup the Internal-IP of one of your cluster nodes:

kubectl get nodes -o wide

Exec again into one of the OMSAgent pods in the kube-system namespace:

kubectl exec -n kube-system -it omsagent-hr279 -- /bin/bash

Load the bearer token and curl the metrics endpoint of one of the kubelets:

TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token`
curl https://10.1.4.4:10250/metrics -k -H "Authorization: Bearer ${TOKEN}"

The result is pretty similar to what we have seen with the apiserver. The metrics itself are slightly different. To let the OMSAgent scrape the metrics from all our kubelets, we have to modify our configmap again:

kubectl edit cm/container-azm-ms-agentconfig -n kube-system

This time we are looking for the [prometheus_data_collection_settings.node] section which contains the node-level scrape endpoints. For our kubelet, a single URL using $NODE_IP is enough to let the OMSAgent scrape this endpoint on all nodes in our cluster:

urls = ["https://$NODE_IP:10250/metrics"]

Getting more advanced infrastructure metrics

Last but not least on the infrastructure side, there might be cases where we need more advanced and detailed metrics for the nodes in our cluster. To gather this data, there is the prometheus-node-exporter. The prometheus-node-exporter is a pod that is deployed as a DaemonSet to all our nodes.

Go to the prometheus/node_exporter GitHub repository to learn more about it. It is also available as a helm chart for deployments on a Kubernetes cluster here.

The node exporter can be set up either using NodePort (like the Kubelet) or using ClusterIP (like the apiserver). Depending on that choice you have to either add it to the .node or .cluster section of your configmap.

Before we close Part 3 of this series one last advice, when storing data in Log Analytics, make sure that you configure proper retention times and perhaps also daily caps to keep costs under control. See Manage usage and costs for Azure Monitor Logs — Azure Monitor | Microsoft Docs for more information.

AKS Monitoring Deep Dive — Part 3

Written by Heyko Oelrichs