How to monitor Kapsule cluster with Grafana Cloud

Greetings, this is the third and last article describing how to monitor your Kubernetes cluster with Grafana Cloud free tier.

In part one, I described how to setup a free tier Grafana Cloud account and how to install Grafana Agent to collect and store metrics.

In part two, I did configure alerts and notifications to a Slack channel.

This last post describes how to collect logs from containers running in your cluster and store them in the Loki instance included in your Grafana Cloud free tier account. Remember you have 50 GB of logs available for free. After that, you’ll pay for each additional data stored.

Grafana Agent or Promtail?

The collection of logs for Loki normally happens through Promtail agent. The Grafana Agent also provides this capability. Unfortunately, the Grafana Agent for metrics that is setup with Kubernetes integration is controlled through a Kubernetes deployment object. To be able to scrape logs, the agent must be deployed as a daemonset. You have then to deploy a second agent specifically for logs. I decided to stick with the agent but this is definitely something to be improved / clarified for the future by Grafana Labs.

Some insights about this situation can be found in that post on Grafana Cloud community forum.

Create Grafana Cloud API key

Login to your Grafana Cloud account to the Grafana Cloud portal. Select the organization for which you want to create a new API key. If you’re using a free tier account, you should only have one available.

Select API keys from the SECURITY section on the left then click +Add API key.

Give a meaningful name and assign the role MetricsPublisher.

Copy the value of the API key (it is only displayed once at creation) and store it safely in a password manager.

Close.

Configure LOKI datasource

Still in the Grafana Cloud portal, go back to the home page and click Send Logs button in the Loki tile.

fig 1 - Grafana Cloud Home Page - Loki tile

Keep the page open and log in to your Grafana Cloud instance in another tab of your browser.

Apply the instructions to add the Loki datasource.

PS: I did not have to do this because the setup done in the first article took care of it.

Deploy and configure new Grafana Agent for logs

Get the sample agent-loki.yaml Kubernetes manifest from Grafana Agent Github repository.

Replace all occurrences of YOUR_NAMESPACE by the name of the namespace where you want to deploy the daemonset.

Apply the manifest by running kubectl apply -f agent-loki.yaml.

Now, as this manifest does not contain a configmap object with the configuration of the agent, the container will not start.

Create a configmap by using (and tweaking if needed) the sample configmap available in Grafana Agent documentation.

Tip: you can find some good explanations about what the scrape configs do in the jsonnet templates inside the tanka folder of the Grafana Agent Github repository. For log scraping configs, have a look at this file.

The standard configmap is pretty extensive as it scrapes the logs of the pods that fulfill one of the following requirements :

the pod has a name label.
the pod has a app label.
the pod has a direct controller like StatefulSets.
the pod has an indirect controller like Deployments that create ReplicaSets that create pods.
the pod is part of the static control-plane pods.

Using this scraping configuration could be risky if you wish to stay in the limits of the free tier (max 50 GB of log data). I personnaly use a restricted version of it by only scraping pods that have a label app.kubernetes.io/name and an annotation logging=true.

apiVersion: v1
data:
  agent.yaml: |
    loki:
        configs:
          - clients:
              - url: <LOKI_ENDPOINT>
                basic_auth:
                  username: <USERNAME>
                  password: <API KEY>
            name: default
            scrape_configs:
              - job_name: kubernetes-pods-logging
                kubernetes_sd_configs:
                  - role: pod
                pipeline_stages:
                  - docker: {}
                relabel_configs:
                  - source_labels:
                      - __meta_kubernetes_pod_label_app_kubernetes_io_name
                    target_label: __service__
                  - source_labels:
                      - __meta_kubernetes_pod_node_name
                    target_label: __host__
                  - action: drop
                    regex: ""
                    source_labels:
                      - __service__
                  - action: keep
                    regex: "true"
                    source_labels:
                      - __meta_kubernetes_pod_annotation_logging
                  - action: labelmap
                    regex: __meta_kubernetes_pod_label_(.+)
                  - action: replace
                    replacement: $1
                    separator: /
                    source_labels:
                      - __meta_kubernetes_namespace
                      - __service__
                    target_label: job
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_name
                    target_label: pod
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_container_name
                    target_label: container
                  - replacement: /var/log/pods/*$1/*.log
                    separator: /
                    source_labels:
                      - __meta_kubernetes_pod_uid
                      - __meta_kubernetes_pod_container_name
                    target_label: __path__
        positions_directory: /tmp/positions
    server:
        http_listen_port: 8080
        log_level: info    
kind: ConfigMap
metadata:
  name: grafana-agent-logs
  namespace: default

Apply the configmap and restart the daemonset grafana-agent-logs

# Apply configmap
kubectl apply -f agent-loki-configmap.yaml

# Restart grafana-agent-logs daemonset
kubectl rollout restart ds/grafana-agent-logs

Configure a pod for logging

Check which pods in your cluster do already have the label app.kubernetes.io/name.

kubectl get pods -A -l app.kubernetes.io/name
NAMESPACE      NAME                                       READY   STATUS    RESTARTS   AGE
blog           seblab-blog-deployment-848b7dd499-jddjl    1/1     Running   0          45h
gitea          gitea-0                                    1/1     Running   0          25d
kube-system    nginx-ingress-hb4n4                        1/1     Running   1          246d

Select nginx-ingress (it generates quite some logs) and add the annotation logging=true.

kubectl annotate pods nginx-ingress-hb4n4 logging=true

Check if grafana-agent-log pod hosted on the same node as the nginx-ingress pod starts to scrape the logs of nginx-ingress

# Check logs of the agent to see if the configuration is working
kubectl logs ds/grafana-agent-logs
ts=2021-08-30T06:33:57.279112071Z level=info msg="server configuration changed, restarting server"
ts=2021-08-30T06:33:57.279893392Z level=info caller=server.go:245 http=[::]:8080 grpc=[::]:9095 msg="server listening on addresses"
ts=2021-08-30T06:35:37.276091151Z level=info component=loki component=loki loki_config=default msg="Adding target" key="{app_kubernetes_io_name=\"ingress-nginx\", app_kubernetes_io_part_of=\"ingress-nginx\", container=\"nginx-ingress-controller\", controller_revision_hash=\"5b97cd775d\", job=\"kube-system/ingress-nginx\", logging=\"true\", namespace=\"kube-system\", pod=\"nginx-ingress-hb4n4\", pod_template_generation=\"1\"}"

Visualize logs in Grafana Cloud

The last step is to check if the logs are visible from Grafana UI. Login to your Grafana Cloud instance and choose Explore.

Select the Loki datasource and enter the following query:

{app.kubernetes.io/name="ingress-nginx"}

If everything is fine, you should receive some logs in Grafana.

fig 2 - Exploring Loki logs in Grafana

Conclusion

This post concludes the serie dedicated to the setup of Grafana Cloud free tier to monitor the metrics and logs of a Kubernetes cluster (here a cluster hosted at Scaleway).

I like the services provided by Grafana Cloud. It’s still maturating as we have seen with the Grafana Agent deployment patterns and the integrations are nice idea to onboard quickly the enthousiasts that do not want to dive into the intricacies of Prometheus relabeling. It also enables you to not deploy a full monitoring stack inside your cluster and use its compute power to run your applications only. I suppose it all depends on the level of enthousiasm you want to dedicate to the management of your cluster.

Take care and see you soon.