How to Monitor Kapsule Cluster With Grafana Cloud

Welcome cloud traveller and take a seat to read the wonderful story on how to monitor your Kapsule cluster with Grafana Cloud!

In older articles on this blog, I already explored some ways to monitor my Kubernetes Kapsule cluster hosted at Scaleway. As I’m running a very small managed cluster, hosting a full Prometheus install consumes quite some resources and I was looking for an alternative.

I have already tested the deployment of OpenTelemetry Collector Agent in combination with AWS Managed Prometheus and that was working well. Nonetheway, I still had to deploy and configure Grafana for visualization. Furthermore, this solution is only good at storing metrics but there is nothing concerning logs and traces.

With that in mind, I decided to give a try at Grafana Cloud which is a combination of the following managed services:

Grafana for dashboarding / visualization / alerting.
Prometheus at scale (Cortex).
Logs management (with Loki).
Distributed traces management (with Tempo).

All these components are managed by Grafana and are available as a free (forever) tier.

In this article, I will focus on the metrics part. There will be 2 additional articles for logs and traces.

Grafana Cloud setup

The concept is very simple.

First you create a free tier account.

After a few seconds (max one minute), your environment will be ready. You will be logged in into a hosted Grafana instance with some additional content if you compare with the open source version.

fig 01 - grafana cloud new options

Click on onboarding -> walkthrough and it will display a page detailing the multiple available integrations.

fig 02 onboarding - walkthrough

As I wanted to monitor my Kubernetes cluster, I chose the Kubernetes tile then select Configure service button at the bottom of the page.

fig 03 configure Kubernetes service

Follow the 4 steps described on the page. Don’t forget to have a look at the scripts before executing them (we’re never cautious enough). Run these steps to deploy a Grafana Agent in your cluster that is configured to scrape metrics from cadvisor and kubelet and send them to your Prometheus instance of Grafana Cloud.

Note: the generated scripts do contain your client id and secret to connect your Grafana Cloud instance.

When the resources are deployed in your cluster, click Test integration and finish installation button at the bottom of the page. It should take at most one minute before it succeeds and voilà … metrics are flowing into your Grafana Cloud and you have some standard dashboards ready to use.

fig 04 Kubernetes dashboard sample

Grafana Agent standard configuration for Kubernetes integration

Below you’ll find the content of the Grafana Agent configmap deployed by the Kubernetes integration.

server:
  http_listen_port: 12345
prometheus:
  wal_directory: /tmp/grafana-agent-wal
  global:
    scrape_interval: 1m
    external_labels:
      cluster: <some cluster name>
  configs:
  - name: integrations
    remote_write:
    - url: https://prometheus-blocks-prod-us-central1.grafana.net/api/prom/push
      basic_auth:
        username: <your Grafana Cloud user name>
        password: <your Grafana Cloud password>
    scrape_configs:
    - job_name: integrations/kubernetes/cadvisor
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
        - role: node
      metric_relabel_configs:
        - action: drop
          regex: container_([a-z_[]+);
          source_labels:
            - __name__
            - image
        - action: drop
          regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s)
          source_labels:
            - __name__
      relabel_configs:
        - replacement: kubernetes.default.svc.cluster.local:443
          target_label: __address__
        - regex: (.+)
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
          source_labels:
            - __meta_kubernetes_node_name
          target_label: __metrics_path__
      scheme: https
      tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: false
          server_name: kubernetes
    - job_name: integrations/kubernetes/kubelet
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
        - role: node
      relabel_configs:
        - replacement: kubernetes.default.svc.cluster.local:443
          target_label: __address__
        - regex: (.+)
          replacement: /api/v1/nodes/${1}/proxy/metrics
          source_labels:
            - __meta_kubernetes_node_name
          target_label: __metrics_path__
      scheme: https
      tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: false
          server_name: kubernetes
    
integrations:
  prometheus_remote_write:
  - url: https://prometheus-blocks-prod-us-central1.grafana.net/api/prom/push
    basic_auth:
      username: <your Grafana Cloud user name>
      password: <your Grafana Cloud password>

As you can see, only cadvisor and kubelet metrics are currently scraped every minute (as advertised in the documentation).

Warning: I modified the scrape interval to avoid having too many active metric series to stay in the limits of the free tier.

Let’s see if we can improve that.

Modifying the agent configuration

I wanted to add the following metrics :

Postgres metrics from the Postgres database deployed in my cluster.
Gitea metrics exposed by the service / pod.

The Grafana Agent configuration file supports the following types of configuration

integrations (like configured by default deployment): integrations are a combination of metrics scraping and a corresponding exporter.
Prometheus-like configurations (if you are experienced with Prometheus or want to reuse some existing configurations)
Loki configurations (for log scraping)
Tempo configurations (for traces collection)

Remark: I found the info later on that integrations are ok to use as a start method to get up and running fast with Grafana Cloud but the best monitoring experience is still using the Prometheus approach.

Scraping Postgres metrics

Luckily enough, there is an integration available to collect metrics from a Postgres instance.

From an old article, I did already have a Postgres exporter deployed in my cluster. I decided to delete it to avoid any waste of resource.

helm uninstall prometheus-postgres-exporter -n postgres

Afterwards, I modified the configuration of Grafana Agent configuration file (in the configmap) to add the postgres_exporter integration in the integrations: section.

postgres_exporter:
  enabled: true
  data_source_names:
  - postgresql://<username>:<password>@postgres-svc.postgres.svc.cluster.local:5432/?sslmode=disable
  # Disables collection of metrics from pg_settings. This will generate to much metrics.
  disable_settings_metrics: true

I have disabled the gathering of the Postgres settings because I don’t plan to use these metrics in any dashboards and don’t want to “waste” free metrics for these.

Apply the new configmap and restart the grafana-agent pod.

# Restart the Grafana Agent after configmap change
kubectl rollout restart deployment/grafana-agent

After a few seconds, a new dashboard related to Postgres is available in Grafana Cloud and pg metrics are visible too.

fig 05 Postgres dashboard sample

Scraping Gitea metrics

This time, we will use a standard Prometheus configuration. In fact, we will enable a “standard” Kubernetes pod scraping configuration that is used in many other product deployments plus some tweaks to avoid metrics overload. This configuration will automatically scrape metrics from pods that satisfies the following conditions :

the pod must have an annotation prometheus.io/scrape=true.
the pod must have an annotation grafana/cloud=true.
the pod must be running.

Add the following code in the prometheus -> configs section of the Grafana Agent configmap

prometheus:
  wal_directory: /tmp/grafana-agent-wal
  global:
    scrape_interval: 1m
    external_labels:
      cluster: <some cluster name>
  configs:
  - name: kubernetes-pods
    remote_write:
    - url: https://prometheus-blocks-prod-us-central1.grafana.net/api/prom/push
      basic_auth:
        username: <your Grafana Cloud user name>
        password: <your Grafana Cloud password>
    scrape_configs:
      job_name: prometheus/kubernetes/pods
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: keep
          regex: "true"
          source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape
        - action: replace
          regex: (https?)
          replacement: $1
          source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_scheme
          target_label: __scheme__
        - action: replace
          regex: (.+)
          replacement: $1
          source_labels:
            - __meta_kubernetes_pod_annotation_prometheus_io_path
          target_label: __metrics_path__
        - action: replace
          regex: (.+?)(\:\d+)?;(\d+)
          replacement: $1:$3
          source_labels:
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
          target_label: __address__
        - action: keep
          regex: "true"
          source_labels:
            - __meta_kubernetes_pod_annotation_grafana_cloud
        - action: replace
          replacement: $1
          separator: /
          source_labels:
            - __meta_kubernetes_namespace
            - __meta_kubernetes_pod_label_name
          target_label: job
        - action: replace
          source_labels:
            - __meta_kubernetes_namespace
          target_label: namespace
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_name
          target_label: pod
        - action: replace
          source_labels:
            - __meta_kubernetes_pod_container_name
          target_label: container
        - action: replace
          separator: ':'
          source_labels:
            - __meta_kubernetes_pod_name
            - __meta_kubernetes_pod_container_name
            - __meta_kubernetes_pod_container_port_name
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
          replacement: __param_$1
        - action: drop
          regex: Succeeded|Failed
          source_labels:
            - __meta_kubernetes_pod_phase

Restart the grafana-agent deployment

kubectl rollout restart deployment/grafana-agent

Check that the Grafana Agent is restarting successfully.

As Gitea is deployed using Helm chart, modify the file gitea-values.yaml to add the required annotations for scraping.

gitea:
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "3000"
    grafana/cloud: "true"

Remark: I did combine this with an update of the version of Gitea.

# Update Helm repositories
helm repo update

# Upgrade the Gitea installation
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n gitea

The metrics should come into your Grafana Cloud Prometheus instance soon.

fig 06 Gitea metrics in Explore mode

I can now build my own Gitea dashboard (or import an existing one).

Troubleshooting the Grafana Agent

Not everything will run fine at the first try and you’ll certainly have times when you’ll want to have a look at what the agent is doing. Additionally to the docuementation, you should be aware of the Grafana Agent API.

The APIs I found the most usefull are

GET /agent/api/v1/instances

and

GET /agent/api/v1/targets

This one is particularly interesting when you want to understand what the agent is scraping and which labels are processed.

To run those API calls, simply configure a port-forward to the Grafana Agent pod on port 12345 and run curl from your workstation. Pipe the result through jq to receive a nicely formatted response.

curl GET http://localhost:12345/agent/api/v1/targets | jq

Billing and charging monitoring

As I’m planning to enjoy the free tier of the service, it’s important to have a look at the current usage / consumption of the resource available for that tier. This is possible when looking at the corresponding dashboard that is automatically deployed within your Grafana Cloud instance.

Go to Dashboards -> Manage. Type billing in the search bar and click on the Grafana Cloud Billing/Usage dashboard.

fig 07 Grafana Cloud Billing and Usage dashboard

Conclusion

This is the end of the first part of monitoring your Scaleway Kapsule cluster with Grafana Cloud.

Next time we’ll see how to get logs from some containers and store them in Grafana Cloud’s Loki store.