Featured image of post Use OpenTelemetry Collector to send metrics to Amazon Managed service for Prometheus

Use OpenTelemetry Collector to send metrics to Amazon Managed service for Prometheus

Today I refine the configuration described in my past article where I tested the sending of metrics from a non AWS hosted Kubernetes cluster to Amazon Managed service for Prometheus (aka AMP). At that time, I did setup a Prometheus instance in the Kubernetes cluster and configured it to send metrics to AMP through Remote Write configuration.

Here, I setup OpenTelemetry Collector to replace the Prometheus instance. The idea is to use a lighter and more versatile process to collect metrics and send them to a remote backend. The OpenTelemetry project offers all components for the creation and management of telemetry data such as metrics, traces and logs. It’s part of CNCF’s sandbox projects, vendor agnostic and in the process of implementation in numerous cloud services (Datadog, New Relic, AWS, Azure, etc …)

The architecture

diagram 01

I will deploy the OpenTelemetry Collector inside the postgres namespace to scrape metrics from the postgres-exporter pod and send them to an AMP workspace.

Setup AMP workspace

Run the following Terraform code to deploy an AMP workspace.

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.37.0"
    }
  }
}

provider "aws" {
  # Configuration options
  region = "eu-west-1"
}

resource "aws_prometheus_workspace" "amp" {
  alias = "otel-test"
}

# Create a user for access to AMP workspace
resource "aws_iam_user" "amp-user" {
  name = "amp-user-tf"

  tags = {
    terraform = "true"
  }
}

# Create an access key and a secret
resource "aws_iam_access_key" "amp" {
  user = aws_iam_user.amp-user.name
}

# Attach some standard policies to the user to query and ingest data
# This could be split between different users for more security
resource "aws_iam_user_policy_attachment" "amp-attach-qaccess" {
  user       = aws_iam_user.amp-user.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess"
}

resource "aws_iam_user_policy_attachment" "amp-attach-remotewriteaccess" {
  user       = aws_iam_user.amp-user.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess"
}

# Output some data
output "AMP_endpoint" {
  value = "AMP endpoint: ${aws_prometheus_workspace.amp.prometheus_endpoint}"
}

output "AMP_keys" {
  value = "Access Key: ${aws_iam_access_key.amp.id} / Secret key: ${aws_iam_access_key.amp.secret}"
}

Take note of the AMP endpoint and AWS access key id and secret of the new user. These are needed later on.

Deploy OpenTelemetry Collector in Kubernetes

The first thing to do is to choose a deployment model for the OpenTelemetry Collector. Everything is possible and the choice of the correct model depends on the usage being made of the collector.

To help with the choice, read the following resources

Besides those resources, take also into account the receiver type to use. As I will only use the Prometheus receiver, I need to take into account the limitations relative to it which are:

  • Collector cannot auto-scale the scraping yet when multiple replicas of the collector is run.
  • When running multiple replicas of the collector with the same config, it will scrape the targets multiple times.
  • Users need to configure each replica with different scraping configuration if they want to manually shard the scraping.
  • The Prometheus receiver is a stateful component.

To avoid having metrics scraped multiple times, I will deploy the OpenTelemetry Collector as a Kubernetes deployment with only one replica inside the namespace containing the resources to monitor (postgres).

The next thing to take care of is the choice of the version of the OpenTelemetry Collector to deploy. There exists indeed multiple official container images of the collector. In this case, I use opentelemetry-collector-contrib image as it contains contributor specific enhancements like awsprometheusremotewriteexporter that enables to send metrics from the collector to an AMP workspace without the need of the AWS Sigv4 proxy sidecar container.

Reminder: all requests sent to an AMP workspace must be signed with AWS Sigv4! The standard OpenTelemetry Collector does support Prometheus Remote Write exporter but not AWS Sigv4 trafic.

Let’s create the stuff now!

Create a service account otel-collector-sa in namespace postgres

apiVersion: v1
kind: ServiceAccount
metadata:
  name: otel-collector-sa
  namespace: postgres

Create a cluster role otel-collector-cr that allow the access to pod informations for scraping.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app: opentelemetry
    component: otel-collector
  name: otel-collector-cr
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  - /metrics/cadvisor
  verbs:
  - get

Finally create a cluster role binding to bind the role otel-collector-cr to the service account otel-collector-sa in namespace postgres.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app: opentelemetry
    component: otel-collector
  name: otel-collector-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: otel-collector-cr
subjects:
- kind: ServiceAccount
  name: otel-collector-sa
  namespace: postgres

Without those 3 tasks, I got the following errors in the logs of the collector

E0427 11:14:59.157773       1 reflector.go:138] 
pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: 
Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: 
User "system:serviceaccount:postgres:default" cannot list resource 
"pods" in API group "" at the cluster scope

Create a Kubernetes Secret otel-collector-aws-creds in namespace postgresthat will contains your AWS credentials.

kubectl create secret generic otel-collector-aws-creds \
  --from-literal=AWS_ACCESS_KEY_ID=<your AWS Access Key> \
  --from-literal=AWS_SECRET_ACCESS_KEY=<your AWS Secret Key> \
  -n postgres

Mount the secret’s keys as environment variables so that they can be picked by awsprometheusremotewrite exporter. This exporter uses the AWS default credential provider chain to find AWS credentials.

Create the OpenTelemetry Collector’s ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-conf
  namespace: postgres
  labels:
    app: opentelemetry
    component: otel-collector-conf
data:
  config.yaml: |-
    receivers:
      prometheus:
        config:
          scrape_configs:

            - job_name: 'otel-collector'
              scrape_interval: 5s
              static_configs:
                - targets: ['0.0.0.0:8888']

            - job_name: kubernetes-pods
              kubernetes_sd_configs:
              - role: pod
                namespaces:
                  names:
                    - postgres
              relabel_configs:
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                regex: true
                action: keep
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
                action: replace
                regex: (https?)
                target_label: __scheme__
              - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
                action: replace
                target_label: __metrics_path__
                regex: (.+)
              - source_labels: [__address__, __meta_kubernetes_pod_annotation_example_io_scrape_port]
                action: replace
                regex: ([^:]+)(?::\d+)?;(\d+)
                replacement: $1:$2
                target_label: __address__
              - action: labelmap
                regex: __meta_kubernetes_pod_label_(.+)     
              - source_labels: [__meta_kubernetes_namespace]
                action: replace
                target_label: kubernetes_namespace
              - source_labels: [__meta_kubernetes_pod_name]
                action: replace
                target_label: kubernetes_pod_name
              - source_labels: [__meta_kubernetes_pod_phase]
                regex: Pending|Succeeded|Failed
                action: drop

    exporters:
      awsprometheusremotewrite:
        retry_on_failure:
          enabled: true
          initial_interval: 10s
          max_interval: 60s
          max_elapsed_time: 10m
        endpoint: "https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/ws-54770de3-1591-4782-8b7f-fe867e858c79/api/v1/remote_write"
        aws_auth:
          service: "aps"
          region: "eu-west-1"

    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [awsprometheusremotewrite]    

The configuration of the OpenTelemetry Collector is very simple.

The receivers: part describe the entry points for data. Here I only defined one receiver of type prometheus. The configuration of this receiver accepts the same syntax as a standard Prometheus instance making it very easy to configure.

There are 2 scrape jobs configured.

  • Job otel-collector scrapes metrics from the OpenTelemetry Collector itself.
  • Job kubernetes-pods scrapes metrics from pods that have the annotation prometheus.io/scrape = true

The following annotations are used by the collector scrape_config to enhance the scraping experience.

Annotation Purpose
prometheus.io/scrape If set to "true" then the pod will be scraped by Opentelemetry Collector.
prometheus.io/path Use to specify a path different than /metrics to scrape metrics.
prometheus.io/port Use to specify a port to scrape metrics.

There is no processors: section in my configuration as I do not process data coming in through receivers.

The exporters: section defines where data collected must be sent. In this case, I use a contributor exporter named awsprometheusremotewrite that is not present in the standard OpenTelemetry Collector (that’s the reason why I used the otel/otel-collector-contrib container image). Apart from the endpoint parameter, you also need to specify the following section :

aws_auth:
  service: "aps"
  region: "eu-west-1"

Off course, the region must match the AWS region where the AMP workspace is deployed. Without these parameters, I received error 403 - forbidden when sending metrics to the workspace.

I do not create a Kubernetes service because the OpenTelemetry Collector scrapes metrics and does not receive them. It must be taken into account if the collector is also used to collect logs and traces. In that case, other pods must be able to send data to it and a service is mandatory.

Create a deployment in namespace postgres.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: postgres
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-collector
    spec:
      serviceAccountName: otel-collector-sa
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.24.0
        envFrom:
          - secretRef:
              name: otel-collector-aws-creds
        ports:
        - containerPort: 8888     
        volumeMounts:
        - name: config
          mountPath: /etc/otel/config.yaml
          subPath: config.yaml
          readOnly: true
      volumes:
        - name: config
          configMap:
            name: otel-collector-conf

Check the deployment.

kubectl get all -n postgres --selector=app=opentelemetry
NAME                                  READY   STATUS    RESTARTS   AGE
pod/otel-collector-86cc7fb7d7-h4ngw   1/1     Running   0          110m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/otel-collector   1/1     1            1           110m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/otel-collector-86cc7fb7d7   1         1         1       110m

Check the logs of the collector.

kubectl logs pod/otel-collector-86cc7fb7d7-h4ngw -n postgres
2021-04-28T12:34:33.064Z        info    service/application.go:280      Starting OpenTelemetry Contrib Collector...    {"Version": "v0.24.0", "GitHash": "f564404d", "NumCPU": 3}
2021-04-28T12:34:33.064Z        info    service/application.go:215      Setting up own telemetry...
2021-04-28T12:34:33.066Z        info    service/telemetry.go:98 Serving Prometheus metrics      {"address": ":8888", "level": 0, "service.instance.id": "d6c38367-1773-4c92-815d-db4a3fb48d2a"}
2021-04-28T12:34:33.066Z        info    service/application.go:250      Loading configuration...
2021-04-28T12:34:33.070Z        info    service/application.go:257      Applying configuration...
2021-04-28T12:34:33.071Z        info    builder/exporters_builder.go:274        Exporter was built.     {"kind": "exporter", "exporter": "awsprometheusremotewrite"}
2021-04-28T12:34:33.071Z        info    builder/pipelines_builder.go:204        Pipeline was built.     {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-04-28T12:34:33.071Z        info    builder/receivers_builder.go:230        Receiver was built.     {"kind": "receiver", "name": "prometheus", "datatype": "metrics"}
2021-04-28T12:34:33.071Z        info    service/service.go:155  Starting extensions...
2021-04-28T12:34:33.071Z        info    service/service.go:200  Starting exporters...
2021-04-28T12:34:33.071Z        info    builder/exporters_builder.go:92 Exporter is starting... {"kind": "exporter", "name": "awsprometheusremotewrite"}
2021-04-28T12:34:33.071Z        info    builder/exporters_builder.go:97 Exporter started.       {"kind": "exporter", "name": "awsprometheusremotewrite"}
2021-04-28T12:34:33.071Z        info    service/service.go:205  Starting processors...
2021-04-28T12:34:33.071Z        info    builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-04-28T12:34:33.071Z        info    builder/pipelines_builder.go:62 Pipeline is started.    {"pipeline_name": "metrics", "pipeline_datatype": "metrics"}
2021-04-28T12:34:33.071Z        info    service/service.go:210  Starting receivers...
2021-04-28T12:34:33.071Z        info    builder/receivers_builder.go:70 Receiver is starting... {"kind": "receiver", "name": "prometheus"}
2021-04-28T12:34:33.072Z        info    kubernetes/kubernetes.go:264    Using pod service account via in-cluster config{"kind": "receiver", "name": "prometheus", "discovery": "kubernetes"}
2021-04-28T12:34:33.085Z        info    builder/receivers_builder.go:75 Receiver started.       {"kind": "receiver", "name": "prometheus"}
2021-04-28T12:34:33.086Z        info    service/application.go:227      Everything is ready. Begin running and processing data.

As everything seems allright, let’s configure a Grafana instance to see if metrics are flowing in the AMP workspace.

Grafana setup

Warning: this Grafana setup is only meant to test and not to last!

I started Grafana as a container on my workstation to quickly test the configuration

docker run -d \
  -p 3000:3000 \
  --name=amp-grafana \
  -e "GF_AUTH_SIGV4_AUTH_ENABLED=true" \
  -e "AWS_SDK_LOAD_CONFIG=true"\
  grafana/grafana

After the setup of the data source (see my past article), quickly play with a dashboard to visualize OpenTelemetry Collector metrics.

fig 1

Conclusion

The OpenTelemetry Collector is a really nice alternative to the deployment of a Prometheus instance if you’re only using it as a “relay” to send metrics to a remote backend. It consumes less data and accepts the same scraping configurations. It is vendor agnostic and enables to change the remote backend without too much hassle. It is also very versatile as it can be used to collect logs and traces and offers a robust base to setup observability in your Kubernetes cluster.

Built with Hugo
Theme Stack designed by Jimmy