** This tutorial was originally published on Datawire.io in 2017. As a result, some of the tools mentioned may no longer be actively maintained. Please join our Slack if you have any questions.

Monitoring Envoy and Ambassador on Kubernetes with the Prometheus Operator

In the Kubernetes ecosystem, one of the emerging themes is how applications can best take advantage of the various capabilities of Kubernetes. The Kubernetes community has also introduced new concepts such as Custom Resources to make it easier to build Kubernetes-native software.

In late 2016, CoreOS introduced the Operator pattern and released the Prometheus Operator as a working example of the pattern. The Prometheus Operator automatically creates and manages Prometheus monitoring instances.

The operator model is especially powerful for cloud-native organizations deploying multiple services. In this model, each team can deploy their own Prometheus instance as necessary, instead of relying on a central SRE team to implement monitoring.

Envoy, Ambassador, and Prometheus

In this tutorial, we'll show how the Prometheus Operator can be used to monitor an Envoy proxy deployed at the edge. Envoy is an open source L7 proxy. One of the (many) reasons for Envoy's growing popularity is its emphasis on observability. Envoy uses statsd as its output format.

Instead of using Envoy directly, we'll use Ambassador. Ambassador is a Kubernetes-native API Gateway built on Envoy. Similar to the Prometheus Operator, Ambassador configures and manages Envoy instances in Kubernetes, so that the end user doesn't need to do that work directly.

Prerequisites

This tutorial assumes you're running Kubernetes 1.8 or later, with RBAC enabled.

Note: If you're running on Google Kubernetes Engine, you'll need to grant cluster-admin privileges to the account that will be installing Prometheus and Ambassador. You can do this with the commands below:

Terminal
$ gcloud info | grep Account
Account: [username@example.org]
$ kubectl create clusterrolebinding my-cluster-admin-binding --clusterrole=cluster-admin --user=username@example.org

Deploy the Prometheus Operator

The Prometheus Operator is configured as a Kubernetes deployment. We'll first deploy the Prometheus operator.

shell
1apiVersion: rbac.authorization.k8s.io/v1beta1
2kind: ClusterRoleBinding
3metadata:
4 name: prometheus-operator
5roleRef:
6 apiGroup: rbac.authorization.k8s.io
7 kind: ClusterRole
8 name: prometheus-operator
9subjects:
10- kind: ServiceAccount
11 name: prometheus-operator
12 namespace: default
13---
14apiVersion: rbac.authorization.k8s.io/v1beta1
15kind: ClusterRole
16metadata:
17 name: prometheus-operator
18rules:
19- apiGroups:
20 - extensions
21 resources:
22 - thirdpartyresources
23 verbs:
24 - "*"
25- apiGroups:
26 - apiextensions.k8s.io
27 resources:
28 - customresourcedefinitions
29 verbs:
30 - "*"
31- apiGroups:
32 - monitoring.coreos.com
33 resources:
34 - alertmanagers
35 - prometheuses
36 - servicemonitors
37 verbs:
38 - "*"
39- apiGroups:
40 - apps
41 resources:
42 - statefulsets
43 verbs: ["*"]
44- apiGroups: [""]
45 resources:
46 - configmaps
47 - secrets
48 verbs: ["*"]
49- apiGroups: [""]
50 resources:
51 - pods
52 verbs: ["list", "delete"]
53- apiGroups: [""]
54 resources:
55 - services
56 - endpoints
57 verbs: ["get", "create", "update"]
58- apiGroups: [""]
59 resources:
60 - nodes
61 verbs: ["list", "watch"]
62- apiGroups: [""]
63 resources:
64 - namespaces
65 verbs: ["list"]
66---
67apiVersion: v1
68kind: ServiceAccount
69metadata:
70 name: prometheus-operator
71---
72apiVersion: extensions/v1beta1
73kind: Deployment
74metadata:
75 labels:
76 k8s-app: prometheus-operator
77 name: prometheus-operator
78spec:
79 replicas: 1
80 template:
81 metadata:
82 labels:
83 k8s-app: prometheus-operator
84 spec:
85 containers:
86 - args:
87 - --kubelet-service=kube-system/kubelet
88 - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
89 image: quay.io/coreos/prometheus-operator:v0.15.0
90 name: prometheus-operator
91 ports:
92 - containerPort: 8080
93 name: http
94 resources:
95 limits:
96 cpu: 200m
97 memory: 100Mi
98 requests:
99 cpu: 100m
100 memory: 50Mi
101 serviceAccountName: prometheus-operator
102

kubectl apply -f prom-operator.yaml

We'll also want to create an additional ServiceAccounts for the actual Prometheus instances.

shell
1apiVersion: v1
2kind: ServiceAccount
3metadata:
4 name: prometheus
5---
6apiVersion: rbac.authorization.k8s.io/v1beta1
7kind: ClusterRole
8metadata:
9 name: prometheus
10rules:
11- apiGroups: [""]
12 resources:
13 - nodes
14 - services
15 - endpoints
16 - pods
17 verbs: ["get", "list", "watch"]
18- apiGroups: [""]
19 resources:
20 - configmaps
21 verbs: ["get"]
22- nonResourceURLs: ["/metrics"]
23 verbs: ["get"]
24---
25apiVersion: rbac.authorization.k8s.io/v1beta1
26kind: ClusterRoleBinding
27metadata:
28 name: prometheus
29roleRef:
30 apiGroup: rbac.authorization.k8s.io
31 kind: ClusterRole
32 name: prometheus
33subjects:
34- kind: ServiceAccount
35 name: prometheus
36 namespace: default

kubectl apply -f prom-rbac.yaml

The Operator functions as your virtual SRE. At all times, the Prometheus operator insures that you have a set of Prometheus servers running with the appropriate configuration.

Deploy Ambassador

Ambassador also functions as your virtual SRE. At all times, Ambassador insures that you have a set of Envoy proxies running the appropriate configuration.

We're going to deploy Ambassador into Kubernetes. On each Ambassador pod, we'll also deploy an additional container that runs the Prometheus statsd exporter. The exporter will collect the statsd metrics emitted by Envoy over UDP, and proxy them to Prometheus over TCP in Prometheus metrics format.

shell
1---
2apiVersion: v1
3kind: Service
4metadata:
5 labels:
6 service: ambassador-admin
7 name: ambassador-admin
8spec:
9 type: NodePort
10 ports:
11 - name: ambassador-admin
12 port: 8877
13 targetPort: 8877
14 selector:
15 service: ambassador
16---
17apiVersion: rbac.authorization.k8s.io/v1beta1
18kind: ClusterRole
19metadata:
20 name: ambassador
21rules:
22- apiGroups: [""]
23 resources:
24 - services
25 verbs: ["get", "list", "watch"]
26- apiGroups: [""]
27 resources:
28 - configmaps
29 verbs: ["create", "update", "patch", "get", "list", "watch"]
30- apiGroups: [""]
31 resources:
32 - secrets
33 verbs: ["get", "list", "watch"]
34---
35apiVersion: v1
36kind: ServiceAccount
37metadata:
38 name: ambassador
39---
40apiVersion: rbac.authorization.k8s.io/v1beta1
41kind: ClusterRoleBinding
42metadata:
43 name: ambassador
44roleRef:
45 apiGroup: rbac.authorization.k8s.io
46 kind: ClusterRole
47 name: ambassador
48subjects:
49- kind: ServiceAccount
50 name: ambassador
51 namespace: default
52---
53apiVersion: extensions/v1beta1
54kind: Deployment
55metadata:
56 name: ambassador
57spec:
58 replicas: 1
59 template:
60 metadata:
61 labels:
62 service: ambassador
63 spec:
64 serviceAccountName: ambassador
65 containers:
66 - name: ambassador
67 image: datawire/ambassador:0.21.0
68 imagePullPolicy: Always
69 resources:
70 limits:
71 cpu: 1
72 memory: 400Mi
73 requests:
74 cpu: 200m
75 memory: 100Mi
76 env:
77 - name: AMBASSADOR_NAMESPACE
78 valueFrom:
79 fieldRef:
80 fieldPath: metadata.namespace
81 livenessProbe:
82 httpGet:
83 path: /ambassador/v0/check_alive
84 port: 8877
85 initialDelaySeconds: 3
86 periodSeconds: 3
87 readinessProbe:
88 httpGet:
89 path: /ambassador/v0/check_ready
90 port: 8877
91 initialDelaySeconds: 3
92 periodSeconds: 3
93 - name: statsd-sink
94 image: datawire/prom-statsd-exporter:0.6.0
95 restartPolicy: Always

kubectl apply -f ambassador-rbac.yaml

Ambassador is typically deployed as an API Gateway at the edge of your network. We'll deploy a service to map to the Ambassador deployment. Note: if you're not on AWS or GKE, you'll need to update the service below to be a NodePort instead of a LoadBalancer.

shell
1---
2apiVersion: v1
3kind: Service
4metadata:
5 labels:
6 service: ambassador
7 name: ambassador
8spec:
9 type: LoadBalancer
10 ports:
11 - name: ambassador
12 port: 80
13 targetPort: 80
14 selector:
15 service: ambassador

kubectl apply -f ambassador.yaml

You should now have a working Ambassador and StatsD/Prometheus exporter that is accessible from outside your cluster.

Configure Prometheus

We now have Ambassador/Envoy running, along with the Prometheus Operator. How do we hook this all together? Logically, all the metrics data flows from Envoy to Prometheus in the following way:

So far, we've deployed Envoy and the StatsD exporter, so now it's time to deploy the other components of this flow.

We'll first create a Kubernetes service that points to the StatsD exporter. We'll then create a ServiceMonitor that tells Prometheus to add the service as a target.

shell
1---
2apiVersion: v1
3kind: Service
4metadata:
5 name: ambassador-monitor
6 labels:
7 service: ambassador-monitor
8spec:
9 selector:
10 service: ambassador
11 type: ClusterIP
12 clusterIP: None
13 ports:
14 - name: prometheus-metrics
15 port: 9102
16 targetPort: 9102
17 protocol: TCP
18---
19apiVersion: monitoring.coreos.com/v1
20kind: ServiceMonitor
21metadata:
22 name: ambassador-monitor
23 labels:
24 ambassador: monitoring
25spec:
26 selector:
27 matchLabels:
28 service: ambassador-monitor
29 endpoints:
30 - port: prometheus-metrics

kubectl apply -f statsd-sink-svc.yaml

Next, we need to tell the Prometheus Operator to create a Prometheus cluster for us. The Prometheus cluster is configured to collect data from any ServiceMonitor with the ambassador:monitoring label.

shell
1apiVersion: monitoring.coreos.com/v1
2kind: Prometheus
3metadata:
4 name: prometheus
5spec:
6 serviceAccountName: prometheus
7 serviceMonitorSelector:
8 matchLabels:
9 ambassador: monitoring
10 resources:
11 requests:
12 memory: 400Mi
13

kubectl apply -f prometheus.yaml

Finally, we can create a service to expose Prometheus to the rest of the world. Again, if you're not on AWS or GKE, you'll want to use a NodePort instead.

shell
1apiVersion: v1
2kind: Service
3metadata:
4 name: prometheus
5spec:
6 type: NodePort
7 ports:
8 - name: web
9 port: 9090
10 protocol: TCP
11 targetPort: web
12 selector:
13 prometheus: prometheus

kubectl apply -f prom-svc.yaml

Testing

We've now configured Prometheus to monitor Envoy, so now let's test this out. Get the external IP address for Prometheus.

Terminal
$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ambassador 10.11.255.93 35.221.115.102 80:32079/TCP 3h
ambassador-admin 10.11.246.117 <nodes> 8877:30366/TCP 3h
ambassador-monitor None <none> 9102/TCP 3h
kubernetes 10.11.240.1 <none> 443/TCP 3h
prometheus 10.11.254.180 35.191.39.173 9090:32134/TCP 3h
prometheus-operated None <none> 9090/TCP 3h

In the example above, this is 35.191.39.173. Now, go to http://$PROM_IP:9090 to see the Prometheus UI. You should see a number of metrics automatically populate in Prometheus.

Troubleshooting

If the above doesn't work, there are a few things to investigate:

  • Make sure all your pods are running (kubectl get pods)
  • Check the logs on the Prometheus cluster (kubectl logs $PROM_POD prometheus)
  • Check Ambassador diagnostics to verify Ambassador is working correctly

Get a service running in Envoy

The metrics so far haven't been very interesting, since we haven't routed any traffic through Envoy. We'll use Ambassador to set up a route from Envoy to the httpbin service. Ambassador is configured using Kubernetes annotations, so we'll do that here.

shell
1apiVersion: v1
2kind: Service
3metadata:
4 name: httpbin
5 annotations:
6 getambassador.io/config: |
7 ---
8 apiVersion: ambassador/v0
9 kind: Mapping
10 name: httpbin_mapping
11 prefix: /httpbin/
12 service: httpbin.org:80
13 host_rewrite: httpbin.org
14spec:
15 ports:
16 - port: 80

kubectl apply -f httpbin.yaml

Now, if we get the external IP address of Ambassador, we can route requests through Ambassador to the httpbin service:

Terminal
$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ambassador 10.11.255.93 35.221.115.102 80:32079/TCP 3h
ambassador-admin 10.11.246.117 <nodes> 8877:30366/TCP 3h
ambassador-monitor None <none> 9102/TCP 3h
kubernetes 10.11.240.1 <none> 443/TCP 3h
prometheus 10.11.254.180 35.191.39.173 9090:32134/TCP 3h
prometheus-operated None <none> 9090/TCP 3h
$ curl http://35.221.115.102/httpbin/ip
{
"origin": "35.214.10.110"
}

Run a curl command a few times, as shown above. Going back to the Prometheus dashboard, you'll see that a bevy of new metrics that contain httpbin have appeared. Pick any of these metrics to explore further. For more information on Envoy stats, Matt Klein has written a detailed overview of Envoy's stats architecture. If you are interested in setting up a Grafana dashboard, Alex Gervais has published a sample Grafana/Ambassador dashboard.

Conclusion

Microservices, as you know, are distributed systems. The key to scaling distributed systems is creating loose coupling between each of the components. In a microservices architecture, the most painful source of coupling is actually organizational and not architectural. Design patterns such as the Prometheus Operator enable teams to be more self-sufficient, and reduce organizational coupling, enabling teams to code faster.

Next Steps