Ambassador Edge Stack collects many statistics internally, and makes it easy to
direct this information to a statistics and monitoring tool of your
choice. As an example, for a given service
usersvc, here are some
interesting statistics to investigate:
envoy.cluster.usersvc.upstream_rq_totalis the total number of requests that
usersvchas received via Ambassador Edge Stack. The rate of change of this value is one basic measure of service utilization, i.e. requests per second.
envoy.cluster.usersvc.upstream_rq_2xxis the total number of requests to which
usersvcresponded with an HTTP response indicating success. This value divided by the prior one, taken on an rolling window basis, represents the recent success rate of the service. There are corresponding
5xxcounters that can help clarify the nature of unsuccessful requests.
envoy.cluster.usersvc.upstream_rq_timeis a StatsD timer that tracks the latency in milliseconds of
usersvcfrom Ambassador Edge Stack's perspective. StatsD timers include information about means, standard deviations, and decile values.
stats_name element of every CRD that references a service (
TracingService) can override the name under which cluster statistics
are logged (
usersvc above). If not set, the default is the
service value, with non-alphanumeric characters replaced by
service: foowill just use
service: foo:8080will use
service: http://foo:8080will use
service: foo.othernamespacewill use
The last example is worth special mention: a resource in a different namespace than the one in which Ambassador Edge Stack is running will automatically be qualified with the namespace of the resource itself. So, for example, if Ambassador Edge Stack is running in the
ambassador namespace, and this
Mapping is present in the
apiVersion: getambassador.io/v3alpha1kind: Mappingmetadata:name: default-mappingnamespace: defaultspec:prefix: /default/service: default-service
service will be qualified to
default-service.default, so the
stats_name will be
default_service_default rather than simply
default_service. To change this behavior, set
There are several ways to get different statistics out of Ambassador Edge Stack:
:8877/metricsendpoint can be polled for aggregated statistics (in a Prometheus-compatible format). This is our recommended method.
- Ambassador Edge Stack can push Envoy statistics over the StatsD or DogStatsD protocol.
- Ambassador Edge Stack can push RateLimiting statistics over the StatsD protocol.
The Four Golden Signals are four generally-accepted metrics that are important to monitor for good information about service health:
The time it takes to service a request.
cluster.$name.upstream_rq_time is a histogram of time taken by individual requests, which can be an effective latency metric.
The amount of demand being placed on your system.
cluster.$name.upstream_rq_active is a gauge that shows the number of active outstanding requests, which can be a good proxy for traffic.
The number of failing requests. Some errors (e.g. a request succeeds, but gives the wrong answer) can only be detected by application-specific monitoring; however, many errors can be spotted simply by looking at the HTTP status code of requests.
cluster.$name.upstream_rq_5xx is a counter of HTTP
5xx responses, so monitoring it over time can show error rates. (Likewise,
The hardest metric to measure, saturation describes how much of the total capability of the system to respond to requests is being used. Fully measuring saturation often requires application-specific monitoring, but looking at the 99th percentile of latency over a short window - perhaps a a minute - can often give an early indication of saturation problems.