Alerting Routes
This guide assumes you already have a basic understanding of the Prometheus Operator and have gone through the Getting Started guide. We’re also expecting you to know how to run an Alertmanager instance.
In this guide, we’ll explore the various methods for managing Alertmanager configurations within your Kubernetes cluster.
Prometheus’ configuration also includes “rule files”, which contain the alerting rules. When an alerting rule is triggered, it fires that alert to all Alertmanager instances, on every rule evaluation interval. The Alertmanager instances communicate to each other which notifications have already been sent out. For more information on this system design, see the High Availability page.
By default, the Alertmanager instances will start with a minimal configuration which isn’t really useful since it doesn’t send any notification when receiving alerts.
You have several options to provide the Alertmanager configuration:
- Using a native Alertmanager configuration file stored in a Kubernetes secret.
- using
spec.alertmanagerConfiguration
to reference anAlertmanagerConfig
object in the same namespace which defines the main Alertmanager configuration. - Using
spec.alertmanagerConfigSelector
andspec.alertmanagerConfigNamespaceSelector
to tell the operator whichAlertmanagerConfig
objects should be selected and merged with the main Alertmanager configuration.
Using a Kubernetes Secret
The following native Alertmanager configuration sends notifications to a fictuous webhook service:
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://example.com/'
Save the above configuration in a file called alertmanager.yaml
in the local directory and create a Secret from it:
kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml
The Prometheus operator requires the Secret to be named like
alertmanager-{ALERTMANAGER_NAME}
. In the previous example, the name of the
Alertmanager is example
, so the secret name must be alertmanager-example
.
The name of the key holding the configuration data in the Secret has to be
alertmanager.yaml
.
Note: if you want to use a different secret name, you can specify it with the
spec.configSecret
field in the Alertmanager resource.
The Alertmanager configuration may reference custom templates or password files
on disk. These can be added to the Secret along with the alertmanager.yaml
configuration file. For example, provided that we have the following Secret:
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-example
data:
alertmanager.yaml: {BASE64_CONFIG}
template_1.tmpl: {BASE64_TEMPLATE_1}
template_2.tmpl: {BASE64_TEMPLATE_2}
Templates will be accessible to the Alertmanager container under the
/etc/alertmanager/config
directory. The Alertmanager
configuration can reference them like this:
templates:
- '/etc/alertmanager/config/*.tmpl'
Using AlertmanagerConfig Resources
The following example configuration creates an AlertmanagerConfig resource that sends notifications to a fictitious webhook service.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: config-example
labels:
alertmanagerConfig: example
spec:
route:
groupBy: ['job']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://example.com/'
Create the AlertmanagerConfig resource in your cluster:
curl -sL https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/user-guides/alerting/alertmanager-config-example.yaml | kubectl create -f -
The spec.alertmanagerConfigSelector
field in the Alertmanager resource
needs to be updated so the operator selects AlertmanagerConfig resources. In
the previous example, the label alertmanagerConfig: example
is added, so the
Alertmanager object should be updated like this:
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: example
spec:
replicas: 3
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: example
Using AlertmanagerConfig for global configuration
The following example configuration creates an Alertmanager resource that uses
an AlertmanagerConfig resource to be used for the Alertmanager configuration
instead of the alertmanager-example
secret.
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: example
namespace: default
spec:
replicas: 3
alertmanagerConfiguration:
name: config-example
The AlertmanagerConfig resource named example-config
in namespace default
will be a global AlertmanagerConfig. When the operator generates the
Alertmanager configuration from it, the namespace label will not be enforced
for routes and inhibition rules.
Deploying Prometheus Rules
The PrometheusRule
CRD allows to define alerting and recording rules. The
operator knows which PrometheusRule objects to select for a given Prometheus
based on the spec.ruleSelector
field.
Note: by default,
spec.ruleSelector
is nil meaning that the operator picks up no rule.
By default, the Prometheus resources discovers only PrometheusRule
resources
in the same namespace. This can be refined with the ruleNamespaceSelector
field:
- To discover rules from all namespaces, pass an empty dict (
ruleNamespaceSelector: {}
). - To discover rules from all namespaces matching a certain label, use the
matchLabels
field.
Discover PrometheusRule
resources with role=alert-rules
and
prometheus=example
labels from all namespaces with team=frontend
label:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: example
spec:
serviceAccountName: prometheus
replicas: 2
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
ruleNamespaceSelector:
matchLabels:
team: frontend
In case you want to select individual namespace by their name, you can use the
kubernetes.io/metadata.name
label, which gets populated automatically with
the
NamespaceDefaultLabelName
feature gate.
Create a PrometheusRule object from the following manifest. Note that the
object’s labels match with the spec.ruleSelector
of the Prometheus object.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: example
role: alert-rules
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
For demonstration purposes, the PrometheusRule object always fires the
ExampleAlert
alert. To validate that everything is working properly, you can
open again the Prometheus web interface and go to the Alerts page.
Next open the Alertmanager web interface and check that it shows one active alert.