ScrapeConfig CRD
This document aims at creating a lower level ScrapeConfig
Custom Resource Definition that defines additional scrape
configurations the Kubernetes way.
Why
prometheus-operator misses a way to scrape external targets using CRD. Users have either been abusing the Probe CRD (#3447) or additionalScrapeConfig to do so. Multiple use cases have been reported:
- A user reported in the contributor office hours that their team serves Prometheus as a service to several teams across multiple regions. These teams have tens of thousands of exporters running outside Kubernetes. To scrape them, each team has to involve this user’s team which makes it a bottleneck. While there is CI in place, errors happen and can render Prometheus’ configuration to be invalid. They would like to use a CRD to help the team serve themselves in adding scrape configurations.
- @auligh in #2787
reported needing a specific scrape configuration to scrape services running on the kubernetes nodes but don’t have a
Service attached to them. This user wants to deploy these scrape configurations alongside the applications rather than
in a centralized way with
additionalScrapeConfigs
. - @bgagnon in #2787
mentions that
additionalScrapeConfigs
’s usage is error-prone as with usage, multiple unrelated scrape configurations are bundled together.
Furthermore, currently, there is a lot of code duplication due to the operator supporting several CRDs that generate
scrape configurations. With the new ScrapeConfig
CRD, it would be possible to consolidate some of that logic, where
the other *Monitor
CRDs could be migrated so that they create a ScrapeConfig resource that would ultimately be used by
the operator to generate scrape configuration.
Pitfalls of the current solution
Using additionalScrapeConfig
comes with drawbacks:
- Teams have to build an infrastructure to add scrape rules in a centralized manner, which creates a bottleneck since a single team becomes responsible for the configuration
- There is no input validation, which can lead to an invalid prometheus configuration
Goals
- Provide a way for users to self-service adding scrape targets
- Consolidate the scrape configuration generation logic in a central point for other resources to use
Audience
- Users who serve Prometheus as a service and want to have their customers autonomous in defining scrape configs
- Users who want to manage scrape configs the same way as for services running within the Kubernetes cluster
- Users who want a supported Kubernetes way of scraping targets outside the Kubernetes cluster
Non-Goals
- This proposal doesn’t aim at covering all the fields in
<scrape_config>
. Specifically, it focuses first onstatic_configs
,file_sd_configs
andhttp_sd_configs
. - Refactoring of the other CRDs is not in scope for the first version
How
As described by
@aulig in #2787, we
will create a new ScrapeConfig CRD, this new CRD will act the same as the other CRDs and append scrape configurations to
the configuration. Usage of ScrapeConfig doesn’t exclude the use of the other CRDs, they are not mutually exclusive.
ScrapeConfig
will allow for any scraping configuration, while the other CRDs provide sane defaults. This will allow
for isolated testing of the new ScrapeConfig
CRD.
graph TD;
ServiceMonitor --> prometheusConfig
PodMonitor --> prometheusConfig
ScrapeConfig --> prometheusConfig
prometheusConfig
Using a pseudo custom resource definition, we should have the following:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: my-scrape-config
namespace: system-monitoring
labels:
test: value
spec:
staticConfigs:
- <staticConfig>[] # new resource
fileSDConfigs:
- <fileSDConfig>[] # new resource
httpSDConfigs:
- <httpSDConfig>[] # new resource
relabelings: # relabel_configs
- <RelabelConfig>[] # https://github.com/prometheus-operator/prometheus-operator/blob/e4e27052f57040f073c6c1e4aedaecaaec77d170/pkg/apis/monitoring/v1/types.go#L1150
metricsPath: /metrics
with the following new resources:
staticConfig
targets:
- target:9100
labels:
labelA: placeholder
fileSDConfig
:
files:
# Files here are string referencing a file existing in the Prometheus Pod. prometheus-operator is not responsible for
# these SD files. The operator should use Prometheus.ConfigMaps to mount these files in the pods and have them usable
# by ScrapeConfig. No validation on the content of the SD files is expected from prometheus-operator.
- /etc/prometheus/configmaps/inventory/file.json
refreshInterval: 5m
httpSDConfig
:
url: http://localhost:1234
refreshInterval: 60s
This example doesn’t list all the fields that are offered by prometheus. The implementation of all the fields will be done in an iterative process and as such, the expectation is not for all of them to be implemented in the first version.
Also, to help selecting ScrapeConfig
, a new field will be added to the Prometheus CRD, same as for ServiceMonitor
,
PodMonitor
and Probe
objects:
[...]
spec:
scrapeConfigSelector: ...
scrapeConfigNamespaceSelector: ...
Once the CRD is released, we will start refactoring the other CRDs. Since ScrapeConfig
will allow for any
configuration, it can also generate scrape configuration for the other CRDs.
graph TD;
ServiceMonitor --> ScrapeConfig
PodMonitor --> ScrapeConfig
ScrapeConfig --> prometheusConfig
prometheusConfig
Alternatives
- Use
additionalScrapeConfig
secrets, with the pitfalls described earlier
Action Plan
- Create the
ScrapeConfig
CRD, coveringfile_sd_configs
,static_configs
andhttp_sd_configs
. The implementation of every field in each service discovery mechanism is left to the implementation. The expectation is not for all of them to be implemented. - Once released, refactor the configuration generation logic to reuse
ScrapeConfig
. In parallel, add other service discovery mechanisms to the CRD and complete the implementation offile_sd_configs
,static_configs
andhttp_sd_configs
.