Decomposing OpenTelemetry Collector Configuration for Maintainability

When your collector configuration grows beyond a few hundred lines, you start feeling the friction. Pull request reviews become exercises in scrolling. Testing a single processor change means deploying an entire pipeline. Environment-specific variations creep in through copy-paste, and suddenly you have three nearly-identical YAML files that drift apart over time. The monolithic collector configuration that worked well during initial deployment becomes a liability at scale.
The OpenTelemetry Collector provides configuration providers that enable modular configurations, but the documentation tends to focus on individual features rather than composition patterns. This post examines practical strategies for decomposing collector configurations into maintainable, testable units.
The monolith problem
Consider a typical production collector configuration. It starts innocently enough: OTLP receiver, batch processor, OTLP exporter. Then you add Kubernetes metadata enrichment, tail sampling for traces, filtering for noisy health check spans, resource detection for cloud provider attributes, and suddenly you have a 500-line YAML file that handles traces, metrics, and logs across multiple pipelines.
The problems compound in predictable ways. When someone submits a pull request to modify the tail sampling policy, the reviewer must mentally parse the entire configuration to understand context. When a team wants to test a new transform processor statement, they cannot easily isolate that piece from the rest of the pipeline. When you deploy to staging versus production, environment-specific values get mixed with structural configuration, making it difficult to identify what actually differs between deployments.
The collector's configuration merging behavior and provider system offer a path forward, but the patterns for using them effectively are not immediately obvious from the documentation.
Configuration providers and merging
The collector supports multiple configuration sources through providers. The most commonly used are the file provider (file:), environment provider (env:), HTTP provider (http://, https://), and YAML provider (yaml:). Each provider resolves a URI to configuration content, and the collector merges configurations from multiple sources in the order specified.
otelcol --config file:base.yaml --config file:overrides.yaml
When the collector receives multiple configuration sources, it performs a deep merge. Keys from later sources override keys from earlier sources at each level of the hierarchy. This merge behavior is the foundation for decomposition: you can split configuration by concern and let the merge operation assemble the final result.
The environment provider substitutes environment variable values within configuration files using ${env:VAR_NAME} or the shorthand ${VAR_NAME} syntax.
The file provider supports recursive inclusion through the ${file:path} syntax within configuration files. This enables configuration fragments to reference other fragments, building up complex configurations from smaller pieces.
Decomposition strategies
Three primary patterns emerge for organizing collector configurations: splitting by component type, splitting by signal pipeline, and layering environment-specific overlays. Each serves different organizational needs, and they can be combined.
Splitting by component type
The first pattern separates receivers, processors, and exporters into distinct files. This works well when teams own different parts of the telemetry pipeline. The platform team might own receiver configurations, while the observability team owns processor logic, and the SRE team manages exporter destinations.
collector/
base.yaml # service section, extensions
receivers.yaml # all receiver definitions
processors.yaml # all processor definitions
exporters.yaml # all exporter definitions
The base configuration defines the service section and references components by name:
# base.yaml
extensions:
health_check:
endpoint: 0.0.0.0:13133
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, k8sattributes, batch]
exporters: [otlp]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, batch]
exporters: [otlp]
Component files define the actual configurations:
# receivers.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
# processors.yaml
processors:
memory_limiter:
check_interval: 1s
limit_mib: ${env:MEMORY_LIMIT_MIB:-512}
spike_limit_mib: ${env:SPIKE_LIMIT_MIB:-128}
k8sattributes:
auth_type: serviceAccount
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.deployment.name
batch:
timeout: 1s
send_batch_size: 1024
The collector assembles these with multiple --config flags:
otelcol --config file:base.yaml \
--config file:receivers.yaml \
--config file:processors.yaml \
--config file:exporters.yaml
This pattern makes pull requests smaller and more focused. A change to the batch processor configuration only touches processors.yaml, and reviewers can evaluate it in isolation.
Splitting by signal pipeline
When different teams own different telemetry signals, splitting by pipeline makes more sense. The traces team iterates on sampling policies while the metrics team focuses on aggregation rules. Each signal gets its own configuration file containing receivers, processors, and exporters relevant to that signal.
collector/
common.yaml # shared extensions, telemetry settings
traces.yaml # trace pipeline: receivers, processors, exporters, service.pipelines.traces
metrics.yaml # metrics pipeline: receivers, processors, exporters, service.pipelines.metrics
logs.yaml # logs pipeline: receivers, processors, exporters, service.pipelines.logs
The common file contains shared infrastructure:
# common.yaml
extensions:
health_check:
endpoint: 0.0.0.0:13133
pprof:
endpoint: localhost:1777
processors:
memory_limiter:
check_interval: 1s
limit_mib: ${env:MEMORY_LIMIT_MIB:-512}
service:
extensions: [health_check, pprof]
telemetry:
logs:
level: ${env:LOG_LEVEL:-info}
encoding: json
metrics:
level: detailed
readers:
- pull:
exporter:
prometheus:
host: 0.0.0.0
port: 8888
Each signal file is self-contained for its domain:
# traces.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
tail_sampling:
decision_wait: 30s
num_traces: 50000
policies:
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
- name: slow-requests
type: latency
latency:
threshold_ms: 2000
- name: baseline
type: probabilistic
probabilistic:
sampling_percentage: 5
batch:
timeout: 1s
send_batch_size: 512
exporters:
otlp:
endpoint: ${env:TRACES_BACKEND_ENDPOINT}
tls:
insecure: false
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp]
The merge operation combines the service.pipelines sections from each file, resulting in a complete configuration with all three signal pipelines.
Environment-specific overlays
Production, staging, and development environments differ in endpoints, resource limits, and sometimes pipeline structure. The overlay pattern uses a shared base with environment-specific files that override particular values.
collector/
base.yaml
env/
production.yaml
staging.yaml
development.yaml
The base file defines the complete structure with placeholders or defaults:
# base.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
batch:
timeout: 1s
send_batch_size: 1024
exporters:
otlp:
endpoint: localhost:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
Environment files override specific values:
# env/production.yaml
processors:
memory_limiter:
limit_mib: 2048
spike_limit_mib: 512
exporters:
otlp:
endpoint: ${env:BACKEND_ENDPOINT}
tls:
insecure: false
ca_file: /etc/ssl/certs/ca-bundle.crt
retry_on_failure:
enabled: true
max_elapsed_time: 300s
sending_queue:
enabled: true
num_consumers: 10
queue_size: 5000
service:
telemetry:
logs:
level: info
encoding: json
# env/development.yaml
processors:
memory_limiter:
limit_mib: 256
exporters:
otlp:
endpoint: localhost:4317
tls:
insecure: true
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters: [otlp, debug]
telemetry:
logs:
level: debug
Deployment selects the appropriate overlay:
# Production
otelcol --config file:base.yaml --config file:env/production.yaml
# Development
otelcol --config file:base.yaml --config file:env/development.yaml
Nested file inclusion
For deeply modular configurations, the file provider supports nested inclusion. This is particularly useful for complex processor configurations like tail sampling policies, where individual policies might be maintained by different teams.
# processors/tail_sampling.yaml
processors:
tail_sampling:
decision_wait: 30s
num_traces: 50000
policies:
- ${file:policies/errors.yaml}
- ${file:policies/slo-violations.yaml}
- ${file:policies/baseline.yaml}
Each policy file contains a single policy definition:
# policies/errors.yaml
name: errors
type: status_code
status_code:
status_codes: [ERROR]
# policies/slo-violations.yaml
name: slo-violations
type: and
and:
and_sub_policy:
- name: latency-threshold
type: latency
latency:
threshold_ms: 2000
- name: high-priority-services
type: string_attribute
string_attribute:
key: service.tier
values: [critical, high]
This granularity enables teams to own individual policies, submit focused pull requests, and test policies in isolation before integration.
Testing decomposed configurations
The collector's validate command accepts the same configuration sources as runtime, enabling validation of decomposed configurations:
# Validate merged configuration
otelcol validate --config file:base.yaml --config file:env/production.yaml
# Validate with environment variables set
BACKEND_ENDPOINT=backend:4317 otelcol validate --config file:base.yaml --config file:env/production.yaml
For more complex validation, the print-config command outputs the fully resolved configuration after merging and environment variable substitution:
otelcol print-config --config file:base.yaml --config file:env/production.yaml
This output shows exactly what the collector would receive, useful for debugging merge issues or unexpected environment variable values.
Individual component files can be validated in isolation by wrapping them in minimal configurations. For a processor file to validate independently, it needs at least one receiver, exporter, and pipeline that uses the processor:
# test-harness.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
exporters:
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling] # processor under test
exporters: [debug]
otelcol validate --config file:test-harness.yaml --config file:processors/tail_sampling.yaml
Practical considerations
The merge operation has limitations worth understanding. Arrays are not merged; the later source completely replaces the earlier source's array. This affects pipeline definitions: if base.yaml defines processors: [a, b, c] and an overlay defines processors: [a, b], the result is [a, b], not a combination. Plan your decomposition accordingly, keeping arrays that need to vary together in the same file.
File paths in nested inclusions are relative to the working directory, not the file containing the inclusion. A ${file:policies/errors.yaml} reference resolves relative to where the collector process runs, regardless of which configuration file contains the reference. This behavior can surprise when organizing configurations in subdirectories; consider using absolute paths or ensuring the working directory is set appropriately.
Environment variable defaults (${env:VAR:-default}) only apply when the variable is unset. An empty string is not the same as unset; if VAR="" is exported, the default is not used. For required variables without reasonable defaults, validate externally before starting the collector:
: ${BACKEND_ENDPOINT:?BACKEND_ENDPOINT must be set}
otelcol --config file:config.yaml
Custom collector distributions built with ocb need appropriate providers included in the manifest. The file, env, http, https, and yaml providers are not included by default. Missing providers cause cryptic errors when the configuration tries to use unsupported URI schemes:
# builder.yaml
providers:
- gomod: go.opentelemetry.io/collector/confmap/provider/fileprovider v1.57.0
- gomod: go.opentelemetry.io/collector/confmap/provider/envprovider v1.57.0
- gomod: go.opentelemetry.io/collector/confmap/provider/yamlprovider v1.57.0
When not to decompose
Decomposition adds indirection. A single-file configuration that fits on a screen is easier to understand than multiple files that must be mentally merged. Small teams with straightforward pipelines may find decomposition overhead exceeds its benefits.
The patterns described here target configurations that have grown painful to maintain. If your configuration is not yet painful, keeping it simple might be the right choice. The collector's configuration system supports decomposition when you need it; you are not required to use it from the start.
Summary
The OpenTelemetry Collector's configuration merging and provider system enable modular configurations that scale with organizational complexity. Splitting by component type aligns with team ownership of pipeline stages. Splitting by signal pipeline aligns with team ownership of telemetry domains. Environment overlays separate deployment concerns from structural configuration.
The key insight is that the collector's deep merge behavior lets you compose configurations from independent pieces. Each piece can be reviewed, tested, and modified in isolation. When combined with validation in CI pipelines, decomposed configurations become easier to maintain than their monolithic alternatives.
Start with the decomposition pattern that matches your organizational boundaries. If platform and observability teams have clear ownership, split by component type. If traces, metrics, and logs teams operate independently, split by signal. Layer environment overlays on either approach. The collector's configuration system is flexible enough to support the structure that works for your team.
