Skip to main contentSkip to navigationSkip to footer
    Back to Blog

    Decomposing OpenTelemetry Collector Configuration for Maintainability

    Juraci Paixão Kröhling
    opentelemetryopentelemetry-collectorconfiguration
    Decomposing OpenTelemetry Collector Configuration for Maintainability

    When your collector configuration grows beyond a few hundred lines, you start feeling the friction. Pull request reviews become exercises in scrolling. Testing a single processor change means deploying an entire pipeline. Environment-specific variations creep in through copy-paste, and suddenly you have three nearly-identical YAML files that drift apart over time. The monolithic collector configuration that worked well during initial deployment becomes a liability at scale.

    The OpenTelemetry Collector provides configuration providers that enable modular configurations, but the documentation tends to focus on individual features rather than composition patterns. This post examines practical strategies for decomposing collector configurations into maintainable, testable units.

    The monolith problem

    Consider a typical production collector configuration. It starts innocently enough: OTLP receiver, batch processor, OTLP exporter. Then you add Kubernetes metadata enrichment, tail sampling for traces, filtering for noisy health check spans, resource detection for cloud provider attributes, and suddenly you have a 500-line YAML file that handles traces, metrics, and logs across multiple pipelines.

    The problems compound in predictable ways. When someone submits a pull request to modify the tail sampling policy, the reviewer must mentally parse the entire configuration to understand context. When a team wants to test a new transform processor statement, they cannot easily isolate that piece from the rest of the pipeline. When you deploy to staging versus production, environment-specific values get mixed with structural configuration, making it difficult to identify what actually differs between deployments.

    The collector's configuration merging behavior and provider system offer a path forward, but the patterns for using them effectively are not immediately obvious from the documentation.

    Configuration providers and merging

    The collector supports multiple configuration sources through providers. The most commonly used are the file provider (file:), environment provider (env:), HTTP provider (http://, https://), and YAML provider (yaml:). Each provider resolves a URI to configuration content, and the collector merges configurations from multiple sources in the order specified.

    otelcol --config file:base.yaml --config file:overrides.yaml
    

    When the collector receives multiple configuration sources, it performs a deep merge. Keys from later sources override keys from earlier sources at each level of the hierarchy. This merge behavior is the foundation for decomposition: you can split configuration by concern and let the merge operation assemble the final result.

    The environment provider substitutes environment variable values within configuration files using ${env:VAR_NAME} or the shorthand ${VAR_NAME} syntax.

    The file provider supports recursive inclusion through the ${file:path} syntax within configuration files. This enables configuration fragments to reference other fragments, building up complex configurations from smaller pieces.

    Decomposition strategies

    Three primary patterns emerge for organizing collector configurations: splitting by component type, splitting by signal pipeline, and layering environment-specific overlays. Each serves different organizational needs, and they can be combined.

    Splitting by component type

    The first pattern separates receivers, processors, and exporters into distinct files. This works well when teams own different parts of the telemetry pipeline. The platform team might own receiver configurations, while the observability team owns processor logic, and the SRE team manages exporter destinations.

    collector/
      base.yaml           # service section, extensions
      receivers.yaml      # all receiver definitions
      processors.yaml     # all processor definitions
      exporters.yaml      # all exporter definitions
    

    The base configuration defines the service section and references components by name:

    # base.yaml
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
    
    service:
      extensions: [health_check]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, k8sattributes, batch]
          exporters: [otlp]
        metrics:
          receivers: [otlp, prometheus]
          processors: [memory_limiter, batch]
          exporters: [otlp]
    

    Component files define the actual configurations:

    # receivers.yaml
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      prometheus:
        config:
          scrape_configs:
            - job_name: kubernetes-pods
              kubernetes_sd_configs:
                - role: pod
    
    # processors.yaml
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: ${env:MEMORY_LIMIT_MIB:-512}
        spike_limit_mib: ${env:SPIKE_LIMIT_MIB:-128}
    
      k8sattributes:
        auth_type: serviceAccount
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.pod.name
            - k8s.deployment.name
    
      batch:
        timeout: 1s
        send_batch_size: 1024
    

    The collector assembles these with multiple --config flags:

    otelcol --config file:base.yaml \
            --config file:receivers.yaml \
            --config file:processors.yaml \
            --config file:exporters.yaml
    

    This pattern makes pull requests smaller and more focused. A change to the batch processor configuration only touches processors.yaml, and reviewers can evaluate it in isolation.

    Splitting by signal pipeline

    When different teams own different telemetry signals, splitting by pipeline makes more sense. The traces team iterates on sampling policies while the metrics team focuses on aggregation rules. Each signal gets its own configuration file containing receivers, processors, and exporters relevant to that signal.

    collector/
      common.yaml         # shared extensions, telemetry settings
      traces.yaml         # trace pipeline: receivers, processors, exporters, service.pipelines.traces
      metrics.yaml        # metrics pipeline: receivers, processors, exporters, service.pipelines.metrics
      logs.yaml           # logs pipeline: receivers, processors, exporters, service.pipelines.logs
    

    The common file contains shared infrastructure:

    # common.yaml
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      pprof:
        endpoint: localhost:1777
    
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: ${env:MEMORY_LIMIT_MIB:-512}
    
    service:
      extensions: [health_check, pprof]
      telemetry:
        logs:
          level: ${env:LOG_LEVEL:-info}
          encoding: json
        metrics:
          level: detailed
          readers:
            - pull:
                exporter:
                  prometheus:
                    host: 0.0.0.0
                    port: 8888
    

    Each signal file is self-contained for its domain:

    # traces.yaml
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    
    processors:
      tail_sampling:
        decision_wait: 30s
        num_traces: 50000
        policies:
          - name: errors
            type: status_code
            status_code:
              status_codes: [ERROR]
          - name: slow-requests
            type: latency
            latency:
              threshold_ms: 2000
          - name: baseline
            type: probabilistic
            probabilistic:
              sampling_percentage: 5
    
      batch:
        timeout: 1s
        send_batch_size: 512
    
    exporters:
      otlp:
        endpoint: ${env:TRACES_BACKEND_ENDPOINT}
        tls:
          insecure: false
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling, batch]
          exporters: [otlp]
    

    The merge operation combines the service.pipelines sections from each file, resulting in a complete configuration with all three signal pipelines.

    Environment-specific overlays

    Production, staging, and development environments differ in endpoints, resource limits, and sometimes pipeline structure. The overlay pattern uses a shared base with environment-specific files that override particular values.

    collector/
      base.yaml
      env/
        production.yaml
        staging.yaml
        development.yaml
    

    The base file defines the complete structure with placeholders or defaults:

    # base.yaml
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
        spike_limit_mib: 128
      batch:
        timeout: 1s
        send_batch_size: 1024
    
    exporters:
      otlp:
        endpoint: localhost:4317
        tls:
          insecure: true
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp]
    

    Environment files override specific values:

    # env/production.yaml
    processors:
      memory_limiter:
        limit_mib: 2048
        spike_limit_mib: 512
    
    exporters:
      otlp:
        endpoint: ${env:BACKEND_ENDPOINT}
        tls:
          insecure: false
          ca_file: /etc/ssl/certs/ca-bundle.crt
        retry_on_failure:
          enabled: true
          max_elapsed_time: 300s
        sending_queue:
          enabled: true
          num_consumers: 10
          queue_size: 5000
    
    service:
      telemetry:
        logs:
          level: info
          encoding: json
    
    # env/development.yaml
    processors:
      memory_limiter:
        limit_mib: 256
    
    exporters:
      otlp:
        endpoint: localhost:4317
        tls:
          insecure: true
    
      debug:
        verbosity: detailed
    
    service:
      pipelines:
        traces:
          exporters: [otlp, debug]
      telemetry:
        logs:
          level: debug
    

    Deployment selects the appropriate overlay:

    # Production
    otelcol --config file:base.yaml --config file:env/production.yaml
    
    # Development
    otelcol --config file:base.yaml --config file:env/development.yaml
    

    Nested file inclusion

    For deeply modular configurations, the file provider supports nested inclusion. This is particularly useful for complex processor configurations like tail sampling policies, where individual policies might be maintained by different teams.

    # processors/tail_sampling.yaml
    processors:
      tail_sampling:
        decision_wait: 30s
        num_traces: 50000
        policies:
          - ${file:policies/errors.yaml}
          - ${file:policies/slo-violations.yaml}
          - ${file:policies/baseline.yaml}
    

    Each policy file contains a single policy definition:

    # policies/errors.yaml
    name: errors
    type: status_code
    status_code:
      status_codes: [ERROR]
    
    # policies/slo-violations.yaml
    name: slo-violations
    type: and
    and:
      and_sub_policy:
        - name: latency-threshold
          type: latency
          latency:
            threshold_ms: 2000
        - name: high-priority-services
          type: string_attribute
          string_attribute:
            key: service.tier
            values: [critical, high]
    

    This granularity enables teams to own individual policies, submit focused pull requests, and test policies in isolation before integration.

    Testing decomposed configurations

    The collector's validate command accepts the same configuration sources as runtime, enabling validation of decomposed configurations:

    # Validate merged configuration
    otelcol validate --config file:base.yaml --config file:env/production.yaml
    
    # Validate with environment variables set
    BACKEND_ENDPOINT=backend:4317 otelcol validate --config file:base.yaml --config file:env/production.yaml
    

    For more complex validation, the print-config command outputs the fully resolved configuration after merging and environment variable substitution:

    otelcol print-config --config file:base.yaml --config file:env/production.yaml
    

    This output shows exactly what the collector would receive, useful for debugging merge issues or unexpected environment variable values.

    Individual component files can be validated in isolation by wrapping them in minimal configurations. For a processor file to validate independently, it needs at least one receiver, exporter, and pipeline that uses the processor:

    # test-harness.yaml
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: localhost:4317
    
    exporters:
      debug:
        verbosity: basic
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [tail_sampling]  # processor under test
          exporters: [debug]
    
    otelcol validate --config file:test-harness.yaml --config file:processors/tail_sampling.yaml
    

    Practical considerations

    The merge operation has limitations worth understanding. Arrays are not merged; the later source completely replaces the earlier source's array. This affects pipeline definitions: if base.yaml defines processors: [a, b, c] and an overlay defines processors: [a, b], the result is [a, b], not a combination. Plan your decomposition accordingly, keeping arrays that need to vary together in the same file.

    File paths in nested inclusions are relative to the working directory, not the file containing the inclusion. A ${file:policies/errors.yaml} reference resolves relative to where the collector process runs, regardless of which configuration file contains the reference. This behavior can surprise when organizing configurations in subdirectories; consider using absolute paths or ensuring the working directory is set appropriately.

    Environment variable defaults (${env:VAR:-default}) only apply when the variable is unset. An empty string is not the same as unset; if VAR="" is exported, the default is not used. For required variables without reasonable defaults, validate externally before starting the collector:

    : ${BACKEND_ENDPOINT:?BACKEND_ENDPOINT must be set}
    otelcol --config file:config.yaml
    

    Custom collector distributions built with ocb need appropriate providers included in the manifest. The file, env, http, https, and yaml providers are not included by default. Missing providers cause cryptic errors when the configuration tries to use unsupported URI schemes:

    # builder.yaml
    providers:
      - gomod: go.opentelemetry.io/collector/confmap/provider/fileprovider v1.57.0
      - gomod: go.opentelemetry.io/collector/confmap/provider/envprovider v1.57.0
      - gomod: go.opentelemetry.io/collector/confmap/provider/yamlprovider v1.57.0
    

    When not to decompose

    Decomposition adds indirection. A single-file configuration that fits on a screen is easier to understand than multiple files that must be mentally merged. Small teams with straightforward pipelines may find decomposition overhead exceeds its benefits.

    The patterns described here target configurations that have grown painful to maintain. If your configuration is not yet painful, keeping it simple might be the right choice. The collector's configuration system supports decomposition when you need it; you are not required to use it from the start.

    Summary

    The OpenTelemetry Collector's configuration merging and provider system enable modular configurations that scale with organizational complexity. Splitting by component type aligns with team ownership of pipeline stages. Splitting by signal pipeline aligns with team ownership of telemetry domains. Environment overlays separate deployment concerns from structural configuration.

    The key insight is that the collector's deep merge behavior lets you compose configurations from independent pieces. Each piece can be reviewed, tested, and modified in isolation. When combined with validation in CI pipelines, decomposed configurations become easier to maintain than their monolithic alternatives.

    Start with the decomposition pattern that matches your organizational boundaries. If platform and observability teams have clear ownership, split by component type. If traces, metrics, and logs teams operate independently, split by signal. Layer environment overlays on either approach. The collector's configuration system is flexible enough to support the structure that works for your team.

    We use analytics cookies to improve our website and understand how you use it. You can accept analytics cookies or decline to use only essential cookies.