OllyGarden at KubeCon EU 2026

KubeCon EU 2026 in Amsterdam was a big week for OllyGarden. Our team delivered five recorded talks across Observability Day and the main conference, co-chaired the Observability Day program committee, ran a Prometheus Contribfest session, and announced the general availability of Rose from booth 1343. Here is what we presented and why it matters.

Observability Day

Observability Day kicked off the week on Monday, March 23, as a co-located event before the main conference. Juraci Paixão Kröhling served as program committee co-chair alongside Austin Parker and Eduardo Silva, helping shape the day's program.

In the opening remarks, Juraci framed the day around a simple observation: the systems we build today are beyond the ability of any single individual to fully understand. Microservices, service meshes, serverless functions, GPU clusters running inference workloads. A request comes in, touches multiple services belonging to multiple teams on multiple clusters across multiple clouds. Observability exists not as a buzzword or product category, but as a fundamental engineering practice, because if we cannot understand our systems, we cannot fix them, improve them, or trust them.

But adoption does not mean the problems are solved. The economics of observability are still broken. Some companies spend six to nine digits on observability tooling alone and still have to turn off tracing or reduce log retention to stay within budget. The choice between visibility and affordability should not be normal. And perhaps the biggest challenge is the human one: teams of five platform engineers supporting organizations with thousands of developers, most of whom start from zero when it comes to good telemetry practices. No single tool solves that. These are structural problems, and the day's program was built around people facing them in production.

Two OllyGarden talks ran in parallel tracks that afternoon.

OpenTelemetry gateways: enforce, transform, route

Juraci Paixão Kröhling (OllyGarden) & Natalie Ujuk (IG Group) | Watch on YouTube

This talk makes the case for the OpenTelemetry Collector gateway as a centralized control point in the telemetry pipeline, using IG Group's production deployment as the running example. IG Group is a financial services company with over 800 engineers across four offices, a mix of on-premises and multi-cloud infrastructure, and thousands of services including legacy systems that cannot be modified. They operate over 10,000 edge collectors that forward telemetry through Kafka into a gateway layer.

The core argument is that scaling observability by teaching every team to instrument correctly does not work. Developers are busy delivering features, teams are reluctant to touch legacy code, and documentation does not get read. A centralized gateway solves this by enforcing standards, transforming data, and routing telemetry without requiring changes to application code.

The talk organizes gateway capabilities into three categories. Enforcement covers mandatory resource attributes (service.name, service.namespace, deployment.environment), service identity validation against a Configuration Management Database (CMDB), and Personally Identifiable Information (PII) protection using OpenTelemetry Transformation Language (OTTL) regex patterns. Transformation covers filling in missing attributes, normalizing inconsistent naming conventions, and enriching telemetry with external data sources like GeoIP databases. Routing covers directing PII-containing services through resource-intensive redaction pipelines while others take a fast path, aggregating Java Virtual Machine (JVM) metrics to reduce time series counts by 97.5%, and distributing traces across sampling instances using trace ID for load balancing.

The speakers also address the trade-offs honestly. The gateway becomes critical infrastructure that requires high availability planning. It is tempting to decompose the pipeline into too many layers, so teams should start simple and add complexity only as needed. During Q&A, an audience member raised the challenge of opacity when gateways drop or transform telemetry. The response: redact attribute values rather than removing entire attributes, tag telemetry to indicate when attributes have been dropped, and use OTel Weaver to shift enforcement left to CI time.

Let me be your OpenTelemetry champ

Pavol Loffay (Red Hat) & Nicolas Worner (OllyGarden) | Watch on YouTube

Nicolas and Pavol tested how well AI coding agents handle OpenTelemetry instrumentation and Collector configuration. The setup is a practical Friday-afternoon scenario: a developer and an SRE need to instrument a multi-service demo application (browser, Go backend, Python service calling a Large Language Model) and deploy a Collector in Kubernetes before the weekend.

The first attempt, using a coding agent with a simple prompt, produced a working pull request in five minutes. But closer inspection revealed problems. The agent used an outdated Python SDK version from 2024. Context propagation failed, leaving each service with disconnected traces instead of a single correlated trace. Span names used raw function names instead of the semantic convention pattern of method and route.

The talk then introduces "agentic skills" as a mechanism for improving agent performance. An SDK version skill maintains a table of the latest version for each language SDK, updated automatically on new releases. A semantic conventions skill provides a script that fetches only the relevant subset of the conventions registry. With these skills loaded, the agent produced correct span names, proper semantic convention attributes, working context propagation, and functioning log correlation with traces.

On the Collector side, the agent consistently struggled. It hallucinated deprecated configuration fields, produced invalid OTTL statements, suggested extensions that exist only in the repository and not in any standard distribution, and generated different configurations each time the same prompt ran. These findings led to concrete proposals: a collection of agentic skills for common OpenTelemetry use cases (Kubernetes monitoring, PII removal, cost reduction) and a Model Context Protocol (MCP) server for the Collector that would teach agents exactly which components exist in a given distribution, what their configuration looks like for a specific version, and how they map to the deployment.

Main conference

OpenTelemetry project update and ask the experts

Pablo Baeyens (Datadog), Juraci Paixão Kröhling (OllyGarden), Marylia Gutierrez (Grafana Labs) & Severin Neumann (Causely) | Watch on YouTube

The four OpenTelemetry Governance Committee members presented progress since the previous project update in Atlanta. Three deep-dive topics anchored the session.

Declarative configuration reached stable status. This feature replaces the fragmented environment variable approach with a single YAML file that works across all SDKs (Java, JavaScript, Python, Go, and others) as well as the Collector. Users can configure resources, propagators, tracer providers, meter providers, and logger providers in one place.

Juraci presented OTel Weaver, a tool for defining, validating, and evolving telemetry conventions within an organization. Teams create a registry of YAML files that define metrics, attributes, instrument types, stability levels, and required fields. Weaver resolves these into a schema and can generate code from it. Custom Rego policies enforce naming conventions and attribute patterns. A live check mode acts as an OpenTelemetry Protocol (OTLP) server that validates incoming telemetry against the registry in real time and reports policy violations as JSON output.

Severin Neumann introduced the OpenTelemetry Injector, which uses LD_PRELOAD to automatically inject the correct auto-instrumentation into applications without code changes. It supports Java, Node.js, .NET, and Python, detecting the application runtime and setting the appropriate environment variables before the process starts.

Rapid updates covered 4x to 30x performance improvements in the Go SDK metrics path, the deprecation of the span event API in favor of log-based events, profiling reaching alpha in OTLP version 1.11, the extended Berkeley Packet Filter (eBPF) instrumentation project targeting a 1.0 release candidate, Kubernetes semantic conventions reaching release candidate status, and new Special Interest Groups (SIGs) for Collector MCP, Browser, Zig, PHP, Kotlin, and the Ecosystem Explorer.

How manual OTel instrumentation saves more than just money

Juliano Costa (Datadog) & Yuri Oliveira (OllyGarden) | Watch on YouTube

Many companies that have reached a mature observability practice continue to rely on auto-instrumentation without realizing it is quietly inflating their budgets. Yuri and Juliano show how manual instrumentation can deliver leaner and more business-oriented telemetry. By comparing traces produced through automatic and manual instrumentation, the talk demonstrates how controlling spans and their attributes can cut data volume by up to 60%, reducing resource overhead and storage costs while improving signal-to-noise ratio.

The talk covers the do's and don'ts of span and attribute management, how to identify unnecessary telemetry metadata, and practical steps to achieve higher observability efficiency. The key insight is that the problem may lie not in how much you collect, but in what you collect. At scale, the difference between thoughtful manual instrumentation and default auto-instrumentation translates directly into cost savings, cleaner traces, and faster debugging.

Day-2 reality check: taming wasteful telemetry

Juraci Paixão Kröhling (OllyGarden) & Elena Kovalenko (Delivery Hero) | Watch on YouTube

This talk examines the sources of telemetry waste that accumulate after the initial instrumentation push. The speakers identify three root causes: the convenience trap of auto-instrumentation at day zero, lack of telemetry governance in large organizations (especially those formed through acquisitions, like Delivery Hero), and telemetry hoarding where teams keep data "just in case."

The real-world examples are striking. Elena describes a security incident where over 700 customer phone numbers leaked into a telemetry backend because neither the application nor the ingestion pipeline masked PII. She also shows "traces of doom," where auto-instrumentation on a pub/sub system produced a single trace running for 7 days and 12 hours with no useful information in its spans. A slide of actual service names found in production includes database calls, broadcast IP addresses, "net http," AWS S3 endpoints, and "Java SDK," none of which are valid service names.

For immediate relief, the talk demonstrates several OpenTelemetry Collector processors: the transform processor for masking PII and deleting obsolete attributes, and the log deduplication processor for aggregating repeated log entries. For long-term improvement, the speakers advocate source-level governance through OTel Weaver for schema enforcement, SDK wrappers that provide opinionated defaults, and agentic tooling that reviews pull requests for cardinality issues and suggests instrumentation improvements.

Beyond the stage

Contribfest: Prometheus new contributor introduction

Ben Kochie (Reddit), Bryan Boreham (Grafana Labs), Saswata Mukherjee (Red Hat) & Arianna Vespri (OllyGarden)

Arianna co-led a hands-on Contribfest session guiding newcomers through their first contributions to the Prometheus project. Contribfest sessions are not recorded, but they are one of the most effective ways to grow the contributor base for open source projects.

Rose general availability

OllyGarden announced the general availability of Rose at KubeCon, an AI agent that fixes bad OpenTelemetry instrumentation. The announcement came from booth 1343, where the team spent the week talking with platform teams and observability engineers about the challenges covered in the talks above: telemetry waste, governance at scale, and the gap between auto-instrumentation defaults and production-quality observability.

A common thread

A theme runs through all five talks: the hard problems in observability are no longer about getting started with instrumentation. They are about what happens after. Telemetry waste accumulates silently. Governance does not scale through documentation and mandates. Auto-instrumentation provides a starting point, not an ending point. AI coding agents can accelerate instrumentation work, but they need structured knowledge to produce correct results. And centralized gateways offer a practical path to enforcing standards without requiring every team to become an observability expert.

These are the problems OllyGarden works on every day, and KubeCon EU 2026 was an opportunity to share what we have learned with the community.