The Variability Principle: How to Decide What Deserves a Span

Every team discovers OpenTelemetry the same way. First, excitement—finally, visibility into distributed systems! Then comes the instrumentation party. Spans everywhere. Every function. Every validation. Every calculation gets its own span because "more data is better," right?
Three months later, you're staring at a trace with 500 spans trying to figure out why a simple API call took 3 seconds. Your observability bill has grown 10x. And your engineers have given up on traces entirely because they're impossible to read.
There's a better way.
The Problem: Span Explosion
Most teams create spans like this:
func ProcessPayment(ctx context.Context, payment Payment) error {
ctx, span := tracer.Start(ctx, "process payment")
defer span.End()
validateAmount(ctx, payment.Amount) // Another span
validateCard(ctx, payment.CardNumber) // Another span
calculateFees(ctx, payment.Amount) // Another span
formatCurrency(ctx, payment.Total) // Another span
// ... 10 more spans for trivial operations
}
At 10,000 requests per minute with 15 spans each, you're generating 6.5 billion spans per month. At $0.20 per million spans, that's $1,300 monthly just for payment processing traces.
But cost isn't the real problem. The real problem is that your traces become unreadable. When everything has a span, nothing stands out. Signal drowns in noise.
The Variability Principle: Your New Mental Model
Here's the principle that changed everything for us:
"Is this operation unpredictable?"
If yes, create a span. If no, don't.
This simple question cuts through all the complexity. It's not about operation importance or business value—it's about performance predictability.
Unpredictable = Create a Span
Operations with unpredictable performance need spans:
Database queries: Could take 5ms or 5 seconds depending on locks, data size, indexes
HTTP calls: Network latency, retries, timeouts are all variable
External APIs: You don't control their performance
Message queues: Depends on queue depth, consumer availability
Cache operations: Network round-trip to Redis/Memcached
File I/O: Disk performance varies, especially with network storage
These operations can surprise you. When they're slow, you need to know.
Predictable = Skip the Span
Operations with predictable performance don't need spans:
Validation logic: Checking if a string contains "@" is always microseconds
Math calculations: CPU-bound operations are consistent
Data transformation: Mapping objects in memory is deterministic
String formatting: Always fast, never the problem
Getters/setters: Not worth measuring
These operations can't surprise you. They're never the bottleneck.
The Pattern in Practice
Let's refactor that payment processing:
func ProcessPayment(ctx context.Context, payment Payment) {
ctx, span := tracer.Start(ctx, "process payment")
defer span.End()
// Add context as attributes, not spans
span.SetAttributes(
attribute.Float64("payment.amount", payment.Amount),
attribute.String("payment.currency", payment.Currency),
)
// Validation is predictable - no span needed
if payment.Amount <= 0 || !isValidCard(payment.CardNumber) {
span.RecordError(errors.New("invalid payment"))
return
}
// Database operation is unpredictable - needs a span
ctx, dbSpan := tracer.Start(ctx, "INSERT payments")
dbSpan.SetAttributes(
attribute.String("db.system", "postgresql"),
attribute.String("db.collection.name", "payments"),
attribute.String("db.operation.name", "INSERT"),
)
db.SavePayment(ctx, payment)
dbSpan.End()
// External API is unpredictable - needs a span
ctx, chargeSpan := tracer.Start(ctx, "charge card")
paymentGateway.Charge(ctx, payment)
chargeSpan.End()
}
Result: 3 spans instead of 15. Traces are readable. Engineers can actually find problems.
What to Use Instead of Spans
When you skip creating a span, you still need to capture information. That's where attributes and events come in.
Attributes: Context Without Cost
Attributes add metadata to existing spans. They're perfect for:
Request/response data (user ID, order total, currency)
Configuration values (retry count, timeout settings)
Business context (customer tier, feature flags)
span.SetAttributes(
attribute.String("user.id", userID),
attribute.Float64("order.total", 157.46),
attribute.Bool("cache.hit", true),
)
Attributes are indexed and searchable. They let you filter traces without creating separate spans.
Events: Milestones in Time
Events mark important moments within a span's lifecycle. They're perfect for:
Validation checkpoints
State transitions
Progress markers in loops
// Mark validation completion
span.AddEvent("validation completed")
// Track calculation results
span.AddEvent("total calculated",
trace.WithAttributes(
attribute.Int("line_items.count", 4),
attribute.Float64("total", 157.46),
))
// Record state changes
span.AddEvent("payment saved")
// Track retry attempts
span.AddEvent("retry attempt",
trace.WithAttributes(
attribute.Int("attempt", 3),
attribute.String("reason", "timeout"),
))
Events show you when something happened and provide rich context without the overhead of a full span. When debugging, they help you see the timeline of operations within your parent span.
The Decision Framework
Before creating any span, ask one question:
"Is this operation unpredictable?"
Yes → Create a span
No → Use attributes or events
That's it. This single question replaces complex decision trees and eliminates 80% of unnecessary spans.
Remember This
Your traces should tell a story, not document every CPU cycle. Each span costs money, performance, and clarity.
Create spans only for operations that could surprise you. For everything else, there are attributes and events.
The best observability isn't about having all the data—it's about having the right data.
