Tail-Based Sampling
Smart log sampling that decides after trace completion based on errors, latency, status codes, and custom business logic. Achieve 40-90% cost reduction while maintaining 100% error coverage.
Overview
Traditional (head-based) sampling decides immediately whether to log:
typescript
// ❌ Head-based sampling - might miss errors
if (Math.random() < 0.1) { // 10% sample rate
logger.info('Request started')
// ... process request ...
logger.error('Request failed!') // If not sampled, this error is lost!
}Tail-based sampling buffers logs and decides after the trace completes:
typescript
// ✅ Tail-based sampling - never miss errors
logger.info('Request started') // Buffered
// ... process request ...
logger.error('Request failed!') // Buffered
// On trace complete:
// - Has error? → Keep 100%
// - Slow (>1s)? → Keep 100%
// - Success? → Keep 1%Benefits:
- ✅ 100% error coverage (never miss issues)
- ✅ 100% slow request coverage (catch performance problems)
- ✅ 40-90% cost reduction (drop uninteresting logs)
- ✅ Smart decisions (based on full trace context)
- ✅ Budget-aware (adaptive sampling)
Basic Usage
typescript
import { createLogger, tailSamplingPlugin, tracingPlugin } from '@sylphx/cat'
const logger = createLogger({
plugins: [
tracingPlugin(), // Required for trace correlation
tailSamplingPlugin() // Default rules
]
})
// Errors are kept (100%)
logger.error('Payment failed', { orderId: 'ORD-123' })
// Success logs are sampled (1%)
logger.info('Order processed', { orderId: 'ORD-124' })Default Rules
typescript
[
{ name: 'errors', condition: hasError, sampleRate: 1.0 }, // Keep all errors
{ name: 'slow', condition: duration > 1000ms, sampleRate: 1.0 }, // Keep slow requests
{ name: '5xx', condition: statusCode >= 500, sampleRate: 1.0 }, // Keep server errors
{ name: '4xx', condition: statusCode >= 400, sampleRate: 0.5 }, // Sample client errors 50%
{ name: 'warnings', condition: hasWarning, sampleRate: 0.2 }, // Sample warnings 20%
{ name: 'default', condition: true, sampleRate: 0.01 } // Sample success 1%
]Custom Rules
Define rules based on your business logic:
typescript
import { tailSamplingPlugin, type SamplingRule } from '@sylphx/cat'
const rules: SamplingRule[] = [
// Rule 1: Keep all payment errors (highest priority)
{
name: 'payment-errors',
priority: 100,
condition: (trace) => {
return trace.metadata.hasError && trace.metadata.customFields.category === 'payment'
},
sampleRate: 1.0
},
// Rule 2: Keep VIP user actions (50%)
{
name: 'vip-users',
priority: 90,
condition: (trace) => trace.metadata.customFields.userTier === 'vip',
sampleRate: 0.5
},
// Rule 3: Keep slow checkouts (>3s)
{
name: 'slow-checkout',
priority: 80,
condition: (trace) => {
return (
trace.metadata.customFields.action === 'checkout' &&
(trace.metadata.maxDuration || 0) > 3000
)
},
sampleRate: 1.0
},
// Rule 4: Sample regular users at low rate
{
name: 'default',
priority: 0,
condition: () => true,
sampleRate: 0.01 // 1%
}
]
const logger = createLogger({
plugins: [
tracingPlugin(),
tailSamplingPlugin({ rules })
]
})Rule Structure
typescript
interface SamplingRule {
name: string // Rule identifier
priority?: number // Higher = evaluated first (default: 0)
condition: (trace: TraceBuffer) => boolean // Keep if true
sampleRate: number // 0.0 to 1.0 (0% to 100%)
}Trace Metadata
typescript
interface TraceMetadata {
hasError: boolean // Has error or fatal level
hasWarning: boolean // Has warn level
statusCode?: number // HTTP status code
maxDuration?: number // Max duration in ms
customFields: Record<string, unknown> // Your custom data
}Adaptive Sampling
Automatically adjust sample rates based on budget:
typescript
const logger = createLogger({
plugins: [
tracingPlugin(),
tailSamplingPlugin({
adaptive: true,
monthlyBudget: 10 * 1024 * 1024 * 1024, // 10 GB/month
onBudgetUpdate: (stats) => {
console.log(`Budget: ${stats.usedPercent}% used`)
console.log(`Current sample rate: ${stats.currentSampleRate}`)
}
})
]
})How it works:
- Monitor log volume in real-time
- If approaching budget (>80%), reduce sample rates
- Prioritize errors and slow requests
- Never drop errors (always 100%)
Example:
- 0-50% budget: Sample success at 5%
- 50-80% budget: Sample success at 2%
- 80-95% budget: Sample success at 0.5%
- 95-100% budget: Sample success at 0.1%
Configuration
typescript
tailSamplingPlugin({
// Sampling rules
rules?: SamplingRule[]
// Enable adaptive sampling
adaptive?: boolean // default: false
// Monthly budget in bytes
monthlyBudget?: number // default: undefined
// Max buffer size (traces)
maxBufferSize?: number // default: 1000
// Max trace duration (ms)
maxTraceDuration?: number // default: 60000 (1 minute)
// Callback when trace is flushed
onFlush?: (trace: TraceBuffer, kept: boolean) => void
// Callback when budget is updated
onBudgetUpdate?: (stats: BudgetStats) => void
})Real-World Examples
E-Commerce API
typescript
const rules: SamplingRule[] = [
// Critical: All errors
{ name: 'errors', condition: (t) => t.metadata.hasError, sampleRate: 1.0 },
// Critical: All 5xx errors
{ name: '5xx', condition: (t) => (t.metadata.statusCode || 0) >= 500, sampleRate: 1.0 },
// Important: Slow requests (>2s)
{ name: 'slow', condition: (t) => (t.metadata.maxDuration || 0) > 2000, sampleRate: 1.0 },
// Important: Authentication issues
{ name: 'auth', condition: (t) => {
const code = t.metadata.statusCode
return code === 401 || code === 403
}, sampleRate: 0.5 },
// Medium: 4xx errors
{ name: '4xx', condition: (t) => {
const code = t.metadata.statusCode || 0
return code >= 400 && code < 500 && code !== 401 && code !== 403
}, sampleRate: 0.1 },
// Low: Success
{ name: 'success', condition: () => true, sampleRate: 0.01 } // 1%
]
const logger = createLogger({
plugins: [
tracingPlugin(),
tailSamplingPlugin({
rules,
adaptive: true,
monthlyBudget: 50 * 1024 * 1024 * 1024 // 50 GB
})
]
})Result: 40-90% cost reduction with 100% error coverage
Gaming Backend
typescript
const rules: SamplingRule[] = [
// Critical: Crashes
{ name: 'crashes', condition: (t) => t.metadata.customFields.event === 'crash', sampleRate: 1.0 },
// Critical: Cheat detection
{ name: 'cheat', condition: (t) => t.metadata.customFields.cheatDetected === true, sampleRate: 1.0 },
// Important: High latency matchmaking
{ name: 'slow-match', condition: (t) => {
return t.metadata.customFields.event === 'matchmaking' &&
(t.metadata.maxDuration || 0) > 500
}, sampleRate: 1.0 },
// Low: Regular gameplay
{ name: 'gameplay', condition: () => true, sampleRate: 0.001 } // 0.1%
]Financial API
typescript
const rules: SamplingRule[] = [
// Critical: Transaction errors
{ name: 'tx-errors', condition: (t) => {
return t.metadata.hasError && t.metadata.customFields.category === 'transaction'
}, sampleRate: 1.0 },
// Critical: Fraud detection
{ name: 'fraud', condition: (t) => t.metadata.customFields.fraudScore !== undefined, sampleRate: 1.0 },
// Critical: High-value transactions
{ name: 'high-value', condition: (t) => {
const amount = t.metadata.customFields.amount as number
return amount !== undefined && amount > 10000
}, sampleRate: 1.0 },
// Important: Audit events (compliance)
{ name: 'audit', condition: (t) => t.metadata.customFields.auditRequired === true, sampleRate: 1.0 },
// Low: Regular API calls
{ name: 'default', condition: () => true, sampleRate: 0.05 } // 5%
]Distributed Tracing
Tail-based sampling works across microservices:
typescript
// Service A (API Gateway)
const gatewayLogger = createLogger({
plugins: [
tracingPlugin(), // Generates traceId
tailSamplingPlugin()
]
})
gatewayLogger.info('Request received')
// Propagate traceId to Service B
const headers = tracingPlugin.toHeaders(traceContext)
await fetch('http://service-b', { headers })
// Service B
const authLogger = createLogger({
plugins: [
tracingPlugin({ getTraceContext: () => traceContext }), // Same traceId
tailSamplingPlugin()
]
})
authLogger.error('Authentication failed')
// Both services' logs are kept (error detected)Performance
Memory Usage
typescript
// Configure buffer limits
tailSamplingPlugin({
maxBufferSize: 1000, // Max 1000 traces in memory
maxTraceDuration: 60000 // Auto-flush after 1 minute
})Memory estimate:
- ~1 KB per log entry
- 10 logs per trace average
- 1000 traces max = ~10 MB
Throughput
Tail-based sampling adds minimal overhead:
- Buffer write: ~1-2% CPU
- Rule evaluation: ~0.5% CPU
- Total overhead: ❤️% CPU
Monitoring
Track sampling statistics:
typescript
const logger = createLogger({
plugins: [
tracingPlugin(),
tailSamplingPlugin({
onFlush: (trace, kept) => {
if (kept) {
console.log(`✅ Kept trace ${trace.traceId}:`,
`errors=${trace.metadata.hasError},`,
`duration=${trace.metadata.maxDuration}ms`)
} else {
// Optionally log discarded traces
}
},
onBudgetUpdate: (stats) => {
console.log(`Budget: ${stats.usedPercent.toFixed(1)}% used`)
console.log(`Sample rate: ${stats.currentSampleRate}`)
}
})
]
})Testing
typescript
import { tailSamplingPlugin } from '@sylphx/cat'
describe('Tail-based sampling', () => {
it('keeps errors', () => {
const plugin = tailSamplingPlugin()
const errorEntry = {
level: 'error' as const,
timestamp: Date.now(),
message: 'Failed',
traceId: 'trace-123',
data: {}
}
plugin.onLog!(errorEntry)
plugin.flush!('trace-123')
// Verify error was kept
})
it('samples success at low rate', () => {
const plugin = tailSamplingPlugin()
let kept = 0
for (let i = 0; i < 1000; i++) {
const entry = {
level: 'info' as const,
timestamp: Date.now(),
message: 'Success',
traceId: `trace-${i}`,
data: { statusCode: 200 }
}
plugin.onLog!(entry)
plugin.flush!(`trace-${i}`)
}
// Expect ~1% kept (10 out of 1000)
expect(kept).toBeLessThan(50)
})
})Best Practices
Always Keep Errors
typescript
// ✅ Good - never miss errors
{ name: 'errors', condition: (t) => t.metadata.hasError, sampleRate: 1.0 }
// ❌ Bad - might miss critical issues
{ name: 'errors', condition: (t) => t.metadata.hasError, sampleRate: 0.5 }Prioritize Rules
typescript
// ✅ Good - high priority for important events
{ name: 'payment-errors', priority: 100, ... }
{ name: 'vip-users', priority: 90, ... }
{ name: 'default', priority: 0, ... }
// ❌ Bad - no priorities, order mattersUse Adaptive Sampling
typescript
// ✅ Good - automatically manages budget
adaptive: true,
monthlyBudget: 10 * 1024 * 1024 * 1024
// ❌ Bad - fixed rates might exceed budget
adaptive: falseCost Savings
Real-world impact (100,000 requests/second):
| Strategy | Logs Kept | Error Coverage | Monthly Cost |
|---|---|---|---|
| No sampling | 100,000/s | 100% | $10,000 |
| Head-based (10%) | 10,000/s | ~10% ❌ | $1,000 |
| Tail-based | 5,000/s | 100% ✅ | $500 |
Tail-based achieves:
- 50% cost vs. head-based
- 95% cost savings vs. no sampling
- 100% error coverage
See Also
- Tracing Guide - W3C Trace Context
- Plugins - Plugin system
- Best Practices - Production patterns
- Examples - Code examples