Observability and Monitoring in Intent
Overview
Observability is a critical aspect of the Intent architecture, providing insights into the system's behavior, performance, and health. The system implements distributed tracing using OpenTelemetry, allowing for end-to-end visibility across the various components and services.
Core Components
OpenTelemetry Integration
The system uses OpenTelemetry for distributed tracing, with a simple but effective implementation:
// src/infra/observability/otel-trace-span.ts import { trace } from '@opentelemetry/api'; const tracer = trace.getTracer('infra'); export async function traceSpan<T>( name: string, attributes: Record<string, any>, fn: () => Promise<T> ): Promise<T> { console.log(`[otel-traceSpan] Starting span: ${name}`, attributes); return tracer.startActiveSpan(name, { attributes }, async (span) => { try { return await fn(); } catch (error) { if (error instanceof Error) { span.recordException(error); } else { span.recordException({ message: String(error) }); } throw error; } finally { span.end(); } }); }
This implementation provides:
- A simple API for creating spans with names and attributes
- Automatic error recording for exceptions
- Proper span lifecycle management (ensuring spans are always ended)
Temporal Workflow Observability
Temporal workflows are instrumented for observability using a dedicated signal and activity:
// From src/infra/temporal/workflows/processCommand.ts const obsTraceSignal = defineSignal<[{ span: string; data?: Record<string, any> }]>('obs.trace'); setHandler(obsTraceSignal, async ({span, data}) => { await emitObservabilitySpan(span, data); });
The emitObservabilitySpan
activity creates a span without executing any code:
// From src/infra/temporal/activities/observabilityActivities.ts export async function emitObservabilitySpan(span: string, data?: Record<string, any>) { await traceSpan(span, data || {}, async () => {}); }
This pattern allows:
- External systems to send signals to workflows to create spans
- Workflows to emit spans at key points without modifying the workflow logic
- Correlation of workflow execution with other system components
Projection Observability
Projections are instrumented to track their performance and errors:
// From src/infra/projections/projectEvents.ts for (const event of events) { for (const h of handlers) { if (!h.supportsEvent(event)) continue; try { await traceSpan(`projection.handle.${event.type}`, { event }, () => h.on(event), ); } catch (err) { console.warn('Projection failed', { eventType: event.type, error: err }); } } }
This provides:
- Spans for each projection handler execution
- Correlation of projections with the events they process
- Error tracking for failed projections
Testing Observability
The system includes a test tracer for verifying observability instrumentation:
// From src/infra/observability/otel-test-tracer.ts import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base'; export const memoryExporter = new InMemorySpanExporter(); const provider = new NodeTracerProvider({ spanProcessors: [new SimpleSpanProcessor(memoryExporter)], }); provider.register({ contextManager });
Integration tests verify that spans are created correctly:
// From src/infra/integration-tests/otel.test.ts it('emits a projection.handle span', async () => { memoryExporter.reset(); const evt: Event = { id: randomUUID(), type: 'testExecuted', // ... other fields }; await projectEvents([evt], pool); const spans = memoryExporter.getFinishedSpans(); expect(spans.length).toBeGreaterThan(0); expect(spans[0].name).toBe('projection.handle.testExecuted'); });
Observability Patterns
Span Naming Conventions
The system uses consistent span naming conventions:
projection.handle.{eventType}
for projection handlers- Custom span names for workflow activities and other operations
Attribute Enrichment
Spans are enriched with relevant attributes:
- Event data for projection spans
- Command data for workflow spans
- Error information for exception spans
Error Tracking
Errors are automatically recorded in spans:
- Exceptions are caught and recorded with stack traces
- Failed projections are logged with event type and error information
Integration with Other Patterns
Observability in Intent integrates with several other patterns:
- Event Sourcing: Events are tracked through the system with spans
- CQRS: Projections are instrumented to track read model updates
- Temporal Workflows: Workflows emit spans for key activities
- Multi-tenancy: Spans can include tenant information for tenant-specific monitoring
Benefits of the Observability Approach
- End-to-End Visibility: Traces span across system boundaries
- Performance Monitoring: Span durations provide insights into performance bottlenecks
- Error Detection: Exceptions are automatically recorded in spans
- Debugging Support: Detailed traces help with debugging complex issues
- Operational Insights: Patterns in traces can reveal operational issues
Challenges and Considerations
- Overhead: Tracing adds some performance overhead
- Data Volume: High-traffic systems generate large volumes of trace data
- Privacy Concerns: Care must be taken not to include sensitive data in spans
- Sampling Strategy: Determining the right sampling rate for traces
- Integration Complexity: Ensuring consistent tracing across all system components
Future Enhancements
Potential improvements to the observability system could include:
- Metrics Collection: Adding metrics for key system indicators
- Structured Logging: Integrating structured logging with trace context
- Alerting Integration: Connecting traces to alerting systems
- Visualization Tools: Integrating with trace visualization tools
- Correlation IDs: Enhancing correlation across system boundaries