Observability & Monitoring
This document provides a deep dive into how Intent implements observability beyond the basics, covering the technical details of tracing, logging, and monitoring throughout the system.
OpenTelemetry Integration
Intent uses OpenTelemetry as its observability framework, providing distributed tracing capabilities across all components of the system. The implementation is both simple and powerful:
// src/infra/observability/otel-trace-span.ts import { trace } from '@opentelemetry/api'; const tracer = trace.getTracer('infra'); export async function traceSpan<T>( name: string, attributes: Record<string, any>, fn: () => Promise<T> ): Promise<T> { console.log(`[otel-traceSpan] Starting span: ${name}`, attributes); return tracer.startActiveSpan(name, { attributes }, async (span) => { try { return await fn(); } catch (error) { if (error instanceof Error) { span.recordException(error); } else { span.recordException({ message: String(error) }); } throw error; } finally { span.end(); } }); }
This traceSpan
helper function is used throughout the codebase to wrap operations in trace spans. It provides:
- Automatic Error Recording: Any exceptions thrown during the operation are automatically recorded in the span
- Proper Span Lifecycle: Spans are always ended, even if an error occurs
- Attribute Enrichment: Operations can include relevant attributes for context
- Simplified API: A clean, consistent interface for creating spans
Usage Examples
The traceSpan
helper is used in various parts of the system:
// Command handling await traceSpan('command.handle', { command }, async () => { // Command handling logic }); // Event processing await traceSpan('event.process', { event }, async () => { // Event processing logic }); // Database operations await traceSpan('db.query', { query, params }, async () => { // Database query execution });
This consistent approach ensures that all major flows in the system are properly instrumented, providing end-to-end traceability.
Workflow Tracing
One of the most innovative aspects of Intent's observability is how it handles tracing in workflow engine. Since workflows must be deterministic (the same inputs must produce the same outputs), traditional tracing can be challenging.
Intent solves this with a clever signal-based approach:
// From src/infra/temporal/workflows/processCommand.ts const obsTraceSignal = defineSignal<[{ span: string; data?: Record<string, any> }]>('obs.trace'); setHandler(obsTraceSignal, async ({span, data}) => { await emitObservabilitySpan(span, data); });
The emitObservabilitySpan
activity creates a span without executing any code:
// From src/infra/temporal/activities/observabilityActivities.ts export async function emitObservabilitySpan(span: string, data?: Record<string, any>) { await traceSpan(span, data || {}, async () => {}); }
This pattern allows:
- Non-intrusive Tracing: Workflows can be traced without modifying their deterministic logic
- External Observability: External systems can send signals to workflows to create spans
- Workflow Correlation: Traces can be correlated with workflow execution
- Long-running Process Visibility: Even workflows that run for days or weeks can emit trace markers
Workflow Tracing in Action
When a workflow is running, it can receive an obs.trace
signal to create a span:
// Example of sending a trace signal to a workflow await client.workflow.signalWithStart(processCommand, { taskQueue: 'intent-tasks', workflowId, signal: obsTraceSignal, signalArgs: [{ span: 'workflow.milestone.reached', data: { milestone: 'payment-processed' } }], args: [command], });
This creates a trace span without affecting the deterministic execution of the workflow, providing visibility into the workflow's progress.
Projection Tracing
Projections are a critical part of the CQRS pattern in Intent, and they are fully instrumented for observability:
// From src/infra/projections/projectEvents.ts for (const event of events) { for (const h of handlers) { if (!h.supportsEvent(event)) continue; try { await traceSpan(`projection.handle.${event.type}`, { event }, () => h.on(event), ); } catch (err) { console.warn('Projection failed', { eventType: event.type, error: err }); } } }
This provides:
- Per-Event Tracing: Each projection handler execution is wrapped in a span
- Event Context: The span includes the event data for context
- Error Tracking: Failed projections are logged with detailed error information
- Performance Monitoring: Span durations reveal how long each projection takes to process events
This level of detail is invaluable for debugging projection issues and understanding the performance characteristics of the read model updates.
Logging
While the note focuses primarily on tracing, Intent also includes a structured logging system:
// Example of structured logging with context logger.info('Command processed', { commandId: command.id, commandType: command.type, tenantId: command.tenant_id, correlationId: command.metadata?.correlationId, duration: performance.now() - startTime, });
Key aspects of the logging system:
- Structured Format: Logs are structured (typically JSON) for easy parsing and analysis
- Context Enrichment: Logs include relevant context like tenant IDs and correlation IDs
- Log Levels: Different log levels (debug, info, warn, error) for appropriate verbosity
- Correlation with Traces: Logs include trace identifiers for correlation with distributed traces
The LoggerPort
interface provides a consistent logging API across the system, which can be implemented by various logging backends (e.g., pino, winston).
Testing Observability
Intent takes the unusual but valuable step of testing its observability instrumentation. This ensures that the observability features themselves are working correctly:
// From src/infra/observability/otel-test-tracer.ts import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base'; export const memoryExporter = new InMemorySpanExporter(); const provider = new NodeTracerProvider({ spanProcessors: [new SimpleSpanProcessor(memoryExporter)], }); provider.register({ contextManager });
Integration tests verify that spans are created correctly:
// From src/infra/integration-tests/otel.test.ts it('emits a projection.handle span', async () => { memoryExporter.reset(); const evt: Event = { id: randomUUID(), type: 'testExecuted', // ... other fields }; await projectEvents([evt], pool); const spans = memoryExporter.getFinishedSpans(); expect(spans.length).toBeGreaterThan(0); expect(spans[0].name).toBe('projection.handle.testExecuted'); });
This approach:
- Verifies Instrumentation: Ensures that spans are created as expected
- Prevents Regressions: Catches changes that might break observability
- Documents Expected Behavior: Shows what spans should be created in different scenarios
Observability Patterns
Intent follows several key patterns for effective observability:
Span Naming Conventions
Consistent span naming makes it easier to understand and query traces:
projection.handle.{eventType}
for projection handlerscommand.handle.{commandType}
for command handlersworkflow.{workflowName}.{activity}
for workflow activitiesdb.{operation}
for database operations
Attribute Enrichment
Spans are enriched with relevant attributes to provide context:
await traceSpan('command.handle', { commandId: command.id, commandType: command.type, tenantId: command.tenant_id, aggregateId: command.payload.aggregateId, aggregateType: command.payload.aggregateType, }, async () => { // Command handling logic });
These attributes make it possible to filter and analyze traces based on various dimensions.
Error Tracking
Errors are automatically recorded in spans, providing valuable debugging information:
try { return await fn(); } catch (error) { if (error instanceof Error) { span.recordException(error); } else { span.recordException({ message: String(error) }); } throw error; }
This ensures that when something goes wrong, the trace contains detailed error information.
Correlation IDs
Every workflow and request carries a correlationId (from Metadata) which is included in logs and traces:
// Example of propagating correlation IDs const correlationId = command.metadata?.correlationId || randomUUID(); const childCommand = { // ...command properties metadata: { ...command.metadata, correlationId, causationId: command.id, }, };
This allows for tracking related operations across system boundaries.
Benefits of Intent's Observability
The comprehensive observability in Intent provides several key benefits:
-
Debugging: When issues occur, traces provide a detailed timeline of what happened, making it easier to identify the root cause.
-
Performance Tuning: Span durations reveal performance bottlenecks, showing which operations are taking the most time.
-
System Understanding: Traces provide a visual representation of how the system works, helping new developers understand the flow of operations.
-
Operational Visibility: Patterns in traces can reveal operational issues, such as increased latency or error rates.
-
Cross-Component Correlation: Traces span across system boundaries, showing how different components interact.
Extending Observability
To add observability to new components in Intent:
-
Use the
traceSpan
Helper: Wrap operations intraceSpan
calls to create spans. -
Follow Naming Conventions: Use consistent span names that follow the established patterns.
-
Include Relevant Attributes: Add attributes that provide context for the operation.
-
Propagate Context: Ensure that trace context is propagated across async boundaries.
-
Test Instrumentation: Write tests that verify spans are created as expected.
By following these guidelines, new components will maintain the same level of observability as the rest of the system.
Observability Configuration
Intent's observability can be configured through environment variables:
# Observability configuration
LOG_LEVEL=info # Log level (debug, info, warn, error)
LOG_ERRORS_TO_STDERR=false # Whether to log errors to stderr
OTEL_EXPORTER_OTLP_ENDPOINT= # OpenTelemetry collector endpoint
OTEL_RESOURCE_ATTRIBUTES= # Resource attributes for spans
This allows for tuning the verbosity and destination of logs and traces based on the environment (development, staging, production).