Observability & Monitoring

This document provides a deep dive into how Intent implements observability beyond the basics, covering the technical details of tracing, logging, and monitoring throughout the system.

OpenTelemetry Integration

Intent uses OpenTelemetry as its observability framework, providing distributed tracing capabilities across all components of the system. The implementation is both simple and powerful:

// src/infra/observability/otel-trace-span.ts
import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('infra');

export async function traceSpan<T>(
    name: string,
    attributes: Record<string, any>,
    fn: () => Promise<T>
): Promise<T> {
    console.log(`[otel-traceSpan] Starting span: ${name}`, attributes);
    return tracer.startActiveSpan(name, { attributes }, async (span) => {
        try {
            return await fn();
        } catch (error) {
            if (error instanceof Error) {
                span.recordException(error);
            } else {
                span.recordException({ message: String(error) });
            }
            throw error;
        } finally {
            span.end();
        }
    });
}

This traceSpan helper function is used throughout the codebase to wrap operations in trace spans. It provides:

Automatic Error Recording: Any exceptions thrown during the operation are automatically recorded in the span
Proper Span Lifecycle: Spans are always ended, even if an error occurs
Attribute Enrichment: Operations can include relevant attributes for context
Simplified API: A clean, consistent interface for creating spans

Usage Examples

The traceSpan helper is used in various parts of the system:

// Command handling
await traceSpan('command.handle', { command }, async () => {
    // Command handling logic
});

// Event processing
await traceSpan('event.process', { event }, async () => {
    // Event processing logic
});

// Database operations
await traceSpan('db.query', { query, params }, async () => {
    // Database query execution
});

This consistent approach ensures that all major flows in the system are properly instrumented, providing end-to-end traceability.

Workflow Tracing

One of the most innovative aspects of Intent's observability is how it handles tracing in workflow engine. Since workflows must be deterministic (the same inputs must produce the same outputs), traditional tracing can be challenging.

Intent solves this with a clever signal-based approach:

// From src/infra/temporal/workflows/processCommand.ts
const obsTraceSignal = defineSignal<[{ span: string; data?: Record<string, any> }]>('obs.trace');

setHandler(obsTraceSignal, async ({span, data}) => {
    await emitObservabilitySpan(span, data);
});

The emitObservabilitySpan activity creates a span without executing any code:

// From src/infra/temporal/activities/observabilityActivities.ts
export async function emitObservabilitySpan(span: string, data?: Record<string, any>) {
    await traceSpan(span, data || {}, async () => {});
}

This pattern allows:

Non-intrusive Tracing: Workflows can be traced without modifying their deterministic logic
External Observability: External systems can send signals to workflows to create spans
Workflow Correlation: Traces can be correlated with workflow execution
Long-running Process Visibility: Even workflows that run for days or weeks can emit trace markers

Workflow Tracing in Action

When a workflow is running, it can receive an obs.trace signal to create a span:

// Example of sending a trace signal to a workflow
await client.workflow.signalWithStart(processCommand, {
    taskQueue: 'intent-tasks',
    workflowId,
    signal: obsTraceSignal,
    signalArgs: [{ span: 'workflow.milestone.reached', data: { milestone: 'payment-processed' } }],
    args: [command],
});

This creates a trace span without affecting the deterministic execution of the workflow, providing visibility into the workflow's progress.

Projection Tracing

Projections are a critical part of the CQRS pattern in Intent, and they are fully instrumented for observability:

// From src/infra/projections/projectEvents.ts
for (const event of events) {
  for (const h of handlers) {
    if (!h.supportsEvent(event)) continue;

    try {
      await traceSpan(`projection.handle.${event.type}`, { event }, () =>
          h.on(event),
      );
    } catch (err) {
      console.warn('Projection failed', { eventType: event.type, error: err });
    }
  }
}

This provides:

Per-Event Tracing: Each projection handler execution is wrapped in a span
Event Context: The span includes the event data for context
Error Tracking: Failed projections are logged with detailed error information
Performance Monitoring: Span durations reveal how long each projection takes to process events

This level of detail is invaluable for debugging projection issues and understanding the performance characteristics of the read model updates.

Logging

While the note focuses primarily on tracing, Intent also includes a structured logging system:

// Example of structured logging with context
logger.info('Command processed', {
    commandId: command.id,
    commandType: command.type,
    tenantId: command.tenant_id,
    correlationId: command.metadata?.correlationId,
    duration: performance.now() - startTime,
});

Key aspects of the logging system:

Structured Format: Logs are structured (typically JSON) for easy parsing and analysis
Context Enrichment: Logs include relevant context like tenant IDs and correlation IDs
Log Levels: Different log levels (debug, info, warn, error) for appropriate verbosity
Correlation with Traces: Logs include trace identifiers for correlation with distributed traces

The LoggerPort interface provides a consistent logging API across the system, which can be implemented by various logging backends (e.g., pino, winston).

Testing Observability

Intent takes the unusual but valuable step of testing its observability instrumentation. This ensures that the observability features themselves are working correctly:

// From src/infra/observability/otel-test-tracer.ts
import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';

export const memoryExporter = new InMemorySpanExporter();

const provider = new NodeTracerProvider({
    spanProcessors: [new SimpleSpanProcessor(memoryExporter)],
});

provider.register({ contextManager });

Integration tests verify that spans are created correctly:

// From src/infra/integration-tests/otel.test.ts
it('emits a projection.handle span', async () => {
    memoryExporter.reset();

    const evt: Event = {
        id: randomUUID(),
        type: 'testExecuted',
        // ... other fields
    };

    await projectEvents([evt], pool);

    const spans = memoryExporter.getFinishedSpans();
    expect(spans.length).toBeGreaterThan(0);
    expect(spans[0].name).toBe('projection.handle.testExecuted');
});

This approach:

Verifies Instrumentation: Ensures that spans are created as expected
Prevents Regressions: Catches changes that might break observability
Documents Expected Behavior: Shows what spans should be created in different scenarios

Observability Patterns

Intent follows several key patterns for effective observability:

Span Naming Conventions

Consistent span naming makes it easier to understand and query traces:

projection.handle.{eventType} for projection handlers
command.handle.{commandType} for command handlers
workflow.{workflowName}.{activity} for workflow activities
db.{operation} for database operations

Attribute Enrichment

Spans are enriched with relevant attributes to provide context:

await traceSpan('command.handle', {
    commandId: command.id,
    commandType: command.type,
    tenantId: command.tenant_id,
    aggregateId: command.payload.aggregateId,
    aggregateType: command.payload.aggregateType,
}, async () => {
    // Command handling logic
});

These attributes make it possible to filter and analyze traces based on various dimensions.

Error Tracking

Errors are automatically recorded in spans, providing valuable debugging information:

try {
    return await fn();
} catch (error) {
    if (error instanceof Error) {
        span.recordException(error);
    } else {
        span.recordException({ message: String(error) });
    }
    throw error;
}

This ensures that when something goes wrong, the trace contains detailed error information.

Correlation IDs

Every workflow and request carries a correlationId (from Metadata) which is included in logs and traces:

// Example of propagating correlation IDs
const correlationId = command.metadata?.correlationId || randomUUID();
const childCommand = {
    // ...command properties
    metadata: {
        ...command.metadata,
        correlationId,
        causationId: command.id,
    },
};

This allows for tracking related operations across system boundaries.

Benefits of Intent's Observability

The comprehensive observability in Intent provides several key benefits:

Debugging: When issues occur, traces provide a detailed timeline of what happened, making it easier to identify the root cause.
Performance Tuning: Span durations reveal performance bottlenecks, showing which operations are taking the most time.
System Understanding: Traces provide a visual representation of how the system works, helping new developers understand the flow of operations.
Operational Visibility: Patterns in traces can reveal operational issues, such as increased latency or error rates.
Cross-Component Correlation: Traces span across system boundaries, showing how different components interact.

Extending Observability

To add observability to new components in Intent:

Use the traceSpan Helper: Wrap operations in traceSpan calls to create spans.
Follow Naming Conventions: Use consistent span names that follow the established patterns.
Include Relevant Attributes: Add attributes that provide context for the operation.
Propagate Context: Ensure that trace context is propagated across async boundaries.
Test Instrumentation: Write tests that verify spans are created as expected.

By following these guidelines, new components will maintain the same level of observability as the rest of the system.

Observability Configuration

Intent's observability can be configured through environment variables:

# Observability configuration
LOG_LEVEL=info                 # Log level (debug, info, warn, error)
LOG_ERRORS_TO_STDERR=false     # Whether to log errors to stderr
OTEL_EXPORTER_OTLP_ENDPOINT=   # OpenTelemetry collector endpoint
OTEL_RESOURCE_ATTRIBUTES=      # Resource attributes for spans

This allows for tuning the verbosity and destination of logs and traces based on the environment (development, staging, production).