Observability and Monitoring in Intent

Overview

Observability is a critical aspect of the Intent architecture, providing insights into the system's behavior, performance, and health. The system implements distributed tracing using OpenTelemetry, allowing for end-to-end visibility across the various components and services.

Core Components

OpenTelemetry Integration

The system uses OpenTelemetry for distributed tracing, with a simple but effective implementation:

// src/infra/observability/otel-trace-span.ts
import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('infra');

export async function traceSpan<T>(
    name: string,
    attributes: Record<string, any>,
    fn: () => Promise<T>
): Promise<T> {
    console.log(`[otel-traceSpan] Starting span: ${name}`, attributes);
    return tracer.startActiveSpan(name, { attributes }, async (span) => {
        try {
            return await fn();
        } catch (error) {
            if (error instanceof Error) {
                span.recordException(error);
            } else {
                span.recordException({ message: String(error) });
            }
            throw error;
        } finally {
            span.end();
        }
    });
}

This implementation provides:

A simple API for creating spans with names and attributes
Automatic error recording for exceptions
Proper span lifecycle management (ensuring spans are always ended)

Temporal Workflow Observability

Temporal workflows are instrumented for observability using a dedicated signal and activity:

// From src/infra/temporal/workflows/processCommand.ts
const obsTraceSignal = defineSignal<[{ span: string; data?: Record<string, any> }]>('obs.trace');

setHandler(obsTraceSignal, async ({span, data}) => {
    await emitObservabilitySpan(span, data);
});

The emitObservabilitySpan activity creates a span without executing any code:

// From src/infra/temporal/activities/observabilityActivities.ts
export async function emitObservabilitySpan(span: string, data?: Record<string, any>) {
    await traceSpan(span, data || {}, async () => {});
}

This pattern allows:

External systems to send signals to workflows to create spans
Workflows to emit spans at key points without modifying the workflow logic
Correlation of workflow execution with other system components

Projection Observability

Projections are instrumented to track their performance and errors:

// From src/infra/projections/projectEvents.ts
for (const event of events) {
  for (const h of handlers) {
    if (!h.supportsEvent(event)) continue;

    try {
      await traceSpan(`projection.handle.${event.type}`, { event }, () =>
          h.on(event),
      );
    } catch (err) {
      console.warn('Projection failed', { eventType: event.type, error: err });
    }
  }
}

This provides:

Spans for each projection handler execution
Correlation of projections with the events they process
Error tracking for failed projections

Testing Observability

The system includes a test tracer for verifying observability instrumentation:

// From src/infra/observability/otel-test-tracer.ts
import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';

export const memoryExporter = new InMemorySpanExporter();

const provider = new NodeTracerProvider({
    spanProcessors: [new SimpleSpanProcessor(memoryExporter)],
});

provider.register({ contextManager });

Integration tests verify that spans are created correctly:

// From src/infra/integration-tests/otel.test.ts
it('emits a projection.handle span', async () => {
    memoryExporter.reset();

    const evt: Event = {
        id: randomUUID(),
        type: 'testExecuted',
        // ... other fields
    };

    await projectEvents([evt], pool);

    const spans = memoryExporter.getFinishedSpans();
    expect(spans.length).toBeGreaterThan(0);
    expect(spans[0].name).toBe('projection.handle.testExecuted');
});

Observability Patterns

Span Naming Conventions

The system uses consistent span naming conventions:

projection.handle.{eventType} for projection handlers
Custom span names for workflow activities and other operations

Attribute Enrichment

Spans are enriched with relevant attributes:

Event data for projection spans
Command data for workflow spans
Error information for exception spans

Error Tracking

Errors are automatically recorded in spans:

Exceptions are caught and recorded with stack traces
Failed projections are logged with event type and error information

Integration with Other Patterns

Observability in Intent integrates with several other patterns:

Event Sourcing: Events are tracked through the system with spans
CQRS: Projections are instrumented to track read model updates
Temporal Workflows: Workflows emit spans for key activities
Multi-tenancy: Spans can include tenant information for tenant-specific monitoring

Benefits of the Observability Approach

End-to-End Visibility: Traces span across system boundaries
Performance Monitoring: Span durations provide insights into performance bottlenecks
Error Detection: Exceptions are automatically recorded in spans
Debugging Support: Detailed traces help with debugging complex issues
Operational Insights: Patterns in traces can reveal operational issues

Challenges and Considerations

Overhead: Tracing adds some performance overhead
Data Volume: High-traffic systems generate large volumes of trace data
Privacy Concerns: Care must be taken not to include sensitive data in spans
Sampling Strategy: Determining the right sampling rate for traces
Integration Complexity: Ensuring consistent tracing across all system components

Future Enhancements

Potential improvements to the observability system could include:

Metrics Collection: Adding metrics for key system indicators
Structured Logging: Integrating structured logging with trace context
Alerting Integration: Connecting traces to alerting systems
Visualization Tools: Integrating with trace visualization tools
Correlation IDs: Enhancing correlation across system boundaries