Calendar-MCP uses OpenTelemetry for comprehensive observability across all components: structured logging, distributed tracing, and metrics collection.
- Structured Logging: Consistent log formatting with context
- Distributed Tracing: End-to-end request tracking across components
- Metrics Collection: Performance monitoring and usage analytics
- Multiple Exporters: Support for various backends (Console, OTLP, Jaeger, Prometheus, Azure Monitor)
- Privacy-First: Built-in redaction of sensitive data
See Configuration for complete examples.
{
"telemetry": {
"enabled": true,
"serviceName": "calendar-mcp",
"console": {
"enabled": true,
"logLevel": "Debug"
}
}
}{
"telemetry": {
"enabled": true,
"serviceName": "calendar-mcp",
"serviceVersion": "1.0.0",
"otlp": {
"enabled": true,
"endpoint": "http://collector:4317",
"protocol": "grpc"
},
"sampling": {
"samplingRate": 0.1
},
"redaction": {
"enabled": true,
"redactEmailContent": true,
"redactTokens": true,
"redactPii": true
}
}
}public static class Telemetry
{
public static readonly ActivitySource ActivitySource =
new ActivitySource("CalendarMcp", "1.0.0");
}public async Task<IEnumerable<EmailMessage>> GetEmailsAsync(
string? accountId,
int count,
bool unreadOnly)
{
using var activity = Telemetry.ActivitySource.StartActivity("MCP.GetEmails");
activity?.SetTag("mcp.tool", "get_emails");
activity?.SetTag("mcp.account_id", accountId ?? "all");
activity?.SetTag("mcp.count", count);
activity?.SetTag("mcp.unread_only", unreadOnly);
try
{
var emails = await ExecuteAsync(accountId, count, unreadOnly);
activity?.SetTag("mcp.result_count", emails.Count());
return emails;
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
activity?.RecordException(ex);
throw;
}
}public async Task<RoutingDecision> RouteAsync(RoutingRequest request)
{
using var activity = Telemetry.ActivitySource.StartActivity("SmartRouter.Route");
activity?.SetTag("router.tool", request.ToolName);
activity?.SetTag("router.query_length", request.UserQuery.Length);
var stopwatch = Stopwatch.StartNew();
var decision = await ExecuteRoutingLogicAsync(request);
stopwatch.Stop();
activity?.SetTag("router.strategy", decision.RoutingStrategy);
activity?.SetTag("router.confidence", decision.Confidence);
activity?.SetTag("router.account_count", decision.AccountIds.Count);
activity?.SetTag("router.latency_ms", stopwatch.ElapsedMilliseconds);
return decision;
}public async Task<IEnumerable<EmailMessage>> GetEmailsAsync(
string accountId,
int count,
bool unreadOnly)
{
using var activity = Telemetry.ActivitySource.StartActivity("M365Provider.GetEmails");
activity?.SetTag("provider.type", "microsoft365");
activity?.SetTag("provider.account_id", accountId);
try
{
var token = await AcquireTokenAsync(accountId);
activity?.SetTag("auth.cache_hit", token.Source == TokenSource.Cache);
var emails = await CallGraphApiAsync(token, count, unreadOnly);
activity?.SetTag("provider.result_count", emails.Count());
return emails;
}
catch (MsalException ex)
{
activity?.SetStatus(ActivityStatusCode.Error, "Authentication failed");
activity?.SetTag("error.type", "auth_failure");
throw;
}
}_logger.LogInformation(
"Account {AccountId} initialized successfully. Provider: {Provider}, Domains: {Domains}",
account.Id,
account.Provider,
string.Join(", ", account.Domains));_logger.LogError(
ex,
"Failed to refresh token for account {AccountId}. Provider: {Provider}",
accountId,
provider);_logger.LogWarning(
"Router backend {Backend} took {LatencyMs}ms, exceeding target of {TargetMs}ms",
backend,
latency,
targetLatency);Decision Latency:
router.decision_latency_ms: Time to make routing decision- Tags:
backend,strategy,tool_name
Decision Accuracy (requires user feedback):
router.decision_accuracy: Percentage of correct routing decisions- Tags:
backend,strategy
Backend Performance:
router.backend_latency_ms: LLM backend response timerouter.backend_errors: Backend failure count- Tags:
backend,model
API Response Times:
provider.api_latency_ms: Time to complete API call- Tags:
provider_type,operation,account_id
Authentication Metrics:
provider.auth_success_count: Successful authenticationsprovider.auth_failure_count: Failed authenticationsprovider.token_cache_hit_rate: Cache hit percentage- Tags:
provider_type,account_id
API Call Volumes:
provider.api_calls_total: Total API calls- Tags:
provider_type,operation,account_id
Tool Execution:
mcp.tool_execution_count: Tool invocation countmcp.tool_execution_latency_ms: Tool execution time- Tags:
tool_name,account_id
Error Rates:
mcp.tool_error_count: Tool execution failures- Tags:
tool_name,error_type
Parallel Query Performance:
aggregation.parallel_query_latency_ms: Time to query all accountsaggregation.account_query_latency_ms: Per-account query timeaggregation.result_count: Results per account- Tags:
tool_name,account_count
Merge/Dedupe Performance:
aggregation.merge_latency_ms: Time to merge resultsaggregation.dedupe_count: Duplicates removed
Traces flow through the entire request lifecycle:
AI Assistant
→ MCP Server (StartActivity: "MCP.GetEmails")
→ Smart Router (StartActivity: "SmartRouter.Route")
→ LLM Backend (StartActivity: "Router.LLM.CallApi")
→ Provider Service Factory (StartActivity: "Factory.ResolveProvider")
→ M365 Provider Service (StartActivity: "M365Provider.GetEmails")
→ Authentication (StartActivity: "M365Provider.AcquireToken")
→ Graph API Call (StartActivity: "M365Provider.CallGraphApi")
Each span includes:
- Unique span ID
- Parent span ID (for hierarchy)
- Start/end timestamps
- Tags (metadata)
- Status (OK, Error)
- Exceptions (if any)
Output:
[14:32:45 INF] Activity: MCP.GetEmails
Tags:
mcp.tool: get_emails
mcp.account_id: work-account
mcp.count: 20
mcp.result_count: 15
Duration: 234ms
Use Case: Send to OpenTelemetry Collector for routing to multiple backends
{
"telemetry": {
"otlp": {
"enabled": true,
"endpoint": "http://localhost:4317",
"protocol": "grpc"
}
}
}Use Case: Visualize trace spans in Jaeger UI
{
"telemetry": {
"jaeger": {
"enabled": true,
"agentHost": "localhost",
"agentPort": 6831
}
}
}Jaeger UI: http://localhost:16686
Use Case: Time-series metrics and alerting
{
"telemetry": {
"prometheus": {
"enabled": true,
"port": 9090
}
}
}Metrics endpoint: http://localhost:9090/metrics
Use Case: Enterprise monitoring with Azure
{
"telemetry": {
"azureMonitor": {
"enabled": true,
"connectionString": "InstrumentationKey=...;IngestionEndpoint=..."
}
}
}Always Redacted (cannot be disabled):
- Access tokens
- Refresh tokens
- Client secrets
- API keys
- Authorization headers
Configurable Redaction:
{
"telemetry": {
"redaction": {
"enabled": true,
"redactEmailContent": true,
"redactTokens": true,
"redactPii": true
}
}
}public static class TelemetryRedactor
{
public static string RedactEmailAddress(string email)
{
if (!_config.RedactPii) return email;
var domain = email.Split('@').LastOrDefault();
return $"***@{domain}";
}
public static string RedactEmailContent(string content)
{
return _config.RedactEmailContent ? "[REDACTED]" : content;
}
public static string RedactToken(string token)
{
return "[REDACTED_TOKEN]";
}
}Never redacted (safe for telemetry):
- Account IDs (e.g., "work-account")
- Provider types (e.g., "microsoft365")
- Email domains (e.g., "example.com")
- Message counts
- Timestamps
- Status codes
- Latency measurements
To reduce telemetry volume:
{
"telemetry": {
"sampling": {
"alwaysSample": false,
"samplingRate": 0.1 // 10% of traces
}
}
}Sample based on request characteristics:
var sampler = new CustomSampler();
sampler.AlwaysSampleErrors(); // Always trace errors
sampler.AlwaysSampleSlow(threshold: 1000); // Always trace slow requests (>1s)
sampler.SampleNormal(rate: 0.1); // 10% for normal requestsFind slow router decisions:
service="calendar-mcp"
operation="SmartRouter.Route"
duration>500ms
Find authentication failures:
service="calendar-mcp"
tag["error.type"]="auth_failure"
Router performance by backend:
customMetrics
| where name == "router.decision_latency_ms"
| summarize avg(value), percentiles(value, 50, 95, 99) by tostring(customDimensions.backend)Failed API calls by provider:
dependencies
| where type == "HTTP"
| where success == false
| summarize count() by tostring(customDimensions.provider_type), resultCode- Start spans early: Create activity at method entry
- Set tags before operations: Tag before executing to ensure metadata even on exceptions
- Always record exceptions: Use
activity?.RecordException(ex) - Use structured logging: Include context in log messages
- Tag with account context: Always include accountId for filtering
- Measure latency: Use Stopwatch for timing
- Set meaningful status codes: OK for success, Error for failures
- Keep tag names consistent: Use namespaced tags (e.g.,
mcp.*,router.*,provider.*) - Redact by default: Privacy first, opt-in to more verbose telemetry
- Sample in production: Don't trace 100% of requests at scale