Skip to content

Paulo/cli-telemetry#3163

Open
rossirpaulo wants to merge 4 commits intocanaryfrom
paulo/cli-telemetry
Open

Paulo/cli-telemetry#3163
rossirpaulo wants to merge 4 commits intocanaryfrom
paulo/cli-telemetry

Conversation

@rossirpaulo
Copy link
Contributor

@rossirpaulo rossirpaulo commented Feb 23, 2026

Pull Request Template

Added CLI telemetry to baml-cli

Summary by CodeRabbit

Release Notes

  • New Features

    • Added telemetry collection to the Engine CLI. Telemetry can be disabled via the BAML_CLI_DISABLE_TELEMETRY environment variable.
  • Documentation

    • Added reference documentation describing telemetry behavior, captured data, privacy guarantees, and disablement instructions.

@vercel
Copy link

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Feb 23, 2026 5:31pm
promptfiddle Ready Ready Preview, Comment Feb 23, 2026 5:31pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

A PostHog-based telemetry system is added to the Engine CLI that captures command-start events with minimal data (command name, CLI version, caller info, CI context, project hash, machine ID). Events are sent asynchronously without affecting command execution and can be disabled via the BAML_CLI_DISABLE_TELEMETRY environment variable.

Changes

Cohort / File(s) Summary
Telemetry Implementation
engine/cli/src/lib.rs, engine/cli/src/telemetry.rs
New telemetry module added with command-start event capture. Implements project hash computation, CI provider detection, machine ID persistence, and async PostHog payload sending. Includes comprehensive tests for disablement, LSP command suppression, CI detection, and payload field validation. Module integrated into CLI run path with argv cloning and event invocation before command execution.
Telemetry Documentation
docs/engine-cli-telemetry-implementation-plan.md, fern/03-reference/baml-cli/telemetry.mdx, fern/docs.yml
Implementation plan document outlining module structure and integration points. User-facing reference documentation describing captured data, exclusions, disablement mechanism, and privacy guarantees. Navigation entry added to docs.yml for the new telemetry reference section.

Sequence Diagram

sequenceDiagram
    participant CLI as Engine CLI
    participant Telemetry as telemetry module
    participant ProjectHash as Project Hash<br/>Computation
    participant CIDetect as CI Provider<br/>Detection
    participant MachineID as Machine ID<br/>Persistence
    participant PostHog as PostHog API

    CLI->>Telemetry: capture_command_started(argv, command, caller_type)
    
    alt Telemetry disabled or LSP command
        Telemetry-->>CLI: return early
    else Proceed with event capture
        Telemetry->>ProjectHash: compute_project_hash(--from, cwd)
        ProjectHash-->>Telemetry: project_hash, source
        
        Telemetry->>CIDetect: detect_ci_provider()
        CIDetect-->>Telemetry: ci_provider variant
        
        Telemetry->>MachineID: read_or_create_machine_id()
        MachineID-->>Telemetry: machine_id
        
        Telemetry->>Telemetry: build TelemetryEvent payload
        Note over Telemetry: Includes: command, subcommand, version,<br/>caller, CI info, project hash,<br/>machine/session IDs, OS/arch, argv length
        
        Telemetry->>PostHog: spawn_send_event (async HTTP POST)<br/>timeout: ~5s, fire-and-forget
        PostHog-->>Telemetry: event sent
        
        Telemetry-->>CLI: return (event sent async)
    end
    
    CLI->>CLI: continue command execution
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Paulo/cli-telemetry' is a branch name, not a descriptive pull request title. It does not clearly summarize the main change of implementing CLI telemetry for the baml-cli. Rename the title to clearly describe the main change, such as 'Add CLI telemetry to baml-cli' or 'Implement PostHog-based telemetry for Engine CLI'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch paulo/cli-telemetry

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

@github-actions
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5


ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6133e3f and d172216.

📒 Files selected for processing (5)
  • docs/engine-cli-telemetry-implementation-plan.md
  • engine/cli/src/lib.rs
  • engine/cli/src/telemetry.rs
  • fern/03-reference/baml-cli/telemetry.mdx
  • fern/docs.yml

Comment on lines +1 to +450
# Engine CLI Telemetry Implementation Plan

Status: Proposed
Scope: Engine CLI only (`/engine/cli`), loosely coupled, no VSCode refactor
Last updated: 2026-02-16

## 1. Executive Summary

This document defines the exact implementation plan for adding PostHog telemetry to the current Engine CLI path used by existing BAML users.

The implementation is intentionally minimal and low-entropy:

1. Add one telemetry module in `engine/cli`.
2. Add one call site in `run_cli(...)`.
3. Emit one event per command invocation start.
4. Exclude `lsp` command.
5. Support only one env switch: `BAML_CLI_DISABLE_TELEMETRY`.
6. Keep VSCode telemetry untouched.

No shared abstraction across Rust and TypeScript is introduced.
No VSCode telemetry API/events/settings are modified.

## 2. Background and Context

### 2.1 Why this exists

Telemetry exists for the VSCode extension today, but not for CLI invocations. As more BAML usage is initiated by non-human workflows and wrappers, CLI visibility is needed to understand adoption and behavior.

### 2.2 Current telemetry situation

1. VSCode telemetry exists in:
- `typescript/apps/vscode-ext/src/telemetryReporter.ts`
2. Engine CLI currently has no PostHog capture.
3. Engine CLI is the real runtime path for current users (native and wrapper invocations).

### 2.3 CLI entrypoint convergence

All current-user CLI surfaces converge into:

- `engine/cli/src/lib.rs` -> `run_cli(argv, caller_type)`

Callers include:

1. Native binary:
- `engine/cli/src/main.rs`
2. Python:
- `engine/language_client_python/src/lib.rs`
3. TypeScript:
- `engine/language_client_typescript/src/lib.rs`
4. Ruby:
- `engine/language_client_ruby/ext/ruby_ffi/src/lib.rs`
5. CFFI/Go path:
- `engine/language_client_cffi/src/ffi/runtime.rs`
- `baml-cli/main.go`

This makes one module + one call site sufficient for broad CLI coverage.

## 3. Scope and Non-Goals

### 3.1 In scope

1. `engine/cli` telemetry only.
2. Non-`lsp` command start event capture.
3. Fire-and-forget transport.
4. One opt-out env var.
5. Unit tests for telemetry module behavior.
6. Documentation update for CLI telemetry behavior and env control.

### 3.2 Out of scope

1. Any refactor to VSCode telemetry architecture.
2. Any changes under `baml_language`.
3. Completion/success/failure lifecycle telemetry.
4. New CLI flags for telemetry.
5. Multi-env telemetry configuration surface (`host`, `mode`, `event`, etc.).

## 4. Design Constraints

1. Loose coupling: telemetry logic isolated inside one CLI module.
2. No user-impact risk: telemetry must never change command behavior or exit code.
3. Privacy-safe defaults:
- no raw filesystem paths
- no raw argv values
- no arbitrary env value export
4. Minimal surface area:
- one env var only
5. Operational simplicity:
- fixed event name
- fixed host/endpoint
- fixed timeout

## 5. Event Contract (Fixed)

### 5.1 Event name

`baml.engine_cli.command.started`

### 5.2 Endpoint

- Host: `https://us.i.posthog.com`
- Path: `/i/v0/e`

### 5.3 API key

Default: same PostHog project key currently used in VSCode telemetry code.
This is data-plane alignment only. There is no code-level coupling to VSCode telemetry runtime.

### 5.4 Payload shape

Top-level JSON payload:

```json
{
"api_key": "<posthog_project_key>",
"event": "baml.engine_cli.command.started",
"distinct_id": "<machine_id>",
"$process_person_profile": false,
"properties": {
"surface": "engine_cli",
"schema_version": 1,
"cli_version": "0.0.0",
"command": "generate",
"subcommand": null,
"caller_output_type": "typescript",
"caller_runtime": "typescript",
"ci": false,
"ci_provider": "none",
"project_hash": "ab12cd34",
"project_hash_source": "from_arg",
"machine_id": "baml_machine_....",
"session_id": "uuid-v4",
"argv_len": 3,
"feature_flags_count": 1,
"os_platform": "macos",
"os_arch": "aarch64",
"stdout_is_tty": true,
"stderr_is_tty": true
}
}
```

## 6. Configuration Surface (Final)

### 6.1 Supported env var

`BAML_CLI_DISABLE_TELEMETRY`

Behavior:

1. Truthy values disable telemetry hard.
2. Falsy/unset values allow telemetry.

Truthy parser set:

- `1`
- `true`
- `yes`
- `on`

Case-insensitive, surrounding whitespace trimmed.

### 6.2 Explicitly not added

The following are intentionally not implemented:

1. `BAML_CLI_TELEMETRY_MODE`
2. `BAML_CLI_TELEMETRY_API_KEY`
3. `BAML_CLI_TELEMETRY_HOST`
4. `BAML_CLI_TELEMETRY_EVENT_NAME`
5. `BAML_CLI_TELEMETRY_TIMEOUT_MS`
6. `BAML_CLI_TELEMETRY_PROCESS_PERSON_PROFILE`
7. `BAML_CLI_TELEMETRY_DEBUG`

Reason: keep loose coupling and reduce entropy of configuration behavior.

## 7. Data Collection Rules

### 7.1 Command capture

Capture only start event for commands except:

1. `lsp` (always excluded)

### 7.2 Command and subcommand mapping

Map from `commands::Commands`:

1. `Init` -> `init`
2. `Generate` -> `generate`
3. `Check` -> `check`
4. `Serve` -> `serve`
5. `Dev` -> `dev`
6. `Auth(Login|Token)` -> `auth` with `subcommand=login|token`
7. `Login` -> `login`
8. `Deploy` -> `deploy`
9. `Format` -> `fmt`
10. `Test` -> `test`
11. `DumpHIR` -> `dump_hir`
12. `DumpBytecode` -> `dump_bytecode`
13. `Repl` -> `repl`
14. `Optimize` -> `optimize`
15. `LanguageServer` -> excluded

### 7.3 Caller mapping

Derive `caller_output_type` from `RuntimeCliDefaults.output_type`.
Derive `caller_runtime` from output type mapping:

1. `OpenApi` -> `native`
2. `PythonPydantic` / `PythonPydanticV1` -> `python`
3. `Typescript` / `TypescriptReact` -> `typescript`
4. `RubySorbet` -> `ruby`
5. `Go` -> `go`
6. fallback -> `unknown`

### 7.4 Project hash derivation

Path selection order:

1. `--from <path>` from raw argv if present.
2. `cwd/baml_src` if it exists.
3. `cwd`.

Then:

1. normalize path string
2. SHA-256 hash
3. truncate to 8 hex chars

Only hash is sent, never raw path.

### 7.5 CI metadata

`ci` is true when `CI` env is truthy.
`ci_provider` inferred by first match:

1. `GITHUB_ACTIONS` -> `github_actions`
2. `GITLAB_CI` -> `gitlab`
3. `CIRCLECI` -> `circleci`
4. `BUILDKITE` -> `buildkite`
5. `JENKINS_URL` -> `jenkins`
6. `TF_BUILD` -> `azure_pipelines`
7. otherwise -> `none`

## 8. Reliability and Failure Semantics

### 8.1 Delivery strategy

Fire-and-forget only:

1. Spawn detached thread.
2. In thread, create short-lived tokio runtime.
3. Use `reqwest` POST to PostHog endpoint.
4. Apply fixed timeout: 800ms.
5. Ignore all errors.

### 8.2 Guarantees

1. Telemetry failure cannot alter stdout/stderr contract.
2. Telemetry failure cannot alter exit code.
3. Telemetry cannot panic process path.

## 9. Privacy and Security

1. No raw argv values are exported.
2. No raw filesystem path values are exported.
3. No sensitive env values are exported.
4. Distinct ID is anonymous machine ID.
5. `$process_person_profile=false`.

## 10. File-Level Implementation Plan

### 10.1 New file

`engine/cli/src/telemetry.rs`

Contents:

1. Config parsing (`BAML_CLI_DISABLE_TELEMETRY`).
2. Command mapping utilities.
3. Project hash selection and hashing.
4. Machine ID persistence logic.
5. CI detection utilities.
6. Payload construction structs.
7. Async HTTP capture helper.
8. Public entry function called by `run_cli(...)`.

### 10.2 Existing file edit

`engine/cli/src/lib.rs`

Changes:

1. `mod telemetry;`
2. After `parse_from_smart(argv.clone())` and before command execution, call:
- `telemetry::capture_command_started(&argv, &cli.command, caller_type);`
3. No changes to command behavior flow.

### 10.3 Docs file add

`fern/03-reference/baml-cli/telemetry.mdx`

Contents:

1. What is captured.
2. What is not captured.
3. How to disable: `BAML_CLI_DISABLE_TELEMETRY`.
4. Privacy guarantees.

### 10.4 Docs navigation update

`fern/docs.yml`

Add telemetry page under `baml-cli` section.

## 11. Proposed Internal API (telemetry.rs)

Suggested shape:

```rust
pub(crate) fn capture_command_started(
argv: &[String],
command: &crate::commands::Commands,
caller_type: baml_runtime::RuntimeCliDefaults,
)
```

Supporting internal items:

1. `struct TelemetryEvent`
2. `struct TelemetryProperties`
3. `enum ProjectHashSource`
4. `enum CiProvider`
5. `fn env_truthy(key: &str) -> bool`
6. `fn telemetry_disabled() -> bool`
7. `fn is_lsp_command(command: &Commands) -> bool`
8. `fn map_command(command: &Commands) -> (&'static str, Option<&'static str>)`
9. `fn map_caller_runtime(...) -> &'static str`
10. `fn compute_project_hash(argv: &[String]) -> (String, ProjectHashSource)`
11. `fn get_or_create_machine_id() -> String`
12. `async fn send_event(payload: &TelemetryEvent) -> anyhow::Result<()>`

## 12. Machine ID Persistence Plan

Use app config strategy similar to existing CLI credential storage patterns.

Directory strategy:

1. top-level-domain: `com`
2. author: `boundaryml`
3. app-name: `baml-cli`

File:

`<config_dir>/telemetry_machine_id`

Behavior:

1. Read if exists and non-empty.
2. If missing/unreadable/empty:
- generate UUID v4
- attempt write
- use generated value even if write fails

## 13. Test Plan

Unit tests in `engine/cli/src/telemetry.rs` (or companion test module):

1. `env_truthy` parser:
- truthy set passes
- falsy/unset fails
2. disable behavior:
- disabled env suppresses send
3. lsp exclusion:
- `LanguageServer` command never sends
4. command mapping:
- all commands map to expected `command`/`subcommand`
5. project hash source fallback:
- `--from` path
- `cwd/baml_src`
- `cwd`
6. CI provider mapping:
- each provider env produces expected enum/name
7. payload shape:
- required fields populated
- no raw path/argv in serialized payload

Non-goal for tests:

1. No live network integration test with PostHog endpoint.

## 14. Acceptance Criteria

1. Any non-`lsp` CLI invocation emits one start event when not disabled.
2. `baml-cli lsp` emits no telemetry.
3. `BAML_CLI_DISABLE_TELEMETRY=1` disables all CLI telemetry.
4. Network failures/timeouts do not alter command behavior or exit code.
5. Payload never includes raw path or argv strings.
6. VSCode telemetry behavior remains unchanged.

## 15. Rollout and Verification

### 15.1 Rollout

Single-release rollout with no feature flag beyond env disable switch.

### 15.2 Verification checklist

1. Manual run:
- `baml-cli generate`
- confirm send path executed
2. Manual run:
- `BAML_CLI_DISABLE_TELEMETRY=1 baml-cli generate`
- confirm no send path
3. Manual run:
- `baml-cli lsp`
- confirm no send path
4. Confirm VSCode extension telemetry code unchanged in diff.

## 16. Risk Register

1. Risk: unintentional latency from network call.
- Mitigation: detached thread + short timeout + no await in main path.
2. Risk: accidental data leakage.
- Mitigation: explicit allowlist-only payload construction.
3. Risk: storage failures for machine id.
- Mitigation: fallback to ephemeral UUID.
4. Risk: future config sprawl.
- Mitigation: single env var contract; reject extra knobs for this iteration.

## 17. Explicit Change Approval List

The implementation of this plan introduces exactly these repo-level changes:

1. Add `engine/cli/src/telemetry.rs`
2. Edit `engine/cli/src/lib.rs`
3. Add `fern/03-reference/baml-cli/telemetry.mdx`
4. Edit `fern/docs.yml`

No other files are required for the planned implementation.

## 18. Appendix: Coupling Clarification

This plan is loosely coupled by design:

1. CLI telemetry is implemented entirely in Rust under `engine/cli`.
2. VSCode telemetry remains separate in TypeScript.
3. No shared runtime interfaces between them.
4. Shared PostHog project key is only a data destination choice, not code coupling.

Copy link
Contributor

@coderabbitai coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Thorough implementation plan — well-structured and aligned with the code.

The plan clearly documents scope, non-goals, event contract, privacy rules, and testing strategy. All described behaviors match the actual implementation in telemetry.rs.

One consideration: since this is committed as a file, it may drift from the implementation over time. Consider adding a note that the source of truth is the code itself, or moving this to a wiki/ADR with a link in the codebase.

🧰 Tools
🪛 LanguageTool

[style] ~261-~261: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...etry failure cannot alter exit code. 3. Telemetry cannot panic process path. ## 9. Priva...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~267-~267: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...filesystem path values are exported. 3. No sensitive env values are exported. 4. D...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will delete it instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rossirpaulo, sounds good! Removing the implementation plan document works too.

Comment on lines +201 to +210
fn map_caller_runtime(caller_type: RuntimeCliDefaults) -> &'static str {
match caller_type.output_type {
GeneratorOutputType::OpenApi => "native",
GeneratorOutputType::PythonPydantic | GeneratorOutputType::PythonPydanticV1 => "python",
GeneratorOutputType::Typescript | GeneratorOutputType::TypescriptReact => "typescript",
GeneratorOutputType::RubySorbet => "ruby",
GeneratorOutputType::Go => "go",
_ => "unknown",
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the GeneratorOutputType enum definition
rg -n 'enum GeneratorOutputType' --type rust -A 50

Repository: BoundaryML/baml

Length of output: 4055


Add mapping for the Rust generator output type.

The map_caller_runtime function handles 7 of 8 variants from the GeneratorOutputType enum. The Rust variant is missing and will silently map to "unknown" in telemetry, reducing data accuracy. Additionally, the non-exhaustive match pattern using _ => "unknown" masks this issue—any future enum variants added will also fall through silently.

Replace the wildcard with an explicit mapping for Rust (e.g., => "rust") and consider using #[non_exhaustive] on the enum or refactoring to an exhaustive match to catch missing variants at compile time.

Comment on lines +641 to +642
let machine_path = temp_root.join("telemetry_machine_id");
set_machine_id_path_override(Some(machine_path));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Unnecessary set_machine_id_path_override in this test.

build_command_started_event takes machine_id as a parameter and does not call get_or_create_machine_id, so the override on Line 642 is never exercised. Consider removing it to avoid misleading future readers.

Proposed cleanup
         let _cwd_guard = CwdGuard::set(&temp_root);
 
-        let machine_path = temp_root.join("telemetry_machine_id");
-        set_machine_id_path_override(Some(machine_path));
-
         let sensitive_path = temp_root.join("private").join("sensitive").join("baml_src");
         );
-        set_machine_id_path_override(None);
 
         let value = serde_json::to_value(&payload).expect("payload should serialize");

Also applies to: 666-666

Comment on lines +1 to +2
`baml-cli` emits one telemetry event when a command starts, except for `lsp`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing required frontmatter.

This .mdx file must start with frontmatter containing title and subtitle. As per coding guidelines: "Every .mdx file must start with frontmatter containing title and subtitle in the specified format."

Proposed fix
+---
+title: Telemetry
+subtitle: Learn how to manage CLI telemetry in baml-cli
+---
+
 `baml-cli` emits one telemetry event when a command starts, except for `lsp`.

@@ -0,0 +1,51 @@
`baml-cli` emits one telemetry event when a command starts, except for `lsp`.

## Event Behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Section headings should use sentence case.

Per coding guidelines, titles should have "the first word capitalized followed by lowercase unless it is a name." The following headings should be updated:

  • ## Event Behavior## Event behavior
  • ## What Is Captured## What is captured
  • ## What Is Not Captured## What is not captured
  • ## Disable Telemetry## Disable telemetry

As per coding guidelines: "Titles must always start with an uppercase letter, followed by lowercase letters unless it is a name."

Also applies to: 12-12, 25-25, 31-31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant