BOR-518: add raw line ingestion fallback#1
Open
bdclaw2026 wants to merge 5 commits into
Open
Conversation
) * refactor: restructure CLI — analyze includes ingest pipeline, move ingest under debug - `analyze` now runs the full ingest pipeline (Drain + semantic labeling + DuckDB storage) before launching the AI agent - Move `ingest` command under `debug ingest` for step-by-step debugging - Extract shared pipeline helpers into `cmd/lapp/pipeline.go` - Remove top-level `templates` command - Add workspace path constraint to analyzer system prompt to prevent the agent from scanning files outside the workspace directory - Add Langfuse tracing support with docker-compose for local dev - Update CLAUDE.md with new CLI structure and code style notes * feat: add OpenTelemetry distributed tracing with Jaeger backend Instrument the entire pipeline with OTel spans: CLI commands, multiline merge, Drain parsing, semantic labeling, DuckDB storage, and analyzer. HTTP clients for LLM calls use otelhttp transport for deep request traces. - Add pkg/tracing/otel.go with OTLP HTTP exporter (env-gated via OTEL_TRACING_ENABLED) - Add Jaeger service to docker-compose.yml (UI on port 16686) - Wire InitOTel in main.go with graceful shutdown - Add ctx parameter to DrainParser.Feed/Templates and multiline.Merge/MergeSlice - Wrap eino OpenRouter HTTP clients with otelhttp.NewTransport * fix: reuse Drain templates in analyzer to keep IDs consistent with DB Extract AnalyzeWithTemplates() that accepts pre-computed templates, so the analyze command passes the same DrainParser output to both DuckDB storage and the workspace builder. Previously, Analyze() created a second DrainParser with fresh UUIDs, causing template IDs in the workspace to diverge from those in the database. * feat: replace flat workspace with structured file-based layout for AI agents Replace all old CLI commands (analyze, debug *) with a new `workspace` command group (create, add-log, analyze) that builds a structured directory with pattern directories named by LLM-generated semantic IDs. Workspace structure: logs/, patterns/<semantic-id>/{pattern.md,samples.log}, patterns/unmatched/, notes/{summary.md,errors.md}, and AGENTS.md. Closes STRRL#17 * fix: address PR review — sanitize dir names, deterministic order, unique stdin names - Sanitize semantic IDs with [a-z0-9-] regex before using as directory names to prevent path traversal from LLM output - Sort filenames before iterating in mergeAllLogs for deterministic rebuild output across runs - Use UnixNano instead of Unix for stdin log filenames to avoid collisions within the same second * feat: replace positional dir args with --topic flag and auto-resolve workspace paths Workspace commands now accept a --topic flag instead of a raw directory path. Topics are sanitized to lower-kebab-case and resolved to ~/.lapp/workspaces/<topic>/, giving users a simpler CLI interface without needing to manage paths directly. * feat: add workspace list command and show available workspaces on errors Add `workspace list` subcommand to enumerate existing workspaces. Include available workspace names in error messages when a workspace is not found, helping users discover valid --topic values.
* feat: integrate ACP providers for workspace analyze * refactor: remove Gemini provider, update eino-acp to 829a6c3 Drop Gemini ACP support, keeping only Claude and Codex providers. Update eino-acp dependency and use its command builders instead of hardcoded command slices. --------- Co-authored-by: bdclaw2026 <262853276+bdclaw2026@users.noreply.github.com>
Matches Loki's default. Higher depth means the prefix tree routes more precisely, reducing the number of candidate clusters that need similarity comparison at leaf nodes. This improves both accuracy and performance for pattern detection.
* feat: define v1 event schema fixtures * fix: make event schema protobuf canonical --------- Co-authored-by: bdclaw2026 <262853276+bdclaw2026@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirror of upstream PR STRRL#27 for Symphony workflow metadata.
Upstream PR: STRRL#27
Linear issue: https://linear.app/boringdesign/issue/BOR-518/phase-2ingestion-foundation-build-raw-log-line-ingestion-with-text
This fork PR is not the intended merge target; it exists because the upstream repository does not expose the required
symphonylabel to this account.