Add Langfuse observability to Unified API #457

avirajsingh7 · 2025-11-27T08:38:22Z

Summary

Target issue is #438
This PR introduces Langfuse observability into the LLM provider execution flow by wrapping provider_instance.execute with a configurable decorator. This allows every LLM call to automatically generate:

A Langfuse trace
A generation event
Success/failure metadata
Token usage reporting
Optional session grouping via conversation_id

This enables unified tracing, debugging, and analytics across all LLM providers.

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- Added LLM execution observability with automatic tracing of LLM interactions.
- Collects telemetry per execution, including model info, usage metrics, and output details.
- Supports session and conversation tracking to correlate related requests.
- Observability is optional and gracefully bypasses tracing when credentials or client initialization are unavailable.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-27T08:38:32Z

Walkthrough

Adds a decorator factory observe_llm_execution that, when provided credentials, initializes a Langfuse client and wraps an LLM provider’s execute function to emit traces and generation data (output, usage, model). Integrates this decorator into execute_job to apply observability when credentials exist.

Changes

Cohort / File(s)	Change Summary
Langfuse tracing helper `backend/app/core/langfuse/langfuse.py`	Adds `observe_llm_execution(session_id: str
LLM job execution integration `backend/app/services/llm/jobs.py`	Fetches Langfuse provider credentials via `get_provider_credential`, extracts `conversation_id` from request query, wraps `provider_instance.execute` with `observe_llm_execution`, and calls the decorated execute function.

Sequence Diagram(s)

sequenceDiagram
    participant Job as execute_job
    participant Decorator as observe_llm_execution (wrapper)
    participant Langfuse as Langfuse Client
    participant Provider as LLM Provider

    Job->>Decorator: call decorated_execute(query, ...)
    alt credentials available & client init OK
        Decorator->>Langfuse: init client (credentials)
        Decorator->>Langfuse: create trace (session_id, conversation_id)
        Decorator->>Langfuse: create generation
        Decorator->>Provider: execute(query)
        alt success
            Provider-->>Decorator: result + usage
            Decorator->>Langfuse: update generation (output, usage, model)
            Decorator->>Langfuse: update trace & flush
            Decorator-->>Job: return result
        else failure
            Provider-->>Decorator: exception
            Decorator->>Langfuse: flush data
            Decorator-->>Job: propagate exception
        end
    else credentials missing or client init failed
        Decorator->>Provider: execute(query) (bypass)
        Provider-->>Decorator: result
        Decorator-->>Job: return result
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Files to inspect closely:
- backend/app/core/langfuse/langfuse.py — trace/generation lifecycle, error handling, and client init logic.
- backend/app/services/llm/jobs.py — correct credential fetching, conversation_id extraction, and decorated execute call signature compatibility.

Poem

🐰
I nibble on traces in moonlit code,
Wrapping each call down the winding road.
With credentials snug, I stitch each run,
Flushing bright threads when the work is done.
Hop—observability's neatly sowed.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add Langfuse observability to Unified API' accurately summarizes the main change: introducing Langfuse observability into the LLM provider execution flow across all providers.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add_langfuse_llm_api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-27T08:44:30Z

Codecov Report

❌ Patch coverage is 84.78261% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/core/langfuse/langfuse.py	84.21%	6 Missing ⚠️
backend/app/services/llm/jobs.py	87.50%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

backend/app/core/langfuse/langfuse.py (2)
3-3: Optional: modernize typing imports and Dict usage to match Ruff hints.

Ruff is flagging Callable and Dict from typing; in Python 3.11+ you can simplify by importing Callable from collections.abc and using builtin dict[...] instead of Dict[...]. This is purely stylistic but will keep the module aligned with current best practices and avoid future deprecation noise.

Example (conceptual only):
-from typing import Any, Callable, Dict, Optional
+from collections.abc import Callable
+from typing import Any, Optional
...
-    input: Dict[str, Any],
-    metadata: Optional[Dict[str, Any]] = None,
+    input: dict[str, Any],
+    metadata: Optional[dict[str, Any]] = None,
Also applies to: 55-61, 73-78, 88-92

114-218: Tighten type hints on observe_llm_execution and its wrapper.

The decorator logic looks sound and preserves the original (response, error) contract, including graceful fallback when credentials are missing or client init fails. To better leverage type checking (and per project guidelines on type hints), consider adding explicit return types for the decorator and wrapper:
-def observe_llm_execution(
-    session_id: str | None = None,
-    credentials: dict | None = None,
-):
+def observe_llm_execution(
+    session_id: str | None = None,
+    credentials: dict | None = None,
+) -> Callable:
@@
-    def decorator(func: Callable) -> Callable:
+    def decorator(func: Callable) -> Callable:
@@
-        def wrapper(completion_config: CompletionConfig, query: QueryParams, **kwargs):
+        def wrapper(
+            completion_config: CompletionConfig,
+            query: QueryParams,
+            **kwargs,
+        ) -> tuple[LLMCallResponse | None, str | None]:
You can later narrow the Callable annotations if you want stronger guarantees, but even this minimal change makes the behavior clearer to tooling without affecting runtime.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1821b84 and 4c6a07b.

📒 Files selected for processing (2)

backend/app/core/langfuse/langfuse.py (2 hunks)
backend/app/services/llm/jobs.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/core/langfuse/langfuse.py
backend/app/services/llm/jobs.py

backend/app/core/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/

Files:

backend/app/core/langfuse/langfuse.py

backend/app/core/langfuse/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place Langfuse observability integration under backend/app/core/langfuse/

Files:

backend/app/core/langfuse/langfuse.py

backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

backend/app/services/llm/jobs.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

📚 Learning: 2025-10-08T12:05:01.317Z

Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

Applied to files:

backend/app/core/langfuse/langfuse.py
backend/app/services/llm/jobs.py

🧬 Code graph analysis (2)

backend/app/core/langfuse/langfuse.py (3)

backend/app/models/llm/request.py (2)

CompletionConfig (49-58)

QueryParams (35-46)

backend/app/models/llm/response.py (1)

LLMCallResponse (42-52)

backend/app/tests/services/llm/providers/test_openai.py (2)

completion_config (32-37)

provider (27-29)

backend/app/services/llm/jobs.py (3)

backend/app/crud/credentials.py (1)

get_provider_credential (121-159)

backend/app/core/langfuse/langfuse.py (1)

observe_llm_execution (114-218)

backend/app/services/llm/providers/base.py (1)

execute (35-55)

🪛 Ruff (0.14.6)

backend/app/core/langfuse/langfuse.py

3-3: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)

3-3: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (2)

backend/app/services/llm/jobs.py (2)

187-193: Confirm get_provider_credential supports provider=\"langfuse\".

This call assumes the credentials CRUD/validation layer recognizes "langfuse" as a valid provider; otherwise validate_provider inside get_provider_credential will raise and short‑circuit the LLM job before the actual provider executes.

Please double‑check that:

"langfuse" is included wherever provider names are validated, and

Langfuse credentials are stored with the expected shape so that decrypt_credentials returns the public_key / secret_key / host fields used in observe_llm_execution.

194-205: Verify provider/session lifetime and note clean fallback when Langfuse is absent.

decorated_execute is created and invoked after the with Session(engine) as session block has exited. That’s fine as long as:

get_llm_provider only uses the DB session during provider construction (e.g., to fetch credentials/config), and

provider_instance.execute does not depend on the original Session remaining open.

If any provider still uses the passed session during execute, it should instead manage its own short‑lived sessions internally, or decorated_execute should be moved back inside the with Session(...) block.

On the positive side, the decorator is correctly wired:

When langfuse_credentials is None or invalid, observe_llm_execution will call through to provider_instance.execute unchanged.

When credentials are valid, you get tracing without altering the external (response, error) behavior.

…re execution

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

backend/app/core/langfuse/langfuse.py (2)

173-175: Simplify variable declaration and assignment.

The separate type hint declarations on lines 173-174 followed by assignment on line 175 are unnecessary. Python's type inference combined with the function's return annotation provides sufficient typing.

Apply this diff:

-                # Execute the actual LLM call
-                response: LLMCallResponse | None
-                error: str | None
-                response, error = func(completion_config, query, **kwargs)
+                # Execute the actual LLM call
+                response, error = func(completion_config, query, **kwargs)

114-220: Consider leveraging the existing LangfuseTracer class to reduce duplication.

The decorator reimplements logic similar to LangfuseTracer (lines 14-111), including trace/generation creation, error handling, and flushing. Refactoring the decorator to use LangfuseTracer internally would improve maintainability and eliminate the duplicate error-handling blocks (lines 198-203 vs. 209-214).

Example refactor:

def observe_llm_execution(
    session_id: str | None = None,
    credentials: dict | None = None,
):
    def decorator(func: Callable) -> Callable:
        @wraps(func)
        def wrapper(completion_config: CompletionConfig, query: QueryParams, **kwargs):
            if not credentials or not all(
                key in credentials for key in ["public_key", "secret_key", "host"]
            ):
                logger.info("[Langfuse] No credentials - skipping observability")
                return func(completion_config, query, **kwargs)

            tracer = LangfuseTracer(credentials=credentials, session_id=session_id)
            
            # Use tracer methods for trace/generation lifecycle
            tracer.start_trace(
                name="unified-llm-call",
                input={"query": query.input},
                metadata={"provider": completion_config.provider},
                tags=[completion_config.provider],
            )
            tracer.start_generation(
                name=f"{completion_config.provider}-completion",
                input={"query": query.input},
            )
            
            try:
                response, error = func(completion_config, query, **kwargs)
                if response:
                    tracer.end_generation(
                        output={"status": "success", "output": response.response.output.text},
                        usage={"input": response.usage.input_tokens, "output": response.usage.output_tokens},
                        model=response.response.model,
                    )
                    tracer.update_trace(
                        tags=[completion_config.provider],
                        output={"status": "success", "output": response.response.output.text},
                    )
                else:
                    tracer.log_error(error or "Unknown error")
                
                tracer.flush()
                return response, error
            except Exception as e:
                tracer.log_error(str(e))
                tracer.flush()
                raise
        
        return wrapper
    return decorator

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c6a07b and 5748d69.

📒 Files selected for processing (1)

backend/app/core/langfuse/langfuse.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/core/langfuse/langfuse.py

backend/app/core/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/

Files:

backend/app/core/langfuse/langfuse.py

backend/app/core/langfuse/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place Langfuse observability integration under backend/app/core/langfuse/

Files:

backend/app/core/langfuse/langfuse.py

🧠 Learnings (1)

📚 Learning: 2025-10-08T12:05:01.317Z

Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

Applied to files:

backend/app/core/langfuse/langfuse.py

🧬 Code graph analysis (1)

backend/app/core/langfuse/langfuse.py (2)

backend/app/models/llm/request.py (2)

CompletionConfig (49-58)

QueryParams (35-46)

backend/app/models/llm/response.py (1)

LLMCallResponse (42-52)

🪛 Ruff (0.14.6)

backend/app/core/langfuse/langfuse.py

3-3: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)

3-3: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (1)

backend/app/core/langfuse/langfuse.py (1)

183-186: The review comment is incorrect. usage_details is the correct and preferred parameter for Langfuse 2.60.3.

Based on verification:

Langfuse version 2.60.3 uses usage_details as the current v2/v3 standard parameter for generation.end()

The format {"input": ..., "output": ...} matches the expected generic-style structure

The usage parameter at line 95 is legacy/v1 style but remains backward-compatible

Both approaches work; usage_details is actually more modern and correct

No changes are needed. The code at lines 183-186 is properly implemented.

backend/app/core/langfuse/langfuse.py

nishika26 · 2025-11-28T04:19:47Z

backend/app/core/langfuse/langfuse.py

+                name="unified-llm-call",
+                input=query.input,
+                metadata=trace_metadata,
+                tags=[completion_config.provider],


why is provider detail being repeated both in metadata and tags

nishika26 · 2025-11-28T04:36:29Z

backend/app/core/langfuse/langfuse.py

+                        session_id=session_id,
+                    )
+
+                langfuse.flush()


maybe you can a function for marking the status and error, and then use that function here

right now,
keeping it simple for extensibility

Add Langfuse observability to LLM execution methods

4c6a07b

avirajsingh7 self-assigned this Nov 27, 2025

avirajsingh7 added the enhancement New feature or request label Nov 27, 2025

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

Enhance observability decorator to validate Langfuse credentials befo…

5748d69

…re execution

avirajsingh7 requested review from kartpop and nishika26 November 27, 2025 09:49

avirajsingh7 added the ready-for-review label Nov 27, 2025

avirajsingh7 linked an issue Nov 27, 2025 that may be closed by this pull request

Integrate Observability with Unified API #438

Open

coderabbitai bot reviewed Nov 27, 2025

View reviewed changes

backend/app/core/langfuse/langfuse.py Show resolved Hide resolved

backend/app/core/langfuse/langfuse.py Show resolved Hide resolved

nishika26 reviewed Nov 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Langfuse observability to Unified API #457

Add Langfuse observability to Unified API #457

Uh oh!

avirajsingh7 commented Nov 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 27, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

nishika26 Nov 28, 2025

Uh oh!

nishika26 Nov 28, 2025

Uh oh!

avirajsingh7 Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Langfuse observability to Unified API #457

Are you sure you want to change the base?

Add Langfuse observability to Unified API #457

Uh oh!

Conversation

avirajsingh7 commented Nov 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Nov 27, 2025

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nishika26 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

nishika26 Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

avirajsingh7 Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avirajsingh7 commented Nov 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 27, 2025 •

edited

Loading