Skip to content

Conversation

@avirajsingh7
Copy link
Collaborator

@avirajsingh7 avirajsingh7 commented Nov 27, 2025

Summary

Target issue is #438
This PR introduces Langfuse observability into the LLM provider execution flow by wrapping provider_instance.execute with a configurable decorator. This allows every LLM call to automatically generate:

  • A Langfuse trace
  • A generation event
  • Success/failure metadata
  • Token usage reporting
  • Optional session grouping via conversation_id

This enables unified tracing, debugging, and analytics across all LLM providers.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features
    • Added LLM execution observability with automatic tracing of LLM interactions.
    • Collects telemetry per execution, including model info, usage metrics, and output details.
    • Supports session and conversation tracking to correlate related requests.
    • Observability is optional and gracefully bypasses tracing when credentials or client initialization are unavailable.

✏️ Tip: You can customize this high-level summary in your review settings.

@avirajsingh7 avirajsingh7 self-assigned this Nov 27, 2025
@avirajsingh7 avirajsingh7 added the enhancement New feature or request label Nov 27, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 27, 2025

Walkthrough

Adds a decorator factory observe_llm_execution that, when provided credentials, initializes a Langfuse client and wraps an LLM provider’s execute function to emit traces and generation data (output, usage, model). Integrates this decorator into execute_job to apply observability when credentials exist.

Changes

Cohort / File(s) Change Summary
Langfuse tracing helper
backend/app/core/langfuse/langfuse.py
Adds `observe_llm_execution(session_id: str
LLM job execution integration
backend/app/services/llm/jobs.py
Fetches Langfuse provider credentials via get_provider_credential, extracts conversation_id from request query, wraps provider_instance.execute with observe_llm_execution, and calls the decorated execute function.

Sequence Diagram(s)

sequenceDiagram
    participant Job as execute_job
    participant Decorator as observe_llm_execution (wrapper)
    participant Langfuse as Langfuse Client
    participant Provider as LLM Provider

    Job->>Decorator: call decorated_execute(query, ...)
    alt credentials available & client init OK
        Decorator->>Langfuse: init client (credentials)
        Decorator->>Langfuse: create trace (session_id, conversation_id)
        Decorator->>Langfuse: create generation
        Decorator->>Provider: execute(query)
        alt success
            Provider-->>Decorator: result + usage
            Decorator->>Langfuse: update generation (output, usage, model)
            Decorator->>Langfuse: update trace & flush
            Decorator-->>Job: return result
        else failure
            Provider-->>Decorator: exception
            Decorator->>Langfuse: flush data
            Decorator-->>Job: propagate exception
        end
    else credentials missing or client init failed
        Decorator->>Provider: execute(query) (bypass)
        Provider-->>Decorator: result
        Decorator-->>Job: return result
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Files to inspect closely:
    • backend/app/core/langfuse/langfuse.py — trace/generation lifecycle, error handling, and client init logic.
    • backend/app/services/llm/jobs.py — correct credential fetching, conversation_id extraction, and decorated execute call signature compatibility.

Poem

🐰
I nibble on traces in moonlit code,
Wrapping each call down the winding road.
With credentials snug, I stitch each run,
Flushing bright threads when the work is done.
Hop—observability's neatly sowed.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Langfuse observability to Unified API' accurately summarizes the main change: introducing Langfuse observability into the LLM provider execution flow across all providers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add_langfuse_llm_api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 84.78261% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/core/langfuse/langfuse.py 84.21% 6 Missing ⚠️
backend/app/services/llm/jobs.py 87.50% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/core/langfuse/langfuse.py (2)

3-3: Optional: modernize typing imports and Dict usage to match Ruff hints.

Ruff is flagging Callable and Dict from typing; in Python 3.11+ you can simplify by importing Callable from collections.abc and using builtin dict[...] instead of Dict[...]. This is purely stylistic but will keep the module aligned with current best practices and avoid future deprecation noise.

Example (conceptual only):

-from typing import Any, Callable, Dict, Optional
+from collections.abc import Callable
+from typing import Any, Optional
...
-    input: Dict[str, Any],
-    metadata: Optional[Dict[str, Any]] = None,
+    input: dict[str, Any],
+    metadata: Optional[dict[str, Any]] = None,

Also applies to: 55-61, 73-78, 88-92


114-218: Tighten type hints on observe_llm_execution and its wrapper.

The decorator logic looks sound and preserves the original (response, error) contract, including graceful fallback when credentials are missing or client init fails. To better leverage type checking (and per project guidelines on type hints), consider adding explicit return types for the decorator and wrapper:

-def observe_llm_execution(
-    session_id: str | None = None,
-    credentials: dict | None = None,
-):
+def observe_llm_execution(
+    session_id: str | None = None,
+    credentials: dict | None = None,
+) -> Callable:
@@
-    def decorator(func: Callable) -> Callable:
+    def decorator(func: Callable) -> Callable:
@@
-        def wrapper(completion_config: CompletionConfig, query: QueryParams, **kwargs):
+        def wrapper(
+            completion_config: CompletionConfig,
+            query: QueryParams,
+            **kwargs,
+        ) -> tuple[LLMCallResponse | None, str | None]:

You can later narrow the Callable annotations if you want stronger guarantees, but even this minimal change makes the behavior clearer to tooling without affecting runtime.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1821b84 and 4c6a07b.

📒 Files selected for processing (2)
  • backend/app/core/langfuse/langfuse.py (2 hunks)
  • backend/app/services/llm/jobs.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/core/langfuse/langfuse.py
  • backend/app/services/llm/jobs.py
backend/app/core/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/

Files:

  • backend/app/core/langfuse/langfuse.py
backend/app/core/langfuse/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place Langfuse observability integration under backend/app/core/langfuse/

Files:

  • backend/app/core/langfuse/langfuse.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/llm/jobs.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

Applied to files:

  • backend/app/core/langfuse/langfuse.py
  • backend/app/services/llm/jobs.py
🧬 Code graph analysis (2)
backend/app/core/langfuse/langfuse.py (3)
backend/app/models/llm/request.py (2)
  • CompletionConfig (49-58)
  • QueryParams (35-46)
backend/app/models/llm/response.py (1)
  • LLMCallResponse (42-52)
backend/app/tests/services/llm/providers/test_openai.py (2)
  • completion_config (32-37)
  • provider (27-29)
backend/app/services/llm/jobs.py (3)
backend/app/crud/credentials.py (1)
  • get_provider_credential (121-159)
backend/app/core/langfuse/langfuse.py (1)
  • observe_llm_execution (114-218)
backend/app/services/llm/providers/base.py (1)
  • execute (35-55)
🪛 Ruff (0.14.6)
backend/app/core/langfuse/langfuse.py

3-3: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)


3-3: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/services/llm/jobs.py (2)

187-193: Confirm get_provider_credential supports provider=\"langfuse\".

This call assumes the credentials CRUD/validation layer recognizes "langfuse" as a valid provider; otherwise validate_provider inside get_provider_credential will raise and short‑circuit the LLM job before the actual provider executes.

Please double‑check that:

  • "langfuse" is included wherever provider names are validated, and
  • Langfuse credentials are stored with the expected shape so that decrypt_credentials returns the public_key / secret_key / host fields used in observe_llm_execution.

194-205: Verify provider/session lifetime and note clean fallback when Langfuse is absent.

decorated_execute is created and invoked after the with Session(engine) as session block has exited. That’s fine as long as:

  • get_llm_provider only uses the DB session during provider construction (e.g., to fetch credentials/config), and
  • provider_instance.execute does not depend on the original Session remaining open.

If any provider still uses the passed session during execute, it should instead manage its own short‑lived sessions internally, or decorated_execute should be moved back inside the with Session(...) block.

On the positive side, the decorator is correctly wired:

  • When langfuse_credentials is None or invalid, observe_llm_execution will call through to provider_instance.execute unchanged.
  • When credentials are valid, you get tracing without altering the external (response, error) behavior.

@avirajsingh7 avirajsingh7 linked an issue Nov 27, 2025 that may be closed by this pull request
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
backend/app/core/langfuse/langfuse.py (2)

173-175: Simplify variable declaration and assignment.

The separate type hint declarations on lines 173-174 followed by assignment on line 175 are unnecessary. Python's type inference combined with the function's return annotation provides sufficient typing.

Apply this diff:

-                # Execute the actual LLM call
-                response: LLMCallResponse | None
-                error: str | None
-                response, error = func(completion_config, query, **kwargs)
+                # Execute the actual LLM call
+                response, error = func(completion_config, query, **kwargs)

114-220: Consider leveraging the existing LangfuseTracer class to reduce duplication.

The decorator reimplements logic similar to LangfuseTracer (lines 14-111), including trace/generation creation, error handling, and flushing. Refactoring the decorator to use LangfuseTracer internally would improve maintainability and eliminate the duplicate error-handling blocks (lines 198-203 vs. 209-214).

Example refactor:

def observe_llm_execution(
    session_id: str | None = None,
    credentials: dict | None = None,
):
    def decorator(func: Callable) -> Callable:
        @wraps(func)
        def wrapper(completion_config: CompletionConfig, query: QueryParams, **kwargs):
            if not credentials or not all(
                key in credentials for key in ["public_key", "secret_key", "host"]
            ):
                logger.info("[Langfuse] No credentials - skipping observability")
                return func(completion_config, query, **kwargs)

            tracer = LangfuseTracer(credentials=credentials, session_id=session_id)
            
            # Use tracer methods for trace/generation lifecycle
            tracer.start_trace(
                name="unified-llm-call",
                input={"query": query.input},
                metadata={"provider": completion_config.provider},
                tags=[completion_config.provider],
            )
            tracer.start_generation(
                name=f"{completion_config.provider}-completion",
                input={"query": query.input},
            )
            
            try:
                response, error = func(completion_config, query, **kwargs)
                if response:
                    tracer.end_generation(
                        output={"status": "success", "output": response.response.output.text},
                        usage={"input": response.usage.input_tokens, "output": response.usage.output_tokens},
                        model=response.response.model,
                    )
                    tracer.update_trace(
                        tags=[completion_config.provider],
                        output={"status": "success", "output": response.response.output.text},
                    )
                else:
                    tracer.log_error(error or "Unknown error")
                
                tracer.flush()
                return response, error
            except Exception as e:
                tracer.log_error(str(e))
                tracer.flush()
                raise
        
        return wrapper
    return decorator
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c6a07b and 5748d69.

📒 Files selected for processing (1)
  • backend/app/core/langfuse/langfuse.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/core/langfuse/langfuse.py
backend/app/core/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/

Files:

  • backend/app/core/langfuse/langfuse.py
backend/app/core/langfuse/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place Langfuse observability integration under backend/app/core/langfuse/

Files:

  • backend/app/core/langfuse/langfuse.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

Applied to files:

  • backend/app/core/langfuse/langfuse.py
🧬 Code graph analysis (1)
backend/app/core/langfuse/langfuse.py (2)
backend/app/models/llm/request.py (2)
  • CompletionConfig (49-58)
  • QueryParams (35-46)
backend/app/models/llm/response.py (1)
  • LLMCallResponse (42-52)
🪛 Ruff (0.14.6)
backend/app/core/langfuse/langfuse.py

3-3: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)


3-3: typing.Dict is deprecated, use dict instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/core/langfuse/langfuse.py (1)

183-186: The review comment is incorrect. usage_details is the correct and preferred parameter for Langfuse 2.60.3.

Based on verification:

  • Langfuse version 2.60.3 uses usage_details as the current v2/v3 standard parameter for generation.end()
  • The format {"input": ..., "output": ...} matches the expected generic-style structure
  • The usage parameter at line 95 is legacy/v1 style but remains backward-compatible
  • Both approaches work; usage_details is actually more modern and correct

No changes are needed. The code at lines 183-186 is properly implemented.

name="unified-llm-call",
input=query.input,
metadata=trace_metadata,
tags=[completion_config.provider],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is provider detail being repeated both in metadata and tags

session_id=session_id,
)

langfuse.flush()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can a function for marking the status and error, and then use that function here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now,
keeping it simple for extensibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Observability with Unified API

3 participants