Skip to content

Conversation

@AkhileshNegi
Copy link
Collaborator

@AkhileshNegi AkhileshNegi commented Nov 24, 2025

Summary

Target issue is #449

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • Upload flow now accepts pre-parsed item lists (no raw CSV parsing), with case-insensitive header handling, relaxed validation, and clearer errors.
  • Per-item processing with intermediate and final flushes improves upload reliability and reduces race conditions.
  • Improved persistence error handling with safer rollback on failures.

Summary by CodeRabbit

  • New Features

    • CSV dataset uploads now support case-insensitive column headers for greater flexibility.
  • Improvements

    • Enhanced error handling during dataset uploads with improved rollback mechanisms.
  • Tests

    • Updated test coverage for flexible CSV header matching and upload scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

Walkthrough

Removes the async CSV upload function from backend/app/crud/evaluations/core.py, renames and reworks the Langfuse upload to accept pre-parsed item dicts with per-original-item flushes, wraps DB commit/refresh calls in try/except with rollback, normalizes CSV headers case-insensitively in the API and maps columns, updates public exports, and adjusts tests to use the renamed upload function and items input shape.

Changes

Cohort / File(s) Summary
Core CRUD changes
backend/app/crud/evaluations/core.py
Deleted the public async CSV upload function. Updated CRUD functions (create_evaluation_run, list_evaluation_runs, get_evaluation_run_by_id, update_evaluation_run) to wrap commit/refresh in try/except with rollback and improved error handling; removed imports/types tied to the deleted upload function.
Langfuse upload behavior
backend/app/crud/evaluations/langfuse.py
Renamed upload_dataset_to_langfuse_from_csvupload_dataset_to_langfuse; signature now accepts items: list[dict[str,str]] instead of CSV bytes. Removed CSV parsing/validation and iterates provided items, duplicating per duplication_factor, performing a per-original-item flush inside the loop and a final flush after all items; adjusted logs and errors accordingly.
API CSV handling & responses
backend/app/api/routes/evaluation.py
Import updated to new upload function. Added _dataset_to_response(dataset) -> DatasetUploadResponse. CSV header handling broadened: validates presence of headers, normalizes headers case-insensitively, derives actual question_col/answer_col, extracts rows using those names, tracks original_items_count and total_items_count, and passes pre-parsed items to Langfuse upload. List/get endpoints now use _dataset_to_response.
Public exports
backend/app/crud/evaluations/__init__.py
Updated exported symbol name from upload_dataset_to_langfuse_from_csv to upload_dataset_to_langfuse in __all__.
Tests updated for items API & flush behavior
backend/app/tests/crud/evaluations/test_langfuse.py, backend/app/tests/api/routes/test_evaluation.py
Renamed tests, fixtures, and patches to align with upload_dataset_to_langfuse and items input; replaced CSV inputs with pre-parsed item dicts; adjusted assertions for per-original-item flush plus final flush and updated mocks/patch targets across API tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as Evaluation API
    participant Lang as Langfuse Adapter/SDK
    participant DB as Database

    Client->>API: POST upload (CSV file)
    API->>API: parse CSV -> normalize headers (case-insensitive), map question/answer cols, build items list
    API->>Lang: upload_dataset_to_langfuse(items, name, duplication_factor)
    loop for each original item
        Lang->>Lang: create duplicated dataset items
        rect rgba(200,230,200,0.18)
            Lang->>Lang: per-original-item flush
        end
    end
    Lang->>Lang: final flush after all items
    API->>DB: commit & refresh evaluation run
    alt commit succeeds
        DB-->>API: ok
    else commit fails
        DB--x API: error
        API->>DB: rollback
    end
    API-->>Client: response with dataset/run details
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Focus areas:
    • Verify all call sites and imports updated to the renamed function.
    • Inspect per-original-item flush placement and final flush for correctness and performance implications.
    • Validate CSV header normalization and column mapping edge cases.
    • Confirm DB commit/rollback paths and error logging.
    • Ensure tests correctly mock/patch the new symbol and reflect flush counting.

Suggested labels

ready-for-review

Suggested reviewers

  • avirajsingh7
  • Prajna1999
  • kartpop

Poem

🐰 I hopped through rows and mapped each name,
I flushed each batch to tame the racey game.
Headers folded, case made light,
I saved the run and rolled back tight.
A carrot toast to items done just right.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Evaluation: Upload Dataset Improvements' clearly relates to the main changes in the PR, which focus on improving the dataset upload flow for evaluations through refactoring CSV parsing, adding case-insensitive headers, and improving error handling.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/evaluation-upload-dataset

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69ed744 and edb3a5f.

📒 Files selected for processing (1)
  • backend/app/tests/api/routes/test_evaluation.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/api/routes/test_evaluation.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)

63-65: Patch targets correctly updated to renamed upload_dataset_to_langfuse function

All affected tests now patch app.api.routes.evaluation.upload_dataset_to_langfuse, keeping the mock at the route layer and aligned with the new public API name. Expectations (return_value, assert_called_once) remain valid and don’t need further adjustment here.

Also applies to: 147-149, 191-193, 230-232, 269-271, 359-361


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AkhileshNegi AkhileshNegi linked an issue Nov 24, 2025 that may be closed by this pull request
@AkhileshNegi AkhileshNegi marked this pull request as ready for review November 24, 2025 15:36
@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 67.24138% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/evaluations/core.py 6.25% 15 Missing ⚠️
backend/app/api/routes/evaluation.py 69.23% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/crud/evaluations/langfuse.py (1)

318-323: Per-item flush strategy looks good.

The per-item flush after processing each original item's duplicates, combined with a final flush, addresses potential batching issues in the Langfuse SDK. This pattern ensures proper separation of Q&A pairs.

Note: If datasets grow significantly beyond the current 1MB limit, consider batching flushes (e.g., every N items) to balance reliability and performance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e4f7d3 and 25b7e68.

📒 Files selected for processing (3)
  • backend/app/crud/evaluations/core.py (7 hunks)
  • backend/app/crud/evaluations/langfuse.py (1 hunks)
  • backend/app/tests/crud/evaluations/test_langfuse.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/evaluations/langfuse.py
  • backend/app/tests/crud/evaluations/test_langfuse.py
  • backend/app/crud/evaluations/core.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/evaluations/langfuse.py
  • backend/app/crud/evaluations/core.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

Applied to files:

  • backend/app/crud/evaluations/langfuse.py
🧬 Code graph analysis (3)
backend/app/crud/evaluations/langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
backend/app/tests/crud/evaluations/test_langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
backend/app/crud/evaluations/core.py (2)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)
  • refresh (32-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/crud/evaluations/test_langfuse.py (3)

419-420: Test assertions correctly updated for per-item + final flush pattern.

The flush count assertion (4 = 3 items + 1 final) aligns with the implementation changes in backend/app/crud/evaluations/langfuse.py.


492-493: LGTM: Flush count reflects valid items only.

Correctly expects 3 flushes (2 valid items + 1 final), as invalid rows are skipped before the per-item flush.


545-546: LGTM: Flush count consistent with duplication_factor=1 scenario.

The assertion (4 = 3 items + 1 final) correctly verifies the flush pattern when duplication_factor is 1.

backend/app/crud/evaluations/core.py (6)

38-46: Duplication factor validation is well-implemented.

The bounds check (1-100) with descriptive error messages provides good guardrails for users.


48-56: CSV size limit appropriately enforced.

The 1MB limit with clear error messaging helps prevent resource exhaustion. The size calculation and formatting are correct.


139-144: Per-item flush + final flush pattern is consistent.

The implementation matches the pattern in backend/app/crud/evaluations/langfuse.py and addresses batching concerns mentioned in the comments.


206-212: Robust error handling with rollback added.

The try/except wrapper with explicit rollback ensures database consistency when commit fails. This is a best practice for transactional operations.


343-349: LGTM: Database error handling with rollback.

Consistent with the pattern in create_evaluation_run. The error handling properly maintains database integrity.


16-16: All callers have been correctly updated for the sync conversion.

Verification confirms that upload_dataset_to_langfuse and its wrapper upload_dataset_to_langfuse_from_csv are both properly defined as sync functions (using def not async def). The call site in evaluation.py:240 correctly omits the await keyword, and no remaining await calls to either function were found in the codebase. The conversion has been completed correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backend/app/crud/evaluations/core.py (1)

67-98: Duplicate header detection correctly implemented.

The single-pass normalization with duplicate detection addresses the previous review concern. The implementation correctly:

  • Maps normalized names to lists of originals to detect conflicts
  • Returns a descriptive error showing which headers conflict
  • Uses the first occurrence (originals[0]) for the clean mapping

This prevents silent data loss from case-variant duplicates like "Question" and "question".

🧹 Nitpick comments (1)
backend/app/crud/evaluations/core.py (1)

48-56: Consider reusing the constant for consistency.

The size limit logic is correct. Minor suggestion: reuse max_size_bytes in the calculation to avoid duplicating the magic number.

     # Validate CSV file size (max 1MB)
     max_size_bytes = 1_048_576  # 1MB
     if len(csv_content) > max_size_bytes:
-        size_mb = len(csv_content) / 1_048_576
+        size_mb = len(csv_content) / max_size_bytes
         return (
             False,
             None,
             f"CSV file too large ({size_mb:.2f}MB). Maximum allowed is 1MB",
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25b7e68 and 30792a4.

📒 Files selected for processing (1)
  • backend/app/crud/evaluations/core.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/evaluations/core.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/evaluations/core.py
🧬 Code graph analysis (1)
backend/app/crud/evaluations/core.py (2)
backend/app/models/collection.py (1)
  • norm (92-98)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/crud/evaluations/core.py (6)

38-46: LGTM on duplication_factor validation.

The validation correctly enforces the 1–100 range per PR objectives. Minor nit: the docstring on line 31 mentions "default 5" but there's no default value in the signature—consider updating the docstring to say "Number of times to duplicate each item (1-100)".


101-134: LGTM on case-insensitive header matching.

The approach correctly validates using normalized names while accessing row data with original field names—DictReader keys match the original headers exactly, so row.get(golden_question) works correctly.


148-174: Per-item flush trade-off is acceptable given the constraints.

The flush-per-original-item approach correctly prevents batching race conditions as noted in the PR objectives. Given the 1MB file limit and max 100× duplication, the performance overhead is reasonable.

Note that partial uploads are possible if individual create_dataset_item calls fail—the function logs errors and continues, returning total_uploaded which may be less than expected. This is a reasonable graceful-degradation strategy, though callers should be aware.


236-242: LGTM on database rollback handling.

The try/except pattern with rollback, logging (including stack trace), and re-raise correctly ensures transactional integrity while maintaining visibility into failures.


373-379: Consistent rollback pattern applied.

Matches the create_evaluation_run pattern—maintains consistency across CRUD operations.


189-189: Verify id attribute presence on Langfuse Dataset objects.

The web search indicates create_dataset() returns a langfuse.api.Dataset object with attributes including name, description, and metadata, but does not definitively confirm whether id is present. The defensive hasattr check appears reasonable given incomplete SDK documentation. If the Langfuse SDK always includes an id attribute on successfully created datasets, direct access would be cleaner; however, this requires manual verification against the actual SDK implementation or official API documentation to confirm the guaranteed contract.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/crud/evaluations/core.py (2)

140-165: Per-item flushing addresses race conditions but has performance implications.

The per-item flush after each original item's duplicates (line 162) plus the final flush (line 165) aligns with the PR objective of preventing race conditions in Langfuse SDK's batching. However, for CSVs with many rows, this could significantly increase upload time due to frequent network round-trips.

Consider documenting this trade-off or adding a comment explaining why per-item flushing is necessary for correctness.


58-63: Consider more specific error handling for CSV decoding failures.

If the CSV file isn't UTF-8 encoded, the decode("utf-8") call will raise UnicodeDecodeError, which gets caught by the generic exception handler at line 185. The resulting error message may not clearly indicate the encoding issue to users.

     try:
-        csv_text = csv_content.decode("utf-8")
+        try:
+            csv_text = csv_content.decode("utf-8")
+        except UnicodeDecodeError:
+            return False, None, "CSV file must be UTF-8 encoded"
+
         csv_reader = csv.DictReader(io.StringIO(csv_text))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30792a4 and 5f6f8bf.

📒 Files selected for processing (1)
  • backend/app/crud/evaluations/core.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/evaluations/core.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/evaluations/core.py
🧬 Code graph analysis (1)
backend/app/crud/evaluations/core.py (1)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/crud/evaluations/core.py (6)

16-23: Function signature is well-typed and appropriately synchronous.

The conversion from async to sync aligns with the PR objective of preventing race conditions. Type hints are comprehensive.


38-46: Duplication factor validation looks good.

The range validation (1–100) is correctly implemented with clear error messages.


48-56: CSV file size validation is well-implemented.

The 1MB limit with a clear error message showing the actual file size provides good user feedback.


65-102: Header normalization and duplicate detection are well-implemented.

The approach correctly identifies all original headers that normalize to the same lowercase key and provides a descriptive error message. The required header validation (question/answer) uses case-insensitive matching as intended.


227-233: Database error handling with rollback is correctly implemented.

The try/except pattern properly rolls back on failure, logs the error with stack trace, and re-raises to allow caller handling.


364-370: Consistent error handling pattern for database updates.

The rollback-on-failure pattern matches create_evaluation_run, maintaining consistency across CRUD operations.

@AkhileshNegi AkhileshNegi self-assigned this Nov 25, 2025
@AkhileshNegi AkhileshNegi added the enhancement New feature or request label Nov 25, 2025
@AkhileshNegi AkhileshNegi changed the title Evaluation: Dataset Improvements Evaluation: Upload Dataset Improvements Nov 25, 2025
csv_reader = csv.DictReader(io.StringIO(csv_text))

if not csv_reader.fieldnames:
return False, None, "CSV file has no headers"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this error be raised instead of returning?

return False, None, "CSV file has no headers"

normalized_to_originals = {}
for field in csv_reader.fieldnames:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't a tuple instead of hashmap do here if we want to remove duplicate headers?

if normalized not in normalized_to_originals:
normalized_to_originals[normalized] = []
normalized_to_originals[normalized].append(field)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from LOC 62 to 103 can be extracted to a csv sanitizing and checking util

coderabbitai[bot]

This comment was marked as outdated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
backend/app/api/routes/evaluation.py (2)

222-224: Preserve explicit HTTPException validation errors.

As noted in a previous review, the broad except Exception catches HTTPException raised by header validation (lines 182, 191, 210) and re-wraps it, mangling the message. Let HTTPException propagate unchanged:

+    except HTTPException:
+        # Let explicit validation errors propagate as-is
+        raise
     except Exception as e:
         logger.error(f"[upload_dataset] Failed to parse CSV | {e}", exc_info=True)
-        raise HTTPException(status_code=422, detail=f"Invalid CSV file: {e}")
+        raise HTTPException(status_code=422, detail="Invalid CSV file")

203-207: Row values may be None - .strip() can fail.

As noted in a previous review, csv.DictReader can return None for column values (e.g., when a row has fewer columns than headers). The current code will raise AttributeError: 'NoneType' object has no attribute 'strip' in that case.

Apply this fix:

         for row in csv_reader:
-            question = row.get(question_col, "").strip()
-            answer = row.get(answer_col, "").strip()
+            raw_question = row.get(question_col)
+            raw_answer = row.get(answer_col)
+            question = (raw_question or "").strip()
+            answer = (raw_answer or "").strip()
             if question and answer:
                 original_items.append({"question": question, "answer": answer})
🧹 Nitpick comments (1)
backend/app/crud/evaluations/langfuse.py (1)

252-271: Consider defensive key access for item dictionaries.

The docstring states items are "already validated," but if a malformed item dict is passed (missing question or answer key), this would raise a KeyError at lines 258-259 or 261. Since this is an internal function, the current approach may be acceptable, but you could add defensive access:

-                        input={"question": item["question"]},
-                        expected_output={"answer": item["answer"]},
+                        input={"question": item.get("question", "")},
+                        expected_output={"answer": item.get("answer", "")},

Alternatively, ensure the caller always validates items before calling this function.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcd37e6 and 69ed744.

📒 Files selected for processing (4)
  • backend/app/api/routes/evaluation.py (6 hunks)
  • backend/app/crud/evaluations/__init__.py (2 hunks)
  • backend/app/crud/evaluations/langfuse.py (2 hunks)
  • backend/app/tests/crud/evaluations/test_langfuse.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/evaluations/langfuse.py
  • backend/app/tests/crud/evaluations/test_langfuse.py
  • backend/app/crud/evaluations/__init__.py
  • backend/app/api/routes/evaluation.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/evaluations/langfuse.py
  • backend/app/crud/evaluations/__init__.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/evaluation.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/

Applied to files:

  • backend/app/crud/evaluations/langfuse.py
  • backend/app/tests/crud/evaluations/test_langfuse.py
  • backend/app/crud/evaluations/__init__.py
🧬 Code graph analysis (4)
backend/app/crud/evaluations/langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
backend/app/tests/crud/evaluations/test_langfuse.py (2)
backend/app/crud/evaluations/langfuse.py (1)
  • upload_dataset_to_langfuse (220-295)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
backend/app/crud/evaluations/__init__.py (1)
backend/app/crud/evaluations/langfuse.py (1)
  • upload_dataset_to_langfuse (220-295)
backend/app/api/routes/evaluation.py (2)
backend/app/crud/evaluations/langfuse.py (1)
  • upload_dataset_to_langfuse (220-295)
backend/app/models/evaluation.py (1)
  • DatasetUploadResponse (25-44)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (11)
backend/app/crud/evaluations/langfuse.py (2)

220-225: LGTM! Clean refactor to item-based workflow.

The function signature change from csv_content: bytes to items: list[dict[str, str]] is a good separation of concerns, moving CSV parsing responsibility to the API layer.


273-278: Per-item flush addresses race conditions but may impact performance.

The per-item flush strategy is a reasonable mitigation for Langfuse SDK batching issues. Note that for large datasets, this synchronous flushing pattern may increase upload latency. The final flush on line 278 is technically redundant when items exist but serves as a safety net for edge cases.

backend/app/crud/evaluations/__init__.py (1)

25-29: LGTM! Public API export correctly updated.

The import and __all__ export are properly aligned with the renamed function in langfuse.py.

backend/app/tests/crud/evaluations/test_langfuse.py (3)

386-393: LGTM! Test fixture correctly updated for item-based workflow.

The valid_items fixture properly provides the expected data structure with question and answer keys.


418-419: LGTM! Flush count assertions correctly match implementation.

The test correctly expects 3 items + 1 final = 4 flush calls, accurately reflecting the per-item flush pattern in the implementation.


489-511: LGTM! Error handling test correctly validates partial success scenario.

The test properly validates that item creation errors are logged but don't stop processing, and the returned total_items reflects only successful uploads.

backend/app/api/routes/evaluation.py (5)

44-54: LGTM! Good DRY improvement with response helper.

The _dataset_to_response helper eliminates duplication across the list and get endpoints. The type hint is present as required by coding guidelines.


184-199: LGTM! Clean implementation of case-insensitive header matching.

The approach of building a lowercase-to-original mapping (clean_headers) and then using the original column names for row access is correct and maintains compatibility with csv.DictReader.


118-123: Verify duplication factor range: code says 1-5, PR says 1-100.

The PR description states the duplication factor range should be "1–100", but the implementation constrains it to ge=1, le=5 (max 5). Please verify the intended range and update either the code or the PR description.

If the intended max is 100:

     duplication_factor: int = Form(
         default=5,
         ge=1,
-        le=5,
-        description="Number of times to duplicate each item (min: 1, max: 5)",
+        le=100,
+        description="Number of times to duplicate each item (min: 1, max: 100)",
     ),

261-266: LGTM! Correct usage of refactored upload function.

The call to upload_dataset_to_langfuse correctly passes the pre-parsed original_items list, aligning with the new item-based signature.


340-340: LGTM! Consistent response formatting across endpoints.

Both the list and get endpoints now use _dataset_to_response, ensuring consistent response structure.

Also applies to: 371-371

Copy link
Collaborator

@Prajna1999 Prajna1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM functionality wise. May be we can take up nitpicks/refactors later.

@AkhileshNegi AkhileshNegi merged commit f3b8f4d into main Nov 28, 2025
2 checks passed
@AkhileshNegi AkhileshNegi deleted the enhancement/evaluation-upload-dataset branch November 28, 2025 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluation: Inconsistent Golden Q&As in Langfuse

4 participants