Skip to content

Conversation

Vattikuti-Manideep-Sitaram
Copy link

@Vattikuti-Manideep-Sitaram Vattikuti-Manideep-Sitaram commented Oct 10, 2025

Description

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Walkthrough

Adds a new ontology graph ingestion pipeline module implementing async RDF/OWL parsing into triple dicts, mapping triples to OntologyEntity DataPoint objects with deduplicated edges/metadata, and providing async task + runner to persist datapoints into a Cognee dataset while streaming status updates.

Changes

Cohort / File(s) Summary
Ontology ingestion pipeline
cognee/api/v1/cognify/ontology_graph_pipeline.py
New module adding: load_ontology_data (async RDF/OWL parse → list[dict] triples), OntologyEntity (DataPoint subclass with name, uri, related_to, metadata), ontology_to_datapoints (builds graph of OntologyEntity objects and deduplicated edges), ontology_ingestion_task (task to load/convert/persist datapoints), and run_ontology_pipeline (env reset, dataset creation, run task and stream async status).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant Runner as run_ontology_pipeline
    participant Loader as load_ontology_data
    participant Converter as ontology_to_datapoints
    participant Task as ontology_ingestion_task
    participant Store as add_data_points

    Client->>Runner: run_ontology_pipeline(file, format, dataset_name)
    activate Runner
    Runner->>Runner: prune/reset env, init Cognee, create dataset
    Runner->>Loader: load_ontology_data(file, format)
    Loader-->>Runner: triples (list[dict])
    Runner->>Converter: ontology_to_datapoints(triples)
    Converter-->>Runner: list[OntologyEntity] datapoints
    Runner->>Task: ontology_ingestion_task(datapoints, format)
    Task->>Store: add_data_points(datapoints)
    Store-->>Task: ack/result
    loop stream statuses
        Task-->>Runner: status update
        Runner-->>Client: yield status update
    end
    deactivate Runner
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

run-checks

Poem

I nibble at triples, hop from URI to name,
stitching edges gently into a datapoint frame.
Async I tumble, dedupe, and play—
datasets awaken where graphs and rabbits stay. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR’s Description section is entirely wrapped in HTML comments so no human-readable summary is rendered, leaving the required Description field empty despite other sections being filled out. Remove the HTML comment markers around the Description section and provide a clear, human-written summary of the changes, including motivation, approach, and key behaviors introduced by this ontology pipeline.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the primary change of adding a new pipeline for direct OWL file ingestion, aligning well with the implemented ontology_graph_pipeline features.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Vattikuti-Manideep-Sitaram, thank you for submitting a PR! We will respond as soon as possible.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
cognee/api/v1/cognify/ontology_graph_pipeline.py (2)

50-50: Consider preserving RDF triple structure in text representation.

The simple concatenation f"{t['subject']} {t['predicate']} {t['object']}" creates a space-separated string that loses semantic boundaries and makes URIs hard to distinguish. For better downstream processing and readability, consider using N-Triples format or a more structured representation.

Use N-Triples format for clarity:

-        text = f"{t['subject']} {t['predicate']} {t['object']}"
+        # N-Triples format: <subject> <predicate> <object> .
+        text = f"<{t['subject']}> <{t['predicate']}> <{t['object']}> ."

Or preserve structure with delimiters:

-        text = f"{t['subject']} {t['predicate']} {t['object']}"
+        text = f"Subject: {t['subject']} | Predicate: {t['predicate']} | Object: {t['object']}"

69-70: Move imports to module level.

Importing modules inside a function is a code smell that can cause issues with:

  • IDE autocomplete and type checking
  • Import time side effects
  • Code readability

Move these imports to the top of the file:

+from cognee.low_level import setup
+import cognee
+
 from cognee.modules.pipelines import run_tasks
 from cognee.modules.pipelines.tasks.task import Task

And remove them from the function:

 async def run_ontology_pipeline(ontology_file, dataset_name: str = "ontology_dataset"):
     """
     Run the ontology ingestion pipeline directly from a file object (no file path).
     """
-    from cognee.low_level import setup
-    import cognee
-
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 760a9de and 6dd70b9.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
🔇 Additional comments (2)
cognee/api/v1/cognify/ontology_graph_pipeline.py (2)

83-84: Pass format parameter through the pipeline.

If you implement format auto-detection or configurability in load_ontology_data, the format should be passed through to convert_triples_to_chunks for correct MIME type assignment.

After implementing format configurability, update the pipeline to pass the format:

+async def run_ontology_pipeline(
+    ontology_file, 
+    dataset_name: str = "ontology_dataset",
+    format: Optional[str] = None
+):
     ...
     # ✅ Process ontology file directly
-    triples = await load_ontology_data(ontology_file)
-    chunks = convert_triples_to_chunks(triples)
+    triples = await load_ontology_data(ontology_file, format=format)
+    chunks = convert_triples_to_chunks(triples, format=format or "xml")

87-94: LGTM! Task configuration is appropriate.

The task definitions for extract_graph_from_data and add_data_points with batch size 20 are well-structured and follow the existing Task patterns from the codebase.

Comment on lines 69 to 74
from cognee.low_level import setup
import cognee

await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await setup()
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove destructive prune operations or make them optional.

The prune_data() and prune_system() calls delete all existing data and system metadata without warning. This is unexpected behavior for a data ingestion pipeline and could result in data loss if a user runs this on a populated instance.

Either:

  1. Remove these calls entirely and let users manage cleanup separately, or
  2. Make them opt-in via a parameter:
-async def run_ontology_pipeline(ontology_file, dataset_name: str = "ontology_dataset"):
+async def run_ontology_pipeline(
+    ontology_file, 
+    dataset_name: str = "ontology_dataset",
+    clear_existing: bool = False
+):
     """
     Run the ontology ingestion pipeline directly from a file object (no file path).
+    
+    Args:
+        ontology_file: File-like object containing RDF data
+        dataset_name: Name for the dataset
+        clear_existing: If True, prune existing data and system metadata before ingestion
     """
     from cognee.low_level import setup
     import cognee
 
-    await cognee.prune.prune_data()
-    await cognee.prune.prune_system(metadata=True)
+    if clear_existing:
+        await cognee.prune.prune_data()
+        await cognee.prune.prune_system(metadata=True)
     await setup()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from cognee.low_level import setup
import cognee
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await setup()
async def run_ontology_pipeline(
ontology_file,
dataset_name: str = "ontology_dataset",
clear_existing: bool = False
):
"""
Run the ontology ingestion pipeline directly from a file object (no file path).
Args:
ontology_file: File-like object containing RDF data
dataset_name: Name for the dataset
clear_existing: If True, prune existing data and system metadata before ingestion
"""
from cognee.low_level import setup
import cognee
if clear_existing:
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await setup()
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/ontology_graph_pipeline.py around lines 69 to 74, the
code calls cognee.prune.prune_data() and
cognee.prune.prune_system(metadata=True) which perform destructive deletions;
remove these calls or make them opt‑in by adding a parameter (e.g.,
allow_prune=False) to the surrounding function or entry point, wrap the prune
calls in an if allow_prune: block, add a clear warning log before performing
pruning, and ensure the default behavior is non‑destructive (no pruning) so
users must explicitly enable pruning to run those operations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the same pattern of the code_graph_pipeline code. Would love to get some input from the reviewer, Before moving ahead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the code_graph_pipeline file
fd -t f "code_graph_pipeline"

# Search for files containing code_graph_pipeline
rg -l "code_graph_pipeline"

Length of output: 210


🏁 Script executed:

#!/bin/bash
# Examine the code_graph_pipeline.py file, particularly looking for prune operations
cat cognee/api/v1/cognify/code_graph_pipeline.py

Length of output: 4106


@Vattikuti-Manideep-Sitaram, you're absolutely right! I can confirm that code_graph_pipeline.py has the exact same pattern:

await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)
await setup()

This appears to be an established pattern for pipeline operations in the codebase. My concern about the destructive nature of these operations applies equally to both pipelines.

Given that you've followed the existing convention, here are some thoughts for the reviewers/maintainers:

  1. If this is intentional behavior: Consider adding clear documentation in both pipeline functions warning users that running these pipelines will clear all existing data and system metadata. For example:
async def run_ontology_pipeline(ontology_file, dataset_name: str = "ontology_dataset"):
    """
    Run the ontology ingestion pipeline directly from a file object.
    
    ⚠️ WARNING: This pipeline clears all existing data and system metadata 
    before processing. Ensure you have backups if needed.
    
    Args:
        ontology_file: File-like object containing RDF data
        dataset_name: Name for the dataset
    """
  1. Alternative approach: If the prune operations are meant for testing/example purposes, consider extracting them to a setup utility that both pipelines can optionally call, or making it configurable across all pipelines consistently.

For the current PR: Since you're following the established pattern, this implementation is consistent with the codebase. However, I'd recommend waiting for maintainer input on whether they want to:

  • Keep the current pattern as-is with better documentation
  • Refactor both pipelines to make pruning optional
  • Keep it as-is without changes

What are your thoughts, reviewers?


✏️ Learnings added
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)

20-20: Add type hint for the ontology_file parameter.

The ontology_file parameter lacks a type hint. Consider adding a type annotation to improve code clarity and IDE support.

Apply this diff:

-async def load_ontology_data(ontology_file) -> list[dict]:
+async def load_ontology_data(ontology_file: Union[str, bytes, "BinaryIO"]) -> list[dict]:

Note: You may need to import BinaryIO from typing if using that type hint.


54-54: Chunk size calculation may be inaccurate for URIs.

The len(text.split()) calculation splits on whitespace, which can be inaccurate for RDF triples containing long URIs. For example, a triple like http://example.org/subject http://example.org/predicate http://example.org/object would be counted as 1 word instead of 3 semantic units.

Consider using a more accurate calculation method for RDF triples.

Apply this diff:

-        chunk_size=len(text.split()),
+        chunk_size=3,  # Each triple has exactly 3 components (subject, predicate, object)

46-46: Hardcoded MIME type doesn't reflect actual input format.

The mime_type is hardcoded to "application/rdf+xml", but the input ontology file could be in Turtle, N3, or other RDF formats (especially if the suggested format parameter is added to load_ontology_data).

Consider parameterizing the MIME type or deriving it from the actual input format.

If you add a format parameter to load_ontology_data, pass it through to this function:

def convert_triples_to_chunks(triples: list[dict], format: str = "xml") -> list[DocumentChunk]:
    # Map format to MIME type
    mime_type_map = {
        "xml": "application/rdf+xml",
        "turtle": "text/turtle",
        "n3": "text/n3",
        "nt": "application/n-triples",
        # ... add more as needed
    }
    mime_type = mime_type_map.get(format, "application/rdf+xml")
    
    ontology_doc = Document(
        # ...
        mime_type=mime_type
    )

69-70: Consider moving imports to module level.

Imports are placed inside the function, which is unconventional. While this can be useful for avoiding circular imports or lazy loading, it's generally better to place imports at the module level for clarity and consistency.

Apply this diff to move imports to the top of the file:

+from cognee.low_level import setup
+import cognee
+
 # ---------- Step 3: Run Ontology Pipeline ----------
 async def run_ontology_pipeline(ontology_file, dataset_name: str = "ontology_dataset"):
     """
     Run the ontology ingestion pipeline directly from a file object (no file path).
     """
-    from cognee.low_level import setup
-    import cognee
-
     await cognee.prune.prune_data()

Only keep imports inside the function if there's a specific reason (e.g., circular import issues).


87-94: Consider parameterizing task batch size.

The batch_size is hardcoded to 20 for both tasks. While this may be a reasonable default, different ontologies may benefit from different batch sizes based on their size and complexity.

Apply this diff:

-async def run_ontology_pipeline(ontology_file, dataset_name: str = "ontology_dataset"):
+async def run_ontology_pipeline(
+    ontology_file, 
+    dataset_name: str = "ontology_dataset",
+    batch_size: int = 20
+):
     """
     Run the ontology ingestion pipeline directly from a file object (no file path).
+    
+    Args:
+        ontology_file: File-like object containing RDF/OWL ontology data
+        dataset_name: Name for the dataset to create
+        batch_size: Number of chunks to process per batch
     """
     # ... existing code ...
     
     # Define pipeline tasks
     tasks = [
         Task(
             extract_graph_from_data,
             graph_model=KnowledgeGraph,
-            task_config={"batch_size": 20},
+            task_config={"batch_size": batch_size},
         ),
-        Task(add_data_points, task_config={"batch_size": 20}),
+        Task(add_data_points, task_config={"batch_size": batch_size}),
     ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 760a9de and 6dd70b9.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
🔇 Additional comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

1-16: LGTM!

The imports are appropriate for the ontology ingestion pipeline functionality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
cognee/api/v1/cognify/ontology_graph_pipeline.py (4)

1-2: Remove unused imports.

The imports asyncio and Optional are not used in this file.

Apply this diff:

-import asyncio
-from typing import List, Union, Optional
+from typing import List, Union

27-40: Consider validating non-empty graph.

The function doesn't check if the parsed graph contains any triples. An empty or invalid ontology file would return an empty list without warning, potentially causing confusion downstream.

Consider adding a check after parsing:

     g = Graph()
     g.parse(ontology_file, format="xml")  # Works for standard OWL/RDF/XML
 
+    if len(g) == 0:
+        raise ValueError("Ontology file contains no triples")
+
     triples = []

44-71: Add input validation.

The function doesn't validate that the triples list is non-empty. An empty list would create an in-memory Document but no chunks, which may not be the intended behavior.

Consider adding validation:

 def convert_triples_to_chunks(triples: list[dict]) -> list[DocumentChunk]:
     """
     Convert ontology triples into Cognee-compatible DocumentChunk objects.
     """
+    if not triples:
+        raise ValueError("Cannot convert empty triples list to chunks")
+    
     chunks = []

107-107: Consider making pipeline name configurable.

The pipeline name is hardcoded as "ontology_pipeline". For flexibility, consider adding it as an optional parameter.

Apply this diff:

-async def run_ontology_pipeline(ontology_file, dataset_name: str = "ontology_dataset"):
+async def run_ontology_pipeline(
+    ontology_file, 
+    dataset_name: str = "ontology_dataset",
+    pipeline_name: str = "ontology_pipeline"
+):
     """
     Run the ontology ingestion pipeline directly from a file object (no file path).
     """
     # ... rest of function ...
 
     # Run tasks with chunks
-    async for run_status in run_tasks(tasks, dataset.id, chunks, user, "ontology_pipeline"):
+    async for run_status in run_tasks(tasks, dataset.id, chunks, user, pipeline_name):
         yield run_status
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd70b9 and 2f71025.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

91-124: Consider adding error handling for better diagnostics.

While not critical, wrapping the pipeline in a try-except block would improve error messages and make debugging easier.

Apply this diff to add error handling:

 async def run_ontology_pipeline(
     ontology_file, 
     dataset_name: str = "ontology_dataset",
     format: Optional[str] = None
 ):
     """
     Run the ontology ingestion pipeline directly from a file object (no file path).
     
     Args:
         ontology_file: File-like object containing RDF/OWL data
         dataset_name: Name for the dataset to create
         format: RDF serialization format (xml, turtle, n3, json-ld, etc.).
                 If None, rdflib will attempt auto-detection.
     """
+    try:
         from cognee.low_level import setup
         import cognee
 
         await cognee.prune.prune_data()
         await cognee.prune.prune_system(metadata=True)
         await setup()
 
         user = await get_default_user()
         db_engine = get_relational_engine()
 
         async with db_engine.get_async_session() as session:
             dataset = await create_dataset(dataset_name, user, session)
 
         # ✅ Process ontology file directly
         triples = await load_ontology_data(ontology_file, format=format)
+        
+        if not triples:
+            raise ValueError("No triples found in ontology file")
+        
         chunks = convert_triples_to_chunks(triples, format=format or "xml")
 
         # Define pipeline tasks
         tasks = [
             Task(
                 extract_graph_from_data,
                 graph_model=KnowledgeGraph,
                 task_config={"batch_size": 20},
             ),
             Task(add_data_points, task_config={"batch_size": 20}),
         ]
 
         # Run tasks with chunks
         async for run_status in run_tasks(tasks, dataset.id, chunks, user, "ontology_pipeline"):
             yield run_status
+    except Exception as e:
+        import logging
+        logging.error(f"Ontology pipeline failed: {str(e)}")
+        raise RuntimeError(f"Failed to process ontology file: {str(e)}") from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f71025 and ee78dd8.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
🔇 Additional comments (2)
cognee/api/v1/cognify/ontology_graph_pipeline.py (2)

51-87: LGTM! Well-structured triple-to-chunk conversion.

The MIME type mapping is comprehensive, and the chunk creation logic correctly preserves triple metadata. The format parameter enables proper content-type handling.


98-100: Prune operations follow established codebase pattern.

The prune calls are consistent with other pipelines (e.g., code_graph_pipeline.py) and appear to be intentional for pipeline initialization. Based on learnings.

Comment on lines 20 to 33
async def load_ontology_data(ontology_file: Union[str, bytes, "IOBase"]) -> list[dict]:
"""
Loads OWL/RDF ontology directly from a file-like object and extracts RDF triples.

Args:
ontology_file: File-like object or path to RDF data
format: RDF serialization format (xml, turtle, n3, json-ld, etc.).
If None, rdflib will attempt auto-detection.
"""
g = Graph()
try:
g.parse(ontology_file, format=format)
except Exception as e:
raise ValueError(f"Failed to parse ontology file: {str(e)}") from e
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Missing format parameter in function signature.

The function body uses format on line 31 but it's not declared as a parameter on line 20. This will cause a NameError at runtime. The docstring (lines 26-27) documents the parameter, but it's missing from the actual signature.

Apply this diff to add the missing parameter:

-async def load_ontology_data(ontology_file: Union[str, bytes, "IOBase"]) -> list[dict]:
+async def load_ontology_data(ontology_file: Union[str, bytes, "IOBase"], format: Optional[str] = None) -> list[dict]:
     """
     Loads OWL/RDF ontology directly from a file-like object and extracts RDF triples.
     
     Args:
         ontology_file: File-like object or path to RDF data
         format: RDF serialization format (xml, turtle, n3, json-ld, etc.). 
                 If None, rdflib will attempt auto-detection.
     """

Based on learnings: rdflib 7.2.1 supports auto-detection when format is None.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def load_ontology_data(ontology_file: Union[str, bytes, "IOBase"]) -> list[dict]:
"""
Loads OWL/RDF ontology directly from a file-like object and extracts RDF triples.
Args:
ontology_file: File-like object or path to RDF data
format: RDF serialization format (xml, turtle, n3, json-ld, etc.).
If None, rdflib will attempt auto-detection.
"""
g = Graph()
try:
g.parse(ontology_file, format=format)
except Exception as e:
raise ValueError(f"Failed to parse ontology file: {str(e)}") from e
async def load_ontology_data(
ontology_file: Union[str, bytes, "IOBase"],
format: Optional[str] = None
) -> list[dict]:
"""
Loads OWL/RDF ontology directly from a file-like object and extracts RDF triples.
Args:
ontology_file: File-like object or path to RDF data
format: RDF serialization format (xml, turtle, n3, json-ld, etc.).
If None, rdflib will attempt auto-detection.
"""
g = Graph()
try:
g.parse(ontology_file, format=format)
except Exception as e:
raise ValueError(f"Failed to parse ontology file: {str(e)}") from e
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/ontology_graph_pipeline.py around lines 20 to 33, the
function load_ontology_data uses a variable named format but the parameter is
missing from the signature; add a new parameter format: Optional[str] = None to
the function signature (and import typing.Optional if not already imported) so
rdflib can auto-detect when None, and ensure the docstring matches the new
parameter description; keep the default None to preserve rdflib auto-detection
and raise the same ValueError on parse failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definately need format right to parse the graph ? I dont think it should be optional field what is you take on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all calls to load_ontology_data in the codebase
rg -n -A3 -B3 'load_ontology_data\(' --type=py

Length of output: 1270


🏁 Script executed:

#!/bin/bash
# Check how run_ontology_pipeline calls load_ontology_data
ast-grep --pattern $'async def run_ontology_pipeline($$$) {
  $$$
  load_ontology_data($$$)
  $$$
}'

Length of output: 0


🌐 Web query:

rdflib Graph.parse format parameter: is format auto-detection reliable? Should format be required or optional?

💡 Result:

Short answer: auto-detection is convenient but not fully reliable — prefer to pass format when you can.

Why:

  • rdflib will try to "guess" formats (e.g. by file extension or using rdflib.util.guess_format) and the html parser can auto-detect RDFa/HTurtle/Microdata. This makes format optional in many simple cases. [1][2]
  • However, detection is best‑effort and brittle: guess is based on filename/extension or HTTP Content-Type heuristics, and remote content‑type/auto‑detection is explicitly called unreliable in rdflib issues and docs. For JSON‑LD / other formats there are known pitfalls when parsing from URLs vs. strings. [3][4]

Recommendation:

  • If you control the input (file, API, or content-type), pass format explicitly (format="turtle" or a mime type) — that is robust and inexpensive. [1][2]
  • If you must auto-detect: use rdflib.util.guess_format() first and fall back to trying parsers in a defined order, and handle parse errors (try/except) so you can retry with an explicit format or provide a clear error to the caller. [1][3]

Sources

  1. rdflib plugin parsers / docs (format names, guess_format, html auto-detect). [1]
  2. rdflib docs (parsers/plugins). [2]
  3. rdflib parser implementation / Accept-header and behavior when format not given. [4]
  4. GitHub issue discussing auto-detection limits (JSON-LD, URL vs string). [3]

@Vattikuti-Manideep-Sitaram, you're absolutely right! Format should be required, not optional.

Here's why:

  1. Reliability: According to rdflib documentation, format auto-detection is "best-effort and brittle" - it relies on file extensions or HTTP Content-Type headers, which can be unreliable, especially for formats like JSON-LD.

  2. Explicit is better: When you control the input (which you do in this pipeline), explicitly specifying the format is the robust approach and prevents ambiguous parse failures.

  3. Your code is already correct: Looking at the actual file, line 20 already has format: str as a required parameter:

    async def load_ontology_data(ontology_file: Union[str, bytes, "IOBase"],format: str) -> list[dict]:

    The only minor issue is a missing space after the comma, but the parameter is there and required.

My original suggestion to make it Optional[str] = None was incorrect. The current implementation is the right approach - requiring callers to explicitly specify the ontology format ensures reliable parsing and clear error messages when something goes wrong.

The only cosmetic fix needed is spacing:

async def load_ontology_data(ontology_file: Union[str, bytes, "IOBase"], format: str) -> list[dict]:

✏️ Learnings added
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:0-0
Timestamp: 2025-10-13T13:18:46.294Z
Learning: In cognee/api/v1/cognify/ontology_graph_pipeline.py, the `format` parameter in `load_ontology_data` function must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

110-110: Thread the format parameter through to maintain consistency.

Line 110 calls convert_triples_to_chunks(triples) without passing the format parameter. This means if a user specifies format="turtle", the chunks will be created with the default mime_type="application/rdf+xml" instead of mime_type="text/turtle", resulting in inconsistent metadata.

Apply this diff:

     triples = await load_ontology_data(ontology_file,format)
-    chunks = convert_triples_to_chunks(triples)
+    chunks = convert_triples_to_chunks(triples, format)

Also fix the spacing on line 109:

-    triples = await load_ontology_data(ontology_file,format)
+    triples = await load_ontology_data(ontology_file, format)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee78dd8 and be1d41a.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
🔇 Additional comments (3)
cognee/api/v1/cognify/ontology_graph_pipeline.py (3)

35-47: LGTM! RDF term type information is preserved.

The implementation correctly preserves semantic information like datatypes, language tags, and term types for downstream processing. This addresses the concern from previous reviews about losing RDF metadata.


51-87: LGTM! Clean implementation with proper MIME type mapping.

The function correctly:

  • Maps RDF formats to their appropriate MIME types
  • Creates an in-memory Document representation
  • Converts each triple into a DocumentChunk with preserved metadata
  • Includes a sensible fallback for unknown formats

98-99: Prune operations follow established codebase pattern.

The prune_data() and prune_system() calls follow the same pattern used in code_graph_pipeline.py. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Based on learnings.

@pazone pazone self-requested a review October 13, 2025 12:46
@pazone
Copy link
Contributor

pazone commented Oct 13, 2025

@Vattikuti-Manideep-Sitaram Thanks for the PR. Could you please review the codeRabbit suggestions?

I'd appreciate it if you could add the PR description. For example:

  • An example of how this new pipeline is supposed to be used? Please refer to examples/python is the place where the demo files are stored. You can also create an owl_pipelie_demo.py for that.
  • Question: Can a custom Task that reads the owl files be an alternative solution for this concept?

Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
@Vattikuti-Manideep-Sitaram
Copy link
Author

Hey hi @pazone . Thanks for the suggestion, I have made the changes which code rabbit suggestion there are few things which I would like to confirm with you. this is the sample code which I am using to test the function

import asyncio
from io import BytesIO
from cognee.api.v1.cognify.ontology_graph_pipeline import run_ontology_pipeline
from cognee.api.v1.search import search, SearchType


async def main():
    # Path to your updated ontology file
    ONTOLOGY_FILE_PATH = "sample_ontology.owl"

    # Read ontology in-memory as bytes
    with open(ONTOLOGY_FILE_PATH, "rb") as f:
        ontology_bytes = BytesIO(f.read())

    print("\n🚀 Starting Ontology Ingestion Pipeline...\n")

    # Run the pipeline using the in-memory ontology file
    async for status in run_ontology_pipeline(ontology_bytes,format="xml"):
        print(f"📦 Pipeline Run ID: {status.pipeline_run_id}, Status: {status.status}")

    print("\n✅ Ontology ingestion complete. Running validation searches...\n")

    # Perform multiple searches to verify the ontology knowledge graph
    queries = [
        "Which disease has symptom Fever?",
        "List all classes defined in the ontology.",
        "What are the object properties in this ontology?"
    ]

    for query in queries:
        print(f"🔍 Query: {query}")
        try:
            results = await search(query, query_type=SearchType.GRAPH_COMPLETION)
            if not results:
                print("   ⚠️  No results found.")
            else:
                for r in results:
                    print("   ➜", r)
        except Exception as e:
            print(f"   ❌ Search failed: {e}")
        print()  # spacing between queries

    print("✅ Ontology graph successfully indexed and searchable!\n")


if __name__ == "__main__":
    asyncio.run(main())

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be1d41a and dc60446.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-13T13:18:46.294Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:0-0
Timestamp: 2025-10-13T13:18:46.294Z
Learning: In cognee/api/v1/cognify/ontology_graph_pipeline.py, the `format` parameter in `load_ontology_data` function must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)

@pazone
Copy link
Contributor

pazone commented Oct 13, 2025

Hey hi @pazone . Thanks for the suggestion, I have made the changes which code rabbit suggestion there are few things which I would like to confirm with you. this is the sample code which I am using to test the function

import asyncio
from io import BytesIO
from cognee.api.v1.cognify.ontology_graph_pipeline import run_ontology_pipeline
from cognee.api.v1.search import search, SearchType


async def main():
    # Path to your updated ontology file
    ONTOLOGY_FILE_PATH = "sample_ontology.owl"

    # Read ontology in-memory as bytes
    with open(ONTOLOGY_FILE_PATH, "rb") as f:
        ontology_bytes = BytesIO(f.read())

    print("\n🚀 Starting Ontology Ingestion Pipeline...\n")

    # Run the pipeline using the in-memory ontology file
    async for status in run_ontology_pipeline(ontology_bytes,format="xml"):
        print(f"📦 Pipeline Run ID: {status.pipeline_run_id}, Status: {status.status}")

    print("\n✅ Ontology ingestion complete. Running validation searches...\n")

    # Perform multiple searches to verify the ontology knowledge graph
    queries = [
        "Which disease has symptom Fever?",
        "List all classes defined in the ontology.",
        "What are the object properties in this ontology?"
    ]

    for query in queries:
        print(f"🔍 Query: {query}")
        try:
            results = await search(query, query_type=SearchType.GRAPH_COMPLETION)
            if not results:
                print("   ⚠️  No results found.")
            else:
                for r in results:
                    print("   ➜", r)
        except Exception as e:
            print(f"   ❌ Search failed: {e}")
        print()  # spacing between queries

    print("✅ Ontology graph successfully indexed and searchable!\n")


if __name__ == "__main__":
    asyncio.run(main())

I think it's a great example and it deserves to be added to examples/python. WDYT?

@pazone
Copy link
Contributor

pazone commented Oct 13, 2025

@Vattikuti-Manideep-Sitaram I'd recommend to consider implementing a custom task: https://docs.cognee.ai/core-concepts/building-blocks/tasks. It is much easier. Please LMK if you need any assistance with the task implementation

@Vattikuti-Manideep-Sitaram
Copy link
Author

Hey hi @pazone . Thanks for the suggestion, I have made the changes which code rabbit suggestion there are few things which I would like to confirm with you. this is the sample code which I am using to test the function

import asyncio
from io import BytesIO
from cognee.api.v1.cognify.ontology_graph_pipeline import run_ontology_pipeline
from cognee.api.v1.search import search, SearchType


async def main():
    # Path to your updated ontology file
    ONTOLOGY_FILE_PATH = "sample_ontology.owl"

    # Read ontology in-memory as bytes
    with open(ONTOLOGY_FILE_PATH, "rb") as f:
        ontology_bytes = BytesIO(f.read())

    print("\n🚀 Starting Ontology Ingestion Pipeline...\n")

    # Run the pipeline using the in-memory ontology file
    async for status in run_ontology_pipeline(ontology_bytes,format="xml"):
        print(f"📦 Pipeline Run ID: {status.pipeline_run_id}, Status: {status.status}")

    print("\n✅ Ontology ingestion complete. Running validation searches...\n")

    # Perform multiple searches to verify the ontology knowledge graph
    queries = [
        "Which disease has symptom Fever?",
        "List all classes defined in the ontology.",
        "What are the object properties in this ontology?"
    ]

    for query in queries:
        print(f"🔍 Query: {query}")
        try:
            results = await search(query, query_type=SearchType.GRAPH_COMPLETION)
            if not results:
                print("   ⚠️  No results found.")
            else:
                for r in results:
                    print("   ➜", r)
        except Exception as e:
            print(f"   ❌ Search failed: {e}")
        print()  # spacing between queries

    print("✅ Ontology graph successfully indexed and searchable!\n")


if __name__ == "__main__":
    asyncio.run(main())

I think it's a great example and it deserves to be added to examples/python. WDYT?

Thanks a lot for the feedback! @pazone
I agree this example nicely demonstrates ontology indexing and querying in a clean, runnable way.
Adding it under examples/python sounds perfect to me. I can prepare a PR for it if everyone’s okay with that. 👍

text = f"{t['subject']} {t['predicate']} {t['object']}"
chunk = DocumentChunk(
id=uuid4(),
text=text,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing triplets as plain text won't take the desired effect. The relations(Egdes) between the entities will be lost. We need to keep the structure using proper DataPoints. To do that we need to extract the proper types.
It's not an easy task. Please take a look at the example in Cognee-starter(custom-model).

…p triples to DataPoint graph edges

Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
…p triples to DataPoint graph edges

Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

133-140: Consider expanding the docstring.

The function docstring is minimal and doesn't document the parameters or the destructive prune operations. Consider adding parameter documentation and a warning about the data clearing behavior.

Example improvement:

async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
    """
    Run the ontology ingestion pipeline directly from a file object.
    
    WARNING: This pipeline clears all existing data and system metadata before processing.
    
    Args:
        ontology_file: File-like object, bytes, or path containing RDF/OWL ontology data
        format: RDF serialization format (e.g., "xml", "turtle", "n3", "json-ld")
    
    Yields:
        Pipeline status updates as the ingestion progresses
    """

Note: The prune operations are intentional and follow the established pattern from code_graph_pipeline.py. Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc60446 and ccf41c3.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-13T13:18:46.294Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:0-0
Timestamp: 2025-10-13T13:18:46.294Z
Learning: In cognee/api/v1/cognify/ontology_graph_pipeline.py, the `format` parameter in `load_ontology_data` function must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (6)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (20-220)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
🪛 Pylint (4.0.0)
cognee/api/v1/cognify/ontology_graph_pipeline.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

🔇 Additional comments (3)
cognee/api/v1/cognify/ontology_graph_pipeline.py (3)

50-63: LGTM! Well-structured DataPoint implementation.

The OntologyEntity class correctly extends DataPoint and properly represents ontology resources with relationships. The use of List[Tuple[Edge, "OntologyEntity"]] for related_to preserves the graph structure, and model_rebuild() correctly handles forward references.


75-113: LGTM! Solid triple-to-DataPoint conversion.

The function correctly converts RDF triples into a graph of OntologyEntity instances:

  • Deduplicates entities by URI
  • Properly constructs edges with relationship types
  • Prevents duplicate relationships (lines 107-110)
  • Returns all entities for complete graph representation

119-127: LGTM! Clean task implementation.

The custom task function correctly orchestrates the ontology ingestion flow:

  1. Extracts the ontology file from inputs
  2. Parses it into triples with the specified format
  3. Converts triples to DataPoints
  4. Persists DataPoints via add_data_points

Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

38-43: Preserve RDF term semantics—distinguish Literals from URIs.

Converting all RDF terms to plain strings loses critical semantic information and causes downstream issues:

  1. Literal datatypes are lost: Values like "42"^^xsd:integer or "2024-01-01"^^xsd:date become plain strings
  2. Language tags are lost: "hello"@en becomes just "hello"
  3. Term type distinction is lost: Cannot differentiate between URIRef (resource) and Literal (value)

This causes ontology_to_datapoints (lines 95-100) to incorrectly create OntologyEntity nodes for literals, producing semantically invalid ontology graphs where values like "42" become entity nodes.

Consider preserving term type information:

     triples = []
     for s, p, o in g:
-        triples.append({
-            "subject": str(s),
-            "predicate": str(p),
-            "object": str(o)
-        })
+        triple = {
+            "subject": str(s),
+            "predicate": str(p),
+            "object": str(o),
+            "object_type": type(o).__name__,  # 'URIRef', 'Literal', 'BNode'
+        }
+        if hasattr(o, 'datatype') and o.datatype:
+            triple["object_datatype"] = str(o.datatype)
+        if hasattr(o, 'language') and o.language:
+            triple["object_language"] = o.language
+        triples.append(triple)
     return triples

Then update ontology_to_datapoints to only create entities for URIRef objects, and handle Literals as edge properties or separate literal value nodes.

🧹 Nitpick comments (2)
cognee/api/v1/cognify/ontology_graph_pipeline.py (2)

75-113: Add error handling and validation.

The function lacks validation for empty or malformed input and has no error handling for potential issues during entity/edge creation.

Add input validation and error handling:

 async def ontology_to_datapoints(triples: list[dict]) -> list[DataPoint]:
     """
     Converts parsed triples into Cognee DataPoints (entities + relations).
     This preserves the ontology's structure as a graph.
     """
+    if not triples:
+        raise ValueError("No triples provided for conversion")
+    
     entities: Dict[str, OntologyEntity] = {}
 
+    try:
         for t in triples:
             subj = t["subject"]
             pred = t["predicate"]
             obj = t["object"]
             # ... rest of logic
+    except KeyError as e:
+        raise ValueError(f"Malformed triple missing required field: {e}") from e
+    except Exception as e:
+        raise RuntimeError(f"Failed to convert triples to datapoints: {e}") from e
 
     return list(entities.values())

119-127: Validate task inputs to prevent runtime errors.

The function assumes inputs[0] exists and is an ontology file without validation, which will raise an IndexError if inputs is empty.

Add input validation:

 async def ontology_ingestion_task(inputs: list, format: str):
     """
     Custom Cognee Task: Ingest OWL/RDF ontology and store as structured DataPoints.
     """
+    if not inputs:
+        raise ValueError("No inputs provided to ontology ingestion task")
+    
     ontology_file = inputs[0]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccf41c3 and 937d716.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-13T13:18:46.294Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:0-0
Timestamp: 2025-10-13T13:18:46.294Z
Learning: In cognee/api/v1/cognify/ontology_graph_pipeline.py, the `format` parameter in `load_ontology_data` function must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (6)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (20-220)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
🪛 Pylint (4.0.0)
cognee/api/v1/cognify/ontology_graph_pipeline.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

🔇 Additional comments (5)
cognee/api/v1/cognify/ontology_graph_pipeline.py (5)

1-16: LGTM!

All necessary imports are present and correctly organized for the ontology ingestion pipeline.


50-63: LGTM!

The OntologyEntity DataPoint structure appropriately represents ontology resources with graph relationships, and model_rebuild() correctly resolves the forward reference.


66-72: LGTM!

The label extraction logic handles common RDF URI patterns (fragment identifiers and path components) with a sensible fallback.


133-133: Consider requiring the format parameter.

Per project learnings, the format parameter should be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable. While providing a default "xml" may improve user experience, it could lead to parse failures if the actual format differs.

Based on learnings

Consider either:

  1. Remove the default to force explicit format specification:
-async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
+async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str):
  1. Or keep the default but add a warning in the docstring:
async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
    """
    Run the ontology ingestion pipeline.
    
    Args:
        ontology_file: File-like object or path to RDF data
        format: RDF serialization format (xml, turtle, n3, json-ld, etc.)
                Defaults to "xml" but explicit specification is recommended
                for reliability.
    """

What is your preference for balancing usability vs. reliability?


138-140: LGTM—prune operations follow established pattern.

The prune operations at the start of the pipeline match the pattern used in code_graph_pipeline.py and are intentional for pipeline operations in the cognee codebase.

Based on learnings

Comment on lines 89 to 90
entities[subj] = OntologyEntity(
id=uuid4(),
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use stable IDs derived from URIs instead of uuid4().

Generating new UUIDs with uuid4() on every pipeline run breaks referential stability—the same ontology resource will have different IDs across runs, preventing:

  • Incremental updates
  • Cross-run queries and linking
  • Idempotent pipeline execution

Use the URI as the ID or generate a stable hash:

         if subj not in entities:
             entities[subj] = OntologyEntity(
-                id=uuid4(),
+                id=UUID(hashlib.md5(subj.encode()).hexdigest()),
                 name=_extract_label(subj),
                 uri=subj,
             )

Add the import at the top:

import hashlib
from uuid import UUID

Apply the same fix at lines 96-97 for object entities.

🤖 Prompt for AI Agents
In cognee/api/v1/cognify/ontology_graph_pipeline.py around lines 89-90 (and also
apply the same change at lines 96-97 for object entities), replace the unstable
uuid4() generation with a stable ID derived from the entity URI: compute a
deterministic hash from the URI (e.g., SHA-1) and construct a UUID from that hex
digest (or use uuid5 with a fixed namespace and the URI) so the same URI yields
the same ID across runs; also add the suggested imports at the top (import
hashlib and from uuid import UUID) and ensure the OntologyEntity id field is
assigned the deterministic UUID instead of uuid4().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I use the same logic for the object uuid ? Or else should I just use it for the subject ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vattikuti-Manideep-Sitaram Yes, absolutely use the same stable UUID generation logic for both subject and object entities. This is critical because:

  1. The same URI can appear as both subject and object in different triples. For example, a class definition might have a URI as the subject, but that same URI could appear as the object in another triple (e.g., when defining relationships between classes).

  2. Referential integrity: If you generate different UUIDs for the same URI depending on whether it appears as a subject or object, you'll create duplicate entities in your knowledge graph instead of properly linking them together.

  3. Consistency: The stable ID generation ensures that http://example.org/Person always maps to the same UUID, regardless of whether it appears as a subject or object in any triple.

Here's what the logic should look like for both:

import hashlib
from uuid import UUID

# For subjects (around line 89-90)
if subj not in entities:
    entities[subj] = OntologyEntity(
        id=UUID(hashlib.md5(subj.encode()).hexdigest()),
        name=_extract_label(subj),
        uri=subj,
    )

# For objects (around line 96-97) - same pattern
if obj not in entities:
    entities[obj] = OntologyEntity(
        id=UUID(hashlib.md5(obj.encode()).hexdigest()),
        name=_extract_label(obj),
        uri=obj,
    )

This ensures that any given URI will always resolve to the same entity ID across your entire ontology graph, enabling proper relationship modeling and idempotent pipeline runs.

Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
Signed-off-by: Vattikuti-Manideep-Sitaram <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
cognee/api/v1/cognify/ontology_graph_pipeline.py (2)

39-46: CRITICAL: Preserve RDF term type information to fix broken literal detection.

Converting all RDF terms to plain strings loses critical semantic information that the code relies on:

  • Line 99 in ontology_to_datapoints checks t.get("object_type") == "URIRef" to distinguish entities from literals
  • This check will always fail because object_type is never set, causing all objects to be skipped (line 107)
  • Result: The pipeline will crash on line 118 when trying to access entities[obj] for a skipped literal object

Apply this diff to preserve term types:

     triples = []
     for s, p, o in g:
-        triples.append({
-            "subject": str(s),
-            "predicate": str(p),
-            "object": str(o)
-        })
+        triple = {
+            "subject": str(s),
+            "predicate": str(p),
+            "object": str(o),
+            "object_type": type(o).__name__,  # 'URIRef', 'Literal', 'BNode'
+        }
+        if hasattr(o, 'datatype') and o.datatype:
+            triple["object_datatype"] = str(o.datatype)
+        if hasattr(o, 'language') and o.language:
+            triple["object_language"] = o.language
+        triples.append(triple)
     return triples

Based on learnings


92-92: Use stable IDs consistently for all entities.

Subject entities use stable MD5-based UUIDs (line 92), but object entities use random uuid4() (line 101). This inconsistency breaks referential stability—the same ontology resource will have different IDs depending on whether it appears as a subject or object, and different IDs across pipeline runs.

Apply this diff to use stable IDs for object entities:

             if t.get("object_type") == "URIRef":
                  entities[obj] = OntologyEntity(
-                     id=uuid4(),
+                     id=UUID(hashlib.md5(obj.encode()).hexdigest()),
                      name=_extract_label(obj),
                      uri=obj,
                  )
🧹 Nitpick comments (3)
cognee/api/v1/cognify/ontology_graph_pipeline.py (3)

1-1: Remove unused import.

The asyncio module is imported but never used in this file.

Apply this diff:

-import asyncio
 from typing import Union, Dict, List, Tuple

126-134: Add input validation for robustness.

The function accesses inputs[0] without checking if inputs is non-empty. While run_tasks should always pass data, defensive validation improves robustness.

Apply this diff:

 async def ontology_ingestion_task(inputs: list, format: str):
     """
     Custom Cognee Task: Ingest OWL/RDF ontology and store as structured DataPoints.
     """
+    if not inputs:
+        raise ValueError("ontology_ingestion_task requires at least one input file")
+    
     ontology_file = inputs[0]
     triples = await load_ontology_data(ontology_file, format)
     datapoints = await ontology_to_datapoints(triples)
     await add_data_points(datapoints)
     return datapoints

157-157: Fix spacing in Task instantiation.

Missing space after the comma before format parameter.

Apply this diff:

-        Task(ontology_ingestion_task,format=format, task_config={"batch_size": 50}),
+        Task(ontology_ingestion_task, format=format, task_config={"batch_size": 50}),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 937d716 and b52e473.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-13T13:18:46.294Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:0-0
Timestamp: 2025-10-13T13:18:46.294Z
Learning: In cognee/api/v1/cognify/ontology_graph_pipeline.py, the `format` parameter in `load_ontology_data` function must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (6)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (20-220)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
🪛 Pylint (4.0.0)
cognee/api/v1/cognify/ontology_graph_pipeline.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

🔇 Additional comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

145-146: LGTM: Prune operations follow established pattern.

These prune operations are consistent with the established pattern used in code_graph_pipeline.py and other pipelines in the Cognee codebase.

Based on learnings

Comment on lines +114 to +118
if not any(
existing_edge.relationship_type == predicate_label and target.uri == obj
for existing_edge, target in entities[subj].related_to
):
entities[subj].related_to.append((edge, entities[obj]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Guard against accessing skipped literal objects.

When an object is a Literal (not a URIRef), the code skips creating an entity (line 107). However, line 118 unconditionally tries to access entities[obj], which will raise a KeyError for skipped literals, crashing the pipeline.

Apply this diff to only create edges for URI objects:

         predicate_label = _extract_label(pred)
         edge = Edge(
             relationship_type=predicate_label,
             properties={"uri": pred},
         )
+        # Only create edges between entity nodes (URIs), not to literals
+        if obj not in entities:
+            continue
+        
         if not any(
             existing_edge.relationship_type == predicate_label and target.uri == obj
             for existing_edge, target in entities[subj].related_to
         ):
             entities[subj].related_to.append((edge, entities[obj]))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not any(
existing_edge.relationship_type == predicate_label and target.uri == obj
for existing_edge, target in entities[subj].related_to
):
entities[subj].related_to.append((edge, entities[obj]))
predicate_label = _extract_label(pred)
edge = Edge(
relationship_type=predicate_label,
properties={"uri": pred},
)
# Only create edges between entity nodes (URIs), not to literals
if obj not in entities:
continue
if not any(
existing_edge.relationship_type == predicate_label and target.uri == obj
for existing_edge, target in entities[subj].related_to
):
entities[subj].related_to.append((edge, entities[obj]))
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/ontology_graph_pipeline.py around lines 114 to 118, the
code unconditionally accesses entities[obj] when appending an edge which raises
a KeyError for literal objects that were intentionally skipped; change the logic
to only consider and append edges when the object is a URI entity present in
entities (e.g., guard with if obj in entities before performing the any(...)
check and before appending), so edges are only created for URI objects that
exist in the entities mapping.

# -------------------------------------
# STEP 4: Build and run the pipeline
# -------------------------------------
async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the default value for the format parameter.

Based on learnings: The format parameter must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable. The current default value of "xml" could lead to parse failures if the actual file format differs.

Apply this diff:

-async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
+async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str):

Based on learnings

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str):
🤖 Prompt for AI Agents
In cognee/api/v1/cognify/ontology_graph_pipeline.py around line 140, the
function signature defines format: str = "xml" which makes format optional;
remove the default so the parameter is required (change to format: str) and then
update any internal usages/call sites and tests to pass an explicit format
string; ensure the docstring/type hints reflect that format is mandatory and run
tests to catch callers that need updating.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (1)

135-135: Remove the default value for the format parameter.

The parameter currently has a default value of "xml", but this contradicts the established requirement that format must be explicitly specified for reliable RDF parsing. Auto-detection is unreliable, and defaulting to XML may cause parse failures if files are in other formats (Turtle, N3, JSON-LD, etc.).

Based on learnings

Apply this diff:

-async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str = "xml"):
+async def run_ontology_pipeline(ontology_file: Union[str, bytes, IOBase], format: str):
🧹 Nitpick comments (2)
cognee/api/v1/cognify/ontology_graph_pipeline.py (2)

77-115: Advisory: Current design treats all RDF objects as entities.

The implementation creates OntologyEntity instances for all objects (lines 97-102), including values that may be RDF Literals (e.g., "42"^^xsd:integer, "text"@en) rather than URI references. This can produce semantically incorrect graphs where data values become entity nodes.

Based on past review discussion, an attempt was made to distinguish URIs from Literals and handle them differently, but that approach caused relationship loss. The current simpler string-based approach was retained.

If downstream queries or semantic correctness issues arise from treating literal values as entities, you may need to revisit the term-type preservation approach with additional testing to ensure relationships are maintained.


152-152: Fix formatting: add space after comma.

Apply this diff:

-        Task(ontology_ingestion_task,format=format, task_config={"batch_size": 50}),
+        Task(ontology_ingestion_task, format=format, task_config={"batch_size": 50}),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b52e473 and 702a019.

📒 Files selected for processing (1)
  • cognee/api/v1/cognify/ontology_graph_pipeline.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-13T13:18:46.294Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:0-0
Timestamp: 2025-10-13T13:18:46.294Z
Learning: In cognee/api/v1/cognify/ontology_graph_pipeline.py, the `format` parameter in `load_ontology_data` function must be required (not optional) to ensure reliable RDF parsing, as rdflib's auto-detection is unreliable.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
📚 Learning: 2025-10-11T04:18:24.585Z
Learnt from: Vattikuti-Manideep-Sitaram
PR: topoteretes/cognee#1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.585Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

  • cognee/api/v1/cognify/ontology_graph_pipeline.py
🧬 Code graph analysis (1)
cognee/api/v1/cognify/ontology_graph_pipeline.py (6)
cognee/infrastructure/engine/models/DataPoint.py (1)
  • DataPoint (20-220)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/modules/data/methods/create_dataset.py (1)
  • create_dataset (10-32)
cognee/infrastructure/databases/relational/get_relational_engine.py (1)
  • get_relational_engine (5-21)
cognee/modules/pipelines/operations/run_tasks.py (1)
  • run_tasks (62-336)
cognee/modules/pipelines/tasks/task.py (1)
  • Task (5-97)
🪛 Pylint (4.0.0)
cognee/api/v1/cognify/ontology_graph_pipeline.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants