Cocoindex update #656

Psylence0609 · 2025-06-24T16:42:09Z

Psylence0609
Jun 24, 2025

I am new to Cocoindex and I am trying to build a knowledge graph by ingesting a single pdf file containing 30 pages. I have created a flow function in python file and I am trying to run cocoindex update main.py:PdfToKnowledgeGraph.

I am running neo4j, postgres and Ollama with llama3.1:8b in a docker container which has CUDA enabled with specific memory allotments.(I am hoping that the below error is not a memory issue.)

The command is stuck at:
Recognizing layout: 100%
Running OCR Error Detection: 100%
Detecting bboxes: 0it [00:00, ?it/s]
Detecting bboxes: 100%
Recognizing Text: 100%
Recognizing tables: 100%

for hours.

Is there anything wrong that I am doing?

Answered by badmonster0

Jun 24, 2025

Thanks for describing the problem in detail! Sorry for the inconvenience.

This is mostly likely caused by issue of ollama: ollama/ollama#8200

While 30 pages is long. It won't perform well for most models, and makes the ollama bug more vulnerable. You likely need to split it into chunks first. Our docs_to_knowledge_graph example had an earlier version that split it into chunks first (we simplified the example so there's no step of splitting, but for largedocs it should actually be split first).

If it's still stuck after splitting it, you may try to switch to a different LLM API. OpenAI and Google Gemini are usually quite stable, and LiteLLM provides a proxy that integrate with a variety of…

View full answer

badmonster0 · 2025-06-24T18:55:33Z

badmonster0
Jun 24, 2025
Maintainer

Thanks for describing the problem in detail! Sorry for the inconvenience.

This is mostly likely caused by issue of ollama: ollama/ollama#8200

While 30 pages is long. It won't perform well for most models, and makes the ollama bug more vulnerable. You likely need to split it into chunks first. Our docs_to_knowledge_graph example had an earlier version that split it into chunks first (we simplified the example so there's no step of splitting, but for largedocs it should actually be split first).

If it's still stuck after splitting it, you may try to switch to a different LLM API. OpenAI and Google Gemini are usually quite stable, and LiteLLM provides a proxy that integrate with a variety of LLM APIs.

I also created #658 to make such issues easier to debug in the future, and created #659 to provide vLLM integration - it's potentially a more stable alternative to Ollama.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cocoindex update #656

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cocoindex update #656

Uh oh!

Psylence0609 Jun 24, 2025

Replies: 1 comment

Uh oh!

badmonster0 Jun 24, 2025 Maintainer

Psylence0609
Jun 24, 2025

badmonster0
Jun 24, 2025
Maintainer