Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating or deleting documents gives errors but graph still created #1092

Open
michela opened this issue Feb 17, 2025 · 7 comments
Open

Creating or deleting documents gives errors but graph still created #1092

michela opened this issue Feb 17, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@michela
Copy link

michela commented Feb 17, 2025

Testing with 593de6e and with latest tag (v0.7)

Neo4j EE 5.20.0

Add document by url (wikipedia url or website url) or Delete document

Repro:

Add wikipedia URL
https://en.wikipedia.org/wiki/Albert_Einstein
Select document
Process

Delete log

2025-02-17 16:45:05 backend   | 2025-02-17 05:45:05,261 - Unable to delete document ["Albert_Einstein"]:{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '(': expected "{" (line 5, column 18 (offset: 194))
2025-02-17 16:45:05 backend   | "            CALL (documents) {"
2025-02-17 16:45:05 backend   |                   ^}
2025-02-17 16:45:05 backend   | Traceback (most recent call last):
2025-02-17 16:45:05 backend   |   File "/code/score.py", line 640, in delete_document_and_entities
2025-02-17 16:45:05 backend   |     files_list_size = await asyncio.to_thread(graphDb_data_Access.delete_file_from_graph, filenames, source_types, deleteEntities, MERGED_DIR, uri)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/asyncio/threads.py", line 25, in to_thread
2025-02-17 16:45:05 backend   |     return await loop.run_in_executor(None, func_call)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
2025-02-17 16:45:05 backend   |     result = self.fn(*self.args, **self.kwargs)
2025-02-17 16:45:05 backend   |   File "/code/src/graphDB_dataAccess.py", line 338, in delete_file_from_graph
2025-02-17 16:45:05 backend   |     result = self.execute_query(query_to_delete_document_and_entities, param)
2025-02-17 16:45:05 backend   |   File "/code/src/graphDB_dataAccess.py", line 260, in execute_query
2025-02-17 16:45:05 backend   |     return self.graph.query(query, param)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/langchain_neo4j/graphs/neo4j_graph.py", line 447, in query
2025-02-17 16:45:05 backend   |     data, _, _ = self._driver.execute_query(
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/driver.py", line 970, in execute_query
2025-02-17 16:45:05 backend   |     return session._run_transaction(
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/work/session.py", line 583, in _run_transaction
2025-02-17 16:45:05 backend   |     result = transaction_function(tx, *args, **kwargs)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_work/query.py", line 144, in wrapped
2025-02-17 16:45:05 backend   |     return f(*args, **kwargs)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/driver.py", line 1306, in _work
2025-02-17 16:45:05 backend   |     res = tx.run(query, parameters)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/work/transaction.py", line 206, in run
2025-02-17 16:45:05 backend   |     result._tx_ready_run(query, parameters)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/work/result.py", line 177, in _tx_ready_run
2025-02-17 16:45:05 backend   |     self._run(query, parameters, None, None, None, None, None, None)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/work/result.py", line 236, in _run
2025-02-17 16:45:05 backend   |     self._attach()
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/work/result.py", line 430, in _attach
2025-02-17 16:45:05 backend   |     self._connection.fetch_message()
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/io/_common.py", line 184, in inner
2025-02-17 16:45:05 backend   |     func(*args, **kwargs)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/io/_bolt.py", line 864, in fetch_message
2025-02-17 16:45:05 backend   |     res = self._process_message(tag, fields)
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/io/_bolt5.py", line 500, in _process_message
2025-02-17 16:45:05 backend   |     response.on_failure(summary_metadata or {})
2025-02-17 16:45:05 backend   |   File "/usr/local/lib/python3.10/site-packages/neo4j/_sync/io/_common.py", line 254, in on_failure
2025-02-17 16:45:05 backend   |     raise self._hydrate_error(metadata)
2025-02-17 16:45:05 backend   | neo4j.exceptions.CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '(': expected "{" (line 5, column 18 (offset: 194))
2025-02-17 16:45:05 backend   | "            CALL (documents) {"
2025-02-17 16:45:05 backend   |                   ^}
@michela michela added the bug Something isn't working label Feb 17, 2025
@prakriti-solankey
Copy link
Collaborator

Hi @michela , Its working fine through:
main > docker deploy and as well by cloud deploy Main.

Please help us understand more how to replicate this issue.

@genggengchen
Copy link

genggengchen commented Feb 17, 2025

I also encountered this issue today. The problem lies in an incorrect usage of the Cypher CALL clause. The syntax CALL (documents) is invalid.

The wrong codes can be found in:
src/shared/constants.pyandsrc/graphDB_dataAccess.py

One of the wrong code

 CALL (documents) {
           UNWIND documents AS d
            OPTIONAL MATCH (d)<-[:PART_OF]-(c:Chunk)
            OPTIONAL MATCH (c:Chunk)-[:HAS_ENTITY]->(e)
            WITH d, c, e, documents
            WHERE NOT EXISTS {
                MATCH (e)<-[:HAS_ENTITY]-(c2)-[:PART_OF]->(d2:Document)
                WHERE NOT d2 IN documents
                }
            WITH d, COLLECT(c) AS chunks, COLLECT(e) AS entities
            FOREACH (chunk IN chunks | DETACH DELETE chunk)
            FOREACH (entity IN entities | DETACH DELETE entity)
            DETACH DELETE d
            } IN TRANSACTIONS OF 1 ROWS

runnable code

CALL {
    WITH documents
    UNWIND documents AS d
    OPTIONAL MATCH (d)<-[:PART_OF]-(c:Chunk)
    OPTIONAL MATCH (c)-[:HAS_ENTITY]->(e)
    WITH d, c, e
    WHERE NOT EXISTS {
        MATCH (e)<-[:HAS_ENTITY]-(c2)-[:PART_OF]->(d2:Document)
        WHERE NOT d2 IN documents
    }
    WITH d, COLLECT(c) AS chunks, COLLECT(e) AS entities
    FOREACH (chunk IN chunks | DETACH DELETE chunk)
    FOREACH (entity IN entities | DETACH DELETE entity)
    DETACH DELETE d
} IN TRANSACTIONS OF 1 ROWS

@michela
Copy link
Author

michela commented Feb 17, 2025

Below is a redacted log of the following repro case for 7461aa1

  • Connect
  • Drag and drop credentials file for Neo4j EE 5.20.0
  • Click Web Sources
  • Click Wikipedia Source
  • Paste default URL https://en.wikipedia.org/wiki/Albert_Einstein
  • Click Submit
  • Click Close (top right)
  • Switch from Diffbot to OpenAI bottom left of frontend
  • <Status - Processing>
  • <Status - Failed>

2025-02-17 20:50:56 2025-02-17 09:50:56,458 - Use pytorch device_name: cpu
2025-02-17 20:50:56 2025-02-17 09:50:56,458 - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2025-02-17 20:50:59 2025-02-17 09:50:59,402 - Embedding: Using Langchain HuggingFaceEmbeddings , Dimension:384
2025-02-17 20:50:59 2025-02-17 09:50:59,403 - embedding model:model_name='all-MiniLM-L6-v2' cache_folder=None model_kwargs={} encode_kwargs={} multi_process=False show_progress=False and dimesion:384
2025-02-17 20:50:59 2025-02-17 09:50:59,455 - GDS is available in the database.
2025-02-17 20:50:59 2025-02-17 09:50:59,512 - Checking access for database: major-minor
2025-02-17 20:50:59 2025-02-17 09:50:59,556 - Read access count: 0
2025-02-17 20:50:59 2025-02-17 09:50:59,556 - The account has write access.
2025-02-17 20:50:59 2025-02-17 09:50:59,581 - Get existing files list from graph
2025-02-17 20:50:59 2025-02-17 09:50:59,859 - closing connection for sources_list api
2025-02-17 20:50:59 2025-02-17 09:50:59,860 - Get existing files list from graph
2025-02-17 20:51:20 2025-02-17 09:51:20,126 - incoming URL: None
2025-02-17 20:51:20 2025-02-17 09:51:20,126 - wikipedia query id = Albert_Einstein
2025-02-17 20:51:20 2025-02-17 09:51:20,126 - Creating source node for Albert_Einstein, en
2025-02-17 20:51:39 2025-02-17 09:51:39,329 - creating source node if does not exist
2025-02-17 20:54:57 2025-02-17 09:54:57,753 - Total Pages from Wikipedia = 24
2025-02-17 20:54:58 2025-02-17 09:54:58,871 - Time taken database connection: 1.12 seconds
2025-02-17 20:54:58 2025-02-17 09:54:58,932 - Index already exist,Skipping creation. Time taken: 0.06 seconds
2025-02-17 20:54:58 2025-02-17 09:54:58,932 - Break down file into chunks
2025-02-17 20:54:58 2025-02-17 09:54:58,932 - Split file into smaller chunks
2025-02-17 20:54:59 2025-02-17 09:54:59,220 - creating FIRST_CHUNK and NEXT_CHUNK relationships between chunks
2025-02-17 20:54:59 2025-02-17 09:54:59,690 - Time taken to create list chunkids with chunk document: 0.76 seconds
2025-02-17 20:54:59 2025-02-17 09:54:59,730 - Time taken to get the current status of document node: 0.04 seconds
2025-02-17 20:54:59 2025-02-17 09:54:59,730 - Albert_Einstein
2025-02-17 20:54:59 2025-02-17 09:54:59,730 - <src.entities.source_node.sourceNode object at 0x7f721e077340>
2025-02-17 20:54:59 2025-02-17 09:54:59,730 - Base Param value 1 : {'props': {'fileName': 'Albert_Einstein', 'status': 'Processing', 'model': 'openai_gpt_4o', 'total_chunks': 100, 'processed_chunk': 0}}
2025-02-17 20:54:59 2025-02-17 09:54:59,730 - Update source node properties
2025-02-17 20:54:59 2025-02-17 09:54:59,796 - updating node and relationship count
2025-02-17 20:55:00 2025-02-17 09:55:00,110 -  SSE Client disconnected
2025-02-17 20:55:00 2025-02-17 09:55:00,452 - update KNN graph
2025-02-17 20:55:00 2025-02-17 09:55:00,545 - Updated KNN Graph
2025-02-17 20:55:00 2025-02-17 09:55:00,548 - Use pytorch device_name: cpu
2025-02-17 20:55:00 2025-02-17 09:55:00,548 - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2025-02-17 20:55:03 2025-02-17 09:55:03,118 - Embedding: Using Langchain HuggingFaceEmbeddings , Dimension:384
2025-02-17 20:55:03 2025-02-17 09:55:03,118 - Starting the process of creating full-text indexes.
2025-02-17 20:55:03 2025-02-17 09:55:03,118 - Attempting to connect to the Neo4j database at <REDACTED>
2025-02-17 20:55:03 2025-02-17 09:55:03,118 - Connection successful
2025-02-17 20:55:03 2025-02-17 09:55:03,180 - Database connectivity verified.
2025-02-17 20:55:03 2025-02-17 09:55:03,180 - Creating a full-text index for type 'entities'.
2025-02-17 20:55:03 2025-02-17 09:55:03,201 - Dropped existing index (if any) in 0.02 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,223 - Full text index is not created as labels are empty
2025-02-17 20:55:03 2025-02-17 09:55:03,223 - Process completed in 0.04 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,224 - Full-text index for type 'entities' created successfully.
2025-02-17 20:55:03 2025-02-17 09:55:03,224 - Creating a full-text index for type 'hybrid'.
2025-02-17 20:55:03 2025-02-17 09:55:03,245 - Dropped existing index (if any) in 0.02 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,330 - Created full-text index in 0.09 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,334 - Process completed in 0.11 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,334 - Full-text index for type 'hybrid' created successfully.
2025-02-17 20:55:03 2025-02-17 09:55:03,334 - Creating a vector index for type 'vector'.
2025-02-17 20:55:03 2025-02-17 09:55:03,334 - Starting the process to create vector index.
2025-02-17 20:55:03 2025-02-17 09:55:03,369 - Dropped existing index (if any) in 0.03 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,391 - Created vector index in 0.02 seconds.
2025-02-17 20:55:03 2025-02-17 09:55:03,394 - Vector index for chunk created successfully.
2025-02-17 20:55:03 2025-02-17 09:55:03,394 - Driver closed successfully.
2025-02-17 20:55:03 2025-02-17 09:55:03,394 - Full-text and vector index creation process completed.
2025-02-17 20:55:03 2025-02-17 09:55:03,394 - Full Text index created
2025-02-17 20:55:03 2025-02-17 09:55:03,454 - Entity Embeddings created
2025-02-17 20:55:03 2025-02-17 09:55:03,695 - Successfully created GDS driver.
2025-02-17 20:55:03 2025-02-17 09:55:03,695 - Starting to clear communities.
2025-02-17 20:55:03 2025-02-17 09:55:03,695 - Dropping communities...
2025-02-17 20:55:03 2025-02-17 09:55:03,744 - Communities dropped successfully
2025-02-17 20:55:03 2025-02-17 09:55:03,744 - Dropping community property from entities...
2025-02-17 20:55:03 2025-02-17 09:55:03,794 - Community property dropped successfully
2025-02-17 20:55:03 2025-02-17 09:55:03,856 - Creating new graph project 'communities'.
2025-02-17 20:55:03 2025-02-17 09:55:03,932 - Graph projection 'None' created successfully with None nodes and None relationships.
2025-02-17 20:55:03 2025-02-17 09:55:03,991 - Failed to create community graph project: No projected graph named 'None' exists in current database 'major-minor'
2025-02-17 20:55:03 2025-02-17 09:55:03,991 - Failed to create communities: No projected graph named 'None' exists in current database 'major-minor'
2025-02-17 20:55:03 2025-02-17 09:55:03,992 - created communities
2025-02-17 20:55:04 2025-02-17 09:55:04,062 - updating node and relationship count

@praveshkumar1988
Copy link
Collaborator

praveshkumar1988 commented Feb 18, 2025

@michela delete query working fine, you can test on deployed version https://llm-graph-builder.neo4jlabs.com/

Suggested query by you the way will be deprecated in future version. Use CALL (documents) { ... }

Image

@michela
Copy link
Author

michela commented Feb 19, 2025

Issue (both the "Failed" processing and inability to delete) still showing in 39bb3ef

Repro:

  • fresh database
  • pull on Windows 11
  • docker-compose up --build
  • add Wikipedia URL as above

What commit version is deployed atm?

Could it be something in the config? Very confusing docs around config

New errors from docker-compose but app does run

 => ERROR [frontend build 6/6] RUN VITE_BACKEND_API_URL=http://localhost:8000     VITE_REACT_APP_SOURCES=local,youtube,wiki,s3,web     VITE_GOOGLE_CLIENT_ID=     6.2s ------
 > [frontend build 6/6] RUN VITE_BACKEND_API_URL=http://localhost:8000     VITE_REACT_APP_SOURCES=local,youtube,wiki,s3,web     VITE_GOOGLE_CLIENT_ID=     VITE_BLOOM_URL=https://workspace-preview.neo4j.io/workspace/explore?connectURL={CONNECT_URL}&search=Show+me+a+graph&featureGenAISuggestions=true&featureGenAISuggestionsInternal=true     VITE_CHUNK_SIZE=5242880     VITE_TIME_PER_PAGE=50     VITE_ENV=DEV     VITE_LARGE_FILE_SIZE=5242880     VITE_CHAT_MODES=     VITE_BATCH_SIZE=2     VITE_LLM_MODELS=diffbot,openai_gpt_4o     VITE_LLM_MODELS_PROD=openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash     VITE_AUTH0_CLIENT_ID=     VITE_AUTH0_DOMAIN=     VITE_SKIP_AUTH=true     yarn run build:
0.342 yarn run v1.22.22
0.368 $ tsc && vite build
5.876 vite v4.5.3 building for production...
5.903 transforming...
6.157 Browserslist: caniuse-lite is outdated. Please run:
6.157   npx update-browserslist-db@latest
6.157   Why you should do it regularly: https://github.com/browserslist/update-db#readme
6.191 ✓ 8 modules transformed.
6.191 ✓ built in 315ms
6.191 [vite:css] [postcss] It looks like you're trying to use `tailwindcss` directly as a PostCSS plugin. The PostCSS plugin has moved to a separate package, so to continue using Tailwind CSS with PostCSS you'll need to install `@tailwindcss/postcss` and update your PostCSS configuration.
6.191 file: /app/src/index.css:undefined:undefined
6.192 error during build:
6.192 Error: [postcss] It looks like you're trying to use `tailwindcss` directly as a PostCSS plugin. The PostCSS plugin has moved to a separate package, so to continue using Tailwind CSS with PostCSS you'll need to install `@tailwindcss/postcss` and update your PostCSS configuration.
6.192     at Se (/app/node_modules/tailwindcss/dist/lib.js:33:1716)
6.192     at LazyResult.runOnRoot (/app/node_modules/postcss/lib/lazy-result.js:329:16)
6.192     at LazyResult.runAsync (/app/node_modules/postcss/lib/lazy-result.js:258:26)
6.192     at LazyResult.async (/app/node_modules/postcss/lib/lazy-result.js:160:30)
6.192     at LazyResult.then (/app/node_modules/postcss/lib/lazy-result.js:404:17)
6.209 error Command failed with exit code 1.
6.209 info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
------
failed to solve: process "/bin/sh -c VITE_BACKEND_API_URL=$VITE_BACKEND_API_URL     VITE_REACT_APP_SOURCES=$VITE_REACT_APP_SOURCES     VITE_GOOGLE_CLIENT_ID=$VITE_GOOGLE_CLIENT_ID     VITE_BLOOM_URL=$VITE_BLOOM_URL     VITE_CHUNK_SIZE=$VITE_CHUNK_SIZE     VITE_TIME_PER_PAGE=$VITE_TIME_PER_PAGE     VITE_ENV=$VITE_ENV     VITE_LARGE_FILE_SIZE=${VITE_LARGE_FILE_SIZE}     VITE_CHAT_MODES=$VITE_CHAT_MODES     VITE_BATCH_SIZE=$VITE_BATCH_SIZE     VITE_LLM_MODELS=$VITE_LLM_MODELS     VITE_LLM_MODELS_PROD=$VITE_LLM_MODELS_PROD     VITE_AUTH0_CLIENT_ID=$VITE_AUTH0_CLIENT_ID     VITE_AUTH0_DOMAIN=$VITE_AUTH0_DOMAIN     VITE_SKIP_AUTH=$VITE_SKIP_AUTH     yarn run build" did not complete successfully: exit code: 1

@michela
Copy link
Author

michela commented Feb 20, 2025

Unresolved as of f056d99

Generate Graph error has changed to

2025-02-21 10:07:16 backend   | neo4j.exceptions.CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '(': expected "{" (line 37, column 6 (offset: 1636))
2025-02-21 10:07:16 backend   | "CALL (entities) {"
2025-02-21 10:07:16 backend   |       ^}

Delete Files error is

Error: {"error":"{code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '(': expected \"{\" (line 5, column 18 (offset: 194))\n\"            CALL (documents) {\"\n                  ^}","message":"Unable to delete document [\"Albert_Einstein\"]"}
    at io (index-62bdc5fe.js:1392:19144)

Tested with Edge only as frontend broken with Chrome. #1107

@prakriti-solankey
Copy link
Collaborator

Hi @michela , we yet have to deploy our latest changes to deployed branch . We will update you with the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants