Skip to content

ImportDocument: prevent retry loops on duplicate-result entity lookups during PDF ingestion#6482

Closed
Copilot wants to merge 4 commits into
masterfrom
copilot/fix-connector-looping-pdf
Closed

ImportDocument: prevent retry loops on duplicate-result entity lookups during PDF ingestion#6482
Copilot wants to merge 4 commits into
masterfrom
copilot/fix-connector-looping-pdf

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 21, 2026

ImportDocument can get stuck reprocessing some PDFs when entity resolution fails with backend error Id loading expect only one response (notably on Vulnerability / Attack-Pattern lookup paths). This change hardens lookup behavior so the connector can continue processing instead of failing the message and looping.

  • Lookup hardening for parsed entities

    • Added DUPLICATE_ID_ERROR_MESSAGE and _read_entity_with_fallback(...) in reportimporter/core.py.
    • Vulnerability (Vulnerability.name) and MITRE technique (Attack-Pattern.x_mitre_id) resolution now use this helper.
  • Controlled fallback behavior

    • Keep normal path as read(filters=...).
    • Only for the known duplicate-result backend error, fallback to list(getAll=True, filters=...).
    • Re-raise unexpected exceptions (with contextual warning) to avoid masking unrelated failures.
  • Deterministic and safe fallback selection

    • Handle list() returning None.
    • Drop invalid list results without id (with warning).
    • Deterministically select first entity after sorting by id.
    • Emit explicit warnings for multi-match and empty-usable-result cases.
# core idea introduced in ImportDocument entity resolution
try:
    entity = api_entity.read(filters=filters)
except Exception as e:
    if DUPLICATE_ID_ERROR_MESSAGE not in str(e):
        raise
    entities = api_entity.list(getAll=True, filters=filters) or []
    entities = [x for x in entities if x.get("id")]
    entity = sorted(entities, key=lambda x: x["id"])[0] if entities else None

Copilot AI changed the title [WIP] Fix connector's looping with specific .pdf ImportDocument: prevent retry loops on duplicate-result entity lookups during PDF ingestion May 21, 2026
Copilot AI requested a review from SamuelHassine May 21, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ImportDocument] Connector's looping with a specific .pdf

2 participants