Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Improvement of Prompt Configuration for rag_demo Graph Index generation #185

Open
3 tasks done
Kryst4lDem0ni4s opened this issue Mar 1, 2025 · 1 comment
Open
3 tasks done
Labels
enhancement New feature or request

Comments

@Kryst4lDem0ni4s
Copy link
Contributor

Kryst4lDem0ni4s commented Mar 1, 2025

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Feature Description (功能描述)

Issue:
File Path: .\incubator-hugegraph-ai\hugegraph-llm\src\hugegraph_llm\resources\demo\config_prompt.yaml
I have tested the current prompt against the following Large Language Models:

API:

  • GPT 4o mini
  • o3-mini
    Local/Ollama:
  • tom_himanen/deepseek-r1-roo-cline-tools:1.5b
  • tom_himanen/deepseek-r1-roo-cline-tools:7b
  • deepseek-r1:7b
  • qwen2.5-coder:1.5b-base
  • DEEPSEEk-coder-v2:16b
  • deepseek-r1:14b

The results after testing were very poor.
The prompt does not clearly define the format requirements as per Apache Gremlin's documentation and can be made better through further testing and more prompt engineering.

Example of outputs generated by the current prompt file:

 "vertices": [
    {
      "id": "1:person",
      "label": "person",
      "type": "vertex",
      "properties": {
        "name": "Sarah",
        "age": "30",
        "occupation": "attorney"
      }
    },
    {
      "id": "1:webpage",
      "label": "webpage",
      "type": "vertex",
      "properties": {
        "name": "www.sarahsplace.com",
        "url": "None"
      }
    }
  ],
  "edges": [
    {
      "label": "roommate",
      "type": "edge",
      "outV": "1:person",
      "outVLabel": "person",
      "inV": "1:webpage",
      "inVLabel": "webpage",
      "properties": {
        "date": "2010"
      }
    }
  ]
}

Why are these results incorrect (after numerous tests)?
Errors related to missing keywords like "vertices", "edges", "edgelabels", "vertexlabels", "propertykeys", missing IDs, incorrect ID sequencing, missing "source_label" and "target_label", and other syntax errors.

Expected syntax example for reference:

{
  "vertices": [
    {
      "id": "2:lop",
      "label": "software",
      "type": "vertex",
      "properties": {
        "name": "lop",
        "lang": "java",
        "price": 328
      }
    },
    {
      "id": "1:josh",
      "label": "person",
      "type": "vertex",
      "properties": {
        "name": "josh",
        "age": 32,
        "city": "Beijing"
      }
    },
    {
      "id": "1:marko",
      "label": "person",
      "type": "vertex",
      "properties": {
        "name": "marko",
        "age": 29,
        "city": "Beijing"
      }
    },
    {
      "id": "1:peter",
      "label": "person",
      "type": "vertex",
      "properties": {
        "name": "peter",
        "age": 35,
        "city": "Shanghai"
      }
    },
    {
      "id": "1:vadas",
      "label": "person",
      "type": "vertex",
      "properties": {
        "name": "vadas",
        "age": 27,
        "city": "Hongkong"
      }
    },
    {
      "id": "2:ripple",
      "label": "software",
      "type": "vertex",
      "properties": {
        "name": "ripple",
        "lang": "java",
        "price": 199
      }
    }
  ],
  "edges": [
    {
      "id": "S1:josh>2>2>>S2:lop",
      "label": "created",
      "type": "edge",
      "outV": "1:josh",
      "outVLabel": "person",
      "inV": "2:lop",
      "inVLabel": "software",
      "properties": {
        "weight": 0.4,
        "date": "20091111"
      }
    },
    {
      "id": "S1:josh>2>2>>S2:ripple",
      "label": "created",
      "type": "edge",
      "outV": "1:josh",
      "outVLabel": "person",
      "inV": "2:ripple",
      "inVLabel": "software",
      "properties": {
        "weight": 1,
        "date": "20171210"
      }
    },
    {
      "id": "S1:marko>1>1>>S1:josh",
      "label": "knows",
      "type": "edge",
      "outV": "1:marko",
      "outVLabel": "person",
      "inV": "1:josh",
      "inVLabel": "person",
      "properties": {
        "weight": 1,
        "date": "20130220"
      }
    },
    {
      "id": "S1:marko>1>1>>S1:vadas",
      "label": "knows",
      "type": "edge",
      "outV": "1:marko",
      "outVLabel": "person",
      "inV": "1:vadas",
      "inVLabel": "person",
      "properties": {
        "weight": 0.5,
        "date": "20160110"
      }
    },
    {
      "id": "S1:marko>2>2>>S2:lop",
      "label": "created",
      "type": "edge",
      "outV": "1:marko",
      "outVLabel": "person",
      "inV": "2:lop",
      "inVLabel": "software",
      "properties": {
        "weight": 0.4,
        "date": "20171210"
      }
    },
    {
      "id": "S1:peter>2>2>>S2:lop",
      "label": "created",
      "type": "edge",
      "outV": "1:peter",
      "outVLabel": "person",
      "inV": "2:lop",
      "inVLabel": "software",
      "properties": {
        "weight": 0.2,
        "date": "20170324"
      }
    }
  ],
  "schema": {
    "vertexlabels": [
      {
        "id": 1,
        "name": "person",
        "id_strategy": "PRIMARY_KEY",
        "primary_keys": [
          "name"
        ],
        "properties": [
          "name",
          "age",
          "occupation"
        ],
        "nullable_keys": [
          "age",
          "occupation"
        ]
      },
      {
        "id": 2,
        "name": "webpage",
        "id_strategy": "PRIMARY_KEY",
        "primary_keys": [
          "name"
        ],
        "properties": [
          "name",
          "url"
        ],
        "nullable_keys": [
          "url"
        ]
      }
    ],
    "edgelabels": [
      {
        "id": 1,
        "name": "roommate",
        "source_label": "person",
        "target_label": "person",
        "properties": [
          "date"
        ]
      },
      {
        "id": 2,
        "name": "link",
        "source_label": "webpage",
        "target_label": "person",
        "properties": []
      }
    ],
    "propertykeys": [
      {
        "name": "name",
        "data_type": "TEXT",
        "cardinality": "SINGLE"
      },
      {
        "name": "age",
        "data_type": "TEXT",
        "cardinality": "SINGLE"
      },
      {
        "name": "occupation",
        "data_type": "TEXT",
        "cardinality": "SINGLE"
      },
      {
        "name": "url",
        "data_type": "TEXT",
        "cardinality": "SINGLE"
      },
      {
        "name": "date",
        "data_type": "TEXT",
        "cardinality": "SINGLE"
      }
    ]
  }
}

Note:
The improvement of this process can be made in two iterations.

  1. Improving the prompts.
  2. Using a two step sequence (multi agent system for the complete json generation :
    - First step: generate vertices.
    - Second step: generate edges.
    Why?:
    Reduces the load on a single agent, decreasing generalization, especially while handling alarge context window.
    This is all with the understanding that the schema and property keys are automatically added when the vertices and edges are correctly generated..

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Kryst4lDem0ni4s Kryst4lDem0ni4s changed the title [Feature] Improvement of [Feature] Improvement of Prompt Configuration for rag_demo Graph Index generation Mar 1, 2025
@dosubot dosubot bot added the enhancement New feature or request label Mar 1, 2025
@chiruu12
Copy link

chiruu12 commented Mar 1, 2025

@Kryst4lDem0ni4s Hey I have quite a bit experience with prompt engineering let me know if you need any help with the PR

Kryst4lDem0ni4s added a commit to Kryst4lDem0ni4s/incubator-hugegraph-ai that referenced this issue Mar 2, 2025
Kryst4lDem0ni4s added a commit to Kryst4lDem0ni4s/incubator-hugegraph-ai that referenced this issue Mar 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants