Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llm):improve some RAG function UT(tests) #192

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yanchaomei
Copy link

fix #167

Comprehensive Test Suite Implementation for HugeGraph-LLM

This PR implements a complete test suite for the HugeGraph-LLM project, covering all major components and ensuring code quality and reliability.

Summary of Test Implementation

1. Test Infrastructure

  • Created run_tests.py script for easy test execution
  • Implemented conftest.py with test configuration and fixtures
  • Added test utilities in test_utils.py for common testing functions
  • Set up test data directories with sample documents, schemas, and prompts

2. Document Processing Tests

  • test_document.py: Tests for document module imports and basic functionality
  • test_document_splitter.py: Tests for document chunking in different languages
  • test_text_loader.py: Tests for loading text files with various encodings

3. Integration Tests

  • test_graph_rag_pipeline.py: End-to-end tests for graph-based RAG pipeline
  • test_kg_construction.py: Tests for knowledge graph construction from documents
  • test_rag_pipeline.py: Tests for standard RAG pipeline functionality

4. Middleware Tests

  • test_middleware.py: Tests for FastAPI middleware components

5. Model Tests

  • LLM Tests:
    • test_openai_client.py: Tests for OpenAI API integration
    • test_qianfan_client.py: Tests for Baidu Qianfan API integration
    • test_ollama_client.py: Tests for Ollama local model integration
  • Embedding Tests:
    • test_openai_embedding.py: Tests for OpenAI embedding functionality
    • test_ollama_embedding.py: Tests for Ollama embedding functionality
  • Reranker Tests:
    • test_cohere_reranker.py: Tests for Cohere reranking API
    • test_siliconflow_reranker.py: Tests for SiliconFlow reranking API
    • test_init_reranker.py: Tests for reranker initialization

6. Operator Tests

  • Common Operations:
    • test_check_schema.py: Tests for schema validation
    • test_merge_dedup_rerank.py: Tests for result merging and reranking
    • test_nltk_helper.py: Tests for NLP utilities
    • test_print_result.py: Tests for result output formatting
  • Document Operations:
    • test_chunk_split.py: Tests for document chunking strategies
    • test_word_extract.py: Tests for keyword extraction
  • HugeGraph Operations:
    • test_commit_to_hugegraph.py: Tests for graph data writing
    • test_fetch_graph_data.py: Tests for graph data retrieval
    • test_graph_rag_query.py: Tests for graph-based RAG queries
    • test_schema_manager.py: Tests for graph schema management
  • Index Operations:
    • test_build_gremlin_example_index.py: Tests for Gremlin example indexing
    • test_build_semantic_index.py: Tests for semantic indexing
    • test_build_vector_index.py: Tests for vector index construction
    • test_gremlin_example_index_query.py: Tests for querying Gremlin examples
    • test_semantic_id_query.py: Tests for semantic ID queries
    • test_vector_index_query.py: Tests for vector index queries
  • LLM Operations:
    • test_gremlin_generate.py: Tests for Gremlin query generation
    • test_keyword_extract.py: Tests for LLM-based keyword extraction
    • test_property_graph_extract.py: Tests for property graph extraction

Testing Approach

The test suite employs several testing strategies:

  1. Unit Tests: Testing individual components in isolation
  2. Integration Tests: Testing interactions between components
  3. Mock Testing: Using mocks to simulate external dependencies
  4. Parametrized Tests: Testing with various input combinations
  5. Exception Testing: Verifying proper error handling

Key Features

  • Comprehensive Coverage: Tests for all major modules and components
  • External Service Handling: Tests can skip external service dependencies when needed
  • Mock Implementations: Provides mock implementations for external services
  • Test Data: Includes sample data for consistent test execution
  • Isolation: Tests are designed to run independently without side effects

Results

All tests pass successfully, ensuring the reliability and correctness of the HugeGraph-LLM codebase. The test suite provides a solid foundation for future development and helps maintain code quality as the project evolves.

@github-actions github-actions bot added the llm label Mar 5, 2025
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Mar 5, 2025
@@ -0,0 +1,106 @@
#!/usr/bin/env python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we don't need it?

Also check other CI check, THX~

Copy link
Member

@imbajin imbajin Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should enable the test in the related CI file: (So it could run automatically)
like add a .github/workflows/graph_rag.yml ?

could refer:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it~ I will do it soon

export PYTHONPATH=$(pwd)/hugegraph-llm/src
export SKIP_EXTERNAL_SERVICES=true
cd hugegraph-llm
python -m pytest src/tests/integration/test_graph_rag_pipeline.py -v
Copy link
Member

@imbajin imbajin Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note each file should have a EOF line (U could config it in your IDE's settings)

image

So as others files

Copy link
Member

@imbajin imbajin Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/incubator-hugegraph-ai/actions/runs/13693587346/job/38291894859?pr=192

And could check the CI status here (U could submit a PR in your own repo, select the upstream branch like
yanchaomei:main to test it separately)

image

Also better not use main/master as your default branch, keep it clean & it could sync the code with upstream
easily(one-click), if u want to modify some code u could checkout a new branch from main like dev-xx (This can avoid many potential conflicts and inconsistencies in the future, and also maintain clarity in using Git)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request llm size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve some RAG function UT(tests)
2 participants