LLM based Solr & Lucene Technical Support Assistant

An LLM based technical Solr assistant which pulls in data from Solr mailing lists, Github PRs and official Solr documentation for context to answer technical user queries.

How to run

Setup a virtual environment
python3 -m venv venv
source venv/bin/activate #or 'venv\Scripts\activate' on Windows
pip install -r requirements.txt
python3 generate_project_folder_structure.py
The tool currently ingests the Github issues from apache/solr and apache/lucene-solr projects for its dataset. User mailing lists and documentation to be added. You'll need a Github Personal Access Token (PAT) in order to be able to crawl the Github issues (Refer to https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-fine-grained-personal-access-token). In the project folder create a .env file with your PAT and OpenAI API Key:
GITHUB_TOKEN=<Your PAT>
OPENAI_API_KEY=<Your OpenAI API key>

The initial setup is now complete.

Scripts to be run from scripts/ folder in the given sequence

fetch_github_issues.py ==> Fetches Github issues from apache/solr and apache/lucene-solr, and stores in the data/github_issues folder
chunk_issues.py ==> Chunks the title+body and PR comments in chunks of 300 tokens with a 20% overlap between chunks for better context retention during vector generation 
index_chunks.py ==> Generates embeddings using mpnet model (with 384-512 token context window) and indexes them into ChromaDB

Phases completed

Phase 1: Ingestion

Goal: Ingest core data sources for retrieval.

Phase 2: Semantic search

Goal: Generate and Index embeddings in ChromaDB to support semantic search for RAG

Phase 3: Retrieval-Augmented Generation (RAG)

Goal: Combine retrieval with LLMs to produce grounded answers.

Phase4: Agentic Tooling

Goal: Evolve RAG into a modular, multi-step agent pipeline.

To-Do (WIP)

Phase 5: MCP refactor and Server Deployment

Goal: Turn the agent pipeline into a production-ready service.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
generate_project_folder_structure.py		generate_project_folder_structure.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM based Solr & Lucene Technical Support Assistant

How to run

Scripts to be run from scripts/ folder in the given sequence

Phases completed

To-Do (WIP)

About

Uh oh!

Releases

Packages

Languages

rahulgoswami/LLM-based-solr-support-assistant

Folders and files

Latest commit

History

Repository files navigation

LLM based Solr & Lucene Technical Support Assistant

How to run

Scripts to be run from scripts/ folder in the given sequence

Phases completed

To-Do (WIP)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages