Skip to content

This project demonstrates how to build a local semantic search engine that converts PDF documents into vector embeddings using Ollama, and stores them in a ChromaDB vector database.

License

Notifications You must be signed in to change notification settings

javsan77/Local-RAG-with-Chroma-and-Ollama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Local RAG with Ollama + ChromaDB

This project implements a fully local Retrieval-Augmented Generation (RAG) pipeline using:

  • 🦙 Ollama — for local embeddings and reasoning with LLMs
  • 🧩 ChromaDB — as a vector database
  • 📄 LangChain — to load, split, and process PDF documents

The goal is to transform a PDF into a semantic vector knowledge base that can later be queried by an LLM such as llama2 for contextual answers.


⚙️ Installation

1️⃣ Clone the repository

git clone https://github.com/javsan77/Local-RAG-with-Chroma-and-Ollama.git
cd Local-RAG-with-Chroma-and-Ollama

2️⃣ Create a virtual environment

python3 -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

3️⃣ Install dependencies

pip install -r requirements.txt

4️⃣ Install and run Ollama

Download Ollama from ollama.com/download and start the local server:

ollama serve

Then pull the required models:

# Embedding model
ollama pull nomic-embed-text

# LLM for reasoning and Q&A
ollama pull llama2

🧩 Generate the vector database

Place your PDF file as documento.pdf in the project root and run:

python rag_setup.py

📁 This will create a local Chroma database inside chroma_db/, where each document chunk is stored as a semantic vector.


🧠 Test Ollama with Llama2

Before integrating the model into your app, test it manually:

ollama run llama2

Then type something like:

Hello, what can you do?

To exit interactive mode: Ctrl + C

This ensures that Ollama and the llama2 model are running correctly before connecting it to your RAG pipeline.


🧰 Requirements

  • Python 3.10+
  • Ollama running locally (ollama serve)
  • Downloaded models: nomic-embed-text, llama2
  • Python dependencies from requirements.txt

Example:

langchain
langchain-community
langchain-ollama
pypdf
chromadb

👨‍💻 Author

Javier Sanchez Backend Developer | AI & Data Enthusiast 🔗 GitHub


🧾 License

MIT License © 2025

About

This project demonstrates how to build a local semantic search engine that converts PDF documents into vector embeddings using Ollama, and stores them in a ChromaDB vector database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages