A comprehensive study repository for exploring LangChain, RAG (Retrieval-Augmented Generation), embeddings, and semantic vector search techniques with practical implementations.
This repository contains hands-on experiments and production-ready implementations of modern AI techniques, focusing on:
- π LangChain Framework: Building sophisticated LLM applications
- π RAG (Retrieval-Augmented Generation): Enhancing LLM responses with relevant context
- π§ Vector Embeddings: Converting text into semantic representations
- π Semantic Search: Finding relevant documents using meaning, not just keywords
- π Document Processing: PDF parsing, chunking, and vectorization strategies
- PDF Processing: Extract and process text from PDF documents
- Vector Embeddings: Convert text chunks into semantic vectors using OpenAI
- Similarity Search: Find relevant content using semantic similarity
- Interactive Chat: Build a conversational interface with context-aware responses
- Structured Data: Work with JSON-based FAQ datasets
- Multi-Category Search: Handle different types of questions (product, service, technical)
- Production-Ready Chatbot: Implement a robust FAQ answering system
- Context Retrieval: Smart document retrieval for accurate responses
- TypeScript - Type-safe JavaScript development
- LangChain - Framework for building LLM applications
- OpenAI GPT-4 - Advanced language model for text generation
- OpenAI Embeddings - Text-to-vector conversion
- PostgreSQL - Vector database for storing embeddings
- Drizzle ORM - Type-safe database operations
- PDF-Parse - PDF document processing
- Node.js 18+
- PostgreSQL database
- OpenAI API key
- Clone the repository
git clone https://github.com/Natanaelvich/langchain-rag-embeddings-study.git
cd langchain-rag-embeddings-study
- Install dependencies
npm install
- Set up environment variables
cp .env.example .env
# Edit .env with your OpenAI API key and database credentials
- Run the examples
PDF Processing & Chat:
npm run dev src/01-introduction/gpt-embeddings-pdf.ts
FAQ Chatbot:
npm run dev src/02-real-world-faq/chat-faq.ts
src/
βββ 01-introduction/ # Basic embeddings and PDF processing
β βββ gpt-embeddings-pdf.ts # Interactive chat with PDF content
β βββ load-embeddings-pdf.ts # PDF loading and vectorization
β βββ search-embeddings-pdf.ts # Vector search implementation
βββ 02-real-world-faq/ # Production FAQ system
β βββ chat-faq.ts # Interactive FAQ chatbot
β βββ load-faq-data.ts # FAQ data loading and processing
β βββ search-faq.ts # FAQ-specific search logic
βββ schema.ts # Database schema definitions
tmp/
βββ agents-data/ # Sample data for agents
βββ faq-data/ # FAQ datasets (product, service, technical)
βββ pdf/ # PDF documents for processing
npm run dev
- Start development server with hot reloadnpm run build
- Build TypeScript to JavaScriptnpm run start
- Run built applicationnpm run test
- Run test suitenpm run lint
- Check code qualitynpm run format
- Format code with Prettiernpm run studio
- Open Drizzle Studio for database management
- Start with
01-introduction/
to understand basic concepts - Learn about embeddings and vector search
- Build your first RAG application
- Explore
02-real-world-faq/
for production patterns - Understand structured data processing
- Implement multi-category search
- Customize the implementations for your use case
- Add new data sources and processing pipelines
- Optimize performance and accuracy
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
β Star this repository if you found it helpful for your AI/ML journey!