Skip to content

πŸš€ Comprehensive study on LangChain, RAG, and vector embeddings with practical examples including PDF processing and FAQ chatbot implementation

License

Notifications You must be signed in to change notification settings

Natanaelvich/ai-rag-embeddings-langchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ LangChain RAG Embeddings Study

A comprehensive study repository for exploring LangChain, RAG (Retrieval-Augmented Generation), embeddings, and semantic vector search techniques with practical implementations.

πŸ“‹ Overview

This repository contains hands-on experiments and production-ready implementations of modern AI techniques, focusing on:

  • πŸ”— LangChain Framework: Building sophisticated LLM applications
  • πŸ” RAG (Retrieval-Augmented Generation): Enhancing LLM responses with relevant context
  • 🧠 Vector Embeddings: Converting text into semantic representations
  • πŸ”Ž Semantic Search: Finding relevant documents using meaning, not just keywords
  • πŸ“„ Document Processing: PDF parsing, chunking, and vectorization strategies

🎯 What You'll Learn

01 - Introduction to Embeddings

  • PDF Processing: Extract and process text from PDF documents
  • Vector Embeddings: Convert text chunks into semantic vectors using OpenAI
  • Similarity Search: Find relevant content using semantic similarity
  • Interactive Chat: Build a conversational interface with context-aware responses

02 - Real-World FAQ System

  • Structured Data: Work with JSON-based FAQ datasets
  • Multi-Category Search: Handle different types of questions (product, service, technical)
  • Production-Ready Chatbot: Implement a robust FAQ answering system
  • Context Retrieval: Smart document retrieval for accurate responses

πŸ› οΈ Technologies Used

  • TypeScript - Type-safe JavaScript development
  • LangChain - Framework for building LLM applications
  • OpenAI GPT-4 - Advanced language model for text generation
  • OpenAI Embeddings - Text-to-vector conversion
  • PostgreSQL - Vector database for storing embeddings
  • Drizzle ORM - Type-safe database operations
  • PDF-Parse - PDF document processing

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • PostgreSQL database
  • OpenAI API key

Installation

  1. Clone the repository
git clone https://github.com/Natanaelvich/langchain-rag-embeddings-study.git
cd langchain-rag-embeddings-study
  1. Install dependencies
npm install
  1. Set up environment variables
cp .env.example .env
# Edit .env with your OpenAI API key and database credentials
  1. Run the examples

PDF Processing & Chat:

npm run dev src/01-introduction/gpt-embeddings-pdf.ts

FAQ Chatbot:

npm run dev src/02-real-world-faq/chat-faq.ts

πŸ“ Project Structure

src/
β”œβ”€β”€ 01-introduction/          # Basic embeddings and PDF processing
β”‚   β”œβ”€β”€ gpt-embeddings-pdf.ts # Interactive chat with PDF content
β”‚   β”œβ”€β”€ load-embeddings-pdf.ts # PDF loading and vectorization
β”‚   └── search-embeddings-pdf.ts # Vector search implementation
β”œβ”€β”€ 02-real-world-faq/        # Production FAQ system
β”‚   β”œβ”€β”€ chat-faq.ts          # Interactive FAQ chatbot
β”‚   β”œβ”€β”€ load-faq-data.ts     # FAQ data loading and processing
β”‚   └── search-faq.ts        # FAQ-specific search logic
└── schema.ts                # Database schema definitions

tmp/
β”œβ”€β”€ agents-data/             # Sample data for agents
β”œβ”€β”€ faq-data/               # FAQ datasets (product, service, technical)
└── pdf/                    # PDF documents for processing

πŸ”§ Available Scripts

  • npm run dev - Start development server with hot reload
  • npm run build - Build TypeScript to JavaScript
  • npm run start - Run built application
  • npm run test - Run test suite
  • npm run lint - Check code quality
  • npm run format - Format code with Prettier
  • npm run studio - Open Drizzle Studio for database management

πŸŽ“ Learning Path

Beginner Level

  1. Start with 01-introduction/ to understand basic concepts
  2. Learn about embeddings and vector search
  3. Build your first RAG application

Intermediate Level

  1. Explore 02-real-world-faq/ for production patterns
  2. Understand structured data processing
  3. Implement multi-category search

Advanced Level

  1. Customize the implementations for your use case
  2. Add new data sources and processing pipelines
  3. Optimize performance and accuracy

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Related Resources


⭐ Star this repository if you found it helpful for your AI/ML journey!

About

πŸš€ Comprehensive study on LangChain, RAG, and vector embeddings with practical examples including PDF processing and FAQ chatbot implementation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published