A Next.js application that enables users to upload PDF documents and chat with their content using Couchbase vector search and OpenAI embeddings, built with the Mastra framework.
- Node.js 22+ and npm/yarn/pnpm
- Couchbase Capella account or local Couchbase cluster
- OpenAI API key for embeddings and chat
- Clone and install dependencies
git clone <repository-url>
cd couchbase-mastra-rag
npm install
- Environment Configuration
Create a .env
file with these required variables:
# Couchbase Vector Store Configuration
COUCHBASE_CONNECTION_STRING=couchbase://localhost
COUCHBASE_USERNAME=Administrator
COUCHBASE_PASSWORD=your_password
COUCHBASE_BUCKET_NAME=your_bucket
COUCHBASE_SCOPE_NAME=your_scope
COUCHBASE_COLLECTION_NAME=your_collection
# Embedding Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSION=1536
EMBEDDING_BATCH_SIZE=100
# Chunking Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# Vector Index Configuration
VECTOR_INDEX_NAME=document-embeddings
VECTOR_INDEX_METRIC=cosine
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
-
Couchbase Setup
- Create a Couchbase Capella account or local cluster
- Create a bucket and collection for document storage
- Get connection credentials and add to environment variables
-
OpenAI Setup
- Get API key from OpenAI Platform
- Add to environment variables
# Development mode
npm run dev
# Production build
npm run build
npm start
Open http://localhost:3000 to access the application.
- Upload PDF: Drag and drop or select a PDF file (max 100MB)
- Processing: The app will extract text, create embeddings, and store in Couchbase
- Chat: Navigate to the chat interface to ask questions about your document
- Search: The system uses vector similarity search to find relevant content
The application automatically validates all required environment variables on startup. Key configurations:
- Embedding Model: Uses OpenAI's
text-embedding-3-small
by default - Chunking: Documents split into 100-character chunks with 50-character overlap
- Vector Search: Cosine similarity for semantic search
- File Storage: PDFs stored in
public/assets/
directory
The application follows a modern RAG (Retrieval-Augmented Generation) pattern with clear separation between frontend, backend, and data layers.
- Framework: Next.js 15 with React 19
- Components:
PDFUploader
: Drag-and-drop interface using react-dropzoneInfoCard
: Application information and instructionschatPage
: Chat interface for document interaction
- Styling: Tailwind CSS for responsive design
- File Handling: Client-side PDF validation and FormData submission
- API Routes:
/api/ingestPdf
: Handles PDF upload, text extraction, chunking, and vector storage/api/chat
: Chat endpoint for conversational AI functionality
- Document Processing:
- PDF text extraction using
pdf-parse
- Text chunking with configurable size and overlap
- Embedding generation via OpenAI's text-embedding-3-small
- PDF text extraction using
- Vector Database: Couchbase for high-performance vector search
- Stores document embeddings with metadata
- Supports cosine similarity search
- Auto-creates vector indexes for semantic search
- File Storage: Local filesystem (
public/assets/
) for uploaded PDFs
- Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
- Agent Framework: Mastra for AI agent orchestration
- Vector Search: Semantic similarity matching for relevant content retrieval
- Upload: User uploads PDF → stored locally + FormData sent to API
- Processing: PDF text extracted → chunked → embeddings generated → stored in Couchbase
- Query: User chat input → embedded → vector search → relevant chunks retrieved → LLM response
- Response: Generated answer returned to user interface
- Environment-based configuration with validation
- Automatic index creation and management
- Error handling with graceful fallbacks