Couchbase Mastra RAG

A Next.js application that enables users to upload PDF documents and chat with their content using Couchbase vector search and OpenAI embeddings, built with the Mastra framework.

Quick Start

Prerequisites

Node.js 22+ and npm/yarn/pnpm
Couchbase Capella account or local Couchbase cluster
OpenAI API key for embeddings and chat

Installation

Clone and install dependencies

git clone <repository-url>
cd couchbase-mastra-rag
npm install

Environment Configuration

Create a .env file with these required variables:

# Couchbase Vector Store Configuration
COUCHBASE_CONNECTION_STRING=couchbase://localhost
COUCHBASE_USERNAME=Administrator
COUCHBASE_PASSWORD=your_password
COUCHBASE_BUCKET_NAME=your_bucket
COUCHBASE_SCOPE_NAME=your_scope
COUCHBASE_COLLECTION_NAME=your_collection

# Embedding Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSION=1536
EMBEDDING_BATCH_SIZE=100

# Chunking Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200

# Vector Index Configuration
VECTOR_INDEX_NAME=document-embeddings
VECTOR_INDEX_METRIC=cosine

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

Setup Guide

Couchbase Setup
- Create a Couchbase Capella account or local cluster
- Create a bucket and collection for document storage
- Get connection credentials and add to environment variables
OpenAI Setup
- Get API key from OpenAI Platform
- Add to environment variables

Running the Application

# Development mode
npm run dev

# Production build
npm run build
npm start

Open http://localhost:3000 to access the application.

Screenshots

PDF Upload Interface

Chat Interface

Usage

Upload PDF: Drag and drop or select a PDF file (max 100MB)
Processing: The app will extract text, create embeddings, and store in Couchbase
Chat: Navigate to the chat interface to ask questions about your document
Search: The system uses vector similarity search to find relevant content

Configuration Details

The application automatically validates all required environment variables on startup. Key configurations:

Embedding Model: Uses OpenAI's text-embedding-3-small by default
Chunking: Documents split into 100-character chunks with 50-character overlap
Vector Search: Cosine similarity for semantic search
File Storage: PDFs stored in public/assets/ directory

Architecture

System Overview

The application follows a modern RAG (Retrieval-Augmented Generation) pattern with clear separation between frontend, backend, and data layers.

Frontend Layer

Framework: Next.js 15 with React 19
Components:
- PDFUploader: Drag-and-drop interface using react-dropzone
- InfoCard: Application information and instructions
- chatPage: Chat interface for document interaction
Styling: Tailwind CSS for responsive design
File Handling: Client-side PDF validation and FormData submission

Backend Layer

API Routes:
- /api/ingestPdf: Handles PDF upload, text extraction, chunking, and vector storage
- /api/chat: Chat endpoint for conversational AI functionality
Document Processing:
- PDF text extraction using pdf-parse
- Text chunking with configurable size and overlap
- Embedding generation via OpenAI's text-embedding-3-small

Data Layer

Vector Database: Couchbase for high-performance vector search
- Stores document embeddings with metadata
- Supports cosine similarity search
- Auto-creates vector indexes for semantic search
File Storage: Local filesystem (public/assets/) for uploaded PDFs

AI & ML Components

Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
Agent Framework: Mastra for AI agent orchestration
Vector Search: Semantic similarity matching for relevant content retrieval

Data Flow

Upload: User uploads PDF → stored locally + FormData sent to API
Processing: PDF text extracted → chunked → embeddings generated → stored in Couchbase
Query: User chat input → embedded → vector search → relevant chunks retrieved → LLM response
Response: Generated answer returned to user interface

Configuration Management

Environment-based configuration with validation
Automatic index creation and management
Error handling with graceful fallbacks

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
public/images		public/images
src		src
.env.sample		.env.sample
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Couchbase Mastra RAG

Quick Start

Prerequisites

Installation

Setup Guide

Running the Application

Screenshots

PDF Upload Interface

Chat Interface

Usage

Configuration Details

Architecture

System Overview

Frontend Layer

Backend Layer

Data Layer

AI & ML Components

Data Flow

Configuration Management

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

couchbase-examples/mastra-nextJS-quickstart

Folders and files

Latest commit

History

Repository files navigation

Couchbase Mastra RAG

Quick Start

Prerequisites

Installation

Setup Guide

Running the Application

Screenshots

PDF Upload Interface

Chat Interface

Usage

Configuration Details

Architecture

System Overview

Frontend Layer

Backend Layer

Data Layer

AI & ML Components

Data Flow

Configuration Management

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages