🧠 Python Documentation RAG Chatbot

An intelligent, context-aware chatbot that helps users improve their coding skills through interactive conversations. Powered by Retrieval-Augmented Generation (RAG), the system pulls information from official Python documentation to provide accurate and relevant responses.

📌 About the Project

Our project is an intelligent chatbot designed to help users improve their Python coding skills through conversation-based learning. The chatbot leverages Retrieval-Augmented Generative (RAG) techniques to fetch information from official Python documentation and provide accurate, grounded, and context-aware answers.

The chatbot is trained on Python 3.13 documentation and enables:

Conversational search
Programming guidance
Real-time assistance using language models and document retrieval

🎯 Goal

Our goal is to fulfill the following core functionalities that assist users in everyday learning:

Interactive Learning
Engage users in real-world conversations that enhance learning.
Library Assistance
Suggest Python libraries based on user needs and queries.
Programming Correction
Help users debug, understand, or improve their code using relevant documentation.

🔍 Scope

🖥️ More Prettier UI
Build a clean, modern UI using Next.js for a better user experience.
📚 One-Stop Shop for Python Library
The bot will support multiple Python libraries and fetch accurate documentation in response to user questions.
⚙️ Check Program Efficiency
- Retrieve and rank the top 3 most relevant documents from the vector database.
- Provide suggestions with context and explanations.

🏗️ Tech Stack

Component	Tech
Frontend (Planned)	Next.js
Backend	LangChain, FastAPI
LLM	OpenAI GPT-3.5 / GPT-4
Embeddings	OpenAI Embeddings
Vector Store	FAISS (local) or Pinecone/Chroma (cloud)
Data Source	Python 3.13 Documentation (Plain Text)
Tools	Jupyter, Git, Databricks (for data prep)

👨‍👩‍👧‍👦 Team Members

We will collaborate to implement the RAG pipeline and improve answer accuracy.
We will use vector databases to support semantic document search.
We will spread out data collection, aggregation, and cleaning.
We will use Databricks for large-scale processing and feature exploration.
We will iterate to handle real-world issues like ambiguity, noise, or conflicting documentation.

🧪 What the Code Does / Steps to Run

🔄 Pipeline Breakdown

Unzip Python 3.13 documentation
The plain-text .zip file is extracted into a local folder using Python's zipfile.
Load all .txt files
Reads every file in the extracted folder into memory using LangChain's Document class.
Split documents into chunks
Uses RecursiveCharacterTextSplitter to break content into small, overlapping pieces for better semantic matching.
Generate embeddings
OpenAI Embeddings are created for each chunk and stored in a FAISS vector store for fast retrieval.
Run Retrieval-Augmented Generation
A LangChain RetrievalQA or ConversationalRetrievalChain is used to answer user questions based on retrieved content.
Interactive chat
A loop runs in the notebook where the user can ask questions about Python, and the bot responds using official docs.

🔐 How to Get OpenAI API Key & Run the Bot

To use OpenAI’s GPT models for answering questions, you'll need to generate an API key from your OpenAI account.

🧾 Step-by-Step Instructions

Create/Open an OpenAI account
Go to: https://platform.openai.com/signup
(Sign up or log in with your existing credentials)
Generate your API key
- Go to: https://platform.openai.com/account/api-keys
- Click “Create new secret key”
- Copy and save the key somewhere safe (you won’t be able to see it again!)

Set the API key in your code
You can pass it securely using Python:

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("🔐 Enter your OpenAI API key: ")

🚀 Setup Instructions

Install Jupyter Notebook if not done: https://jupyter.org/
Go to openai and get api key: https://platform.openai.com/settings/organization/api-keys
Download the python documentation: https://docs.python.org/3/download.html

Clone the repository:

git clone https://github.com/yourusername/cs6320_project.git
cd python-rag-chatbot

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
..bfg-report/2025-04-25/11-26-05/protected-dirt		..bfg-report/2025-04-25/11-26-05/protected-dirt
.ipynb_checkpoints		.ipynb_checkpoints
backend		backend
frontend		frontend
python_docs/python-3.13-docs-text		python_docs/python-3.13-docs-text
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Report.pdf		Report.pdf
how_to_run.txt		how_to_run.txt
main.ipynb		main.ipynb
python-3.13-docs-text.zip		python-3.13-docs-text.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Python Documentation RAG Chatbot

📌 About the Project

🎯 Goal

🔍 Scope

🏗️ Tech Stack

👨‍👩‍👧‍👦 Team Members

🧪 What the Code Does / Steps to Run

🔄 Pipeline Breakdown

🔐 How to Get OpenAI API Key & Run the Bot

🧾 Step-by-Step Instructions

🚀 Setup Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Python Documentation RAG Chatbot

📌 About the Project

🎯 Goal

🔍 Scope

🏗️ Tech Stack

👨‍👩‍👧‍👦 Team Members

🧪 What the Code Does / Steps to Run

🔄 Pipeline Breakdown

🔐 How to Get OpenAI API Key & Run the Bot

🧾 Step-by-Step Instructions

🚀 Setup Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages