Individual Project 03
Machine Learning with Advanced Python
Infotech College of Business & IT
A Retrieval-Augmented Generation (RAG) powered chatbot built for Infotech College Website, enabling students and visitors to ask questions about courses, admissions, campus life, and more.
This project showcases practical use of Large Language Models (LLMs), Vector Databases, and Retrieval-Augmented Generation to build a domain-specific assistant.
- ✅ Document Processing: Loads and splits college PDFs into retrievable knowledge chunks
- ✅ Vector Search: Embeddings stored in ChromaDB for fast retrieval
- ✅ LLM Integration: Powered by Ollama with DeepSeek-r1 (1.5B) model
- ✅ RAG Pipeline: Retrieval + Prompting + Context-aware responses
- ✅ Interactive CLI: Ask questions directly from terminal
- ✅ Streamlit UI: Simple front-end interface for users
- ✅ Clean Responses: Custom function strips unnecessary “thinking” text from LLM outputs
ML_Projects_ChatBot_for_Infotech_College/
│── 1_data/ # PDF documents
|
│── 2_src/
│ ├── 1_chatbot/
│ │ ├── doc_load.py # Load PDFs
│ │ ├── doc_split.py # Split into chunks
│ │ ├── get_vectorstore.py # Create & save embeddings
│ │ ├── integrate_llm.py # Connect to Ollama LLM
│ │ ├── build_rag.py # RAG pipeline
│ │ ├── clean.py # Strip unwanted LLM text
│ │ └── chat_interactive.py # CLI chatbot
│ │
│ └── 2_frontend/
│ └── frontend.py # Streamlit interface (optional)
│
│── open_chatbot.py # Final chatbot entrypoint
|
│── .git/
│── chromadb/ # Vector database
│
│── requirements.txt # Dependencies
│── README.md # Project documentation
└── cover_image.png
- Language: Python 3.11
- RAG pipeline & integrations: LangChain
- Vector database: ChromaDB
- Local LLM integration: Ollama + DeepSeek-r1 : 1.5B model
- Embeddings: HuggingFace Sentence Transformers
all-MiniLM-L6-v2
- UI: Streamlit
- IDE: Visual Studio Code + Anaconda Distribution
git clone https://github.com/ThilinaPerera-DataAnalytics/ML_Projects_ChatBot_for_Infotech_College.git
cd ML_Projects_ChatBot_for_Infotech_College
conda create -n infochatbot python=3.11
conda activate infochatbot
pip install -r requirements.txt
Install Ollama → [ollama.com](https://ollama.com/)
ollama pull deepseek-r1:1.5b
# Load documents
python 2_src/1_chatbot/doc_load.py
# Split into chunks
python 2_src/1_chatbot/doc_split.py
# Create vectorstore
python 2_src/1_chatbot/get_vectorstore.py
# Run chatbot interactively
python 2_src/1_chatbot/chat_interactive.py
# Run final chatbot
python open_chatbot.py
streamlit run 2_src/2_frontend/frontend.py
Then open: http://localhost:8501
Streamlit Chat UI
Q: How can I enroll at Infotech College?
A: Visit Enrollment Page. Payment options include monthly installments, full payment, or smaller installment fees.
Q: What courses are offered?
A: Foundation in Information Technology (FIT) program, recognized by the University of Colombo School of Computing (UCSC), plus other IT-focused offerings.
Q: Can I study medicine there?
A: Infotech College specializes in IT & AI programs, not medicine.
(Lenovo T530 i7 quad core + Samsung SSD EVO 870 512GB + Kingston DDR3 8GB x 2 + NVIDIA NVS 5400M)
- Mistral (7.3B) → ~200 sec response time
- DeepSeek-r1 (1.5B) → ~60 sec response time (chosen for final chatbot)
- Vector store size → 63 embeddings across 39 documents
- Practical application of RAG pipelines for domain-specific QA
- Trade-offs between different LLM sizes (speed vs detail)
- Building both CLI and Web-based chatbot interfaces
The current version of the Infotech College Virtual Assistant delivers core RAG-based Q&A with document retrieval and a clean Streamlit interface. Future improvements will focus on expanding both functionality and user experience, including:
-
Multimodal Support: Allow the chatbot to handle not just text but also images, PDFs, and video transcripts for richer responses.
-
Chat History & Memory: Persist conversations across sessions, enabling follow-up questions and more natural dialog.
-
Admin Dashboard: Provide college staff with insights on frequently asked questions, student trends, and analytics.
-
Multi-Language Support: Integrate Sinhala and Tamil responses alongside English for accessibility.
-
Deployment at Scale: Host the chatbot on a secure cloud platform (AWS / Azure / GCP) with API endpoints for web and mobile.
-
Voice Interface: Enable speech-to-text and text-to-speech for students who prefer a voice-driven guide.
-
Integration with College Systems: Connect to CRM/ERP for real-time admission updates, course schedules, and event announcements.
- Lecturer Mr. Akshan Bandara for guidance & Infotech College staff for resource material.
- The open-source community behind tools like LangChain, Streamlit, ChromaDB, and Ollama which power the backbone of this chatbot.
- Hugging Face for providing accessible state-of-the-art embedding models.
Thilina Perera/ Data with TP
📌 Data Science/ Data Analytics
📌 Machine Learning, Deep Learning, LLM/LMM, NLP, and Automated Data Pipelines Explorer
🔍 LinkedIn | 🔍 GitHub | 🔍 YouTube
This project is licensed under the MIT License – free to use and extend.
Contributions are welcome! Please fork the repo and submit a pull request with improvements.
✨ If you like this project, don’t forget to star ⭐ the repo..!