This is an end-to-end project that builds a "Text-to-SQL" application using Google's Gemini Pro for free. The application allows non-technical users to ask complex questions in English, which are then converted into precise SQL queries, executed against a soccer database, and the results are returned in a user-friendly interface.
-
Project Overview
-
Objectives
-
Tech Stack
-
Project Workflow
-
Key Findings & Insights
-
Repository Structure
-
How to Run This Project
-
Important Note & Future Directions
This project demonstrates the power of Large Language Models (LLMs) by creating an intelligent natural language interface for a complex database, which bridges the gap between intricate data structures and users who lack SQL expertise. The application uses a sophisticated prompt engineering strategy to guide the Gemini 2.5 Pro model, ensuring it generates accurate, efficient, and secure SQL queries tailored specifically for a SQLite environment. The entire application is wrapped in an interactive web interface built with Streamlit.
In many organizations, valuable data is stored in complex databases, accessible only to data analysts or developers proficient in SQL. This creates a bottleneck, slowing down decision-making for business users, coaches, or scouts who need quick answers from the data.
The primary objectives of this project are:
- Eliminate Technical Barriers: Enable users to query a database using natural English without writing any code.
- Accelerate Data-Driven Insights: Provide an interactive tool for real-time data exploration and analysis.
- Ensure Query Accuracy: Leverage advanced prompt engineering to guide the LLM into generating syntactically correct and context-aware SQL queries.
- Build a Scalable Solution: Create a modular application that can be adapted to different databases by dynamically reading and understanding the database schema.
- Language: Python
- LLM: Google Gemini 2.5 Pro
- Database: SQLite
- Web Framework: Streamlit
- Environment Management: python-dotenv
The application follows a streamlined, multi-step process to convert a user's question into a data-driven answer.
- Dynamic Schema Ingestion: The application first connects to the SQLite database and automatically reads its schema. It identifies all tables and their respective columns to understand the database structure.
- Prompt Engineering: A detailed system prompt is constructed dynamically. This prompt instructs the Gemini Pro model on its role as an expert SQL developer, provides the complete database schema, and lists critical rules for generating valid, optimized SQLite queries.
- LLM-Powered Query Generation: The user's question is combined with the system prompt and sent to the Gemini Pro API. The model processes the request and returns a precise SQL query string as output.
- Secure Database Execution: The application takes the SQL query generated by the LLM and executes it against the SQLite database to fetch the requested data.
- Interactive Visualization: The retrieved data is displayed to the user in a clean, tabular format within the Streamlit web application.
- Natural Language Querying: Allows users to ask complex questions like, "Who is the tallest player with the highest jumping rating?" and get immediate, accurate answers.
- Dynamic Schema Awareness: Automatically detects the database schema, making the application adaptable to other databases with minimal code changes.
- SQLite-Specific Optimization: The prompt includes specific instructions for SQLite syntax (e.g., || for concatenation, COALESCE() for NULLs), ensuring the generated queries are highly optimized and error-free.
- Robust Error Handling: The prompt guides the LLM to handle ambiguous questions or queries that cannot be answered with the available data, improving the user experience.
- Interactive Web UI: A clean and simple user interface built with Streamlit allows for easy interaction and data visualization.
├── README.md ├── SQL.py # Database Ingestion ├── APP.py # Core Streamlit application logic ├── requirements.txt # Python dependencies ├── Results │ ├── SQL Query │ └── Application Output └── 📦 Database(source link).txt # The soccer database (download separately)
Follow these steps to set up and run the project on your local machine.
Prerequisites
- Python 3.9+
- An active Google API Key with the Gemini API enabled.
Steps 1. Clone the repository:
git clone https://github.com/nipunchauhan/Text-to-SQL-LLM-Application-with-Google-Gemini-Pro.git cd Text-to-SQL-LLM-Application-with-Google-Gemini-Pro
2. Create and activate a virtual environment:
# For macOS/Linux python3 -m venv venv source venv/bin/activate # For Windows python -m venv venv .\venv\Scripts\activate
3. Install the required libraries:
pip install -r requirements.txt
4. Download the Database:
Download the SQLite database from the Dataset
Place the database.sqlite file in the root directory of the project.
5. Set up your Google API Key:
To get an API key go to Google and get the free access.
Create a file named .env in the root directory.
Add your Google API key to this file as follows: GOOGLE_API_KEY="YOUR_API_KEY_HERE"
7. Run the Streamlit application:
streamlit run APP.py
Open your web browser and navigate to the local URL provided by Streamlit
This project serves as a powerful proof-of-concept and a foundational first version. While highly capable, it is not a production-ready system and still has room for growth.
Known Limitations:
- The LLM can occasionally generate an incorrect or inefficient SQL query, especially for highly complex or ambiguous questions.
- The application's performance is tied to the local machine's capabilities and the size of the SQLite database.
There are many exciting directions to take this project:
-
Cloud Integration: Scale the application by migrating the database to a cloud service like Google Cloud SQL or Amazon RDS to handle much larger datasets and higher traffic.
-
Advanced Chat: Implement a more conversational, chat-like interface that remembers the context of previous questions.
-
Support for Multiple Databases: Extend the system to connect to other SQL databases like PostgreSQL or MySQL.
-
Enhanced UI: Add features for data visualization, allowing users to generate charts and graphs directly from their queries.
Nipun Chauhan
Email: [email protected]
Phone: (469) 386-8456