This project provides an automatic code analysis for Java-based repositories. It performs the following key tasks:
-
Clone a GitHub repository
- If the repository is not already cloned, it will clone the repository into a specified directory.
-
Read and split Java code
- Loads Java files from the repository and splits them into smaller chunks to handle token limits.
-
Build a vector store
- Uses FAISS (Facebook AI Similarity Search) to build a vector store from the code chunks for efficient semantic search.
-
Query the vector store
- Accepts a user query to search the vector store and retrieves the most relevant code chunks.
-
Analyze code with LLM
- Sends the top-matching code chunks to Google's Generative AI model and extracts structured information in JSON format.
-
Save the output
- Saves the structured summary into a
JSON
file.
- Saves the structured summary into a
First, clone this project to your local machine:
git clone https://github.com/your-repository/Code-Analysis-Project.git
cd Code-Analysis-Project
Ensure you have Python 3.8 or higher installed.
Then install the required dependencies:
pip install -r requirements.txt
Dependencies include:
langchain
langchain-google-genai
sentence-transformers
faiss-cpu
python-dotenv
numpy
You will need a Google API key to access Gemini models.
Create a .env
file in the root directory and add:
GOOGLE_API_KEY=your_google_api_key
Replace your_google_api_key
with your actual API key.
Run the following command:
python main.py
This will perform the following:
- Clone the Sakila repository (if not already cloned).
- Load and split Java files.
- Build the FAISS vector store.
- Accept a user query.
- Analyze the matching code using Gemini LLM.
- Save the output to a JSON file.
When you run the project, you will see:
What would you like to know about the SakilaProject?
Example queries you can ask:
- "What are the key classes in the SakilaProject?"
- "What does the method getCustomer do?"
- "Explain the database connection part."
File Name | Purpose |
---|---|
clone_repo.py |
Contains the function to clone a GitHub repository. |
load_split.py |
Loads Java files and splits code into manageable chunks. |
buildvector_faiss.py |
Builds the FAISS vector store and handles semantic search operations. |
analyze_code.py |
Sends code to LLM and processes the analysis response. |
save_output.py |
Saves the final JSON output. |
main.py |
Main entry point integrating all functionalities. |
This project is licensed under the MIT License.
See the LICENSE file for details.