title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned |
---|---|---|---|---|---|---|---|
Insights |
📈 |
gray |
yellow |
streamlit |
1.33.0 |
app.py |
false |
Insights is a state-of-the-art data analysis tool that leverages the Gemini-Pro large language model (LLM) to automate and enhance the data analysis process. This tool aims to perform end-to-end data analysis tasks, providing substantial cost and time savings while matching or exceeding the performance of junior data analysts.
In today's data-driven world, robust data analysis tools are crucial for informed decision-making and strategic planning. Traditional data analysis methods often face challenges such as time-consuming processes, potential for errors, and the need for specialized expertise. Insights addresses these issues by utilizing AI to streamline and enhance the data analysis process.
- Automated Data Analysis: Perform data collection, visualization, and analysis with minimal human intervention.
- Advanced Summarization: Generate detailed summaries and potential questions for datasets.
- Exploratory Data Analysis (EDA): Tools for statistical summaries, distribution plots, and correlation matrices.
- Data Cleaning and Transformation: Functions for handling missing values, outlier detection, normalization, and feature engineering.
- Machine Learning Toolkit: Automates model selection, training, hyperparameter tuning, and evaluation.
- Query Answering Module: Generate Python code to answer user queries and produce visualizations.
The Insights tool is built on the Gemini platform and consists of three main components:
- Summary Module
- QA Module
- Code Execution and Analysis Generation
Extracts essential details about the dataset and generates a comprehensive summary along with potential questions for further exploration.
Handles user queries related to the dataset, generating Python code to answer the queries and produce visualizations.
Executes the generated Python code offline to ensure data security, producing detailed responses and visualizations.
- Information Extraction: Extracts critical information from the dataset.
- Prompting Gemini: Constructs a detailed prompt for Gemini to generate summaries and questions.
- Summary and Question Generation: Generates a summary and potential questions for user review.
Includes tools for EDA, data cleaning, and data transformation.
Facilitates the creation and evaluation of machine learning models on the dataset.
Allows users to query the dataset and receive answers along with visualizations. The process involves:
- Accepting user queries.
- Combining queries with dataset information.
- Generating and executing Python code offline.
- Producing visualizations and textual data.
Processes the output from code execution to create concise and insightful responses.
- Initialize the Tool:
python app.py
- Load Dataset: Upload your dataset when prompted.
- Generate Summary: The tool will automatically generate a summary and potential questions.
- Exploratory Data Analysis: Use the EDA tools to explore your dataset.
- Query the Dataset: Enter your queries to receive answers and visualizations.
- Analyze Results: Review the detailed analysis generated by the tool.
- Install the required packages:
The project's dependencies are listed in the 'requirements.txt' file. You can install all of them using pip:
pip install -r requirements.txt
- Run the application:
Now, you're ready to run the application. Use the following command to start the Streamlit server:
streamlit run app.py
- Build the docker image using
docker build -t insights .
- Run the Docker container with
docker run -p 8501:8501 -e GOOGLE_API_KEY=<you-api-key> insights