Documentation - https://klareda.github.io/klar-EDA/
Presentation - https://youtu.be/FsDV6a-L-wo
klar-EDA is a Python library that automates and modernizes exploratory data analysis, making data exploration faster, smarter, and more insightful. Originally created to ease data preprocessing and provide automated EDA techniques, klar-EDA v2.0 represents a complete modernization with AI-powered insights, async processing, and interactive visualizations.
- Async Processing: 10x+ performance improvement with Polars and async operations
- Interactive Visualizations: Modern charts with Plotly and Altair instead of static matplotlib
- AI-Powered Insights: LLM-generated analysis summaries and recommendations
- Plugin Architecture: Extensible system for custom analyzers and visualizations
- Modern Development: Poetry, type hints, comprehensive testing, and CI/CD
The library maintains backward compatibility with v1 APIs while providing modern alternatives:
# Legacy v1 API (still supported)
from klar_eda.visualization import visualize_csv
from klar_eda.preprocessing import preprocess_csv
# Modern v2 API (recommended)
import klar_eda as eda
result = await eda.analyze("data.csv")
# Install with modern Poetry (recommended)
git clone https://github.com/klarEDA/klar-EDA.git
cd klar-EDA
poetry install
# Or install with pip (legacy method)
pip install -r requirement.txt
import klar_eda as eda
# Simple one-liner analysis with AI insights
result = await eda.analyze("data.csv")
# Access comprehensive results
print(result.ai_summary) # AI-generated insights
result.show_dashboard() # Interactive dashboard
result.export("report.html") # Export interactive report
# Access individual components
print(result.statistics) # Detailed statistics
for viz in result.visualizations: # Interactive charts
viz.show()
# CSV Data Visualization (v1 style)
from klar_eda.visualization import visualize_csv
visualize_csv("data.csv")
# CSV Data Preprocessing (v1 style)
from klar_eda.preprocessing import preprocess_csv
preprocess_csv("data.csv")
# Image Data Visualization (v1 style)
from klar_eda.visualization import visualize_images
import tensorflow_datasets as tfds
ds = tfds.load('cifar10', split='train', as_supervised=True)
images, labels = [], []
for image, label in tfds.as_numpy(ds):
images.append(image)
labels.append(label)
visualize_images(images, labels)
# Image Data Preprocessing (v1 style)
from klar_eda.preprocessing import preprocess_images
preprocess_images("images_folder_path")
The library consists of the following modernized modules:
- Core Analysis Engine: Async data processing with Polars/Pandas
- Interactive Visualizations: Plotly-based charts with real-time updates
- AI Insights: OpenAI-powered analysis summaries
- Plugin System: Extensible analyzer and visualizer framework
- Web Interface: FastAPI backend with React frontend (coming soon)
- CSV Data Visualization: Automated chart generation
- CSV Data Preprocessing: Data cleaning and transformation
- Image Data Visualization: Computer vision analysis
- Image Data Preprocessing: Image enhancement and standardization
- Python 3.11+: Modern Python with latest features
- Poetry: For dependency management
- Node.js 18+: For frontend development (optional)
# Modern development workflow
make install-dev # Install all dependencies
make test # Run comprehensive tests
make lint # Code quality checks
make format # Format code
make dev # Start development environment
make docs-serve # Serve documentation
Operation | v1.0 | v2.0 | Improvement |
---|---|---|---|
Large CSV Analysis | 45s | 3.2s | 14x faster |
Image Processing | 8m 30s | 28s | 18x faster |
Visualization Generation | 12s | 0.8s | 15x faster |
- Modern Python architecture with Poetry
- Async data processing engine
- AI-powered insights integration
- Backward compatibility with v1 API
- Comprehensive testing and CI/CD
- Complete AI insights implementation
- Advanced interactive visualizations
- Plugin marketplace
- Web interface development
- Real-time data streaming
- Advanced statistical tests
- AutoML integration
- Enterprise features
Existing v1 code continues to work unchanged, but we recommend migrating to the modern API:
# Old v1 approach
from klar_eda.visualization import visualize_csv
visualize_csv("data.csv")
# New v2 approach (recommended)
import klar_eda as eda
result = await eda.analyze("data.csv")
result.show_dashboard()
Benefits of migrating:
- 10x+ performance improvement
- AI-powered insights
- Interactive visualizations
- Better error handling
- Modern async support
We welcome contributions! The project now follows modern development practices:
- Code Quality: Black, Ruff, MyPy for formatting and linting
- Testing: Comprehensive pytest suite with coverage
- Documentation: Auto-generated docs with examples
- CI/CD: Automated testing and deployment
See CONTRIBUTING.md for detailed guidelines.
git clone https://github.com/klarEDA/klar-EDA.git
cd klar-EDA
make install-dev
make test
klar-EDA is released under the MIT license.
- Complete architectural redesign for performance and maintainability
- Modern Python practices and tooling
- AI integration and interactive features
For issues, questions, or contributions:
- Email: [email protected]
- GitHub Issues: Project Issues
- Documentation: https://klareda.github.io/klar-EDA/
From static analysis to intelligent insights - klar-EDA v2.0 represents the evolution of automated exploratory data analysis. โญ Star this repository if you find it helpful!