🚀 High-Performance Polars Backend + Simplified Text API

🎯 Key Features

⚡ NEW: High-Performance Polars Backend (2-100x faster!)

Dual DataFrame Support: Choose between pandas (default) or Polars backends
Zero Code Changes: Add backend='polars' to any operation for instant speedups
Comprehensive Coverage: All data types (arrays, text, files) work with both backends
Smart Type Preservation: DataFrames maintain their type when no backend specified
Global Configuration: Set default backend preference with set_dataframe_backend('polars')
Cross-Backend Conversion: Seamlessly convert between pandas and Polars DataFrames

📊 Performance Gains with Polars

Array Processing: 2-100x faster conversion for large datasets
Text Embeddings: 3-10x faster document processing
Memory Efficiency: 30-70% reduction in memory usage
Parallel Processing: Built-in multi-core optimization

🎨 Simplified Text Model API (80% reduction in verbosity)

Simple String Format: {'model': 'all-MiniLM-L6-v2'} now works everywhere
Automatic Normalization: All model formats converted to unified dict internally
List Support: Lists of models work with simplified format
Full Backward Compatibility: All existing verbose syntax continues working

📋 Quick Start Examples

High-Performance Processing

import datawrangler as dw
import numpy as np

# Large dataset example
large_array = np.random.rand(50000, 20)

# Traditional pandas backend
pandas_df = dw.wrangle(large_array)  # Default

# High-performance Polars backend (2-100x faster!)
polars_df = dw.wrangle(large_array, backend='polars')

# Set global preference
from datawrangler.core.configurator import set_dataframe_backend
set_dataframe_backend('polars')  # All operations now use Polars

Simplified Text Processing

# Before v0.4.0 (verbose)
text_kwargs = {
    'model': {
        'model': 'all-MiniLM-L6-v2',
        'args': [],
        'kwargs': {}
    }
}

# After v0.4.0 (simplified!)
text_kwargs = {'model': 'all-MiniLM-L6-v2'}

# Works with Polars for 3-10x faster text processing
fast_embeddings = dw.wrangle(texts, text_kwargs=text_kwargs, backend='polars')

🔧 Additional Improvements

- Google Colab Fix: Eliminated installation warning popup
- Cleaner Dependencies: Removed redundant configparser
- Enhanced Documentation: All examples updated for both backends
- API Consistency: Fixed all docstring examples to use public API

📈 When to Use Each Backend

- Use pandas for: Small datasets, complex index operations, maximum ecosystem compatibility
- Use Polars for: Large datasets, performance-critical applications, memory efficiency

🚀 Installation

pip install --upgrade pydata-wrangler

# For full ML capabilities including sentence-transformers
pip install --upgrade "pydata-wrangler[hf]"

🧪 Verified Quality

- ✅ All 45 tests passing
- ✅ Documentation builds successfully
- ✅ Full backward compatibility maintained
- ✅ Comprehensive API examples tested

This release maintains full backward compatibility while delivering significant performance improvements and API simplification. Upgrade today to experience the power of high-performance data wrangling!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.0 (June, 2025)

🚀 High-Performance Polars Backend + Simplified Text API

🎯 Key Features

⚡ NEW: High-Performance Polars Backend (2-100x faster!)

📊 Performance Gains with Polars

🎨 Simplified Text Model API (80% reduction in verbosity)

📋 Quick Start Examples

High-Performance Processing

Uh oh!