3.1.0
What's Changed
Sub-task
[SYSTEMDS-2411] - Performance codegen kmeans mnist80m w/ compression
[SYSTEMDS-3088] - Prefetch instruction
[SYSTEMDS-3098] - Broadcast instruction
[SYSTEMDS-3256] - Create apply functions for cleaning primitives
[SYSTEMDS-3286] - LogicalEnumerator change with transitions concept and cleanups
[SYSTEMDS-3347] - Flatten the nested loop for parallel pipelines execution
[SYSTEMDS-3376] - Adding apply_pipeline() builtin for cleaning pipelines API
[SYSTEMDS-3401] - Release docker images with GitHub actions
[SYSTEMDS-3422] - Add monitoring tool testing workflows
[SYSTEMDS-3443] - Asynchronous Execution and Persist Spark Transformations
[SYSTEMDS-3466] - Future-based asynchronous execution of Spark actions
[SYSTEMDS-3469] - New operator linearization order to maximize inter-operator parallelism
[SYSTEMDS-3470] - Lineage-based reuse of Spark actions
[SYSTEMDS-3473] - Push down rmvar instructions for asynchronous instructions
[SYSTEMDS-3474] - Lineage-based reuse of asynchronous operators
[SYSTEMDS-3479] - Persist and reuse Spark RDDs
[SYSTEMDS-3497] - Refactor to add LOP rewrite step in compilation
Bug
[SYSTEMDS-1026] - Fix memory configuration in sparkDML.sh
[SYSTEMDS-1281] - OOM Error On Binary Write
[SYSTEMDS-1283] - Out of memory error
[SYSTEMDS-2948] - CLA Improved Run estimation
[SYSTEMDS-3045] - AttributeError: Function definition not found
[SYSTEMDS-3272] - applySchema built-in to set the schema of frame from DML
[SYSTEMDS-3339] - CSR TSMM left with filled rows bug
[SYSTEMDS-3353] - Sparse TSMM dense row blocks CSR
[SYSTEMDS-3354] - py4j.Py4JException: Method exceptionString([class org.apache.spark.SparkConf]) does not exist
[SYSTEMDS-3355] - MatrixBlock size using CSR when allowed
[SYSTEMDS-3379] - Federated Nan Values
[SYSTEMDS-3390] - countDistinctApprox() operation in AggregateUnaryCPInstruction is inefficient for row/col aggregations
[SYSTEMDS-3391] - Correct the release artifact generation date
[SYSTEMDS-3394] - Log4j incompatible dependencies
[SYSTEMDS-3396] - ConcurrentModificationException in federated execution
[SYSTEMDS-3398] - Jackson Core missing for json writing and reading in reduced binary
[SYSTEMDS-3400] - Fix Java doc warnings
[SYSTEMDS-3408] - Enque output not UTF-8 python
[SYSTEMDS-3409] - Read CSV directly without mtd python
[SYSTEMDS-3411] - Python configuration not loading defaults
[SYSTEMDS-3412] - Matrix Multiplication crash in Spark
[SYSTEMDS-3414] - Pipelines failing in Hybrid execution
[SYSTEMDS-3415] - Built-in tests failure in Git actions
[SYSTEMDS-3416] - Cleaning Pipelines failed with No space left on device
[SYSTEMDS-3417] - IndexOutOfBounds due to int overflow on replace
[SYSTEMDS-3418] - Cleaning Pipelines: Replace function failure in hybrid execution
[SYSTEMDS-3419] - Cleaning Pipelines: Block Sizes mismatch
[SYSTEMDS-3420] - Cleaning Pipelines in hybrid mode: Invalid block dimensions error
[SYSTEMDS-3424] - Federated Statistics print in non federated scenario
[SYSTEMDS-3425] - Spark Aggregate Binary operations parse to Fed instruction
[SYSTEMDS-3432] - FederationUtils.bindResponses causes out of memory because of sparse matrices.
[SYSTEMDS-3433] - Python IDE test Docs fail
[SYSTEMDS-3435] - MSVM robustness for non-existing classes
[SYSTEMDS-3436] - CLA ArrayOutOfBounds in sample
[SYSTEMDS-3437] - CLA Invalid Unique estimate DDC
[SYSTEMDS-3439] - Federated read cache cannot be disabled
[SYSTEMDS-3442] - Monitoring Heavy hitters not always correct list
[SYSTEMDS-3451] - Slow Federated Mlogreg on Criteo (dummy-coded)
[SYSTEMDS-3452] - Incorrect warning when reading scalars
[SYSTEMDS-3476] - Spark with default settings
[SYSTEMDS-3477] - Cleaning Pipelines: Task Parallel Experiments failing in spark mode
[SYSTEMDS-3498] - Unique() crashes with iterator EOF on vectors with >1K distinct items
[SYSTEMDS-3500] - Perftest: Mlogreg on 1M_1k_dense w/ unnecessary spark jobs
[SYSTEMDS-3501] - Perftest: lmDS on 1M_1k_dense with unnecessary spark tsmm
[SYSTEMDS-3503] - Java doc warnings
Epic
[SYSTEMDS-450] - Extended spark interfaces
[SYSTEMDS-3445] - Combining compression schemes together
[SYSTEMDS-3459] - Reorganization and cleanup of the internal representation of FrameBlocks
New Feature
[SYSTEMDS-2551] - Federated Compression Instruction
[SYSTEMDS-2699] - CLA IO Compressed Matrices
[SYSTEMDS-2754] - Compressed Max/Min Index support.
[SYSTEMDS-2830] - Functional Compression
[SYSTEMDS-3280] - Homomorphic Encryption for Federated Parameter Servers
[SYSTEMDS-3303] - NN Builtin: Attention Layer
[SYSTEMDS-3325] - Multi-threaded tokenization
[SYSTEMDS-3337] - CLA TSMM direct multiplication
[SYSTEMDS-3360] - Federated async compression
[SYSTEMDS-3361] - Federated Workload-aware Compression
[SYSTEMDS-3369] - Timout setting for all federated tests
[SYSTEMDS-3374] - Federated primitive for transferring a local data object to a federated representation
[SYSTEMDS-3404] - Synchronous with backup workers mode for Parameter Servers
[SYSTEMDS-3405] - Federated Write at site
[SYSTEMDS-3438] - CLA RowSlice compressed return
[SYSTEMDS-3478] - Bitset array for Frames
[SYSTEMDS-3481] - Frame from MatrixBlock improvement
[SYSTEMDS-3493] - Python windows Install
[SYSTEMDS-3494] - Python 3.9 support
[SYSTEMDS-3495] - Parallel Compressed Encode
Story
[SYSTEMDS-2783] - Lineage-based reuse in federated execution
[SYSTEMDS-3087] - Memory management and lazy evaluation in dynamic environments
[SYSTEMDS-3463] - Add unique() built-in function
Improvement
[SYSTEMDS-295] - Unexpected order when print !boolean
[SYSTEMDS-1169] - Clean Up and Automate Python Tests
[SYSTEMDS-1406] - Fix whitespace issues in main algorithms
[SYSTEMDS-1532] - Introduce Python scripts to launch SystemML from the Command Line
[SYSTEMDS-2513] - Improve the Development and user experience on Windows
[SYSTEMDS-2897] - CLA decompressing write
[SYSTEMDS-3185] - Federated Multi Tenant Backend
[SYSTEMDS-3187] - Add documentation for the release scripts
[SYSTEMDS-3192] - Test large dense block compression
[SYSTEMDS-3254] - CountDistinct Col and Row & Unique
[SYSTEMDS-3282] - Upper bound for number of decoders
[SYSTEMDS-3283] - Multi-threaded ctable instruction
[SYSTEMDS-3319] - CLA Generalize Bin Packing
[SYSTEMDS-3323] - CLA move combine of empty to estim
[SYSTEMDS-3328] - Federated transform for equi-height
[SYSTEMDS-3359] - Sample-based Recode Map Size Estimation
[SYSTEMDS-3386] - Refactor runtime replacement of CP or SP instructions with FED instructions
[SYSTEMDS-3393] - Use Java (JDK17) SIMD Implementation
[SYSTEMDS-3413] - Add row/col aggregation support to countDistinct() builtin function
[SYSTEMDS-3429] - Use Local Level of Parallelism when Transformencoding in Federated Mode
[SYSTEMDS-3440] - Federated Requests Coordinator Hostname
[SYSTEMDS-3444] - Spark Write CLA
[SYSTEMDS-3446] - DDC Append
[SYSTEMDS-3447] - SDC Append
[SYSTEMDS-3448] - Uncompressed Append
[SYSTEMDS-3449] - Const/Empty append
[SYSTEMDS-3450] - DDCFOR Append
[SYSTEMDS-3453] - Offsets Append
[SYSTEMDS-3454] - CLA Sheme primitive
[SYSTEMDS-3455] - Improved multi-threaded unary operations
[SYSTEMDS-3456] - MatrixBlock equals
[SYSTEMDS-3457] - MatrixBlock equals Sparse Specialization
[SYSTEMDS-3458] - Add support for Spark backend to countDistinct() builtin function
[SYSTEMDS-3460] - Move FrameBlock out of MatrixBlock path.
[SYSTEMDS-3461] - FrameBlock Arrays separation
[SYSTEMDS-3462] - FrameBlock Iterators Factory Pattern
[SYSTEMDS-3464] - Python Combine Write
[SYSTEMDS-3465] - Typed return on CacheBlock Interface Slice
[SYSTEMDS-3467] - Add support for MULTI_BLOCK Spark backend support for countDistinct()
[SYSTEMDS-3471] - Enable multi-threaded transformencode/apply
[SYSTEMDS-3472] - Spark Append Frame Bug
[SYSTEMDS-3475] - Spark update version 3.3.1
[SYSTEMDS-3480] - Verify release scripts with github workflows
[SYSTEMDS-3482] - Parallel Hadoop IO startup
[SYSTEMDS-3484] - FrameAppend optimization
[SYSTEMDS-3485] - Precompile detect type patterns
[SYSTEMDS-3486] - Character Array Type
[SYSTEMDS-3487] - Array primitives with null
[SYSTEMDS-3488] - Compressed Frame Write
[SYSTEMDS-3489] - CLA Compress NaN
[SYSTEMDS-3490] - Compressed Transform Encode
[SYSTEMDS-3491] - CLA Specialized Column Indexes
Test
[SYSTEMDS-3395] - ColGroup Equivalence Tests
[SYSTEMDS-3397] - Python NN testExample
Wish
[SYSTEMDS-3171] - GIO - Mapping from binary data
Task
[SYSTEMDS-209] - Algorithm wrappers (ml pipelines, ml context)
[SYSTEMDS-563] - MR operations over frames
[SYSTEMDS-3148] - Federated Performance Tests
[SYSTEMDS-3228] - Builtin for k nearest neighbor graph construction
[SYSTEMDS-3229] - WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
[SYSTEMDS-3241] - Federated quantile
[SYSTEMDS-3291] - Apply builtin for mice
[SYSTEMDS-3348] - Federated Monitoring Tool
[SYSTEMDS-3496] - New builtin function auc (area under ROC curve)
Dependency upgrade
[SYSTEMDS-3375] - CUDA11 / CUDNN8 support
Documentation
[SYSTEMDS-3407] - GMM is missing docs for seed and verbose
[SYSTEMDS-3434] - Python API does not include params for all kvargs
New Contributors
- @fathollahzadeh made their first contribution in #1469
- @morf1us made their first contribution in #1520
Full Changelog: 2.2.0-rc1...3.0.0-rc2