Skip to content

v0.4.0

Latest

Choose a tag to compare

@lance-community lance-community released this 11 May 19:54
· 111 commits to main since this release

What's Changed

New Features 🎉

  • feat: support DROP INDEX DDL by @LuciferYang in #371
  • feat: add Map type support by @summaryzb in #379
  • feat: support float16 vector type by @wombatu-kun in #378
  • feat: support alter table set/unset properties by @wombatu-kun in #358
  • feat: support alter table set unenforced primary key by @wombatu-kun in #359
  • feat: support rename table by @wombatu-kun in #352
  • feat: propagate enable_stable_row_ids through spark write path by @ivscheianu in #351
  • feat(benchmark): add --file-format-version option to TPC-DS data generator by @xuzha in #411
  • feat: add zonemap-based fragment pruning and storage-partitioned join (SPJ) support by @beinan in #396
  • feat: add use_large_var_types option to avoid 2GB Arrow vector overflow by @beinan in #413
  • feat(benchmark): add TPC-DS btree index creation for fragment pruning by @summaryzb in #433
  • feat: support compression config via Spark TBLPROPERTIES by @ivscheianu in #428
  • feat: add byte-based batch flushing to prevent OOM on large rows by @beinan in #420
  • feat: expose blob_pack_file_size_threshold write option by @hamersaw in #447
  • feat: support reading non-microsecond Arrow timestamp columns by @summaryzb in #444
  • feat: require clustered distribution on write for SPJ by @beinan in #445
  • feat: support Lance index metadata in Spark indexing by @jackye1995 in #481
  • feat: add param rows_per_range for range-based btree index built by @fangbo in #439
  • feat: add custom Lance metrics to trace read-path scan performance by @summaryzb in #460
  • feat: preserve Arrow Date(MILLISECOND) columns through Spark roundtrip by @summaryzb in #464
  • fix: widen pruned nested struct schemas to preserve Arrow child ordinals by @butnaruandrei in #442

Bug Fixes 🐛

  • fix: strip quotes from visitStringLiteral and fail explicitly on unrecognized build_mode by @puchengy in #375
  • fix: escape single quotes in filter pushdown SQL compilation by @LuciferYang in #377
  • fix: prevent resource leaks in read path close/error handling by @LuciferYang in #376
  • fix: decouple benchmark module from lance-spark version dependency by @summaryzb in #370
  • fix: rename NamedArgument to LanceNamedArgument to avoid Iceberg classpath collision by @LuciferYang in #383
  • fix: add --fail flag to curl downloads in docker/Dockerfile by @wombatu-kun in #405
  • fix: update columns concurrent write conflict issue by @jerryjch in #345
  • fix: pass namespace and storage parameters for add/update column by @bryanck in #422
  • fix: add clean-bundle target for reliable source change detection by @ivscheianu in #426
  • fix(benchmark): accurately materialize tpcds query by @summaryzb in #415
  • fix: implement equals/hashCode on LanceScan to enable ReusedExchange by @LuciferYang in #427
  • fix: intercept Spark 4.0+ native CREATE INDEX to prevent NPE by @beinan in #412
  • fix: report post-pruning statistics to enable BroadcastHashJoin with SPJ by @beinan in #425
  • fix: race condition in QueuedArrowBatchWriteBuffer losing final batch by @hamersaw in #431
  • fix: preserve use_large_var_types on staged commit path by @beinan in #443
  • fix: remove Array filter pushdown workaround (upstream lance#… by @summaryzb in #441
  • fix: exclude netty from bundle jars to prevent split-package conflicts by @hamersaw in #458
  • fix: roll fragments on partition-value transitions by @hamersaw in #463
  • fix: no such method error in lance arrow util due to transitive json4s usage by @ivscheianu in #465
  • fix: propagate index_details from distributed index creation by @LuciferYang in #475
  • fix(spark): gss initiate failed on hms executors; spark.sql.catalog read options not applied by @xiaguanglei in #476
  • fix: reject DECIMAL256 columns at schema resolution time with actiona… by @summaryzb in #492

Documentation 📚

  • docs: add Spark 4.1 to supported versions by @hamersaw in #391
  • docs: add use_large_var_types write option documentation by @beinan in #424
  • docs: fixed repo name from lancedb/lance-spark to lance-format/lance-spark by @wombatu-kun in #438
  • docs: add Lance Spark Glue/S3 agent skill by @jackye1995 in #489

Performance Improvements 🚀

  • perf: optimize LIMIT pushdown by pruning splits using fragment row counts by @beinan in #395
  • refactor: remove Java-side dataset cache, rely on Rust-side Session by @LuciferYang in #353
  • perf: report projection-aware stats so BroadcastHashJoin fires on pruned scans by @LuciferYang in #435

Other Changes

  • refactor: move integration tests to top-level directory by @hamersaw in #393
  • refactor: remove dead SchemaConverter JsonArrow code by @LuciferYang in #409
  • refactor: consolidate dataset-open logic into Utils.OpenDatasetBuilder by @LuciferYang in #384
  • refactor: refactor the vector data expose to rust side to improve performance and prevent OOM by @fangbo in #467

New Contributors

Full Changelog: v0.3.0...v0.4.0