Skip to content

Releases: HyukjinKwon/spark-connect-ruby

v0.3.0

Choose a tag to compare

@github-actions github-actions released this 15 Jun 11:00

Full Changelog: v0.2.1...v0.3.0

v0.2.0

Choose a tag to compare

@HyukjinKwon HyukjinKwon released this 10 Jun 10:38

Adds Structured Streaming and Declarative Pipelines (Spark 4.1+), plus
temporary views, the catalog create_table family, new_session / interrupts /
operation tags, and assorted DataFrame additions (with_watermark,
repartition_by_range, checkpoint, col_regex, to_json, ...). Regenerated
against the Spark Connect 4.1.0 protocol.

Published to RubyGems: https://rubygems.org/gems/spark-connect/versions/0.2.0

See CHANGELOG.md for the full list. Not yet supported: UDFs, foreach/foreachBatch,
and MLlib-over-Connect.

Full Changelog: v0.1.0...v0.2.0

v0.1.0

Choose a tag to compare

@HyukjinKwon HyukjinKwon released this 10 Jun 09:35

First release of spark-connect, a pure-Ruby client for Apache Spark Connect.

Highlights:

  • PySpark-style DataFrame API (select/filter/join/group_by/agg/window/SQL/...)
  • Column expressions and a broad function library (SparkConnect::F)
  • DataFrameReader/Writer (CSV, JSON, Parquet, ORC, JDBC, tables) + v2 writer
  • Catalog, runtime config, observations, full Spark SQL type system
  • Apache Arrow result decoding over a resilient gRPC client
  • Targets the Spark Connect 4.0 protocol (works with 3.5+ servers)

Documentation: https://hyukjinkwon.github.io/spark-connect-ruby/

See CHANGELOG.md for the full list.

Full Changelog: https://github.com/HyukjinKwon/spark-connect-ruby/commits/v0.1.0