Releases: spotify/scio
v0.14.5
What's changed
Includes Beam 2.56.0 support.
warning: This release transitively pulls newer avro and jackson versions. If you need to depend on previous versions, you'll need to pin them in your build.
🚀 Enhancements
- (#5296) Support Parquet predicates/projections in tests (#5309) @clairemcginty
- Add support for Zstd coders (#5321) @kellen
- [scio-core](feature) Sample SCollection with max weight (#5352) @RustedBones
- [scio-core](feature) Add readFiles and readFilesWithPath apis (#5350) @RustedBones
🐛 Bug Fixes
- Fix sparkey loading issue for unknown keys (#5387) @RustedBones
- (fix)[avro] Use correct DatumWriter constructor (#5371) @RustedBones
- fix FixQuery scalafix rule for macro-generated classes (#5356) @clairemcginty
📜 Scalafix Migrations
- Refactor FixAvroCoder case application (#5353) @clairemcginty
- Update FixAvroCoder to match transformed Avro SCollections (#5351) @clairemcginty
📗 Documentation
- Update release table for 0.14.5 (#5392) @RustedBones
- Prefer waitUndilDone over waitUntilFinish in test and examples (#5349) @RustedBones
🧪 Test Improvements
- Drop slow test tag (#5391) @RustedBones
🏗️ Build Improvements
- Update scalac-options to 0.1.5 (#5385) @scala-steward
- Fix assembly conflict (#5382) @RustedBones
- Update munit to 1.0.0 (#5378) @scala-steward
- [avro] Fix CI test for avro latest (#5372) @RustedBones
- Update sbt, sbt-dependency-tree to 1.10.0 (#5365) @scala-steward
- Update sbt-typelevel to 0.7.1 (#5366) @scala-steward
- Update socco-ng to 0.11.12 (#5368) @RustedBones
- Update sbt-scalafix to 0.12.1 (#5357) @scala-steward
- Update sbt-scoverage to 2.0.12 (#5363) @scala-steward
- (chore) Update to codecov-action (#5354) @RustedBones
- Enable mima for test artifacts (#5343) @RustedBones
- Update scalacheck to 1.18.0 (#5344) @scala-steward
🌱 Dependency Updates
- Sync flink artifact (#5393) @RustedBones
- Update elasticsearch-java to 7.17.21 (#5390) @RustedBones
- Update elasticsearch-java to 8.14.0 (#5388) @scala-steward
- Update neo4j-java-driver to 4.4.17 (#5384) @scala-steward
- Update algebra, cats-core, cats-kernel to 2.12.0 (#5383) @scala-steward
- Update magnolify to 0.7.3 (#5381) @RustedBones
- Update mssql-jdbc to 12.6.2.jre11 (#5380) @scala-steward
- Update jedis to 5.1.3 (#5376) @scala-steward
- Update shapeless to 2.3.12 (#5374) @scala-steward
- Update cloud-sql-connector-jdbc-sqlserver, ... to 1.18.1 (#5369) @scala-steward
- Update scala-compiler, scala-library, ... to 2.13.14 (#5362) @scala-steward
- Update mysql-connector-j to 8.4.0 (#5359) @scala-steward
- Update beam to 2.56 (#5346) @RustedBones
- Update circe-core, circe-generic, ... to 0.14.7 (#5360) @scala-steward
Contributors to this release
v0.14.4
Includes Beam 2.55.1 support.
🚀 Enhancements
- Robust throwable kryo coder (#5318) @RustedBones
- Use JacksonJsonpMapper as default for Elasticsearch (#5306) @kellen
- Allow String key type to transform SMB sources with CharSequence key (#5297) @clairemcginty
🐛 Bug Fixes
- [scio-avro] fix: allow conversions field in record (#5332) @RustedBones
- Add all avro logical type conversions to model (#5301) @RustedBones
- fix: safer implementation of distinctBy (#5299) @RustedBones
📜 Scalafix Migrations
- scalafix: Rule to migrate deprecated query to queryRaw (#5302) @RustedBones
📗 Documentation
- Prepare release v0.14.4 (#5342) @RustedBones
- [doc] Fix scaladoc link to class declared in package objects (#5316) @RustedBones
- Add staging dir to REPL docs (#5308) @kellen
🧪 Test Improvements
- Move test code in respective projects (#5330) @RustedBones
- [integration] Use literals for fromSchemaFile integration test (#5334) @RustedBones
- [integration] Fix avro integration test (#5333) @RustedBones
🏗️ Build Improvements
- Update sbt-site, sbt-site-paradox to 1.7.0 (#5324) @scala-steward
- Update sbt-paradox to 0.10.7 (#5325) @scala-steward
- Update scalafmt-core to 3.8.1 (#5320) @scala-steward
- Update sbt-buildinfo to 0.12.0 (#5310) @scala-steward
- Update sbt-assembly to 2.2.0 (#5304) @scala-steward
🌱 Dependency Updates
- Update scalacheck to 1.17.1 (#5341) @scala-steward
- Update cassandra-all to 3.11.17 (#5340) @scala-steward
- Update cloud-sql-connector-jdbc-sqlserver, ... to 1.18.0 (#5339) @scala-steward
- Update pprint to 0.9.0 (#5336) @scala-steward
- Update scala-collection-compat to 2.12.0 (#5337) @scala-steward
- Update beam to 2.55.1 (#5322) @RustedBones
- Update elasticsearch-java to 8.13.2 (#5323) @scala-steward
- Update neo4j-java-driver to 4.4.15 (#5328) @scala-steward
- Update magnolify-avro, magnolify-bigtable, ... to 0.7.2 (#5319) @scala-steward
- Update elasticsearch v7 to 7.17.19 (#5317) @RustedBones
- Update beam to 2.55 (#5307) @RustedBones
- Update sbt-site, sbt-site-paradox to 1.6.0 (#5311) @scala-steward
- Update voyager to 2.0.6 (#5313) @scala-steward
Contributors to this release
v0.14.3
🐛 Bug Fixes
- fix: Execute SmbIO output assertions (#5289) @RustedBones
- (fix #5290) Support empty input in TransformOverride.ofSource (#5293) @clairemcginty
- (Fix #5285) Allow String key type to read SMB sources written with CharSequence key type (#5291) @clairemcginty
🌱 Dependency Updates
- Update jedis to 5.1.2 (#5292) @scala-steward
- Update neo4j-java-driver to 4.4.14 (#5288) @scala-steward
🏗️ Build Improvements
- Update sbt-scalafix to 0.12.0 (#5287) @scala-steward
Contributors to this release
v0.14.2
Contains a bugfix for duplicated SMB Transforms.
🐛 Bug Fixes
- (fix) Remove duplicated transform application (#5283) @RustedBones
🚀 Enhancements
- Make checkVersion more dynamic (#5282) @RustedBones
- Improve implicit coder not found message (#5281) @RustedBones
📗Documentation
- (doc) fix 0.14 migration guide broken links (#5278) @RustedBones
🏗️ Build Improvements
- Update scala to 2.13.13 and 2.12.19 (#5276) @RustedBones
Contributors to this release
v0.14.1
Includes Beam 2.54.0 support.
🚀 Enhancements
- Add layer for low priority coder conflict (#5274) @RustedBones
- Add support of BIGNUMERIC in BigQueryIO (#5225) @shnapz
- Change dataflow runner check for parquet splittable do fn (#5264) @RustedBones
- Make sure Coder.gen produces informative error (#5258) @RustedBones
- Add default coder for ResourceId (#5244) @clairemcginty
- Add type bound for parquet-avro SCollection ops (#5230) @RustedBones
🐛 Bug Fixes
- (fix) Update SmbIO to support absolute path in test (#5277) @RustedBones
- Simplify BucketedInput serialization (#5270) @clairemcginty
- Fix byte[] equality issue for MockByteArraySparkeyReader (#5262) @clairemcginty
- Fix scio-repl assembly after bream 2.54 upgrade (#5263) @RustedBon
- Always set Parquet-Avro projection (#5234) @clairemcginty
- Fix SmbIO's testId naming (#5228) @clairemcginty
- Sort Path IDs in SmbIO (#5242) @clairemcginty
- Fix SmbIO side input support (fix #5240) (#5241) @clairemcginty
📜 Scalafix Migrations
- Make FixLogicalTypeSupplier more permissive (#5233) @clairemcginty
- Fix MatchError for FixAvroCoder (#5269) @clairemcginty
- Broaden use cases to FixAvroCoder (#5267) @clairemcginty
- Update FixAvroCoder to include SMB Avro reads (#5245) @clairemcginty
- Fix Scalafix rule for Avro package import (#5239) @clairemcginty
- Add Scalafix rule for SMB CharSequence key (#5236) @clairemcginty
- Scalafix: Add Avro coder import for SpecificRecord JobTest IOs (#5237) @clairemcginty
- Check options in scalafix (#5231) @kellen
- Inline scalafix (#5229) @kellen
📗 Documentation
- Update copyright year for scio site (#5261) @RustedBones
🧪 Test Improvements
- Fix populate it data (#5260) @RustedBones
- Add SortMergeBucketParityIT case for sortMergeTransform (#5224) @clairemcginty
🏗️ Build Improvements
- Update sbt-typelevel to 0.6.7 (#5275) @RustedBones
- Update sbt, sbt-dependency-tree to 1.9.9 (#5272) @scala-steward
- Update sbt-scoverage to 2.0.11 (#5273) @scala-steward
- Use released 0.14.0 scio version in scalafix (#5259) @RustedBones
- Update sbt-paradox-material-theme to 0.7.0 (#5251) @scala-steward
- Update scalafmt-core to 3.8.0 (#5254) @scala-steward
- Update scalactic to 3.2.18 (#5253) @scala-steward
- Update scalatest to 3.2.18 (#5255) @scala-steward
- Update sbt-scoverage to 2.0.10 (#5256) @scala-steward
- Cache TensorFlow Metadata proto files (#5246) @Duhemm
- Fix cache step for scalafix GHA (#5238) @RustedBones
- Bump release-drafter/release-drafter from 5 to 6 (#5226) @dependabot
🌱 Dependency Updates
- Update beam to 2.54.0 (#5235) @RustedBones
- Update elasticsearch-java to 8.12.2 (#5271) @scala-steward
- Update mssql-jdbc to 12.6.1.jre11 (#5268) @scala-steward
- Update cloud-sql-connector-jdbc-sqlserver, ... to 1.16.0 (#5249) @scala-steward
- Update testcontainers-scala-elasticsearch, ... to 0.41.3 (#5248) @scala-steward
Contributors to this release
@Duhemm, @RustedBones, @clairemcginty, @dependabot, @dependabot[bot], @kellen, @scala-steward and @shnapz
v0.14.0
What's Changed
Includes Beam 2.53.0 support.
Breaking Changes
- avro removed from
scio-core
. scalafix rules helping: FixAvroCoder, FixAvroSchemasPackage, FixDynamicAvro - some avro API changes . scalafix rules helping: FixGenericAvro.
- fallback kryo coder requires explicit import
- use of official tensorflow metadata
- BigQuery error-info and result handling API change
sio-smb
module not pulling implementation dependenciessio-smb
inJobTest
expectingSmbIO
test input/output
See the Migration Guide for more information.
🚀 Enhancements
- improve testing framework by @RustedBones in #4962
- fs.gs.inputstream.fadvise defaults to SEQUENTIAL by @farzad-sedghi in #5132
- (scio-smb) Support mixed FileOperations per BucketedInput by @clairemcginty in #5064
- Support Tap for SMB writes (addresses #5080) by @clairemcginty in #5144
- feat: add save dynamic csv by @klDen in #5130
- Make Sparkey testable by @kellen in #5128
- Integrate avro datum factory in scio-avro by @RustedBones in #5152
- Handle BQ write result as ClosedTap side output by @RustedBones in #5172
- Support projection in ParquetAvroSortedBucketIO by @clairemcginty in #5173
- Expose bigtable read maxBufferElementCount option by @RustedBones in #5026
- Integrate datum-factory in smb-avro by @RustedBones in #5181
- Require import for kryo implicit fallback coder by @RustedBones in #5199
- Add kryo serializer for GAX api exceptions by @RustedBones in #5198
- Initial Iceberg bucket support by @regadas in #5205
- Add scala enumeration implicit coder by @RustedBones in #5213
- Add SortedBucketTransform counter for records written by @clairemcginty in #5220
- Add PipelineTestUtils helper for Taps by @clairemcginty in #5216
- Add counter for SMB Predicate filtering by @clairemcginty in #5221
🐛 Bug Fixes
- Fix race in sparkey write by @kellen in #4937
- Support null CharSequence keys by @RustedBones in #5113
- Add location to BigQuery LoadOps by @f-loris in #5106
- Fix bigtable option conversion for bulk API by @RustedBones in #5167
- Fix multi-line DML statement detection by @RustedBones in #5169
- (fix #5147) Fix Materialize for elements that match compression encoding signature by @clairemcginty in #5148
- In SMB and ParquetAvroIOTap set GenericDataSupplier and read schemas by @shnapz in #5121
- (bugfix) Set metadata in AvroSortedBucketIO by @clairemcginty in #5184
- (fixes #5193) Serialize BucketMetadata#hashType as String by @clairemcginty in #5194
- Close client in async DoFn by @RustedBones in #5206
- Use non-deprecated version of murmur3_32 by @regadas in #5204
- Use fork-join common pool for async DonFn callbacks by @RustedBones in #5209
📜 Scalafix Migrations
- Add 0.14 scalafix migration for saveAsAvroFile and update avro coder one by @RustedBones in #5215
- Add Scalafix rule for LogicalTypeSupplier removal by @clairemcginty in #5178
📗 Documentation
- Add docs about SMB secondary keys by @kellen in #5095
- Remove references to Spotify FOSS Slack by @BalestraPatrick in #5149
- Convert SortMergeBucketExample to Parquet + update tests by @clairemcginty in #5191
- Fix list in Builtin.md by @kellen in #5214
- Scio 0.14 migration guide by @RustedBones in #5212
- Update scio and beam release table by @RustedBones in #5222
🧪 Test Improvements
- Fix flaky SCollectionTest by @kellen in #5098
- Drop deprecated sbt
IntegrationTest
configuration by @RustedBones in #4971 - Move populate test data to compile scope by @RustedBones in #5138
🏗️ Build Improvements
- Fix compiler warnings by @RustedBones in #4934
- Update sbt-assembly to 2.1.5 by @scala-steward in #5093
- Update sbt-bloop to 1.5.12 by @scala-steward in #5097
- Use sbt-typelevel for build by @RustedBones in #5107
- Ack or fix deprecation warnings by @RustedBones in #5124
- Ack expected non-exhaustive pattern match by @RustedBones in #5126
- Update sbt-jmh to 0.4.7 by @scala-steward in #5166
- Update sbt-typelevel to 0.6.5 by @scala-steward in #5164
- Update sbt-avro to 3.4.4 by @scala-steward in #5157
- Update sbt-mdoc to 2.5.2 by @scala-steward in #5163
- Update sbt, sbt-dependency-tree to 1.9.8 by @scala-steward in #5162
- Build integration test in PRs originating from repo by @RustedBones in #5143
- Remove duplicated scalac option by @RustedBones in #5171
- Handle unused warning as error by @RustedBones in #5180
- Skip dependency check if compile is skipped by @RustedBones in #5188
- Update sbt-protoc to 1.0.7 by @scala-steward in #5196
- Update sbt-paradox to 0.10.6 by @scala-steward in #5210
- Fix jar signing by @RustedBones in #5223
🔧 Refactorings
- Remove avro from scio-core, implement binary file source by @kellen in #4913
- Tensorflow metadata by @RustedBones in #4944
- Use avro builder API by @RustedBones in #5119
- Cleanup deprecated API by @RustedBones in #5134
- Move BQ typed from query to queryRaw by @RustedBones in #5137
- Rename scala SortedBucketIO class to SmbIO by @clairemcginty in #5140
- Rename SortedBucketIOTest to match SmbIO by @clairemcginty in #5146
- Relax versioning regex by @RustedBones in #5139
- Change scio-smb to depend on provided scio io modules by @RustedBones in #5004
- Drop collection compat shim in favor of official scala-collection-compat by @RustedBones in #5069
🌱 Dependency Updates
- Update algebra, cats-core, cats-kernel to 2.10.0 by @scala-steward in #4952
- Update jedis to 5.1.0 by @scala-steward in #5094
- Update metrics-core to 4.2.23 by @scala-steward in #5110
- Update hadoop to v3.2.4 by @RustedBones in #5135
- Update beam to v2.53 by @RustedBones in #5133
- Update scalac-compat-annotation, ... to 0.1.4 by @scala-steward in #5165
- Update neo4j-java-driver to 4.4.13 by @scala-steward in #5161
- Update jna to 5.14.0 by @scala-steward in #5160
- Update magnolify to 0.7.0 by @RustedBones in #5155
- Upgrade Parquet to 0.13.1 by @clairemcginty in #5175
- Update elasticsearch-java to 8.12.0 by @scala-steward in #5185
- Update mysql-connector-j to 8.3.0 by @scala-steward in #5174
- Update cloud-sql-connector-jdbc-sqlserver to 1.15.2 by @scala-steward in #5186
- Update mysql-socket-factory-connector-j-8 to 1.15.2 by @scala-steward in #5187
- Use new vendored Guava version by @clairemcginty in #5195
- Update metrics-core to 4.2.25 by @scala-steward in https://git...
v0.13.6
Includes Beam 2.52 support.
🚀 Enhancements
🐛 Bug Fixes
- Don't overwrite Configured projection in scio-smb (#5083) @clairemcginty
- Allow None for JDBC password (#5081) @kellen
- Fix sending empty request when batch is empty (#5060) @senegalo
- Exclude deprecated dropwizard artifact (#5052) @RustedBones
🏗️ Build improvements
- Reworked excluded libs (#5091) @RustedBones
- Cleanup mima filters (#5090) @RustedBones
- Enable jUnitSettings for scio-smb (#5088) @clairemcginty
- Increase BQ read timeout for integration test (#5087) @RustedBones
- Update testing dataset (#5086) @RustedBones
- Enable header plugin in IntegrationTest (#5074) @RustedBones
- Allow manual release (#5072) @RustedBones
- Update scalafmt-core to 3.7.17 (#5085) @scala-steward
- Update sbt-mdoc to 2.5.1 (#5065) @scala-steward
- Update sbt-assembly to 2.1.4 (#5051) @scala-steward
- Downgrade socco-ng to 0.1.8. (#5047) @RustedBones
- Downgrade scala version to last supported socco-ng (#5046) @RustedBones
🌱 Dependency Updates
- Update beam to 2.52 (#5054) @RustedBones
- Update mysql-socket-factory-connector-j-8 to 1.15.0 (#5071) @scala-steward
- Update elasticsearch-java to 8.11.1 (#5070) @scala-steward
- Update elasticsearch to 7.17.14 (#5059) @RustedBones
- Update metrics-core to 4.2.22 (#5057) @scala-steward
- Update elasticsearch-java to 8.11.0 (#5056) @scala-steward
- Update magnolify-avro, magnolify-bigtable, ... to 0.6.4 (#5049) @scala-steward
- Update mysql-connector-j to 8.2.0 (#5048) @scala-steward
Contributors to this release
@RustedBones, @avandel, @clairemcginty, @kellen, @scala-steward and @senegalo
v0.13.5
Updates newly introduced scio-extra Voyager experimental API to v2
🏗️ Build improvements
- Update scalafmt-core to 3.7.15 (#5044) @scala-steward
- Update sbt, sbt-dependency-tree to 1.9.7 (#5043) @scala-steward
🌱 Dependency Updates
- Update voyager to 2.0.2 (#5030) @scala-steward
- Update jakarta.json-api to 2.1.3 (#5041) @scala-steward
- Update jedis to 4.4.6 (#5042) @scala-steward
Contributors to this release
v0.13.4
Includes Beam 2.51 support.
🚀 Enhancements
- Voyager support in Scio (#4996) @patrickwmcgee
- Support a transform() API for ScioContext (#5035) @clairemcginty
- Use readNextFilteredRowGroup instead of readNextRowGroup (#5025) @RustedBones
- Simplify ParquetBucketMetadata (#5024) @RustedBones
- Adds GrpcBatchDoFn (#4977) @senegalo
🐛 Bug Fixes
- Make macro print debug logs if
bigquery.types.debug
enabled (#5033) @RustedBones - Pass avro reader schema for SMB operation (#5032) @RustedBones
- Remove TF model from the resource cache before closing (#5011) @jrglee
- Move log4j-over-slf4 to compile scope (#5018) @RustedBones
- Restrict implicit dynamic extensions for generic avro SCollection (#5021) @RustedBones
- Propagate ParquetOutputFormat options to ParquetWriter (#4980) @clairemcginty
- Route all logging to slf4j (#4981) @RustedBones
📗 Documentation
- fix some typos (#5039) @vuittont60
- Fix scaladoc links (#5022) @kellen
- Update dev docs (#5003) @RustedBones
🏗️ Build improvements
- Update sbt-mdoc to 2.4.0 (#5038) @scala-steward
- Set outputStrategy for run scope (#5019) @RustedBones
- Update sbt, sbt-dependency-tree to 1.9.6 (#5005) @scala-steward
- Update sbt-assembly to 2.1.3 (#5007) @scala-steward
- Update sbt-scalafix to 0.11.1 (#4997) @scala-steward
- Update sbt-jmh to 0.4.6 (#4992) @scala-steward
- Update sbt-scalafmt to 2.5.2 (#4989) @scala-steward
- Update sbt-bloop to 1.5.11 (#4982) @scala-steward
- Update testcontainers-scala-elasticsearch, ... to 0.41.0 (#4985) @scala-steward
- Update scalafmt-core to 3.7.14 (#4990) @scala-steward
- Update scalatest to 3.2.17 (#4991) @scala-steward
- Update sbt-scoverage to 2.0.9 (#4994) @scala-steward
- Bump actions/checkout from 3 to 4 (#4978) @dependabot
🌱 Dependency Updates
- Update beam to version 2.51 (#5023) @RustedBones
- Update elasticsearch-java to 8.10.4 (#5036) @scala-steward
- Update mysql-socket-factory-connector-j-8 to 1.14.1 (#5029) @scala-steward
- Update jedis to 4.4.5 (#5016) @scala-steward
- Update elasticsearch to v7.17.13 (#5008) @RustedBones
- Update magnolify-avro, magnolify-bigtable, ... to 0.6.3 (#5000) @scala-steward
- Update scala-compiler, scala-library, ... to 2.13.12 (#5001) @scala-steward
- Update cassandra-driver-core to 3.11.5 (#4984) @scala-steward
- Update scalactic to 3.2.17 (#4988) @scala-steward
- Update circe-core, circe-generic, ... to 0.14.6 (#4987) @scala-steward
Contributors to this release
@RustedBones, @clairemcginty, @dependabot, @dependabot[bot], @jrglee, @kellen, @patrickwmcgee, @scala-steward, @senegalo and @vuittont60
v0.13.3
Includes Beam 2.50.0 support.
🚀 Enhancements
- (fix #4970) Default to Parquet-SplittableDoFn if RunnerV2 is enabled (#4973) @clairemcginty
🐛 Bug Fixes
- Patch datum factory for specific data in IOs (#4975) @RustedBones
- Fix typeValidation in TransformOverride (#4967) @RustedBones
- Exclude logger implementation (#4969) @RustedBones
📗 Documentation
- Fix broken link (#4974) @saveriogzz
- Update jdbc doc to new API (#4954) @RustedBones
🌱 Dependency Updates
- Update beam to v2.50.0 (#4968) @RustedBones
- Update sbt, sbt-dependency-tree to 1.9.4 (#4965) @scala-steward
- Update sbt-avro to 3.4.3 (#4961) @scala-steward
Contributors to this release
@RustedBones, @clairemcginty, @saveriogzz and @scala-steward