Releases: apache/beam
Beam 2.23.0 release
We are happy to present the new 2.23.0 release of Apache Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.23.0, check out the
detailed release notes.
Highlights
I/Os
- Support for reading from Snowflake added (Java) (BEAM-9722).
- Support for writing to Splunk added (Java) (BEAM-8596).
- Support for assume role added (Java) (BEAM-10335).
- A new transform to read from BigQuery has been added:
apache_beam.io.gcp.bigquery.ReadFromBigQuery
. This transform
is experimental. It reads data from BigQuery by exporting data to Avro files, and reading those files. It also supports
reading data by exporting to JSON files. This has small differences in behavior for Time and Date-related fields. See
Pydoc for more information. - Add dispositions for SnowflakeIO.write (BEAM-10343)
New Features / Improvements
- Update Snowflake JDBC dependency and add application=beam to connection URL (BEAM-10383).
Breaking Changes
RowJson.RowJsonDeserializer
,JsonToRow
, andPubsubJsonTableProvider
now accept "implicit
nulls" by default when deserializing JSON (Java) (BEAM-10220).
Previously nulls could only be represented with explicit null values, as in
{"foo": "bar", "baz": null}
, whereas an implicit null like{"foo": "bar"}
would raise an
exception. Now both JSON strings will yield the same result by default. This behavior can be
overridden withRowJson.RowJsonDeserializer#withNullBehavior
.- Fixed a bug in
GroupIntoBatches
experimental transform in Python to actually group batches by key.
This changes the output type for this transform (BEAM-6696).
Deprecations
- Remove Gearpump runner. (BEAM-9999)
- Remove Apex runner. (BEAM-9999)
- RedisIO.readAll() is deprecated and will be removed in 2 versions, users must use RedisIO.readKeyPatterns() as a replacement (BEAM-9747).
Known Issues
List of Contributors
According to git shortlog, the following people contributed to the 2.23.0 release. Thank you to all contributors!
Aaron, Abhishek Yadav, Ahmet Altay, aiyangar, Aizhamal Nurmamat kyzy, Ajo Thomas, Akshay-Iyangar, Alan Pryor, Alex Amato, Alexey Romanenko, Allen Pradeep Xavier, Andrew Crites, Andrew Pilloud, Ankur Goenka, Anna Qin, Ashwin Ramaswami, bntnam, Borzoo Esmailloo, Boyuan Zhang, Brian Hulette, Brian Michalski, brucearctor, Chamikara Jayalath, chi-chi weng, Chuck Yang, Chun Yang, Colm O hEigeartaigh, Corvin Deboeser, Craig Chambers, Damian Gadomski, Damon Douglas, Daniel Oliveira, Dariusz Aniszewski, darshanj, darshan jani, David Cavazos, David Moravek, David Yan, Esun Kim, Etienne Chauchot, Filipe Regadas, fuyuwei, Graeme Morgan, Hannah-Jiang, Harch Vardhan, Heejong Lee, Henry Suryawirawan, InigoSJ, Ismaël Mejía, Israel Herraiz, Jacob Ferriero, Jan Lukavský, Jie Fan, John Mora, Jozef Vilcek, Julien Phalip, Justine Koa, Kamil Gabryjelski, Kamil Wasilewski, Kasia Kucharczyk, Kenneth Jung, Kenneth Knowles, kevingg, Kevin Sijo Puthusseri, kshivvy, Kyle Weaver, Kyoungha Min, Kyungwon Jo, Luke Cwik, Mark Liu, Mark-Zeng, Matthias Baetens, Maximilian Michels, Michal Walenia, Mikhail Gryzykhin, Nam Bui, Nathan Fisher, Niel Markwick, Ning Kang, Omar Ismail, Pablo Estrada, paul fisher, Pawel Pasterz, perkss, Piotr Szuberski, pulasthi, purbanow, Rahul Patwari, Rajat Mittal, Rehman, Rehman Murad Ali, Reuben van Ammers, Reuven Lax, Reza Rokni, Rion Williams, Robert Bradshaw, Robert Burke, Rui Wang, Ruoyun Huang, sabhyankar, Sam Rohde, Sam Whittle, sclukas77, Sebastian Graca, Shoaib Zafar, Sruthi Sree Kumar, Stephen O'Kennedy, Steve Koonce, Steve Niemitz, Steven van Rossum, Ted Romer, Tesio, Thinh Ha, Thomas Weise, Tobias Kaymak, tobiaslieber-cognitedata, Tobiasz Kędzierski, Tomo Suzuki, Tudor Marian, tvs, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Vasu Nori, xuelianhan, Yichi Zhang, Yifan Zou, Yixing Zhang, yoshiki.obata, Yueyang Qiu, Yu Feng, Yuwei Fu, Zhuo Peng, ZijieSong946.
Beam 2.22.0 release
We are happy to present the new 2.22.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release. For more information on changes in 2.22.0, check out the detailed release notes.
I/Os
- Basic Kafka read/write support for DataflowRunner (Python) (BEAM-8019).
- Sources and sinks for Google Healthcare APIs (Java)(BEAM-9468).
New Features / Improvements
--workerCacheMB
flag is supported in Dataflow streaming pipeline (BEAM-9964)--direct_num_workers=0
is supported for FnApi runner. It will set the number of threads/subprocesses to number of cores of the machine executing the pipeline (BEAM-9443).- Python SDK now has experimental support for SqlTransform (BEAM-8603).
- Add OnWindowExpiration method to Stateful DoFn (BEAM-1589).
- Added PTransforms for Google Cloud DLP (Data Loss Prevention) services integration (BEAM-9723):
- Inspection of data,
- Deidentification of data,
- Reidentification of data.
- Add a more complete I/O support matrix in the documentation site (BEAM-9916).
- Upgrade Sphinx to 3.0.3 for building PyDoc.
- Added a PTransform for image annotation using Google Cloud AI image processing service
(BEAM-9646)
Breaking Changes
- The Python SDK now requires
--job_endpoint
to be set when using--runner=PortableRunner
(BEAM-9860). Users seeking the old default behavior should set--runner=FlinkRunner
instead.
List of Contributors
According to git shortlog, the following people contributed to the 2.22.0 release. Thank you to all contributors!
Ahmet Altay, aiyangar, Ajo Thomas, Akshay-Iyangar, Alan Pryor, Alexey Romanenko, Allen Pradeep Xavier, amaliujia, Andrew Pilloud, Ankur Goenka, Ashwin Ramaswami, bntnam, Borzoo Esmailloo, Boyuan Zhang, Brian Hulette, Chamikara Jayalath, Colm O hEigeartaigh, Craig Chambers, Damon Douglas, Daniel Oliveira, David Cavazos, David Moravek, Esun Kim, Etienne Chauchot, Filipe Regadas, Graeme Morgan, Hannah Jiang, Hannah-Jiang, Harch Vardhan, Heejong Lee, Henry Suryawirawan, Ismaël Mejía, Israel Herraiz, Jacob Ferriero, Jan Lukavský, John Mora, Kamil Wasilewski, Kenneth Jung, Kenneth Knowles, kevingg, Kyle Weaver, Kyoungha Min, Kyungwon Jo, Luke Cwik, Mark Liu, Matthias Baetens, Maximilian Michels, Michal Walenia, Mikhail Gryzykhin, Nam Bui, Niel Markwick, Ning Kang, Omar Ismail, omarismail94, Pablo Estrada, paul fisher, pawelpasterz, Pawel Pasterz, Piotr Szuberski, Rahul Patwari, rarokni, Rehman, Rehman Murad Ali, Reuven Lax, Robert Bradshaw, Robert Burke, Rui Wang, Ruoyun Huang, Sam Rohde, Sam Whittle, Sebastian Graca, Shoaib Zafar, Sruthi Sree Kumar, Stephen O'Kennedy, Steve Koonce, Steve Niemitz, Steven van Rossum, Tesio, Thomas Weise, tobiaslieber-cognitedata, Tomo Suzuki, Tudor Marian, tvalentyn, Tyson Hamilton, Udi Meiri, Valentyn Tymofieiev, Vasu Nori, xuelianhan, Yichi Zhang, Yifan Zou, yoshiki.obata, Yueyang Qiu, Zhuo Peng
Beam 2.21.0 release
We are happy to present the new 2.21.0 release of Beam. This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.21.0, check out the
detailed release notes.
I/Os
- Python: Deprecated module
apache_beam.io.gcp.datastore.v1
has been removed
as the client it uses is out of date and does not support Python 3
(BEAM-9529).
Please migrate your code to use
apache_beam.io.gcp.datastore.v1new.
See the updated
datastore_wordcount
for example usage. - Python SDK: Added integration tests and updated batch write functionality for Google Cloud Spanner transform (BEAM-8949).
New Features / Improvements
-
Python SDK will now use Python 3 type annotations as pipeline type hints.
(#10717)If you suspect that this feature is causing your pipeline to fail, calling
apache_beam.typehints.disable_type_annotations()
before pipeline creation
will disable is completely, and decorating specific functions (such as
process()
) with@apache_beam.typehints.no_annotations
will disable it
for that function.More details will be in
Ensuring Python Type Safety
and an upcoming
blog post. -
Java SDK: Introducing the concept of options in Beam Schema’s. These options add extra
context to fields and schemas. This replaces the current Beam metadata that is present
in a FieldType only, options are available in fields and row schemas. Schema options are
fully typed and can contain complex rows. Remark: Schema aware is still experimental.
(BEAM-9035) -
Java SDK: The protobuf extension is fully schema aware and also includes protobuf option
conversion to beam schema options. Remark: Schema aware is still experimental.
(BEAM-9044) -
Added ability to write to BigQuery via Avro file loads (Python) (BEAM-8841)
By default, file loads will be done using JSON, but it is possible to
specify the temp_file_format parameter to perform file exports with AVRO.
AVRO-based file loads work by exporting Python types into Avro types, so
to switch to Avro-based loads, you will need to change your data types
from Json-compatible types (string-type dates and timestamp, long numeric
values as strings) into Python native types that are written to Avro
(Python's date, datetime types, decimal, etc). For more information
see https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions. -
Added integration of Java SDK with Google Cloud AI VideoIntelligence service
(BEAM-9147) -
Added integration of Java SDK with Google Cloud AI natural language processing API
(BEAM-9634) -
docker-pull-licenses
tag was introduced. Licenses/notices of third party dependencies will be added to the docker images whendocker-pull-licenses
was set.
The files are added to/opt/apache/beam/third_party_licenses/
.
By default, no licenses/notices are added to the docker images. (BEAM-9136)
Breaking Changes
- Dataflow runner now requires the
--region
option to be set, unless a default value is set in the environment (BEAM-9199). See here for more details. - HBaseIO.ReadAll now requires a PCollection of HBaseIO.Read objects instead of HBaseQuery objects (BEAM-9279).
- ProcessContext.updateWatermark has been removed in favor of using a WatermarkEstimator (BEAM-9430).
- Coder inference for PCollection of Row objects has been disabled (BEAM-9569).
- Go SDK docker images are no longer released until further notice.
Deprecations
- Java SDK: Beam Schema FieldType.getMetadata is now deprecated and is replaced by the Beam
Schema Options, it will be removed in version2.23.0
. (BEAM-9704) - The
--zone
option in the Dataflow runner is now deprecated. Please use--worker_zone
instead. (BEAM-9716)
List of Contributors
According to git shortlog, the following people contributed to the 2.21.0 release. Thank you to all contributors!
Aaron Meihm, Adrian Eka, Ahmet Altay, AldairCoronel, Alex Van Boxel, Alexey Romanenko, Andrew Crites, Andrew Pilloud, Ankur Goenka, Badrul (Taki) Chowdhury, Bartok Jozsef, Boyuan Zhang, Brian Hulette, brucearctor, bumblebee-coming, Chad Dombrova, Chamikara Jayalath, Chie Hayashida, Chris Gorgolewski, Chuck Yang, Colm O hEigeartaigh, Curtis "Fjord" Hawthorne, Daniel Mills, Daniel Oliveira, David Yan, Elias Djurfeldt, Emiliano Capoccia, Etienne Chauchot, Fernando Diaz, Filipe Regadas, Gleb Kanterov, Hai Lu, Hannah Jiang, Harch Vardhan, Heejong Lee, Henry Suryawirawan, Hk-tang, Ismaël Mejía, Jacoby, Jan Lukavský, Jeroen Van Goey, jfarr, Jozef Vilcek, Kai Jiang, Kamil Wasilewski, Kenneth Knowles, KevinGG, Kyle Weaver, Kyoungha Min, Luke Cwik, Maximilian Michels, Michal Walenia, Ning Kang, Pablo Estrada, paul fisher, Piotr Szuberski, Reuven Lax, Robert Bradshaw, Robert Burke, Rose Nguyen, Rui Wang, Sam Rohde, Sam Whittle, Spoorti Kundargi, Steve Koonce, sunjincheng121, Ted Yun, Tesio, Thomas Weise, Tomo Suzuki, Udi Meiri, Valentyn Tymofieiev, Vasu Nori, Yichi Zhang, yoshiki.obata, Yueyang Qiu
v2.21.0-RC1
2.21.0 release candidate #1.
v2.16.0
Apache Beam 2.16.0 release