TensorFlow Data Validation 0.30.0
Major Features and Improvements
-
This version is the last version before TFDV 1.0. Once 1.0, all the TFDV
public APIs (i.e. symbols in the root__init__.py
) will be subject to
semantic versioning. We are deprecating some public APIs in this version
and they will be removed in 1.0. -
Sketch-based top-k/unique stats generator now is able to detect invalid
utf-8 sequences / large texts and replace them with a placeholder.
It will not suffer from memory issue usually caused by image / large text
features in the data. Note that this generator is not by default used yet. -
Added
StatsOptions.experimental_use_sketch_based_topk_uniques
which
enables the sketch-based top-k/unique stats generator.
Bug Fixes and Other Changes
- Fixed bug in
display_schema
that caused domains not to be displayed. - Modified how
get_schema_dataframe
outputs numeric domains. - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: INVALID_DOMAIN_SPECIFICATION and MULTIPLE_REASONS. - Depends on
tensorflow-metadata>=0.30,<0.31
. - Depends on
tfx-bsl>=0.30,<0.31
.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
tfdv.LiftStatsGenerator
is going to be removed in the next version from
the public API. To enable that generator,
supplyStatsOptions.label_feature
tfdv.NonStreamingCustomStatsGenerator
is going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.tfdv.validate_instance
is going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.- Removed
tfdv.DecodeCSV
,tfdv.DecodeTFExample
(deprecated in 0.27). - Removed
feature_whitelist
intfdv.StatsOptions
(deprecated in 0.28).
Usefeature_allowlist
instead. tfdv.get_feature_value_slicer
is deprecated.
tfdv.experimental_get_feature_value_slicer
is introduced as a replacement.
TFDV is likely to have a different slicing functionality post 1.0, which
may not be compatible with the current slicers.StatsOptions.slicing_functions
is deprecated.
StatsOptions.experimental_slicing_functions
is introduced as a
replacement.tfdv.WriteStatisticsToText
is removed (deprecated in 0.25.0).- Parameter
compression_type
intfdv.generate_statistics_from_tfrecord
is deprecated. The compression type is currently automatically determined.