Skip to content

TensorFlow Data Validation 0.30.0

Compare
Choose a tag to compare
@jay90099 jay90099 released this 21 Apr 22:08
d615611

Major Features and Improvements

  • This version is the last version before TFDV 1.0. Once 1.0, all the TFDV
    public APIs (i.e. symbols in the root __init__.py) will be subject to
    semantic versioning. We are deprecating some public APIs in this version
    and they will be removed in 1.0.

  • Sketch-based top-k/unique stats generator now is able to detect invalid
    utf-8 sequences / large texts and replace them with a placeholder.
    It will not suffer from memory issue usually caused by image / large text
    features in the data. Note that this generator is not by default used yet.

  • Added StatsOptions.experimental_use_sketch_based_topk_uniques which
    enables the sketch-based top-k/unique stats generator.

Bug Fixes and Other Changes

  • Fixed bug in display_schema that caused domains not to be displayed.
  • Modified how get_schema_dataframe outputs numeric domains.
  • Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
    anomaly types: INVALID_DOMAIN_SPECIFICATION and MULTIPLE_REASONS.
  • Depends on tensorflow-metadata>=0.30,<0.31.
  • Depends on tfx-bsl>=0.30,<0.31.

Known Issues

  • N/A

Breaking Changes

  • N/A

Deprecations

  • tfdv.LiftStatsGenerator is going to be removed in the next version from
    the public API. To enable that generator,
    supply StatsOptions.label_feature
  • tfdv.NonStreamingCustomStatsGenerator is going to be removed in the next
    version from the public API. You may continue to import it from TFDV
    but it will not be subject to compatibility guarantees.
  • tfdv.validate_instance is going to be removed in the next
    version from the public API. You may continue to import it from TFDV
    but it will not be subject to compatibility guarantees.
  • Removed tfdv.DecodeCSV, tfdv.DecodeTFExample (deprecated in 0.27).
  • Removed feature_whitelist in tfdv.StatsOptions (deprecated in 0.28).
    Use feature_allowlist instead.
  • tfdv.get_feature_value_slicer is deprecated.
    tfdv.experimental_get_feature_value_slicer is introduced as a replacement.
    TFDV is likely to have a different slicing functionality post 1.0, which
    may not be compatible with the current slicers.
  • StatsOptions.slicing_functions is deprecated.
    StatsOptions.experimental_slicing_functions is introduced as a
    replacement.
  • tfdv.WriteStatisticsToText is removed (deprecated in 0.25.0).
  • Parameter compression_type in tfdv.generate_statistics_from_tfrecord
    is deprecated. The compression type is currently automatically determined.