Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SchemaGen Executor to natively handle SequenceExample #5689

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions tfx/components/schema_gen/executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from tfx.types import standard_component_specs
from tfx.utils import io_utils
from tfx.utils import json_utils
from tfx_bs.tfxio import tensor_representation_util


# Default file name for generated schema file.
Expand Down Expand Up @@ -89,5 +90,10 @@ def Do(self, input_dict: Dict[str, List[types.Artifact]],
artifact_utils.get_single_uri(
output_dict[standard_component_specs.SCHEMA_KEY]),
DEFAULT_FILE_NAME)

# Add tensor representations to handle SequenceExamples downstream. Still need correct Payload Format.
tensor_representations = tensor_representation_util.InferTensorRepresentationsFromSchema(schema)
tensor_representation_util.SetTensorRepresentationsInSchema(schema, tensor_representations)
Copy link
Member

@lego0901 lego0901 Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it formatted in Pyink Python formatter for consistency? (https://github.com/google/pyink)
ex. <=80 columns rule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have attempted to do this, although perhaps the imports need changing for this same reason? I am unsure how one recommends to do this if the updated state is not desired, as I did not come up with the module names nor function names?


io_utils.write_pbtxt_file(output_uri, schema)
logging.info('Schema written to %s.', output_uri)