Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Utils to support added type casting capabilities #5906

Merged

Conversation

AlexanderLavelle
Copy link
Contributor

@AlexanderLavelle AlexanderLavelle commented May 11, 2023

To facilitate @1025KB / jyzhao's TODO, separate functionality to support extended capabilities.

Also enables more helpful error message to be clear on which key(s) are causing problems.
F-strings work in both supported Python versions on PyPi (3.8, 3.9)

Recommended expanded types handling:
Decimal - to int and float lists (BigQuery sometimes encodes numbers as Decimal)
Dictionaries - get to lists or list-of-lists
Datetimes - to UNIX / epoch time probably

Further discussion but possible and plausible types:
List-of-Lists - for Sequence Examples*

*This utils function could/would/should produce SequenceExamples. In my estimation per PR 5689, I posit all TFX ExampleGens should produce SequenceExamples as this is forward-looking (in terms of supporting NLP applications etc) and because a SequenceExample with empty sequence_features seems not only permissible but also is not world breaking with SchemaGen, unlike the situation currently.

Passing a SequenceExample to SchemaGen without PR 5689 means there has to be some other piece of code used to make TF Transform work -- what average user should be burdened with figuring that out?

@gbaned gbaned self-assigned this May 12, 2023
@gbaned gbaned requested a review from briron May 12, 2023 18:04
@briron briron requested review from AnuarTB and removed request for briron May 13, 2023 03:10
@AnuarTB
Copy link
Contributor

AnuarTB commented May 17, 2023

Started looking at a PR, but we have some error with Basel. Currently, it is known issue and we are investigating it.

Copy link
Contributor

@AnuarTB AnuarTB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Made a comment.

@gbaned gbaned added the ready to pull Ready to pull for internal review/testing. label May 24, 2023
@AlexanderLavelle
Copy link
Contributor Author

@gbaned Updated with a simplified version by casting single types to a list to reduce two functions to one

@gbaned gbaned requested a review from AnuarTB May 24, 2023 15:43
@gbaned gbaned added ready to pull Ready to pull for internal review/testing. and removed ready to pull Ready to pull for internal review/testing. labels May 25, 2023
@@ -101,36 +121,16 @@ def dict_to_example(instance: Dict[str, Any]) -> example_pb2.Example:
if isinstance(pyval, bytes):
pyval = pyval.decode(_DEFAULT_ENCODING)

if pyval is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this case was not handled. So I revived if pyval is None conditional. Thanks!

copybara-service bot pushed a commit that referenced this pull request Jun 6, 2023
@copybara-service copybara-service bot merged commit 43c8df6 into tensorflow:master Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to pull Ready to pull for internal review/testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants