-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Utils to support added type casting capabilities #5906
Refactor Utils to support added type casting capabilities #5906
Conversation
Started looking at a PR, but we have some error with Basel. Currently, it is known issue and we are investigating it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Made a comment.
@gbaned Updated with a simplified version by casting single types to a list to reduce two functions to one |
@@ -101,36 +121,16 @@ def dict_to_example(instance: Dict[str, Any]) -> example_pb2.Example: | |||
if isinstance(pyval, bytes): | |||
pyval = pyval.decode(_DEFAULT_ENCODING) | |||
|
|||
if pyval is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this case was not handled. So I revived if pyval is None
conditional. Thanks!
PiperOrigin-RevId: 536904657
To facilitate @1025KB /
jyzhao
'sTODO
, separate functionality to support extended capabilities.Also enables more helpful error message to be clear on which key(s) are causing problems.
F-strings work in both supported Python versions on PyPi (3.8, 3.9)
Recommended expanded types handling:
Decimal - to int and float lists (BigQuery sometimes encodes numbers as Decimal)
Dictionaries - get to lists or list-of-lists
Datetimes - to UNIX / epoch time probably
Further discussion but possible and plausible types:
List-of-Lists - for Sequence Examples*
*This utils function could/would/should produce
SequenceExamples
. In my estimation per PR 5689, I posit all TFXExampleGens
should produceSequenceExamples
as this is forward-looking (in terms of supporting NLP applications etc) and because aSequenceExample
with emptysequence_features
seems not only permissible but also is not world breaking withSchemaGen
, unlike the situation currently.Passing a
SequenceExample
toSchemaGen
without PR 5689 means there has to be some other piece of code used to makeTF Transform
work -- what average user should be burdened with figuring that out?