-
Notifications
You must be signed in to change notification settings - Fork 389
Update functionality to download demos #2705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…sing download_demo (#2696)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2705 +/- ##
==========================================
- Coverage 98.15% 98.10% -0.06%
==========================================
Files 74 74
Lines 7826 7989 +163
==========================================
+ Hits 7682 7838 +156
- Misses 144 151 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
sdv/datasets/demo.py
Outdated
_validate_modalities(modality) | ||
if output_filepath is not None and not str(output_filepath).endswith('.txt'): | ||
fname = (filename or '').lower() | ||
file_type = 'README' if 'readme' in fname else 'source' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the else 'source'
be else 'SOURCE'
? Source: https://docs.sdv.dev/datacebo-internal/product/demo-datasets/organization#source.txt-file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the S3 folder they are lower case. @amontanez24 Which one should we follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you post on team engineering? The docs do say capital, but we would have to run a script to rename all the files. Let's see what Neha thinks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvk-developer @amontanez24 The code is already case insensitive. As for the comment above, it's just a matter of what error message we want to show. I'll keep as is if that's alright.
sdv/datasets/demo.py
Outdated
try: | ||
raw = _get_data_from_bucket(yaml_key) | ||
info = yaml.safe_load(raw) or {} | ||
|
||
size_mb_val = info.get('dataset-size-mb') | ||
try: | ||
size_mb = float(size_mb_val) if size_mb_val is not None else np.nan | ||
except (ValueError, TypeError): | ||
LOGGER.info( | ||
f'Invalid dataset-size-mb {size_mb_val} for dataset ' | ||
f'{dataset_name}; defaulting to NaN.' | ||
) | ||
size_mb = np.nan | ||
|
||
num_tables_val = info.get('num-tables', np.nan) | ||
if isinstance(num_tables_val, str): | ||
try: | ||
num_tables_val = float(num_tables_val) | ||
except (ValueError, TypeError): | ||
LOGGER.info( | ||
f'Could not cast num_tables_val {num_tables_val} to float for ' | ||
f'dataset {dataset_name}; defaulting to NaN.' | ||
) | ||
num_tables_val = np.nan | ||
|
||
try: | ||
num_tables = int(num_tables_val) if not pd.isna(num_tables_val) else np.nan | ||
except (ValueError, TypeError): | ||
LOGGER.info( | ||
f'Invalid num-tables {num_tables_val} for ' | ||
f'dataset {dataset_name} when parsing as int.' | ||
) | ||
num_tables = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be broken down into smaller functions that do their own thing and in a similar fashion as the download
.
Also, avoid the try
/except
in a try
/except
and specially in such large portions. What if tables_info['size_MB']
list was not defined ? Would we know which thing failed in this block of code ?
raw = _get_data_from_bucket(yaml_key)
info = yaml.safe_load(raw) or {}
size_mb_val = info.get('dataset-size-mb')
......
tables_info['dataset_name'].append(dataset_name)
tables_info['size_MB'].append(size_mb)
tables_info['num_tables'].append(num_tables)
Co-authored-by: Plamen Valentinov Kolev <[email protected]>
80cfbdb
to
78fee6b
Compare
CU-86b6fugf3, Resolve #2661
CU-86b6fugf3, Resolve #2663
CU-86b6xkdcf, Resolve #2686
CU-86b6xjuk7, Resolve #2685
CU-86b6xjugh, Resolve #2684
CU-86b6xrcb1, Resolve #2689
CU-86b6xrqf0, Resolve #2691
CU-86b6xnx26, Resolve #2687
CU-86b6xp7a0, Resolve #2688
CU-86b6xrcah, Resolve #2690