-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Description
We already have some tests for data preprocessing. However, those are more integration tests that capture the behaviour of the tool as a whole than unit tests for specific functions.
In order to efficiently test the different preprocessing functionalities, we need to add some smaller-scale unit tests. Those should not include real data, but sample input values that can be generated from scratch.
Here are the classes / functions that should be covered (from the implementation in the protein_prediction branch
reader.py:
- DataReader:
to_data() - ChemDataReader:
_read_data() - DeepChemDataReader:
_read_data() - SelfiesReader:
_read_data() - ProteinDataReader:
_read_data()
collate.py: - DefaultCollator:
__call__() - RaggedCollator:
__call__(),process_label_rows()
datasets/base.py - XYBaseDataModule:
_filter_labels() - DynamicDataset:
get_test_split(),get_train_val_splits_given_test()
datasets/chebi.py - _ChEBIDataExtractor:
_extract_class_hierarchy(),_graph_to_raw_dataset(),_load_dict(),_setup_pruned_test_set() - ChEBIOverX:
select_classes() - ChEBIOverXPartial:
extract_class_hierarchy() term_callback()
datasets/go_uniprot.py:- _GOUniprotDataExtractor:
_extract_class_hierarchy(),term_callback(),_graph_to_raw_dataset(),_get_swiss_to_go_mapping(),_load_dict() - _GoUniProtOverX:
select_classes()
datasets/tox21.py: Tox21MolNet:setup_processed(),_load_data_from_file()- Tox21Challenge:
setup_processed(),_load_data_from_file(),_load_dict()
For some functions, it is necessary to read from / write to files. Instead of real files, I would suggest to use mock objects (see e.g. this comment)
Metadata
Metadata
Assignees
Labels
No labels