Add metadata property to OpenMLBenchmarkSuite for LaTeX export #1515
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Metadata
metadataproperty toOpenMLBenchmarkSuitefor exporting suite information to LaTeX"Details
This PR implements the
metadataproperty on theOpenMLBenchmarkSuiteclass as requested in issue #1126. The property returns a pandas DataFrame containing comprehensive metadata for all tasks in the suite, combining both task-level information (task ID, estimation procedure, target feature) and dataset-level information (dataset ID, version, uploader, number of instances, features, classes, etc.).Key changes:
metadataproperty toOpenMLBenchmarkSuiteclass inopenml/study/study.py_list_tasksandlist_datasets) to minimize network overheadCurrently, researchers using OpenML benchmark suites must manually aggregate metadata from individual tasks and datasets to create the standard "dataset characteristics" tables required for academic publications. This process is:
This implementation solves these issues by:
to_latex()methodBefore (manual approach):
After (with this PR):
Testing:
Performance: The implementation uses batch API calls, reducing network overhead from O(N) to O(1) relative to suite size. For a suite with 72 tasks, this reduces API calls from ~144 to 2.
Caching: The property implements lazy loading with caching. The first access triggers API calls, but subsequent accesses return the cached DataFrame instantly.
Error Handling: Comprehensive error handling with informative
RuntimeErrormessages that help users debug issues.Backward Compatibility: This is a purely additive change. No existing functionality is modified, ensuring full backward compatibility.
Documentation:
examples/Advanced/suite_metadata_latex_export.pyCode Quality:
Files Changed:
openml/study/study.py: Added imports, cache initialization, andmetadatapropertytests/test_study/test_benchmark_suite_metadata.py: New test file with 7 unit testsexamples/Advanced/suite_metadata_latex_export.py: New example script demonstrating usage