FAIRshake (Sample Handling and Analysis Kit for Experiments) is a comprehensive data processing pipeline designed for efficient benchmarking and processing of datasets, particularly in diffraction data analysis. It includes modules for benchmarking, data loading, preprocessing, integration, and exporting.
- Benchmarking Modules: Assess the performance of data processing workflows.
- Data Loading: Efficient handling of large-scale datasets.
- Preprocessing: Data cleaning, normalization, and noise reduction.
- Integration: Combine data from various formats and sources seamlessly.
- Exporting: Output processed data in multiple formats for further analysis.
- Python 3.11 or higher
pip install FAIRshakeClone the repository and install FAIRshake locally:
git clone https://github.com/cwru-sdle/FAIRshake.git
cd FAIRshake
pip install .
# FAIRshake
FAIRshake (Sample Handling and Analysis Kit for Experiments) is a comprehensive data processing pipeline designed for efficient benchmarking and processing of datasets, particularly in diffraction data analysis. It includes modules for benchmarking, data loading, preprocessing, integration, and exporting.
## Features
- **Benchmarking Modules**: Assess the performance of data processing workflows.
- **Data Loading**: Efficient handling of large-scale datasets.
- **Preprocessing**: Data cleaning, normalization, and noise reduction.
- **Integration**: Combine data from various formats and sources seamlessly.
- **Exporting**: Output processed data in multiple formats for further analysis.
## Installation
### Requirements
- Python 3.11 or higher
### From PyPi
```bash
pip install FAIRshakeClone the repository and install FAIRshake locally:
git clone https://github.com/cwru-sdle/FAIRshake.git
cd FAIRshake
pip install .FAIRshake provides command-line tools and modules for data processing, benchmarking, and integration of diffraction data.
After installation, you can use the fairshake command. Use fairshake --help to see available commands:
fairshake --helpTo run the data processing pipeline on your dataset:
fairshake process --config <config-file> --data-dir <data-directory> --output-dir <output-directory>Create a configuration file (e.g., config.json) specifying parameters for preprocessing, integration, and exporting:
{
"preprocessing": {
"dark_field_path": "path/to/dark_field.ge2",
"mask_file_path": "path/to/mask.edf",
"invert_mask": true,
"min_intensity": 0.0,
"max_intensity": null
},
"integration": {
"poni_file_path": "calibration_files/det0.poni",
"npt_radial": 500,
"unit": "2th_deg",
"do_solid_angle": false,
"error_model": "poisson",
"radial_range": [3, 13],
"azimuth_range": [-180, 180],
"polarization_factor": 0.99,
"method": ["full", "histogram", "cython"]
},
"exporting": {
"output_directory": "path/to/output",
"naming_convention": "{GE_filenumber}_{iter}",
"options": {
"do_remove_nan": true,
"unit": "2th_deg"
},
"file_format": "fxye"
}
}To benchmark the performance of the data processing pipeline:
fairshake benchmark --data-dir <data-directory> \
--iterations <iterations> \
--batch-size <batch-size> \
--files-per-dataset <files-per-dataset>Example:
fairshake benchmark --data-dir data/benchmark_files \
--iterations 1 \
--batch-size 5 \
--files-per-dataset 10You can use FAIRshake modules directly in your Python scripts:
from FAIRshake.execution_pipeline.pipeline import ExecutionPipeline
# Configuration Parameters
input_base_dir = 'path/to/input'
output_base_dir = 'path/to/output'
# Preprocessing configuration
preprocessing_config = {
"dark_field_path": "path/to/dark_field.ge2",
"mask_file_path": "path/to/mask.edf",
"invert_mask": True,
"min_intensity": 0.0,
"max_intensity": None,
}
# Integration configuration
integration_config = {
"poni_file_path": "calibration_files/det0.poni",
"npt_radial": 500,
"unit": "2th_deg",
"do_solid_angle": False,
"error_model": "poisson",
"radial_range": (3, 13),
"azimuth_range": [-180, 180],
"polarization_factor": 0.99,
"method": ["full", "histogram", "cython"]
}
# Exporting configuration
exporting_config = {
"output_directory": output_base_dir,
"naming_convention": "{GE_filenumber}_{iter}",
"options": {
"do_remove_nan": True,
"unit": "2th_deg"
},
"file_format": "fxye"
}
# Pipeline parameters
pipeline_params = {
"input_base_dir": input_base_dir,
"output_base_dir": output_base_dir,
"batch_size": 10,
"data_file_types": ['.ge2', '.tif', '.edf', '.cbf', '.mar3450', '.h5', '.png'],
"metadata_file_types": ['.json', '.poni', '.instprm', '.geom', '.spline'],
"require_metadata": True,
"load_metadata_files": True,
"load_detector_metadata": False,
"require_all_formats": False,
"average_frames": False,
"enable_profiling": True,
"tf_data_debug_mode": False,
"pattern": '*/*/*',
"preprocessing_config": preprocessing_config,
"enable_preprocessing": True,
"enable_integration": True,
"integration_config": integration_config,
"enable_exporting": True,
"exporting_config": exporting_config,
"log_level": "ERROR"
}
# Initialize the Execution Pipeline
pipeline = ExecutionPipeline(**pipeline_params)
# Run the Pipeline
pipeline.run()Ensure that you define preprocessing_config, integration_config, and exporting_config according to your requirements.
For detailed usage and options, use the help command:
fairshake process --help
fairshake benchmark --helpContributions are welcome. Please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature-branch). - Open a pull request.
This project is licensed under the BSD 3-Clause License. See the LICENSE.txt file for details.
For support or inquiries:
- Author: Finley Holt
- Email: [email protected]
- GitHub: FinleyHolt