The code for reproducing the experiments using the pyECT package.
Python version 3.10 should be installed on your system (or in your
virtual environment).
Base requirements can be installed by running
pip install -r requirements.txt from the root of this repository.
Acknowledgements Computational efforts were performed on the Tempest High Performance Computing System, operated and supported by University Information Technology Research Cyberinfrastructure (RRID:SCR_026229) at Montana State University.
Fashion-MNIST:
The Fashion-MNIST dataset may be downloaded
here.
Then, the training set CSV file should be moved into the
data/fashionmnist/ directory, and can be preprocessed by
running data/fashionmnist/preprocess-fashionmnist.py.
ImageNet:
The ImageNet sample dataset can be downloaded
here.
The images should be moved to a data/imagenet/images directory within this
repo, and then preprocessed by running the data/imagenet/preprocess-imagenet.py
function.
3DCCs:
The 3DCCs dataset can be generated by running the
data/generate_3d_cubical_complexes.py script.
Stanford Mesh Datasets:
The Stanford mesh datasets can be downloaded as PLY files from
here.
They should be moved to the data/stanford directory and preprocessed using
the data/stanford/preprocess-stanford.py file.
Base requirements for running the package can be installed using
pip install -r requirements.txt.
Then, additional packages need to be installed using the following steps.
PyECT:
PyECT should be installed using the linked anonymous repository in the paper.
From the root of the repository, you may run pip install .
Eucalc: Eucalc should be installed following the instructions here.
FastTopology: FastTopology may be installed and compiled from here.
To run the experiments, you may run the experiments/main.py function,
specifying values for the following command line arguments:
-
data_path(str, required)
Path to the dataset file. This may be:- A compressed NumPy
.npzfile containing image data, or - An
.objfile containing 3D mesh data.
The file will be loaded and processed depending on the selecteddata_type.
- A compressed NumPy
-
data_type(str, required)
Specifies the format and structure of the input data. Must be one of the predefinedDATA_TYPES.
Examples include:'image': 2D image data stored in.npzformat'3d_mesh': 3D triangular mesh in.objformat'3d_cubical_complex': Voxel-based or grid-based 3D data
-
invariant(str, required)
The topological invariant to compute. Must be one ofINVARIANT_TYPES. Examples:'wect': Weighted Euler Characteristic Transform'ecf': Euler Characteristic Field
-
implementation_name(str, required)
A descriptive name for the experiment or implementation variant being executed.
Used for logging, result organization, and reproducibility. -
output_path(str, required)
Destination path for the output.csvfile where computed invariants or experiment results will be saved. -
num_directions(int, required)
Number of directions for directional transforms (e.g., for WECT).
Higher values produce more detailed signatures but increase computation time. -
num_timesteps(int, required)
Number of filtration time steps used when computing invariants.
Controls the resolution of the transform. -
device("cpu" | "cuda" | "mps", optional; default:"cpu")
Which hardware backend to use:"cpu"for standard CPU execution"cuda"for NVIDIA GPU acceleration"mps"for Apple Silicon GPU acceleration
-
batch_size(int, optional; default:1)
Number of images to process simultaneously.
Useful for GPU acceleration when processing many images. -
cores(int, optional; default:1)
Number of CPU cores to utilize for parallelized implementations.
As an example, you may consider this bash script:
INVARIANT="wect"
IMPLEMENTATION_NAME="dect"
DEVICE="cpu"
DATASET="armadillo"
DIRECTIONS=(1 2)
TIMESTEPS=(10 100)
for NUM_DIRECTIONS in "${DIRECTIONS[@]}"; do
for NUM_TIMESTEPS in "${TIMESTEPS[@]}"; do
OUTPUT_PATH="results/${DATASET}/${INVARIANT}/${IMPLEMENTATION_NAME}/${NUM_DIRECTIONS}_dirs_${NUM_TIMESTEPS}_timesteps_${DEVICE}.csv"
# ensure directory exists
DIR_NAME=$(dirname "$OUTPUT_PATH")
mkdir -p "$DIR_NAME"
echo "Running: directions=$NUM_DIRECTIONS, timesteps=$NUM_TIMESTEPS"
python experiments/main.py \
--data_path="data/${DATASET}.obj" \
--data_type="3d_mesh" \
--invariant="${INVARIANT}" \
--implementation_name="${IMPLEMENTATION_NAME}" \
--output_path="${OUTPUT_PATH}" \
--num_directions="${NUM_DIRECTIONS}" \
--num_timesteps="${NUM_TIMESTEPS}" \
--device="${DEVICE}" \
--batch_size=1 \
--cores=1
done
doneFigures that plot the results can be created using the notebooks in the
figures/ directory. Note that you may need to change path names based
on the name of your results directory.