An object-oriented Python package for species distribution modelling (SDM) using deep neural networks (DNNs). The package provides an intuitive, high-level interface for exploring biodiversity patterns by modelling species' environmental preferences across a large number of abiotic and biotic variables.
Traditional SDM approaches such as MaxEnt have well-known limitations in handling correlated input features and incorporating species interactions. sdmdl addresses these limitations by using deep learning, which can learn complex, non-linear relationships between environmental predictors and species occurrences. The package trains binary classification DNNs with dropout regularisation, evaluates model performance using AUC and bootstrapped confidence intervals, and estimates variable importance using SHAP values.
- Data preparation — Automatically creates presence maps, raster stacks, pseudo-absence samples, band statistics, and training/prediction datasets from occurrence tables and environmental raster layers.
- Model training — Trains multiple DNN replicates per species and retains the best-performing model (highest AUC). Computes feature importance via SHAP.
- Prediction — Generates global species distribution maps as GeoTIFF rasters and PNG visualisations.
- Configuration — All key hyper-parameters (batch size, epochs, layer sizes, dropout, random
seed, pseudo-absence sample size) are controlled through a single
config.ymlfile.
GDAL must be installed separately as a system dependency, including the GDAL Python bindings. On Ubuntu/Debian:
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo apt-get update
sudo apt-get install gdal-bin libgdal-dev
pip install GDAL==$(gdal-config --version)Clone the repository and install in your Python environment:
git clone https://github.com/naturalis/sdmdl.git
cd sdmdl
pip install .The Python dependencies listed in requirements.txt will be installed
automatically.
from sdmdl.sdmdl_main import sdmdl
# Step 1: Create an sdmdl object pointing to the repository root
model = sdmdl('/path/to/sdmdl')
# Step 2: Prepare data (presence maps, raster stack, pseudo-absences, training & prediction data)
model.prep()
# Step 3: Train deep neural network models for each species
model.train()
# Step 4: Predict global species distributions
model.predict()
# Step 5: Remove temporary intermediate files
model.clean()-
Environmental raster layers (
.tif) placed in:data/gis/layers/scaled/— layers that require standardisation during preprocessing.data/gis/layers/non-scaled/— layers that are already normalised or categorical.
All raster layers (including the bundled
empty_land_map.tif) must share the same affine transformation and resolution. -
Occurrence tables (
.csvor.xls/.xlsx) placed indata/occurrences/. Each table must contain columns nameddecimalLatitude(ordecimallatitude) anddecimalLongitude(ordecimallongitude) with WGS 84 coordinates.
A config.yml file is generated automatically on first use in the data/ directory. It controls:
| Parameter | Type | Description |
|---|---|---|
random_seed |
int | Seed for reproducibility |
pseudo_freq |
int | Number of pseudo-absence samples |
batchsize |
int | Training batch size |
epoch |
int | Number of training epochs |
model_layers |
list | Nodes per hidden layer (length sets network depth) |
model_dropout |
list | Dropout rate per hidden layer |
verbose |
bool | If True, display progress bars |
Changes to config.yml take effect the next time an sdmdl object is created.
- Performance metrics —
results/_DNN_performance/DNN_eval.txtwith per-species accuracy, loss, AUC, TPR, and confidence intervals. - Trained models — Per-species
.h5(weights) and.json(architecture) files underresults/<species_name>/. - Feature importance — SHAP-based feature impact plots (
.png) per species. - Prediction maps — GeoTIFF rasters and colour-mapped PNG visualisations of predicted distributions per species.
Full documentation is available on Read the Docs and in
docs/index.rst.
This package implements the deep-learning SDM approach described in the following preprint:
Rademaker, M., Hogeweg, L., & Vos, R. (2019). Modelling the niches of wild and domesticated Ungulate species using deep learning. bioRxiv, 744441. https://doi.org/10.1101/744441
A copy of the preprint is included in this repository at docs/744441.full.pdf.
- Comparative analysis of abiotic niches in Ungulates by E. Hendrix — the MaxEnt-based comparative analysis that preceded this work.
- Ecological Niche Modelling Using Deep Learning by M. Rademaker — the original proof-of-concept DL-SDM implementation.
The raw results of a case study on domesticated crops and their wild progenitors are available on Zenodo:
Contributions are welcome! Please read CONTRIBUTING.md for guidelines on submitting bug reports, feature requests, and pull requests.
This project is licensed under the MIT License.
Copyright © 2019 Naturalis Biodiversity Center.