Research papers on temporal ML models for wildfire smoke detection and related topics.
- uv for Python dependency management
- AWS credentials configured for access to the S3 bucket (
s3://pyro-survey-research/dvc/)
git clone <repo-url>
cd papers
make install # installs DVC and dependencies from uv.lock
make pull # downloads PDFs and notes from S3- Drop the PDF into
pdfs/using the naming conventionYear-Short-Title-Author.pdf - Add a row to
papers.csv - Write reading notes in
notes/asyear-short-title.md - Update
SUMMARY.mdwith a description - Track and push the changes:
uv run dvc add pdfs/ notes/
make push
git add pdfs.dvc notes.dvc papers.csv SUMMARY.md README.md
git commit -m "Add paper: <title>"make install Install dependencies from uv.lock
make pull Pull PDF data and notes from S3 via DVC
make push Push PDF data and notes to S3 via DVC
vision-rd/
├── README.md # This file
├── SUMMARY.md # Narrative summary grouped by theme
├── papers.csv # Structured metadata for all papers
├── pdfs/ # PDF files (DVC-tracked)
├── notes/ # Per-paper reading notes (DVC-tracked)
├── Makefile # install / pull / push
├── pyproject.toml # Python dependencies (dvc[s3])
└── uv.lock # Lockfile
| Year | Paper | Category | Architecture / Focus | Notes | |
|---|---|---|---|---|---|
| 2020 | Lightweight Student LSTM (Jeong et al.) | Temporal | YOLOv3 + LSTM, teacher-student distillation | notes | |
| 2020 | ELASTIC-YOLOv3 + Fire-Tube (Park & Ko) | Temporal | YOLOv3 + fire-tube + BoF + random forest | notes | |
| 2021 | TimeSformer (Bertasius et al.) | Video Foundation | Divided space-time attention | notes | |
| 2021 | ViViT (Arnab et al.) | Video Foundation | Video Vision Transformer, 4 factorizations | notes | |
| 2021 | LSTR (Xu et al.) | Online Detection | Long short-term memory Transformer | notes | |
| 2022 | Nemo / DETR (Yazdi et al.) | Spatial | DETR for wildfire smoke, open-source benchmark | notes | |
| 2022 | SlowFastMTB (Choi et al.) | Temporal | SlowFast + MTB bounding box algorithm | notes | |
| 2022 | SmokeyNet (Dewangan et al.) | Temporal | CNN (ResNet34) + LSTM + ViT on tiled frames | notes | |
| 2022 | TeSTra (Zhao & Krahenbuhl) | Online Detection | Temporal smoothing kernels, O(1) per frame | notes | |
| 2022 | VideoMAE (Tong et al.) | Video Foundation | Masked video autoencoder, data-efficient | notes | |
| 2023 | VideoMAE V2 (Wang et al.) | Video Foundation | Dual masking, billion-scale, progressive training | notes | |
| 2024 | Beyond Few-Shot OD Survey (Li et al.) | Few-Shot | 5 categories of few-shot detection | notes | |
| 2024 | FLAME (Gragnaniello et al.) | Temporal | DNN + GMM background subtraction + tracking FSM | notes | |
| 2024 | MATR (Song et al.) | Online Detection | Memory-augmented Transformer for streaming | notes | |
| 2024 | PyroNear2025 Dataset (Lostanlen et al.) | Dataset | 150k annotations, 50k images, 640 wildfires | notes | |
| 2024 | Smoke-DETR (Sun & Cheng) | Spatial | RT-DETR + ECPConv + EMA + MFFPN | notes | |
| 2024 | SmokeBench (Qi et al.) | Benchmark | Multimodal LLM evaluation on FIgLib | notes | |
| 2024 | Ultra-lightweight (Chaturvedi et al.) | Spatial | Conv-Transformer, 0.6M params, edge deploy | notes | |
| 2024 | Video Anomaly Survey (Liu et al.) | Survey | 10-year survey, reconstruction + MIL methods | notes | |
| 2024 | YOLOv10 (Wang et al.) | General | NMS-free YOLO, edge-friendly | notes | |
| 2025 | CCPE Swin (Wang et al.) | Spatial | Swin + Cross Contrast Patch Embedding | notes | |
| 2025 | Comprehensive DL Review (Elhanashi et al.) | Survey | CNNs, RNNs, YOLO, transformers, spatiotemporal | notes | |
| 2025 | Datasets 20-Year Review (Haeri Boroujeni et al.) | Survey | 29 fire/smoke datasets across modalities | notes | |
| 2025 | Few-Shot Remote Sensing (Zhang et al.) | Few-Shot | Domain adaptation with limited labels | notes | |
| 2025 | RT-DETR-Smoke (Wang et al.) | Spatial | RT-DETR + CoordAtt + WShapeIoU, 445 FPS | notes | |
| 2025 | Small Object Detection Survey | Survey | Multi-scale, super-resolution, attention | notes | |
| 2025 | ViT on the Edge Survey | Survey | Pruning, quantization, knowledge distillation | notes | |
| 2026 | ViT + 3D-CNN (Lilhore et al.) | Temporal | ViT + 3D-CNN + Transformer encoder | notes |
- time-wildfire -- Temporal smoke detection with EfficientNet, 3D ResNet, VideoMAE, ViViT, CNN+Transformer backbones + SAM3 tracking