Skip to content

Commit 927143a

Browse files
authored
Merge pull request #208 from RangiLyu/refactor
[Refactor] Switch to new training code based on pytorch lightning
2 parents fb82fbb + 2ca0f48 commit 927143a

15 files changed

+437
-226
lines changed

README.md

+33-27
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,7 @@
2020
* [2021.02.03] Support [EfficientNet-Lite](https://github.com/RangiLyu/EfficientNet-Lite) and [Rep-VGG](https://github.com/DingXiaoH/RepVGG) backbone. Please check the [config folder](config/). Download models in [Model Zoo](#model-zoo)
2121

2222
* [2021.01.10] **NanoDet-g** with lower memory access cost, which designed for edge NPU or GPU, is now available!
23-
Check [config/nanodet-g.yml](config/nanodet-g.yml) and download:
24-
[COCO pre-trained model(Google Drive)](https://drive.google.com/file/d/10uW7oqZKw231l_tr4C1bJWkbCXgBf7av/view?usp=sharing) | [(BaiduDisk百度网盘)](https://pan.baidu.com/s/1IJLdtLBvmQVOmzzNY_Ci5A) code:otcd
23+
Check [config/nanodet-g.yml](config/nanodet-g.yml) and download in [Model Zoo](#model-zoo).
2524

2625
<details>
2726
<summary>More...</summary>
@@ -93,9 +92,8 @@ Inference using [Alibaba's MNN framework](https://github.com/alibaba/MNN) is in
9392
### Pytorch demo
9493

9594
First, install requirements and setup NanoDet following installation guide. Then download COCO pretrain weight from here
96-
👉[COCO pretrain weight for torch>=1.6(Google Drive)](https://drive.google.com/file/d/1EhMqGozKfqEfw8y9ftbi1jhYu86XoW62/view?usp=sharing) | [(百度网盘)](https://pan.baidu.com/s/1LCnmj2Pqhv0tsDX__1j2gg) code:6au1
9795

98-
👉[COCO pretrain weight for torch<=1.5(Google Drive)](https://drive.google.com/file/d/10h-0qLMCgYvWQvKULqbkLvmirFR-w9NN/view?usp=sharing) | [(百度云盘)](https://pan.baidu.com/s/1OTcPiajCcqKLg3Q0vwho3A) code:topw
96+
👉[COCO pretrain weight (Google Drive)](https://drive.google.com/file/d/1ZkYucuLusJrCb_i63Lid0kYyyLvEiGN3/view?usp=sharing)
9997

10098
* Inference images
10199

@@ -141,13 +139,13 @@ Besides, We provide a notebook [here](./demo/demo-inference-with-pytorch.ipynb)
141139
2. Install pytorch
142140

143141
```shell script
144-
conda install pytorch torchvision cudatoolkit=11.0 -c pytorch
142+
conda install pytorch torchvision cudatoolkit=11.1 -c pytorch
145143
```
146144

147145
3. Install requirements
148146

149147
```shell script
150-
pip install Cython termcolor numpy tensorboard pycocotools matplotlib pyaml opencv-python tqdm
148+
pip install Cython termcolor numpy tensorboard pycocotools matplotlib pyaml opencv-python tqdm pytorch-lightning torchmetrics
151149
```
152150

153151
4. Setup NanoDet
@@ -166,14 +164,14 @@ NanoDet supports variety of backbones. Go to the [***config*** folder](config/)
166164

167165
Model | Backbone |Resolution|COCO mAP| FLOPS |Params | Pre-train weight |
168166
:--------------------:|:------------------:|:--------:|:------:|:-----:|:-----:|:-----:|
169-
NanoDet-m | ShuffleNetV2 1.0x | 320*320 | 20.6 | 0.72B | 0.95M | [Download](https://drive.google.com/file/d/10h-0qLMCgYvWQvKULqbkLvmirFR-w9NN/view?usp=sharing) |
170-
NanoDet-m-416 | ShuffleNetV2 1.0x | 416*416 | 23.5 | 1.2B | 0.95M | [Download](https://drive.google.com/file/d/1h6TBy1tx4faIBKHnYeg0QwzFF6wlFBEd/view?usp=sharing)|
171-
NanoDet-t (***NEW***) | ShuffleNetV2 1.0x | 320*320 | 21.7 | 0.96B | 1.36M | [Download](https://drive.google.com/file/d/1O2iz-aaDiQHJNfocInpFrY8ZFMrT3M1r/view?usp=sharing) |
172-
NanoDet-g | Custom CSP Net | 416*416 | 22.9 | 4.2B | 3.81M | [Download](https://drive.google.com/file/d/10uW7oqZKw231l_tr4C1bJWkbCXgBf7av/view?usp=sharing)|
173-
NanoDet-EfficientLite | EfficientNet-Lite0 | 320*320 | 24.7 | 1.72B | 3.11M | [Download](https://drive.google.com/file/d/1u_t9L0jqjH858gCR-vpzWzu9FexQOSmJ/view?usp=sharing)|
174-
NanoDet-EfficientLite | EfficientNet-Lite1 | 416*416 | 30.3 | 4.06B | 4.01M | [Download](https://drive.google.com/file/d/1y9z7BToAZOQ1pKbOjNjf79YMuFuDTvfq/view?usp=sharing) |
175-
NanoDet-EfficientLite | EfficientNet-Lite2 | 512*512 | 32.6 | 7.12B | 4.71M | [Download](https://drive.google.com/file/d/1UMXJJxRkRzgTvN1iRKeDZqGpkLxK3X4K/view?usp=sharing) |
176-
NanoDet-RepVGG | RepVGG-A0 | 416*416 | 27.8 | 11.3B | 6.75M | [Download](https://drive.google.com/file/d/1bsT9Ksxws2O3g_IUuUwp0QwZcJlqJw3S/view?usp=sharing) |
167+
NanoDet-m | ShuffleNetV2 1.0x | 320*320 | 20.6 | 0.72B | 0.95M | [Download](https://drive.google.com/file/d/1ZkYucuLusJrCb_i63Lid0kYyyLvEiGN3/view?usp=sharing) |
168+
NanoDet-m-416 | ShuffleNetV2 1.0x | 416*416 | 23.5 | 1.2B | 0.95M | [Download](https://drive.google.com/file/d/1jY-Um2VDDEhuVhluP9lE70rG83eXQYhV/view?usp=sharing)|
169+
NanoDet-t (***NEW***) | ShuffleNetV2 1.0x | 320*320 | 21.7 | 0.96B | 1.36M | [Download](https://drive.google.com/file/d/1TqRGZeOKVCb98ehTaE0gJEuND6jxwaqN/view?usp=sharing) |
170+
NanoDet-g | Custom CSP Net | 416*416 | 22.9 | 4.2B | 3.81M | [Download](https://drive.google.com/file/d/1f2lH7Ae1AY04g20zTZY7JS_dKKP37hvE/view?usp=sharing)|
171+
NanoDet-EfficientLite | EfficientNet-Lite0 | 320*320 | 24.7 | 1.72B | 3.11M | [Download](https://drive.google.com/file/d/1Dj1nBFc78GHDI9Wn8b3X4MTiIV2el8qP/view?usp=sharing)|
172+
NanoDet-EfficientLite | EfficientNet-Lite1 | 416*416 | 30.3 | 4.06B | 4.01M | [Download](https://drive.google.com/file/d/1ernkb_XhnKMPdCBBtUEdwxIIBF6UVnXq/view?usp=sharing) |
173+
NanoDet-EfficientLite | EfficientNet-Lite2 | 512*512 | 32.6 | 7.12B | 4.71M | [Download](https://drive.google.com/file/d/11V20AxXe6bTHyw3aMkgsZVzLOB31seoc/view?usp=sharing) |
174+
NanoDet-RepVGG | RepVGG-A0 | 416*416 | 27.8 | 11.3B | 6.75M | [Download](https://drive.google.com/file/d/1nWZZ1qXb1HuIXwPSYpEyFHHqX05GaFer/view?usp=sharing) |
177175

178176

179177
****
@@ -194,35 +192,43 @@ NanoDet-RepVGG | RepVGG-A0 | 416*416 | 27.8 | 11.3B | 6.75M |
194192

195193
Change ***num_classes*** in ***model->arch->head***.
196194

197-
Change image path and annotation path in both ***data->train data->val***.
195+
Change image path and annotation path in both ***data->train*** and ***data->val***.
198196

199-
Set gpu, workers and batch size in ***device*** to fit your device.
197+
Set gpu ids, num workers and batch size in ***device*** to fit your device.
200198

201199
Set ***total_epochs***, ***lr*** and ***lr_schedule*** according to your dataset and batchsize.
202200

203201
If you want to modify network, data augmentation or other things, please refer to [Config File Detail](docs/config_file_detail.md)
204202

205203
3. **Start training**
206204

207-
For single GPU, run
205+
NanoDet is now using [pytorch lightning](https://github.com/PyTorchLightning/pytorch-lightning) for training.
206+
207+
For both single-GPU or multiple-GPUs, run:
208+
209+
```shell script
210+
python tools/train.py CONFIG_FILE_PATH
211+
```
212+
213+
Old training script is deprecated and will be deleted in next version. If you still want to use,
214+
215+
<details>
216+
<summary>follow this...</summary>
217+
218+
For single GPU, run
208219

209220
```shell script
210-
python tools/train.py CONFIG_PATH
221+
python tools/deprecated/train.py CONFIG_FILE_PATH
211222
```
212223

213224
For multi-GPU, NanoDet using distributed training. (Notice: Windows not support distributed training before pytorch1.7) Please run
214225

215226
```shell script
216-
python -m torch.distributed.launch --nproc_per_node=GPU_NUM --master_port 29501 tools/train.py CONFIG_PATH
227+
python -m torch.distributed.launch --nproc_per_node=GPU_NUM --master_port 29501 tools/deprecated/train.py CONFIG_FILE_PATH
217228
```
218229

219-
**Experimental**:
220-
221-
Training with [pytorch lightning](https://github.com/PyTorchLightning/pytorch-lightning), no matter single or multi GPU just run:
222-
223-
```shell script
224-
python tools/train_pl.py CONFIG_PATH
225-
```
230+
</details>
231+
226232

227233
4. **Visualize Logs**
228234

@@ -232,7 +238,7 @@ NanoDet-RepVGG | RepVGG-A0 | 416*416 | 27.8 | 11.3B | 6.75M |
232238

233239
```shell script
234240
cd <YOUR_SAVE_DIR>
235-
tensorboard --logdir ./logs
241+
tensorboard --logdir ./
236242
```
237243

238244
****

docs/config_file_detail.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Change save_dir to where you want to save logs and models. If path not exist, Na
1515
```yaml
1616
model:
1717
arch:
18-
name: xxx
18+
name: OneStageDetector
1919
backbone: xxx
2020
fpn: xxx
2121
head: xxx

nanodet/evaluator/coco_detection.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def results2json(self, results):
4848
json_results.append(detection)
4949
return json_results
5050

51-
def evaluate(self, results, save_dir, epoch, logger, rank=-1):
51+
def evaluate(self, results, save_dir, rank=-1):
5252
results_json = self.results2json(results)
5353
json_path = os.path.join(save_dir, 'results{}.json'.format(rank))
5454
json.dump(results_json, open(json_path, 'w'))
@@ -61,5 +61,4 @@ def evaluate(self, results, save_dir, epoch, logger, rank=-1):
6161
eval_results = {}
6262
for k, v in zip(self.metric_names, aps):
6363
eval_results[k] = v
64-
logger.scalar_summary('Val_coco_bbox/' + k, 'val', v, epoch)
6564
return eval_results

nanodet/trainer/task.py

+67-14
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import copy
1616
import os
1717
import warnings
18+
import json
1819
import torch
1920
import logging
2021
from pytorch_lightning import LightningModule
@@ -27,25 +28,20 @@
2728
class TrainingTask(LightningModule):
2829
"""
2930
Pytorch Lightning module of a general training task.
31+
Including training, evaluating and testing.
32+
Args:
33+
cfg: Training configurations
34+
evaluator: Evaluator for evaluating the model performance.
3035
"""
3136

32-
def __init__(self, cfg, evaluator=None, logger=None):
33-
"""
34-
35-
Args:
36-
cfg: Training configurations
37-
evaluator:
38-
logger:
39-
"""
37+
def __init__(self, cfg, evaluator=None):
4038
super(TrainingTask, self).__init__()
4139
self.cfg = cfg
4240
self.model = build_model(cfg.model)
4341
self.evaluator = evaluator
44-
self._logger = logger
4542
self.save_flag = -10
4643
self.log_style = 'NanoDet' # Log style. Choose between 'NanoDet' or 'Lightning'
4744
# TODO: use callback to log
48-
# TODO: remove _logger
4945
# TODO: batch eval
5046
# TODO: support old checkpoint
5147

@@ -54,7 +50,7 @@ def forward(self, x):
5450
return x
5551

5652
@torch.no_grad()
57-
def predict(self, batch, batch_idx, dataloader_idx):
53+
def predict(self, batch, batch_idx=None, dataloader_idx=None):
5854
preds = self.forward(batch['img'])
5955
results = self.model.head.post_process(preds, batch)
6056
return results
@@ -103,11 +99,17 @@ def validation_step(self, batch, batch_idx):
10399
return res
104100

105101
def validation_epoch_end(self, validation_step_outputs):
102+
"""
103+
Called at the end of the validation epoch with the outputs of all validation steps.
104+
Evaluating results and save best model.
105+
Args:
106+
validation_step_outputs: A list of val outputs
107+
108+
"""
106109
results = {}
107110
for res in validation_step_outputs:
108111
results.update(res)
109-
eval_results = self.evaluator.evaluate(results, self.cfg.save_dir, self.current_epoch+1,
110-
self._logger, rank=self.local_rank)
112+
eval_results = self.evaluator.evaluate(results, self.cfg.save_dir, rank=self.local_rank)
111113
metric = eval_results[self.cfg.evaluator.save_key]
112114
# save best model
113115
if metric > self.save_flag:
@@ -125,9 +127,39 @@ def validation_epoch_end(self, validation_step_outputs):
125127
warnings.warn('Warning! Save_key is not in eval results! Only save model last!')
126128
if self.log_style == 'Lightning':
127129
for k, v in eval_results.items():
128-
self.log('Val/' + k, v, on_step=False, on_epoch=True, prog_bar=False, sync_dist=True)
130+
self.log('Val_metrics/' + k, v, on_step=False, on_epoch=True, prog_bar=False, sync_dist=True)
131+
elif self.log_style == 'NanoDet':
132+
for k, v in eval_results.items():
133+
self.scalar_summary('Val_metrics/' + k, 'Val', v, self.current_epoch+1)
134+
135+
def test_step(self, batch, batch_idx):
136+
dets = self.predict(batch, batch_idx)
137+
res = {batch['img_info']['id'].cpu().numpy()[0]: dets}
138+
return res
139+
140+
def test_epoch_end(self, test_step_outputs):
141+
results = {}
142+
for res in test_step_outputs:
143+
results.update(res)
144+
res_json = self.evaluator.results2json(results)
145+
json_path = os.path.join(self.cfg.save_dir, 'results.json')
146+
json.dump(res_json, open(json_path, 'w'))
147+
148+
if self.cfg.test_mode == 'val':
149+
eval_results = self.evaluator.evaluate(results, self.cfg.save_dir, rank=self.local_rank)
150+
txt_path = os.path.join(self.cfg.save_dir, "eval_results.txt")
151+
with open(txt_path, "a") as f:
152+
for k, v in eval_results.items():
153+
f.write("{}: {}\n".format(k, v))
129154

130155
def configure_optimizers(self):
156+
"""
157+
Prepare optimizer and learning-rate scheduler
158+
to use in optimization.
159+
160+
Returns:
161+
optimizer
162+
"""
131163
optimizer_cfg = copy.deepcopy(self.cfg.schedule.optimizer)
132164
name = optimizer_cfg.pop('name')
133165
build_optimizer = getattr(torch.optim, name)
@@ -153,6 +185,18 @@ def optimizer_step(self,
153185
on_tpu=None,
154186
using_native_amp=None,
155187
using_lbfgs=None):
188+
"""
189+
Performs a single optimization step (parameter update).
190+
Args:
191+
epoch: Current epoch
192+
batch_idx: Index of current batch
193+
optimizer: A PyTorch optimizer
194+
optimizer_idx: If you used multiple optimizers this indexes into that list.
195+
optimizer_closure: closure for all optimizers
196+
on_tpu: true if TPU backward is required
197+
using_native_amp: True if using native amp
198+
using_lbfgs: True if the matching optimizer is lbfgs
199+
"""
156200
# warm up lr
157201
if self.trainer.global_step <= self.cfg.schedule.warmup.steps:
158202
if self.cfg.schedule.warmup.name == 'constant':
@@ -180,6 +224,15 @@ def get_progress_bar_dict(self):
180224
return items
181225

182226
def scalar_summary(self, tag, phase, value, step):
227+
"""
228+
Write Tensorboard scalar summary log.
229+
Args:
230+
tag: Name for the tag
231+
phase: 'Train' or 'Val'
232+
value: Value to record
233+
step: Step value to record
234+
235+
"""
183236
if self.local_rank < 1:
184237
self.logger.experiment.add_scalars(tag, {phase: value}, step)
185238

nanodet/trainer/trainer.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,9 @@ def run(self, train_loader, val_loader, evaluator):
145145
results, val_loss_dict = self.run_epoch(self.epoch, val_loader, mode='val')
146146
for k, v in val_loss_dict.items():
147147
self.logger.scalar_summary('Epoch_loss/' + k, 'val', v, epoch)
148-
eval_results = evaluator.evaluate(results, self.cfg.save_dir, epoch, self.logger, rank=self.rank)
148+
eval_results = evaluator.evaluate(results, self.cfg.save_dir, rank=self.rank)
149+
for k, v in eval_results.items():
150+
self.logger.scalar_summary('Val_metrics/' + k, 'val', v, epoch)
149151
if self.cfg.evaluator.save_key in eval_results:
150152
metric = eval_results[self.cfg.evaluator.save_key]
151153
if metric > save_flag:

nanodet/util/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from .logger import Logger, MovingAverage, AverageMeter
44
from .data_parallel import DataParallel
55
from .distributed_data_parallel import DDP
6-
from .check_point import load_model_weight, save_model
6+
from .check_point import load_model_weight, save_model, convert_old_model
77
from .config import cfg, load_config
88
from .box_transform import *
99
from .util_mixins import NiceRepr

nanodet/util/check_point.py

+29
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11
import torch
2+
import pytorch_lightning as pl
3+
from collections import OrderedDict
24
from .rank_filter import rank_filter
35

6+
47
def load_model_weight(model, checkpoint, logger):
58
state_dict = checkpoint['state_dict']
69
# strip prefix of state_dict
710
if list(state_dict.keys())[0].startswith('module.'):
811
state_dict = {k[7:]: v for k, v in checkpoint['state_dict'].items()}
12+
if list(state_dict.keys())[0].startswith('model.'):
13+
state_dict = {k[6:]: v for k, v in checkpoint['state_dict'].items()}
914

1015
model_state_dict = model.module.state_dict() if hasattr(model, 'module') else model.state_dict()
1116

@@ -35,3 +40,27 @@ def save_model(model, path, epoch, iter, optimizer=None):
3540
data['optimizer'] = optimizer.state_dict()
3641

3742
torch.save(data, path)
43+
44+
45+
def convert_old_model(old_model_dict):
46+
if 'pytorch-lightning_version' in old_model_dict:
47+
raise ValueError('This model is not old format. No need to convert!')
48+
version = pl.__version__
49+
epoch = old_model_dict['epoch']
50+
global_step = old_model_dict['iter']
51+
state_dict = old_model_dict['state_dict']
52+
new_state_dict = OrderedDict()
53+
for name, value in state_dict.items():
54+
new_state_dict['model.' + name] = value
55+
56+
new_checkpoint = {'epoch': epoch,
57+
'global_step': global_step,
58+
'pytorch-lightning_version': version,
59+
'state_dict': new_state_dict,
60+
'lr_schedulers': []}
61+
62+
if 'optimizer' in old_model_dict:
63+
optimizer_states = [old_model_dict['optimizer']]
64+
new_checkpoint['optimizer_states'] = optimizer_states
65+
66+
return new_checkpoint

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/usr/bin/env python
22
from setuptools import find_packages, setup
33

4-
__version__ = "0.2.1"
4+
__version__ = "0.3.0"
55

66
if __name__ == '__main__':
77
setup(

0 commit comments

Comments
 (0)