Skip to content

Commit 75f26c8

Browse files
Czm369triple-Muq3394101ZwwWayne
authored
[Feature]Add coco, lvis and voc dataset download script based on pr open-mmlab#6715 (open-mmlab#7015)
* Add coco dataset download script You can use command "python tools/download.py --win --unzip" to download coco dataset. Linux for using command "python tools/download.py --unzip" * Add coco dataset download script * Add coco dataset download script * Add coco dataset download script * add some notes and modify dataset urls * add some notes and modify dataset urls * remove some useless lines and modify urls list to dict * add urls of lvis and voc, and delete --win * add parse_args() * Add documentation of this tool in docs/en/1_exist_data_model.md, docs/zh_cn/1_exist_data_model.md and docs/en/useful_tools.md. * add a link * Download files regardless of system。 * Use get() of dict * add empty line above the code block * Update useful_tools.md Co-authored-by: q3394101 <[email protected]> Co-authored-by: q3394101 <[email protected]> Co-authored-by: Wenwei Zhang <[email protected]>
1 parent 794a87c commit 75f26c8

File tree

4 files changed

+117
-0
lines changed

4 files changed

+117
-0
lines changed

docs/en/1_exist_data_model.md

+4
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,10 @@ Public datasets like [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/index.h
174174
It is recommended to download and extract the dataset somewhere outside the project directory and symlink the dataset root to `$MMDETECTION/data` as below.
175175
If your folder structure is different, you may need to change the corresponding paths in config files.
176176

177+
We provide a script to download datasets such as COCO , you can run `python tools/misc/download_dataset.py --dataset-name coco2017` to download COCO dataset.
178+
179+
For more usage please refer to [dataset-download](https://github.com/open-mmlab/mmdetection/tree/master/docs/en/useful_tools.md#dataset-download)
180+
177181
```text
178182
mmdetection
179183
├── mmdet

docs/en/useful_tools.md

+10
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,16 @@ python tools/dataset_converters/cityscapes.py ${CITYSCAPES_PATH} [-h] [--img-dir
377377
python tools/dataset_converters/pascal_voc.py ${DEVKIT_PATH} [-h] [-o ${OUT_DIR}]
378378
```
379379

380+
## Dataset Download
381+
382+
`tools/misc/download_dataset.py` supports downloading datasets such as COCO, VOC, and LVIS.
383+
384+
```shell
385+
python tools/misc/download_dataset.py --dataset-name coco2017
386+
python tools/misc/download_dataset.py --dataset-name voc2007
387+
python tools/misc/download_dataset.py --dataset-name lvis
388+
```
389+
380390
## Benchmark
381391

382392
### Robust Detection Benchmark

docs/zh_cn/1_exist_data_model.md

+1
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,7 @@ asyncio.run(main())
172172
注意:在检测任务中,Pascal VOC 2012 是 Pascal VOC 2007 的无交集扩展,我们通常将两者一起使用。
173173
我们建议将数据集下载,然后解压到项目外部的某个文件夹内,然后通过符号链接的方式,将数据集根目录链接到 `$MMDETECTION/data` 文件夹下,格式如下所示。
174174
如果你的文件夹结构和下方不同的话,你需要在配置文件中改变对应的路径。
175+
我们提供了下载 COCO 等数据集的脚本,你可以运行 `python tools/misc/download_dataset.py --dataset-name coco2017` 下载 COCO 数据集。
175176

176177
```plain
177178
mmdetection

tools/misc/download_dataset.py

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
import argparse
2+
from itertools import repeat
3+
from multiprocessing.pool import ThreadPool
4+
from pathlib import Path
5+
from tarfile import TarFile
6+
from zipfile import ZipFile
7+
8+
import torch
9+
10+
11+
def parse_args():
12+
parser = argparse.ArgumentParser(
13+
description='Download datasets for training')
14+
parser.add_argument(
15+
'--dataset-name', type=str, help='dataset name', default='coco2017')
16+
parser.add_argument(
17+
'--save-dir',
18+
type=str,
19+
help='the dir to save dataset',
20+
default='data/coco')
21+
parser.add_argument(
22+
'--unzip',
23+
action='store_true',
24+
help='whether unzip dataset or not, zipped files will be saved')
25+
parser.add_argument(
26+
'--delete',
27+
action='store_true',
28+
help='delete the download zipped files')
29+
parser.add_argument(
30+
'--threads', type=int, help='number of threading', default=4)
31+
args = parser.parse_args()
32+
return args
33+
34+
35+
def download(url, dir, unzip=True, delete=False, threads=1):
36+
37+
def download_one(url, dir):
38+
f = dir / Path(url).name
39+
if Path(url).is_file():
40+
Path(url).rename(f)
41+
elif not f.exists():
42+
print('Downloading {} to {}'.format(url, f))
43+
torch.hub.download_url_to_file(url, f, progress=True)
44+
if unzip and f.suffix in ('.zip', '.tar'):
45+
print('Unzipping {}'.format(f.name))
46+
if f.suffix == '.zip':
47+
ZipFile(f).extractall(path=dir)
48+
elif f.suffix == '.tar':
49+
TarFile(f).extractall(path=dir)
50+
if delete:
51+
f.unlink()
52+
print('Delete {}'.format(f))
53+
54+
dir = Path(dir)
55+
if threads > 1:
56+
pool = ThreadPool(threads)
57+
pool.imap(lambda x: download_one(*x), zip(url, repeat(dir)))
58+
pool.close()
59+
pool.join()
60+
else:
61+
for u in [url] if isinstance(url, (str, Path)) else url:
62+
download_one(u, dir)
63+
64+
65+
def main():
66+
args = parse_args()
67+
path = Path(args.save_dir)
68+
if not path.exists():
69+
path.mkdir(parents=True, exist_ok=True)
70+
data2url = dict(
71+
# TODO: Support for downloading Panoptic Segmentation of COCO
72+
coco2017=[
73+
'http://images.cocodataset.org/zips/train2017.zip',
74+
'http://images.cocodataset.org/zips/val2017.zip',
75+
'http://images.cocodataset.org/zips/test2017.zip',
76+
'http://images.cocodataset.org/annotations/' +
77+
'annotations_trainval2017.zip'
78+
],
79+
lvis=[
80+
'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip', # noqa
81+
'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip', # noqa
82+
],
83+
voc2007=[
84+
'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar', # noqa
85+
'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar', # noqa
86+
'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar', # noqa
87+
],
88+
)
89+
url = data2url.get(args.dataset_name, None)
90+
if url is None:
91+
print('Only support COCO, VOC, and LVIS now!')
92+
return
93+
download(
94+
url,
95+
dir=path,
96+
unzip=args.unzip,
97+
delete=args.delete,
98+
threads=args.threads)
99+
100+
101+
if __name__ == '__main__':
102+
main()

0 commit comments

Comments
 (0)