Skip to content

Commit 21b0208

Browse files
authored
Update README.md (#64)
1 parent df332dc commit 21b0208

File tree

1 file changed

+120
-1
lines changed

1 file changed

+120
-1
lines changed

experiments/README.md

+120-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ wget http://images.cocodataset.org/zips/val2017.zip
1616
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
1717

1818
## Folder structure of experimental data
19-
19+
```
2020
experiments_data/tmp
2121
experiments_data/tmp/sam_coco_mask_center_cache
2222
experiments_data/tmp/sam_eval_masks_out
@@ -25,3 +25,122 @@ experiments_data/datasets/coco2017
2525
experiments_data/datasets/coco2017/val2017
2626
experiments_data/datasets/coco2017/annotations
2727
experiments_data/checkpoints
28+
```
29+
## Environment details
30+
31+
### Hardware
32+
These experiments were run on an Amazon p4d.24xlarge instance. See the Product details of the EC2 website for the exact details. A few key highlights are
33+
34+
- 8 A100 GPUs with 40960MiB running at 400W
35+
- 96 vCPUs
36+
- 1152 GiB of RAM
37+
- Software
38+
39+
40+
### Versions
41+
42+
- PyTorch nightly and Python 3.10
43+
- https://github.com/cpuhrsch/segment-anything fork of https://github.com/facebookresearch/segment-anything with additional commits if you want to reproduce baseline and first few experiments
44+
- This https://github.com/pytorch-labs/segment-anything-fast
45+
46+
### Installation instructions
47+
48+
```
49+
$ conda create -n nightly20231023py310
50+
$ conda activate nightly20231023py310
51+
$ conda install python=3.10
52+
$ pip install https://download.pytorch.org/whl/nightly/cu121/torch-2.2.0.dev20231023%2Bcu121-cp310-cp310-linux_x86_64.whl
53+
$ pip install https://download.pytorch.org/whl/nightly/cu121/torchvision-0.17.0.dev20231023%2Bcu121-cp310-cp310-linux_x86_64.whl
54+
$ cd /scratch/cpuhrsch/dev
55+
$ git clone https://github.com/cpuhrsch/segment-anything.git
56+
$ cd segment-anything
57+
$ pip install -e .
58+
$ cd /scratch/cpuhrsch/dev
59+
$ git clone https://github.com/pytorch-labs/segment-anything-fast.git
60+
$ cd segment-anything-fast
61+
$ pip install -e .
62+
```
63+
64+
If you plan to run the scripts that run the experiments from segment-anything-fast it is important to install the segment-anything fork in editable mode so that the script can switch between different commits of the fork automatically.
65+
66+
67+
### How to run experiments
68+
69+
```
70+
$ python run_experiments.py 16 vit_b <pytorch_github> <segment-anything_github> <path_to_experiments_data> --run-experiments --num-workers 32
71+
```
72+
73+
If at any point you run into issue, please note that you can increase verbosity by adding `--capture_output False` to above command. Also, please don't hesitate to open an issue.
74+
75+
76+
### Data
77+
We are using the COCO2017 Validation (Val images) dataset. We use this dataset to serve as a somewhat realistic distribution of input images and aim to measure a) accuracy and b) performance.
78+
Measurement
79+
Accuracy
80+
Our main goal is to verify that our performance optimizations do not degrade the accuracy of the model. We do not aim to reproduce any paper results or aim to make statements about the accuracy of this model on the dataset. This measurement serves as an additional integration test in conjunction with numerous unit and other separate integration tests.
81+
82+
We calculate the center points of the mask annotations using a rudimentary version of https://arxiv.org/pdf/2304.02643.pdf, section D.1.Point Sampling ([code](https://github.com/pytorch-labs/segment-anything-fast/blob/67d5c894569e99b9fdba55cfcf2f724be9f68994/experiments/data.py#L10-L120)). These center points serve as annotations per image. Note that the number of masks and thus number of annotations per image vary.
83+
84+
These images and annotations are given to the predict_torch method of an instance of SamPredictor to predict masks. These are then compared to the ground truth masks using the Intersection over Union (IoU) metric ([code](https://github.com/pytorch-labs/segment-anything-fast/blob/67d5c894569e99b9fdba55cfcf2f724be9f68994/experiments/metrics.py#L4-L22)). We calculate the mean IoU (mIoU) metric over the entire 5000 images of the validation dataset to track accuracy.
85+
Performance
86+
Our goal is to measure the runtime of PyTorch models. We purposefully exclude data movements or calculation of the metrics. Specifically we measure the execution time on the GPU of running the image encoder (e.g. vit_h) and SamPredictor.predict_torch ([code](https://github.com/pytorch-labs/segment-anything-fast/blob/67d5c894569e99b9fdba55cfcf2f724be9f68994/experiments/eval_combo.py#L127-L165), [code](https://github.com/pytorch-labs/segment-anything-fast/blob/67d5c894569e99b9fdba55cfcf2f724be9f68994/experiments/eval_combo.py#L68-L99)).
87+
88+
Each experiment is run in a separate Python process created from scratch. We run three batches of warmup before each experiment. This also implies that we are excluding compilation time from benchmarking.
89+
90+
We measure the execution time and calculate the number of images that can be processed per image (img/s). We also measure the maximum amount of memory allocated at the end of the process using torch.cuda.max_memory_allocated.
91+
Tracing
92+
93+
We collect kernel and memory traces using PyTorch native tooling and analyze it with [Perfetto UI](https://perfetto.dev/). When collecting these traces and profiles we typically only limit us to a few batches. Otherwise the files can become very large and difficult to load.
94+
95+
### Kernel traces
96+
97+
One can write a simple wrapper that runs a function under the tracer context and writes out the result to a compressed json file. The resulting chrome trace can then be analyzed with Perfetto UI.
98+
99+
```
100+
def profiler_runner(path, fn, *args, **kwargs):
101+
with torch.profiler.profile(
102+
activities=[torch.profiler.ProfilerActivity.CPU,
103+
torch.profiler.ProfilerActivity.CUDA],
104+
record_shapes=True) as prof:
105+
result = fn(*args, **kwargs)
106+
prof.export_chrome_trace(path)
107+
return result
108+
```
109+
110+
It can be very useful to annotate certain regions in these traces to map (pieces of) the code to the overall traces. For this we frequently use record_function. Consider the following as an example.
111+
112+
```
113+
with torch.autograd.profiler.record_function("timed region"):
114+
with torch.autograd.profiler.record_function("image encoder"):
115+
features_batch = encoder(input_image_batch)
116+
features_batch = features_batch[:orig_input_image_batch_size]
117+
118+
with torch.autograd.profiler.record_function("nt predict_torch"):
119+
predictor.reset_image()
120+
[...]
121+
```
122+
123+
### Memory profiles
124+
125+
We record the memory history and use memory_viz.py to convert the result into a human readable html file.
126+
127+
```
128+
def memory_runner(path, fn, *args, **kwargs):
129+
print("Start memory recording")
130+
torch.cuda.synchronize()
131+
torch.cuda.memory._record_memory_history(
132+
True,
133+
trace_alloc_max_entries=100000,
134+
trace_alloc_record_context=True
135+
)
136+
result = fn(*args, **kwargs)
137+
torch.cuda.synchronize()
138+
snapshot = torch.cuda.memory._snapshot()
139+
print("Finish memory recording")
140+
import pickle
141+
with open(path, 'wb') as f:
142+
pickle.dump(snapshot, f)
143+
# Use to convert pickle file into html
144+
# python torch/cuda/_memory_viz.py trace_plot <snapshot>.pickle -o <snapshot>.html
145+
return result
146+
```

0 commit comments

Comments
 (0)