Skip to content

Commit 32a186e

Browse files
Francisc Bungiufacebook-github-bot
Francisc Bungiu
authored andcommitted
Add ODS logging to all runners
Summary: Pull Request resolved: #5050 X-link: facebookresearch/d2go#606 Allow attaching a monitoring service to the training loop. Reviewed By: miqueljubert Differential Revision: D47595332 fbshipit-source-id: 49d770207aeea56113c008fcd29ad7b545cec849
1 parent 57bdb21 commit 32a186e

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

detectron2/engine/train_loop.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -388,14 +388,16 @@ def write_metrics(
388388
metrics_dict = {k: v.detach().cpu().item() for k, v in loss_dict.items()}
389389
metrics_dict["data_time"] = data_time
390390

391+
storage = get_event_storage()
392+
# Keep track of data time per rank
393+
storage.put_scalar("rank_data_time", data_time, cur_iter=cur_iter)
394+
391395
# Gather metrics among all workers for logging
392396
# This assumes we do DDP-style training, which is currently the only
393397
# supported method in detectron2.
394398
all_metrics_dict = comm.gather(metrics_dict)
395399

396400
if comm.is_main_process():
397-
storage = get_event_storage()
398-
399401
# data_time among workers can have high variance. The actual latency
400402
# caused by data_time is the maximum among workers.
401403
data_time = np.max([x.pop("data_time") for x in all_metrics_dict])

0 commit comments

Comments
 (0)