Logging metrics at datapoint level #16573

GregorySech · 2023-01-31T10:13:01Z

GregorySech
Jan 31, 2023

I'm trying to log some metrics at the datapoint (image/dataset row/sample) level.

For example, considering a semantic segmentation task I would like to log for each image its intersection over union and loss value.
What I've done so far is use the LightningModule.log method and assign a log key for each image:

def _shared_step_end(self, stage: str, step_output: STEP_OUTPUT):
        (
            loss,
            batch_loss,
            sem_loss1,
            sem_loss2,
            change_loss,
            regularization,
            iou,
            iou1,
            iou2,
            image_ids,
        ) = unpack_keys(step_output)
       # some code here for other metrics
       # ...
        for i, _id in enumerate(image_ids):
            tag = f"{stage}_data/{_id}"
            self.log(
                f"{tag}/loss",
                batch_loss[i],
                sync_dist=True,
            )
            self.log(
                f"{tag}/IoU",
                mean_iou[i],
                sync_dist=True,
            )
           # rest of the metrics here...

This method is then called by both training_step_end and validation_step_end with an appropriate stage string.
However, I've run into an unexpected issue, after 50 steps from the first log I find the same value logged on Tensorboard.

I should add that I'm using DDP, however, this behaviour happens regardless of how many devices. The screenshot refers to a run with a single GPU.

The Dataset is implemented using the filename as the "image_id" so there are no duplicates (and the number of images does add up to the correct dataset size).

I was wondering if this behaviour is to be expected and if there is a smarter way of logging at this level of granularity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging metrics at datapoint level #16573

{{title}}

Replies: 0 comments

Select a reply

Logging metrics at datapoint level #16573

GregorySech Jan 31, 2023

Replies: 0 comments

GregorySech
Jan 31, 2023