Skip to content

Conversation

cwagaman
Copy link

This new simulation (and its corresponding visualization) attempts to show how quickly a "best caption" rises to the top of the rankings. We perform two visualizations.

  1. The graph titled "# Captions within 95% CI of Current Funniest" provides a visualization for how soon a caption (not necessarily the true funniest caption) can plausibly be identified as the funniest. First, the average user-provided rating is computed for each caption. Then, a 95% CI is computed for each of these average user-provided ratings (basically using the central limit theorem). The corresponding graph displays the number of captions with a 95% CI intersecting the 95% CI around the caption with the highest average user-provided rating.
  2. The graph titled "# Captions with Simulated Rating Higher than True Funniest" provides a visualization for how quickly the funniest caption can be correctly identified. the following. Recall that we have access to the ground truth for which caption is funniest. This graph displays how many captions, after a given number of queries, have recieved an average user-provided rating that is better than the average user-provided rating received by the true funniest caption.

Each visualization is performed for three different learning strategies.

  1. "Random" randomly selects captions for users to rate.
  2. "Active" adaptively chooses captions for users to rate according to the upper confidence bound strategy described in https://arxiv.org/abs/1312.7308.
  3. "lil_KLUCB" adaptively chooses captions for users to rate according to the upper confidence bound strategy described in https://arxiv.org/abs/1709.03570.

The line on each graph is a plot of the mean, taken over 10 samples. The shaded region around each line is the standard deviation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant