diff --git a/lab3/solutions/Lab3_Bias_And_Uncertainty.ipynb b/lab3/solutions/Lab3_Bias_And_Uncertainty.ipynb index 33cbff92..a210f0f0 100644 --- a/lab3/solutions/Lab3_Bias_And_Uncertainty.ipynb +++ b/lab3/solutions/Lab3_Bias_And_Uncertainty.ipynb @@ -2,142 +2,73 @@ "cells": [ { "cell_type": "markdown", - "metadata": { - "id": "IgYKebt871EK" - }, "source": [ - "# Laboratory 3: Detecting and mitigating bias and uncertainty in Facial Detection Systems\n", - "In this lab, we'll continue to explore how to mitigate algorithmic bias in facial recognition systems. In addition, we'll explore the notion of *uncertainty* in datasets, and learn how to reduce both data-based and model-based uncertainty.\n", - "\n", - "As we've seen in lecture 5, bias and uncertainty underlie many common issues with machine learning models today, and these are not just limited to classification tasks. Automatically detecting and mitigating uncertainty is crucial to deploying fair and safe models. \n", - "\n", - "In this lab, we'll be using [CAPSA](https://github.com/themis-ai/capsa/), a software package developed by [Themis AI](https://themisai.io/), which automatically *wraps* models to make them risk-aware and plugs into training workflows. We'll explore how we can use CAPSA to diagnose uncertainties, and then develop methods for automatically mitigating them.\n", - "\n", - "\n", - "Run the next code block for a short video from Google that explores how and why it's important to consider bias when thinking about machine learning:" - ] + "<table align=\"center\">\n", + " <td align=\"center\"><a target=\"_blank\" href=\"http://introtodeeplearning.com\">\n", + " <img src=\"https://i.ibb.co/Jr88sn2/mit.png\" style=\"padding-bottom:5px;\" />\n", + " Visit MIT Deep Learning</a></td>\n", + " <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning/blob/2023/lab3/solutions/Lab3_Bias_And_Uncertainty.ipynb\">\n", + " <img src=\"https://i.ibb.co/2P3SLwK/colab.png\" style=\"padding-bottom:5px;\" />Run in Google Colab</a></td>\n", + " <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning/blob/2023/lab3/solutions/Lab3_Bias_And_Uncertainty.ipynb\">\n", + " <img src=\"https://i.ibb.co/xfJbPmL/github.png\" height=\"70px\" style=\"padding-bottom:5px;\" />View Source on GitHub</a></td>\n", + "</table>\n", + "\n", + "# Copyright Information" + ], + "metadata": { + "id": "Kxl9-zNYhxlQ" + } }, { "cell_type": "code", "source": [ - "!git clone https://github.com/slolla/capsa-intro-deep-learning.git\n", - "!cd capsa-intro-deep-learning/ && git checkout HistogramVAEWrapper\n" + "# Copyright 2023 MIT Introduction to Deep Learning. All Rights Reserved.\n", + "# \n", + "# Licensed under the MIT License. You may not use this file except in compliance\n", + "# with the License. Use and/or modification of this code outside of MIT Introduction\n", + "# to Deep Learning must reference:\n", + "#\n", + "# © MIT Introduction to Deep Learning\n", + "# http://introtodeeplearning.com\n", + "#" ], "metadata": { - "id": "5Ll7uZ8q72hm", - "outputId": "56b3117b-e344-481b-a9fc-2798b76d7a60", - "colab": { - "base_uri": "https://localhost:8080/" - } + "id": "aAcJJN3Xh3S1" }, - "execution_count": 1, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "fatal: destination path 'capsa-intro-deep-learning' already exists and is not an empty directory.\n", - "Already on 'HistogramVAEWrapper'\n", - "Your branch is up to date with 'origin/HistogramVAEWrapper'.\n" - ] - } - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", "metadata": { - "id": "6JTRoM7E71EU" + "id": "IgYKebt871EK" }, "source": [ - "Let's get started by installing the relevant dependencies:" + "# Laboratory 3: Debiasing, Uncertainty, and Robustness\n", + "\n", + "# Part 2: Mitigating Bias and Uncertainty in Facial Detection Systems\n", + "\n", + "In Lab 2, we defined a semi-supervised VAE (SS-VAE) to diagnose feature representation disparities and biases in facial detection systems. In Lab 3 Part 1, we gained experience with [Capsa](https://github.com/themis-ai/capsa/) and its ability to build risk-aware models automatically through wrapping. Now in this lab, we will put these two together: using Capsa to build systems that can *automatically* uncover and mitigate bias and uncertainty in facial detection systems.\n", + "\n", + "As we have seen, automatically detecting and mitigating bias and uncertainty is crucial to deploying fair and safe models. Building off our foundation with Capsa, developed by [Themis AI](https://themisai.io/), we will now use Capsa for the facial detection problem, in order to diagnose risks in facial detection models. You will then design and create strategies to mitigate these risks, with goal of improving model performance across the entire facial detection dataset.\n", + "\n", + "**Your goal in this lab -- and the associated competition -- is to design a strategic solution for bias and uncertainty mitigation, using Capsa.** The approaches and solutions with oustanding performance will be recognized with outstanding prizes! Details on the submission process are at the end of this lab.\n", + "\n", + "" ] }, { - "cell_type": "code", - "source": [ - "%cd capsa-intro-deep-learning/\n", - "%pip install -e .\n", - "%cd .." - ], + "cell_type": "markdown", "metadata": { - "id": "SjAn-WZK9lOv", - "outputId": "35e24600-85b4-4320-c436-061856e56861", - "colab": { - "base_uri": "https://localhost:8080/" - } + "id": "6JTRoM7E71EU" }, - "execution_count": 2, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "/content/capsa-intro-deep-learning\n", - "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", - "Obtaining file:///content/capsa-intro-deep-learning\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Installing collected packages: capsa\n", - " Attempting uninstall: capsa\n", - " Found existing installation: capsa 0.1.2\n", - " Can't uninstall 'capsa'. No files were found to uninstall.\n", - " Running setup.py develop for capsa\n", - "Successfully installed capsa-0.1.2\n", - "/content\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "!git clone https://github.com/aamini/introtodeeplearning.git\n", - "!cd introtodeeplearning/ && git checkout 2023\n", - "%cd introtodeeplearning/\n", - "%pip install -e .\n", - "%cd .." - ], - "metadata": { - "id": "3pzGVPrh-4LQ", - "outputId": "f4588f12-d290-4746-d819-501a0e3ba390", - "colab": { - "base_uri": "https://localhost:8080/" - } - }, - "execution_count": 3, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "fatal: destination path 'introtodeeplearning' already exists and is not an empty directory.\n", - "Already on '2023'\n", - "Your branch is up to date with 'origin/2023'.\n", - "/content/introtodeeplearning\n", - "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", - "Obtaining file:///content/introtodeeplearning\n", - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - "Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning==0.3.0) (1.21.6)\n", - "Requirement already satisfied: regex in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning==0.3.0) (2022.6.2)\n", - "Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning==0.3.0) (4.64.1)\n", - "Requirement already satisfied: gym in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning==0.3.0) (0.25.2)\n", - "Requirement already satisfied: importlib-metadata>=4.8.0 in /usr/local/lib/python3.8/dist-packages (from gym->mitdeeplearning==0.3.0) (5.2.0)\n", - "Requirement already satisfied: gym-notices>=0.0.4 in /usr/local/lib/python3.8/dist-packages (from gym->mitdeeplearning==0.3.0) (0.0.8)\n", - "Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from gym->mitdeeplearning==0.3.0) (1.5.0)\n", - "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=4.8.0->gym->mitdeeplearning==0.3.0) (3.11.0)\n", - "Installing collected packages: mitdeeplearning\n", - " Attempting uninstall: mitdeeplearning\n", - " Found existing installation: mitdeeplearning 0.3.0\n", - " Can't uninstall 'mitdeeplearning'. No files were found to uninstall.\n", - " Running setup.py develop for mitdeeplearning\n", - "Successfully installed mitdeeplearning-0.3.0\n", - "/content\n" - ] - } + "Let's get started by installing the necessary dependencies:" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "id": "2PdAhs1371EU" }, @@ -152,11 +83,14 @@ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from tqdm import tqdm\n", - "from capsa import *\n", + "\n", "# Download and import the MIT 6.S191 package\n", - "from mitdeeplearning import lab3 \n", + "!pip install git+https://github.com/aamini/introtodeeplearning.git@2023\n", + "import mitdeeplearning as mdl\n", + "\n", "# Download and import capsa\n", - "#!pip install capsa\n" + "!pip install capsa\n", + "import capsa" ] }, { @@ -165,49 +99,33 @@ "id": "6VKVqLb371EV" }, "source": [ - "## 3.1 Datasets\n", + "# 3.1 Datasets\n", "\n", - "We'll be using the same datasets from lab 2 in this lab. Note that in this dataset, we've intentionally perturbed some of the samples in some ways (it's up to you to figure out how!) that are not necessarily present in the actual dataset. \n", + "Since we are again focusing on the facial detection problem, we will use the same datasets from Lab 2. To remind you, we have a dataset of positive examples (i.e., of faces) and a dataset of negative examples (i.e., of things that are not faces).\n", "\n", - "1. **Positive training data**: [CelebA Dataset](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). A large-scale (over 200K images) of celebrity faces. \n", - "2. **Negative training data**: [ImageNet](http://www.image-net.org/). Many images across many different categories. We'll take negative examples from a variety of non-human categories. \n", - "[Fitzpatrick Scale](https://en.wikipedia.org/wiki/Fitzpatrick_scale) skin type classification system, with each image labeled as \"Lighter'' or \"Darker''.\n", + "1. **Positive training data**: [CelebA Dataset](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). A large-scale dataset (over 200K images) of celebrity faces. \n", + "2. **Negative training data**: [ImageNet](http://www.image-net.org/). A large-scale dataset with many images across many different categories. We will take negative examples from a variety of non-human categories.\n", "\n", - "Like before, let's begin by importing these datasets. We've written a class that does a bit of data pre-processing to import the training data in a usable format.\n", + "We will evaluate trained models on an independent test dataset of face images to diagnose and mitigate potential issues with *bias, fairness, and confidence*. This will be a larger test dataset for evaluation purposes.\n", "\n", - "Also note that in this lab, we'll be using a much larger test dataset for evaluation purposes." + "We begin by importing these datasets. We have defined a `DatasetLoader` class that does a bit of data pre-processing to import the training data in a usable format." ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { - "id": "HIA6EA1D71EW", - "outputId": "df98738c-00d5-4987-bd58-938dd17c8ef4", - "colab": { - "base_uri": "https://localhost:8080/" - } + "id": "HIA6EA1D71EW" }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Opening /root/.keras/datasets/train_face_2023_v2.h5\n", - "Loading data into memory...\n", - "Opening /root/.keras/datasets/train_face_2023_v2.h5\n", - "Loading data into memory...\n" - ] - } - ], + "outputs": [], "source": [ "batch_size = 32\n", "\n", "# Get the training data: both images from CelebA and ImageNet\n", "path_to_training_data = tf.keras.utils.get_file('train_face_2023_v2.h5', 'https://www.dropbox.com/s/b5z1cd317y5u1tr/train_face_2023_v2.h5?dl=1')\n", "# Instantiate a DatasetLoader using the downloaded dataset\n", - "train_loader = lab3.DatasetLoader(path_to_training_data, training=True, batch_size= batch_size)\n", - "test_loader = lab3.DatasetLoader(path_to_training_data, training=False, batch_size = batch_size)" + "train_loader = mdl.lab3.DatasetLoader(path_to_training_data, training=True, batch_size=batch_size)\n", + "test_loader = mdl.lab3.DatasetLoader(path_to_training_data, training=False, batch_size=batch_size)" ] }, { @@ -216,13 +134,11 @@ "id": "cREmhMWJ71EX" }, "source": [ - "### Recap: Thinking about bias and uncertainty\n", + "### Building robustness to bias and uncertainty\n", "\n", - "Remember that we'll be training our facial detection classifiers on the large, well-curated CelebA dataset (and ImageNet), and then evaluating their accuracy by testing them on an independent test dataset. Our goal is to build a model that trains on CelebA *and* achieves high classification accuracy on the the test dataset across all demographics, and to thus show that this model does not suffer from any hidden bias. \n", + "Remember that we'll be training our facial detection classifiers on the large, well-curated CelebA dataset (and ImageNet), and then evaluating their accuracy by testing them on an independent test dataset. We want to mitigate the effects of unwanted bias and uncertainty on the model's predictions and performance. Your goal is to build the best-performing, most robust model, one that achieves high classification accuracy across the entire test dataset.\n", "\n", - "In addition to thinking about bias, we want to detect areas of high *aleatoric* uncertainty in the dataset, which is defined as data noise: in the context of facial detection, this means that we may have very similar inputs with different labels-- think about the scenario where one face is labeled correctly as a positive, and another face is labeled incorrectly as a negative. \n", - "\n", - "Finally, we want to look at samples with high *epistemic*, or predictive, uncertainty. These may be samples that are anomalous or out of distribution, samples that contain adversarial noise, or samples that are \"harder\" to learn in some way. Importantly, epistemic uncertainty is not the same as bias! We may have well-represented samples that still have high epistemic uncertainty. " + "To achieve this, you may want to consider the three metrics introduced with Capsa: (1) representation bias, (2) data or aleatoric uncertainty, and (3) model or epistemic uncertainty. Note that all three of these metrics are different! For example, we can have well-represented examples that still have high epistemic uncertainty. Think about how you may use these metrics to improve the performance of your model." ] }, { @@ -231,34 +147,27 @@ "id": "1NhotGiT71EY" }, "source": [ - "# 3.2 Bias\n", + "# 3.2 Risk-aware facial detection with Capsa\n", "\n", - "In the previous lab, we used a variational autoencoder (VAE) to automatically learn the latent structure of our database, and we developed a scoring mechanism for samples to determine their bias. In this lab, we'll show that we can use CAPSA to do the same thing in one line! Then, our goal will be to continue our implementation of the DB-VAE and use the latent variables learned via a VAE to adaptively re-sample the CelebA data during training. Specifically, we will alter the probability that a given image is used during training based on how often its latent features appear in the dataset. So, faces with rarer features (like dark skin, sunglasses, or hats) should become more likely to be sampled during training, while the sampling probability for faces with features that are over-represented in the training dataset should decrease (relative to uniform random sampling across the training data)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "niy4he0m71EZ" - }, - "source": [ - "Just like the last lab, let's define a standard classifier that we'll use as the base encoder of our network." + "In Lab 2, we built a semi-supervised variational autoencoder (SS-VAE) to learn the latent structure of our database and to uncover feature representation disparities, inspired by the approach of [uncover hidden biases](http://introtodeeplearning.com/AAAI_MitigatingAlgorithmicBias.pdf). In this lab, we'll show that we can use Capsa to build the same VAE in one line!\n", + "\n", + "This sets the foundation for quantifying a key risk metric -- representation bias -- for the facial detection problem. In working to improve your model's performance, you will want to consider representation bias carefully and think about how you could mitigate the effect of representation bias.\n", + "\n", + "Just like in Lab 2, we begin by defining a standard CNN-based classifier. We will then use Capsa to wrap the model and build the risk-aware VAE variant." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "id": "5hQb75Vm71EZ" }, "outputs": [], "source": [ - "### Define the CNN model ###\n", - "\n", - "n_filters = 12 # base number of convolutional filters\n", + "### Define the CNN classifier model ###\n", "\n", "'''Function to define a standard CNN model'''\n", - "def make_standard_classifier(n_outputs=1):\n", + "def make_standard_classifier(n_outputs=1, n_filters=12):\n", " Conv2D = functools.partial(tf.keras.layers.Conv2D, padding='same', activation='relu')\n", " BatchNormalization = tf.keras.layers.BatchNormalization\n", " Flatten = tf.keras.layers.Flatten\n", @@ -278,6 +187,9 @@ " Conv2D(filters=6*n_filters, kernel_size=3, strides=2),\n", " BatchNormalization(),\n", "\n", + " Conv2D(filters=8*n_filters, kernel_size=3, strides=2),\n", + " BatchNormalization(),\n", + "\n", " Flatten(),\n", " Dense(512),\n", " Dense(n_outputs, activation=None),\n", @@ -291,29 +203,35 @@ "id": "LgTG6buf71Ea" }, "source": [ - "Let's use CAPSA's `HistogramVAEWrapper` to analyze the latent space distribution as we did previously. The `HistogramVAEWrapper` constructs a histogram with `num_bins` bins across every dimension of the latent space, and then calculates the joint probability of every sample according to the histograms. The samples with the lowest joint probability have the lowest bias, and we want to oversample these. Conversely, we want to undersample the areas of the dataset with the highest bias." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "FivHOdGE71Ea" - }, - "source": [ - "The `HistogramVAEWrapper` class takes in a number of arguments: namely, the number of bins we want to discretize our distribution into, the number of samples we want to track at any given point, and whether we're using the output of a hidden layer (good for higher-dimensional data) or the input data itself (good for lower-dimensional data). Since this is a variational autoencoder, we need to also pass in a decoder. Let's define the same decoder as the previous lab:" + "### Capsa's `HistogramVAEWrapper`\n", + "\n", + "With our base classifier Capsa allows us to automatically define a VAE implementing that base classifier. Capsa's [`HistogramVAEWrapper`](https://themisai.io/capsa/api_documentation/HistogramVAEWrapper.html) builds this VAE to analyze the latent space distribution, just as we did in Lab 2. \n", + "\n", + "Specifically, `capsa.HistogramVAEWrapper` constructs a histogram with `num_bins` bins across every dimension of the latent space, and then calculates the joint probability of every sample according to the constructed histograms. The samples with the lowest joint probability have the lowest representation; the samples with the highest joint probability have the highest representation.\n", + "\n", + "`capsa.HistogramVAEWrapper` takes in a number of arguments including:\n", + "1. `base_model`: the model to be transformed into the risk-aware variant.\n", + "2. `num_bins`: the number of bins we want to discretize our distribution into. \n", + "2. `queue_size`: the number of samples we want to track at any given point.\n", + "3. `decoder`: the decoder architecture for the VAE.\n", + "\n", + "We define the same decoder as in Lab 2:" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { "id": "zTat3K8E71Eb" }, "outputs": [], "source": [ + "### Define the decoder architecture for the facial detection VAE ###\n", + "\n", "def make_face_decoder_network(n_filters=12):\n", " # Functionally define the different layer types we will use\n", - " Conv2DTranspose = functools.partial(tf.keras.layers.Conv2DTranspose, padding='same', activation='relu')\n", + " Conv2DTranspose = functools.partial(tf.keras.layers.Conv2DTranspose, \n", + " padding='same', activation='relu')\n", " BatchNormalization = tf.keras.layers.BatchNormalization\n", " Flatten = tf.keras.layers.Flatten\n", " Dense = functools.partial(tf.keras.layers.Dense, activation='relu')\n", @@ -322,10 +240,11 @@ " # Build the decoder network using the Sequential API\n", " decoder = tf.keras.Sequential([\n", " # Transform to pre-convolutional generation\n", - " Dense(units=4*4*6*n_filters), # 4x4 feature maps (with 6N occurances)\n", - " Reshape(target_shape=(4, 4, 6*n_filters)),\n", + " Dense(units=2*2*8*n_filters), # 4x4 feature maps (with 6N occurances)\n", + " Reshape(target_shape=(2, 2, 8*n_filters)),\n", "\n", " # Upscaling convolutions (inverse of encoder)\n", + " Conv2DTranspose(filters=6*n_filters, kernel_size=3, strides=2),\n", " Conv2DTranspose(filters=4*n_filters, kernel_size=3, strides=2),\n", " Conv2DTranspose(filters=2*n_filters, kernel_size=3, strides=2),\n", " Conv2DTranspose(filters=1*n_filters, kernel_size=5, strides=2),\n", @@ -335,162 +254,93 @@ " return decoder" ] }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": { - "id": "i4JmvmMA71Ec" - }, - "outputs": [], - "source": [ - "standard_classifier = make_standard_classifier()\n", - "wrapped_classifier = HistogramVAEWrapper(standard_classifier, num_bins=5, queue_size=20000, latent_dim = 100, decoder=make_face_decoder_network())" - ] - }, { "cell_type": "markdown", "metadata": { - "id": "valYm5LH71Ec" - }, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "A527wdyV71Ec" + "id": "SzFGcrhv71Ed" }, "source": [ - "Now, let's train the wrapped classifier! As we did in the previous lab, in addition to updating the weights of the model, the wrapped classifier also tracks feature distributions. We can use the joint probabilities of these feature distributions to determine the bias of a given sample in this dataset. We'll make use of the `Model.fit` API here, but note that we can achieve the same behavior with a custom training loop as well." + "We are ready to create the wrapped model using `capsa.HistogramVAEWrapper` by passing in the relevant arguments!\n", + "\n", + "Just like in the wrappers in the Introduction to Capsa lab, we can take our standard CNN classifier, wrap it with `capsa.HistogramVAEWrapper`, build the wrapped model. The wrapper then enablings semi-supervised training for the facial detection task. As the wrapped model trains, the classifier weights are updated, and the VAE-wrapped model learns to track feature distributions over the latent space. More details of the `HistogramVAEWrapper` and how it can be used are [available here](https://themisai.io/capsa/api_documentation/HistogramVAEWrapper.html).\n", + "\n", + "We can then evaluate the representation bias of the classifier on the test dataset. By calling the `wrapped_model` on our test data, we can automatically generate representation bias and uncertainty scores that are normally manually calculated. Let's wrap our base CNN classifier using Capsa, train and build the resulting model, and start to process the test data: " ] }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NmshVdLM71Ed", - "outputId": "48155283-4767-46e7-e84b-dfd3ac8c1917", - "colab": { - "base_uri": "https://localhost:8080/" - } - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Epoch 1/6\n" - ] - }, - { - "output_type": "stream", - "name": "stderr", - "text": [ - "WARNING:tensorflow:Gradients do not exist for variables ['dense_1/kernel:0', 'dense_1/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?\n", - "WARNING:tensorflow:Gradients do not exist for variables ['dense_1/kernel:0', 'dense_1/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?\n" - ] - }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - " 102/2404 [>.............................] - ETA: 5:58 - vae_compiled_loss: 0.8147 - vae_compiled_binary_accuracy: 0.4792 - vae_wrapper_loss: 3385.2124" - ] - } - ], "source": [ - "learning_rate = 1e-5\n", + "### Estimating representation bias with Capsa HistogramVAEWrapper ###\n", "\n", - "# compile model using desired optimizers and losses\n", - "wrapped_classifier.compile(\n", - " optimizer=tf.keras.optimizers.Adam(learning_rate),\n", - " loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),\n", - " metrics=[tf.keras.metrics.BinaryAccuracy()],\n", + "model = make_standard_classifier()\n", + "# Wrap the CNN classifier for latent encoding with a VAE wrapper\n", + "wrapped_model = capsa.HistogramVAEWrapper(model, num_bins=5, queue_size=20000, \n", + " latent_dim = 32, decoder=make_face_decoder_network())\n", + "\n", + "# Build the model for classification, defining the loss function, optimizer, and metrics\n", + "wrapped_model.compile(\n", + " optimizer=tf.keras.optimizers.Adam(learning_rate=5e-4),\n", + " loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), # for classification\n", + " metrics=[tf.keras.metrics.BinaryAccuracy()], # for classification\n", " run_eagerly=True\n", ")\n", "\n", - "# fit the model to our training data\n", - "history = wrapped_classifier.fit(\n", + "# Train the wrapped model for 6 epochs by fitting to the training data\n", + "history = wrapped_model.fit(\n", " train_loader,\n", " epochs=6,\n", " batch_size=batch_size,\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "SzFGcrhv71Ed" - }, - "source": [ - "Let's see what the bias looks like on our test dataset! Note that in this lab, we're using a much larger test dataset than the one in Lab 2. By calling the `wrapped_classifier` on our test set, we can automatically generate the same bias scores that we manually calculated in the last lab. " - ] - }, - { - "cell_type": "code", - "execution_count": null, + " )\n", + "\n", + "## Evaluation\n", + "\n", + "# Get all faces from the testing dataset\n", + "test_imgs = test_loader.get_all_faces()\n", + "\n", + "# Call the Capsa-wrapped classifier to generate outputs: predictions, uncertainty, and bias!\n", + "predictions, uncertainty, bias = wrapped_model.predict(test_imgs, batch_size=512)" + ], "metadata": { - "id": "1dCqvPFH71Ed" + "id": "YqsBHBf3yUlm" }, - "outputs": [], - "source": [ - "test_imgs = test_loader.get_all_faces() # Get all faces from the testing dataset\n", - "predictions, _, bias = wrapped_classifier.predict(test_imgs) # use CAPSA-wrapped classifier to obtain estimates for bias and the output" - ] - }, - { - "cell_type": "code", "execution_count": null, - "metadata": { - "id": "Pt7_FlRW71Ee" - }, - "outputs": [], - "source": [ - "tf.config.list_physical_devices('GPU')" - ] + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "Xtc0kjE471Ee" - }, - "source": [ - "Now, we have an estimate for the bias score! Let's visualize what the samples with the highest bias and those with the lowest bias look like. Before you run the next code block, which faces would you expect to be underrepresented in the dataset? Which ones do you think will be overrepresented?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "OYMRqq5E71Ee" - }, - "outputs": [], "source": [ - "indices = np.argsort(bias, axis=None) \n", - "sorted_images = test_imgs[indices] # sort images from lowest to highest bias\n", - "sorted_biases = bias[indices]\n", - "sorted_preds = predictions[indices]" - ] - }, - { - "cell_type": "code", - "execution_count": null, + "# 3.3 Analyzing representation bias with Capsa\n", + "\n", + "From the above output, we have an estimate for the representation bias score! We can analyze the representation scores to start to think about manifestations of bias in the facial detection dataset. Before you run the next code block, which faces would you expect to be underrepresented in the dataset? Which ones do you think will be overrepresented?" + ], "metadata": { - "id": "UAYaFUj-71Ee" - }, - "outputs": [], - "source": [ - "lab3.plot_k(sorted_images[:20]) # These are the samples with the lowest representation (least bias) in our test dataset" - ] + "id": "629ng-_H6WOk" + } }, { "cell_type": "code", "execution_count": null, "metadata": { - "id": "CnbR3qAF71Ef" + "id": "OYMRqq5E71Ee" }, "outputs": [], "source": [ - "lab3.plot_k(sorted_images[-20:]) # These are the samples with the highest representation (most bias) in our test dataset" + "### Analyzing representation bias scores ###\n", + "\n", + "# Sort according to lowest to highest representation scores\n", + "indices = np.argsort(bias, axis=None) # sort the score values themselves\n", + "sorted_images = test_imgs[indices] # sort images from lowest to highest representations\n", + "sorted_biases = bias[indices] # order the representation bias scores\n", + "sorted_preds = predictions[indices] # order the prediction values\n", + "\n", + "\n", + "# Visualize the 20 images with the lowest and highest representation in the test dataset\n", + "fig, ax = plt.subplots(1, 2, figsize=(16, 8))\n", + "ax[0].imshow(mdl.util.create_grid_of_images(sorted_images[-20:], (4, 5)))\n", + "ax[0].set_title(\"Over-represented\")\n", + "\n", + "ax[1].imshow(mdl.util.create_grid_of_images(sorted_images[:20], (4, 5)))\n", + "ax[1].set_title(\"Under-represented\");" ] }, { @@ -499,7 +349,7 @@ "id": "-JYmGMJF71Ef" }, "source": [ - "Now, we'll spend some time looking at the bias by *percentile* in our dataset. First, let's plot the accuracy as the bias increases. Remember that we use bias to quantify the level of representation in our dataset, so increasing bias means increasing representation. How do you expect the accuracy to change?" + "We can also quantify how the representation density relates to the classification accuracy by plotting the two against each other:" ] }, { @@ -510,7 +360,10 @@ }, "outputs": [], "source": [ - "averaged_imgs = lab3.plot_accuracy_vs_risk(sorted_images, sorted_biases, sorted_preds, \"Bias vs. Accuracy\")" + "# Plot the representation density vs. the accuracy\n", + "plt.xlabel(\"Density (Representation)\")\n", + "plt.ylabel(\"Accuracy\")\n", + "averaged_imgs = mdl.lab3.plot_accuracy_vs_risk(sorted_images, sorted_biases, sorted_preds, \"Bias vs. Accuracy\")" ] }, { @@ -519,19 +372,20 @@ "id": "i8ERzg2-71Ef" }, "source": [ - "Now, for a super interesting visualization, let's look at the *percentiles* of bias: what does the average face in the 10th percentile of bias look like? What about the 90th percentile? What changes across these faces?" + "These representations scores relate back to data examples, so we can visualize what the average face looks like for a given *percentile* of representation density:" ] }, { "cell_type": "code", - "execution_count": null, + "source": [ + "fig, ax = plt.subplots(figsize=(15,5))\n", + "ax.imshow(mdl.util.create_grid_of_images(averaged_imgs, (1,10)))" + ], "metadata": { - "id": "1cd590UP71Ef" + "id": "kn9IpPKYSECg" }, - "outputs": [], - "source": [ - "lab3.plot_percentile(averaged_imgs)" - ] + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -539,7 +393,13 @@ "id": "cRNV-3SU71Eg" }, "source": [ - "Now that we know what the bias in our dataset looks like, let's adaptively resample from our dataset! Since we can calculate this score on-the-fly *during training*, we can adjust the probability of samples being chosen. But first, let's also take a look at the *epistemic* uncertainty of this dataset" + "#### **TODO: Scoring representation densities with Capsa**\n", + "\n", + "Write short answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. How does accuracy relate to the representation score? From this relationship, what can you determine about the bias underlying the dataset?\n", + "2. What does the average face in the 10th percentile of representation density look like (i.e., the face for which 10% of the data have lower probability of occuring)? What about the 90th percentile? What changes across these faces?\n", + "3. What could be potential limitations of the `HistogramVAEWrapper` approach as it is implemented now?" ] }, { @@ -548,51 +408,52 @@ "id": "ww5lx7ue71Eg" }, "source": [ - "# 3.3 Epistemic Uncertainty\n", + "# 3.4 Analyzing epistemic uncertainty with Capsa\n", "\n", - "Recall from lecture that *epistemic* uncertainty, or a model's uncertainty in its prediction, can arise from out of distribution data, or samples that are harder to learn. This does not necessarily correlate with bias! Imagine the scenario of training an object detector for self-driving cars: even if the model is presented with many cluttered scenes, these samples still may be harder to learn than scenes with very few objects in them. In this part of the lab, we'll analyze the epistemic uncertainty of the VAE that we've trained on this dataset. \n", + "Recall that *epistemic* uncertainty, or a model's uncertainty in its prediction, can arise from out-of-distribution data, missing data, or samples that are harder to learn. This does not necessarily correlate with representation bias! Imagine the scenario of training an object detector for self-driving cars: even if the model is presented with many cluttered scenes, these samples still may be harder to learn than scenes with very few objects in them.\n", "\n", - "From lecture 6, we saw that most methods of estimating epistemic uncertainty are *sampling-based*, but we can also use *reconstruction-based* methods. If a model is unable to provide a good reconstruction for a given data point, it has not learned that area of the underlying data distribution well, and therefore has high epistemic uncertainty. \n", + "We will now use our VAE-wrapped facial detection classifier to analyze and estimate the epistemic uncertainty of the model trained on the facial detection task.\n", "\n", - "Since we've already used a VAE to calculate the histograms for bias quantification, we can use the same VAE to shed insight into epistemic uncertainty! CAPSA helps us do exactly that: when call the model, we get the bias, reconstruction loss, and prediction for every sample." + "While most methods of estimating epistemic uncertainty are *sampling-based*, we can also use ***reconstruction-based*** methods -- like using VAEs -- to estimate epistemic uncertainty. If a model is unable to provide a good reconstruction for a given data point, it has not learned that area of the underlying data distribution well, and therefore has high epistemic uncertainty.\n", + "\n" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "AwGPvdZm71Eg" - }, - "outputs": [], + "cell_type": "markdown", "source": [ - "predictions, reconstruction_loss, bias = wrapped_classifier.predict(test_imgs) # note that we're estimating both bias and uncertainty in a single shot!\n", + "Since we've already used the `HistogramVAEWrapper` to calculate the histograms for representation bias quantification, we can use the exact same VAE wrapper to shed insight into epistemic uncertainty! Capsa helps us do exactly that. When we called the model, we returned the classification prediction, uncertainty, and bias for every sample:\n", + "`predictions, uncertainty, bias = wrapped_model.predict(test_imgs, batch_size=512)`.\n", "\n", - "epistemic_indices = np.argsort(reconstruction_loss, axis=None) \n", - "epistemic_images = test_imgs[epistemic_indices] # sort images by reconstruction loss this time!\n", - "sorted_epistemic = reconstruction_loss[epistemic_indices]\n", - "sorted_epistemic_preds = predictions[epistemic_indices]" - ] - }, - { - "cell_type": "code", - "execution_count": null, + "Let's analyze these estimated uncertainties:" + ], "metadata": { - "id": "kB8Iqrfb71Eg" - }, - "outputs": [], - "source": [ - "lab3.plot_k(epistemic_images[:20]) # samples with the LEAST epistemic uncertainty" - ] + "id": "NEfeWo2p7wKm" + } }, { "cell_type": "code", "execution_count": null, "metadata": { - "id": "miu5h2Pc71Eh" + "id": "AwGPvdZm71Eg" }, "outputs": [], "source": [ - "lab3.plot_k(epistemic_images[-20:]) # samples with the MOST epistemic uncertainty" + "### Analyzing epistemic uncertainty estimates ###\n", + "\n", + "# Sort according to epistemic uncertainty estimates\n", + "epistemic_indices = np.argsort(uncertainty, axis=None) # sort the uncertainty values\n", + "epistemic_images = test_imgs[epistemic_indices] # sort images from lowest to highest uncertainty\n", + "sorted_epistemic = uncertainty[epistemic_indices] # order the uncertainty scores\n", + "sorted_epistemic_preds = predictions[epistemic_indices] # order the prediction values\n", + "\n", + "\n", + "# Visualize the 20 images with the LEAST and MOST epistemic uncertainty\n", + "fig, ax = plt.subplots(1, 2, figsize=(16, 8))\n", + "ax[0].imshow(mdl.util.create_grid_of_images(epistemic_images[:20], (4, 5)))\n", + "ax[0].set_title(\"Least Uncertain\");\n", + "\n", + "ax[1].imshow(mdl.util.create_grid_of_images(epistemic_images[-20:], (4, 5)))\n", + "ax[1].set_title(\"Most Uncertain\");" ] }, { @@ -601,7 +462,7 @@ "id": "L0dA8EyX71Eh" }, "source": [ - "Let's run the same analysis: check how the accuracy varies with epistemic uncertainty!" + "We quantify how the epistemic uncertainty relates to the classification accuracy by plotting the two against each other:" ] }, { @@ -612,7 +473,10 @@ }, "outputs": [], "source": [ - "_ = lab3.plot_accuracy_vs_risk(epistemic_images, sorted_epistemic, sorted_epistemic_preds, \"Epistemic Uncertainty vs. Accuracy\")" + "# Plot epistemic uncertainty vs. classification accuracy\n", + "plt.xlabel(\"Epistemic Uncertainty\")\n", + "plt.ylabel(\"Accuracy\")\n", + "_ = mdl.lab3.plot_accuracy_vs_risk(epistemic_images, sorted_epistemic, sorted_epistemic_preds, \"Epistemic Uncertainty vs. Accuracy\")" ] }, { @@ -621,7 +485,13 @@ "id": "iyn0IE6x71Eh" }, "source": [ - "How do these compare to the bias plots? Was this expected or unexpected?" + "#### **TODO: Estimating epistemic uncertainties with Capsa**\n", + "\n", + "Write short answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. How does accuracy relate to the epistemic uncertainty?\n", + "2. How do the results for epistemic uncertainty compare to the results for representation bias? Was this expected or unexpted? Why?\n", + "3. What may be instances in the facial detection task that could have high representation density but also high uncertainty? " ] }, { @@ -632,67 +502,15 @@ "source": [ "# 3.4 Resampling based on risk metrics\n", "\n", - "Finally, let's use both the bias score and the reconstruction loss to adaptively resample from our dataset. Since we can calculate this score on-the-fly *during training*, we can adjust the probability of samples being chosen. \n", + "Finally, we will use the risk metrics just computed to actually *mitigate* the issues of bias and uncertainty in the facial detection classifier.\n", "\n", - "Note that we want to debias and amplify only the *positive* samples in the dataset, so we're going to only adjust probabilities and calculate scores for these samples. \n", + "Specifically, we will use the latent variables learned via the VAE to adaptively re-sample the face (CelebA) data during training, following the approach of [recent work](http://introtodeeplearning.com/AAAI_MitigatingAlgorithmicBias.pdf). We will alter the probability that a given image is used during training based on how often its latent features appear in the dataset. So, faces with rarer features (like dark skin, sunglasses, or hats) should become more likely to be sampled during training, while the sampling probability for faces with features that are over-represented in the training dataset should decrease (relative to uniform random sampling across the training data).\n", "\n", - "We want to *amplify*, or increase the probability of sampling, of images with high epistemic uncertainty, since these data points come from areas of the latent distribution that the model hasn't learned very well yet. We also want to amplify images with very low representation bias, since otherwise, the model won't see enough of these samples during training. Let's define two functions below to do this:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hRL5nUBs71Ei" - }, - "source": [ - "First, let's do this for the bias. We have a smoothing parameter `alpha` that we can tune: as `alpha` increases, the probabilities will tend towards a uniform distribution, and as `alpha` decreases, the probabilities will correlate more directly with the bias. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "0wR2bMw571Ei" - }, - "outputs": [], - "source": [ - "def score_to_probability_bias(score, alpha):\n", - " score = score + alpha\n", - " probabilities = 1/score\n", - " probabilities = probabilities/sum(probabilities)\n", - " return probabilities" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "TUs-0O_v71Ei" - }, - "source": [ - "Let's now define a similar function for the epistemic probabilities: note that in this case, we want high epistemic uncertainty to correlate with a higher probability!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "hLWGKvc971Ei" - }, - "outputs": [], - "source": [ - "def score_to_probability_epistemic(score, beta):\n", - " score = score + beta\n", - " probabilities = score/sum(score)\n", - " return probabilities" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "meZxtxFS71Ei" - }, - "source": [ - "Now, let's redefine and re-train our debiasing model!" + "Note that we want to debias and amplify only the *positive* samples in the dataset -- the faces -- so we are going to only adjust probabilities and calculate scores for these samples. We focus on using the representation bias scores to implement this adaptive resampling to achieve model debiasing.\n", + "\n", + "We re-define the wrapped model with `HistogramVAEWrapper`, and then define the adaptive resampling operation for training. At each training epoch, we compute the predictions, uncertainties, and representation bias scores, then recompute the data sampling probabilities according to the *inverse* of the representation bias score. That is, samples with higher representation densities will end up with lower re-sampling probabilities; samples with lower representations will end up with higher re-sampling probabilities.\n", + "\n", + "Let's do all this below!" ] }, { @@ -703,11 +521,19 @@ }, "outputs": [], "source": [ - "standard_classifier = make_standard_classifier()\n", - "dbvae = HistogramVAEWrapper(standard_classifier, latent_dim=100, num_bins=5, queue_size=2000, decoder=make_face_decoder_network())\n", - "dbvae.compile(optimizer=tf.keras.optimizers.Adam(1e-4),\n", - " loss=tf.keras.losses.BinaryCrossentropy(),\n", - " metrics=[tf.keras.metrics.BinaryAccuracy()])\n", + "### Define the standard CNN classifier and wrap with HistogramVAE ###\n", + "\n", + "classifier = make_standard_classifier()\n", + "# Wrap with HistogramVAE\n", + "wrapper = capsa.HistogramVAEWrapper(classifier, latent_dim=32, num_bins=5, \n", + " queue_size=2000, decoder=make_face_decoder_network())\n", + "\n", + "# Build the wrapped model for the classification task\n", + "wrapper.compile(optimizer=tf.keras.optimizers.Adam(5e-4),\n", + " loss=tf.keras.losses.BinaryCrossentropy(),\n", + " metrics=[tf.keras.metrics.BinaryAccuracy()])\n", + "\n", + "# Load training data\n", "train_imgs = train_loader.get_all_faces()" ] }, @@ -719,26 +545,32 @@ }, "outputs": [], "source": [ - "# The training loop -- outer loop iterates over the number of epochs\n", - "for i in range(6):\n", + "### Debiasing via resampling based on risk metrics ###\n", "\n", - " print(\"Starting epoch {}/{}\".format(i+1, 6))\n", + "# The training loop -- outer loop iterates over the number of epochs\n", + "num_epochs = 6\n", + "for i in range(num_epochs):\n", + " print(\"Starting epoch {}/{}\".format(i+1, num_epochs))\n", " \n", - " # get a batch of training data and compute the training step\n", + " # Get a batch of training data and compute the training step\n", " for step, data in enumerate(train_loader):\n", - " metrics = dbvae.train_step(data)\n", + " metrics = wrapper.train_step(data)\n", " if step % 100 == 0:\n", " print(step)\n", - " _, recon_loss, bias_scores = dbvae(train_imgs)\n", - " recon_loss = np.squeeze(recon_loss)\n", - "\n", - " # Recompute data sampling proabilities\n", - " p_faces = score_to_probability_bias(bias_scores.numpy(), 1e-7)\n", - " p_recon = score_to_probability_epistemic(recon_loss, 1e-7)\n", - " p_final = (p_faces + p_recon)/2\n", - " p_final /= sum(p_final)\n", - " \n", - " train_loader.p_pos = p_final" + "\n", + " # After the epoch is done, recompute data sampling proabilities \n", + " # according to the inverse of the bias\n", + " pred, unc, bias = wrapper(train_imgs)\n", + "\n", + " # Increase the probability of sampling under-represented datapoints by setting \n", + " # the probability to the **inverse** of the biases\n", + " inverse_bias = 1.0 / (bias.numpy() + 1e-7)\n", + "\n", + " # Normalize the inverse biases in order to convert them to probabilities\n", + " p_faces = inverse_bias / np.sum(inverse_bias)\n", + "\n", + " # Update the training data loader to sample according to this new distribution\n", + " train_loader.p_pos = p_faces" ] }, { @@ -747,18 +579,11 @@ "id": "SwXrAeBo71Ej" }, "source": [ - "Now, we should have a debiased model that also mitigates some forms of uncertainty! Let's see how well our model does:" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MXiB-DMH71Ej" - }, - "source": [ - "# 3.5 Evaluation\n", + "That's it! We should have a debiased model (we hope!). Let's see how the model does.\n", "\n", - "Let's run the same analyses as before, and plot the accuracy vs. the bias and accuracy vs. epistemic uncertainty. We want the model to do better on less biased and more uncertain samples than it did previously\n" + "### Evaluation\n", + "\n", + "Let's run the same analyses as before, and plot the classification accuracy vs. the representation bias and classification accuracy vs. epistemic uncertainty. We want the model to do better across the data samples, achieving higher accuracies on the under-represented and more uncertain samples compared to previously.\n" ] }, { @@ -769,37 +594,21 @@ }, "outputs": [], "source": [ - "predictions, reconstruction_loss, bias = dbvae.predict(test_imgs)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "zCXVIsaJ71Ej" - }, - "outputs": [], - "source": [ + "### Evaluation of debiased model ###\n", + "\n", + "# Get classification predictions, uncertainties, and representation bias scores\n", + "pred, unc, bias = wrapper.predict(test_imgs)\n", + "\n", + "# Sort according to lowest to highest representation scores\n", "indices = np.argsort(bias, axis=None)\n", - "bias_images = test_imgs[indices]\n", - "sorted_bias = bias[indices]\n", - "sorted_bias_preds = predictions[indices]\n", - "_ = lab3.plot_accuracy_vs_risk(bias_images, sorted_bias, sorted_bias_preds, \"Bias vs. Accuracy\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "P6p2j_xa71Ej" - }, - "outputs": [], - "source": [ - "indices = np.argsort(reconstruction_loss, axis=None)\n", - "epistemic_images = test_imgs[indices]\n", - "sorted_epistemic = bias[indices]\n", - "sorted_epistemic_preds = predictions[indices]\n", - "_ = lab3.plot_accuracy_vs_risk(epistemic_images, sorted_epistemic, sorted_epistemic_preds, \"Epistemic Uncertainty vs. Accuracy\")" + "bias_images = test_imgs[indices] # sort the images\n", + "sorted_bias = bias[indices] # sort the representation bias scores\n", + "sorted_bias_preds = pred[indices] # sort the predictions\n", + "\n", + "# Plot the representation bias vs. the accuracy\n", + "plt.xlabel(\"Density (Representation)\")\n", + "plt.ylabel(\"Accuracy\")\n", + "_ = mdl.lab3.plot_accuracy_vs_risk(bias_images, sorted_bias, sorted_bias_preds, \"Bias vs. Accuracy\")" ] }, { @@ -808,26 +617,45 @@ "id": "d1cEEnII71Ej" }, "source": [ - "# 3.6 Conclusion\n", + "# 3.5 Competition!\n", + "\n", + "Now, you are well equipped to submit to the competition to dig in deeper into deep learning models, uncover their deficiencies with Capsa, address those deficiencies, and submit your findings!\n", + "\n", + "**Below are some potential areas to start investigating -- the goal of the competition is to develop creative and innovative solutions to address bias and uncertainty, and to improve the overall performance of deep learning models.**\n", + "\n", + "We encourage you to identify other questions that could be solved with Capsa and use those as the basis of your submission. But, to help get you started, here are some interesting questions that you might look into solving with these new tools and knowledge that you've built up: \n", + "\n", + "1. In this lab, you learned how to build a wrapper that can estimate the bias within the training data, and take the results from this wrapper to adaptively re-sample during training to encourage learning on under-represented data. \n", + " * Can we apply a similar approach to mitigate epistemic uncertainty in the model? \n", + " * Can this approach be combined with your original bias mitigation approach to achieve robustness across both bias *and* uncertainty? \n", + "\n", + "2. In this lab, you focused on the `HistogramVAEWrapper`. \n", + " * How can you use other methods of uncertainty in Capsa to strengthen your uncertainty estimates? Checkout [Capsa documentation](https://themisai.io/capsa/api_documentation/index.html) for a list of all wrappers, and ask for help if you run into trouble applying them to your model!\n", + " * Can you combine uncertainty estimates from different wrappers to achieve greater robustness in your estimates? \n", + "\n", + "3. So far in this part of the lab, we focused only on bias and epistemic uncertainty. What about aleatoric uncetainty? \n", + " * We've curated a dataset (available at [this URL](https://www.dropbox.com/s/wsdyma8a340k8lw/train_face_2023_perturbed_large.h5?dl=0)) of faces with greater amounts of aleatoric uncertainty -- can you use Capsa to wrap your model, estimate aleatoric uncertainty, and remove it from the dataset? \n", + " * Does removing aleatoric uncertainty help improve your training accuracy on this new dataset? \n", + " * Can you develop an approach to incorporate this aleatoric uncertainty estimation into the predictive training pipeline in order to improve accuracy? You may find some surprising results!!\n", "\n", - "We encourage you to think about and maybe even address some questions raised by the approach and results outlined here:\n", + "4. How can the performance of the classifier above be improved even further? We purposely did not optimize hyperparameters to leave this up to you!\n", "\n", - "* We did not analyze the *aleatoric* uncertainty of the above dataset. Try to develop a similar approach (assigning probabilities based on aleatoric uncertainty) and incorporate this as well! You may find some surprising results :)\n", + "5. Are there other applications that you think Capsa and bias/uncertainty estimation would be helpful in? \n", + " * Try integrating Capsa into another domain or dataset and submit your findings!\n", + " * Are there applications where you may *not* want to debias your model? \n", "\n", - "* How can the performance of the classifier above be improved even further? We purposely did not optimize hyperparameters to leave this up to you!\n", "\n", - "* How can you use other methods of uncertainty in CAPSA to strengthen your uncertainty estimates?\n", + "**To enter the competition, please upload the following to the [lab submission site](https://www.dropbox.com/request/TTYz3Ikx5wIgOITmm5i2):**\n", "\n", - "* In which applications (either related to facial detection or not!) would debiasing in this way be desired? Are there applications where you may not want to debias your model?\n", + "* Written short-answer responses to `TODO`s from Lab 2, Part 2 on Facial Detection.\n", + "* Description of the wrappers, algorithms, and approach you used. What was your strategy? What wrappers did you implement? What debiasing or mitigation strategies did you try? How and why did these modifications affect performance? Describe *any* modifications or implementations you made to the template code, and what their effects were. Written text, visual diagram, and plots welcome!\n", + "* Jupyter notebook with the code you used to generate your results (along with all plots/visuals generated).\n", "\n", - "* Try to optimize your model to achieve improved performance. MIT students and affiliates will be eligible for prizes during the IAP offering. To enter the competition, MIT students and affiliates should upload the following to the course Canvas:\n", + "**Name your file in the following format: `[FirstName]_[LastName]_Face`, followed by the file format (.zip, .ipynb, .pdf, etc).** ZIP files are preferred over individual files. If you submit individual files, you must name the individual files according to the above nomenclature (e.g., `[FirstName]_[LastName]_Face_TODO.pdf`, `[FirstName]_[LastName]_Face_Report.pdf`, etc.). **Submit your files [here](https://www.dropbox.com/request/TTYz3Ikx5wIgOITmm5i2).**\n", "\n", - "* Jupyter notebook with the code you used to generate your results;\n", - "copy of the line plots from section 3.5 showing the performance of your model;\n", - "* a description and/or diagram of the architecture and hyperparameters you used -- if there are any additional or interesting modifications you made to the template code, please include these in your description;\n", - "* discussion of why these modifications helped improve performance.\n", + "We encourage you to think about and maybe even address some questions raised by this lab and dig into any questions that you may have about the risks inherrent to neural networks and their data. \n", "\n", - "Hopefully this lab has shed some light on a few concepts, from vision based tasks, to VAEs, to algorithmic bias. We like to think it has, but we're biased ;)." + "<img src=\"https://i.ibb.co/BjLSRMM/ezgif-2-253dfd3f9097.gif\" />" ] } ], diff --git a/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb b/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb index c3d6d3fd..09b73c8b 100644 --- a/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb +++ b/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb @@ -2,37 +2,77 @@ "cells": [ { "cell_type": "markdown", + "source": [ + "<table align=\"center\">\n", + " <td align=\"center\"><a target=\"_blank\" href=\"http://introtodeeplearning.com\">\n", + " <img src=\"https://i.ibb.co/Jr88sn2/mit.png\" style=\"padding-bottom:5px;\" />\n", + " Visit MIT Deep Learning</a></td>\n", + " <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning/blob/2023/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb\">\n", + " <img src=\"https://i.ibb.co/2P3SLwK/colab.png\" style=\"padding-bottom:5px;\" />Run in Google Colab</a></td>\n", + " <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning/blob/2023/lab3/solutions/Lab3_Part_1_Introduction_to_CAPSA.ipynb\">\n", + " <img src=\"https://i.ibb.co/xfJbPmL/github.png\" height=\"70px\" style=\"padding-bottom:5px;\" />View Source on GitHub</a></td>\n", + "</table>\n", + "\n", + "# Copyright Information" + ], "metadata": { - "id": "ckzz5Hus-hJB" - }, + "id": "SWa-rLfIlTaf" + } + }, + { + "cell_type": "code", "source": [ - "## Part 1: Introduction to CAPSA" - ] + "# Copyright 2023 MIT Introduction to Deep Learning. All Rights Reserved.\n", + "# \n", + "# Licensed under the MIT License. You may not use this file except in compliance\n", + "# with the License. Use and/or modification of this code outside of MIT Introduction\n", + "# to Deep Learning must reference:\n", + "#\n", + "# © MIT Introduction to Deep Learning\n", + "# http://introtodeeplearning.com\n", + "#" + ], + "metadata": { + "id": "-LohleBMlahL" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", "metadata": { - "id": "gTpt_Hj5j-FZ" + "id": "ckzz5Hus-hJB" }, "source": [ - "As we saw in lecture 6, it is critical to be able to estimate bias and uncertainty robustly: we need benchmarks that uniformly measure how uncertain a given model is, and we need principled ways of measuring bias and uncertainty. To that end, in this lab, we'll utilize [CAPSA](https://github.com/themis-ai/capsa), a risk-estimation wrapping library developed by [Themis AI](https://themisai.io/). CAPSA supports the estimation of three different types of *risk*, defined as measures of how trustworthy our model is. These are:\n", - "1. Representation bias: using a histogram estimation approach, CAPSA calculates how likely combinations of features are to appear in a given dataset. Often, certain combinations of features are severely underrepresented in datasets, which means models learn them less well. Since evaluation metrics are often also biased in the same manner, these biases are not caught through traditional validation pipelines.\n", - "2. Aleatoric uncertainty: we can estimate the uncertainty in *data* by learning a layer that predicts a standard deviation for every input. This is useful to determine when sensors have noise, classes in datasets have low separations, and generally when very similar inputs lead to drastically different outputs.\n", - "3. Epistemic uncertainty: also known as predictive or model uncertainty, epistemic uncertainty captures the areas of our underlying data distribution that the model has not yet learned. Areas of high epistemic uncertainty can be due to out of distribution (OOD) samples or data that is harder to learn.\n" + "# Laboratory 3: Debiasing, Uncertainty, and Robustness\n", + "\n", + "# Part 1: Introduction to Capsa\n", + "\n", + "In this lab, we'll explore different ways to make deep learning models more **robust** and **trustworthy**.\n", + "\n", + "To achieve this it is critical to be able to identify and diagnose issues of bias and uncertainty in deep learning models, as we explored in the Facial Detection Lab 2. We need benchmarks that uniformly measure how uncertain a given model is, and we need principled ways of measuring bias and uncertainty. To that end, in this lab, we'll utilize [Capsa](https://github.com/themis-ai/capsa), a risk-estimation wrapping library developed by [Themis AI](https://themisai.io/). Capsa supports the estimation of three different types of ***risk***, defined as measures of how robust and trustworthy our model is. These are:\n", + "1. **Representation bias**: reflects how likely combinations of features are to appear in a given dataset. Often, certain combinations of features are severely under-represented in datasets, which means models learn them less well and can thus lead to unwanted bias.\n", + "2. **Data uncertainty**: reflects noise in the data, for example when sensors have noisy measurements, classes in datasets have low separations, and generally when very similar inputs lead to drastically different outputs. Also known as *aleatoric* uncertainty. \n", + "3. **Model uncertainty**: captures the areas of our underlying data distribution that the model has not yet learned or has difficulty learning. Areas of high model uncertainty can be due to out-of-distribution (OOD) samples or data that is harder to learn. Also known as *epistemic* uncertainty." ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "id": "o02MyoDrnNqP" }, "source": [ - "The core ideology behind CAPSA is that models can be *wrapped* in a way that makes them *risk-aware*. \n", + "## CAPSA overview\n", + "\n", + "This lab introduces Capsa and its functionalities, to next build automated tools that use Capsa to mitigate the underlying issues of bias and uncertainty.\n", + "\n", + "The core idea behind [Capsa](https://themisai.io/capsa/) is that any deep learning model of interest can be ***wrapped*** -- just like wrapping a gift -- to be made ***aware of its own risks***. Risk is captured in representation bias, data uncertainty, and model uncertainty.\n", "\n", "\n", "\n", - "This means that CAPSA augments or modifies the user's original model minimally to create a risk-aware variant while preserving the model's underlying structure and training pipeline. CAPSA is a one-line addition to any training workflow in Tensorflow. In this part of the lab, we'll apply CAPSA's risk estimation methods to a toy regression task to further explore the notions of bias and uncertainty. " + "This means that Capsa takes the user's original model as input, and modifies it minimally to create a risk-aware variant while preserving the model's underlying structure and training pipeline. Capsa is a one-line addition to any training workflow in TensorFlow. In this part of the lab, we'll apply Capsa's risk estimation methods to a simple regression problem to further explore the notions of bias and uncertainty. \n", + "\n", + "Please refer to [Capsa's documentation](https://themisai.io/capsa/) for additional details." ] }, { @@ -41,51 +81,34 @@ "id": "hF0uSqk-nwmA" }, "source": [ - "Let's first install necessary dependencies:" + "Let's get started by installing the necessary dependencies:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "NdXF4Reyj6yy", - "outputId": "e21a92b6-cb80-4da3-9b25-f447bf28482b" + "id": "NdXF4Reyj6yy" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", - "Requirement already satisfied: capsa in /usr/local/lib/python3.8/dist-packages (0.1.2)\n", - "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", - "Requirement already satisfied: mitdeeplearning in /usr/local/lib/python3.8/dist-packages (0.2.0)\n", - "Requirement already satisfied: gym in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning) (0.25.2)\n", - "Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning) (1.21.6)\n", - "Requirement already satisfied: regex in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning) (2022.6.2)\n", - "Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from mitdeeplearning) (4.64.1)\n", - "Requirement already satisfied: importlib-metadata>=4.8.0 in /usr/local/lib/python3.8/dist-packages (from gym->mitdeeplearning) (5.2.0)\n", - "Requirement already satisfied: gym-notices>=0.0.4 in /usr/local/lib/python3.8/dist-packages (from gym->mitdeeplearning) (0.0.8)\n", - "Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from gym->mitdeeplearning) (1.5.0)\n", - "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=4.8.0->gym->mitdeeplearning) (3.11.0)\n" - ] - } - ], + "outputs": [], "source": [ + "# Import Tensorflow 2.0\n", + "%tensorflow_version 2.x\n", "import tensorflow as tf\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "!pip install capsa\n", "\n", - "from capsa import *\n", - "from helper import gen_data_regression\n", + "import IPython\n", + "import functools\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from tqdm import tqdm\n", "\n", + "# Download and import the MIT Introduction to Deep Learning package\n", "!pip install mitdeeplearning\n", "import mitdeeplearning as mdl\n", - "import tqdm" + "\n", + "# Download and import Capsa\n", + "!pip install capsa\n", + "import capsa" ] }, { @@ -94,61 +117,70 @@ "id": "xzEcxjKHn8gc" }, "source": [ - "### 1.1 Datasets \n", - "Next, let's construct a dataset that we'll analyze. As shown in lecture, we'll look at the curve `y = x^3` with epistemic and aleatoric noise added to certain parts of the dataset. The blue points below are the test data: note that there are regions where we have no train data but we have test data! Do you expect these areas to have higher or lower uncertainty? What type of uncertainty?" + "## 1.1 Dataset\n", + "\n", + "We will build understanding of bias and uncertainty by training a neural network for a simple 2D regression task: modeling the function $y = x^3$. We will use Capsa to analyze this dataset and the performance of the model. Noise and missing-ness will be injected into the dataset.\n", + "\n", + "Let's generate the dataset and visualize it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 265 - }, - "id": "fH40EhC1j9dH", - "outputId": "c6936767-2162-4b6c-b430-e5717c70bb75" + "id": "fH40EhC1j9dH" }, - "outputs": [ - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ + "# Get the data for the cubic function, injected with noise and missing-ness\n", + "# This is just a toy dataset that we can use to test some of the wrappers on\n", "def gen_data(x_min, x_max, n, train=True):\n", + " if train: \n", " x = np.random.triangular(x_min, 2, x_max, size=(n, 1))\n", + " else: \n", + " x = np.linspace(x_min, x_max, n).reshape(n, 1)\n", "\n", - " sigma = np.exp(-(x+1)**2/1) + 0.2 if train else np.zeros_like(x)\n", - " y = x**3/6 + np.random.normal(0, sigma).astype(np.float32)\n", + " sigma = 2*np.exp(-(x+1)**2/1) + 0.2 if train else np.zeros_like(x)\n", + " y = x**3/6 + np.random.normal(0, sigma).astype(np.float32)\n", "\n", - " return x, y\n", + " return x, y\n", "\n", - "x, y = gen_data(-4, 4, 2000)\n", - "x_val, y_val = gen_data(-6, 6, 500)\n", - "plt.scatter(x_val,y_val, s=1.5, label='test data')\n", - "plt.scatter(x,y, s=1.5, label='train data')\n", + "# Plot the dataset and visualize the train and test datapoints\n", + "x_train, y_train = gen_data(-4, 4, 2000, train=True) # train data\n", + "x_test, y_test = gen_data(-6, 6, 500, train=False) # test data\n", "\n", - "plt.legend()\n", - "plt.show()" + "plt.figure(figsize=(10, 6))\n", + "plt.plot(x_test, y_test, c='r', zorder=-1, label='ground truth')\n", + "plt.scatter(x_train, y_train, s=1.5, label='train data')\n", + "plt.legend()" ] }, + { + "cell_type": "markdown", + "source": [ + "In the plot above, the blue points are the training data, which will be used as inputs to train the neural network model. The red line is the ground truth data, which will be used to evaluate the performance of the model.\n", + "\n", + "#### **TODO: Inspecting the 2D regression dataset**\n", + "\n", + " Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. What are your observations about where the train data and test data lie relative to each other?\n", + "2. What, if any, areas do you expect to have high/low aleatoric (data) uncertainty?\n", + "3. What, if any, areas do you expect to have high/low epistemic (model) uncertainty?" + ], + "metadata": { + "id": "Fz3UxT8vuN95" + } + }, { "cell_type": "markdown", "metadata": { "id": "mXMOYRHnv8tF" }, "source": [ - "### 1.2 Vanilla regression\n", - "Let's define a small model that can predict `y` given `x`: this is a classical regression task!" + "## 1.2 Regression on cubic dataset\n", + "\n", + "Next we will define a small dense neural network model that can predict `y` given `x`: this is a classical regression task! We will build the model and use the [`model.fit()`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) function to train the model -- normally, without any risk-awareness -- using the train dataset that we visualized above." ] }, { @@ -159,17 +191,30 @@ }, "outputs": [], "source": [ - "def create_standard_classifier():\n", + "### Define and train a dense NN model for the regression task###\n", + "\n", + "'''Function to define a small dense NN'''\n", + "def create_dense_NN():\n", " return tf.keras.Sequential(\n", " [\n", " tf.keras.Input(shape=(1,)),\n", - " tf.keras.layers.Dense(8, \"relu\"),\n", - " tf.keras.layers.Dense(8, \"relu\"),\n", + " tf.keras.layers.Dense(32, \"relu\"),\n", + " tf.keras.layers.Dense(32, \"relu\"),\n", + " tf.keras.layers.Dense(32, \"relu\"),\n", " tf.keras.layers.Dense(1),\n", " ]\n", " )\n", "\n", - "standard_classifier = create_standard_classifier()" + "dense_NN = create_dense_NN()\n", + "\n", + "# Build the model for regression, defining the loss function and optimizer\n", + "dense_NN.compile(\n", + " optimizer=tf.keras.optimizers.Adam(learning_rate=5e-3),\n", + " loss=tf.keras.losses.MeanSquaredError(), # MSE loss for the regression task\n", + ")\n", + "\n", + "# Train the model for 30 epochs using model.fit().\n", + "loss_history = dense_NN.fit(x_train, y_train, epochs=30)" ] }, { @@ -178,96 +223,44 @@ "id": "ovwYBUG3wTDv" }, "source": [ - "Let's first train this model normally, without any wrapping. Which areas would you expect the model to do well in? Which areas should it do worse in?" + "Now, we are ready to evaluate our neural network. We use the test data to assess performance on the regression task, and visualize the predicted values against the true values.\n", + "\n", + "Given your observation of the data in the previous plot, where do you expect the model to perform well? Let's test the model and see:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "oPNxsGBRwaNA", - "outputId": "0598cef9-350c-4785-a7a9-51ed3b54fd4b" + "id": "fb-EklZywR4D" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Epoch 1/10\n", - "63/63 [==============================] - 1s 2ms/step - loss: 5.5708\n", - "Epoch 2/10\n", - "63/63 [==============================] - 0s 2ms/step - loss: 4.3687\n", - "Epoch 3/10\n", - "63/63 [==============================] - 0s 2ms/step - loss: 3.9064\n", - "Epoch 4/10\n", - "63/63 [==============================] - 0s 2ms/step - loss: 3.1653\n", - "Epoch 5/10\n", - "63/63 [==============================] - 0s 4ms/step - loss: 2.1027\n", - "Epoch 6/10\n", - "63/63 [==============================] - 0s 3ms/step - loss: 1.6488\n", - "Epoch 7/10\n", - "63/63 [==============================] - 0s 2ms/step - loss: 1.3093\n", - "Epoch 8/10\n", - "63/63 [==============================] - 0s 2ms/step - loss: 1.1078\n", - "Epoch 9/10\n", - "63/63 [==============================] - 0s 2ms/step - loss: 0.9919\n", - "Epoch 10/10\n", - "63/63 [==============================] - 0s 3ms/step - loss: 0.8937\n" - ] - } - ], + "outputs": [], "source": [ - "standard_classifier.compile(\n", - " optimizer=tf.keras.optimizers.Adam(learning_rate=2e-3),\n", - " loss=tf.keras.losses.MeanSquaredError(),\n", - ")\n", + "# Pass the test data through the network and predict the y values\n", + "y_predicted = dense_NN.predict(x_test)\n", "\n", - "history = standard_classifier.fit(x, y, epochs=10)\n" + "# Visualize the true (x, y) pairs for the test data vs. the predicted values\n", + "plt.figure(figsize=(10, 6))\n", + "plt.scatter(x_train, y_train, s=1.5, label='train data')\n", + "plt.plot(x_test, y_test, c='r', zorder=-1, label='ground truth')\n", + "plt.plot(x_test, y_predicted, c='b', zorder=0, label='predicted')\n", + "plt.legend()" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 283 - }, - "id": "fb-EklZywR4D", - "outputId": "1f913c81-fbef-43dd-a391-b7dc209055fa" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "<matplotlib.legend.Legend at 0x7fe11cd8e3a0>" - ] - }, - "execution_count": 107, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "cell_type": "markdown", "source": [ - "plt.scatter(x_val, y_val, s=0.5, label='truth')\n", - "plt.scatter(x_val, standard_classifier(x_val), s=0.5, label='predictions')\n", - "plt.legend()" - ] + "\n", + "#### **TODO: Analyzing the performance of standard regression model**\n", + "\n", + "Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. Where does the model perform well?\n", + "2. Where does the model perform poorly?" + ], + "metadata": { + "id": "7Vktjwfu0ReH" + } }, { "cell_type": "markdown", @@ -275,8 +268,13 @@ "id": "7MzvM48JyZMO" }, "source": [ - "### 1.3 Bias Identification\n", - "Now that we've seen what the predictions from this model look like, let's see what the uncertainty and bias look like! To do this, we'll wrap a model first with a `HistogramWrapper`. For low-dimensional data, the HistogramWrapper bins the input directly into discrete categories and measures the density. " + "## 1.3 Evaluating bias\n", + "\n", + "Now that we've seen what the predictions from this model look like, we will identify and quantify bias and uncertainty in this problem. We first consider bias.\n", + "\n", + "Recall that *representation bias* reflects how likely combinations of features are to appear in a given dataset. Capsa calculates how likely combinations of features are by using a histogram estimation approach: the `capsa.HistogramWrapper`. For low-dimensional data, the `capsa.HistogramWrapper` bins the input directly into discrete categories and measures the density. More details of the `HistogramWrapper` and how it can be used are [available here](https://themisai.io/capsa/api_documentation/HistogramWrapper.html).\n", + "\n", + "We start by taking our `dense_NN` and wrapping it with the `capsa.HistogramWrapper`:" ] }, { @@ -287,10 +285,15 @@ }, "outputs": [], "source": [ - "standard_classifier = create_standard_classifier()\n", - "bias_wrapped_classifier = HistogramWrapper(standard_classifier, \n", - " queue_size=2000, # how many samples to track\n", - " target_hidden_layer=False) # for low-dimensional data, we can estimate densities directly from data\n" + "### Wrap the dense network for bias estimation ###\n", + "\n", + "standard_dense_NN = create_dense_NN()\n", + "bias_wrapped_dense_NN = capsa.HistogramWrapper(\n", + " standard_dense_NN, # the original model\n", + " num_bins=20,\n", + " queue_size=2000, # how many samples to track\n", + " target_hidden_layer=False # for low-dimensional data (like this dataset), we can estimate biases directly from data\n", + ")" ] }, { @@ -299,108 +302,29 @@ "id": "UFHO7LKcz8uP" }, "source": [ - "Now that we've wrapped the classifier, let's re-train it to update the biases as we train. We can use the exact same training pipeline as above to accomplish this!" + "Now that we've wrapped the classifier, let's re-train it to update the bias estimates as we train. We can use the exact same training pipeline, using `compile` to build the model and `model.fit()` to train the model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "SkyD3rsqy2ff", - "outputId": "7cd6b5fa-c61a-4306-faed-02a5b9d6a3e3" + "id": "SkyD3rsqy2ff" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Epoch 1/30\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:Gradients do not exist for variables ['dense_47/kernel:0', 'dense_47/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?\n", - "WARNING:tensorflow:Gradients do not exist for variables ['dense_47/kernel:0', 'dense_47/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "63/63 [==============================] - 1s 2ms/step - histogram_compiled_loss: 3.9734 - histogram_wrapper_loss: 7.2317\n", - "Epoch 2/30\n", - "63/63 [==============================] - 0s 2ms/step - histogram_compiled_loss: 2.0948 - histogram_wrapper_loss: 4.1785\n", - "Epoch 3/30\n", - "63/63 [==============================] - 0s 6ms/step - histogram_compiled_loss: 1.6633 - histogram_wrapper_loss: 3.2580\n", - "Epoch 4/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 1.5208 - histogram_wrapper_loss: 2.8870\n", - "Epoch 5/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 1.2064 - histogram_wrapper_loss: 2.5474\n", - "Epoch 6/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 1.1682 - histogram_wrapper_loss: 2.3297\n", - "Epoch 7/30\n", - "63/63 [==============================] - 0s 3ms/step - histogram_compiled_loss: 1.0387 - histogram_wrapper_loss: 2.0440\n", - "Epoch 8/30\n", - "63/63 [==============================] - 0s 3ms/step - histogram_compiled_loss: 0.9051 - histogram_wrapper_loss: 1.8478\n", - "Epoch 9/30\n", - "63/63 [==============================] - 0s 3ms/step - histogram_compiled_loss: 0.8954 - histogram_wrapper_loss: 1.6332\n", - "Epoch 10/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.7636 - histogram_wrapper_loss: 1.5712\n", - "Epoch 11/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.6725 - histogram_wrapper_loss: 1.3582\n", - "Epoch 12/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.6783 - histogram_wrapper_loss: 1.2359\n", - "Epoch 13/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.6118 - histogram_wrapper_loss: 1.1157\n", - "Epoch 14/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.5462 - histogram_wrapper_loss: 1.0705\n", - "Epoch 15/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.4946 - histogram_wrapper_loss: 0.9810\n", - "Epoch 16/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.4712 - histogram_wrapper_loss: 0.9213\n", - "Epoch 17/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.4449 - histogram_wrapper_loss: 0.8751\n", - "Epoch 18/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.4146 - histogram_wrapper_loss: 0.8342\n", - "Epoch 19/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.4441 - histogram_wrapper_loss: 0.8335\n", - "Epoch 20/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.4050 - histogram_wrapper_loss: 0.7910\n", - "Epoch 21/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.4113 - histogram_wrapper_loss: 0.7864\n", - "Epoch 22/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.3650 - histogram_wrapper_loss: 0.7556\n", - "Epoch 23/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.3521 - histogram_wrapper_loss: 0.7350\n", - "Epoch 24/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.3672 - histogram_wrapper_loss: 0.7575\n", - "Epoch 25/30\n", - "63/63 [==============================] - 0s 3ms/step - histogram_compiled_loss: 0.3608 - histogram_wrapper_loss: 0.7124\n", - "Epoch 26/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.3740 - histogram_wrapper_loss: 0.7006\n", - "Epoch 27/30\n", - "63/63 [==============================] - 0s 4ms/step - histogram_compiled_loss: 0.3691 - histogram_wrapper_loss: 0.6984\n", - "Epoch 28/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.3733 - histogram_wrapper_loss: 0.6849\n", - "Epoch 29/30\n", - "63/63 [==============================] - 0s 3ms/step - histogram_compiled_loss: 0.3372 - histogram_wrapper_loss: 0.6665\n", - "Epoch 30/30\n", - "63/63 [==============================] - 0s 5ms/step - histogram_compiled_loss: 0.3604 - histogram_wrapper_loss: 0.6658\n" - ] - } - ], + "outputs": [], "source": [ - "bias_wrapped_classifier.compile(\n", + "### Compile and train the wrapped model! ###\n", + "\n", + "# Build the model for regression, defining the loss function and optimizer\n", + "bias_wrapped_dense_NN.compile(\n", " optimizer=tf.keras.optimizers.Adam(learning_rate=2e-3),\n", - " loss=tf.keras.losses.MeanSquaredError(),\n", + " loss=tf.keras.losses.MeanSquaredError(), # MSE loss for the regression task\n", ")\n", "\n", - "history = bias_wrapped_classifier.fit(x, y, epochs=30)" + "# Train the wrapped model for 30 epochs.\n", + "loss_history_bias_wrap = bias_wrapped_dense_NN.fit(x_train, y_train, epochs=30)\n", + "\n", + "print(\"Done training model with Bias Wrapper!\")" ] }, { @@ -409,213 +333,117 @@ "id": "_6iVeeqq0f_H" }, "source": [ - "To access the bias for a given testing input, we can simply call the method as we would normally. In addition to outputting the prediction, this risk-aware model now also outputs an additional bias score per output." + "We can now use our wrapped model to assess the bias for a given test input. With the wrapping capability, Capsa neatly allows us to output a *bias score* along with the predicted target value. This bias score reflects the density of data surrounding an input point -- the higher the score, the greater the data representation and density. The wrapped, risk-aware model outputs the predicted target and bias score after it is called!\n", + "\n", + "Let's see how it is done:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 287 - }, - "id": "tZ17eCbP0YM4", - "outputId": "4da00423-1115-4bf2-95e6-966b8697b5b6" + "id": "tZ17eCbP0YM4" }, - "outputs": [ - { - "data": { - "text/plain": [ - "<matplotlib.legend.Legend at 0x7fe11cd97190>" - ] - }, - "execution_count": 110, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ - "predictions, bias = bias_wrapped_classifier(np.sort(x_val))\n", - "plt.scatter(np.sort(x_val), bias, label='bias', s=0.5)\n", - "plt.legend()" + "### Generate and visualize bias scores for data in test set ###\n", + "\n", + "# Call the risk-aware model to generate scores\n", + "predictions, bias = bias_wrapped_dense_NN(x_test)\n", + "\n", + "# Visualize the relationship between the input data x and the bias\n", + "fig, ax = plt.subplots(2, 1, figsize=(8,6))\n", + "ax[0].plot(x_test, bias, label='bias')\n", + "ax[0].set_ylabel('Estimated Bias')\n", + "ax[0].legend()\n", + "\n", + "# Let's compare against the ground truth density distribution\n", + "# should roughly align with our estimated bias in this toy example\n", + "ax[1].hist(x_train, 50, label='ground truth')\n", + "ax[1].set_xlim(-6, 6)\n", + "ax[1].set_ylabel('True Density')\n", + "ax[1].legend();" ] }, { "cell_type": "markdown", - "metadata": { - "id": "PvS8xR_q27Ec" - }, "source": [ - "## 1.3 Aleatoric Estimation\n", - "Now, let's do the same thing but for aleatoric estimation! The method we use here is Mean and Variance Estimation (MVE) since we're trying to estimate both mean and variance for every input. As presented in lecture 5, we measure the accuracy of these predictions negative likelihood loss in addition to mean squared error. However, capsa *automatically* does this for us, so we only have to specify the loss function that we want to use for evaluating the predictions, not the uncertainty." - ] + "#### **TODO: Evaluating bias with wrapped regression model**\n", + "\n", + "Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. How does the bias score relate to the train/test data density from the first plot?\n", + "2. What is one limitation of the Histogram approach that simply bins the data based on frequency?" + ], + "metadata": { + "id": "HpDMT_1FERQE" + } }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { - "id": "sxmm-2sd3G9u" + "id": "PvS8xR_q27Ec" }, - "outputs": [], "source": [ - "standard_classifier = create_standard_classifier()\n", - "mve_wrapped_classifier = MVEWrapper(standard_classifier)\n" + "# 1.4 Estimating data uncertainty\n", + "\n", + "Next we turn our attention to uncertainty, first focusing on the uncertainty in the data -- the aleatoric uncertainty.\n", + "\n", + "As introduced in Lecture 5 on Robust & Trustworthy Deep Learning, in regression we can estimate aleatoric uncertainty by training the model to predict both a target value and a variance for every input. Because we estimate both a mean and variance for every input, this method is called Mean Variance Estimation (MVE). MVE involves modifying the output layer to predict both the mean and variance, and changing the loss to reflect the prediction likelihood.\n", + "\n", + "Capsa automatically implements these changes for us: we can wrap a given model using `capsa.MVEWrapper` to use MVE to estimate aleatoric uncertainty. All we have to do is define the model and the loss function to evaluate its predictions! More details of the `MVEWrapper` and how it can be used are [available here](https://themisai.io/capsa/api_documentation/MVEWrapper.html).\n", + "\n", + "Let's take our standard network, wrap it with `capsa.MVEWrapper`, build the wrapped model, and then train it for the regression task. Finally, we evaluate performance of the resulting model by quantifying the aleatoric uncertainty across the data space: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Yr0yIJEc26yM", - "outputId": "5dc23258-613b-4a83-a672-70da11ebbe81" + "id": "sxmm-2sd3G9u" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Epoch 1/30\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:Gradients do not exist for variables ['dense_62/kernel:0', 'dense_62/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?\n", - "WARNING:tensorflow:Gradients do not exist for variables ['dense_62/kernel:0', 'dense_62/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss`argument?\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "63/63 [==============================] - 1s 2ms/step - mve_compiled_loss: 5.2917 - mve_wrapper_loss: 8.2990\n", - "Epoch 2/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 1.5322 - mve_wrapper_loss: 2.2633\n", - "Epoch 3/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.6495 - mve_wrapper_loss: 0.4517\n", - "Epoch 4/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.4826 - mve_wrapper_loss: -0.0290\n", - "Epoch 5/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.4571 - mve_wrapper_loss: -0.2686\n", - "Epoch 6/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.4070 - mve_wrapper_loss: -0.3623\n", - "Epoch 7/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3092 - mve_wrapper_loss: -0.4281\n", - "Epoch 8/30\n", - "63/63 [==============================] - 0s 3ms/step - mve_compiled_loss: 0.3229 - mve_wrapper_loss: -0.4038\n", - "Epoch 9/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3236 - mve_wrapper_loss: -0.5268\n", - "Epoch 10/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3022 - mve_wrapper_loss: -0.5458\n", - "Epoch 11/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3035 - mve_wrapper_loss: -0.6220\n", - "Epoch 12/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3110 - mve_wrapper_loss: -0.5680\n", - "Epoch 13/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2720 - mve_wrapper_loss: -0.4468\n", - "Epoch 14/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2848 - mve_wrapper_loss: -0.5656\n", - "Epoch 15/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3078 - mve_wrapper_loss: -0.6007\n", - "Epoch 16/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2827 - mve_wrapper_loss: -0.6292\n", - "Epoch 17/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3168 - mve_wrapper_loss: -0.6420\n", - "Epoch 18/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2910 - mve_wrapper_loss: -0.6672\n", - "Epoch 19/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3076 - mve_wrapper_loss: -0.5917\n", - "Epoch 20/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3097 - mve_wrapper_loss: -0.6985\n", - "Epoch 21/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2982 - mve_wrapper_loss: -0.5248\n", - "Epoch 22/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2912 - mve_wrapper_loss: -0.5999\n", - "Epoch 23/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3003 - mve_wrapper_loss: -0.5714\n", - "Epoch 24/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3314 - mve_wrapper_loss: -0.6600\n", - "Epoch 25/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2974 - mve_wrapper_loss: -0.5685\n", - "Epoch 26/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3157 - mve_wrapper_loss: -0.6695\n", - "Epoch 27/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2832 - mve_wrapper_loss: -0.6686\n", - "Epoch 28/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3066 - mve_wrapper_loss: -0.6361\n", - "Epoch 29/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.2880 - mve_wrapper_loss: -0.6316\n", - "Epoch 30/30\n", - "63/63 [==============================] - 0s 2ms/step - mve_compiled_loss: 0.3322 - mve_wrapper_loss: -0.5759\n" - ] - } - ], + "outputs": [], "source": [ - "mve_wrapped_classifier.compile(\n", + "### Estimating data uncertainty with Capsa wrapping ###\n", + "\n", + "standard_dense_NN = create_dense_NN()\n", + "# Wrap the dense network for aleatoric uncertainty estimation\n", + "mve_wrapped_NN = capsa.MVEWrapper(standard_dense_NN)\n", + "\n", + "# Build the model for regression, defining the loss function and optimizer\n", + "mve_wrapped_NN.compile(\n", " optimizer=tf.keras.optimizers.Adam(learning_rate=1e-2),\n", - " loss=tf.keras.losses.MeanSquaredError(),\n", + " loss=tf.keras.losses.MeanSquaredError(), # MSE loss for the regression task\n", ")\n", "\n", - "history = mve_wrapped_classifier.fit(x, y, epochs=30)" + "# Train the wrapped model for 30 epochs.\n", + "loss_history_mve_wrap = mve_wrapped_NN.fit(x_train, y_train, epochs=30)\n", + "\n", + "# Call the uncertainty-aware model to generate outputs for the test data\n", + "x_test_clipped = np.clip(x_test, x_train.min(), x_train.max())\n", + "prediction = mve_wrapped_NN(x_test_clipped)" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 283 - }, - "id": "k_m_7H4P1ADv", - "outputId": "c215f212-1bb6-45aa-beab-0565ed20362b" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "<matplotlib.legend.Legend at 0x7fe12a1e10a0>" - ] - }, - "execution_count": 126, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], "source": [ - "outputs = mve_wrapped_classifier(x_val)\n", - "plt.scatter(x_val, outputs.aleatoric, label='aleatoric uncertainty', s=0.5)\n", + "# Capsa makes the aleatoric uncertainty an attribute of the prediction!\n", + "pred = np.array(prediction.y_hat).flatten()\n", + "unc = np.sqrt(prediction.aleatoric).flatten() # out.aleatoric is the predicted variance\n", + "\n", + "# Visualize the aleatoric uncertainty across the data space\n", + "plt.figure(figsize=(10, 6))\n", + "plt.scatter(x_train, y_train, s=1.5, label='train data')\n", + "plt.plot(x_test, y_test, c='r', zorder=-1, label='ground truth')\n", + "plt.fill_between(x_test_clipped.flatten(), pred-2*unc, pred+2*unc, \n", + " color='b', alpha=0.2, label='aleatoric')\n", "plt.legend()" - ] + ], + "metadata": { + "id": "dT2Rx8JCg3NR" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -623,7 +451,12 @@ "id": "ZFeArgRX9U9s" }, "source": [ - "We can see that in the areas of high label noise-- where small changes in the input lead to large changes in the output-- aleatoric uncertainty spikes!" + "#### **TODO: Estimating aleatoric uncertainty**\n", + "\n", + "Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. For what values of $x$ is the aleatoric uncertainty high or increasing suddenly?\n", + "2. How does your answer in (1) relate to how the $x$ values are distributed?" ] }, { @@ -632,172 +465,100 @@ "id": "6FC5WPRT5lAb" }, "source": [ - "## 1.4 Epistemic Estimation\n", - "Finally, let's do the same thing but for epistemic estimation! In this example, we'll use ensembles, which essentially copy the model `N` times and average predictions across all runs for a more robust prediction, and also calculate the variance of the `N` runs. Feel free to play around with any of the epistemic methods shown in the github repository! Which methods perform the best? Why do you think this is?" + "# 1.5 Estimating model uncertainty\n", + "\n", + "Finally, we use Capsa for estimating the uncertainty underlying the model predictions -- the epistemic uncertainty. In this example, we'll use ensembles, which essentially copy the model `N` times and average predictions across all runs for a more robust prediction, and also calculate the variance of the `N` runs to estimate the uncertainty.\n", + "\n", + "Capsa provides a neat wrapper, `capsa.EnsembleWrapper`, to make an ensemble from an input model. Just like with aleatoric estimation, we can take our standard dense network model, wrap it with `capsa.EnsembleWrapper`, build the wrapped model, and then train it for the regression task. More details of the `EnsembleWrapper` and how it can be used are [available here](https://themisai.io/capsa/api_documentation/EnsembleWrapper.html).\n", + "\n", + "Finally, we evaluate the resulting model by quantifying the epistemic uncertainty on the test data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "SuRlhq2c5Fob", - "outputId": "b1f81f5a-69da-4e40-af2a-2a908a1da639" + "id": "SuRlhq2c5Fob" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Epoch 1/30\n", - "63/63 [==============================] - 2s 3ms/step - usermodel_0_compiled_loss: 6.5601 - usermodel_1_compiled_loss: 4.9589 - usermodel_2_compiled_loss: 4.8135 - usermodel_3_compiled_loss: 4.3547 - usermodel_4_compiled_loss: 6.3809\n", - "Epoch 2/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 3.7732 - usermodel_1_compiled_loss: 4.3968 - usermodel_2_compiled_loss: 2.8938 - usermodel_3_compiled_loss: 3.2580 - usermodel_4_compiled_loss: 4.0894\n", - "Epoch 3/30\n", - "63/63 [==============================] - 0s 4ms/step - usermodel_0_compiled_loss: 2.8896 - usermodel_1_compiled_loss: 4.1322 - usermodel_2_compiled_loss: 2.3051 - usermodel_3_compiled_loss: 2.6397 - usermodel_4_compiled_loss: 3.0872\n", - "Epoch 4/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 2.4637 - usermodel_1_compiled_loss: 4.0084 - usermodel_2_compiled_loss: 2.0035 - usermodel_3_compiled_loss: 2.2897 - usermodel_4_compiled_loss: 2.5871\n", - "Epoch 5/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 2.1638 - usermodel_1_compiled_loss: 3.8279 - usermodel_2_compiled_loss: 1.7763 - usermodel_3_compiled_loss: 2.0321 - usermodel_4_compiled_loss: 2.2477\n", - "Epoch 6/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.9428 - usermodel_1_compiled_loss: 3.6973 - usermodel_2_compiled_loss: 1.6039 - usermodel_3_compiled_loss: 1.8292 - usermodel_4_compiled_loss: 2.0054\n", - "Epoch 7/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.7664 - usermodel_1_compiled_loss: 3.5628 - usermodel_2_compiled_loss: 1.4630 - usermodel_3_compiled_loss: 1.6634 - usermodel_4_compiled_loss: 1.8182\n", - "Epoch 8/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.6222 - usermodel_1_compiled_loss: 3.4514 - usermodel_2_compiled_loss: 1.3476 - usermodel_3_compiled_loss: 1.5290 - usermodel_4_compiled_loss: 1.6662\n", - "Epoch 9/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.4997 - usermodel_1_compiled_loss: 3.3267 - usermodel_2_compiled_loss: 1.2495 - usermodel_3_compiled_loss: 1.4147 - usermodel_4_compiled_loss: 1.5371\n", - "Epoch 10/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.3961 - usermodel_1_compiled_loss: 3.2211 - usermodel_2_compiled_loss: 1.1664 - usermodel_3_compiled_loss: 1.3165 - usermodel_4_compiled_loss: 1.4279\n", - "Epoch 11/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.3066 - usermodel_1_compiled_loss: 3.1169 - usermodel_2_compiled_loss: 1.0947 - usermodel_3_compiled_loss: 1.2315 - usermodel_4_compiled_loss: 1.3337\n", - "Epoch 12/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.2304 - usermodel_1_compiled_loss: 3.0270 - usermodel_2_compiled_loss: 1.0334 - usermodel_3_compiled_loss: 1.1599 - usermodel_4_compiled_loss: 1.2545\n", - "Epoch 13/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.1635 - usermodel_1_compiled_loss: 2.9340 - usermodel_2_compiled_loss: 0.9794 - usermodel_3_compiled_loss: 1.0972 - usermodel_4_compiled_loss: 1.1850\n", - "Epoch 14/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.1056 - usermodel_1_compiled_loss: 2.8478 - usermodel_2_compiled_loss: 0.9327 - usermodel_3_compiled_loss: 1.0430 - usermodel_4_compiled_loss: 1.1251\n", - "Epoch 15/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.0553 - usermodel_1_compiled_loss: 2.7690 - usermodel_2_compiled_loss: 0.8924 - usermodel_3_compiled_loss: 0.9959 - usermodel_4_compiled_loss: 1.0732\n", - "Epoch 16/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 1.0106 - usermodel_1_compiled_loss: 2.6925 - usermodel_2_compiled_loss: 0.8563 - usermodel_3_compiled_loss: 0.9544 - usermodel_4_compiled_loss: 1.0276\n", - "Epoch 17/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.9704 - usermodel_1_compiled_loss: 2.6262 - usermodel_2_compiled_loss: 0.8240 - usermodel_3_compiled_loss: 0.9167 - usermodel_4_compiled_loss: 0.9862\n", - "Epoch 18/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.9343 - usermodel_1_compiled_loss: 2.5600 - usermodel_2_compiled_loss: 0.7951 - usermodel_3_compiled_loss: 0.8834 - usermodel_4_compiled_loss: 0.9494\n", - "Epoch 19/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.9023 - usermodel_1_compiled_loss: 2.4966 - usermodel_2_compiled_loss: 0.7695 - usermodel_3_compiled_loss: 0.8541 - usermodel_4_compiled_loss: 0.9171\n", - "Epoch 20/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.8727 - usermodel_1_compiled_loss: 2.4440 - usermodel_2_compiled_loss: 0.7459 - usermodel_3_compiled_loss: 0.8270 - usermodel_4_compiled_loss: 0.8870\n", - "Epoch 21/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.8455 - usermodel_1_compiled_loss: 2.3854 - usermodel_2_compiled_loss: 0.7243 - usermodel_3_compiled_loss: 0.8023 - usermodel_4_compiled_loss: 0.8594\n", - "Epoch 22/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.8213 - usermodel_1_compiled_loss: 2.3344 - usermodel_2_compiled_loss: 0.7052 - usermodel_3_compiled_loss: 0.7805 - usermodel_4_compiled_loss: 0.8349\n", - "Epoch 23/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.7981 - usermodel_1_compiled_loss: 2.2851 - usermodel_2_compiled_loss: 0.6867 - usermodel_3_compiled_loss: 0.7593 - usermodel_4_compiled_loss: 0.8112\n", - "Epoch 24/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.7778 - usermodel_1_compiled_loss: 2.2358 - usermodel_2_compiled_loss: 0.6707 - usermodel_3_compiled_loss: 0.7409 - usermodel_4_compiled_loss: 0.7906\n", - "Epoch 25/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.7586 - usermodel_1_compiled_loss: 2.1888 - usermodel_2_compiled_loss: 0.6555 - usermodel_3_compiled_loss: 0.7235 - usermodel_4_compiled_loss: 0.7711\n", - "Epoch 26/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.7409 - usermodel_1_compiled_loss: 2.1461 - usermodel_2_compiled_loss: 0.6415 - usermodel_3_compiled_loss: 0.7074 - usermodel_4_compiled_loss: 0.7532\n", - "Epoch 27/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.7247 - usermodel_1_compiled_loss: 2.1043 - usermodel_2_compiled_loss: 0.6288 - usermodel_3_compiled_loss: 0.6927 - usermodel_4_compiled_loss: 0.7367\n", - "Epoch 28/30\n", - "63/63 [==============================] - 0s 4ms/step - usermodel_0_compiled_loss: 0.7098 - usermodel_1_compiled_loss: 2.0641 - usermodel_2_compiled_loss: 0.6171 - usermodel_3_compiled_loss: 0.6792 - usermodel_4_compiled_loss: 0.7216\n", - "Epoch 29/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.6950 - usermodel_1_compiled_loss: 2.0264 - usermodel_2_compiled_loss: 0.6053 - usermodel_3_compiled_loss: 0.6658 - usermodel_4_compiled_loss: 0.7067\n", - "Epoch 30/30\n", - "63/63 [==============================] - 0s 3ms/step - usermodel_0_compiled_loss: 0.6819 - usermodel_1_compiled_loss: 1.9897 - usermodel_2_compiled_loss: 0.5950 - usermodel_3_compiled_loss: 0.6539 - usermodel_4_compiled_loss: 0.6934\n" - ] - } - ], + "outputs": [], "source": [ - "standard_classifier = create_standard_classifier()\n", - "ensemble_wrapper = EnsembleWrapper(standard_classifier, num_members=5)\n", + "### Estimating model uncertainty with Capsa wrapping ###\n", + "\n", + "standard_dense_NN = create_dense_NN()\n", + "# Wrap the dense network for epistemic uncertainty estimation with an Ensemble\n", + "ensemble_NN = capsa.EnsembleWrapper(standard_dense_NN)\n", "\n", - "ensemble_wrapper.compile(\n", + "# Build the model for regression, defining the loss function and optimizer\n", + "ensemble_NN.compile(\n", " optimizer=tf.keras.optimizers.Adam(learning_rate=3e-3),\n", - " loss=tf.keras.losses.MeanSquaredError(),\n", + " loss=tf.keras.losses.MeanSquaredError(), # MSE loss for the regression task\n", ")\n", "\n", - "history = ensemble_wrapper.fit(x, y, epochs=30)" + "# Train the wrapped model for 30 epochs.\n", + "loss_history_ensemble = ensemble_NN.fit(x_train, y_train, epochs=30)\n", + "\n", + "# Call the uncertainty-aware model to generate outputs for the test data\n", + "prediction = ensemble_NN(x_test)" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 283 - }, - "id": "HfnPqf8T6TVw", - "outputId": "4a9fa19d-ae27-477c-bc47-f84d62a3444f" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "<matplotlib.legend.Legend at 0x7fe127f9f7f0>" - ] - }, - "execution_count": 130, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "<Figure size 432x288 with 1 Axes>" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], "source": [ - "outputs = ensemble_wrapper(x_val)\n", - "plt.scatter(x_val, outputs.epistemic, label='epistemic uncertainty', s=0.5)\n", + "# Capsa makes the epistemic uncertainty an attribute of the prediction!\n", + "pred = np.array(prediction.y_hat).flatten()\n", + "unc = np.array(prediction.epistemic).flatten()\n", + "\n", + "# Visualize the aleatoric uncertainty across the data space\n", + "plt.figure(figsize=(10, 6))\n", + "plt.scatter(x_train, y_train, s=1.5, label='train data')\n", + "plt.plot(x_test, y_test, c='r', zorder=-1, label='ground truth')\n", + "plt.fill_between(x_test.flatten(), pred-20*unc, pred+20*unc, color='b', alpha=0.2, label='epistemic')\n", "plt.legend()" - ] + ], + "metadata": { + "id": "eauNoKDOj_ZT" + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", - "metadata": { - "id": "VU6eMpYX9m9N" - }, "source": [ - "## Conclusion\n", - "As expected, areas where there is no training data have very high epistemic uncertainty, since all of the testing data is OOD. If our training data contained more samples from this region, would you expect the epistemic uncertainty to decrease?" - ] + "#### **TODO: Estimating epistemic uncertainty**\n", + "\n", + "Write short (~1 sentence) answers to the questions below to complete the `TODO`s:\n", + "\n", + "1. For what values of $x$ is the epistemic uncertainty high or increasing suddenly?\n", + "2. How does your answer in (1) relate to how the $x$ values are distributed (refer back to original plot)? Think about both the train and test data.\n", + "3. How could you reduce the epistemic uncertainty in regions where it is high?" + ], + "metadata": { + "id": "N4LMn2tLPBdg" + } }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "id": "CkpvkOL06jRd" }, "source": [ + "# 1.6 Conclusion\n", "\n", - "You've just analyzed the bias, aleatoric uncertainty, and epistemic uncertainty for your first risk-aware model! This is a task that data scientists do constantly to determine methods of improving their models and datasets. In the next part, you'll continue to build off of these concepts to *mitigate* these risks, in addition to diagnosing them!\n", + "You've just analyzed the bias, aleatoric uncertainty, and epistemic uncertainty for your first risk-aware model! This is a task that data scientists do constantly to determine methods of improving their models and datasets.\n", + "\n", + "In the next part of the lab, you'll continue to build off of these concepts to study them in the context of facial detection systems: not only diagnosing issues of bias and uncertainty, but also developing solutions to *mitigate* these risks.\n", "\n", "" ] }, { "cell_type": "code", - "execution_count": null, + "source": [], "metadata": { - "id": "bs4mAQ5c6cMY" + "id": "nIpfPcpjlsKK" }, - "outputs": [], - "source": [] + "execution_count": null, + "outputs": [] } ], "metadata": { @@ -814,4 +575,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} +} \ No newline at end of file diff --git a/mitdeeplearning/lab3.py b/mitdeeplearning/lab3.py index a33f4886..ee7b9213 100644 --- a/mitdeeplearning/lab3.py +++ b/mitdeeplearning/lab3.py @@ -76,6 +76,7 @@ def __init__(self, data_path, batch_size, training=True): self.train_inds = np.concatenate((self.pos_train_inds, self.neg_train_inds)) self.batch_size = batch_size self.p_pos = np.ones(self.pos_train_inds.shape) / len(self.pos_train_inds) + self.p_neg = np.ones(self.neg_train_inds.shape) / len(self.neg_train_inds) def get_train_size(self): return self.pos_train_inds.shape[0] + self.neg_train_inds.shape[0] @@ -88,7 +89,7 @@ def __getitem__(self, index): self.pos_train_inds, size=self.batch_size // 2, replace=False, p=self.p_pos ) selected_neg_inds = np.random.choice( - self.neg_train_inds, size=self.batch_size // 2, replace=False + self.neg_train_inds, size=self.batch_size // 2, replace=False, p = self.p_neg ) selected_inds = np.concatenate((selected_pos_inds, selected_neg_inds)) diff --git a/setup.py b/setup.py index 90d0df82..9f2ffd0b 100644 --- a/setup.py +++ b/setup.py @@ -22,13 +22,13 @@ def get_dist(pkgname): setup( name = 'mitdeeplearning', # How you named your package folder (MyLib) packages = ['mitdeeplearning'], # Chose the same as "name" - version = '0.3.0', # Start with a small number and increase it with every change you make + version = '0.4.0', # Start with a small number and increase it with every change you make license='MIT', # Chose a license from here: https://help.github.com/articles/licensing-a-repository description = 'Official software labs for MIT Introduction to Deep Learning (http://introtodeeplearning.com)', # Give a short description about your library author = 'Alexander Amini', # Type in your name author_email = 'introtodeeplearning-staff@mit.edu', # Type in your E-Mail url = 'http://introtodeeplearning.com', # Provide either the link to your github or to your website - download_url = 'https://github.com/aamini/introtodeeplearning/archive/v0.3.0.tar.gz', # I explain this later on + download_url = 'https://github.com/aamini/introtodeeplearning/archive/v0.4.0.tar.gz', # I explain this later on keywords = ['deep learning', 'neural networks', 'tensorflow', 'introduction'], # Keywords that define your package best install_requires=install_deps, classifiers=[