forked from secML/secML.github.io
-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.xml
More file actions
2349 lines (1584 loc) · 222 KB
/
index.xml
File metadata and controls
2349 lines (1584 loc) · 222 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>secML</title>
<link>https://secml.github.io/index.xml</link>
<description>Recent content on secML</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Fri, 23 Mar 2018 00:00:00 +0000</lastBuildDate>
<atom:link href="https://secml.github.io/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Class 8: Testing of Deep Networks</title>
<link>https://secml.github.io/class8/</link>
<pubDate>Fri, 23 Mar 2018 00:00:00 +0000</pubDate>
<guid>https://secml.github.io/class8/</guid>
<description>
<h2 id="deepxplore-automated-whitebox-testing-of-deep-learning-systems">DeepXplore: Automated Whitebox Testing of Deep Learning Systems</h2>
<blockquote>
<p>Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. 2017. <em>DeepXplore: Automated Whitebox Testing of Deep Learning Systems</em>. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP ’17). ACM, New York, NY, USA, 18 pages. <a href="https://arxiv.org/pdf/1705.06640.pdf">[PDF]</a></p>
</blockquote>
<p>As deep learning is increasingly applied to security-critical domains, having high confidence in the accuracy of a model&rsquo;s predictions is vital. Just as in traditional software development, confidence in the correctness of a model&rsquo;s behavior stems from rigorous testing across a wide variety of possible scenarios. However, unlike in traditional software development, the logic of deep learning systems is learned through the training process, which opens the door to many possible causes of unexpected behavior, like biases in the training data, overfitting, underfitting, etc. As this logic does not exist as an actual line of code, deep learning models are extremely difficult to test, and those who do are are faced with two key challenges:</p>
<ol>
<li>How can all (or at least most) of the model&rsquo;s logic be triggered so as to discover incorrect behavior?</li>
<li>How can such incorrect behavior be identified without manual inspection?</li>
</ol>
<p>To address these challenges, the authors of this paper first introduce <em>neuron coverage</em> as a measure of how much of a model&rsquo;s logic is activated by the test cases. To avoid manually inspecting output behavior for correctness, other DL systems designed for the same purpose are compared across the same set of test inputs, following the logic that if the models disagree than at least one model&rsquo;s output must be incorrect. These two solutions are then reformulated into a joint optimization problem, which is implemented in the whitebox DL-testing framework DeepXplore.</p>
<h3 id="limitations-of-current-testing">Limitations of Current Testing</h3>
<p>The motivation for the DeepXplore framework is the inability of current methods to thoroughly test deep neural networks. Most existing techniques to identify incorrect behavior require human effort to manually label samples with the correct output, which quickly becomes prohibitively expensive for large datasets. Additionally, the input space of these models is so large that test inputs cover only a small fraction of cases, leaving many corner cases untested. Recent work has shown that these untested cases near model decision boundaries leave DNNs vulnerable to adversarial evasion attacks, in which small perturbations to the input cause a misclassification. And even when these adversarial examples are used to retrain the model and improve accuracy, they still do not have enough model coverage to prevent future evasion attacks.</p>
<h3 id="neuron-coverage">Neuron Coverage</h3>
<p>To measure the area of the input space covered by tests, the authors define what they call &ldquo;neuron coverage,&rdquo; a metric analogous to code coverage in traditional software testing. As seen in the figure below, neuron coverage measures the percentage of nodes a given test input activates in the DNN, analogous to the percentage of the source code executed on code coverage metrics. This is believed to be a better measure of the robustness test inputs because the logic of a DNN is learned, not programmed, and exists primarily in the layers of nodes that compose the model, not the source code.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/neuron_coverage.png" width="500">
<div class="caption"> <a href="(https://arxiv.org/pdf/1705.06640.pdf)"><em>Source</em></a> </div>
</p>
<h3 id="cross-referencing-oracles">Cross-referencing Oracles</h3>
<p>To eliminate the need for expensive human effort to check output correctness, multiple DL models are tested on the same inputs and their behavior compared. If different DNNs designed for the same application produce different outputs on the same input, at least one of them should be incorrect, therefore identifying potential model inaccuracies.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/oracle.png" width="500">
<div class="caption"> <a href="(https://arxiv.org/pdf/1705.06640.pdf)"><em>Source</em></a> </div>
</p>
<h3 id="method">Method</h3>
<p>The primary objective for the test generation process is to maximize the neuron coverage and differential behaviors observed across models. This is formulated as a joint optimization problem with domain-specific constraints (i.e., to ensure that a test case discovered is still a valid input), which is then solved using a gradient ascent algorithm.</p>
<h3 id="experimental-set-up">Experimental Set-up</h3>
<p>The DeepXplore framework was used to test three DNNs for each of five well-known public datasets that span multiple domains: MNIST, ImageNet, Driving, Contagio/Virustotal, and Drebin. Because these datasets include images, video frames, PDF malware, and Android malware, different domain-specific constraints were incorporated into DeepXplore for each dataset (e.g. pixel values for images need to remain between 0 and 255, PDF malware should still be a valid PDF file, etc.). The details of the chosen DNNs and datasets can be seen in the table below.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/datasets.png" width="700">
<div class="caption"> <a href="(https://arxiv.org/pdf/1705.06640.pdf)"><em>Source</em></a> </div>
</p>
<p>For each dataset, 2,000 random samples were selected as seed inputs, which were then manipulated to search for erroneous behaviors in the test DNNs. For example, in the image datasets, the lighting conditions were modified to find inputs on which the DNNs disagreed. This is shown in the photos below from the Driving dataset for self-driving cars.
<p align="center">
<img src="https://secml.github.io/images/class8/driving.png" width="400">
<div class="caption"> <a href="(https://arxiv.org/pdf/1705.06640.pdf)"><em>Source</em></a> </div>
</p>
The top row has the original images with arrows indicating that all three DNNs agreed on the decision and the bottom row has the images with modified lighting conditions with arrows showing that at least one of the models made a difference decision than the other two.</p>
<h2 id="deep-k-nearest-neighbors-towards-confident-interpretable-and-robust-deep-learning">Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning</h2>
<blockquote>
<p>Nicolas Papernot, Patrick McDaniel. 2018. <em>Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning</em>. <a href="https://arxiv.org/pdf/1803.04765.pdf">[PDF]</a></p>
</blockquote>
<h2 id="k-nearest-neighbors">k-Nearest Neighbors</h2>
<p>Deep learning is ubiquitous. Deep neural networks achieve a good performance on challenging tasks like machine translation, diagnosing medical conditions, malware detection, and classification of images. In this research work, the authors mentioned about three
well-identified criticisms directly relevant to the security. They are the lack of reliable confidence in machine learning, model interpretability and robustness. Authors introduced the Deep k-Nearest Neighbors (DkNN) classification algorithm in this research work. It enforces the conformity of the predictions made by a DNN model on the test data with respect to the model’s training data. For each layer in the neural network, the DkNN performs a nearest neighbor search to find training points for which the layer’s output is closest to the layer’s output on the test input. Then they analyze the assigned label of these neighboring points to make it sure that the intermediate layer&rsquo;s computations remain conformal with the final output model’s prediction.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/Dknn.png">
<br> <b>Figure:</b> Intuition behind the Deep k-Nearest Neighbors (DkNN)
</p>
<p>Consider a deep neural network (in the left of the figure), representations output by each layer (in the middle of the figure) and the nearest neighbors found at each layer in the training data (in the right of the figure). Drawings of pandas and school buses indicate training points. We can observe that confidence is high when there is homogeneity among the nearest neighbors labels. Interpretability of the outcome of each layer is provided by the nearest neighbors. Robustness stems from detecting nonconformal predictions from nearest neighbor labels found for out-of-distribution inputs across different layers.</p>
<h3 id="algorithm">Algorithm</h3>
<p>The psudo-code for their k-Nearest Neighbors (DkNN) that the authors introduced in ensuring that the intermediate layer&rsquo;s computations remain conformal with the respect to the final model&rsquo;s prediction is given below-</p>
<p align="center">
<img src="https://secml.github.io/images/class8/algorithm_dknn.png" width="400">
<br> <b>Figure:</b> Code snippet of Deep k-Nearest Neighbor
</p>
<h2 id="basis-for-evaluation">Basis for Evaluation</h2>
<p>Basis for evaluating robustness, interpretability, Confidence are discussed below-
<p align="center">
<img src="https://secml.github.io/images/class8/basis_evalution.png" width="500">
<br> <b>Figure:</b> Basis for Evaluation
</p></p>
<h2 id="evaluation-of-confidence-credibility">Evaluation of Confidence/ Credibility</h2>
<p>In their experiments, they measured high confidence on inputs. From the experiment they observed that credibility varies across both in- and out-of-distribution samples. They tailored their evaluation to demonstrate that the credibility is well calibrated. They performed their experiments on both benign and adversarial examples.</p>
<h2 id="classification-accuracy">Classification Accuracy</h2>
<p>In their experiment, they used three datasets. First, hand written recognition task of MNIST dataset, SVHN dataset and the third one is GTSRB dataset. In the following figure, we can observe the comparison of the accuracy between DNN and DkNN model on three different dataset.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/classification_accuracy.png" width="400">
<br> <b>Figure:</b> Classification accuracy of the DNN and DkNN: the DkNN has a limited impact on or improves performance.
</p>
<h2 id="credibility-on-in-distribution-samples">Credibility on in-distribution samples</h2>
<p>Reliability diagrams are plotted for the three different datasets (MNIST, SVHN and GTSRB) below:
<p align="center">
<img src="https://secml.github.io/images/class8/evaluation_confidence_dknn.png" width="500">
<br> <b>Figure:</b> Reliability diagrams of DNN softmax confidence
(left) and DkNN credibility (right) on test data—bars (left
axis) indicate the mean accuracy of predictions binned by
credibility; the red line (right axis) illustrates data density
across bins. The softmax outputs high confidence on most of
the data while DkNN credibility spreads across the value range.<br />
</p></p>
<p>On the left, they visualized the estimation of confidence output by the DNN softmax and it is calculated by the probability \(arg ~max_{j} ~f_j(x)\). On the right, they plotted the credibility of DkNN predictions. From the graph, it may appear that the softmax is better calibrated than the corresponding DkNN. Because its reliability diagrams are closer to the linear relation between accuracy and DNN confidence. But if the distribution of DkNN credibility values are considered then it surfaces that the softmax is almost always very confident on test data with a confidence above 0.8. DkNN uses the range of possible credibility values for datasets like SVHN (test set contains a larger number of inputs that are difficult to classify).</p>
<!-- ## Mislabeled Inputs
<p align="center">
<img src="https://secml.github.io/images/class8/mislabeled.png" width="600">
<br> <b>Figure:</b> Mislabeled inputs from the MNIST (top) and SVHN
(bottom) test sets: we found these points by searching for
inputs that are classified with strong credibility by the DkNN
in a class that is different than the label found in the dataset.
</p> -->
<h2 id="credibility-on-out-of-distribution-samples">Credibility on out-of-distribution samples</h2>
<p>Images from NotMNIST is identical to MNIST but the classes are non-overlapping. For MNIST, the first set of out-of-distribution samples contains images from the NotMNIST dataset. For SVHN, the out-of-distribution samples contains images from the CIFAR-10 dataset. Again, they have the same format but there is no overlap between SVHN and CIFAR-10. For both the MNIST and SVHN datasets, they rotated all the test inputs by an angle of 45 degree to generate a second set of out-of-distribution samples.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/dknn-on-test-data.png" width="600">
<br> <b>Figure:</b> DkNN credibility vs. softmax confidence on outof-distribution
test data: the lower credibility of DkNN
predictions (solid lines) compared to the softmax confidence
(dotted lines) is desirable here because test inputs are not part
of the distribution on which the model was trained—they are
from another dataset or created by rotating inputs.
</p>
<p>In the above figure, the credibility of the DkNN on the out-of-distribution samples is compared with the DNN softmax on MNIST (left) and SVHN (right). The DkNN algorithm has an average credibility of 6% and 9% to inputs from the NotMNIST and rotated MNIST test sets respectively, compared to 33% and 31% for the softmax probabilities. We find the same observation for SVHN model. Here, the DkNN assigns an average credibility of 15% and 18% to CIFAR-10 and rotated SVHN inputs, compared to 52% and 33% for the softmax probabilities.</p>
<h2 id="evaluation-of-the-interpretability">Evaluation of the interpretability</h2>
<p>Here, they have considered the model being bias to the skin color of a person. In a recent study, Stock and Cisse demonstrate how an image of former US president Barack Obama throwing an American football in a stadium ResNet model. They reproduce their experiment and apply the DkNN algorithm to this model. They plotted the 10 nearest neighbors from the training data. These neighbors are computed by the last hidden layer of ResNet model.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/evaluation-interpretability.png" width="600">
<br> <b>Figure:</b> Debugging ResNet model biases—This illustrates how
the DkNN algorithm helps to understand a bias identified by
Stock and Cisse [105] in the ResNet model for ImageNet. The
image at the bottom of each column is the test input presented
to the DkNN. Each test input is cropped slightly differently to
include (left) or exclude (right) the football. Images shown at
the top are nearest neighbors in the predicted class according
to the representation output by the last hidden layer. This
comparison suggests that the “basketball” prediction may have
been a consequence of the ball being in the picture. Also
note how the white apparel color and general arm positions of
players often match the test image of Barack Obama.
</p>
<p>On the left side of the above figure, the test image that is processed by the DNN is the same as that of the one used by Stock and Cisse. It contains 7 black and 3 white basketball players. They are similar to the color and also located in the air. They assumed that the ball play an important role in prediction. So, they ran another experiment with the same image but now cropping the image to remove the ball. Now the model predictated it as he is playing racket. Neighbor in this training class are white players. Image share certain charateristics, such as the background is green and most of the people are wearing white dresses and holding there hands in the air. In this example, besides the skin color, the position and appearance of the ball also contributed to the model&rsquo;s prediction.</p>
<h2 id="evaluation-of-robustness">Evaluation of Robustness</h2>
<p>DkNN is a step towards correctly handling malicious inputs like adversarial inputs because:</p>
<ul>
<li>outputs more reliable confidence estimates on adversarial examples than the softmax.</li>
<li>provides insights as to why adversarial examples affect undefended DNNs.</li>
<li>robust to adaptive attacks they considered</li>
</ul>
<h2 id="accuracy-on-adversarial-examples">Accuracy on Adversarial Examples</h2>
<p>They crafted adversarial examples using three algorithms: Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), and Carlini-Wagner 2 attack (CW).</p>
<p>All there test results are shown in the following table. They have also included the accuracy of both undefended DNN and DkNN. By observing the table, they made a conclusion that even though the attacks were successful in evading the undefended DNN, but when the model is integrated with DkNN, then some accuracy on adversarial examples is recovered.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/evaluation-robustness.png" width="500">
<br> <b>Figure:</b> Adversarial example classification accuracy for
the DNN and DkNN: attack parameters are chosen according
to prior work. All input features were clipped to remain in their
range. Note that most wrong predictions made by the DkNN
are assigned low credibility (see Figure 6 and the Appendix).
</p>
<p>They have plotted the reliability diagrams comparing the DkNN credibility on GTSRB adversarial examples with the softmax probabilities output by the DNN. For DkNN, credibility is low across all attacks. Here, the number of points in each bin is reflected by the red line. DkNN outputs a credibility below 0.5 for most of the inputs. This indicates a sharp departure from softmax probabilities, which classified most adversarial examples in the wrong class with a high confidence of above 0.9 for the FGSM and BIM attacks. They also made an observation that the BIM attack is more successful at introducing perturbations than that of the FGSM or the CW attacks.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/evaluation_confidence_dknn_adversarial.png" width="600">
<br> <b>Figure:</b> Reliability Diagrams on Adversarial Examples—
The DkNN’s credibility is better calibrated (i.e., it assigns low
confidence to adversarial examples) than probabilities output by
the softmax of an undefended DNN. All diagrams are plotted
with GTSRB test data. Similar graphs for the MNIST and
SVHN datasets are in the Appendix.
</p>
<h2 id="explanation-of-dnn-mispredictions">Explanation of DNN Mispredictions</h2>
<p align="center">
<img src="https://secml.github.io/images/class8/misprediction_dknn.png" width="600">
<br> <b>Figure:</b> Number of Candidate Labels among k = 75 Nearest
Neighboring Representations—Shown for GTSRB with clean
and adversarial data across the layers of the DNN underlying
the DkNN. Points are centered according to the number of
labels found in the neighbors; while the area of points is
proportional to the number of neighbors whose label matches
the DNN prediction. Representations output by lower layers
of the DNN are less ambiguous for clean data than adversarial
examples (nearest neighbors are more homogeneously labeled).
</p>
<p>In the above figure, for both clean and adversarial examples, we can observe that the number of candidate labels decreases as we move up the neural network from its input layer all the way to its output layer. Number of candidate labels (in this case k = 75 nearest neighboring training representations) that match the final prediction made by the DNN is smaller for some attacks. For CW attack, the true label of adversarial examples that it produces is often recovered by the DkNN. Again, the lack of conformity between neighboring training representations at different layers of the model characterizes the weak support for the model’s prediction.</p>
<!-- ## Robustness of DKNN to Adaptive Attacks
<p align="center">
<img src="https://secml.github.io/images/class8/robustness_dknn_adaptive.png" width="600">
<br> <b>Figure:</b> Feature Adversarial Examples against our DkNN
algorithm—Shown for SVHN (see Appendix for MNIST).
Adversarial examples are organized according to their original
label (rows) and the DkNN’s prediction (columns).
</p> -->
<!-- ## Conclusion -->
<h3 id="comparison-to-lid">Comparison to LID</h3>
<p>We discussed similarities between the DkNN approach and Local
Intrinsic Dimensionality [<a
href="https://openreview.net/forum?id=B1gJ1L2aW">ICLR 2018</a>]. There
are important differences between the approaches, but given the
results reported in the <em>Obfuscated Gradients Give a False Sense of
Security: Circumventing Defenses to Adversarial Examples</em> (discussed
in <a href="https://secml.github.io/class3/">Class 3</a>) reported on
LID, it is worth investigating how robust DkNN is to the same
attacks. (Note that neither of these defenses are really obfuscating
gradients, so the attack strategy reported in that paper is to just use high confidence adversarial examples.)</p>
<h2 id="the-secret-sharer-measuring-unintended-neural-network-memorization-extracting-secrets">The Secret Sharer: Measuring Unintended Neural Network Memorization &amp; Extracting Secrets</h2>
<blockquote>
<p>Nicholas Carlini, Chang Liu, Jernej Kos, Ulfar Erlingsson, Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization &amp; Extracting Secrets. arXiv:1802.08232. <a href="https://arxiv.org/pdf/1802.08232.pdf">[PDF]</a></p>
</blockquote>
<p>This paper focuses an adversary targeting &ldquo;secret&rdquo; user information stored in a deep neural network. Sensitive or &ldquo;secret&rdquo; user information can be included in the datasets used to train deep machine learning models. For example, if a model is trained on a dataset of emails, some of which have credit card numbers, there is a high probability that the credit card number can be extracted from the model, according to this paper.</p>
<h2 id="introduction">Introduction</h2>
<p>Rapid adoption of machine learning techniques has resulted in models trained on sensitive user information or &ldquo;secrets&rdquo;. The &ldquo;secrets&rdquo; may include “person messages, location histories, or medical information.” The potential for machine learning models to memorize or store secret information could reveal sensitive user information to an adversary. Even black-box models were found to be susceptible to leaking secret user information. As more deep learning models are implemented, we need to be mindful of the models ability to store information, and to shield models from revealing secrets.</p>
<h2 id="contributions">Contributions</h2>
<p>The exposure metric defined in this paper measures the ability of a model to memorize a secret. The higher the exposure of a model, the more likely a model is memorizing secret user information. The paper uses this metric to compare the ability of different models to memorize with different hyper-parameters. An important observation was that secrets were found to be memorized early in training rather than in the period of over-fitting. The author found that this property was consistent between different models and hyper parameters. Models included in paper focus on deep learning generative text models. Convolution neural network were also tested, but perform worse generally on text-based data. Extraction of secret information from the model was tested with varying hyper-parameters and conditions.</p>
<h2 id="perplexity-and-exposure">Perplexity and Exposure</h2>
<p>Perplexity is a measurement of how well a probability distribution predicts a sample. A model will have completely memorized the randomness of the training set if the log-perplexity of a secret is the absolute smallest.</p>
<p>Log-perplexity suggests memorization but does not yield general information about the extent of memorization in the model. Thus the authors define rank:</p>
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_risk.png" width="400">
<br> <b>Figure:</b> Definition of rank. Requires computing log-perplexity of all possible secrets.
</p>
<p>To compute the rank, you must iterate over all possible secrets. To avoid this heavy computation the authors define multiple numerical methods to compute a value related to the rank of a secret, exposure.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_exposure.png" width="400">
<br> <b>Figure:</b> Definition of exposure. More simply, the negative log-rank of a secret, s[r].
</p>
<h2 id="secret-extraction-methods">Secret Extraction Methods</h2>
<p>The authors identify four methods for extracting secrets from a black-box model.</p>
<ol>
<li>Brute Force</li>
<li>Generative Sampling</li>
<li>Beam Search</li>
<li>Shortest Path Search</li>
</ol>
<p>Brute force was determined to be too computationally expensive, as the randomness space is too large. Generative sampling and beam search both fail to give the most optimal solution. The author&rsquo;s used shortest path search to guarantee the lowest log-perplexity solution. Their approach was based on Dijkstra&rsquo;s graph algorithm, and is explained in the paper. The figure below demonstrates the process of maximizing the log-perplexity for the purpose of finding the secret.</p>
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_dsk.png" width="500">
<br> <b>Figure:</b> Shortest path search algorithm. Blue path points to location of largest perplexity, the location of the secret.
</p>
<h2 id="characterizing-memorization-of-secrets">Characterizing Memorization of Secrets</h2>
<p>In order to better understand model memorization the authors tested different numbers of iterations, various model architectures, multiple training strategies, changes in secret formats and context, and memorization across multiple simultaneous secrets.</p>
<h4 id="iterations">Iterations</h4>
<p>As showing in the figure below, the authors collected data relating to the exposure of the model throughout training. The exposure of the secret can clearly be seen rising until the test error starts to rise. This increase in exposure expresses the model&rsquo;s memorization of the training set early in the training process.
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_epoch.png" width="500">
<br> <b>Figure:</b> Estimated exposure of secret and loss during training.
</p></p>
<h4 id="model-architectures">Model architectures</h4>
<p>The authors observed, through their exposure metric, that memorization was a shared property among recurrent and convolutional neural networks. Data relating to the different model architectures and exposure is shown in the figure below.
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_modelComparison.png" width="500">
<br> <b>Table:</b> Model Exposure Comparison
</p></p>
<h4 id="training-strategies">Training strategies</h4>
<p>As seen in the figure below, smaller batch sizes resulted in lower levels of memorization. Although having a smaller batch size slows down distributed processing of models, it is clear that their results suggest that a smaller batch size can reduce memorization.
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_batchsize.png" width="400">
<br> <b>Table:</b> Estimated exposure of various model and batch sizes.
</p></p>
<h4 id="secret-formats-and-context">Secret formats and context</h4>
<p>The table below interestingly suggests that the context of secrets significantly impacts whether an adversary can detect memorization. As there become more characters or information associated with the secret, the adversary has an easier time extracting randomness.
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_secret.png" width="400">
<br> <b>Table:</b> Estimated exposure with different contexts.
</p></p>
<h4 id="memorization-across-multiple-simultaneous-secrets">Memorization across multiple simultaneous secrets</h4>
<p>The table below shows the effect of inserting multiple secrets into the dataset. As the number of insertions increase, the model becomes more likely to memorize the secrets that were inserted.
<p align="center">
<img src="https://secml.github.io/images/class8/secsha_multiplephrases.png" width="500">
<br> <b>Table:</b> Percentage of phrases that can be extracted from the model
</p></p>
<h2 id="evaluating-word-level-models">Evaluating word level models</h2>
<p>An interesting observation was that the capacity of a large word-level model produced better results than the smaller character-level model. By applying the exposure metrics to a word-level model, the authors found that representation of numbers mattered heavily in the memorization of information. When the authors replaced replaced &ldquo;1&rdquo; with &ldquo;one,&rdquo; they saw the exposure dropped by more than half from 25 to 12 for the large word-level model. Because the word-level model was more than 80 times the size of the character-level model, the authors found it surprising that &ldquo;[the large model] has sufficient capacity to memorize the training data completely, but it actually memorizes less.&rdquo;</p>
<h2 id="conclusion">Conclusion</h2>
<p>This paper examines the extent to which memorization has occurred in different types of algorithms and how sensitive user information can be revealed through this memorization. The metrics used in the paper can be easily transferred to existing models that have a well-defined notion of perplexity. They also demonstrate that these metrics can also be used to extract secret user information from black-box models.</p>
<p>&mdash; Team Panda: Christopher Geier, Faysal Hossain Shezan, Helen Simecek, Lawrence Hook, Nishant Jha</p>
<h3 id="sources">Sources</h3>
<blockquote>
<p>Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of ACM Symposium on Operating Systems Principles (SOSP ’17). ACM, New York, NY, USA, 18 pages. https: //doi.org/10.1145/3132747.3132785 <a href="https://arxiv.org/pdf/1705.06640.pdf">[PDF]</a></p>
<p>Nicholas Carlini, Chang Liu, Jernej Kos, Ulfar Erlingsson, Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization &amp; Extracting Secrets. arXiv:1802.08232. Retrieved from <a href="https://arxiv.org/pdf/1802.08232.pdf">https://arxiv.org/pdf/1802.08232.pdf</a> <a href="https://arxiv.org/pdf/1802.08232.pdf">[PDF]</a></p>
<p>Nicolas Papernot, Patrick McDaniel. 2018. Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning. Retrieved from <a href="https://arxiv.org/pdf/1803.04765.pdf">[PDF]</a></p>
</blockquote>
</description>
</item>
<item>
<title>Class 7: Biases in ML, Discriminatory Advertising</title>
<link>https://secml.github.io/class7/</link>
<pubDate>Tue, 20 Mar 2018 00:00:00 +0000</pubDate>
<guid>https://secml.github.io/class7/</guid>
<description>
<h2 id="motivation">Motivation</h2>
<p>Machine learning algorithms are playing increasingly important roles in many critical decision making tasks. However, studies reveal that machine learning models are subject to biases, some of which stem from historical biases in human world that are captured in training data. Understanding potential bias, identifying and fixing existing bias can help people design more objective and reliable decision making systems based on machine learning models.</p>
<h2 id="ad-transparency">Ad Transparency</h2>
<blockquote>
<p>Athanasios Andreou, Giridhari Venkatadri, Oana Goga, Krishna P. Gummadi, Patrick Loiseau, Alan Mislove. <em>Investigating Ad Transparency Mechanisms in Social Media: A Case Study of Facebook’s Explanations</em>. NDSS, 2018. <a href="http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_10-1_Andreou_paper.pdf">[PDF]</a> (All images below are taken from this paper.)</p>
</blockquote>
<h3 id="what-are-ad-transparency-mechanisms-in-social-media">What are ad transparency mechanisms in social media?</h3>
<p>Transparency mechanisms are solutions for many privacy complaints from users and regulators. Users have little understanding of what data the advertising platforms have about them and why they are shown particular ads. The transparency mechanisms provided by Facebook are (1) the &ldquo;Why am I seeing this?&rdquo; button that provides users with an explanation of why they were shown a particular ad (<em>ad explanations</em>), and (2) an Ad Preferences Page that provides users with a list of attributes Facebook has inferred about them and how (<em>data explanations</em>). Twitter offers similar transparency mechanisms, including a &ldquo;Why am I seeing this?&rdquo; botton that provides users with an explanation of why they were shown a particular ad (ad explanations) in Twitter.</p>
<h3 id="what-did-this-paper-do">What did this paper do?</h3>
<p>This paper reports on an investigation and analysis of ad explanations (why users see a certain ad), and an investigation of data explanations (how data is inferred about a user), which is strongly related to the ad transparency mechanisms in Facebook. This paper first introduced how ad platform in Facebook works and then evaluate the two transparency mechanisms we introduced before with some properties.</p>
<h3 id="the-ad-platform-in-facebook">The Ad platform in Facebook</h3>
<p>There are three main processes in the Facebook ad platform: a) the data inference process; b) the audience selection process; c) the user-ad matching process.</p>
<p>(a) The data inference process is the process that allows the advertising platform to learn the users’ attributes. It has three parts: (1) the raw user data (the inputs), containing the information the
advertising platform collects about a user either online or offline; (2) the data inference algorithm (the mapping function between inputs and outputs), covering the algorithm the advertising platform uses to translate input user data to targeting attributes; (3) the resulting targeting attributes (the outputs) of each user that advertisers can specify to select different groups of users.</p>
<p><center><a href="https://secml.github.io/images/class7/a.png"><img src="https://secml.github.io/images/class7/a.png" width=80%></a></center></p>
<p>(b) The audience selection process is the interface that allows advertisers to express who should receive their ads. Advertisers create audiences by specifying the set of targeting attributes the audience needs to satisfy. Later, to launch an ad campaign, advertisers also need to specify a bid price and an optimization criterion.</p>
<p><center><a href="https://secml.github.io/images/class7/b.png"><img src="https://secml.github.io/images/class7/b.png" width=80%></a></center></p>
<p>&#40;c) The user-ad matching process takes place whenever someone is eligible to see an ad. It examines all the ad campaigns placed by different advertisers in a particular time interval, their bids, and runs an auction to determine which ads are selected.</p>
<p><center><a href="https://secml.github.io/images/class7/c.png"><img src="https://secml.github.io/images/class7/c.png" width=80%></a></center></p>
<h3 id="ad-explanations-and-the-experiments-on-this-transparency-mechanism">Ad Explanations and the experiments on this transparency mechanism</h3>
<p><center><a href="https://secml.github.io/images/class7/WhySeeingThis.png"><img src="https://secml.github.io/images/class7/WhySeeingThis.png" width=80%></a></center></p>
<p>As you can see in the picture, there are both attritubutes and
potentional attritubutes here.</p>
<p>This paper used 5 different properties to evaluate the performance of
Ad explanations:</p>
<ol>
<li>Correctness: Every attribute listed was used by the advertiser</li>
<li>Personalization: The attributes listed are unique to the individual</li>
<li>Completeness: If all relevant attributes are included in the explanation</li>
<li>Consistency: Users with the same attributes see the same explanations</li>
<li>Determinism: A user would see the same explanation for ads based on the same target attributes</li>
</ol>
<p>The paper evaluated the ad explanations by using Chrome browser extension to record ads and explanations. The experiment had 35 users&rsquo; data across 5 months. This experiment also evaluated the data explanation. This paper made simple statistics for the explanation (see the following figure). And then, it shows the results of different properties on this experiment.</p>
<p><center><a href="https://secml.github.io/images/class7/stat.png"><img src="https://secml.github.io/images/class7/stat.png" width=95%"></a></center></p>
<h3 id="data-explanations-experiments">Data Explanations Experiments</h3>
<p>The data explanations is applied in &ldquo;Your interests&rdquo; part as shown in the following picture.</p>
<p><center><a href="https://secml.github.io/images/class7/like.png"><img src="https://secml.github.io/images/class7/like.png" width=80%"></a></center></p>
<p>The properties here are not the same as those used in the Ad explanation part. There are 3 new properties.
1. Specificity: A data explanation is precise if it shows the precise activities that were used to infer an attribute about a user.
2. Snapshot completeness: A data explanation is snapshot complete if the explanation shows all the inferred attributes about the user that Facebook makes available.
3. Temporal completeness: a temporally complete explanation is one where the platform shows all inferred attributes over a specified period of time.</p>
<p>The results of different properties on this experiment are showed below:</p>
<p><center><a href="https://secml.github.io/images/class7/DataRes.png"><img src="https://secml.github.io/images/class7/DataRes.png" width="80%"></a></center></p>
<h3 id="conclusion">Conclusion</h3>
<p>While the Ad Preferences Page does bring some transparency to the different attributes users can be targeted with,
the provided explanations are incomplete and often vague. Facebook does not provide information about data broker-provided attributes in its data explanations or in its ad explanations.</p>
<h2 id="discrimination-in-online-targeted-advertising">Discrimination in Online Targeted Advertising</h2>
<blockquote>
<p>Till Speicher, Muhammad Ali, Giridhari Venkatadri, Filipe Nunes Ribeiro, George Arvanitakis, Fabr&iacute;cio Benevenuto, Krishna P. Gummadi, Patrick Loiseau, Alan Mislove. <em>Potential for Discrimination in Online Targeted Advertising</em>. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:5-19, 2018. <a href="http://proceedings.mlr.press/v81/speicher18a/speicher18a.pdf">[PDF]</a></p>
</blockquote>
<p>Recently, online targeted advertising platforms like Facebook have received intense criticism for allowing advertisers to discriminate against users belonging to protected groups.</p>
<p>Facebook, in particular, is facing a civil rights lawsuit for allowing advertisers to target ads using an attribute called &ldquo;ethnic affinity.&rdquo; Facebook has clarified that &ldquo;ethnic affinity&rdquo; does not represent ethnicity, but rather represents a user’s affinity for content related to different ethnic communities. Facebook has agreed to rename the attribute to &ldquo;multicultural affinity&rdquo; and to disallow using this attribute to target ads related to housing, employment, and financial services.</p>
<p>However, Facebook offers many different ways to describe a set of targeted users, so it’s not adequate to disallow targeting on certain attributes. In this paper, the authors develop a framework for quantifying ad discrimination and show the potential for discriminatory advertising using the three different targeting methods on Facebooks advertising platform: personally identifiable information (PII)-based targeting, attribute-based targeting, and look-alike audience targeting.</p>
<h3 id="quantifying-ad-discrimination">Quantifying Ad Discrimination</h3>
<p>The authors identify three potential approaches to quantifying discrimination.</p>
<p><strong>Based on advertiser’s intent:</strong> The authors reject this approach since it is hard to measure and it does not capture unintentionally discriminatory ads.</p>
<p><strong>Based on ad targeting process:</strong> This category includes existing anti-discrimination measures, like disallowing use of sensitive attributes when defining a target population. The authors reject this approach since it breaks down when there exist several methods of targeting a population.</p>
<p><strong>Based on targeted audience (outcomes):</strong> This approach takes into account only which users are targeted, not how they are targeted. The authors use this approach to quantify ad discrimination since outcome-based analyses generalize independent of targeting methods.</p>
<p>The authors formalize outcome-based discrimination as follows:</p>
<p><blockquote>
Let \(\mathbf{D} = (u_i)_{i=1,\ldots,n}\) be a database of user records \(u_i\).<br />
Let \(u_i \in \mathbb{B}^m\) be a vector of \(m\) boolean attributes.<br />
Let \(s \in {1, \ldots, m}\) be the sensitive attribute we are considering.<br />
Let \(u_s\) be the value of sensitive attribute \(s\) for user \(u\).<br />
Let \(\mathbf{S} = {u \in \mathbf{D} | u_s = 1}\) be the set of all users having sensitive attribute \(s\).
</blockquote></p>
<p>The authors define a metric for how discriminatory an advertiser’s targeting is, inspired by the disparate impact measure used for recruiting candidates from a pool.</p>
<p><blockquote>
Let \(\mathbf{TA}\) (target audience) be the set of users selected by the targeting process.<br />
Let \(\mathbf{RA}\) (relevant audience) be the set of all users in the database \(\mathbf{D}\) who would find the ad useful and interesting.<br />
</blockquote></p>
<p>Define the representation ratio measure to capture how much more likely a user is to be targeted when having the sensitive attribute than if the user did not have the attribute:</p>
<p>$$\text{rep_ratio}_s(\mathbf{TA}, \mathbf{RA}) = \dfrac{|\mathbf{TA} \cap \mathbf{RA}_s|/|\mathbf{RA}_s|}{|\mathbf{TA} \cap \mathbf{RA}_{\neg s}|/|\mathbf{RA}_{\neg s}|}$$</p>
<p>where \(\mathbf{RA}_s = {u \in \mathbf{RA} | u_s = 1 }\) is the subset of the relevant audience with the sensitive attribute and \(\mathbf{RA}_{\neg s} = {u \in \mathbf{RA} | u_s = 0}\) is the complementary subset of the relevant audience without the sensitive attribute</p>
<p>Define the disparity in targeting measure to capture both over- and under-representation of a sensitive attribute in a target audience:</p>
<p>$$\text{disparity}_s(\mathbf{TA}, \mathbf{RA}) = \max\left(\text{rep_ratio}_s(\mathbf{TA}, \mathbf{RA}), \dfrac{1}{\text{rep_ratio}_s(\mathbf{TA}, \mathbf{RA})}\right)$$</p>
<p>Disparity must be computed based on the relevant audience \(\mathbf{RA}\) because \(\mathbf{RA}\) may have a different distribution of the sensitive attribute than the whole database \(\mathbf{D}\). The authors assume that sensitive attributes considered have the same distributions in the relevant audience as the global population, and therefore high disparity in targeting is evidence of discrimination. Following the &ldquo;80%&rdquo; disparate impact rule, a reasonable disparity threshold for a group to be over- or under-represented may be \(\max(0.8, 1/0.8) = 1.25\).</p>
<p>The recall of an ad quantifies how many of the relevant users with the sensitive attribute the ad targets or excludes:</p>
<p>$$\text{recall}(\mathbf{TA}, \mathbf{RA}&lsquo;) = \dfrac{|\mathbf{TA} \cap \mathbf{RA}&lsquo;|}{|\mathbf{RA}&lsquo;|}$$</p>
<p>where \(\mathbf{RA}&rsquo;\) is one of \(\mathbf{RA}_s\) or \(\mathbf{RA}_{\neg s}\) depending on whether we’re considering the inclusion or exclusion of \(\mathbf{S}\).</p>
<h3 id="pii-based-targeting">PII-Based Targeting</h3>
<p>PII-based targeting on the Facebook advertising platform allows advertisers to select a target audience using unique identifiers, like phone numbers, email addresses, and combinations of name with other attributes (e.g. birthday or zip code). The authors show that public data sources, such as voter records and criminal history records, contain sufficient PII to construct a discriminatory target audience for a sensitive attribute without explicitly targeting that attribute.</p>
<p>The authors constructed datasets to show that they could implicitly target gender, race, and age using North Carolina voter records. Each of these attributes is listed in voting records, and the remaining fields together uniquely identify the voter (i.e., last name, first name, city, state, zip code, phone number, and country). The authors uploaded datasets targeting values of each attribute and recorded Facebook’s estimated audience size.</p>
<p><center><a href="https://secml.github.io/images/class7/table1.jpg"><img src="https://secml.github.io/images/class7/table1.jpg" width=95%></img></a></center></p>
<p>The Voter Records column shows the distribution of attribute values in the voter records data set. For a given attribute, the Facebook Users column shows how many of the 10,000 people in the dataset constructed for that attribute are actually targetable on Facebook (as reported by the Facebook advertising platform). The final column shows the portion of the targetable users who actually match the targeted attribute, found by restricting the target audience using Facebook’s records of the sensitive attribute. High targetable percentages values show that the voter records overlap significantly with the voter records data set. High validation percentages show that the auxiliary PII was highly accurate at describing particular users with the targeted attribute. Note that there are some low validation percentages, which the authors attribute to Facebook’s inaccurate or incomplete records of some data (for example, they do not know race, only &ldquo;multicultural affinity&rdquo;).</p>
<h3 id="attribute-based-targeting">Attribute-Based Targeting</h3>
<p>Attribute-based targeting allows advertisers to select a target audience by specifying that targeted users should have some attribute or combination of attributes. The authors group these attributes into two categories: curated attributes and free-form attributes. Curated attributes are well-defined binary attributes spanning demographics, behaviors, and interests — Facebook tracks a list of over 1,100 of these. Free-form attributes describe users inferred interest in entities such as websites and apps as well as topics such as food preferences or niche interests. The authors estimate that there are at least hundreds of thousands of free-form attributes.</p>
<p>The authors demonstrate that many curated attributes are correlated with sensitive attributes like race, and can therefore be used for discriminatory audience creation. The following table shows experimental results obtained by uploading sets of voter records filtered to contain only a single race and measuring Facebook’s reported size of the subaudiences for each curated attribute. The figures in parentheses are the recall and representation ratio for a population from North Carolina. The top three most inclusive and exclusive attributes per ethnicity are listed. Note the high representation ratios for the &ldquo;Most inclusive&rdquo; column and the low representation ratios for the &ldquo;Most exclusive&rdquo; column.</p>
<p><img src="https://secml.github.io/images/class7/table2.jpg" alt="" /></p>
<p>The authors similarly demonstrated that free-form attributes could used in a discriminatory manner. For example, targeting a vulnerable audience could be made possible by targeting the free-form attributes &ldquo;Addicted,&rdquo; &ldquo;REHAB,&rdquo; &ldquo;AA,&rdquo; or &ldquo;Support group.&rdquo; The authors also showed how Facebook’s attribute suggestions feature could be used to discover new highly-discriminatory free-form attributes. For example, starting a search with &ldquo;Fox&rdquo; (37% conservative audience on Facebook) and following a chain of suggestions leads to &ldquo;The Sean Hannity Show&rdquo; (95% conservative audience on Facebook).</p>
<h3 id="look-alike-audience-targeting">Look-Alike Audience Targeting</h3>
<p>Look-alike audience targeting allows advertisers to generate a new target audience that looks similar to an existing set of users (the fans of one of their Facebook pages or an uploaded PII data set). The authors show that this feature can be used to scale a biased audience to a much larger population. Experimental results suggest that Facebook attempts to determine the attributes that distinguish the base target audience from the general population and propagates these biases to the look-alike audience. The authors show that this bias propagation can amplify both intentionally created and unintentionally overlooked biases in source audiences.</p>
<h2 id="algorithmic-transparency-via-quantitative-input-influence">Algorithmic Transparency via Quantitative Input Influence</h2>
<blockquote>
<p>Anupam Datta, Shayak Sen, Yair Zick. <em>Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems</em>. 2016 IEEE Symposium on Security and Privacy (&ldquo;Oakland&rdquo;). <a href="https://www.andrew.cmu.edu/user/danupam/datta-sen-zick-oakland16.pdf">[PDF]</a></p>
</blockquote>
<p>Machine learning systems are increasingly being used to make important societal decisions, in sectors including healthcare, education, and insurance.
For instance, an ML model may help a bank decide if a client is eligible for a loan, and both parties may to know critical details about how the model works.
A rejected client will likely want to know why they were rejected: would they have been accepted if their income was higher?
The answer would be especially important if their reported income was lower than their actual income;
more generally, the client can ensure that their input data contained no errors.</p>
<p>Conversely, the model&rsquo;s user may want to ensure that the model does not discriminate based on sensitive inputs, such as the legally-restricted features of race and gender.
Simply ignoring those features may not be sufficient to prevent discrimination; e.g., ZIP code can be used as a proxy for race.
This paper proposes a method to solve these problems by making the model&rsquo;s behavior more transparent: a quantitative measure of the effect of a particular feature (or set of features) on the model&rsquo;s decision for an individual.
The paper offers several approaches suited for various circumstances, but they all fall under the umbrella of &ldquo;quantitative input inflence&rdquo;, or QII.</p>
<h3 id="unary-qii">Unary QII</h3>
<p>The simplest quantitative measure presented is unary QII, which measures the influence of one attribute on a quantity of interest \(Q_\mathcal{A}\) for some subset of the sample space \(X\).
Formally, unary QII is determined as</p>
<p><center><a href="https://secml.github.io/images/class7/unaryQII.png"><img src="https://secml.github.io/images/class7/unaryQII.png" width=40%></img></a></center></p>
<p>where the first term is the actual expected value of \(Q_\mathcal{A}\) for this subset, and the second term is the expected value if the feature \(i\) were randomized.</p>
<p>For example, consider the rejected bank client from above.
If they restrict \(X\) to only contain their feature vector, and they set \(Q_\mathcal{A}\) to output the model&rsquo;s probability of rejection,
then unary QII tells how much any individual feature impacted his loan application.
If the unary QII for a feature is large, changing the value of that feature would likely increase their odds of being accepted;
conversely, changing the value of a feature with low unary QII would make little difference.</p>
<p>The paper presents a concrete example: Mr. X has been classified as a low-income individual, and he would like to know why.
Since only 2.1% of people with income above $10k are classified as low-income, Mr. X suspects racial bias.
In actuality, the transparency report shows that neither his race nor country of origin were significant;
rather, his marital status and education were far more influential in his classification.</p>
<p><center>
<a href="https://secml.github.io/images/class7/mrxprofile.png"><img src="https://secml.github.io/images/class7/mrxprofile.png" width="40%"></a><Br>
<A href="https://secml.github.io/images/class7/mrxreport.png"><img src="https://secml.github.io/images/class7/mrxreport.png" width=50%></a>
</center></p>
<p>The sample space \(X\) can also be broadened to include an entire class of people.
For instance, suppose \(X\) is restricted to include people of just one gender, and \(Q_\mathcal{A}\) is set to output the model&rsquo;s probability of acceptance.
Here, unary QII would reveal the influence of a feature \(i\) on men and on women.
A disparity between the two measures may then indicate that the model is biased:
specifically, the feature \(i\) can be identified as a proxy variable, used by the model to distinguish between men and women (even if gender is omitted as an input feature).</p>
<p><center><a href="https://secml.github.io/images/class7/unarygraph.png"><img src="https://secml.github.io/images/class7/unarygraph.png" width=50%></img></a></center></p>
<p>However, unary QII is often insufficent to explain a model&rsquo;s behavior on an individual or class of individuals.
This histogram shows the paper&rsquo;s results for their &ldquo;adult&rdquo; dataset:
for each individual, the feature that created the highest unary QII was found, and the unary QII value was plotted in the histogram.
Most individuals could not be explained by any particular feature, and most features had little influence by themselves.</p>
<h3 id="set-and-marginal-qii">Set and Marginal QII</h3>
<p>Thankfully, unary QII can easily be generalized to incorporate multiple features at once.
Set QII is defined as
<center><a href="https://secml.github.io/images/class7/setQII.png"><img src="https://secml.github.io/images/class7/setQII.png" width=35%></img></a></center></p>
<p>where \(S\) is a set of features (as opposed to a single feature, like \(i\) in unary QII).
The paper also defines marginal QII
<center><a href="https://secml.github.io/images/class7/marginalQII.png"><img src="https://secml.github.io/images/class7/marginalQII.png" width=55%></img></a></center></p>
<p>which measures the influence of a feature \(i\) after controlling for the features in \(S\).
These two quantitative measures have different use cases, but both are more general (and thus more useful) than unary QII.</p>
<p>Marginal QII can measure the influence of a single feature \(i\), like unary QII, but only for a specific choice of \(S\),
and the amount of influence can vary wildly depending on the choice of \(S\).
To account for this, the paper defines the <em>aggregate influence</em> of \(i\), which measures the expected influence of \(i\) for random choices of \(S\).</p>
<h3 id="conclusion-1">Conclusion</h3>
<p>The above variants of QII can be used to provide transparency reports, offering insight into how an ML model makes decisions about an individual.
Malicious actors may seek to abuse such a system, carefully crafting their input vector to glean someone else&rsquo;s private information.
However, these QII measures are shown too have low sensitivity, so differential privacy can be added with small amounts of noise.</p>
<p>These QII measures are useful only if the input features have well-defined semantics.
This is not true in domains such as image or speech recognition, yet transparency is still desirable there.
The authors assert that designing transparency mechanisms in these domains is an important future goal.
Nevertheless, these QII measures are remarkably effective on real datasets, both for understanding individual outcomes and for finding biases in ML models.</p>
<h2 id="language-corpus-bias">Language Corpus Bias</h2>
<blockquote>
<p>Caliskan, A., Bryson, J., &amp; Narayanan, A. (2017). <em>Semantics derived automatically from language corpora contain human-like biases</em>. Science, 356(6334), 183-186. doi:10.1126/science.aal4230 <a href="http://science.sciencemag.org/content/sci/356/6334/183.full.pdf">[PDF]</a> [<a href="http://opus.bath.ac.uk/55288/4/CaliskanEtAl_authors_full.pdf">Author&rsquo;s Full Version PDF</a>]</p>
</blockquote>
<p>The focus of this paper is how machine learning can learn from the biases and stereotypes in humans. The main contributions of the authors are:</p>
<ol>
<li>Using word embeddings to extract associations in text</li>
<li>Replicate human bias to reveal prejudice behavior in humans</li>
<li>Show that cultural stereotypes propagate to widely used AI today</li>
</ol>
<h3 id="uncovering-biases-in-ml">Uncovering Biases in ML</h3>
<p>The authors began by replicating inoffensive biases using their original Word-Embedding Association Test (WEAT) method. Word embedding is a representation of words in vector space. WEAT is a test applied to words in AI which represents words as a 300 dimensional vector. The words are then paired by distances between the vectors. Using WEAT, they demonstrated that flowers have pleasant associations and insects have unpleasant associations. Or instruments are more pleasant than weapons. The word embeddings know the properties of flowers or weapons even though they have no experience with them!</p>
<p>After showing that WEAT works, they use this technique to show that machine learning absorbs stereotype biases. In a study by Bertrand and Mullainathan, 5,000 identical resumes were sent out to 1,300 job ads and varied only the names. The European American names were 50% more likely to be offered an opportunity to be interviewed. Based on this study, the authors used WEAT to test the pleasantness associations with the names from Bertrand’s work and found European American names were more pleasant than African American names.</p>
<p>They then turned to studying gender biases. Female names were associated with family as oppose to male names which were associated with career. They also showed woman/girl associated more with arts than math compared to men. These observations were then correlated with data in the labor force. This is show in the figure below:</p>
<p><center><a href="https://secml.github.io/images/class7/gender_bias.PNG"><img src="https://secml.github.io/images/class7/gender_bias.PNG" width=50%></img></a></center></p>
<p>The authors then applied another method of their creation called Word-Embedding Factual
Association Test (WEFAT) to show that these embeddings correlate strongly with the occupations women have in the real world. They then used the GloVe method to find similarity between a pair of vectors. Similarity between vectors is related to the probability that the words co-occur with other words similar to each other. GloVe finds this by doing dimensionality reduction to amplify signal in co-occurring probabilities.</p>
<p>Afterwards, they did a crawl of the internet and got 840 billion words, and each word had a 300 dimension vector derived from counts of other words that occur with it in a 10 word window. WEFAT allowed them to further examine how embeddings capture empirical information. Using this, they were able to predict properties from the given vector.</p>
<h3 id="conclusion-2">Conclusion</h3>
<p>So what does their work mean? Their results show there’s a way to reveal unknown implicit associations. They demonstrate that word embeddings encode not only stereotyped bias, but also other knowledge like that flowers are pleasant. These results also explain origins of prejudice in humans. It shows how group identity transmits through language before an institution explains why individuals make prejudiced decisions. There are implications for AI and ML because technology could be perpetuating cultural stereotypes. What if ML responsible for reviewing resumes absorbed cultural stereotypes? It’s important to keep this in mind and be cautious in the future.</p>
<h2 id="men-also-like-shopping-reducing-bias-amplification">Men Also Like Shopping: Reducing Bias Amplification</h2>
<blockquote>
<p>Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang. <em>Men Also Like Shopping:Reducing Gender Bias Amplification using Corpus-level Constraints</em>. 2017 Conference on Empirical Methods in Natural Language Processing. <a href="https://arxiv.org/pdf/1707.09457.pdf">arXiv preprint arXiv:1709.10207</a>. July 2017.</p>
</blockquote>
<p>Language is increasingly being used to identify some rich visual recognition tasks. And structured prediction models are widely applied to these tasks to take advantage of correlations between co-ocurring labels and visual inputs. However, inadvertently, there can be social biases encoded in the model training procedure, which may magnify some stereotypes and poses challenge in the fairness of (machine learning) model decision making.</p>
<p>Researchers found that datasets for these tasks contain significant gender bias and models trained on these biased dataset further amplifies these existing biases. An example provided in the paper is, the activity &ldquo;cooking&rdquo; is over 33% more likely to refer females than males in the training set, and a model trained on this dataset can further amplify the disparity of gender ratio to 68% at test time. And to tackle the problem, the author proposed to adopt corpus-level constraints for calibrating existing structured prediction models. Specifically, the author limit the gender bias of the model deviate by only a small amount from what is in the original training data.</p>
<h3 id="problem-formulation">Problem Formulation</h3>
<p>The problem is then defined as maximizing the test time inference likelihood while also satisfying the corpus-level constraint. A bias score for an output \(o\) with respect to demographic variable \(g\)is defined as:
$$b(o,g) = \frac{c(o,g)}{\sum_{g^{&lsquo;}\in G}c(o,g^{&lsquo;})}$$
where \(c(o,g)\) captures the number of occurrences of \(o\) and \(g\) in a corpus. And a bias might be exhibited if \(b(o,g)&gt;1/||G||\). A mean bias amplification of a model compared to the bias on training data set (i.e., \(b^{*}(o,g)\)) is defined as:</p>
<p align="center">
<img src="https://secml.github.io/images/class7/mean_bias_amp.png" width="350" >
<br>
</p>
<p>with these terms defined, the author proposes the calibration algorithm: \(\textbf{R}educing~\textbf{B}ias~\textbf{A}mplification\) (RBA). Intuitive understanding the of calibration algorithm is to inject constraints to ensure the model predictions follow the gender distribution observed from the training data with allowbable small deviations.</p>
<h4 id="structured-output-prediction">Structured Output Prediction</h4>
<p>Given a test insatnce, the inference problem at test time is defined as:
$$\underset{y\in Y}{\operatorname{argmax}}f_{\theta}(y,i)$$</p>
<p>with \(f_{\theta} (y,i)\) is a scoring function based on model \(\theta\) learned from training data. The inference problem hence can be interpreted as finding the structured oupt \(y\) such that the scoring function is maximized. The corpus level constraint is expressed as:</p>
<p align="center">
<img src="https://secml.github.io/images/class7/corpus_constraint.png" width="450" >
<br>
</p>
<p>With the given constraint, the problem is formulated as:
<p align="center">
<img src="https://secml.github.io/images/class7/final_form.png" width="300" >
<br>
</p></p>
<p>where \(i\) refers to insatnce \(i\) in test dataset.</p>
<p>The corpus level constraint is represented by \(A\sum_{i}y^{i}-b \leq 0\), where the matrix \(A\in R^{l \times K}\) is the coefficients of one constraint, and \(b \in R^{l}\). Note that, above formulation can be solved individually for each instance \(i\).</p>
<h4 id="lagrangian-relaxation">Lagrangian Relaxation</h4>
<p>The final optimization problem is a mixed integer programming problem, and solving with off-the-shell solver is inefficient for large-scale dataset. Hence, the author proposed to solve the problem with <a href="https://www.jair.org/media/3680/live-3680-6584-jair.pdf">Lagrangian relaxation technique</a>. With a lagrangian multiplier introduced, we have the Lagrangian as
<p align="center">
<img src="https://secml.github.io/images/class7/lagrangian_form.png" width="350" >
<br>
</p></p>
<p>where \(\lambda_{j} \geq 0, \forall \in {1,&hellip;,l}\). The solution to the problem is then obtained by iteratively optimizing the problem with respect to \(y^{i}\) and \(\lambda\). Specifically, we need two steps in each optimization iteration:<br />
1) At iteration \(t\), first get the output solution of each instance \(i\)
<p align="center">
<img src="https://secml.github.io/images/class7/lagrangian_opt_1.png" width="300" >
<br>
</p>
<br />
2) next update the Lagrangian multipliers
<p align="center">
<img src="https://secml.github.io/images/class7/lagrangian_opt_2.png" width="300" >
<br>
</p></p>
<h3 id="experimental-setup">Experimental Setup</h3>
<p>This problem is evvaluated on two vision recognition tasks: visual semantic role labeling (vSRL), and multi-label classification (MLC). The authors focus on the gender bias problem, where \(G = \{man, woman\}\) and focus on the agent and any occurrence in text associated with the images in MLC.</p>
<h4 id="dataset-and-model">Dataset and Model</h4>
<p>The experiment of vSRL is conducted on <a href="https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yatskar_Situation_Recognition_Visual_CVPR_2016_paper.pdf">imSitu</a> where activity classes are drawn from verbs and roles in <a href="http://delivery.acm.org/10.1145/990000/980860/p86-baker.pdf?ip=128.143.69.35&amp;id=980860&amp;acc=OPEN&amp;key=B33240AC40EC9E30%2E95F2ACB8D94EAE2C%2E4D4702B0C3E38B35%2E6D218144511F3437&amp;__acm__=1521592411_637f2c8599c29ab1aa3d8bf2818f6140">FrameNet</a> and noun categories are drawn from <a href="https://academic.oup.com/ijl/article-abstract/3/4/235/923280">WordNet</a>. The model is built on the baseline <a href="https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yatskar_Situation_Recognition_Visual_CVPR_2016_paper.pdf">CRF</a> released with the data, which has been shown effective compared to a non-structured prediction baseline [2]. The experiment of MLC is conducted on <a href="https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48">MS-COCO</a>. The model is a similar model as CRF that is used for vSRL.</p>
<h4 id="result-analysis">Result analysis</h4>
<p>For both the vSRL and MLC tasks, their training data is biased as illustrated in Figure 2. Y-axis denotes the percentage of male agents and x-axis represents gender bias in training data. It is clear that many verbs are biased in the training set and when a model is trained on these biased training datasets, the gender bias is further amplified. Figure 2(a) denotes demonstrates the gender bias in vSRL task and figure 2(b) shows the gender bias in MLC task. Some seemingly neutral words like &ldquo;microwaving&rdquo; and &ldquo;washing&rdquo; is heavily biased towards female and other words like &ldquo;driving&rdquo; is beavily biased towards male.</p>
<p align="center">
<img src="https://secml.github.io/images/class7/bias_fig.png" width="500" >
<br>
</p>
<p><img src="https://secml.github.io/images/class7/bias_fig.png" alt="" />
<div class="caption">
Source: <a href="(https://arxiv.org/pdf/1608.04644.pdf)"><em>Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints</em></a> [1]
</div></p>
<p>Calibration results are then summarized in the table below, which utilizes RBA method. The experimental results show that, with this calibrated method, we are able to significantly reduce the gender bias.<br />
<p align="center">
<img src="https://secml.github.io/images/class7/calib_result.png" width="30%">
<br>
</p></p>
<p><img src="https://secml.github.io/images/class7/bias_fig.png" alt="" />
<div class="caption">
Source: <a href="(https://arxiv.org/pdf/1707.09457.pdf)"><em>Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints</em></a> [1]
</div></p>
<hr />
<p>—&ndash; Team Gibbon:
Austin Chen, Jin Ding, Ethan Lowman, Aditi Narvekar, Suya</p>
<h4 id="references">References</h4>
<p><a href="http://proceedings.mlr.press/v81/speicher18a/speicher18a.pdf">[1]</a> Till Speicher, Muhammad Ali, Giridhari Venkatadri, Filipe Nunes Ribeiro, George Arvanitakis, Fabr&iacute;cio Benevenuto, Krishna P. Gummadi, Patrick Loiseau, Alan Mislove. &ldquo;Potential for Discrimination in Online Targeted Advertising.&rdquo; Proceedings of the 1st Conference on Fairness, Accountability and Transparency, PMLR 81:5-19, 2018.</p>
<p><a href="https://www.andrew.cmu.edu/user/danupam/datta-sen-zick-oakland16.pdf">[2]</a> Anupam Datta, Shayak Sen, Yair Zick. &ldquo;Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems.&rdquo; 2016 IEEE Symposium on Security and Privacy (SP), 2016.</p>
<p><a href="http://science.sciencemag.org/content/sci/356/6334/183.full.pdf">[3]</a> Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan. &ldquo;Semantics derived automatically from language corpora contain human-like biases.&rdquo; Science Magazine, 2017.</p>
<p><a href="https://www.jair.org/media/3680/live-3680-6584-jair.pdf">[4]</a> Rush, Alexander M., and Michael Collins. &ldquo;A tutorial on dual decomposition and Lagrangian relaxation for inference in natural language processing.&rdquo; Journal of Artificial Intelligence Research (2012).</p>
<p><a href="https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yatskar_Situation_Recognition_Visual_CVPR_2016_paper.pdf">[5]</a> Yatskar, Mark, Luke Zettlemoyer, and Ali Farhadi. &ldquo;Situation recognition: Visual semantic role labeling for image understanding.&rdquo; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.</p>
<p><a href="http://delivery.acm.org/10.1145/990000/980860/p86-baker.pdf?ip=128.143.69.35&amp;id=980860&amp;acc=OPEN&amp;key=B33240AC40EC9E30%2E95F2ACB8D94EAE2C%2E4D4702B0C3E38B35%2E6D218144511F3437&amp;__acm__=1521592411_637f2c8599c29ab1aa3d8bf2818f6140">[6]</a> Baker, Collin F., Charles J. Fillmore, and John B. Lowe. &ldquo;The berkeley framenet project.&rdquo; Proceedings of the 17th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 1998.</p>
<p><a href="https://academic.oup.com/ijl/article-abstract/3/4/235/923280">[7]</a> Miller, George A., et al. &ldquo;Introduction to WordNet: An on-line lexical database.&rdquo; International journal of lexicography 3.4 (1990): 235-244.</p>
<p><a href="https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48">[8]</a> Lin, Tsung-Yi, et al. &ldquo;Microsoft coco: Common objects in context.&rdquo; European conference on computer vision. Springer, Cham, 2014.</p>
<p><a href="http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_10-1_Andreou_paper.pdf">[9]</a> Andreou, Athanasios, et al. &ldquo;Investigating ad transparency mechanisms in social media: A case study of Facebook’s explanations.&rdquo; NDSS, 2018.</p>
</description>
</item>
<item>
<title>Class 6: Measuring Robustness of ML Models</title>
<link>https://secml.github.io/class6/</link>
<pubDate>Fri, 02 Mar 2018 00:00:00 +0000</pubDate>
<guid>https://secml.github.io/class6/</guid>
<description>
<h2 id="motivation">Motivation</h2>
<p>In what seems to be an endless back-and-forth between new adversarial attacks and new defenses against those attacks, we would like a means of formally verifying the robustness of machine learning algorithms to adversarial attacks.
In the privacy domain, there is the idea of a differential privacy budget, which quantifies privacy over all possible attacks. In the following three papers, we see attempts at deriving an equivalent benchmark for security, one that will allow the evaluation of defenses against all possible attacks instead of just a specific one.</p>
<h2 id="provably-minimally-distorted-adversarial-examples">Provably Minimally Distorted Adversarial Examples</h2>
<blockquote>
<p>Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill. <em>Provably Minimally-Distorted Adversarial Examples</em>. <a href="https://arxiv.org/abs/1709.10207">arXiv preprint arXiv:1709.10207</a>. September 2017.</p>
<p>Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer. <em>Reluplex: An efficient SMT solver for verifying deep neural networks</em>. International Conference on Computer Aided Verification. 2017. <a href="https://arxiv.org/pdf/1702.01135.pdf">[PDF]</a></p>
</blockquote>
<p>There have been many attacking techniques against deep learning models that can effectively generate adversarial examples, such as FGSM, JSMA, DeepFool and the Carlini &amp; Wagner attacks. But most of them couldn&rsquo;t verify the absence of adversarial examples even if they fail to find any result.</p>
<p>Researchers have started to borrow the idea and techniques from the program verification field to verify the robustness of neural networks. In order to verify the correctness of a program, we can encode a program into SAT-modulo-theories (SMT) formulas and use some off-the-shelf solvers (e.g. Microsoft Z3) to verify a correctness property. An SMT solver generates sound and complete results, either telling you the program never violates the property, or giving you some specific counter-examples. However, we may not be able to get the results for some large programs in our lifetime, because it is an NP-complete problem.</p>
<p>Similarly, the current neural network verification techniques haven&rsquo;t been able to deal with deep learning models of arbitrary size. But one of the prototype named Reluplex has produced some promising results on the MNIST dataset.</p>
<p>The Reluplex is an extension of the Simplex algorithm. It introduces a new domain theory solver to take care of the ReLU activation function because the Simplex only deals with linear real arithmetic. You can find more details about Reluplex in:</p>
<blockquote>
<p>Guy Katz, Clark Barrett, David Dill, Kyle Julian, Mykel Kochenderfer. <em>Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks</em>. <a href="https://arxiv.org/abs/1702.01135">Arxiv</a>. 2017.</p>
</blockquote>
<p>The paper we discuss here uses Reluplex to verify the <em>local adversarial robustness</em> of two MNIST classification models. We say a model is delta-locally-robust at input \(x\) if for every \(x&rsquo;\) such that \( ||x-x&rsquo;||p \le \delta \), the model predicts the same label for \(x\) and \(x&rsquo;\). Local robustness is certified for individual inputs, which is substantially different from the <em>global robustness</em> that certifies the whole input space. The paper performs binary search on delta to find the minimal distortion at a certain precision, and each delta corresponds to one execution instance of Reluplex. The paper only considers the \(L_{infinity}\) norm and the \(L_1\) norm because it is easier to encode these constraints with Reluplex. For example, the \(L_1\) norm could be encoded as
\( |x| = max(x, -x)=ReLu(2x)-x \).</p>
<p>The evaluation is conducted on a fully-connected, 3-layer network that has 20k weights and fewer than 100 hidden neurons. The testing accuracy of the model is 97%. The model is smaller than most of the state-of-the-art models and has inferior accuracy, but should be good enough for the model verification prototype. The authors arbitrarily selected 10 source images with known labels from the MNIST test set, which produced 90 targeted attack instances in total.</p>
<p>Even though the Reluplex method is faster than most of the existing general SMT solvers, it is not fast enough for verifying the MNIST classification models. For every configuration with different target models and different \(L_p\) norm constraints, Reluplex always timed out for some of the 90 instances. The experiments with the \(L_1\) constraint were generally slower than those with the
L_infinity constraint, because it introduced more ReLU components. However, we still found some interesting results from the successful verification instances.</p>
<p>The paper compared the minimally-distorted adversarial examples found by Reluplex with those generated by the CW attack and concluded that iterative optimization-based attacks are effective because the CW attack produced adversarial examples within 12% of optimal on the specific models.</p>
<p>The paper also evaluated a particular defense technique proposed by Madry et al. which is an adversarial training method that uses the PGD attack and enlarges the model capacity. The paper concluded that the PGD-based adversarial training increased the robustness to adversarial examples by 4.2x on the examined samples. Even though this result doesn&rsquo;t guarantee the efficacy at larger scale, it proves that the defense increases the robustness against all future attacks while it is only trained with the PGD attack.</p>
<hr />
<h2 id="evaluating-the-robustness-of-neural-networks">Evaluating the Robustness of Neural Networks</h2>
<blockquote>
<p>Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, Luca Daniel. <em>Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach</em>. ICLR 2018. January 2018 <a href="https://arxiv.org/pdf/1801.10578.pdf">[PDF]</a></p>
</blockquote>
<p>Little work has been done towards developing a comprehensive measure of robustness for neural networks, primarily due to them growing mathematically complex as the number of layers increases. The authors contribute a lower bound on the minimal perturbation needed to generate an adversarial sample. They accomplish this by using the extreme value theorem to estimate the local Lipschitz constant for a given sample.</p>
<p>The work is motivated by the success of a particular attack by Carlini and Wagner against several previous defense strategies such as defensive distillation, adversarial training, and model ensemble <a href="https://arxiv.org/pdf/1608.04644.pdf">[2]</a>. This highlights the need for a means for evaluating a defenses effectiveness against <em>all</em> attacks rather than just the ones tested against.</p>
<p>Previous attempts at deriving such a lower bound have shortcomings of their own. Szegedy et al. compute the product of the global Lipschitz constant of each layer of the network to derive a metric of instability of a deep neural network&rsquo;s output; however the global Lipschitz constant is a loose bound <a href="https://arxiv.org/pdf/1312.6199.pdf">[3]</a>.</p>
<p>Hein and Andriushchenko derived a closed-form bound using the local Lipschitz constant, but such a bound is only feasible for networks with one hidden layer. Several other approaches used linear programming to verify properties of neural networks, but they are also infeasible for large networks <a href="https://arxiv.org/pdf/1705.08475.pdf">[4]</a>.</p>
<p>The most similar work uses a linear programming formulation to derive an <em>upper</em> bound on the minimum distortion needed, which is not as useful as a universal robustness metric, and also is infeasible for large networks due to the computational complexity of linear programming.</p>
<h3 id="robustness-metric">Robustness metric</h3>
<p>In order to derive a formal robustness guarantee, the authors formulate the lower bound for the minimum adversarial distortion needed to be a successful perturbed example (success here meaning fooling the classifier, thus becoming adversarial). They accomplish this by relating p-norm of the perturbation \(\lVert \delta \rVert_p \) to the local Lipschitz constant \( L_q^j \), local meaning defined over a l-ball around the input
They use properties of Lipschitz continuous functions to prove the following:</p>
<p>$$ \lVert \delta \rVert \le \min_{j \ne c} \frac{f_c(x_0) - f_j(x_0)}{L_q^j} $$</p>
<p>That is to say that if the the p-norm of the perturbation is less than the difference between any two classifications of the input, \((x_0\)), divided by the local Lipschitz constant.</p>
<p>The authors go on to provide a similar guarantee for networks where the activation function is not differentiable, for example an ReLU network. In such a case, the local Lipschitz constant can be replaced with the supremum of the directional derivatives for each direction heading towards the non-differentiable point.</p>
<h3 id="clever-robustness-metric">CLEVER Robustness Metric</h3>
<p>Since the above formulations of the lower bound are difficult to compute, the authors present a technique for estimating the lower bound, which is much more feasible computationally. They make use of the extreme value theorem, which claims that for any random variable, the maximum value of infinite samples of it follows a known distribution. In this case, the random variable is the p-norm of the gradient of a given sample. The authors assume a Weibull non-degenerate distribution for this paper and verify it as a reasonable assumption empirically.</p>
<p>To apply the theorem, they generate N samples in a ball around a given sample and calculate the gradient norm of each sample. Using the maximum value of the gradient over those N samples, they apply maximum likelihood to get distribution parameters that maximize the probability of those gradients. It should be noted that this approach assumes that the adversarial examples are well-distributed enough that enough random noise will generate them.</p>
<p>The resulting CLEVER score is an approximate lower bound on the distortion needed for an attack to succeed.</p>
<h3 id="experiments-and-results">Experiments and Results</h3>
<p>To evaluate how effective an indicator of robustness the CLEVER score is, the authors conducted experiments using the CIFAR-10, MNIST, and ImageNet data sets, pairing each data set with a set of neural network architectures and corresponding popular defenses for those networks.</p>
<p>They estimate the Weibull distribution parameter and conduct a goodness-of-fit test to verify that the distribution fits the data empirically.</p>
<p>They then apply two recent state-of-the-art attacks to their models, the Carlini and Wagner attack <a href="https://arxiv.org/pdf/1608.04644.pdf">[2]</a>, covered in a previous blog post, and I-FGSM <a href="https://arxiv.org/pdf/1412.6572.pdf">[3]</a></p>
<p>To evaluate the CLEVER score&rsquo;s effectiveness, the authors compare it with the average \( L_2 \) and
L_infinity distortions for each adversarial example generated by each type of attack. Since the score is an estimate of the lower bound of the minimal distortion needed, if it is an accurate estimate, then no attack should succeed with a distortion less than the lower bound. Their results show that this holds under both distortion metrics.</p>
<p>These results also show that the Carlini Wagner attack produces adversarial examples with distortion much closer to minimal than I-FGSM, which demonstrates the CLEVER score&rsquo;s ability to evaluate the strength of attacks themselves. The score also serves as a metric for the effectiveness of of defenses against adversarial attacks since the score was shown to increase for defensively distilled networks. The authors generally contribute a means of providing theoretical guarantees about neural networks that is no limited by the size of the network.</p>
<p>However, there does not seem to be a correlation between the CLEVER score and the distortion needed by the Carlini Wagner attack. If the score were to truly indicate how hard it is to generate adversarial examples, then we would expect that networks with a higher CLEVER score to have higher average distortions than networks with lower scores, but this was not always the case.</p>
<hr />
<h2 id="lower-bounds-on-the-robustness-to-adversarial-perturbations">Lower bounds on the Robustness to Adversarial Perturbations</h2>
<blockquote>
<p>Jonathan Peck, Joris Roels, Bart Goossens, Yvan Saeys <a href="https://papers.nips.cc/paper/6682-lower-bounds-on-the-robustness-to-adversarial-perturbations.pdf">Lower bounds on the robustness to adversarial perturbations</a>. NIPS 2017.</p>
</blockquote>
<p>In this paper, the authors propose theoretical lower bounds on the adversarial
perturbations on different types of layers in neural networks. They then combine
these theoretical lower bounds derived layer-by-layer, and apply it to the
entire network to calculate the theoretical lower bound for adversarial
perturbations for the network. In contrast to previous work on this topic, the
authors derive the lower bounds directly in terms of the model parameters.
This is useful for applying the bounds on real-world DNN models.</p>
<h3 id="lower-bounds-on-different-classifiers">Lower bounds on different classifiers</h3>
<p>The authors use a modular approach to find the lower bound of a feedforward
neural network. In this approach, they derive the lower bound at a particular
layer by working their way backward starting from the output layer, and
gradually towards the input layer. More precisely, given a layer that takes as
input \(y\) and computes a function \(h\) on the input, if we know the
robustness bound of the following layer \(k\), then the goal at the current
layer is to find the perturbation \(r\) such that,</p>
<p>$$||h(y+r)|| = ||h(y)|| + k$$</p>
<p>Any adversarial perturbation to that layer, \(q\) is guaranteed to satisfy
\(||q|| \geq ||r||\).</p>
<p>The lower bounds for different types of layers, as proposed in the paper, is
shown below. The proofs can be found in the <a href="https://papers.nips.cc/paper/6682-lower-bounds-on-the-robustness-to-adversarial-perturbations-supplemental.zip">supplemental materials provided by
the authors of the paper</a>.</p>
<h4 id="softmax-output-layers">Softmax output layers</h4>
<p>Let \(r\) be the smallest perturbations to the input \(x\) of a softmax
layer such that \(f(x+r) \neq f(x)\). The authors have shown that this
condition is then satisfied:
<p align="center">
<img src="https://secml.github.io/images/class6/softmax.png" width="500" >
<br>
</p></p>
<h4 id="fully-connected-layers">Fully connected layers</h4>
<p>Here, the assumption is that the next layer has a robustness of \(k\). In that
case, the authors have shown that the minimum perturbations \(r\) satisfies
this condition:
<p align="center">
<img src="https://secml.github.io/images/class6/fulllayers.png" width="500" >
<br>
</p></p>
<p>Here, \(J(x)\) is the Jacobian matrix of \(h_L\) at \(x\), and \(M\) is the
bound of the second order derivative of \(h_L\).</p>
<h4 id="convolutional-layers">Convolutional layers</h4>
<p>For a convolutional layer with filter tensor \(W \in R^{k \times c \times q
\times q}\), and input tensor \(X\), the adversarial perturbation \(R\)
satisfies this condition:
<p align="center">
<img src="https://secml.github.io/images/class6/convolutional.png" width="200" >
<br>
</p></p>
<h4 id="pooling-layers">Pooling layers</h4>
<p>For a MAX or average pooling layer, the adversarial perturbation \(R\)
satisfies:
<p align="center">
<img src="https://secml.github.io/images/class6/maxpooling.png" width="200" >
<br>
</p></p>
<p>And for \(L_p\) pooling:
<p align="center">
<img src="https://secml.github.io/images/class6/lppooling.png" width="200" >
<br>
</p></p>
<h3 id="results">Results</h3>
<p>The authors conducted their experiments using the MNIST and CIFAR-10 datasets on
the LeNet network. For generating the adversarial perturbations for testing the
theoretical bounds, they used the fast gradient sign method (FGSM). They used a
binary search to find the smallest perturbation parameter of FGSM that resulted
in misclassification of a sample.</p>
<p>In their experiments, the authors did not find any adversarial sample that
violated the theoretical lower bounds. The experimental and theoretical
perturbation norms can be seen in the following two tables:</p>
<p align="center">
<img src="https://secml.github.io/images/class6/fgsm.png" width="500" >
<br> <b>Figure:</b> Summary of norms of adversarial perturbations found by FGSM
</p>
<p align="center">
<img src="https://secml.github.io/images/class6/theoretical.png" width="500" >
<br> <b>Figure:</b> Summary of theoretical bound of adversarial perturbations
</p>
<p>It can be seen that the mean theoretical perturbation is much lower than the
experimental ones for both MNIST and CIFAR-10 datasets. The authors suggest this
is due to FGSM not being the most efficient attacking technique.</p>
<p>In conclusion, the authors suggest it is still unclear how tight these
theoretical bounds are. It will be interesting to verify how close they are to