add attention and deep set references

Kucharssim · Kucharssim · commit 5e06c2acc092 · 2025-03-21T19:37:11.000+01:00
diff --git a/bibliography.bib b/bibliography.bib
@@ -354,4 +354,23 @@ @article{pooladian2023multisample
   author={Pooladian, Aram-Alexandre and Ben-Hamu, Heli and Domingo-Enrich, Carles and Amos, Brandon and Lipman, Yaron and Chen, Ricky TQ},
   journal={arXiv preprint arXiv:2304.14772},
   year={2023}
-}
+}
+
+@article{vaswani2017attention,
+  title={Attention is all you need},
+  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
+  journal={Advances in neural information processing systems},
+  volume={30},
+  year={2017}
+}
+
+
+@inproceedings{zaheer_deep_2017,
+	title = {Deep {Sets}},
+	volume = {30},
+	abstract = {We study the problem of designing models for machine learning tasks defined on sets. In contrast to the traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets and are invariant to permutations. Such problems are widespread, ranging from the estimation of population statistics, to anomaly detection in piezometer data of embankment dams, to cosmology. Our main theorem characterizes the permutation invariant objective functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection.},
+	booktitle = {Advances in {Neural} {Information} {Processing} {Systems}},
+	author = {Zaheer, Manzil and Kottur, Satwik and Ravanbakhsh, Siamak and Poczos, Barnabas and Salakhutdinov, Russ R and Smola, Alexander J},
+	year = {2017},
+	file = {Full Text PDF:/Users/simonkucharsky/Zotero/storage/LPWEVIE9/Zaheer et al. - 2017 - Deep Sets.pdf:application/pdf},
+}
diff --git a/slides/deep-learning.qmd b/slides/deep-learning.qmd
@@ -756,7 +756,7 @@ $\rightarrow$ leverage properties of data to our advantage by building networks
 
 ![[Source: Christopher Olah's blog](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png){fig-align="center"}
 
-## Attention mechanism
+## Attention [@vaswani2017attention]
 
 - Sequential updating is slow
 - Limited memory (even for LSTM)
@@ -813,7 +813,7 @@ $$
   - Permutation invariant
   - Interactions between elements
 
-## Deep Set
+## Deep Set [@zaheer_deep_2017]
 
 ::::{.columns}