Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Jan 24, 2024
1 parent d4945df commit d10a671
Show file tree
Hide file tree
Showing 10 changed files with 69 additions and 56 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
79f6e15c
3389a51e
8 changes: 4 additions & 4 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@

<div class="quarto-listing quarto-listing-container-grid" id="listing-listing">
<div class="list grid quarto-listing-cols-3">
<div class="g-col-1" data-index="0" data-listing-date-sort="1705968000000" data-listing-file-modified-sort="1706071160679" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4">
<div class="g-col-1" data-index="0" data-listing-date-sort="1705968000000" data-listing-file-modified-sort="1706114882598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="4">
<a href="./posts/dreamy.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/dream_wow.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand All @@ -166,7 +166,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="1" data-listing-date-sort="1705104000000" data-listing-file-modified-sort="1706071160651" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="25">
<div class="g-col-1" data-index="1" data-listing-date-sort="1705104000000" data-listing-file-modified-sort="1706114882566" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="25">
<a href="./posts/TDC2023.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/TDC2023-sample-instances.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand All @@ -189,7 +189,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="2" data-listing-date-sort="1701302400000" data-listing-file-modified-sort="1706071160679" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="2" data-listing-date-sort="1701302400000" data-listing-file-modified-sort="1706114882598" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/fight_the_illusion.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<div class="listing-item-img-placeholder card-img-top" style="height: 150px;">&nbsp;</div>
Expand All @@ -212,7 +212,7 @@ <h5 class="no-anchor card-title listing-title">
</div>
</a>
</div>
<div class="g-col-1" data-index="3" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1706071160671" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<div class="g-col-1" data-index="3" data-listing-date-sort="1687651200000" data-listing-file-modified-sort="1706114882586" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7">
<a href="./posts/catalog.html" class="quarto-grid-link">
<div class="quarto-grid-item card h-100 card-left">
<p class="card-img-top"><img src="posts/catalog_files/figure-html/cell-9-output-1.png" style="height: 150px;" class="thumbnail-image card-img"/></p>
Expand Down
4 changes: 2 additions & 2 deletions posts/TDC2023.html
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ <h4 class="anchored" data-anchor-id="although-we-struggled-to-use-activation-eng
<p>to compare/rank different sequences of tokens. Since <span class="math inline">\(u_i\)</span> is now a scalar for each x, given a collection of such x’s we can construct a z-score for our dataset as <span class="math inline">\((u_i - mean(u_i))/std(u_i)\)</span>, and rank them.</p>
<div class="quarto-figure quarto-figure-left">
<figure class="figure">
<p><a href="TDC2023-sample-instances.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="The Z-scores of activation vector similarity for the provided sample instances"><img src="TDC2023-sample-instances.png" class="img-fluid figure-img" style="width:60.0%"></a></p>
<p><a href="TDC2023-sample-instances.png" class="lightbox" title="The Z-scores of activation vector similarity for the provided sample instances" data-gallery="quarto-lightbox-gallery-1"><img src="TDC2023-sample-instances.png" class="img-fluid figure-img" style="width:60.0%"></a></p>
<figcaption class="figure-caption">The Z-scores of activation vector similarity for the provided sample instances</figcaption>
</figure>
</div>
Expand Down Expand Up @@ -722,7 +722,7 @@ <h4 class="anchored" data-anchor-id="trojan-recovery">Trojan recovery:</h4>
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","loop":true,"selector":".lightbox","openEffect":"zoom","descPosition":"bottom"});</script>
<script>var lightboxQuarto = GLightbox({"descPosition":"bottom","selector":".lightbox","openEffect":"zoom","closeEffect":"zoom","loop":true});</script>



Expand Down
2 changes: 1 addition & 1 deletion posts/catalog.html
Original file line number Diff line number Diff line change
Expand Up @@ -823,7 +823,7 @@ <h2 class="anchored" data-anchor-id="github">GitHub</h2>
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","openEffect":"zoom","descPosition":"bottom","loop":true,"closeEffect":"zoom"});</script>
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","loop":true,"descPosition":"bottom","openEffect":"zoom"});</script>



Expand Down
10 changes: 5 additions & 5 deletions posts/catalog.out.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@
"Pythia-12B is miscalibrated on 20% of the bigrams and 45% of the\n",
"trigrams when we ask for prediction of $p \\geq 0.45$."
],
"id": "071e2969-0251-470d-811a-ef3e58e076f4"
"id": "512e3ec4-e0f8-44c4-9f5d-436c8e3bac3d"
},
{
"cell_type": "code",
Expand All @@ -313,7 +313,7 @@
}
],
"source": [],
"id": "6577a415-333d-40a7-b621-3cd5f0e7904b"
"id": "db43a0e6-1b41-4fe1-994e-a7af1aae7a34"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -377,7 +377,7 @@
"The dataset is available on Huggingface:\n",
"[pile_scan_4](https://huggingface.co/datasets/Confirm-Labs/pile_scan_4)"
],
"id": "cc7f5c8f-63b9-4dcc-ae4e-213496fa15d8"
"id": "b5bd2598-bfa6-4373-a25d-2f1434ce4cb3"
},
{
"cell_type": "code",
Expand All @@ -391,7 +391,7 @@
}
],
"source": [],
"id": "dea2970d-4d42-4b00-ae00-be6b007066ab"
"id": "1b39cf33-0f3e-4d04-9c75-5a7e6ad679df"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -423,7 +423,7 @@
"Computational Linguistics, May 2022, pp. 95–136. doi:\n",
"[10.18653/v1/2022.bigscience-1.9](https://doi.org/10.18653/v1/2022.bigscience-1.9).</span>"
],
"id": "a2ddc192-ddf8-4075-b06a-819568172e74"
"id": "06602fea-6654-4858-8914-d23032b89619"
}
],
"nbformat": 4,
Expand Down
15 changes: 10 additions & 5 deletions posts/dreamy.html
Original file line number Diff line number Diff line change
Expand Up @@ -668,7 +668,7 @@ <h1 class="title">Fluent dreaming for language models</h1>
</div>
</div>
</div>
<p>Dreaming is the process of maximizing some internal or output feature of a neural network by iteratively tweaking the input to the network. The most well-known example is DeepDream <span class="citation" data-cites="mordvintsev-2015"><a href="#ref-mordvintsev-2015" role="doc-biblioref">[1]</a></span>. Besides making pretty images, dreaming is useful for interpreting the purpose of the internal components of a neural network <span class="citation" data-cites="cammarata2020thread olah2017feature yosinski2015understanding"><a href="#ref-cammarata2020thread" role="doc-biblioref">[2]</a><a href="#ref-yosinski2015understanding" role="doc-biblioref">[4]</a></span>. Dreaming has only been applied to vision models because the input space to a vision model is approximately continuous and algorithms like gradient descent work well. For language models, the input space is discrete and very different algorithms are needed. Extending work in the adversarial attacks literature <span class="citation" data-cites="zou2023universal"><a href="#ref-zou2023universal" role="doc-biblioref">[5]</a></span>, in the paper, we introduce the Evolutionary Prompt Optimization (EPO) algorithm for dreaming with language models.</p>
<p>Dreaming is the process of maximizing some internal or output feature of a neural network by iteratively tweaking the input to the network. The most well-known example is DeepDream <span class="citation" data-cites="mordvintsev-2015"><a href="#ref-mordvintsev-2015" role="doc-biblioref">[1]</a></span>. Besides making pretty images, dreaming is useful for interpreting the purpose of the internal components of a neural network <span class="citation" data-cites="cammarata2020thread olah2017feature yosinski2015understanding"><a href="#ref-cammarata2020thread" role="doc-biblioref">[2]</a><a href="#ref-yosinski2015understanding" role="doc-biblioref">[4]</a></span>. To our knowledge, Dreaming has previously only been applied to vision models because the input space to a vision model is approximately continuous and algorithms like gradient descent work well. For language models, the input space is discrete and very different algorithms are needed. Extending work in the adversarial attacks literature <span class="citation" data-cites="zou2023universal"><a href="#ref-zou2023universal" role="doc-biblioref">[5]</a></span>, in the paper, we introduce the Evolutionary Prompt Optimization (EPO) algorithm for dreaming with language models.</p>
<p>On this page, we demonstrate running the EPO algorithm for a neuron in Phi-2. There is also a Colab notebook version of this page available.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
Expand Down Expand Up @@ -796,9 +796,9 @@ <h2 class="anchored" data-anchor-id="the-pareto-frontier">The Pareto frontier</h
</div>
</div>
<p>We also plot the evolution of the Pareto frontier over the course of the optimization run.</p>
<div class="cell" data-execution_count="7">
<div class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>linestyles <span class="op">=</span> [<span class="st">'k--o'</span>, <span class="st">'k:o'</span>, <span class="st">'k--*'</span>, <span class="st">'k:*'</span>]</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i, n <span class="kw">in</span> <span class="bu">enumerate</span>([<span class="dv">20</span>, <span class="dv">40</span>, <span class="dv">100</span>, <span class="dv">299</span>]):</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i, n <span class="kw">in</span> <span class="bu">enumerate</span>([<span class="dv">20</span>, <span class="dv">40</span>, <span class="dv">100</span>, <span class="dv">300</span>]):</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> pareto <span class="op">=</span> build_pareto_frontier(tokenizer, history.subset(<span class="bu">slice</span>(<span class="dv">0</span>, n)))</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> ordering <span class="op">=</span> np.argsort(pareto.xentropy)</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> plt.plot(pareto.full_xentropy, pareto.full_target, linestyles[i <span class="op">%</span> <span class="bu">len</span>(linestyles)], label<span class="op">=</span><span class="ss">f"</span><span class="sc">{</span>n<span class="sc">}</span><span class="ss"> iterations"</span>)</span>
Expand Down Expand Up @@ -853,7 +853,12 @@ <h2 class="anchored" data-anchor-id="thresholding-cross-entropy">Thresholding cr
</section>
<section id="causal-token-attribution" class="level2">
<h2 class="anchored" data-anchor-id="causal-token-attribution">Causal token attribution</h2>
<p>The visualizations below show the sensitivity to each token in the prompts. We first filter to the 32 “best” alternative tokens based on backpropagated token gradients. Then, amongst those 32 tokens, we calculate two sensitivities: - the drop in activation from swapping the token to the next most likely token. In the visualization, we show this in the height of the token bars. - the drop in activation from swapping the token to the least likely token. In the visualization, we show this with the color of the tokens. Darker reds indicate a larger drop in activation.</p>
<p>The visualizations below show the sensitivity to each token in the prompts. We first filter to the 32 “best” alternative tokens based on backpropagated token gradients. Then, amongst those 32 tokens, we calculate two sensitivities:</p>
<ul>
<li>the drop in activation from swapping the token to the next highest activation alternative token. In the visualization, we show this in the height of the token bars.</li>
<li>the drop in activation from swapping the token to the lowest activation alternative token. In the visualization, we show this with the color of the tokens. Darker reds indicate a larger drop in activation.</li>
</ul>
<p>The visualizations are interactive. Hover over each token to see a tooltip with the top-3 highest activation alternative tokens and the single lowest alternative token.</p>
<p>We show attribution visualizations for each prompt on the Pareto frontier. For all the prompts, swapping the last token can reduce the neuron activation to zero. Swapping other token can reduces the activation much less. The comma in the second-to-last position is also important and often has no viable substitute which is indicated by its tall bar.</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="bu">len</span>(ordering)):</span>
Expand Down Expand Up @@ -5206,7 +5211,7 @@ <h2 class="anchored" data-anchor-id="causal-token-attribution">Causal token attr
});
</script>
</div> <!-- /content -->
<script>var lightboxQuarto = GLightbox({"descPosition":"bottom","selector":".lightbox","openEffect":"zoom","loop":true,"closeEffect":"zoom"});</script>
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","descPosition":"bottom","selector":".lightbox","openEffect":"zoom","loop":true});</script>



Expand Down
68 changes: 38 additions & 30 deletions posts/dreamy.out.ipynb

Large diffs are not rendered by default.

Binary file modified posts/dreamy_files/figure-html/cell-9-output-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions search.json

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,22 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://confirmlabs.org/posts/catalog.html</loc>
<lastmod>2024-01-24T04:39:37.651Z</lastmod>
<lastmod>2024-01-24T16:48:21.134Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/dreamy.html</loc>
<lastmod>2024-01-24T04:39:34.395Z</lastmod>
<lastmod>2024-01-24T16:48:17.946Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/index.html</loc>
<lastmod>2024-01-24T04:39:31.495Z</lastmod>
<lastmod>2024-01-24T16:48:15.070Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/TDC2023.html</loc>
<lastmod>2024-01-24T04:39:32.975Z</lastmod>
<lastmod>2024-01-24T16:48:16.510Z</lastmod>
</url>
<url>
<loc>https://confirmlabs.org/posts/fight_the_illusion.html</loc>
<lastmod>2024-01-24T04:39:35.139Z</lastmod>
<lastmod>2024-01-24T16:48:18.650Z</lastmod>
</url>
</urlset>

0 comments on commit d10a671

Please sign in to comment.