Add `check_csrf` option to socket transport options #5952

tanguilp · 2024-10-13T16:31:55Z

This PR proposes adding a check_csrf option to Phoenix's socket transport, so as to enable caching Liveview's static rendering.

Context

After releasing plug_http_cache, I was asked if we could use it to cache Liveviews.

After a few experiment, I came with the following PR last year: #5667. Note that in this PR, I didn't follow the proposed approach in the discussion there (prefer being honest here 🙈).

I took more time to study the different approaches how to implement it this year. I described it in 2 blog posts:

We're talking about caching Liveview's with session enabled. When disabled, caching works OOTB.

Key takeaways are:

there are 2 mechanisms to prevent CSWSH: checking origin and checking a CSRF token
both mechanisms protect against CSWSH. As a consequence using only one is safe
by default Phoenix uses both for websocket & longpoll transports. origin check can be disabled, CSRF check cannot
we can't cache liveviews because the generated pages contain a CSRF token, which is user-specific
we propose instead to enable origin check, and disable the CSRF check
the second article also discusses:
- how to selectively caching some liveviews (opt-in caching)
- a pattern to avoid accidentally cache private data (assign_private, similar to assign_async)

I believe this pattern (Publicly caching private Liveviews) is actually another use-case for Liveview, separate but not exclusive to server-rendered interactivity. Usually you would make heavy use of javascript, requesting a separate APIs to get the little bits of private user data. The proposed pattern greatly simplifies that, and is compatible with any caching system (plug_http_cache, CDN, any shared cache between the Phoenix server and the user browser). But only time will tell 😁

Implementation

To disable CSRF check, you would do:

socket "/live", Phoenix.LiveView.Socket, websocket: [connect_info: [session: @session_options], check_csrf: false]

The check_csrf option is transport-specific, contrary to check_origin that can be configured at the endpoint level as well. This is because:

we could have non-web transports for which checking CSRF makes no sense
this could confuse users, because you still need to check CSRF for Phoenix views' forms for instance

We change the signature of Phoenix.Socket.Transport.connect_info/4 to add an opts parameter that defaults to [check_csrf: true].

When disabling CSRF check for Phoenix socket transports, we no longer need to 1) add a CSRF token in the HTML page 2) send the token as a parameter to the JS LiveSocket call.

A test was added to check that when this option is set to true (default value), including an invalid CSRF token fails to retrieve session data.

From a security perspective, I think that with the default values (secure by default) and the current warnings, it's hard to make Liveview insecure without noticing it.

josevalim · 2024-10-22T08:33:41Z

Hi @tanguilp, I believe we had previous discussions about this. Do you remember where they happened? I can't recall all of the details and linking back to them could help speed up the process. Meanwhile, thank you for the PR. ❤️

rhcarvalho · 2024-10-22T08:43:53Z

Hi @tanguilp, I believe we had previous discussions about this. Do you remember where they happened? I can't recall all of the details and linking back to them could help speed up the process. Meanwhile, thank you for the PR. ❤️

#5667 and https://elixirforum.com/t/http-caching-liveviews-first-render/59180?u=rhcarvalho

Hi José, I happen to read the conversation on the original PR for context I guess like a week ago.

In particular, you had a comment in the PR where you "draw a line on the sand" for what you'd accept or not at the time.

But @tanguilp also presents now some new ideas and approach.

rhcarvalho · 2024-10-22T08:51:54Z

More specific links:

(Full disclosure I'm not associated with the changes, but interested to learn from it and trying to help)

josevalim · 2024-10-22T09:25:21Z

Thank you @rhcarvalho, the Elixir Forum was the additional discussion that I was missing. Let me get up to date.

josevalim · 2024-10-22T09:31:53Z

We propose instead to enable origin check, and disable the CSRF check

@tanguilp my understanding is that we only require the CSRF tokens if they are sent by the client, no? So why not send the CSRF token from the client in the first place an option? I believe the answer is because LiveView then fails... but that would be something to address in LiveView, not here. :)

tanguilp · 2024-10-22T18:37:48Z

my understanding is that we only require the CSRF tokens if they are sent by the client, no?

No, we always check them if the session is configured in the transport.

So why not send the CSRF token from the client in the first place an option? I believe the answer is because LiveView then fails... but that would be something to address in LiveView, not here. :)

It's not failing in Liveview, but in Phoenix's socket transport. There was a problem with LiveView when no session was set but it turned out you can, well, just set a random value in the session.

They're two cases:

connect_info: [session: @session_options] is not set → then no problem, caching works OOTB
connect_info: [session: @session_options] is set → then Phoenix.Socket.Transport.connect_info/3 is called and CSRF checki is always performed. This PR makes this CSRF check it optional (enabled by default) so that caching can be enabled (and maybe for other use-cases - CSRF check is not needed as long as we have origin check enabled)

I believe we had previous discussions about this. Do you remember where they happened? I can't recall all of the details and linking back to them could help speed up the process.

(Thanks @rhcarvalho for posting the links!)

Last time we stopped while discussing the following points:

the need to allow / encourage granular caching → this is discussed in Part 1 of the blog post (live "/", MainLive.Index, private: %{cache: true} for example for opt-in caching)
"We should document that you MUST have check_origin enabled if you want to use this feature." → I think this is sufficiently documented, although we could add a more visible warning
"What I am saying is that it is granular per page. You don't disable session validation or change the CSWSH for the whole app, only on the pages you want to cache." → this is clearly not the direction taken by this PR. The rationale is that 1) the origin check does protect against CSWSH 2) disabling both checks by accident is hard unless you really don't read the doc. I think allowing to disable by route / page would make Phoenix's code more complex for a feature that might not have huge popularity
by the way this option can only be configured at the transport level - not at the endpoint level
there's still the problem of live_session's :session parameter that adds some values (potentially private data!) into the HTML directly as discussed in Part 2 of the blog post. Not sure how to handle this point. Document it in the option?
Part 2 discusses how to avoid accidentally caching private user data (assign_private on the model of assign_async)

Hopefully it answers your questions! I think the core issue is that this PR allows developers to shoot in their feet if they disable both origin and CSRF check, which is not possible now. The alternatives being:

bake in this feature deeper into Phoenix
maybe a custom Phoenix.Socket.Transport could help? Unsure about it, seems like it would require copy/pasting current websocket & longpolling code without the CSRF check
not allowing caching LiveViews to avoid possible critical security failures

Cheers!

josevalim · 2024-10-22T18:54:55Z

It's not failing in Liveview, but in Phoenix's socket transport. There was a problem with LiveView when no session was set but it turned out you can, well, just set a random value in the session.

Can you please explain how it is failing? Because, looking at this code, we simply set the session to nil, which will then fail in LiveView, but not in Phoenix. In that case, I'd prefer to deal with this in LiveView.

tanguilp · 2024-10-22T19:17:17Z

The point is that we want the session but without the CSRF check so as to implement the Publicly caching private Liveviews pattern.

josevalim · 2024-10-22T19:40:46Z

Doesn’t it mean then that we can cache user specific information, if we are not careful about it? So you need to write the page carefully to not expose any of that? How often is that used and why not cache parts of the page at the Phoenix layer instead of the HTTP layer?

josevalim · 2024-10-22T19:43:58Z

In any case, this is good to me as long as we raise if both check_origin and check_csrf is false. You need to have at least one of them enabled. :)

tanguilp · 2024-10-22T21:49:25Z

Doesn’t it mean then that we can cache user specific information, if we are not careful about it? So you need to write the page carefully to not expose any of that?

Yes, but the risk is totally mitigated if the approach described in the article is used.

How often is that used

Now nobody uses it but there's interest for sure!

and why not cache parts of the page at the Phoenix layer instead of the HTTP layer?

It's just more complicated to reassemble the parts of the page, and it's not compatible with caching with external systems (including CDNs).

In any case, this is good to me as long as we raise if both check_origin and check_csrf is false. You need to have at least one of them enabled. :)

Alright! Just need to check how to handle this in dev, where check_origin is set to false. Should we raise in dev as well when both checks are disabled?

josevalim · 2024-10-23T04:55:45Z

We should raise in dev as well, yes!

tanguilp · 2024-10-23T19:58:38Z

We should raise in dev as well, yes!

Done in 5f2bf63 and d80a535

josevalim · 2024-10-24T07:44:44Z

lib/phoenix/socket/transport.ex


    cond do
+      check_origin == false and check_csrf == false ->


Is there any chance we can detect and raise this when the endpoint is being configured, rather than at request time?

Technically we could do it at compile time somewhere like in https://github.com/phoenixframework/phoenix/blob/main/lib/phoenix/endpoint.ex#L637

However, check_origin is a runtime configuration option so there's not really to ensure it will no be changed at runtime. We could also check it when starting the endpoint, but as far as I can see we don't do this kind of checks in the current code.

This also raises the question of the check_origin: {m, f, a} configuration: should we raise if the MFA call returns false and check_csrf is set to false as well (not currently done)?

Can we check it when the supervision starts or the transports start? if it is a MFA, it is fine to not check, then the user is really up their own devices after manually bypassing all layers.

Done in d339626. Not sure it's the best place to do this, but I've pushed to discuss this :) There are no tests yet, hence I've set this PR to draft.

lib/phoenix/socket/transport.ex

We take into account that it's no longer possible to have both CSRF and origin checks disabled at the same time

Co-authored-by: José Valim <[email protected]>

Cannot be disabled with `check_origin` disabled as well

lib/phoenix/endpoint.ex

Co-authored-by: José Valim <[email protected]>

Schultzer · 2024-10-25T20:34:07Z

First of all I like this is going to be configurable, I wanted to add this SO comment as a good overview of when to use CSRF token vs Origin headers https://stackoverflow.com/questions/24680302/csrf-protection-with-cors-origin-header-vs-csrf-token#24692474

Origin headers is not always are guaranteed to be there, but I don’t believe this is a concern at all, since we have all the control and know which routes and methods are allowed in the first place at compile time, so there is potential for improving security / education even further for a developer.

lib/phoenix/endpoint/supervisor.ex

josevalim

I dropped one nitpick and we can ship it!

lib/phoenix/endpoint/supervisor.ex

Co-authored-by: José Valim <[email protected]>

lib/phoenix/endpoint/supervisor.ex

lib/phoenix/endpoint.ex

josevalim · 2024-10-31T07:37:02Z

💚 💙 💜 💛 ❤️

tanguilp added 5 commits October 13, 2024 17:11

Implement optionnal CSRF check

50cdc6d

Implement the check_csrf option

492e489

Document check_csrf option in Phoenix.Endpoint

e845ac8

Add test for check_csrf option

759eed4

Use boolean operator in condition

467e378

tanguilp added 2 commits October 23, 2024 22:56

Add test that checks disabling both origin and CSRF checks raises

5f2bf63

Add check that one of origin and CSRF check is performed

d80a535

josevalim reviewed Oct 24, 2024

View reviewed changes

lib/phoenix/socket/transport.ex Outdated Show resolved Hide resolved

tanguilp and others added 3 commits October 24, 2024 22:43

Update Phoenix.Socket.Transport.connect_info/4 doc

f5b45bf

We take into account that it's no longer possible to have both CSRF and origin checks disabled at the same time

Set the connect_option default options to []

f9a8f29

Co-authored-by: José Valim <[email protected]>

Update Phoenix.Endpoint doc regarding the check_csrf opt

0f3fdab

Cannot be disabled with `check_origin` disabled as well

josevalim reviewed Oct 24, 2024

View reviewed changes

lib/phoenix/endpoint.ex Outdated Show resolved Hide resolved

Fix doc indentation in endpoint

5b2ba48

Co-authored-by: José Valim <[email protected]>

tanguilp marked this pull request as draft October 27, 2024 15:30

Check prsence of CSRF or origin check when starting the transports

d339626