Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check_csrf option to socket transport options #5952

Merged

Conversation

tanguilp
Copy link
Contributor

This PR proposes adding a check_csrf option to Phoenix's socket transport, so as to enable caching Liveview's static rendering.

Context

After releasing plug_http_cache, I was asked if we could use it to cache Liveviews.

After a few experiment, I came with the following PR last year: #5667. Note that in this PR, I didn't follow the proposed approach in the discussion there (prefer being honest here 🙈).

I took more time to study the different approaches how to implement it this year. I described it in 2 blog posts:

We're talking about caching Liveview's with session enabled. When disabled, caching works OOTB.

Key takeaways are:

  • there are 2 mechanisms to prevent CSWSH: checking origin and checking a CSRF token
  • both mechanisms protect against CSWSH. As a consequence using only one is safe
  • by default Phoenix uses both for websocket & longpoll transports. origin check can be disabled, CSRF check cannot
  • we can't cache liveviews because the generated pages contain a CSRF token, which is user-specific
  • we propose instead to enable origin check, and disable the CSRF check
  • the second article also discusses:
    • how to selectively caching some liveviews (opt-in caching)
    • a pattern to avoid accidentally cache private data (assign_private, similar to assign_async)

I believe this pattern (Publicly caching private Liveviews) is actually another use-case for Liveview, separate but not exclusive to server-rendered interactivity. Usually you would make heavy use of javascript, requesting a separate APIs to get the little bits of private user data. The proposed pattern greatly simplifies that, and is compatible with any caching system (plug_http_cache, CDN, any shared cache between the Phoenix server and the user browser). But only time will tell 😁

Screenshot from 2024-09-04 16-45-11

Implementation

To disable CSRF check, you would do:

socket "/live", Phoenix.LiveView.Socket, websocket: [connect_info: [session: @session_options], check_csrf: false]

The check_csrf option is transport-specific, contrary to check_origin that can be configured at the endpoint level as well. This is because:

  • we could have non-web transports for which checking CSRF makes no sense
  • this could confuse users, because you still need to check CSRF for Phoenix views' forms for instance

We change the signature of Phoenix.Socket.Transport.connect_info/4 to add an opts parameter that defaults to [check_csrf: true].

When disabling CSRF check for Phoenix socket transports, we no longer need to 1) add a CSRF token in the HTML page 2) send the token as a parameter to the JS LiveSocket call.

A test was added to check that when this option is set to true (default value), including an invalid CSRF token fails to retrieve session data.

From a security perspective, I think that with the default values (secure by default) and the current warnings, it's hard to make Liveview insecure without noticing it.

@josevalim
Copy link
Member

Hi @tanguilp, I believe we had previous discussions about this. Do you remember where they happened? I can't recall all of the details and linking back to them could help speed up the process. Meanwhile, thank you for the PR. ❤️

@rhcarvalho
Copy link
Contributor

Hi @tanguilp, I believe we had previous discussions about this. Do you remember where they happened? I can't recall all of the details and linking back to them could help speed up the process. Meanwhile, thank you for the PR. ❤️

#5667 and https://elixirforum.com/t/http-caching-liveviews-first-render/59180?u=rhcarvalho

Hi José, I happen to read the conversation on the original PR for context I guess like a week ago.

In particular, you had a comment in the PR where you "draw a line on the sand" for what you'd accept or not at the time.

But @tanguilp also presents now some new ideas and approach.

@rhcarvalho
Copy link
Contributor

More specific links:

(Full disclosure I'm not associated with the changes, but interested to learn from it and trying to help)

@josevalim
Copy link
Member

Thank you @rhcarvalho, the Elixir Forum was the additional discussion that I was missing. Let me get up to date.

@josevalim
Copy link
Member

We propose instead to enable origin check, and disable the CSRF check

@tanguilp my understanding is that we only require the CSRF tokens if they are sent by the client, no? So why not send the CSRF token from the client in the first place an option? I believe the answer is because LiveView then fails... but that would be something to address in LiveView, not here. :)

@tanguilp
Copy link
Contributor Author

my understanding is that we only require the CSRF tokens if they are sent by the client, no?

No, we always check them if the session is configured in the transport.

So why not send the CSRF token from the client in the first place an option? I believe the answer is because LiveView then fails... but that would be something to address in LiveView, not here. :)

It's not failing in Liveview, but in Phoenix's socket transport. There was a problem with LiveView when no session was set but it turned out you can, well, just set a random value in the session.

They're two cases:

  • connect_info: [session: @session_options] is not set → then no problem, caching works OOTB
  • connect_info: [session: @session_options] is set → then Phoenix.Socket.Transport.connect_info/3 is called and CSRF checki is always performed. This PR makes this CSRF check it optional (enabled by default) so that caching can be enabled (and maybe for other use-cases - CSRF check is not needed as long as we have origin check enabled)

I believe we had previous discussions about this. Do you remember where they happened? I can't recall all of the details and linking back to them could help speed up the process.

(Thanks @rhcarvalho for posting the links!)

Last time we stopped while discussing the following points:

  • the need to allow / encourage granular caching → this is discussed in Part 1 of the blog post (live "/", MainLive.Index, private: %{cache: true} for example for opt-in caching)
  • "We should document that you MUST have check_origin enabled if you want to use this feature." → I think this is sufficiently documented, although we could add a more visible warning
  • "What I am saying is that it is granular per page. You don't disable session validation or change the CSWSH for the whole app, only on the pages you want to cache." → this is clearly not the direction taken by this PR. The rationale is that 1) the origin check does protect against CSWSH 2) disabling both checks by accident is hard unless you really don't read the doc. I think allowing to disable by route / page would make Phoenix's code more complex for a feature that might not have huge popularity
  • by the way this option can only be configured at the transport level - not at the endpoint level
  • there's still the problem of live_session's :session parameter that adds some values (potentially private data!) into the HTML directly as discussed in Part 2 of the blog post. Not sure how to handle this point. Document it in the option?
  • Part 2 discusses how to avoid accidentally caching private user data (assign_private on the model of assign_async)

Hopefully it answers your questions! I think the core issue is that this PR allows developers to shoot in their feet if they disable both origin and CSRF check, which is not possible now. The alternatives being:

  • bake in this feature deeper into Phoenix
  • maybe a custom Phoenix.Socket.Transport could help? Unsure about it, seems like it would require copy/pasting current websocket & longpolling code without the CSRF check
  • not allowing caching LiveViews to avoid possible critical security failures

Cheers!

@josevalim
Copy link
Member

It's not failing in Liveview, but in Phoenix's socket transport. There was a problem with LiveView when no session was set but it turned out you can, well, just set a random value in the session.

Can you please explain how it is failing? Because, looking at this code, we simply set the session to nil, which will then fail in LiveView, but not in Phoenix. In that case, I'd prefer to deal with this in LiveView.

@tanguilp
Copy link
Contributor Author

The point is that we want the session but without the CSRF check so as to implement the Publicly caching private Liveviews pattern.

@josevalim
Copy link
Member

Doesn’t it mean then that we can cache user specific information, if we are not careful about it? So you need to write the page carefully to not expose any of that? How often is that used and why not cache parts of the page at the Phoenix layer instead of the HTTP layer?

@josevalim
Copy link
Member

In any case, this is good to me as long as we raise if both check_origin and check_csrf is false. You need to have at least one of them enabled. :)

@tanguilp
Copy link
Contributor Author

Doesn’t it mean then that we can cache user specific information, if we are not careful about it? So you need to write the page carefully to not expose any of that?

Yes, but the risk is totally mitigated if the approach described in the article is used.

How often is that used

Now nobody uses it but there's interest for sure!

and why not cache parts of the page at the Phoenix layer instead of the HTTP layer?

It's just more complicated to reassemble the parts of the page, and it's not compatible with caching with external systems (including CDNs).

In any case, this is good to me as long as we raise if both check_origin and check_csrf is false. You need to have at least one of them enabled. :)

Alright! Just need to check how to handle this in dev, where check_origin is set to false. Should we raise in dev as well when both checks are disabled?

@josevalim
Copy link
Member

We should raise in dev as well, yes!

@tanguilp
Copy link
Contributor Author

We should raise in dev as well, yes!

Done in 5f2bf63 and d80a535


cond do
check_origin == false and check_csrf == false ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance we can detect and raise this when the endpoint is being configured, rather than at request time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could do it at compile time somewhere like in https://github.com/phoenixframework/phoenix/blob/main/lib/phoenix/endpoint.ex#L637

However, check_origin is a runtime configuration option so there's not really to ensure it will no be changed at runtime. We could also check it when starting the endpoint, but as far as I can see we don't do this kind of checks in the current code.

This also raises the question of the check_origin: {m, f, a} configuration: should we raise if the MFA call returns false and check_csrf is set to false as well (not currently done)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check it when the supervision starts or the transports start? if it is a MFA, it is fine to not check, then the user is really up their own devices after manually bypassing all layers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d339626. Not sure it's the best place to do this, but I've pushed to discuss this :) There are no tests yet, hence I've set this PR to draft.

tanguilp and others added 3 commits October 24, 2024 22:43
We take into account that it's no longer possible to have both CSRF
and origin checks disabled at the same time
Cannot be disabled with `check_origin` disabled as well
lib/phoenix/endpoint.ex Outdated Show resolved Hide resolved
@Schultzer
Copy link
Contributor

First of all I like this is going to be configurable, I wanted to add this SO comment as a good overview of when to use CSRF token vs Origin headers https://stackoverflow.com/questions/24680302/csrf-protection-with-cors-origin-header-vs-csrf-token#24692474

Origin headers is not always are guaranteed to be there, but I don’t believe this is a concern at all, since we have all the control and know which routes and methods are allowed in the first place at compile time, so there is potential for improving security / education even further for a developer.

@tanguilp tanguilp marked this pull request as draft October 27, 2024 15:30
Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped one nitpick and we can ship it!

@josevalim josevalim marked this pull request as ready for review October 31, 2024 07:30
lib/phoenix/endpoint.ex Outdated Show resolved Hide resolved
@josevalim josevalim merged commit c486cdf into phoenixframework:main Oct 31, 2024
7 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants