-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/url: don't parse ';' as a separator in query string [freeze exception] #25192
Comments
Complete repro - package main
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/foo", func(w http.ResponseWriter, req *http.Request) {
err := req.ParseForm()
if err != nil {
fmt.Println(err)
w.Write([]byte("error"))
return
}
params := req.Form
fmt.Println("params:", params, "count:", len(params))
for key, values := range params {
fmt.Println("param", key, ":", values)
}
w.Write([]byte("OK"))
})
fmt.Println("starting on port 9999")
server := &http.Server{
Addr: ":9999",
}
server.ListenAndServe()
}
@bradfitz - Is this expected ? |
Both & and ; are used to split key value pairs. The ; is optional since it allows you to provide a URL as a value with a query string. |
Hello @zhyale, thank you for filing this issue and welcome to the Go project! @agnivade and @fraenkel thank you for the responses too. So I believe, the root cause of the question here is rather: If you run https://play.golang.org/p/QVz18jWspPF or inlined below: package main
import (
"log"
"net/url"
)
func main() {
log.SetFlags(0)
u, err := url.Parse("http://localhost:9999/foo?id=1%27;--")
if err != nil {
log.Fatalf("Failed to parse URL: %v", err)
}
log.Printf("%#v\n", u.Query())
} You'll see the real symptom url.Values{"--":[]string{""}, "id":[]string{"1'"}} A W3C recommendation https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 recommended using and that was what we used to add However, unfortunately that recommendation just got superseded very recently in mid-December 2017 by https://www.w3.org/TR/2017/REC-html52-20171214/ and now that points to https://url.spec.whatwg.org/#urlsearchparams Node.js doesn't seem to recognize ';' as a separator url.parse('http://localhost:9999/foo?id=1%27;--', true);
Url {
protocol: 'http:',
slashes: true,
auth: null,
host: 'localhost:9999',
port: '9999',
hostname: 'localhost',
hash: null,
search: '?id=1%27;--',
query: { id: '1\';--' },
pathname: '/foo',
path: '/foo?id=1%27;--',
href: 'http://localhost:9999/foo?id=1%27;--' } and the old survey of what other languages do while technically still correct per #2210 (comment), any future adoptions of the new recommendation from W3C will mean that those languages will also change their behavior. Now the big question is: this behavior has been around since 2011, changing it 7 years later might massively break code for many users. Perhaps we need a survey of the make-up and interpretation of query strings from some frontend server? Also this perhaps will need some security considerations as well or maybe a wide scale adoption of the recommendation first? In addition to those already paged, I will page some more folks to help pitch in thoughts about the fate of the new W3C recommendation /cc @ianlancetaylor @andybons @tombergan @agl @rsc @mikesamuel |
The Go 1 docs say:
So we can't change it during Go 1.x. Repurposing this to be a Go 2.x issue. |
IIUC, the problem is that Array.from(new URL(`https://a/?a&b;c`).searchParams.keys())
// [ "a", "b;c" ] but values, _ := url.ParseQuery(`a&b;c`)
fmt.Println(reflect.ValueOf(values).MapKeys()) // [c a b] Assuming that's right:
+1 We might be able to get a handle on the size of potential breakage by surveying URL composition libraries to get an idea of how many sources are likely to produce URLs that are substantially different given non-crafted inputs. For example: // html/template
t, _ := template.New("T").Parse(`<a href="/?a={{.}}&b=b">`)
t.Execute(&out, `foo;bar&baz`)
out.String() == `<a href="/?a=foo%3bbar%26baz&b=b">`
// net/url
url.QueryEscape(`foo;bar&baz`) == `foo%3Bbar%26baz`
url.QueryUnescape(`foo%3Bbar%26baz`) == `foo;bar&baz` In vanilla JS encodeURIComponent(`foo;bar&baz`) == `foo%3Bbar%26baz`
encodeURI(`foo;bar&baz`) == `foo;bar&baz` I think this is mostly a correctness and interop issue but that the security consequences are not severe. That said, there are risks to inferring more structure than the spec allows. If a survey finds that there are widely used url.QueryEscape / encodeURIComponent counterparts that escape
then The risk is pretty small as long as |
Hi everybody,
into:
And Golang parse the ua parameter into Here a Go Playground to test it: https://play.golang.org/p/BeKy_UuSyoO |
It was recently pointed out that this behavior divergence can lead to cache poisoning attacks in reverse proxies that cache based on a subset of query parameters: if Go treats At the risk of repeating myself, I want to point out that relying on parser alignment for security is doomed, so this will break again, and often in subtler ways that we can't fix universally. However, this is probably broken most of the time now, so we should fix it. I did a quick survey of other popular languages and frameworks.
A more interesting question would be how caching proxies behave, but a lot of them are ad-hoc and hard to test. Snyk seems to think at least Varnish uses only I think the W3C and WHATWG recommendations are sort of red herrings: they are about Given Rails and Python are waiting for the next scheduled release to fix this, we will too, and ship the fix in Go 1.17. It would be too harsh of a behavior change for too vague of a security benefit to land in a security release anyway. This is technically a backwards compatibility promise violation, but I am invoking the security exception. More broadly, we should think whether the thing we want to provide compatibility with is "the exact self-contained behavior of the library when it was introduced" or "correctly interoperate with the current ecosystem". Realistically, we're already doing more of the latter with crypto/tls and crypto/x509, because doing the exact same thing as 5 years ago would be useless and insecure. HTTP and web specifications are similarly living and evolving, and we should figure out what our policy about that is. |
It seems wrong that every single HTTP server implementation in the world is being charged with coping with a decision made by a few proxies. If the proxy doesn't think ; is special, then it could easily enforce that view by escaping the ; as it passes the request through. And if it did think ; was special, it could enforce that by rewriting to &. Then just a few proxies would need changing instead of all the HTTP server implementations in the world. The fact that we're having this discussion and not the nginx developers is utterly backward. Anyway, that rant aside, if we do make this change, can we find some way to make it not a silent change? I worry about people updating to Go 1.17 and having their servers misbehave but with no idea why. Maybe a logging print for each request that ignores ; as separator, limited to one per second or something like that? It also seems like a hard requirement to be able to opt back in to the old behavior. Users with client apps sending ; that don't have the misfortune of being behind nginx will not like being forced to rewrite and redeploy their apps just to get Go 1.17. |
Adding to the minutes. It's getting a bit late to make a change in Go 1.17 so we should try to converge on a plan soon. @bigluck, if you still use the AWS API Gateway, does it still unescape semicolons? Or does anyone else know? |
@ianlancetaylor found private mail saying that https://angular.io/api/common/http/HttpUrlEncodingCodec did not escape semicolons at least in 2019, leading to confusion as well. Does anyone know if that is still the case? And does anyone know of any clients that do use semicolons for separators and would break if we stopped recognizing them? |
This proposal has been added to the active column of the proposals project |
I am not convinced we should make a breaking change here. 1. There are many many implementations of this behavior.HTML 4 introduced web forms. Appendix B, section B.2.2 reads:
Go, like Python, Ruby, and other languages, implemented this recommendation. I can't find anything saying that W3C or WHATWG has explicitly retracted this. (The W3C link in the Snyk post is about POST form parsing, not query parameters. And the Snyk post also later refers to “the RFC” but seems to mean that W3C link again, not an actual RFC. Snyk's being sloppy, and sloppy mixed with security is never good.) 2. Making this change will break Go users.I remain concerned about breaking existing users. Python deciding to make the change does not seem like a compelling argument to me. Go takes a much stronger view of backwards compatibility than Python does (for example, Python 3). There is no doubt that at least some hand-written HTML containing links that will break when semicolon is removed. We would need a good reason to break those users. The only acceptable reason here is security. But this is the wrong way to improve security of Nginx configurations. 3. Making this change is not the best fix for the web or nginx ecosystem.It is clear that there are many URL query parsers that implemented the HTML 4 recommendations. If caching proxies like Nginx want to work correctly, they should be guarding against this problem, not every possible server sitting behind them. The fix can be made in one place (the proxy) instead of a very large number of places. URL query parameter parsing turns out to be ambiguous, because we don't know whether a particular implementation will split on semicolons. The caching proxy is the first thing that processes the URL. If it interprets the query parameters a particular way, then it should rewrite them to have that meaning unambiguously. So if a proxy receives The Snyk post says:
This is good advice. It boils down to "Make sure your server and your cache proxy agree about how to split query arguments." The simplest, most guaranteed way to do this is to have the proxy rewrite the query to be unambiguous. 4. Go servers behind nginx are not affected by default.The nginx docs say the default proxy cache key is The way to get into trouble is to use a key like 5. A potential non-breaking Go change.To not break users, we could add a url.AllowQuerySemicolon function that controls whether ParseQuery accepts semicolons. It can default to true, but users who want to use nginx's query argument cache keys would call AllowQuerySemicolon(false) to bring the Go implementation in line with nginx's. It is not clear to me that we have enough justification to take the next step of changing the default and breaking users, all for the very few users who are using nginx and setting a custom cache key using query argument parameters. In the long term, it still seems like nginx should rewrite the query strings to escape semicolons in requests that are consulting the proxy cache and using query parameter argument keys, ensuring that nginx's understanding of the request URL matches the server's, 100% of the time, no matter what the server is. |
I think focusing on nginx is optimistic. nginx happens to be a popular reverse proxy that exhibits this behavior, but I doubt it's the only one. We should check Cloudflare, Fastly, Akamai, AWS, GCP, Azure, Varnish, F5, and Apache, off the top of my head. If they all need to change, then changing the server frameworks is not "changing the thing in a lot of places instead of in one place" (which I generally agree with as an argument). At that point, I think it becomes a matter of what's more expected and predictable. The HTML 4 spec is hardly authoritative for URL query strings in HTTP, so we just don't have an easy reference here. As of today, I found the If you are not convinced, I think we should check with all those other reverse proxy vendors before declining this proposal. |
With the change https://golang.org/cl/325697 in place our servers now produce lots of log lines like
This is really not helpful, and since it's a web server, anyone can flood you with requests with a |
@yktoo Have you looked into using |
|
Thanks @antichris, #49399/#50034 describe my concerns exactly. @seankhliao yes log filtering seems the only workaround at the moment, albeit very ugly. |
I have been caught this by surprise and want to restore the old behavior. But it's unclear to me how to do this while supporting both 1.17&pre-1.17 since calling |
@shinny-chengzhi You can use a build tag to compile code differently for go1.16 and go1.17. |
@ianlancetaylor Thanks, that will do it. |
It's used by upstream server to log some of the kinda-breaking issues, for example [1]. Since the majority of the http servers at Reddit are public facing, when those happens it's usually just user messing with us, not really that the http client we control doing things wrong. This gives us a way to suppress those logs, and also emit counters to better track how many times those happened. This also makes the upstream http server to use the same json logger by zap as the rest of our code. [1]: golang/go#25192 (comment)
It's used by upstream server to log some of the kinda-breaking issues, for example [1]. Since the majority of the http servers at Reddit are public facing, when those happens it's usually just user messing with us, not really that the http client we control doing things wrong. This gives us a way to suppress those logs, and also emit counters to better track how many times those happened. This also makes the upstream http server to use the same json logger by zap as the rest of our code, at warning level. [1]: golang/go#25192 (comment)
It's used by upstream server to log some of the kinda-breaking issues, for example [1]. Since the majority of the http servers at Reddit are public facing, when those happens it's usually just user messing with us, not really that the http client we control doing things wrong. This gives us a way to suppress those logs, and also emit counters to better track how many times those happened. This also makes the upstream http server to use the same json logger by zap as the rest of our code, at warning level. [1]: golang/go#25192 (comment)
It's used by upstream server to log some of the kinda-breaking issues, for example [1]. Since the majority of the http servers at Reddit are public facing, when those happens it's usually just user messing with us, not really that the http client we control doing things wrong. This gives us a way to suppress those logs, and also emit counters to better track how many times those happened. This also makes the upstream http server to use the same json logger by zap as the rest of our code, at warning level. [1]: golang/go#25192 (comment)
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go1.10.1 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?CentOS 7 with kernel 3.10.0-514.10.2.el7.x86_64
What did you do?
I create a program to detect attack such as SQL Injection, when test case is:
http://..../testcase?id=1%27;--
and I use:
r.ParseForm()
params := r.Form
fmt.Println("params:", params, "count:", len(params))
for key, values := range params {
fmt.Println("param", key, ":", values)
}
Got:
params: map[--:[] id:[1']] count: 2
param id : [1']
param -- : []
What did you expect to see?
expect only one expression in this case:
key: id
value: 1';--
What did you see instead?
I got two key:[value] pairs.
The text was updated successfully, but these errors were encountered: