-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making urlparse WHATWG conformant #88049
Comments
Mike Lissner reported that a set test suites that exercise extreme conditions with URLs, but in conformance with url.spec.whatwg.org https://github.com/web-platform-tests/wpt/tree/77da471a234e03e65a22ee6df8ceff7aaba391f8/url These test cases were used against urlparse and urljoin method. https://gist.github.com/mlissner/4d2110d7083d74cff3893e261a801515 Quoting verbatim
|
It would be interesting to test also with the yarl module. It is based on urlparse and urljoin, but does extra normalization of %-encoding. |
See also bpo-43882. |
FWIW rather than implementing our own URL parsing at all... wrapping a library extracted from a compatible-license major browser (Chromium or Firefox) and keeping it updated would avoid disparities. Unfortunately, I'm not sure how feasible this really is. Do all of the API surfaces we must support in the stdlib for compatibility's sake with urllib line up with such a browser core URL parsing library? Something to ponder. Unlikely something we'll actually do. |
If urlparse's behaviour was superseded by RFC 2396 back in 1998 (which is what resulted in the addition of urlsplit: #35466), the whatwg url standard is a descendant of that (itself intending to supersede RFC 3986). As of the url standard (and already RFC 3986) the entire concept of path segment parameter has been dropped. By its very nature and purpose, urlparse can't pass the conformance suite, because any time it encounters a |
I'm not sure if this is the correct issue or not, but it seemed related. Chromium (and derivative browsers) recently implemented a change to URL parsing that broke a bunch of custom protocols. We have an application that uses the protocol The way I understand it, the correct format for custom protocols is to not have the Thanks and please let me know if this is the wrong issue or if you need more information. |
also @sethmlarson as FYI - it isn't clear we'd ever even be able to do what this issue suggests or that we should FWIW. there are multiple competing interests in the world who have valid needs to use URLs (URIs?) in different ways. |
I don't think we should chase browsers, and therefore we shouldn't implement WHATWG. They are allowed to update their "fleet" globally on a whim, they primarily deal with users physically typing things into address bars, we have things called backwards compatibility guarantees. I think implementing RFC 3986 is appropriate for software like Python. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: