feat: implement :has pseudo-selector functionality#624
Conversation
| ## NOTE: this parses incorrectly, parses as: | ||
| ## %PseudoClass{name: "has", value: [%Selector{type: "label", pseudo_classes: [%PseudoClass{name: "has", value: []}]}]} | ||
| ## but would expect to parse as: | ||
| ## %PseudoClass{name: "has", value: [%Selector{type: "div", pseudo_classes: [%PseudoClass{name: "has", value: [%Selector{type: "label"}]}]}]} | ||
| # assert_find(html, "tr:has(div:has(label))", [ | ||
| # {"tr", [], | ||
| # [ | ||
| # {"th", [], [{"div", [], [{"label", [], ["NESTED"]}]}]}, | ||
| # {"td", [], [{"div", [], ["fetch me pls"]}]} | ||
| # ]} | ||
| # ]) | ||
|
|
||
| ## NOTE: this does not parse, because "only simple selectors are allowed in :has() pseudo-class" | ||
| # assert_find(html, "th:has(> label)", [ | ||
| # {"th", [], [{"label", [], ["TEST"]}]} | ||
| # ]) | ||
|
|
||
| ## NOTE: this does not parse, because "only simple selectors are allowed in :has() pseudo-class" | ||
| # assert_find(html, "th:has(> div > label)", [ | ||
| # {"th", [], [{"div", [], [{"label", [], ["NESTED"]}]}]} | ||
| # ]) | ||
|
|
||
| ## NOTE: this parses incorrectly, parses as: | ||
| ## %PseudoClass{name: "not", value: [%Selector{type: "label", pseudo_classes: [%PseudoClass{name: "has", value: []}]}]} | ||
| ## but would expect to parse as: | ||
| ## %PseudoClass{name: "not", value: [%Selector{type: "*", pseudo_classes: [%PseudoClass{name: "has", value: [%Selector{type: "label"}]}]}]} | ||
| # assert_find(html, "tr:not(:has(label))", [ | ||
| # {"tr", [], [{"th", [], ["No Label"]}, {"td", [], ["some data"]}]} | ||
| # ]) |
There was a problem hiding this comment.
@philss During testing, I found some cases that are valid according to the CSS spec, but are not handled correctly by Floki. I suspect this has something to do with only simple selectors being allowed in not and has.
Perhaps out of scope for this PR, but do you see a solution for this? What would it take to support full selectors in not and has, do you think?
There was a problem hiding this comment.
Yeah, this is due the fact that we don't parse the selector correctly. I think we would need to tweak the "selector" parser to be more recursive when encounter those two pseudo-selectors. We would probably need a special case to handle the > combinator as well.
There was a problem hiding this comment.
Yeah the selector parser would definitely need to be more recursive. I'm not familiar with the syntax of the .xrl files and accompanying parser implementations, but I did find compiler construction very interesting in uni, so here's some pseudo-grammar of how I think a CSS selector parser is supposed to be defined:
# Let's say an identifier is just a word
ID = ~r/\w+/
# a number is just, well, a number
NUMBER = ~r/\d+/
# and a value is anything within quotes
VALUE = ~r/".*"/
# Then the grammar for Floki's CSS selectors can be defined recursively as:
SELECTOR = (
:root
*
ID
.ID
SELECTOR[ID=VALUE]
SELECTOR[ID~=VALUE]
SELECTOR[ID^=VALUE]
SELECTOR.nth-child(NUMBER)
SELECTOR#ID
# etc
SELECTOR? SELECTOR
SELECTOR? > SELECTOR
SELECTOR? + SELECTOR
SELECTOR? ~ SELECTOR
SELECTOR?:checked
SELECTOR?:disabled
SELECTOR?:fl-contains(VALUE)
SELECTOR?:fl-icontains(VALUE)
SELECTOR?:not(SELECTOR)
SELECTOR?:has(SELECTOR)
)Note that this grammar allows a superset of what CSS selectors semantically allow, as for example div:has(:has(img)) is not allowed in CSS, but div:not(:has(:not(img))) is. But as you can see, CSS selectors are quite recursive in their grammar, so the parser would need to be as well. Semantic correctness should be checked separately, potentially even at evaluation time.
Regarding selectors like > h2, whether on its own or within :has, e.g. div:has(> h2), you could interpret them as the case SELECTOR? > SELECTOR, where the first selector is nil. Logically, the absence of a selector implies no selection is being done, i.e. every element we try, matches. Given that we start the evaluation of a CSS selector by trying all HTML nodes in the HTML tree, > h2 on its own is thus equivalent to * > h2 (i.e. match every HTML element that has a h2 as a direct child). However, within div:has(> h2), when we reach > h2 in our evaluation, we have already narrowed down our search to match only divs, so > h2 is only evaluated on all div elements in the tree, making the evaluation of div:has(> h2) essentially equivalent to div > h2, except the former selects the div, the latter selects the h2. The same logic can be applied for other combinators: div:has(+ dt) is essentially the same in evaluation as div + dt, but the former selects the div, the latter selects the dt. That is how I interpret and logically make sense of the :has CSS spec.
Does this help? I realise it's been a pretty long response and it's pretty difficult to get on the same page in a complex topic like this without speaking a common grammar, so if you want to have a video call someday to meet each other and have a more interactive conversation on this subject, I'd be happy to 😊
There was a problem hiding this comment.
Yeah, that was super helpful! I will not promise to implement changes for that soon, but I will keep all of that in mind.
Thank you very much for your research and suggestions.
There was a problem hiding this comment.
Yeah I understand, it probably requires some significant changes. Let me know when you do, I'll probably still be interested in helping out! ;)
philss
left a comment
There was a problem hiding this comment.
Thank you! I think you nailed it!
I added some points to change, but overall it looks great!
Co-authored-by: Philip Sampaio <philip.sampaio@gmail.com>
…d fix behaviour of :has when given multiple arguments
|
I noticed I made a mistake in my implementation. According to https://developer.mozilla.org/en-US/docs/Web/CSS/:has#logical_operations when we give I've fixed that now and the updated tests reflect that :) |
Fixes #482, #382, #592
TODO:
:hasto README.