feat: implement :has pseudo-selector functionality by bvobart · Pull Request #624 · philss/floki

bvobart · 2025-05-15T23:12:25Z

TODO:

Add documentation for :has to README.
Review by @philss
Fix tests
Add a few more tests to include other combinators and sub-selectors.

bvobart · 2025-05-15T23:21:39Z

+    ## NOTE: this parses incorrectly, parses as:
+    ##   %PseudoClass{name: "has", value: [%Selector{type: "label", pseudo_classes: [%PseudoClass{name: "has", value: []}]}]}
+    ## but would expect to parse as:
+    ##   %PseudoClass{name: "has", value: [%Selector{type: "div", pseudo_classes: [%PseudoClass{name: "has", value: [%Selector{type: "label"}]}]}]}
+    # assert_find(html, "tr:has(div:has(label))", [
+    #   {"tr", [],
+    #    [
+    #      {"th", [], [{"div", [], [{"label", [], ["NESTED"]}]}]},
+    #      {"td", [], [{"div", [], ["fetch me pls"]}]}
+    #    ]}
+    # ])
+
+    ## NOTE: this does not parse, because "only simple selectors are allowed in :has() pseudo-class"
+    # assert_find(html, "th:has(> label)", [
+    #   {"th", [], [{"label", [], ["TEST"]}]}
+    # ])
+
+    ## NOTE: this does not parse, because "only simple selectors are allowed in :has() pseudo-class"
+    # assert_find(html, "th:has(> div > label)", [
+    #   {"th", [], [{"div", [], [{"label", [], ["NESTED"]}]}]}
+    # ])
+
+    ## NOTE: this parses incorrectly, parses as:
+    ##  %PseudoClass{name: "not", value: [%Selector{type: "label", pseudo_classes: [%PseudoClass{name: "has", value: []}]}]}
+    ## but would expect to parse as:
+    ##  %PseudoClass{name: "not", value: [%Selector{type: "*", pseudo_classes: [%PseudoClass{name: "has", value: [%Selector{type: "label"}]}]}]}
+    # assert_find(html, "tr:not(:has(label))", [
+    #   {"tr", [], [{"th", [], ["No Label"]}, {"td", [], ["some data"]}]}
+    # ])


@philss During testing, I found some cases that are valid according to the CSS spec, but are not handled correctly by Floki. I suspect this has something to do with only simple selectors being allowed in not and has.

Perhaps out of scope for this PR, but do you see a solution for this? What would it take to support full selectors in not and has, do you think?

Yeah, this is due the fact that we don't parse the selector correctly. I think we would need to tweak the "selector" parser to be more recursive when encounter those two pseudo-selectors. We would probably need a special case to handle the > combinator as well.

Yeah the selector parser would definitely need to be more recursive. I'm not familiar with the syntax of the .xrl files and accompanying parser implementations, but I did find compiler construction very interesting in uni, so here's some pseudo-grammar of how I think a CSS selector parser is supposed to be defined:

# Let's say an identifier is just a word ID = ~r/\w+/ # a number is just, well, a number NUMBER = ~r/\d+/ # and a value is anything within quotes VALUE = ~r/".*"/ # Then the grammar for Floki's CSS selectors can be defined recursively as: SELECTOR = ( :root * ID .ID SELECTOR[ID=VALUE] SELECTOR[ID~=VALUE] SELECTOR[ID^=VALUE] SELECTOR.nth-child(NUMBER) SELECTOR#ID # etc SELECTOR? SELECTOR SELECTOR? > SELECTOR SELECTOR? + SELECTOR SELECTOR? ~ SELECTOR SELECTOR?:checked SELECTOR?:disabled SELECTOR?:fl-contains(VALUE) SELECTOR?:fl-icontains(VALUE) SELECTOR?:not(SELECTOR) SELECTOR?:has(SELECTOR) )

Note that this grammar allows a superset of what CSS selectors semantically allow, as for example div:has(:has(img)) is not allowed in CSS, but div:not(:has(:not(img))) is. But as you can see, CSS selectors are quite recursive in their grammar, so the parser would need to be as well. Semantic correctness should be checked separately, potentially even at evaluation time.

Regarding selectors like > h2, whether on its own or within :has, e.g. div:has(> h2), you could interpret them as the case SELECTOR? > SELECTOR, where the first selector is nil. Logically, the absence of a selector implies no selection is being done, i.e. every element we try, matches. Given that we start the evaluation of a CSS selector by trying all HTML nodes in the HTML tree, > h2 on its own is thus equivalent to * > h2 (i.e. match every HTML element that has a h2 as a direct child). However, within div:has(> h2), when we reach > h2 in our evaluation, we have already narrowed down our search to match only divs, so > h2 is only evaluated on all div elements in the tree, making the evaluation of div:has(> h2) essentially equivalent to div > h2, except the former selects the div, the latter selects the h2. The same logic can be applied for other combinators: div:has(+ dt) is essentially the same in evaluation as div + dt, but the former selects the div, the latter selects the dt. That is how I interpret and logically make sense of the :has CSS spec.

Does this help? I realise it's been a pretty long response and it's pretty difficult to get on the same page in a complex topic like this without speaking a common grammar, so if you want to have a video call someday to meet each other and have a more interactive conversation on this subject, I'd be happy to 😊

Yeah, that was super helpful! I will not promise to implement changes for that soon, but I will keep all of that in mind.
Thank you very much for your research and suggestions.

Yeah I understand, it probably requires some significant changes. Let me know when you do, I'll probably still be interested in helping out! ;)

philss

Thank you! I think you nailed it!
I added some points to change, but overall it looks great!

Co-authored-by: Philip Sampaio <philip.sampaio@gmail.com>

…d fix behaviour of :has when given multiple arguments

bvobart · 2025-05-20T10:56:33Z

I noticed I made a mistake in my implementation. According to https://developer.mozilla.org/en-US/docs/Web/CSS/:has#logical_operations when we give :has multiple arguments, e.g. :has(s1, s2), it means that s1 OR s2 must match in order for the :has to match. Initially, I thought it was an AND, which also made :has(s1, s2) equivalent to :has(s1):has(s2), but that is not true.

I've fixed that now and the updated tests reflect that :)

bvobart mentioned this pull request May 15, 2025

Support for :has pseudo selector #482

Closed

bvobart commented May 15, 2025

View reviewed changes

feat: implement :has pseudo-selector functionality

27ed99d

bvobart force-pushed the main branch from 8c53a23 to 27ed99d Compare May 15, 2025 23:37

docs: update ReadMe with info about :has and simple selectors

5448f28

philss reviewed May 17, 2025

View reviewed changes

Comment thread lib/floki/selector/pseudo_class.ex Outdated

Comment thread test/floki_test.exs

bvobart and others added 2 commits May 19, 2025 23:30

Apply PR review suggestion on lib/floki/selector/pseudo_class.ex

1d7f004

Co-authored-by: Philip Sampaio <philip.sampaio@gmail.com>

fix: add more tests for :has, ensure they all pass for all parsers an…

a9b5e3c

…d fix behaviour of :has when given multiple arguments

bvobart requested a review from philss May 20, 2025 10:56

philss approved these changes Jun 1, 2025

View reviewed changes

philss merged commit f900ea1 into philss:main Jun 1, 2025
6 checks passed

This was referenced Jun 6, 2025

Selecting parents #592

Closed

Extend fl-contains to mach on children text as well #382

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement :has pseudo-selector functionality#624

feat: implement :has pseudo-selector functionality#624
philss merged 4 commits into
philss:mainfrom
bvobart:main

bvobart commented May 15, 2025 •

edited

Loading

Uh oh!

bvobart May 15, 2025

Uh oh!

philss May 17, 2025

Uh oh!

bvobart May 19, 2025 •

edited

Loading

Uh oh!

philss Jun 1, 2025

Uh oh!

bvobart Jun 2, 2025

Uh oh!

philss left a comment

Uh oh!

Uh oh!

Uh oh!

bvobart commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bvobart commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bvobart May 15, 2025

Choose a reason for hiding this comment

Uh oh!

philss May 17, 2025

Choose a reason for hiding this comment

Uh oh!

bvobart May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philss Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

bvobart Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

philss left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bvobart commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bvobart commented May 15, 2025 •

edited

Loading

bvobart May 19, 2025 •

edited

Loading