Skip to content

feat: implement :has pseudo-selector functionality#624

Merged
philss merged 4 commits into
philss:mainfrom
bvobart:main
Jun 1, 2025
Merged

feat: implement :has pseudo-selector functionality#624
philss merged 4 commits into
philss:mainfrom
bvobart:main

Conversation

@bvobart
Copy link
Copy Markdown
Contributor

@bvobart bvobart commented May 15, 2025

Fixes #482, #382, #592

TODO:

  • Add documentation for :has to README.
  • Review by @philss
  • Fix tests
  • Add a few more tests to include other combinators and sub-selectors.

Comment thread test/floki_test.exs
Comment on lines +1791 to +1828
## NOTE: this parses incorrectly, parses as:
## %PseudoClass{name: "has", value: [%Selector{type: "label", pseudo_classes: [%PseudoClass{name: "has", value: []}]}]}
## but would expect to parse as:
## %PseudoClass{name: "has", value: [%Selector{type: "div", pseudo_classes: [%PseudoClass{name: "has", value: [%Selector{type: "label"}]}]}]}
# assert_find(html, "tr:has(div:has(label))", [
# {"tr", [],
# [
# {"th", [], [{"div", [], [{"label", [], ["NESTED"]}]}]},
# {"td", [], [{"div", [], ["fetch me pls"]}]}
# ]}
# ])

## NOTE: this does not parse, because "only simple selectors are allowed in :has() pseudo-class"
# assert_find(html, "th:has(> label)", [
# {"th", [], [{"label", [], ["TEST"]}]}
# ])

## NOTE: this does not parse, because "only simple selectors are allowed in :has() pseudo-class"
# assert_find(html, "th:has(> div > label)", [
# {"th", [], [{"div", [], [{"label", [], ["NESTED"]}]}]}
# ])

## NOTE: this parses incorrectly, parses as:
## %PseudoClass{name: "not", value: [%Selector{type: "label", pseudo_classes: [%PseudoClass{name: "has", value: []}]}]}
## but would expect to parse as:
## %PseudoClass{name: "not", value: [%Selector{type: "*", pseudo_classes: [%PseudoClass{name: "has", value: [%Selector{type: "label"}]}]}]}
# assert_find(html, "tr:not(:has(label))", [
# {"tr", [], [{"th", [], ["No Label"]}, {"td", [], ["some data"]}]}
# ])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philss During testing, I found some cases that are valid according to the CSS spec, but are not handled correctly by Floki. I suspect this has something to do with only simple selectors being allowed in not and has.

Perhaps out of scope for this PR, but do you see a solution for this? What would it take to support full selectors in not and has, do you think?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is due the fact that we don't parse the selector correctly. I think we would need to tweak the "selector" parser to be more recursive when encounter those two pseudo-selectors. We would probably need a special case to handle the > combinator as well.

Copy link
Copy Markdown
Contributor Author

@bvobart bvobart May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the selector parser would definitely need to be more recursive. I'm not familiar with the syntax of the .xrl files and accompanying parser implementations, but I did find compiler construction very interesting in uni, so here's some pseudo-grammar of how I think a CSS selector parser is supposed to be defined:

# Let's say an identifier is just a word
ID = ~r/\w+/
# a number is just, well, a number
NUMBER = ~r/\d+/
# and a value is anything within quotes
VALUE = ~r/".*"/

# Then the grammar for Floki's CSS selectors can be defined recursively as:
SELECTOR = (
  :root
  *
  ID
  .ID
  SELECTOR[ID=VALUE]
  SELECTOR[ID~=VALUE]
  SELECTOR[ID^=VALUE]
  SELECTOR.nth-child(NUMBER)
  SELECTOR#ID
  # etc
  SELECTOR? SELECTOR
  SELECTOR? > SELECTOR
  SELECTOR? + SELECTOR
  SELECTOR? ~ SELECTOR
  SELECTOR?:checked
  SELECTOR?:disabled
  SELECTOR?:fl-contains(VALUE)
  SELECTOR?:fl-icontains(VALUE)
  SELECTOR?:not(SELECTOR)
  SELECTOR?:has(SELECTOR)
)

Note that this grammar allows a superset of what CSS selectors semantically allow, as for example div:has(:has(img)) is not allowed in CSS, but div:not(:has(:not(img))) is. But as you can see, CSS selectors are quite recursive in their grammar, so the parser would need to be as well. Semantic correctness should be checked separately, potentially even at evaluation time.

Regarding selectors like > h2, whether on its own or within :has, e.g. div:has(> h2), you could interpret them as the case SELECTOR? > SELECTOR, where the first selector is nil. Logically, the absence of a selector implies no selection is being done, i.e. every element we try, matches. Given that we start the evaluation of a CSS selector by trying all HTML nodes in the HTML tree, > h2 on its own is thus equivalent to * > h2 (i.e. match every HTML element that has a h2 as a direct child). However, within div:has(> h2), when we reach > h2 in our evaluation, we have already narrowed down our search to match only divs, so > h2 is only evaluated on all div elements in the tree, making the evaluation of div:has(> h2) essentially equivalent to div > h2, except the former selects the div, the latter selects the h2. The same logic can be applied for other combinators: div:has(+ dt) is essentially the same in evaluation as div + dt, but the former selects the div, the latter selects the dt. That is how I interpret and logically make sense of the :has CSS spec.

Does this help? I realise it's been a pretty long response and it's pretty difficult to get on the same page in a complex topic like this without speaking a common grammar, so if you want to have a video call someday to meet each other and have a more interactive conversation on this subject, I'd be happy to 😊

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was super helpful! I will not promise to implement changes for that soon, but I will keep all of that in mind.
Thank you very much for your research and suggestions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand, it probably requires some significant changes. Let me know when you do, I'll probably still be interested in helping out! ;)

Copy link
Copy Markdown
Owner

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think you nailed it!
I added some points to change, but overall it looks great!

Comment thread lib/floki/selector/pseudo_class.ex Outdated
Comment thread test/floki_test.exs
bvobart and others added 2 commits May 19, 2025 23:30
Co-authored-by: Philip Sampaio <philip.sampaio@gmail.com>
…d fix behaviour of :has when given multiple arguments
@bvobart
Copy link
Copy Markdown
Contributor Author

bvobart commented May 20, 2025

I noticed I made a mistake in my implementation. According to https://developer.mozilla.org/en-US/docs/Web/CSS/:has#logical_operations when we give :has multiple arguments, e.g. :has(s1, s2), it means that s1 OR s2 must match in order for the :has to match. Initially, I thought it was an AND, which also made :has(s1, s2) equivalent to :has(s1):has(s2), but that is not true.

I've fixed that now and the updated tests reflect that :)

@bvobart bvobart requested a review from philss May 20, 2025 10:56
@philss philss merged commit f900ea1 into philss:main Jun 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for :has pseudo selector

2 participants