Skip to content

Code Challenge submission #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Code Challenge submission #327

wants to merge 4 commits into from

Conversation

Gawyn
Copy link

@Gawyn Gawyn commented Apr 26, 2025

In this PR, a GoogleSearchArtworksParser is added, that can parse artworks from a Google Search Results page. The only public method of this class is .parse, which accepts one string as a parameter. This string can be both a file path (files/expected-array.json) or a URL (https://google.com/search?q=Van+Gogh+paintings).

The most significant design decision taken was how to go around the issue of the lazy-loaded images. There were two options: the simplest one was to use a "real" browser, via Selenium, to get the HTML after JS finishes modifying it. The second one would be to check exactly what the scripts were doing, and then try to capture the logic of those scripts in the parsing process. The second one is way more performant (browsers are slow), but also more complicated, and more brittle. Therefore, I decided to go with the Selenium option. In a situation in which scalability is the paramount concern, I would go for the no-Selenium option.

One final thing. You will see that, after fetching the HTML with Selenium, I am parsing it with Nokolexbor. This is because, while it would be possible to traverse the HTML with Selenium, Nokolexbor is way faster, even in a small project like this one. Therefore, I decided to keep Nokolexbor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant