Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR, a
GoogleSearchArtworksParser
is added, that can parse artworks from a Google Search Results page. The only public method of this class is.parse
, which accepts one string as a parameter. This string can be both a file path (files/expected-array.json
) or a URL (https://google.com/search?q=Van+Gogh+paintings
).The most significant design decision taken was how to go around the issue of the lazy-loaded images. There were two options: the simplest one was to use a "real" browser, via Selenium, to get the HTML after JS finishes modifying it. The second one would be to check exactly what the scripts were doing, and then try to capture the logic of those scripts in the parsing process. The second one is way more performant (browsers are slow), but also more complicated, and more brittle. Therefore, I decided to go with the Selenium option. In a situation in which scalability is the paramount concern, I would go for the no-Selenium option.
One final thing. You will see that, after fetching the HTML with Selenium, I am parsing it with Nokolexbor. This is because, while it would be possible to traverse the HTML with Selenium, Nokolexbor is way faster, even in a small project like this one. Therefore, I decided to keep Nokolexbor.