Skip to content

Comments

[HEP Training ingestors] added a custom event ingestor (Gray Scott events)#1236

Open
kennethrioja wants to merge 3 commits intoElixirTeSS:masterfrom
kennethrioja:gray-scott-ingestor
Open

[HEP Training ingestors] added a custom event ingestor (Gray Scott events)#1236
kennethrioja wants to merge 3 commits intoElixirTeSS:masterfrom
kennethrioja:gray-scott-ingestor

Conversation

@kennethrioja
Copy link
Contributor

Summary of changes

Motivation and context

Asked by David Chamont (in2p3)

Screenshots

N/A

Checklist

  • I have read and followed the CONTRIBUTING guide.
  • I confirm that I have the authority necessary to make this contribution on behalf of its copyright owner and agree
    to license it to the TeSS codebase under the
    BSD license.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new HEPTraining ingestor to import Gray Scott School 2026 webinar events from an ICS feed, including fixtures/tests, and factors out a shared HTML-fetch helper for ingestors.

Changes:

  • Introduce Ingestors::Heptraining::GrayScottIngestor to ingest events from the Gray Scott ICS and scrape additional details from linked pages.
  • Register the new ingestor in IngestorFactory.
  • Add unit test + fixtures for the ICS and associated HTML pages; refactor GitHub ingestor to use a shared HTML fetcher.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
lib/ingestors/heptraining/gray_scott_ingestor.rb New ingestor implementation for Gray Scott events (ICS parsing + HTML scraping).
lib/ingestors/ingestor_factory.rb Registers the new Heptraining ingestor in the factory list.
lib/ingestors/ingestor.rb Adds shared get_html_from_url helper for HTTParty+Nokogiri HTML retrieval.
lib/ingestors/github_ingestor.rb Switches from a local HTML helper to the new base-class helper.
test/unit/ingestors/heptraining/gray_scott_ingestor_test.rb Adds unit test coverage for the new ingestor behavior.
test/fixtures/files/ingestion/heptraining/grayscott/grayscott-event.ics ICS fixture used by the new ingestor test.
test/fixtures/files/ingestion/heptraining/grayscott/grayscott-redirect.html Redirect-page fixture used by the new ingestor test.
test/fixtures/files/ingestion/heptraining/grayscott/grayscott-page.html Event detail page fixture used by the new ingestor test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +58
def get_redirected_url(url)
uri = URI.parse(url)
label = CGI.parse(uri.query)['label']&.first
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ingestor defines get_redirected_url(url) which overrides Ingestor#get_redirected_url(url, limit = 5) with a different signature/behavior. This can be confusing and makes it easy to accidentally bypass the shared redirect logic; consider renaming this method to something Gray-Scott-specific (or accept *args and call super where appropriate).

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +68
script_content = get_html_from_url(url).css('script').find { |s| s.content.include?('var dictReference') }&.content
dict_match = script_content&.match(/var\s+dictReference\s*=\s*({[^}]+})/)
return unless dict_match

dict = JSON.parse(dict_match[1])
matched_value = dict[label]

"#{uri.scheme}://#{uri.host}#{uri.path.sub(%r{/[^/]+$}, '')}/#{matched_value}"
end
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_redirected_url can return nil (no matching script/dictionary) or build an invalid URL when matched_value is nil, but the caller immediately passes the result into get_html_from_url(...). Add a guard/fallback (e.g., return the original URL, or raise a descriptive error) before attempting to fetch/parse HTML.

Copilot uses AI. Check for mistakes.
end

def get_html_from_url(url)
response = HTTParty.get(url, follow_redirects: true, headers: { 'User-Agent' => config[:user_agent] })
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_html_from_url sets the User-Agent header to config[:user_agent] without a default; ingestors like GrayScottIngestor don't set :user_agent, which can result in a nil/empty User-Agent being sent and cause requests to be blocked. Consider defaulting to something like 'TeSS Bot' (consistent with get_redirected_url).

Suggested change
response = HTTParty.get(url, follow_redirects: true, headers: { 'User-Agent' => config[:user_agent] })
response = HTTParty.get(url, follow_redirects: true, headers: { 'User-Agent' => config[:user_agent] || 'TeSS Bot' })

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +36
# puts "calevent: #{calevent.inspect}"
gs_url = calevent.custom_properties.find { |key, _| key.include?('http') }&.last&.first&.strip&.gsub(%r{^[/\s]+|[/\s]+$}, '')&.prepend('https://')
html = get_html_from_url(get_redirected_url(gs_url))
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gs_url is derived from calevent.custom_properties, but the provided .ics fixture embeds the redirect URL inside the DESCRIPTION (folded line), not as a custom property. This will likely produce nil for gs_url and then raise when calling URI.parse/get_redirected_url. Extract the first http(s) URL from calevent.description (or calevent.url when present) and validate it before proceeding.

Copilot uses AI. Check for mistakes.
event = OpenStruct.new
event.title = calevent.summary.to_s
event.url = gs_url
event.description = html.css('.paragraphStyle').text.strip || calevent.description.to_s
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

html.css('.paragraphStyle').text.strip || calevent.description.to_s will never fall back to the iCal description when the HTML selector is missing, because text.strip returns "" (truthy) rather than nil. Use a blank/presence check so the iCal description is used when the scraped HTML is empty.

Suggested change
event.description = html.css('.paragraphStyle').text.strip || calevent.description.to_s
html_description = html.css('.paragraphStyle').text.to_s.strip
event.description = html_description.empty? ? calevent.description.to_s : html_description

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant