-
Notifications
You must be signed in to change notification settings - Fork 20
[HEP Training ingestors] added a custom event ingestor (Gray Scott events) #1236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,75 @@ | ||||||||
| require 'icalendar' | ||||||||
| require 'nokogiri' | ||||||||
| require 'open-uri' | ||||||||
| require 'tzinfo' | ||||||||
|
|
||||||||
| module Ingestors | ||||||||
| module Heptraining | ||||||||
| class GrayScottIngestor < Ingestor | ||||||||
| def self.config | ||||||||
| { | ||||||||
| key: 'gray_scott_event', | ||||||||
| title: 'Gray Scott Events API', | ||||||||
| category: :events | ||||||||
| } | ||||||||
| end | ||||||||
|
|
||||||||
| def read(url) | ||||||||
| @verbose = false | ||||||||
| process_gray_scott(url) | ||||||||
| end | ||||||||
|
|
||||||||
| private | ||||||||
|
|
||||||||
| def process_gray_scott(url) | ||||||||
| events = Icalendar::Event.parse(open_url(url, raise: true).set_encoding('utf-8')) | ||||||||
| raise 'Not found' if events.nil? || events.empty? | ||||||||
|
|
||||||||
| events.each do |e| | ||||||||
| process_calevent(e, url) | ||||||||
| end | ||||||||
| end | ||||||||
|
|
||||||||
| def process_calevent(calevent, url) | ||||||||
| # puts "calevent: #{calevent.inspect}" | ||||||||
| gs_url = calevent.custom_properties.find { |key, _| key.include?('http') }&.last&.first&.strip&.gsub(%r{^[/\s]+|[/\s]+$}, '')&.prepend('https://') | ||||||||
| html = get_html_from_url(get_redirected_url(gs_url)) | ||||||||
|
|
||||||||
| event = OpenStruct.new | ||||||||
| event.title = calevent.summary.to_s | ||||||||
| event.url = gs_url | ||||||||
| event.description = html.css('.paragraphStyle').text.strip || calevent.description.to_s | ||||||||
|
||||||||
| event.description = html.css('.paragraphStyle').text.strip || calevent.description.to_s | |
| html_description = html.css('.paragraphStyle').text.to_s.strip | |
| event.description = html_description.empty? ? calevent.description.to_s : html_description |
Copilot
AI
Feb 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ingestor defines get_redirected_url(url) which overrides Ingestor#get_redirected_url(url, limit = 5) with a different signature/behavior. This can be confusing and makes it easy to accidentally bypass the shared redirect logic; consider renaming this method to something Gray-Scott-specific (or accept *args and call super where appropriate).
Copilot
AI
Feb 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_redirected_url can return nil (no matching script/dictionary) or build an invalid URL when matched_value is nil, but the caller immediately passes the result into get_html_from_url(...). Add a guard/fallback (e.g., return the original URL, or raise a descriptive error) before attempting to fetch/parse HTML.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -84,6 +84,11 @@ def open_url(url, raise: false, token: nil) | |||||
| end | ||||||
| end | ||||||
|
|
||||||
| def get_html_from_url(url) | ||||||
| response = HTTParty.get(url, follow_redirects: true, headers: { 'User-Agent' => config[:user_agent] }) | ||||||
|
||||||
| response = HTTParty.get(url, follow_redirects: true, headers: { 'User-Agent' => config[:user_agent] }) | |
| response = HTTParty.get(url, follow_redirects: true, headers: { 'User-Agent' => config[:user_agent] || 'TeSS Bot' }) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| BEGIN:VCALENDAR | ||
| VERSION:2.0 | ||
| PRODID:-//PhoenixTex2Html//gray_scott_2026_webinars/ | ||
| BEGIN:VEVENT | ||
| CLASS:PUBLIC | ||
| DTSTAMP:20260212T103600 | ||
| UID:TH8WMR_PNR0012_20260212T103600 | ||
| DTSTART;TZID=Europe/Paris:20260226T100000 | ||
| DTEND;TZID=Europe/Paris:20260226T113000 | ||
| SUMMARY:Memory allocation, why and how to profile applications | ||
|
|
||
| LOCATION:Registration : <a id="0" href="https://teratec.webex.com/blabla">https://teratec.webex.com/blabla</a> | ||
|
|
||
| DESCRIPTION:Memory allocation, why and how to profile applications | ||
| \n | ||
| https://cta-lapp.pages.in2p3.fr/cours/gray_scott_revolutions/grayscott2026/redirect.html?label=sec_gray_scott_webinar_memory_allocation_memory_profiling\n | ||
| BEGIN:VALARM | ||
| TRIGGER:-PT10M | ||
| ACTION:DISPLAY | ||
| DESCRIPTION:Reminder | ||
| END:VALARM | ||
| END:VEVENT | ||
| END:VCALENDAR |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
|
|
||
| <!DOCTYPE html> | ||
| <html class="js sidebar-visible navy" lang="fr"> | ||
| <head> | ||
| <meta charset="UTF-8"> | ||
| <title>Memory allocation, why and how to profile applications | ||
| </title> | ||
| <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> | ||
| <meta name="description" content="Memory allocation, why and how to profile applications | ||
| "> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1"> | ||
| <meta name="theme-color" content="rgba(0, 0, 0, 0)"> | ||
| <link rel="stylesheet" href="variables.css"> | ||
| <link rel="stylesheet" href="dark_style.css" /> | ||
| <link rel="stylesheet" href="general.css"> | ||
| <link rel="stylesheet" href="chrome.css"> | ||
| <link rel="stylesheet" href="highlight.css" disabled=""> | ||
| <link rel="stylesheet" href="tomorrow-night.css"> | ||
| <link rel="stylesheet" href="ayu-highlight.css" disabled=""> | ||
| <!-- Fonts --> | ||
| <link rel="stylesheet" href="font-awesome.css"> | ||
| <link rel="stylesheet" href="fonts.css"> | ||
| <!-- <script src="" async></script> --> | ||
| <!-- <script src=""></script> --> | ||
| </head> | ||
| <body> | ||
|
|
||
| <a id="450" href="invitation/gray_scott_webinar_memory_allocation_memory_profiling.ics"><div class="rendezvousStyle"></div></a><b>Date</b> : 26/02/2026<br /> | ||
| <b>Location</b> : Registration : <a id="458" href="https://teratec.webex.com/webappng/sites/teratec/webinar/webinarSeries/register/0465b64b919540de9910a5b84077b878">https://teratec.webex.com/webappng/sites/teratec/webinar/webinarSeries/register/0465b64b919540de9910a5b84077b878</a> | ||
| <br /> | ||
| <b>Start at</b> : 10:00<br /> | ||
| <b>Stop at</b> : 11:30 <h3 id="466">Speakers</h3> | ||
| <ul> | ||
| <li><a href="2-3-5-4513.html">Someone | ||
| </a></li> | ||
| <li><a href="2-3-5-4513.html">SomeoneElse | ||
| </a></li> | ||
| </ul> | ||
| <h3 id="471">Description</h3> | ||
| <p id="472" class="paragraphStyle"> | ||
| Sometimes memory has become a major problem in applications, with its bandwidth but also by the incresing size needed by more and more complex and dynamic applications. So, how to track these errors and point problematic patterns ? How to find where the memory is consumed when the application reaches the hardware limit ? After my PhD on memory management in HPC context (NUMA, parallel, etc) I had the opportunity to develop two profilers (malloc and numa) now open-sources for C/C++/Fortran and Rust. I will briefly present these tools with some examples and expected observations. | ||
| </p> | ||
|
|
||
| </body> | ||
| </html> | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
|
|
||
| <!DOCTYPE html> | ||
| <html lang="fr"> | ||
| <head> | ||
| <meta charset="utf-8" /> | ||
| <title>Page redirection</title> | ||
| <link rel="stylesheet" href="dark_style.css" /> | ||
| <script type="text/javascript"> | ||
| function redirectionWithLabelReference(){ | ||
| var parameters = location.search.substring(1).split("?"); | ||
| var tmp = parameters[0].split("="); | ||
| referenceName = unescape(tmp[1]); | ||
| var dictReference = { | ||
| "sec_gray_scott_webinar_memory_allocation_memory_profiling": "1-1-5-1-449.html" | ||
| }; | ||
| if(referenceName in dictReference){ | ||
| document.location.href=dictReference[referenceName]; | ||
| }else{ | ||
| document.location.href="index.html"; | ||
| } | ||
| } | ||
| </script> | ||
| </head> | ||
| <body onLoad="setTimeout('redirectionWithLabelReference()', 1000)"> | ||
| <div>Dans 2 secondes vous allez être redirigé vers la page que vous avez demandée... normalement</div> | ||
| </body> | ||
| </html> | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| require 'test_helper' | ||
|
|
||
| class GrayScottIngestorTest < ActiveSupport::TestCase | ||
| setup do | ||
| @ingestor = Ingestors::Heptraining::GrayScottIngestor.new | ||
| @user = users(:regular_user) | ||
| @content_provider = content_providers(:another_portal_provider) | ||
|
|
||
| webmock('https://cta-lapp.pages.in2p3.fr/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/invitation/gray_scott_2026_webinars.ics', 'heptraining/grayscott/grayscott-event.ics') | ||
| webmock('https://cta-lapp.pages.in2p3.fr/cours/gray_scott_revolutions/grayscott2026/redirect.html?label=sec_gray_scott_webinar_memory_allocation_memory_profiling', 'heptraining/grayscott/grayscott-redirect.html') | ||
| webmock('https://cta-lapp.pages.in2p3.fr/cours/gray_scott_revolutions/grayscott2026/1-1-5-1-449.html', 'heptraining/grayscott/grayscott-page.html') | ||
| end | ||
|
|
||
| teardown do | ||
| reset_timezone | ||
| end | ||
|
|
||
| test 'should read Gray Scott ics' do | ||
| @ingestor.read('https://cta-lapp.pages.in2p3.fr/COURS/GRAY_SCOTT_REVOLUTIONS/GrayScott2026/invitation/gray_scott_2026_webinars.ics') | ||
| @ingestor.write(@user, @content_provider) | ||
|
|
||
| sample = @ingestor.events.detect { |e| e.title == 'Memory allocation, why and how to profile applications' } | ||
| assert sample.persisted? | ||
|
|
||
| assert_equal sample.url, 'https://cta-lapp.pages.in2p3.fr/cours/gray_scott_revolutions/grayscott2026/redirect.html?label=sec_gray_scott_webinar_memory_allocation_memory_profiling' | ||
| assert_includes sample.description, 'Sometimes memory has become a major problem in applications' | ||
| assert_equal sample.end, '2026-02-26 10:30:00 +0000' | ||
| assert_equal sample.start, '2026-02-26 09:00:00 +0000' | ||
| assert_equal sample.timezone, 'Paris' | ||
| assert_includes sample.venue, 'teratec.webex.com' | ||
| assert_equal sample.organizer, 'Someone, SomeoneElse' | ||
| end | ||
|
|
||
| private | ||
|
|
||
| def webmock(url, filename) | ||
| file = Rails.root.join('test', 'fixtures', 'files', 'ingestion', filename) | ||
| WebMock.stub_request(:get, url).to_return(status: 200, headers: {}, body: file.read) | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gs_urlis derived fromcalevent.custom_properties, but the provided .ics fixture embeds the redirect URL inside the DESCRIPTION (folded line), not as a custom property. This will likely producenilforgs_urland then raise when callingURI.parse/get_redirected_url. Extract the first http(s) URL fromcalevent.description(orcalevent.urlwhen present) and validate it before proceeding.