Conversation
There was a problem hiding this comment.
Pull request overview
Adds RSS/Atom feed ingestion to TeSS by introducing dedicated ingestors for events and materials, including optional HTML feed discovery and support for several common metadata extensions (Dublin Core, RDF/Bioschemas, iTunes, Yahoo Media).
Changes:
- Introduce shared RSS/Atom ingestion helpers (
RSSIngestion) plus reusable Dublin Core parsing/building (DublinCoreIngestion). - Add new ingestors for event and material RSS/Atom feeds, including RDF/Bioschemas merge behavior and HTML alternate-feed discovery.
- Add RSS Media namespace support for Atom parsing and comprehensive unit tests for RSS/Atom ingestion and extensions.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/rss_media_atom_test.rb | Tests Media namespace installation idempotency for Atom. |
| test/unit/ingestors/material_rss_ingestor_test.rb | Material RSS/Atom ingestion tests (DC, RSS versions, RDF/Bioschemas, HTML discovery, media/iTunes extensions). |
| test/unit/ingestors/event_rss_ingestor_test.rb | Event RSS/Atom ingestion tests (DC, relative links, RDF/Bioschemas, HTML discovery). |
| lib/rss/media.rb | Defines Yahoo Media RSS extension wiring + loads Atom-specific patch. |
| lib/rss/media/atom.rb | Patches Atom classes to support media:group parsing and makes namespace installation idempotent. |
| lib/ingestors/rss_ingestion.rb | Shared feed fetching/parsing + HTML discovery + extraction/merge helpers. |
| lib/ingestors/dublin_core_ingestion.rb | Centralized DC-to-OpenStruct builders and normalization helpers. |
| lib/ingestors/material_rss_ingestor.rb | New material RSS/Atom ingestor (RSS/RDF/Atom + Bioschemas LearningResource extraction). |
| lib/ingestors/event_rss_ingestor.rb | New event RSS/Atom ingestor (RSS/RDF/Atom + Bioschemas Event/Course extraction). |
| lib/ingestors/oai_pmh_ingestor.rb | Refactors OAI-PMH DC parsing to reuse DublinCoreIngestion. |
| lib/ingestors/ingestor_factory.rb | Registers the new RSS ingestors. |
| config/initializers/inflections.rb | Adds RSS acronym for correct Zeitwerk/inflector naming. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Two additional notes:
|
fbacall
left a comment
There was a problem hiding this comment.
Looks good - very flexible and nice tests. I think some parts can be simplified, and it might be good to split the YouTube functionality into a simple subclass for the sake of clarity.
| rescue StandardError | ||
| nil |
There was a problem hiding this comment.
Can this rescue be more specific?
| @@ -0,0 +1,111 @@ | |||
| require 'rss' | |||
| require_relative '../rss/media' | |||
There was a problem hiding this comment.
Is this needed? Should be able to autoload
| else | ||
| @messages << "Parsing UNKNOWN feed: #{feed_title(feed)}" | ||
| @messages << 'unsupported feed format' | ||
| end |
There was a problem hiding this comment.
What other types of feed are there? Perhaps log what the feed was.
| rescue StandardError | ||
| nil |
| return nil if candidate.blank? | ||
|
|
||
| URI.parse(candidate) | ||
| return candidate if URI::DEFAULT_PARSER.make_regexp(%w[http https]).match?(candidate) |
There was a problem hiding this comment.
Do you need to check the regexp if it's already parsed above? Can just check .scheme on the URI object.
| links.map { |link| text_value(link.href) }.find(&:present?) | ||
| end | ||
|
|
||
| def resolve_feed_url(candidate_url, feed_url) |
There was a problem hiding this comment.
I'm not entirely sure what this method is for, but if you're trying to get an absolute URL from the candidate_url which may be relative or absolute, you can just use:
Addressable::URI.join(feed_url, candidate_url)
| @@ -0,0 +1,54 @@ | |||
| # Patches RSS::Atom::Feed and RSS::Atom::Entry with Media namespace support (see ../media.rb). | |||
| # Kept as RSS::Media::Atom so Zeitwerk can autoload it from lib/rss/media/atom.rb. | |||
There was a problem hiding this comment.
Is anywhere autoloading it?
I think put this in config/initializers, where we already have other patches. It will also save you having to worry about loading twice.
| if (url = discover_feed_url_from_youtube_playlist_url(base_url)) | ||
| @messages << "Found Atom feed link from YouTube playlist URL, following: #{url}" | ||
| return url | ||
| end |
There was a problem hiding this comment.
I think it might be nicer if we have a subclassed Ingestor for YouTube ingestion. Then we don't have to explain to users that they need to pick the RSS ingestor for YouTube content.
| def discover_feed_url_from_youtube_playlist_url(base_url) | ||
| uri = URI.parse(base_url) | ||
| host = uri.host.to_s.downcase | ||
| return nil unless host == 'youtube.com' || host.end_with?('.youtube.com') |
There was a problem hiding this comment.
Could perhaps re-use the is_youtube_url? method from lib/renderers/youtube.rb (make it not private)
| Array(values).map { |v| dublin_core_text(v) } | ||
| .map(&:to_s) | ||
| .map(&:strip) |
There was a problem hiding this comment.
Don't need to map 3 times, just do dublin_core_text(v).to_s.strip in the first block
Summary of changes
linkelement withapplication/rss+xmlor atomMotivation and context
Closes #722
Screenshots

Checklist