Skip to content

Handle frontend loaded zip archive content (h5p, bloom(pub/d), perseus, kpub) that contains large files#12805

Open
rtibbles wants to merge 12 commits intolearningequality:release-v0.19.xfrom
rtibbles:large_zipcontent
Open

Handle frontend loaded zip archive content (h5p, bloom(pub/d), perseus, kpub) that contains large files#12805
rtibbles wants to merge 12 commits intolearningequality:release-v0.19.xfrom
rtibbles:large_zipcontent

Conversation

@rtibbles
Copy link
Member

@rtibbles rtibbles commented Nov 7, 2024

Summary

Refactors the kolibri-zip package to use zip.js and implements adaptive loading for large ZIP archives (H5P, Bloom, Perseus, QTI). Instead of downloading entire archives upfront, the new approach:

  • Small archives (< 2.5MB): downloads the entire file in one request (fast path, unchanged UX)
  • Large archives: uses HTTP range requests to fetch ZIP metadata first, then groups small files into ~500KB chunks for efficient batch extraction
  • Large media files (audio/video >= 500KB within archives): not extracted at all — instead, a largeFileUrlGenerator callback provides streaming URLs pointing to the zipcontent backend, so video/audio plays via standard HTTP without being loaded into memory

Other changes:

  • Simplified zipContentUrl API — now takes a file object instead of separate parameters
  • Added CSP_MEDIA_SRC to allow media streaming from the zipcontent origin
  • Backend now decodes URL-encoded embedded file paths so filenames with spaces/special characters resolve correctly
  • Comprehensive test coverage for AdaptiveHttpReader and kolibri-zip

References

Fixes #9761

Reviewer guidance

How to test:

  1. Import the H5P channel with token zapif-babuz — it contains content with large embedded video/audio
  2. Open the content — it should load significantly faster than on release-v0.19.x
  3. Verify that embedded video/audio plays correctly (streamed from zipcontent backend, not extracted into memory)
  4. Test with Bloom content (.bloomd/.bloompub) and Perseus content as well

Risky areas deserving extra review:

  • packages/kolibri-zip/src/AdaptiveHttpReader.js — entirely new; implements HTTP range request reading with chunking, tail prefetching, and adaptive full-download-to-lazy switching
  • packages/kolibri-zip/src/index.js — rewritten from Promise/callback-based code (fflate with unzip callbacks) to async/await using zip.js. The control flow is substantially different even though the public API (file, files, filesFromExtension, close) is preserved. Also adds the new large file URL generation path through ExtractedFile.
  • kolibri/deployment/default/settings/base.py:489-491 — CSP change adding CSP_MEDIA_SRC to allow media streaming from the zipcontent origin

Architecture decisions:

  • Uses XMLHttpRequest instead of fetch for range requests to maintain iOS Safari 9.3 compatibility
  • The adaptive threshold (2.5MB) for switching from full download to range requests is a heuristic — content below this size is fast enough to download entirely - this should help to handle the non-media "large files" somewhat, as I have definitely seen some non-media files exceeding the 500KB threshold, so this allows them to be lazy loaded when there are too many of them
  • The 500KB threshold for "large media file" is also a somewhat arbitrary boundary at which streaming from the backend seems more efficient than extracting into a blob URL

AI usage

Claude Code was used extensively to implement the AdaptiveHttpReader, write test suites, and update all consumer components to the new API. All generated code was reviewed, tested, and refined, with a particular focus on removing or refining redundant test and production code paths.

@github-actions github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Nov 7, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2024

@pcenov
Copy link
Member

pcenov commented Nov 14, 2024

Hi @rtibbles,

I noticed the following issues:

  1. A lot of resources are not loading at all. Examples are 'Advent Calendar (beta)', 'Interactive book', 'Course Presentation', 'Virtual Tour (360)'. For others it takes noticeably longer to load.
large.mp4
  1. In some resources there are missing icons, examples are 'Speak the Words Set':

speak

And 'Unidad 1 - Herramientas para el empoderamiento' from the QA channel, where the player buttons are missing:

buttons

I've created the following test channel zapif-babuz using the provided example resources at https://h5p.org/content-types-and-applications so that you can have a look and replicate the issues.

@rtibbles
Copy link
Member Author

Thanks @pcenov - I guess there was a good reason I marked this as a draft!

@rtibbles
Copy link
Member Author

Note the backend parts of this PR have been cherry-picked into this PR: #12993

@rtibbles rtibbles changed the base branch from develop to release-v0.19.x January 16, 2026 05:39
@github-actions github-actions bot added DEV: renderers HTML5 apps, videos, exercises, etc. APP: Learn Re: Learn App (content, quizzes, lessons, etc.) DEV: frontend SIZE: very large labels Jan 16, 2026
@rtibbles
Copy link
Member Author

@pcenov @radinamatic this has been updated - this is not super urgent, and QA of it shouldn't take precedence over others - but in case there is some spare bandwidth, I think the latest updates here should have resolved the issues that were previously being observed.

@pcenov
Copy link
Member

pcenov commented Jan 22, 2026

Hi @rtibbles - unfortunately after testing again with the resources from zapif-babuz I can confirm that the first issue is still present and the following resources are not displayed: 'Advent Calendar (beta)', 'Interactive book', 'Course Presentation', 'Virtual Tour (360)'

interactive book

Point 2 from my previous comment is fixed now and the icons are displayed correctly.

@rtibbles
Copy link
Member Author

I will take an incremental improvement - thanks for testing, I think this remaining issue will be quicker to hunt down!

@rtibbles
Copy link
Member Author

rtibbles commented Feb 14, 2026

The previously reported non-loading H5P content types from zapif-babuz are now working:

Content Type Screenshot
Advent Calendar (beta) Advent Calendar
Course Presentation Course Presentation
Interactive Book Interactive Book
Virtual Tour (360) Virtual Tour

@rtibbles rtibbles force-pushed the large_zipcontent branch 2 times, most recently from 77443cc to f274fff Compare February 20, 2026 06:23
@rtibbles rtibbles requested a review from rtibblesbot February 20, 2026 06:24
Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured approach to lazy-loading large zip archives on the frontend. The AdaptiveHttpReader, chunked fetching, and large-file URL delegation are solid.

CI: linting passed, other checks pending. No new strings added (appropriate for release branch target).

  • suggestion: ZIP_EXTRA_FIELD_ESTIMATE may underestimate — see inline
  • suggestion: tail prefetch range validation — see inline
  • praise: Good decision to only defer audio/video to URL generation while keeping large non-media files in chunks
  • praise: CSP_MEDIA_SRC addition correctly enables cross-origin media streaming

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Reviewed the pull request diff checking for:

  • Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
  • Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
  • Architecture: duplicated concerns, minimal interfaces, composition over inheritance
  • Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
  • Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
  • Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
  • Checked CI status and linked issue acceptance criteria
  • For UI changes: inspected screenshots for layout, visual completeness, and consistency

Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress addressing prior feedback — both the ZIP_EXTRA_FIELD_ESTIMATE increase and tail prefetch bounds check are in place.

2 prior finding(s) resolved.

CI: Frontend tests failing (2 tests in AdaptiveHttpReader.spec.js). Linting passes, other checks pending.

  • blocking: Test assertions need updating after ZIP_EXTRA_FIELD_ESTIMATE change — see inline
  • suggestion: _tryFullDownload onprogress abort path doesn't set this.size — see inline
  • nitpick: BloomRunner comment copy-paste — see inline

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

  • Retrieved prior bot reviews deterministically via the GitHub API
  • Classified each prior finding as RESOLVED, UNRESOLVED, PARTIALLY_ADDRESSED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Reviewed the pull request diff checking for correctness, design, architecture, testing, completeness, and adherence to DRY/SRP principles
  • Checked CI status and linked issue acceptance criteria

Copy link
Member Author

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes for myself.

Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress — the core adaptive loading and large media delegation are solid.

5 prior finding(s) resolved.

Prior findings still open:

  • [suggestion] _tryFullDownload onprogress abort path doesn't set this.size — UNRESOLVED. When event.lengthComputable is false and Content-Length header is absent, this.size stays 0 after abort, breaking downstream lazy-mode operations. @rtibbles noted the onprogress handler may not be needed at all — either way, this needs resolution before merge.

CI: Linting failing — unused variable in test file. Frontend tests pass.

  • blocking: Lint failure from unused rangesBefore — see inline
  • suggestion: README "Large File Handling" naming is misleading per @rtibbles' own review — see inline
  • suggestion: Lazy loading test comment block can be cleaned up per @rtibbles' own review — see inline

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

  • Retrieved prior bot reviews deterministically via the GitHub API
  • Classified each prior finding as RESOLVED, UNRESOLVED, PARTIALLY_ADDRESSED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Reviewed the pull request diff checking for correctness, design, architecture, testing, completeness, and adherence to DRY/SRP principles
  • Checked CI status and linked issue acceptance criteria

Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation — the adaptive loading, chunked fetching, and large media delegation are well-designed and well-tested.

7 prior finding(s) resolved. 1 prior finding acknowledged (author accepted, pending fix).

CI: Frontend tests passing, linting passing, Python tests in progress. No new i18n strings (appropriate for release-v0.19.x target).

  • suggestion: embeddedFilePath in zipContentUrl is not URL-encoded — see inline
  • suggestion: chunksConfigured getter has a subtle edge case — see inline
  • praise: Comprehensive test coverage for both AdaptiveHttpReader and ZipFile lazy loading, including edge cases like tail chunk bounds fallback and large file gap detection

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

  • Retrieved prior bot reviews deterministically via the GitHub API
  • Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Reviewed the pull request diff checking for correctness, design, architecture, testing, completeness, and adherence to DRY/SRP principles
  • Checked CI status and linked issue acceptance criteria

rtibbles and others added 3 commits February 28, 2026 12:01
…largeFile handling. Add zip origin to CSP for media.
Add unquote() call for embedded_filepath so filenames with spaces or
special characters resolve correctly. Add test coverage for both backend
(zip_wsgi) and frontend (zipContentUrl) URL encoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation is solid — the adaptive loading, chunked fetching, and large media delegation are well-designed and thoroughly tested.

3 prior finding(s) resolved. 1 prior finding contested (URL encoding — URL constructor handles it, confirming test added).

CI: Build assets job in progress. No new i18n strings (appropriate for release-v0.19.x target).

  • praise: Consistent largeFileUrlGenerator integration across all five consumers (Perseus, SafeHtml5, QTI, H5P, Bloom) with a clean API surface

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

  • Retrieved prior bot reviews deterministically via the GitHub API
  • Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Reviewed the pull request diff checking for correctness, design, architecture, testing, completeness, and adherence to DRY/SRP principles
  • Checked CI status and linked issue acceptance criteria

try {
// Only use largeMediaThreshold if we have a generator to serve large files
// Otherwise, treat all files the same (include them all in chunks)
const effectiveThreshold = this.largeFileUrlGenerator ? this.largeMediaThreshold : Infinity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Smart use of Infinity threshold when no URL generator is provided — ensures all files are included in chunks without needing a separate code path.

Copy link
Member

@bjester bjester Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README states:

Audio/video files larger than this use largeFileUrlGenerator instead of being extracted

Which made me go back to check whether it threw an error if the size was larger than the threshold but a URL generator wasn't supplied. This ternary prevents that situation, but silently. Since threshold handling is managed in AdaptiveHttpReader, this all leads to a bit of a confusing situation if misconfigured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a warning in the constructor now to handle this case.

@rtibbles rtibbles marked this pull request as ready for review March 1, 2026 18:23
Copy link
Member Author

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick self review for some signposting!

next(response.streaming_content).decode(), self.embedded_file_str
)

def test_request_for_url_encoded_embedded_file(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This come up during local self review using Claude, it seems like it's an edge case that we've not really hit previously, but it was simple enough to write a test for and fix.


# Includes a prefix that is almost certain not to collide
# with a filename embedded in a zip file. Prefix is:
# @*._ followed by the encoded base url
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was outdated.

CSP_FRAME_SRC = CSP_DEFAULT_SRC + frame_src

# Allow media from our zipcontent origin as well
CSP_MEDIA_SRC = CSP_DEFAULT_SRC + frame_src
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't affect our sandboxing, as it only adds a CSP allow for media URLs specifically, and doesn't switch up the opposite direction.

if (!this.isH5P) {
// In the case that this is being routed via a remote URL
// ensure we preserve that for the zip endpoint.
const url = new URL(this.defaultFile.storage_url, window.location.href);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were repeating this logic, so I moved it into the zip url function instead.

const storageUrl = this.defaultFile.storage_url;
const zipFile = new ZipFile(storageUrl);
const zipFile = new ZipFile(storageUrl, {
largeFileUrlGenerator: filepath => urls.zipContentUrl(this.defaultFile, filepath),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could have made a function factory that gets reused, but it was such a small amount of code, it didn't seem worth abstracting.

@@ -0,0 +1,455 @@
// Polyfills for Node.js and older browsers (must be before zip.js import)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all new code, a subclass of the HttpReader class from zip.js to implement the lazy/chunked loading behaviour.

this._fileLoadingPromise = this._init();
}

async _init() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a good opportunity to switch these methods to async/await, as there was a lot of gnarly Promise callback nesting otherwise.

@@ -1 +0,0 @@
{"js":"application/javascript","json":"application/json","doc":"application/msword","pdf":"application/pdf","rtf":"text/rtf","xls":"application/vnd.ms-excel","ppt":"application/vnd.ms-powerpoint","odp":"application/vnd.oasis.opendocument.presentation","ods":"application/vnd.oasis.opendocument.spreadsheet","odt":"application/vnd.oasis.opendocument.text","pptx":"application/vnd.openxmlformats-officedocument.presentationml.presentation","xlsx":"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet","docx":"application/vnd.openxmlformats-officedocument.wordprocessingml.document","swf":"application/x-shockwave-flash","xhtml":"application/xhtml+xml","xml":"text/xml","mp3":"audio/mpeg","m4a":"audio/x-m4a","ogg":"audio/ogg","wav":"audio/x-wav","otf":"font/otf","ttf":"font/ttf","woff":"font/woff","bmp":"image/x-ms-bmp","gif":"image/gif","jpeg":"image/jpeg","jpg":"image/jpeg","png":"image/png","svg":"image/svg+xml","tif":"image/tiff","tiff":"image/tiff","css":"text/css","csv":"text/csv","html":"text/html","htm":"text/html","md":"text/markdown","txt":"text/plain","vtt":"text/vtt","mp4":"video/mp4","webm":"video/webm"} No newline at end of file
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zip.js includes more robust mimetype handling internally, which I chose to supplement rather than continuing to maintain our own.

@@ -0,0 +1,1154 @@
// Polyfills for Node.js/jsdom environment - must run BEFORE any module imports
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this test suite before doing any refactor to zip.js to give me confidence of the public API of the kolibri-zip module remaining stable. This is also why the tests here still rely on fflate, because that was what was in place previously.

I had to make a few tweaks to the tests (for example adding the polyfills for webstream and text encoder/decoder etc), but for the most part even the tests of the novel behaviour were written first (with Claude Code assistance) in order to take a RED/GREEN approach here.

});

describe('Large file with URL generator', () => {
test('large media file cannot be converted to string when URL generator is used', async () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could in theory enable this, by turning the toString method into an async/await method to allow this - but it feels of limited use for media files.

@rtibbles rtibbles changed the title Handle frontend loaded zip archive content (h5p, bloomd) that contains large files Handle frontend loaded zip archive content (h5p, bloom(pub/d), perseus, kpub) that contains large files Mar 3, 2026
Copy link
Member

@bjester bjester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most notable feedback I have is in regards to the potentially confusing configuration situation with largeMediaThreshold and largeFileUrlGenerator

// web-streams-polyfill/polyfill patches globals as a side effect
import 'web-streams-polyfill/polyfill';
// Use Node.js built-in TextEncoder/TextDecoder (available since Node 11)
import { TextEncoder, TextDecoder } from 'util';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended practice now is to use node: protocol prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, have updated.

Comment on lines +93 to +95
// Restore original URL methods
global.URL.createObjectURL = originalCreateObjectURL;
global.URL.revokeObjectURL = originalRevokeObjectURL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of thing is handled for you by using proper jest mocks on objects, and jest.restoreAllMocks();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

// Use Node.js built-in TextEncoder/TextDecoder (available since Node 11)
import { TextEncoder, TextDecoder } from 'util';

import mock from 'xhr-mock';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly naming this xhrMock is much clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

const videoUrl = videoFile.toUrl();

// KEY ASSERTION: largeFileUrlGenerator must be called
expect(generatedUrls).toContain('media/video.mp4');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a local array of call args, to assert behavior, is a hack for what mock functions do normally. This pattern in repeated in several places, creating significant test logic code that does what mocks do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated these.

expect(video._urlGenerator).toBeNull();
expect(video.obj).toBeInstanceOf(Uint8Array);
expect(video.obj.length).toBe(600 * 1024);
// toString() would work (but would be slow for large binary), toUrl() returns blob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears there are other tests calling toString(), but this comments feels like an implementation note, rather than a good reason not to actually test .toString()? would work => well, test it!

It was attempted in large media file cannot be converted to string when URL generator is used which makes the reciprocal case sensible IMO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, have tweaked - this was a left over note I missed cleaning up.

Comment on lines +9 to +12
global.URL.createObjectURL = jest.fn(() => {
return `blob:mock-${Math.random().toString(36).substr(2, 9)}`;
});
global.URL.revokeObjectURL = jest.fn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never restored? see my other comment

embedded_filepath = embedded_filepath.replace("//", "/")
# Decode any URL-encoded characters in the embedded filepath so that
# filenames with spaces or special characters resolve correctly.
embedded_filepath = unquote(embedded_filepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the webserver not already unquote these, meaning this could inadvertently mess up embedded paths that legitimately have character sequences that would be considered url-encoded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - you're right - the test was not veridical because it artificially encoded the path, whereas the WSGI server would already have dealt with that. Have reverted the test and the changes.

Comment on lines +44 to +50
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `maxFullLoadSize` | `number` | `2.5MB` | Files larger than this trigger lazy loading with range requests |
| `largeMediaThreshold` | `number` | `500KB` | Audio/video files larger than this use `largeFileUrlGenerator` instead of being extracted |
| `largeFileUrlGenerator` | `function` | `null` | Function `(filename) => url` for serving large audio/video files directly instead of extracting them |
| `chunkSize` | `number` | `500KB` | Size of chunks for grouped range requests |
| `filePathMappers` | `object` | (built-in) | Custom path replacement mappers by file extension |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more documentation here than exists in the code. Please consider more JSDocs over separate README details

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all code specific documentation to JSDocs.

try {
// Only use largeMediaThreshold if we have a generator to serve large files
// Otherwise, treat all files the same (include them all in chunks)
const effectiveThreshold = this.largeFileUrlGenerator ? this.largeMediaThreshold : Infinity;
Copy link
Member

@bjester bjester Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README states:

Audio/video files larger than this use largeFileUrlGenerator instead of being extracted

Which made me go back to check whether it threw an error if the size was larger than the threshold but a URL generator wasn't supplied. This ternary prevents that situation, but silently. Since threshold handling is managed in AdaptiveHttpReader, this all leads to a bit of a confusing situation if misconfigured.

rtibbles and others added 5 commits March 20, 2026 17:33
…to add largeFile handling. Add zip origin to CSP for media.
…to add largeFile handling. Add zip origin to CSP for media.

fixup! Change API of zipContentUrl. Update all kolibri-zip consumers to add largeFile handling. Add zip origin to CSP for media.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fixup! Refactor kolibri-zip to use zip.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
// A fallback URL to the zipcontent endpoint for this H5P file
this.zipcontentUrl = new URL(
`../../zipcontent/${this.filepath.substring(this.filepath.lastIndexOf('/') + 1)}`,
`../../zipcontent/${this.filepath.split('/').pop()}`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagated the update from the BloomRunner here.

rtibbles and others added 3 commits March 20, 2026 18:06
fixup! Refactor kolibri-zip to use zip.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fixup! Refactor kolibri-zip to use zip.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APP: Learn Re: Learn App (content, quizzes, lessons, etc.) DEV: backend Python, databases, networking, filesystem... DEV: frontend DEV: renderers HTML5 apps, videos, exercises, etc. SIZE: very large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read media files more efficiently from zipped files

6 participants