Skip to content

Conversation

@dom96
Copy link
Contributor

@dom96 dom96 commented Nov 18, 2025

Uses HTMLRewriter to inject meta tags

Deployed to: https://python-opengraph.runtime-playground.workers.dev/

CC @ryanking13 (I can't add you as a reviewer yet)


# Remove existing OpenGraph and Twitter meta tags to avoid duplicates
meta_remover = ExistingMetaRemover()
rewriter.on('meta[property^="og:"]', to_js(meta_remover))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without to_js here we get an unclear error:

✘ [ERROR] Uncaught Error: This borrowed attribute proxy was automatically destroyed in the process of destroying the proxy it was borrowed from. Try using the 'copy' method.

  For more information about the cause of this error, use `pyodide.setDebug(true)` Error
      at _getAttrs (pyodide-internal:generated/emscriptenSetup:22274:19)
      at _adjustArgs (pyodide-internal:generated/emscriptenSetup:22292:77)
      at apply (pyodide-internal:generated/emscriptenSetup:23240:37)
      at apply (pyodide-internal:generated/emscriptenSetup:23083:20)


✘ [ERROR] Uncaught Error: This borrowed attribute proxy was automatically destroyed in the process of destroying the proxy it was borrowed from. Try using the 'copy' method.

  For more information about the cause of this error, use `pyodide.setDebug(true)`


✘ [ERROR] Uncaught Error: This borrowed attribute proxy was automatically destroyed in the process of destroying the proxy it was borrowed from. Try using the 'copy' method.

  For more information about the cause of this error, use `pyodide.setDebug(true)`

After reading pyodide/pyodide#1855 I was able to deduce that to_js is at least the first thing I need to do in order to then be able to copy as well. Looks like just doing to_js works here, but maybe there is a hidden bug here that I'm not aware of.

Anyway, we should put HTMLRewriter into our SDK to make this easier for devs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_proxy would also work, but I found another example using to_js and it is probably good to be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it should be create_proxy here.

Copy link
Contributor

@hoodmane hoodmane Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the resource management right, it should probably be:

        meta_remover = create_proxy(ExistingMetaRemover())
        meta_injector = create_proxy(MetaTagInjector(og_data))

        try:
            # Create an HTMLRewriter instance
            rewriter = HTMLRewriter.new()
            rewriter.on('meta[property^="og:"]', meta_remover)
            rewriter.on("head", to_js(meta_injector))
            return rewriter.transform(response.js_object)
        finally:
            meta_remover.destroy()
            meta_injector.destroy()

We should probably add a wrapper for HTMLRewriter to the sdk that can manage the handler lifetimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done. I needed to add a few copy in there too. Hopefully that makes sense.

Copy link

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @ryanking13 (I can't add you as a reviewer yet)

Yeah, do you know how I can be added to the GitHub org?


# Remove existing OpenGraph and Twitter meta tags to avoid duplicates
meta_remover = ExistingMetaRemover()
rewriter.on('meta[property^="og:"]', to_js(meta_remover))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_proxy would also work, but I found another example using to_js and it is probably good to be consistent.

@dom96 dom96 force-pushed the dominik/opengraph branch 2 times, most recently from 08c0160 to eb78296 Compare November 20, 2025 11:50
Comment on lines 166 to 169
rewriter.on('meta[property^="og:"]', meta_remover.copy())
rewriter.on('meta[name^="twitter:"]', meta_remover.copy())
# Inject new OpenGraph meta tags into the <head> element
rewriter.on("head", meta_injector.copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need these copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it fails without them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rewriter do any work after rewriter.transform returns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think that it does, but I guess the error implies this

I removed the copy() and the destroy() for now

@dom96 dom96 force-pushed the dominik/opengraph branch 2 times, most recently from 0476c1b to 16c9b20 Compare November 21, 2025 15:03
@dom96 dom96 force-pushed the dominik/opengraph branch from 16c9b20 to 7592ede Compare November 21, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants