Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some CSS hits a pathological case in the rrweb splitCssText pathway causing slowdown in processing #1668

Open
kevinansfield opened this issue Jan 22, 2025 · 21 comments

Comments

@kevinansfield
Copy link

We've just had to disable user session recording on our posthog.com hosted account because our customers started experiencing browser hangs when using our editor with the Grammarly browser extension enabled.

Profiling during the hang showed rrweb's record.js script as the culprit. There's a related issue (rrweb-io/rrweb#1603) on their project but it's been closed.

I've opened the issue here as it affected the posthog project and it may just be a versioning problem with posthog using the v2.0.0-alpha.18 version

@pauldambra
Copy link
Member

Hey

rrweb has been on v2 alpha for over 12 months... it's not at all an alpha version just an indication of the amount of time the core contributors have to finish features they want to include in v2

This is a tricky one since if things work normally without grammarly i'm tempted to say it's grammarly bug... but that's not to reject any investigation or fix on our end :)

Can you share anything else about your site / tech stack? I'll need to set up a reproduction so I can investigate - although you're welcome to make a simple app that demonstrates the problem too!

@kevinansfield
Copy link
Author

kevinansfield commented Jan 22, 2025

This is specifically a change that occurred in -alpha.18. It's apparently been fixed in rrweb but the -alpha.19 release containing it hasn't been cut yet.

I can only assume reverting to -alpha.17 will fix posthog until the -alpha.19 release is available.

Unfortunately I don't have a minimal repro available right now. Our editor screen where the problem was occurring contained both a textarea and a contenteditable area using Lexical.

@kevinansfield
Copy link
Author

kevinansfield commented Jan 22, 2025

FWIW we started getting customer reports yesterday with timing that coincides with your v1.207.1 release that contained the -alpha.18 bump.

@kevinansfield
Copy link
Author

kevinansfield commented Jan 22, 2025

One route to a repro would be our editor demo site - https://koenig.ghost.org. We don't have posthog enabled on that and so don't see any problems with Grammarly enabled, but it's a static site that could be cloned and posthog added in order to see the performance issues.

@pauldambra
Copy link
Member

#1670

@danLDev
Copy link

danLDev commented Jan 22, 2025

Have upgraded to the latest posthog version with the fix and we're still experiencing this (for all users, not just grammarly)

@pauldambra
Copy link
Member

hey @danLDev

how did you update? the fix hasn't hit the CDN yet so you probably aren't running the fix

@pauldambra
Copy link
Member

that has hit the CDN now so hopefully resolved for you

@danLDev
Copy link

danLDev commented Jan 22, 2025

I just bumped the posthog-js package so I guess you're right. Looks like it's resolved now

@pauldambra
Copy link
Member

awesome, thanks for taking time to confirm!

sorry for the interruption

there's a test in the upstream recorder tool to protect against this now, so at least the system as a whole is less fragile now 🙈

@pauldambra
Copy link
Member

@kevinansfield I'm going to close this since I've had confirmation from others that it is resolved for them, here and on other channels

obvs feel free to comment back here or contact me over in-app support if you need any follow-up

thanks for the really clear issue and follow-up, helped cut down response time 🙌

@danLDev
Copy link

danLDev commented Jan 22, 2025

@pauldambra Sorry to do this, I think i may have celebrated too early.

We're still getting reports of page hangs (and in some cases crashes on slower machines), even after bumping posthog-js to the latest version.

Image

@pauldambra
Copy link
Member

hey, is the site public for us to take a look? or you can open an in-app support ticket to share details there...

@danLDev
Copy link

danLDev commented Jan 22, 2025

Unfortunately not, will open a support ticket

@Zloka
Copy link

Zloka commented Jan 24, 2025

Hi,

We noticed the same in our web application. Disabling session recordings fixed it for now, while we upgrade to the latest PostHog version.

Out of curiosity, do you @danLDev still experience the issue? 🙂

If so, and if it helps @pauldambra with debugging, it should be possible for us to deploy a public-facing site.

@pauldambra
Copy link
Member

hey zloka, are you on the latest version? has it improved with the latest versions?

if the site is public I can for sure take a look, i've seen some CSS that is ~10x more expensive to process but I'm not 100% on what's causing it

@pauldambra pauldambra reopened this Jan 24, 2025
@pauldambra pauldambra changed the title Site hang from rrweb when client has grammarly extension Some CSS hits a pathological case in the rrweb splitCss pathway causing slowdown in processing Jan 24, 2025
@pauldambra pauldambra changed the title Some CSS hits a pathological case in the rrweb splitCss pathway causing slowdown in processing Some CSS hits a pathological case in the rrweb splitCssText pathway causing slowdown in processing Jan 24, 2025
@pauldambra
Copy link
Member

i've reopened this and updated the title since this is definitely not completely fixed but I also don't want folk to think it's completely broken

i've got an improvement to the processing cache on the way to the CDN in 1.209.1 which means folk only take the hit on the first run through the CSS (per page load) but that's obvs still not good enough in the case where the recorder pauses on your css

have asked in the rrweb secret contributors slack too - want to check if someone can ELI5 the purpose of the function to help me figure out the fix (or indeed nerd snipe someone else into fixing it)

@MartinWorkfully
Copy link

Hey, we got the same issue

Login to the site is public if you need to take a look to the problem in the wild.

This is priority for our team, so if you need support debugging this just ping me.

Image

@pauldambra
Copy link
Member

Hey,

We rolled back to the previous version. Can you test if that has resolved things for you?

Thanks

Paul

@MartinWorkfully
Copy link

Hey again, rolling back to 1.210.1 'fixes' the bug.

Obviously this also prevents us from updating your lib, but I'll watch this thread for updates on the fix version.

Thanks

eoghanmurray added a commit to eoghanmurray/rrweb that referenced this issue Jan 29, 2025
@pauldambra
Copy link
Member

@MartinWorkfully if you aren't locking the version of replay by using our full or no external dependencies builds then you'll be getting these fixes automaticaly 👍

eoghanmurray added a commit to rrweb-io/rrweb that referenced this issue Feb 6, 2025
Fixes a browser 'lock up' at record time due to a presence of large amounts of css in <style> elements, which are split over multiple text nodes, which triggers the new code added in #1437 (see that PR for full explanation of why this all exists).  #1437 was not written with performance in mind as it was believed to be an edge case, but things like Grammarly browser extension (#1603) among other scenarios were triggering pathological behavior, some of which was solved in #1615.
See also #1640 (comment) for further discussion.

* Fix the case when there are multiple matches and we end up not finding a unique one - just go with the best guess when there are many splits by looking at the previous chunk's size
* Also add '0px' -> '0' stylesheet normalization, which also fixes the sample problem in a different way
* Add new test and modify it so that it can trigger a failure in the absence of the '0px' normalization; there may be other unknown ways of triggering a similar bug, so ensure that the primary 'best guess' method doesn't suffer a regression
* Leverage the 'best guess' method so that we can quit after 100 iterations trying to find a unique substring; hopefully this bit along with the `iterLimit` already added will prevent any future pathological cases.

Failing example extracted from large files identified by Paul D'Ambra (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668
gnpaone added a commit to Midpath-Software/rrweb that referenced this issue Feb 7, 2025
* Fix up the 'should replace the existing DOM nodes on iframe navigation with `isAttachIframe`' test (rrweb-io#1636)

- it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time

* [chore]: Update actions/upload-artifact to v4 (rrweb-io#1643)

* update actions/upload-artifact to v4

---------

Co-authored-by: Eoghan Murray <[email protected]>

* Fix a code path where masking could be skipped on textareas (rrweb-io#1599)

* Fixes rrweb-io#1596

* [chore] Cache yarn packages for CI (rrweb-io#1646)

* [chore] Cache yarn packages for CI

* Cache yarn in release.yml

* [chore] Update deprecated download artifact on CI (rrweb-io#1647)

* I'm merging even though ESLint is stlll failing in Github Actions as I believe it's running actions _without_ this PR applied yet

* Fix env puppeteer error in cross-origin-iframes.test.ts (rrweb-io#1629)

* chore(ci): track bundle size (rrweb-io#1630)

* chore(ci): track bundle size

---------

Co-authored-by: pauldambra <[email protected]>

* Fix adapt css with split (rrweb-io#1600)

Fix for rrweb-io#1575 where postcss was raising an exception

* adapt the entire CSS as a whole in one pass with postcss, rather than adapting each split part separately
* break up the postcss output again and assign to individual text nodes (kind of inverse of splitCssText at record side)
* impose an upper bound of 30 iterations on the substring searches to preempt possible pathological behavior
* add tests to demonstrate the scenario and prevent regression

More technical details:
* Fix algorithm; checks against `ix_end` within loop were incorrect when `ix_start` was bigger than zero.  
* Fix that length check against wrong array was causing 'should record style mutations with multiple child nodes and replay them correctly' test to fail. 
Note on last point: I haven't looked into things more deeply than that the test was complaining about missing .length after `replayer.pause(1000);`

* Warn instead of fail on exceptions thrown from postcss (rrweb-io#1580)

* postcss was introduced in rrweb-io#1458 for use within adaptCssForReplay
* rrweb-io#1600 fixes the main case where invalid css could be introduced when if valid css from the output of `sheet.cssRules` was split according to how it was split across text nodes of the <style>
* the guard introduced here is still useful as we likely in future will switch to capturing the raw stylesheet contents (both <style> and <link>), at which point we will be much less confident of getting valid css

* Fix splitCssText again (rrweb-io#1640)

Fixes a browser 'lock up' at record time due to a presence of large amounts of css in <style> elements, which are split over multiple text nodes, which triggers the new code added in rrweb-io#1437 (see that PR for full explanation of why this all exists).  rrweb-io#1437 was not written with performance in mind as it was believed to be an edge case, but things like Grammarly browser extension (rrweb-io#1603) among other scenarios were triggering pathological behavior, some of which was solved in rrweb-io#1615.
See also rrweb-io#1640 (comment) for further discussion.

* Fix the case when there are multiple matches and we end up not finding a unique one - just go with the best guess when there are many splits by looking at the previous chunk's size
* Also add '0px' -> '0' stylesheet normalization, which also fixes the sample problem in a different way
* Add new test and modify it so that it can trigger a failure in the absence of the '0px' normalization; there may be other unknown ways of triggering a similar bug, so ensure that the primary 'best guess' method doesn't suffer a regression
* Leverage the 'best guess' method so that we can quit after 100 iterations trying to find a unique substring; hopefully this bit along with the `iterLimit` already added will prevent any future pathological cases.

Failing example extracted from large files identified by Paul D'Ambra (Posthog) ... see comment from MartinWorkfully: PostHog/posthog-js#1668

* fix: move patch function into utils to improve bundling (rrweb-io#1631)

* fix: move patch function into utils to improve bundling

---------

Co-authored-by: pauldambra <[email protected]>
Co-authored-by: Justin Halsall <[email protected]>

---------

Co-authored-by: Eoghan Murray <[email protected]>
Co-authored-by: Kevin Townsend <[email protected]>
Co-authored-by: Justin Halsall <[email protected]>
Co-authored-by: Paul D'Ambra <[email protected]>
Co-authored-by: pauldambra <[email protected]>
Co-authored-by: John Henry Gunther <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants