Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow saving HTML snapshots of Cloudflare-Protected Websites #712

Closed
tastycroissant opened this issue Apr 19, 2024 · 5 comments
Closed

Allow saving HTML snapshots of Cloudflare-Protected Websites #712

tastycroissant opened this issue Apr 19, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@tastycroissant
Copy link

Feature Request: Integration with Flaresolverr for Archiving Cloudflare-Protected Websites

Description:
Currently, Linkding offers a convenient way to archive links in HTML format for revisiting later. However, a significant number of websites are protected by Cloudflare, preventing Linkding from capturing snapshots of these pages. This limitation hampers the completeness of the archiving process, especially considering the widespread use of Cloudflare protection.

Proposal:
I suggest integrating Flaresolverr into Linkding to overcome this issue. Flaresolverr is a service designed specifically to bypass Cloudflare protection, allowing access to websites that would otherwise be inaccessible due to Cloudflare's security measures. By incorporating Flaresolverr into Linkding's archiving process, users would gain the ability to capture snapshots of Cloudflare-protected websites seamlessly.

Benefits:

  • Enhanced archiving capabilities: Users can archive a wider range of websites, including those protected by Cloudflare.
  • Improved user experience: Seamless integration of Flaresolverr ensures a smoother archiving process without requiring users to manually bypass Cloudflare protection.

Implementation:
The integration with Flaresolverr could be implemented as an optional feature within Linkding's settings. Users could enable or disable the Flaresolverr integration based on their preferences. Additionally, clear documentation and user guidance should be provided to ensure ease of use and understanding.

Thank you for considering this feature request.

@sissbruecker
Copy link
Owner

sissbruecker commented Apr 20, 2024

This is definitely an issue, and integrating with a solution like this could be an option. Then again it looks like it would need to be a more involved setup, as flaresolverr would ideally run in its own Docker container. At least from a quick look they rely on a specific Chrome / Chromium version that might not be compatible with what linkding installs, and then they don't support ARM architectures with standalone installs / builds, which linkding needs.

Another option could be to make this work through the browser extension somehow. When the browser extension runs you already have passed all Cloudflare checks, so there might be an option to just get the HTML though the extension and then make single-file work with that instead of trying to load the HTML through the URL on its own. Maybe it would also be possible to integrate the single-file browser extension with the linkding extension so that the full snapshot can already be captured in the browser.

Edit: Modified the title to reflect the problem that needs solving, rather than one of the possible options to solve it.

@sissbruecker sissbruecker added the enhancement New feature or request label Apr 20, 2024
@sissbruecker sissbruecker changed the title Integration with Flaresolverr for Archiving Cloudflare-Protected Websites Allow saving HTML snapshots of Cloudflare-Protected Websites Apr 20, 2024
@timthinks
Copy link

This would be really great! Is there any update on this? Thank you for all the work!!!!

@DonkeeeyKong
Copy link

DonkeeeyKong commented Nov 23, 2024

This is also an issue with Recaptcha-protected sites. Here is an example of a snapshot of a random archive.ph link:
Screenshot 2024-11-23 at 21-13-24 archive ph
I looked around a little bit. (I still prefer Linkding, but what I found out:) The snapshot Hoarder takes of my example website from above looks like this:
Screenshot 2024-11-23 at 21-14-17 Tochter von Rocklegende über ihre lieblose Kindheit „Ich machte vier Therapien auf einmal – typisch Zappa“
Afaik Hoarder uses monolith, not single-file, for snapshots. Would this be an alternative,@sissbruecker?

Thank you for this great tool, btw! Awesome work!

@DonkeeeyKong
Copy link

This is also an issue with Recaptcha-protected sites. Here is an example of a snapshot of a random archive.ph link:

I looked around a little bit. (I still prefer Linkding, but what I found out:) The snapshot Hoarder takes of my example website from above looks like this:

Afaik Hoarder uses monolith, not single-file, for snapshots. Would this be an alternative,@sissbruecker?

Thank you for this great tool, btw! Awesome work!

Nevermind. This works flawlessly after setting a custom user-agent as suggested in #690 and described here.

@sissbruecker
Copy link
Owner

Closing in favor of #980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants