Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0] turn off URL rewriting for my account #5542

Open
5 tasks done
aeharding opened this issue Mar 25, 2025 · 17 comments
Open
5 tasks done

[1.0] turn off URL rewriting for my account #5542

aeharding opened this issue Mar 25, 2025 · 17 comments
Labels
enhancement New feature or request

Comments

@aeharding
Copy link

aeharding commented Mar 25, 2025

Requirements

  • Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support or the matrix chat.
  • Did you check to see if this issue already exists?
  • Is this only a feature request? Do not put multiple feature requests in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
  • Do you agree to follow the rules in our Code of Conduct?

Is your proposal related to a problem?

Example one

Often I don't want lemmy rewriting my lemmy URLs. The biggest reason is not wanting to seem as I may be associated with a given instance.

For example, imagine I make a post:

[email protected] posted:
Hey, check out my new post! https://lemmy.world/post/123

Imagine google indexes lemmyporn.com. it will be indexed as search engines as this:

[email protected] posted:
Hey, check out my new post! https://lemmyporn.com/post/123

I never want this to happen. This would be a really bad look for me, as it would seem like I linked to a porn instance.

Example two

Many times I've linked to a post on a specific instance, for example when explaining how post content hasn't federated properly to X instance, but has to Y instance. In this case, it would be nice if I could turn off rewriting for a specific post.

Describe the solution you'd like.

Ability to turn off URL rewriting for object resolving in post and comment body.

Describe alternatives you've considered.

There is really no alternative. Maybe post the URL in a code block, but then it can't be linked properly.

Additional context

This is a lemmy 1.0.0 thing. This maybe should be a larger discussion, too. In general I am opposed to rewriting links of user content, because it can change the meaning of my post.

Mastodon, for example, does not rewrite URLs, for this reason. It leaves implementation up to clients. I think this is a better approach, because it doesn't change the user's meaning.

@aeharding aeharding added the enhancement New feature or request label Mar 25, 2025
@aeharding aeharding changed the title Ability to turn off URL rewriting per post [1.0] Ability to turn off URL rewriting per post Mar 25, 2025
@aeharding aeharding changed the title [1.0] Ability to turn off URL rewriting per post [1.0] turn off URL rewriting for my account Mar 25, 2025
@Nutomic
Copy link
Member

Nutomic commented Mar 25, 2025

Copying from @aeharding's direct message:

I was thinking about a good middle ground that would be a win-win:

What if Lemmy extracted all non-local links, and then provided a map via the API alongside the original markdown?

For example, the markdown "Hey, visit my post! https://lemmy.world/post/123" when viewed from lemm.ee, would be returned from the api as this:

"Hey, visit my post! https://lemmy.world/post/123"
{ "https://lemmy.world/post/123": "https://lemm.ee/post/321" }

This would then be trivial for frontends to replace the href, as it's a super simple map :)

This would be possible, although it requires some extra effort for every frontend to implement, and we would need a separate database table to store these link mappings. It also wouldnt solve the problem with lemmynsfw in posts, as lemmy-ui running on that instance would still replace the link, so the HTML seen by search engine crawlers would be the same either way.

Ability to turn off URL rewriting for object resolving in post and comment body.

About this, how exactly would it work? As a setting by the post author which gets federated to all instances?

@dessalines
Copy link
Member

It also wouldnt solve the problem with lemmynsfw in posts, as lemmy-ui running on that instance would still replace the link, so the HTML seen by search engine crawlers would be the same either way.

I don't see any way around that, which kind of makes this issue impossible to fix.

@aeharding
Copy link
Author

I don't see any way around that, which kind of makes this issue impossible to fix.

If lemmy-ui detects a crawler, it could disable the markdown rewriting, that shouldn't be too bad unless I'm misunderstanding.

About this, how exactly would it work? As a setting by the post author which gets federated to all instances?

That's the only way I can think of. However it does have downsides: breaking linking for non-crawler users for example.

This would be possible, although it requires some extra effort for every frontend to implement, and we would need a separate database table to store these link mappings. It also wouldnt solve the problem with lemmynsfw in posts, as lemmy-ui running on that instance would still replace the link, so the HTML seen by search engine crawlers would be the same either way.

It's true that it would be more work for clients, however you do need to consider that all clients (other than lemmy-ui) need to do some logic for links anyways so they stay in the app/client instead of opening a browser to lemmy-ui. And a simple map should be quite easy for apps to add.

@dessalines
Copy link
Member

Even if we moved all this logic to front-ends, how would you prevent any other federated services or apps from rewriting to their own local links?

This is probably a losing battle, as you'll never be able to account for every fediverse service or web app.

@aeharding
Copy link
Author

Even if we moved all this logic to front-ends, how would you prevent any other federated services or apps from rewriting to their own local links?

Lemmy can't control anything other software does.

But I'd also like to flip that on its head. If lemmy does rewrite links, will those rewritten links get federated to other services? I could see that being a problem - for example, say I post on lemmygrad.ml/c/memes:

Hi there! Check out this post: https://lemm.ee/post/123

With string replace, would that get rewritten to:

Hi there! Check out this post: https://lemmygrad.ml/post/321

Would that string replace content be federated to non-lemmy services? For example, would a piefed user see:

Hi there! Check out this post: https://lemmygrad.ml/post/321

If that is the case, all of those non-lemmy instances would see my comment as if I linked to lemmygrad.ml, when I didn't, especially since no other federated services I'm aware of rewrite links in user post/comment body.

That is why I think in terms of federation and external services, rewriting URLs is a much more slippery slope than preserving the original user submitted URL.

@dessalines
Copy link
Member

I think this is pretty convincing, although it does put a large burden on apps to do all their own link rewriting.

The places I can find where we do rewriting in the back-end, are for images, and tracking params.

  • post.thumbnail_url
    • We should consider removing any proxying logic, and make sure that this field only ever points to the actual source image.
    • Expose proxying settings in GetSite, and make front-ends read it to see if it should proxy those or not.
  • post.url
    • Only for some images, and tracking param clearing. Only keep tracker param clearing.
  • All markdown bodies.
    • We could remove all that logic entirely.

Any front-end logic to not rewrite links for scrapers only would have to be opened up on those repos.

@aeharding
Copy link
Author

That makes sense to me.

Regarding images: Can you give an example of how you are imagining providing proxy information for thumbnail_url in GetSite would look like? I was thinking, as an alternative, even image urls could be added to the same map used for links.

There would be only one REST API change:

  1. For any Postview, CommentView, add a local_url_map property.

Pseudo response example:

Post response on lemmy.ml/c/memes by lemmy.world user, viewed on lemm.ee:

post {
  content: "Hi, check out my comment: https://lemmy.world/post/123"
  thumbnail_url: "https://lemmy.world/pictrs/mypic.jpg"
  local_url_map: {
    "https://lemmy.world/post/123": "https://lemm.ee/post/321",
    "https://lemmy.world/pictrs/mypic.jpg": "https://lemm.ee/pictrs/mypic.jpg"
  }
}

Some benefits:

  1. Clients only need to make one assumption: If any image src or link href is found in the local_url_map, it may use it (doesn't have to)
  2. The business logic of how proxying occurs (URL format, etc) is abstracted from clients. So if lemmy wants to change the image proxy endpoint (/api/v3/image_proxy) in the future, frontends wouldn't need any changes
  3. Clients can fall back to the original image or link upon a server error with the proxied one if they want
  4. Lemmy has the opportunity for more fine grained control over what is proxied. For example, when receiving a newly federated post, Lemmy could download the images async and add to local_url_map later, after the image has been successfully downloaded locally.
  5. Unproxy-able images, like imgur, could be omitted from the local_url_map. Lemmy would have complete control over this. The nice thing is, the clients don't care, they may use the local url if it is available, but it doesn't have to be fully populated.

Potential drawback:

  1. More verbose response payload (gzip may mitigate) than providing a global proxy config in GetSite

Of course, lemmy could proxy as much or as little information as it wants... maybe we just put fedi urls in the proxy map, not images, to start, idk.

@dessalines
Copy link
Member

Can you give an example of how you are imagining providing proxy information for thumbnail_url in GetSite would look like? I was thinking, as an alternative, even image urls could be added to the same map used for links.

We'd just expose the Pictrs image mode, then force clients to build their own URLs for every non-local image if proxy mode is turned on.

We wouldn't provide any URL maps, clients would have to check those fields (and scan through markdown bodies), to rewrite their own links.

@aeharding
Copy link
Author

We'd just expose the Pictrs image mode, then force clients to build their own URLs for every non-local image if proxy mode is turned on.

I see. That could work, although it has the downside of forcing clients to build URLs, which could limit flexibility of Lemmy to change logic in the future.

Also, how would that work if Lemmy needs to disable proxying for certain domains like imgur? Would lemmy need to send that domain blacklist in the GetSite?

On the other hand, the logic is much simpler with a local_url_map solution. I made a proof-of-concept codepen to show how I think a client would implement the proxying with thumbnail URLs and markdown mapping, check it out! It is super straightforward with markdown-it. You can even set IS_SCRAPER = true to see everything is unproxied. :) Let me know what you think

https://stackblitz.com/edit/vitejs-vite-rukomjdn?file=src%2Fmain.ts

@Nutomic
Copy link
Member

Nutomic commented Mar 27, 2025

Links are only rewritten by your local instance (where your account is registered), and the instance where a post is displayed. So the only way someone could see your post containing a rewritten lemmynsfw link is by viewing your post on that site. There is a link to the original post on your instance included (with Fediverse icon), so its clear that this is changed from what you wrote. Also Lemmy includes link rel="canonical" specifically so that search engines index posts correctly.

Honestly your concern about having posts shown containing links with lemmynsfw domain is only a theoretical problem. In practice a much more likely problem is that someone would create an account with your name to impersonate you, and post all kinds of malicious things. Url rewriting and image proxying were implemented like this specifically to make it easier for client devs, so it "just works". Implementing the solutions you suggest would require a lot of work from all client devs, when they already have a lot on their plate for the 1.0 API changes. Its really not worth the effort, so I would close this as wontfix.

@dessalines
Copy link
Member

dessalines commented Mar 27, 2025

I think there is potential value in only ever using source images in every back end. Not just for app cases like this, but also because it simplifies our code, and federated images and thumbnails will never run into proxy chaining cases when federating out.

Take for instance our rewriting markdown bodies, or the complicated thumbnail logic : all that logic could be removed.

I'm interested in what others have to say about this (especially other app devs):

cc @SleeplessOne1917 @MV-GH @phiresky @dullbananas

@Nutomic
Copy link
Member

Nutomic commented Mar 28, 2025

The code would get much more complicated with this because we would have to add new database tables to store url mappings, and include them in the api. The markdown logic would still be needed to generate these mappings in the first place. And when a community forwards a post to followers it uses the original activity without any changes, so there is no proxy chaining in that case.

@dessalines
Copy link
Member

I'd say we don't do any URL mappings; each front end might want to use their own unique URLs anyway, so it'd be pointless to do it.

As long as we have reference examples in lemmy-ui and jerboa for how to rewrite links, I don't think it should be a problem for others to copy it.

@Nutomic
Copy link
Member

Nutomic commented Mar 31, 2025

The problem is that it requires a lot of work to change something that already works perfectly fine. Lots of apps/clients wont have time for that, or wont even know about it. So they wont benefit from these Lemmy features. Not to mention all the work required to implement this in the Lemmy backend. Its really not worth it.

@dessalines
Copy link
Member

It would only be deleting things in the back-end, exposing the pictrs image mode in GetSite, and making sure all our proxying functions are deleted. It'd be easy to do, but I don't think we should until we get some more opinions from app devs.

@Nutomic
Copy link
Member

Nutomic commented Apr 1, 2025

No, the backend still has to parse all incoming posts for image links and store them in RemoteImage table to pass this check (to prevent Lemmy from being abused as general-purpose proxy). So only the logic to replace links with proxied variants could be removed, but the entire parsing logic is still needed.

And for Activitypub links they still have to be dereferenced in the backend, then stored in a table activitypub_dereferenced_links (original: text, local: text). Then for any data with markdown (post, comment, community sidebar, private message etc), somehow join to that table so that clients can replace the links. Which means it also requires sql relations for all these different tables to activitypub_dereferenced_links.

@dessalines
Copy link
Member

dessalines commented Apr 1, 2025

No, the backend still has to parse all incoming posts for image links and store them in RemoteImage table to pass this check (to prevent Lemmy from being abused as general-purpose proxy). So only the logic to replace links with proxied variants could be removed, but the entire parsing logic is still needed.

I think the only change there, would be that we only ever store real image links, and never proxied ones. On the apub side, we'd only ever send, receive, or store real image links, never proxied ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants