Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing issue regarding Astro v5 Beta and Astro v4 #10673

Open
ArmandPhilippot opened this issue Jan 9, 2025 · 19 comments
Open

Indexing issue regarding Astro v5 Beta and Astro v4 #10673

ArmandPhilippot opened this issue Jan 9, 2025 · 19 comments
Assignees
Labels
site improvement Some thing that improves the website functionality - ask @delucis for help!

Comments

@ArmandPhilippot
Copy link
Member

📋 Explain your issue

I just came across this link after a Google search: https://5-0-0-beta--astro-docs-2.netlify.app/en/getting-started/

The content might be out of date if someone refers to it thinking they're on Astro Docs. And duplicated content between this website and the real Docs website could harm SEO I think... (both use canonical but each with their own URL so I think Google can't determine which one is the real "original" and may suggest the beta website instead of the current docs...)

So I think we should handle this.

And quoting Chris: "We need to handle this and also https://v4.docs.astro.build/ properly and I don’t think we did. We’ll need to figure out if we can actually kill the v5 beta one, or if we need to update how it is indexed"

@sarah11918 sarah11918 assigned sarah11918 and delucis and unassigned sarah11918 Jan 10, 2025
@sarah11918
Copy link
Member

Thanks for filing this issue so it's on the radar for @delucis ! 🙌

@delucis
Copy link
Member

delucis commented Jan 11, 2025

Thanks for helping track this. Small update: I updated and redeployed the v5 branch (https://github.com/withastro/docs/tree/5.0.0-beta) to use docs.astro.build for canonical URLs, so that should hopefully filter through and remove beta URLs from search results.

For the v4 branch (https://github.com/withastro/docs/tree/v4), I’ve tried a different approach for now: adding a X-Robots-Tag: noindex header to all pages to tell search engines not to index them. My thinking here was that the v4 branch is a bit more longer lived and more likely not really relate to the v5 canonical content, so just requesting it not to be indexed may make more sense. But we can monitor it (AFAIK I haven’t seen the v4 subdomain causing issues in search results just yet).

@sarah11918
Copy link
Member

Thanks @delucis ! Is there follow up/monitoring to do here? How will we know when we're "done" and the issue can be closed?

@sarah11918 sarah11918 added the site improvement Some thing that improves the website functionality - ask @delucis for help! label Jan 13, 2025
@delucis
Copy link
Member

delucis commented Jan 13, 2025

I’m not sure 😁 Basically, yes, I’d expect this to take a little time to resolve so might need monitoring. Maybe @ArmandPhilippot could share the search query that returned the 5-0-0 deploy URL? Then once that seems to be working we can close?

@ArmandPhilippot
Copy link
Member Author

Good thing I didn't delete the history... because I forgot to noted it somewhere. 😅
I don't remember why I was searching that but, from the date and time, it was "@astrojs/mdx" component.

I just checked in Google (in the 3 first pages) and I no longer see the v5 beta! However, I see: https://v4--astro-docs-2.netlify.app/fr/guides/integrations-guide/mdx/
The x-robots-tag appears in the headers, so I guess it might be a matter of days before Google decide to remove it from the search results.

@delucis
Copy link
Member

delucis commented Jan 13, 2025

Thank you! I get a v5 branch URL on the 3rd page of results on Google (in Korean for some reason 😅):

Google search for astrojs/mdx component showing a result pointing to the v5 branch deployment of docs

But hopefully that will clear up and in any case 3rd page isn’t a disaster as most people don’t end up there (especially these days given how bad most results are…)

@delucis
Copy link
Member

delucis commented Jan 13, 2025

I can also see 5-0-0 URLs showing up as not being indexed in the Google search console which is good.

@ArmandPhilippot
Copy link
Member Author

Yeah, even I don't look at all the pages anymore. 😆 So, I just tried some less specific queries to check the results:

  • with @astrojs/mdx (no quotes) on google.fr, I still see the Netlify v4 website on the first page then, the Korean doc on page 2.
  • with astro mdx on google.com, the fifth result is a 5-0-0-beta URL...

Well, site:5-0-0-beta--astro-docs-2.netlify.app still gives a lot of results... 30 result pages until I get:

In order to display the most relevant results, we have omitted some entries that are very similar to the current 298 entries.

I guess I was optimistic when I said "matter of days"... maybe a little longer than that. 😅 But, at least, it means that some pages have already been removed!

@delucis
Copy link
Member

delucis commented Feb 7, 2025

What do we think about closing this? Just did a quick check and didn’t see v4 or v5 URLs in early results and both sites now have correct canonicals pointing to docs.astro.build. They do still seem to show up sometimes either on results page 3+ or if you explicitly search with a site: filter, but I don’t think we can do much more?

@ArmandPhilippot
Copy link
Member Author

I don't see 5-0-0-beta URL anymore (in the first 3 pages) so I think we're good for those!
However, I still see v4--astro-docs-2.netlify.app on the first page (third result) with @astrojs/mdx. But I'm not sure there's anything more we can do... I guess it just takes more time. So, yeah, I think we can close this issue.

@delucis
Copy link
Member

delucis commented Feb 7, 2025

Ooh, I hadn’t spotted those in my searches. That URL doesn’t have the correct canonical, so maybe we need to fix it too:

<link rel="canonical" href="https://v4--astro-docs-2.netlify.app/en/getting-started/"/>

Netlify does set the x-robots-tag: noindex header because it’s a preview URL but I guess it’s not working?

@ArmandPhilippot
Copy link
Member Author

Either it's not working or it takes more time... Since those pages are no longer updated maybe Google doesn't crawl them as often... Not sure what happens here. But, yeah, updating the canonical could help I think.

Oh I didn't go any further with this query, on page 2 I also see a 5-0-0-beta, still the Korean version. So it doesn't seem to have changed much since my last test... This seems fixed with astro mdx though.

@delucis
Copy link
Member

delucis commented Feb 7, 2025

OK, I added made docs.astro.build the canonical URL for the v4 branch in b320aef

It’s a bit of an odd one to set because we do it via site so it does have a few other side effects, like impacting open-graph image URLs. That will mean pages that have moved in the live docs might have broken OG image links in the v4 docs, but I guess it’s OK to sacrifice that for this archive copy.

@ArmandPhilippot
Copy link
Member Author

I tested on Discord:
Screenshot showing two examples of links with OG images on Discord

I don't know about the other platform (e.g. Twitter) but at least Discord doesn't show a big red cross or something like that instead of the image. So yeah, I think it's okay if this can help with SEO... 🤞🏽

@ArmandPhilippot
Copy link
Member Author

I was checking if there was any changes since last time:

  • astro mdx on google.com: no 5.0.0 URLs, but still a v4--astro-docs-2.netlify.app on page 3 (Korean page)
  • @astrojs/mdx on google.com: the first page is clean! But a v4--astro-docs-2.netlify.app is still available on page 2...

So it seems fixed for 5.0.0 and I'd say v4--astro-docs-2.netlify.app is in progress?

However, I discover another URL that could affect SEO... on page 3 for @astrojs/mdx I found: astrojs.cn and the canonical URL uses astrojs.cn instead of docs.astro.build. But I don't think we can do much here...

@delucis
Copy link
Member

delucis commented Feb 24, 2025

Oh! I guess that’s a mirror someone made? I know that Netlify (and in general sites hosted by US and European companies) are often quite slow for Chinese users, so I guess having a mirror is quite useful for those people. Would be nice if the non-Chinese pages still used docs.astro.build for canonicals, but yeah, probably not much we can do.

@sarah11918
Copy link
Member

Thanks for keeping up with this one! FWIW, I just now did a search on Google for both of those and did not get either of the preview URLs on the first 5 pages. (I do have some kind of extension to remove AI from the top of Google Search results, so that may affect it?)

Do we think safe to close now?

@ArmandPhilippot
Copy link
Member Author

We probably have different results related to geolocation. I still have a v4--astro-docs-2.netlify.app for @astrojs/mdx on page 2 with google.com (but geolocated in France):

Image

So I don't know... I guess the results are being updated for the v4--astro-docs-2.netlify.app URLs as well since it moved in 3 weeks but I can't say for sure that what Chris did is working and that they will disappear... But, on the other hand, I don't see what more we can do.

@delucis
Copy link
Member

delucis commented Feb 28, 2025

I might need to remove the x-robots-tag: noindex header. I can try that.

We updated the canonicals after adding noindex and IIRC, that might mean that Google doesn’t _re_index the page (respecting the noindex header) but if the page is already indexed, it can also mean it is never removed. So maybe removing the header for the v4 deploys and letting Google slowly absorb the canonicals is the way to go.

delucis added a commit that referenced this issue Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site improvement Some thing that improves the website functionality - ask @delucis for help!
Projects
None yet
Development

No branches or pull requests

3 participants