Posted on 11/09/2023 23:15:08
Hi Allan
Thanks for writing.
The missing respect for "Show in sitemap" for product groups (and actually pages) is a bug that has just been fixed but not yet released as part of other changes to sitemap.xml. This id devops#15253. These are the changes:
- Respect "Show in sitemap" setting from pages and product groups
- Will remove variants from the sitemap.xml by default as that is the same product as its master - and duplicate content in your definition (This change is mostly to limit the amount of data). Variants can be enabled though.
- The last change of this work item is that if a primary domain is provided for a given website, that would be used instead of the host of the current request.
With the underlying API change we might also be able to give you a setting to leave out products that have multiple URLs and default to the primary or random first url created.
Everyone disagrees on these sitemaps, so it is hard for us to make a solution that works for everyone. But I am game for adding a bunch of checkboxes so it can be configured to your local flavor of seo expert knowledge.
We also have an example of how to create your own ruleset here: https://github.com/dynamicweb/Swift/blob/main/Swift/Files/Templates/Designs/Swift/Swift_Sitemap.xml.cshtml
Related to duplicate content (opinionated)
There is duplicate content and then there is the same page having 2 urls - that would be 2 very different situations.
What you describe is the same page having 2 urls, and it is not harming and it is not duplicate content. If anything, google will sometimes choose one URL over another URL when displayed in the search result and sometimes choose one you do not prefer. By using primary group and ensuring your implementation have the primary url in the canonicals of your page, you should have handled that just fine. According to Google docs, canonicals takes prescedence over urls in sitemap.xml
Duplicate content is when you repeat the same content on multiple pages. I.e. having the same product description for many products or inserting the same paragraph all over the website. And then again this is not even a really bad duplicate content issue, just bad and confusing content.
A really bad duplicate content problem is when you copy content from other sites to your site - i.e. re-using product descriptions from sources that deliver to many sources and therefore is available on many websites.
All that aside - I am very much aware you probably disagree.
An extremely old blog post from Google stating the above - URLs vs. Content: https://developers.google.com/search/blog/2008/09/demystifying-duplicate-content-penalty
Another interesting aspect of this issue: https://www.searchenginejournal.com/truth-about-duplicate-content/442265/
And another: https://www.searchenginejournal.com/seo/seo-myths/