Streaming live at 10am (PST)

Auto Generate Sitemap publish and unpublish


#1

Two bugs:
(1) When I switch on auto generate sitemap in site setting, and publish. Everything is fine, I can see it in domain.com/sitemap.xml. However, when I toggle the auto generate back off, and hit publish. I can still see the sitemap at domain.com/sitemap.xml (the same one). If this is intentional, then it is certainly confusing because I left the custom sitemap section blank, so I would actually expect it to be deleted.

(2) I think this is a bug, but not sure.
Under robots.txt, I put in:
User-agent: *
Disallow: /template/*
To disallow a part (everything in template) to be crawled. I've tried this both with and without the *. When I auto gen the sitemap, I still see all the pages in /template in sitemap.xml. I'm not fluent in how this works. Maybe this is correct and sitemap still generates the pages, but robots won't crawl /template. I figured I will report this anyway, as it is somewhat related to (1) above.


#2

For 1. anything you create and publish will stay published unless you unpublish and publish again. Check the unpublish link in the publish dialog, unpublish, publish again, the xml should be gone.

It may solve 2. too, can you tell me?


#3

For 1. I just tested your theory. I unpublished the site. I checked domain.com/sitemap.xml and it gave me a 404. Which is correct. Went back and toggled auto gen to off (custom sitemap is blank). Saved and published the site. I see my old domain.com/sitemap.xml still there. So, I think your theory is incorrect.

For 2. I don't really know how this is supposed to work. If I have disallowed in robots.txt, does the auto gen sitemap skip those pages? Or does it still generate those pages in sitemap? Anyway, for me, it is still generating those pages in sitemap. If that is by design, then it is fine. If not, then it is a bug.


#4

To be clear, I am simply reporting a bug. My sitemap is fine as I just use the custom sitemap (using the auto gen, but deleting a few lines, then pasting it back in) and published it, then it is fine.

I am just reporting what I think is a bug.


#5

I stand corrected then, thanks for testing this. I didn't mean not to take your post seriously.

I went to check on my own sites. I can browse http://larochelle.today/sitemap.xml which is a site I've never activated the sitemap option for. Only sites that have no hosting defined don't have a sitemap. (inb4.webflow.io/sitemap.xml for example)

I've re-defined your post as a bug.

@waldo could you have a look on that? What's the use of the toggle if it's on anyway?


#6

Hi @vincent and @harveywun thank you so much for reaching out and excellent questions!

Our auto generated sitemap does not remove pages from the sitemap.xml if you have a custom robots.txt record to not index. Referencing this article.

An XML sitemap shouldn't override robots.txt. If you have Google Webmaster Tools setup, you will see warnings on the sitemaps page that pages being blocked by robots are being submitted.

Now, robots.txt does not prevent indexation, just crawling. So if the pages were indexed before they implemented robots.txt, they may continue to be indexed. Google will also display just the URL for pages that it's discovered, but can't crawl because of robots.txt.

I have gone ahead and changed this to a Wishlist item and created the item here to vote on: https://wishlist.webflow.com/ideas/WEBFLOW-I-211

​Please let me know if this is helpful, or if you have any additional questions, I'm happy to help further. :slight_smile:


#7

#8

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.