Googlebot blocked (by robot.txt)

Hey, i launched my site 2 days ago and connected it to google search console. However the sitemap verification didn’t went through (http error) I specifically allowed it in the robots.txt & doublechecked head code + that all pages are included.

https://www.keinemaklerei.at/robots.txt
https://www.keinemaklerei.at/sitemap.xml

it looks like the site is blocked for bots - however I cannot explain why? I would be really greatful if someone could help me figure this out. :slight_smile:

Thank you so much, cheers, Constantin

Your robot.txt is wrong try

User-agent: *
Disallow:

Hi Janne, are you sure? Am i getting this wrong?

https://www.robotstxt.org/robotstxt.html

Im very sure, but do not add /

ahh thank you - missunderstood that completely. :see_no_evil: I updated it and checked with the following url, however still no luck :-/ Any ideas why?

https://search.google.com/test/mobile-friendly

That is a mobile test. Go to Google search console and ad your url

I did that too. in search console, I deleted the old sitemap, where the http error shows. Then I added it again, however the same error shows up.

i did the live test in google search console, it states the url is not available to google.


!“URL not available”
!“Crawling not allowed”

Did you republish webflow after your change?

yes, the new robots.txt was published before…

Have you verified your site for google search console

yes i did that 3 days ago. The test states that the robots.txt is the problem but before this i already tried various things.

I always register the site.xml for each domain
Http://www,domain.con/site map.xml
Https://
Http://domain…com/site map.xml
Https://

Have you checked that the robot.txt is correct generated?

Edit,
Check that the sitemap is turned on in webflow.
Also a blank robot.txt allows browsing

Thank you for your help! I blanked the robots.txt now, It was correct before. The auto sitemap is on.

Heureka! :slight_smile: i didn’t put the https://www.yourdomain.com domain in the property and there was an cached version that prevented it from running. Thank you so much for your help!!! :slight_smile:

Yes, Googlebot can be blocked using robots.txt file. If you use this file in your site so you can decide which part of your site can be scaned or not using robots.txt.

Hi Guys,

I thought I would share as this happened to us and this is how to resolve.

When hosting on Webflow your robots.txt is blank, we then entered this to stop the indexing whist the site was being built:

User-agent: *
Disallow: /

We then just removed the text/content and re published, but the site was still being blocked, so we then entered this to allow the crawl and re published but it was still being blocked:

User-agent: *
Disallow:

If you put any robots.txt in and publish you cant just remove the txt/code you have to have the correct format in, but the problem is Google can take upto 48hrs to correct the issue.

All you have to do is click the link below and force it to re index with the correct robots.txt and then re-submit to search console and all will be well!

https://www.google.com/webmasters/tools/robots-testing-tool?pli=1

Hope that helps

3 Likes

Amazing
THANKS A LOT