Streaming live at 10am (PST)

What is the correct nofollow/noindex code snippet?

Hey all,

I have two questions regarding noindex/nofollow for search engines.

  1. What is the correct code snippet to stop search engines from crawling a page? I have seen 3 different variations on the forums that were marked as the correct answer. Here are two different examples I have seen:

<meta name="robots" content="noindex, follow" />
<meta name=”robots” content=”noindex, nofollow”>

One includes a “/” at the end and the names are wrapped in different types of quotes ( " versus ”).

I don’t actually know how to verify that the nofollow/noindex is being recognized, so I can’t test each one. Can someone please let me know which is correct?

  1. My second questions is whether I am using noindex/nofollow correctly. I have several empty collection pages. I was planning on adding the noindex/nofollow code inside the head tag of these pages as I don’t want Google to think I have pages with no content. Should I be adding this code to these pages? Should I also include this code on my 404 page (I saw this briefly mentioned on a different forum).

Thanks for any help you all can offer. I truly appreciate it!

Here is my share link and staging link if it’s needed for this question.

  • Michael

Just bumping this up if anyone has a definitive answer :slight_smile:

Edit: @samliew @vincent any chance you guys could help me here when you get a chance? I would really appreciate it!

Hello Michael,

Use a robots.txt instead. This is what I recommend for your robots.txt and sitemap.xml settings for Webflow. This will allow the search engines to index all your site pages. Since Webflow is dynamic and you probably use the CMS for dynamic pages, I do not recommend listing all your pages out manually. You can use that approach however if you prefer.

If you want to exclude a page or folder, you can use this syntax:

Hope this helps!

Bryan

@bgarrant Thanks so much for the reply, I really appreciate it. Just to make sure I don’t get this wrong and hide a bunch of things from Google, can you clarify my setup:

  1. I have a group of collection pages. The pages for Teams, Locations, Testimonials, and Hero Galleries are all empty. I display the CMS data from these collections on pages other than their individual CMS Collection Page though.

  1. So I want to hide the above 4 pages from Google because I don’t want the search engine to think I have empty content. However, I definitely want Google to crawl all of my other pages.

  2. Is it best practice to also disallow legal pages like Privacy Policy?

  3. My site settings/robots.txt looks like the following then:

    User-agent: *
    Disallow: /team/
    Disallow: /location/
    Disallow: /testimonial/
    Disallow: /hero-gallery/
    Disallow: /terms-and-conditions/
    Disallow: /privacy-policy/

Does this all look correct to you for the goals of my specific project?

Thanks again, I am really, really am thankful for the help.

It looks good to me Michael. So to be clear, if you add these entries to block the folders, any pages within those folders should also be blocked from indexing.

1 Like

Thanks again, @bgarrant. Yes, I do understand that. So if I disallow “/location” then “https://www.hosshomes.com/location/cicero” would also not be crawled.

Last question, is it best practice to add these disallows for your legal pages, /privacy-policy for example?

Thanks!

I normally block those. No reason really to have them Indexed that I can think of.

1 Like

@bgarrant Thanks for all the help!