Streaming live at 10am (PST)

Prevent Webflow Content from Being Indexed

Hello,

I just realized that the PDFs housed in our webflow site’s CMS are being indexed by Google. Is there a way to stop this?


Here is my public share link: LINK
(how to access public share link)

1 Like

No there is not since the resources are stored on the common CDN. You would need to use other hosting for those files.

Hi @webdev - is this still the case?

I came across an answer from @mistercreate here which said the following:

Yes, you can exclude the PDF by adding the adding the following to your SEO settings:
Disallow: /*.pdf for robots.txt

Your robots.txt settings can be found within any project within the SEO tab /seo of your project.

Hopefully, this was helpful. Feel free to let me know if this is helpful, or if you have any additional questions.

I’ll be standing by to help further!

Yes. The / is your current domain, not the CDN. Since the files are on the CDN a robots.txt exclusion can’t be honored. If it could anyone could block everyone else from having indexed content. The solution is to place PDF assets on a hosting platform where you can control the asset and indexing.

Thanks Jeff. Do you have any suggestions of what’s easiest to use for this?

@alastairbudge - Easiest is subjective. I manage lots of servers and cloud infrastructure so I have lots of options. What you need will depend on how often a PDF is served and where the visitors are located geographically. A busy resource might require a CDN.

Basically you need to be able to place a robots.txt file in the root of whatever the domain or subdomain is. So you could use any CDN like Amazon S3 or a simple web hosting provider that allows you to upload and publically serve resources plus your robots.txt so you can restrict bots.

Thanks @webdev. I guess Amazon S3 will be easiest.

Think this would definitely be a great feature to have on Webflow for users who don’t want uploaded documents to be indexed by Google.