I have a multi-tiered architecture for my website. Most content exists in Webflow, but not all of it. Our main website is actually served by our own service, but it proxies most path requests to Webflow. We need this for a couple of reasons. The overall typical flow of a user interaction is browser → our proxy → Webflow.
I want to make sure that the connection between our proxy and Webflow is TLS protected. It appears that there are two ways to set this up.
I can have Webflow host some other domain, like www-webflow.example.com. This allows Webflow to provision certificates with Lets Encrypt correctly, and the Webflow settings page has a green checkmark, and everything looks happy. Our proxy can receive requests for www.example.com and proxy appropriate ones to www-webflow.example.com, returning the responses. The problem with this approach is content like www.example.com/sitemap.xml, which now is full of URLs like www-webflow.example.com, and not www.example.com. It is not just the sitemap though; most HTML pages have metadata about what domain name Webflow thinks we’re using (www-webflow.example.com) and I very much don’t want the domain Webflow thinks we’re using to be one that leaks out to the search engine bots.
I can tell Webflow that it is hosting www.example.com. This is what we’re currently doing. If I configure DNS temporarily to CNAME to Webflow, then Webflow can correctly get the Lets Encrypt certificate. Then I can point DNS at our proxy’s A record again, and everything works (for now). Our proxy can contact Webflow over TLS successfully, validating a good cert (our proxy lies about SNI in the TLS handshake and the HTTP 1.1 Host header), but Webflow looks unhappy. Its expectations about DNS are unmet and there’s an angry looking warning in the hosting settings. More pertinently, I am worried about Lets Encrypt certificate renewal in 90 days.
The second solution is okay if Webflow uses HTTP-01 Lets Encrypt challenges (with what path?), but does not work and is a ticking time bomb certificate-wise if Webflow uses TLS-ALPN-01.
The first solution is a disaster for SEO. We even changed our proxy to search and replace content returned from Webflow to get the URLs in e.g. sitemap.xml correct, but then realized our wacky proxy search and replace wasn’t behaving correctly for compressed content encodings. We stopped short before extending our proxy to decompress and recompress content after a search and replace, feeling that was too far afield.
It’s unclear to me what Base Tag and Href Prefix are supposed to do in the “Custom Code” tab, but we couldn’t figure out how to make that work correctly for both our staging (www-staging.example.com, www-staging-webflow.example.com) and production site publish targets. Would that fix our problem here if we got rid of our staging site?
What should we be doing here? In reading the docs it does appear we are not the first to reverse proxy Webflow content, but I can find no content on which Lets Encrypt renewal practices Webflow employs, or how to tell Webflow what it should think the base domain/path of a site is, regardless of where it believes it is hosting it.