Bingbot reading HTTPS robots.txt incorrectly

Jaguar · March 25, 2023, 10:03am

epiz_30569901

Error Message

Response is:

t<html><body><script type="text/javascript" src="/aes.js" ></script><script>function toNumbers(d){var e=[];d.replace(/(..)/g,function(d){e.push(parseInt(d,16))});return e}function toHex(){for(var d=[],d=1==arguments.length&&arguments[0].constructor==Array?arguments[0]:arguments,e="",f=0;f<d.length;f++)e+=(16>d[f]?"0":"")+d[f].toString(16);return e.toLowerCase()}var a=toNumbers("f655ba9d09a112d4968c63579db590b4"),b=toNumbers("98344c2eee86c3994890592585b49f80"),c=toNumbers("98341c68040555c988359722c399e25e");document.cookie="__test="+toHex(slowAES.decrypt(c,2,a,b))+"; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/"; location.href="https://jaguar.lovestoblog.com/robots.txt?i=1";</script><noscript>This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support</noscript></body></html>

However!
If I point Bingbot to: http://sitename/robots.txt it reads the file perfectly.

Other Information

Also, when I submit pages to Bing, the same is true with Bing returning the same response (as per above), reporting missing H1, Title and other meta tags missing.

I don’t know where to go with this. Is it a problem with Bing? (unlikely) Is there an issue with ZeroSSL? Is there a complication with the server/network setup? Is anyone else experiencing this issue?

NB Google reads sitemap correctly.

Thewebuser22 · March 25, 2023, 4:08pm

The response looks like it is related to this:

However, I know that Bing can crawl IF websites, so I did some searching on previous forum topics and found this:

The other user had that problem where things were reported missing and got the same response from the request, but as Oxy stated, the Bing bot itself can crawl your webpage.

Jaguar · March 25, 2023, 4:50pm

Thanks for your reply, but:

It is not just the robots.txt, this screen grab of a test of a live page produces the same erroneous result as from the robots.txt test. I have no idea where that piece of code comes from.

So, yes the bot can crawl (something!), but it isn’t actually reading the page it is directed at. I don’t understand

Summing up:

https://sitename/robots.txt produces the erroneous text shown above.
http://sitename/robots.txt works correctly.
It makes no difference for webpages whether I use http or https
I have the http to https redirect in .htaccess. I have tried commenting out that redirect but it made no difference. Oddly, http is still being redirected to https, (must be a plugin doing the redirect?) my guess is that http://sitename/pagename/ would work correctly but unless I can find what is doing the redirect I can’t prove it.

Thewebuser22 · March 25, 2023, 10:35pm

Yeah, because Bing can access your webpage. I guess the Live URL system, however, is not allowed. I’m assuming that was what they were talking about here:

However, like said, Bing should be able to crawl your website (correctly):

However,

So while your webpage is indexed, Live URL does not seem to be able to get the contents of the webpage.

That’s what I can gather from this, anyways.

The guy in the other topic showed the same problem, the image that he showed and yours has the same results, which makes me think that the security system is the problem, like mentioned in that topic:

I believe then, because this seems similar (with what I can make out anyways), that this would be the solution:

as

This is from what I can tell, anyways. Someone could pipe in with more information if they need, but I know that a lot of bots will have problems with indexing your website (as in, the full functionality may not be available). That is deliberate, so as to avoid the bad bots (even though good ones may be blocked in the process, which is just another trade-off for using this system):

Jaguar · March 26, 2023, 8:51am

I appreciate the effort you have put into helping resolve this.

My read of the situation is that Cloudflare is compulsory if I want Bing to be able to crawl the website.

But… I can’t use Cloudflare on a free sub-domain

Are we really saying Bing is off-limits for sub-domains?

Thewebuser22 · March 26, 2023, 5:12pm

No, just the Live URL (which allows Bing to show you what it sees).

It still indexes your website, but you just can’t use certain tools (like Live URL) that attempt to pull website information with a system that doesn’t use cookies and javascript. Bing web crawlers, however, are permitted as they use JS and cookies, so the security system lets them crawl.

Jaguar · March 26, 2023, 6:43pm

Hmm, okay I see the distinction. Thanks again for your time.