Google search console error: indexed, although blocked by robots.txt

I've used shared hosting in the past and everything was fine. But when I switched to a google search console (Digital Ocean), VPS (Digital Ocean) gives this error: "I … | Read the rest of https://www.webhostingtalk.com/showthread.php?t = 1786978 & goto = NewPost

robots.txt – Google Search Console Configured for Subfolder Site

I run a website https://www.example.com/ and have just launched a new website https://www.example.com/talkabout (so with a subfolder).
I've already set up a GSC account, Sitemap (html and XML) and a robots.txt file for the root domain, but I was wondering if I needed to configure them for https://www.example.com/ talkabout?

I think the robots.txt file from the "/ talkabout" site should be the same as the root domain (https://www.example.com/robots.txt), but should I set up a separate GSC account and sitemap? XML For example, https://www.example.com/talkabout/sitemap.xml and add it to the robots.txt file?

Thanks in advance

Can the robots.txt file be used to prevent robots from seeing poorly loaded content?

Say googlebot is scratching https://example.com/page.

  • example.com has a robots.txt file that forbids /endpoint-for-lazy-loaded-contentbut allows /page
  • /page lazy content using /endpoint-for-lazy-loaded-content (via fetch)

Does Googlebot see loaded lazy content?

The Google URL inspection indicates that the URL of my image is blocked by the robots.txt file. I do not even have one!

I just discovered that our image system domain was not explored by Google for a long time.
The reason is that all the URLs seem to be blocked by robots.txt – but I do not even have one.

Disclaimer: Due to some configuration testing, I now have a robots file authorizing everything at the root of the website. I did not have one before that time.

We execute an image resizing system in one of the subdomains of our website.
I'm getting very strange behavior because Search Console claims to be blocked by robots.txt, whereas in fact I do not even have one in the first place.

All the URLs in this subdomain give me this result when I test them live:

Unknown URL to Google

URL supposedly blocked by robots

While trying to debug the problem, I created a robots.txt file at the root:

valid robots

The robots file is already visible in the search results:

indexed robots

The response headers also seem to be correct:

​HTTP/2 200 
date: Sun, 27 Oct 2019 02:22:49 GMT
content-type: image/jpeg
set-cookie: __cfduid=d348a8xxxx; expires=Mon, 26-Oct-20 02:22:49 GMT; path=/; domain=.legiaodosherois.com.br; HttpOnly; Secure
access-control-allow-origin: *
cache-control: public, max-age=31536000
via: 1.1 vegur
cf-cache-status: HIT
age: 1233
expires: Mon, 26 Oct 2020 02:22:49 GMT
alt-svc: h3-23=":443"; ma=86400
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 52c134xxx-IAD

Here are some examples of URLs for testing:

https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_zg1YXWVbJwFkxT_ZQR534L90lnm8d2IsjPUGruhqAe.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_FPutcVi19O8wWo70IZEAkrY3HJfK562panvxblm4SL.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/09/legiao_gTnwjab0Cz4tp5X8NOmLiWSGEMH29Bq7ZdhVPlUcFu.png.jpeg

What should I do?

Google says that my URL is blocked by robots.txt – I do not even have one!

I just discovered that our image system domain was not explored by Google for a long time.
The reason is that all the URLs seem to be blocked by robots.txt – but I do not even have one.

Disclaimer: Due to some configuration testing, I now have a robots file authorizing everything at the root of the website. I did not have one before that time.

We execute an image resizing system in one of the subdomains of our website.
I'm getting very strange behavior because Search Console claims to be blocked by robots.txt, whereas in fact I do not even have one in the first place.

All the URLs in this subdomain give me this result when I test them live:

Unknown URL to Google

URL supposedly blocked by robots

While trying to debug the problem, I created a robots.txt file at the root:

valid robots

The robots file is already visible in the search results:

indexed robots

The response headers also seem to be correct:

​HTTP/2 200 
date: Sun, 27 Oct 2019 02:22:49 GMT
content-type: image/jpeg
set-cookie: __cfduid=d348a8xxxx; expires=Mon, 26-Oct-20 02:22:49 GMT; path=/; domain=.legiaodosherois.com.br; HttpOnly; Secure
access-control-allow-origin: *
cache-control: public, max-age=31536000
via: 1.1 vegur
cf-cache-status: HIT
age: 1233
expires: Mon, 26 Oct 2020 02:22:49 GMT
alt-svc: h3-23=":443"; ma=86400
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 52c134xxx-IAD

Here are some examples of URLs for testing:

https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_zg1YXWVbJwFkxT_ZQR534L90lnm8d2IsjPUGruhqAe.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/10/legiao_FPutcVi19O8wWo70IZEAkrY3HJfK562panvxblm4SL.png.jpeg
https://kanto.legiaodosherois.com.br/w760-h398-gnw-cfill-q80/wp-content/uploads/2019/09/legiao_gTnwjab0Cz4tp5X8NOmLiWSGEMH29Bq7ZdhVPlUcFu.png.jpeg

What should I do?

What is the best – Meta Robot tags or robots.txt?

What is the best – Meta Robot tags or robots.txt?

Robots.txt issue – WordPress Stack Exchange Development

When we ask our website to index through the Google Search Console, we receive an error message of the type:

Crawl allowed? No: blocked by robots.txt
Page fetch Failed: Blocked by robots.txt

In the robots.txt file, we mentioned some pages to ban like:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /account/*
Disallow: /talent-profile

but we still have the problem. How to solve this?

Why is Google indexing our robots.txt file and displaying it in search results?

For some reason, Google indexes the robots.txt file of some of our sites and displays it in the search results. See screenshots below.

Our robots.txt file is not linked to any place on the site and contains only the following:

User-agent: *
Crawl-delay: 5

This only happens on certain sites. Why is this happening and how can we stop it?

enter the description of the image here

Screenshot 1: Google Search Console

enter the description of the image here

Screenshot 2: Google search results

Why is Google indexing our robots.txt file and displaying it in search results?

For some reason, Google indexes the robots.txt file of some of our sites and displays it in the search results. See screenshots below.

Our robots.txt file is not linked to any place on the site and contains only the following:
SEMrush

User Agent: *
Time to wait: 5

This only happens on certain sites. Why is this happening and how can we stop it?

[IMG] "data-url =" https://i.stack.imgur.com/5t9Ms.png

Screenshot 1: Google Search Console

[IMG] "data-url =" https://i.stack.imgur.com/V2UaU.png

Screenshot 2: Google search results

What is forbidden in the robots.txt file?

What is forbidden in the robots.txt file?