domains – How can I use robots.txt to disallow subdomain only?

You can serve a different robots.txt file based on the subdomain through which the site has been accessed. One way of doing this on Apache is by internally rewriting the URL using mod_rewrite in .htaccess. Something like:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www.)?example.com$ (NC)
RewriteRule ^robots.txt$ robots-disallow.txt (L)

The above states that for all requests to robots.txt where the host is anything other than www.example.com or example.com, then internally rewrite the request to robots-disallow.txt. And robots-disallow.txt will then contain the Disallow: / directive.

If you have other directives in your .htaccess file then this directive will need to be nearer the top, before any routing directives.

Robots.txt is blocking my labels

In my adsense account in “revenue optimization” i have crawl errors then when i click “fix crawl errors” then this..

Blocked Urls Error
http://www.rechargeoverload.in/search/label Robot Denied

My robots.txt:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://www.rechargeoverload.in/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: http://www.rechargeoverload.in/atom.xml?redirect=false&start-index=501&max-results=500
SEMrush

 

seo – how to specify all values for a parameter in robots.txt and sitemap.xml?

I have a php file that extracts a post id from a url’s parameter to show a specific post content. for example it uses the postid parameter from the following url to show the post content with id of 21:
example.com/mydir/posts.php?postid=21
the postid is dynamic and every time that a user post a content a new post id will be added to the database.
how can I specify this php file in my robots.txt and sitemap.xml?
Currently, I use the following:
robots.txt:

Disallow: /mydir/
Allow: /mydir/posts.php?username=*

sitmap.xml:
<loc>example.com/mydir/posts.php?postid=*</loc>

8 – Robots.txt change are not reflected in AWS server

we have a Deployment process, We initiated deployment initially from branch to environment through Jenkins. After its process starts with the help of docker environment which gets all the files from my branch and takes an image back up and it’s deployed to the concerned environment. All other files are reflecting correctly but robots.txt changes are not reflecting.

Changes are reflecting in the S3 bucket.

Please give suggestions to changes the file on the server or any idea to resolve those issues?

disallow – Block urls that start and end with common prefixes and suffixes using robots.txt file

I have the urls like:
/category1/type-name-more
And I have to ban all such urls.

Disallow: /*-more$ is not OK because it can block all urls ending in -more. So is the variant
Disallow: /category1/ */-more$
correct in this case? To ban all URLs from /category1/ section and ending with -more?

More Info: -name- is always changing in the model.

google search – In robots.txt, should I use a wildcard at the end of a disallow directive?

I want to disallow a specific folder and all of its files and subdirectories but I don’t know the difference between Disallow: /somedir/ and Disallow: /somedir/*. which one of these lines should I use?

By the way, what does Disallow: /somedir? mean? should I use it too?

what’s the difference between Disallow: /somedir/ and Disallow: /somedire/* in robots.txt?

I want to disallow a specific folder and all of its files and subdirectories but I don’t know the difference between Disallow: /somedir/ and Disallow: /somedir/*. which one of these lines should I use?
By the way, what does Disallow: /somedir? mean? should I use it too?

Order of items in a very basic robots.txt changing the command scope?

I had a very simple robots.txt file setup for a site I maintain. After a spike of traffic that the ISP put down to crawlers they suggested I add a crawl delay directive which is fair enough. So I ended up with this file

User-agent: *
Disallow: /a-page-i-wanted-to-ignore
Crawl-delay: 1

I still receive spikes in traffic that are causing downtime. The ISP told me that this directive has only defined the crawl delay for the page /a-page-i-wanted-to-ignore

I wanted to check, is that correct? Is a command like crawl-delay placed under a disallow causing it to be specific to the disallow clause ?

importance of robots.txt

i want to know the importance of robots.txt for a website

seo – Link rel=”nofollow” vs robots.txt disallow for “/search” URLs?

Since you don’t want bots to crawl these /search pages then you have no option other than to block them in robots.txt.

However, that doesn’t necessarily prevent the /search pages from appearing as link only results in the SERPS (if other pages link to them) – but this is unlikely. And they are very unlikely to rank higher than other search results anyway.

2: Add a rel="noindex" to the links that lead to the /search route.

noindex is not a recognised rel value for Google, maybe you mean rel="nofollow"?

You can set rel="nofollow" on the link to the search pages. However, this won’t necessarily prevent the target pages from appearing in the SERPs (if there are other links to them) or even from being crawled, although it should. (This value appears to have been demoted to an advisory value by Google).

To prevent a page from being indexed then you need to include a <meta name="robots" content="noindex"> tag on those pages. Or serve with an X-Robots-Tag HTTP response header. But in this case, you can’t block crawling in robots.txt, otherwise the bot won’t see the meta tag (or response header).

Depending on your use case, it might be beneficial to allow Google to crawl your search pages (in order to find internal pages), but not to index them. However, you don’t necessarily want to waste Google’s crawl budget on your search pages either, so it can depend on the nature of your site.