Google has announced that GoogleBot will not follow the robots.txt file from September 1 related to indexing. Now, use the alternative which is suggested by Google to stop crawling the Googlebot. Google crawler is a bot that goes through the web’s pages and adds them to index, which is a Google database. Based on this indexation, Google can rank different websites in the search engine result page.
Robots.txt is a command given by the website to the GoogleBot which page has to crawl and which have to avoid. Mainly, it is used to optimize the website crawling ability.
Why Google is canceling it?
Google doesn’t consider Robots.txt an official directive. However, it did not work in 8% of the cases and it was not a fool-proof directive. They have officially unsupported noindex, crawl delay and no-follow directives within robot.txt files. They told websites that have it to remove their directive till September 1, 2019. The Google team analyzed the robots.txt rules unsupported by internet-draft nofollow, crawl-delay, and noindex. Naturally, the usages related to robots.txt is very low because it was undocumented by Google and also the usage was contradicted by several other rules. The mistakes affect the websites’ ranking in the search engine result page (SERP). that webmaster didn’t want.
Google’s official announcement on the morning of July, 2nd and said that they were unsupported and undocumented the rules in Robots.txt.
In their official blog about it, this is what they had written, right before suggesting alternatives:
“To maintain a working and healthy ecosystem and preparing for potential open-source releases in the future. we’re retiring the codes that handle unpublished and unsupported rules (such as noindex) on September 1, 2019.”
What are the alternatives?
Don’t be helpless with this change, Google gave a list of things one could do otherwise.
NOINDEX: The robots meta tag is very effective to remove the URL from indexing where crawling is allowed. It is supported both in the HTML response header and in HTML.
404 and 410 HTTPS status code: Both the codes are used to tell that the page does not exist. And the crawler drops that URL from indexing once they crawled and processed.
Password Protection: This protection is used to hide the page behind the login that will remove the URL from Google index. The exception is that the markup is used to indicate subscription or paywalled content.
Disallow in robots.txt: Search engine can only index that page that they know about it, blocking the page from being crawled means Search engine would not index that content. While the search engine may also index a URL based on links from other websites’ pages, without seeing the content. We aim to make such pages less visible in the future.
Search Console Remove URL tool: It is a quick and easy method to remove a URL temporarily from Google’s search results.
Google is trying to protect the website. It also finds a way to optimize the algorithms that determine which website will be on the top. It constantly changes its rules and regulations, and their algorithms, so this sudden change was not a surprise. Google has already established alternatives so that no company gets affected by this change. They have already given websites two months to adjust to the change in directives.
Read more: 8 Types of Anchor