Robots.txt: How to Show Your Best Side to Google

Robots.txt: How to Show Your Best Side to Google

Thursday, October 30, 2014

Recently Google changed the way they view sites. In the past search engines tended to see a website much as a user using a text-only browser would. This has since changed. Now search engines have switched to looking at sites as a human would a modern web browser; rich with imagery, video content and a variety of other media.


This can cause problems if a sites robots.txt accidently blocks JavaScript, CSS and images from being crawled. This can happen when a CMS (content management system) has a default robots.txt that blocks the key files that are required to display a page from Google’s crawler’s gaze. If you do this Google have gone on record as saying:

Disallowing crawling of JavaScript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings.”

This is a major change and indicates that how a site looks to a human user now factors as a ranking metric. After all Google want to provide the end user the best possible sites. And in this day and age a text-based site simply wont do. We have come to expect imagery, video and a host of other points of interaction. A Robots.txt file is a powerful tool, but the downside is that even a minor error in the robots.txt file can cause major disruption. We have seen sites block so much of their JavaScript, images and CSS that they appear to Google as a text-only website. In the past we have seen sites accidentally block themselves entirely, which had a massive impact on the traffic to their site.

Tips for checking your robots.txt

Google now provide tools as part of Google Webmaster Tools that allow you to see the results of a Google crawl and if any elements on a page are blocked. You or your developers should not just use a default robots.txt for a particular CMS without checking each line to ensure that it’s required or if additional lines need to be added. Make sure that every site has a robots.txt even if it is empty. Sometimes when Google’s crawler bots can’t find a robots.txt file it can assume that the entire site no longer exists and you could take a substantial rankings hit and ultimately lose traffic. Another problem that is often encountered with robots.txt files comes during the launch of a new site. Developers put robots.txt files to block all crawlers during the development phase. This is done so as not to allow a half built site to rank. Forgetting to remove these is a problem I see more often than I would like to. If you would like any further clarification on this issue do not hesitate to get in contact with a member of the team. We’d be happy to help.