The implementation of a proper robots.txt file is very important for your website search engine optimization.
About Robots.txtIt is basically a standard on internet that prevent web crawlers and other web bots from accessing all or part of a website which is otherwise publicly viewable and crawled.
This is basically all you need to know. So you can easily prevent all search engine crawlers to access, crawl and read any folder in your root.
How This Protocol WorksAs I can see, most websites today uses this standard to successfully prevent all web crawlers to see some parts of a website.
The most common examples are these:
User-agent: *This above example simply allows all robots to view all files because the wildcard specifies all crawling robots.
If you want to tell all crawlers to prevent them reading your protected (example: temp) folder, then use this syntax below:
User-agent: *In this above example, all web crawlers have no access to your temp folder in your website root. Of course, you can add more than one folders into your robots.txt file.
You can also prevent all crawlers to read a specific files on your website. All you need to do is to enter full path to that file in Disallow line.
Some crawlers support a sitemap protocol, allowing you to add multiple sitemaps in the same robots.txt file. This can be very useful for websites who use multiple sitemaps. All you need to to is to add the following line in your robots.txt file:
Sitemap: http://www.yoursite.com/sitemap-1.xmlI use the first one and mostly used method. I allow all crawlers to access any part/folder on my website. Also, I have added a URL to my sitemap – just to make sure it is properly crawled.
Robots.txt on Popular SitesThere is a collection of robots.txt files from some popular sites on internet. You can also check this out yourself by entering /robots.txt right next to main address of any website you want.
First there are some popular sites:
blogs and websites on internet uses this protocol. Take some time and consider which folders and parts of your website you should prevent from being crawled.
This may help crawlers to be more effective when they crawl your website. Reducing their load and taking less time to crawl your website.