So Watz Up!!

Amazing facts you've never heard/read before, extensive travelogues, gadgets, jobs, will be hearing a lot more from us on this blog and we hope to share great stories with you.

10 day SEO Guide to Supercharge your Online Business - robert.txt


Written on 1:41 AM by Mrudula

"robots.txt" file gives instructions to spiders that might visit your site regarding what folders or files the spider(s) may or may not visit. When a search engine visits a web site through a submission or when following a link from site one site to another, the search engine robot (also known as a "spider") will look for a this file. With a correctly set up robots.txt file in place, files that are made available to a normal web surfer while can sometimes be kept hidden from a search spider. This can be useful if you are trying to conserve bandwidth (data transfer) since some engines will completely skip files and folders indicated with robots.txt, if you need to keep certain private files from being indexed like data bases, stock images, if you want to link to another site with out promoting for ranking purposes, etc.

robots.txt'' file can be created via the standard notepad.exe text editor in Windows, or TextEdit in plain text mode on a Mac. You can even create a robots.txt file under a unix command line. In any of these cases you will want to make sure that the file is saved as (all lowercase) robots.txt and that it is saved under a normal text mode.

The three most common items you will find in a robots.txt file are:

* allow
* disallow
* and the wildcard or asterisk: "*"

Normally you would use the "disallow" command so that an engine not index certain areas of your site, while the "allow" command is actually redundant since they will usually follow any other link that you have not prohibited. Finally the wildcard indicates all engines thus if you had a file folder called "images" under the main directory such as: "" you might use the following coding if you wished to disallow all spiders from that folder:

User-agent: *
Disallow: /images/

If you wanted to disallow a robot from a particular set of folders, you would use a robot's name rather than a *. You can even specify individual files. For example:

User-agent: MSNBot
Disallow: /gopher/solutions/

User-agent: Googlebot
Disallow: /beta/private/new_widget_ideas.asp

One more way to control acess to search engines is by using In the meta tags area of your HTML, you can add coding such as:


This tells the robots not to index the page or follow links from that page. However if the robots finds other pages that are linked from other areas of your site, a submission, a link from another site to that page, etc. the pages that do not include the meta tags may still be indexed.

Tell Robots not to index you site:

META name="ROBOTS" content="NOINDEX"

Tell Robots not follow any links on the page:

META name="ROBOTS" content="NOFOLLOW"

Tell Robots to follow links on a site map but not index the site map web page itself:


Tell Robots not to see your site as 'promoting' the site you are linking to:


Prevent all search engines from showing a “Cached” link for your site:

meta name="robots" content="noarchive"

Allow other search engines to show a “Cached” link, preventing only Google:

meta name="googlebot" content="noarchive"

Note: this tag only removes the “Cached” link for the page. Google will continue to index the page and display a snippet.

Since the meta tag examples above tend to control the whole page, you can instead opt to leave out the meta tag control from your HTML using relevance tags.

In order to improve the rank of the destination page for the term "Our Blog":

href="">Look at our Blog

If you do not want a search engine robot to count the link for the sake of rankings:

href="" rel="nofollow">The truth about yellow widgets

So try it now.

- Munnu

If you enjoyed this post Subscribe to our feed

No Comment

Post a Comment

Blog Archive