Thursday, November 13, 2008

What Will Meta Tags and Robot.txt Files Do To Your Website?

As the web grows and the search engines are continuously getting smarter, we find that we need to learn as many tricks as possible to get our sites indexed. This tool allows you to pick and choose which pages are and are not indexed by the search engines.

Robot.txt File Versus Meta Tags

A meta tag is a tool that we use to choose which pages we want the search engines to index.

Many web developers will use the meta tag to also tell the search engines which pages they do not want to have indexed.

This code generally looks like this:

Some search engines do not use this tag and will completely ignore it. This is where the robot.txt file comes in hand. In this file you are able to list specific pages that you do not want to have indexed. These pages may include password protected folders and pages or images, for example. How Do I Make a Robot.txt File?

Begin by creating a simple Notepad text file and name it robot.txt. There are two conventions that you will use in the file to tell the search engines what you do not want them to index.

These are:

  • User-agent:* : The asterisk (*) is a very important part of this code. By using the asterisk you are addressing all of the robots that are trying to index your site. This is the easiest way to ensure that the files are not being indexed. You can target individual robots if you have the specific name or IP address for them, but in general you will want to stick with this form.
  • Disallow: / : After the disallow statement you will list any folders that you do not want to have indexed by the robots. If you do not list the files or folders have the slash (/) then that code tells the robots not to index any part of the website. You do not necessarily want to do this. All files and folders listed after this code will not be indexed.
    Robot.txt File Example

Here is an example of how you would use both of these commands. User-agent: *

Disallow: /tutorials/meta/
Disallow: /documents/images/
Disallow: /pages/404redirect/

In this example the asterisk is a sort of wild card. It addresses all of the search engine robots that would be indexing your site. Using the disallow command we are telling the search engine robots to not index the meta files, images, and 404 redirect pages in our site.


These files can be substituted for any files that you do not want to use. It is important to note here that tutorials, documents and pages are all folders, while meta, images and 404 redirect are files or actual web pages. If you do not use folders when developing your site you can leave out the folder name and only use the file name such as:
Disallow: /images/

This would direct the search engine robots to not index this file. On the other hand, if you have folders within folders and files within the inner folders, your command may look like:

Disallow: /documents/images/picture1.jpg

These files are very simple to use and would be added to your website directory with your regular website files.

Check out this sites for more robot.txt files examples

Action Steps For A Robot.txt File

1. Determine if there are any pages that you do not want the search engine to index.

2. Create a simple file using these simple commands in Notepad.

3. Upload the file to your website's file directory through your hosting company.

Important Points About Robot.txt Files

  • These pages will not be indexed. Therefore, any information or keywords on these pages would be irrelevant.
  • Most individuals choose to use this file to protect any information that they do not want the public to see.
  • Be sure to use the asterisk to ensure that you are addressing all search engines.

No comments: