darkgrey.com darkgrey.com
  Index >> About Us >> Add Your Link >> Privacy Policy >> ToS >> Submit Article
Search:   
Add Url
 

Banking & Finance

Automobile & Automotive

Art & Culture

Shopping Online

Property & Agents

Medicine & Treatment

Employment & Careers

Self Help

Cooking & Drinking

People & Communities

Internet & Computers

Fitness & Health

Science & Space

Events & News

Garden & Home

Teens & Kids

Education & Reference

Games & Play

Sports & Adventure

Companies & Business

Fashion & Lifestyle

Travel & Vacation

Music & Entertainment

Law & Politics

 

Index » Internet & Computers » Website Development
 

How to Keep Robots Out of Your Web Site

 
Author: Roberto Bonomi

THE ROBOTS.TXT FILE

You know that search engines have been created to help people find information quickly on the Internet, and the search engines acquire much of their information through robots (also known as spiders or crawlers), that look for web pages for them.

The spiders or crawlers robots explore the web looking for and recording all kinds of information. They usually start with URL submitted by users, or from links they find on the web sites, the sitemap files or the top level of a site.

Once the robot accesses the home page then recursively accesses all pages linked from that page. But the robot can also check out all the pages that can find on a particular server.

After the robot finds a web page it works indexing the title, the keywords, the text, etc. But sometimes you might want to prevent search engines from indexing some of your web pages like news postings, and specially marked web pages (in example: affiliates pages), but whether individual robots comply to these conventions is pure voluntary.

ROBOTS EXCLUSION PROTOCOL

So if you want robots to keep out from some of your web pages, you can ask robots to ignore the web pages that you dont want indexed, and to do that you can place a robots.txt file on the local root server of your web site.

In example if you have a directory called e-books and you want to ask robots to keep out of it, your robots.txt file should read:

User-agent: * Disallow: e-books/

When you dont have enough control over your server to set up a robots.txt file, you can try adding a META tag to the head section of any HTML document.

In example, a tag like the following tells robots not to index and not to follow links on a particular page:

meta name="ROBOTS" content="NOINDEX, NOFOLLOW"

Support for the META tag among robots is not so frequent as the Robots Exclusion Protocol, but most of major web indexes currently support it.

NEWS POSTINGS

If you want to keep the search engines out of your news postings, you can create an an "X-no-archive" line in of your postings' headers:

X-no-archive: yes

But although common news clients allow you to add an X-no-archive line to the headers of your news postings, some of them dont permit you to do so.

The problem is that most search engines assume that all information they find is public unless marked otherwise.

So be careful because though the robot and archive exclusion standards may help keep your material out of major search engines there are some others that respect no such rules.

If you're highly concerned about the privacy of your e-mail and Usenet postings, you must use some anonymous remailers and PGP. You can read about it here:

http://www.well.com/user/abacard/remail.html
http://www.io.com/~combs/htmls/crypto.html
http://world.std.com/~franl/pgp/

Even if you are not particularly concerned about privacy, remember that anything you write will be indexed and archived somewhere for eternity, so use the robots.txt file as much as you need it.

Author Bio:
Roberto Bonomi is a famous writer. Roberto likes to scribble articles about this topic.
You can search for this article using: How to Keep Robots Out of Your Web Site, Internet & Computers, Website Development
 
 
 

Related Articles

 
Video Communications the New Age Marketing Tool
 
4 Must-Have Tools For Turnkey Web Site Developers
 
RFID Readers Guide
 
Internet Business Marketing For Success
 
Why Most Small Business Web Sites Fail
 
SEO is Dead
 
10 Ways to Promote Your Business Using Mass Media
 
Custom Web Site Design Strategies
 
A Few Words About Spiders
 
Succeeding in the Rapidly Changing Internet Marketing World
 
 
 
 

Emphasing a Product with Monochrome Graphics

Learn an effective way to design a website centered around a product you are selling. It will grab y ... - Ric Reynolds
 

Step by Step Making Site on the Web

For every body who wanna make personal or business website ,but have no knowledge or ideas to do tha ... - Haysam Eltabl
 

Traffic with Better Web Page Structure

There are a few basics of page building that will raise your pages on search engines and get you mor ... - Bill Nadraszky
 
 

Fixing Quicken Calculation Errors

Quicken doesn't always handle your loan payments accurately. Get expert advice on how to fix this pr ... - Stephen Nelson
 

Treo 650 PDA CDMA

The Treo 650 smartphone from palmOne makes it easier than ever to stay connected. It simplifies your ... - sherry
 

Windows Not Valid After Reinstall

Obviously the first thing you should do after a fresh install is get all of the recent updates to pr ... - Matt Christensen
 

Search Engine Optimization

Search engine optimization (SEO) is the act of making one's website content more search engine frien ... - Mahesh Ugale
 

Top 10 Keys to Success You Must Know before Launching an Affiliate Program

When I talk to potential clients about starting an affiliate program for them I have a core set of c ... - Durk K Price
 
 
Index >> Privacy Policy >> ToS  
Copyright © 2008 www.darkgreycells.com All Rights Reserved.