« Monterey Bay .Org - Nice Use of RSS | Main | Google's PR Broken »

Designing Robot Text

What is a Robot.txt file?

Well, basically it tells the robots that visit your website where they are and are not allowed to go and spider. Spidering being the thing that search engine companies do to grab your website's information and pull it back to their search engine. The search engine spider will follow the instructions of the robot text.

Designing a robot text file:

It's actually very simple to do. First of all it must be written in ascii ... to newbies that basically means make it in a plain text file ... to really, really newbies that means use your Notepad application to make it.

Your robot.txt file must live in the root directory of the web site as spiders will not look for it anywhere else.

Examples
To exclude all robots from parts of the server:
User-agent: *
Disallow: /cgi-bin/
Disallow: /misc/sitestats/


Exclude a specific spider from parts of the server:
User-agent:slurp.so/
Disallow: /cgi-bin/
Disallow: /secure/
Disallow: /products/
Disallow:/misc/sitestats/


Nothing is disallowed and the spider can follow all links:
User-agent: *
Disallow:


To allow a single robot complete access and exclude all others:
User-agent: Googlebot/1.0
Disallow:
User-agent: *
Disallow: /


This keeps your entire web site from being indexed:
User-agent: *
Disallow: /

Are you getting 404 errors?

If you do not have a robot.txt file in the root directory of your web site you're going to get a large amount of 404 errors on your web stats. This is because the robot.txt file was requested by bots or spiders, but wasn't there. You can confirm this by looking at your error logs.

Looking at Logs and Stats:

If you look at your website's log and error logs, you're going to see who is visiting your website, what are they visiting and which spider bots are spidering you. Some search engine spider bots you might see are:

209.67.247.156 - Fast WebCrawler/2.1 pre14
209.185.143.98 - Mozilla/3.0 (Slurp/s slurp@inktomi.com
216.35.103.42 - Slurp.so/1.0 slurp@inktomi.com
209.185.108.147 - Googlebot/1.0 googlebot@googlebot.com
209.67.229.101 - bos-spider10b.bos.lycos.com Lycos_Spider_(T-Rex)
204.162.96.124 - InfoSeek Sidewinder/0.9
198.3.103.97 - Excite ArchitextSpider
208.219.77.19 - NorthernLight Gulliver/1.3
Openfind data gatherer@Openbot
flunky crawler_admin@bigfoot.com
parallelContextFocusCrawler1.1
larbin_2.2.0 (crawl@compete.com)
Bot mailto:craftbot@yahoo.com
SpaceBison
webbandit
BunnySlippers
ScoutAbout
Ziggy The Clown From Hell!!
LinkWalker
LexiBot
BaiDuSpider
DigOut4U
KIT Fireball
Microsoft URL Control 5.01.4319
Microsoft URL Control 6.00.8862
moget
Xenu_s Link Sleuth 1.0r
URL Indexer
sitecheck.internetseer.com
MFC_Tear_Sample
Nokia7110
Nutscrape
TV33_Mercator_1 1.0
teomaagent crawler admin@teoma.com
bumblebee@relevare.com
Mitsu
CSE@IITBombay
MARS SV
AtlantisSearch
RepoMonkey Bait & Tackle
spider.yellopet.com www.yellopet.com
The Intraformant
SwishSpider
SlySearch (slysearch@slysearch.com)
AmigaVoyager
Toutatis 2.5 2 (http:
GaisLab data gatherer, Gaisbot
Zeus ThemeSite Viewer Webster Pro V2.9 Win32
WIBBLE WOBBLE
Surfnomore Spider v1.1
WebSauger 1.20b
UIowaCrawler
Kenjin Spider
Robot@SuperSnooper.Com
16274.345.67.23 WebWasher 3.0
NOSEYBOLLOCKS V1.0
GB2 LinkChecker
EmailWolf 1.00
EmailSiphon

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/394843/2528903

Listed below are links to weblogs that reference Designing Robot Text:

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In