“Search engines only know about your site because they are able to (or not able to) access it. While visitors can click freely from page to page of your site (assuming there aren’t logins or other secure areas they have pass through first), robots *should* follow instructions you give them about what to access and how often. The robots.txt is this set of instructions.”

“There are all kinds of robots online, some good and some not so good. This may sound a little like Terminator (good vs evil bots) and it is, just without the hostile takeover of humanity (yet).”

Evil robots eat up your bandwidth for evil purposes.

robots.txt online generator: http://tools.seobook.com/robots-txt/generator/

“Google’s most famous robot is called Googlebot and you can learn more about it here.”

“Googlebot crawls the web by following links from one page to another, so if your site isn’t well linked, it may be hard for us to discover it.”

“Bing’s robot is called… wait for it… bingbot. Duane Forrester gives us the skinny on bingbot (video in Silverlight, because they’re Microsoft!):”

Source: http://outspokenmedia.com/seo/getting-to-know-your-bots-robots-txt-101/

http://www.whitehouse.gov/robots.txt

#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used:    http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

User-agent: *
Crawl-delay: 10
# Directories
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /profiles/
Disallow: /scripts/
Disallow: /themes/
# Files
Disallow: /CHANGELOG.txt
Disallow: /cron.php
Disallow: /INSTALL.mysql.txt
Disallow: /INSTALL.pgsql.txt
Disallow: /install.php
Disallow: /INSTALL.txt
Disallow: /LICENSE.txt
Disallow: /MAINTAINERS.txt
Disallow: /update.php
Disallow: /UPGRADE.txt
Disallow: /xmlrpc.php
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips/
Disallow: /logout/
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register/
Disallow: /user/password/
Disallow: /user/login/
# Paths (no clean URLs)
Disallow: /?q=admin/
Disallow: /?q=comment/reply/
Disallow: /?q=filter/tips/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

another example:

http://edition.cnn.com/robots.txt
Sitemap: http://www.cnn.com/sitemaps/sitemap-index.xml
Sitemap: http://www.cnn.com/sitemaps/sitemap-news.xml
Sitemap: http://www.cnn.com/sitemaps/sitemap-video-index.xml
Sitemap: http://www.cnn.com/sitemaps/sitemap-interactive.xml
User-agent: *
Disallow: /.element
Disallow: /editionssi
Disallow: /ads
Disallow: /aol
Disallow: /audio
Disallow: /audioselect
Disallow: /beta
Disallow: /browsers
Disallow: /cl
Disallow: /cnews
Disallow: /cnn_adspaces
Disallow: /cnnbeta
Disallow: /cnnintl_adspaces
Disallow: /development
Disallow: /NewsPass
Disallow: /NOKIA
Disallow: /partners
Disallow: /pipeline
Disallow: /pointroll
Disallow: /POLLSERVER
Disallow: /pr
Disallow: /PV
Disallow: /quickcast
Disallow: /Quickcast
Disallow: /QUICKNEWS
Disallow: /test
Disallow: /virtual
Disallow: /WEB-INF

another example:

http://www.amazon.com/robots.txt
User-agent: *
Disallow: /exec/obidos/account-access-login
Disallow: /exec/obidos/change-style
Disallow: /exec/obidos/flex-sign-in
Disallow: /exec/obidos/handle-buy-box
Disallow: /exec/obidos/tg/cm/member/
Disallow: /gp/cart
Disallow: /gp/flex
Disallow: /gp/product/e-mail-friend
Disallow: /gp/product/product-availability
Disallow: /gp/product/rate-this-item
Disallow: /gp/sign-in
Disallow: /gp/reader
Disallow: /gp/sitbv3/reader
Disallow: /gp/richpub/syltguides/create
Disallow: /gp/gfix
Disallow: /gp/associations/wizard.html
Disallow: /gp/dmusic/order
Disallow: /gp/legacy-handle-buy-box.html
Disallow: /gp/aws/ssop
Disallow: /gp/yourstore
Disallow: /gp/gift-central/organizer/add-wishlist
Disallow: /gp/vote
Disallow: /gp/voting/
Disallow: /gp/music/wma-pop-up
Disallow: /gp/customer-images
Disallow: /gp/richpub/listmania/createpipeline
Disallow: /gp/content-form
Disallow: /gp/pdp/invitation/invite
Disallow: /gp/customer-reviews/common/du
Disallow: /gp/customer-reviews/write-a-review.html
Disallow: /gp/associations/wizard.html
Disallow: /gp/music/clipserve
Disallow: /gp/offer-listing
Disallow: /gp/customer-media/upload
Disallow: /gp/history
Disallow: /gp/item-dispatch
Disallow: /gp/dmusic/order/handle-buy-box.html
Disallow: /gp/recsradio
Disallow: /gp/slredirect
Disallow: /dp/shipping/
Disallow: /dp/twister-update/
Disallow: /dp/manual-submit/
Disallow: /dp/e-mail-friend/
Disallow: /dp/product-availability/
Disallow: /dp/rate-this-item/
Disallow: /gp/registry/wishlist/*/reserve
Disallow: /gp/structured-ratings/actions/get-experience.html
Disallow: /gp/twitter/
Disallow: /ap/signin
Disallow: /gp/registry/wishlist/
Disallow: /wishlist/
Allow: /wishlist/universal*
Allow: /wishlist/vendor-button*
Allow: /wishlist/get-button*
Disallow: /gp/wishlist/
Allow: /gp/wishlist/universal*
Allow: /gp/wishlist/vendor-button*
Allow: /gp/wishlist/ipad-install*
Disallow: /registry/wishlist/
Disallow: /review/common/du
Disallow: /gp/registry/search.html
Disallow: /product-reviews/B0069IY63Y
Disallow: /gp/orc/rml/
Disallow: */gcrnsts
Disallow: /gp/gc/widget
Disallow: /gp/dmusic/mp3/player
Disallow: /gp/entity-alert/external
Disallow: /gp/customer-reviews/dynamic/sims-box
Disallow: /review/dynamic/sims-box
Disallow: /gp/redirect.html
Disallow: /gp/twister/ajaxv2
Disallow: /ss/twister/ajax
Disallow: /b?*node=7454917011
Disallow: /b?*node=7454927011
Disallow: /b?*node=7454939011
Disallow: /b?*node=7454898011
Disallow: /gp/customer-media/actions/delete/
Disallow: /gp/customer-media/actions/edit-caption/
Disallow: /gp/dmusic/

User-agent: Googlebot
Disallow: /rss/people/*/reviews
Disallow: /gp/pdp/rss/*/reviews
Disallow: /gp/cdp/member-reviews/
Disallow: /gp/aw/cr/
Disallow: /exec/obidos/account-access-login
Disallow: /exec/obidos/change-style
Disallow: /exec/obidos/flex-sign-in
Disallow: /exec/obidos/handle-buy-box
Disallow: /exec/obidos/tg/cm/member/
Disallow: /gp/cart
Disallow: /gp/flex
Disallow: /gp/product/e-mail-friend
Disallow: /gp/product/product-availability
Disallow: /gp/product/rate-this-item
Disallow: /gp/sign-in
Disallow: /gp/reader
Disallow: /gp/sitbv3/reader
Disallow: /gp/richpub/syltguides/create
Disallow: /gp/gfix
Disallow: /gp/associations/wizard.html
Disallow: /gp/dmusic/order
Disallow: /gp/legacy-handle-buy-box.html
Disallow: /gp/aws/ssop
Disallow: /gp/yourstore
Disallow: /gp/gift-central/organizer/add-wishlist
Disallow: /gp/vote
Disallow: /gp/voting/
Disallow: /gp/music/wma-pop-up
Disallow: /gp/customer-images
Disallow: /gp/richpub/listmania/createpipeline
Disallow: /gp/content-form
Disallow: /gp/pdp/invitation/invite
Disallow: /gp/customer-reviews/common/du
Disallow: /gp/customer-reviews/write-a-review.html
Disallow: /gp/associations/wizard.html
Disallow: /gp/music/clipserve
Disallow: /gp/offer-listing
Disallow: /gp/customer-media/upload
Disallow: /gp/history
Disallow: /gp/item-dispatch
Disallow: /gp/dmusic/order/handle-buy-box.html
Disallow: /gp/recsradio
Disallow: /gp/slredirect
Disallow: /dp/shipping/
Disallow: /dp/twister-update/
Disallow: /dp/manual-submit/
Disallow: /dp/e-mail-friend/
Disallow: /dp/product-availability/
Disallow: /dp/rate-this-item/
Disallow: /gp/registry/wishlist/*/reserve
Disallow: /gp/structured-ratings/actions/get-experience.html
Disallow: /gp/twitter/
Disallow: /ap/signin
Disallow: /gp/registry/wishlist/
Disallow: /wishlist/
Allow: /wishlist/universal*
Allow: /wishlist/vendor-button*
Allow: /wishlist/get-button*
Disallow: /gp/wishlist/
Allow: /gp/wishlist/universal*
Allow: /gp/wishlist/vendor-button*
Allow: /gp/wishlist/ipad-install*
Disallow: /registry/wishlist/
Disallow: /review/common/du
Disallow: /gp/registry/search.html
Disallow: /product-reviews/B0069IY63Y
Disallow: /gp/orc/rml/
Disallow: */gcrnsts
Disallow: /gp/gc/widget
Disallow: /gp/dmusic/mp3/player
Disallow: /gp/entity-alert/external
Disallow: */sim/B001132UEE
Disallow: /gp/customer-reviews/dynamic/sims-box
Disallow: /review/dynamic/sims-box
Disallow: /gp/redirect.html
Disallow: /gp/twister/ajaxv2
Disallow: /ss/twister/ajax
Disallow: /b?*node=7454917011
Disallow: /b?*node=7454927011
Disallow: /b?*node=7454939011
Disallow: /b?*node=7454898011
Disallow: /gp/customer-media/actions/delete/
Disallow: /gp/customer-media/actions/edit-caption/
Disallow: /gp/dmusic/

User-agent: EtaoSpider
Disallow: /

# Sitemap files
Sitemap: http://www.amazon.com/sitemap-manual-index.xml
Sitemap: http://www.amazon.com/sitemap_vendor_videos_us.xml
Sitemap: http://www.amazon.com/sitemap_vod_index.xml
Sitemap: http://www.amazon.com/sitemaps.f3053414d236e84.SitemapIndex_0.xml.gz
Sitemap: http://www.amazon.com/sitemaps.1946f6b8171de60.SitemapIndex_0.xml.gz
Sitemap: http://www.amazon.com/sitemaps.bbb7d657c7e29fa.SitemapIndex_0.xml.gz
Sitemap: http://www.amazon.com/sitemaps.11aafed315ee654.SitemapIndex_0.xml.gz
Sitemap: http://www.amazon.com/sitemaps.c21f969b5f03d33.SitemapIndex_0.xml.gz

google and sitemaps: https://support.google.com/webmasters/answer/156184 

list of all known web crawlers / robots out there: http://www.robotstxt.org/db.html

liked this article?

  • only together we can create a truly free world
  • plz support dwaves to keep it up & running!
  • (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
  • really really hate advertisement
  • contribute: whenever a solution was found, blog about it for others to find!
  • talk about, recommend & link to this blog and articles
  • thanks to all who contribute!
admin