“Search engines only know about your site because they are able to (or not able to) access it. While visitors can click freely from page to page of your site (assuming there aren’t logins or other secure areas they have pass through first), robots *should* follow instructions you give them about what to access and how often. The robots.txt is this set of instructions.”
“There are all kinds of robots online, some good and some not so good. This may sound a little like Terminator (good vs evil bots) and it is, just without the hostile takeover of humanity (yet).”
Evil robots eat up your bandwidth for evil purposes.
robots.txt online generator: http://tools.seobook.com/robots-txt/generator/
“Google’s most famous robot is called Googlebot and you can learn more about it here.”
“Bing’s robot is called… wait for it… bingbot. Duane Forrester gives us the skinny on bingbot (video in Silverlight, because they’re Microsoft!):”
Source: http://outspokenmedia.com/seo/getting-to-know-your-bots-robots-txt-101/
http://www.whitehouse.gov/robots.txt # # robots.txt # # This file is to prevent the crawling and indexing of certain parts # of your site by web crawlers and spiders run by sites like Yahoo! # and Google. By telling these "robots" where not to go on your site, # you save bandwidth and server resources. # # This file will be ignored unless it is at the root of your host: # Used: http://example.com/robots.txt # Ignored: http://example.com/site/robots.txt # # For more information about the robots.txt standard, see: # http://www.robotstxt.org/wc/robots.html # # For syntax checking, see: # http://www.sxw.org.uk/computing/robots/check.html User-agent: * Crawl-delay: 10 # Directories Disallow: /includes/ Disallow: /misc/ Disallow: /modules/ Disallow: /profiles/ Disallow: /scripts/ Disallow: /themes/ # Files Disallow: /CHANGELOG.txt Disallow: /cron.php Disallow: /INSTALL.mysql.txt Disallow: /INSTALL.pgsql.txt Disallow: /install.php Disallow: /INSTALL.txt Disallow: /LICENSE.txt Disallow: /MAINTAINERS.txt Disallow: /update.php Disallow: /UPGRADE.txt Disallow: /xmlrpc.php # Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /filter/tips/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=filter/tips/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/ another example: http://edition.cnn.com/robots.txt
Sitemap: http://www.cnn.com/sitemaps/sitemap-index.xml Sitemap: http://www.cnn.com/sitemaps/sitemap-news.xml Sitemap: http://www.cnn.com/sitemaps/sitemap-video-index.xml Sitemap: http://www.cnn.com/sitemaps/sitemap-interactive.xml User-agent: * Disallow: /.element Disallow: /editionssi Disallow: /ads Disallow: /aol Disallow: /audio Disallow: /audioselect Disallow: /beta Disallow: /browsers Disallow: /cl Disallow: /cnews Disallow: /cnn_adspaces Disallow: /cnnbeta Disallow: /cnnintl_adspaces Disallow: /development Disallow: /NewsPass Disallow: /NOKIA Disallow: /partners Disallow: /pipeline Disallow: /pointroll Disallow: /POLLSERVER Disallow: /pr Disallow: /PV Disallow: /quickcast Disallow: /Quickcast Disallow: /QUICKNEWS Disallow: /test Disallow: /virtual Disallow: /WEB-INF another example: http://www.amazon.com/robots.txt
User-agent: * Disallow: /exec/obidos/account-access-login Disallow: /exec/obidos/change-style Disallow: /exec/obidos/flex-sign-in Disallow: /exec/obidos/handle-buy-box Disallow: /exec/obidos/tg/cm/member/ Disallow: /gp/cart Disallow: /gp/flex Disallow: /gp/product/e-mail-friend Disallow: /gp/product/product-availability Disallow: /gp/product/rate-this-item Disallow: /gp/sign-in Disallow: /gp/reader Disallow: /gp/sitbv3/reader Disallow: /gp/richpub/syltguides/create Disallow: /gp/gfix Disallow: /gp/associations/wizard.html Disallow: /gp/dmusic/order Disallow: /gp/legacy-handle-buy-box.html Disallow: /gp/aws/ssop Disallow: /gp/yourstore Disallow: /gp/gift-central/organizer/add-wishlist Disallow: /gp/vote Disallow: /gp/voting/ Disallow: /gp/music/wma-pop-up Disallow: /gp/customer-images Disallow: /gp/richpub/listmania/createpipeline Disallow: /gp/content-form Disallow: /gp/pdp/invitation/invite Disallow: /gp/customer-reviews/common/du Disallow: /gp/customer-reviews/write-a-review.html Disallow: /gp/associations/wizard.html Disallow: /gp/music/clipserve Disallow: /gp/offer-listing Disallow: /gp/customer-media/upload Disallow: /gp/history Disallow: /gp/item-dispatch Disallow: /gp/dmusic/order/handle-buy-box.html Disallow: /gp/recsradio Disallow: /gp/slredirect Disallow: /dp/shipping/ Disallow: /dp/twister-update/ Disallow: /dp/manual-submit/ Disallow: /dp/e-mail-friend/ Disallow: /dp/product-availability/ Disallow: /dp/rate-this-item/ Disallow: /gp/registry/wishlist/*/reserve Disallow: /gp/structured-ratings/actions/get-experience.html Disallow: /gp/twitter/ Disallow: /ap/signin Disallow: /gp/registry/wishlist/ Disallow: /wishlist/ Allow: /wishlist/universal* Allow: /wishlist/vendor-button* Allow: /wishlist/get-button* Disallow: /gp/wishlist/ Allow: /gp/wishlist/universal* Allow: /gp/wishlist/vendor-button* Allow: /gp/wishlist/ipad-install* Disallow: /registry/wishlist/ Disallow: /review/common/du Disallow: /gp/registry/search.html Disallow: /product-reviews/B0069IY63Y Disallow: /gp/orc/rml/ Disallow: */gcrnsts Disallow: /gp/gc/widget Disallow: /gp/dmusic/mp3/player Disallow: /gp/entity-alert/external Disallow: /gp/customer-reviews/dynamic/sims-box Disallow: /review/dynamic/sims-box Disallow: /gp/redirect.html Disallow: /gp/twister/ajaxv2 Disallow: /ss/twister/ajax Disallow: /b?*node=7454917011 Disallow: /b?*node=7454927011 Disallow: /b?*node=7454939011 Disallow: /b?*node=7454898011 Disallow: /gp/customer-media/actions/delete/ Disallow: /gp/customer-media/actions/edit-caption/ Disallow: /gp/dmusic/ User-agent: Googlebot Disallow: /rss/people/*/reviews Disallow: /gp/pdp/rss/*/reviews Disallow: /gp/cdp/member-reviews/ Disallow: /gp/aw/cr/ Disallow: /exec/obidos/account-access-login Disallow: /exec/obidos/change-style Disallow: /exec/obidos/flex-sign-in Disallow: /exec/obidos/handle-buy-box Disallow: /exec/obidos/tg/cm/member/ Disallow: /gp/cart Disallow: /gp/flex Disallow: /gp/product/e-mail-friend Disallow: /gp/product/product-availability Disallow: /gp/product/rate-this-item Disallow: /gp/sign-in Disallow: /gp/reader Disallow: /gp/sitbv3/reader Disallow: /gp/richpub/syltguides/create Disallow: /gp/gfix Disallow: /gp/associations/wizard.html Disallow: /gp/dmusic/order Disallow: /gp/legacy-handle-buy-box.html Disallow: /gp/aws/ssop Disallow: /gp/yourstore Disallow: /gp/gift-central/organizer/add-wishlist Disallow: /gp/vote Disallow: /gp/voting/ Disallow: /gp/music/wma-pop-up Disallow: /gp/customer-images Disallow: /gp/richpub/listmania/createpipeline Disallow: /gp/content-form Disallow: /gp/pdp/invitation/invite Disallow: /gp/customer-reviews/common/du Disallow: /gp/customer-reviews/write-a-review.html Disallow: /gp/associations/wizard.html Disallow: /gp/music/clipserve Disallow: /gp/offer-listing Disallow: /gp/customer-media/upload Disallow: /gp/history Disallow: /gp/item-dispatch Disallow: /gp/dmusic/order/handle-buy-box.html Disallow: /gp/recsradio Disallow: /gp/slredirect Disallow: /dp/shipping/ Disallow: /dp/twister-update/ Disallow: /dp/manual-submit/ Disallow: /dp/e-mail-friend/ Disallow: /dp/product-availability/ Disallow: /dp/rate-this-item/ Disallow: /gp/registry/wishlist/*/reserve Disallow: /gp/structured-ratings/actions/get-experience.html Disallow: /gp/twitter/ Disallow: /ap/signin Disallow: /gp/registry/wishlist/ Disallow: /wishlist/ Allow: /wishlist/universal* Allow: /wishlist/vendor-button* Allow: /wishlist/get-button* Disallow: /gp/wishlist/ Allow: /gp/wishlist/universal* Allow: /gp/wishlist/vendor-button* Allow: /gp/wishlist/ipad-install* Disallow: /registry/wishlist/ Disallow: /review/common/du Disallow: /gp/registry/search.html Disallow: /product-reviews/B0069IY63Y Disallow: /gp/orc/rml/ Disallow: */gcrnsts Disallow: /gp/gc/widget Disallow: /gp/dmusic/mp3/player Disallow: /gp/entity-alert/external Disallow: */sim/B001132UEE Disallow: /gp/customer-reviews/dynamic/sims-box Disallow: /review/dynamic/sims-box Disallow: /gp/redirect.html Disallow: /gp/twister/ajaxv2 Disallow: /ss/twister/ajax Disallow: /b?*node=7454917011 Disallow: /b?*node=7454927011 Disallow: /b?*node=7454939011 Disallow: /b?*node=7454898011 Disallow: /gp/customer-media/actions/delete/ Disallow: /gp/customer-media/actions/edit-caption/ Disallow: /gp/dmusic/ User-agent: EtaoSpider Disallow: / # Sitemap files Sitemap: http://www.amazon.com/sitemap-manual-index.xml Sitemap: http://www.amazon.com/sitemap_vendor_videos_us.xml Sitemap: http://www.amazon.com/sitemap_vod_index.xml Sitemap: http://www.amazon.com/sitemaps.f3053414d236e84.SitemapIndex_0.xml.gz Sitemap: http://www.amazon.com/sitemaps.1946f6b8171de60.SitemapIndex_0.xml.gz Sitemap: http://www.amazon.com/sitemaps.bbb7d657c7e29fa.SitemapIndex_0.xml.gz Sitemap: http://www.amazon.com/sitemaps.11aafed315ee654.SitemapIndex_0.xml.gz Sitemap: http://www.amazon.com/sitemaps.c21f969b5f03d33.SitemapIndex_0.xml.gz
google and sitemaps: https://support.google.com/webmasters/answer/156184
list of all known web crawlers / robots out there: http://www.robotstxt.org/db.html
liked this article?
- only together we can create a truly free world
- plz support dwaves to keep it up & running!
- (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
- really really hate advertisement
- contribute: whenever a solution was found, blog about it for others to find!
- talk about, recommend & link to this blog and articles
- thanks to all who contribute!