Domo Arigato, Mr. Roboto
There are many rehashed posts on the blogosphere about misc. SEO 101 practices. When it comes to robots.txt matters, typically they cover topics like this:
- Disallow vs. the essentially non-existent "Allow"
- Directory vs. file exclusions
- Global vs. agent-specific exclusions (ordering, which wins if both are used)
- Wildcard matching: valid vs. invalid, supported vs. unsupported
- HTTP vs. HTTPs
- subdomains vs. canonical domains
- "friendly" vs. "unfriendly" 'bots (allowing vs. blocking / blocking n' trapping)
- vs. the meta name="" tag (which wins if both are used)
- vs. security (how it's not a method, but can become a liability)
However, not a lot has been discussed to my knowledge about how to view robots.txt from a competitive SEM/SEO perspective specifically. A couple years ago, Graywolf discussed PPC Quality Scoring considerations related to robots.txt, and how to sandbox a site's normal vs. campaign landing pages nicely between algorithms that determine organic ranking vs. paid positioning (affecting metrics such as Minimum CPC, CTR etc.).
What happens however, if one is in a very headed competitive SEM/SEO environment? Or, looked at it another way, are you monitoring your competitors' robots.txt files? You should be. It's a common mistake to presume that only Webmasters who are total douches make disclosure slips in their robots.txt files such as disallowing private content that for whatever reason isn't (yet, ahem?!?) not protected by a username and password login. Beyond that, Webmasters sometimes list semi-private locations in their robots.txt files, such as the very examples in Michael's post.
On more than a few occasions, I've found my competitors maintaining a nice, tidy list of all their misc. campaign landing pages in their robots.txt files. When that happens it's a great way to weed out landing pages that might otherwise not have been found, such as obscure creative tests they may be running and/or designs they've tested in the past. Sometimes, sniffing competitors' robots.txt files can lead one to discover whole new components of their developing online marketing strategy, e.g. "Oh, wow! Look who's starting to target the Spanish market all of a sudden!"
Part of the deal is that robots.txt instances are content management issues unto themselves. Keeping them updated, properly synced with site content, is enough work as it is for many if not most site managers. Some Webmasters keep fully optimized about robots management do pay their dues (making very, very careful choices about how to use robots.txt vs. metadata in managing indexing vs. crawling (semantic flow) vs. caching, and put in the work on their builds accordingly)... but they are in the minority.
Join the minority, and let's go hunting.