Dup' Content Googlebombing

Duplicate content is easily created mistakenly and unwittingly by publishers. It's what happens when the exact same content presentation exists - or can be made to exist - at more than a single, unique URL. When engines find this lack of a squarely 1:1 relationship, it can confuse them about what's the authoritative and/or original source, and in some cases they may interpret it as spam. The jury is out on whether they apply filters or actual penalties (i.e. ranking demotions) for sites found with this problem, but everyone in the business is in agreement about it being a bad thing fundamentally. Engines do have sophisticated algorithms to try to avoid interpreting legitimate RSS feeds and other syndications as something to penalize, however making sure one isn't causing or potentially causing it is a critical SEO best practice. Doing this for sites within our influence and/or control is much of what we talk about when we stress the importance of establishing canonical domains via 301 redirects, for example.

I bring this up now because there's an illustration handy:

Jason Calacanis is an A-list blogger whom I've mentioned before. A former AOL exec who knows how to stir up activity by having a little fun flare for the dramatic (not unlike other masterful marketers like Marc Benioff, Steve Jobs and various other heavy-hitters), Calacanis has been in and out of SEO news over the past year with infamous quips like "SEO is bullshit."

Now, his blog is calacanis.com, which has up until very recently showed a duplicate content issue. Weblogs, the company he co-founded after his stint at AOL, has the same issue across its network. To be fair, many sites and their networks have various security vulnerabilities, but anyway continuing with the example: The problem here is with the potential for wildcard subdomains, specifically.

When something like this is found, what competitive and skilled SEOs can do is essentially pump URLs of their own design into the organic ranks, to knock down other domains' natural standing by hanging duplicate content issues over them. This kind of downgrading of other people's domains is one example from a set of methods usually affectionately referred to as "Googlebombing," though methods aren't always limited to any specific engine, technically. Note the following little pinch of litter, which took minimal time and effort:

Calacanis duplicate content trick

For posterity, here's the screenshot (evidence highlighted in red).

The "-15." was a randomly generated numeric suffix, appended to an array of randomly rotated, cheeky (and fake) subdomains:

  • jason-sucks-seo-rules
  • jason-must-go-to-supplemental-hell
  • punished-for-blasphemy
  • etc.

So crawlers would've picked up a different permutation upon every visit.

For courtesy, all of this BTW was done under some casual, lightweight cloaking... however admittedly without restricting engines' caches at the time (I don't have it in for Calacanis or anyone else for that matter. Were I to seriously try testing anyone/thing i.e. engines or otherwise, I certainly wouldn't go blogging about it as I'd be aiming to exploit from the shelter of anonymity). Doing this in earnest however, that is from a popular site getting crawled many times a day with its changes getting into the SERPs very quickly, could've perhaps harmed his blog's rankings - perhaps substantially after a bit of time. Engines say in their public documentation that there's "almost nothing a competitor can do to harm your ranking" in their indexes. Yeah... "almost" just might be an operative word here. If not that, at the very least this kind of thing can make for pretty noteworthy PR stunts.

Now, in this particular case Calacanis has actually been alerted to this it appears, by a nice SEO (yes, we exist) who found and announced this publicly. Michael Gray could've kept this in the shadows and people could've exploited it for however long it could've flown under the radar. The hole has seemingly recently been hence patched... as of this writing though, only for the calacanis.com domain I assume. There are many blogs out there on the Weblogs network that still show the vulnerability (random example).

The main lesson here IMHO is not that one might do well to avoid talking smack about SEO (though one might infer that, obviously). It's that this is the kind of thing SEOs must educate their clients about and help them take care of, preferably as part of preventative maintenance.

The main (if not only) reason scientists say it's rare that hyenas take down elephants is because they've only observed and documented it happening in nature first-hand so much. This kind of method is just one example, in other words, of how it's totally possible for big brands to lose what should be their rightful rankings to packs of savvy affiliate marketers, spammers and/or general competitors.

As much as possible we SEO consultants (again, yeah some of us exist - still) must make sure our elephants can be helped to stay strong, mobile and without blind spots. Many sites, both with security and SEO, have a ton of work to do if they're to get it together. The reality of the situation is quite far from "bullshit."

Hat tip to Graywolf, and good on ye for the quick-fix action, Jason.

No Comments »


Based out of Northern California, bl.asphemo.us is a bl.og dedicated to the advocacy and study of high-impact, data driven marketing disciplines and related concerns: Analytics and Data Mining, Marketing Automation, Integrated Advertising (targeting, retargeting), Demand Generation and Lead Nurturing, Social Media / Social Engineering (Crowd-hacking) and the new PR, Privacy, Security, CRM, SEO / SEM, CRO, ROI... more TLAs (three letter acronyms) than any sane person's daily lexicon should include.

About the Preacher