Primary Indexation: Size Matters

One thing that's been an area of concern for SEO is indexation, and it's been of debate and discussion for some years now. Namely, this pertains to getting ranking from Google. Among the many unique things about that engine is the separation of indexation across two segments they have, a primary one and a supplemental one. For pages to rank substantively on queries they need to get into the former. The latter is a big bucket for content Google deems untrustworthy, non-unique / duplicate / non-original, and/or lacking in relevance... even if just tentatively in some cases.

Today, the phrase "Supplemental Hell" isn't used as much as it was a handful of years ago, when SEOs started first observing Google's sandboxing of new sites into the supplemental index sometimes for initial periods of months on end. The issues of understanding how indexation works and how and when to lend weight to it in managing one's site(s) however, is more relevant than ever. While at the end of the day links as always are the most important part of SEO, a site can have a ton of great links but if it doesn't have any content substantially relevant for a given keyword, it won't rank for it, period.

This is why for some long tail, moderate/low-competition keywords, the ones that might be left not fully tapped in a given sector or niche's demand pool anyway, one can in some situations get improved ranking on site-side optimization alone... even if ideal opportunities to make substantial headway that way only come up so often. Generally I think these days it only applies in cases where sites have been building up organic trust, links and authority passively over a period of many years, like since the late 90s or something, and have had no proactive SEO work done on them in a long time or ever. Those are probably the only cases, in my opinion, where one can do dramatic things like doubling a site's natural search traffic without having to also make conscious, direct link building effort.

In any case, as with many SEO concepts really putting the one of indexation into good practice means digging into the details per project. Just as with ideas like pages' content-to-code ratios and keyword densities, also site's inbound deep link ratios and average page load times, there's no hard and fast rule that sites in general should always be held to; no real standard. At the same time, we know calculated metrics like these can matter, even if in some cases more than in others... and even though Google's datacenters will offer up numbers that can change wildly, and literally right before our eyes upon running a given site: operator query on a given domain repeatedly in a given sitting, makes the target all the more a moving one.

So how does one put indexation metrics into perspective so as to get some actual value out of them instead of chasing one's tail gathering data and stopping at that? Examining a range of sites over time in a trend-spotting effort is it. Here's an example.

The following charts are of an indexation sampling across a number of sites. There's a huge variety here in the site types; these properties range from having thousands of pages to hundreds of millions. It's still intriguing to examine a given month's span, that caveat aside.

First, let's look at percentages of primary indexation; the percentages of each site's pages that were actually ranking-capable - for any potential queries - in the past month:

Primary Indexation

Primary Indexation

Next, let's look at total indexation. While the first chart shows the fractions, this one shows the denominators; estimates of all pages these sites had across the whole of Google's index:

Total Indexation

Total Indexation

They're wildly different views, no? There certainly were some interesting things that happened here. Pay close attention to how differently these sites work when considering the trends against one another. For example, note how eBay's primary indexation levels held at a steady downward path up until suddenly jumping after the first week of September. One could correlate that with Sellers' activity as Labor Day approached. As their offers ramped up, a kind of "lag" became evident in the primary indexation... maybe not of trust but of Google simply taking a while to then adjust to and keep pace the pages growth. Moreover, Labor Day itself with eBay has historical precedent dating back to their very beginnings; the first auction on their platform was on Labor Day of 1995). Notice also how, compared to their oceanic volume of pages, TechCrunch's raindrop held very steady... and lastly while Omniture's primary indexation dipped mid-way through for whatever reason, that only lasted for a few days after which it recovered to essentially the same level.

Obviously this is just one casual exercise... short-term, minuscule data set... nonetheless it's worth taking a shot at some

Inferences and Interpretations

  • If we define size as the number of URLs (pages in theory), the smaller a site is the more important - not the more likely - it is that a higher percentage of its pages are in the primary index. This makes perfect sense if we work off an assumption that fewer pages on a given site correlates with increased likelihood of any given one of those pages being relevant and original content.
  • This rule does not apply, however, to sites that are largely made up of user-generated content which a) lives on respectively unique URLs and b) is temporal. Those two things introduce a ton of flux into a given situation. Therefore, managing indexation to benchmarks is much harder to do for a site like eBay than it is for a site like Twitter, Yelp or even Amazon for example.
  • With the percentage of pages a site has in Google's primary index, the more a site is made of user-generated content, the more a range of variance over a given monitoring period can and should be expected. While their architectures differ in how they render user-generated content, over this test month for example the highest degrees of change from start to end points were with (in order) eBay and Twitter (tied at 34%), followed by Amazon (21%). Of all the other sites monitored here only Zynga got into the double digits at 17% change. All the other sites maintained degrees of change in the single digits.

(and some)


  • The smaller (primarily) and/or less UCG-based (secondarily) your site is, the higher "bar" of primary indexation you should be holding it to. This applies not only to having higher percentages of primary indexation, but also more stable trends of primary indexation.
  • Monitoring your degree of primary indexation vs. your direct competitors and peers over time can be a good idea, the prerequisite being absence of major i.e. fundamental content and URLs architecture differences in between.
  • Even if not deeming it a KPI, one should always be monitoring one's primary indexation levels over time. At the very least, a sudden and unusual drop in it can potentially indicate something's gone wrong with the build.

When it comes to how a site can attract organic action and how much, size clearly matters. However, so do factors of consistency, stability and technique.

Spread His Word

    About this entry