Google Says ...: Who does Google trust now?

What SEOs and Search Engines say about TrustRank and PageRank

Let me say up front that, so far as I am concerned, no one outside of Google is in a position to say definitively or authoritatively how Google determines trust. Nonetheless, many SEOs have been making very ignorant comments about Google and "trust" over the past 18 months or so. The problem began with everyone commenting on Google's listing TrustRank as a service mark. This was a curious situation because the expression TrustRank was coined by Yahoo!, who published a paper in conjunction with Stanford University introducing the TrustRank methodology for calculating PageRank more reliably.

Many of those SEOs have wrongly assumed (and stated repeatedly) that PageRank serves as the basis for Google's search results rankings. PageRank has apparently always been factored into the algorithm wherever possible, but Apostolous Gerasoulis of Ask has long claimed that Google never fully implemented PageRank anyway.

Matt Cutts has indicated that Google's internal PageRank drives their crawling priorities. I think this is probably for the main index only, but maybe it also drives the Supplemental Index crawling as well.

Google's apparent historical trust in sub-domains

It became apparent to me by early 2005 that Google had begun shifting its priorities in late 2004 (and perhaps earlier that year) to favor pages from older domains that I called Trusted Content Domains. I coined that expression to distinguish those domains from Spam Domains. Spam domains typically fall into one of two groups: 1-page doorway domains that redirect to primary content domains and domains that host a lot of worthless content.

I found that I could add content to an existing domain and see it rank well within a week to a few weeks, while people creating new domains were making no progress after several months. This was a marked change from the way my new-page content achieved rankings a year before. However, it very closely resembled the behavior of sub-domains coming off of primary domains going back to 2001 (see my comment on Aaron's post). I have complained in numerous public forums since 2001 that Google would automatically trust sub-domains. They never seemed to care, and a lot of sub-domain spam has been around for years because of that oversight.

In essence, Google has always seemed to confer without question to sub-domains the ability to achieve high rankings in search results. For technical reasons, I long resisted the temptation to hang sub-domains off Xenite.Org. I found that sub-directories often served my purposes, even though they took a little longer to establish relevance. However, though I am now starting to work with more sub-domains, I am concerned that Google may now be implementing serious sub-domain analysis and filtration. I may or may not inadvertently trip some filters simply through inexperience and experimentation.

How Google Determines Search Results

Because of SEOs' ridiculous infatuation with link-bombing based "optimization", the importance of relevance has long gone unheeded in the SEO community. Sergey Brin and Larry Page established that determining relevance was the core factor of their ranking methodology in their original paper about Google, but the inconvenient fact has been swept under the rug of ranking-through-link-spam.

In January 2006, Matt Cutts published an article in Google's newsletter for Librarians in which he recapped Google's basic ranking strategy. Matt naturally discussed the PageRank algorithm because it is so often referred to, but he emphasized that PageRank is not the key to ranking in Google's search results. In fact, Matt literally wrote that, "in order to present and score" results for a query, Google picks pages that "include the user's query somewhere" and then ranks "the matching pages in order of relevance".

The SEO community continues to look in the wrong direction

Despite this apocalyptic revelation, SEOs have continued to pound the podium in favor of link building. And I will admit to helping them pound the podium with all my link-building articles, although I have tried to point out that links are important for other reasons.

I write about link-building for one reason: since I know how to do it better than most SEOs, I felt it might help to establish my linking credentials in a community obsessed with links. Most of the more popular link schemes owe something to my research over the years anyway -- it's just that the young SEOs are too consumed with their snide tirades to do the research to find out where all their cherished strategies came from.

I didn't invent these linking schemes, but I helped test and prove their effectiveness back in the day when they could truly be efficient and effective. And, sad to say, I probably am one of the grand-daddies of link farming. But you can blame Inktomi for being so darned frustrating. Most of you have no idea of what it really means to have to rank on the basis of linkage. I do. I hope we never have to return to those kinds of search engines.

The consequences of all the bad SEO practices since 2001

When Adam Mathes coined the expression "Google bombing", he was only giving a bad name to a practice that actually went back to the days before Google. Adam noticed how effective the technique worked for bloggers, but spammers had been link bombing both Google and Inktomi for years. Well, after the media had their day with the new buzz word, a new generation of SEOs began building their business models on the foundation of link building.

After four years of thousands of SEOs blogging, writing articles, and sharing link-based ranking techniques in forums, FAQs, and eBooks, a large community of business decision-makers has been misled into believing that linkage is the key to ranking on Google. And what is truly sad is that it appears to be more true today than it was two years ago only because Google had to react to the massive onslought of manipulative linking that has mangled its relevance scoring.

All "white hat" SEOs who practice link-building are as guilty as all "black hat" SEOs and spammers of burning down the trees in our forest and destroying the environment in which we optimize. It will be years before SEOs take responsibility for their ill-considered practices. Black hats at least snicker at the idea of ethical optimization and shamelessly promote their Web sites in whatever way they can. They work on volume and build their networks and just adapt to the algorithm changes.

But the rest of the community has bogged itself down in a blind tradition that was a terrible solution to a non-existing problem in the first place. Now they are chained to the link-building treadmill because even the SEOs who realize there is more to search engine optimization have to deal with unrealistic client demands and expectations. The machine has lurched into high gear and tumbled out of control. Maybe a few of the operators notice they are no longer in charge, but most still mindlessly wade through SEO forums blathering about PR (Toolbar PageRank), "quality links", sending out reciprocal and 1-way link requests, and now TrustRank.

How Important Has Trust Become?

Because of SEO "best practices" based on link-building, Google has gradually gone into high anti-link building gear. Since early 2004 the so-called Sandbox Effect has been debated and tested and evaluated in six thousand directions. Consensus now seems to be settling on the idea that new domains are sandboxed because they lack links from Trusted Content Domains. I credit John Scott with being the first to offer the most reasonable explanation, though he now feels somewhat differently about what causes the effect (things do change).

Since mid-2005, Google has implemented filters against fake link directories, scraped content sites, and RSS-feed driven sites. When I warned Danny Sullivan about these kinds of sites in early 2005, he expressed complete and total ignorance of the problem. Swept up in the fake link directory blitz, however, were many "low quality" SEO directories -- directories set up by people for various reasons, including accruing PageRank, helping other sites build up linkage, and gaming Google.

Another problem that began to get attention from SEOs in late 2004, and which has gradually increased in severity, is the transfer of many legitimate content sites to the Supplemental Index. Only over the past few weeks have I found enough bits and pieces from Google to assemble a coherent idea of what the Supplemental Index may be.

With the rollout of Big Daddy in early 2006, Google exacerbated Webmaster frustrations by increasing main index crawling and decreasing supplemental index crawling. Suddenly, everyone started talking about trust as if they knew what was going on. Remember that I said at the beginning of this post that I don't believe anyone outside Google knows what is going on.

How can trust be algorithmically determined?

But several of us have tried to guess what is happening. Todd Mailcoat suggests that it's a trust filter based on Web site age, number and age of backlinks, and total "trustscore" of those backlinks. He adds: "Most trust criteria revolve around some dependence on age, which is actually a pretty good signal of quality". However, we know that Google ignores identified paid links among others, so "total number of backlinks" isn't helpful. Nor do I believe that age really matters as much as I once did.

Neither age of site nor age of links pointing to the site should really matter to how much a site can be trusted. A spammy link that sits around for 3 years is still a spammy link. A spammy site that sits around for 5 years is still a spammy site. I think Todd's third point is closer to the truth, and is really the only one required to explain what Google is doing.

But is Google scoring by trust or is it just trusting pages to confer PageRank and Link Anchor Text? In a follow up to his earlier Google Librarian article, Matt Cutts wrote "if more people trust your site, your site is more valuable" (implying that PageRank is used to help determine trustworthiness) and "we examine the content of neighboring pages, which can provide more clues as to whether the page we're looking at is trusted".

Another point Matt recently made was that the sudden appearance of hundreds of thousands of pages can trip a trust filter. That's a high threshold, but I'm sure it's that high for a reason.

Looking for trust in all the wrong neighborhoods

But what constitutes a "neighboring page" for a new domain? Any new page on an existing domain already has neighbors in its sibling pages (found in the same physical folder or directory) and cousins (found in other folders and directories on the same domain or sub-domain). New domains have to be placed into neighborhoods before they can have neighbors. Such neighborhoods are most likely only defined by linkage.

One simple possibility is that if a trusted "expert" or "hub" page links to a new domain, that expert/hub can be used to determine who the neighbors are. But even one expert's opinion isn't very informative. I think that Google looks for a variety of trusted expert opinions. These experts will include well-known human-edited directories with clear, definitive categories, but I think the expert votes also will come from some of the second-tier content sites. Any Web page that links to a group of related Web pages is usually considered to be an expert.

Until Google can form a collective opinion about where a new domain's "neighborhood" is, it isn't in much of a position to determine if that domain can be trusted. So, while many SEOs might be quick to say, "See? We do need to submit links to directories!" Maybe, but would you as a surfer want to trust a site only listed in directories? Why does no one else link to the site? You need more than one kind of expert opinion, in my opinion. Dan Thies suggested as much in late 2005 at the Highrankings Forum (and perhaps elsewhere).

"Well then," some hardcore reciprocators might say, "We just need to submit to directories and get reciprocal links from related pages."

Moving into the wrong neighborhood

But the problem is that Google looks for "excessive reciprocation". Some reciprocation is expected and tolerated. This is the World Wide Web, after all, where sites are expected to link to each other. But if you can only get links from directories and reciprocating sites, you're still not collecting independent opinions or votes of confidence from true authorities.

Authority pages has become another SEO buzzword, and I have seldom seen anyone in the SEO community use the expression in a way that conveyed a clear meaning to me. I am sure most people who speak of authority pages have a clear idea of what they mean, and can probably articulate that idea. But I have found no real consensus on what the SEO community collectively means.

I'll go with the traditional HITS definition: an authority page is linked to by many experts. But some experts are more trustworthy than others, and those experts are often linked to by many authority pages. It's all very circular, of course, but I think it's important that new domains be linked from authority pages in clear context. That is, a reciprocal link won't do the trick. You need to have content surrounding or adjoining the link that is relevannt to the link anchor text.

But let's back up a moment. Is it not possible that there are sham experts and authorities? Absolutely. So you need to ask if Google hasn't found a way to favor some neighborhoods over others. One potential trust-impacting factor is who you link to. Matt Cutts has been reluctant to explain why spammy-looking links on one page may be trouble and why similar appearing links on another page seem okay.

Neighborhoods must be bubbles of tightly connected Web sites, and the neighborhoods that are most trustworthy are probably linked to by many other neighborhoods. So now we're venturing into the realm of speculation with the concept of NeighborhoodRank. Does Google tag neighborhoods as being more or less trustworthy? If so, then it may be that an entire neighborhood has to gain trust before its member pages earn trust.

Why link baiting works

This may explain why Rand Fishkin of SEOMoz is able to boost sites past the Sandbox Effect so quickly. When he creates Link Bait, his sites draw linkage from both new neighborhoods and old neighborhoods, and the old neighborhoods undoubtedly include a lot of trusted neighborhoods. His Link Bait domains are therefore drawn into the better neighborhoods because of where they link to and from whence their inbound linkage comes.

In other words, successful Link Bait doesn't have to wait for its neighborhood to be approved for trust. It simply joins one or more already established good, trusted neighborhoods.

Why reciprocation sometimes fails

And that may explain why link reciprocation doesn't always work. Some people complain that after gaining several hundred reciprocal links, they still seem to be sandboxed. In evaluating the backlinks for many such sites, I often find they link out to and receve links from what I personally would deem to be low quality sites, many of which appear not to be trusted.

I have my own test for deducing whch sites may be trusted and which sites may not be. I don't disclose the test publicly because I don't know how accurate it is and I don't want to give away a possibly useful idea to people whom I don't want to help. My test is quick and simple, but even if it's on the right track I doubt it is 100% reliable. I am developing a couple of other tests to see if I can establish a consensus of results.

In the meantime, the continued emphasis on building links in quantity probably only maginifies the problem for most SEO'd Web sites. The more links the SEOs seek out from "tried and true" sources, probably the longer it takes to get sites to move past the Sandbox Effect. There will be differing degrees of success. Some SEOs most likely have very good sources of linkages. Most prbably do not.

Are Supplemental Index Pages 'bad neighborhoods'?

I don't believe so. I think these pages represent documents that have not yet earned trust, but that doesn't mean they are considered to be 'bad'. Matt suggested to one person on his blog that "the best way I know of to move sites from more supplemental to normal is to get high-quality links (don’t bother to get low-quality links just for links’ sake)".

I have more to say about Google's Supplemental Index at my SEO Web site.

Final Word

The bottom line is that we still don't know what Google is doing, but we all agree that they are now being strongly influenced by a need to distinguish which sites can be trusted from those that cannot be trusted. I think there are some highly implausible and convoluted theories being proposed by other people right now. The more complicated a proposed explanation becomes, the less likely it is to be correct. For now, I think Google is looking at aggregate linking relationships to determine where community trust really exists. It's very, very difficult to fake trust from a broad variety of sources.

Simply getting links from free directories, article submission sites, reciprocal links, and other popular link sources will probably gradually extend the length of time new sites require to earn trust if for no other reason than that they will only very slowly naturally attract links from trusted neighborhoods.

The real question comes down to this: if I am correct, or close to correct, in my analysis, how long will it take for spammers and SEOs to develop methodologies that effectively poison the "good" (trusted) neighborhoods and force Google to develop some filtration methodology?

I think maybe a year, perhaps 18 months. Until then, those SEOs who have inventories of trusted link sources will hoard their wealth and be very, very reluctant to share the gold. After all, the more people who know where to get the good links from, the less likely those link source will continue to be valuable.

3 Comments:

Michael Martinez said...: I agree with you, but there are a lot of highly selective Web sites that SEO link builders are finding ways to exploit. Look at the attention being paid to .edu and .gov domains right now.

You basically have thousands of link builders hacking ideas for obtaining links from those sites. I don't mean they are hacking the sites -- I just mean they are testing and probing the openness of the sites and finding domains where they can obtain links.

If enough people do that, they'll eventually dilute whatever quality value resides in those domain communities, or at least in some areas of those domain communities.

Eventually, people will start posting that ".edu links and .gov links no longer work (as well as they once did)".; 12:48 PM
Dan Thies said...: Hey Michael, someone just pointed me to this post. The idea that you want links from a diverse set of sources isn't new to either of us, is it? I can imagine a number of ways, including what's been published on TrustRank, to get the job done at the search engine. You could just as easily invent StinkRank, and propagate stench backwards through the inbound links pointing to crappy sites. :D It's not impossible to get to the concept of a neighborhood either.

Although we hear and see in practice that a Yahoo directory listing can help avoid that early-stage trust filtering (sandbox, whatever), shouldn't one ask what the mechanism behind this effect is? We know what is possible, we know what has been done in the past. The core concept of the PageRank algorithm can be applied in other areas. Perhaps some paid links are trusted (such as the Y! directory), because they imply that the website is legitimate... how many people will pay $299 to submit their 'traffic equalizer' site to Yahoo?; 3:05 PM
Rangan Badri said...: You surely possess deep insight in what you are talking about Mr. Michael Martinez.

Very informative your posts are.

I just blogged about your blog.

Thanks Mr. Michael Martinez; 7:43 PM

<< Home

Google Says ...

About Me

Michael's Web

Friday, September 08, 2006

Who does Google trust now?