Google Says ...: 2006-07-23

Should Search Engines Teach SEO?

Matt Cutts recently asked his blog readers if he should offer some SEO-101 (basics/fundamentals) posts or go into the deeper nether regions of the Force. Naturally, he got a mixed bag of reactions.

Personally, I think Matt should start with the basics. Not just because it would be fun to see long-time SEO pundits disagreeing with him on minor points, but because before you teach the advanced stuff you should make sure your students understand where you're coming from.

Some of the things Matt puts on his blog are propaganda posts, such as his lectures about using rel=nofollow. The NoFollow attribute is Matt's baby, so you have to expect him to advocate its use -- anyone would be expected to go to bat for their ideas.

But sometimes Matt drops significant hints that are just absolutely missed by the SEO pundits. They dwell on stupid, meaningless stuff like PageRank, the implications of "excessive reciprocal linking", why such-and-such site isn't being outed, etc. You know, no automated system is perfect, and I believe Google relies mostly on automation to keep its search results clean. Yes, yes, they penalize and ban sites so there is clearly human intervention in some places, but most of it is automated. The software is occasionally going to miss something. People occasionally miss something.

It would be interesting to see what search engine employees like Matt and Laura Lippay of Yahoo! (who used to be on the SEO industry side) have to say about the best methods of optimization. They would strive to teach people how to optimize both fairly and effectively. They would speak with authority that typical SEO gurus (including me) simply lack.

There are several search engine optimization organications around these days, and at least two of them offer some sort of certification. While the people involved in these groups are widely respected, they don't have the proper credentials in search science to really be certifying anyone. A better certification system would rely upon a blend of traditional IR principles and commercial search engine placement practices.

And while many people in the SEO industry feel that the search engines would proselytize and use their access to SEO students to advocate "favored" practices that would, in fact, assist their engines in improving results, what's the harm in that? It's not like those students wouldn't be able to find the FAQs, tutorials, and forums that teach the Dark Side principles, that offer speculative advice, and that simply go off into the wild blue yonder prattling about PageRank converging to an average of 1.

Search engines have been looking over the fence for a long time, since before Google existed. They have quietly hired SEO professionals as "consultants" and examined various techniques and principles without saying much in public. Google employees openly attend SEO industry events and party with the Black Hat SEOs and give interviews and do all the main promotional stuff, but they haven't really done much to help Google lay a foundation of trust and understanding in the SEO community.

Educating SEOs in reliable methodologies that meet search engine guidelines would go a long way toward establishing professional standards. It would also help validate or calibrate the certifications being offered in the industry, because you know those certification curricula would be adjusted to at least examine what the search engines have to say about acceptale SEO practices.

But it would also help SEOs interact more directly with search engines. There is a dirty side to the long history of SEO-search service interactions that doesn't get talked about much. So far as I know, Google never played that game, but other search services have. In the past, people learned not to share too many secrets openly in certain SEO communities because community insiders took what they learned to their paying clients (search services) and ... well ... you can see where that leads.

I think Google, Yahoo!, MSN, and Ask all owe it to themselves and their users to discuss what they consider to be proper optimization in a more formal environment. I'm not saying I would want to pay Google $2000 for the privilege of sitting in the Googleplex for a week, but if it came down to that, then it would be better than the haphazard "what should I write about next?" from Matt Cutts.

Matt is a great resource for everyone, but he can't certify SEO professionals, and in my opinion, no one can do it properly. The need for certification has oft been discussed, proposed, and mostly sidelined despite the efforts being made. SEO certification needs to be standardized and made available to everyone at a cost-effective level. I can even envision 2-tier SEO certification, because project planning in itself is a major investment of time and resources.

Canvassing Google employee blogs

It's been a slow news month for Google employee blogs, which I don't bookmark because they are legion and most -- unlike Matt Cutts rarely say much about their employer. Non-disclosure issues aside, I think Google employees show remarkable restraint given just how much speculative commentary is provided across the Web (and Swedish researchers might conclude I post about 10% of that commentary, but that's another story).

There was, however, a small brouhaha over Google's authentication service when one of their employees, Ben Laurie, took the blogging/journalism community to task for (in his personal opinion) misrepresenting or miscomparing the service to Microsoft's upcoming Live features.

I think the most significant fact to come out of this exchange is Ben's statement that "Google doesn't announce what it's going to do, only what it's already done." I'm not sure how accurate that statement is (it was not sanctioned by Google and is Ben's personal expression). After all, if Google releases a beta tool, is that something they've already done, something they are doing, something they will be doing, or all three? It's a bit complicated.

Take the Google Web Toolkit, for example. You build AJAX applications in Java and then convert them to Javascript. This is a beta tool, but is it essentially finished, or is it just a foreshadow of what Google will do with Webmaster tools in the future?

Google has apparently made a huge investment in Java technology. According to another Google employee (Crazybob), Google powers Gmail, Adwords, and Blogger with Java. Now, I didn't know that. I've been criticizing Web-based Java apps for years because they tend to run so slow. I guess maybe the reason has more to do with available resources than anything else, since I only occasionally cringe at the slow response of Blogger's server.

Gregor Hohpe (who hopes to follow in the footsteps of Crazybob, among others) casually mentions that Java and integration are very important to Google. In fact, Google sits on the JCP Executive Committee. So they must really like Java.

Wonder how well they get along with Sun Micrsystems? It's hard to say. Or will there be a merger? Hm....

I guess Google doesn't have much to say on the subject right now, but perhaps they will after the deed is done (if indeed done it ever will be).

Google disables "links to this post" - Why?

I've noticed that the official Google blog no longer shows who is linking to their posts. Why is that? Too much link spam?

Google's click-fraud prevention techniques are 'reasonable'

So says Alexander Tuzhilin, the independent 'expert' who reviews Google's fraud-detection systems for a court. It's not clear to me, after reading this document (which includes the author's bio), what qualifies him to review Google's click-fraud detection methodologies. He does not claim to have any prior experience or exposure to the management or generation of invalid clicks (as Google describes them).

Nonetheless, his conclusion is that Google makes a reasonable attempt to detect and neutralize invalid clicks. Googlers are understandably happy to be so vindicated in an official document. However, the fact that the author quotes Wikipedia further underscores his essential naivete. He provides no reservation about the votility of Wiki entries (although any comparison of a legal analysis from 100 years ago would be expected to make reference to contemporary dictionaries and source materials, he provided no publication date for the Wiki reference).

I think Google makes a better presentation in its filed objections to a proposed $90 million settlement, especially where they point out that the proposed settlement implies fraud exceeding their total revenues to date has occurred. If there are any valid scientific claims being made in this case, neither side has done a very good job of providing them.

Dr. Tuzhilin's analysis even includes a self-admitted unscientific Zipf graph, where he analyzes the long tail of invalid clicks. While there is a certain logic to what he proposes, he offers no evidence to support his contention that the behavior he is analyzing conforms to his proposed model. That would be equivalent to Einstein saying, "Well, I think Time changes near Mass, but I have no mathematical model to show how this works." It has taken scientists decades to accumulate confirmable observations that support the extensive math accompanying Einstein's theory, so maybe I'm being a little harsh in comparing the long tail model to Relativity.

However, Dr. Tuzhilin only examined the issue from Google's perspective. His report indicates nothing about any attempts to contact people who specialize in, or rely-upon invalid click generation. Such people exist, they have been in business since before Google was founded, and based on my own conversations with some members of that shadow industry, they were already using (8 years ago) advanced methodologies which Google's vague measures appear to be incapable of detecting.

Dr. Tuzhilin suggests that Google has not implemented any data mining techniques in its filter technology. Google, if I were you, I'd assign about a dozen people to catch up on that major deficiency right now. You really have no idea of what you are up against. You need to look at the history of those IP addresses that are clicking on your ads and search results. In my opinion, based on what I have read across the Web, I think Google probably does a very good job of attempting to detect click fraud. Probably no one is better at it by now.

I'll agree that their methods are reasonable, based on the available data.

But are they effective on a large enough scale to ensure advertisers that a majority of the invalid clicks are captured? There are two areas where that question must be answered: in the aggregate and in the specific. That is, overall, the effectiveness of the program may be acceptable ("acceptable" may require something more than a reasonable effort in some people's opinions). But some specific campaigns may be targeted for abuse that is slipping by the filters.

What Dr. Tuzhilin's report underscores is the fact that Google does not know how effective its filters truly are.

Google Says ...

About Me

Michael's Web

Thursday, July 27, 2006

Should Search Engines Teach SEO?

Wednesday, July 26, 2006

Canvassing Google employee blogs

Google disables "links to this post" - Why?

Monday, July 24, 2006

Google's click-fraud prevention techniques are 'reasonable'