Google Says ...

An unofficial, unaffiliated source of comment and opinion on statements from Google, Google employees, and Google representatives. In no way is this site owned by, operated by, or representative of Google, Google's point of view, policies, or statements.

My Photo
Location: California, United States

Use your imagination. It's more entertaining.

Thursday, August 03, 2006

What does 60 per cent really say about Google?

Bill Tancer published the July 2006 search volume breakdown showing that Google processed an estimated 60.2% of all U.S. searches for the month of July 2006. People cite these rising statistics frequently, but I never see any in-depth analysis of the activity behind the numbers. Perhaps I have to pay someone $1500 or more to see the raw data. That's not going to happen.

Given how much search activity is generated on Google by people checking their PageRank, their rankings, their visibility in Google ("Googling themselves"), competitor rankings, coverage of their content by Google, and robots it should be no surprise that Google receives a tremendous amount of traffic.

But how much of that traffic is really useful? How much of Yahoo!'s traffic is really useful? How much of MSN and Ask's traffic are useful?

My feeling is that Google has a lower overall percentage of Human Need-based Traffic. By Human Need-based Traffic, I mean queries where someone actually wants to find something useful for their personal benefit. Maybe they are researching a future purchase, or seeking an online community to join, or looking for interesting news and gossip, or maybe they want to buy something.

What Human Need-based Traffic does not include are all the positioning-centric queries that online marketers generate for statistical purposes, all the automated queries performed by robots for the purpose of scraping results or generating advertising clicks, and vanity queries.

Need-based traffic is really what should be measured. Is that what we are seeing in these statistics, though? I think that Google still dominates Need-based searches by a wide margin, but not by as wide a margin as it dominates overall searches.

As MSN and Ask become more visible, they may be subjected to more positioning-centric queries than in the past. If that proves to be the case, it should shift their overall market share. There may or may not be a correlation to advertising revenues. It would be interesting to study where the positioning-centric queries come from.

Google school documentary on the Web

Google employee Reza Behforooz points to an Australian documentary about Google called Behind.The.Screen.

The show lasts an hour but includes interviews with Google Superstars and lots of footage of the Googleplex. The production quality is about what American students would expect from school movies.

It's only a matter of time before Hollywood starts optioning the Google story....

Wednesday, August 02, 2006

How do you say that again?

Back in April, Google Research announced their beta Arabic-English translation tools. I performed a simple translation test, one which exemplifies just how difficult it is for people to use online translation tools. Let me share an anecdote with you first before I reveal the test and my results.

A few years ago, I had a very popular Web site. I called it Parma Endorion. Originally created in the fall of 1996, Parma Endorion was just a collection of a handful of essays I wrote about randomly selected topics concerning J.R.R. Tolkien's Middle-earth. From 1996 to 1998, I received hundreds of emails from teachers, librarians, and students around the world asking me how they could print out the essays (I had deliberately made this difficult to do). It finally dawned on me that I should stop being so intellectually proprietous and let people print the essays.

So in 1998 I redesigned the site to work more like a book (parma is the Elvish word for "book" in Tolkien's invented languages). Well, my email exploded with thank yous for a while. And then something wonderful happened. People started linking to Parma Endorion all over the place. Problem was, new research was showing me that the essays needed serious updating. And my readers wanted a sequel to Visualizing Middle-earth I didn't have time to write. So in 2001 I arranged with Matt Tinaglia to update Parma Endorion and offer the third edition as a free eBook. All this is pretty well documented.

As the original essays had been translated into Polish and Italian, I thought it would be cool to translate the eBook. I contacted various overseas Tolkien groups and asked for help. I used a well-known online translation tool to write my letters of invitation. I translated the letters sentence-by-sentence, translating them back to English, repeating the process as often as necessary -- changing words where the translations didn't work -- until I found some consistency.

It's a brutal technique but it works for the most part. However, I just didn't pay close enough attention to the Spanish language translations. I can actually read Spanish to some degree. I used to read the Miami Herald's Spanish edition every day, so I really had no excuse for what happened. But I didn't notice that the tool had translated "fans" (as in devoted readers of Tolkien's books) to "ventiladores" (ventillators -- those rotating things that push air around). Well, when the Spanish ventiladores stopped laughing, one of them wrote back and said, "Yes, it's obvious you need help, and we'll be glad to help you."

Thus was born the Spanish translation of Parma Endorion.

So, what is the significance of all that? English has a fairly standardized spelling system. We have differences between the United States and areas of the British Commonwealth. For example, we write "color" and they write "colour". But for the most part English is very standardized. One area of notable exception, however, is the rendering of certain foreign languages into English. Arabic names in particular have multiple renderings in English. Take the terrorist organization Hezbollah. Or is that Hizballa? Or was that supposed to be Hizbullah? You can see the variation in spellings just by switching news sources in your Web browser.

So what does Google with these kinds of variations in spelling? I decided to type a phrase into the English to Arabic tool using a variation of Hizballah's name. The Arabic to English tool returned the exact English meaning of my phrase with a different spelling of "Hizbullah".

I wondered what their authoritative source for that spelling might be. I suppose there may be a book of translation standards running around, but even the Israeli news media cannot agree on how to render the name into English. The Jerusalem Post uses Hizbullah (like Google) and Haaretz uses Hezbollah. So what does Google do to normalize English spellings of non-English words?

While it may seem to some people that I'm getting lost in the details, the SEO world should pay some attention to what Google does with its translation technologies. Spellings are only one aspect of the challenges that face translators. Idiom -- the way we form phrases -- causes an even bigger headache. Today we say "I'm down with that" to mean "that's cool by me" which used to be "I'm okay with that" which replaced "I most heartily agree" which was subsequent to "I approve in all aspects".

When you think about how the Google tool translates Web pages, which may or may not use non-standard idiom, or obsolete idiom, think about how phrase normalization will become very important for Google. If they can accurately render whole passages of text into foreign languages (better than the older tool we're all so familiar with), Google will have taken online translation to a new level.

Furthermore, the significance of idiomatic translation is that a core relevance standard can be established. Think of a universal language that underlies all of our human languages. Linguists have been seeking the means of tying all human languages together for decades, perhaps centuries. They are not really close to doing that, but once they achieve a Unified Human Language Theory, they'll be able to offer translators new techniques and tools for determining what unusual passages may mean.

"may mean" is in itself significant. Language doesn't simply rely upon words and spelling and expressions. For example, nearly every human language -- if not indeed all of them -- incorporates metaphor to some degree. That is, we can use the phrase "hatching of the ugly duckling" to refer to the birth of something other than a duck. If you're a native Turkish language user and you have to translate a paper that uses "hatching of the ugly duckling", how do you determine what that expression is really referring to?

One of my Tolkien essays, "Is your canon on the loose", was translated into Hebrew a couple of years ago. The translator could not replicate the pun in my title ("canon" refers to the authoritative body of texts used for Tolkien research, but the title is styled on the popular idiomatic expression, "He is a loose cannon" -- "canon" and "cannon" are pronounced exactly the same way). My translator sensed the connection but did not fully appreciate it, and after I explained how the joke worked, he said, "We have no similar expression in Hebrew".

After giving the matter some thought, he chose to retitle the essay (with my approval, as well as my permission, after conferring with me and getting my opinion) "Choir of a thousand voices" (which is a much closer rendering of the actual Hebrew title than he could come to my original meaning). It was an appropriate choice for a twisted metaphor, as the core meaning for both expressions is quite the same thing. "Is your canon on the loose" refers to the fact that Tolkien canon discussions are very fluid -- I freely admit to changing contexts and canons on a frequent, mind-boggling basis. I have to because no two people use the same parameters.

So online translation tools are going to hit the same walls that human translators hit, and how those tools make their choices will be very important to search engine optimization specialists. Why is that? Because even though it's not presently possible for search engines to truly practice semantic indexing, that is exactly what they hope to achieve some day.

Semantic indexing would allow a search engine to capture a user's query in any language, any jargon, any idiomatic context and find all relevant documents -- in any language, any jargon, any idiomatic context. Isn't placing the world's collective knowledge at your fingertips one of Google's stated objectives? Hey, whether they can or cannot do it really doesn't matter right now. They are trying to do it.

Hopefully, they'll take whatever lessons they learn from these translation projects and apply them to working with user queries and page indexing.

But beware what you ask for. Today, the real estate for Web page design and optimization is wide open. If you cannot dominate one expression, you can dominate another, similar expression and then brand that similar expression into your targeted market. Teach them to search for the phrases you dominate and you blow your competition out of the water while he is still gloating over his number 1 ranking for the phrases you cannot touch.

If Google creates the Universal Translation Tool, they'll be able to substitute one expression for another in resolving queries. One might hope they would allow for an exact find search anyway (how else would one find an exact passage one wants to retrieve?). But if that day ever comes, optimizing for specific expressions will take a back seat to optimizing for concepts.

Concept optimization is in its barest infancy right now. Our methodologies are clumsy and rely mostly on brute force. Things will remain that way until the search engines become more sophisticated in language analysis and translation. When that happens, the rules will become more clear.

Until then, we'll have to keep an eye on things, standing watch over the camp, monitoring their progress, staying apace with their technological developments, matching their innovations with our improvisations.

We have to stay in touch on the issue.

Got that?

Tuesday, August 01, 2006

The Other Google Blog, and yet more things...

I don't actually have the address committed to memory. I usually just take a stab at typing in what I think the URL should be and if I don't see what I want I search on Google for the Google blog. It (so far) always comes up first.

Well, today I accidentally visited and thought, "Hm. I guess Google doesn't mind if people hyphenate their name into blogs after all".

The other Google blog is an unofficial blog about Google posted by a Librarian named Susan Herzog, an "Information Literacry Librarian @ Eastern Connecticut State University". That's a bigger jawbreaker than anything in Khuzdul, as Sam Gamgee might be tempted to say. I can only guess what an Information Literacy Librarian might be -- sounds like something the U.S. military would dream up to describe a Webmaster.

Still, she provides some very good resources for people who want to do research about...Google. As one might expect, there are many entries concerning Google Print. I can see how librarians of all varieties would be interested in Google Print. And one of the headlines in Susan's blog reads, "What about authors?"

As a published author, I have been asked what I think about Google Print. My reaction is mixed. As someone who cannot afford to buy every book on Earth, it's tempting to have so many of them at my finger tips. On the other hand, I'm not sure how many people realize that you can, actually, print out the full contents of every book encased behind Google Print's protections. The result would look very ugly, but you can do it.

Which means that people can print Visualizing Middle-earth in its entirety, despite the fact that it's labeled as a "Limited Preview" book. Now, I'm not going to share the tedious details on how to print it all out, but if I can figure out how to do it, so can at least 7 other people. The rest of you probably don't care to try.

Why do I allow Google to index my book? Because I'm ambivalent about the whole matter. I might as well see what happens. I've earned a nice amount of money off of Visualizing Middle-earth (which is cream for me, considering I had been paid to write most of the essays by Suite101). It comes up in the top five or ten for a variety of searches such as "elves in Middle-earth", "tolkien middle-earth", "middle-earth movies", "lord of the rings movies", etc.

And I did absolutely nothing to optimize for those searches. I have no idea of how one would be able to optimize for Google Print. But the day may come when books are written with search engines like Google Print in mind. Does that frighten you? It shouldn't. For decades, some authors have written books that they hoped would be picked up for film or television adaptation. Such books are designed to make the transition easily. Does the concept work? I doubt it. The film and television industry tend to go after classic books as much as possible, in my experience.

But books have also been written for specialized markets for decades as well. There are book packagers who see a need in a special niche and they go out and hire authors, artists, editors, whomever they need to produce the exact type of book required. You might buy one of these books at the checkout stand of your supermarket. Your child might buy one of these books in a school book fair. You have no way of knowing if the book you buy was the result of a package deal. I was approached a few years ago by Chris Zavisa, who has worked with Stephen King and Dean R. Koontz, to write the main text for a Middle-earth book. That's all I can say about it, but Chris introduced me to the world of book packaging.

So the production of books for services like Google Print is, in my opinion, something that will happen. It's only a matter of time until someone figures out how to (possibly) make some money off of it. How would I do it? Maybe I'd embed a lot of permanent advertising in the book, and see that each page ranks highly for a variety of searches.

But how do you optimize for Google Print? I don't know. Only Google knows (at this stage). So it would be a bit of a crap shoot, which is (I am sure) how Google wants it. But how long will it be before Google monetizes Google Print? They run ads in the margins, but technologically it should be possible for Google to increase their revenues from Google Print in other ways.

Frankly, I think they should provide a subscription book service, where you can read (and bookmark) books online. The authors would, naturally, get a royalty for your access. Much like RIAA and MPAA, I suspect that mainstream publishers would be appalled at such an idea. After all, if done fairly and right, it would cut them out of the middle and get the money directly to the authors: many of whom wait a long, long time to see money for their work.

And all that came to me because I'm too lazy to memorize