Eye Tracking on Personalized Search: a Tough Nut to Crack

First published August 16, 2007 in Mediapost’s Search Insider

I was in Seattle for the SMX show when somebody asked me if we were planning on doing eye tracking studies on personalized search results.  I replied that I would love to do it, but I just wasn’t sure how.  To accurately track interactions with a personalized page of results, you have to be able to have access to your participant for a significant period of time and track their click stream data. That raises some rather ugly privacy concerns.  The other problem is that Google’s current implementation of personalization is so watered down, it really doesn’t have much impact on the user experience.  What would be really interesting to do is to see what a user interaction might look like with personalized results the way they’ll be in two to three years.

Planting a Research Seed…

With that seed planted, I came back from Seattle, and the first thing I did  was to sit down with our research team and start to explore how we might pull this off.  We realized early on that we wouldn’t be able to do the kind of study where we bring in participants from our regular panel and track interactions with a real search engine.  To come up with a really interesting study, we were going to have to fudge it on the methodology a bit. This was not going to be a study with bulletproof methodology.

So we opted for interesting instead.  We decided that it would be fascinating to speculate on what the search results page might look like in 2010, with a more personalized, richer experience that brings many types of results onto the same page.  How would the eye navigate a search results page that included more than just text-based Web results?  How would we interact with images and video, maps and audio files, all interwoven on the same results page?  How would advertising standout from the organic results? Would the Golden Triangle still exist? Would we still scan the results in an F-shaped pattern?

All these were top-of-mind questions So, starting in late June, we started to put the study together.  Because we couldn’t use our traditional panel (because of privacy issues involved in getting a truly personalized experience) we had to reach out to our circle of family and friends.  What we wanted to do was track interactions with the search results page as it might progress over the next three years.

We came up with three different flavors of search results: the universal results we’re saying today on Google, a slightly more aggressive presentation of universal and personalized search that we might see in a year or two, and then a much more personalized, varied presentation of results in a portal-like format that might represent the search results page in three years time.  We were able to interview some of our favorite experts in the world of search usability and behavior to get a glimpse of what search might look like in the year 2010.  They included Jakob Nielsen, Marissa Mayer, Larry Cornett, Justin Osmer, Greg Sterling, Danny Sullivan and Chris Sherman.

Heat Mapalooza

 I’ve just spent the last week going over hundreds of heat maps slices to try to get a white paper together to release for SES San Jose.  By the way, for regular readers of this column, you’ll remember that when I came back from Seattle, I was somewhat taken aback by the lack of interest in what personalization might mean for the search marketer.  For the 20 or so of you that posted comments indicating that you are definitely interested in how personalization will impact search marketing and would like to hear my thoughts, you’ll be happy to know that we’re adding a section to this white paper on just that subject.

The End of the Golden Triangle?

Without spoiling the results of the study, here are a few tidbits I can share.  Even in Google’s present linear format, the minute you start introducing images into the results, you break down the scan patterns that result in the Golden Triangle.

We saw significant variations in initial orientation points on the page, which led to a much different interaction and scanning pattern.  We tend to fixate on images and if these images appear in the top of page real estate, they create different entry points for the eye.  Our entry point has traditionally been in the far upper left, but now we may orient on an image that’s in the second or third result and then move to further scanning from this point.

In the sessions where we saw the scanning activity move down the page and start from an in-line graphic, we saw a different level of interaction with the sponsored results. Scanning is pulled down the page and away from the top of page, Golden Triangle real estate.

One of the really interesting things to consider is that the interface of the search results page is in more flux now than it has been in any time in the past decade.  Engines are increasingly looking at presentation of results as a key differentiating factor in the search engine war.  Ask really pushed this approach with their introduction of 3-D Search.

The search results page we see now has largely defined itself, based on Google’s success, across all the major search properties and has remained relatively static over the past few years.  All that is about to change.  As we search for a richer and more relevant search experience, the elements of the page will be in constant flux.

One of the challenges will be in making sure that as personalization takes hold, the relevance of the organic results and the relevance of the sponsored results stay in sync.  This was a point contantly hammered home by Marissa Mayer in several interviews with her.  While Google is choosing the organic side to roll out its personalization technology, the company has to ensure that the relevancy of the sponsored results doesn’t begin to drop, relative to personalized organic results.  There will be a delicate juggling act needed to ensure that the user experience and the effectiveness of advertising don’t sway too far from the ideal point of balance.

I can tell you that the heat maps I’ve seen so far are the most interesting ones I’ve seen since we first identified the Golden Triangle. If you do happen to be at SES San Jose, try to catch the results at the research update panel. Otherwise, I’ll give you a heads-up when it’s available in this column in a few weeks time.

Interview with Jakob Nielsen on the Future of the SERP (and other stuff)

jakob-nielsen_cropped.jpg.400x400_q95_crop_upscaleI recently had the opportunity to talk to Jakob Nielsen for a series I’m doing for Search Engine Land about what the search results page will look like in 2010.  Jakob is called a “controversial guru of Web design” in Wikipedia (Jakob gets his own shots in at Wikipedia in this interview) because of his strongly held views on the use of graphics and flash in web design. I have a tremendous amount of respect for Jakob, even though we don’t agree on everything, because of his no frills, common sense approach to the user experience. And so I thought it was quite appropriate I sound him out on his feelings about the evolution of the search interface, now that with Universal search and Ask’s 3D Search we seem to be seeing more innovation in this area in the last 6 months than we’ve seen for the last 10 years. Jakob is not as optimistic about the pace of change as I am, but the conversation was fascinating. We touched on Universal Search, personalization, banner blindness on the SERP and scanning of the web in China, amongst other things. Usability geeks..enjoy!

Gord: For today I only really have one question, although I’m sure there be lots of branch offs from it. It revolves around what the search engine results page may look like in 2010.  I thought you would be a great person to lend your insight on that.

Jakob: Ok, sure.

Gord: So why don’t we just start? Obviously there are some things that are happening now with personalization and universal search results. Let’s just open this up. What do you think we’ll be seeing on a search results page in 3 years?

Jakob: I don’t think there will be that big a change because 3 years is not that long a time. I think if you look back three years at 2004, there was not really that much difference from what there is today.  I think if you look back ten years there still isn’t that much difference.  I actually just took a look at some old screen shots in preparation before this call at some various search engines like Infoseek and Excite and those guys that were around at that time, and Google’s Beta release, and the truth is that they were pretty similar to what we have today as well.  The main difference, the main innovation seems to have been to abandon banner ads, which we all know now really do not work, and replace them with the text ads, and of course that affected the appearance of the page.  And of course now the text ads are driven by the key words, but in terms of the appearance of the page, they have been very static, very similar for 10 years.  I think that’s quite likely to continue. You could speculate the possible changes. Then I think there are three different big things that could happen.

One of them that will not make any difference to the appearance and that is a different prioritization scheme. Of course, the big thing that has happened in the last 10 years was a change from an information retrieval oriented relevance ranking to being more of a popularity relevance ranking. And I think we can see a change maybe being a more of a usefulness relevance ranking. I think there is a tendency now for a lot of not very useful results to be dredged up that happen to be very popular, like Wikipedia and various blogs. They’re not going to be very useful or substantial to people who are trying to solve problems. So I think that with counting links and all of that, there may be a change and we may go into a more behavioral judgment as to which sites actually solve people’s problems, and they will tend to be more highly ranked.

But of course from the user perspective, that’s not going to look any different. It’s just going to be that the top one is going to be the one that the various search engines, by what ever means they think of, will judge to be the best and that’s what people will tend to click first, and then the second one and so on. That behavior will stay the same, and the appearance will be the same, but the sorting might be different. That I think is actually very likely to happen

Gord: So, as you say, those will be the relevancy changes at the back end. You’re not seeing the paradigm of the primarily text based interface with 10 organic results and  8-9 sponsored results where they are, you don’t see that changing much in the next 3 years?

Jakob: No.  I think you can speculate on possible changes to this as well. There could be small changes, there could be big changes.  I don’t think big changes. The small changes are, potentially, a change from the one dimensional linear layout to more of a two dimensional layout with different types of information, presented in different parts of the page so you could have more of a newspaper metaphor in terms of the layout. I’m not sure if that’s going to happen.  It’s a huge dominant user behavior to scan a linear list and so this attempt to put other things on the side, to tamper with the true layout, the true design of the page, to move from it being just a list, it’s going to be difficult, but I think it’s a possibility.  There’s a lot of things, types of information that the search engines are crunching on, and one approach is to unify them all into one list based on it’s best guess as to relevance or importance or whatever, and that is what I think is most likely to happen.  But it could also be that they decide to split it up, and say, well, out here to the right we’ll put shopping results, and out here to the left we’ll put news results, and down here at the bottom we’ll put pictures, and so forth, and I think that’s a possibility.

Gord: Like Ask is experimenting with right now with their 3D search. They’re actually breaking it up into 3 columns, and using the right rail and the left rail to show non-web based results.

Jakob: Exactly, except I really want to say that it’s 2 dimensional, it’s not 3 dimensional.

Gord: But that’s what they’re calling it.

Jakob: Yes I know, but that’s a stupid word. I don’t want to give them any credit for that. It’s 2 dimensional. It’s evolutionary in the sense that search results have been 1 dimensional, which is linear, just scroll down the page, and so potentially 2 dimensional (they can call it three but it is two) that is the big step, doing something differently and that may take off and more search engines may do that if it turns out to work well.  But I think it’s more likely that they will work on ways on integrating all these different sources into a linear list. But those are two alternative possibilities, and it depends on how well they are able to produce a single sorted list of all these different data sources.  Can they really guess people’s intent that well?

All this stuff..all this talk about personalization, that is incredibly hard to do. Partly because it’s not just personalization, based on a user model, which is hard enough already. You have to guess that this person prefers this style of content and so on.  But furthermore, you have to guess as to what this person’s “in this minute” interest is and that is almost impossible to do. I’m not too optimistic on the ability to do that.  In many ways I think the web provides self personalization, you know, self service personalization. I show you my navigational scheme of things you can do on my site and you pick the one you want today, and the job of the web designer is to, first of all, design choices that adequately meet common user needs, and secondly, simply explain these choices so people can make the right ones for them.  And that’s what most sites do very poorly. Both of those two steps are done very poorly on most corporate websites. But when it’s done well, that leads to people being able to click – click and they have what they want, because they know what they want, and its very difficult for the computer to guess what they want in this minute.

Gord:  When we bring it back to the search paradigm, giving people that kind of control to be able to determine the type of content that’s most relevant to them requires them interacting with the page in some way.

Jakob: Yes, exactly, and that’s actually my third possible change. My first one was changing to the ranking scheme; the second one was the potentially changing to two dimensional layouts. The third one is to add more tools to the search interface to provide query reformulation and query refinement options. I’m also very skeptical about this, because this has been tried a lot of times and it has always failed.  If you go back and look at old screen shots (you probably have more than I have) of all of the different search engines that have been out there over the last 15 years or so, there have been a lot of attempts to do things like this. I think Microsoft had one where you could prioritize one thing more, prioritize another thing more. There was another slider paradigm. I know that Infoseek, many, many years ago, had alternative query terms you could do just one click and you could search on them, which was very simple. Yet most people didn’t even do that.

People are basically lazy, and this makes sense.  The basic information foraging theory, which is, I think, the one theory that basically explains why the web is the way it is, says that people want to expend minimal effort to gain their benefits.  And this is an evolutionary point that has come about because the people, or the creatures, who don’t exert themselves, are the ones most likely to survive when there are bad times or a crisis of some kind. So people are inherently lazy and don’t want to exert themselves. Picking from a set of choices is one of the least effortful interaction styles which is why this point and click interaction in general seems to work very well. Where as tweaking sliders, operating pull down menus and all that stuff, that is just more work.

Gord: Right.

Jakob: But of course, this depends on whether we can make these tools useful enough, because it’s not that people will never exert themselves.  People do, after all, still get out of bed in the morning, so people will do something if the effort is deemed worthwhile.  But it just has to be the case that if you tweak the slider you get remarkably better results for your current needs.  And it has to be really easy to understand. I think this has been a problem for many of these ideas. They made sense to the search engine experts, but for the average user they had no idea about what would happen if they tweaked these various search settings and so people tended to not do them.

Gord: Right. When you look at where Google appears to be going, it seems like they’ve made the decision, “we’ll keep the functionality transparent in the background, we’ll use our algorithms and our science to try to improve the relevancy”, where as someone like Ask might be more likely to offer more functionality and more controls on the page. So if Google is going the other way, they seem to be saying that personalization is what they’re betting on to make that search experience better.  You’re not too optimistic that that will happen without some sort of interaction on the part of the user?

Jakob: Not, at least, in a small number of years. I think if you look very far ahead, you know 10, 20, 30 years or whatever, then I think there can be a lot of things happening in terms of natural language understanding and making the computer more clever than it is now. If we get to that level then it may be possible to have the computer better guess at what each person needs without the person having to say anything, but I think right now, it is very difficult.  The main attempt at personalization so far on the web is Amazon.com. They know so much about the user because they know what you’ve bought which is a stronger signal of interest than if you had just searched for something.  You search for a lot of things that you may never actually want, but actually paying money; that’s a very, very strong signal of interest.  Take myself, for example. I’m a very loyal shopper of Amazon. I’ve bought several hundred things from them and despite that they rarely recommend (successfully)…sometimes they actually recommend things I like but things I already have. I just didn’t buy it from them so they don’t know I have it. But it’s very, very rare that they recommend something where I say, “Oh yes, I really want that”. So I actually buy it from them.  And that’s despite the (fact that the) economic incentive is extreme, recommending things that people will buy. And they know what people have bought. Despite that and despite their work on this now for already 10 years (it’s always been one of their main dreams is to personalize shopping) they still don’t have it very well done. What they have done very well is this “just in time” relevance or “cross sell” as it’s normally called. So when you are on one book on one page, or one product in general, they will say, here are 5 other ones that are very similar to the one you’re looking at now. But that’s not saying, in general, I’m predicting that these 5 books will be of interest to you. They’re saying, “Given that you’re looking at this book, here are 5 other books that are similar, and therefore, the lead that you’re interested in these 5 books comes from your looking at that first book, not from them predicting or having a more elaborate theory about what I like.

Gord: Right.

Jakob: What “I like” tends not to be very useful.

Gord: Interesting. Jakob, I want to be considerate of your time but I do have one more question I’d love to run by you.  As the search results move towards more types of images, we’re already seeing more images showing up on the actual search results page for a lot of searches. Soon we could be seeing video and different types of information presented on the page. First of all, how will that impact our scanning patterns?  We’ve both done eye scanning research on search engine results, so we know there is very distinct patterns that we see.  Second of all, Marissa Mayer in a statement not that long ago seemed to backpedal a bit about the fact that Google would never put display ads back on a search results page, seeming to open a door for non text ads.  Would you mind commenting on those two things?

Jakob: Well they’re actually quite related.  If they put up display ads, then they will start training people to exhibit more banner blindness, which will also cause them to not look at other types of multimedia on the page. So as long as the page is very clean and the only ads are the text ads that are keyword driven, then I think that putting pictures and probably even videos on there actually work well.  The problem of course is they are inherently a more two dimensional media form, and video is 3 dimensional, because it’s two dimensional – graphic, and the third dimension is time, so they become more difficult to process in this linear type of scanned document “down the page” type of pattern.  But on the other hand people can process images faster, with just one fixation and you can “grok” a lot of what’s in an image, so I think that if they can keep the pages clean, then it will be incorporated in peoples scanning pattern a little bit more. “Oh this can give me a quick idea of what this is all about and what type of information I can expect”.  This of course assumes as well one more thing which is that they can actually select good pictures.

Gord: Right.

Jakob: I would be kind of conservative until higher tweaking with these algorithms, you know, what threshold should you cross before you put an image up.  I would really say tweak it such so that you only put it up when you’re really sure that it’s a highly relevant good image.  If there starts becoming that there are too many images, then we start seeing the obstacle course behavior. People scan around the images, as they do on a lot of corporate websites, where the images tend to be stock photos of glamour models that are irrelevant to what the user’s there for.  And then people involve behavior where they look around the images which is very contrary to first principals of perceptual psychology type of predicting which would be that the images would be attractive. Images turn out to be repelling if people start feeling like they are irrelevant. It’s a similar effect to banner blindness. If there’s any type of design element that people start perceiving as being irrelevant to their needs, then they will start to avoid that design element.

Gord: So, they could be running the risk of banner blindness, by incorporating those images if they’re not absolutely relevant…

Jakob: Exactly.

Gord: …to the query. Ok thank you so much.  Just out of interest have you done a lot of usability work with Chinese?

Jakob: Some. I actually read the article you had on your site. We haven’t done eye tracking studies, but we did some studies when we were in Hong Kong recently, and to that level the findings were very much the same. In terms of pdf was bad and how people go though shopping carts. So a lot of the transactional behavior, the interaction behavior, is very, very similar.

Gord: It was interesting to see how they were interacting with the search results page.  We’re still trying to figure out what some of those interactions meant

Jakob: I think it’s interesting. It can possibly be that the alphabet or character set is less scannable, but it is very hard to say because when you’re a foreigner, these characters look very blocky, and it looks very much like a lot of very similar scribbles.  But on the other hand, it could very well be the same, that people who don’t speak English would view a set of English words like a lot of little speck marks on the page, and yet words in English or in European languages are highly scannable because they have these shapes.

Gord: Right.

Jakob: So I think this is where more research is really called for to really find out.  But I think it’s possible, you know the hypothesis is that it’s just less scannable because the actual graphical or visual appearance of the words just don’t make the words pop as much.

Gord: There seems to be some conditioning effects as well and intent plays a huge part.  There’s a lot of moving pieces with that and we’re just trying to sort out. The relevancy of the results is a huge issue because the relevancy in China is really not that good so…

Jakob: It seems like it would have a lot to do with experience and amount of information.  If you compare back with uses of search in the 80’s, for example, before the web started, that was also a much more thorough reading of search results because people didn’t do search very well. Most people never did it actually, and when you did do it you would search through a very small set of information, and you had to carefully consider each probability. Then, as WebCrawler and Excite and AltaVista and people started, users got more used to scanning, they got more used to filtering out lots of junk. So the paradigm has completely changed from “find everything about my question” to “protect myself against overload of information”.  That paradigm shift requires you to have lived in a lot of information for awhile.

Gord: I was actually talking to the Chinese engineering team down at Yahoo! and that’s one thing I said. If you look at how the Chinese are using the internet, it’s very similar to North America in 99 or 2000. There’s a lot of searching for entertainment files and MP3s. They’re not using it for business and completing tasks nearly as much. It’s an entertainment medium for them, and that will impact how their browsing things like search results. It’ll be interesting to watch as that market matures and as users get more experienced, if that scanning pattern condenses and tightens up a lot

Jakob: Exactly. And I would certainly predict it would. There could be a language difference, basically a character set as we just discussed, but I think the basic information foraging theory is still a universal truth. People have to protect themselves against information overload, if you have information overload. As long as you’re not accustomed to that scenario, then you don’t evolve those behaviors. But once you get it… I think a lot of those people have lived in an environment where there’s not a lot of information.  Only one state television channel and so forth and gradually they’re getting satellite television and they’re getting millions of websites. But gradually they are getting many places where they can shop for given things, but that’s going to be an evolution.

Gord: The other thing we saw was that there was a really quick scan right to the bottom of the page, within 5 seconds, just to determine how relevant these results were, were these legitimate results? And then there was a secondary pass though where they went back to the top and then started going through. So they’re very wary of what’s presented on the page, and I think part of it is lack of trust in the information source and part of it is the amount of spam on the results page.

Jakob: Oh, yes, yes.

Gord: Great thanks very much for your time Jakob.

Jakob: Oh and thank you!

Shari Thurow Talking Smack about Eye Tracking

You know, if I didn’t know better I’d say that Shari Thurow had issues with me and eye tracking. I ran across a column a couple of weeks ago where she was talking about the niches that SEO’s are carving out for themselves and she mentioned eye tracking specifically. In fact she devoted a whole section to eye tracking. Now, it’s pretty hard not to take it personally when Enquiro is the only search marketing company I know that does extensive eye tracking. We’re the only ones I’m aware of that have eye tracking equipment in-house. So when Shari singles out eye tracking and warns about using the results in isolation…

That brings me to my favorite group of SEO specialists: search usability professionals. As much as I read and admire their research, they, too, often don’t focus on the big picture.

…I’m not sure who else she might be talking about.

I’ve been meaning to post on this for awhile but I just didn’t get around to it. I’m on the road today and feeling a little cranky so what the heck. It’s time to respond in kind. First, here’s Shari’s take on on eye tracking and SEO.

Eye-tracking data is always fascinating to observe on a wide variety of Web pages, including SERPs (define). As a Web developer, I love eye-tracking data to let me know how well I’m drawing visitors’ attention to the appropriate calls to action for each page type.

Nonetheless, eye-tracking data can be deceiving. Most search marketers understand the SERP’s prime viewing area, which is in the shape of an “F.” Organic or natural search results are viewed far more often than search engine ads are, and (as expected) top, above-the-fold results are viewed more often than the lower, below-the-fold results. Viewing a top listing in a SERP isn’t the same as clicking that link and taking the Web site owner’s desired call to action.

Remember, usability testing isn’t the same as focus groups and eye tracking. Focus groups measure peoples’ opinions about a product or service. Eye-tracking data provide information about where people focus their visual attention. Usability testing is task-oriented. It measures whether participants complete a desired task. If the desired task isn’t completed, the tests often reveal the many roadblocks to task completion.

Eye-tracking tests used in conjunction with usability tests and Web analytics analysis can reveal a plethora of accurate information about search behavior. But eye-tracking tests used in isolation yield limited information, just as Web analytics and Web positioning data yield limited (and often erroneous) information.

Okay Shari, you didn’t mention me or Enquiro by name but again, who else would you be talking about?

Actually, Shari and I agree more than we disagree here. I agree that no single data source or research or testing approach provides all the answers, including eye tracking. However, eye tracking data adds an extraordinarily rich layer of data to common usability testing. When Shari says eye tracking is not the same as usability testing, she’s only half right. As Shari points out, eye tracking combines very well with usability testing but in many cases, can be overkill. Usability testing is task oriented. There’s no reason why eye tracking studies can’t be task oriented as well (most of ours are). The eye tracking equipment we use is very unobtrusive. It virtually like interacting with any computer in a usability lab. In usability testing you put someone in front of the computer with the task and asked them to complete the task. Typically you record the entire interaction with software such as TechSmith’s Morae. After you can replay the session and watch where the cursor goes. Eye tracking can capture all that, plus capture where the eyes went. It’s like taking a two dimensional test and suddenly making it three-dimensional. Everything you do in usability can also be done with eye tracking.

The fact is, the understanding we currently have of interaction with the search results would be impossible to know without eye tracking. I’d like to think that a lot of our current understanding of interaction with search results comes from the extensive eye tracking testing we’ve done on the search results page. The facts that Shari says are common knowledge among search marketers comes, in large part, from our work with eye tracking. And we’re not the only ones. Cornell and Microsoft have done their own eye tracking studies, as has Jakob Nielsen, and findings have been remarkably similar. I’ve actually talked to the groups responsible for these other eye tracking tests and we’ve all learned from each other.

When Enquiro produced our studies we took a deep dive into the data that we collected. I think we did an excellent job at not presenting just the top level findings but really tried to create an understanding of what the interaction with the search results page looks like. Over the course of the last two years I’ve talked to Google, Microsoft and Yahoo. I’ve shared the findings of our research and learned a little bit more about the findings of their own internal research. I think, on the whole, we know a lot more about how people interact with search than we did two years ago, thanks in large part to eye tracking technology. The big picture Shari keeps alluding to has broadened and been colored much more extensively thanks to those studies. And Enquiro has tried to share that information as much as possible. I don’t know of anyone else in the search marketing world who’s done more to help marketers understand how people interact with search. When we released our first study, Shari wrote a previous column that basically said, “Duh, who didn’t know this before?” Well, based on my discussions with hundreds, actually, thousands of people, almost everyone, save for a few usability people at each of the main engines.

There are some dangers with eye tracking. Perhaps the biggest danger is that heat maps are so compelling visually. People tend not to go any further. The Golden Triangle image has been displayed hundreds, if not thousands of times, since we first released it. It’s one aggregate snapshot of search activity. And perhaps this is what Shari’s referring to. If so, I agree with her completely. This one snapshot can be deceiving. You need to do a really deep dive into the data to understand all the variations that can take place. But it’s not the methodology of eye tracking that’s at fault here. It’s people’s unwillingness to roll up their sleeves and weed through the amount of data that comes with eye tracking, preferring instead to stop at those colorful heat maps and not go any further. Conclusions on limited data can be dangerous, no matter the methodology behind them. I actually said the same for an eye tracking study Microsoft did that had a few people drawing overly simplified conclusions. The same is true for usability testing, focus groups, quantitative analysis, you name it. I really don’t believe Enquiro is guilty of doing this. That’s why we released reports that are a couple hundred pages in length, trying to do justice to the data we collected.

Look, eye tracking is a tool, a very powerful one. And I don’t think there’s any other tool I’ve run across that can provide more insight into search experience, when it’s used with a well designed study. Personally, if you want to learn more about how people interact with engines, I don’t think there’s any better place to start than our reports. And it’s not just me saying so. I’ve heard as much from hundreds of people who have bought them, including representatives at every major search engine (they all have corporate licenses, as well as a few companies you might have heard of, IBM, HP, Xerox..to name a few). I know the results pages you see at each of the major engines look the way they do in part because of our studies.

Shari says we don’t focus on the big picture. Shari, you should know that you can’t see the big picture until you fill in the individual pieces of the puzzle. That’s what we’ve been trying to do. I only wish more people out there followed our example.

Top Spot or Not in Google?

Brandt Dainow at Think Metrics shared the results of his campaign performance with Google Adwords and came up with the following conclusions:

    • There is no relationship between the position of an advertisement in the Google Ad listings and the chance of that ad being clicked on.
    • Bidding more per visitor in order to get a higher position will not get you more visitors.
    • The number one position in the listings is not the best position.
    • No ad position is any better than any other.
    • The factor which has the most bearing on your chance of being clicked on is the text in your ad, not the ad’s position.

These conclusions were arrived at after analyzing the Google ads he ran this year. He says,

“while position in the listings used to be important, it is not anymore. People are more discriminating in their use of Google Ads than they used to be; they have learned to read the ads rather than just click the first one they see”

This runs directly counter to all the research we’ve done, and also that done by others, including Atlas one point. So I decided it was worth a deeper dive.
First, some facts about the analysis. It was done on ads he ran in October and November of last year, for the Christmas season. He acknowledges that this isn’t a definitive analysis, but the results are surprising enough that he encourages everyone to test their own campaigns.
In the following chart, he tracks the click through per position.

Dainow
Brandt expected to see a chart that started high on the left, and tapered down as it moved to the right. But there seemed to be little correlation between position and click through. This runs counter to our eye tracking, which showed a strong correlation, primarily on first page visits. Top sponsored ads on Google received 2 to 3 times the click throughs.

enquirorank

Further, Atlas OnePoint did some analysis from their data set, and similarly found a fairly high correlation between position and click through on Google and Overture/Yahoo.

atlasrank

So why the difference?

Well, here are a couple thoughts right off the bat. Dainow’s data is exclusively for his campaigns. We don’t see click through rates for the other listings, both paid and non-paid, on the page, so we can’t see how his ads stack up against others on the page. Also, it may be that for the campaigns in question, Brandt’s creative is more relevant than the other ads that show. He makes the point that creative is more important than position. I don’t necessarily agree completely. The two work together. The odds of being seen are substantially higher in the top spots, and your creative doesn’t work if it isn’t seen. The discriminating searcher that Dainow sees emerging who takes the time to read all the ads isn’t the searcher we see in eye tracking tests. That searcher quickly scans 3 to 4 listings, usually top sponsored and the top 1 or 2 organic listings and then makes their choice. This is not only true of our study, but the recent Microsoft one that just came out. Although Dainow’s charts over time certainly seem to show that position is less important, there could be a number of other factors contributing to this.

I will agree with Brandt though that if seen, relevant and compelling copy does make a huge difference in the click through rate of the ad. And for consumer researchers in particular, I still see search advertiser’s cranking out copy that’s not aligned to intent. But all the evidence I’ve seen points to much higher visibility, and hence, click throughs, in the top sponsored spots.

When looking at analysis like Brandt Dainow is presenting, you have to be aware of all the variables. In this case, I’d really like to know the following:

  • What were the keywords that made up the campaigns
  • What was the creative that was running for his clients
  • What was the creative the competition was running
  • What were the overall click throughs for the page

In doing the analysis, you really need to control for these variables before you can make valid conclusions. Some are ones we can know, others, like the overall click throughs, only the engines would know.

But Dainow is quick to point that his findings show the need for individual testing on a campaign by campaign basis. And in that, we’re in complete agreement. Our eye tracking tests and other research shows general patterns over common searches, and the patterns have been surprisingly consistent from study to study. It probably gives us as good idea as any what typical searcher behavior might be. But as I’ve said before, there is no such thing as typical behavior. Look at enough searches and an average, aggregate pattern emerges, but each search is different. It depends on searcher intent, it depends on the results and what shows on the page, it depends on the engines,  it depends on what searchers find on the other side of the click. All these things can dramatically affect a scan pattern. So while you might look to our studies or others as a starting point, we continually encourage you to use our findings to set up your own testing frameworks. Don’t take anything for granted. But that’s a message that often doesn’t get through. And my concern is that advertisers looking for a magic bullet will read Dainow’s conclusions highlighted at the top of this post and swallow them whole, without bothering to digest them. And there’s still far too many question marks about this analysis for anyone to do that. I’ve contacted Dainow to set up a chat so I can find out more. Hopefully we can shed more light on this question.

Why No “Golden Triangle” in the Microsoft Eye Tracking Study

Over at Searchengineland, Danny Sullivan did a deeper dive into the Microsoft Eye Tracking Study that I posted about last Friday. In it, Danny said:

“Interesting, the pattern is different that the “golden triangle” that Enquiro has long talked about in its eye tracking studies, where you see all the red along the horizontal line of the top listing (indicating a lot of reading there), then less on the second listing, then less still as you move down. “

I just want to draw a few distinctions between the studies. In our study, we wanted to replicate typical search behavior as much as possible, so let people interact with actual results pages. In the Microsoft study, they were testing what would happen when the most relevant result was moved down the page and how searchers responded to different snippet lengths. The results, while actual results, were intercepted and were restructured in a way (i.e., stripping out sponsored ads) to let the researchers test different variables. We have said repeatedly that the Golden Triangle is not a constant, as is shown in our second study, but follows intent and the presentation of the search results.

In fact, the Microsoft study does confirm many of our findings, in the linear scanning of results, the scanning of groups of results and the importance of being in the top 5.

Another potential misconception that could be drawn from Danny’s interpretation of results is hard and fast rules about how many results searchers scan. He settled on the number five. When looking at eye tracking results, it’s vital to remember that there is no typical activity. Please don’t take an average and apply it as a rule of thumb. Averages, or aggregate heat maps, are just that. They’re what happens when you take a lot of different sessions, varying greatly, and mash them together. Scanning activity is highly dependent on the intent of the user and what appears on the search results page. A particularly relevant result in top sponsored, matched to the intent of the majority of users, would probably mean little scanning beyond the first or second organic result. On the other hand, if the query is more ambiguous, you could see scanning a lot deeper on the page. The Microsoft study used two tasks that would generate a limited number of queries, and recorded interactions based on this limited scope. Our studies, while using more tasks, still out of necessity represented the tiniest slice of possible interactions.

After looking at over a thousand sessions in the past 2 years, I’ve learned first hand that there are a lot of variables in scanning patterns and interactions with the search page. An eye tracking study provides clues, but no real answers. You have to take the results and try to extrapolate them beyond the scope of the study. We spent a lot of time doing this when writing up both our reports. You try to find universal behaviors and commonalities, but you have to be very careful not to accept the results at face value. Drawing conclusions such as snippet lengths should be longer or that official site tags should become standard are dangerous, because it’s not true for every search. The study actually found that ideal snippet length is highly dependent on the task and intent of the user.

If anything, what eye tracking has shown me is the need for more flexible search results, personalized to me and my intent at the time.

New Microsoft Eye Tracking Study

Microsoft has just released the results of an internal eye tracking study that looked at the impact of snippet length. For more detail, visit Marina Garrison’s blog where she looks at the notable findings.

msheatmapm

A few quick ones and some comments:

Snippet length doesn’t seem to impact people’s search strategies.

This makes sense to us. We found scanning for word patterns rather than actual reading. In fact, a longer snippet may actually detract from the user experience in certain scenarios, such as navigational search. It makes it more difficult to pick up information scent quickly. Remember, we’re on and off the search page as quickly as possible.

People scan 4 listings regardless

This is definitely aligned with the Rule of 3 (or 4) we found in our eye tracking study. We found, however, that this isn’t a hard and fast rule, but rather a pretty common tendency. It changes depending on whether top sponsored ads appeared, how closely aligned the top result was to intent and other factors. But in general, we would agree that most people tend to scan 3 or 4 listings before clicking on one.

Scenario Success Rates Dropped Dramatically as the “Best” Listing Moved Down the Page

No big surprise here. This was referred to in our first study as the “Google” Effect, and it comes from our being trained that best result should show up on top. I actually co-authored a paper with Dr. Bajaj and Dr. Wood at the University of Tulsa about this very topic. By the way, it was Dr. Bajaj that called it the “Google” Effect, not me, so please Yahoo and Microsoft, don’t beat me up on this one.

The report is available for download.

Google Pulls Back the Curtain on Quality Score – a Little

At the last few shows I’ve attended, an interesting theme emerged. Up to now, reverse engineering an algorithm was exclusively a preoccupation on the organic side. SEO’s would try to out wit and out guess Yahoo and Google’s black box. But with the introduction of quality score, that game suddenly moved to the sponsored side of the strategy table. Because the factors that went into the quality score weren’t disclosed, particularly by Google, it was a game of test and guess by advertisers. A lot of show attendees were expressing frustration that there wasn’t more transparency. Google has apparently heard the call, and yesterday issued a clarification.

Google’s advice?

  • Link to the page on your site that provides the most useful and accurate information about the product or service in your ad.
  • Ensure that your landing page is relevant to your keywords and your ad text.
  • Distinguish sponsored links from the rest of your site content.
  • Try to provide information without requiring users to register. Or, provide a preview of what users will get by registering.
  • In general, build pages that provide substantial and useful information to the end-user. If your ad does link to a page consisting of mostly ads or general search results (such as a directory or catalog page), provide additional information beyond what the user may have seen in your ad or on the page prior to clicking on your ad.
  • You should have unique content (should not be similar or nearly identical in appearance to another site). For more information, see our affiliate guidelines.

While a step forward, there’s still a lot hidden under the hood of this algorithm. Anytime you put algorithms in charge, it opens the door to reverse engineering, and you can bet the SEM community is going to launch a barrage of tests to try to determine the nuances that determine the quality of a landing page in the eyes of the quality score algorithm.

What this does do, however, is increase the complexity of the quality score substantially. There are now three seperate components, including user click through, ad quality and landing page quality. Each addition exponentially increases the complexity of the algorithm, making it a lot tougher to game. It harkens back to the original introduction of the Google PageRank algorithm, which went beyond on-the-page factors to introduce the whole concept of authority within the structure of the Web.

How important is the quality score? It’s vital. Moving up the ranks on the sponsored side is at least as important as on the algorithmic side, and if you can make the leap from the right rail to the top sponsored ads, you can expect a 3 to 10X increase in visibility and click throughs.

Our recent eye tracking study showed just how important relevancy is in these top spots. And Google has always been very aware of that importance. They have an obsession about providing relevancy above the fold, especially in the Golden Triangle, that is not matched by any of the other engines. I actually had a chance to chat with Marissa Mayer about this. The interview will be part of the Eye Tracking study (currently available, by the way, and you’ll get a free final version with Marissa’s interview when it’s available) but I’ll be including some tidbits in this blog as well.

What’s Up with Verticals?

First published July 27, 2006 in Mediapost’s Search Insider

You probably haven’t given a lot of thought lately to vertical search results, that thin sliver of search real estate that is sandwiched between the top sponsored ads and the top organic ads, and generally shows a few lines of news results, or local, or products. I have. Don’t panic, there’s really no reason why you should have. It’s really just a sad comment on my day-to-day activities. But I’ve noticed some things, and I think it’s incumbent upon me to share them with you. So let’s get vertical for a few moments, shall we?

In a Location Near You

First, this is prime real estate. When vertical results appear on the major engines, they appear smack in the middle of the hottest part of the page. After a number of eye tracking studies, we can say with a degree of certainty that most searchers (upwards of 80 percent) at least look at the top sponsored ads and the top three or so organic ads. That means that vertical, wedged in between, will be at least grazed over by a lot of eyeballs.

But position is not enough. Working the vertical angle is not just about grabbing some prime real estate. Verticals have to offer information scent. The information, links and visual cues they offer have to align with the user’s intent. In one bizarre example we saw during our latest study, somebody searched on Google for “digital cameras.” For some reason, Google saw fit to return news results for digital cameras. Now, just what percentage of the over two million people who searched for “digital cameras” last month (a quick estimate courtesy of Yahoo) do you guess would be looking for the scoop on how Nikon had to recall 710,000 digital camera batteries? Maybe the ex-product manager from Nikon, in between looking for new jobs on Monster, but that’s about it.

Hopelessly Devoted to OneBox?

While we’re on the subject, what’s the deal with Google and verticals anyway? Search pundit Greg Sterling said in a blog post some time ago that Google had an “almost religious devotion to OneBox,” its vertical label of choice. Could be, but it seems that a few in the temple of Google are questioning their religious affiliations. OneBox results have been a little sketchy of late. The reason this came to light is that I’ve just looked at 100-plus sessions in Google for a recent study, and there were surprisingly few of those sessions with OneBox results showing.

First of all, they hardly ever show for product-based searches. Try it for yourself. I must have tried over a dozen different common product searches before I got one that returned Froogle results via OneBox. Now why would that be? Well, for one thing, OneBox real estate competes with top sponsored ads, and perhaps advertisers are starting to resent the increased competition in their neighborhood for highly commercial searches. If that theory is correct, it flies in the face of Google’s goal to provide the most relevant results for each query, no matter what the source of the results. Another reason might be that Froogle has never really gained traction as a shopping engine. Maybe Google’s quiet dialing down the rate of appearance of Froogle results on the main page is their way of admitting that these results aren’t adding value to the user experience.

Doing Vertical Right

If you’re looking at a good example of Vertical execution, Yahoo seems to be currently leading the pack with its Shortcuts. The display of vertical results is consistent, and they seem to be one step ahead of the competition in aligning results with user intent.

Here are some examples we saw in a recent study:

One of the tasks given was to research the upcoming purchase of a digital camera. This resulted in a number of related queries being used, ranging from very general (“digital cameras”) to very specific (“Canon Powershot A530”). When these queries were thrown at Yahoo, the engine was able to differentiate and return appropriate vertical results. Broad generic phrases returned vertical results that compared known brands or allowed browsing by features. More specific queries returned links that led to reviews and best prices for that model alone. It was a great example of results matching intent, and we saw the interaction with these results go up dramatically as an example.

One very bright thing that Yahoo does consistently in its vertical listings is provide a 5-star rating scale. It appears for products, some local results (restaurants, hotels) and in various other places. When it comes to attracting our eye, nothing does the trick better than a visual cue that promises ratings. We love lists that sort from most popular to least popular. It’s the paradigm of the consumer researcher, and it’s something that reeks of scent. We saw eyeballs attracted to these icons like search marketers to an open bar (come on, I know many of you are already scoping out the cocktail network for San Jose).

A Vertical Future

I still believe that verticals mark a path into search’s future, but until the engines do better at disambiguating intent, either through personalization, behavioral tracking or just really smart key phrase parsing, they will be relegated to the thin sliver of real estate they currently occupy. Their success in luring users into what Sterling called a “Page 2” vertical experience will lie solely in how well they deliver on intent.

The Rule of Three in Search

First published July 20, 2006 in Mediapost’s Search Insider

Once again, I find myself up to my earlobes in eye-tracking data. I have no one to blame, as I got myself into this mess when I made the well-intentioned but poorly thought out promise to have the first draft of a study done by the time I head out on vacation at the end of the month.

In wading through the sessions (about 420 of them) sometimes new insights rise to the top–and sometimes my eyeballs just roll back in my head as my hands jerk spasmodically on my keyboard and drool runs down my cheek. Luckily, this week it was the former.

In this study, we are looking at interactions with Google, compared to MSN and Yahoo. Recently, one finding in particular seemed to be screaming out to be noticed. Being a compassionate sort of researcher, I listened.

When we looked at interactions with the top sponsored ads, there was a notable difference between MSN, Yahoo and Google. On MSN and Google, the percentage of clicks happening on these top ads seemed to be in line with previous studies done both by us and by others. But the amount of activity on the Yahoo ads seemed to be substantially higher. We started out by looking at first fixations, or the first place people looked on the page, even for a split second. Here, the engines were all in the same ball park, with 83.7 percent of first fixations in top sponsored ads for Yahoo, compared to 86.7 percent for MSN and 80.6 percent for Google.

Then, we looked at where the first activity on listing happened; where on the page did people start actually scanning listings? Google held a good percentage of eyeballs, keeping 12.4 percent of the users, while MSN had a significant defection issue, losing 36.6 percent of the people who first fixated in the top sponsored ads. But Yahoo lost the fewest, with only 5.5 percent choosing to look elsewhere. And finally, Google had 25.8 percent click-throughs on these ads, and MSN had 16.7 percent (yes, this is low, but MSN was dealing with a number of issues at the time of the study). Yahoo led the pack with a 30.2 percent click-through rate. In fact, for the first time ever in our research, a sponsored link (the number one top sponsored) out-pulled the No. 1 organic link, at click-through rates of 25.6 percent vs. 14 percent. This was a complete reversal of the click-through ratios we saw on the other two engines.

For whatever reason, Yahoo’s top sponsored ads seemed to be locking searchers into their part of the results page to a much greater extent than Google and MSN.

Why? What the heck was going on? Better ads? Not really. If anything, Google’s ads seemed a touch more relevant.

Location, Location, Location

Part of it was real estate. Another interesting comparison we did was to look at the percentages of screen real estate devoted to various sections of the page. Yahoo has gone out of its way to make the top sponsored ads the dominant feature on a results page at 1024 by 768 screen resolution. At this size, the ads take up 23 percent of the real estate, compared to approximately 16 percent for Google and Yahoo. This pushes organic listings on Yahoo perilously close to the fold.

And there, as I stared at the screen shots of fully loaded (maximum ads and vertical results showing) Google, MSN and Yahoo results at standard resolution, a possible answer revealed itself. On Google, three top sponsored ads, three OneBox results, and three visible organic listings. On MSN, the same three:three:three presentation. But on Yahoo, there were four top sponsored ads, three vertical results, and just one and a half organic listings were visible.

The Rule of Three

Hmmm, three, three and three. There was something there, niggling in the back of my mind. Quickly, I did a search for the “Rule of Three” and sure enough, there it was. We humans tend to think in triplets. Three is a good number to wrap our mind around, and we see it in all kinds of instances. We tend to remember points best when given in groups of three, we scan visual elements best when they come in threes, and we like to have three options to consider. Think how often three comes up in our society: three little pigs, three strikes, three doors on “Let’s Make a Deal,” three competitive quotes. It’s a triordered world out there.

So is it coincidence that search results tend to be presented to us, neatly ordered in groups of three? I think not. It strikes me that this engrained human behavior would probably translate to the search engine results page as well.

The Ruler-breaker

MSN and Google tend to adhere to the rule of three in their layouts (depending on whether or not Google serves three top sponsored ads). Our choices are conveniently presented in neat trios, with logical divides between each.

Yahoo breaks the rule by tipping the balance in favor of the top sponsored ads. First, it provides four results, not three. Does this mean we need to spend a little more time up in these results, trying to fit one extra one into our limited memory slots? That appears to be the case, with people spending an average of 4.6 seconds in the Yahoo top sponsored results in our study, compared to 2.4 seconds for Google and 1.73 seconds for MSN.

Second, it only gives us one visible organic listing to consider. It breaks our natural desire to have three alternatives, thereby reducing the Promise of Interest for the organic listings. In effect, on the screen of results most people would see on Yahoo, we only have one alternative, the top sponsored ads.

An earth-shaking discovery? Perhaps not. But cut me some slack. I’ve been looking at eye-tracking data daily for three months now, spending about three hours each day looking at interactions with the three engines. I think it’s time I took the three other members of my family on a three-week vacation, during which we’ll be visiting three countries. Wait a minute! Do I sense a pattern developing?

Branded Terms in Search Results: Pre-Mapping in Action

First published July 6, 2006 in Mediapost’s Search Insider

Two separate occurrences in the last little while have lent credence to a behavioral occurrence we’ve seen in many of our studies.

First, I was sitting in on a meeting where an agency (not ours) was reporting on the performance of its sponsored search campaigns and was ecstatic with the performance of its branded term phrases, which were outperforming every other keyword bucket both in terms of click-throughs and conversions. While giddy with delight, company executives were at a bit of a loss to explain why.

On a similar track, a search marketing firm has recently released some results that looked at cannibalization of search campaigns when you are buying terms where you also hold top organic position. Again, they found this is most likely to happen when you’re buying branded terms.

While neither of these examples should be surprising to a seasoned search marketer, we’re all interested to know the reasons behind this interplay between organic and sponsored, particularly on branded terms. The answer, as it so often does, lies in looking more closely at what the search user is doing.

Pre-Mapping: A Theory

After looking at thousands of search sessions in detail, one thing is becoming clear. Searchers are incredibly adept at focusing in on just the portion of the results page that interests them. The time required to relocate to the prime real estate is literally a fraction of a second. Yet that real estate isn’t always the same spot. It varies depending on query and intent. It also varies by user, but even the same users will navigate the real estate of the listings in very different ways, depending on what they’re looking for.

Pre-Mapping supposes that we’ve interacted with search results pages enough to know the sections of real estate we typically deal with. We know where the top sponsored ads are and what they are. We know about where the top organic listings start. And in our minds, we already have a good idea of the type of site we’re looking for and approximately where we expect it to appear. Before the page ever loads, we’ve already mapped out the sections that would appear to hold the greatest promise to deliver on our intent. As the page loads, we do a split-second scan to get our bearings (orient in the top left corner, see how many top ads there are, see where organic starts) and then we go to the part of the map we’ve predetermined to be our best starting point.

Theory in Practice

Let’s run through a few examples. Imagine you’re looking for the possible side effects of a medication. The types of sites you would be looking for would be authoritative information sites, either the official site for the medication, a recognized health portal or possibly a government information site. In this case, you may be leaning more towards objective sites, rather than the pharmaceutical company’s own site. After launching the search (the name of the drug) you’ll quickly filter out, or thin slice, any commercially oriented sites. In this type of interaction, you’ve determined through pre-mapping that your area of greatest promise is not likely to be in the sponsored ads. You also expect the official site to rank No. 1 organically, so your area of greatest promise is probably in the No. 2 to 5 organic rankings, where you expect the types of sites you’re looking for to sit. In a split second, you’ve narrowed the real estate where you’ll start your active scanning to about 10 percent of the total real estate.

Now, let’s say you’re looking to renew your auto insurance. You’ve already checked out a few quotes online, but before you commit to any, you want to see how your current carrier compares. You’ve also pre-mapped the page in this case. Here, you expect your company to be bidding for the term ( “Brand Name auto insurance”) and because it’s a commercially oriented query, you assume that the sponsored listing would take you to a page where you could get a quote. Your area of greatest promise is the top sponsored ads. Again, you do your orientation scan to find your bearings in the upper left, but in this case, you would start right at the top sponsored link and work your way down the page until you find a link to the carrier in question that offers the promise of giving you a quote.

Theory Applied

Considering these two examples of user behavior, you can easily see what was happening in the two anecdotes I cited at the beginning of this piece. Brand terms will convert like gangbusters in the top sponsored location, because when a brand term is used, it’s very likely that the user has pre-mapped and is expecting to find that site in those top sponsored spots.

Similarly, you will find significant cannibalization because when users have pre-mapped, they start at the top and work down. They’ll hit the sponsored result before they hit any organic result that might appear. They’re looking for the quickest route, and in this case, the sponsored listing is giving it to them.

The likelihood to pre-map, and what this means for interaction for the page, lies in that deep dark place where all the answers to search engine success lie, the mind of your target prospect. Spend some time exploring it.