In Wednesday’s SearchInsider, Aaron Goldman looked at video search and what’s going to be required for it to truly become an interesting advertising vehicle. Some of the speculation comes from Aaron’s musing about what might happen if Google purchased Blinkx.
To me, video search is one of the more interesting growth areas for search in the future. Currently, there are some restrictions on video search that are imposed by the current state of technology. Our ability to index video is restricted to the addition of metadata. For each video clip, someone must take the time to include the tags indicating what the video is about. As long as video search relies on this, the opportunities for advancement are extremely limited. But right now we’re advancing on several technical fronts to be able to index content and not rely on metadata. Several organizations, including Microsoft, are working on visual recognition algorithms that allow for true indexing of video content. Advancements in computing horsepower will soon give us the sheer muscle required for the gargantuan indexing task. Once we remove humans from the equation, allowing for automated indexing video content, the world of video search suddenly becomes much more promising.
When this happens, we move accessing information in a video from being a linear experience to being a nonlinear experience. Suddenly we have random-access to information embedded within the video. As mentioned, the technology is being developed to enable this, but the question is, will we as viewers be able to adapt to this paradigm shift? The evolution of video has been one that is coming from a linear, storytelling experience. Every video is generally a self-contained story with a distinct beginning, middle and end. This is how we’re used to looking at video.
But when video search makes it possible to access information at any point in the video, how will that impact our engagement with that video? In the last 10 years, we’ve seen some fairly dramatic shifts in how we assimilate written information. We have moved from our past experience, where information was presented in very much a linear fashion in novels or books, to the way we currently assimilate information on websites. When we interact with websites, we “berry pick”, hunting in various places on the page for information cues that seemed to offer what we are looking for. Assimilation of the written word is much more erratic experience right now. We move in a nonlinear fashion through websites, picking up information and navigating based solely on our intent and the paths we choose for ourselves. One of the greatest revelations in website design was that we can not restrict users to a linear progression through our site, much as we might want to control their experience.
This adaptation has happened fairly quickly on websites, but will it happen as quickly with video? When we can search for and access information anywhere in the video, what does that do for the nature of our engagement with that video? Certainly it opens the door to some very interesting marketing opportunities, with what I’ve previously described as “product placement on steroids”. The ability to click on any item in a video and instantly be connected to more information about that item creates a tremendous opportunity for advertisers. But it also opens the potential for multiple paths through a video. Does watching a video become more like playing a video game, where we can pursue different paths and have different experiences depending on the path we choose? Does a travel video on Prague become an interactive virtual tour, where we choose our own path through Prague? And is that interactive virtual tour assembled on-the-fly from dozens of different video clips? do we assemble content based on our intent with the help of our video search tool? Do video producers take a dramatically more granular approach to producing content, leaving you to assemble the storyline from these individual bits of content, based on what you want to see?
This promises an extraordinarily rich user experience. Consider how this might play out for an individual user. We go to Google video search tool and search for the Loreta, one of the top tourist attractions in Prague. We find a clip that takes us on a quick virtual tour and within the clip we could click on other things of interest. For instance, we could climb to the top of the bell tower and take a look over Prague. We could click on any building and if there was a video available we would be instantly transported to that building. Or, if we choose, we could search for the nearest hotel and find the corresponding video clip. The entire video has been indexed so no matter what we click on, our video search engine can use that to initiate a query and bring us back the resulting clips. The clips are assembled into a virtual montage that we can navigate through depending on our interest areas. We create a virtual version of Prague, assembled from all the video content that’s available, and we can access just what we’re interested in and search for any content that might be embedded into any of those individual video files. Underneath this layer of video content there could be additional layers of functionality. For instance you could tie it in with mapping functionality, à la Google Earth. You could tie in Web search functionality so that you could easily click through to the relevant websites. This could also provide access to booking engines and a number of other potential actions that we could take.
Such an experience is not that great a stretch from where we are currently at. To see how it might play out take a look at Microsoft’s PhotoSynth.
PhotoSynth View of Piazza San Marco in Venice
It does just what I’m describing with video, only with pictures. It creates a 3-D world from the thousands of pictures that have been publicly shared. I highly recommend taking it for a spin, as it provides a fascinating look at what human computer interfaces can be.
As we start considering the possibilities for video, the problem is we’re still stuck in our current paradigm of how we interact with video. My feeling is once indexing technology allows us to truly index the content of the video, the nature of our interaction with video will completely change. We’ll take the sensory input we expect from video and extend that into our typical user experience with more types of content. Our interfaces will be more satisfying because they will become more like real life. They will engage more of our senses and put us into a deeper and richer virtual world. More and more, as technology progresses, our interface with technology will start to look more like our experience with the physical world. As this happens, we will have the ability to step from a interface that engages our senses of sight and sound into a more abstract world where we interact with written text. The transition between these two interfaces will be seamless and we can step back and forth as we wish.
The promise of video lies not so much in taking video as we know it and bringing it online. The promise of video is that it provides a distinctly different user experience which could prove to be the new interface to technology. But to make this happen we have to be able to index and search for the content that lies embedded within video. We have to be able to take that video content and manipulate and mold it into a virtual world that we can interact with. And that is the promise that lies within the next-generation video search.