Siri and Alexa reduce how much time we spend looking at things online, but the visual internet will come back with a vengeance and in a new shape.

In Brief: With the rise of voice-activated digital assistants like Siri and Alexa, we predict a short-term loss in the visual attention that digital media companies can sell to advertisers. Later, with the rise of smart glasses (a.k.a. Heads-Up Displays or HUDs) we’ll see an explosion in advertising-based visual inventory — but with a twist.

By Brad Berens

Bennett and I have been friends since we were eight. Over a recent late-night dessert we compared notes about how thinly spread we each felt across work, family and life. Bennett then shared an insight from a counselor he sees: “Y’know how in Kung-Fu movies the hero stands in the center and all the villains gather into a circle around him and take turns attacking him one by one? Life isn’t like that.”

Neither is technology.

Technologies don’t take turns arriving in our lives. Instead, they’re locked in a Darwinian struggle to clutch and hold onto a niche in our lives. Sometimes it’s a head-to-head struggle, like VCR versus Betamax, where the differences are slight and one technology wins because of marketing and luck. Sometimes different trends slam into each other and that collision creates a new thing — like the way that mobile phones ate digital cameras, email, notebooks, calendars, music collections, powerful microprocessors, decent battery life, email and the web to become smart phones.

A new collision is gaining velocity with the emergence of digital assistants and heads-up display. Both new technologies are changing how users interact with information, particularly visual information. As these technologies give users new ways to behave, those behavior changes will pressurize the business models and financial health of digital media companies, particularly ad-supported companies.

Voice-Interfaces Reduce Visual Interaction

Even though newer Echo devices have screens and touch interfaces, the most compelling use case is eyes free and hands free for Amazon’s Alexa, Apple’s Siri in the HomePod, and the Google Assistant in Google Home.

For example, I often use my Echo device when I’m doing the dishes to catch up on the day’s events by asking, “Alexa, what’s in the news?” Or, if I’m about to wade deep into thought at my desk and don’t want to miss a conference call starting an hour later I’ll ask Alexa to “set a timer for 55 minutes.”

I’m a failure at voice-driven commerce because I have yet to ask Alexa to buy anything from Amazon, but I have used IFTTT (the “If This, Then That” service that connects different devices and services) to connect Alexa to my to-do list so that I can add something just by speaking, which spares me from dropping everything to grab my phone or (gasp!) a pen and paper.

Alexa’s answers are pleasantly clutter-free. If I use my desktop computer to search Amazon for the latest John Grisham novel, then along with a prominent link to Camino Island, Amazon serves up a results page with 24 distracting other things that I can buy, as well as hundreds of other links. With Alexa, I just get Camino Island. (With commodity products, unless you specify a brand Amazon will send you its generic house brand: CPG advertisers beware!)

Right now, most queries to smartphone-based digital assistants result in a list of results that I have to look at, switching my attention from ears to eyes, but as these rudimentary artificial intelligences get better my need to look at a screen will decline. Today, if I say, “Hey Siri, where’s a Peet’s coffee near me?” the AI will tell me the address and ask me if I want to call or get directions. If I choose “directions,” then I have to look at my phone. In a short amount of time, Siri will seamlessly transition to Apple Maps and speak turn-by-turn directions, so I won’t have to look away from the road.

_________________________________________________________________________________________________

Technologies don’t take turns arriving in our lives. Instead, they’re locked in a Darwinian struggle to clutch and hold onto a niche in our lives. Sometimes different trends slam into each other and that collision creates a new thing — like the way that mobile phones ate digital cameras, email, notebooks, calendars, music collections, powerful microprocessors, decent battery life, email and the web to become smart phones.

_________________________________________________________________________________________________

The challenge the rise of voice interfaces poses for ad-supported digital companies is that those companies make their money from propinquity— from the background clutter that is near the thing I’m looking at or searching for but that isn’t the thing I’m looking at or searching for.

Google, Facebook, the New York Times, AOL (excuse me, “Oath”), Reddit, Tumblr, Bing, LinkedIn, and others make much of their money from banners, pop-up ads, search results and other things we see but often don’t consciously notice: that is, online display adverting.

Amazon’s Alexa can already read news stories aloud in a smooth, easy-to-follow voice. It won’t be long until all the digital assistants can do so, and can navigate from article to article, site to site without users having to look at anything.

We can listen to only one thing at a time, so there aren’t background ads for Siri, Alexa and their ilk. Moreover, despite decades of conditioning to accept interruptive ads in radio, it’ll be game over the moment Alexa or Siri or Google Assistant says, “I’ll answer your question, but first please listen to this message from our friends at GlaxoSmithKline.”

The most powerful ad blocker turns out to be a switch from eyes to ears as the primary sense for media interaction. As voice-interface digital assistants grow in popularity and capability, the volume of visual inventory for these businesses will erode.

This erosion follows the decline in visual inventory that already happened as users moved most of their computing time to the smaller screens of mobile devices with less visual geography and therefore less room for ads.

In a recent Recode Decode interview, marketing professor and L2 founder Scott Galloway observed, “advertising has become a tax that the poor and the technologically illiterate pay.”

Since wealthier people will have voice-activated digital assistants first, that means that the people more exposed to visual advertising will have less disposable income and will therefore be less desirable targets for many advertisers. This creates more pressure on the display-ad-based media economy.

On the other hand, remember the Kung Fu movie quip? There’s another technology making changes in the visual internet at the same time.

Smart Glasses Increase Visual Interaction

Smart glasses are, simply, computer screens that you wear over your eyes. In contrast with voice-interfaces that are already popular in phones and with speakers, smart glasses haven’t become a big hit because they’re expensive, battery life is limited, and many people get nervous around other people wearing cameras on their faces all the time. (Early Google Glass enthusiasts were sometimes dubbed “glassholes.”)

Some pundits think that just because Google Glass didn’t sweep the nation it means that all smart glasses are doomed to failure. But just as Apple’s failed Newton (1993) presaged the iPhone 14 years later (2007), Google Glass is merely an early prototype for a future technology hit.

Smart glasses come in a spectrum that gets more immersive: augmented reality puts relevant information in your peripheral vision (Google Glass), mixed reality overlays information onto your location that you can manipulate (Microsoft’s HoloLens, with Pokemon Go as a phone-based version), and virtual reality absorbs you into a 360 degree environment that has little relationship to wherever your body happens to be (Facebook’s Oculus Rift, HTC Vive). The overarching category is “Heads-Up Display” or HUD.

What’s important about HUDs is that they increase the amount of digital information in the user’s visual field: not just the visual inventory for ads (like in this clip from the film, “Minority Report“), but for everything.

Wherever you’re reading this column — on a computer, tablet, phone or paper printout — please stop for a moment and pay attention to your peripheral vision. I’m sitting at my desk as I write this. To my left is a window leading to the sunny outdoors. On my desk to the right are a scanner and a coffee cup. Papers lie all over the desk below the monitor, and there are post-it reminders and pictures on the wall behind the monitor. It’s a typical work environment.

If I were wearing a HUD, then all of that peripheral territory would be fair game for digital information pasted over the real world. That might be a good thing: I could have a “focus” setting on my HUD that grays out everything in my visual field that isn’t part of the window where I’m typing or the scattered paper notes about what I’m writing. If I needed to search for a piece of information on Google I might call a virtual monitor into existence next to my actual monitor and run the search without having to hide the text I’m writing. This is the good news version.

In the bad news version, ads, helpful suggestions, notifications, reminders and much more colonize the majority of my visual field: I think about those moments when my smart phone seems to explode with notifications, and then I imagine expanding that chaos to everything I can see. In some instances this might be a maddening cacophony, but others might be more subtle, exposing me to messages in the background at a high but not-irritating frequency in order to make the product more salient. (“I’m thirsty: I’ll have a Coke. Wait, I don’t drink soft drinks… how’d that happen?”) This isn’t as creepy as it sounds, like the old Vance Packard “subliminal advertising” bugaboo, it’s just advertising. Salience results from repetition.

Regardless of what fills the digital visual field, an explosion of visual inventory will be a smorgasbord of yummies for ad-supported media companies.

But there’s a twist.

Filters and the Decline of Shared Reality

Just sitting at my desk as I work is an overly-simplistic use case for wearing a HUD: the real differences in all their complexity come into focus once I leave my office to wander the world.

With Heads-Up Display, every surface becomes a possible screen for interactive information. That’s the output. Since the primary input channel will still be my voice, there’s a disparity between the thin amount of input I give and the explosion of output I receive. This is the digital assistant and HUD collision I mentioned earlier.

Walking in a supermarket, the labels on different products might be different for me than for the person pushing his cart down the aisle a few yards away. The supermarket might generate individualized coupons in real time that would float over the products in question and beckon. If my HUD integrated with my digital assistant, then I might be able to say, “Hey Siri, what can I make for dinner?” and have Siri show me what’s in the fridge and the pantry so that I can buy whatever else I need.

_______________________________________________________________________________________________

We can listen to only one thing at a time, so there aren’t background ads for Siri, Alexa and their ilk. Moreover, despite decades of conditioning to accept interruptive ads in radio, it’ll be game over the moment Alexa or Siri or Google Assistant says, “I’ll answer your question, but first please listen to this message from our friends at GlaxoSmithKline.”

________________________________________________________________________________________________

Smart glasses won’t just stick information on top of the reality on the other side of the lenses, they will also filter that reality in different ways.

We can see how this will work by looking at the technologies we already use. For example, businesses will compete to put hyper-customized articles, videos, and ads in front of you, similar to how ads pop-up on your Facebook page today. But these articles and ads will be everywhere you look, rather than contained on your laptop of phone. This is algorithmic filtering based on your past behavior.

Likewise, your digital assistant will insert helpful information into your visual field (such as the name of the person you’re talking with that you can’t remember) that you either ask for or that it anticipates you might find useful. The Google app on many smart phones already does versions of this, like reminding you to leave for the airport so that you aren’t late for your flight.

Finally, you’ll be able to add your own filters by hand, changing people’s appearances or names in real-time. If you’ve given one of your smart phone callers an individual ring tone, changed the name of a contact to something else (“What a Babe” or “Don’t Answer Him,”), or watched a teenager put a dog nose or kitty ears on top of a photo in Snapchat, then you’ve already seen primitive versions of this in action.

An unintended consequence of this visual explosion is the decline of shared reality. We already spend much of our time avoiding the world around us in favor of the tastier, easier world inside our smart phones. But even if the latest meme coming out of Instagram is the funniest thing we’ve ever seen, the majority of what surrounds us is still analog, still the flesh and blood world untouched by digital information.

That changes with HUDs.

In the near future where HUDs are common, you and I might stand side by side on the same street corner looking at the same hodgepodge of people, cars, buildings and signs — but seeing different things because we have idiosyncratic, real-time filters. Each of us will be standing on the same corner but living inside what Eli Pariser calls “filter bubbles” that have ballooned out to surround our entire worlds.

Common knowledge at this point becomes rare because a big part of common knowledge is its social component. In the words of Michael Suk-Young Chwe from his book Rational Ritual, a society’s integration is the result of coordinated activities built on a set of shared information and messages.

For a society to function, Chwe writes, “Knowledge of the message is not enough; what is also required is knowledge of others’ knowledge, knowledge of others’ knowledge of others’ knowledge, and so on — that is, “common knowledge.”

It has been challenging enough in our shared analog reality to achieve things like consensus in politics or word-of-mouth awareness in business. As we each move into new, idiosyncratically personalized environments where we don’t know what other people know, we’ll need to work harder to hear other voices than our own, to connect with each other as friends, family members, customers and citizens.

That may be a tall order.
__________

Brad Berens is the Center’s Chief Strategy Officer.

 

 

 

 

See all Analysis columns.

July 20, 2017