Steve Goldstein: 7 Ways Voice is Rapidly Evolving

Steve Goldstein’s Amplifi Media works with media companies and podcasters in developing audio content strategies. Goldstein writes frequently at Blogstein, the Amplifi blog.


The first weeks of this year brought two major gatherings on voice. The first was with the backdrop of the glitz of Vegas at CES. The second was far from the lights across the country in Chattanooga. Both Voice Summit and Project Voice featured Google, Amazon, Samsung and others shedding light on where things are and where things are likely to go with voice. This was my second time at the Chattanooga conference and the tone this year was decidedly a less dreamy “what if” and a more realistic “how come.” The lessons about users, logic, simplicity and devices resonated through the halls.

Here are 7 top-line takeaways:

1. The pace of skill development has slowed – There are more than 100,000 skills on the Alexa platform worldwide, but according to Voicebot.Ai only 14,000 were added in 2019. This suggests that momentum is slowing. But why? Have major companies all built their successful skills? Are they now spending time iterating? Have they failed and packed up their science project bags and moved on to the next shiny thing? All of the above. It is clearly a business growing up fast and moving beyond the era of the hobbyist. Along with growth comes frustration for many and success for some. Every new platform goes through a steep learning curve. Voice is no different.

2. Build things that solve problems – Most skills are intrinsically bad and thus skill retention is pretty terrible. People ask for something and it doesn’t work or they ask and its not worth returning again or people simply forget the skill exists. The holy grails are useful content and simplicity. Will Hall of the RAIN Agency said “the “F” word is friction.” He’s right. There is plenty of friction finding and accessing truly useful content. Tom Hewitson who runs London based skill builder Labworks.io emphasized the importance of Iteration, lamenting that most voice experiences are bad.

“The “F” Word is Friction”

Developer Nick Schwab, who owns a Telsa car funded by his own skill development, offered tips and a monetization pathway that has been elusive to most. Nick is clearly in a minority of players who have cracked the code of building things people care about, repeatedly use and upgrade – he built “sleep sounds” to drown out his neighbors. Focus on the user.

3. Discoverability is not a solved problem – At both conferences, we repeatedly heard that the toughest parts of voice development are the dynamic duo of awareness and discoverability. The first is largely a marketing issue and closely aligns with my company’s experience working with podcasts, and our earlier work developing brands for commercial radio. Voice without a screen is hard. A Google executive said “discoverability is not a solved problem.” Agreed.

The three most commonly used words in Voice Search are “How,”“What” and “Best” – Google

4. Beyond the cylinder and dot – We tend to think about voice as residing in smart speakers, but after CES and Chattanooga, it is abundantly clear voice is in everything from microwaves, toothbrushes, shower heads and cars —- you name it, it will talk to you. No one really knows how all of this will work, but after five years of voice devices on the market, the most popular interactions have stayed pretty much the same. According to Google, the top three common words used in voice search are “how “what” and “best.” Mostly people are still looking for basic transactional content like answering questions followed by ambient solutions such as streaming services.

5. Big league thinking – We saw fantastic experimentation from organizations such as the NFL who are thoughtfully looking for ways to get closer to consumers by learning which teams and players people have interests in and offering customized content. Ever wonder about a specific NFL rule mid-game? Ask the NFL.

There are 3,000 ways to set an alarm

6. There are 3,000 ways to set an alarm clock – Google’s Scott Huffman spoke at CES about the need to get past the rigidity of invocations and move to conversational voice design which takes into account how real people speak. It is remarkable to track the progress in such a short period of time.

7. Baby you can drive my car – While autonomous vehicles generate chatter, they are clearly not happening yet for the mass market.. In the next three years, however, voice in the car is likely to be a big deal. As we wrote about in our last post, Amazon and Google are seizing on the failure of car companies to create good user voice experiences and launching rich voice integration in cars from GM, Toyota, Audi, and Ford and others. The thing to watch for here is an eco-system in which you can trigger home devices directly from the car or pull up your calendar.

On a panel focused on podcasting and smart speakers, I shared the stage with Julie Daniel Davis who is pioneering education flash briefings for schools which feels like a natural and NPR’s Joel Sucherman. He and his team are setting the pace for voice integration and “pushing the envelope” with various rich initiatives including a smart speaker version of the game show “Wait, Wait Don’t Tell Me.”

Thanks to Pete Erickson and Bradley Metrock for leading the way with two great conferences and plenty of note taking.

Steve Goldstein