
It’s been a couple of months, and I still can’t talk to Alexa. My Amazon Echo Dot, the hockey puck-shaped smart speaker that emits a pale blue glow when asked a question or given an order, sits dormant in my office. It’s not because I dislike talking to machines. I’m a dictation expert on my phone—incorporating the words “period” and “comma” into my sentences, and saying “ha ha” in a staccato way that is the exact opposite of laughing—the product of years of idea-saving and communicating while walking or driving. But when I opened my Echo earlier this summer, I found a small list of sample questions, like a tourist’s starter guide to navigating a foreign country. While I understand that part of using a smart speaker is the simple novelty of giving orders to a sentient piece of plastic, I don’t know if I’m ready to use voice commands for their own sake; Alexa is programmed to answer questions I don’t know that I need to ask.
For instance, as someone who actually enjoys sorting streaming tracks into gargantuan playlists and maintaining a physical record collection, I don’t need assistance listening to music, which is pitched as one of the Echo’s primary tasks. At the same time, a lot of my music-loving friends have fallen for the Echo. One of them who grew up listening to AM signals through a transistor radio uses one of his Echoes (he has four) for background radio listening in the kitchen, as his wife listens to NPR through another one. Another friend taught her Echo to play ambient music while she sleeps. The people who use the Echo tend to really like the Echo. I can’t help but wonder if, eventually, I’ll be an Echo person too.
Almost four years after the Echo and Alexa’s rather inauspicious debut—“The whole thing is a tad baffling, but also intriguing,” wrote TechCrunch at the time—smart speakers are now teetering on omnipresence. Sales tripled between 2016 and 2017, and analysts expect nearly 60 million units will be bought globally this year. According to a study by NPR and Edison Research, 39 million Americans—16 percent of the country—owned a smart speaker in January 2018. Though Amazon still has the market cornered, Google launched its Home speaker in November 2016, and Apple’s Siri-run HomePod was released this February with a higher price and the promise of superior audio quality. The smart speaker marketplace grows more crowded by the day: Microsoft’s virtual assistant Cortana has found a home in a Harman Kardon speaker, while Samsung’s Bixby will reportedly debut in speaker form later this year. Sonos, Panasonic, and Sony are joining the fray as well. Would you buy a will.i.am-branded smart speaker? He hopes so.
TRENDING NOW
Mount Kimbie Perform “Marilyn” | Pitchfork Music Festival 2018
Unsurprisingly, streaming music is proving to be these appliances’ killer app. NPR and Edison report that 60 percent of users surveyed asked their smart speakers to “play music” while spending time with others, making it easily the most popular function, ahead of answering general questions (mentioned by 30 percent of respondents) and getting the weather (28 percent). The listening isn’t purely random, either: A recent report notes that nearly half of smart speaker owners pay for a monthly streaming subscription, a number that is predicted to rise. At a recent British music industry meeting, smart speakers were compared to the Apple’s iPod and App Store launches in terms of their possible effects on multiple industries. Three of the most valuable technology companies in the world are deploying interactive speakers to draw listeners to their branded music platforms: The next battle in the corporate streaming music war will be fought with voice.
The smart speaker is the product of decades of experimentation with voice recognition and domestic networking that has been made possible, as have so many recent innovations, by massive companies wielding incredible amounts of computing power. Alexa, Siri, and the other artificially intelligent, voice-recognizing (and always female) domestic robo-agents have roots in Bell Labs’ fledgling 1950s experiments with “Audrey,” but their capacity to recognize conversational speech patterns and interact with their owners in a naturalistic way situates them within the ongoing evolution of interactive AI, which once terrified us but now turns us on. These devices’ roles in organizing the mundane duties of domestic life is part of a much broader campaign to network the entire home into a smoothly operating, data-rich whole: Echo can adjust your home’s thermostat and lock your doors, just like Google Home fits into its Nest system, and Apple’s HomePod dialogues with its HomeKit. Freely accessible digital music has been compared to a household utility—like water out of the tap, always available—for years, and with smart speakers, it’s now controllable by the same device that dims your lights.
Digital music files themselves have been remade as “smart” objects for the past several years—“smart” being the latest unavoidable tech buzzword describing technologies that promise to improve experience through mild surveillance. By corralling files into platforms, Spotify, Apple Music, Tidal and their ilk have transformed the simple act of clicking play into a value-generating activity. Streaming songs aren’t exchangeable commodities like they are on CD, vinyl, or even MP3; instead, they’re pleasurable spyware, reporting back copious amounts of proprietary data on listeners (which, the companies promise, is then routed back into an ever-more-personalized and enjoyable user experience). When Spotify CEO Daniel Ek told The New Yorker that his company isn’t in the music space, but the moment space, he was implying that the experience is the commodity—not music, but everyday activities tuned to Spotify’s algorithms and curated playlists. Smart speakers nestle perfectly into a digital music landscape colonized by streaming platforms, the better to curate each activity as a meaningfully soundtracked moment.
Tech designers and engineers look at the world as a set of problems to efficiently, if not artfully, solve. Within certain corners of the digital music space, those problems manifest as barriers to a seamless listening experience—to experiencing streaming music as an atmospheric hum capable of instantaneously accommodating any mood, activity, or nostalgic pang. This is what Amazon Music director Ryan Redington is getting at when he tells me that “voice almost completely removes friction for getting the music quickly.” As an example, Redington describes how he uses music to shift into domestic mode after work. “I used to get home, take out my phone, unlock it, find Amazon Music, find a playlist that I want to listen to, connect to Bluetooth or a receiver in my house, then start playing music,” he explains. With a smart speaker, he claims, all that technological friction disappears. “Now I can just walk in my house, say, ‘Alexa, play’ whatever I want to listen to, and it just works.”
The Echo was not designed explicitly for music, but it was no coincidence that Amazon launched Prime Music, its free service for Amazon Prime members, a few months before the Echo was introduced to the world. (Amazon Music Unlimited, which features millions more tracks and was launched as as a direct competitor to Spotify and Apple Music, debuted in 2016.) “I wouldn’t go as far as to say that [the Echo and Amazon Prime Music] were developed together,” Redington tells me, “but certainly, we knew that this device was being worked on, [and built] our music service to make sure it was very voice-forward.” While Spotify distinguishes itself with personally curated playlists, and Tidal and Apple Music offer artist exclusives on their platforms, Amazon Music hopes to separate itself with voice.
Though its competitors will no doubt catch up quickly, to date Amazon has done far more to integrate streaming music with voice commands. This is a realm that, to put it lightly, can differ starkly from the more familiar process of typing a question into a visual interface. “We are very much down in the weeds on understanding exactly what words customers are using when they ask for something,” explains Alex Luke, Amazon’s global head of programming and content strategy. “What does Alexa say back in response to that utterance, and then what music do we deliver after Alexa says her response?”
Indeed, one of the most significant issues for smart speaker engineers to address is what might be called the single-response problem. “In voice,” Redington explains, “you don’t have the luxury to give customers a lot of results—you have to start playing something.” Unlike a visual interface that can provide a screen full of sorted responses to a question for the user to select from, Alexa can only provide one answer at a time—otherwise there’s friction. In the smart speaker world, getting the right answer first is key. As Redington puts it, “When you ask for something and it works, that’s truly where the magic happens.”
As with all streaming music, the “magic” emerges from the metadata. In a platformed music environment, each individual track is appended with copious digital information that determines where and how it should circulate, from codes that track sales and streams to musical and activity information. Though any streaming platform user is deeply familiar with mood and activity-geared playlists, the frictionless domestic landscape of voice commanded speakers has led to a surge in such requests. “When people say, ‘Alexa, play me happy music,’ that’s something we never saw typed into our app, but we start to see happening a lot through the voice environment,” Redington explains.
While all platforms have teams creating reams of metadata through machine learning techniques and human curation that can determine if a song is “happy,” record labels understandably want to have a say as well. Will Slattery is the global digital sales manager for Ninja Tune, an electronic label that, translated into streaming language, features a lot of lyric-less music that lends itself toward specific moods and activities. “When people start interacting with smart speakers, they’re going to want to say, ‘Alexa, play some chill music,’ or ‘play music for dinner,’” Slattery predicts. “And that’s where a label could jump in and provide the [streaming] companies with that metadata, like, ‘This would be a good song for these specific moods.’” Ninja Tune artist Bonobo, Slattery notes, is very popular on study and concentration playlists—something the producer doesn’t take into account when composing his music, but which he can’t deny once it’s in circulation. “It is strange to imagine an artist hoping they someday get their music on fitness playlists,” as opposed to getting a rave review or a plum Coachella slot, one indie label owner tells me. “But this will change fast. What seems like a slightly absurd way to approach music today will be commonplace tomorrow.”





