Let’s say you’re a regular at the local coffee shop. It’s likely the staff know you by name, and they can whip up your order without having to say anything: double shot espresso, half-sweet hazelnut syrup, coconut milk latte with extra foam. The level of personalization that comes with something as simple as ordering a coffee has infiltrated consumer expectations across industries. According to a study by Accenture, 75% of consumers are more likely to purchase if the retailer knows them by name and can make recommendations based on their purchase history. That’s likely why you’re a regular at the local coffee shop.
Spotify is an interesting example of how organizations can dig deep into consumer data and serve up a highly personalized experience. If you’re a Spotify user, you might be familiar with the ‘Discover Weekly’ feature—a curated playlist customized to your taste in music. This feature has delighted Spotify users with its ability to tailor a playlist curated based on skipped songs, songs added to playlists and even a thumbs down. Let’s explore the mechanisms behind Spotify’s infamous algorithm that delivers the right tunes to the right people, week after week.
Achieving computing efficiency with collaborative filtering
So how does
Spotify build such highly personalized playlists for millions of users? Historically, Spotify has used collaborative filtering, an AI system that analyzes the preferences and patterns of similar users to make future predictions. Collaborative filtering enables the system to record associations between user preferences towards certain types of songs and artists.
Complex machine learning problems involving graph computation, streaming and query processing can be solved at scale using powerful technologies like Spark. Where data centers used to be overwhelmed by the sheer volume of processing queries, machine learning and AI have transformed the way information is used and synthesized.
Using natural language processing to interpret human speech (and song)
You’ve probably heard about Natural Language Processing (NLP) before but might not realize the significant role it plays in Spotify’s algorithm. NLP is an emerging form of artificial intelligence that analyzes the human language to generate consumer insights. In essence, it’s technology that helps machines understand human speech.
Spotify organizes this data into ‘cultural vectors’ ranking the top terms (which change daily) used to reference the artist or song. Similar to collaborative filtering, the NLP mechanism weighs and ranks the terms by relevance to create a vector that determines similarities across songs and artists. For example, one might find an association between artists Drake and Kanye West, with terms such as rap, hip hop and R&B ranking high on the respective term lists. This indicates to Spotify that such words are commonly being used in reference to both artists and the music they produce, which in turn can help the algorithm curate complementary music for a user’s Discover Weekly playlist.
Leveraging convolutional neural networks akin to facial recognition technology
Another way Spotify builds their Discover Weekly playlists: raw audio data from the music file itself. Raw audio models factor in new songs on the market. Collaborative filtering and NLP typically pick up songs that receive a lot of traction from users. Raw audio models, however, can pick up songs that might be a great fit for your Discover Weekly playlist, but don’t necessarily get the same traffic as more popular tracks.
In using convolutional neural networks, Spotify can drill down on the key characteristics shared across similar songs and artists. Instead of simply relying on track plays and interactions with artists, Spotify can look at the architecture behind a song and populate your Discover Weekly playlist with complementary music, whether the artist is well known or not.
In the cloud, but breaking ground on new data processing strategies
By leveraging Google Cloud Platform (GCP), Spotify was able to shift their focus and capitalize on the innovative mechanisms offered by GCP such as BigQuery cloud data warehouse, Pub/Sub for messaging and DataFlow for batch streaming and processing.
A byproduct of this cloud adoption has been the Discover Weekly playlist. Using a combination of collaborative filtering, NLP and raw audio modeling, Spotify is putting their data to use in a way we haven’t seen before in the music industry. Given the company’s massive storage requirements, Spotify has opted for the cloud to increase efficiencies and minimize resource expenditure.
It’s important to find what works best for your business goals since not everything always belongs in the public cloud. Many Digital Realty customers also leverage the expansive GCP features within Digital Realty data centers, as part of deploying secure, private connections in their hybrid cloud solutions. Consider public vs. private for each of your business applications. It may come down to decisions around the prioritization of control, cost optimization, and security. The public cloud is ideal for high elasticity scenarios—which Spotify deals with often, especially when, for example, a new album drops by a popular artist and everyone listens to it at the same time—while private infrastructures are better for more predictable capacity needs.
Need some good tunes to get you through the work day? Check out our Rock Your Colo playlist that we curated on Spotify with some of our favorite jams!