Let’s say you’re a regular at the local coffee shop. It’s likely the staff know you by name, and they can whip up your order without having to say anything: double shot espresso, half-sweet hazelnut syrup, coconut milk latte with extra foam. The level of personalization that comes with something as simple as ordering a coffee has infiltrated consumer expectations across industries. According to a study by Accenture, 75% of consumers are more likely to purchase if the retailer knows them by name and can make recommendations based on their purchase history. That’s likely why you’re a regular at the local coffee shop.
Spotify is an interesting example of how organizations can dig deep into consumer data and serve up a highly personalized experience. If you’re a Spotify user, you might be familiar with the ‘Discover Weekly’ feature—a curated playlist customized to your taste in music. This feature has delighted Spotify users with its ability to tailor a playlist curated based on skipped songs, songs added to playlists and even a thumbs down. Let’s explore the mechanisms behind Spotify’s effective algorithm that delivers the right tunes to the right people, week after week.
Achieving computing efficiency with collaborative filtering
So how does Spotify build such highly personalized playlists for millions of users? Historically, Spotify has used collaborative filtering, an AI system that analyzes the preferences and patterns of similar users to make future predictions. Collaborative filtering enables the system to record associations between user preferences towards certain types of songs and artists.
Spotify’s recommend system relies on implicit feedback data such as stream counts, saved tracks to personal playlists and visits to an artist’s page after listening to a song. Furthermore, suggested tracks are generated by Python libraries that render two different vectors using an intricate matrix factorization formula. One vector represents the user (X), and the other represents the song (Y). To understand which users have similar preferences, the collaborative filtering system cross references all vectors in Spotify (i.e. all users and all songs) to find the closest matches.
Now more than ever, computing efficiency is critical to success. Enter Spark: a popular cluster computing system, hailed by data scientists as a fast, simple and highly scalable data processing tool. The Spark ecosystem continues to be a crowd favorite, allowing seamless integration with a range of other tools including R, SQL, Python, Scala and Java.
Complex machine learning problems involving graph computation, streaming and query processing can be solved at scale using powerful technologies like Spark. Where data centers used to be overwhelmed by the sheer volume of processing queries, machine learning and AI have transformed the way information is used and synthesized.
Using natural language processing to interpret human speech (and song)
You’ve probably heard about Natural Language Processing (NLP) before but might not realize the significant role it plays in Spotify’s algorithm. NLP is an emerging form of artificial intelligence that analyzes the human language to generate consumer insights. In essence, it’s technology that helps machines understand human speech.
Spotify’s NLP mechanism crawls the internet for written content on all things music. It aggregates data around what articles, blogs and metadata are saying about particular artists and songs, with a dynamic list being assigned to each artist and song that changes every day and is weighted by relevance.
Spotify organizes this data into ‘cultural vectors’ ranking the top terms (which change daily) used to reference the artist or song. Similar to collaborative filtering, the NLP mechanism weighs and ranks the terms by relevance to create a vector that determines similarities across songs and artists. For example, one might find an association between artists Drake and Kanye West, with terms such as rap, hip hop and R&B ranking high on the respective term lists. This indicates to Spotify that such words are commonly being used in reference to both artists and the music they produce, which in turn can help the algorithm curate complementary music for a user’s Discover Weekly playlist.
According to an article written for the Harvard Business School, Spotify works with the largest crowdsourced music dataset in the world. The ‘Discover Weekly’ feature is a unique opportunity for Spotify to synthesize vast datasets into highly personalized playlists, demonstrating a true understanding of the consumer preferences and creating a unique customer loyalty to a product that ‘gets them’.
Leveraging convolutional neural networks akin to facial recognition technology
Another way Spotify builds their Discover Weekly playlists: raw audio data from the music file itself. Raw audio models factor in new songs on the market. Collaborative filtering and NLP typically pick up songs that receive a lot of traction from users. Raw audio models, however, can pick up songs that might be a great fit for your Discover Weekly playlist, but don’t necessarily get the same traffic as more popular tracks.
Similar to the facial recognition feature most new smartphones sport these days, audio data can be leveraged using convolutional neural networks. These networks are comprised of a series of layers through which time-frequency audio frames are inputted and linked to create a spectrogram. Once the audio frames have passed through all of the layers, a global temporal pooling layer forms, gathering and computing all of the song characteristics reports Sander Dieleman, former Spotify intern. After the data is processed, the output of the neural network is a holistic view of the song’s features including time signature, key, mode, tempo and loudness.
In using convolutional neural networks, Spotify can drill down on the key characteristics shared across similar songs and artists. Instead of simply relying on track plays and interactions with artists, Spotify can look at the architecture behind a song and populate your Discover Weekly playlist with complementary music, whether the artist is well known or not.
In the cloud, but breaking ground on new data processing strategies
By leveraging Google Cloud Platform (GCP), Spotify was able to shift their focus and capitalize on the innovative mechanisms offered by GCP such as BigQuery cloud data warehouse, Pub/Sub for messaging and DataFlow for batch streaming and processing.
A byproduct of this cloud adoption has been the Discover Weekly playlist. Using a combination of collaborative filtering, NLP and raw audio modeling, Spotify is putting their data to use in a way we haven’t seen before in the music industry. Given the company’s massive storage requirements, Spotify has opted for the cloud to increase efficiencies and minimize resource expenditure.
It’s important to find what works best for your business goals since not everything always belongs in the public cloud. Many Digital Realty customers also leverage the expansive GCP features within Digital Realty data centers, as part of deploying secure, private connections in their hybrid cloud solutions. Consider public vs. private for each of your business applications. It may come down to decisions around the prioritization of control, cost optimization, and security. The public cloud is ideal for high elasticity scenarios—which Spotify deals with often, especially when, for example, a new album drops by a popular artist and everyone listens to it at the same time—while private infrastructures are better for more predictable capacity needs.
Need some good tunes to get you through the work day? Check out our Rock Your Colo playlist that we curated on Spotify with some of our favorite jams!