Final Blog Post

Our goal for this project was to see if certain audio features of a song could determine the song's popularity amongst listeners. We wanted to identify any sort of correlations between particular audio features, such as key, tempo, danceability, etc., and a song's popularity metric.

This data was gathered from the 1 Million Song dataset. We cut down the songs to use to 35,000, and ensured that certain features such as song popularity, tempo, etc. existed to ensure a clean database.

In order to test whether or not our metadata for songs can lead to predictions about potential popularity, we decided to use three different classifiers. We utilized two support vector machines: one with a linear, the other with a gaussian and Logistic Regression in an attempt to find linear relationships with features and song popularity. Our main class of features that our classifier utilized include loudness2 , artist hotness (read popularity), artist familiarity tempo, key, tempo * key (often switched to tempo * mode), mean segment pitch, mean segment timbre, mode, duration, and genre tags.

The results of our experiment in classifying popular music ultimately lead to the conclusion that metadata defining tracks does not offer a reliable way to determine how popular a song could be. Rather outside factors such as the existing popularity of an artist have higher influence on the potential popularity of an artist. Additionally, the averages of harder to quantify values like timbre can lose a lot of the uniqueness that define songs. Thus the metadata given has limitations in describing and defining different songs which can fall on the unpopular or popular side of classification.

We created a web app that allows users to feed into our classifier songs found on Spotify. Our application calculates its own popularity metric and also classifies the song is popular or not (based on its calculated popularity metric). We provide other visualizations as well, including a word-bubble that graphs both artist and genre popularity over the span of a specified range of years that the user chooses. The larger the word, the more popular the artist or genre is. Finally, we visualized a 2-dimensional graph which plotted any of our 7 raw features (artist hotness, duration, key, loudness, mode, tempo, time signature) against one another. Ideally, we would like to explore all data points plotted on a 7-dimensional space to highlight key ranges that suggest correlation with song popularity. However, since this would be nearly impossible to make sense of on a 2D surface, we opted for the 2D option.