Methodology
​
To perform our analysis, we decided to combine two data sets: the ‘Billboard Year End Hot 100 songs’ chart from 1970 to 2019 and the Spotify API.

Billboard Year End
Hot 100 songs
​
The Year-End, Hot 100 songs Billboard chart ranks every year the most listened songs in the United States. Songs appearing on top of the chart can most of the time be considered as ‘very popular’.
​
For our project, we decided to only keep the top 20 songs of the billboard chart each year between 1970 and 2019. We believed that the top 20 really represented the popular songs of the year. Moreover, choosing as many as 20 songs rather than less, left us with a large enough database to perform a solid analysis.
​
As the Billboard website only contained the charts up to 2006, we had to use the Billboard Year-End Hot 100 Wikipedia Page, containing archives of the Billboard magazine rankings between 1970 and 2005.
​
Even if the reliability of the Wikipedia data can be questioned, the top 100 songs listed on the Wikipedia page between 2006 and 2020 correspond exactly to the official Billboard chart.


The Spotify API
After having chosen the songs we wanted to analyze, we used the Spotify API to obtain the audio features of our songs.
​
In order to work with this Spotify API, we used a new python library 'Spotipy'. ‘Spotipy’ is a library created by Spotify to access all its music data including the audio features of a track.
​
In order for the Spotify API to analyse our tracks, we added the songs we wanted to analyse into playlists on Spotify. We created playlists containing the top 20 songs of the Billboard chart each year since 1970 under the profile louaustralia. We left these playlists public and accessible to all as we believe in the importance of Open Data.
​
We finally obtained a dataset containing the audio features of the Top 20 songs of the US Billboard chart since 1970. These allowed us to look at the evolution of music in the last 50 years.
​
The code used for the Spotify API can be found here.
The Spotify Audio Features
Below is the list of all the audio features we have included in our data analysis and their definitions. Spotify has also included popularity as an audio feature but as we have exclusively chosen ‘popular’ songs for our data, we felt there was no need to include this feature. We have included the rest to help provide a complete view of the makeup of a song (Spotify,2020).
Length: Length gives the duration of the track in milliseconds.
Danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. The value of 0.0 is least danceable and 1.0 is most danceable.
​
Energy: Energy represents a perceptual measure of intensity and activity and is a measure from 0.0 to 1.0. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
​
Instrumentalness: Instrumentalness predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
​
Acousticness: Acousticness give a confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic and 0.0 low confidence.
Liveness: Liveness detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
Loudness: Loudness gives the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
Tempo: Tempo gives the overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
Time Signature: Time Signature estimates the overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure).
​
Source: Spotify