<aside>
Table of Contents
</aside>
https://apption.co/embeds/b7f0cb4d
If the dashboard doesn't load or display fully, view it directly on my Tableau page.
Using Spotify’s “Download your data” tool, users can request a copy of their personal data. Spotify offers three types of data packages, which can be downloaded individually or together:
Account data, Extended streaming history and Technical log information.
For this project, I requested my extended streaming history, which contains detailed records of all audio and video content I’ve streamed since opening my account, including track metadata, timestamps, and playback behavior. According to Spotify, preparation can take up to 30 days.
A few days after submitting my request, I received a ZIP file containing 11 JSON files, each representing a segment of my listening history:
Streaming_History_Audio_2014-2018_0.json
Streaming_History_Audio_2018-2019_1.json
Streaming_History_Audio_2019-2020_2.json
...
Streaming_History_Audio_2023-2024_8.json
Streaming_History_Audio_2024-2025_9.json
Streaming_History_Video_2020-2024.json
Here’s a sample JSON object and what each field means:
{
"ts": "2024-11-11T03:30:29Z", // Date and time when the stream ended (UTC)
"platform": "ios", // Platform used to stream the track
"ms_played": 1514, // Duration the track was played (in milliseconds)
"conn_country": "CA", // Country code where the stream occurred
"ip_addr": "24.202.7.143", // IP address used during the stream
"master_metadata_track_name": "Supernova", // Name of the track
"master_metadata_album_artist_name": "aespa", // Name of the artist or band
"master_metadata_album_album_name": "Armageddon - The 1st Album", // Name of the album
"spotify_track_uri": "spotify:track:5lKnZbdGCBViitE1Ce5TZh", // Spotify URI identifying the track
"episode_name": null, // Name of the podcast episode (if applicable)
"episode_show_name": null, // Name of the podcast show (if applicable)
"spotify_episode_uri": null, // Spotify URI identifying the podcast episode
"audiobook_title": null, // Name of the audiobook (if applicable)
"audiobook_uri": null, // Spotify URI identifying the audiobook
"audiobook_chapter_uri": null, // Spotify URI identifying the audiobook chapter
"audiobook_chapter_title": null, // Name of the audiobook chapter
"reason_start": "clickrow", // Why the track started (e.g., clickrow, autoplay)
"reason_end": "endplay", // Why the track ended (e.g., endplay, forwardbutton)
"shuffle": true, // Whether shuffle mode was used
"skipped": true, // Whether the user skipped the track
"offline": false, // Whether the track was played offline
"offline_timestamp": 1731295828, // Timestamp of when offline mode was used (if used)
"incognito_mode": false // Whether the track was played in a private session
}
This data structure is consistent across all audio and video streaming history files, with one JSON object representing each individual stream.
For detailed definitions of each field, Spotify provides a reference guide titled “Read Me First – Extended Streaming History.” You can access it on my GitHub.
To prepare the data for analysis and visualization in Tableau, I followed these key steps using Python and pandas:
<aside>
1. Merge all files: Scanned the target folder for all .json files, loaded each with pandas, and concatenated them into a single master dataset.
2. Convert timestamps: Parsed the raw UTC timestamps (ts) into timezone-aware datetime objects.
3. Filter for music only: Removed all rows containing podcast or audiobook metadata based on non-null media-specific columns.
4. Remove duplicates: Dropped exact duplicate rows across all fields to ensure data integrity.
5. Handle missing data: Filtered out any rows missing either the track name or artist name.
6. Adjust for local time: Converted UTC timestamps to local time using country-specific time zones (defaulted to UTC when unavailable).
7. Export to CSV: Dropped temporary helper columns and saved the final cleaned dataset as a .csv file, ready for use in Tableau.
</aside>