Spotify Playlist ETL Pipeline

PythonSpotipyAWS LambdaAWS GlueAmazon S3boto3PandasSparkSnowflakeSnowpipeSQLTableauETLData Modeling

Designed an end-to-end AWS ETL pipeline for Spotify playlist analytics. Extract Lambda writes raw API payloads to S3, transform Lambda creates normalized albums/artists/songs tables with deduplication and mixed-date handling, and Snowflake stages with Snowpipe auto-ingest transformed files for query-ready analytics.

About the Project

Production-style data engineering project focused on reliability and analyst usability. The final pipeline automates ingestion and modeling so new playlist data lands in warehouse tables with minimal manual effort and faster time-to-insight.

Key Features

  • Extract Lambda uses Spotify API credentials from environment variables and writes raw JSON to S3
  • Transform Lambda parses playlist payloads into album, artist, and song datasets
  • Applied deduplication by primary keys and standardized mixed release-date formats via fallback parsing
  • Wrote transformed CSV outputs to separate S3 prefixes for each entity table
  • Created Snowflake storage integration, stages, and file formats with Snowpipe AUTO_INGEST for continuous loading
  • Reduced manual data preparation by ~90%, improved transformation speed by ~30%, and produced Tableau-ready outputs
Spotify Playlist ETL Pipeline | Mayur Bijarniya