Web Scraping

How to scrape Spotify in 2026: 5 working methods

Spotify holds treasure troves of music data that developers, researchers, and music enthusiasts want to extract. But getting that data without getting blocked requires the right approach.

Whether you want to scrape Spotify playlists for analysis or build a music recommendation engine, you need reliable extraction methods. This guide shows you five working methods to scrape Spotify data in 2026.

We'll cover everything from the official API to browser automation, including practical code you can run today. By the end, you'll know exactly how to scrape Spotify tracks, artists, and playlists efficiently.

What Does Scraping Spotify Mean?

Scraping Spotify involves extracting track metadata, playlist information, artist details, and audio features programmatically from Spotify's platform. You can access this data through the official Web API with authentication, the embed API without credentials, or browser automation tools that render JavaScript-heavy pages.

The best method depends on your needs. The API gives structured data with rate limits. Web scraping gives flexibility but requires handling anti-bot protections.

Method 1: Use the Official Spotify Web API

The most reliable way to scrape Spotify data involves their official Web API. It requires authentication but offers structured JSON responses and predictable rate limits.

Step 1: Create a Spotify Developer Account

Head to the Spotify Developer Dashboard and log in with your Spotify account. Free accounts work fine.

Click "Create App" and fill in the required fields. The redirect URI can be http://localhost:8888/callback for testing purposes.

Once created, you'll see your Client ID and Client Secret. Keep these safe—you'll need them for authentication.

Step 2: Get an Access Token

Spotify uses OAuth 2.0. For server-to-server requests without user login, use the Client Credentials flow:

import requests
import base64

client_id = "your_client_id"
client_secret = "your_client_secret"

# Encode credentials
credentials = f"{client_id}:{client_secret}"
encoded = base64.b64encode(credentials.encode()).decode()

# Request token
response = requests.post(
    "https://accounts.spotify.com/api/token",
    headers={
        "Authorization": f"Basic {encoded}",
        "Content-Type": "application/x-www-form-urlencoded"
    },
    data={"grant_type": "client_credentials"}
)

token = response.json()["access_token"]
print(f"Token: {token}")

This code encodes your credentials as Base64, sends them to Spotify's token endpoint, and retrieves an access token valid for one hour.

Step 3: Make API Requests

With your token, you can now query Spotify endpoints:

def get_track_info(track_id, token):
    """Fetch track details from Spotify API"""
    url = f"https://api.spotify.com/v1/tracks/{track_id}"
    
    response = requests.get(
        url,
        headers={"Authorization": f"Bearer {token}"}
    )
    
    return response.json()

# Example: Get info for "Blinding Lights"
track = get_track_info("0VjIjW4GlUZAMYd2vXMi3b", token)
print(f"Track: {track['name']}")
print(f"Artist: {track['artists'][0]['name']}")
print(f"Album: {track['album']['name']}")
print(f"Duration: {track['duration_ms'] / 1000:.0f} seconds")

The API returns rich metadata including popularity scores, available markets, preview URLs, and album artwork links.

Step 4: Scrape Playlist Tracks

Playlists require pagination since Spotify returns 100 tracks maximum per request:

def get_all_playlist_tracks(playlist_id, token):
    """Fetch all tracks from a playlist with pagination"""
    tracks = []
    url = f"https://api.spotify.com/v1/playlists/{playlist_id}/tracks"
    
    while url:
        response = requests.get(
            url,
            headers={"Authorization": f"Bearer {token}"}
        )
        data = response.json()
        
        for item in data["items"]:
            track = item["track"]
            if track:  # Skip null tracks
                tracks.append({
                    "name": track["name"],
                    "artist": track["artists"][0]["name"],
                    "duration_ms": track["duration_ms"],
                    "popularity": track["popularity"]
                })
        
        url = data.get("next")  # Next page URL or None
    
    return tracks

# Example usage
playlist_tracks = get_all_playlist_tracks("37i9dQZF1DXcBWIGoYBM5M", token)
print(f"Found {len(playlist_tracks)} tracks")

The next field contains the URL for the next page. Keep requesting until it returns None.

Method 2: Scrape with SpotifyScraper Library (No Auth)

Don't want to deal with API credentials? The SpotifyScraper library extracts data from Spotify's embed API without any authentication.

Installation

pip install spotifyscraper

For Selenium support (handles JavaScript-heavy pages):

pip install "spotifyscraper[selenium]"

Basic Usage

from spotify_scraper import SpotifyClient

# No authentication needed
client = SpotifyClient()

# Get track information
track_url = "https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh"
track = client.get_track_info(track_url)

print(f"Track: {track.get('name', 'Unknown')}")
print(f"Artist: {track['artists'][0]['name']}")
print(f"Duration: {track['duration_ms'] / 1000:.0f} seconds")

# Always close the client
client.close()

This library scrapes Spotify's embed endpoints that don't require OAuth tokens. It works for public tracks, albums, and playlists.

Download Preview Audio

SpotifyScraper can grab 30-second preview clips:

from spotify_scraper import SpotifyClient

client = SpotifyClient()

# Download preview MP3
audio_path = client.download_preview_mp3(
    "https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh",
    path="previews/",
    filename="daft_punk_preview.mp3"
)

print(f"Saved to: {audio_path}")
client.close()

This saves a 30-second preview as an MP3 file. Full tracks aren't available through scraping due to DRM protection.

Download Album Artwork

from spotify_scraper import SpotifyClient

client = SpotifyClient()

# Download cover art
cover_path = client.download_cover(
    "https://open.spotify.com/album/2noRn2Aes5aoNVsU6iWThc",
    path="covers/",
    size_preference="large",  # small, medium, or large
    format="jpeg"
)

print(f"Cover saved to: {cover_path}")
client.close()

Album covers come in three sizes. The "large" option gives you high-resolution images suitable for display.

Bulk Operations

For scraping multiple URLs efficiently:

from spotify_scraper import SpotifyClient
from spotify_scraper.utils.common import SpotifyBulkOperations

client = SpotifyClient()
bulk = SpotifyBulkOperations(client)

urls = [
    "https://open.spotify.com/track/4iV5W9uYEdYUVa79Axb7Rh",
    "https://open.spotify.com/track/0VjIjW4GlUZAMYd2vXMi3b",
    "https://open.spotify.com/album/2noRn2Aes5aoNVsU6iWThc"
]

# Process all URLs
results = bulk.process_urls(urls, operation="all_info")

# Export to files
bulk.export_to_json(results, "spotify_data.json")
bulk.export_to_csv(results, "spotify_data.csv")

client.close()

The bulk operations handler manages rate limiting and exports data directly to JSON or CSV formats.

Method 3: Browser Automation with Playwright

When embed APIs don't cut it, browser automation renders full JavaScript pages. Playwright excels at handling dynamic Spotify content.

Installation

pip install playwright lxml
playwright install chromium

Scraping a Playlist Page

import asyncio
from playwright.async_api import async_playwright
from lxml.html import fromstring

async def scrape_spotify_playlist(playlist_url):
    async with async_playwright() as p:
        # Launch browser
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        
        # Navigate to playlist
        await page.goto(playlist_url, wait_until='networkidle')
        
        # Wait for content to load
        await page.wait_for_timeout(3000)
        
        # Get page HTML
        content = await page.content()
        await browser.close()
        
        return content

async def extract_tracks(html_content):
    parser = fromstring(html_content)
    
    # Extract track names using XPath
    track_names = parser.xpath(
        '//div[@data-testid="tracklist-row"]'
        '//a[@data-testid="internal-track-link"]/div/text()'
    )
    
    # Extract artist names
    artist_names = parser.xpath(
        '//div[@data-testid="tracklist-row"]'
        '//div[@data-encore-id="text"]/a/text()'
    )
    
    return list(zip(track_names, artist_names))

async def main():
    url = "https://open.spotify.com/playlist/37i9dQZF1DXcBWIGoYBM5M"
    html = await scrape_spotify_playlist(url)
    tracks = await extract_tracks(html)
    
    for name, artist in tracks[:10]:
        print(f"{name} - {artist}")

asyncio.run(main())

Playwright waits for the network to settle before grabbing HTML. The XPath selectors target Spotify's internal component structure.

Adding Proxy Support

For scraping at scale, rotate IP addresses to avoid rate limits:

async def scrape_with_proxy(playlist_url, proxy_config):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy={
                "server": proxy_config["server"],
                "username": proxy_config["username"],
                "password": proxy_config["password"]
            }
        )
        
        page = await browser.new_page()
        await page.goto(playlist_url, wait_until='networkidle')
        await page.wait_for_timeout(3000)
        
        content = await page.content()
        await browser.close()
        
        return content

# Example with proxy
proxy = {
    "server": "http://proxy.roundproxies.com:8080",
    "username": "your_username",
    "password": "your_password"
}

html = asyncio.run(scrape_with_proxy(playlist_url, proxy))

Residential proxies work best for Spotify since datacenter IPs often get flagged. Roundproxies.com offers residential and mobile proxies specifically designed for web scraping scenarios.

Handling Infinite Scroll

Some Spotify pages load content as you scroll:

async def scroll_and_scrape(page, scroll_pause=2):
    """Scroll page until no new content loads"""
    last_height = await page.evaluate("document.body.scrollHeight")
    
    while True:
        # Scroll to bottom
        await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
        await page.wait_for_timeout(scroll_pause * 1000)
        
        # Calculate new scroll height
        new_height = await page.evaluate("document.body.scrollHeight")
        
        if new_height == last_height:
            break
        
        last_height = new_height
    
    return await page.content()

This scrolls repeatedly until the page height stops changing. Works great for long playlists with hundreds of tracks.

Method 4: Use Spotipy for Advanced Data

Spotipy is the most mature Python wrapper for Spotify's Web API. It handles authentication flows and provides clean abstractions.

Installation

pip install spotipy

Client Credentials Flow

For public data without user login:

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

client_id = "your_client_id"
client_secret = "your_client_secret"

sp = spotipy.Spotify(
    auth_manager=SpotifyClientCredentials(
        client_id=client_id,
        client_secret=client_secret
    )
)

# Search for tracks
results = sp.search(q='artist:Daft Punk', type='track', limit=10)

for track in results['tracks']['items']:
    print(f"{track['name']} - {track['album']['name']}")

The credentials manager handles token refresh automatically. No need to manually track expiration.

Get Audio Features

Spotify provides machine learning-derived audio features:

def get_audio_features(sp, track_ids):
    """Fetch audio features for multiple tracks"""
    features = sp.audio_features(track_ids)
    
    for f in features:
        if f:
            print(f"Track: {f['id']}")
            print(f"  Danceability: {f['danceability']}")
            print(f"  Energy: {f['energy']}")
            print(f"  Tempo: {f['tempo']} BPM")
            print(f"  Valence: {f['valence']} (happiness)")
            print()

# Example
track_ids = [
    "4iV5W9uYEdYUVa79Axb7Rh",  # One More Time
    "0VjIjW4GlUZAMYd2vXMi3b"   # Blinding Lights
]

get_audio_features(sp, track_ids)

Audio features include danceability, energy, tempo, valence (happiness), and more. Perfect for building recommendation systems or analyzing music trends.

Get Artist Discography

def get_artist_albums(sp, artist_id):
    """Fetch all albums from an artist"""
    albums = []
    results = sp.artist_albums(artist_id, album_type='album')
    
    while results:
        albums.extend(results['items'])
        
        if results['next']:
            results = sp.next(results)
        else:
            results = None
    
    return albums

# Example: Get Daft Punk albums
daft_punk_id = "4tZwfgrHOc3mvqYlEYSvVi"
albums = get_artist_albums(sp, daft_punk_id)

for album in albums:
    print(f"{album['name']} ({album['release_date'][:4]})")

Spotipy's next() method handles pagination automatically. Just keep calling until you've retrieved everything.

Method 5: Direct HTTP Requests to Embed API

For lightweight scraping without external libraries, hit Spotify's embed endpoints directly.

The Embed Endpoint

Spotify's embed player exposes track data publicly:

import requests

def get_embed_data(track_id):
    """Fetch track data from embed endpoint"""
    url = f"https://open.spotify.com/embed/track/{track_id}"
    
    response = requests.get(url)
    
    if response.status_code != 200:
        return None
    
    # Parse the response HTML for embedded JSON
    html = response.text
    
    # Find the embedded resource data
    start_marker = '"resource":"'
    end_marker = '"}'
    
    start = html.find(start_marker)
    if start == -1:
        return None
    
    # Extract and decode the URI-encoded JSON
    start += len(start_marker)
    end = html.find(end_marker, start)
    
    import urllib.parse
    import json
    
    encoded_data = html[start:end]
    decoded_data = urllib.parse.unquote(encoded_data)
    
    return json.loads(decoded_data)

# Example
data = get_embed_data("4iV5W9uYEdYUVa79Axb7Rh")
if data:
    print(f"Track: {data.get('name')}")
    print(f"Artist: {data.get('artists', [{}])[0].get('name')}")

The embed page contains URI-encoded JSON with track metadata. No authentication required.

Better Parsing with BeautifulSoup

import requests
from bs4 import BeautifulSoup
import json
import urllib.parse

def scrape_embed_track(track_id):
    """Robust embed scraping with BeautifulSoup"""
    url = f"https://open.spotify.com/embed/track/{track_id}"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
    }
    
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find script tag with resource data
    scripts = soup.find_all('script')
    
    for script in scripts:
        if script.string and '"resource"' in script.string:
            # Extract JSON from script
            content = script.string
            start = content.find('{"@context"')
            
            if start != -1:
                # Find matching closing brace
                depth = 0
                for i, char in enumerate(content[start:]):
                    if char == '{':
                        depth += 1
                    elif char == '}':
                        depth -= 1
                        if depth == 0:
                            json_str = content[start:start + i + 1]
                            return json.loads(json_str)
    
    return None

track_data = scrape_embed_track("4iV5W9uYEdYUVa79Axb7Rh")

This parses the embedded JSON-LD structured data that Spotify includes for SEO purposes. More reliable than regex matching.

Handling Rate Limits and Anti-Bot Protections

When you scrape Spotify aggressively, the platform will block you. Spotify actively blocks aggressive scrapers. Here's how to stay under the radar.

Implement Request Delays

import time
import random

def polite_request(url, headers, min_delay=1, max_delay=3):
    """Make request with random delay"""
    time.sleep(random.uniform(min_delay, max_delay))
    return requests.get(url, headers=headers)

Random delays between 1-3 seconds mimic human browsing patterns. Consistent timing gets flagged.

Rotate User Agents

import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
]

def get_random_headers():
    return {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "DNT": "1"
    }

Rotate between real browser user agents. Spotify checks for automated tools through fingerprinting.

Use Residential Proxies

Datacenter IPs get blocked fast. Residential proxies look like real users:

def make_proxied_request(url, proxies):
    """Rotate through proxy list"""
    proxy = random.choice(proxies)
    
    return requests.get(
        url,
        proxies={
            "http": f"http://{proxy}",
            "https": f"http://{proxy}"
        },
        timeout=30
    )

# Example proxy list
proxies = [
    "user:pass@residential1.roundproxies.com:8080",
    "user:pass@residential2.roundproxies.com:8080",
    "user:pass@residential3.roundproxies.com:8080"
]

Quality residential or ISP proxies from providers like Roundproxies reduce block rates significantly.

Handle Rate Limit Responses

def robust_request(url, headers, max_retries=5):
    """Handle rate limits with exponential backoff"""
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        
        if response.status_code == 200:
            return response
        
        if response.status_code == 429:  # Too Many Requests
            # Get retry-after header or use exponential backoff
            retry_after = int(response.headers.get('Retry-After', 2 ** attempt))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
            continue
        
        if response.status_code in [403, 503]:
            # Likely bot detection - switch proxy or wait longer
            wait_time = 60 * (attempt + 1)
            print(f"Blocked. Waiting {wait_time}s...")
            time.sleep(wait_time)
            continue
    
    return None

Exponential backoff increases wait times after each failure. The 429 response often includes a Retry-After header.

Storing Your Scraped Data

Save to CSV

import csv

def save_tracks_csv(tracks, filename):
    """Save track list to CSV file"""
    with open(filename, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=['name', 'artist', 'album', 'duration_ms'])
        writer.writeheader()
        writer.writerows(tracks)

tracks = [
    {"name": "One More Time", "artist": "Daft Punk", "album": "Discovery", "duration_ms": 320357},
    {"name": "Blinding Lights", "artist": "The Weeknd", "album": "After Hours", "duration_ms": 200040}
]

save_tracks_csv(tracks, "spotify_tracks.csv")

CSV works great for smaller datasets and Excel compatibility.

Save to JSON

import json

def save_tracks_json(tracks, filename):
    """Save track list to JSON file"""
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(tracks, f, indent=2, ensure_ascii=False)

save_tracks_json(tracks, "spotify_tracks.json")

JSON preserves nested structures better. Ideal for complex metadata with arrays.

Store in SQLite

import sqlite3

def create_database():
    """Create SQLite database for track storage"""
    conn = sqlite3.connect('spotify_data.db')
    cursor = conn.cursor()
    
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS tracks (
            id TEXT PRIMARY KEY,
            name TEXT,
            artist TEXT,
            album TEXT,
            duration_ms INTEGER,
            popularity INTEGER,
            scraped_at DATETIME DEFAULT CURRENT_TIMESTAMP
        )
    ''')
    
    conn.commit()
    return conn

def insert_track(conn, track):
    """Insert track into database"""
    cursor = conn.cursor()
    cursor.execute('''
        INSERT OR REPLACE INTO tracks (id, name, artist, album, duration_ms, popularity)
        VALUES (?, ?, ?, ?, ?, ?)
    ''', (
        track['id'],
        track['name'],
        track['artist'],
        track['album'],
        track['duration_ms'],
        track.get('popularity', 0)
    ))
    conn.commit()

SQLite handles larger datasets and supports querying. No server setup required.

Common Pitfalls to Avoid

Pitfall 1: Ignoring Rate Limits

Hammering Spotify's API gets you banned fast. The official API allows roughly 180 requests per minute. Web scraping should stay under 1 request per second.

Fix: Implement delays and respect rate limit headers.

Pitfall 2: Hardcoding Selectors

Spotify updates their web interface frequently. XPath selectors that work today might break tomorrow.

Fix: Use data-testid attributes when available—they're more stable than class names.

Pitfall 3: Missing Pagination

Many endpoints return limited results. Playlist tracks cap at 100. Search results cap at 50.

Fix: Always check for next fields and paginate until exhausted.

Pitfall 4: Not Handling Empty Tracks

Playlists sometimes contain null tracks (removed or unavailable songs).

Fix: Check if track exists before accessing properties:

for item in playlist_data['items']:
    track = item.get('track')
    if track:
        # Process track
        pass

Pitfall 5: Using Free Proxies

Public proxy lists contain honeypots and dead IPs. They'll get you blocked faster.

Fix: Invest in quality residential proxies from reputable providers.

Pipeline Architecture

import logging
from datetime import datetime
from typing import List, Dict
import time

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class SpotifyPipeline:
    def __init__(self, client, db_connection):
        self.client = client
        self.db = db_connection
        self.failed_ids = []
    
    def process_playlist(self, playlist_id: str) -> int:
        """Process entire playlist with error handling"""
        processed = 0
        
        try:
            tracks = self.client.get_playlist_tracks(playlist_id)
            
            for track in tracks:
                try:
                    self.db.insert_track(track)
                    processed += 1
                except Exception as e:
                    logger.error(f"Failed to insert track {track.get('id')}: {e}")
                    self.failed_ids.append(track.get('id'))
            
            logger.info(f"Processed {processed} tracks from playlist {playlist_id}")
            
        except Exception as e:
            logger.error(f"Failed to fetch playlist {playlist_id}: {e}")
        
        return processed
    
    def retry_failed(self) -> int:
        """Retry processing failed tracks"""
        recovered = 0
        
        for track_id in self.failed_ids.copy():
            try:
                track = self.client.get_track(track_id)
                self.db.insert_track(track)
                self.failed_ids.remove(track_id)
                recovered += 1
                time.sleep(1)  # Rate limiting
            except Exception as e:
                logger.warning(f"Retry failed for {track_id}: {e}")
        
        return recovered

This pipeline class handles batch processing with built-in retry logic.

Scheduling Regular Scrapes

Use cron or a scheduler to run scrapes automatically:

import schedule
import time

def daily_scrape_job():
    """Run daily playlist scrape"""
    playlists = [
        "37i9dQZF1DXcBWIGoYBM5M",  # Today's Top Hits
        "37i9dQZF1DX0XUsuxWHRQd",  # RapCaviar
        "37i9dQZF1DX4JAvHpjipBk"   # New Music Friday
    ]
    
    pipeline = SpotifyPipeline(client, db)
    
    for playlist_id in playlists:
        pipeline.process_playlist(playlist_id)
        time.sleep(60)  # Wait between playlists

# Schedule daily at 2 AM
schedule.every().day.at("02:00").do(daily_scrape_job)

# Run scheduler
while True:
    schedule.run_pending()
    time.sleep(60)

Schedule scrapes during off-peak hours. Space out requests to avoid triggering rate limits.

Monitoring and Alerts

Track your scraping success rate:

class ScrapeMonitor:
    def __init__(self):
        self.total_requests = 0
        self.successful = 0
        self.failed = 0
        self.rate_limited = 0
    
    def log_request(self, status_code: int):
        self.total_requests += 1
        
        if status_code == 200:
            self.successful += 1
        elif status_code == 429:
            self.rate_limited += 1
        else:
            self.failed += 1
    
    def get_success_rate(self) -> float:
        if self.total_requests == 0:
            return 0.0
        return (self.successful / self.total_requests) * 100
    
    def should_alert(self) -> bool:
        """Alert if success rate drops below 80%"""
        return self.get_success_rate() < 80.0
    
    def report(self) -> Dict:
        return {
            "total": self.total_requests,
            "successful": self.successful,
            "failed": self.failed,
            "rate_limited": self.rate_limited,
            "success_rate": f"{self.get_success_rate():.1f}%"
        }

Monitor your success rates. If they drop below 80%, you're likely getting detected.

What Changed in Spotify Scraping for 2026

Spotify tightened security significantly after the December 2025 Anna's Archive incident where hackers scraped 86 million tracks. Here's what changed:

Stricter Rate Limiting

API rate limits dropped from ~180 requests/minute to approximately 100. Web endpoints now implement more aggressive fingerprinting.

Enhanced Bot Detection

Spotify now uses behavioral analysis beyond simple rate limiting. Consistent request timing, missing mouse movements, and automated browser signatures trigger blocks faster.

Embed API Restrictions

Some previously public embed endpoints now require authentication or return limited data. The SpotifyScraper library maintains workarounds, but expect occasional breakage.

Recommendations for 2026

  1. Use the official API whenever possible - It remains the most stable option
  2. Implement human-like delays - Random intervals between 2-5 seconds
  3. Rotate residential proxies - Datacenter IPs get blocked almost immediately
  4. Keep libraries updated - SpotifyScraper releases frequent patches
  5. Cache aggressively - Don't re-fetch data you already have

The landscape keeps evolving. Stay updated by checking the SpotifyScraper GitHub repository for the latest techniques and workarounds.

Conclusion

You now have five working methods to scrape Spotify data in 2026:

  1. Official Web API - Most reliable, requires authentication
  2. SpotifyScraper library - No auth, uses embed API
  3. Playwright browser automation - Handles JavaScript, supports proxies
  4. Spotipy - Best Python wrapper for API access
  5. Direct HTTP requests - Lightweight embed scraping

Start with the official API for structured data. Use SpotifyScraper for quick no-auth access. Fall back to Playwright when you need rendered JavaScript content.

When you scrape Spotify at scale, remember to respect rate limits, rotate your headers and proxies, and store data efficiently. Building reliable scrapers takes practice, but these methods give you the foundation to extract any public Spotify data you need.

Frequently Asked Questions

Scraping public data for personal or research use generally falls under fair use. However, Spotify's Terms of Service prohibit automated access. The official API is always the safest option.

Can I download full songs from Spotify?

No. Spotify uses Widevine DRM protection on audio streams. You can only download 30-second preview clips legally. Full track downloads require circumventing DRM, which violates the DMCA.

How do I avoid getting blocked while scraping?

Use residential proxies, rotate user agents, add random delays between requests, and limit your request rate. The SpotifyScraper library includes built-in rate limiting.

What's the best proxy type for Spotify scraping?

Residential proxies work best since they use real ISP IP addresses. Datacenter proxies get detected quickly. Mobile proxies offer the highest success rates but cost more.

How often does Spotify change their website structure?

Spotify updates their web interface every few weeks. Class names change frequently. Use data-testid attributes for selectors—they remain stable across updates. The official API structure rarely changes.