Web Scraping

How to scrape SoundCloud in 2026: 4 simple methods

12 January 2026

15 min read

SoundCloud scraping lets you extract track metadata, artist profiles, play counts, and audio information from one of the world's largest music platforms. This guide covers multiple approaches to scrape SoundCloud effectively in 2026, from lightweight HTTP requests to full browser automation.

Whether you're building a music analytics tool, tracking artist performance, or collecting data for research, you'll find working code examples for every skill level.

What is SoundCloud Scraping?

When you scrape SoundCloud, you programmatically extract data from their website or internal API endpoints. SoundCloud hosts over 320 million tracks from independent artists, podcasters, and creators worldwide. Learning to scrape SoundCloud data opens up possibilities for music analytics, trend research, and building recommendation engines.

The data you can extract includes track titles, artist names, play counts, likes, comments, waveform data, genre tags, and profile information. This makes SoundCloud a valuable source for music trend analysis, competitor research, and building recommendation systems.

SoundCloud loads most content dynamically through JavaScript. This presents a challenge because simple HTTP requests won't see data that gets rendered after page load.

You have three main options for scraping SoundCloud:

API-based extraction - Intercept SoundCloud's internal API calls
HTTP requests with session handling - Simulate browser requests directly
Browser automation - Use Playwright or Puppeteer to render JavaScript

Each method has trade-offs between speed, reliability, and complexity. Let's explore all three.

Prerequisites and Setup

Before diving into code, set up your Python environment.

Install the required packages:

pip install requests beautifulsoup4 playwright pandas

For Playwright, you also need to install browser binaries:

playwright install chromium

Create a project folder and add this base configuration file:

# config.py
import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
]

def get_random_ua():
    return random.choice(USER_AGENTS)

# Request delays to avoid rate limiting
MIN_DELAY = 1.5
MAX_DELAY = 3.0

# Base headers for requests
BASE_HEADERS = {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "Accept-Language": "en-US,en;q=0.9",
    "Origin": "https://soundcloud.com",
    "Referer": "https://soundcloud.com/",
}

This configuration rotates user agents and sets appropriate delays between requests.

Method 1: Extracting the Client ID

SoundCloud's internal API requires a client_id parameter for authentication. This ID changes periodically, so you need to extract it dynamically.

Here's how to find the current client ID:

# extract_client_id.py
import requests
import re
from config import get_random_ua, BASE_HEADERS

def get_client_id():
    """
    Extract SoundCloud's client_id from their JavaScript bundles.
    The client_id is embedded in one of the app's JS files.
    """
    headers = {**BASE_HEADERS, "User-Agent": get_random_ua()}
    
    # First, get the main page to find JS bundle URLs
    response = requests.get("https://soundcloud.com", headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"Failed to load SoundCloud: {response.status_code}")
    
    # Find all script URLs that match SoundCloud's asset pattern
    script_pattern = r'https://a-v2\.sndcdn\.com/assets/[a-zA-Z0-9-]+\.js'
    script_urls = re.findall(script_pattern, response.text)
    
    # The client_id is typically in one of the last JS bundles
    for url in reversed(script_urls):
        js_response = requests.get(url, headers=headers)
        
        if js_response.status_code == 200:
            # Look for client_id pattern in the JS code
            client_id_match = re.search(
                r'client_id["\']?\s*[:=]\s*["\']([a-zA-Z0-9]{32})["\']',
                js_response.text
            )
            
            if client_id_match:
                return client_id_match.group(1)
    
    raise Exception("Could not find client_id in SoundCloud scripts")


if __name__ == "__main__":
    client_id = get_client_id()
    print(f"Found client_id: {client_id}")

The script fetches SoundCloud's homepage, finds all JavaScript bundle URLs, then searches through them for the client ID pattern.

Save your extracted client ID because you'll use it for all API-based scraping methods.

Method 2: API-Based Track Scraping

With the client ID in hand, you can query SoundCloud's internal API directly. This is the fastest approach and returns clean JSON data.

# scrape_tracks.py
import requests
import time
import random
import pandas as pd
from config import get_random_ua, BASE_HEADERS, MIN_DELAY, MAX_DELAY
from extract_client_id import get_client_id

class SoundCloudScraper:
    """
    Scrape SoundCloud tracks using their internal API.
    Returns structured data including play counts, likes, and metadata.
    """
    
    def __init__(self, client_id=None):
        self.client_id = client_id or get_client_id()
        self.base_url = "https://api-v2.soundcloud.com"
        self.session = requests.Session()
        self.session.headers.update({
            **BASE_HEADERS,
            "User-Agent": get_random_ua()
        })
    
    def search_tracks(self, query, limit=50):
        """
        Search for tracks matching a query string.
        Returns list of track dictionaries with metadata.
        """
        endpoint = f"{self.base_url}/search/tracks"
        params = {
            "q": query,
            "client_id": self.client_id,
            "limit": min(limit, 50),  # API limit per request
            "offset": 0,
        }
        
        all_tracks = []
        
        while len(all_tracks) < limit:
            response = self.session.get(endpoint, params=params)
            
            if response.status_code != 200:
                print(f"API error: {response.status_code}")
                break
            
            data = response.json()
            tracks = data.get("collection", [])
            
            if not tracks:
                break
            
            for track in tracks:
                all_tracks.append(self._parse_track(track))
            
            # Check if there are more results
            next_href = data.get("next_href")
            if not next_href:
                break
            
            params["offset"] += 50
            
            # Respect rate limits
            time.sleep(random.uniform(MIN_DELAY, MAX_DELAY))
        
        return all_tracks[:limit]
    
    def _parse_track(self, track_data):
        """
        Extract relevant fields from raw API response.
        Handles missing fields gracefully.
        """
        user = track_data.get("user", {})
        
        return {
            "id": track_data.get("id"),
            "title": track_data.get("title"),
            "artist": user.get("username"),
            "artist_id": user.get("id"),
            "duration_ms": track_data.get("duration"),
            "play_count": track_data.get("playback_count", 0),
            "like_count": track_data.get("likes_count", 0),
            "repost_count": track_data.get("reposts_count", 0),
            "comment_count": track_data.get("comment_count", 0),
            "genre": track_data.get("genre"),
            "tags": track_data.get("tag_list"),
            "created_at": track_data.get("created_at"),
            "permalink_url": track_data.get("permalink_url"),
            "downloadable": track_data.get("downloadable", False),
            "artwork_url": track_data.get("artwork_url"),
        }
    
    def get_user_tracks(self, user_id, limit=100):
        """
        Get all tracks uploaded by a specific user.
        Useful for artist analysis and catalog scraping.
        """
        endpoint = f"{self.base_url}/users/{user_id}/tracks"
        params = {
            "client_id": self.client_id,
            "limit": 50,
            "offset": 0,
        }
        
        all_tracks = []
        
        while len(all_tracks) < limit:
            response = self.session.get(endpoint, params=params)
            
            if response.status_code != 200:
                break
            
            data = response.json()
            tracks = data.get("collection", [])
            
            if not tracks:
                break
            
            for track in tracks:
                all_tracks.append(self._parse_track(track))
            
            params["offset"] += 50
            time.sleep(random.uniform(MIN_DELAY, MAX_DELAY))
        
        return all_tracks[:limit]
    
    def resolve_url(self, soundcloud_url):
        """
        Convert a SoundCloud URL to API resource data.
        Works with track, user, and playlist URLs.
        """
        endpoint = f"{self.base_url}/resolve"
        params = {
            "url": soundcloud_url,
            "client_id": self.client_id,
        }
        
        response = self.session.get(endpoint, params=params)
        
        if response.status_code == 200:
            return response.json()
        
        return None


if __name__ == "__main__":
    scraper = SoundCloudScraper()
    
    # Search for tracks
    tracks = scraper.search_tracks("electronic music", limit=20)
    
    # Convert to DataFrame for analysis
    df = pd.DataFrame(tracks)
    
    print(f"Found {len(tracks)} tracks")
    print(df[["title", "artist", "play_count", "genre"]].head(10))
    
    # Save to CSV
    df.to_csv("soundcloud_tracks.csv", index=False)
    print("Saved to soundcloud_tracks.csv")

This scraper handles pagination automatically, respects rate limits, and outputs clean structured data.

The resolve_url method is particularly useful. Pass any SoundCloud URL and it returns the full API data for that resource.

Method 3: Browser Automation with Playwright

Some SoundCloud pages require full JavaScript rendering. Browser automation handles these cases reliably.

# scrape_with_playwright.py
import asyncio
import json
from playwright.async_api import async_playwright
import pandas as pd

class SoundCloudBrowserScraper:
    """
    Scrape SoundCloud using browser automation.
    Handles JavaScript rendering and dynamic content loading.
    """
    
    def __init__(self):
        self.browser = None
        self.context = None
    
    async def init_browser(self, headless=True, proxy=None):
        """
        Initialize Playwright browser with optional proxy support.
        Use headless=False for debugging.
        """
        playwright = await async_playwright().start()
        
        browser_args = {
            "headless": headless,
            "args": [
                "--disable-blink-features=AutomationControlled",
                "--no-sandbox",
            ]
        }
        
        self.browser = await playwright.chromium.launch(**browser_args)
        
        context_args = {
            "viewport": {"width": 1920, "height": 1080},
            "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        }
        
        # Add proxy if provided
        if proxy:
            context_args["proxy"] = {
                "server": proxy["server"],
                "username": proxy.get("username"),
                "password": proxy.get("password"),
            }
        
        self.context = await self.browser.new_context(**context_args)
    
    async def scrape_search_results(self, query, max_results=50):
        """
        Scrape tracks from SoundCloud search results page.
        Scrolls to load additional results.
        """
        page = await self.context.new_page()
        
        # Navigate to search results
        search_url = f"https://soundcloud.com/search/sounds?q={query}"
        await page.goto(search_url, wait_until="networkidle")
        
        # Handle cookie consent if present
        try:
            consent_btn = page.locator("#onetrust-accept-btn-handler")
            if await consent_btn.is_visible(timeout=3000):
                await consent_btn.click()
                await page.wait_for_timeout(500)
        except:
            pass
        
        tracks = []
        previous_count = 0
        
        while len(tracks) < max_results:
            # Scroll to load more results
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)
            
            # Extract track data from page
            track_elements = await page.locator(".searchList__item").all()
            
            for element in track_elements[previous_count:]:
                try:
                    track_data = await self._extract_track_from_element(element)
                    if track_data and track_data not in tracks:
                        tracks.append(track_data)
                except Exception as e:
                    continue
            
            # Check if we've loaded all results
            if len(track_elements) == previous_count:
                break
            
            previous_count = len(track_elements)
        
        await page.close()
        return tracks[:max_results]
    
    async def _extract_track_from_element(self, element):
        """
        Parse track information from a search result element.
        """
        title_el = element.locator(".soundTitle__title")
        artist_el = element.locator(".soundTitle__usernameText")
        
        title = await title_el.inner_text() if await title_el.count() > 0 else None
        artist = await artist_el.inner_text() if await artist_el.count() > 0 else None
        
        if not title:
            return None
        
        # Get the track URL
        link_el = element.locator(".soundTitle__title a")
        href = await link_el.get_attribute("href") if await link_el.count() > 0 else None
        url = f"https://soundcloud.com{href}" if href else None
        
        return {
            "title": title.strip(),
            "artist": artist.strip() if artist else None,
            "url": url,
        }
    
    async def scrape_artist_page(self, artist_url):
        """
        Scrape all tracks from an artist's profile page.
        Handles infinite scrolling for large catalogs.
        """
        page = await self.context.new_page()
        
        # Navigate to artist's tracks page
        tracks_url = f"{artist_url}/tracks" if not artist_url.endswith("/tracks") else artist_url
        await page.goto(tracks_url, wait_until="networkidle")
        
        # Accept cookies
        try:
            await page.click("#onetrust-accept-btn-handler", timeout=3000)
        except:
            pass
        
        tracks = []
        previous_count = 0
        scroll_attempts = 0
        max_scroll_attempts = 20
        
        while scroll_attempts < max_scroll_attempts:
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)
            
            sound_items = await page.locator(".soundList__item").all()
            
            if len(sound_items) == previous_count:
                scroll_attempts += 1
            else:
                scroll_attempts = 0
            
            previous_count = len(sound_items)
        
        # Extract all loaded tracks
        sound_items = await page.locator(".soundList__item").all()
        
        for item in sound_items:
            try:
                track_data = await self._extract_track_from_sound_item(item)
                if track_data:
                    tracks.append(track_data)
            except:
                continue
        
        await page.close()
        return tracks
    
    async def _extract_track_from_sound_item(self, item):
        """
        Extract track data from artist page sound item.
        """
        title_el = item.locator(".soundTitle__title span")
        
        title_text = await title_el.first.inner_text() if await title_el.count() > 0 else None
        
        if not title_text:
            return None
        
        # Get play count if visible
        play_el = item.locator(".sc-ministats-plays")
        plays = None
        if await play_el.count() > 0:
            plays_text = await play_el.inner_text()
            plays = self._parse_count(plays_text)
        
        return {
            "title": title_text.strip(),
            "play_count": plays,
        }
    
    def _parse_count(self, count_text):
        """
        Convert formatted count strings like '1.2M' to integers.
        """
        if not count_text:
            return None
        
        count_text = count_text.strip().upper()
        
        multipliers = {"K": 1000, "M": 1000000, "B": 1000000000}
        
        for suffix, mult in multipliers.items():
            if suffix in count_text:
                try:
                    num = float(count_text.replace(suffix, ""))
                    return int(num * mult)
                except:
                    return None
        
        try:
            return int(count_text.replace(",", ""))
        except:
            return None
    
    async def close(self):
        """Clean up browser resources."""
        if self.context:
            await self.context.close()
        if self.browser:
            await self.browser.close()


async def main():
    scraper = SoundCloudBrowserScraper()
    await scraper.init_browser(headless=True)
    
    # Scrape search results
    tracks = await scraper.scrape_search_results("ambient music", max_results=30)
    
    print(f"Found {len(tracks)} tracks")
    for track in tracks[:5]:
        print(f"  {track['artist']} - {track['title']}")
    
    await scraper.close()


if __name__ == "__main__":
    asyncio.run(main())

Playwright renders JavaScript and handles infinite scroll pagination automatically. This approach works for pages where the API method fails.

The downside is speed. Browser automation runs slower than direct API calls. Use it when you need to scrape content that requires JavaScript rendering.

Method 4: Scraping with Proxies

High-volume scraping requires rotating proxies to avoid IP blocks. Here's how to integrate residential proxies into your scraper.

# scrape_with_proxies.py
import requests
import time
import random
from config import get_random_ua, BASE_HEADERS, MIN_DELAY, MAX_DELAY

class ProxiedSoundCloudScraper:
    """
    SoundCloud scraper with rotating proxy support.
    Suitable for high-volume data collection.
    """
    
    def __init__(self, client_id, proxy_config=None):
        """
        Initialize with proxy configuration.
        
        proxy_config format:
        {
            "host": "proxy.example.com",
            "port": 8080,
            "username": "user",
            "password": "pass"
        }
        """
        self.client_id = client_id
        self.proxy_config = proxy_config
        self.base_url = "https://api-v2.soundcloud.com"
    
    def _get_proxies(self):
        """Build proxy dict for requests library."""
        if not self.proxy_config:
            return None
        
        auth = ""
        if self.proxy_config.get("username"):
            auth = f"{self.proxy_config['username']}:{self.proxy_config['password']}@"
        
        proxy_url = f"http://{auth}{self.proxy_config['host']}:{self.proxy_config['port']}"
        
        return {
            "http": proxy_url,
            "https": proxy_url,
        }
    
    def search_tracks(self, query, limit=100):
        """
        Search tracks with proxy rotation.
        Automatically retries on failure.
        """
        endpoint = f"{self.base_url}/search/tracks"
        params = {
            "q": query,
            "client_id": self.client_id,
            "limit": 50,
            "offset": 0,
        }
        
        all_tracks = []
        failures = 0
        max_failures = 3
        
        while len(all_tracks) < limit and failures < max_failures:
            headers = {
                **BASE_HEADERS,
                "User-Agent": get_random_ua()
            }
            
            try:
                response = requests.get(
                    endpoint,
                    params=params,
                    headers=headers,
                    proxies=self._get_proxies(),
                    timeout=15
                )
                
                if response.status_code == 200:
                    data = response.json()
                    tracks = data.get("collection", [])
                    
                    if not tracks:
                        break
                    
                    all_tracks.extend(tracks)
                    params["offset"] += 50
                    failures = 0
                    
                elif response.status_code == 429:
                    # Rate limited - increase delay
                    time.sleep(10)
                    failures += 1
                    
                else:
                    failures += 1
                    
            except requests.exceptions.RequestException as e:
                print(f"Request failed: {e}")
                failures += 1
            
            time.sleep(random.uniform(MIN_DELAY, MAX_DELAY))
        
        return all_tracks[:limit]


# Example usage with Roundproxies residential proxies
if __name__ == "__main__":
    # Configure your proxy settings
    proxy_config = {
        "host": "gate.roundproxies.com",
        "port": 5432,
        "username": "your_username",
        "password": "your_password"
    }
    
    # Get client_id first
    from extract_client_id import get_client_id
    client_id = get_client_id()
    
    # Initialize scraper with proxies
    scraper = ProxiedSoundCloudScraper(client_id, proxy_config)
    
    tracks = scraper.search_tracks("hip hop beats", limit=50)
    print(f"Scraped {len(tracks)} tracks with proxy rotation")

Residential proxies from providers like Roundproxies.com route requests through real user IPs, making them harder to detect and block.

For large-scale scraping, rotate between multiple proxy endpoints and use session stickiness for sequences of related requests.

Handling Common Challenges

SoundCloud scraping presents several challenges. Here's how to handle them.

Rate Limiting

SoundCloud limits API requests per IP. The signs include 429 response codes and empty responses.

Implement exponential backoff:

def request_with_backoff(url, params, max_retries=5):
    """Make requests with exponential backoff on failure."""
    base_delay = 2
    
    for attempt in range(max_retries):
        response = requests.get(url, params=params)
        
        if response.status_code == 200:
            return response
        
        if response.status_code == 429:
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            time.sleep(delay)
        else:
            break
    
    return None

Geo-Blocking

Some tracks are restricted by region. Use proxies from the appropriate country to access geo-locked content.

Client ID Expiration

SoundCloud rotates client IDs periodically. Build in automatic refresh logic:

def ensure_valid_client_id(self):
    """Verify client_id works, refresh if needed."""
    test_url = f"{self.base_url}/search/tracks"
    params = {"q": "test", "client_id": self.client_id, "limit": 1}
    
    response = requests.get(test_url, params=params)
    
    if response.status_code != 200:
        self.client_id = get_client_id()

Dynamic Selectors

SoundCloud updates their HTML classes frequently. Use multiple fallback selectors:

title_selectors = [
    ".soundTitle__title > .sc-link-dark",
    ".trackItem__trackTitle", 
    "h3 a.sc-link-dark",
]

for selector in title_selectors:
    element = page.locator(selector)
    if await element.count() > 0:
        return await element.inner_text()

Exporting and Analyzing Data

Once you've collected data, export it for analysis.

# export_and_analyze.py
import pandas as pd
from datetime import datetime

def export_tracks(tracks, filename=None):
    """
    Export track data to CSV with timestamp.
    """
    df = pd.DataFrame(tracks)
    
    if not filename:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"soundcloud_tracks_{timestamp}.csv"
    
    df.to_csv(filename, index=False)
    print(f"Exported {len(tracks)} tracks to {filename}")
    return df

def analyze_tracks(df):
    """
    Generate basic statistics from scraped data.
    """
    print("\n=== Track Analysis ===")
    print(f"Total tracks: {len(df)}")
    
    if "play_count" in df.columns:
        print(f"Total plays: {df['play_count'].sum():,}")
        print(f"Average plays: {df['play_count'].mean():,.0f}")
        print(f"Top track: {df.loc[df['play_count'].idxmax(), 'title']}")
    
    if "genre" in df.columns:
        print(f"\nTop genres:")
        print(df["genre"].value_counts().head(5))
    
    if "artist" in df.columns:
        print(f"\nMost prolific artists:")
        print(df["artist"].value_counts().head(5))

# Example usage
if __name__ == "__main__":
    from scrape_tracks import SoundCloudScraper
    
    scraper = SoundCloudScraper()
    tracks = scraper.search_tracks("indie rock", limit=100)
    
    df = export_tracks(tracks)
    analyze_tracks(df)

Legal and Ethical Considerations

Web scraping exists in a legal gray area. Follow these guidelines to scrape SoundCloud responsibly.

Respect robots.txt

Check SoundCloud's robots.txt file for crawl directives. Honor any restrictions specified there.

Rate Limit Your Requests

Don't hammer the servers. Add delays between requests and implement backoff when you receive errors.

Don't Scrape Private Content

Only access publicly available data. Avoid attempting to access private tracks or user information.

Use Data Responsibly

Scraped data should be used for legitimate purposes like research, analysis, or building tools that add value. Don't redistribute copyrighted content.

Check Terms of Service

Review SoundCloud's Terms of Service before scraping. Commercial use may require explicit permission.

Scraping Playlists and User Profiles

Beyond individual tracks, you can scrape entire playlists and user profile data.

# scrape_playlists.py
import requests
import time
from config import get_random_ua, BASE_HEADERS, MIN_DELAY, MAX_DELAY

class PlaylistScraper:
    """
    Extract tracks from SoundCloud playlists and sets.
    """
    
    def __init__(self, client_id):
        self.client_id = client_id
        self.base_url = "https://api-v2.soundcloud.com"
        self.session = requests.Session()
        self.session.headers.update({
            **BASE_HEADERS,
            "User-Agent": get_random_ua()
        })
    
    def get_playlist_tracks(self, playlist_url):
        """
        Get all tracks from a playlist by URL.
        Returns playlist metadata and track list.
        """
        # First resolve the playlist URL to get ID
        resolve_endpoint = f"{self.base_url}/resolve"
        params = {
            "url": playlist_url,
            "client_id": self.client_id,
        }
        
        response = self.session.get(resolve_endpoint, params=params)
        
        if response.status_code != 200:
            return None
        
        playlist_data = response.json()
        
        # Extract basic playlist info
        result = {
            "id": playlist_data.get("id"),
            "title": playlist_data.get("title"),
            "creator": playlist_data.get("user", {}).get("username"),
            "track_count": playlist_data.get("track_count"),
            "likes_count": playlist_data.get("likes_count"),
            "tracks": []
        }
        
        # Get full track details
        tracks = playlist_data.get("tracks", [])
        
        for track in tracks:
            result["tracks"].append({
                "id": track.get("id"),
                "title": track.get("title"),
                "artist": track.get("user", {}).get("username"),
                "duration_ms": track.get("duration"),
                "play_count": track.get("playback_count", 0),
            })
        
        return result
    
    def get_user_profile(self, username):
        """
        Scrape comprehensive user profile data.
        Includes stats, description, and social links.
        """
        user_url = f"https://soundcloud.com/{username}"
        
        resolve_endpoint = f"{self.base_url}/resolve"
        params = {
            "url": user_url,
            "client_id": self.client_id,
        }
        
        response = self.session.get(resolve_endpoint, params=params)
        
        if response.status_code != 200:
            return None
        
        user_data = response.json()
        
        return {
            "id": user_data.get("id"),
            "username": user_data.get("username"),
            "full_name": user_data.get("full_name"),
            "description": user_data.get("description"),
            "city": user_data.get("city"),
            "country": user_data.get("country_code"),
            "followers_count": user_data.get("followers_count"),
            "followings_count": user_data.get("followings_count"),
            "track_count": user_data.get("track_count"),
            "playlist_count": user_data.get("playlist_count"),
            "likes_count": user_data.get("likes_count"),
            "avatar_url": user_data.get("avatar_url"),
            "created_at": user_data.get("created_at"),
            "verified": user_data.get("verified", False),
        }
    
    def get_user_followers(self, user_id, limit=100):
        """
        Get list of users following a specific account.
        Useful for network analysis.
        """
        endpoint = f"{self.base_url}/users/{user_id}/followers"
        params = {
            "client_id": self.client_id,
            "limit": 50,
            "offset": 0,
        }
        
        all_followers = []
        
        while len(all_followers) < limit:
            response = self.session.get(endpoint, params=params)
            
            if response.status_code != 200:
                break
            
            data = response.json()
            followers = data.get("collection", [])
            
            if not followers:
                break
            
            for follower in followers:
                all_followers.append({
                    "id": follower.get("id"),
                    "username": follower.get("username"),
                    "followers_count": follower.get("followers_count"),
                })
            
            params["offset"] += 50
            time.sleep(MIN_DELAY)
        
        return all_followers[:limit]


if __name__ == "__main__":
    from extract_client_id import get_client_id
    
    client_id = get_client_id()
    scraper = PlaylistScraper(client_id)
    
    # Get playlist data
    playlist = scraper.get_playlist_tracks(
        "https://soundcloud.com/example-user/sets/my-playlist"
    )
    
    if playlist:
        print(f"Playlist: {playlist['title']}")
        print(f"Tracks: {playlist['track_count']}")

This scraper extracts playlist metadata, track listings, and user profile information including follower counts and social stats.

Storing Data in a Database

For ongoing data collection, store results in a database rather than CSV files.

# database_storage.py
import sqlite3
from datetime import datetime

class SoundCloudDatabase:
    """
    SQLite storage for scraped SoundCloud data.
    Supports incremental updates and deduplication.
    """
    
    def __init__(self, db_path="soundcloud_data.db"):
        self.conn = sqlite3.connect(db_path)
        self._create_tables()
    
    def _create_tables(self):
        """Initialize database schema."""
        cursor = self.conn.cursor()
        
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS tracks (
                id INTEGER PRIMARY KEY,
                title TEXT,
                artist TEXT,
                artist_id INTEGER,
                duration_ms INTEGER,
                play_count INTEGER,
                like_count INTEGER,
                genre TEXT,
                permalink_url TEXT,
                created_at TEXT,
                scraped_at TEXT
            )
        """)
        
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS users (
                id INTEGER PRIMARY KEY,
                username TEXT,
                full_name TEXT,
                followers_count INTEGER,
                track_count INTEGER,
                city TEXT,
                country TEXT,
                scraped_at TEXT
            )
        """)
        
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_tracks_artist 
            ON tracks(artist_id)
        """)
        
        self.conn.commit()
    
    def save_track(self, track_data):
        """
        Insert or update a track record.
        Updates play counts on duplicate.
        """
        cursor = self.conn.cursor()
        
        cursor.execute("""
            INSERT OR REPLACE INTO tracks 
            (id, title, artist, artist_id, duration_ms, play_count, 
             like_count, genre, permalink_url, created_at, scraped_at)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            track_data.get("id"),
            track_data.get("title"),
            track_data.get("artist"),
            track_data.get("artist_id"),
            track_data.get("duration_ms"),
            track_data.get("play_count"),
            track_data.get("like_count"),
            track_data.get("genre"),
            track_data.get("permalink_url"),
            track_data.get("created_at"),
            datetime.now().isoformat()
        ))
        
        self.conn.commit()
    
    def save_tracks_batch(self, tracks):
        """Bulk insert multiple tracks efficiently."""
        for track in tracks:
            self.save_track(track)
    
    def get_tracks_by_artist(self, artist_id):
        """Retrieve all tracks by a specific artist."""
        cursor = self.conn.cursor()
        cursor.execute(
            "SELECT * FROM tracks WHERE artist_id = ?",
            (artist_id,)
        )
        return cursor.fetchall()
    
    def get_top_tracks(self, limit=100):
        """Get tracks sorted by play count."""
        cursor = self.conn.cursor()
        cursor.execute(
            "SELECT * FROM tracks ORDER BY play_count DESC LIMIT ?",
            (limit,)
        )
        return cursor.fetchall()
    
    def close(self):
        """Close database connection."""
        self.conn.close()

SQLite handles storage for most scraping projects. For larger scale operations, migrate to PostgreSQL or MongoDB.

Scheduling Automated Scraping Jobs

Run your scrapers on a schedule to collect data over time.

# scheduler.py
import schedule
import time
from scrape_tracks import SoundCloudScraper
from database_storage import SoundCloudDatabase

def daily_trending_scrape():
    """
    Scheduled job to scrape trending tracks daily.
    """
    print(f"Starting daily scrape at {time.strftime('%Y-%m-%d %H:%M:%S')}")
    
    scraper = SoundCloudScraper()
    db = SoundCloudDatabase()
    
    # Scrape multiple genres
    genres = ["electronic", "hip-hop", "indie", "pop", "ambient"]
    
    for genre in genres:
        try:
            tracks = scraper.search_tracks(f"trending {genre}", limit=50)
            db.save_tracks_batch(tracks)
            print(f"Saved {len(tracks)} {genre} tracks")
        except Exception as e:
            print(f"Error scraping {genre}: {e}")
    
    db.close()
    print("Daily scrape complete")

# Schedule the job
schedule.every().day.at("02:00").do(daily_trending_scrape)

if __name__ == "__main__":
    print("Scheduler started. Press Ctrl+C to exit.")
    
    while True:
        schedule.run_pending()
        time.sleep(60)

This scheduler runs a scraping job at 2 AM daily. Adjust the timing based on when SoundCloud sees the least traffic.

For production deployments, use a proper task queue like Celery or run jobs through cron.

Troubleshooting Common Issues

Here are solutions to problems you'll encounter when scraping SoundCloud.

Empty Responses After Working Initially

Your client ID probably expired. SoundCloud rotates these every few hours to few days. Refresh it using the extraction function.

403 Forbidden Errors

You're likely blocked by IP. Solutions include:

Switch to a residential proxy
Increase delays between requests
Rotate user agents more frequently

Missing Data Fields

SoundCloud returns different fields based on the endpoint and track status. Always use .get() with defaults when accessing nested data.

Slow Playwright Scraping

Browser automation is inherently slower. Speed it up by:

Blocking images and media: page.route("**/*.{png,jpg,jpeg,gif,webp}", lambda route: route.abort())
Using headless mode
Reusing browser contexts instead of creating new ones

Inconsistent Results

SoundCloud personalizes results based on location and history. Use a consistent proxy location and clear cookies between sessions for reproducible results.

Conclusion

You now have multiple proven methods to scrape SoundCloud effectively in 2026. The API-based approach offers speed and clean data. Browser automation handles JavaScript-heavy pages. Proxy integration enables scale. If you need to scrape SoundCloud data at volume, combining all three approaches gives you maximum flexibility.

Start with the API method for most use cases. It's faster and returns structured JSON. Fall back to Playwright when you encounter pages that require full JavaScript rendering.

For production scraping at scale, combine these methods with residential proxies to maintain reliable access. Monitor for changes in SoundCloud's structure and update your selectors accordingly.

The code examples in this guide are complete and working. Adapt them to your specific data collection needs and always scrape responsibly.

Frequently Asked Questions

Can I scrape SoundCloud without getting blocked?

Yes, by using reasonable request delays, rotating user agents, and residential proxies. Avoid aggressive scraping patterns that trigger rate limits.

Is scraping SoundCloud legal?

Scraping publicly available data is generally permitted for personal use and research. Commercial applications may require explicit permission from SoundCloud. Always review their Terms of Service.

How do I download audio files from SoundCloud?

This guide focuses on metadata extraction. Downloading audio files involves additional legal considerations around copyright. Only download tracks with explicit download permissions enabled by the creator.

What's the best programming language for SoundCloud scraping?

Python offers the best combination of libraries for web scraping. The requests library handles API calls efficiently, while Playwright manages browser automation when needed.

How often does SoundCloud change their API?

SoundCloud's internal API endpoints remain relatively stable, but client IDs rotate frequently. Build your scraper to refresh credentials automatically rather than hardcoding values.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

This article was originally published in January 2026, written by Marius Bernard. It was most recently updated in January 2026.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Tags

Related from Knowledge Base

BoringSSL: Google's TLS Library Behind Chrome Fingerprinting

What Is IP Rotation? How it works and why you need it

How to bypass Bot Detection in 2026: 8 easy methods

What is 403 Forbidden Error? Causes & Fixes Explained

Guide to List Crawling in 2026: Extract data at scale

HTTP Error 429: What It Is & How to Fix It (2026)

The 8 best Residential Proxy providers in 2026

How ISP Proxies work in 2026: Step by step explained

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

Web Scraping with Kotlin in 2026: Complete Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to use Playwright Proxy in 2026: Full setup guide