Web Scraping

4 easy ways to scrape AutoScout24 in 2026

07 June 2026

11 min read

AutoScout24 is Europe's largest car marketplace, with more than 2 million live listings across 18 countries. That's a lot of pricing data sitting behind a search box.

This guide shows you four ways to scrape AutoScout24 with Python, ordered from easiest to hardest. You build and run all of them yourself. No paid scraping API, no black box.

What is AutoScout24 scraping?

Scraping AutoScout24 means pulling vehicle listing data straight from the site with code instead of copying it by hand. You get prices, mileage, year, fuel type, transmission, and seller details as structured records. The reliable way is to read the embedded JSON the page already ships, then handle Akamai's bot checks with a real browser only when you have to.

Most tutorials skip the first part. They parse HTML classes that break every release. We'll start with the durable approach instead.

Why scrape AutoScout24?

People scrape AutoScout24 for a handful of concrete reasons, and they all come back to pricing.

Dealers watch competitor prices across markets. A BMW 3 Series can sit €2,000 cheaper in Italy than in Germany, and that gap is only visible if you're pulling both markets.

Analysts model depreciation. Feed a few thousand listings into a notebook and you can see exactly how mileage and age bend resale value for a given model.

Buyers build alerts. Rather than refreshing a search page, a scraper pings you the moment a car matching your filters shows up under budget.

Whatever the reason, the shape of the job is the same: pull structured listings at some cadence and store them. Let's set up the tools.

What you'll need

Before any code, get these in place:

Python 3.9 or newer
requests, beautifulsoup4, and lxml for the lightweight methods
playwright and nodriver for the browser methods
Basic comfort reading JSON in your browser's dev tools

Install the core packages in one line:

pip install requests beautifulsoup4 lxml

We'll install the browser tools in their own sections so you only grab what you use.

How AutoScout24 is built (read this first)

AutoScout24 runs on Next.js. That single fact decides your whole strategy.

Every page ships a <script id="__NEXT_DATA__"> tag containing the full page payload as JSON. The prices and specs you see on screen are already in that blob before any JavaScript runs.

So you rarely need to parse HTML at all. You grab the script tag, load the JSON, and read clean fields.

There's a catch, and it's the reason plain scrapers fail. AutoScout24 sits behind Akamai Bot Manager, which fingerprints your TLS handshake, sets an _abck cookie, and throws 403s at anything that looks automated.

Next.js also hashes its CSS class names. A selector like Price_price__APlgs looks stable until the next deploy renames it to something else. Build a scraper on those classes and it rots within a release cycle.

The JSON payload doesn't have that problem. Its shape stays consistent far longer than any class name. That's why we lead with it.

Want to see it yourself before writing code? Open any AutoScout24 search page, hit F12, and go to the Console tab.

Paste this and press enter:

JSON.parse(document.getElementById("__NEXT_DATA__").textContent).props.pageProps

You'll get an expandable tree of every listing on the page. Spend two minutes here noting the field names for your market. It'll save you an hour of guessing later.

The 4 methods at a glance

Method	How you fetch	How you parse	Detection risk	Best for
1. Embedded JSON	`requests`	`__NEXT_DATA__`	Medium	Clean data, small to medium jobs
2. HTML parsing	`requests`	CSS + `data-testid`	Medium	Quick one-offs, missing JSON fields
3. Playwright	Headless browser	`__NEXT_DATA__`	Lower	When plain requests get 403s
4. Nodriver	Stealth browser	`__NEXT_DATA__`	Lowest	When Akamai catches Playwright

Start at the top. Only move down when the site forces you to.

Method 1: Grab the embedded JSON with requests

This is the method almost every tutorial hides behind a paid API. It's the easiest and the most durable, so it goes first.

The idea: fetch the page with a normal request, find the __NEXT_DATA__ script, and read the listings out of JSON.

Set up a believable session

Akamai reads your headers before anything else. A bare requests call with no headers gets flagged instantly.

import requests

def make_session():
    session = requests.Session()
    session.headers.update({
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/122.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "de-DE,de;q=0.9,en;q=0.8",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        # Sec-Fetch headers make the request look like a real navigation
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
    })
    return session

The de-DE language header matters. You're hitting a German-first site, so speaking German lowers suspicion.

Pull and load the JSON

Now find the script tag and parse it. Note that we don't hardcode a deep key path, because that path shifts between releases.

import json
from bs4 import BeautifulSoup

def get_next_data(html):
    soup = BeautifulSoup(html, "lxml")
    tag = soup.find("script", id="__NEXT_DATA__")
    if not tag:
        return None
    return json.loads(tag.string)

If get_next_data returns None, you were either blocked or the page shape changed. That's your signal to jump to Method 3.

Find the listings inside the blob

The page data lives under props.pageProps. Instead of guessing the exact key, walk the structure and grab the first list of listing-shaped objects.

def find_listings(node):
    """Recursively search the JSON for the listings array."""
    if isinstance(node, list):
        # A listing usually has both a price and a vehicle id
        if node and isinstance(node[0], dict) and "price" in node[0]:
            return node
        for item in node:
            found = find_listings(item)
            if found:
                return found
    elif isinstance(node, dict):
        for value in node.values():
            found = find_listings(value)
            if found:
                return found
    return None

This survives key renames because it looks for the shape of a listing, not a fixed address. Open __NEXT_DATA__ in your dev tools console once to confirm the fields on your target market.

Tie it together

Here's the full method 1 run against BMW listings in Germany.

def scrape_json(url):
    session = make_session()
    resp = session.get(url, timeout=20)
    if resp.status_code != 200:
        print(f"Blocked or error: HTTP {resp.status_code}")
        return []
    data = get_next_data(resp.text)
    listings = find_listings(data) if data else []
    return listings

if __name__ == "__main__":
    url = "https://www.autoscout24.com/lst/bmw?atype=C&cy=D&sort=standard"
    cars = scrape_json(url)
    print(f"Found {len(cars)} listings")

When this works, you get richer data than the visible page shows, including fields AutoScout24 never renders. When it 403s, you've learned Akamai is challenging your IP, and it's time for a browser.

Clean up the records and save them

Raw listing objects carry more nesting than you want. Flatten each one to the fields you'll actually use.

def clean_listing(raw):
    """Pull the useful fields out of one raw listing dict."""
    return {
        "make": raw.get("make"),
        "model": raw.get("model"),
        "price": raw.get("price"),
        "mileage": raw.get("mileage"),
        "year": raw.get("firstRegistration"),
        "fuel": raw.get("fuelType"),
        "url": raw.get("url"),
    }

Field names vary by market, so confirm them against your own __NEXT_DATA__ dump. The .get() calls mean a renamed key returns None instead of crashing the run.

Then write the cleaned records to CSV:

import csv

def save_csv(listings, filename="autoscout24.csv"):
    if not listings:
        return
    rows = [clean_listing(item) for item in listings]
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=rows[0].keys())
        writer.writeheader()
        writer.writerows(rows)
    print(f"Saved {len(rows)} rows to {filename}")

For anything you re-scrape on a schedule, use the listing url as a unique key so repeat runs don't pile up duplicates.

Method 2: Parse the HTML with BeautifulSoup

Sometimes you want a field that isn't in the JSON, or the JSON shape changed and you need a quick fix today. That's when you parse the rendered HTML directly.

Be honest with yourself about the tradeoff. This is the brittle path, so lean on data-testid attributes, which AutoScout24 keeps stable for its own tests.

def parse_html_listings(html):
    soup = BeautifulSoup(html, "lxml")
    cars = []
    for article in soup.select("article[class*='cldt-summary-full-item']"):
        # data-testid survives longer than hashed class names
        price = article.select_one("[data-testid='regular-price']")
        mileage = article.select_one("[data-testid='VehicleDetails-mileage_road']")
        year = article.select_one("[data-testid='VehicleDetails-calendar']")
        cars.append({
            "price": price.get_text(strip=True) if price else None,
            "mileage": mileage.get_text(strip=True) if mileage else None,
            "year": year.get_text(strip=True) if year else None,
        })
    return cars

Notice the class*= selector. It matches a partial class name, so a hash change on the suffix won't break it outright.

Still, treat this as a stopgap. If your data-testid selectors start returning None across the board, the markup moved, and Method 1's JSON is the better long-term home.

Method 3: Render with Playwright

When plain requests hit a wall of 403s, run a real browser. Playwright drives Chromium, passes Akamai's JavaScript challenge, and lets the _abck cookie get set the normal way.

Install it and its browser binary:

pip install playwright
playwright install chromium

The second command downloads the actual Chromium build Playwright controls.

Launch a less-detectable browser

A default headless launch leaks the navigator.webdriver flag. Strip it, and set a realistic viewport and timezone.

from playwright.sync_api import sync_playwright

def make_context(pw):
    browser = pw.chromium.launch(
        headless=True,
        args=["--disable-blink-features=AutomationControlled"],
    )
    context = browser.new_context(
        viewport={"width": 1920, "height": 1080},
        locale="de-DE",
        timezone_id="Europe/Berlin",
    )
    # Hide the automation flag before any page script reads it
    context.add_init_script(
        "Object.defineProperty(navigator, 'webdriver', {get: () => undefined});"
    )
    return browser, context

The add_init_script runs before the page's own JavaScript, so the flag is already gone when Akamai checks for it.

Reuse the JSON extractor

Here's the payoff for leading with Method 1: once the browser renders the page, you pull the same __NEXT_DATA__ JSON. One parser, two fetch strategies.

def scrape_playwright(url):
    with sync_playwright() as pw:
        browser, context = make_context(pw)
        page = context.new_page()
        page.goto(url, wait_until="networkidle", timeout=30000)
        html = page.content()
        browser.close()
    data = get_next_data(html)
    return find_listings(data) if data else []

wait_until="networkidle" gives Akamai's challenge time to resolve before you read the HTML. If you still get an empty result, the IP itself is flagged, which proxies fix.

Method 4: Go stealth with Nodriver

Akamai in 2026 can spot Playwright by its Chrome DevTools Protocol traffic. When that happens, Nodriver is the next step. It drives Chrome without the CDP signals that give automation away.

pip install nodriver

Nodriver manages its own browser, so there's no separate install step.

import asyncio
import nodriver as uc

async def scrape_nodriver(url):
    browser = await uc.start()
    page = await browser.get(url)
    await page.sleep(3)  # let the challenge clear
    html = await page.get_content()
    await browser.stop()
    data = get_next_data(html)
    return find_listings(data) if data else []

if __name__ == "__main__":
    url = "https://www.autoscout24.com/lst/audi/a4?atype=C&cy=D"
    cars = asyncio.run(scrape_nodriver(url))
    print(f"Found {len(cars)} listings")

The sleep(3) is deliberate. Akamai's challenge can take a couple of seconds to run, and reading the HTML too early gives you an interstitial page instead of listings.

Nodriver is slower than plain requests, but it's the most reliable option when detection is aggressive. Keep it in reserve rather than reaching for it first.

Handling pagination

One page gives you 20 listings. AutoScout24 caps a single search at 20 pages, so 400 results per query.

Add a page parameter and loop, with a pause between requests.

import random, time

def scrape_all_pages(base_url, max_pages=20):
    all_cars = []
    for page in range(1, max_pages + 1):
        sep = "&" if "?" in base_url else "?"
        url = f"{base_url}{sep}page={page}"
        cars = scrape_json(url)
        if not cars:
            break  # blocked or out of results
        all_cars.extend(cars)
        time.sleep(random.uniform(3, 7))  # look human
    return all_cars

To pull more than 400 records, split the search with tighter filters like price bands or registration year. Each narrowed search gives you a fresh set of 400.

Going deeper: individual car pages

Search results give you the summary. The full picture lives on each car's own page: service history, number of owners, options list, and every technical spec.

The good news is that detail pages are also Next.js. The same __NEXT_DATA__ extractor you already wrote works without changes.

def scrape_detail(url):
    session = make_session()
    resp = session.get(url, timeout=20)
    if resp.status_code != 200:
        return None
    data = get_next_data(resp.text)
    # detail data sits under pageProps, not in a list
    return data["props"]["pageProps"] if data else None

Feed it a listing url from your search results and you get the complete record for that car.

Pace yourself here. A detail page per listing means 400 extra requests per search, so keep your delays generous and rotate IPs if you're doing this at volume.

When you need proxies

Akamai blocks datacenter IPs fast. If you're scraping past a few hundred pages, one IP won't cut it, and cheap datacenter ranges get flagged on sight.

Residential IPs from the country you're targeting work best, since a German listing search from a German home IP looks exactly like a real buyer. Roundproxies rotating residential pools handle this for AutoScout24's .de, .at, and .ch domains.

Wiring a proxy into requests is two lines:

def make_session_with_proxy(proxy_url):
    session = make_session()
    session.proxies = {"http": proxy_url, "https": proxy_url}
    return session

For Playwright, pass the proxy at launch instead:

browser = pw.chromium.launch(
    headless=True,
    proxy={"server": "http://your-proxy:port",
           "username": "user", "password": "pass"},
)

Match the proxy country to the domain you're scraping. Mismatches are a quiet way to get flagged.

Common errors and fixes

403 Forbidden on every request. Akamai flagged your fingerprint or IP. Move from Method 1 to Playwright or Nodriver, and add residential proxies. Our 403 Forbidden guide breaks down the causes.

__NEXT_DATA__ is missing. You got an interstitial challenge page, not the listing page. Switch to a browser method and wait for the network to settle.

All selectors return None. The HTML changed and your Method 2 classes are stale. Move that logic to the JSON in Method 1.

Empty list after a few hundred requests. You hit a soft IP block or the 400-result cap. Rotate proxies and split the search with filters.

HTTP 429. Too many requests, too fast. Add longer random delays and back off exponentially before retrying.

Here's a retry wrapper that doubles the wait after each failure, so a brief block doesn't kill your whole run.

import time

def fetch_with_backoff(url, session, max_retries=3):
    delay = 5
    for attempt in range(max_retries):
        resp = session.get(url, timeout=20)
        if resp.status_code == 200:
            return resp
        if resp.status_code in (429, 403):
            time.sleep(delay)
            delay *= 2  # 5s, then 10s, then 20s
        else:
            return None
    return None

Three retries with doubling delays clears most soft rate limits. If you're still blocked after that, the problem is the IP, not the pace, and you need proxies.

Which method should you use?

Don't overthink it. Follow the ladder.

Use Method 1 for anything small to medium. It's fast, the data is clean, and it survives site updates. This should be your default.

Reach for Method 2 only when you need a field the JSON doesn't carry, or you need a fix in the next ten minutes.

Move to Method 3 the moment plain requests start returning 403s. A real browser clears most of Akamai's checks.

Save Method 4 for the hardest cases, where even Playwright gets caught. It's the slowest but the stealthiest.

For scaling any of them, add rotating residential proxies. Speed and staying unblocked are the same problem at volume.

A note on responsible scraping

Scrape the public data, not the site into the ground. Check robots.txt, keep your request rate reasonable, and add real delays.

Prices and specs on public listings are fair game for research and monitoring. Personal seller contact details are not.

Reusing data commercially can also cross AutoScout24's terms. When in doubt about commercial use, ask a lawyer, not a blog.

FAQ

Is scraping AutoScout24 legal? Collecting public listing data for personal research is generally fine. Republishing it or using it commercially may breach their terms, so check your specific case.

Why do my requests get blocked instantly? Datacenter IPs and bare headers trip Akamai on the first request. Use realistic headers, and residential proxies when you scale. Our bot detection guide covers the signals it reads.

Do I really not need a browser? For most jobs, no. Method 1's __NEXT_DATA__ approach pulls everything from a single request. You only need a browser once Akamai starts challenging your IP.

How many listings can I pull per day? Each search caps at 400 results. With proxy rotation and sensible delays, thousands of listings a day is realistic. Push too fast and you'll get blocked.

Which country domains work the same way? All of them. .de, .at, .ch, .it, and the rest all run the same Next.js setup, so the JSON method carries over. Just match your proxy country to the domain.

Can I scrape historical prices? No. AutoScout24 only shows current listings. To build history, scrape on a schedule and store the results yourself over time.

Wrapping up

You now have four ways to scrape AutoScout24, from a single clean request to a full stealth browser. The order is the point.

Start with the embedded JSON, and only add weight when the site pushes back.

Method 1 handles most of what people actually need. Keep Playwright and Nodriver in your pocket for the days Akamai gets serious, and add residential proxies once you scale past a few hundred pages.

If you're scraping other European marketplaces next, the same Next.js JSON trick works on plenty of them. Here's the same approach applied to Immobilienscout24.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

This article was originally published in June 2026, written by Marius Bernard. It was most recently updated in July 2026.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Tags

Web Scraping

Related from Knowledge Base

BoringSSL: Google's TLS Library Behind Chrome Fingerprinting

What Is IP Rotation? How it works and why you need it

How to bypass Bot Detection in 2026: 8 easy methods

What is 403 Forbidden Error? Causes & Fixes Explained

Guide to List Crawling in 2026: Extract data at scale

HTTP Error 429: What It Is & How to Fix It (2026)

The 8 best Residential Proxy providers in 2026

How ISP Proxies work in 2026: Step by step explained

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

Web Scraping with Kotlin in 2026: Complete Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to use Playwright Proxy in 2026: Full setup guide

4 easy ways to scrape AutoScout24 in 2026

What is AutoScout24 scraping?

Why scrape AutoScout24?

What you'll need

How AutoScout24 is built (read this first)

The 4 methods at a glance

Method 1: Grab the embedded JSON with requests

Set up a believable session

Pull and load the JSON

Find the listings inside the blob

Tie it together

Clean up the records and save them

Method 2: Parse the HTML with BeautifulSoup

Method 3: Render with Playwright

Launch a less-detectable browser

Reuse the JSON extractor

Method 4: Go stealth with Nodriver

Going deeper: individual car pages

When you need proxies

Common errors and fixes

Which method should you use?

A note on responsible scraping

FAQ

Wrapping up

Enterprise proxies that scale.
Unmatched performance.

Let's get you connected with a Roundproxies expert

Successfully Submitted!

What is AutoScout24 scraping?

Why scrape AutoScout24?

What you'll need

How AutoScout24 is built (read this first)

The 4 methods at a glance

Method 1: Grab the embedded JSON with requests

Set up a believable session

Pull and load the JSON

Find the listings inside the blob

Tie it together

Clean up the records and save them

Method 2: Parse the HTML with BeautifulSoup

Method 3: Render with Playwright

Launch a less-detectable browser

Reuse the JSON extractor

Method 4: Go stealth with Nodriver

Handling pagination

Going deeper: individual car pages

When you need proxies

Common errors and fixes

Which method should you use?

A note on responsible scraping

FAQ

Wrapping up

Enterprise proxies that scale.Unmatched performance.

Enterprise proxies that scale.
Unmatched performance.