Web Scraping

How to scrape AutoScout24 in 2026: 3 working methods

AutoScout24 is Europe's largest online car marketplace with over 2 million active listings. Scraping AutoScout24 gives you access to vehicle prices, specifications, mileage data, and seller information across 18+ countries.

In this guide, you'll learn how to scrape AutoScout24 using Python with multiple approaches. We'll cover everything from basic HTTP requests to browser automation for handling their Akamai bot protection.

What Is AutoScout24 Scraping?

Scraping AutoScout24 means programmatically extracting vehicle listing data from their website. This data includes car prices, makes, models, mileage, year of registration, fuel type, transmission, and seller details.

AutoScout24 uses Akamai bot protection and JavaScript rendering, which makes simple scraping requests fail with 403 errors. You need specific techniques to bypass these protections and extract data reliably.

Why Scrape AutoScout24?

Car dealers, researchers, and data analysts scrape AutoScout24 for several reasons.

Price monitoring is the primary use case. Dealers track competitor pricing across European markets to adjust their own listings. Price differences between countries can be significant.

Market research helps manufacturers understand vehicle depreciation patterns. Knowing how mileage, age, and features affect resale value informs production decisions.

Inventory tracking lets buyers find specific vehicles matching their criteria. Rather than manually checking hundreds of listings, a scraper can alert you when the right car appears.

Lead generation for dealerships involves identifying private sellers who might want to trade in their vehicle. Contact information from listings creates sales opportunities.

Understanding AutoScout24's Structure

Before writing any code, you need to understand how AutoScout24 organizes its data.

URL Structure

AutoScout24 uses clean, predictable URLs for listings:

https://www.autoscout24.com/lst/{brand}?atype=C&cy=D&desc=0&sort=standard&ustate=N,U

The lst path indicates a listing search. Country codes like D (Germany), A (Austria), or I (Italy) filter by market.

Individual car pages follow this pattern:

https://www.autoscout24.com/offers/{brand}-{model}-{details}-{unique-id}

Page Structure

Listing pages contain vehicle cards with summary information. Each card shows the title, price, mileage, year, fuel type, and a thumbnail image.

Detail pages hold complete specifications. You'll find engine data, color, number of owners, service history, and seller contact details.

Anti-Bot Protection

AutoScout24 deploys Akamai's bot management system. This protection includes browser fingerprinting, JavaScript challenges, and IP-based rate limiting.

Basic Python requests often get blocked immediately. You need either proper header rotation, residential proxies, or browser automation to succeed.

Method 1: Scraping with Python Requests

The simplest approach uses Python's requests library with careful header management. This works for small-scale scraping when you rotate user agents and add delays.

Installing Dependencies

Open your terminal and install the required packages:

pip install requests beautifulsoup4 lxml

These packages handle HTTP requests and HTML parsing.

Basic Request Setup

Create a new file called autoscout24_scraper.py and add this code:

import requests
from bs4 import BeautifulSoup
import random
import time

# Realistic user agents to rotate
USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
]

def create_session():
    """Create a requests session with random headers."""
    session = requests.Session()
    session.headers.update({
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1',
        'Sec-Fetch-Dest': 'document',
        'Sec-Fetch-Mode': 'navigate',
        'Sec-Fetch-Site': 'none',
        'Sec-Fetch-User': '?1',
    })
    return session

This code creates a session with browser-like headers. The Sec-Fetch-* headers mimic real Chrome behavior.

Fetching a Listing Page

Now add a function to fetch and parse search results:

def fetch_listings(url, session):
    """Fetch car listings from a search URL."""
    try:
        # Add random delay between requests
        time.sleep(random.uniform(2, 5))
        
        response = session.get(url, timeout=15)
        
        if response.status_code == 403:
            print("Blocked by anti-bot protection")
            return None
            
        if response.status_code != 200:
            print(f"Error: Status code {response.status_code}")
            return None
            
        return BeautifulSoup(response.content, 'lxml')
        
    except requests.RequestException as e:
        print(f"Request failed: {e}")
        return None

The function returns a BeautifulSoup object for parsing or None if the request fails.

Extracting Car Data

AutoScout24 uses specific CSS classes for car information. Here's how to extract listing data:

def extract_car_listings(soup):
    """Extract car data from search results page."""
    cars = []
    
    # Find all listing articles
    articles = soup.find_all('article', class_='cldt-summary-full-item')
    
    for article in articles:
        try:
            # Extract car title
            title_elem = article.find('a', class_='ListItem_title__ndA4s')
            title = title_elem.get_text(strip=True) if title_elem else 'N/A'
            
            # Extract link to detail page
            link = 'https://www.autoscout24.com' + title_elem['href'] if title_elem else None
            
            # Extract price
            price_elem = article.find('p', class_='Price_price__APlgs')
            price = price_elem.get_text(strip=True) if price_elem else 'N/A'
            
            # Extract mileage
            mileage_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-mileage_road'})
            mileage = mileage_elem.get_text(strip=True) if mileage_elem else 'N/A'
            
            # Extract registration year
            year_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-calendar'})
            year = year_elem.get_text(strip=True) if year_elem else 'N/A'
            
            # Extract fuel type
            fuel_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-gas_pump'})
            fuel = fuel_elem.get_text(strip=True) if fuel_elem else 'N/A'
            
            # Extract transmission
            trans_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-transmission'})
            transmission = trans_elem.get_text(strip=True) if trans_elem else 'N/A'
            
            # Extract power
            power_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-speedometer'})
            power = power_elem.get_text(strip=True) if power_elem else 'N/A'
            
            cars.append({
                'title': title,
                'price': price,
                'mileage': mileage,
                'year': year,
                'fuel_type': fuel,
                'transmission': transmission,
                'power': power,
                'link': link
            })
            
        except Exception as e:
            print(f"Error parsing listing: {e}")
            continue
            
    return cars

The data-testid attributes are stable identifiers that AutoScout24 uses for testing. These change less frequently than CSS classes.

Running the Basic Scraper

Add a main function to tie everything together:

def main():
    """Main scraper function."""
    session = create_session()
    
    # BMW listings in Germany
    url = "https://www.autoscout24.com/lst/bmw?atype=C&cy=D&desc=0&sort=standard&ustate=N,U"
    
    print(f"Scraping: {url}")
    soup = fetch_listings(url, session)
    
    if soup:
        cars = extract_car_listings(soup)
        print(f"Found {len(cars)} listings")
        
        for car in cars[:5]:  # Print first 5
            print(f"  {car['title']} - {car['price']}")
    else:
        print("Failed to fetch page")

if __name__ == "__main__":
    main()

Run this script with python autoscout24_scraper.py. If you get blocked, move to Method 2.

Method 2: Scraping with Playwright

Playwright renders JavaScript and handles dynamic content that plain requests miss. This approach bypasses many anti-bot checks because it runs a real browser.

Installing Playwright

Install Playwright and its browser binaries:

pip install playwright
playwright install chromium

The second command downloads Chromium, which Playwright uses for scraping.

Setting Up a Stealth Browser

Create a new file autoscout24_playwright.py:

from playwright.sync_api import sync_playwright
import random
import time

def create_browser_context(playwright):
    """Create a browser context with stealth settings."""
    browser = playwright.chromium.launch(
        headless=True,
        args=[
            '--disable-blink-features=AutomationControlled',
            '--no-sandbox',
            '--disable-dev-shm-usage',
        ]
    )
    
    context = browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        locale='en-US',
        timezone_id='Europe/Berlin',
    )
    
    # Remove webdriver flag
    context.add_init_script("""
        Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined
        });
    """)
    
    return browser, context

The add_init_script removes the navigator.webdriver flag that identifies automation.

Scraping with Playwright

Add the main scraping logic:

def scrape_autoscout24(url):
    """Scrape AutoScout24 using Playwright."""
    with sync_playwright() as playwright:
        browser, context = create_browser_context(playwright)
        page = context.new_page()
        
        try:
            # Navigate to page with longer timeout
            page.goto(url, wait_until='networkidle', timeout=30000)
            
            # Wait for listings to load
            page.wait_for_selector('article.cldt-summary-full-item', timeout=10000)
            
            # Extract data using page.evaluate for speed
            cars = page.evaluate("""
                () => {
                    const listings = [];
                    const articles = document.querySelectorAll('article.cldt-summary-full-item');
                    
                    articles.forEach(article => {
                        const titleElem = article.querySelector('a[class*="ListItem_title"]');
                        const priceElem = article.querySelector('p[class*="Price_price"]');
                        const mileageElem = article.querySelector('[data-testid="VehicleDetails-mileage_road"]');
                        const yearElem = article.querySelector('[data-testid="VehicleDetails-calendar"]');
                        const fuelElem = article.querySelector('[data-testid="VehicleDetails-gas_pump"]');
                        
                        listings.push({
                            title: titleElem ? titleElem.textContent.trim() : 'N/A',
                            link: titleElem ? 'https://www.autoscout24.com' + titleElem.getAttribute('href') : null,
                            price: priceElem ? priceElem.textContent.trim() : 'N/A',
                            mileage: mileageElem ? mileageElem.textContent.trim() : 'N/A',
                            year: yearElem ? yearElem.textContent.trim() : 'N/A',
                            fuel: fuelElem ? fuelElem.textContent.trim() : 'N/A',
                        });
                    });
                    
                    return listings;
                }
            """)
            
            return cars
            
        except Exception as e:
            print(f"Error: {e}")
            return []
            
        finally:
            browser.close()


def main():
    url = "https://www.autoscout24.com/lst/volkswagen/golf?atype=C&cy=D"
    
    print("Scraping with Playwright...")
    cars = scrape_autoscout24(url)
    
    print(f"Found {len(cars)} listings")
    for car in cars[:5]:
        print(f"  {car['title']} - {car['price']}")

if __name__ == "__main__":
    main()

Using page.evaluate() runs JavaScript in the browser context. This is faster than selecting elements one by one from Python.

Method 3: Using Nodriver for Stealth Scraping

Nodriver is a newer alternative that removes CDP (Chrome DevTools Protocol) detection signals. It's specifically designed for bypassing advanced anti-bot systems.

Installing Nodriver

pip install nodriver

Nodriver includes its own browser management, so no extra setup is required.

Stealth Scraping with Nodriver

Create autoscout24_nodriver.py:

import nodriver as uc
import asyncio

async def scrape_with_nodriver(url):
    """Scrape AutoScout24 using Nodriver for stealth."""
    browser = await uc.start()
    
    try:
        page = await browser.get(url)
        
        # Wait for content to load
        await page.sleep(3)
        
        # Find all listing articles
        articles = await page.select_all('article.cldt-summary-full-item')
        
        cars = []
        for article in articles:
            try:
                # Get text content from elements
                title_elem = await article.query_selector('a[class*="ListItem_title"]')
                price_elem = await article.query_selector('p[class*="Price_price"]')
                
                title = await title_elem.text if title_elem else 'N/A'
                price = await price_elem.text if price_elem else 'N/A'
                
                cars.append({
                    'title': title.strip(),
                    'price': price.strip(),
                })
                
            except Exception as e:
                continue
                
        return cars
        
    finally:
        await browser.stop()


async def main():
    url = "https://www.autoscout24.com/lst/audi/a4?atype=C&cy=D"
    
    print("Scraping with Nodriver...")
    cars = await scrape_with_nodriver(url)
    
    print(f"Found {len(cars)} listings")
    for car in cars[:5]:
        print(f"  {car['title']} - {car['price']}")

if __name__ == "__main__":
    asyncio.run(main())

Nodriver's architecture avoids CDP detection that blocks Playwright and Selenium in 2026.

Extracting Hidden JSON-LD Data

AutoScout24 embeds structured data in JSON-LD format. This data is cleaner than parsing HTML and includes information not visible on the page.

Finding JSON-LD Scripts

Look for <script type="application/ld+json"> tags in the page source:

import json
from bs4 import BeautifulSoup

def extract_json_ld(soup):
    """Extract JSON-LD structured data from page."""
    scripts = soup.find_all('script', type='application/ld+json')
    
    for script in scripts:
        try:
            data = json.loads(script.string)
            
            # Check for Vehicle schema
            if data.get('@type') == 'Vehicle' or data.get('@type') == 'Car':
                return data
                
            # Handle array of schemas
            if isinstance(data, list):
                for item in data:
                    if item.get('@type') in ['Vehicle', 'Car']:
                        return item
                        
        except json.JSONDecodeError:
            continue
            
    return None

Parsing Vehicle Schema

The Vehicle schema contains standardized fields:

def parse_vehicle_schema(schema):
    """Parse Vehicle JSON-LD schema into clean data."""
    if not schema:
        return None
        
    return {
        'name': schema.get('name'),
        'brand': schema.get('brand', {}).get('name'),
        'model': schema.get('model'),
        'year': schema.get('vehicleModelDate') or schema.get('productionDate'),
        'mileage': schema.get('mileageFromOdometer', {}).get('value'),
        'mileage_unit': schema.get('mileageFromOdometer', {}).get('unitCode'),
        'fuel_type': schema.get('fuelType'),
        'transmission': schema.get('vehicleTransmission'),
        'color': schema.get('color'),
        'price': schema.get('offers', {}).get('price'),
        'currency': schema.get('offers', {}).get('priceCurrency'),
        'seller': schema.get('offers', {}).get('seller', {}).get('name'),
        'url': schema.get('url'),
    }

JSON-LD extraction is more reliable than CSS selectors because the schema follows a standard format.

Handling Pagination

AutoScout24 shows 20 listings per page. To scrape all results, you need to iterate through pages.

Building Pagination URLs

Add a page parameter to your search URL:

def build_pagination_urls(base_url, max_pages=20):
    """Generate paginated URLs."""
    urls = []
    
    for page in range(1, max_pages + 1):
        # AutoScout24 uses 'page' parameter
        if '?' in base_url:
            paginated_url = f"{base_url}&page={page}"
        else:
            paginated_url = f"{base_url}?page={page}"
            
        urls.append(paginated_url)
        
    return urls

AutoScout24 limits results to 400 listings per search (20 pages × 20 results). For more data, split your search with filters.

Scraping Multiple Pages

def scrape_all_pages(base_url, session, max_pages=5):
    """Scrape multiple pages of results."""
    all_cars = []
    
    for page in range(1, max_pages + 1):
        url = f"{base_url}&page={page}" if '?' in base_url else f"{base_url}?page={page}"
        
        print(f"Scraping page {page}...")
        soup = fetch_listings(url, session)
        
        if not soup:
            print(f"Failed on page {page}, stopping")
            break
            
        cars = extract_car_listings(soup)
        
        if not cars:
            print("No more listings found")
            break
            
        all_cars.extend(cars)
        
        # Random delay between pages
        time.sleep(random.uniform(3, 7))
        
    return all_cars

Adding longer delays between pages reduces the chance of triggering rate limits.

Using Proxies to Avoid Blocks

When scraping at scale, you need rotating proxies to distribute requests across different IP addresses. AutoScout24 blocks datacenter IPs quickly, so residential proxies work best.

Setting Up Proxy Rotation

If you need residential proxies for AutoScout24 scraping, providers like Roundproxies.com offer rotating residential pools that work well with European car marketplaces.

Here's how to integrate proxies with requests:

def create_session_with_proxy(proxy_url):
    """Create session with proxy authentication."""
    session = requests.Session()
    
    session.proxies = {
        'http': proxy_url,
        'https': proxy_url,
    }
    
    session.headers.update({
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'de-DE,de;q=0.9,en;q=0.8',
    })
    
    return session


def scrape_with_proxy_rotation(urls, proxy_list):
    """Scrape URLs with rotating proxies."""
    all_results = []
    
    for i, url in enumerate(urls):
        # Rotate through proxy list
        proxy = proxy_list[i % len(proxy_list)]
        session = create_session_with_proxy(proxy)
        
        soup = fetch_listings(url, session)
        if soup:
            cars = extract_car_listings(soup)
            all_results.extend(cars)
            
        time.sleep(random.uniform(2, 5))
        
    return all_results

Using Proxies with Playwright

For Playwright, pass proxy settings when creating the browser:

def create_browser_with_proxy(playwright, proxy_server, proxy_username, proxy_password):
    """Create Playwright browser with proxy."""
    browser = playwright.chromium.launch(
        headless=True,
        proxy={
            'server': proxy_server,
            'username': proxy_username,
            'password': proxy_password,
        }
    )
    
    return browser

Residential proxies from countries like Germany, Austria, or Switzerland work best for AutoScout24.

Complete AutoScout24 Scraper

Here's a production-ready scraper combining all techniques:

#!/usr/bin/env python3
"""
AutoScout24 Scraper - Complete solution for extracting car listings
"""

import requests
from bs4 import BeautifulSoup
import json
import csv
import random
import time
from datetime import datetime
import re

# Configuration
USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
]

REQUEST_DELAY = (2, 5)  # Random delay range in seconds


class AutoScout24Scraper:
    """Scraper for AutoScout24 car listings."""
    
    def __init__(self, proxy=None):
        self.session = self._create_session(proxy)
        self.results = []
        
    def _create_session(self, proxy=None):
        """Initialize requests session."""
        session = requests.Session()
        
        if proxy:
            session.proxies = {'http': proxy, 'https': proxy}
            
        session.headers.update({
            'User-Agent': random.choice(USER_AGENTS),
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
        })
        
        return session
        
    def _fetch_page(self, url):
        """Fetch a single page."""
        try:
            time.sleep(random.uniform(*REQUEST_DELAY))
            response = self.session.get(url, timeout=20)
            
            if response.status_code == 200:
                return BeautifulSoup(response.content, 'lxml')
            else:
                print(f"HTTP {response.status_code} for {url}")
                return None
                
        except Exception as e:
            print(f"Error fetching {url}: {e}")
            return None
            
    def _extract_listings(self, soup):
        """Extract listings from search page."""
        cars = []
        articles = soup.find_all('article', class_='cldt-summary-full-item')
        
        for article in articles:
            car = self._parse_listing(article)
            if car:
                cars.append(car)
                
        return cars
        
    def _parse_listing(self, article):
        """Parse a single listing element."""
        try:
            # Title and link
            title_elem = article.find('a', class_=re.compile(r'ListItem_title'))
            title = title_elem.get_text(strip=True) if title_elem else 'N/A'
            link = 'https://www.autoscout24.com' + title_elem['href'] if title_elem and title_elem.get('href') else None
            
            # Price - handle dynamic classes with regex
            price_elem = article.find('p', class_=re.compile(r'Price_price'))
            if not price_elem:
                # Fallback: search for currency pattern
                price_text = article.get_text()
                price_match = re.search(r'[€£]\s*[\d,.]+', price_text)
                price = price_match.group(0) if price_match else 'N/A'
            else:
                price = price_elem.get_text(strip=True)
                
            # Vehicle details using data-testid
            mileage_elem = article.find(attrs={'data-testid': 'VehicleDetails-mileage_road'})
            year_elem = article.find(attrs={'data-testid': 'VehicleDetails-calendar'})
            fuel_elem = article.find(attrs={'data-testid': 'VehicleDetails-gas_pump'})
            trans_elem = article.find(attrs={'data-testid': 'VehicleDetails-transmission'})
            power_elem = article.find(attrs={'data-testid': 'VehicleDetails-speedometer'})
            
            # Seller info
            seller_elem = article.find('span', class_=re.compile(r'SellerInfo_name'))
            location_elem = article.find('span', class_=re.compile(r'SellerInfo_address'))
            
            return {
                'title': title,
                'price': price,
                'mileage': mileage_elem.get_text(strip=True) if mileage_elem else 'N/A',
                'year': year_elem.get_text(strip=True) if year_elem else 'N/A',
                'fuel_type': fuel_elem.get_text(strip=True) if fuel_elem else 'N/A',
                'transmission': trans_elem.get_text(strip=True) if trans_elem else 'N/A',
                'power': power_elem.get_text(strip=True) if power_elem else 'N/A',
                'seller': seller_elem.get_text(strip=True) if seller_elem else 'N/A',
                'location': location_elem.get_text(strip=True) if location_elem else 'N/A',
                'link': link,
                'scraped_at': datetime.now().isoformat(),
            }
            
        except Exception as e:
            print(f"Error parsing listing: {e}")
            return None
            
    def scrape_search(self, base_url, max_pages=5):
        """Scrape multiple pages of search results."""
        for page in range(1, max_pages + 1):
            url = f"{base_url}&page={page}" if '?' in base_url else f"{base_url}?page={page}"
            
            print(f"Scraping page {page}...")
            soup = self._fetch_page(url)
            
            if not soup:
                break
                
            cars = self._extract_listings(soup)
            
            if not cars:
                print("No more listings")
                break
                
            self.results.extend(cars)
            print(f"  Found {len(cars)} listings")
            
        return self.results
        
    def scrape_detail_page(self, url):
        """Scrape a single car detail page."""
        soup = self._fetch_page(url)
        
        if not soup:
            return None
            
        # Try JSON-LD extraction first
        json_ld = self._extract_json_ld(soup)
        if json_ld:
            return self._parse_vehicle_schema(json_ld)
            
        # Fallback to HTML parsing
        return self._parse_detail_page(soup)
        
    def _extract_json_ld(self, soup):
        """Extract JSON-LD Vehicle schema."""
        scripts = soup.find_all('script', type='application/ld+json')
        
        for script in scripts:
            try:
                data = json.loads(script.string)
                if isinstance(data, list):
                    for item in data:
                        if item.get('@type') in ['Vehicle', 'Car', 'Product']:
                            return item
                elif data.get('@type') in ['Vehicle', 'Car', 'Product']:
                    return data
            except:
                continue
                
        return None
        
    def _parse_vehicle_schema(self, schema):
        """Parse Vehicle schema to dict."""
        offers = schema.get('offers', {})
        
        return {
            'name': schema.get('name'),
            'brand': schema.get('brand', {}).get('name') if isinstance(schema.get('brand'), dict) else schema.get('brand'),
            'model': schema.get('model'),
            'year': schema.get('vehicleModelDate'),
            'mileage': schema.get('mileageFromOdometer', {}).get('value'),
            'fuel_type': schema.get('fuelType'),
            'transmission': schema.get('vehicleTransmission'),
            'color': schema.get('color'),
            'price': offers.get('price'),
            'currency': offers.get('priceCurrency'),
            'url': schema.get('url'),
        }
        
    def _parse_detail_page(self, soup):
        """Parse detail page HTML."""
        # Title
        title_elem = soup.find('h1', class_=re.compile(r'StageTitle'))
        title = title_elem.get_text(strip=True) if title_elem else 'N/A'
        
        # Price
        price_elem = soup.find('span', class_=re.compile(r'PriceInfo_price'))
        price = price_elem.get_text(strip=True) if price_elem else 'N/A'
        
        # Collect vehicle overview items
        details = {}
        overview_items = soup.find_all('div', class_=re.compile(r'VehicleOverview_itemContainer'))
        
        for item in overview_items:
            label = item.find('div', class_=re.compile(r'VehicleOverview_itemTitle'))
            value = item.find('div', class_=re.compile(r'VehicleOverview_itemText'))
            
            if label and value:
                key = label.get_text(strip=True).lower().replace(' ', '_')
                details[key] = value.get_text(strip=True)
                
        return {
            'title': title,
            'price': price,
            **details,
        }
        
    def save_to_csv(self, filename):
        """Save results to CSV file."""
        if not self.results:
            print("No results to save")
            return
            
        keys = self.results[0].keys()
        
        with open(filename, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=keys)
            writer.writeheader()
            writer.writerows(self.results)
            
        print(f"Saved {len(self.results)} listings to {filename}")
        
    def save_to_json(self, filename):
        """Save results to JSON file."""
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(self.results, f, indent=2, ensure_ascii=False)
            
        print(f"Saved {len(self.results)} listings to {filename}")


def main():
    """Example usage."""
    # Initialize scraper
    scraper = AutoScout24Scraper()
    
    # Define search URL
    search_url = "https://www.autoscout24.com/lst/mercedes-benz/c-class?atype=C&cy=D&desc=0&sort=standard&ustate=N,U"
    
    # Scrape search results
    print("Starting AutoScout24 scraper...")
    results = scraper.scrape_search(search_url, max_pages=3)
    
    print(f"\nTotal listings scraped: {len(results)}")
    
    # Save results
    scraper.save_to_csv('autoscout24_listings.csv')
    scraper.save_to_json('autoscout24_listings.json')
    
    # Optionally scrape detail pages
    if results and results[0].get('link'):
        print("\nScraping first detail page...")
        detail = scraper.scrape_detail_page(results[0]['link'])
        if detail:
            print(json.dumps(detail, indent=2))


if __name__ == "__main__":
    main()

This complete scraper handles search pages, detail pages, JSON-LD extraction, dynamic class matching, and data export.

Common Errors and Fixes

403 Forbidden Errors

AutoScout24's Akamai protection triggers 403 responses when it detects automation.

Fix: Switch from requests to Playwright or Nodriver. Use residential proxies and longer delays between requests.

Empty Results

Sometimes selectors return no data because AutoScout24 updated their HTML.

Fix: Use data-testid attributes instead of CSS classes. These change less often. Alternatively, extract JSON-LD data which follows a stable schema.

Captcha Challenges

Aggressive scraping triggers captcha verification pages.

Fix: Reduce request frequency. Use residential proxies with sticky sessions. Consider running in headful mode to solve captchas manually.

IP Blocks

Repeated requests from the same IP get temporarily or permanently blocked.

Fix: Rotate through a pool of residential proxies. Space requests across different times of day.

Rate Limiting

Too many requests in a short period triggers rate limits.

Fix: Add random delays of 3-10 seconds between requests. Scrape during off-peak hours (European nights).

Comparing Scraping Methods

Each scraping method has trade-offs. Here's a quick comparison:

Method Speed Detection Risk Setup Complexity Best For
Python Requests Fast High Low Small datasets, testing
Playwright Medium Medium Medium JS-heavy pages, medium scale
Nodriver Medium Low Low Bypassing advanced protection
Requests + Proxies Fast Low Medium Large-scale production

Choose requests for quick prototypes. Use Playwright when you need JavaScript execution. Pick Nodriver when Playwright gets detected.

For production scraping, combine requests with rotating residential proxies. This gives you speed and reliability at scale.

Best Practices for AutoScout24 Scraping

Following these practices helps you avoid blocks and maintain data quality.

Respect the Site

Add delays between requests. AutoScout24 serves millions of users daily. Hammering their servers hurts everyone.

Scrape during off-peak hours. European nighttime (midnight to 6 AM CET) sees less traffic and fewer rate limits.

Don't scrape the same listings repeatedly. Store data locally and only refresh periodically.

Handle Errors Gracefully

Implement exponential backoff for failed requests. If a request fails, wait 5 seconds before retrying. Double the wait time for each subsequent failure.

def fetch_with_retry(url, session, max_retries=3):
    """Fetch with exponential backoff."""
    delay = 5
    
    for attempt in range(max_retries):
        try:
            response = session.get(url, timeout=15)
            if response.status_code == 200:
                return response
            elif response.status_code == 429:  # Rate limited
                time.sleep(delay)
                delay *= 2
            else:
                return None
        except Exception:
            time.sleep(delay)
            delay *= 2
            
    return None

Validate Your Data

Check that extracted data makes sense. Prices should be positive numbers. Years should be between 1980 and 2026. Mileage can't be negative.

def validate_listing(listing):
    """Validate a car listing."""
    # Clean price
    price_str = listing.get('price', '')
    price_clean = re.sub(r'[^\d]', '', price_str)
    
    if price_clean and int(price_clean) > 0:
        listing['price_numeric'] = int(price_clean)
    else:
        listing['price_numeric'] = None
        
    # Validate year
    year_str = listing.get('year', '')
    year_match = re.search(r'(\d{4})', year_str)
    
    if year_match:
        year = int(year_match.group(1))
        if 1980 <= year <= 2026:
            listing['year_numeric'] = year
        else:
            listing['year_numeric'] = None
            
    return listing

Store Data Efficiently

For large datasets, use a database instead of CSV files. SQLite works for local storage. PostgreSQL handles concurrent writes better.

import sqlite3

def create_database():
    """Create SQLite database for listings."""
    conn = sqlite3.connect('autoscout24.db')
    cursor = conn.cursor()
    
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS listings (
            id TEXT PRIMARY KEY,
            title TEXT,
            price INTEGER,
            mileage INTEGER,
            year INTEGER,
            fuel_type TEXT,
            transmission TEXT,
            link TEXT UNIQUE,
            scraped_at TEXT
        )
    ''')
    
    conn.commit()
    return conn

Using link as a unique constraint prevents duplicate entries when you re-scrape.

Advanced Techniques

These techniques handle edge cases and improve scraping reliability.

Fingerprint Rotation

Anti-bot systems track browser fingerprints. Rotate your fingerprint between sessions.

import random

def generate_fingerprint():
    """Generate random browser fingerprint settings."""
    screen_sizes = [(1920, 1080), (1366, 768), (1536, 864), (1440, 900)]
    languages = ['en-US', 'de-DE', 'en-GB', 'fr-FR']
    timezones = ['Europe/Berlin', 'Europe/Vienna', 'Europe/Zurich', 'Europe/Paris']
    
    return {
        'viewport': random.choice(screen_sizes),
        'language': random.choice(languages),
        'timezone': random.choice(timezones),
    }

Apply different fingerprints to each browser session.

Session Persistence

Keep sessions alive to maintain cookies and avoid re-authentication.

import pickle

def save_session(session, filename='session.pkl'):
    """Save session cookies to file."""
    with open(filename, 'wb') as f:
        pickle.dump(session.cookies, f)
        
def load_session(session, filename='session.pkl'):
    """Load session cookies from file."""
    try:
        with open(filename, 'rb') as f:
            cookies = pickle.load(f)
            session.cookies.update(cookies)
    except FileNotFoundError:
        pass

Concurrent Scraping

Speed up scraping with concurrent requests. Be careful not to exceed rate limits.

from concurrent.futures import ThreadPoolExecutor, as_completed

def scrape_urls_concurrently(urls, max_workers=3):
    """Scrape multiple URLs concurrently."""
    results = []
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(fetch_and_parse, url): url for url in urls}
        
        for future in as_completed(futures):
            url = futures[future]
            try:
                result = future.result()
                if result:
                    results.append(result)
            except Exception as e:
                print(f"Failed {url}: {e}")
                
    return results

Keep max_workers low (3-5) to avoid triggering rate limits.

FAQ

Scraping publicly available data is generally legal for personal use and research. However, republishing scraped data or using it commercially may violate their terms of service. Consult a lawyer for your specific use case.

How many listings can I scrape per day?

AutoScout24 limits search results to 400 per query. With careful rate limiting and proxy rotation, you can scrape thousands of listings daily. Going too fast risks IP blocks.

Why does my scraper get blocked after a few requests?

Datacenter IPs are flagged immediately. AutoScout24's Akamai protection identifies non-browser requests. Switch to Playwright with residential proxies.

How do I handle different AutoScout24 country sites?

Each country uses a different domain or URL structure. Germany uses autoscout24.de, Switzerland uses autoscout24.ch. Adjust your base URL and potentially the CSS selectors.

Can I scrape historical pricing data?

No. AutoScout24 only shows current listings. For historical data, you need to scrape regularly and store results in a database over time.

What's the best proxy type for AutoScout24?

Residential proxies work best because they appear as regular home internet connections. ISP proxies are also effective. Avoid datacenter proxies as they get blocked quickly.

Summary

You now have three working methods to scrape AutoScout24: Python requests with header rotation, Playwright for JavaScript rendering, and Nodriver for stealth scraping.

Start with the requests approach for small projects. Move to Playwright when you hit blocks. Use Nodriver if advanced anti-bot detection stops Playwright from working.

Always respect rate limits, use delays between requests, and rotate your IP addresses with residential proxies when scaling up.

The complete scraper code in this guide handles pagination, JSON-LD extraction, and exports to CSV and JSON formats. Adapt it to your specific needs and target markets.