AutoScout24 is Europe's largest online car marketplace with over 2 million active listings. Scraping AutoScout24 gives you access to vehicle prices, specifications, mileage data, and seller information across 18+ countries.
In this guide, you'll learn how to scrape AutoScout24 using Python with multiple approaches. We'll cover everything from basic HTTP requests to browser automation for handling their Akamai bot protection.
What Is AutoScout24 Scraping?
Scraping AutoScout24 means programmatically extracting vehicle listing data from their website. This data includes car prices, makes, models, mileage, year of registration, fuel type, transmission, and seller details.
AutoScout24 uses Akamai bot protection and JavaScript rendering, which makes simple scraping requests fail with 403 errors. You need specific techniques to bypass these protections and extract data reliably.
Why Scrape AutoScout24?
Car dealers, researchers, and data analysts scrape AutoScout24 for several reasons.
Price monitoring is the primary use case. Dealers track competitor pricing across European markets to adjust their own listings. Price differences between countries can be significant.
Market research helps manufacturers understand vehicle depreciation patterns. Knowing how mileage, age, and features affect resale value informs production decisions.
Inventory tracking lets buyers find specific vehicles matching their criteria. Rather than manually checking hundreds of listings, a scraper can alert you when the right car appears.
Lead generation for dealerships involves identifying private sellers who might want to trade in their vehicle. Contact information from listings creates sales opportunities.
Understanding AutoScout24's Structure
Before writing any code, you need to understand how AutoScout24 organizes its data.
URL Structure
AutoScout24 uses clean, predictable URLs for listings:
https://www.autoscout24.com/lst/{brand}?atype=C&cy=D&desc=0&sort=standard&ustate=N,U
The lst path indicates a listing search. Country codes like D (Germany), A (Austria), or I (Italy) filter by market.
Individual car pages follow this pattern:
https://www.autoscout24.com/offers/{brand}-{model}-{details}-{unique-id}
Page Structure
Listing pages contain vehicle cards with summary information. Each card shows the title, price, mileage, year, fuel type, and a thumbnail image.
Detail pages hold complete specifications. You'll find engine data, color, number of owners, service history, and seller contact details.
Anti-Bot Protection
AutoScout24 deploys Akamai's bot management system. This protection includes browser fingerprinting, JavaScript challenges, and IP-based rate limiting.
Basic Python requests often get blocked immediately. You need either proper header rotation, residential proxies, or browser automation to succeed.
Method 1: Scraping with Python Requests
The simplest approach uses Python's requests library with careful header management. This works for small-scale scraping when you rotate user agents and add delays.
Installing Dependencies
Open your terminal and install the required packages:
pip install requests beautifulsoup4 lxml
These packages handle HTTP requests and HTML parsing.
Basic Request Setup
Create a new file called autoscout24_scraper.py and add this code:
import requests
from bs4 import BeautifulSoup
import random
import time
# Realistic user agents to rotate
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15',
]
def create_session():
"""Create a requests session with random headers."""
session = requests.Session()
session.headers.update({
'User-Agent': random.choice(USER_AGENTS),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
})
return session
This code creates a session with browser-like headers. The Sec-Fetch-* headers mimic real Chrome behavior.
Fetching a Listing Page
Now add a function to fetch and parse search results:
def fetch_listings(url, session):
"""Fetch car listings from a search URL."""
try:
# Add random delay between requests
time.sleep(random.uniform(2, 5))
response = session.get(url, timeout=15)
if response.status_code == 403:
print("Blocked by anti-bot protection")
return None
if response.status_code != 200:
print(f"Error: Status code {response.status_code}")
return None
return BeautifulSoup(response.content, 'lxml')
except requests.RequestException as e:
print(f"Request failed: {e}")
return None
The function returns a BeautifulSoup object for parsing or None if the request fails.
Extracting Car Data
AutoScout24 uses specific CSS classes for car information. Here's how to extract listing data:
def extract_car_listings(soup):
"""Extract car data from search results page."""
cars = []
# Find all listing articles
articles = soup.find_all('article', class_='cldt-summary-full-item')
for article in articles:
try:
# Extract car title
title_elem = article.find('a', class_='ListItem_title__ndA4s')
title = title_elem.get_text(strip=True) if title_elem else 'N/A'
# Extract link to detail page
link = 'https://www.autoscout24.com' + title_elem['href'] if title_elem else None
# Extract price
price_elem = article.find('p', class_='Price_price__APlgs')
price = price_elem.get_text(strip=True) if price_elem else 'N/A'
# Extract mileage
mileage_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-mileage_road'})
mileage = mileage_elem.get_text(strip=True) if mileage_elem else 'N/A'
# Extract registration year
year_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-calendar'})
year = year_elem.get_text(strip=True) if year_elem else 'N/A'
# Extract fuel type
fuel_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-gas_pump'})
fuel = fuel_elem.get_text(strip=True) if fuel_elem else 'N/A'
# Extract transmission
trans_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-transmission'})
transmission = trans_elem.get_text(strip=True) if trans_elem else 'N/A'
# Extract power
power_elem = article.find('span', attrs={'data-testid': 'VehicleDetails-speedometer'})
power = power_elem.get_text(strip=True) if power_elem else 'N/A'
cars.append({
'title': title,
'price': price,
'mileage': mileage,
'year': year,
'fuel_type': fuel,
'transmission': transmission,
'power': power,
'link': link
})
except Exception as e:
print(f"Error parsing listing: {e}")
continue
return cars
The data-testid attributes are stable identifiers that AutoScout24 uses for testing. These change less frequently than CSS classes.
Running the Basic Scraper
Add a main function to tie everything together:
def main():
"""Main scraper function."""
session = create_session()
# BMW listings in Germany
url = "https://www.autoscout24.com/lst/bmw?atype=C&cy=D&desc=0&sort=standard&ustate=N,U"
print(f"Scraping: {url}")
soup = fetch_listings(url, session)
if soup:
cars = extract_car_listings(soup)
print(f"Found {len(cars)} listings")
for car in cars[:5]: # Print first 5
print(f" {car['title']} - {car['price']}")
else:
print("Failed to fetch page")
if __name__ == "__main__":
main()
Run this script with python autoscout24_scraper.py. If you get blocked, move to Method 2.
Method 2: Scraping with Playwright
Playwright renders JavaScript and handles dynamic content that plain requests miss. This approach bypasses many anti-bot checks because it runs a real browser.
Installing Playwright
Install Playwright and its browser binaries:
pip install playwright
playwright install chromium
The second command downloads Chromium, which Playwright uses for scraping.
Setting Up a Stealth Browser
Create a new file autoscout24_playwright.py:
from playwright.sync_api import sync_playwright
import random
import time
def create_browser_context(playwright):
"""Create a browser context with stealth settings."""
browser = playwright.chromium.launch(
headless=True,
args=[
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-dev-shm-usage',
]
)
context = browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
locale='en-US',
timezone_id='Europe/Berlin',
)
# Remove webdriver flag
context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
return browser, context
The add_init_script removes the navigator.webdriver flag that identifies automation.
Scraping with Playwright
Add the main scraping logic:
def scrape_autoscout24(url):
"""Scrape AutoScout24 using Playwright."""
with sync_playwright() as playwright:
browser, context = create_browser_context(playwright)
page = context.new_page()
try:
# Navigate to page with longer timeout
page.goto(url, wait_until='networkidle', timeout=30000)
# Wait for listings to load
page.wait_for_selector('article.cldt-summary-full-item', timeout=10000)
# Extract data using page.evaluate for speed
cars = page.evaluate("""
() => {
const listings = [];
const articles = document.querySelectorAll('article.cldt-summary-full-item');
articles.forEach(article => {
const titleElem = article.querySelector('a[class*="ListItem_title"]');
const priceElem = article.querySelector('p[class*="Price_price"]');
const mileageElem = article.querySelector('[data-testid="VehicleDetails-mileage_road"]');
const yearElem = article.querySelector('[data-testid="VehicleDetails-calendar"]');
const fuelElem = article.querySelector('[data-testid="VehicleDetails-gas_pump"]');
listings.push({
title: titleElem ? titleElem.textContent.trim() : 'N/A',
link: titleElem ? 'https://www.autoscout24.com' + titleElem.getAttribute('href') : null,
price: priceElem ? priceElem.textContent.trim() : 'N/A',
mileage: mileageElem ? mileageElem.textContent.trim() : 'N/A',
year: yearElem ? yearElem.textContent.trim() : 'N/A',
fuel: fuelElem ? fuelElem.textContent.trim() : 'N/A',
});
});
return listings;
}
""")
return cars
except Exception as e:
print(f"Error: {e}")
return []
finally:
browser.close()
def main():
url = "https://www.autoscout24.com/lst/volkswagen/golf?atype=C&cy=D"
print("Scraping with Playwright...")
cars = scrape_autoscout24(url)
print(f"Found {len(cars)} listings")
for car in cars[:5]:
print(f" {car['title']} - {car['price']}")
if __name__ == "__main__":
main()
Using page.evaluate() runs JavaScript in the browser context. This is faster than selecting elements one by one from Python.
Method 3: Using Nodriver for Stealth Scraping
Nodriver is a newer alternative that removes CDP (Chrome DevTools Protocol) detection signals. It's specifically designed for bypassing advanced anti-bot systems.
Installing Nodriver
pip install nodriver
Nodriver includes its own browser management, so no extra setup is required.
Stealth Scraping with Nodriver
Create autoscout24_nodriver.py:
import nodriver as uc
import asyncio
async def scrape_with_nodriver(url):
"""Scrape AutoScout24 using Nodriver for stealth."""
browser = await uc.start()
try:
page = await browser.get(url)
# Wait for content to load
await page.sleep(3)
# Find all listing articles
articles = await page.select_all('article.cldt-summary-full-item')
cars = []
for article in articles:
try:
# Get text content from elements
title_elem = await article.query_selector('a[class*="ListItem_title"]')
price_elem = await article.query_selector('p[class*="Price_price"]')
title = await title_elem.text if title_elem else 'N/A'
price = await price_elem.text if price_elem else 'N/A'
cars.append({
'title': title.strip(),
'price': price.strip(),
})
except Exception as e:
continue
return cars
finally:
await browser.stop()
async def main():
url = "https://www.autoscout24.com/lst/audi/a4?atype=C&cy=D"
print("Scraping with Nodriver...")
cars = await scrape_with_nodriver(url)
print(f"Found {len(cars)} listings")
for car in cars[:5]:
print(f" {car['title']} - {car['price']}")
if __name__ == "__main__":
asyncio.run(main())
Nodriver's architecture avoids CDP detection that blocks Playwright and Selenium in 2026.
Extracting Hidden JSON-LD Data
AutoScout24 embeds structured data in JSON-LD format. This data is cleaner than parsing HTML and includes information not visible on the page.
Finding JSON-LD Scripts
Look for <script type="application/ld+json"> tags in the page source:
import json
from bs4 import BeautifulSoup
def extract_json_ld(soup):
"""Extract JSON-LD structured data from page."""
scripts = soup.find_all('script', type='application/ld+json')
for script in scripts:
try:
data = json.loads(script.string)
# Check for Vehicle schema
if data.get('@type') == 'Vehicle' or data.get('@type') == 'Car':
return data
# Handle array of schemas
if isinstance(data, list):
for item in data:
if item.get('@type') in ['Vehicle', 'Car']:
return item
except json.JSONDecodeError:
continue
return None
Parsing Vehicle Schema
The Vehicle schema contains standardized fields:
def parse_vehicle_schema(schema):
"""Parse Vehicle JSON-LD schema into clean data."""
if not schema:
return None
return {
'name': schema.get('name'),
'brand': schema.get('brand', {}).get('name'),
'model': schema.get('model'),
'year': schema.get('vehicleModelDate') or schema.get('productionDate'),
'mileage': schema.get('mileageFromOdometer', {}).get('value'),
'mileage_unit': schema.get('mileageFromOdometer', {}).get('unitCode'),
'fuel_type': schema.get('fuelType'),
'transmission': schema.get('vehicleTransmission'),
'color': schema.get('color'),
'price': schema.get('offers', {}).get('price'),
'currency': schema.get('offers', {}).get('priceCurrency'),
'seller': schema.get('offers', {}).get('seller', {}).get('name'),
'url': schema.get('url'),
}
JSON-LD extraction is more reliable than CSS selectors because the schema follows a standard format.
Handling Pagination
AutoScout24 shows 20 listings per page. To scrape all results, you need to iterate through pages.
Building Pagination URLs
Add a page parameter to your search URL:
def build_pagination_urls(base_url, max_pages=20):
"""Generate paginated URLs."""
urls = []
for page in range(1, max_pages + 1):
# AutoScout24 uses 'page' parameter
if '?' in base_url:
paginated_url = f"{base_url}&page={page}"
else:
paginated_url = f"{base_url}?page={page}"
urls.append(paginated_url)
return urls
AutoScout24 limits results to 400 listings per search (20 pages × 20 results). For more data, split your search with filters.
Scraping Multiple Pages
def scrape_all_pages(base_url, session, max_pages=5):
"""Scrape multiple pages of results."""
all_cars = []
for page in range(1, max_pages + 1):
url = f"{base_url}&page={page}" if '?' in base_url else f"{base_url}?page={page}"
print(f"Scraping page {page}...")
soup = fetch_listings(url, session)
if not soup:
print(f"Failed on page {page}, stopping")
break
cars = extract_car_listings(soup)
if not cars:
print("No more listings found")
break
all_cars.extend(cars)
# Random delay between pages
time.sleep(random.uniform(3, 7))
return all_cars
Adding longer delays between pages reduces the chance of triggering rate limits.
Using Proxies to Avoid Blocks
When scraping at scale, you need rotating proxies to distribute requests across different IP addresses. AutoScout24 blocks datacenter IPs quickly, so residential proxies work best.
Setting Up Proxy Rotation
If you need residential proxies for AutoScout24 scraping, providers like Roundproxies.com offer rotating residential pools that work well with European car marketplaces.
Here's how to integrate proxies with requests:
def create_session_with_proxy(proxy_url):
"""Create session with proxy authentication."""
session = requests.Session()
session.proxies = {
'http': proxy_url,
'https': proxy_url,
}
session.headers.update({
'User-Agent': random.choice(USER_AGENTS),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'de-DE,de;q=0.9,en;q=0.8',
})
return session
def scrape_with_proxy_rotation(urls, proxy_list):
"""Scrape URLs with rotating proxies."""
all_results = []
for i, url in enumerate(urls):
# Rotate through proxy list
proxy = proxy_list[i % len(proxy_list)]
session = create_session_with_proxy(proxy)
soup = fetch_listings(url, session)
if soup:
cars = extract_car_listings(soup)
all_results.extend(cars)
time.sleep(random.uniform(2, 5))
return all_results
Using Proxies with Playwright
For Playwright, pass proxy settings when creating the browser:
def create_browser_with_proxy(playwright, proxy_server, proxy_username, proxy_password):
"""Create Playwright browser with proxy."""
browser = playwright.chromium.launch(
headless=True,
proxy={
'server': proxy_server,
'username': proxy_username,
'password': proxy_password,
}
)
return browser
Residential proxies from countries like Germany, Austria, or Switzerland work best for AutoScout24.
Complete AutoScout24 Scraper
Here's a production-ready scraper combining all techniques:
#!/usr/bin/env python3
"""
AutoScout24 Scraper - Complete solution for extracting car listings
"""
import requests
from bs4 import BeautifulSoup
import json
import csv
import random
import time
from datetime import datetime
import re
# Configuration
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
]
REQUEST_DELAY = (2, 5) # Random delay range in seconds
class AutoScout24Scraper:
"""Scraper for AutoScout24 car listings."""
def __init__(self, proxy=None):
self.session = self._create_session(proxy)
self.results = []
def _create_session(self, proxy=None):
"""Initialize requests session."""
session = requests.Session()
if proxy:
session.proxies = {'http': proxy, 'https': proxy}
session.headers.update({
'User-Agent': random.choice(USER_AGENTS),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
})
return session
def _fetch_page(self, url):
"""Fetch a single page."""
try:
time.sleep(random.uniform(*REQUEST_DELAY))
response = self.session.get(url, timeout=20)
if response.status_code == 200:
return BeautifulSoup(response.content, 'lxml')
else:
print(f"HTTP {response.status_code} for {url}")
return None
except Exception as e:
print(f"Error fetching {url}: {e}")
return None
def _extract_listings(self, soup):
"""Extract listings from search page."""
cars = []
articles = soup.find_all('article', class_='cldt-summary-full-item')
for article in articles:
car = self._parse_listing(article)
if car:
cars.append(car)
return cars
def _parse_listing(self, article):
"""Parse a single listing element."""
try:
# Title and link
title_elem = article.find('a', class_=re.compile(r'ListItem_title'))
title = title_elem.get_text(strip=True) if title_elem else 'N/A'
link = 'https://www.autoscout24.com' + title_elem['href'] if title_elem and title_elem.get('href') else None
# Price - handle dynamic classes with regex
price_elem = article.find('p', class_=re.compile(r'Price_price'))
if not price_elem:
# Fallback: search for currency pattern
price_text = article.get_text()
price_match = re.search(r'[€£]\s*[\d,.]+', price_text)
price = price_match.group(0) if price_match else 'N/A'
else:
price = price_elem.get_text(strip=True)
# Vehicle details using data-testid
mileage_elem = article.find(attrs={'data-testid': 'VehicleDetails-mileage_road'})
year_elem = article.find(attrs={'data-testid': 'VehicleDetails-calendar'})
fuel_elem = article.find(attrs={'data-testid': 'VehicleDetails-gas_pump'})
trans_elem = article.find(attrs={'data-testid': 'VehicleDetails-transmission'})
power_elem = article.find(attrs={'data-testid': 'VehicleDetails-speedometer'})
# Seller info
seller_elem = article.find('span', class_=re.compile(r'SellerInfo_name'))
location_elem = article.find('span', class_=re.compile(r'SellerInfo_address'))
return {
'title': title,
'price': price,
'mileage': mileage_elem.get_text(strip=True) if mileage_elem else 'N/A',
'year': year_elem.get_text(strip=True) if year_elem else 'N/A',
'fuel_type': fuel_elem.get_text(strip=True) if fuel_elem else 'N/A',
'transmission': trans_elem.get_text(strip=True) if trans_elem else 'N/A',
'power': power_elem.get_text(strip=True) if power_elem else 'N/A',
'seller': seller_elem.get_text(strip=True) if seller_elem else 'N/A',
'location': location_elem.get_text(strip=True) if location_elem else 'N/A',
'link': link,
'scraped_at': datetime.now().isoformat(),
}
except Exception as e:
print(f"Error parsing listing: {e}")
return None
def scrape_search(self, base_url, max_pages=5):
"""Scrape multiple pages of search results."""
for page in range(1, max_pages + 1):
url = f"{base_url}&page={page}" if '?' in base_url else f"{base_url}?page={page}"
print(f"Scraping page {page}...")
soup = self._fetch_page(url)
if not soup:
break
cars = self._extract_listings(soup)
if not cars:
print("No more listings")
break
self.results.extend(cars)
print(f" Found {len(cars)} listings")
return self.results
def scrape_detail_page(self, url):
"""Scrape a single car detail page."""
soup = self._fetch_page(url)
if not soup:
return None
# Try JSON-LD extraction first
json_ld = self._extract_json_ld(soup)
if json_ld:
return self._parse_vehicle_schema(json_ld)
# Fallback to HTML parsing
return self._parse_detail_page(soup)
def _extract_json_ld(self, soup):
"""Extract JSON-LD Vehicle schema."""
scripts = soup.find_all('script', type='application/ld+json')
for script in scripts:
try:
data = json.loads(script.string)
if isinstance(data, list):
for item in data:
if item.get('@type') in ['Vehicle', 'Car', 'Product']:
return item
elif data.get('@type') in ['Vehicle', 'Car', 'Product']:
return data
except:
continue
return None
def _parse_vehicle_schema(self, schema):
"""Parse Vehicle schema to dict."""
offers = schema.get('offers', {})
return {
'name': schema.get('name'),
'brand': schema.get('brand', {}).get('name') if isinstance(schema.get('brand'), dict) else schema.get('brand'),
'model': schema.get('model'),
'year': schema.get('vehicleModelDate'),
'mileage': schema.get('mileageFromOdometer', {}).get('value'),
'fuel_type': schema.get('fuelType'),
'transmission': schema.get('vehicleTransmission'),
'color': schema.get('color'),
'price': offers.get('price'),
'currency': offers.get('priceCurrency'),
'url': schema.get('url'),
}
def _parse_detail_page(self, soup):
"""Parse detail page HTML."""
# Title
title_elem = soup.find('h1', class_=re.compile(r'StageTitle'))
title = title_elem.get_text(strip=True) if title_elem else 'N/A'
# Price
price_elem = soup.find('span', class_=re.compile(r'PriceInfo_price'))
price = price_elem.get_text(strip=True) if price_elem else 'N/A'
# Collect vehicle overview items
details = {}
overview_items = soup.find_all('div', class_=re.compile(r'VehicleOverview_itemContainer'))
for item in overview_items:
label = item.find('div', class_=re.compile(r'VehicleOverview_itemTitle'))
value = item.find('div', class_=re.compile(r'VehicleOverview_itemText'))
if label and value:
key = label.get_text(strip=True).lower().replace(' ', '_')
details[key] = value.get_text(strip=True)
return {
'title': title,
'price': price,
**details,
}
def save_to_csv(self, filename):
"""Save results to CSV file."""
if not self.results:
print("No results to save")
return
keys = self.results[0].keys()
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(self.results)
print(f"Saved {len(self.results)} listings to {filename}")
def save_to_json(self, filename):
"""Save results to JSON file."""
with open(filename, 'w', encoding='utf-8') as f:
json.dump(self.results, f, indent=2, ensure_ascii=False)
print(f"Saved {len(self.results)} listings to {filename}")
def main():
"""Example usage."""
# Initialize scraper
scraper = AutoScout24Scraper()
# Define search URL
search_url = "https://www.autoscout24.com/lst/mercedes-benz/c-class?atype=C&cy=D&desc=0&sort=standard&ustate=N,U"
# Scrape search results
print("Starting AutoScout24 scraper...")
results = scraper.scrape_search(search_url, max_pages=3)
print(f"\nTotal listings scraped: {len(results)}")
# Save results
scraper.save_to_csv('autoscout24_listings.csv')
scraper.save_to_json('autoscout24_listings.json')
# Optionally scrape detail pages
if results and results[0].get('link'):
print("\nScraping first detail page...")
detail = scraper.scrape_detail_page(results[0]['link'])
if detail:
print(json.dumps(detail, indent=2))
if __name__ == "__main__":
main()
This complete scraper handles search pages, detail pages, JSON-LD extraction, dynamic class matching, and data export.
Common Errors and Fixes
403 Forbidden Errors
AutoScout24's Akamai protection triggers 403 responses when it detects automation.
Fix: Switch from requests to Playwright or Nodriver. Use residential proxies and longer delays between requests.
Empty Results
Sometimes selectors return no data because AutoScout24 updated their HTML.
Fix: Use data-testid attributes instead of CSS classes. These change less often. Alternatively, extract JSON-LD data which follows a stable schema.
Captcha Challenges
Aggressive scraping triggers captcha verification pages.
Fix: Reduce request frequency. Use residential proxies with sticky sessions. Consider running in headful mode to solve captchas manually.
IP Blocks
Repeated requests from the same IP get temporarily or permanently blocked.
Fix: Rotate through a pool of residential proxies. Space requests across different times of day.
Rate Limiting
Too many requests in a short period triggers rate limits.
Fix: Add random delays of 3-10 seconds between requests. Scrape during off-peak hours (European nights).
Comparing Scraping Methods
Each scraping method has trade-offs. Here's a quick comparison:
| Method | Speed | Detection Risk | Setup Complexity | Best For |
|---|---|---|---|---|
| Python Requests | Fast | High | Low | Small datasets, testing |
| Playwright | Medium | Medium | Medium | JS-heavy pages, medium scale |
| Nodriver | Medium | Low | Low | Bypassing advanced protection |
| Requests + Proxies | Fast | Low | Medium | Large-scale production |
Choose requests for quick prototypes. Use Playwright when you need JavaScript execution. Pick Nodriver when Playwright gets detected.
For production scraping, combine requests with rotating residential proxies. This gives you speed and reliability at scale.
Best Practices for AutoScout24 Scraping
Following these practices helps you avoid blocks and maintain data quality.
Respect the Site
Add delays between requests. AutoScout24 serves millions of users daily. Hammering their servers hurts everyone.
Scrape during off-peak hours. European nighttime (midnight to 6 AM CET) sees less traffic and fewer rate limits.
Don't scrape the same listings repeatedly. Store data locally and only refresh periodically.
Handle Errors Gracefully
Implement exponential backoff for failed requests. If a request fails, wait 5 seconds before retrying. Double the wait time for each subsequent failure.
def fetch_with_retry(url, session, max_retries=3):
"""Fetch with exponential backoff."""
delay = 5
for attempt in range(max_retries):
try:
response = session.get(url, timeout=15)
if response.status_code == 200:
return response
elif response.status_code == 429: # Rate limited
time.sleep(delay)
delay *= 2
else:
return None
except Exception:
time.sleep(delay)
delay *= 2
return None
Validate Your Data
Check that extracted data makes sense. Prices should be positive numbers. Years should be between 1980 and 2026. Mileage can't be negative.
def validate_listing(listing):
"""Validate a car listing."""
# Clean price
price_str = listing.get('price', '')
price_clean = re.sub(r'[^\d]', '', price_str)
if price_clean and int(price_clean) > 0:
listing['price_numeric'] = int(price_clean)
else:
listing['price_numeric'] = None
# Validate year
year_str = listing.get('year', '')
year_match = re.search(r'(\d{4})', year_str)
if year_match:
year = int(year_match.group(1))
if 1980 <= year <= 2026:
listing['year_numeric'] = year
else:
listing['year_numeric'] = None
return listing
Store Data Efficiently
For large datasets, use a database instead of CSV files. SQLite works for local storage. PostgreSQL handles concurrent writes better.
import sqlite3
def create_database():
"""Create SQLite database for listings."""
conn = sqlite3.connect('autoscout24.db')
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS listings (
id TEXT PRIMARY KEY,
title TEXT,
price INTEGER,
mileage INTEGER,
year INTEGER,
fuel_type TEXT,
transmission TEXT,
link TEXT UNIQUE,
scraped_at TEXT
)
''')
conn.commit()
return conn
Using link as a unique constraint prevents duplicate entries when you re-scrape.
Advanced Techniques
These techniques handle edge cases and improve scraping reliability.
Fingerprint Rotation
Anti-bot systems track browser fingerprints. Rotate your fingerprint between sessions.
import random
def generate_fingerprint():
"""Generate random browser fingerprint settings."""
screen_sizes = [(1920, 1080), (1366, 768), (1536, 864), (1440, 900)]
languages = ['en-US', 'de-DE', 'en-GB', 'fr-FR']
timezones = ['Europe/Berlin', 'Europe/Vienna', 'Europe/Zurich', 'Europe/Paris']
return {
'viewport': random.choice(screen_sizes),
'language': random.choice(languages),
'timezone': random.choice(timezones),
}
Apply different fingerprints to each browser session.
Session Persistence
Keep sessions alive to maintain cookies and avoid re-authentication.
import pickle
def save_session(session, filename='session.pkl'):
"""Save session cookies to file."""
with open(filename, 'wb') as f:
pickle.dump(session.cookies, f)
def load_session(session, filename='session.pkl'):
"""Load session cookies from file."""
try:
with open(filename, 'rb') as f:
cookies = pickle.load(f)
session.cookies.update(cookies)
except FileNotFoundError:
pass
Concurrent Scraping
Speed up scraping with concurrent requests. Be careful not to exceed rate limits.
from concurrent.futures import ThreadPoolExecutor, as_completed
def scrape_urls_concurrently(urls, max_workers=3):
"""Scrape multiple URLs concurrently."""
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(fetch_and_parse, url): url for url in urls}
for future in as_completed(futures):
url = futures[future]
try:
result = future.result()
if result:
results.append(result)
except Exception as e:
print(f"Failed {url}: {e}")
return results
Keep max_workers low (3-5) to avoid triggering rate limits.
FAQ
Is scraping AutoScout24 legal?
Scraping publicly available data is generally legal for personal use and research. However, republishing scraped data or using it commercially may violate their terms of service. Consult a lawyer for your specific use case.
How many listings can I scrape per day?
AutoScout24 limits search results to 400 per query. With careful rate limiting and proxy rotation, you can scrape thousands of listings daily. Going too fast risks IP blocks.
Why does my scraper get blocked after a few requests?
Datacenter IPs are flagged immediately. AutoScout24's Akamai protection identifies non-browser requests. Switch to Playwright with residential proxies.
How do I handle different AutoScout24 country sites?
Each country uses a different domain or URL structure. Germany uses autoscout24.de, Switzerland uses autoscout24.ch. Adjust your base URL and potentially the CSS selectors.
Can I scrape historical pricing data?
No. AutoScout24 only shows current listings. For historical data, you need to scrape regularly and store results in a database over time.
What's the best proxy type for AutoScout24?
Residential proxies work best because they appear as regular home internet connections. ISP proxies are also effective. Avoid datacenter proxies as they get blocked quickly.
Summary
You now have three working methods to scrape AutoScout24: Python requests with header rotation, Playwright for JavaScript rendering, and Nodriver for stealth scraping.
Start with the requests approach for small projects. Move to Playwright when you hit blocks. Use Nodriver if advanced anti-bot detection stops Playwright from working.
Always respect rate limits, use delays between requests, and rotate your IP addresses with residential proxies when scaling up.
The complete scraper code in this guide handles pagination, JSON-LD extraction, and exports to CSV and JSON formats. Adapt it to your specific needs and target markets.