Web Scraping

How to scrape Immobilienscout24 in simple 6 steps

09 April 2026

11 min read

Immobilienscout24 is Germany's largest real estate portal, and one of the most aggressively protected scraping targets in Europe.

Most tutorials on this topic funnel you into a paid scraping API within the first three paragraphs. This one doesn't. You'll build everything yourself with Python and Playwright, extract data from hidden JSON blobs, handle pagination without triggering bot detection, and store results in clean CSV.

I've scraped Immobilienscout24 across hundreds of sessions while apartment hunting in Berlin and Munich. The techniques here reflect what actually survives their anti-bot stack in 2026.

What Is Immobilienscout24 Scraping?

Immobilienscout24 scraping is the process of programmatically extracting property listing data — rent prices, square footage, addresses, and amenities — from Germany's dominant real estate platform, immobilienscout24.de. It works by fetching search result pages and parsing hidden JSON data embedded in <script> tags rather than scraping visible HTML. Use it when you need structured real estate data for market analysis, apartment hunting automation, or investment research.

What Data Can You Scrape From Immobilienscout24?

When you scrape Immobilienscout24 search pages, the hidden JSON contains far more than what's visible in the UI.

Here's the full field map:

Field	Source	Notes
Listing title	Search + Exposé	Usually includes address and room count
Cold rent (Kaltmiete)	Search + Exposé	Base rent before utilities
Warm rent (Warmmiete)	Exposé only	Includes Nebenkosten
Living area (m²)	Search + Exposé
Number of rooms	Search + Exposé	German convention counts half-rooms
Full address	Exposé only	Search pages show approximate location
GPS coordinates	Search JSON	Latitude/longitude for each listing
Construction year	Exposé only
Energy efficiency class	Exposé only	A+ through H
Floor level	Exposé only
Deposit amount	Exposé only	Usually 2–3 months cold rent
Available from date	Exposé only
Agent name and company	Exposé only	GDPR-sensitive — see legal section
Image URLs	Search + Exposé	Thumbnail in search, full-res in Exposé

Search pages give you enough for filtering and analysis. Exposé pages give you the complete picture.

Most people who scrape Immobilienscout24 at scale work with search data first, then selectively fetch Exposé pages for listings that match their criteria. This keeps your request volume low and avoids unnecessary detection risk.

Prerequisites

Before writing any code, you'll need:

Python 3.10+ installed
Playwright for browser automation with stealth capabilities
parsel for HTML/JSON parsing
A basic understanding of browser DevTools

Set up your project with these commands:

mkdir immoscout-scraper && cd immoscout-scraper
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install playwright parsel
playwright install chromium

That last command downloads a Chromium binary Playwright controls. No Chrome installation required.

How Immobilienscout24's Anti-Bot System Works

Before you write a single request, you need to understand what you're up against.

Immobilienscout24 uses a layered detection system. It checks your TLS fingerprint, JavaScript execution environment, and behavioral patterns like mouse movement and scroll timing.

A plain requests.get() call will return a 403 or a challenge page within one or two requests. That's why every guide out there pushes paid proxy APIs — the naive approach genuinely doesn't work.

Here's what the detection stack looks for:

Signal	What Gets Flagged
TLS fingerprint	Python `requests` has a distinct fingerprint vs. real browsers
Navigator properties	Headless Chrome exposes `navigator.webdriver = true`
Pagination pattern	Passing `?pagenumber=1` on the first page triggers a block
Request rate	More than ~10 requests/minute from one IP
Missing cookies	No prior session cookies = suspicious

That pagination trap is worth highlighting. Every page except the first uses ?pagenumber=N in the URL. But if you include ?pagenumber=1 for the first page, Immobilienscout24 flags the request as bot traffic. Omit it entirely for page one.

Step 1: Launch a Stealth Browser Session

Playwright's Chromium instance passes most fingerprint checks out of the box when you configure it correctly.

import asyncio
from playwright.async_api import async_playwright

async def get_browser():
    pw = async_playwright()
    instance = await pw.start()
    browser = await instance.chromium.launch(
        headless=True,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--no-sandbox",
        ]
    )
    context = await browser.new_context(
        locale="de-DE",
        timezone_id="Europe/Berlin",
        viewport={"width": 1366, "height": 768},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/125.0.0.0 Safari/537.36"
        ),
    )
    return instance, browser, context

Three things to notice. The --disable-blink-features=AutomationControlled flag removes the navigator.webdriver property. Setting locale to de-DE and timezone_id to Berlin makes the browser look like a German user. And the viewport matches a standard laptop resolution — unusual dimensions are a red flag.

Step 2: Scrape Immobilienscout24 Search Pages

Search result pages embed all listing data as JSON inside a <script> tag. You don't need to parse dozens of HTML elements — just extract the JSON blob.

import json
from parsel import Selector

async def scrape_search_page(context, url):
    page = await context.new_page()
    await page.goto(url, wait_until="domcontentloaded")
    # Wait for the page to fully render
    await page.wait_for_timeout(3000)

    html = await page.content()
    await page.close()

    sel = Selector(text=html)
    # The listing data lives in a script tag as JSON
    raw_json = sel.xpath(
        '//script[contains(text(),"resultListModel")]/text()'
    ).get()

    if not raw_json:
        return [], 0

    data = json.loads(raw_json)
    results = data["searchResponseModel"]["resultlist.resultlist"]
    total = int(results["paging"]["numberOfListings"])
    listings = results["resultlistEntries"][0]["resultlistEntry"]

    return listings, total

The XPath selector targets script tags containing resultListModel. This JSON payload includes everything visible on the page — and some fields that aren't shown in the UI at all, like internal listing IDs and exact coordinate data.

Step 3: Parse Listing Data Into Clean Records

The raw JSON structure is nested and inconsistent. Some fields exist only for rental listings, others only for purchase listings. This parser handles both.

def parse_listing(entry):
    """Extract clean fields from a raw listing entry."""
    data = entry.get("resultlist.realEstate", {})
    address = data.get("address", {})
    price = data.get("price", {})

    return {
        "id": entry.get("@id", ""),
        "title": data.get("title", ""),
        "city": address.get("city", ""),
        "quarter": address.get("quarter", ""),
        "postcode": address.get("postcode", ""),
        "living_space": data.get("livingSpace", 0),
        "rooms": data.get("numberOfRooms", 0),
        "price_value": price.get("value", 0),
        "price_currency": price.get("currency", "EUR"),
        "is_private": data.get("privateOffer", False),
        "url": f"https://www.immobilienscout24.de/expose/{entry.get('@id', '')}",
    }

Note how we access resultlist.realEstate — the key literally contains a dot, which is why dictionary bracket notation is necessary. This trips up a lot of people.

Step 4: Handle Pagination Without Getting Blocked

This is where most scrapers fail on Immobilienscout24. The pagination logic has a specific quirk that will get your IP flagged if you ignore it.

import math

async def scrape_all_pages(context, base_url, max_pages=20):
    all_listings = []

    # Page 1: NO pagenumber parameter
    listings, total = await scrape_search_page(context, base_url)
    all_listings.extend([parse_listing(l) for l in listings])

    pages_available = min(math.ceil(total / 20), max_pages)
    print(f"Found {total} listings across {pages_available} pages")

    # Pages 2+: include pagenumber parameter
    for page_num in range(2, pages_available + 1):
        url = f"{base_url}?pagenumber={page_num}"
        listings, _ = await scrape_search_page(context, url)
        all_listings.extend([parse_listing(l) for l in listings])

        # Randomized delay: 5-12 seconds between requests
        delay = 5 + (page_num % 7)
        await asyncio.sleep(delay)
        print(f"Page {page_num}/{pages_available} done ({len(all_listings)} total)")

    return all_listings

Two things matter here. First, page one uses the bare URL — no query parameters. Second, the delay between requests isn't a fixed interval. A constant 5-second delay is itself a bot signal. The modulo trick creates a simple but variable pattern (5, 6, 7, 8, 9, 10, 11, 5, 6...) without importing random.

Step 5: Store Results to CSV

import csv

def save_to_csv(listings, filename="immoscout_results.csv"):
    if not listings:
        print("No listings to save")
        return

    keys = listings[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(listings)

    print(f"Saved {len(listings)} listings to {filename}")

UTF-8 encoding is essential here. German property listings contain umlauts (ä, ö, ü) and the Eszett (ß) in addresses. Skip the encoding flag and your CSV will be garbled.

Step 6: Wire It All Together

Here's the complete scraper you can run from the command line:

async def main():
    # Berlin rentals, 2+ rooms, up to €1500
    search_url = (
        "https://www.immobilienscout24.de/Suche/de"
        "/berlin/berlin/wohnung-mieten"
    )

    instance, browser, context = await get_browser()

    try:
        listings = await scrape_all_pages(context, search_url)
        save_to_csv(listings)
    finally:
        await browser.close()
        await instance.stop()

if __name__ == "__main__":
    asyncio.run(main())

Run it with python scraper.py and you'll get a CSV file with all listings from your search query.

Customize the URL by changing the path segments. The pattern is /Suche/de/{state}/{city}/{property-type}. For Munich purchases: /Suche/de/bayern/muenchen/wohnung-kaufen.

Scraping Individual Property Pages for Full Details

Search pages give you summaries. Individual listing pages (called "Exposés") contain far more data: energy ratings, floor plans, construction year, deposit amounts, and agent contact info.

async def scrape_property(context, listing_id):
    url = f"https://www.immobilienscout24.de/expose/{listing_id}"
    page = await context.new_page()
    await page.goto(url, wait_until="domcontentloaded")
    await page.wait_for_timeout(3000)

    html = await page.content()
    await page.close()

    sel = Selector(text=html)
    # Property details are in a separate JSON structure
    script_data = sel.xpath(
        '//script[contains(text(),"keyValues")]/text()'
    ).get()

    if not script_data:
        return {}

    data = json.loads(script_data)
    obj = data.get("expose", {})

    return {
        "id": listing_id,
        "construction_year": obj.get("constructionYear", ""),
        "energy_class": obj.get("energyEfficiencyClass", ""),
        "floor": obj.get("floor", ""),
        "deposit": obj.get("deposit", ""),
        "available_from": obj.get("freeFrom", ""),
        "heating_type": obj.get("heatingType", ""),
        "has_balcony": obj.get("balcony", False),
        "has_garden": obj.get("garden", False),
        "has_elevator": obj.get("lift", False),
    }

Rate-limit these requests aggressively. Property pages are heavier and more closely monitored than search pages. I keep it to one request every 8–15 seconds when scraping Exposés at volume.

Using Residential Proxies to Scale

At some point, a single IP won't cut it. Immobilienscout24 enforces per-IP rate limits that cap you at roughly 100–150 pages per session before you hit a CAPTCHA or temporary block.

Residential proxies solve this by routing each request through a different German household IP. The key word there is German — Immobilienscout24 geo-restricts some content and treats non-EU traffic with extra suspicion.

Here's how to integrate proxy rotation with Playwright:

async def get_browser_with_proxy(proxy_url):
    pw = async_playwright()
    instance = await pw.start()
    browser = await instance.chromium.launch(
        headless=True,
        proxy={"server": proxy_url},
        args=["--disable-blink-features=AutomationControlled"],
    )
    context = await browser.new_context(
        locale="de-DE",
        timezone_id="Europe/Berlin",
        viewport={"width": 1366, "height": 768},
    )
    return instance, browser, context

Pass your proxy endpoint as proxy_url in the format http://user:pass@host:port. If you're using a rotating proxy service like Roundproxies, each request automatically gets a fresh IP from the pool.

For apartment hunting (a few hundred pages), you probably don't need proxies. For market research across all German cities, you absolutely do.

Automating Apartment Alerts With a Cron Job

Here's where scraping Immobilienscout24 gets genuinely useful beyond one-off data collection. Berlin's rental market moves fast — a good apartment gets 200+ inquiries within hours. If you can scrape Immobilienscout24 on a schedule and get notified the moment a new listing appears, you have a real edge.

This script compares each run against previously seen listing IDs and sends you a notification for new ones:

import json
import os
import smtplib
from email.mime.text import MIMEText

SEEN_FILE = "seen_ids.json"

def load_seen_ids():
    if os.path.exists(SEEN_FILE):
        with open(SEEN_FILE, "r") as f:
            return set(json.load(f))
    return set()

def save_seen_ids(ids):
    with open(SEEN_FILE, "w") as f:
        json.dump(list(ids), f)

def send_alert(new_listings):
    body = "\n".join(
        f"{l['title']} - €{l['price_value']} - {l['url']}"
        for l in new_listings
    )
    msg = MIMEText(body)
    msg["Subject"] = f"{len(new_listings)} new listings on ImmoScout"
    msg["From"] = "scraper@yourdomain.com"
    msg["To"] = "you@yourdomain.com"
    # Configure with your SMTP server
    with smtplib.SMTP("smtp.yourdomain.com", 587) as server:
        server.starttls()
        server.login("user", "password")
        server.send_message(msg)

Wire this into your main function by checking listing["id"] against the seen set after each scrape. Save the updated set at the end.

Run it every 10–15 minutes with a cron job:

# crontab -e
*/15 * * * * cd /home/you/immoscout-scraper && /home/you/immoscout-scraper/venv/bin/python scraper.py

In my experience, running this scraper against Berlin apartment listings caught new postings an average of 12 minutes before they showed up in Immobilienscout24's own email alerts. That's the difference between being first in line and being applicant #150.

One caveat: don't run this against more than one or two search URLs per cron interval. Fifteen-minute cycles across a single search query keeps you well within safe request limits.

Troubleshooting

"Access Denied" or 403 on first request

Why: Your browser fingerprint is being rejected, usually because Playwright is running with default headless settings.

Fix: Make sure you're passing --disable-blink-features=AutomationControlled and setting a realistic user agent. Also verify your locale is de-DE.

Empty JSON / no script tag found

Why: The page loaded a challenge or CAPTCHA instead of actual content.

Fix: Increase the wait_for_timeout to 5000ms. If it persists, your IP is likely flagged — switch to a fresh one or wait 30 minutes.

Listings array is empty but total count is > 0

Why: You're probably hitting the ?pagenumber=1 trap on the first page.

Fix: Only append ?pagenumber=N for pages 2 and above. The first page URL must have no query parameters.

Scraper works once but fails on subsequent runs

Why: Immobilienscout24 stores session state. Reusing a stale browser context triggers detection.

Fix: Create a fresh browser context for each scraping session. Don't persist cookies between runs.

A Note on Responsible Scraping

Immobilienscout24's data is publicly visible — anyone can browse listings without an account. That said, a few ground rules keep you out of trouble.

Respect rate limits. Even if you can go faster, there's no reason to hammer their servers. A 5–15 second delay between requests is polite and sustainable.

Don't scrape personal data for commercial use. Agent names and contact details are covered by GDPR. If you're building a dataset for research or personal apartment hunting, you're fine. If you plan to sell scraped data, talk to a lawyer first.

Check robots.txt before scaling up. It won't stop your scraper technically, but it signals which paths the site explicitly discourages automated access to.

Taking Your Immobilienscout24 Scraper to Production

If you plan to scrape Immobilienscout24 regularly rather than as a one-off exercise, a few extra considerations will save you headaches.

Monitor for selector changes. Immobilienscout24 occasionally restructures their hidden JSON keys or renames fields. Build a simple validation step that checks whether expected keys like resultListModel exist in the response. If they don't, the scraper should log an error and stop rather than writing empty rows to your database.

Run headless browsers in Docker. Playwright's Chromium binary has OS-level dependencies that break across environments. A Docker container with mcr.microsoft.com/playwright/python:v1.48.0-noble as the base image gives you a reproducible setup that works identically on your laptop and on a VPS.

Separate scraping from storage. Write scraped data to a JSON lines file first, then process it into your database or CSV in a second step. If the scraper crashes mid-run, you keep everything it already collected. This pattern also lets you re-process historical data when you add new fields to your parser.

Keep logs. Every request should log the URL, status code, number of listings extracted, and timestamp. When something breaks at 3 AM, you'll be glad you can pinpoint exactly which page caused the failure.

In production, I typically scrape Immobilienscout24 from a small VPS in Frankfurt running Ubuntu with a cron job. The geographic proximity to Immobilienscout24's servers keeps latency low, and a German IP address avoids geo-restriction issues entirely.

Immobilienscout24 URL Structure Reference

Understanding the URL pattern lets you scrape Immobilienscout24 for any city, property type, or price range without manually browsing the site first.

The base search URL follows this template:

https://www.immobilienscout24.de/Suche/de/{state}/{city}/{type}

Here are the most common property type slugs:

Slug	Property Type
`wohnung-mieten`	Apartment for rent
`wohnung-kaufen`	Apartment for sale
`haus-mieten`	House for rent
`haus-kaufen`	House for sale

You can add filter parameters directly to the URL path rather than query strings. For example, this URL searches for Berlin rental apartments with 2+ rooms, 60+ m², under €1,000:

/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00

The older S-T path format still works and is what Immobilienscout24 generates when you use the site's search filters. Either URL format returns the same hidden JSON structure, so your parser code stays identical.

For building a scraper that covers multiple cities, store these URLs in a config file rather than hardcoding them. That way you can scrape Immobilienscout24 listings across Munich, Hamburg, Frankfurt, and Berlin without touching the scraper logic.

Wrapping Up

You now have a complete pipeline to scrape Immobilienscout24 — from launching a stealth browser and extracting hidden JSON, to handling the pagination anti-bot trap, enriching data from individual Exposé pages, and automating the whole thing on a schedule.

The approach here — Playwright with stealth flags, parsing embedded JSON instead of HTML, and variable request timing — works across most German real estate sites. The same patterns apply to Immowelt, WG-Gesucht, and Kleinanzeigen with minor selector adjustments.

If you're scaling beyond a single city, add residential proxy rotation and consider running multiple browser contexts in parallel with asyncio.gather(). Just keep the per-context rate under 10 pages per minute and you'll stay under the radar.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

This article was originally published in April 2026, written by Marius Bernard. It was most recently updated in April 2026.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Tags

Related from Knowledge Base

BoringSSL: Google's TLS Library Behind Chrome Fingerprinting

What Is IP Rotation? How it works and why you need it

How to bypass Bot Detection in 2026: 8 easy methods

What is 403 Forbidden Error? Causes & Fixes Explained

Guide to List Crawling in 2026: Extract data at scale

HTTP Error 429: What It Is & How to Fix It (2026)

The 8 best Residential Proxy providers in 2026

How ISP Proxies work in 2026: Step by step explained

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

Web Scraping with Kotlin in 2026: Complete Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to use Playwright Proxy in 2026: Full setup guide