Immobilienscout24 is Germany's largest real estate portal, and one of the most aggressively protected scraping targets in Europe.
Most tutorials on this topic funnel you into a paid scraping API within the first three paragraphs. This one doesn't. You'll build everything yourself with Python and Playwright, extract data from hidden JSON blobs, handle pagination without triggering bot detection, and store results in clean CSV.
I've scraped Immobilienscout24 across hundreds of sessions while apartment hunting in Berlin and Munich. The techniques here reflect what actually survives their anti-bot stack in 2026.
What Is Immobilienscout24 Scraping?
Immobilienscout24 scraping is the process of programmatically extracting property listing data — rent prices, square footage, addresses, and amenities — from Germany's dominant real estate platform, immobilienscout24.de. It works by fetching search result pages and parsing hidden JSON data embedded in <script> tags rather than scraping visible HTML. Use it when you need structured real estate data for market analysis, apartment hunting automation, or investment research.
What Data Can You Scrape From Immobilienscout24?
When you scrape Immobilienscout24 search pages, the hidden JSON contains far more than what's visible in the UI.
Here's the full field map:
| Field | Source | Notes |
|---|---|---|
| Listing title | Search + Exposé | Usually includes address and room count |
| Cold rent (Kaltmiete) | Search + Exposé | Base rent before utilities |
| Warm rent (Warmmiete) | Exposé only | Includes Nebenkosten |
| Living area (m²) | Search + Exposé | |
| Number of rooms | Search + Exposé | German convention counts half-rooms |
| Full address | Exposé only | Search pages show approximate location |
| GPS coordinates | Search JSON | Latitude/longitude for each listing |
| Construction year | Exposé only | |
| Energy efficiency class | Exposé only | A+ through H |
| Floor level | Exposé only | |
| Deposit amount | Exposé only | Usually 2–3 months cold rent |
| Available from date | Exposé only | |
| Agent name and company | Exposé only | GDPR-sensitive — see legal section |
| Image URLs | Search + Exposé | Thumbnail in search, full-res in Exposé |
Search pages give you enough for filtering and analysis. Exposé pages give you the complete picture.
Most people who scrape Immobilienscout24 at scale work with search data first, then selectively fetch Exposé pages for listings that match their criteria. This keeps your request volume low and avoids unnecessary detection risk.
Prerequisites
Before writing any code, you'll need:
- Python 3.10+ installed
- Playwright for browser automation with stealth capabilities
- parsel for HTML/JSON parsing
- A basic understanding of browser DevTools
Set up your project with these commands:
mkdir immoscout-scraper && cd immoscout-scraper
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install playwright parsel
playwright install chromium
That last command downloads a Chromium binary Playwright controls. No Chrome installation required.
How Immobilienscout24's Anti-Bot System Works
Before you write a single request, you need to understand what you're up against.
Immobilienscout24 uses a layered detection system. It checks your TLS fingerprint, JavaScript execution environment, and behavioral patterns like mouse movement and scroll timing.
A plain requests.get() call will return a 403 or a challenge page within one or two requests. That's why every guide out there pushes paid proxy APIs — the naive approach genuinely doesn't work.
Here's what the detection stack looks for:
| Signal | What Gets Flagged |
|---|---|
| TLS fingerprint | Python requests has a distinct fingerprint vs. real browsers |
| Navigator properties | Headless Chrome exposes navigator.webdriver = true |
| Pagination pattern | Passing ?pagenumber=1 on the first page triggers a block |
| Request rate | More than ~10 requests/minute from one IP |
| Missing cookies | No prior session cookies = suspicious |
That pagination trap is worth highlighting. Every page except the first uses ?pagenumber=N in the URL. But if you include ?pagenumber=1 for the first page, Immobilienscout24 flags the request as bot traffic. Omit it entirely for page one.
Step 1: Launch a Stealth Browser Session
Playwright's Chromium instance passes most fingerprint checks out of the box when you configure it correctly.
import asyncio
from playwright.async_api import async_playwright
async def get_browser():
pw = async_playwright()
instance = await pw.start()
browser = await instance.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--no-sandbox",
]
)
context = await browser.new_context(
locale="de-DE",
timezone_id="Europe/Berlin",
viewport={"width": 1366, "height": 768},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
)
return instance, browser, context
Three things to notice. The --disable-blink-features=AutomationControlled flag removes the navigator.webdriver property. Setting locale to de-DE and timezone_id to Berlin makes the browser look like a German user. And the viewport matches a standard laptop resolution — unusual dimensions are a red flag.
Step 2: Scrape Immobilienscout24 Search Pages
Search result pages embed all listing data as JSON inside a <script> tag. You don't need to parse dozens of HTML elements — just extract the JSON blob.
import json
from parsel import Selector
async def scrape_search_page(context, url):
page = await context.new_page()
await page.goto(url, wait_until="domcontentloaded")
# Wait for the page to fully render
await page.wait_for_timeout(3000)
html = await page.content()
await page.close()
sel = Selector(text=html)
# The listing data lives in a script tag as JSON
raw_json = sel.xpath(
'//script[contains(text(),"resultListModel")]/text()'
).get()
if not raw_json:
return [], 0
data = json.loads(raw_json)
results = data["searchResponseModel"]["resultlist.resultlist"]
total = int(results["paging"]["numberOfListings"])
listings = results["resultlistEntries"][0]["resultlistEntry"]
return listings, total
The XPath selector targets script tags containing resultListModel. This JSON payload includes everything visible on the page — and some fields that aren't shown in the UI at all, like internal listing IDs and exact coordinate data.
Step 3: Parse Listing Data Into Clean Records
The raw JSON structure is nested and inconsistent. Some fields exist only for rental listings, others only for purchase listings. This parser handles both.
def parse_listing(entry):
"""Extract clean fields from a raw listing entry."""
data = entry.get("resultlist.realEstate", {})
address = data.get("address", {})
price = data.get("price", {})
return {
"id": entry.get("@id", ""),
"title": data.get("title", ""),
"city": address.get("city", ""),
"quarter": address.get("quarter", ""),
"postcode": address.get("postcode", ""),
"living_space": data.get("livingSpace", 0),
"rooms": data.get("numberOfRooms", 0),
"price_value": price.get("value", 0),
"price_currency": price.get("currency", "EUR"),
"is_private": data.get("privateOffer", False),
"url": f"https://www.immobilienscout24.de/expose/{entry.get('@id', '')}",
}
Note how we access resultlist.realEstate — the key literally contains a dot, which is why dictionary bracket notation is necessary. This trips up a lot of people.
Step 4: Handle Pagination Without Getting Blocked
This is where most scrapers fail on Immobilienscout24. The pagination logic has a specific quirk that will get your IP flagged if you ignore it.
import math
async def scrape_all_pages(context, base_url, max_pages=20):
all_listings = []
# Page 1: NO pagenumber parameter
listings, total = await scrape_search_page(context, base_url)
all_listings.extend([parse_listing(l) for l in listings])
pages_available = min(math.ceil(total / 20), max_pages)
print(f"Found {total} listings across {pages_available} pages")
# Pages 2+: include pagenumber parameter
for page_num in range(2, pages_available + 1):
url = f"{base_url}?pagenumber={page_num}"
listings, _ = await scrape_search_page(context, url)
all_listings.extend([parse_listing(l) for l in listings])
# Randomized delay: 5-12 seconds between requests
delay = 5 + (page_num % 7)
await asyncio.sleep(delay)
print(f"Page {page_num}/{pages_available} done ({len(all_listings)} total)")
return all_listings
Two things matter here. First, page one uses the bare URL — no query parameters. Second, the delay between requests isn't a fixed interval. A constant 5-second delay is itself a bot signal. The modulo trick creates a simple but variable pattern (5, 6, 7, 8, 9, 10, 11, 5, 6...) without importing random.
Step 5: Store Results to CSV
import csv
def save_to_csv(listings, filename="immoscout_results.csv"):
if not listings:
print("No listings to save")
return
keys = listings[0].keys()
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(listings)
print(f"Saved {len(listings)} listings to {filename}")
UTF-8 encoding is essential here. German property listings contain umlauts (ä, ö, ü) and the Eszett (ß) in addresses. Skip the encoding flag and your CSV will be garbled.
Step 6: Wire It All Together
Here's the complete scraper you can run from the command line:
async def main():
# Berlin rentals, 2+ rooms, up to €1500
search_url = (
"https://www.immobilienscout24.de/Suche/de"
"/berlin/berlin/wohnung-mieten"
)
instance, browser, context = await get_browser()
try:
listings = await scrape_all_pages(context, search_url)
save_to_csv(listings)
finally:
await browser.close()
await instance.stop()
if __name__ == "__main__":
asyncio.run(main())
Run it with python scraper.py and you'll get a CSV file with all listings from your search query.
Customize the URL by changing the path segments. The pattern is /Suche/de/{state}/{city}/{property-type}. For Munich purchases: /Suche/de/bayern/muenchen/wohnung-kaufen.
Scraping Individual Property Pages for Full Details
Search pages give you summaries. Individual listing pages (called "Exposés") contain far more data: energy ratings, floor plans, construction year, deposit amounts, and agent contact info.
async def scrape_property(context, listing_id):
url = f"https://www.immobilienscout24.de/expose/{listing_id}"
page = await context.new_page()
await page.goto(url, wait_until="domcontentloaded")
await page.wait_for_timeout(3000)
html = await page.content()
await page.close()
sel = Selector(text=html)
# Property details are in a separate JSON structure
script_data = sel.xpath(
'//script[contains(text(),"keyValues")]/text()'
).get()
if not script_data:
return {}
data = json.loads(script_data)
obj = data.get("expose", {})
return {
"id": listing_id,
"construction_year": obj.get("constructionYear", ""),
"energy_class": obj.get("energyEfficiencyClass", ""),
"floor": obj.get("floor", ""),
"deposit": obj.get("deposit", ""),
"available_from": obj.get("freeFrom", ""),
"heating_type": obj.get("heatingType", ""),
"has_balcony": obj.get("balcony", False),
"has_garden": obj.get("garden", False),
"has_elevator": obj.get("lift", False),
}
Rate-limit these requests aggressively. Property pages are heavier and more closely monitored than search pages. I keep it to one request every 8–15 seconds when scraping Exposés at volume.
Using Residential Proxies to Scale
At some point, a single IP won't cut it. Immobilienscout24 enforces per-IP rate limits that cap you at roughly 100–150 pages per session before you hit a CAPTCHA or temporary block.
Residential proxies solve this by routing each request through a different German household IP. The key word there is German — Immobilienscout24 geo-restricts some content and treats non-EU traffic with extra suspicion.
Here's how to integrate proxy rotation with Playwright:
async def get_browser_with_proxy(proxy_url):
pw = async_playwright()
instance = await pw.start()
browser = await instance.chromium.launch(
headless=True,
proxy={"server": proxy_url},
args=["--disable-blink-features=AutomationControlled"],
)
context = await browser.new_context(
locale="de-DE",
timezone_id="Europe/Berlin",
viewport={"width": 1366, "height": 768},
)
return instance, browser, context
Pass your proxy endpoint as proxy_url in the format http://user:pass@host:port. If you're using a rotating proxy service like Roundproxies, each request automatically gets a fresh IP from the pool.
For apartment hunting (a few hundred pages), you probably don't need proxies. For market research across all German cities, you absolutely do.
Automating Apartment Alerts With a Cron Job
Here's where scraping Immobilienscout24 gets genuinely useful beyond one-off data collection. Berlin's rental market moves fast — a good apartment gets 200+ inquiries within hours. If you can scrape Immobilienscout24 on a schedule and get notified the moment a new listing appears, you have a real edge.
This script compares each run against previously seen listing IDs and sends you a notification for new ones:
import json
import os
import smtplib
from email.mime.text import MIMEText
SEEN_FILE = "seen_ids.json"
def load_seen_ids():
if os.path.exists(SEEN_FILE):
with open(SEEN_FILE, "r") as f:
return set(json.load(f))
return set()
def save_seen_ids(ids):
with open(SEEN_FILE, "w") as f:
json.dump(list(ids), f)
def send_alert(new_listings):
body = "\n".join(
f"{l['title']} - €{l['price_value']} - {l['url']}"
for l in new_listings
)
msg = MIMEText(body)
msg["Subject"] = f"{len(new_listings)} new listings on ImmoScout"
msg["From"] = "scraper@yourdomain.com"
msg["To"] = "you@yourdomain.com"
# Configure with your SMTP server
with smtplib.SMTP("smtp.yourdomain.com", 587) as server:
server.starttls()
server.login("user", "password")
server.send_message(msg)
Wire this into your main function by checking listing["id"] against the seen set after each scrape. Save the updated set at the end.
Run it every 10–15 minutes with a cron job:
# crontab -e
*/15 * * * * cd /home/you/immoscout-scraper && /home/you/immoscout-scraper/venv/bin/python scraper.py
In my experience, running this scraper against Berlin apartment listings caught new postings an average of 12 minutes before they showed up in Immobilienscout24's own email alerts. That's the difference between being first in line and being applicant #150.
One caveat: don't run this against more than one or two search URLs per cron interval. Fifteen-minute cycles across a single search query keeps you well within safe request limits.
Troubleshooting
"Access Denied" or 403 on first request
Why: Your browser fingerprint is being rejected, usually because Playwright is running with default headless settings.
Fix: Make sure you're passing --disable-blink-features=AutomationControlled and setting a realistic user agent. Also verify your locale is de-DE.
Empty JSON / no script tag found
Why: The page loaded a challenge or CAPTCHA instead of actual content.
Fix: Increase the wait_for_timeout to 5000ms. If it persists, your IP is likely flagged — switch to a fresh one or wait 30 minutes.
Listings array is empty but total count is > 0
Why: You're probably hitting the ?pagenumber=1 trap on the first page.
Fix: Only append ?pagenumber=N for pages 2 and above. The first page URL must have no query parameters.
Scraper works once but fails on subsequent runs
Why: Immobilienscout24 stores session state. Reusing a stale browser context triggers detection.
Fix: Create a fresh browser context for each scraping session. Don't persist cookies between runs.
A Note on Responsible Scraping
Immobilienscout24's data is publicly visible — anyone can browse listings without an account. That said, a few ground rules keep you out of trouble.
Respect rate limits. Even if you can go faster, there's no reason to hammer their servers. A 5–15 second delay between requests is polite and sustainable.
Don't scrape personal data for commercial use. Agent names and contact details are covered by GDPR. If you're building a dataset for research or personal apartment hunting, you're fine. If you plan to sell scraped data, talk to a lawyer first.
Check robots.txt before scaling up. It won't stop your scraper technically, but it signals which paths the site explicitly discourages automated access to.
Taking Your Immobilienscout24 Scraper to Production
If you plan to scrape Immobilienscout24 regularly rather than as a one-off exercise, a few extra considerations will save you headaches.
Monitor for selector changes. Immobilienscout24 occasionally restructures their hidden JSON keys or renames fields. Build a simple validation step that checks whether expected keys like resultListModel exist in the response. If they don't, the scraper should log an error and stop rather than writing empty rows to your database.
Run headless browsers in Docker. Playwright's Chromium binary has OS-level dependencies that break across environments. A Docker container with mcr.microsoft.com/playwright/python:v1.48.0-noble as the base image gives you a reproducible setup that works identically on your laptop and on a VPS.
Separate scraping from storage. Write scraped data to a JSON lines file first, then process it into your database or CSV in a second step. If the scraper crashes mid-run, you keep everything it already collected. This pattern also lets you re-process historical data when you add new fields to your parser.
Keep logs. Every request should log the URL, status code, number of listings extracted, and timestamp. When something breaks at 3 AM, you'll be glad you can pinpoint exactly which page caused the failure.
In production, I typically scrape Immobilienscout24 from a small VPS in Frankfurt running Ubuntu with a cron job. The geographic proximity to Immobilienscout24's servers keeps latency low, and a German IP address avoids geo-restriction issues entirely.
Immobilienscout24 URL Structure Reference
Understanding the URL pattern lets you scrape Immobilienscout24 for any city, property type, or price range without manually browsing the site first.
The base search URL follows this template:
https://www.immobilienscout24.de/Suche/de/{state}/{city}/{type}
Here are the most common property type slugs:
| Slug | Property Type |
|---|---|
wohnung-mieten |
Apartment for rent |
wohnung-kaufen |
Apartment for sale |
haus-mieten |
House for rent |
haus-kaufen |
House for sale |
You can add filter parameters directly to the URL path rather than query strings. For example, this URL searches for Berlin rental apartments with 2+ rooms, 60+ m², under €1,000:
/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00
The older S-T path format still works and is what Immobilienscout24 generates when you use the site's search filters. Either URL format returns the same hidden JSON structure, so your parser code stays identical.
For building a scraper that covers multiple cities, store these URLs in a config file rather than hardcoding them. That way you can scrape Immobilienscout24 listings across Munich, Hamburg, Frankfurt, and Berlin without touching the scraper logic.
Wrapping Up
You now have a complete pipeline to scrape Immobilienscout24 — from launching a stealth browser and extracting hidden JSON, to handling the pagination anti-bot trap, enriching data from individual Exposé pages, and automating the whole thing on a schedule.
The approach here — Playwright with stealth flags, parsing embedded JSON instead of HTML, and variable request timing — works across most German real estate sites. The same patterns apply to Immowelt, WG-Gesucht, and Kleinanzeigen with minor selector adjustments.
If you're scaling beyond a single city, add residential proxy rotation and consider running multiple browser contexts in parallel with asyncio.gather(). Just keep the per-context rate under 10 pages per minute and you'll stay under the radar.