Web Scraping

How to scrape Bol.com in 4 easy steps

13 February 2026

12 min read

Bol.com is the largest e-commerce platform in the Netherlands and Belgium. Over 41 million products, 13 million active users.

If you sell in the Dutch or Belgian market, competitor pricing data from bol.com is worth its weight in gold.

But bol.com isn't a static HTML site you can hit with requests.get() and call it a day. It runs on NextJS with React and loads product data through internal GraphQL endpoints.

That means you need a browser-based approach — or better yet, a way to intercept those API calls directly.

This guide walks you through three methods to scrape bol.com, from basic Playwright automation to intercepting the underlying API requests that power the site. Working Python code included for every method.

What Is Bol.com Scraping?

Bol.com scraping is the process of extracting product data — prices, titles, ratings, seller information, and availability — from bol.com's product and search result pages. Because bol.com renders content with JavaScript and loads data through internal API calls, simple HTTP-based scrapers won't work. You need either browser automation (Playwright, Puppeteer) or network request interception to capture the data as the page loads. The most reliable approach combines both.

What Data Can You Extract From Bol.com?

When you scrape bol.com product pages, you get access to a rich set of data points. Here's what's available on a typical listing:

Data Point	Location	Extraction Difficulty
Product title	Search results + product page	Easy
Price (current, original, discount)	Buy block on product page	Easy
Seller name and rating	Buy block / offer section	Medium
All seller offers	"Other sellers" tab	Medium
Product specifications	Specs table on product page	Easy
Customer reviews and ratings	Review section (lazy-loaded)	Medium
EAN / product ID	URL path and page metadata	Easy
Stock and delivery status	Buy block	Easy
Category breadcrumbs	Top of product page	Easy
Image URLs	Product gallery	Medium

The "medium" items are lazy-loaded by React. They don't appear in the initial HTML — you need to scroll or wait for the component to fetch the data.

For price monitoring, you only need buy block data: title, price, seller, stock status. For full competitive analysis, you'll also want the seller offer list and review data.

Why Bol.com Is Tricky to Scrape

Most scraping tutorials show you how to hit a URL with requests and parse the HTML with BeautifulSoup. That approach fails on bol.com for three reasons.

First, bol.com is a single-page application built on NextJS and React. Product data doesn't exist in the initial HTML response — it's injected by JavaScript after the page loads.

A basic HTTP request returns an empty shell with no product information.

Second, bol.com loads data through internal GraphQL and REST endpoints. The product page you see in the browser is assembled from multiple asynchronous API calls.

If you can intercept those calls, you get structured JSON instead of parsing HTML.

Third, bol.com applies standard anti-bot protections: rate limiting, cookie consent gates, and browser fingerprinting. Not the hardest targets to bypass, but enough to block anyone running raw requests with default headers.

The upside: once you understand these layers, bol.com is actually easier to scrape than many comparable sites. The internal APIs return cleaner data than the HTML.

The bot protection is relatively light compared to Amazon or Walmart.

Prerequisites

Before you start, make sure you have:

Python 3.10+ installed
Playwright for browser automation
BeautifulSoup4 for HTML parsing
Basic familiarity with CSS selectors and browser DevTools

Install everything in one shot:

pip install playwright beautifulsoup4 lxml
playwright install chromium

The playwright install command downloads a bundled Chromium binary. It's around 150MB, so give it a minute on slower connections.

Step 1: Understand Bol.com's Page Structure

Before writing any scraping code, open bol.com in your browser and inspect the page. Right-click any product element and choose "Inspect."

You'll notice something immediately: the page HTML is generated client-side by React. Product listings live inside div elements with data-test attributes like product-title and product-price.

Here are the key selectors for a bol.com search results page:

# bol.com search results selectors (as of early 2026)
SELECTORS = {
    "product_card": '[data-test="product-card"]',
    "product_title": '[data-test="product-title"]',
    "product_price": '[data-test="product-price"]',
    "product_rating": '[data-test="rating"]',
    "product_link": 'a[href*="/p/"]',
    "next_page": '[data-test="pagination-next"]',
}

These selectors can change when bol.com updates their frontend. Always verify them in DevTools before running a scrape.

If a selector breaks, the data-test pattern is your best friend. Search for data-test= in the Elements panel to find the current attribute names.

Step 2: Scrape Search Results with Playwright

Let's start with the most straightforward method: loading a search results page in a headless browser and extracting product data from the rendered DOM.

This function navigates to a bol.com search URL, waits for the product cards to load, and pulls out the data:

import asyncio
from playwright.async_api import async_playwright

async def scrape_search_results(query: str, max_pages: int = 3):
    """Scrape bol.com search results for a given query."""
    products = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            locale="nl-NL",
            viewport={"width": 1280, "height": 800},
        )
        page = await context.new_page()

        for page_num in range(1, max_pages + 1):
            url = (
                f"https://www.bol.com/nl/nl/s/"
                f"?searchtext={query}&page={page_num}"
            )
            await page.goto(url, wait_until="networkidle")

            # Wait for product cards to render
            await page.wait_for_selector(
                '[data-test="product-card"]',
                timeout=10000,
            )

            cards = await page.query_selector_all(
                '[data-test="product-card"]'
            )

            for card in cards:
                title_el = await card.query_selector("a")
                price_el = await card.query_selector(
                    '[class*="price"]'
                )

                title = await title_el.inner_text() if title_el else ""
                link = await title_el.get_attribute("href") if title_el else ""
                price = await price_el.inner_text() if price_el else ""

                products.append({
                    "title": title.strip(),
                    "price": price.strip(),
                    "url": f"https://www.bol.com{link}" if link else "",
                    "page": page_num,
                })

            # Random delay between pages
            await asyncio.sleep(2)

        await browser.close()

    return products

A few things to notice. The locale="nl-NL" setting tells Chromium to present itself as a Dutch-language browser, which prevents bol.com from redirecting you to a country selection page.

The wait_until="networkidle" parameter ensures we don't try to parse the DOM before React has finished rendering.

Run it like this:

results = asyncio.run(scrape_search_results("laptop", max_pages=2))
for product in results[:5]:
    print(f"{product['title'][:60]} — {product['price']}")

You should see output like product titles with prices in euros. If you get empty results, bol.com may have updated their selectors — check DevTools and update the query selectors accordingly.

Step 3: Extract Detailed Product Data

Search results give you titles and prices, but product pages contain the good stuff: full descriptions, specifications, seller offers, reviews, and stock status.

Here's how to scrape a single product page:

async def scrape_product_page(product_url: str):
    """Extract detailed data from a bol.com product page."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(locale="nl-NL")
        page = await context.new_page()

        await page.goto(product_url, wait_until="networkidle")

        # Extract product details
        data = {}

        # Title
        title_el = await page.query_selector("h1")
        data["title"] = await title_el.inner_text() if title_el else ""

        # Price — look for the main price element
        price_el = await page.query_selector(
            '[data-test="buy-block"] [class*="price"]'
        )
        data["price"] = await price_el.inner_text() if price_el else ""

        # Rating
        rating_el = await page.query_selector(
            '[data-test="rating-value"]'
        )
        data["rating"] = await rating_el.inner_text() if rating_el else ""

        # Seller
        seller_el = await page.query_selector(
            '[data-test="seller-name"]'
        )
        data["seller"] = await seller_el.inner_text() if seller_el else ""

        # Specifications table
        specs = {}
        spec_rows = await page.query_selector_all(
            '[data-test="specifications"] tr'
        )
        for row in spec_rows:
            cells = await row.query_selector_all("td")
            if len(cells) >= 2:
                key = await cells[0].inner_text()
                val = await cells[1].inner_text()
                specs[key.strip()] = val.strip()

        data["specifications"] = specs
        data["url"] = product_url

        await browser.close()

    return data

The buy-block selector targets the main purchase area where the price and seller info live. Bol.com shows multiple sellers for the same product.

The first price you see is the "buy box" winner — usually the cheapest or best-rated seller.

One gotcha: bol.com lazy-loads specifications and reviews. If you need those sections, scroll the page first:

# Scroll to trigger lazy-loaded content
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await asyncio.sleep(1)
await page.evaluate("window.scrollTo(0, 0)")
await asyncio.sleep(1)

This mimics a user scrolling through the page, which triggers the React components to load the remaining content.

Step 4: Intercept API Requests (The Smart Approach)

Here's where it gets interesting. Bol.com's frontend fetches product data from internal API endpoints. If you intercept those network requests, you get structured JSON instead of messy HTML — cleaner, faster, and more reliable.

Open DevTools, go to the Network tab, and filter by Fetch/XHR. Load a product page and watch the requests. You'll see calls to endpoints that return product data as JSON.

Here's how to capture those responses with Playwright:

async def scrape_via_api_intercept(product_url: str):
    """Intercept bol.com's internal API calls for clean JSON data."""
    captured_data = {}

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(locale="nl-NL")
        page = await context.new_page()

        # Listen for API responses
        async def handle_response(response):
            url = response.url
            if "/api/" in url or "graphql" in url.lower():
                try:
                    body = await response.json()
                    # Store responses keyed by URL path
                    path = url.split("?")[0].split("/")[-1]
                    captured_data[path] = body
                except Exception:
                    pass

        page.on("response", handle_response)

        await page.goto(product_url, wait_until="networkidle")
        # Brief wait for any trailing async requests
        await asyncio.sleep(2)

        await browser.close()

    return captured_data

The handle_response callback fires for every network response. We filter for URLs containing /api/ or graphql and parse the JSON body.

This gives you raw product data exactly as bol.com's frontend receives it — no selector breakage, no missing fields.

The response typically includes product name, EAN code, prices from all sellers, review counts, category breadcrumbs, and image URLs. Much richer than what you'd parse from HTML.

This is the method I'd recommend for any production scraper targeting bol.com. Selectors change every few months. API response structures change far less frequently.

Step 5: Handle Pagination and Scale

A single search query on bol.com can return hundreds of pages. Here's a production-ready scraper that handles pagination, rate limiting, and data storage:

import asyncio
import json
import random
from datetime import datetime
from playwright.async_api import async_playwright

async def scrape_bol_at_scale(
    queries: list[str],
    max_pages_per_query: int = 10,
    output_file: str = "bol_data.json",
):
    """Production scraper with rate limiting and error handling."""
    all_products = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            locale="nl-NL",
            viewport={"width": 1280, "height": 800},
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/131.0.0.0 Safari/537.36"
            ),
        )

        for query in queries:
            print(f"Scraping: {query}")

            for page_num in range(1, max_pages_per_query + 1):
                try:
                    page = await context.new_page()
                    url = (
                        f"https://www.bol.com/nl/nl/s/"
                        f"?searchtext={query}&page={page_num}"
                    )

                    await page.goto(url, wait_until="networkidle")
                    await page.wait_for_selector(
                        '[data-test="product-card"]',
                        timeout=10000,
                    )

                    # Extract with JavaScript for speed
                    products = await page.evaluate("""
                        () => {
                            const cards = document.querySelectorAll(
                                '[data-test="product-card"]'
                            );
                            return Array.from(cards).map(card => {
                                const link = card.querySelector('a');
                                const price = card.querySelector(
                                    '[class*="price"]'
                                );
                                return {
                                    title: link?.innerText?.trim() || '',
                                    url: link?.href || '',
                                    price: price?.innerText?.trim() || '',
                                };
                            });
                        }
                    """)

                    for prod in products:
                        prod["query"] = query
                        prod["page"] = page_num
                        prod["scraped_at"] = datetime.now().isoformat()

                    all_products.extend(products)
                    await page.close()

                    # Polite delay: 2-5 seconds between pages
                    delay = random.uniform(2, 5)
                    await asyncio.sleep(delay)

                except Exception as e:
                    print(f"  Error on page {page_num}: {e}")
                    await asyncio.sleep(5)
                    continue

            print(f"  Collected {len(all_products)} products so far")

        await browser.close()

    # Save results
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(all_products, f, ensure_ascii=False, indent=2)

    print(f"Done. {len(all_products)} products saved to {output_file}")
    return all_products

Key design decisions in this code: we use page.evaluate() to run extraction logic inside the browser. This is significantly faster than making individual query_selector calls from Python.

Each page gets its own Playwright page object that we close after extraction, preventing memory leaks in long-running scrapes.

Run it against multiple product categories:

asyncio.run(scrape_bol_at_scale(
    queries=["laptop", "koptelefoon", "hardloop schoenen"],
    max_pages_per_query=5,
))

The random delay between 2 and 5 seconds keeps your request pattern unpredictable. Fixed delays are a dead giveaway to bot detection systems.

Avoiding Detection on Bol.com

Bol.com uses standard anti-bot protections. Nothing as aggressive as Cloudflare Enterprise, but enough to catch naive scrapers. Here's what works in practice.

Rotate User Agents

Don't use the same user agent string for every request. Keep a list of current, real browser user agents and rotate through them:

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
    "(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
    "(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) "
    "Gecko/20100101 Firefox/134.0",
]

Pick a random one when creating each browser context. Outdated user agents (Chrome 90, Firefox 80) will get you flagged instantly.

Use Residential Proxies

If you're scraping more than a few hundred pages, you'll need proxy rotation. Datacenter IPs get blocked quickly on e-commerce sites.

Residential proxies with Dutch IP addresses work best for bol.com since the site serves the Netherlands and Belgium. A provider like Roundproxies can supply rotating residential IPs in the right geography.

context = await browser.new_context(
    proxy={
        "server": "http://proxy-host:port",
        "username": "your-user",
        "password": "your-pass",
    },
    locale="nl-NL",
)

Respect Rate Limits

This is the single most important anti-detection measure. Keep your request rate under one page every 2 seconds. Bol.com handles thousands of requests per second from real users, but concentrated traffic from a single source stands out.

If you get a 429 status code, back off immediately. Double your delay and retry after 30 seconds.

Exporting Data to CSV

JSON is great for storage, but most people want spreadsheets. Here's a quick conversion:

import csv
import json

def json_to_csv(json_file: str, csv_file: str):
    """Convert scraped JSON data to CSV."""
    with open(json_file, "r", encoding="utf-8") as f:
        products = json.load(f)

    if not products:
        print("No data to convert")
        return

    keys = products[0].keys()

    with open(csv_file, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        for product in products:
            # Flatten specs dict if present
            row = {k: str(v) for k, v in product.items()}
            writer.writerow(row)

    print(f"Exported {len(products)} rows to {csv_file}")

For nested data like specifications, you'll want to flatten the dictionary into individual columns. The str(v) cast handles that for quick exports, though for production use you'd want proper column mapping.

Troubleshooting

"TimeoutError: Timeout 10000ms exceeded"

Why: The product cards didn't render in time. This usually means bol.com served a cookie consent banner, a CAPTCHA, or a country selection page instead of the search results.

Fix: Add consent banner handling before your main scrape logic:

# Dismiss cookie banner if present
try:
    consent_btn = await page.wait_for_selector(
        "#js-first-screen-accept",
        timeout=3000,
    )
    await consent_btn.click()
    await asyncio.sleep(1)
except Exception:
    pass  # No banner appeared

"Empty results — all fields are blank"

Why: Bol.com updated their CSS classes or data-test attributes. This is the most common maintenance issue with DOM-based scrapers.

Fix: Open bol.com in a real browser, inspect the elements you want, and update your selectors. Better yet, switch to the API interception method from Step 4, which is far less fragile.

"403 Forbidden after ~50 requests"

Why: Your IP got flagged. Bol.com rate-limits aggressive scrapers.

Fix: Slow down (3-5 second delays), rotate proxies, and use a realistic user agent. If you're hitting this consistently, your request pattern is too predictable — add randomized delays and vary your navigation flow.

"Page shows Dutch text I can't read"

Why: Bol.com defaults to Dutch for Netherlands traffic.

Fix: The product data is what matters, not the UI chrome. Use bol.com's built-in language switch to /nl/en/ for English product names on the Belgian site.

Alternatively, extract the raw data and translate field names in your post-processing pipeline.

A Note on Responsible Scraping

Bol.com's terms of service restrict automated data collection. Before you scrape, check their robots.txt at https://www.bol.com/robots.txt for current crawling rules.

Keep your scraper respectful: use reasonable delays, don't hammer their servers during peak hours, cache results so you're not re-scraping the same pages, and only collect publicly visible data.

If you need regular, high-volume product data from bol.com, check whether their Partner Platform API covers your use case. The official API is faster, more reliable, and won't get you blocked.

Which Method Should You Use to Scrape Bol.com?

Method	Speed	Reliability	Maintenance	Best For
DOM extraction (Step 2-3)	Slow	Medium	High (selectors break)	Quick one-off scrapes
API interception (Step 4)	Fast	High	Low (JSON structure stable)	Production scrapers
Full paginated scraper (Step 5)	Medium	Medium	Medium	Category-wide monitoring

If you're building something that needs to run daily for months, API interception is the clear winner. You avoid the fragility of CSS selectors entirely.

For a one-time competitive analysis — say, grabbing all laptop prices for a pricing report — the DOM extraction method from Step 2 is faster to set up and good enough.

For scraping entire product categories across bol.com on a regular schedule, combine the paginated scraper from Step 5 with the API interception logic from Step 4. That gives you both coverage and clean data.

Wrapping Up

You now have three working approaches to scrape bol.com: DOM extraction with Playwright for quick one-off scrapes, API interception for cleaner structured data, and a production-ready paginated scraper for ongoing collection.

Start with the API interception method from Step 4. It's the most resilient to frontend changes and returns data in a format you can pipe directly into a database or analytics pipeline.

Save DOM scraping as your fallback for data that doesn't show up in the API responses — things like rendered image thumbnails or specific UI elements that only exist in the client-side HTML.

If you're building a price monitoring system for the Dutch or Belgian market, combine the scraper with a cron job and a simple SQLite database.

Scrape bol.com once daily during off-peak hours (early morning CET works well), diff the results against your previous run, and alert on price changes above your threshold.

The same techniques you learned here apply to other Dutch e-commerce sites too. Coolblue, Wehkamp, and Marktplaats use similar JavaScript-heavy frontends that require browser automation.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

This article was originally published in February 2026, written by Marius Bernard. It was most recently updated in February 2026.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Tags

Related from Knowledge Base

What Is IP Rotation? How it works and why you need it

How to bypass Bot Detection in 2026: 8 easy methods

What is 403 Forbidden Error? Causes & Fixes Explained

Guide to List Crawling in 2026: Extract data at scale

HTTP Error 429: What It Is & How to Fix It (2026)

The 8 best Residential Proxy providers in 2026

How ISP Proxies work in 2026: Step by step explained

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

Web Scraping with Kotlin in 2026: Complete Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to use Playwright Proxy in 2026: Full setup guide

How to Take Screenshots with Puppeteer