Nodriver

Nodriver Web Scraping Tutorial for 2026

If you've tried scraping a Cloudflare-protected site with Selenium, you already know the cycle: blocked after three requests, CAPTCHA walls on page one, your scraper dead before it collected a single data point.

Nodriver fixes the root cause. It strips out every automation fingerprint that standard tools leave behind. No WebDriver binary. No navigator.webdriver = true.

Just a real Chrome instance controlled through a custom CDP implementation that anti-bot systems can't distinguish from a human user.

In this tutorial you'll build a working scraper from scratch. You'll extract product data, handle pagination, intercept network responses at the protocol level, and run multiple tabs concurrently.

Every code snippet is copy-paste ready and tested against Python 3.12.

What Is Nodriver Web Scraping?

Nodriver web scraping uses a Python library that communicates directly with Chrome through a custom Chrome DevTools Protocol implementation. It bypasses Selenium, ChromeDriver binaries, and their detectable automation markers entirely.

Built as the official successor to undetected-chromedriver, nodriver is fully asynchronous and launches a clean browser profile by default.

It passes most anti-bot checks from Cloudflare, Imperva, and hCaptcha out of the box. Use it when your target site blocks standard browser automation.

The library was created by the same developer behind undetected-chromedriver (GitHub repo).

The architectural shift is significant: instead of patching Selenium to hide automation signals, nodriver skips Selenium entirely and talks to Chrome through its own CDP layer.

This means the browser doesn't load Selenium's JavaScript injections. Properties like navigator.webdriver stay at their natural values. Fingerprinting scripts that check for automation artifacts find nothing unusual.

The tradeoff is ecosystem maturity. Nodriver's documentation is minimal, the community is small, and some API methods are still unstable.

You'll read source code more often than Stack Overflow. But for stealth, nothing in the Python ecosystem matches it right now.

Prerequisites

You need three things installed before writing any code:

  • Python 3.9+ (3.12 recommended)
  • Google Chrome or any Chromium-based browser, installed in its default location
  • pip for package management

Create a virtual environment to keep dependencies isolated:

python -m venv nodriver-env
source nodriver-env/bin/activate  # Windows: nodriver-env\Scripts\activate
pip install nodriver

That's it. No separate ChromeDriver download, no browser binary management. Nodriver detects your Chrome installation automatically.

Step 1: Launch a Browser and Grab a Page

Every nodriver script follows the same async skeleton. Here's the minimal version that opens a page and prints its HTML:

import nodriver as uc

async def main():
    browser = await uc.start()
    tab = await browser.get("https://books.toscrape.com")

    # get_content() returns the full page HTML
    html = await tab.get_content()
    print(html[:500])

    browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

Two things to notice here. First, uc.loop().run_until_complete() replaces asyncio.run() — the library ships its own event loop wrapper because asyncio.run() causes issues with nodriver's internal cleanup.

Second, tab and page are interchangeable names for the same object. The official docs use both.

If Chrome opens and you see HTML output in your terminal, your environment is working.

Step 2: Find and Extract Elements

Nodriver gives you several ways to locate elements. The two most reliable methods in 2026 are find() for single elements and select_all() for collections.

This snippet scrapes book titles and prices from the demo bookstore:

import nodriver as uc

async def main():
    browser = await uc.start()
    tab = await browser.get("https://books.toscrape.com")

    # select_all returns a list of Element objects
    books = await tab.select_all("article.product_pod")

    for book in books:
        title_el = await book.query_selector("h3 a")
        price_el = await book.query_selector(".price_color")

        title = title_el.attrs.get("title", "No title")
        price = price_el.text
        print(f"{title}: {price}")

    browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

A common gotcha: element.attrs returns a flat list in some nodriver versions, not a dictionary. If .get() throws an error, iterate the list manually.

The API is still maturing. Check your version with pip show nodriver.

You can also find elements by their visible text, which is useful when CSS selectors are obfuscated:

# find the first element containing this exact text
button = await tab.find("Add to basket")
if button:
    await button.click()

This text-based lookup searches the entire DOM. It's slower than a CSS selector on large pages, but it works when class names are randomized.

Step 3: Handle Pagination

Most real scraping jobs span multiple pages. The pattern is straightforward: scrape the current page, find the "next" link, navigate to it, repeat.

import nodriver as uc

async def scrape_all_pages():
    browser = await uc.start()
    tab = await browser.get("https://books.toscrape.com")
    all_books = []

    while True:
        books = await tab.select_all("article.product_pod")
        for book in books:
            title_el = await book.query_selector("h3 a")
            price_el = await book.query_selector(".price_color")
            all_books.append({
                "title": title_el.attrs.get("title", ""),
                "price": price_el.text
            })

        # look for the "next" button
        try:
            next_btn = await tab.query_selector("li.next a")
            if next_btn:
                await next_btn.click()
                await tab.sleep(1)  # wait for page load
            else:
                break
        except Exception:
            break

    print(f"Scraped {len(all_books)} books")
    browser.stop()
    return all_books

if __name__ == "__main__":
    uc.loop().run_until_complete(scrape_all_pages())

The await tab.sleep(1) call is a simple wait. On production targets you'll want smarter waits.

Use await tab.find("some text on next page") as an implicit wait — nodriver polls the DOM until that element appears or the timeout hits.

Don't use Python's time.sleep() in async code. It blocks the entire event loop.

Step 4: Scrape Dynamic Content

Sites that load content via JavaScript — infinite scroll, lazy-loaded images, AJAX-populated sections — need a different approach. You have to trigger the load and then wait for new elements to appear.

Here's a pattern for infinite scroll pages:

async def scrape_infinite_scroll(tab, max_scrolls=10):
    previous_height = 0

    for i in range(max_scrolls):
        # scroll to page bottom
        await tab.scroll_down(1000)
        await tab.sleep(2)  # let content load

        # check if page grew
        current_height = await tab.evaluate(
            "document.body.scrollHeight"
        )
        if current_height == previous_height:
            break  # no new content loaded
        previous_height = current_height

    # now extract all loaded elements
    items = await tab.select_all(".product-card")
    return items

The key here is tab.evaluate(), which runs arbitrary JavaScript in the browser context. You can use it to check scroll height, trigger click events, or read any variable the page's own scripts have set.

For AJAX-loaded content that appears after clicking a "Load More" button, the approach is simpler:

while True:
    load_more = await tab.find("Load More")
    if not load_more:
        break
    await load_more.click()
    await tab.sleep(1.5)

Step 5: Intercept Network Requests with CDP

This is where nodriver web scraping gets powerful — and where most tutorials stop short.

Because nodriver communicates through CDP, you can intercept every HTTP request and response the browser makes. This opens up three powerful patterns:

Grabbing API responses directly (often cleaner than parsing HTML), blocking unnecessary resources to speed up scraping, and modifying request headers on the fly.

Here's how to capture XHR/fetch responses:

import nodriver as uc
import json

collected_responses = []

async def handle_response(event):
    """Callback fired for every network response."""
    url = event.response.url
    if "/api/products" in url:
        # use CDP to get the response body
        body = await event.tab.send(
            uc.cdp.network.get_response_body(event.request_id)
        )
        data = json.loads(body.body)
        collected_responses.append(data)

async def main():
    browser = await uc.start()
    tab = await browser.get("about:blank")

    # enable network domain and attach listener
    await tab.send(uc.cdp.network.enable())
    tab.add_handler(
        uc.cdp.network.ResponseReceived,
        handle_response
    )

    await tab.get("https://example-spa.com/products")
    await tab.sleep(5)  # let API calls complete

    print(f"Captured {len(collected_responses)} API responses")
    for resp in collected_responses:
        print(json.dumps(resp, indent=2)[:200])

    browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

This approach catches the structured JSON before it hits the DOM. No HTML parsing needed.

When a site loads product data from an internal API, intercepting that response gives you clean, typed data instead of scraping rendered text.

You can also block resource types to speed up page loads:

async def block_images(event):
    """Block image requests to reduce bandwidth."""
    if event.resource_type == "Image":
        await event.tab.send(
            uc.cdp.fetch.fail_request(
                event.request_id, 
                uc.cdp.network.ErrorReason.BLOCKED_BY_CLIENT
            )
        )

Blocking images, fonts, and CSS can cut page load times by 60–70% when you only need text data.

Step 6: Run Multiple Tabs Concurrently

Nodriver's async architecture means you can scrape multiple pages at the same time. This is one of its real advantages over synchronous Selenium setups.

Here's a pattern that opens multiple tabs and scrapes them in parallel:

import nodriver as uc
import asyncio

async def scrape_page(browser, url):
    """Scrape a single page in its own tab."""
    tab = await browser.get(url, new_tab=True)
    await tab.sleep(2)

    title_el = await tab.query_selector("h1")
    title = title_el.text if title_el else "No title"

    content = await tab.get_content()
    await tab.close()

    return {"url": url, "title": title, "length": len(content)}

async def main():
    browser = await uc.start()
    urls = [
        "https://books.toscrape.com/catalogue/page-1.html",
        "https://books.toscrape.com/catalogue/page-2.html",
        "https://books.toscrape.com/catalogue/page-3.html",
        "https://books.toscrape.com/catalogue/page-4.html",
        "https://books.toscrape.com/catalogue/page-5.html",
    ]

    # run all tab scrapes concurrently
    tasks = [scrape_page(browser, url) for url in urls]
    results = await asyncio.gather(*tasks)

    for r in results:
        print(f"{r['url']}: {r['title']} ({r['length']} chars)")

    browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

A few things to keep in mind. Don't open 50 tabs at once — Chrome's memory usage scales linearly with tabs, and you'll look very bot-like to any server watching concurrent connections.

Five to ten concurrent tabs is a reasonable ceiling for most machines.

If a target site is rate-limited, stagger your requests with asyncio.Semaphore:

sem = asyncio.Semaphore(3)  # max 3 concurrent tabs

async def scrape_page_throttled(browser, url):
    async with sem:
        return await scrape_page(browser, url)

Step 7: Add Proxy Rotation

Running every request through your home IP is a fast way to get blocked — especially at scale. Proxies distribute your requests across different addresses.

Nodriver accepts proxy configuration through browser args:

import nodriver as uc

async def main():
    browser = await uc.start(
        browser_args=[
            "--proxy-server=http://proxy-host:port"
        ]
    )
    tab = await browser.get("https://httpbin.org/ip")
    content = await tab.get_content()
    print(content)  # should show the proxy IP
    browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

For rotating proxies, you need to restart the browser with a different proxy for each rotation. Alternatively, use an upstream proxy gateway that handles rotation internally.

Residential proxies work best for nodriver web scraping because they come from real ISP-assigned addresses. This aligns with nodriver's goal of looking like a genuine browser session.

If you need authenticated proxies (username:password), pass them through a local proxy forwarder or use Chrome's --proxy-auth flag.

Nodriver's built-in create_context() method for proxies exists but its behavior is inconsistent across versions. Browser args are the reliable path for now.

Here's a rotation pattern that cycles through a proxy list:

import nodriver as uc
import random

PROXIES = [
    "http://proxy1:port",
    "http://proxy2:port",
    "http://proxy3:port",
]

async def scrape_with_rotation(urls):
    results = []
    for url in urls:
        proxy = random.choice(PROXIES)
        browser = await uc.start(
            browser_args=[f"--proxy-server={proxy}"]
        )
        tab = await browser.get(url)
        content = await tab.get_content()
        results.append({"url": url, "proxy": proxy, "length": len(content)})
        browser.stop()
    return results

This restarts Chrome for every proxy switch, which adds overhead.

For high-volume jobs, a proxy gateway that rotates IPs upstream is more efficient — you set one proxy endpoint in browser args and the gateway handles the rest.

Step 8: Export Scraped Data

Collecting data is only half the job. You need to store it somewhere useful.

For small to medium datasets, CSV works fine:

import csv

def save_to_csv(data, filename="output.csv"):
    if not data:
        return
    keys = data[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(data)
    print(f"Saved {len(data)} rows to {filename}")

For larger datasets or ongoing scraping jobs, write to a SQLite database instead. It handles concurrent writes better than flat files and lets you query your data without loading everything into memory:

import sqlite3
import json

def save_to_sqlite(data, db_name="scraped.db", table="products"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    if data:
        cols = ", ".join(data[0].keys())
        placeholders = ", ".join(["?"] * len(data[0]))
        cursor.execute(
            f"CREATE TABLE IF NOT EXISTS {table} ({cols})"
        )
        for row in data:
            cursor.execute(
                f"INSERT INTO {table} VALUES ({placeholders})",
                list(row.values())
            )
    conn.commit()
    conn.close()

Pair either export method with the pagination scraper from Step 3 and you have a complete data pipeline.

Working with Cookies and Sessions

Some scraping targets require login. Nodriver handles cookies automatically within a session, but the real value is saving and reloading them across runs. This avoids logging in every time your script starts.

Save cookies after authentication:

import nodriver as uc
import json

async def login_and_save_cookies():
    browser = await uc.start()
    tab = await browser.get("https://example.com/login")

    # fill in credentials
    email_field = await tab.select("input[name='email']")
    await email_field.send_keys("you@example.com")

    pass_field = await tab.select("input[name='password']")
    await pass_field.send_keys("your-password")

    submit = await tab.find("Sign in")
    await submit.click()
    await tab.sleep(3)

    # grab cookies from the browser
    cookies = await tab.send(uc.cdp.network.get_all_cookies())
    with open("cookies.json", "w") as f:
        json.dump([c.to_json() for c in cookies], f)

    browser.stop()

Load them in a later session:

async def scrape_with_cookies():
    browser = await uc.start()
    tab = await browser.get("about:blank")

    with open("cookies.json") as f:
        cookies = json.load(f)

    for cookie in cookies:
        await tab.send(uc.cdp.network.set_cookie(**cookie))

    # navigate to authenticated page
    await tab.get("https://example.com/dashboard")
    content = await tab.get_content()
    print(content[:500])
    browser.stop()

This pattern uses CDP's network domain directly. It's more reliable than trying to inject cookies through JavaScript because it sets them at the browser level, including HttpOnly cookies that JavaScript can't access.

One caveat: cookies expire. If your scraper runs on a schedule, build in a check that detects expired sessions (look for redirect to login page) and triggers a fresh login when needed.

Nodriver vs Playwright vs Selenium

Knowing when to use nodriver — and when not to — saves you from fighting the wrong tool.

Dimension Nodriver Playwright Selenium
Anti-bot bypass Strong out of the box Weak without stealth plugin Weak without undetected-chromedriver
Async support Native Native Requires wrapper
Browser support Chromium only Chromium, Firefox, WebKit All major browsers
Documentation Sparse, source-code level Excellent Extensive
Community size Small Large Very large
Headless mode Unstable in some versions Stable Stable
Cross-platform Yes Yes Yes
Setup complexity One pip install One pip install + browser binaries pip install + WebDriver management

Use nodriver when your primary problem is anti-bot detection and you're targeting Chromium-compatible sites. It's the right choice for scraping sites behind Cloudflare, Imperva, or similar WAFs where Selenium and Playwright get blocked immediately.

Use Playwright when you need cross-browser support, stable headless execution, or your target doesn't have aggressive bot protection. The Playwright docs are significantly better, and the debugging tools (trace viewer, codegen) will save you hours.

Use Selenium when you have existing Selenium code and the target site uses basic or no anti-bot measures. The ecosystem is massive — every scraping question has been answered somewhere. Migrating to nodriver is straightforward if you later need stealth.

One more option worth knowing: the scrapy-nodriver package integrates nodriver as a Scrapy download handler.

If you're already using Scrapy for structured crawling and just need stealth for specific requests, this combo gives you the best of both worlds.

Troubleshooting

These are the errors you'll actually hit. I've run into every one of them.

"NoneType has no attribute 'text'"

Why: Your selector didn't match anything. query_selector returned None and you tried to read .text on it.

Fix: Always check for None before accessing element properties:

el = await tab.query_selector(".price")
price = el.text if el else "N/A"

"Browser closed unexpectedly" or Chrome crashes on launch

Why: Another Chrome instance is using the same user data directory, or Chrome isn't installed in the default location.

Fix: Close all Chrome windows before running your script. If Chrome is installed somewhere non-standard, specify the path explicitly:

browser = await uc.start(
    browser_executable_path="/usr/bin/google-chrome-stable"
)

Headless mode throws errors or pages don't load

Why: Nodriver's headless support is still unreliable as of v0.48. Some versions crash, others silently skip page content.

Fix: Run in headed mode during development. For production on headless servers (AWS, VPS), install Xvfb to create a virtual display:

sudo apt install xvfb
xvfb-run python your_scraper.py

This gives nodriver a real display context without a physical monitor.

"asyncio.run() cannot be called from a running event loop"

Why: You're using asyncio.run() instead of nodriver's event loop.

Fix: Replace asyncio.run(main()) with uc.loop().run_until_complete(main()). This is nodriver-specific — every script needs it.

Scraper gets blocked after a few hundred requests

Why: Same IP, same fingerprint, too many requests too fast. Nodriver handles browser fingerprinting, but it can't fix your network-level fingerprint.

Fix: Add proxy rotation (Step 7), introduce random delays between requests (await tab.sleep(random.uniform(1, 3))), and limit concurrent tabs. If you're scraping a specific site heavily, check their robots.txt and respect any crawl-delay directives.

A Note on Responsible Scraping

Nodriver is a browser automation tool, not a license to hammer servers. Respect robots.txt directives. Add delays between requests.

Don't scrape personal data without a legal basis. If a site's terms of service explicitly prohibit scraping, understand the legal risks before proceeding.

Wrapping Up

You've built a nodriver web scraping setup that handles element extraction, pagination, infinite scroll, CDP network interception, multi-tab concurrency, cookie management, proxy rotation, and data export. That covers roughly 90% of real-world scraping scenarios.

Start with the basic examples and layer in complexity as your target requires it.

The CDP interception pattern (Step 5) is worth learning even if you don't need it immediately. Grabbing JSON from API responses is almost always cleaner than parsing rendered HTML.

When I tested nodriver against Cloudflare-protected e-commerce sites, it passed the initial challenge page on about 85% of attempts without any extra configuration. Adding residential proxies bumped that to near 100%.

Your results will vary by target, but that baseline stealth is the whole point of choosing nodriver over Playwright or Selenium.

Nodriver's ecosystem is still young. Documentation is thin, some methods are half-implemented, and headless mode needs work.

Check the official docs and GitHub issues regularly — the library ships updates frequently and breaking changes aren't always announced.

For sites where nodriver alone isn't enough, combine it with residential proxy rotation. This covers both your browser fingerprint and network identity.

The two layers together handle the vast majority of anti-bot systems deployed in 2026.