Looking to scrape modern websites that block traditional web scraping tools? You're not alone. As websites implement increasingly sophisticated anti-bot measures like Cloudflare, traditional tools like Selenium often fail to get past even basic protection.
This is where Nodriver comes in - the official successor to Undetected-Chromedriver that's revolutionizing how we approach web scraping in 2025.
I've personally used Nodriver to successfully scrape data from heavily protected sites that blocked every other tool I tried. In this guide, I'll show you the exact process that helped me bypass detection and extract data reliably.
What you'll learn:
- How to install and set up Nodriver properly
- Essential configuration for avoiding detection
- Techniques for scraping both static and dynamic content
- Methods to handle pagination and infinite scrolling
- Best practices for staying undetected
- How to handle common challenges and errors
Why You Can Trust This Method
Problem: Modern websites use sophisticated anti-bot systems that detect and block traditional scrapers. Selenium with ChromeDriver leaves obvious fingerprints that are easily detected.
Solution: Nodriver provides direct browser communication without the traditional webdriver, making it much harder to detect while offering better performance.
Proof: Nodriver is "optimized to stay undetected for most anti-bot solutions" and has been specifically designed as a "successor to Undetected-Chromedriver". It successfully bypasses protection from Cloudflare, Imperva, and other major WAFs that would normally block scrapers.
Step 1: Install Nodriver and Set Up Your Environment
First, let's get Nodriver installed and prepare your development environment.
Prerequisites
- Python 3.12 or higher
- Chrome browser installed (preferably in the default location)
- Basic knowledge of Python and asyncio
Installation
Install Nodriver using pip:
pip install nodriver
Set up your project structure
Create a new directory for your project and set up a virtual environment:
mkdir nodriver-scraper
cd nodriver-scraper
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activate
Common pitfall to avoid
When running on a headless machine, like AWS or any other environment where no display is present, it's best to use some Xvfb tool, to emulate a screen. This prevents display-related errors.
Step 2: Create Your First Nodriver Browser Instance
Now let's create a basic Nodriver script that launches a browser.
import nodriver as uc
async def main():
# Start the browser
browser = await uc.start()
# Create a new tab
tab = await browser.get('https://example.com')
# Wait a moment for the page to load
await tab.wait(2)
# Take a screenshot to verify it worked
await tab.save_screenshot('example.png')
# Close the browser
await browser.close()
if __name__ == '__main__':
# Run the async function
uc.loop().run_until_complete(main())
Tips for this step
- Nodriver provides direct communication with browsers, eliminating the need for traditional components like Selenium or Chromedriver binaries
- The browser runs in non-headless mode by default, which is less detectable
- Use
headless=True
parameter inuc.start()
only when necessary
Advanced configuration options
browser = await uc.start(
headless=False, # Run with GUI (more stealthy)
user_data_dir="/path/to/profile", # Use existing browser profile
browser_args=['--disable-blink-features=AutomationControlled'],
lang="en-US"
)
Step 3: Navigate and Extract Basic Data
Let's scrape some actual data from a webpage.
import nodriver as uc
async def scrape_products():
browser = await uc.start()
# Navigate to the target page
tab = await browser.get('https://scrapingcourse.com/ecommerce/')
# Wait for products to load
await tab.wait(2)
# Extract product information
products = []
# Find all product containers
product_elements = await tab.select_all('li.product')
for product in product_elements:
# Extract product details
name_elem = await product.query_selector('h2')
price_elem = await product.query_selector('.price')
if name_elem and price_elem:
name = await name_elem.text
price = await price_elem.text
products.append({
'name': name,
'price': price
})
print(f"Found {len(products)} products")
for product in products:
print(f"- {product['name']}: {product['price']}")
await browser.close()
return products
if __name__ == '__main__':
uc.loop().run_until_complete(scrape_products())
Best practices for element selection
- Use CSS selectors for finding elements:
tab.select()
for single elements andtab.select_all()
for multiple - Always check if elements exist before trying to extract data
- Use
await element.text
to get text content
Common pitfall to avoid
Don't use overly specific selectors that might break if the website structure changes slightly. Prefer class names and data attributes over complex hierarchical selectors.
Step 4: Handle Dynamic Content and JavaScript
Many modern websites load content dynamically. Here's how to handle that:
import nodriver as uc
async def scrape_dynamic_content():
browser = await uc.start()
tab = await browser.get('https://example.com/dynamic-page')
# Wait for specific element to appear
await tab.wait_for('div.dynamic-content', timeout=10)
# Execute JavaScript if needed
result = await tab.evaluate('document.querySelector(".counter").innerText')
print(f"Counter value: {result}")
# Interact with the page
button = await tab.select('button.load-more')
if button:
await button.click()
# Wait for new content to load
await tab.wait(2)
# Scroll to trigger lazy loading
await tab.scroll_down(500)
await browser.close()
if __name__ == '__main__':
uc.loop().run_until_complete(scrape_dynamic_content())
Tips for handling dynamic content
- Use
wait_for()
to wait for specific elements to appear - Nodriver can execute JavaScript, so it's useful for extracting data from dynamic websites
- Always add appropriate waits after interactions
Step 5: Implement Pagination and Scrolling
Here's how to handle pagination and infinite scrolling:
Pagination Example
async def scrape_with_pagination():
browser = await uc.start()
tab = await browser.get('https://scrapingcourse.com/ecommerce/')
all_products = []
page_num = 1
while True:
print(f"Scraping page {page_num}...")
# Extract products from current page
products = await tab.select_all('li.product')
for product in products:
name = await product.query_selector('h2')
if name:
all_products.append(await name.text)
# Check for next page button
next_button = await tab.select('a.next')
if not next_button:
print("No more pages")
break
# Click next page
await next_button.click()
await tab.wait(2) # Wait for page to load
page_num += 1
print(f"Total products scraped: {len(all_products)}")
await browser.close()
return all_products
Infinite Scrolling Example
async def scrape_infinite_scroll():
browser = await uc.start()
tab = await browser.get('https://scrapingcourse.com/infinite-scrolling')
products = []
last_height = 0
while True:
# Scroll to bottom
await tab.evaluate('window.scrollTo(0, document.body.scrollHeight)')
await tab.wait(2) # Wait for new content to load
# Check if new content was loaded
new_height = await tab.evaluate('document.body.scrollHeight')
if new_height == last_height:
print("No more content to load")
break
last_height = new_height
# Extract newly loaded products
product_elements = await tab.select_all('.product-item')
current_count = len(product_elements)
print(f"Products loaded: {current_count}")
# Extract all product data
for elem in product_elements:
name = await elem.query_selector('.product-name')
price = await elem.query_selector('.product-price')
if name and price:
products.append({
'name': await name.text,
'price': await price.text
})
await browser.close()
return products
Best practice tip
Ensure you wait for more elements to load before scraping to avoid missing dynamically loaded content.
Step 6: Save and Export Your Data
Once you've scraped the data, you need to save it properly:
import csv
import json
import nodriver as uc
from datetime import datetime
async def scrape_and_save():
browser = await uc.start()
tab = await browser.get('https://scrapingcourse.com/ecommerce/')
# Scrape data
products = []
product_elements = await tab.select_all('li.product')
for elem in product_elements:
name = await elem.query_selector('h2')
price = await elem.query_selector('.price')
image = await elem.query_selector('img')
if all([name, price, image]):
products.append({
'name': await name.text,
'price': await price.text,
'image_url': await image.get_attribute('src'),
'scraped_at': datetime.now().isoformat()
})
# Save to CSV
with open('products.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['name', 'price', 'image_url', 'scraped_at'])
writer.writeheader()
writer.writerows(products)
# Save to JSON
with open('products.json', 'w', encoding='utf-8') as f:
json.dump(products, f, indent=2, ensure_ascii=False)
print(f"Saved {len(products)} products to CSV and JSON files")
await browser.close()
if __name__ == '__main__':
uc.loop().run_until_complete(scrape_and_save())
Tips for data export
- Always include timestamps in your scraped data
- Use UTF-8 encoding to handle special characters
- Consider using pandas for more complex data manipulation
- Implement error handling for file operations
Step 7: Scale Your Scraper with Best Practices
To build a production-ready scraper, implement these essential practices:
Error Handling and Retries
import asyncio
import nodriver as uc
from typing import Optional, List, Dict
async def safe_scrape(url: str, max_retries: int = 3) -> Optional[List[Dict]]:
"""Scrape with automatic retry on failure"""
for attempt in range(max_retries):
try:
browser = await uc.start()
tab = await browser.get(url)
# Your scraping logic here
await tab.wait_for('.product', timeout=10)
products = await extract_products(tab)
await browser.close()
return products
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt) # Exponential backoff
else:
print("Max retries reached")
return None
finally:
try:
await browser.close()
except:
pass
Rate Limiting and Delays
import random
async def respectful_scraper(urls: List[str]):
"""Scrape multiple URLs with random delays"""
browser = await uc.start()
results = []
for url in urls:
tab = await browser.get(url)
# Random delay between requests (1-3 seconds)
delay = random.uniform(1, 3)
await tab.wait(delay)
# Extract data
data = await extract_data(tab)
results.append(data)
# Close tab to free memory
await tab.close()
await browser.close()
return results
Session Management and Cookies
async def scrape_with_session():
"""Use persistent session with saved cookies"""
browser = await uc.start(
user_data_dir="./browser_profile" # Saves cookies and session
)
# First visit - might need to log in
tab = await browser.get('https://example.com/login')
# Check if already logged in
if not await tab.select('.user-dashboard'):
# Perform login
await login(tab)
# Now scrape protected content
await tab.get('https://example.com/protected-data')
data = await extract_protected_data(tab)
await browser.close()
return data
Common pitfalls to avoid
- Don't name your script "nodriver.py" - this will cause import errors
- Avoid aggressive scraping - respect robots.txt and implement delays
- Don't ignore errors - proper error handling prevents data loss
- Check for CloudFlare challenges - even Nodriver may encounter Cloudflare captchas on heavily protected sites
Pro tips for staying undetected
- Nodriver has nice utility where the get function on the browser just works on the primary tab
- Rotate user agents and browser profiles
- Use residential proxies for heavily protected sites
- Implement human-like behavior (random delays, mouse movements)
Final thoughts
You've now learned how to use Nodriver for web scraping, from basic setup to advanced techniques. The key to successful scraping with Nodriver is understanding its asynchronous nature and leveraging its built-in anti-detection features.
Remember that what makes this package different from other known packages, is the optimization to stay undetected for most anti-bot solutions. However, always scrape responsibly and respect website terms of service.
Next steps
- Explore advanced features: Try using Nodriver's xpath selectors and CDP (Chrome DevTools Protocol) access
- Build a monitoring system: Create scrapers that run on schedule and alert you to changes
- Scale with cloud deployment: Deploy your scrapers on AWS or similar platforms using Docker
- Handle complex scenarios: Learn to bypass more sophisticated protections and handle multi-step processes
Further Reading
Call-to-action
Ready to start scraping? Download the complete code examples from this tutorial and begin extracting data from even the most protected websites. If you need to handle extremely complex anti-bot systems, consider using a web scraping API like ZenRows or ScrapingBee as a fallback option.