How to Use pyppeteer_stealth for Web Scraping in 5 Steps

pyppeteer_stealth is a Python library that patches Pyppeteer's bot-like signals, making your web scraper appear more human-like to bypass anti-bot detection systems. In this guide, we'll show you how to leverage pyppeteer_stealth effectively, along with some unconventional tricks that actually work.

Ever tried scraping a JavaScript-heavy website with Pyppeteer only to get blocked immediately? You're not alone. Modern websites use sophisticated anti-bot systems like Cloudflare, DataDome, and PerimeterX that can detect headless browsers faster than you can say "navigator.webdriver."

Here's the thing: vanilla Pyppeteer leaks automation signals like a broken faucet. Properties like navigator.webdriver: true, the HeadlessChrome user agent, and missing browser plugins are dead giveaways that scream "BOT!" to any decent anti-bot system.

That's where pyppeteer_stealth comes in. It's the Python implementation of the popular puppeteer-stealth plugin, designed to patch these telltale signs and make your scraper blend in with regular browser traffic.

But here's what most tutorials won't tell you: while pyppeteer_stealth is great for basic protection, advanced anti-bot systems have evolved beyond simple fingerprint checks. In this guide, we'll not only cover the basics but also dive into advanced techniques and alternative approaches that actually work in 2025.

Step 1: Install and Set Up pyppeteer_stealth

First, let's get the basics out of the way. Install both pyppeteer and pyppeteer_stealth:

pip install pyppeteer pyppeteer-stealth

Pro tip: pyppeteer automatically downloads Chromium on first run, which can take a while. To pre-download it manually:

import pyppeteer
pyppeteer.chromium_downloader.download_chromium()

Now, here's something most guides miss: pyppeteer_stealth hasn't been actively maintained since 2022. While it still works for many sites, you might want to consider using the more recent fork:

pip install pyppeteerstealth  # Note: different package name

Step 2: Implement Basic Stealth Mode

Here's the standard way to use pyppeteer_stealth:

import asyncio
from pyppeteer import launch
from pyppeteer_stealth import stealth

async def basic_stealth_scraper():
    browser = await launch(headless=True)
    page = await browser.newPage()
    
    # Apply stealth patches
    await stealth(page)
    
    await page.goto('https://bot.sannysoft.com/')
    await page.screenshot({'path': 'stealth_test.png'})
    
    await browser.close()

asyncio.get_event_loop().run_until_complete(basic_stealth_scraper())

But here's where it gets interesting. The stealth() function accepts several parameters that most people ignore:

await stealth(
    page,
    run_on_insecure_origins=True,  # Works on HTTP sites too
    languages=["en-US", "en"],
    vendor="Google Inc.",
    user_agent=None,  # Custom UA if needed
    locale="en-US,en",
    mask_linux=True,  # Hide Linux indicators
    webgl_vendor="Intel Inc.",
    renderer="Intel Iris OpenGL Engine",
    disabled_evasions=[]  # Disable specific patches if needed
)

Step 3: Test Your Stealth Configuration

Don't just assume your stealth setup works. Test it against known fingerprinting sites:

async def test_stealth_effectiveness():
    test_sites = [
        'https://bot.sannysoft.com/',
        'https://fingerprintjs.github.io/fingerprintjs/',
        'https://browserleaks.com/javascript'
    ]
    
    browser = await launch(headless=False)  # Use headful for testing
    
    for site in test_sites:
        page = await browser.newPage()
        await stealth(page)
        
        # Add some human-like behavior
        await page.setViewport({'width': 1366, 'height': 768})
        await page.evaluateOnNewDocument('''() => {
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined
            })
        }''')
        
        await page.goto(site)
        await page.waitFor(3000)
        
        # Check for common bot indicators
        webdriver_check = await page.evaluate('() => navigator.webdriver')
        plugins_check = await page.evaluate('() => navigator.plugins.length')
        
        print(f"{site}: webdriver={webdriver_check}, plugins={plugins_check}")
        
        await page.close()
    
    await browser.close()

Step 4: Apply Advanced Evasion Techniques

Here's where we go beyond the basics. These are the tricks that actually make a difference:

4.1 Disable Specific Evasions for Better Performance

Some evasion modules can actually make you more detectable on certain sites:

# Disable problematic evasions
await stealth(
    page,
    disabled_evasions=[
        'iframe_content_window',  # Can cause issues with some frameworks
        'media_codecs'  # Sometimes triggers false positives
    ]
)

4.2 Implement Request Interception

Intercept and modify requests to remove automation headers:

async def advanced_scraper():
    browser = await launch({
        'headless': True,
        'args': [
            '--disable-blink-features=AutomationControlled',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-setuid-sandbox',
            '--disable-web-security',
            '--disable-features=IsolateOrigins,site-per-process'
        ]
    })
    
    page = await browser.newPage()
    await stealth(page)
    
    # Intercept requests
    await page.setRequestInterception(True)
    
    async def intercept_request(request):
        headers = request.headers
        # Remove automation indicators
        headers.pop('X-DevTools-Request-Id', None)
        headers.pop('X-DevTools-Emulate-Network-Conditions-Client-Id', None)
        
        await request.continue_({'headers': headers})
    
    page.on('request', lambda req: asyncio.ensure_future(intercept_request(req)))
    
    await page.goto('https://example.com')
    await browser.close()

4.3 The "Page Pool" Technique

Instead of creating new pages for each request, maintain a pool of pre-configured pages:

class StealthPagePool:
    def __init__(self, size=5):
        self.size = size
        self.pages = []
        self.browser = None
    
    async def initialize(self):
        self.browser = await launch({
            'headless': True,
            'args': ['--disable-blink-features=AutomationControlled']
        })
        
        for _ in range(self.size):
            page = await self.browser.newPage()
            await stealth(page)
            # Pre-configure pages with cookies, viewport, etc.
            await page.setViewport({'width': 1920, 'height': 1080})
            self.pages.append(page)
    
    async def get_page(self):
        if self.pages:
            return self.pages.pop()
        else:
            # Create new page if pool is empty
            page = await self.browser.newPage()
            await stealth(page)
            return page
    
    async def return_page(self, page):
        # Clear page state before returning to pool
        await page.goto('about:blank')
        self.pages.append(page)

4.4 The Nuclear Option: Browser Context Isolation

When dealing with aggressive anti-bot systems, use incognito contexts:

async def isolated_scraping():
    browser = await launch(headless=False)
    
    # Create isolated browser context
    context = await browser.createIncognitoBrowserContext()
    page = await context.newPage()
    
    await stealth(page)
    
    # Each context has its own cookies, cache, etc.
    await page.goto('https://heavily-protected-site.com')
    
    # Clean up
    await context.close()
    await browser.close()

Step 5: Know When to Use Alternative Approaches

Here's the hard truth: pyppeteer_stealth won't bypass advanced protections like Cloudflare Enterprise, DataDome, or modern PerimeterX. When you hit these walls, it's time to think differently.

5.1 The Request-Based Approach: curl_cffi

Sometimes, you don't need a full browser. Use curl_cffi for TLS fingerprint spoofing:

from curl_cffi import requests

# Impersonate Chrome's TLS fingerprint
response = requests.get(
    'https://example.com',
    impersonate='chrome131',
    proxies={'http': 'http://proxy:port', 'https': 'http://proxy:port'}
)

print(response.text)

This is often faster and more reliable than browser automation for static content.

5.2 The Hybrid Approach

Use pyppeteer_stealth for JavaScript rendering and curl_cffi for subsequent requests:

async def hybrid_scraper():
    # Use Pyppeteer to get past initial challenge
    browser = await launch(headless=True)
    page = await browser.newPage()
    await stealth(page)
    
    await page.goto('https://example.com')
    
    # Extract cookies after challenge
    cookies = await page.cookies()
    cookie_string = '; '.join([f"{c['name']}={c['value']}" for c in cookies])
    
    await browser.close()
    
    # Use curl_cffi for actual scraping
    headers = {
        'Cookie': cookie_string,
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
    
    response = requests.get(
        'https://example.com/api/data',
        headers=headers,
        impersonate='chrome131'
    )
    
    return response.json()

5.3 The "Reverse Engineering" Approach

For sites using JavaScript challenges, sometimes it's better to solve them directly:

import cloudscraper

# Cloudscraper handles many JavaScript challenges automatically
scraper = cloudscraper.create_scraper(
    browser={
        'browser': 'chrome',
        'platform': 'windows',
        'desktop': True
    }
)

response = scraper.get('https://example.com')

Common Pitfalls to Avoid

  1. Don't use default viewports: The default 800x600 viewport is a dead giveaway. Always set realistic dimensions.
  2. Avoid patterns: Randomize delays, mouse movements, and scrolling patterns. Consistent timing = bot behavior.
  3. Watch your concurrency: Running 100 concurrent browser instances from the same IP? That's not how humans browse.
  4. Update your patches: Anti-bot systems evolve. What worked yesterday might not work today.

Performance Optimization Tips

# Disable images and CSS for faster loading
async def optimized_scraper():
    browser = await launch({
        'headless': True,
        'args': ['--blink-settings=imagesEnabled=false']
    })
    
    page = await browser.newPage()
    await stealth(page)
    
    # Block unnecessary resources
    await page.setRequestInterception(True)
    
    async def block_resources(request):
        if request.resourceType in ['image', 'stylesheet', 'font']:
            await request.abort()
        else:
            await request.continue_()
    
    page.on('request', lambda req: asyncio.ensure_future(block_resources(req)))
    
    await page.goto('https://example.com')

Final Thoughts

pyppeteer_stealth is a solid starting point for evading basic bot detection, but it's not a silver bullet. The key to successful web scraping in 2025 is adaptability. Use pyppeteer_stealth as part of a broader strategy that includes:

  • Multiple approaches (browser automation, request libraries, API reverse engineering)
  • Proper proxy rotation with residential IPs
  • Realistic browsing patterns
  • Regular testing and updates

Remember: the best scraper is the one that doesn't look like a scraper at all.

Marius Bernard

Marius Bernard

Marius Bernard is a Product Advisor, Technical SEO, & Brand Ambassador at Roundproxies. He was the lead author for the SEO chapter of the 2024 Web and a reviewer for the 2023 SEO chapter.