pyppeteer_stealth is a Python library that patches Pyppeteer's bot-like signals, making your web scraper appear more human-like to bypass anti-bot detection systems. In this guide, we'll show you how to leverage pyppeteer_stealth effectively, along with some unconventional tricks that actually work.
Ever tried scraping a JavaScript-heavy website with Pyppeteer only to get blocked immediately? You're not alone. Modern websites use sophisticated anti-bot systems like Cloudflare, DataDome, and PerimeterX that can detect headless browsers faster than you can say "navigator.webdriver."
Here's the thing: vanilla Pyppeteer leaks automation signals like a broken faucet. Properties like navigator.webdriver: true
, the HeadlessChrome
user agent, and missing browser plugins are dead giveaways that scream "BOT!" to any decent anti-bot system.
That's where pyppeteer_stealth comes in. It's the Python implementation of the popular puppeteer-stealth plugin, designed to patch these telltale signs and make your scraper blend in with regular browser traffic.
But here's what most tutorials won't tell you: while pyppeteer_stealth is great for basic protection, advanced anti-bot systems have evolved beyond simple fingerprint checks. In this guide, we'll not only cover the basics but also dive into advanced techniques and alternative approaches that actually work in 2025.
Step 1: Install and Set Up pyppeteer_stealth
First, let's get the basics out of the way. Install both pyppeteer and pyppeteer_stealth:
pip install pyppeteer pyppeteer-stealth
Pro tip: pyppeteer automatically downloads Chromium on first run, which can take a while. To pre-download it manually:
import pyppeteer
pyppeteer.chromium_downloader.download_chromium()
Now, here's something most guides miss: pyppeteer_stealth hasn't been actively maintained since 2022. While it still works for many sites, you might want to consider using the more recent fork:
pip install pyppeteerstealth # Note: different package name
Step 2: Implement Basic Stealth Mode
Here's the standard way to use pyppeteer_stealth:
import asyncio
from pyppeteer import launch
from pyppeteer_stealth import stealth
async def basic_stealth_scraper():
browser = await launch(headless=True)
page = await browser.newPage()
# Apply stealth patches
await stealth(page)
await page.goto('https://bot.sannysoft.com/')
await page.screenshot({'path': 'stealth_test.png'})
await browser.close()
asyncio.get_event_loop().run_until_complete(basic_stealth_scraper())
But here's where it gets interesting. The stealth()
function accepts several parameters that most people ignore:
await stealth(
page,
run_on_insecure_origins=True, # Works on HTTP sites too
languages=["en-US", "en"],
vendor="Google Inc.",
user_agent=None, # Custom UA if needed
locale="en-US,en",
mask_linux=True, # Hide Linux indicators
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
disabled_evasions=[] # Disable specific patches if needed
)
Step 3: Test Your Stealth Configuration
Don't just assume your stealth setup works. Test it against known fingerprinting sites:
async def test_stealth_effectiveness():
test_sites = [
'https://bot.sannysoft.com/',
'https://fingerprintjs.github.io/fingerprintjs/',
'https://browserleaks.com/javascript'
]
browser = await launch(headless=False) # Use headful for testing
for site in test_sites:
page = await browser.newPage()
await stealth(page)
# Add some human-like behavior
await page.setViewport({'width': 1366, 'height': 768})
await page.evaluateOnNewDocument('''() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
}''')
await page.goto(site)
await page.waitFor(3000)
# Check for common bot indicators
webdriver_check = await page.evaluate('() => navigator.webdriver')
plugins_check = await page.evaluate('() => navigator.plugins.length')
print(f"{site}: webdriver={webdriver_check}, plugins={plugins_check}")
await page.close()
await browser.close()
Step 4: Apply Advanced Evasion Techniques
Here's where we go beyond the basics. These are the tricks that actually make a difference:
4.1 Disable Specific Evasions for Better Performance
Some evasion modules can actually make you more detectable on certain sites:
# Disable problematic evasions
await stealth(
page,
disabled_evasions=[
'iframe_content_window', # Can cause issues with some frameworks
'media_codecs' # Sometimes triggers false positives
]
)
4.2 Implement Request Interception
Intercept and modify requests to remove automation headers:
async def advanced_scraper():
browser = await launch({
'headless': True,
'args': [
'--disable-blink-features=AutomationControlled',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'
]
})
page = await browser.newPage()
await stealth(page)
# Intercept requests
await page.setRequestInterception(True)
async def intercept_request(request):
headers = request.headers
# Remove automation indicators
headers.pop('X-DevTools-Request-Id', None)
headers.pop('X-DevTools-Emulate-Network-Conditions-Client-Id', None)
await request.continue_({'headers': headers})
page.on('request', lambda req: asyncio.ensure_future(intercept_request(req)))
await page.goto('https://example.com')
await browser.close()
4.3 The "Page Pool" Technique
Instead of creating new pages for each request, maintain a pool of pre-configured pages:
class StealthPagePool:
def __init__(self, size=5):
self.size = size
self.pages = []
self.browser = None
async def initialize(self):
self.browser = await launch({
'headless': True,
'args': ['--disable-blink-features=AutomationControlled']
})
for _ in range(self.size):
page = await self.browser.newPage()
await stealth(page)
# Pre-configure pages with cookies, viewport, etc.
await page.setViewport({'width': 1920, 'height': 1080})
self.pages.append(page)
async def get_page(self):
if self.pages:
return self.pages.pop()
else:
# Create new page if pool is empty
page = await self.browser.newPage()
await stealth(page)
return page
async def return_page(self, page):
# Clear page state before returning to pool
await page.goto('about:blank')
self.pages.append(page)
4.4 The Nuclear Option: Browser Context Isolation
When dealing with aggressive anti-bot systems, use incognito contexts:
async def isolated_scraping():
browser = await launch(headless=False)
# Create isolated browser context
context = await browser.createIncognitoBrowserContext()
page = await context.newPage()
await stealth(page)
# Each context has its own cookies, cache, etc.
await page.goto('https://heavily-protected-site.com')
# Clean up
await context.close()
await browser.close()
Step 5: Know When to Use Alternative Approaches
Here's the hard truth: pyppeteer_stealth won't bypass advanced protections like Cloudflare Enterprise, DataDome, or modern PerimeterX. When you hit these walls, it's time to think differently.
5.1 The Request-Based Approach: curl_cffi
Sometimes, you don't need a full browser. Use curl_cffi for TLS fingerprint spoofing:
from curl_cffi import requests
# Impersonate Chrome's TLS fingerprint
response = requests.get(
'https://example.com',
impersonate='chrome131',
proxies={'http': 'http://proxy:port', 'https': 'http://proxy:port'}
)
print(response.text)
This is often faster and more reliable than browser automation for static content.
5.2 The Hybrid Approach
Use pyppeteer_stealth for JavaScript rendering and curl_cffi for subsequent requests:
async def hybrid_scraper():
# Use Pyppeteer to get past initial challenge
browser = await launch(headless=True)
page = await browser.newPage()
await stealth(page)
await page.goto('https://example.com')
# Extract cookies after challenge
cookies = await page.cookies()
cookie_string = '; '.join([f"{c['name']}={c['value']}" for c in cookies])
await browser.close()
# Use curl_cffi for actual scraping
headers = {
'Cookie': cookie_string,
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(
'https://example.com/api/data',
headers=headers,
impersonate='chrome131'
)
return response.json()
5.3 The "Reverse Engineering" Approach
For sites using JavaScript challenges, sometimes it's better to solve them directly:
import cloudscraper
# Cloudscraper handles many JavaScript challenges automatically
scraper = cloudscraper.create_scraper(
browser={
'browser': 'chrome',
'platform': 'windows',
'desktop': True
}
)
response = scraper.get('https://example.com')
Common Pitfalls to Avoid
- Don't use default viewports: The default 800x600 viewport is a dead giveaway. Always set realistic dimensions.
- Avoid patterns: Randomize delays, mouse movements, and scrolling patterns. Consistent timing = bot behavior.
- Watch your concurrency: Running 100 concurrent browser instances from the same IP? That's not how humans browse.
- Update your patches: Anti-bot systems evolve. What worked yesterday might not work today.
Performance Optimization Tips
# Disable images and CSS for faster loading
async def optimized_scraper():
browser = await launch({
'headless': True,
'args': ['--blink-settings=imagesEnabled=false']
})
page = await browser.newPage()
await stealth(page)
# Block unnecessary resources
await page.setRequestInterception(True)
async def block_resources(request):
if request.resourceType in ['image', 'stylesheet', 'font']:
await request.abort()
else:
await request.continue_()
page.on('request', lambda req: asyncio.ensure_future(block_resources(req)))
await page.goto('https://example.com')
Final Thoughts
pyppeteer_stealth is a solid starting point for evading basic bot detection, but it's not a silver bullet. The key to successful web scraping in 2025 is adaptability. Use pyppeteer_stealth as part of a broader strategy that includes:
- Multiple approaches (browser automation, request libraries, API reverse engineering)
- Proper proxy rotation with residential IPs
- Realistic browsing patterns
- Regular testing and updates
Remember: the best scraper is the one that doesn't look like a scraper at all.