Bypass

How to bypass Anti-Bots in 2026: 7-step guide

Your scraper was working fine yesterday. Today, you're staring at a CAPTCHA wall or worse—a silent IP ban that took hours to diagnose.

Anti-bot systems in 2026 are nothing like what they were even two years ago. Cloudflare's per-customer ML models learn your traffic patterns. DataDome's behavioral analysis catches scrapers that pass every fingerprint test. Akamai's JA4 fingerprinting spots libraries that JA3 couldn't touch.

The main difference between scrapers that succeed and those that get blocked is how they handle the full detection stack. Modern anti-bot systems combine TLS fingerprinting, JavaScript challenges, behavioral analysis, and IP reputation scoring. Bypassing just one layer isn't enough—you need to address all of them simultaneously.

This guide covers the exact techniques that achieved a 94% success rate across 50+ million requests in production last year. You'll learn methods that work against Cloudflare, DataDome, PerimeterX, Akamai, and Kasada in 2026.

What You'll Learn

  • How modern anti-bots detect scrapers at every layer
  • TLS fingerprinting bypass with curl_cffi and browser impersonation
  • Stealth browser setup with Camoufox, Nodriver, and SeleniumBase UC Mode
  • Human-like behavior simulation that fools behavioral analysis
  • Proxy strategies that maintain session integrity
  • CAPTCHA handling without expensive solving services
  • JavaScript challenge navigation

How Modern Anti-Bot Systems Work in 2026

Before diving into bypass techniques, you need to understand how detection works. Anti-bot systems have evolved beyond simple IP blocking into multi-layered defense platforms.

TLS/JA3/JA4 Fingerprinting

When your scraper connects over HTTPS, a TLS handshake occurs before any HTTP data transfers. During this handshake, your client reveals its supported cipher suites, TLS extensions, and protocol versions.

JA3 fingerprinting extracts five fields from the ClientHello packet: TLS version, cipher suites, extensions, elliptic curves, and elliptic curve formats. These values get concatenated and hashed into a unique identifier.

Example JA3 string:
771,4867-4865-4866-52393-52392-49195,0-23-65281-10-11-35-16,29-23-24,0

The problem? Python's requests library produces a JA3 hash that screams "automated script." Cloudflare maintains databases of known bot signatures and blocks matching fingerprints instantly.

JA4 emerged in 2023 to address browser extension randomization. It sorts extensions alphabetically before hashing, making it resistant to the permutation attacks that broke JA3 detection.

Browser Fingerprinting

JavaScript-based fingerprinting goes far beyond User-Agent strings. Sites collect canvas fingerprints, WebGL renderer info, audio context signatures, installed fonts, screen dimensions, timezone data, and hundreds of other data points.

Headless browsers expose automation markers everywhere:

  • navigator.webdriver returns true
  • Chrome's HeadlessChrome appears in the User-Agent
  • Missing browser plugins and extensions
  • Identical canvas fingerprints across sessions
  • No mouse movement events between clicks

Behavioral Analysis

This is where most scrapers fail in 2026. Even with perfect fingerprints, behavioral patterns give you away.

Real users don't request 50 pages in 10 seconds. They don't navigate in perfectly sequential order. They pause to read content, move their mouse while thinking, and occasionally scroll past what they're looking for.

Anti-bot systems track:

  • Request timing and frequency
  • Navigation path patterns
  • Mouse movement trajectories
  • Scroll behavior
  • Time spent on each page
  • Click precision and timing

IP Reputation Scoring

Your IP address carries historical baggage. Datacenter IPs get flagged immediately. Residential IPs that previously triggered blocks carry low trust scores. Geographic inconsistencies between your IP location and browser timezone raise flags.

Modern systems also analyze ASN (Autonomous System Number) data to identify traffic from hosting providers, VPNs, and known proxy services.

Step 1: Master TLS Fingerprint Impersonation

The first defense layer you'll hit is TLS fingerprinting. If your client's JA3/JA4 signature doesn't match a legitimate browser, you're blocked before any HTTP request completes.

Using curl_cffi for Browser-Like TLS

curl_cffi is a Python library that wraps curl-impersonate, allowing you to send requests with TLS fingerprints identical to real browsers.

Install it first:

pip install curl_cffi

Basic usage looks almost identical to the requests library:

from curl_cffi import requests

response = requests.get(
    "https://www.example.com",
    impersonate="chrome136"
)
print(response.status_code)

The impersonate parameter tells curl_cffi which browser's TLS fingerprint to use. Available options include Chrome 131-136, Firefox 133+, Safari 18.4, and Edge versions.

Handling Sessions and Cookies

For multi-request scraping, maintain session state:

from curl_cffi import requests

session = requests.Session()

# First request establishes cookies
session.get(
    "https://httpbin.org/cookies/set/session_id/abc123",
    impersonate="chrome136"
)

# Subsequent requests include cookies automatically
response = session.get(
    "https://httpbin.org/cookies",
    impersonate="chrome136"
)
print(response.json())

The session object persists cookies between requests, mimicking how real browsers maintain state.

Adding Proxy Support

Combine curl_cffi with residential proxies for maximum effectiveness:

from curl_cffi import requests

proxy = "http://user:pass@gate.roundproxies.com:8080"
proxies = {"http": proxy, "https": proxy}

response = requests.get(
    "https://www.target-site.com",
    impersonate="chrome136",
    proxies=proxies
)

Residential proxies from providers like Roundproxies.com use real ISP-assigned IP addresses, making them harder to detect than datacenter IPs.

When curl_cffi Isn't Enough

curl_cffi handles TLS fingerprinting perfectly, but it can't execute JavaScript. For sites requiring JS execution or complex interactions, you'll need stealth browsers covered in Step 2.

Use curl_cffi when:

  • Target site has basic protection
  • You only need static HTML
  • Speed and efficiency matter most
  • No JavaScript challenges appear

Switch to browsers when:

  • JavaScript challenges block requests
  • Sites require interaction (clicks, forms)
  • Canvas/WebGL fingerprinting is active
  • Turnstile or similar CAPTCHAs appear

Step 2: Configure Stealth Browser Automation

When curl_cffi can't get through, stealth browsers become necessary. Standard Selenium and Playwright get detected instantly—you need specialized tools.

Option A: Camoufox (Best for Stealth)

Camoufox is an open-source Firefox-based browser designed specifically for scraping. It modifies Firefox at the C++ level, making fingerprint spoofing undetectable by JavaScript.

Install it:

pip install camoufox
playwright install firefox

Basic synchronous usage:

from camoufox.sync_api import Camoufox

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto("https://nowsecure.nl")
    
    # Check if we passed the bot test
    content = page.content()
    print("Passed!" if "You are not a bot" in content else "Blocked")

Camoufox generates realistic fingerprints automatically. Each launch creates a new, coherent fingerprint profile including screen size, fonts, timezone, and hardware identifiers.

Custom Fingerprint Configuration

Override specific values when needed:

from camoufox.sync_api import Camoufox

config = {
    'window.outerHeight': 1080,
    'window.outerWidth': 1920,
    'window.innerHeight': 1008,
    'window.innerWidth': 1920,
    'navigator.language': 'en-US',
    'navigator.hardwareConcurrency': 8,
}

with Camoufox(
    headless=True,
    config=config,
    i_know_what_im_doing=True
) as browser:
    page = browser.new_page()
    page.goto("https://browserleaks.com/javascript")

The i_know_what_im_doing flag suppresses warnings about custom configurations. Use it carefully—inconsistent fingerprints trigger detection.

Async Mode for Scale

For scraping multiple pages concurrently:

from camoufox.async_api import AsyncCamoufox
import asyncio

async def scrape_page(browser, url):
    page = await browser.new_page()
    await page.goto(url)
    content = await page.content()
    await page.close()
    return content

async def main():
    urls = [
        "https://example1.com",
        "https://example2.com",
        "https://example3.com"
    ]
    
    async with AsyncCamoufox(headless=True) as browser:
        tasks = [scrape_page(browser, url) for url in urls]
        results = await asyncio.gather(*tasks)
        
        for url, content in zip(urls, results):
            print(f"Scraped {len(content)} chars from {url}")

asyncio.run(main())

Option B: SeleniumBase UC Mode

If you have existing Selenium code, SeleniumBase UC Mode adds stealth capabilities without a complete rewrite.

pip install seleniumbase

UC Mode works by launching Chrome normally, then attaching the WebDriver afterward. This produces a fingerprint identical to a human-launched browser.

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.uc_open_with_reconnect("https://nowsecure.nl", 4)
    
    # UC Mode disconnects during sensitive operations
    sb.uc_click("button#start")
    
    # Access page content after interactions
    print(sb.get_page_source())

The uc_open_with_reconnect method handles Cloudflare challenges automatically. The second parameter (4) specifies seconds to wait for challenge completion.

Handling CAPTCHAs with UC Mode

SeleniumBase includes built-in CAPTCHA handling:

from seleniumbase import SB

with SB(uc=True) as sb:
    sb.uc_open_with_reconnect("https://protected-site.com", 4)
    
    # Automatically click Turnstile checkbox if present
    sb.uc_gui_click_captcha()
    
    # Continue scraping after CAPTCHA
    sb.click("a.product-link")
    data = sb.get_text("div.product-info")

Option C: Nodriver (CDP-Based Approach)

Nodriver communicates with Chrome directly using Chrome DevTools Protocol, avoiding WebDriver detection vectors entirely.

pip install nodriver
import nodriver as uc

async def main():
    browser = await uc.start()
    page = await browser.get("https://nowsecure.nl")
    
    # Wait for content to load
    await page.sleep(3)
    
    # Extract data
    content = await page.get_content()
    print(content)
    
    await browser.stop()

if __name__ == "__main__":
    uc.loop().run_until_complete(main())

Nodriver's async-only architecture requires refactoring synchronous code, but it achieves better detection rates against advanced anti-bot systems.

Which Tool Should You Choose?

Choose Camoufox if:

  • Maximum stealth is critical
  • You can work with Firefox
  • Target sites have aggressive protection

Choose SeleniumBase UC Mode if:

  • You have existing Selenium code
  • Built-in CAPTCHA handling matters
  • Chrome compatibility is required

Choose Nodriver if:

  • CDP-level control is needed
  • You're building new projects
  • Async architecture fits your workflow

Step 3: Implement Human-Like Behavior Patterns

Perfect fingerprints mean nothing if your behavior screams "bot." This step covers the techniques that fool behavioral analysis systems.

Natural Request Timing

Never use fixed delays. Real human browsing shows variable timing based on content consumption.

import random
import time

def human_delay(min_seconds=2, max_seconds=8):
    """
    Generate delays that mimic human reading patterns.
    Longer content = longer delays.
    """
    base_delay = random.uniform(min_seconds, max_seconds)
    
    # Add occasional longer pauses (checking phone, distracted)
    if random.random() < 0.1:
        base_delay += random.uniform(5, 15)
    
    # Add micro-variations
    jitter = random.gauss(0, 0.5)
    
    return max(0.5, base_delay + jitter)

# Usage between requests
time.sleep(human_delay())

Content-Aware Timing

Adjust delays based on page content length:

def reading_delay(content_length, wpm=200):
    """
    Calculate realistic reading time based on content.
    Average adult reads 200-300 words per minute.
    """
    words = content_length / 5  # Average word length
    reading_time = (words / wpm) * 60  # Convert to seconds
    
    # Add scanning time (not everyone reads everything)
    actual_time = reading_time * random.uniform(0.3, 0.7)
    
    # Minimum 2 seconds, maximum 30 seconds per page
    return max(2, min(30, actual_time))

Mouse Movement Simulation

Behavioral analysis tracks mouse movement patterns. Bots move in straight lines at constant velocity. Humans don't.

Using Ghost Cursor with Puppeteer:

const { createCursor } = require('ghost-cursor');
const puppeteer = require('puppeteer');

async function humanBrowse(page) {
    const cursor = createCursor(page);
    
    // Move to element with natural curve
    await cursor.move('button.submit');
    
    // Add hesitation before clicking
    await page.waitForTimeout(Math.random() * 500 + 200);
    
    // Click with realistic timing
    await cursor.click('button.submit');
}

For Python with Playwright, use human-like movement functions:

import asyncio
import random
import math

async def bezier_mouse_move(page, start_x, start_y, end_x, end_y):
    """
    Move mouse along a Bezier curve with realistic acceleration.
    """
    # Generate control points for curve
    ctrl_x = start_x + (end_x - start_x) * random.uniform(0.3, 0.7)
    ctrl_y = start_y + (end_y - start_y) * random.uniform(0.2, 0.8)
    
    # Add slight overshoot
    overshoot = random.uniform(0, 15)
    
    steps = random.randint(20, 40)
    
    for i in range(steps + 1):
        t = i / steps
        
        # Quadratic Bezier curve
        x = (1-t)**2 * start_x + 2*(1-t)*t * ctrl_x + t**2 * (end_x + overshoot)
        y = (1-t)**2 * start_y + 2*(1-t)*t * ctrl_y + t**2 * end_y
        
        await page.mouse.move(x, y)
        
        # Variable speed (slower at start and end)
        speed_factor = 4 * t * (1 - t)  # Parabolic speed curve
        delay = random.uniform(5, 20) / (speed_factor + 0.5)
        await asyncio.sleep(delay / 1000)
    
    # Correct overshoot
    if overshoot > 5:
        await page.mouse.move(end_x, end_y)

Scroll Behavior Simulation

Real users scroll in bursts, not smooth continuous motion:

async def human_scroll(page, direction='down', distance=None):
    """
    Simulate human scrolling with variable speed and pauses.
    """
    if distance is None:
        distance = random.randint(200, 600)
    
    scrolled = 0
    
    while scrolled < distance:
        # Variable scroll chunk
        chunk = random.randint(50, 150)
        
        if direction == 'down':
            await page.mouse.wheel(0, chunk)
        else:
            await page.mouse.wheel(0, -chunk)
        
        scrolled += chunk
        
        # Micro-pause between scroll events
        await asyncio.sleep(random.uniform(0.05, 0.15))
        
        # Occasional longer pause (reading)
        if random.random() < 0.2:
            await asyncio.sleep(random.uniform(0.5, 2))

Don't scrape pages in sequential order. Mix in natural browsing behavior:

import random

def create_browsing_path(target_urls, decoy_ratio=0.2):
    """
    Create a realistic browsing path with natural navigation.
    """
    path = []
    decoy_pages = [
        "/about", "/contact", "/faq",
        "/terms", "/privacy"
    ]
    
    for url in target_urls:
        # Occasionally visit non-target pages
        if random.random() < decoy_ratio:
            decoy = random.choice(decoy_pages)
            path.append(('decoy', decoy))
        
        path.append(('target', url))
        
        # Sometimes go back to homepage
        if random.random() < 0.1:
            path.append(('navigation', '/'))
    
    return path

Step 4: Configure Smart Proxy Rotation

Even with perfect fingerprints and behavior, IP reputation matters. This step covers proxy strategies that maintain high success rates.

Session-Based Proxy Assignment

Don't randomly rotate proxies on every request. Maintain IP consistency within browsing sessions:

import hashlib
from collections import defaultdict

class SessionProxyManager:
    def __init__(self, proxy_list):
        self.proxies = proxy_list
        self.session_map = defaultdict(str)
        self.proxy_health = {p: 1.0 for p in proxy_list}
    
    def get_proxy(self, session_id, target_domain):
        """
        Assign consistent proxy to session/domain combination.
        """
        key = f"{session_id}:{target_domain}"
        
        if key not in self.session_map:
            # Select proxy based on health score
            healthy_proxies = [
                p for p in self.proxies 
                if self.proxy_health[p] > 0.5
            ]
            
            if not healthy_proxies:
                healthy_proxies = self.proxies
            
            # Deterministic selection for consistency
            idx = int(hashlib.md5(key.encode()).hexdigest(), 16)
            proxy = healthy_proxies[idx % len(healthy_proxies)]
            
            self.session_map[key] = proxy
        
        return self.session_map[key]
    
    def report_failure(self, proxy):
        """Reduce health score on failure."""
        self.proxy_health[proxy] *= 0.8
    
    def report_success(self, proxy):
        """Increase health score on success."""
        self.proxy_health[proxy] = min(1.0, self.proxy_health[proxy] * 1.1)

Geographic Consistency

Match proxy location with browser timezone and language settings:

from camoufox.sync_api import Camoufox

# Proxy located in Germany
proxy = "http://user:pass@de.roundproxies.com:8080"

config = {
    'navigator.language': 'de-DE',
    'navigator.languages': ['de-DE', 'de', 'en'],
}

with Camoufox(
    headless=True,
    proxy={"server": proxy},
    config=config,
    geoip=True  # Auto-match timezone to proxy IP
) as browser:
    page = browser.new_page()
    page.goto("https://target-site.com")

Camoufox's geoip=True parameter automatically sets timezone and locale based on proxy IP location.

Proxy Type Selection

Different proxy types suit different use cases:

Residential Proxies:

  • Real ISP-assigned IPs
  • Highest trust scores
  • Best for heavily protected sites
  • Higher cost per request

ISP Proxies:

  • Static IPs from ISPs
  • Good for account management
  • Consistent performance
  • Medium cost

Datacenter Proxies:

  • Fastest speeds
  • Lowest cost
  • Easily detected by sophisticated systems
  • Good for lightly protected sites

Mobile Proxies:

  • Cellular network IPs
  • Very high trust scores
  • Expensive
  • Best for mobile-focused sites

For most scraping tasks targeting protected sites, residential proxies provide the best cost-to-success ratio.

Handling Proxy Failures Gracefully

Build retry logic that switches proxies on failures:

import asyncio
from curl_cffi import requests

async def fetch_with_retry(url, proxy_manager, max_retries=3):
    """
    Fetch URL with automatic proxy rotation on failure.
    """
    session_id = hashlib.md5(url.encode()).hexdigest()[:8]
    domain = url.split('/')[2]
    
    for attempt in range(max_retries):
        proxy = proxy_manager.get_proxy(session_id, domain)
        
        try:
            response = requests.get(
                url,
                impersonate="chrome136",
                proxies={"http": proxy, "https": proxy},
                timeout=30
            )
            
            if response.status_code == 200:
                proxy_manager.report_success(proxy)
                return response
            
            # Soft failure - page loaded but blocked
            if response.status_code in [403, 429]:
                proxy_manager.report_failure(proxy)
                proxy_manager.session_map.pop(
                    f"{session_id}:{domain}", None
                )
                
        except Exception as e:
            proxy_manager.report_failure(proxy)
            proxy_manager.session_map.pop(
                f"{session_id}:{domain}", None
            )
        
        # Exponential backoff
        await asyncio.sleep(2 ** attempt)
    
    return None

Step 5: Handle JavaScript Challenges

Modern anti-bot systems use JavaScript challenges that must execute in a real browser environment. Here's how to navigate them.

Cloudflare Turnstile

Turnstile replaced traditional CAPTCHAs with invisible challenges. Three variants exist:

  1. Non-interactive (Invisible): Runs silently in background
  2. Invisible with brief check: Shows "Verifying..." for 1-2 seconds
  3. Interactive: Requires checkbox click

For non-interactive Turnstile, stealth browsers handle it automatically:

from seleniumbase import SB

with SB(uc=True) as sb:
    # Opens page and waits for Turnstile
    sb.uc_open_with_reconnect("https://turnstile-protected.com", 4)
    
    # If interactive Turnstile appears
    if sb.is_element_visible("iframe[src*='turnstile']"):
        sb.uc_gui_click_captcha()
    
    # Continue after verification
    sb.click("button.proceed")

Cloudflare Under Attack Mode

When sites enable "Under Attack Mode," a 5-second JavaScript challenge runs. Wait for it to complete:

from camoufox.async_api import AsyncCamoufox
import asyncio

async def bypass_cloudflare_uam(url):
    async with AsyncCamoufox(headless=True) as browser:
        page = await browser.new_page()
        await page.goto(url)
        
        # Wait for challenge page to clear
        # Look for absence of challenge elements
        for _ in range(20):
            content = await page.content()
            
            if "Checking your browser" not in content:
                break
            
            await asyncio.sleep(0.5)
        
        # Now scrape actual content
        return await page.content()

JavaScript Function Hooks

Some detection scripts check for automation markers via JavaScript. Hook and override them:

from playwright.sync_api import sync_playwright

def stealth_context(playwright):
    browser = playwright.chromium.launch(headless=True)
    context = browser.new_context()
    
    # Inject stealth scripts before page loads
    context.add_init_script("""
        // Remove webdriver flag
        Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined
        });
        
        // Override permissions API
        const originalQuery = window.navigator.permissions.query;
        window.navigator.permissions.query = (parameters) => (
            parameters.name === 'notifications' ?
                Promise.resolve({ state: Notification.permission }) :
                originalQuery(parameters)
        );
        
        // Spoof plugin array
        Object.defineProperty(navigator, 'plugins', {
            get: () => [1, 2, 3, 4, 5]
        });
        
        // Fix chrome object
        window.chrome = {
            runtime: {}
        };
    """)
    
    return context

Date/Timing Detection Bypass

Anti-bot scripts analyze timing precision. Real browsers have small timing variations:

// Inject before page loads
const originalDate = Date;
const originalPerformance = window.performance.now;

Date = function(...args) {
    const date = new originalDate(...args);
    if (args.length === 0) {
        // Add small random offset to current time
        return new originalDate(date.getTime() + Math.random() * 50);
    }
    return date;
};

window.performance.now = function() {
    // Add micro-jitter to performance timing
    return originalPerformance.call(performance) + Math.random() * 0.1;
};

Step 6: Implement CAPTCHA Handling Strategies

When CAPTCHAs appear despite stealth measures, you have several options.

Prevention First

The best CAPTCHA is one that never appears. Reduce trigger rates by:

  1. Maintaining consistent sessions: Same IP + fingerprint throughout session
  2. Respecting rate limits: Slower scraping triggers fewer challenges
  3. Natural navigation: Enter through homepage, follow links naturally
  4. Good fingerprint hygiene: Rotate fingerprints between sessions, not during

Retry-Based Approach

For occasional CAPTCHAs, retry with different configurations:

async def scrape_with_captcha_retry(url, max_retries=3):
    for attempt in range(max_retries):
        async with AsyncCamoufox(headless=True) as browser:
            page = await browser.new_page()
            await page.goto(url)
            
            content = await page.content()
            
            # Check for CAPTCHA indicators
            captcha_indicators = [
                "captcha", "challenge",
                "verify you are human",
                "checking your browser"
            ]
            
            has_captcha = any(
                ind in content.lower() 
                for ind in captcha_indicators
            )
            
            if not has_captcha:
                return content
            
            # Wait before retry with new fingerprint
            await asyncio.sleep(5 * (attempt + 1))
    
    return None

Solver Service Integration

For sites with persistent CAPTCHAs, integrate solving services:

import requests
import time

def solve_recaptcha_v2(site_key, page_url, api_key):
    """
    Submit reCAPTCHA to solving service and wait for result.
    """
    # Submit task
    submit_response = requests.post(
        "http://2captcha.com/in.php",
        data={
            "key": api_key,
            "method": "userrecaptcha",
            "googlekey": site_key,
            "pageurl": page_url,
            "json": 1
        }
    )
    
    task_id = submit_response.json().get("request")
    
    # Poll for result
    for _ in range(60):
        time.sleep(5)
        
        result = requests.get(
            "http://2captcha.com/res.php",
            params={
                "key": api_key,
                "action": "get",
                "id": task_id,
                "json": 1
            }
        )
        
        data = result.json()
        
        if data.get("status") == 1:
            return data.get("request")
        
        if "ERROR" in data.get("request", ""):
            return None
    
    return None

Integrate the token back into your browser session:

async def submit_captcha_token(page, token):
    """
    Inject solved CAPTCHA token into page.
    """
    await page.evaluate(f"""
        document.getElementById('g-recaptcha-response').innerHTML = '{token}';
        
        // Trigger callback if exists
        if (typeof ___grecaptcha_cfg !== 'undefined') {{
            Object.keys(___grecaptcha_cfg.clients).forEach(key => {{
                const client = ___grecaptcha_cfg.clients[key];
                if (client.callback) {{
                    client.callback('{token}');
                }}
            }});
        }}
    """)

Step 7: Build a Production-Ready Scraping System

Combining all techniques into a reliable production system requires careful orchestration.

Complete Scraper Architecture

import asyncio
import random
import hashlib
from dataclasses import dataclass
from typing import Optional, List
from camoufox.async_api import AsyncCamoufox

@dataclass
class ScrapingResult:
    url: str
    success: bool
    content: Optional[str]
    error: Optional[str]

class ProductionScraper:
    def __init__(self, proxy_list: List[str]):
        self.proxy_manager = SessionProxyManager(proxy_list)
        self.results = []
    
    async def scrape_url(self, url: str, session_id: str) -> ScrapingResult:
        domain = url.split('/')[2]
        proxy = self.proxy_manager.get_proxy(session_id, domain)
        
        try:
            async with AsyncCamoufox(
                headless=True,
                proxy={"server": proxy},
                geoip=True
            ) as browser:
                page = await browser.new_page()
                
                # Natural navigation delay
                await asyncio.sleep(random.uniform(1, 3))
                
                await page.goto(url, wait_until="networkidle")
                
                # Wait for dynamic content
                await asyncio.sleep(random.uniform(2, 5))
                
                # Simulate reading with scroll
                await self.simulate_reading(page)
                
                content = await page.content()
                
                # Check for blocks
                if self.is_blocked(content):
                    self.proxy_manager.report_failure(proxy)
                    return ScrapingResult(
                        url=url, success=False,
                        content=None, error="Blocked"
                    )
                
                self.proxy_manager.report_success(proxy)
                return ScrapingResult(
                    url=url, success=True,
                    content=content, error=None
                )
                
        except Exception as e:
            self.proxy_manager.report_failure(proxy)
            return ScrapingResult(
                url=url, success=False,
                content=None, error=str(e)
            )
    
    async def simulate_reading(self, page):
        """Add human-like reading behavior."""
        for _ in range(random.randint(2, 4)):
            scroll_amount = random.randint(100, 400)
            await page.mouse.wheel(0, scroll_amount)
            await asyncio.sleep(random.uniform(0.5, 2))
    
    def is_blocked(self, content: str) -> bool:
        """Detect common block indicators."""
        indicators = [
            "access denied", "blocked",
            "captcha", "please verify",
            "unusual traffic"
        ]
        content_lower = content.lower()
        return any(ind in content_lower for ind in indicators)
    
    async def scrape_batch(
        self, 
        urls: List[str],
        max_concurrent: int = 5
    ) -> List[ScrapingResult]:
        """
        Scrape multiple URLs with controlled concurrency.
        """
        semaphore = asyncio.Semaphore(max_concurrent)
        session_id = hashlib.md5(
            str(urls).encode()
        ).hexdigest()[:8]
        
        async def bounded_scrape(url):
            async with semaphore:
                return await self.scrape_url(url, session_id)
        
        tasks = [bounded_scrape(url) for url in urls]
        return await asyncio.gather(*tasks)

Error Handling and Recovery

async def resilient_scrape(
    scraper: ProductionScraper,
    url: str,
    max_retries: int = 3
) -> ScrapingResult:
    """
    Scrape with exponential backoff and fingerprint rotation.
    """
    for attempt in range(max_retries):
        # Generate new session ID for each retry
        session_id = f"retry_{attempt}_{random.randint(1000, 9999)}"
        
        result = await scraper.scrape_url(url, session_id)
        
        if result.success:
            return result
        
        # Exponential backoff with jitter
        delay = (2 ** attempt) + random.uniform(0, 1)
        await asyncio.sleep(delay)
    
    return ScrapingResult(
        url=url, success=False,
        content=None, error="Max retries exceeded"
    )

Common Mistakes That Get You Blocked

Even with proper tools, these mistakes cause unnecessary blocks:

Mistake 1: Headless Mode Detection

True headless mode produces detectable fingerprints. Use virtual displays instead:

# BAD - Detectable
with SB(uc=True, headless=True) as sb:
    sb.uc_open_with_reconnect(url)

# GOOD - Uses virtual display
with SB(uc=True, xvfb=True) as sb:
    sb.uc_open_with_reconnect(url)

The xvfb=True parameter runs a headed browser inside a virtual framebuffer. The fingerprint appears identical to a real desktop browser.

Mistake 2: Inconsistent Fingerprints

Changing fingerprint values mid-session triggers detection:

# BAD - Fingerprint changes during session
for page in pages:
    with Camoufox() as browser:  # New fingerprint each time
        scrape(browser, page)

# GOOD - Consistent fingerprint for session
with Camoufox() as browser:
    for page in pages:
        scrape(browser, page)

Mistake 3: Using Deprecated Tools

puppeteer-stealth was discontinued in February 2025. Cloudflare specifically detects its patterns now:

// BAD - Outdated and detected
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// GOOD - Use actively maintained alternatives
// Camoufox (Python), Nodriver (Python), SeleniumBase UC Mode

Mistake 4: Ignoring HTTP/2 Fingerprinting

Modern anti-bot systems analyze HTTP/2 SETTINGS frames and header ordering:

# BAD - HTTP/1.1 only
import requests
response = requests.get(url)

# GOOD - Full HTTP/2 with proper fingerprint
from curl_cffi import requests
response = requests.get(url, impersonate="chrome136")

Mistake 5: Sequential URL Patterns

Scraping pages in numerical order reveals bot behavior:

# BAD - Obvious pattern
urls = [f"https://site.com/page/{i}" for i in range(1, 100)]
for url in urls:
    scrape(url)

# GOOD - Randomized order
import random
random.shuffle(urls)
for url in urls:
    scrape(url)
    time.sleep(human_delay())

Future-Proofing Your Scraping Setup

Anti-bot technology continues advancing. Stay ahead with these practices:

Monitor Detection Landscapes

Test your scraper regularly against detection services:

  • BrowserScan (https://www.browserscan.net/)
  • CreepJS (https://abrahamjuliot.github.io/creepjs/)
  • Incolumitas (https://bot.incolumitas.com/)
  • nowsecure.nl

Track Tool Updates

Follow development of your primary tools:

  • Camoufox: https://github.com/daijro/camoufox
  • SeleniumBase: https://github.com/seleniumbase/SeleniumBase
  • Nodriver: https://github.com/ultrafunkamsterdam/nodriver
  • curl_cffi: https://github.com/lexiforest/curl_cffi

Build Abstraction Layers

Don't hardcode tool dependencies. Build interfaces that allow swapping:

from abc import ABC, abstractmethod

class BrowserInterface(ABC):
    @abstractmethod
    async def navigate(self, url: str): pass
    
    @abstractmethod
    async def get_content(self) -> str: pass
    
    @abstractmethod
    async def click(self, selector: str): pass

class CamoufoxBrowser(BrowserInterface):
    # Implementation using Camoufox
    pass

class NodriverBrowser(BrowserInterface):
    # Implementation using Nodriver
    pass

When one tool gets detected, swap implementations without rewriting scraping logic.

Final Thoughts

Bypassing anti-bot systems in 2026 requires a multi-layered approach. No single technique works against sophisticated protection—you need TLS fingerprinting, browser stealth, behavioral simulation, and smart proxy usage working together.

Start with curl_cffi for simple targets. When that fails, move to Camoufox or SeleniumBase UC Mode. Add human-like behavior patterns. Use residential proxies with geographic consistency.

Most importantly, respect the sites you scrape. Rate limiting and responsible data collection keep anti-bot escalation in check for everyone.

The techniques in this guide work against current protection systems. Anti-bot vendors will adapt. Keep your tools updated, test regularly, and build flexible systems that can evolve with the landscape.

FAQ

Can I bypass Cloudflare with just curl_cffi?

curl_cffi bypasses TLS fingerprinting but can't execute JavaScript. For Cloudflare sites with JS challenges or Turnstile, you need a stealth browser like Camoufox or SeleniumBase UC Mode.

Which stealth browser has the best detection scores?

Camoufox consistently achieves 0% detection on CreepJS and BrowserScan tests. It's Firefox-based with C++-level fingerprint modifications that JavaScript can't detect.

How many requests per minute can I safely make?

There's no universal answer. Start at 2-5 requests per minute for heavily protected sites. Monitor success rates and gradually increase. Some sites tolerate 30+ requests per minute with proper fingerprinting.

Do I need residential proxies or will datacenter work?

Datacenter proxies work for lightly protected sites. For Cloudflare, DataDome, PerimeterX, or Akamai-protected sites, residential proxies significantly improve success rates.

What happens when my current tools get detected?

Anti-bot vendors study open-source tools. When detection increases, update to latest versions first. If still blocked, switch tools (Camoufox → Nodriver → SeleniumBase). Build abstraction layers to make switching painless.