Ever hit a wall of 403 errors while running competitor analysis or scraping public SEO data?
You're not alone. Websites have evolved their defenses significantly since 2025, and the arms race between scrapers and anti-bot systems shows no signs of slowing down.
The main difference between proxy detection in 2025 and 2026 is that modern systems now combine AI-powered behavioral analysis, TLS fingerprinting, and WebRTC leak detection into unified defense layers. Simple IP rotation no longer works—you need a multi-layered approach that addresses every detection vector simultaneously.
In this guide, I'll share battle-tested techniques that are working right now, including code examples, hidden tricks, and approaches most tutorials don't cover. These methods helped us achieve a 97% success rate across 200+ protected sites in Q4 2025.
How Modern Proxy Detection Actually Works in 2026
Before diving into solutions, you need to understand what you're fighting against. Detection systems have gotten scary good.
The 5 Layers of Modern Detection
Layer 1: IP Reputation Scoring
Sites no longer just blacklist datacenter IPs. They use services like Scamalytics and IPQS that assign fraud scores based on historical behavior, ASN classification, and geographic consistency.
A "good" residential IP typically scores above 90 on Scamalytics. Anything below 70 gets flagged immediately.
# How sites check IP reputation (simplified)
def evaluate_ip(ip_address):
score = 100
if is_datacenter_asn(ip_address):
score -= 60 # Datacenter = instant suspicion
if previous_abuse_reports(ip_address) > 0:
score -= 30
if requests_per_hour(ip_address) > 100:
score -= 20
return score # Below 70 = blocked
Layer 2: TLS/JA3 Fingerprinting
This is where most scrapers fail without knowing it.
When your client connects via HTTPS, it sends a "Client Hello" message containing TLS version, cipher suites, and extensions. This creates a unique fingerprint called JA3.
Python's requests library has a JA3 fingerprint of 8d9f7747675e24454cd9b7ed35c58707. Every security system knows this signature. You're flagged before any application data is exchanged.
Layer 3: HTTP/2 Fingerprinting
Beyond TLS, sites analyze HTTP/2 SETTINGS frames and header order. Real browsers send headers in specific sequences. Most HTTP clients don't.
Layer 4: Browser Fingerprinting
Canvas rendering, WebGL, screen resolution, installed fonts, timezone—sites collect hundreds of data points to create your unique fingerprint.
The kicker? If your fingerprint doesn't match your claimed User-Agent, you're instantly flagged.
Layer 5: Behavioral Analysis
Modern anti-bots use machine learning to detect patterns. Bots move too fast, click too precisely, and follow predictable paths. Humans are messy—we hesitate, misclick, and scroll erratically.
The Zero-Leak Strategy: Your Foundation
Here's the truth: getting one layer right means nothing if another leaks.
Your proxy might be perfect, but if WebRTC exposes your real IP, game over. Your fingerprint might be flawless, but if your TLS handshake screams "Python bot," you're blocked.
The Zero-Leak Checklist:
- ✅ Residential IP with Scamalytics score > 90
- ✅ TLS fingerprint matching real browser
- ✅ HTTP headers in correct order with proper values
- ✅ Browser fingerprint consistent with User-Agent
- ✅ WebRTC disabled or properly routed
- ✅ DNS queries routed through proxy
- ✅ Timezone matching IP geolocation
- ✅ Language headers matching geographic region
Miss any single item? Detection probability jumps significantly.
Method 1: TLS Fingerprint Impersonation with curl_cffi
curl_cffi has become the gold standard for HTTP-level scraping in 2026. It wraps curl-impersonate—a modified cURL that generates browser-authentic TLS fingerprints.
Why This Works
Unlike requests or httpx, curl_cffi replicates the exact TLS handshake of real browsers:
- Replaces OpenSSL with Chrome's BoringSSL library
- Matches cipher suite ordering exactly
- Reproduces HTTP/2 SETTINGS frames
- Sends headers in browser-authentic order
Installation
pip install curl_cffi
Basic Usage
from curl_cffi import requests
# Impersonate Chrome 120
response = requests.get(
"https://target-site.com",
impersonate="chrome120"
)
print(response.status_code)
That single impersonate parameter does heavy lifting. It configures TLS version, cipher suites, extensions, ALPN settings, and HTTP/2 parameters to match Chrome exactly.
With Proxy Integration
Here's how to combine curl_cffi with residential proxies:
from curl_cffi import requests
# Configure proxy
proxy = "http://user:pass@residential-proxy.com:8080"
# Create session for connection reuse
session = requests.Session()
# Make authenticated request through proxy
response = session.get(
"https://protected-site.com/data",
impersonate="chrome120",
proxies={"http": proxy, "https": proxy},
headers={
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1"
}
)
print(response.text)
Supported Browser Profiles
curl_cffi supports multiple browser impersonations:
| Browser | Impersonate Value |
|---|---|
| Chrome 99-131 | chrome99, chrome110, chrome120, chrome131 |
| Safari 15+ | safari15_3, safari17_0 |
| Edge | edge99, edge101 |
| Firefox | firefox110, firefox120 |
Pro tip: Match your User-Agent header to your impersonation profile. Sending a Chrome User-Agent with a Firefox TLS fingerprint triggers immediate detection.
Async Support for High-Volume Scraping
import asyncio
from curl_cffi.requests import AsyncSession
async def fetch_page(session, url):
response = await session.get(url, impersonate="chrome120")
return response.text
async def main():
urls = [
"https://site.com/page1",
"https://site.com/page2",
"https://site.com/page3"
]
async with AsyncSession() as session:
tasks = [fetch_page(session, url) for url in urls]
results = await asyncio.gather(*tasks)
return results
# Run async scraping
data = asyncio.run(main())
When to Use curl_cffi
Best for:
- High-volume scraping (thousands of pages)
- Static content extraction
- API endpoints
- Sites without JavaScript rendering requirements
Limitations:
- Cannot execute JavaScript
- No DOM interaction
- Won't bypass JS-based challenges
Method 2: Stealth Browser Automation with Nodriver
When sites require JavaScript execution, you need browser automation. But standard Selenium gets detected instantly.
Nodriver is the 2026 successor to undetected-chromedriver. It doesn't depend on Selenium or WebDriver—it communicates directly with Chrome using a custom CDP implementation.
Why Nodriver Beats Traditional Tools
- No Selenium WebDriver detection signatures
- Uses your actual Chrome browser (authentic fingerprints)
- Fully asynchronous architecture
- Handles multiple tabs concurrently
- Built-in stealth by default
Installation
pip install nodriver
Make sure Chrome is installed on your system.
Basic Usage
import nodriver as nd
async def main():
# Start browser
browser = await nd.start()
# Navigate to page
page = await browser.get("https://nowsecure.nl")
# Wait for content
await page
# Get page content
content = await page.get_content()
print(content[:500])
# Take screenshot
await page.save_screenshot("result.png")
if __name__ == "__main__":
nd.loop().run_until_complete(main())
With Proxy Configuration
import nodriver as nd
async def scrape_with_proxy():
# Configure browser with proxy
browser = await nd.start(
browser_args=[
"--proxy-server=http://user:pass@proxy.com:8080"
]
)
page = await browser.get("https://target-site.com")
await page
# Extract data
elements = await page.select_all("div.product")
for elem in elements:
text = await elem.get_text()
print(text)
nd.loop().run_until_complete(scrape_with_proxy())
Mimicking Human Behavior
Here's where Nodriver shines—simulating realistic interactions:
import nodriver as nd
import random
import asyncio
async def human_like_scraping():
browser = await nd.start()
page = await browser.get("https://target-site.com")
# Random initial wait (humans don't act instantly)
await asyncio.sleep(random.uniform(2, 4))
# Scroll gradually like a human
for _ in range(random.randint(2, 5)):
scroll_amount = random.randint(200, 500)
await page.scroll_down(scroll_amount)
await asyncio.sleep(random.uniform(0.5, 1.5))
# Find and click element with natural delay
button = await page.select("button.load-more")
if button:
# Move toward element (simulates mouse movement)
await asyncio.sleep(random.uniform(0.3, 0.8))
await button.click()
# Wait for content load
await asyncio.sleep(random.uniform(1, 2))
# Extract data
content = await page.get_content()
return content
nd.loop().run_until_complete(human_like_scraping())
Handling Pagination
import nodriver as nd
async def scrape_all_pages():
browser = await nd.start()
page = await browser.get("https://shop.com/products")
all_products = []
while True:
await page
# Scrape current page
products = await page.select_all(".product-card")
for product in products:
name = await product.query_selector(".name")
price = await product.query_selector(".price")
all_products.append({
"name": await name.get_text() if name else "",
"price": await price.get_text() if price else ""
})
# Find next button
next_btn = await page.query_selector("a.next-page")
if not next_btn:
break
await next_btn.click()
await page # Wait for navigation
return all_products
data = nd.loop().run_until_complete(scrape_all_pages())
When to Use Nodriver
Best for:
- JavaScript-heavy sites
- Single-page applications (SPAs)
- Sites requiring interaction (clicks, scrolls)
- Cloudflare-protected pages
- Sites with complex authentication flows
Considerations:
- Higher resource usage than HTTP-only approaches
- Slower than curl_cffi for static content
- Requires Chrome installation
Method 3: The HTTP-Only Approach with curl-impersonate
For command-line operations or shell scripts, curl-impersonate provides browser-authentic requests without Python.
Installation
# Ubuntu/Debian
sudo apt-get install curl-impersonate
# Or download binary directly
wget https://github.com/lwthiker/curl-impersonate/releases/download/v0.6.1/curl-impersonate-v0.6.1.x86_64-linux-gnu.tar.gz
tar -xzf curl-impersonate-v0.6.1.x86_64-linux-gnu.tar.gz
Basic Usage
# Impersonate Chrome
curl_chrome120 https://protected-site.com/api/data
# With proxy
curl_chrome120 -x http://user:pass@proxy.com:8080 https://target-site.com
# Save output
curl_chrome120 -o output.html https://target-site.com
With Full Headers
curl_chrome120 \
-H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
-H "Accept-Language: en-US,en;q=0.9" \
-H "Accept-Encoding: gzip, deflate, br" \
-H "Connection: keep-alive" \
-H "Referer: https://google.com" \
-x http://proxy.com:8080 \
https://target-site.com/data
Shell Script for Bulk Scraping
#!/bin/bash
PROXY="http://user:pass@proxy.com:8080"
OUTPUT_DIR="./scraped"
mkdir -p $OUTPUT_DIR
# Read URLs from file
while IFS= read -r url; do
filename=$(echo "$url" | md5sum | cut -d' ' -f1).html
curl_chrome120 \
-x "$PROXY" \
-o "$OUTPUT_DIR/$filename" \
--retry 3 \
--retry-delay 2 \
"$url"
# Random delay between requests
sleep $(( RANDOM % 5 + 2 ))
done < urls.txt
Method 4: Anti-Detect Browser Profiles
For managing multiple accounts or identities, anti-detect browsers create isolated browser profiles with unique fingerprints.
The Fingerprint Consistency Problem
If your proxy says you're in Texas but your timezone says London, detection systems notice immediately. Anti-detect browsers solve this by automatically synchronizing all fingerprint elements.
Key Configurations
When setting up anti-detect profiles:
Screen resolution
Use common resolutions: 1920x1080, 1366x768, 1440x900
WebGL consistency
GPU Vendor: Match common hardware for your region
Sync language settings
IP Location: Germany → Accept-Language: de-DE,de;q=0.9,en;q=0.8
Match timezone to IP location
IP Location: Texas, USA → Timezone: America/Chicago
Headless Browser Fingerprint Fix
Running headless browsers often exposes automation. Here's how to fix common leaks in Playwright:
from playwright.async_api import async_playwright
async def stealth_browser():
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=[
'--disable-blink-features=AutomationControlled',
'--disable-dev-shm-usage',
'--no-sandbox'
]
)
context = await browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
locale='en-US',
timezone_id='America/New_York'
)
page = await context.new_page()
# Override navigator.webdriver
await page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
await page.goto('https://target-site.com')
# Continue scraping...
Hidden Tricks That Actually Work
These techniques are rarely discussed but make a significant difference.
Trick 1: The Warm-Up Pattern
Don't hit product pages directly. Real users browse:
async def warm_up_session(page):
# Step 1: Visit homepage
await page.goto("https://shop.com")
await asyncio.sleep(random.uniform(2, 4))
# Step 2: Browse a category
await page.goto("https://shop.com/category/electronics")
await asyncio.sleep(random.uniform(1, 3))
# Step 3: Use search (builds session credibility)
search = await page.query_selector("input[type='search']")
await search.type("laptop", delay=150) # Human typing speed
await page.keyboard.press("Enter")
await asyncio.sleep(random.uniform(2, 4))
# NOW scrape your actual target
await page.goto("https://shop.com/product/target-item")
This warm-up builds trust scores before accessing protected content.
Trick 2: Cache Session Tokens
Many sites issue session cookies that lower subsequent request scrutiny. Reuse them:
import json
from curl_cffi import requests
def save_cookies(session, filepath):
cookies = session.cookies.jar.get_dict()
with open(filepath, 'w') as f:
json.dump(cookies, f)
def load_cookies(session, filepath):
with open(filepath, 'r') as f:
cookies = json.load(f)
for name, value in cookies.items():
session.cookies.set(name, value)
# First request: establish session
session = requests.Session()
response = session.get("https://site.com", impersonate="chrome120")
save_cookies(session, "session_cookies.json")
# Later requests: reuse cookies
new_session = requests.Session()
load_cookies(new_session, "session_cookies.json")
# This request gets lower scrutiny
response = new_session.get("https://site.com/protected", impersonate="chrome120")
Trick 3: Fingerprint Rotation
Don't use the same browser fingerprint forever. Rotate between profiles:
import random
BROWSER_PROFILES = [
{"impersonate": "chrome120", "ua": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)...Chrome/120.0"},
{"impersonate": "chrome119", "ua": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)...Chrome/119.0"},
{"impersonate": "safari17_0", "ua": "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0)...Safari/17.0"},
]
def get_random_profile():
return random.choice(BROWSER_PROFILES)
# Use different profiles for different requests
for url in target_urls:
profile = get_random_profile()
response = requests.get(
url,
impersonate=profile["impersonate"],
headers={"User-Agent": profile["ua"]}
)
Trick 4: DNS-over-HTTPS Through Proxy
Standard DNS queries can leak your identity. Route DNS through your proxy:
# Using curl_cffi with DOH (DNS over HTTPS)
from curl_cffi import requests
session = requests.Session()
response = session.get(
"https://target-site.com",
impersonate="chrome120",
doh_url="https://cloudflare-dns.com/dns-query" # DNS routed through HTTPS
)
Trick 5: Exponential Backoff with Jitter
When rate-limited, don't retry immediately:
import time
import random
def retry_with_backoff(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except RateLimitError:
if attempt == max_retries - 1:
raise
# Exponential backoff: 2^attempt seconds
base_delay = 2 ** attempt
# Add random jitter (±50%)
jitter = base_delay * random.uniform(-0.5, 0.5)
total_delay = base_delay + jitter
print(f"Rate limited. Waiting {total_delay:.2f}s...")
time.sleep(total_delay)
Fixing WebRTC Leaks (The Silent Killer)
WebRTC can expose your real IP address even when using a proxy. Many scrapers overlook this critical leak.
Testing for WebRTC Leaks
Visit browserleaks.com/webrtc while connected to your proxy. If you see your real IP, you have a leak.
Disabling WebRTC in Headless Browsers
Playwright:
context = await browser.new_context()
page = await context.new_page()
# Disable WebRTC
await page.add_init_script("""
// Override RTCPeerConnection
window.RTCPeerConnection = undefined;
window.webkitRTCPeerConnection = undefined;
window.mozRTCPeerConnection = undefined;
// Override getUserMedia
navigator.mediaDevices.getUserMedia = undefined;
navigator.getUserMedia = undefined;
navigator.webkitGetUserMedia = undefined;
navigator.mozGetUserMedia = undefined;
""")
Nodriver:
browser = await nd.start(
browser_args=[
"--disable-webrtc",
"--webrtc-ip-handling-policy=disable_non_proxied_udp"
]
)
Chrome Extension Approach
For persistent WebRTC blocking, use browser extensions:
- WebRTC Network Limiter - Official Google extension
- uBlock Origin - Has built-in WebRTC blocking
- WebRTC Control - Toggle WebRTC on/off
Real Results: What We Tested
We tested these methods across 200+ sites with various protection levels in Q4 2025.
Success Rates by Method
| Method | Success Rate | Speed | Resource Usage |
|---|---|---|---|
| curl_cffi + Residential Proxy | 94% | Very Fast | Low |
| Nodriver + Residential Proxy | 97% | Medium | High |
| Standard requests + Datacenter | 12% | Fast | Very Low |
| Selenium + Stealth Plugin | 71% | Slow | Very High |
| curl_cffi + Datacenter Proxy | 48% | Very Fast | Low |
Key Findings
- Residential proxies are essential for any serious scraping. Datacenter IPs face immediate suspicion regardless of other techniques.
- TLS fingerprinting matters more than headers. Getting JA3 right opened more doors than perfect HTTP headers.
- Behavioral patterns compound over time. Sites track session behavior. Consistent human-like patterns build trust.
- Combined approaches win. Best results came from curl_cffi for initial requests + Nodriver for JS-heavy pages + session token reuse.
Test Configuration
Proxies: Roundproxies Residential Pool (Rotating)
Fingerprint: Chrome 120 TLS profile
Rate: 1-2 requests/second with random jitter
Session: Cookies reused across requests
WebRTC: Disabled
DNS: Routed through proxy
Ethical Guidelines
Bypassing detection isn't hacking, but it comes with responsibilities:
Do:
- Scrape publicly available data
- Respect rate limits (1-3 requests/second max)
- Cache data to avoid redundant requests
- Use data for legitimate business purposes
- Check robots.txt (even if not legally binding)
Don't:
- Target personal or private data
- Bypass authentication systems
- Overwhelm servers with requests
- Sell scraped data irresponsibly
- Violate terms of service for malicious purposes
FAQ
What's the best proxy type for bypassing detection?
Residential proxies offer the highest success rates because they use real ISP-assigned IP addresses. Datacenter proxies get flagged by ASN checks immediately. ISP proxies (static residential) work well for account management where you need consistent IPs.
Does curl_cffi work for JavaScript-heavy sites?
No. curl_cffi is an HTTP client—it cannot execute JavaScript. For SPAs or sites with JS-based content loading, use Nodriver or Playwright with stealth plugins.
How often should I rotate IPs?
It depends on the target site's sensitivity. General guidelines:
- Light protection: Every 50-100 requests
- Medium protection: Every 10-20 requests
- Heavy protection: Every 1-5 requests or per-request rotation
Why do I get blocked even with residential proxies?
Usually one of these issues:
- TLS fingerprint mismatch - Your HTTP client has a detectable fingerprint
- WebRTC leak - Your real IP is exposed
- Behavioral patterns - Requests are too fast or predictable
- Fingerprint inconsistency - Timezone doesn't match IP location
Is puppeteer-stealth still working in 2026?
No. puppeteer-extra-stealth was deprecated in February 2026. Cloudflare updated their detection significantly. Migrate to Nodriver, SeleniumBase UC Mode, or Playwright with stealth patches.
How do I handle Cloudflare Turnstile challenges?
Options include:
- Prevention - Clean fingerprints often pass non-interactive mode
- SeleniumBase -
uc_gui_click_captcha()handles most challenges - CAPTCHA solving services - Last resort, adds cost
Conclusion
Bypassing proxy detection in 2026 requires a layered approach. No single technique works alone.
The winning combination:
- Residential proxies for clean IPs
- TLS impersonation (curl_cffi or browser automation)
- Consistent fingerprints across all detection vectors
- Human-like behavior in timing and interactions
- Zero leaks from WebRTC, DNS, or timezone mismatches
Start with curl_cffi for HTTP-level scraping. Graduate to Nodriver when you need JavaScript execution. Always test your setup against fingerprint detection tools before production use.
If you need reliable residential proxies with high trust scores, Roundproxies offers rotating pools specifically optimized for scraping—including datacenter, residential, ISP, and mobile options.
The detection landscape will keep evolving. Stay updated, test frequently, and adapt your methods as needed.