Your scraper runs perfectly on test pages. Then you point it at Glassdoor, Udemy, or any major retail site and hit a wall.
403 Forbidden. "Incapsula incident ID" in the response. Your requests die before reaching the server.
Imperva Incapsula blocks roughly 95% of automated requests according to their 2025 Bad Bot Report. If you're scraping eCommerce, job boards, or financial sites at scale, you'll encounter it constantly.
This guide covers six proven bypass methods with working Python code. You'll learn HTTP client approaches, browser automation, and advanced fingerprinting strategies that work against current Imperva protections.
Each method has trade-offs. I'll help you pick the right one.
What is Imperva Incapsula?
Imperva Incapsula is a cloud-based Web Application Firewall (WAF) that sits between users and websites. It analyzes every incoming request before it reaches the origin server.
When your scraper connects to an Incapsula-protected site, the WAF generates a trust score based on hundreds of client characteristics. Low score? You're blocked.
Here's what Imperva checks:
TLS Fingerprinting (JA3/JA4): During the TLS handshake, your client sends information about supported cipher suites, extensions, and curves. Imperva hashes this into a fingerprint. Standard Python libraries like requests produce fingerprints that scream "bot."
IP Reputation: Imperva maintains massive databases of IP metadata. Datacenter IPs get flagged immediately. Residential and mobile IPs pass through.
HTTP Analysis: Header ordering, values, and the presence of browser-specific headers like Sec-CH-UA and Sec-Fetch-* matter. HTTP/1.1 connections raise suspicion since real browsers use HTTP/2 or HTTP/3.
JavaScript Fingerprinting: Imperva collects 180+ encrypted values through client-side JavaScript. Canvas fingerprints, WebGL data, audio context, navigator properties. Everything.
Behavioral Analysis: ML models detect timing patterns, navigation sequences, and request cadences. Bots often request pages in patterns humans never would.
The reese84 Cookie: Advanced challenge requiring browser execution. The cookie contains encrypted fingerprint data that HTTP-only approaches cannot generate.
Standard scraping tools fail multiple checks simultaneously. That's why simple User-Agent spoofing doesn't work anymore.
How to Identify Imperva Protection
Before attempting bypass, confirm you're dealing with Incapsula. Here's a detection function:
import requests
def detect_incapsula(url):
"""
Detect if a website uses Imperva Incapsula protection.
Returns dict with detection results.
"""
try:
response = requests.get(url, timeout=10)
indicators = {
'status_403': response.status_code == 403,
'incapsula_text': 'incapsula' in response.text.lower(),
'incident_id': 'incident id' in response.text.lower(),
'powered_by': 'powered by' in response.text.lower(),
'visid_cookie': 'visid_incap' in response.headers.get('Set-Cookie', ''),
'incap_cookie': 'incap_ses' in response.headers.get('Set-Cookie', ''),
'x_iinfo_header': 'X-Iinfo' in response.headers,
'x_cdn_header': response.headers.get('X-CDN', '').lower() == 'imperva'
}
detected = any(indicators.values())
return {
'protected': detected,
'indicators': indicators
}
except Exception as e:
return {'error': str(e)}
# Test it
result = detect_incapsula('https://example.com')
print(f"Imperva detected: {result['protected']}")
Common block indicators include:
- HTTP 403 Forbidden response
- "Powered By Incapsula" text in HTML
- "Incapsula incident ID" message
X-Iinforesponse headerincap_ses_*andvisid_incapcookies
6 Methods to Bypass Imperva Incapsula
Before diving in, here's a quick overview:
| Method | Difficulty | Cost | Best For | Success Rate |
|---|---|---|---|---|
| curl_cffi | Easy | Free | Basic protection, high speed | Medium |
| Residential Proxies | Easy | $$ | IP-based blocking | High |
| Playwright Stealth | Medium | Free | JavaScript challenges | High |
| SeleniumBase UC | Medium | Free | Complex automation | High |
| nodriver | Hard | Free | Maximum stealth | Very High |
| Combined Approach | Hard | $$ | Production at scale | Very High |
Quick recommendation: Start with curl_cffi for basic scraping. For JavaScript-heavy sites or reese84 challenges, jump to Playwright or nodriver.
Basic Methods
1. curl_cffi: TLS Fingerprint Impersonation
curl_cffi solves the TLS fingerprinting problem without browser overhead. It impersonates real browser fingerprints at the network level.
Difficulty: Easy
Cost: Free
Success rate: Medium (works on ~60% of Incapsula sites)
How it works
Standard HTTP libraries like requests or httpx produce TLS fingerprints that Imperva immediately recognizes as non-browser traffic. The cipher suite ordering, extension list, and curve preferences all differ from real browsers.
curl_cffi wraps curl-impersonate, which replicates exact byte sequences from real Chrome, Firefox, and Safari handshakes. The JA3/JA4 fingerprint matches what Imperva expects from a legitimate browser.
Implementation
First, install the library:
pip install curl_cffi
Basic usage with Chrome impersonation:
from curl_cffi import requests
def scrape_with_curl_cffi(url):
"""
Scrape URL using curl_cffi with Chrome TLS fingerprint.
"""
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Sec-CH-UA': '"Google Chrome";v="136", "Chromium";v="136", "Not.A/Brand";v="99"',
'Sec-CH-UA-Mobile': '?0',
'Sec-CH-UA-Platform': '"Windows"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'Connection': 'keep-alive'
}
response = requests.get(
url,
headers=headers,
impersonate="chrome136",
timeout=30
)
return response
# Usage
response = scrape_with_curl_cffi('https://target-site.com')
print(f"Status: {response.status_code}")
For session persistence and retry logic:
from curl_cffi.requests import Session
import time
import random
class IncapsulaBypass:
"""
Advanced Incapsula bypass with session persistence and retry logic.
"""
def __init__(self, proxy=None):
self.session = Session(impersonate="chrome136")
self.proxy = proxy
def get(self, url, max_retries=3):
"""
GET request with automatic retry on failure.
"""
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Sec-CH-UA': '"Google Chrome";v="136", "Chromium";v="136"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none'
}
proxies = {'http': self.proxy, 'https': self.proxy} if self.proxy else None
for attempt in range(max_retries):
try:
response = self.session.get(
url,
headers=headers,
proxies=proxies,
timeout=30
)
if response.status_code == 200:
return response
if response.status_code == 403:
# Blocked - wait and retry
time.sleep(random.uniform(2, 5))
continue
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
time.sleep(random.uniform(1, 3))
return None
# Usage
bypass = IncapsulaBypass(proxy='http://user:pass@proxy.example.com:8080')
response = bypass.get('https://target-site.com')
Available impersonation profiles include: chrome99, chrome110, chrome120, chrome131, chrome136, safari17, safari18. Use the latest Chrome version for best results.
Pros and cons
Pros:
- Fast execution (no browser overhead)
- Low resource usage
- Simple API similar to requests
- Handles HTTP/2 automatically
Cons:
- Cannot execute JavaScript challenges
- Fails against reese84 cookie requirements
- Some sites detect curl-impersonate patterns
When to use this method
Use curl_cffi when:
- Target site doesn't require JavaScript execution
- Speed is critical
- You need to make thousands of requests quickly
- Basic Incapsula protection without advanced challenges
Avoid this method if:
- Site shows JavaScript challenge pages
- You see reese84 cookie requirements
- curl_cffi returns 403 after multiple attempts with good proxies
2. Residential Proxy Rotation
Datacenter IPs get blocked almost immediately by Imperva. Residential proxies are essential for consistent access.
Difficulty: Easy
Cost: $$ (typically $5-15 per GB)
Success rate: High (when combined with other methods)
How it works
Imperva maintains databases of IP reputation. Every IP has metadata: datacenter vs residential, ASN, geographic location, historical behavior patterns.
Datacenter IPs from AWS, Google Cloud, or Digital Ocean get flagged immediately. Residential IPs appear to come from real ISP customers and pass IP reputation checks.
Implementation
from curl_cffi.requests import Session
import random
import time
class ProxyRotator:
"""
Manage rotating residential proxies for Incapsula bypass.
"""
def __init__(self, proxy_list):
"""
Initialize with list of residential proxy URLs.
Format: http://user:pass@host:port
"""
self.proxies = proxy_list
self.current_index = 0
self.failed_proxies = set()
def get_next(self):
"""
Get next working proxy from rotation.
"""
attempts = 0
while attempts < len(self.proxies):
proxy = self.proxies[self.current_index]
self.current_index = (self.current_index + 1) % len(self.proxies)
if proxy not in self.failed_proxies:
return proxy
attempts += 1
# All proxies failed - reset and try again
self.failed_proxies.clear()
return self.proxies[0]
def mark_failed(self, proxy):
"""
Mark a proxy as failed.
"""
self.failed_proxies.add(proxy)
def scrape_with_rotation(url, proxy_rotator, max_retries=3):
"""
Scrape URL with automatic proxy rotation on failure.
"""
session = Session(impersonate="chrome136")
for attempt in range(max_retries):
proxy = proxy_rotator.get_next()
proxies = {'http': proxy, 'https': proxy}
try:
response = session.get(
url,
proxies=proxies,
timeout=30
)
if response.status_code == 200:
return response
if response.status_code == 403:
proxy_rotator.mark_failed(proxy)
time.sleep(random.uniform(1, 3))
continue
except Exception as e:
proxy_rotator.mark_failed(proxy)
print(f"Proxy {proxy} failed: {e}")
return None
# Usage
proxies = [
'http://user:pass@resi1.proxy.com:8080',
'http://user:pass@resi2.proxy.com:8080',
'http://user:pass@resi3.proxy.com:8080',
]
rotator = ProxyRotator(proxies)
response = scrape_with_rotation('https://target-site.com', rotator)
For high-volume scraping, use sticky sessions to maintain the same IP for related requests:
class StickyProxySession:
"""
Maintain sticky proxy session for related requests.
"""
def __init__(self, proxy_endpoint, session_duration=300):
"""
Args:
proxy_endpoint: Residential proxy endpoint with session support
session_duration: Seconds to maintain same IP
"""
self.endpoint = proxy_endpoint
self.duration = session_duration
self.session_id = None
self.session_start = 0
def get_proxy(self):
"""
Get proxy URL with sticky session ID.
"""
current_time = time.time()
# Generate new session if expired
if self.session_id is None or (current_time - self.session_start) > self.duration:
self.session_id = f"session_{random.randint(10000, 99999)}"
self.session_start = current_time
# Format depends on your proxy provider
# Common format: user-session-{id}:pass@host:port
return self.endpoint.replace('user:', f'user-session-{self.session_id}:')
Pros and cons
Pros:
- Essential for bypassing IP reputation checks
- Enables geographic targeting
- Scales to high request volumes
Cons:
- Ongoing cost per GB
- Slower than direct connections
- Doesn't solve TLS or JavaScript challenges alone
When to use this method
Residential proxies are nearly mandatory for serious Incapsula bypass. Use them in combination with other methods.
Avoid cheap datacenter proxies. They'll fail immediately.
Intermediate Methods
3. Playwright with Stealth Mode
Playwright handles JavaScript challenges natively. Modern versions (2025-2026) include built-in stealth features that hide automation markers.
Difficulty: Medium
Cost: Free
Success rate: High
How it works
Playwright launches a real Chromium browser. It executes JavaScript just like a human visitor, generating authentic fingerprints and handling challenges automatically.
The key advantage: Playwright solves reese84 cookie challenges without additional work. The browser collects fingerprint data and generates the required encrypted payload.
Implementation
Install Playwright:
pip install playwright
playwright install chromium
Basic stealth configuration:
from playwright.sync_api import sync_playwright
import random
import time
def scrape_with_playwright(url):
"""
Scrape URL using Playwright with stealth settings.
"""
with sync_playwright() as p:
# Launch with stealth arguments
browser = p.chromium.launch(
headless=True,
args=[
'--disable-blink-features=AutomationControlled',
'--disable-dev-shm-usage',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-infobars',
'--window-size=1920,1080',
'--start-maximized'
]
)
# Create context with realistic settings
context = browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
locale='en-US',
timezone_id='America/New_York'
)
page = context.new_page()
# Remove automation indicators
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// Override permissions
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
""")
# Navigate with realistic timing
page.goto(url, wait_until='networkidle')
# Wait for any challenges to complete
time.sleep(random.uniform(2, 4))
content = page.content()
browser.close()
return content
# Usage
html = scrape_with_playwright('https://target-site.com')
print(f"Retrieved {len(html)} characters")
For proxy integration:
def scrape_with_playwright_proxy(url, proxy_url):
"""
Scrape using Playwright with residential proxy.
"""
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={
'server': proxy_url.split('@')[1] if '@' in proxy_url else proxy_url,
'username': proxy_url.split('://')[1].split(':')[0] if '@' in proxy_url else None,
'password': proxy_url.split(':')[2].split('@')[0] if '@' in proxy_url else None
},
args=['--disable-blink-features=AutomationControlled']
)
context = browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
)
page = context.new_page()
page.goto(url, wait_until='networkidle')
content = page.content()
browser.close()
return content
Pros and cons
Pros:
- Executes JavaScript challenges automatically
- Generates authentic fingerprints
- Handles reese84 cookie natively
- Cross-browser support (Chromium, Firefox, WebKit)
Cons:
- Slower than HTTP clients (5-10x)
- Higher resource usage (300-500MB per instance)
- Requires browser binary installation
When to use this method
Use Playwright when:
- curl_cffi returns challenge pages
- Target requires JavaScript execution
- You need to interact with dynamic content
- Site uses reese84 cookies
Avoid this method if:
- Speed is critical and basic protection only
- Running on limited resources
4. SeleniumBase Undetected ChromeDriver
SeleniumBase with Undetected ChromeDriver patches automation markers at the driver level. It's well-maintained and handles many detection methods automatically.
Difficulty: Medium
Cost: Free
Success rate: High
How it works
Standard Selenium exposes multiple automation indicators: navigator.webdriver is true, Chrome DevTools Protocol markers are visible, and driver executables leave traces.
SeleniumBase UC mode patches these at a low level. It modifies the ChromeDriver binary to remove telltale signs and configures Chrome to hide automation flags.
Implementation
Install SeleniumBase:
pip install seleniumbase
Basic usage with UC mode:
from seleniumbase import Driver
import time
import random
def scrape_with_seleniumbase(url):
"""
Scrape URL using SeleniumBase in Undetected ChromeDriver mode.
"""
# Initialize driver with UC mode
driver = Driver(
uc=True,
headless=True,
agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36'
)
try:
driver.get(url)
# Wait for page and any challenges
time.sleep(random.uniform(3, 5))
# Check for Incapsula challenge
if 'incapsula' in driver.page_source.lower():
# Wait longer for challenge resolution
time.sleep(random.uniform(5, 10))
content = driver.page_source
return content
finally:
driver.quit()
def scrape_with_seleniumbase_proxy(url, proxy):
"""
SeleniumBase with proxy support.
"""
driver = Driver(
uc=True,
headless=True,
proxy=proxy # Format: host:port or user:pass@host:port
)
try:
driver.get(url)
time.sleep(random.uniform(3, 5))
return driver.page_source
finally:
driver.quit()
# Usage
html = scrape_with_seleniumbase('https://target-site.com')
print(f"Retrieved {len(html)} characters")
For handling CAPTCHAs with SeleniumBase's built-in features:
from seleniumbase import SB
def scrape_with_captcha_handling(url):
"""
SeleniumBase with automatic CAPTCHA handling.
"""
with SB(uc=True, headless=True) as sb:
sb.open(url)
# SeleniumBase can auto-click CAPTCHA checkboxes
if sb.is_element_visible('iframe[src*="captcha"]'):
sb.uc_gui_click_captcha()
sb.sleep(3)
return sb.get_page_source()
Pros and cons
Pros:
- Mature, well-maintained project
- Built-in CAPTCHA handling
- Familiar Selenium API
- Good documentation
Cons:
- Slower than Playwright
- ChromeDriver version must match Chrome
- Some detection methods still work against it
When to use this method
Use SeleniumBase when:
- You need robust CAPTCHA handling
- Team is familiar with Selenium
- Running complex multi-step automation
Advanced Methods
5. nodriver: CDP-Minimal Automation
nodriver represents the cutting edge of stealth automation. It communicates directly with Chrome via CDP while avoiding the markers that standard automation leaves behind.
Difficulty: Hard
Cost: Free
Success rate: Very High
How it works
Standard automation tools (Selenium, Playwright, Puppeteer) control browsers through WebDriver protocol or heavy CDP usage. These protocols leave detectable traces.
nodriver takes a different approach. It uses minimal CDP communication and emulates real user behavior through native OS-level inputs. This makes it invisible to most detection methods.
Recent benchmarks show nodriver achieving 25% success rate against major anti-bots in default configuration. Its fork, zendriver, achieves 75% with additional optimizations.
Implementation
Install nodriver:
pip install nodriver
Basic usage:
import nodriver as uc
import asyncio
async def scrape_with_nodriver(url):
"""
Scrape URL using nodriver for maximum stealth.
"""
browser = await uc.start(
headless=True,
browser_args=[
'--disable-blink-features=AutomationControlled',
'--window-size=1920,1080'
]
)
try:
page = await browser.get(url)
# Wait for dynamic content
await asyncio.sleep(3)
# Get page content
content = await page.get_content()
return content
finally:
await browser.close()
# Run async function
html = asyncio.run(scrape_with_nodriver('https://target-site.com'))
print(f"Retrieved {len(html)} characters")
For proxy support with nodriver (requires SOCKS5):
import nodriver as uc
import asyncio
async def scrape_with_nodriver_proxy(url, socks5_proxy):
"""
nodriver with SOCKS5 proxy.
Format: socks5://user:pass@host:port
"""
browser = await uc.start(
headless=True,
browser_args=[
f'--proxy-server={socks5_proxy}',
'--disable-blink-features=AutomationControlled'
]
)
try:
page = await browser.get(url)
await asyncio.sleep(3)
return await page.get_content()
finally:
await browser.close()
Consider zendriver for higher success rates:
pip install zendriver
import zendriver as zd
import asyncio
async def scrape_with_zendriver(url):
"""
zendriver for enhanced stealth (75% success vs 25% nodriver).
"""
browser = await zd.start(headless=True)
try:
page = await browser.get(url)
await asyncio.sleep(3)
return await page.get_content()
finally:
await browser.close()
Pros and cons
Pros:
- Highest stealth of any automation framework
- Async-first architecture
- Minimal detection surface
- Active development
Cons:
- Async-only API (learning curve)
- SOCKS5 proxy requirement
- Less documentation than mainstream tools
- Chromium-only
When to use this method
Use nodriver/zendriver when:
- Other browser automation methods fail
- Target has advanced behavioral analysis
- Maximum stealth is required
- You're comfortable with async Python
6. Combined Approach: Full-Stack Bypass
Production scraping against Incapsula requires multiple layers working together. Here's a complete solution combining the best methods.
Difficulty: Hard
Cost: $$ (proxies)
Success rate: Very High
Implementation
from curl_cffi.requests import Session as CurlSession
from playwright.sync_api import sync_playwright
import random
import time
from typing import Optional, Dict
class IncapsulaFullBypass:
"""
Production-ready Incapsula bypass combining multiple methods.
"""
def __init__(self, proxy_list: list):
self.proxies = proxy_list
self.proxy_index = 0
self.curl_session = None
def get_proxy(self) -> str:
"""Rotate through proxy list."""
proxy = self.proxies[self.proxy_index]
self.proxy_index = (self.proxy_index + 1) % len(self.proxies)
return proxy
def try_curl_cffi(self, url: str) -> Optional[str]:
"""
Attempt 1: Fast HTTP client with TLS impersonation.
"""
if self.curl_session is None:
self.curl_session = CurlSession(impersonate="chrome136")
proxy = self.get_proxy()
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Sec-CH-UA': '"Google Chrome";v="136"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate'
}
try:
response = self.curl_session.get(
url,
headers=headers,
proxies={'http': proxy, 'https': proxy},
timeout=30
)
if response.status_code == 200:
# Check for challenge pages
if 'incapsula' not in response.text.lower():
return response.text
except Exception as e:
print(f"curl_cffi failed: {e}")
return None
def try_playwright(self, url: str) -> Optional[str]:
"""
Attempt 2: Full browser automation for JavaScript challenges.
"""
proxy = self.get_proxy()
# Parse proxy URL
proxy_parts = proxy.replace('http://', '').replace('https://', '')
with sync_playwright() as p:
browser_args = ['--disable-blink-features=AutomationControlled']
proxy_config = None
if '@' in proxy_parts:
auth, server = proxy_parts.rsplit('@', 1)
user, password = auth.split(':')
proxy_config = {
'server': f'http://{server}',
'username': user,
'password': password
}
else:
proxy_config = {'server': f'http://{proxy_parts}'}
browser = p.chromium.launch(
headless=True,
proxy=proxy_config,
args=browser_args
)
try:
context = browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
)
page = context.new_page()
# Remove webdriver indicator
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
page.goto(url, wait_until='networkidle', timeout=60000)
# Wait for challenges to resolve
time.sleep(random.uniform(3, 6))
content = page.content()
# Verify we got real content
if 'incapsula' not in content.lower() and len(content) > 1000:
return content
except Exception as e:
print(f"Playwright failed: {e}")
finally:
browser.close()
return None
def scrape(self, url: str, max_attempts: int = 3) -> Dict:
"""
Main scraping method with fallback chain.
Returns dict with content and method used.
"""
for attempt in range(max_attempts):
# Try fast method first
content = self.try_curl_cffi(url)
if content:
return {
'success': True,
'method': 'curl_cffi',
'content': content,
'attempts': attempt + 1
}
# Fallback to browser automation
content = self.try_playwright(url)
if content:
return {
'success': True,
'method': 'playwright',
'content': content,
'attempts': attempt + 1
}
# Wait before retry
time.sleep(random.uniform(5, 10))
return {
'success': False,
'method': None,
'content': None,
'attempts': max_attempts
}
# Usage
proxies = [
'http://user:pass@resi1.example.com:8080',
'http://user:pass@resi2.example.com:8080',
'http://user:pass@resi3.example.com:8080',
]
bypass = IncapsulaFullBypass(proxies)
result = bypass.scrape('https://target-site.com')
if result['success']:
print(f"Success with {result['method']} after {result['attempts']} attempts")
print(f"Content length: {len(result['content'])}")
else:
print("All bypass methods failed")
Which Method Should You Use?
| Situation | Best Method |
|---|---|
| Basic protection, speed critical | curl_cffi + residential proxies |
| JavaScript challenges present | Playwright with stealth |
| Complex automation needed | SeleniumBase UC |
| Maximum stealth required | nodriver/zendriver |
| Production at scale | Combined approach |
Start simple. Try curl_cffi first. If blocked, escalate to browser automation.
Always use residential proxies. Datacenter IPs fail immediately regardless of method.
Common Errors and Solutions
"403 Forbidden" after successful initial request
Cause: IP got flagged after behavioral analysis or too many requests.
Fix: Rotate to new proxy. Implement random delays between requests (2-10 seconds). Distribute traffic across multiple IPs.
"Incapsula incident ID" in response
Cause: Request failed multiple detection checks.
Fix: Switch from HTTP client to browser automation. Verify residential proxy is working. Check TLS fingerprint with browserleaks.com.
JavaScript challenge loops forever
Cause: Browser automation detected, challenge keeps regenerating.
Fix: Use nodriver instead of Playwright. Clear cookies between attempts. Try different residential IP.
reese84 cookie not generated
Cause: JavaScript fingerprinting blocked or incomplete.
Fix: Must use browser automation. Ensure JavaScript enabled. Wait longer for challenge completion (10+ seconds).
Rate limited (429 status)
Cause: Too many requests from same IP or session.
Fix: Implement exponential backoff. Rotate proxies more frequently. Reduce concurrent requests.
import time
import random
def exponential_backoff(attempt, base_delay=1, max_delay=60):
"""
Calculate delay with exponential backoff and jitter.
"""
delay = min(base_delay * (2 ** attempt), max_delay)
jitter = random.uniform(0, delay * 0.1)
return delay + jitter
# Usage in retry loop
for attempt in range(5):
response = make_request()
if response.status_code == 429:
delay = exponential_backoff(attempt)
print(f"Rate limited. Waiting {delay:.1f} seconds...")
time.sleep(delay)
continue
break
Ethical Considerations
Before bypassing Incapsula protection, consider:
Terms of Service: Most sites prohibit scraping in their ToS. Bypassing protection may violate these terms.
Legal implications: Laws vary by jurisdiction. The Computer Fraud and Abuse Act (US) and similar laws elsewhere may apply. Consult legal counsel for commercial projects.
Responsible use:
- Only scrape public data
- Respect rate limits even when you can bypass them
- Cache data to minimize requests
- Identify yourself with contact info in headers when appropriate
- Don't overload target servers
When to use official APIs instead:
If a site offers an API, use it. APIs are faster, legal, and more reliable than scraping protected sites.
2026 Detection Trends
Imperva continues evolving. Monitor these emerging methods:
JA4 Fingerprinting: The successor to JA3 provides more granular TLS analysis. Libraries must update impersonation signatures regularly.
HTTP/3 Analysis: QUIC protocol fingerprints now analyzed. Current bypass tools need HTTP/3 support.
Behavioral ML Models: Machine learning detects subtle patterns in navigation timing and click sequences. Simple randomization may not suffice.
Device Attestation: Some implementations verify hardware characteristics through WebAuthn. This requires actual browser execution.
Canvas Fingerprint Verification: Beyond collection, systems verify render consistency across requests from the same session.
Stay updated with curl_cffi releases, Playwright updates, and anti-detection plugin changes. Join web scraping communities for early warnings about new detection methods.
Conclusion
Imperva Incapsula uses layered detection: TLS fingerprinting, IP reputation, HTTP analysis, JavaScript challenges, and behavioral monitoring. Effective bypass requires addressing multiple vectors simultaneously.
Start with curl_cffi for basic requests. The TLS impersonation handles fingerprint detection without browser overhead. Add residential proxies for IP reputation.
For JavaScript-heavy sites, use Playwright or nodriver. These execute challenges natively while hiding automation markers.
The reese84 cookie challenge requires browser execution. HTTP-only approaches cannot generate the required fingerprint payload. Plan for browser automation when targeting sites with this protection.
Match your method to the protection level:
| Protection Level | Recommended Approach |
|---|---|
| Basic | curl_cffi + residential proxies |
| Medium | Playwright with stealth |
| Advanced | nodriver + rotating residential |
| Enterprise | Combined approach with fallbacks |
Always test before scaling. What works today may need adjustment tomorrow as detection methods evolve.