You've built your scraper. It pulls clean data from test pages without breaking a sweat. Then you point it at a real target and get slapped with a 403 or a blank page that says "Access Denied."
That's AWS WAF. Amazon's firewall sits in front of millions of sites, and it's gotten significantly harder to get past in the last year.
This guide covers five methods to bypass AWS WAF — from quick header fixes to open-source token solvers. Every approach here is something you build and control yourself. No paid APIs, no managed services. Just working code.
What Is AWS WAF and How Does It Block Scrapers?
AWS WAF (Web Application Firewall) is Amazon's cloud-based security layer that inspects HTTP/HTTPS traffic before it reaches the origin server. It uses multiple detection layers simultaneously — IP reputation, TLS fingerprinting, JavaScript challenges, and behavioral analysis. The most common trigger for scrapers is the aws-waf-token cookie challenge, which requires valid JavaScript execution to pass.
Here's what happens when your scraper hits an AWS WAF-protected site. The firewall checks your request against several signals at once.
IP reputation is the first gate. AWS maintains lists of known datacenter ranges, VPN endpoints, and previously flagged addresses. If your request comes from AWS, GCP, Hetzner, or any major cloud provider, it's likely flagged before anything else is evaluated.
TLS fingerprinting is the second check. Every HTTP client produces a unique signature during the TLS handshake. Python's requests library has a fingerprint that looks nothing like Chrome. AWS WAF can tell the difference instantly.
JavaScript challenges are where most scrapers die. AWS WAF injects a script that collects browser telemetry — Canvas rendering, WebGL capabilities, installed fonts, screen dimensions. Your browser executes this script and generates an aws-waf-token cookie. No token, no access.
Rate limiting catches everything else. Too many requests from one IP in a short window triggers throttling or a full block. AWS WAF also tracks behavioral patterns like navigation speed and request sequencing.
AWS WAF has two challenge levels you'll encounter:
- HTTP 202 response — JavaScript challenge only. Your client needs to execute JS and return the computed token.
- HTTP 405 response — Full CAPTCHA. A visual puzzle on top of the JS challenge. Much harder to handle programmatically.
The AWS WAF 8KB Body Inspection Limit
There's a well-documented limitation worth knowing: AWS WAF only inspects the first 8,192 bytes (8KB) of a request body by default. Anything beyond that limit gets forwarded directly to the origin server without inspection.
This 8KB limit is relevant if you're sending POST requests to an API endpoint behind AWS WAF. A request body padded beyond 8KB passes the WAF's rule engine entirely. The ghostsecurityarchive/waf-btk project on GitHub demonstrates this with an HTTP proxy that pads request bodies automatically.
AWS introduced SizeConstraint rules to counter this — site owners can reject oversized requests outright. But not every site configures them, and legitimate file uploads or large JSON payloads often exceed 8KB anyway, making blanket size restrictions impractical.
For scraping, this matters less than the token challenge. But if you're interacting with a WAF-protected API that blocks specific request patterns, padding your POST body past 8KB is a valid trick.
5 Methods to Bypass AWS WAF
Here's what each method costs, how hard it is to set up, and when it works best.
| Method | Difficulty | Cost | Best For | Reliability |
|---|---|---|---|---|
| TLS fingerprint spoofing | Easy | Free | Sites with light protection | Medium |
| Browser automation | Medium | Free | JS challenge bypass | High |
| Open-source token solver | Medium | Free | High-volume scraping | High |
| Residential proxy rotation | Easy | $$ | IP-based blocking | Medium |
| Combined bypass engine | Hard | $$ | Production-scale scraping | Very High |
Quick recommendation: If you're blocked right now and need data today, start with Method 2 (Playwright). For production workloads, jump straight to Method 5.
How to Identify AWS WAF Protection
Before you try to bypass anything, confirm you're actually dealing with AWS WAF. Misidentifying the protection wastes time.
Check your response headers for x-amzn-waf-action or x-amz-cf-id. Look at the response body for references to challenge.js or captcha.js from an awswaf.com domain.
The most reliable signal is the cookie itself. Open your browser's dev tools, visit the target, and check for a cookie named aws-waf-token.
Here's a quick Python script that checks programmatically:
import requests
url = "https://target-site.com"
response = requests.get(url, allow_redirects=False)
# Check status code
if response.status_code in [202, 405]:
print(f"AWS WAF challenge detected (HTTP {response.status_code})")
# Check for WAF-specific headers
waf_headers = ['x-amzn-waf-action', 'x-amz-cf-id']
for header in waf_headers:
if header in response.headers:
print(f"Found WAF header: {header}")
# Check for challenge scripts in response body
if 'awswaf.com' in response.text or 'challenge.js' in response.text:
print("AWS WAF challenge page detected in response body")
A 202 status code is the clearest indicator. Standard web servers almost never return 202 for a GET request — that's AWS WAF telling you to solve its challenge.
Basic Methods
1. TLS Fingerprint Spoofing with curl_cffi
Standard Python HTTP clients like requests and httpx have TLS fingerprints that look nothing like a real browser. AWS WAF checks this fingerprint during the handshake and blocks non-browser clients before the request even completes.
curl_cffi solves this by wrapping curl-impersonate, which replicates the exact TLS signatures of real browsers — including JA3 hashes, HTTP/2 settings, and header ordering.
Difficulty: Easy
Cost: Free
Reliability against AWS WAF: Medium — passes TLS checks but won't solve JS challenges alone
Install it first:
pip install curl_cffi --break-system-packages
Here's how to make a request that looks like it came from Chrome:
from curl_cffi import requests
# impersonate="chrome" uses the latest Chrome TLS fingerprint
response = requests.get(
"https://protected-site.com",
impersonate="chrome", # matches Chrome's exact TLS signature
headers={
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
)
print(f"Status: {response.status_code}")
print(f"Cookies: {response.cookies}")
The impersonate="chrome" parameter does the heavy lifting. It changes the entire TLS handshake to match Chrome's signature — cipher suites, extensions, ALPN protocols, everything.
You can also use sessions to maintain cookies across requests, which matters because the aws-waf-token needs to persist:
from curl_cffi import requests
session = requests.Session()
# First request — may trigger a challenge
response = session.get(
"https://protected-site.com",
impersonate="chrome"
)
# If we got a token cookie, subsequent requests use it automatically
response = session.get(
"https://protected-site.com/data-page",
impersonate="chrome"
)
print(response.text[:500])
Sessions preserve the aws-waf-token cookie across requests. If the initial request passes the TLS check and returns data, you're set.
When this works: Sites with basic AWS WAF rules that primarily check TLS fingerprints and headers. You'll know within one request.
When this fails: Sites that require JavaScript execution to generate the aws-waf-token. If you get a 202 response even with curl_cffi, you need Method 2 or 3.
Bypass AWS WAF from the Command Line
If you prefer working from the terminal, curl-impersonate gives you the same TLS spoofing as curl_cffi but as a drop-in replacement for curl. It ships as Docker images — one for Chrome's fingerprint, one for Firefox's.
Pull the Chrome variant and make a request:
docker pull lwthiker/curl-impersonate:0.6-chrome
docker run --rm lwthiker/curl-impersonate:0.6-chrome \
curl_chrome116 \
-s -L \
-H "Accept: text/html,application/xhtml+xml" \
-H "Accept-Language: en-US,en;q=0.9" \
-c /tmp/cookies.txt \
"https://protected-site.com"
The curl_chrome116 binary produces TLS handshakes identical to Chrome 116. The -c flag saves cookies — including any aws-waf-token — to a file for reuse in subsequent requests.
To reuse the token on follow-up requests:
docker run --rm lwthiker/curl-impersonate:0.6-chrome \
curl_chrome116 \
-s -L \
-b /tmp/cookies.txt \
"https://protected-site.com/api/data"
This is useful for quick testing and debugging before you write Python code. If curl_chrome116 gets you a 200, you know TLS spoofing is enough. If you get a 202, you need the token solver or browser automation.
Intermediate Methods
2. Browser Automation with Playwright Stealth
When AWS WAF demands JavaScript execution, you need an actual browser. Playwright with stealth plugins gives you a real Chromium instance that executes the challenge script and generates a valid token.
Difficulty: Medium
Cost: Free
Reliability against AWS WAF: High for JS challenges, medium for CAPTCHA challenges
The key ingredient is playwright-stealth, which patches the telltale signs that Playwright is automated — things like navigator.webdriver being set to true, missing plugin arrays, and incorrect viewport properties.
Install the dependencies:
pip install playwright playwright-stealth --break-system-packages
playwright install chromium
Here's a working implementation:
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import time
def bypass_aws_waf(url):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--no-sandbox",
]
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36",
)
page = context.new_page()
stealth_sync(page) # patches automation detection signals
page.goto(url, wait_until="networkidle")
# Wait for the WAF challenge to resolve
# AWS WAF challenges typically complete within 5 seconds
time.sleep(5)
# Extract the aws-waf-token cookie
cookies = context.cookies()
waf_token = None
for cookie in cookies:
if "aws-waf-token" in cookie["name"]:
waf_token = cookie["value"]
break
content = page.content()
browser.close()
return waf_token, content
token, html = bypass_aws_waf("https://protected-site.com")
if token:
print(f"Got WAF token: {token[:50]}...")
else:
print("No WAF token found — site may not use AWS WAF")
The stealth_sync(page) call is doing most of the anti-detection work. Without it, AWS WAF's browser fingerprinting catches Playwright immediately.
Reusing tokens across HTTP requests is where this gets practical. You don't want to spin up a browser for every request. Get the token once, then use it with curl_cffi:
from curl_cffi import requests as curl_requests
def scrape_with_token(url, waf_token):
"""Use a browser-obtained token with fast HTTP requests."""
session = curl_requests.Session()
response = session.get(
url,
impersonate="chrome",
cookies={"aws-waf-token": waf_token},
)
return response
# Get token via Playwright (Method 2)
token, _ = bypass_aws_waf("https://protected-site.com")
# Reuse token for fast scraping
for page_num in range(1, 50):
url = f"https://protected-site.com/data?page={page_num}"
response = scrape_with_token(url, token)
print(f"Page {page_num}: {response.status_code}")
This hybrid approach gives you the reliability of a real browser for token generation and the speed of HTTP requests for data collection.
Token expiration matters. AWS WAF tokens typically last 5–15 minutes depending on the site's configuration. Build in a refresh mechanism that re-launches the browser when you start getting 403s.
When this works: Most AWS WAF-protected sites with JS challenges (HTTP 202). Very reliable for sites that don't rotate their challenge scripts frequently.
When this fails: Sites with full CAPTCHA challenges (HTTP 405), or sites that re-validate the token with every request. Also slow — each browser session takes 5–10 seconds to initialize.
3. Open-Source Token Solver
This is the most powerful method available right now. An open-source project reverse-engineered the AWS WAF token generation process, letting you compute valid tokens without a browser.
Difficulty: Medium
Cost: Free
Reliability against AWS WAF: High
The solver extracts the encrypted challenge parameters (key, iv, context) from the initial AWS WAF response and computes the token locally. No headless browser, no JavaScript execution — just math.
The project is xKiian/awswaf on GitHub, available in both Python and Golang.
Install it:
pip install awswaf --break-system-packages
Here's how it works in practice:
import re
from curl_cffi import requests as curl_requests
SITE_URL = "https://protected-site.com"
session = curl_requests.Session()
# Step 1: Hit the site and get the challenge page
response = session.get(SITE_URL, impersonate="chrome")
if response.status_code == 202:
print("AWS WAF challenge detected — extracting parameters")
html = response.text
# Step 2: Extract the gokuProps parameters from the HTML
key_match = re.search(r'"key":"([^"]+)"', html)
iv_match = re.search(r'"iv":"([^"]+)"', html)
context_match = re.search(r'"context":"([^"]+)"', html)
if key_match and iv_match and context_match:
key = key_match.group(1)
iv = iv_match.group(1)
context = context_match.group(1)
print(f"Extracted key: {key[:20]}...")
print(f"Extracted iv: {iv[:20]}...")
Those three parameters — key, iv, and context — are everything the solver needs. They're embedded in the challenge HTML inside a window.gokuProps object.
Now feed them to the solver:
from awswaf.aws import AwsWaf
# Build the goku params dict from extracted values
goku = {
"key": key,
"iv": iv,
"context": context,
}
# The solver computes the token locally
# Second argument is the WAF endpoint URL from the challenge script
token = AwsWaf(goku, endpoint_url, "protected-site.com")()
# Use the token in subsequent requests
response = session.get(
SITE_URL,
impersonate="chrome",
cookies={"aws-waf-token": token},
)
print(f"Status after bypass: {response.status_code}")
The AwsWaf class handles the cryptographic computation that would normally happen inside your browser's JavaScript engine. It produces the same aws-waf-token that a real browser would generate.
Why this matters for scale: A Playwright instance takes 5–10 seconds per token. The solver generates one in under 100 milliseconds. At high volumes, that difference is the gap between a working pipeline and an unusable one.
Limitations to know about: AWS periodically updates their challenge scripts. When they do, the solver may need updating too. The maintainers have been responsive, but there's always a window where a new WAF version breaks things. Monitor the GitHub issues to stay current.
The Golang version of the solver is even faster. If you're running a Go-based scraping pipeline, use it instead of the Python wrapper.
How the AWS WAF Token Generator Works
If you're wondering what's actually happening inside the solver, here's the short version. When AWS WAF serves a challenge page, it embeds encrypted parameters in a JavaScript object called window.gokuProps. That object contains three fields:
key— An AES encryption key used to encrypt the challenge responseiv— The initialization vector for the AES ciphercontext— A session identifier tied to your specific challenge
The challenge script (challenge.js) normally runs in your browser, collects telemetry data (screen resolution, installed plugins, Canvas fingerprint, WebGL renderer), encrypts it using those parameters, and sends the result to AWS's verification endpoint. If the payload looks legitimate, AWS returns an aws-waf-token cookie.
The open-source solver skips the browser entirely. It constructs a plausible telemetry payload, encrypts it with the same AES parameters, and submits it directly. The result is a valid aws-waf-token that AWS accepts.
This is why the solver breaks when AWS changes their challenge script — the expected telemetry format or encryption flow changes, and the solver needs updating to match.
Advanced Methods
4. Residential Proxy Rotation
IP reputation is one of AWS WAF's strongest signals. A datacenter IP from Hetzner or DigitalOcean gets flagged before anything else is even evaluated.
Residential proxies solve this by routing your requests through real ISP-assigned IPs. AWS WAF sees traffic from Comcast, Verizon, or Deutsche Telekom — the same providers that real users connect through.
Difficulty: Easy (once you have proxy access)
Cost: $$ (residential proxy bandwidth isn't cheap)
Reliability against AWS WAF: Medium — solves IP problems but not challenge problems
This method works best combined with Methods 1–3. Proxies handle the IP layer while TLS spoofing or token solving handles the challenge layer.
Here's how to set up rotating residential proxies with curl_cffi:
from curl_cffi import requests as curl_requests
import random
# Your proxy list — residential IPs from your provider
proxies = [
"http://user:pass@us-res-1.provider.com:8080",
"http://user:pass@us-res-2.provider.com:8080",
"http://user:pass@de-res-1.provider.com:8080",
"http://user:pass@uk-res-1.provider.com:8080",
]
def fetch_with_rotation(url):
proxy = random.choice(proxies)
response = curl_requests.get(
url,
impersonate="chrome",
proxies={"https": proxy, "http": proxy},
timeout=30,
)
return response
response = fetch_with_rotation("https://protected-site.com")
print(f"Status: {response.status_code}")
Random selection is the simplest rotation strategy, but not the best. A smarter approach tracks which proxies are getting blocked:
import time
class ProxyRotator:
def __init__(self, proxy_list):
self.proxies = proxy_list
self.failures = {} # track failures per proxy
def get_proxy(self):
"""Return the proxy with the fewest recent failures."""
now = time.time()
# Clear failures older than 10 minutes
for proxy in list(self.failures.keys()):
self.failures[proxy] = [
t for t in self.failures[proxy]
if now - t < 600
]
# Sort by failure count, pick the best
scored = sorted(
self.proxies,
key=lambda p: len(self.failures.get(p, []))
)
return scored[0]
def report_failure(self, proxy):
if proxy not in self.failures:
self.failures[proxy] = []
self.failures[proxy].append(time.time())
This rotator automatically avoids proxies that are getting blocked, spreading the load across healthy IPs.
Geo-targeting matters. If the site you're scraping is US-based, use US residential proxies. AWS WAF flags geographic inconsistencies — a request claiming to be Chrome on Windows but originating from a Kenyan IP stands out.
When this works: As a complement to other methods. Proxies alone won't bypass JS challenges, but they prevent IP blocks that would make other methods fail.
When this fails: Against sites that validate the aws-waf-token cookie regardless of IP. You still need a valid token even with clean residential IPs.
5. Combined Bypass Engine
In production, no single method is reliable enough on its own. AWS WAF layers its defenses, so your bypass needs to be layered too.
This approach chains the previous four methods into a fault-tolerant system. It tries the fastest method first and falls back to more expensive ones when needed.
Difficulty: Hard
Cost: $$ (proxy costs, but solver and browser are free)
Reliability against AWS WAF: Very high
Here's the architecture:
┌─────────────────────────────────────────┐
│ Request Handler │
├─────────────────────────────────────────┤
│ 1. Try curl_cffi with TLS spoofing │
│ ↓ (if 202/403) │
│ 2. Try open-source token solver │
│ ↓ (if solver fails) │
│ 3. Fall back to Playwright + stealth │
│ ↓ (always) │
│ 4. Route through residential proxies │
└─────────────────────────────────────────┘
And here's a working implementation:
from curl_cffi import requests as curl_requests
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import random
import time
import re
class AwsWafBypass:
def __init__(self, proxies=None):
self.proxies = proxies or []
self.token_cache = {} # domain -> (token, expiry)
self.session = curl_requests.Session()
def fetch(self, url):
"""Main entry point — tries methods in order."""
from urllib.parse import urlparse
domain = urlparse(url).netloc
# Check for cached valid token
token = self._get_cached_token(domain)
proxy_dict = self._get_proxy_dict()
# Attempt 1: Direct request with TLS spoofing
response = self.session.get(
url,
impersonate="chrome",
proxies=proxy_dict,
cookies={"aws-waf-token": token} if token else {},
timeout=30,
)
if response.status_code == 200:
return response
# Attempt 2: Token solver
if response.status_code in [202, 405]:
token = self._solve_token(response.text, domain)
if token:
self._cache_token(domain, token)
response = self.session.get(
url,
impersonate="chrome",
proxies=proxy_dict,
cookies={"aws-waf-token": token},
)
if response.status_code == 200:
return response
# Attempt 3: Full browser fallback
token = self._browser_solve(url)
if token:
self._cache_token(domain, token)
response = self.session.get(
url,
impersonate="chrome",
proxies=proxy_dict,
cookies={"aws-waf-token": token},
)
return response
The helper methods handle each bypass layer:
def _solve_token(self, html, domain):
"""Try the open-source solver first — it's fastest."""
try:
from awswaf.aws import AwsWaf
key = re.search(r'"key":"([^"]+)"', html)
iv = re.search(r'"iv":"([^"]+)"', html)
ctx = re.search(r'"context":"([^"]+)"', html)
if key and iv and ctx:
goku = {
"key": key.group(1),
"iv": iv.group(1),
"context": ctx.group(1),
}
# Extract endpoint from challenge script URL
endpoint = re.search(
r'src="(https://[^"]*awswaf[^"]*)"', html
)
ep = endpoint.group(1) if endpoint else ""
return AwsWaf(goku, ep, domain)()
except Exception as e:
print(f"Solver failed: {e}")
return None
def _browser_solve(self, url):
"""Playwright fallback — slower but very reliable."""
try:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
ctx = browser.new_context()
page = ctx.new_page()
stealth_sync(page)
page.goto(url, wait_until="networkidle")
time.sleep(5)
for cookie in ctx.cookies():
if "aws-waf-token" in cookie["name"]:
browser.close()
return cookie["value"]
browser.close()
except Exception as e:
print(f"Browser fallback failed: {e}")
return None
def _get_proxy_dict(self):
if not self.proxies:
return {}
proxy = random.choice(self.proxies)
return {"https": proxy, "http": proxy}
def _get_cached_token(self, domain):
if domain in self.token_cache:
token, expiry = self.token_cache[domain]
if time.time() < expiry:
return token
del self.token_cache[domain]
return None
def _cache_token(self, domain, token, ttl=300):
"""Cache token for 5 minutes (conservative default)."""
self.token_cache[domain] = (token, time.time() + ttl)
Use the engine like this:
proxies = [
"http://user:pass@residential1.provider.com:8080",
"http://user:pass@residential2.provider.com:8080",
]
engine = AwsWafBypass(proxies=proxies)
urls = [
"https://protected-site.com/page/1",
"https://protected-site.com/page/2",
"https://protected-site.com/page/3",
]
for url in urls:
response = engine.fetch(url)
print(f"{url} → {response.status_code}")
time.sleep(random.uniform(1, 3)) # human-like delays
The time.sleep(random.uniform(1, 3)) at the end isn't optional. AWS WAF's behavioral analysis flags perfectly timed requests. Adding jitter makes your traffic pattern look human.
Token caching is the performance multiplier here. Without it, you're solving a new challenge for every request. With a 5-minute cache, you solve once and scrape freely until expiry.
Which Method Should You Use?
| Your Situation | Start With | Why |
|---|---|---|
| Getting 403s on first request | Method 1 (curl_cffi) | TLS fingerprint is likely the issue |
| Getting 202 challenge pages | Method 3 (token solver) | Fastest way to solve JS challenges |
| Token solver fails | Method 2 (Playwright) | Browser always solves JS challenges |
| Getting blocked after a few requests | Method 4 (proxies) | Your IP is being flagged |
| Building a production pipeline | Method 5 (combined) | Layer all defenses for reliability |
If you're hitting a site protected by AWS WAF Bot Control (a premium add-on that includes ML-based detection), expect to need the combined approach. Bot Control checks are more sophisticated than standard AWS WAF rules.
Troubleshooting
"403 Forbidden" on every request
Cause: Your IP is on a blocklist, or your TLS fingerprint is non-browser.
Fix: Switch to curl_cffi with impersonate="chrome". If still blocked, route through a residential proxy. Datacenter IPs get blocked preemptively by AWS WAF.
"202" response with challenge HTML
Cause: AWS WAF is serving a JavaScript challenge. Your client can't execute it.
Fix: Use the open-source token solver (Method 3) or Playwright (Method 2). Parse the response for window.gokuProps to extract the challenge parameters.
Token works once, then stops
Cause: The aws-waf-token has expired. Token TTL varies per site (typically 5–15 minutes).
Fix: Implement token refresh logic. When you get a 403 after previously successful requests, discard the cached token and re-solve the challenge.
Playwright gets detected
Cause: Missing the stealth plugin, or Playwright's automation flags are visible.
Fix: Ensure stealth_sync(page) is called before navigation. Add the --disable-blink-features=AutomationControlled flag. Use a realistic viewport size and user agent string.
"405 Method Not Allowed" response
Cause: The site uses AWS WAF's CAPTCHA challenge, not just JavaScript.
Fix: CAPTCHA challenges are harder. You'll need either a CAPTCHA-solving approach or manual intervention. The open-source solver handles some 405 scenarios via its Gemini integration, but reliability varies.
AWS WAF Managed Rules: XSS, SQLi, and CrossSiteScripting_BODY
If you're doing security testing rather than scraping, AWS WAF's managed rule groups are what you're fighting against. The two most common are the Core Rule Set (CRS) and the SQL Injection rule set.
The CrossSiteScripting_BODY rule inspects request bodies for XSS payloads. It catches most standard <script> tags, event handlers like onerror and onload, and common injection patterns. The rule operates on the first 8KB of the body — the same inspection limit discussed earlier.
The SQLiRuleSet works similarly for SQL injection patterns. It flags common payloads like ' OR 1=1-- and UNION SELECT statements.
Both rule sets have documented bypasses. Sysdig's research team found that the onbeforetoggle DOM event wasn't caught by the CRS XSS rules. Characters like <!, /, !, %, and ? have been observed to bypass pattern matching in certain AWS WAF configurations.
A few things worth knowing about AWS WAF rule behavior:
SizeConstraint rules let site owners reject requests over a certain size. This is the counter to the 8KB inspection limit — but it also blocks legitimate large uploads, so not everyone enables it.
Custom rules can match specific patterns, geographic origins, or header values. Every AWS WAF deployment is different because site owners mix managed rules with custom logic.
Rate-based rules track request volume per IP. These are separate from the challenge system and trigger after a configurable threshold (minimum 100 requests per 5-minute window).
For a full catalog of WAF bypass techniques across every major provider, the 0xInfection/Awesome-WAF repository on GitHub is the most complete reference. It covers fingerprinting methods, known bypasses, and evasion techniques for AWS WAF, Cloudflare, Akamai, Imperva, and dozens of other providers.
Tools and Resources
Here's a reference list of every tool mentioned in this guide, plus a few extras worth bookmarking.
| Tool | What It Does | Link |
|---|---|---|
curl_cffi |
Python HTTP client with TLS fingerprint spoofing | GitHub |
curl-impersonate |
Command-line curl with browser TLS signatures | GitHub |
xKiian/awswaf |
Open-source AWS WAF token generator (Python + Go) | GitHub |
playwright-stealth |
Anti-detection patches for Playwright | PyPI |
Awesome-WAF |
Encyclopedia of WAF bypasses and fingerprints | GitHub |
waf-bypass (Nemesida) |
WAF testing tool for false positives/negatives | GitHub |
waf-btk |
HTTP proxy for body padding past 8KB limit | GitHub |
A Note on Responsible Scraping
AWS WAF exists to protect web applications. Bypassing it to scrape public data is one thing. Using these techniques to access private data, overload servers, or violate terms of service is another.
Always respect robots.txt directives. Rate-limit your requests — if you're scraping 10 pages per second, you're probably causing problems. Identify yourself with a descriptive User-Agent when possible.
The techniques here work because AWS WAF can't perfectly distinguish between a human user and a well-built scraper. Use that capability for legitimate data collection, not abuse.
Wrapping Up
AWS WAF is one of the more layered protections you'll encounter, but each layer has a specific bypass. TLS fingerprint spoofing handles the handshake check. Browser automation or the open-source solver handles JavaScript challenges. Residential proxies handle IP reputation.
For most scrapers, the open-source token solver combined with curl_cffi and proxy rotation covers 90% of AWS WAF-protected sites. Fall back to Playwright when the solver can't handle a particular challenge variant.
Things change fast. AWS updates their detection regularly, and the open-source tools update in response. Keep your dependencies current and monitor your success rates.