Bypass

How to bypass AWS WAF in 2026: 5 working (free) methods

You've built your scraper. It pulls clean data from test pages without breaking a sweat. Then you point it at a real target and get slapped with a 403 or a blank page that says "Access Denied."

That's AWS WAF. Amazon's firewall sits in front of millions of sites, and it's gotten significantly harder to get past in the last year.

This guide covers five methods to bypass AWS WAF — from quick header fixes to open-source token solvers. Every approach here is something you build and control yourself. No paid APIs, no managed services. Just working code.

What Is AWS WAF and How Does It Block Scrapers?

AWS WAF (Web Application Firewall) is Amazon's cloud-based security layer that inspects HTTP/HTTPS traffic before it reaches the origin server. It uses multiple detection layers simultaneously — IP reputation, TLS fingerprinting, JavaScript challenges, and behavioral analysis. The most common trigger for scrapers is the aws-waf-token cookie challenge, which requires valid JavaScript execution to pass.

Here's what happens when your scraper hits an AWS WAF-protected site. The firewall checks your request against several signals at once.

IP reputation is the first gate. AWS maintains lists of known datacenter ranges, VPN endpoints, and previously flagged addresses. If your request comes from AWS, GCP, Hetzner, or any major cloud provider, it's likely flagged before anything else is evaluated.

TLS fingerprinting is the second check. Every HTTP client produces a unique signature during the TLS handshake. Python's requests library has a fingerprint that looks nothing like Chrome. AWS WAF can tell the difference instantly.

JavaScript challenges are where most scrapers die. AWS WAF injects a script that collects browser telemetry — Canvas rendering, WebGL capabilities, installed fonts, screen dimensions. Your browser executes this script and generates an aws-waf-token cookie. No token, no access.

Rate limiting catches everything else. Too many requests from one IP in a short window triggers throttling or a full block. AWS WAF also tracks behavioral patterns like navigation speed and request sequencing.

AWS WAF has two challenge levels you'll encounter:

  • HTTP 202 response — JavaScript challenge only. Your client needs to execute JS and return the computed token.
  • HTTP 405 response — Full CAPTCHA. A visual puzzle on top of the JS challenge. Much harder to handle programmatically.

The AWS WAF 8KB Body Inspection Limit

There's a well-documented limitation worth knowing: AWS WAF only inspects the first 8,192 bytes (8KB) of a request body by default. Anything beyond that limit gets forwarded directly to the origin server without inspection.

This 8KB limit is relevant if you're sending POST requests to an API endpoint behind AWS WAF. A request body padded beyond 8KB passes the WAF's rule engine entirely. The ghostsecurityarchive/waf-btk project on GitHub demonstrates this with an HTTP proxy that pads request bodies automatically.

AWS introduced SizeConstraint rules to counter this — site owners can reject oversized requests outright. But not every site configures them, and legitimate file uploads or large JSON payloads often exceed 8KB anyway, making blanket size restrictions impractical.

For scraping, this matters less than the token challenge. But if you're interacting with a WAF-protected API that blocks specific request patterns, padding your POST body past 8KB is a valid trick.

5 Methods to Bypass AWS WAF

Here's what each method costs, how hard it is to set up, and when it works best.

Method Difficulty Cost Best For Reliability
TLS fingerprint spoofing Easy Free Sites with light protection Medium
Browser automation Medium Free JS challenge bypass High
Open-source token solver Medium Free High-volume scraping High
Residential proxy rotation Easy $$ IP-based blocking Medium
Combined bypass engine Hard $$ Production-scale scraping Very High

Quick recommendation: If you're blocked right now and need data today, start with Method 2 (Playwright). For production workloads, jump straight to Method 5.

How to Identify AWS WAF Protection

Before you try to bypass anything, confirm you're actually dealing with AWS WAF. Misidentifying the protection wastes time.

Check your response headers for x-amzn-waf-action or x-amz-cf-id. Look at the response body for references to challenge.js or captcha.js from an awswaf.com domain.

The most reliable signal is the cookie itself. Open your browser's dev tools, visit the target, and check for a cookie named aws-waf-token.

Here's a quick Python script that checks programmatically:

import requests

url = "https://target-site.com"
response = requests.get(url, allow_redirects=False)

# Check status code
if response.status_code in [202, 405]:
    print(f"AWS WAF challenge detected (HTTP {response.status_code})")

# Check for WAF-specific headers
waf_headers = ['x-amzn-waf-action', 'x-amz-cf-id']
for header in waf_headers:
    if header in response.headers:
        print(f"Found WAF header: {header}")

# Check for challenge scripts in response body
if 'awswaf.com' in response.text or 'challenge.js' in response.text:
    print("AWS WAF challenge page detected in response body")

A 202 status code is the clearest indicator. Standard web servers almost never return 202 for a GET request — that's AWS WAF telling you to solve its challenge.

Basic Methods

1. TLS Fingerprint Spoofing with curl_cffi

Standard Python HTTP clients like requests and httpx have TLS fingerprints that look nothing like a real browser. AWS WAF checks this fingerprint during the handshake and blocks non-browser clients before the request even completes.

curl_cffi solves this by wrapping curl-impersonate, which replicates the exact TLS signatures of real browsers — including JA3 hashes, HTTP/2 settings, and header ordering.

Best for: Sites where AWS WAF is set to low/medium sensitivity
Difficulty: Easy
Cost: Free
Reliability against AWS WAF: Medium — passes TLS checks but won't solve JS challenges alone

Install it first:

pip install curl_cffi --break-system-packages

Here's how to make a request that looks like it came from Chrome:

from curl_cffi import requests

# impersonate="chrome" uses the latest Chrome TLS fingerprint
response = requests.get(
    "https://protected-site.com",
    impersonate="chrome",  # matches Chrome's exact TLS signature
    headers={
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    }
)

print(f"Status: {response.status_code}")
print(f"Cookies: {response.cookies}")

The impersonate="chrome" parameter does the heavy lifting. It changes the entire TLS handshake to match Chrome's signature — cipher suites, extensions, ALPN protocols, everything.

You can also use sessions to maintain cookies across requests, which matters because the aws-waf-token needs to persist:

from curl_cffi import requests

session = requests.Session()

# First request — may trigger a challenge
response = session.get(
    "https://protected-site.com",
    impersonate="chrome"
)

# If we got a token cookie, subsequent requests use it automatically
response = session.get(
    "https://protected-site.com/data-page",
    impersonate="chrome"
)
print(response.text[:500])

Sessions preserve the aws-waf-token cookie across requests. If the initial request passes the TLS check and returns data, you're set.

When this works: Sites with basic AWS WAF rules that primarily check TLS fingerprints and headers. You'll know within one request.

When this fails: Sites that require JavaScript execution to generate the aws-waf-token. If you get a 202 response even with curl_cffi, you need Method 2 or 3.

Bypass AWS WAF from the Command Line

If you prefer working from the terminal, curl-impersonate gives you the same TLS spoofing as curl_cffi but as a drop-in replacement for curl. It ships as Docker images — one for Chrome's fingerprint, one for Firefox's.

Pull the Chrome variant and make a request:

docker pull lwthiker/curl-impersonate:0.6-chrome

docker run --rm lwthiker/curl-impersonate:0.6-chrome \
  curl_chrome116 \
  -s -L \
  -H "Accept: text/html,application/xhtml+xml" \
  -H "Accept-Language: en-US,en;q=0.9" \
  -c /tmp/cookies.txt \
  "https://protected-site.com"

The curl_chrome116 binary produces TLS handshakes identical to Chrome 116. The -c flag saves cookies — including any aws-waf-token — to a file for reuse in subsequent requests.

To reuse the token on follow-up requests:

docker run --rm lwthiker/curl-impersonate:0.6-chrome \
  curl_chrome116 \
  -s -L \
  -b /tmp/cookies.txt \
  "https://protected-site.com/api/data"

This is useful for quick testing and debugging before you write Python code. If curl_chrome116 gets you a 200, you know TLS spoofing is enough. If you get a 202, you need the token solver or browser automation.

Intermediate Methods

2. Browser Automation with Playwright Stealth

When AWS WAF demands JavaScript execution, you need an actual browser. Playwright with stealth plugins gives you a real Chromium instance that executes the challenge script and generates a valid token.

Best for: Sites requiring JS challenge completion
Difficulty: Medium
Cost: Free
Reliability against AWS WAF: High for JS challenges, medium for CAPTCHA challenges

The key ingredient is playwright-stealth, which patches the telltale signs that Playwright is automated — things like navigator.webdriver being set to true, missing plugin arrays, and incorrect viewport properties.

Install the dependencies:

pip install playwright playwright-stealth --break-system-packages
playwright install chromium

Here's a working implementation:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import time

def bypass_aws_waf(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--no-sandbox",
            ]
        )
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36",
        )
        page = context.new_page()
        stealth_sync(page)  # patches automation detection signals

        page.goto(url, wait_until="networkidle")

        # Wait for the WAF challenge to resolve
        # AWS WAF challenges typically complete within 5 seconds
        time.sleep(5)

        # Extract the aws-waf-token cookie
        cookies = context.cookies()
        waf_token = None
        for cookie in cookies:
            if "aws-waf-token" in cookie["name"]:
                waf_token = cookie["value"]
                break

        content = page.content()
        browser.close()
        return waf_token, content

token, html = bypass_aws_waf("https://protected-site.com")
if token:
    print(f"Got WAF token: {token[:50]}...")
else:
    print("No WAF token found — site may not use AWS WAF")

The stealth_sync(page) call is doing most of the anti-detection work. Without it, AWS WAF's browser fingerprinting catches Playwright immediately.

Reusing tokens across HTTP requests is where this gets practical. You don't want to spin up a browser for every request. Get the token once, then use it with curl_cffi:

from curl_cffi import requests as curl_requests

def scrape_with_token(url, waf_token):
    """Use a browser-obtained token with fast HTTP requests."""
    session = curl_requests.Session()
    response = session.get(
        url,
        impersonate="chrome",
        cookies={"aws-waf-token": waf_token},
    )
    return response

# Get token via Playwright (Method 2)
token, _ = bypass_aws_waf("https://protected-site.com")

# Reuse token for fast scraping
for page_num in range(1, 50):
    url = f"https://protected-site.com/data?page={page_num}"
    response = scrape_with_token(url, token)
    print(f"Page {page_num}: {response.status_code}")

This hybrid approach gives you the reliability of a real browser for token generation and the speed of HTTP requests for data collection.

Token expiration matters. AWS WAF tokens typically last 5–15 minutes depending on the site's configuration. Build in a refresh mechanism that re-launches the browser when you start getting 403s.

When this works: Most AWS WAF-protected sites with JS challenges (HTTP 202). Very reliable for sites that don't rotate their challenge scripts frequently.

When this fails: Sites with full CAPTCHA challenges (HTTP 405), or sites that re-validate the token with every request. Also slow — each browser session takes 5–10 seconds to initialize.

3. Open-Source Token Solver

This is the most powerful method available right now. An open-source project reverse-engineered the AWS WAF token generation process, letting you compute valid tokens without a browser.

Best for: High-volume scraping where browser overhead is unacceptable
Difficulty: Medium
Cost: Free
Reliability against AWS WAF: High

The solver extracts the encrypted challenge parameters (key, iv, context) from the initial AWS WAF response and computes the token locally. No headless browser, no JavaScript execution — just math.

The project is xKiian/awswaf on GitHub, available in both Python and Golang.

Install it:

pip install awswaf --break-system-packages

Here's how it works in practice:

import re
from curl_cffi import requests as curl_requests

SITE_URL = "https://protected-site.com"

session = curl_requests.Session()

# Step 1: Hit the site and get the challenge page
response = session.get(SITE_URL, impersonate="chrome")

if response.status_code == 202:
    print("AWS WAF challenge detected — extracting parameters")

    html = response.text

    # Step 2: Extract the gokuProps parameters from the HTML
    key_match = re.search(r'"key":"([^"]+)"', html)
    iv_match = re.search(r'"iv":"([^"]+)"', html)
    context_match = re.search(r'"context":"([^"]+)"', html)

    if key_match and iv_match and context_match:
        key = key_match.group(1)
        iv = iv_match.group(1)
        context = context_match.group(1)

        print(f"Extracted key: {key[:20]}...")
        print(f"Extracted iv: {iv[:20]}...")

Those three parameters — key, iv, and context — are everything the solver needs. They're embedded in the challenge HTML inside a window.gokuProps object.

Now feed them to the solver:

from awswaf.aws import AwsWaf

# Build the goku params dict from extracted values
goku = {
    "key": key,
    "iv": iv,
    "context": context,
}

# The solver computes the token locally
# Second argument is the WAF endpoint URL from the challenge script
token = AwsWaf(goku, endpoint_url, "protected-site.com")()

# Use the token in subsequent requests
response = session.get(
    SITE_URL,
    impersonate="chrome",
    cookies={"aws-waf-token": token},
)
print(f"Status after bypass: {response.status_code}")

The AwsWaf class handles the cryptographic computation that would normally happen inside your browser's JavaScript engine. It produces the same aws-waf-token that a real browser would generate.

Why this matters for scale: A Playwright instance takes 5–10 seconds per token. The solver generates one in under 100 milliseconds. At high volumes, that difference is the gap between a working pipeline and an unusable one.

Limitations to know about: AWS periodically updates their challenge scripts. When they do, the solver may need updating too. The maintainers have been responsive, but there's always a window where a new WAF version breaks things. Monitor the GitHub issues to stay current.

The Golang version of the solver is even faster. If you're running a Go-based scraping pipeline, use it instead of the Python wrapper.

How the AWS WAF Token Generator Works

If you're wondering what's actually happening inside the solver, here's the short version. When AWS WAF serves a challenge page, it embeds encrypted parameters in a JavaScript object called window.gokuProps. That object contains three fields:

  • key — An AES encryption key used to encrypt the challenge response
  • iv — The initialization vector for the AES cipher
  • context — A session identifier tied to your specific challenge

The challenge script (challenge.js) normally runs in your browser, collects telemetry data (screen resolution, installed plugins, Canvas fingerprint, WebGL renderer), encrypts it using those parameters, and sends the result to AWS's verification endpoint. If the payload looks legitimate, AWS returns an aws-waf-token cookie.

The open-source solver skips the browser entirely. It constructs a plausible telemetry payload, encrypts it with the same AES parameters, and submits it directly. The result is a valid aws-waf-token that AWS accepts.

This is why the solver breaks when AWS changes their challenge script — the expected telemetry format or encryption flow changes, and the solver needs updating to match.

Advanced Methods

4. Residential Proxy Rotation

IP reputation is one of AWS WAF's strongest signals. A datacenter IP from Hetzner or DigitalOcean gets flagged before anything else is even evaluated.

Residential proxies solve this by routing your requests through real ISP-assigned IPs. AWS WAF sees traffic from Comcast, Verizon, or Deutsche Telekom — the same providers that real users connect through.

Best for: Avoiding IP-based blocks and rate limiting
Difficulty: Easy (once you have proxy access)
Cost: $$ (residential proxy bandwidth isn't cheap)
Reliability against AWS WAF: Medium — solves IP problems but not challenge problems

This method works best combined with Methods 1–3. Proxies handle the IP layer while TLS spoofing or token solving handles the challenge layer.

Here's how to set up rotating residential proxies with curl_cffi:

from curl_cffi import requests as curl_requests
import random

# Your proxy list — residential IPs from your provider
proxies = [
    "http://user:pass@us-res-1.provider.com:8080",
    "http://user:pass@us-res-2.provider.com:8080",
    "http://user:pass@de-res-1.provider.com:8080",
    "http://user:pass@uk-res-1.provider.com:8080",
]

def fetch_with_rotation(url):
    proxy = random.choice(proxies)
    response = curl_requests.get(
        url,
        impersonate="chrome",
        proxies={"https": proxy, "http": proxy},
        timeout=30,
    )
    return response

response = fetch_with_rotation("https://protected-site.com")
print(f"Status: {response.status_code}")

Random selection is the simplest rotation strategy, but not the best. A smarter approach tracks which proxies are getting blocked:

import time

class ProxyRotator:
    def __init__(self, proxy_list):
        self.proxies = proxy_list
        self.failures = {}  # track failures per proxy

    def get_proxy(self):
        """Return the proxy with the fewest recent failures."""
        now = time.time()
        # Clear failures older than 10 minutes
        for proxy in list(self.failures.keys()):
            self.failures[proxy] = [
                t for t in self.failures[proxy]
                if now - t < 600
            ]

        # Sort by failure count, pick the best
        scored = sorted(
            self.proxies,
            key=lambda p: len(self.failures.get(p, []))
        )
        return scored[0]

    def report_failure(self, proxy):
        if proxy not in self.failures:
            self.failures[proxy] = []
        self.failures[proxy].append(time.time())

This rotator automatically avoids proxies that are getting blocked, spreading the load across healthy IPs.

Geo-targeting matters. If the site you're scraping is US-based, use US residential proxies. AWS WAF flags geographic inconsistencies — a request claiming to be Chrome on Windows but originating from a Kenyan IP stands out.

When this works: As a complement to other methods. Proxies alone won't bypass JS challenges, but they prevent IP blocks that would make other methods fail.

When this fails: Against sites that validate the aws-waf-token cookie regardless of IP. You still need a valid token even with clean residential IPs.

5. Combined Bypass Engine

In production, no single method is reliable enough on its own. AWS WAF layers its defenses, so your bypass needs to be layered too.

This approach chains the previous four methods into a fault-tolerant system. It tries the fastest method first and falls back to more expensive ones when needed.

Best for: Production-scale scraping
Difficulty: Hard
Cost: $$ (proxy costs, but solver and browser are free)
Reliability against AWS WAF: Very high

Here's the architecture:

┌─────────────────────────────────────────┐
│            Request Handler              │
├─────────────────────────────────────────┤
│ 1. Try curl_cffi with TLS spoofing     │
│    ↓ (if 202/403)                       │
│ 2. Try open-source token solver        │
│    ↓ (if solver fails)                  │
│ 3. Fall back to Playwright + stealth   │
│    ↓ (always)                           │
│ 4. Route through residential proxies   │
└─────────────────────────────────────────┘

And here's a working implementation:

from curl_cffi import requests as curl_requests
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import random
import time
import re

class AwsWafBypass:
    def __init__(self, proxies=None):
        self.proxies = proxies or []
        self.token_cache = {}  # domain -> (token, expiry)
        self.session = curl_requests.Session()

    def fetch(self, url):
        """Main entry point — tries methods in order."""
        from urllib.parse import urlparse
        domain = urlparse(url).netloc

        # Check for cached valid token
        token = self._get_cached_token(domain)

        proxy_dict = self._get_proxy_dict()

        # Attempt 1: Direct request with TLS spoofing
        response = self.session.get(
            url,
            impersonate="chrome",
            proxies=proxy_dict,
            cookies={"aws-waf-token": token} if token else {},
            timeout=30,
        )

        if response.status_code == 200:
            return response

        # Attempt 2: Token solver
        if response.status_code in [202, 405]:
            token = self._solve_token(response.text, domain)
            if token:
                self._cache_token(domain, token)
                response = self.session.get(
                    url,
                    impersonate="chrome",
                    proxies=proxy_dict,
                    cookies={"aws-waf-token": token},
                )
                if response.status_code == 200:
                    return response

        # Attempt 3: Full browser fallback
        token = self._browser_solve(url)
        if token:
            self._cache_token(domain, token)
            response = self.session.get(
                url,
                impersonate="chrome",
                proxies=proxy_dict,
                cookies={"aws-waf-token": token},
            )
        return response

The helper methods handle each bypass layer:

    def _solve_token(self, html, domain):
        """Try the open-source solver first — it's fastest."""
        try:
            from awswaf.aws import AwsWaf

            key = re.search(r'"key":"([^"]+)"', html)
            iv = re.search(r'"iv":"([^"]+)"', html)
            ctx = re.search(r'"context":"([^"]+)"', html)

            if key and iv and ctx:
                goku = {
                    "key": key.group(1),
                    "iv": iv.group(1),
                    "context": ctx.group(1),
                }
                # Extract endpoint from challenge script URL
                endpoint = re.search(
                    r'src="(https://[^"]*awswaf[^"]*)"', html
                )
                ep = endpoint.group(1) if endpoint else ""
                return AwsWaf(goku, ep, domain)()
        except Exception as e:
            print(f"Solver failed: {e}")
        return None

    def _browser_solve(self, url):
        """Playwright fallback — slower but very reliable."""
        try:
            with sync_playwright() as p:
                browser = p.chromium.launch(headless=True)
                ctx = browser.new_context()
                page = ctx.new_page()
                stealth_sync(page)
                page.goto(url, wait_until="networkidle")
                time.sleep(5)

                for cookie in ctx.cookies():
                    if "aws-waf-token" in cookie["name"]:
                        browser.close()
                        return cookie["value"]
                browser.close()
        except Exception as e:
            print(f"Browser fallback failed: {e}")
        return None

    def _get_proxy_dict(self):
        if not self.proxies:
            return {}
        proxy = random.choice(self.proxies)
        return {"https": proxy, "http": proxy}

    def _get_cached_token(self, domain):
        if domain in self.token_cache:
            token, expiry = self.token_cache[domain]
            if time.time() < expiry:
                return token
            del self.token_cache[domain]
        return None

    def _cache_token(self, domain, token, ttl=300):
        """Cache token for 5 minutes (conservative default)."""
        self.token_cache[domain] = (token, time.time() + ttl)

Use the engine like this:

proxies = [
    "http://user:pass@residential1.provider.com:8080",
    "http://user:pass@residential2.provider.com:8080",
]

engine = AwsWafBypass(proxies=proxies)

urls = [
    "https://protected-site.com/page/1",
    "https://protected-site.com/page/2",
    "https://protected-site.com/page/3",
]

for url in urls:
    response = engine.fetch(url)
    print(f"{url} → {response.status_code}")
    time.sleep(random.uniform(1, 3))  # human-like delays

The time.sleep(random.uniform(1, 3)) at the end isn't optional. AWS WAF's behavioral analysis flags perfectly timed requests. Adding jitter makes your traffic pattern look human.

Token caching is the performance multiplier here. Without it, you're solving a new challenge for every request. With a 5-minute cache, you solve once and scrape freely until expiry.

Which Method Should You Use?

Your Situation Start With Why
Getting 403s on first request Method 1 (curl_cffi) TLS fingerprint is likely the issue
Getting 202 challenge pages Method 3 (token solver) Fastest way to solve JS challenges
Token solver fails Method 2 (Playwright) Browser always solves JS challenges
Getting blocked after a few requests Method 4 (proxies) Your IP is being flagged
Building a production pipeline Method 5 (combined) Layer all defenses for reliability

If you're hitting a site protected by AWS WAF Bot Control (a premium add-on that includes ML-based detection), expect to need the combined approach. Bot Control checks are more sophisticated than standard AWS WAF rules.

Troubleshooting

"403 Forbidden" on every request

Cause: Your IP is on a blocklist, or your TLS fingerprint is non-browser.

Fix: Switch to curl_cffi with impersonate="chrome". If still blocked, route through a residential proxy. Datacenter IPs get blocked preemptively by AWS WAF.

"202" response with challenge HTML

Cause: AWS WAF is serving a JavaScript challenge. Your client can't execute it.

Fix: Use the open-source token solver (Method 3) or Playwright (Method 2). Parse the response for window.gokuProps to extract the challenge parameters.

Token works once, then stops

Cause: The aws-waf-token has expired. Token TTL varies per site (typically 5–15 minutes).

Fix: Implement token refresh logic. When you get a 403 after previously successful requests, discard the cached token and re-solve the challenge.

Playwright gets detected

Cause: Missing the stealth plugin, or Playwright's automation flags are visible.

Fix: Ensure stealth_sync(page) is called before navigation. Add the --disable-blink-features=AutomationControlled flag. Use a realistic viewport size and user agent string.

"405 Method Not Allowed" response

Cause: The site uses AWS WAF's CAPTCHA challenge, not just JavaScript.

Fix: CAPTCHA challenges are harder. You'll need either a CAPTCHA-solving approach or manual intervention. The open-source solver handles some 405 scenarios via its Gemini integration, but reliability varies.

AWS WAF Managed Rules: XSS, SQLi, and CrossSiteScripting_BODY

If you're doing security testing rather than scraping, AWS WAF's managed rule groups are what you're fighting against. The two most common are the Core Rule Set (CRS) and the SQL Injection rule set.

The CrossSiteScripting_BODY rule inspects request bodies for XSS payloads. It catches most standard <script> tags, event handlers like onerror and onload, and common injection patterns. The rule operates on the first 8KB of the body — the same inspection limit discussed earlier.

The SQLiRuleSet works similarly for SQL injection patterns. It flags common payloads like ' OR 1=1-- and UNION SELECT statements.

Both rule sets have documented bypasses. Sysdig's research team found that the onbeforetoggle DOM event wasn't caught by the CRS XSS rules. Characters like <!, /, !, %, and ? have been observed to bypass pattern matching in certain AWS WAF configurations.

A few things worth knowing about AWS WAF rule behavior:

SizeConstraint rules let site owners reject requests over a certain size. This is the counter to the 8KB inspection limit — but it also blocks legitimate large uploads, so not everyone enables it.

Custom rules can match specific patterns, geographic origins, or header values. Every AWS WAF deployment is different because site owners mix managed rules with custom logic.

Rate-based rules track request volume per IP. These are separate from the challenge system and trigger after a configurable threshold (minimum 100 requests per 5-minute window).

For a full catalog of WAF bypass techniques across every major provider, the 0xInfection/Awesome-WAF repository on GitHub is the most complete reference. It covers fingerprinting methods, known bypasses, and evasion techniques for AWS WAF, Cloudflare, Akamai, Imperva, and dozens of other providers.

Tools and Resources

Here's a reference list of every tool mentioned in this guide, plus a few extras worth bookmarking.

Tool What It Does Link
curl_cffi Python HTTP client with TLS fingerprint spoofing GitHub
curl-impersonate Command-line curl with browser TLS signatures GitHub
xKiian/awswaf Open-source AWS WAF token generator (Python + Go) GitHub
playwright-stealth Anti-detection patches for Playwright PyPI
Awesome-WAF Encyclopedia of WAF bypasses and fingerprints GitHub
waf-bypass (Nemesida) WAF testing tool for false positives/negatives GitHub
waf-btk HTTP proxy for body padding past 8KB limit GitHub

A Note on Responsible Scraping

AWS WAF exists to protect web applications. Bypassing it to scrape public data is one thing. Using these techniques to access private data, overload servers, or violate terms of service is another.

Always respect robots.txt directives. Rate-limit your requests — if you're scraping 10 pages per second, you're probably causing problems. Identify yourself with a descriptive User-Agent when possible.

The techniques here work because AWS WAF can't perfectly distinguish between a human user and a well-built scraper. Use that capability for legitimate data collection, not abuse.

Wrapping Up

AWS WAF is one of the more layered protections you'll encounter, but each layer has a specific bypass. TLS fingerprint spoofing handles the handshake check. Browser automation or the open-source solver handles JavaScript challenges. Residential proxies handle IP reputation.

For most scrapers, the open-source token solver combined with curl_cffi and proxy rotation covers 90% of AWS WAF-protected sites. Fall back to Playwright when the solver can't handle a particular challenge variant.

Things change fast. AWS updates their detection regularly, and the open-source tools update in response. Keep your dependencies current and monitor your success rates.