Datadome protection keeps blocking your scraper? You're dealing with one of the toughest anti-bot systems on the web right now. This guide shows you exactly how to bypass Datadome using techniques that actually work in 2026.
Unlike basic protections, Datadome analyzes over a thousand signals—from TLS fingerprints to mouse movements—to determine if you're human or bot. Standard scraping libraries fail immediately.
But here's the good news: with the right approach, you can dramatically improve your success rate. Let's walk through six proven methods, from lightweight HTTP requests to full browser automation.
What you'll learn:
- How Datadome's multi-layered detection works
- Using curl_cffi for TLS fingerprint impersonation
- Deploying Playwright with stealth plugins
- Rotating residential proxies effectively
- Simulating human behavior patterns
- Avoiding honeypot traps
What is Datadome and How Does It Detect Bots?
Datadome is a bot management platform protecting over 1,200 companies worldwide. Major European retailers, ticketing platforms, and e-commerce sites rely on it to block scrapers, credential stuffing attacks, and DDoS attempts.
The system analyzes incoming traffic at multiple levels before deciding to allow, challenge, or block a request. Understanding how it works is the first step toward building an effective bypass Datadome strategy.
The protection operates through two detection stages. Server-side detection examines your request before any HTML loads. It analyzes TLS fingerprints, HTTP headers, IP reputation, request patterns, and connection metadata. If anything looks suspicious, you'll face a CAPTCHA or outright block.
Client-side detection activates after the page loads. JavaScript executes in your browser, fingerprinting everything—from GPU rendering to font availability to how you move your mouse. This behavioral data feeds into machine learning models that continuously adjust your trust score.
What makes Datadome especially challenging is its application-layer integration. Unlike CDN-based protections like Cloudflare, you can't simply find an origin server to bypass. The protection is baked directly into the application.
Most blocks result in HTTP 403 errors. Sometimes you'll see a challenge page requesting JavaScript be enabled or a slider CAPTCHA. The block pages often contain "datadome" in cookies or script references—useful for identifying what you're up against.
Key Detection Signals
Before diving into bypass methods, understand what triggers blocks. Datadome calculates a trust score based on dozens of signals—here are the critical ones:
TLS Fingerprinting creates a JA3 hash from your HTTPS handshake. Cipher suites, extensions, protocol versions, and their ordering combine into a unique fingerprint. Standard Python libraries like requests or httpx have fingerprints that immediately identify them as non-browser clients.
The fingerprint forms during the TLS negotiation before any application data transfers. Different operating systems, browsers, and HTTP libraries produce distinct JA3 hashes. Datadome maintains a database of known bot fingerprints and flags matches.
HTTP/2 Fingerprinting goes beyond TLS. It analyzes frame ordering, header compression patterns, stream priorities, and connection settings. Real browsers implement HTTP/2 with specific quirks that differ from HTTP libraries—another detection vector that catches many scrapers.
Most scraping libraries default to HTTP/1.1 anyway. Using an outdated protocol version is itself a red flag since modern websites operate on HTTP/2 or HTTP/3.
Browser Fingerprinting collects canvas signatures, WebGL data, available fonts, installed plugins, screen resolution, color depth, and the navigator.webdriver property. Default headless browser settings fail these checks spectacularly.
Automated browsers set navigator.webdriver to true by default. They lack plugins like Chrome PDF Viewer. Their canvas rendering produces slightly different outputs. All these signals combine to identify automation.
Behavioral Analysis tracks mouse movements, scroll patterns, keyboard input cadence, and time between actions. Real humans have natural jitter in their movements. Bots move in perfectly straight lines and execute actions with millisecond precision.
Datadome also monitors session behavior—how you navigate through pages, time spent reading content, and whether your browsing patterns match typical human behavior.
IP Reputation heavily weights your IP address. Datacenter IPs get flagged immediately since virtually no real users browse from AWS or Google Cloud. Residential IPs from major ISPs score much higher. Mobile IPs score highest due to their shared nature and association with real device usage.
Datadome maintains extensive databases of known proxy services, hosting providers, and previously flagged addresses. Request rate from a single IP also factors in—excessive speed triggers rate limiting or blocks.
Step 1: Use curl_cffi for TLS Fingerprint Impersonation
For pages that don't require heavy JavaScript, you can bypass Datadome without spinning up a browser. The key is proper TLS fingerprinting—and curl_cffi handles this brilliantly.
The curl_cffi library wraps curl-impersonate, a fork of curl that replicates real browser TLS handshakes down to the cipher suite ordering. Your requests become indistinguishable from Chrome or Safari at the network level.
This library is significantly faster than using headless browsers and consumes far fewer resources. When content doesn't require JavaScript rendering, it's the optimal approach.
Install it with:
pip install curl-cffi
Here's a basic example:
from curl_cffi import requests
response = requests.get(
"https://example.com",
impersonate="chrome"
)
print(response.status_code)
print(response.text[:500])
The impersonate parameter does the heavy lifting. It's not just setting a User-Agent header—it modifies the entire TLS handshake, HTTP/2 settings, and connection behavior to match Chrome's exact fingerprint.
Supported Browser Versions
curl_cffi supports multiple browser fingerprints:
chrome,chrome131,chrome136(latest versions)safari,safari_iosedge101,edge99
Using chrome without a version number automatically selects the latest fingerprint as curl_cffi updates.
Adding Realistic Headers
TLS fingerprinting alone isn't enough. You need proper HTTP headers too:
from curl_cffi import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
}
response = requests.get(
"https://example.com",
headers=headers,
impersonate="chrome"
)
Those Sec-Fetch-* headers matter. Real browsers send them automatically, and missing them is a red flag.
Session Management
Datadome often requires session state. Use curl_cffi's Session object:
from curl_cffi import requests
session = requests.Session(impersonate="chrome")
# First request gets cookies
response = session.get("https://example.com")
# Subsequent requests maintain session
response2 = session.get("https://example.com/data")
This approach works great when content doesn't require JavaScript rendering. For dynamic pages, you'll need browser automation.
Step 2: Deploy Playwright with Stealth Plugins
When HTTP requests hit a wall, headless browsers become necessary. Standard Playwright gets detected instantly though—you need stealth plugins to patch the automation leaks.
The problem is that Playwright automatically configures browsers for automation. It sets navigator.webdriver to true, removes standard plugins, modifies Chrome runtime properties, and leaves dozens of other traces. Datadome's JavaScript fingerprinting catches all of these.
Stealth plugins patch these leaks by modifying browser properties before any page scripts run. They're not perfect—advanced detection still catches some configurations—but they dramatically improve success rates.
Install the dependencies:
npm install playwright-extra puppeteer-extra-plugin-stealth
Here's a working setup:
const { chromium } = require('playwright-extra');
const stealth = require('puppeteer-extra-plugin-stealth')();
chromium.use(stealth);
(async () => {
const browser = await chromium.launch({
headless: true,
args: ['--disable-blink-features=AutomationControlled']
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
locale: 'en-US',
timezoneId: 'America/New_York'
});
const page = await context.newPage();
// Additional property masking
await page.addInitScript(() => {
Object.defineProperty(navigator, 'platform', {
get: () => 'Win32'
});
});
await page.goto('https://example.com', {
waitUntil: 'networkidle'
});
const content = await page.content();
console.log(content.substring(0, 500));
await browser.close();
})();
The addInitScript runs before any page JavaScript, letting you modify navigator properties before Datadome checks them.
What Stealth Plugin Fixes
The stealth plugin patches over 200 headless browser leaks:
- Sets
navigator.webdriverto undefined - Adds missing plugins like Chrome PDF Plugin
- Fixes Chrome runtime inconsistencies
- Patches canvas fingerprinting anomalies
- Corrects permissions API responses
- Fixes WebGL vendor strings
- Removes automation-related Chrome flags
But it doesn't fix everything. Datadome can still detect through CDP (Chrome DevTools Protocol) detection and timing inconsistencies.
CDP detection is a newer technique. When automation tools send commands to the browser, anti-bots can detect the Runtime.enable command that Playwright uses. This exposes automation even with stealth patches applied.
Some newer undetected browser libraries attempt to patch CDP leaks, but this remains an active cat-and-mouse game between automation tools and anti-bot vendors.
Python Alternative
For Python developers, use playwright-stealth:
pip install playwright playwright-stealth
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
def scrape_with_stealth():
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
stealth_sync(page)
page.goto('https://example.com')
content = page.content()
print(content[:500])
browser.close()
scrape_with_stealth()
The stealth_sync function applies all the same patches as the JavaScript version.
Step 3: Rotate Residential Proxies
Even with perfect fingerprints, your IP address can expose you. Datacenter IPs get flagged immediately by Datadome's reputation scoring system.
Residential proxies route traffic through real devices on residential ISPs. To Datadome, requests appear to come from home internet connections in specific cities—not from AWS servers or known hosting providers.
IP reputation accounts for an estimated 25-30% of the trust score. This single factor can determine whether your perfectly crafted request succeeds or fails. Using datacenter proxies while scraping Datadome-protected sites is almost guaranteed to fail.
There are three types of IP addresses to consider:
Residential proxies come from real ISP customers. They have excellent reputation scores since they represent legitimate home users. The downside is cost—residential bandwidth is expensive.
Mobile proxies route through cellular networks. They score highest because mobile carriers frequently recycle IPs among users. Anti-bots are reluctant to block mobile ranges since they'd affect legitimate mobile users.
Datacenter proxies come from hosting providers. They're cheap and fast but have terrible reputation scores. Datadome blocks them almost universally.
For any serious Datadome bypass effort, invest in quality residential or mobile proxies.
Implementing Proxy Rotation
With curl_cffi:
from curl_cffi import requests
import random
PROXY_LIST = [
"http://user:pass@proxy1.example.com:8000",
"http://user:pass@proxy2.example.com:8000",
"http://user:pass@proxy3.example.com:8000",
]
def get_proxy():
proxy = random.choice(PROXY_LIST)
return {"http": proxy, "https": proxy}
response = requests.get(
"https://example.com",
proxies=get_proxy(),
impersonate="chrome"
)
With Playwright:
const proxy = {
server: 'http://proxy.example.com:8000',
username: 'user',
password: 'pass'
};
const browser = await chromium.launch({
headless: true,
proxy: proxy
});
Smart Proxy Management
Instead of random rotation, track which proxies perform best:
class ProxyManager:
def __init__(self, proxy_list):
self.proxies = proxy_list
self.scores = {p: 0 for p in proxy_list}
def get_best_proxy(self):
return max(self.scores.items(), key=lambda x: x[1])[0]
def record_result(self, proxy, success):
self.scores[proxy] += 1 if success else -1
This learns which proxies work over time, reducing blocked requests.
Matching Geolocation
One common mistake: using a US proxy with UK timezone settings. Datadome catches these mismatches. Always align browser settings with proxy location:
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
locale: 'en-GB',
timezoneId: 'Europe/London',
geolocation: { latitude: 51.5074, longitude: -0.1278 },
permissions: ['geolocation']
});
If you need high-quality residential proxies, services like Roundproxies offer residential, datacenter, ISP, and mobile proxy options that work well against anti-bot systems.
Step 4: Simulate Human Behavior
Datadome's behavioral analysis watches for robotic patterns. Perfect straight-line mouse movements, instant page interactions, and machine-precise timing all trigger blocks.
The key insight: humans are messy and inconsistent. We overshoot targets, pause to read, scroll at varying speeds, and move mice in curves rather than straight lines. Your bot needs to mimic this messiness.
Add randomness to everything:
async function humanDelay(min = 100, max = 300) {
const delay = Math.random() * (max - min) + min;
await new Promise(resolve => setTimeout(resolve, delay));
}
async function humanMouseMove(page) {
const viewport = page.viewportSize();
for (let i = 0; i < 5; i++) {
const x = Math.random() * viewport.width;
const y = Math.random() * viewport.height;
// Multiple steps creates curved, human-like movement
await page.mouse.move(x, y, { steps: 10 });
await humanDelay(50, 150);
}
}
async function humanScroll(page) {
const scrollAmount = Math.random() * 300 + 100;
await page.mouse.wheel(0, scrollAmount);
await humanDelay(500, 1500);
}
// Usage in scraping flow
await page.goto('https://example.com');
await humanDelay(1000, 2000);
await humanMouseMove(page);
await humanScroll(page);
await humanDelay(500, 1000);
This creates natural variation that's much harder to distinguish from real users.
Warm-Up Sessions
Real users don't hit product pages directly. They browse homepages, explore categories, maybe use search, then arrive at specific products. Mimicking this navigation pattern builds trust with Datadome's behavioral analysis.
Here's the wrong approach:
# Don't do this - direct access is suspicious
session.get("https://example.com/product/12345")
Here's the right approach:
import time
import random
# Warm up the session first
session.get("https://example.com")
time.sleep(random.uniform(2, 4))
session.get("https://example.com/category/electronics")
time.sleep(random.uniform(1, 3))
# Now access the target page
session.get("https://example.com/product/12345")
For high-value scraping targets, consider even longer warm-up sequences that include search queries, category browsing, and random page visits. The more your session looks like a real shopping journey, the better your trust score.
Some scrapers maintain "warm" sessions—browser contexts that have already established trust through previous legitimate-looking activity. These can be reused for multiple scraping tasks without triggering fresh detection.
Step 5: Scrape Google Cache for Historical Data
Here's a technique most guides skip: scraping from Google's cache instead of live sites.
When Google crawls the web, it caches page snapshots. Most Datadome-protected sites whitelist Google's crawler, making cached pages accessible without protection.
The trade-off: cached data can be hours or days old. This only works if you don't need real-time information.
Accessing Cached Pages
Prepend this to any URL:
https://webcache.googleusercontent.com/search?q=cache:
from curl_cffi import requests
target_url = "https://example.com/protected-page"
cache_url = f"https://webcache.googleusercontent.com/search?q=cache:{target_url}"
response = requests.get(cache_url, impersonate="chrome")
print(response.text)
Extracting Cache Timestamps
Google's cache includes when the page was last crawled:
import re
from curl_cffi import requests
def get_cached_page(url):
cache_url = f"https://webcache.googleusercontent.com/search?q=cache:{url}"
response = requests.get(cache_url, impersonate="chrome")
timestamp_match = re.search(
r'It is a snapshot of the page as it appeared on (.+?)\.',
response.text
)
if timestamp_match:
print(f"Cached on: {timestamp_match.group(1)}")
return response.text
Limitations
Not all pages are cached. Dynamic JavaScript content won't appear. Some images and assets may be missing. Cache updates are unpredictable.
Use this for historical data or when real-time accuracy isn't critical.
Step 6: Avoid Honeypot Traps
Datadome plants invisible traps throughout pages. These honeypots are HTML elements hidden with CSS that humans never interact with—but naive scrapers will.
Recognizing Honeypots
Watch for elements like:
<div style="display:none">
<a href="/trap-link">Click here</a>
</div>
<input type="text" style="position:absolute;left:-9999px" name="trap_field" />
<a href="/fake-page" style="opacity:0">Invisible link</a>
Filtering Hidden Elements
Only interact with visible elements:
async function isVisible(element) {
const box = await element.boundingBox();
if (!box) return false;
const style = await element.evaluate(el => {
const computed = window.getComputedStyle(el);
return {
display: computed.display,
visibility: computed.visibility,
opacity: computed.opacity
};
});
return style.display !== 'none'
&& style.visibility !== 'hidden'
&& parseFloat(style.opacity) > 0;
}
const links = await page.$$('a');
for (const link of links) {
if (await isVisible(link)) {
// Safe to interact
}
}
In Python with BeautifulSoup:
from bs4 import BeautifulSoup
def is_hidden(element):
style = element.get('style', '')
hiding_patterns = [
'display:none',
'visibility:hidden',
'opacity:0',
'left:-9999',
]
return any(p in style.lower() for p in hiding_patterns)
soup = BeautifulSoup(html, 'html.parser')
visible_links = [
link for link in soup.find_all('a')
if not is_hidden(link)
]
The rule is simple: if a human couldn't see or interact with it, your bot shouldn't either.
Putting It All Together
Here's a complete Datadome bypass class combining these techniques:
from curl_cffi import requests
import random
import time
class DatadomeBypass:
def __init__(self, proxies=None):
self.proxies = proxies or []
self.session = None
def create_session(self):
self.session = requests.Session(impersonate="chrome")
self.session.headers.update({
"Accept-Language": random.choice([
"en-US,en;q=0.9",
"en-GB,en;q=0.9",
])
})
def get_with_retry(self, url, max_retries=3):
self.create_session()
for attempt in range(max_retries):
try:
proxy = random.choice(self.proxies) if self.proxies else None
response = self.session.get(
url,
proxies={"http": proxy, "https": proxy} if proxy else None,
timeout=10
)
if "datadome" in response.cookies:
print(f"Datadome detected, attempt {attempt + 1}")
time.sleep(random.uniform(5, 10))
continue
return response
except Exception as e:
print(f"Error: {e}")
time.sleep(random.uniform(2, 5))
return None
# Usage
bypass = DatadomeBypass(proxies=[
"http://user:pass@proxy1.example.com:8000",
])
response = bypass.get_with_retry("https://example.com")
if response:
print(response.text[:500])
When Not to Bypass Datadome
Sometimes the smartest move is avoiding the bypass altogether:
- The site offers a public API — Even paid APIs are often cheaper than complex bypass infrastructure
- You only need occasional data snapshots — Manual collection might be simpler
- Legal risks outweigh benefits — Commercial scraping may violate terms of service
- Maintenance time exceeds data value — Bypasses require constant updates
Datadome constantly evolves. Their machine learning models adapt in real-time based on new attack patterns. Techniques working today might fail next week without warning.
If you're building a business on scraped data, consider whether that's a stable foundation. One Datadome update could break your entire operation overnight.
Legal Considerations
Scraping public data is generally considered legal in the United States following the hiQ Labs v. LinkedIn case. However, several factors complicate this:
- Bypassing technical security measures may violate the Computer Fraud and Abuse Act (CFAA)
- You're almost certainly violating the website's Terms of Service
- Commercial resale of scraped data raises additional concerns
- Different jurisdictions have different laws
For any commercial scraping operation, consult a lawyer familiar with digital rights and data collection law. The legal landscape continues evolving, and what's acceptable varies significantly by jurisdiction and use case.
FAQ
Is bypassing Datadome legal?
Scraping public data is generally legal (see hiQ Labs v. LinkedIn). However, bypassing security measures may violate the Computer Fraud and Abuse Act in some jurisdictions. You're almost certainly violating Terms of Service. Consult a lawyer for commercial use.
Can I bypass Datadome with free proxies?
Free proxies typically have terrible reputation scores. Datadome flags them immediately. Quality residential proxies are worth the investment for any serious scraping operation. The cost of good proxies is almost always less than the cost of failed scrapes and wasted development time.
How often does Datadome update its detection?
Datadome updates continuously. Their machine learning models adapt in real-time based on new attack patterns. What works today may not work next week. Plan for ongoing maintenance of any bypass solution.
Does reverse engineering Datadome's JavaScript work?
Possible in theory, extremely difficult in practice. Their code uses aggressive obfuscation, dynamic generation, time-bounded execution, and anti-debugging traps. Even successful reverse engineering requires constant maintenance as Datadome updates. The maintenance burden makes this approach impractical for most use cases.
What's the best approach for large-scale scraping?
Combine multiple techniques: curl_cffi for simple pages, Playwright with stealth for JavaScript-heavy content, quality residential proxies, realistic behavioral patterns, and session warm-up. No single technique works universally—layer them for best results.
Final Thoughts
Getting through Datadome requires a multi-layered approach. No single technique guarantees success—you need TLS fingerprinting, stealth browsers, quality proxies, and human-like behavior working together.
The arms race between scrapers and anti-bot vendors continues accelerating. Datadome processes trillions of signals daily, constantly refining its detection models. Yesterday's bypass might trigger blocks today.
Key takeaways:
- Use curl_cffi for lightweight HTTP requests with proper TLS fingerprints
- Deploy Playwright with stealth plugins for JavaScript-heavy pages
- Invest in quality residential proxies—IP reputation is critical
- Simulate realistic human behavior with randomized timing and movement
- Consider Google Cache for non-time-sensitive data
- Avoid honeypot traps by only interacting with visible elements
- Warm up sessions before accessing valuable pages
- Plan for ongoing maintenance as detection evolves
The most successful scrapers don't rely on any single bypass technique. They combine approaches, monitor success rates, and adapt quickly when something breaks.
Whatever your scraping goals, respect rate limits and server resources. Even when bypassing protection, be a good citizen of the web—excessive load or abusive patterns hurt everyone and accelerate the arms race.
With the right approach, patience, and ongoing attention, you can maintain reliable access to Datadome-protected sites while keeping your operations sustainable.