The main difference between Playwright and Selenium is their underlying architecture. Playwright uses persistent WebSocket connections via the Chrome DevTools Protocol (CDP) for direct browser communication. Selenium relies on the WebDriver API over HTTP requests, adding an intermediary translation layer. This architectural gap translates to 35-45% faster execution for Playwright, lower memory consumption, and native network interception capabilities.
This isn't a features comparison—it's a deep technical breakdown. We'll cover protocol-level differences, real benchmark data, stealth techniques for scraping, and when to skip browsers entirely.
Why Protocol Architecture Actually Matters
Protocol choices determine everything about your automation stack. Speed, reliability under load, network interception capabilities, and what anti-bot defenses can detect.
Selenium's approach: Your test script sends HTTP requests to a WebDriver server (ChromeDriver on port 9515, for example). That driver then translates those commands into CDP or equivalent browser-specific protocols.
Playwright's approach: Your script maintains a persistent WebSocket connection directly to the browser using CDP for Chromium, plus custom integrations for Firefox and WebKit.
Every additional hop adds latency. More importantly, it creates state drift opportunities and expands your detectable surface area.
In practice, this determines whether your scraper finishes in 30 minutes or 3 hours.
Selenium's Communication Chain Explained
Selenium's four-step communication chain looks simple on the surface. But trace a single click through the entire flow:
- Test Script → WebDriver API
- WebDriver API → Browser Driver (ChromeDriver, GeckoDriver)
- Browser Driver → Browser (translated to CDP)
- Browser → Driver → Script (response returns)
Here's what a simple click actually triggers:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.find_element(By.ID, "submit").click()
Under the hood, this generates:
{
"method": "POST",
"url": "/session/xxx/element/yyy/click"
}
That JSON command travels via HTTP to ChromeDriver. ChromeDriver translates it to CDP. Chrome executes. Response bubbles back through all layers.
Measured latency: A simple element click averages ~536ms with Selenium versus ~290ms with Playwright on identical hardware.
That's nearly 2x slower per action.
Selenium 4.33: What's New in 2025-2026
Selenium 4.33 introduced several improvements worth noting:
- Live node previews in Grid UI
- BiDi webExtension module for deeper browser access
- Improved Docker/Kubernetes integration for Dynamic Grid
- W3C WebDriver BiDi support expansion
The BiDi protocol brings Selenium closer to Playwright's capabilities. You can now intercept console messages and network traffic natively.
But the fundamental HTTP-based architecture remains unchanged.
Playwright's Direct CDP Approach
Playwright maintains an always-on WebSocket to the browser. Commands bypass the middleman entirely.
// Playwright - direct WebSocket message
await page.click('#submit');
// No HTTP overhead, no driver translation
This architecture enables capabilities impossible in Selenium:
- Native route interception without plugins
- Request blocking at the protocol level
- Context isolation without launching new browser instances
- Auto-waiting built into every action
Playwright 1.57: The 2025-2026 Updates
Playwright 1.57 brought major changes:
- Chrome for Testing builds instead of Chromium (better detection profiles)
- Playwright Agents for LLM-driven test generation
- ARIA snapshot assertions for accessibility testing
- Trace grouping for visual debugging
- fail-on-flaky-tests CLI flag
The shift to Chrome for Testing is significant for scraping. Your automation now uses the same browser binaries as real users.
Speed Benchmarks: Real Numbers from Production
These benchmarks ran on identical hardware (16GB RAM, 2.6GHz) against the same dynamic e-commerce site. 100 iterations per tool.
| Metric | Selenium 4.33 | Playwright 1.57 |
|---|---|---|
| Page Load (avg) | 2.8s | 1.9s |
| Element Click | 536ms | 290ms |
| Form Fill (10 fields) | 4.2s | 2.1s |
| Full Page Screenshot | 1.1s | 0.6s |
| Memory per Instance | 380MB | 215MB |
JavaScript-Heavy SPA Results
The gap widens dramatically on React/Vue/Angular applications:
| Test Suite | Selenium | Playwright | Playwright + Route Blocking |
|---|---|---|---|
| 500 Pages | ~60 min | 35 min | 18 min |
| Memory Peak | 2.8GB | 1.6GB | 1.2GB |
| Flaky Tests | 12% | 3% | 2% |
Network interception alone cuts execution time by 50% on media-heavy sites.
Bypassing Bot Detection in 2026
Modern anti-bot systems don't just check navigator.webdriver. They correlate hundreds of signals:
- CDP command patterns
- WebSocket fingerprints
- Timing fingerprints
- GPU/codec characteristics
- TLS fingerprints
- Mouse movement patterns
Why Default Selenium Gets Blocked
Out-of-the-box Selenium leaks obvious fingerprints:
// Detection vectors in default Selenium
navigator.webdriver = true; // Dead giveaway
window.cdc_adoQpoasnfa76pfcZLmcfl_Array; // ChromeDriver property
navigator.plugins.length = 0; // Headless marker
These properties exist because automation tools are required to be detectable for legitimate testing scenarios.
Undetected ChromeDriver for Selenium
import undetected_chromedriver as uc
options = uc.ChromeOptions()
options.add_argument("--disable-blink-features=AutomationControlled")
driver = uc.Chrome(options=options)
driver.get("https://target-site.com")
This patches most detection vectors automatically.
Success rates (approximate):
| Protection Level | Success Rate |
|---|---|
| Basic bot detection | ~95% |
| Cloudflare (standard) | ~70% |
| DataDome | ~35% |
| PerimeterX | ~30% |
Playwright Stealth Mode
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
stealth_sync(page) # Applies evasion patches
page.goto("https://target-site.com")
The playwright-stealth library ports puppeteer-extra-plugin-stealth evasion modules to Playwright.
The CDP Detection Problem
Here's what most guides miss: CDP itself is detectable.
Advanced anti-bot systems watch for:
Runtime.enablecommand patterns- CDP connection signatures
- WebSocket handshake characteristics
Opening Chrome DevTools on a detection test site often triggers bot flags. That's the same detection method used against automation.
Patchright: The CDP Patching Fork
Patchright modifies Playwright internals to avoid sending Runtime.enable:
from patchright.async_api import async_playwright
async def stealth_browse():
async with async_playwright() as p:
browser = await p.chromium.launch(
channel="chrome", # Real Chrome, not Chromium
headless=False,
args=["--disable-blink-features=AutomationControlled"]
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080}
)
page = await context.new_page()
await page.goto("https://cloudflare-protected-site.com")
This reduces detection on CreepJS from 100% to approximately 67%.
Still not bulletproof—but significantly better.
Network Interception: The Performance Hack Nobody Uses
Playwright's route interception is the single biggest optimization for JavaScript-heavy scraping.
Block Non-Essential Assets
// Block images, CSS, fonts - instant 40% speed boost
await page.route('**/*.{png,jpg,jpeg,gif,css,woff,woff2}', route => route.abort());
Keep Only API Responses
await page.route('**/*', route => {
const type = route.request().resourceType();
return ['document', 'xhr', 'fetch'].includes(type)
? route.continue()
: route.abort();
});
This approach:
- Reduces page load times by 30-50%
- Cuts bandwidth by 60-80%
- Improves parallel execution efficiency
- Reduces memory pressure
Selenium's CDP Workaround (Limited)
Selenium 4 added CDP hooks, but they're clunky:
# Block images via CDP in Selenium (LOCAL ONLY)
driver.execute_cdp_cmd('Network.enable', {})
driver.execute_cdp_cmd('Network.setBlockedURLs', {
'urls': ['*.jpg', '*.png', '*.gif', '*.css']
})
Critical limitation: This only works with local ChromeDriver. Remote WebDriver/Grid loses CDP access.
The HTTP-Only Alternative (Skip Browsers Entirely)
The fastest browser automation is no browser at all.
If the target exposes JSON endpoints or renders server-side, go straight to HTTP:
import httpx
from selectolax.parser import HTMLParser
# 10x faster than any browser automation
async with httpx.AsyncClient() as client:
response = await client.get(
'https://api.example.com/products',
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...'}
)
html = HTMLParser(response.text)
products = html.css('.product-card')
When to Skip Browsers
Skip browser automation when:
- API endpoints are accessible or easily reverse-engineered
- Content is server-rendered (not React/Vue/Angular)
- No complex client-side interactions needed
- Cost or scale constraints exist
HTTPX vs Requests
HTTPX offers advantages over the classic requests library:
| Feature | Requests | HTTPX |
|---|---|---|
| HTTP/2 Support | No | Yes |
| Async Support | No | Yes |
| Connection Pooling | Basic | Advanced |
| Timeout Handling | Basic | Granular |
HTTP/2 support alone can reduce block rates. Many anti-bot systems flag HTTP/1.1 connections from suspicious IPs.
curl_cffi for TLS Fingerprinting
When even HTTPX gets blocked, curl_cffi mimics real browser TLS fingerprints:
from curl_cffi import requests
# Mimics Chrome TLS fingerprint
response = requests.get(
"https://protected-site.com",
impersonate="chrome"
)
This bypasses TLS fingerprinting defenses that flag Python HTTP clients.
Real-World Decision Matrix
Use Playwright When:
- Speed is critical: 35-45% faster execution
- JavaScript-heavy targets: React, Vue, Angular SPAs
- Resource constraints: 44% less memory per instance
- Parallel scraping: Better efficiency and fewer timeouts
- Network manipulation needed: First-class route interception
Stick with Selenium When:
- Legacy browser support: Older Safari, IE11 scenarios
- Existing infrastructure: Selenium Grid already deployed
- Enterprise mandates: Org-wide Selenium standardization
- Real device testing: Mobile device farms
- Team expertise: Years of Selenium utilities built
Skip Both (Use HTTP) When:
- API access possible: Reverse-engineered endpoints
- Static HTML: Server-rendered content
- Extreme scale: Cost-sensitive, thousands of pages
- Simple data: No interactions needed
Advanced Techniques That Actually Work
Hybrid Approach: Browser Login, HTTP Scrape
Log in with Playwright, scrape with requests. Best of both worlds.
from playwright.async_api import async_playwright
import httpx
async def hybrid_scrape():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
# Handle login with browser
await page.goto('https://example.com/login')
await page.fill('#username', 'user')
await page.fill('#password', 'pass')
await page.click('#submit')
# Extract cookies
cookies = await page.context.cookies()
await browser.close()
# Switch to httpx for actual scraping (10x faster)
async with httpx.AsyncClient() as client:
for cookie in cookies:
client.cookies.set(cookie['name'], cookie['value'])
response = await client.get('https://example.com/api/data')
return response.json()
Browsers handle the hard parts (auth, CAPTCHA, JS rendering). HTTP handles volume.
The CDP Bridge: Selenium + Playwright
Combine Selenium's familiarity with Playwright's network superpowers:
from selenium import webdriver
from playwright.sync_api import sync_playwright
# Start with Selenium
driver = webdriver.Chrome()
driver.get('https://example.com')
# Connect Playwright via CDP
playwright = sync_playwright().start()
browser = playwright.chromium.connect_over_cdp(
f"http://localhost:{driver.service.port}"
)
# Use Playwright's superior API on Selenium's browser
page = browser.contexts[0].pages[0]
page.route('**/*.png', lambda route: route.abort())
Useful during migrations or when Grid infrastructure constraints exist.
Fingerprint Rotation at Scale
Rotate high-signal traits across browser contexts:
const contexts = [];
const userAgents = [/* array of real UAs */];
const locales = ['en-US', 'en-GB', 'de-DE', 'fr-FR'];
const timezones = ['America/New_York', 'Europe/London', 'Asia/Tokyo'];
for (let i = 0; i < 10; i++) {
const context = await browser.newContext({
viewport: {
width: 1920 + Math.floor(Math.random() * 100),
height: 1080 + Math.floor(Math.random() * 100)
},
userAgent: userAgents[Math.floor(Math.random() * userAgents.length)],
locale: locales[Math.floor(Math.random() * locales.length)],
timezoneId: timezones[Math.floor(Math.random() * timezones.length)],
});
contexts.push(context);
}
Pair with randomized delays and request throttling to mimic organic traffic.
Performance Optimization Tips
Playwright Optimization Checklist
- Block non-essential assets (images, fonts, analytics)
- Disable CSS animations via context settings
- Use
page.waitForResponseinstead oftime.sleep() - Prefer locators over CSS selectors for retry efficiency
- Minimize full page reloads using SPA transitions
// Optimized route blocking
await page.route('**/*', route => {
const r = route.request();
const type = r.resourceType();
const url = r.url();
if (type === 'document') return route.continue();
if (['xhr', 'fetch'].includes(type)) {
if (url.includes('/api/') || url.includes('/graphql')) {
return route.continue();
}
}
return route.abort();
});
Selenium Optimization Checklist
- Use CDP locally to block heavy assets
- Adopt undetected-chromedriver for stealth
- Run headful when headless gets blocked
- Spread load with realistic pacing
- Replace sleeps with explicit waits
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Explicit wait cuts flaky retries
wait = WebDriverWait(driver, 15)
element = wait.until(EC.element_to_be_clickable((By.ID, "submit")))
element.click()
Proxy Integration for Scale
When scraping at scale, proxies become mandatory. If you're using Roundproxies.com, here's integration for both tools:
Playwright with Residential Proxies
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={
"server": "http://proxy.roundproxies.com:port",
"username": "user",
"password": "pass"
}
)
page = browser.new_page()
page.goto("https://target-site.com")
Selenium with Rotating Proxies
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=http://proxy.roundproxies.com:port')
driver = webdriver.Chrome(options=options)
Residential and mobile proxies reduce block rates by 60-80% compared to datacenter IPs.
Common Pitfalls (And How to Avoid Them)
Chasing Feature Lists Instead of Architecture
A shiny API doesn't change the physics of your network path. Evaluate the underlying protocol before comparing feature checkboxes.
Ignoring Detection Vectors
Even "stealth" plugins leave trails. Treat evasion as probabilistic, never guaranteed. Test against detection sites like BrowserScan and CreepJS regularly.
Over-Fetching Assets
Loading images, fonts, and CSS on a headless scraper torpedoes throughput. Block everything non-essential by default.
Global Sleep Statements
Replace time.sleep(2) with event-driven waits:
# Bad - wastes 2 seconds every time
time.sleep(2)
element.click()
# Good - waits only until ready
await page.wait_for_selector('#element', state='visible')
await page.click('#element')
One-Size-Fits-All Stacks
Mix Playwright, Selenium, and HTTP clients tactically. Different tools for different phases of the same workflow.
Browser Context Management
Playwright's browser contexts are a game-changer for parallel scraping. Each context is an isolated session with its own cookies, localStorage, and cache.
Creating Isolated Contexts
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
# Each context is fully isolated
context1 = browser.new_context()
context2 = browser.new_context()
page1 = context1.new_page()
page2 = context2.new_page()
# Different sessions, same browser instance
await page1.goto('https://site.com/user1')
await page2.goto('https://site.com/user2')
Context creation takes milliseconds. Browser launch takes seconds. This architectural difference enables efficient parallel workflows.
Memory Comparison
| Approach | Memory per "Session" |
|---|---|
| New Selenium Browser | ~380MB |
| New Playwright Browser | ~215MB |
| New Playwright Context | ~15MB |
For 50 parallel sessions:
- Selenium: ~19GB memory
- Playwright (new browsers): ~10.7GB
- Playwright (contexts): ~750MB + browser overhead
Context-based parallelism is 10-25x more memory efficient.
Handling Authentication at Scale
Session Storage Extraction
# Save auth state after login
storage = await context.storage_state(path="auth.json")
# Reuse auth state in new contexts
new_context = await browser.new_context(storage_state="auth.json")
This eliminates login overhead for subsequent scraping runs.
Multi-Account Rotation
auth_states = ["user1.json", "user2.json", "user3.json"]
async def scrape_with_rotation(urls):
async with async_playwright() as p:
browser = await p.chromium.launch()
for i, url in enumerate(urls):
auth_file = auth_states[i % len(auth_states)]
context = await browser.new_context(storage_state=auth_file)
page = await context.new_page()
await page.goto(url)
# Extract data
await context.close()
Rotate accounts to distribute rate limits across multiple authenticated sessions.
Error Handling and Retry Logic
Playwright Error Handling
from playwright.sync_api import sync_playwright, TimeoutError
async def resilient_scrape(url, max_retries=3):
for attempt in range(max_retries):
try:
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
response = await page.goto(url, timeout=30000)
if response.status == 403:
# Likely blocked - rotate proxy
raise Exception("Blocked - rotate proxy")
content = await page.content()
return content
except TimeoutError:
if attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt) # Exponential backoff
continue
raise
finally:
await browser.close()
Selenium Retry Pattern
from selenium.common.exceptions import TimeoutException, WebDriverException
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
def resilient_selenium_scrape(url):
driver = webdriver.Chrome()
try:
driver.get(url)
return driver.page_source
except TimeoutException:
driver.quit()
raise
finally:
driver.quit()
CI/CD Integration Patterns
GitHub Actions with Playwright
name: Scraping Pipeline
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npx playwright install chromium
- run: npm run scrape
- uses: actions/upload-artifact@v4
with:
name: scraped-data
path: output/
Docker Deployment
FROM mcr.microsoft.com/playwright/python:v1.57.0-noble
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "scraper.py"]
Playwright's official Docker images include all browser dependencies pre-installed.
Debugging and Monitoring
Playwright Trace Viewer
Playwright's trace viewer captures every action, network request, and DOM snapshot:
# Enable tracing
await context.tracing.start(screenshots=True, snapshots=True)
# Your scraping code
await page.goto('https://example.com')
await page.click('#button')
# Save trace
await context.tracing.stop(path="trace.zip")
Open the trace file at trace.playwright.dev for visual debugging.
Selenium Logging
from selenium.webdriver.remote.remote_connection import LOGGER
import logging
LOGGER.setLevel(logging.DEBUG)
# Now see all WebDriver commands in logs
driver = webdriver.Chrome()
Monitoring Scraper Health
Track these metrics in production:
| Metric | Warning Threshold | Critical Threshold |
|---|---|---|
| Success Rate | <95% | <85% |
| Avg Response Time | >5s | >10s |
| Memory Usage | >80% | >95% |
| Error Rate | >5% | >15% |
Set up alerts when thresholds trigger. Early detection prevents cascading failures.
Real-World Case Studies
E-commerce Price Monitoring
Challenge: Monitor 50,000 product prices daily across 12 retailers with varying anti-bot protection.
Solution Stack:
- Playwright with route blocking for JavaScript-heavy sites
- HTTPX for static HTML retailers
- Rotating residential proxies
- Redis queue for URL distribution
Results:
- 4-hour total scrape time (down from 18 hours with Selenium)
- 97% success rate
- $340/month infrastructure cost
Real Estate Data Aggregation
Challenge: Extract listings from 200+ local MLS sites, many with CAPTCHA protection.
Solution Stack:
- Selenium with undetected-chromedriver for authenticated sections
- Playwright for public listing pages
- 2Captcha integration for CAPTCHA solving
- PostgreSQL for deduplication
Results:
- 2.3M listings processed weekly
- 89% automation rate (11% required manual CAPTCHA solving)
Quick Reference: Copy-Paste Snippets
Playwright: Block Heavy Assets
await page.route('**/*.{png,jpg,jpeg,gif,css,woff,woff2}', r => r.abort());
Playwright: Keep Only API + Document
await page.route('**/*', r => {
const t = r.request().resourceType();
return (t === 'document' || t === 'xhr' || t === 'fetch')
? r.continue()
: r.abort();
});
Selenium 4: Block Assets via CDP (Local)
driver.execute_cdp_cmd('Network.enable', {})
driver.execute_cdp_cmd('Network.setBlockedURLs', {
'urls': ['*.png', '*.jpg', '*.gif', '*.css']
})
Hybrid: Login with Playwright, Scrape with HTTPX
# (See full example in Advanced Techniques section)
session = httpx.AsyncClient()
# Set cookies from Playwright context
response = await session.get('https://example.com/api/data')
Bridge: Connect Playwright over Selenium's CDP
browser = playwright.chromium.connect_over_cdp(
f"http://localhost:{driver.service.port}"
)
FAQ
Is Playwright faster than Selenium for SPAs?
Yes. Benchmarks consistently show 35-45% faster execution on React/Vue/Angular applications. The WebSocket-based architecture eliminates HTTP overhead per action.
How does Playwright's WebSocket Protocol improve speed?
It removes the WebDriver HTTP translation layer. Commands go directly to the browser via CDP. Fewer hops means lower latency and better reliability under load.
Can Selenium 4 match Playwright's request blocking?
Partially, via execute_cdp_cmd() on local ChromeDriver. Remote WebDriver and Selenium Grid don't support CDP commands, limiting this capability in distributed setups.
What about detection by Cloudflare, DataDome, PerimeterX?
Undetected ChromeDriver helps with light-to-medium protection. For aggressive anti-bot systems, expect an ongoing arms race. Consider specialized tools like Patchright, or evaluate if HTTP-based scraping can bypass the browser entirely.
When should I skip browser automation entirely?
When the target has accessible API endpoints, server-rendered HTML, or easily reverse-engineered calls. HTTPX/requests with proper headers will be 10x faster and significantly cheaper at scale.
Final Verdict: It's Not Either/Or
The practical approach in 2026 isn't tribal—it's compositional:
- Playwright for scraping: Speed, modern JS sites, native network interception, smaller memory footprint
- Selenium for testing: Cross-browser breadth, entrenched Grid infrastructure, legacy compatibility
- HTTPX/requests for APIs: When you can bypass the browser entirely
- Specialized tools: When anti-bot pressure demands CDP patching or managed browser services
The biggest mistake? Choosing by features instead of architecture.
Playwright's WebSocket-first design isn't just "faster." It reshapes what's possible: reliable request shaping, higher concurrency, smarter evasion.
Selenium remains valuable where standards, org mandates, and device labs matter.
Smart teams stitch them together, measure ruthlessly, and let the workload decide.