You're scraping a website—maybe for research, price monitoring, or building a dataset—and everything's working perfectly. Then, after your 50th request, you hit a wall. The site returns a 403 Forbidden, or worse, a CAPTCHA page. Your scraper's been detected, and now you're scrambling to figure out what went wrong.
I've been there more times than I care to admit. The culprit? Often, it's your User-Agent header giving you away. When your scraper announces itself as python-requests/2.31.0
with every single request, you might as well be waving a giant "I'm a bot!" flag.
In this article, I'll walk you through what User-Agents are, why rotating them matters more than you might think, and—most importantly—how to implement rotation properly with some tricks I've learned the hard way.
What is a User-Agent Anyway?
Every time your browser (or scraper) makes an HTTP request, it sends along a User-Agent header. This header is essentially an introduction—it tells the server what kind of software is making the request.
Here's what a typical Chrome browser User-Agent looks like:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Breaking this down:
Mozilla/5.0
- Historical compatibility marker (all browsers use this)Windows NT 10.0; Win64; x64
- Operating system informationAppleWebKit/537.36
- Rendering engineChrome/120.0.0.0
- Browser and versionSafari/537.36
- Additional compatibility info
Now contrast that with what Python's requests library sends by default:
python-requests/2.31.0
See the problem? That's about as subtle as showing up to a costume party in your work clothes.
Why Websites Care (And Why You Should Too)
Websites track User-Agents for several legitimate reasons—delivering mobile-optimized content, gathering analytics, and yes, detecting bots. According to recent data, bot traffic now makes up nearly half of all internet traffic, and not all of it is friendly.
But here's where it gets interesting. A surprising number of scraper blocks happen not because the website detected what you were doing, but because of how obviously you were doing it. When a site sees 1,000 requests in an hour, all from the same User-Agent, all following the exact same pattern, it doesn't take advanced AI to spot the bot.
This is where User-Agent rotation comes in. By varying your User-Agent with each request, you spread your traffic across multiple apparent "users," making your scraper's fingerprint less obvious.
How Websites Actually Detect Scrapers
Before we dive into implementation, let's understand what we're up against. Modern websites use layered detection:
Passive Fingerprinting: Websites collect data that your browser sends automatically. This includes your User-Agent, IP address, Accept-Language headers, and more. They combine these signals to create a fingerprint. If 500 requests all have the same fingerprint, that's a red flag.
Active Fingerprinting: More sophisticated sites go deeper. They might check:
- JavaScript execution (can you run JS?)
- Canvas fingerprinting (how your device renders graphics)
- WebGL information (GPU details)
- Font enumeration
- Audio context signatures
For basic HTTP scraping, you're mostly dealing with passive fingerprinting. User-Agent rotation helps here because it's the most obvious signal.
Behavioral Analysis: This is where things get tricky. Sites watch how you interact:
- Do you request pages too quickly? (Humans need time to read)
- Do you follow links humans wouldn't naturally click?
- Is your mouse movement realistic?
User-Agent rotation won't solve behavioral issues, but it's still a crucial first layer of defense.
Basic User-Agent Rotation in Python
Let's start with the simplest approach—manually creating a list of User-Agents and rotating through them.
import requests
import random
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
]
def scrape_page(url):
headers = {
'User-Agent': random.choice(user_agents)
}
response = requests.get(url, headers=headers)
return response.text
# Make requests
for i in range(10):
content = scrape_page('https://httpbin.org/user-agent')
print(f"Request {i+1}: {content}")
This works, but it has limitations. You're stuck with the same five User-Agents, which means after a few hundred requests, patterns emerge. We can do better.
Using fake-useragent for Dynamic Rotation
The fake-useragent
library maintains an updated database of real-world User-Agents. This is better because you're using actual browser strings that websites see every day.
First, install it:
pip install fake-useragent
Here's how to use it:
from fake_useragent import UserAgent
import requests
ua = UserAgent()
def scrape_with_rotation(url):
headers = {
'User-Agent': ua.random
}
response = requests.get(url, headers=headers)
return response
# Each request gets a different User-Agent
for i in range(5):
response = scrape_with_rotation('https://httpbin.org/user-agent')
print(f"Request {i+1} User-Agent: {response.json()}")
What makes this better? The library pulls from a database of actual User-Agents seen in the wild. Every time you call ua.random
, you get a different one.
You can also limit to specific browsers:
# Only Chrome and Firefox
ua = UserAgent(browsers=['chrome', 'firefox'])
# Only modern versions
ua = UserAgent(min_version=115.0)
# Get specific browser
print(ua.chrome) # Always returns Chrome UA
print(ua.firefox) # Always returns Firefox UA
Here's a trick I learned: don't just randomize—be smart about which User-Agents you use. Chrome has about 64% market share globally. If you want to blend in, weight your selection toward Chrome:
import requests
import random
from fake_useragent import UserAgent
ua = UserAgent()
def get_weighted_user_agent():
"""
Returns a User-Agent with realistic browser distribution:
Chrome: ~65%, Firefox: ~20%, Safari: ~15%
"""
choice = random.random()
if choice < 0.65:
return ua.chrome
elif choice < 0.85:
return ua.firefox
else:
return ua.safari
def scrape_smart(url):
headers = {
'User-Agent': get_weighted_user_agent()
}
return requests.get(url, headers=headers)
This approach mimics real traffic patterns better than pure randomization.
Building a User-Agent with Matching Headers (The Trick Nobody Tells You)
Here's where most tutorials stop—but here's the thing: User-Agent isn't the only header that matters. Websites look at your entire header set, and inconsistencies stand out.
For example, if you send a Chrome User-Agent but Firefox's Accept headers, savvy detection systems will notice. Let me show you what I mean:
Chrome's typical headers:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Sec-Ch-Ua: "Not_A Brand";v="8", "Chromium";v="120"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Windows"
Firefox's typical headers:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Notice the differences? When you rotate User-Agents, you should rotate the corresponding headers too. Here's how:
import requests
import random
class BrowserProfile:
"""Complete browser profiles with matching headers"""
CHROME = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120"',
'Sec-Ch-Ua-Mobile': '?0',
'Sec-Ch-Ua-Platform': '"Windows"',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1'
}
FIREFOX = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1'
}
SAFARI = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
@classmethod
def get_random_profile(cls):
"""Return a random browser profile"""
profiles = [cls.CHROME, cls.FIREFOX, cls.SAFARI]
return random.choice(profiles).copy()
# Usage
def scrape_with_full_profile(url):
headers = BrowserProfile.get_random_profile()
response = requests.get(url, headers=headers)
return response
response = scrape_with_full_profile('https://httpbin.org/headers')
print(response.json())
This approach is significantly more robust. You're not just rotating the User-Agent—you're rotating complete, consistent browser fingerprints.
Session-Based Rotation: When to Switch Identities
Here's a mistake I made early on: rotating the User-Agent on every single request, even when scraping multiple pages from the same site in sequence.
Think about it from the website's perspective. A real user doesn't magically switch from Chrome on Windows to Firefox on Mac between clicking two links. That's suspicious behavior.
A better approach is session-based rotation:
import requests
import time
import random
from fake_useragent import UserAgent
class SmartScraper:
def __init__(self):
self.ua = UserAgent()
self.session = None
self.requests_in_session = 0
self.max_requests_per_session = random.randint(10, 30)
def _new_session(self):
"""Create a new session with a fresh User-Agent"""
self.session = requests.Session()
self.session.headers.update({
'User-Agent': self.ua.random
})
self.requests_in_session = 0
self.max_requests_per_session = random.randint(10, 30)
print(f"New session created. Will use for {self.max_requests_per_session} requests.")
def get(self, url):
"""Make a request, rotating session after N requests"""
if self.session is None or self.requests_in_session >= self.max_requests_per_session:
self._new_session()
self.requests_in_session += 1
# Add realistic delay
time.sleep(random.uniform(1, 3))
return self.session.get(url)
# Usage
scraper = SmartScraper()
urls = [
'https://httpbin.org/delay/1',
'https://httpbin.org/delay/1',
'https://httpbin.org/delay/1',
]
for url in urls:
response = scraper.get(url)
print(f"Status: {response.status_code}")
This maintains a consistent identity for a random number of requests (10-30), then switches. It's much more realistic than changing every time.
The DNS and TLS Fingerprint Problem (Advanced)
Here's something that caught me off guard: even with perfect User-Agent rotation and matching headers, sophisticated sites can still fingerprint you through TLS and DNS.
When Python's requests library (which uses urllib3) makes an HTTPS connection, it creates a TLS fingerprint based on:
- Cipher suites supported
- TLS extensions
- Compression methods
- Elliptic curves advertised
This fingerprint is often different from what real browsers produce, and it's collected before your HTTP headers are even sent.
There are a few ways to handle this:
Option 1: Use curl_cffi
The curl_cffi
library mimics browser TLS fingerprints by using libcurl under the hood:
# pip install curl-cffi
from curl_cffi import requests
# Impersonate Chrome
response = requests.get('https://tls.peet.ws/api/all', impersonate='chrome120')
# Impersonate Firefox
response = requests.get('https://tls.peet.ws/api/all', impersonate='firefox120')
This library automatically handles TLS fingerprinting to match real browsers. It's one of the most effective tools I've found for avoiding advanced detection.
Option 2: Use a headless browser
For JavaScript-heavy sites or when TLS fingerprinting is a concern, Playwright or Selenium with real browser engines provides authentic fingerprints:
from playwright.sync_api import sync_playwright
def scrape_with_real_browser(url):
with sync_playwright() as p:
# Random browser choice
browser = p.chromium.launch(headless=True)
context = browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
)
page = context.new_page()
page.goto(url)
content = page.content()
browser.close()
return content
The tradeoff? Speed and resources. Headless browsers are much slower than HTTP libraries.
Avoiding Common User-Agent Rotation Mistakes
After implementing User-Agent rotation on dozens of projects, here are the mistakes I see most often:
1. Using Outdated User-Agents
Don't hardcode User-Agents from 2020. Browsers update constantly, and old User-Agents look suspicious. The fake-useragent
library helps here since it maintains an updated database.
2. Forgetting About Mobile
Real traffic includes mobile devices. Mix in mobile User-Agents:
from fake_useragent import UserAgent
ua = UserAgent(platforms=['pc', 'mobile', 'tablet'])
print(ua.random) # Could be desktop or mobile
3. Rotating Too Perfectly
Ironically, rotating on a perfect schedule (every N requests exactly) can itself be a pattern. Add randomness:
import random
requests_until_rotation = random.randint(5, 20)
4. Ignoring Referer Headers
When you navigate between pages on a site, the Referer header shows where you came from. With User-Agent rotation, don't forget to maintain realistic Referer values:
previous_url = None
for url in urls:
headers = {'User-Agent': ua.random}
if previous_url:
headers['Referer'] = previous_url
response = requests.get(url, headers=headers)
previous_url = url
Combining User-Agent Rotation with Proxies
User-Agent rotation alone isn't a silver bullet. For robust scraping, combine it with rotating proxies. Here's why: even with different User-Agents, if all requests come from the same IP address, you're still identifiable.
import requests
from fake_useragent import UserAgent
import random
ua = UserAgent()
proxies_list = [
{'http': 'http://proxy1.com:8080', 'https': 'http://proxy1.com:8080'},
{'http': 'http://proxy2.com:8080', 'https': 'http://proxy2.com:8080'},
{'http': 'http://proxy3.com:8080', 'https': 'http://proxy3.com:8080'},
]
def scrape_with_rotation(url):
headers = {'User-Agent': ua.random}
proxy = random.choice(proxies_list)
response = requests.get(url, headers=headers, proxies=proxy)
return response
This creates unique (User-Agent, IP) pairs, making each request look like it's from a different user.
When User-Agent Rotation Isn't Enough
Let me be honest: there are limits to what User-Agent rotation can do. If a site uses advanced bot detection like DataDome, PerimeterX, or Cloudflare Bot Management, you're dealing with:
- Canvas fingerprinting
- Mouse movement tracking
- Timing analysis
- Browser automation detection
- Machine learning models that detect patterns across hundreds of variables
For these scenarios, you might need:
- Actual headless browsers (Playwright/Selenium)
- Browser extension spoofing
- CAPTCHA solving services
- or accepting that some sites are meant to be accessed through their APIs
User-Agent rotation is one layer in a defense-in-depth strategy. It's necessary but not always sufficient.
Putting It All Together: A Production-Ready Implementation
Here's a complete example that combines everything we've covered:
import requests
import random
import time
from fake_useragent import UserAgent
class ProductionScraper:
def __init__(self, proxies=None):
self.ua = UserAgent(browsers=['chrome', 'firefox', 'safari'])
self.session = None
self.proxies = proxies or []
self.request_count = 0
self.session_max_requests = random.randint(8, 25)
self.last_request_time = None
def _get_headers(self):
"""Generate complete header set"""
return {
'User-Agent': self.ua.random,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
def _reset_session(self):
"""Create new session with fresh identity"""
self.session = requests.Session()
self.session.headers.update(self._get_headers())
self.request_count = 0
self.session_max_requests = random.randint(8, 25)
print(f"Session reset. New UA: {self.session.headers['User-Agent'][:50]}...")
def _rate_limit(self):
"""Add realistic delays between requests"""
if self.last_request_time:
elapsed = time.time() - self.last_request_time
min_delay = random.uniform(1.5, 3.5)
if elapsed < min_delay:
time.sleep(min_delay - elapsed)
self.last_request_time = time.time()
def get(self, url, **kwargs):
"""Make a request with rotation and rate limiting"""
# Reset session if needed
if not self.session or self.request_count >= self.session_max_requests:
self._reset_session()
# Apply rate limiting
self._rate_limit()
# Add proxy if available
if self.proxies:
kwargs['proxies'] = random.choice(self.proxies)
# Make request
self.request_count += 1
try:
response = self.session.get(url, timeout=10, **kwargs)
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
return None
# Example usage
scraper = ProductionScraper()
urls = [
'https://httpbin.org/user-agent',
'https://httpbin.org/headers',
'https://httpbin.org/delay/1'
]
for url in urls:
response = scraper.get(url)
if response:
print(f"Success: {response.status_code}")
This implementation includes:
- Session-based rotation (switches identity every 8-25 requests)
- Realistic delays between requests
- Complete header sets
- Optional proxy support
- Error handling
Testing Your Rotation Strategy
Before deploying your scraper, test whether your rotation is actually working. Here are two ways:
1. Check Different User-Agents Are Being Sent:
from collections import Counter
import requests
from fake_useragent import UserAgent
ua = UserAgent()
user_agents_used = []
for _ in range(50):
headers = {'User-Agent': ua.random}
response = requests.get('https://httpbin.org/user-agent', headers=headers)
user_agents_used.append(response.json()['user-agent'])
# Check diversity
print(f"Unique User-Agents: {len(set(user_agents_used))} out of 50 requests")
print(Counter(user_agents_used).most_common(5))
You want to see high diversity. If the same User-Agent appears more than a few times in 50 requests, your rotation might not be working.
2. Test Against Detection Services:
Visit sites like https://httpbin.org/headers
or https://www.whatismybrowser.com
to verify what headers you're actually sending:
response = requests.get('https://httpbin.org/headers', headers={'User-Agent': ua.random})
print(response.json())
Final Thoughts
User-Agent rotation is fundamental to successful web scraping, but it's not magic. It's one piece of a larger puzzle that includes:
- Rate limiting
- Proxy rotation
- Realistic behavior patterns
- Proper session management
- Respecting robots.txt
The techniques I've shown you here—weighted browser distribution, matching headers, session-based rotation, and TLS considerations—go beyond what most tutorials cover. They're the difference between a scraper that works today and one that keeps working tomorrow.
Remember: the goal isn't to be invisible (you can't be), but to be indistinguishable from legitimate traffic. User-Agent rotation helps you blend into the crowd rather than standing out as an obvious bot.