SEO proxies are specialized proxy servers that mask your IP address while performing search engine optimization tasks like rank tracking, keyword research, SERP scraping, and competitor analysis. They prevent IP bans from Google and other search engines by routing requests through residential or datacenter IPs.
The best SEO proxies combine large IP pools, fast connection speeds, and geo-targeting capabilities to deliver accurate search data at scale. Whether you're an agency tracking thousands of keywords or a solo marketer monitoring local rankings, the right proxy setup determines your success rate.
In this guide, I'll break down the top SEO proxy providers for 2026, show you exactly how to use them with practical Python code examples, and reveal hidden tricks that most articles won't tell you.
Best SEO Proxy Providers at a Glance
| Provider | Best For | Pool Size | Starting Price | Success Rate |
|---|---|---|---|---|
| Roundproxies | All-around SEO & scraping | 100M+ IPs | $2/GB | 98%+ |
| Smartproxy | Budget-friendly option | 40M+ IPs | $4.50/GB | 87% |
| Oxylabs | Ethical enterprise solution | 100M+ IPs | $7.50/GB | 91.76% |
| Bright Data | Large-scale operations | 72M+ IPs | $2/GB | 90%+ |
| SOAX | Flexible rotation options | 155M+ IPs | $2/GB | 81.50% |
| NetNut | Static residential IPs | 5M+ IPs | $3.75/GB | 90.96% |
| Webshare | Budget SOCKS proxies | 150K+ IPs | $1.75/month | 85% |
What Are SEO Proxies and Why Do You Need Them?
SEO proxies act as intermediaries between your scraping tools and search engines.
When you send a request to Google, it sees the proxy's IP address instead of yours. This prevents your main IP from getting blacklisted when running hundreds or thousands of queries.
Here's what happens without proxies: Google detects unusual traffic patterns from your IP. After a few dozen requests, you hit CAPTCHAs. A few hundred more, and your IP gets blocked entirely.
With rotating residential proxies, each request appears to come from a different household. Google can't distinguish your automated queries from normal user searches.
Key Use Cases for SEO Proxies
Rank tracking requires checking positions for keywords across different locations. A business in New York needs to know how they rank in LA, Chicago, and Miami. Proxies with geo-targeting let you simulate searches from any city.
Competitor analysis involves scraping competitor websites without revealing your identity. If you're checking their backlink profiles or content structure repeatedly, their server logs will show your IP. Proxies keep you anonymous.
Local SEO auditing means verifying how Google My Business listings appear in different regions. The same search shows different results based on location. Rotating proxies let you check all target markets.
Keyword research at scale generates thousands of queries to find search volumes and related terms. No single IP can handle this volume without triggering Google's anti-bot systems.
1. Roundproxies – Premium Quality, Fair Pricing

Overview: Roundproxies, known since 2019, is a heavyweight in the proxy industry, trusted by Fortune 500 companies for its vast network and advanced features. With over 100+ million IPs spanning every country, it offers unparalleled global coverage.
Key Features:
- Extensive proxy pool across mobile, residential & data center IPs
- Precise targeting by country, city, carrier, etc.
- Flexible customization
- 24/7 live support & account managers
- EXTRA: SEO Residential Proxy Pools
Pros:
- Huge global network for strong IP diversity and minimal blocks
- Powerful tools for web data harvesting at scale
- Highly customizable plans for enterprises
- Superb customer support and SLAs
- Fully own proxy infrastructure
- No high pricing
1. Residential Proxies: $3/GB (down to $2 per GB)
2. Datacenter Proxies: from $0.30/month
Best For: Large enterprises and marketers conducting intensive web scraping and data gathering. Bright Data shines for robust, large-scale data operations where quality and support are top priorities.
2. Smartproxy - Well-Rounded Option

Overview: Smartproxy strikes a great balance between performance and affordability, making it a top choice for a wide range of users. Its sizable residential proxy pool spans over 195 locations worldwide.
Key Features:
- 40M+ residential IPs & 40K+ data center IPs
- Auto-rotating or sticky sessions
- Unlimited connections & bandwidth
- Simple, user-friendly dashboard
Pros:
- Very competitive pricing for the proxy quality
- Great for diverse use cases like sneaker copping, social media, etc.
- Beginner-friendly browser extensions
- Free demo for testing the waters
Cons:
- Smaller network vs. some top-tier peers
- Limited city-level targeting
1. Residential Proxies: $7/GB (down to $4.5 per GB)
2. Datacenter Proxies: from $2.50/month
Best For: Smartproxy is an excellent "daily driver" for solopreneurs, small-to-medium businesses, and freelancers needing reliable proxies for tasks like ad verification, market research, etc.
3. Oxylabs - Powerful, Ethical Proxy Solution

Overview: Oxylabs provides a stellar combination of proxy performance and ethical practices. Its impressive proxy infrastructure and commitment to transparency makes it a provider you can feel good about using.
Key Features:
- 100+ million residential proxies
- Extensive data center & mobile proxies
- Robust web scraping tools & integrations
- Advanced rotation settings like ProxyMesh
Pros:
- Strong proxy performance rivaling top competitors
- Clear ethical practices and supply transparency
- Stellar customer support and public roadmap
- Unique features like AI-based Real-Time Crawler
Cons:
- Higher minimum commitment than some providers
- Pricing not disclosed upfront
1. Residential Proxies: $8/GB (down to $7.5 per GB)
2. Datacenter Proxies: from $1.20/month
Best For: Businesses that need powerful proxies and web scraping capabilities for market research, brand protection, etc. while maintaining high ethical standards of data acquisition.
4. Bright Data (formerly Luminati) - High-Grade Proxy

Overview: Bright Data, previously known as Luminati, is a heavyweight in the proxy industry, trusted by Fortune 500 companies for its vast network and advanced features. With over 72+ million IPs spanning every country, it offers unparalleled global coverage.
Key Features:
- Extensive proxy pool across mobile, residential & data center IPs
- Precise targeting by country, city, carrier, etc.
- Flexible customization and API integration
- 24/7 live support & account managers
Pros:
- Huge global network for strong IP diversity and minimal blocks
- Powerful tools for web data harvesting at scale
- Highly customizable plans for enterprises
- Superb customer support and SLAs
Cons:
- Higher price point vs. some competitors
- May be overkill for small-scale use cases
1. Residential Proxies: $3/GB (down to $2 per GB)
2. Datacenter Proxies: from $0.30/month
Best for: Large enterprises and marketers conducting intensive web scraping and data gathering. Bright Data shines for robust, large-scale data operations where quality and support are top priorities.
4. SOAX - Flexible Proxy Service

Overview: SOAX offers unique flexibility with its backconnect rotating proxies that automatically switch IP addresses for each connection request. Its streamlined service is easy to implement and covers a range of use cases.
Key Features:
- Residential, data center & IPv6 proxy support
- Automatic location-based rotation
- Supports HTTP(S) & SOCKS5 protocols
- Browser extensions for convenient proxy usage
Pros:
- Great for e-commerce and sneaker copping
- Quick setup with instant activation
- Flexible monthly, daily & hourly plans
- Affordable pricing for rotating proxies
Cons:
- Smaller network pool than some competitors
- Less control over targeting vs. other providers
1. Residential Proxies: $6.60/GB (down to $2 per GB)
2. Datacenter Proxies: from $2.50/month
Best For: Sneaker coppers, e-commerce businesses, and individuals wanting a flexible proxy service that's quick and easy to get started with, without a big commitment.
5. NetNut - Dependable Static Residential Proxies

Overview: NetNut provides reliable static residential proxies sourced through Direct Carrier Integrations. This unique approach maintains a consistently fast and stable proxy network ideal for high-volume, long-term use cases.
Key Features:
- Over 5 million static residential IPs
- Direct Carrier Integrations for quality IPs
- API for seamless integration
- Proxy optimizer & detailed analytics tools
Pros:
- Highly stable proxies ideal for long sessions
- Blazing speeds & low fail rates
- Pay-per-use model for cost efficiency
- Responsive tech support team
Cons:
- Fewer rotating IPs vs. backconnect services
- Higher priced than some competitors
- Not unlimited option for Datacenter Proxies, you pay per GB
- Minimum comitment of $99 for Residential Proxies
1. Residential Proxies: $7.07/per GB (down to $3.75 per GB)
2. Datacenter Proxies: from $1.00/month
Best For: Enterprises needing consistently fast, long-lasting sessions for intensive tasks like online marketing and web data gathering. NetNut's static IPs shine for maintaining uninterrupted sessions without hiccups.
6. Webshare - Reliable Proxies

Overview: Rsocks specializes in proxies known for their strong anonymity and performance. Its user-friendly service is a top pick for accessing region-restricted content and general privacy needs.
Key Features:
- Shared & private SOCKS4/5 proxies
- Over 150,000 IPs across 50+ countries
- Unlimited bandwidth & traffic
- Handy proxy checker tool
Pros:
- Excellent proxy speeds & low fail rates
- Affordable pricing for reliable SOCKS proxies
- Easy setup & usage across devices/software
- Responsive customer support
Cons:
- More limited IP pool vs. residential providers
- Fewer advanced features for enterprise usage
1. Residential Proxies: $7.00/per GB (down to $4.50 per GB)
2. Datacenter Proxies: from $1.75/month
Best For: Accessing geo-restricted content like streaming services, masking your IP for general anonymity, and secure web browsing. Rsocks' high-quality SOCKS proxies offer a great combination of performance and affordability.
How to Choose the Right SEO Proxy Type
Different proxy types serve different purposes. Here's when to use each:
Residential Proxies
Choose residential proxies when scraping Google, Bing, or other search engines that actively detect datacenter IPs.
Residential IPs come from real devices on consumer ISP networks. Search engines trust them more because they look like normal user traffic.
The tradeoff is cost. Residential proxies charge by bandwidth, typically $2-8 per gigabyte. Heavy scraping jobs add up quickly.
Datacenter Proxies
Datacenter proxies work well for site audits, internal page checks, and less-protected targets like Bing.
They cost significantly less – often under $2 for a dedicated IP with unlimited bandwidth. Speed is typically faster than residential connections.
The downside is detectability. Google blocks known datacenter IP ranges aggressively. Success rates drop to 30-50% without additional fingerprinting measures.
ISP Proxies (Static Residential)
ISP proxies combine datacenter speed with residential trust levels.
These IPs are registered to ISPs but hosted in datacenters. They appear residential to target websites while maintaining consistent performance.
Pricing falls between datacenter and residential options. They work well for long-running sessions where you need a stable IP address.
Mobile Proxies
Mobile proxies route through cellular networks (4G/5G connections).
Search engines rarely block mobile IPs because millions of legitimate users share the same IP through carrier-grade NAT. This creates very high trust levels.
Use mobile proxies when other types fail or when you need to check mobile-specific SERPs. They're the most expensive option but also the most reliable.
Python Code: Building Your Own SERP Scraper with Proxies
Let me show you how to build a production-ready Google scraper using proxies. We'll start simple and add complexity.
Basic Setup with Requests
First, install the required libraries:
bash
pip install requests beautifulsoup4 lxmlHere's a basic scraper that rotates through multiple proxies:
python
import requests
from bs4 import BeautifulSoup
import random
import time
# Your proxy list - format: ip:port:username:password
PROXIES = [
"gate.roundproxies.com:7777:user123:pass456",
"gate.roundproxies.com:7778:user123:pass456",
"gate.roundproxies.com:7779:user123:pass456",
]
def get_proxy():
"""Return a random proxy from the pool"""
proxy_str = random.choice(PROXIES)
parts = proxy_str.split(":")
ip, port, user, password = parts[0], parts[1], parts[2], parts[3]
return {
"http": f"http://{user}:{password}@{ip}:{port}",
"https": f"http://{user}:{password}@{ip}:{port}"
}This function randomly selects a proxy from your pool. The format works with most residential proxy providers that use username/password authentication.
Crafting Realistic Request Headers
Google looks at your request headers to detect bots. Here's how to make requests look human:
python
def get_headers():
"""Return realistic browser headers"""
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/119.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
]
return {
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"DNT": "1",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
}The user agent rotation prevents fingerprinting based on browser identity. The other headers match what real browsers send.
The Core Scraping Function
Now let's build the actual scraper:
python
def scrape_google(keyword, num_results=10, location="us"):
"""
Scrape Google search results for a keyword
Args:
keyword: Search query
num_results: Number of results to fetch
location: Country code for geo-targeting
Returns:
List of result dictionaries with title, url, description
"""
# Build Google search URL
base_url = "https://www.google.com/search"
params = {
"q": keyword,
"num": num_results,
"hl": "en",
"gl": location,
}
results = []
max_retries = 3
for attempt in range(max_retries):
try:
proxy = get_proxy()
headers = get_headers()
response = requests.get(
base_url,
params=params,
headers=headers,
proxies=proxy,
timeout=30
)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "lxml")
results = parse_results(soup)
break
elif response.status_code == 429:
# Rate limited - wait and retry with different proxy
print(f"Rate limited, retrying with new proxy...")
time.sleep(random.uniform(2, 5))
continue
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
time.sleep(1)
continue
return resultsThe retry logic handles temporary failures gracefully. Rate limiting (429 responses) triggers a proxy rotation with a randomized delay.
Parsing Google Results
Google's HTML structure changes frequently. Here's a parser that handles current markup:
python
def parse_results(soup):
"""Extract search results from Google HTML"""
results = []
# Find organic result containers
result_divs = soup.find_all("div", class_="g")
for div in result_divs:
try:
# Extract title
title_elem = div.find("h3")
title = title_elem.get_text() if title_elem else None
# Extract URL
link_elem = div.find("a")
url = link_elem.get("href") if link_elem else None
# Extract description snippet
desc_elem = div.find("div", class_="VwiC3b")
description = desc_elem.get_text() if desc_elem else None
if title and url:
results.append({
"title": title,
"url": url,
"description": description
})
except Exception as e:
continue
return resultsThe class names (like VwiC3b) change periodically. You'll need to update these selectors when Google modifies their HTML.
Running Bulk Keyword Checks
Here's how to check rankings for multiple keywords efficiently:
python
import json
from concurrent.futures import ThreadPoolExecutor
def check_rankings(keywords, your_domain, location="us"):
"""
Check rankings for multiple keywords
Args:
keywords: List of keywords to check
your_domain: Your website domain to find
location: Target location code
Returns:
Dictionary mapping keywords to ranking positions
"""
rankings = {}
for keyword in keywords:
print(f"Checking: {keyword}")
results = scrape_google(keyword, num_results=100, location=location)
position = None
for i, result in enumerate(results, 1):
if your_domain in result.get("url", ""):
position = i
break
rankings[keyword] = {
"position": position,
"found": position is not None,
"total_results": len(results)
}
# Randomized delay between queries
time.sleep(random.uniform(1, 3))
return rankings
# Example usage
keywords = [
"best seo proxies",
"rank tracking software",
"serp scraping python"
]
results = check_rankings(keywords, "yourdomain.com", location="us")
print(json.dumps(results, indent=2))The delay between requests prevents triggering rate limits. Production systems should use longer delays or larger proxy pools.
Advanced Technique: Browser Automation with Selenium
When requests-based scraping fails due to JavaScript rendering or advanced bot detection, browser automation becomes necessary.
Setting Up Selenium with Proxies
python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def create_driver_with_proxy(proxy_host, proxy_port, proxy_user, proxy_pass):
"""Create a Chrome driver configured with proxy"""
chrome_options = Options()
# Proxy authentication via extension
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
}
}
"""
background_js = f"""
var config = {{
mode: "fixed_servers",
rules: {{
singleProxy: {{
scheme: "http",
host: "{proxy_host}",
port: parseInt({proxy_port})
}},
bypassList: []
}}
}};
chrome.proxy.settings.set({{value: config, scope: "regular"}}, function() {{}});
chrome.webRequest.onAuthRequired.addListener(
function(details) {{
return {{
authCredentials: {{
username: "{proxy_user}",
password: "{proxy_pass}"
}}
}};
}},
{{urls: ["<all_urls>"]}},
['blocking']
);
"""
# Create extension
import zipfile
import os
plugin_file = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(plugin_file, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options.add_extension(plugin_file)
# Additional stealth options
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
driver = webdriver.Chrome(options=chrome_options)
# Clean up extension file
os.remove(plugin_file)
return driverThis creates a Chrome extension on-the-fly to handle proxy authentication. The stealth options help avoid detection by hiding Selenium's automation markers.
Scraping with Selenium
python
def scrape_google_selenium(keyword, proxy_config):
"""Scrape Google using Selenium with proxy"""
driver = create_driver_with_proxy(
proxy_config["host"],
proxy_config["port"],
proxy_config["user"],
proxy_config["pass"]
)
try:
url = f"https://www.google.com/search?q={keyword}&num=20"
driver.get(url)
# Wait for results to load
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div.g"))
)
# Extract results
results = []
elements = driver.find_elements(By.CSS_SELECTOR, "div.g")
for elem in elements:
try:
title = elem.find_element(By.CSS_SELECTOR, "h3").text
link = elem.find_element(By.CSS_SELECTOR, "a").get_attribute("href")
results.append({"title": title, "url": link})
except:
continue
return results
finally:
driver.quit()Selenium handles JavaScript-rendered content that requests can't see. The tradeoff is slower execution and higher resource usage.
Hidden Tricks Most Articles Won't Tell You
Here are advanced techniques that experienced SEO professionals use:
Trick 1: Sticky Sessions for Consistent Results
When checking rankings repeatedly, use sticky sessions to maintain the same IP:
python
# Configure proxy for sticky session
proxy_endpoint = "gate.roundproxies.com:7777"
session_id = "my_seo_project_12345"
# Add session ID to username for sticky routing
proxy_auth = f"user123-session-{session_id}:password456"Most providers support session IDs in the username field. This routes all requests through the same IP for consistent results over time.
Trick 2: Location-Specific Search Parameters
Google uses multiple signals to determine location. Stack them for accuracy:
python
params = {
"q": keyword,
"gl": "us", # Country
"hl": "en", # Language
"uule": "w+CAIQIC...", # Encoded location (city level)
"near": "New York, NY", # Additional location hint
}The uule parameter encodes a specific geographic location. You can generate these using online tools or calculate them programmatically.
Trick 3: Fingerprint Randomization
Beyond IP rotation, randomize your browser fingerprint:
python
import random
def randomize_viewport():
"""Generate random but realistic viewport dimensions"""
common_widths = [1366, 1440, 1536, 1920, 2560]
common_heights = [768, 900, 864, 1080, 1440]
return {
"width": random.choice(common_widths),
"height": random.choice(common_heights)
}
def randomize_timezone():
"""Return a random common timezone"""
timezones = [
"America/New_York",
"America/Chicago",
"America/Los_Angeles",
"America/Denver"
]
return random.choice(timezones)Combine these with proxy rotation to create unique browser profiles for each request.
Trick 4: Detect and Handle CAPTCHAs
Implement CAPTCHA detection to switch proxies before wasting requests:
python
def check_for_captcha(response_text):
"""Detect if response contains a CAPTCHA"""
captcha_indicators = [
"unusual traffic",
"automated queries",
"captcha",
"recaptcha",
"/sorry/index"
]
response_lower = response_text.lower()
for indicator in captcha_indicators:
if indicator in response_lower:
return True
return False
# In your scraping loop
if check_for_captcha(response.text):
print("CAPTCHA detected - rotating proxy")
proxy = get_new_proxy()
continueTrick 5: Use Different Proxy Types for Different Tasks
Create a proxy pool with mixed types for optimal cost and reliability:
python
class ProxyPool:
def __init__(self):
self.residential = [] # For Google
self.datacenter = [] # For site audits
self.mobile = [] # For mobile SERPs
def get_proxy_for_target(self, target):
if "google.com" in target:
return random.choice(self.residential)
elif target.startswith("mobile:"):
return random.choice(self.mobile)
else:
return random.choice(self.datacenter)This approach optimizes costs by using cheaper datacenter proxies where possible.
Troubleshooting Common SEO Proxy Issues
Problem: High Failure Rates on Google
Symptoms: Many 403 or 429 responses, frequent CAPTCHAs.
Solutions:
- Switch from datacenter to residential proxies
- Reduce request frequency (add longer delays)
- Rotate user agents more frequently
- Check if your proxy provider is specifically blocked
Problem: Inconsistent Ranking Data
Symptoms: Same keyword shows different positions across checks.
Solutions:
- Use sticky sessions to maintain consistent IPs
- Set explicit location parameters (gl, hl, uule)
- Clear cookies between sessions
- Run checks at consistent times of day
Problem: Slow Response Times
Symptoms: Requests taking 10+ seconds to complete.
Solutions:
- Choose proxy servers geographically closer to you
- Switch to ISP proxies for faster connections
- Reduce concurrent connections to avoid overloading
- Check if your provider's infrastructure is congested
Bonus: Complete Rank Tracking System Architecture
For those building production rank tracking systems, here's a complete architecture that handles thousands of keywords daily.
Database Schema for Storing Rankings
python
import sqlite3
from datetime import datetime
def create_database():
"""Set up SQLite database for rank tracking"""
conn = sqlite3.connect('rankings.db')
cursor = conn.cursor()
# Keywords table
cursor.execute('''
CREATE TABLE IF NOT EXISTS keywords (
id INTEGER PRIMARY KEY AUTOINCREMENT,
keyword TEXT NOT NULL,
domain TEXT NOT NULL,
location TEXT DEFAULT 'us',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(keyword, domain, location)
)
''')
# Rankings history table
cursor.execute('''
CREATE TABLE IF NOT EXISTS rankings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
keyword_id INTEGER NOT NULL,
position INTEGER,
url TEXT,
checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (keyword_id) REFERENCES keywords(id)
)
''')
# SERP snapshots for competitor analysis
cursor.execute('''
CREATE TABLE IF NOT EXISTS serp_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
keyword_id INTEGER NOT NULL,
position INTEGER NOT NULL,
title TEXT,
url TEXT,
description TEXT,
checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (keyword_id) REFERENCES keywords(id)
)
''')
conn.commit()
return connThis schema stores historical ranking data along with full SERP snapshots. You can track your position over time and monitor which competitors appear for each keyword.
Scheduling Daily Rank Checks
python
import schedule
import threading
from queue import Queue
class RankTracker:
def __init__(self, proxy_pool, db_connection):
self.proxy_pool = proxy_pool
self.db = db_connection
self.task_queue = Queue()
self.results_queue = Queue()
def add_keywords(self, keywords, domain, location='us'):
"""Add keywords to track"""
cursor = self.db.cursor()
for keyword in keywords:
cursor.execute('''
INSERT OR IGNORE INTO keywords (keyword, domain, location)
VALUES (?, ?, ?)
''', (keyword, domain, location))
self.db.commit()
def check_all_keywords(self):
"""Queue all keywords for checking"""
cursor = self.db.cursor()
cursor.execute('SELECT id, keyword, domain, location FROM keywords')
for row in cursor.fetchall():
self.task_queue.put({
'id': row[0],
'keyword': row[1],
'domain': row[2],
'location': row[3]
})
def worker(self):
"""Worker thread that processes keyword checks"""
while True:
task = self.task_queue.get()
if task is None:
break
try:
results = scrape_google(
task['keyword'],
num_results=100,
location=task['location']
)
# Find domain position
position = None
found_url = None
for i, result in enumerate(results, 1):
if task['domain'] in result.get('url', ''):
position = i
found_url = result['url']
break
self.results_queue.put({
'keyword_id': task['id'],
'position': position,
'url': found_url,
'serp': results
})
except Exception as e:
print(f"Error checking {task['keyword']}: {e}")
self.task_queue.task_done()
def run_workers(self, num_workers=5):
"""Start worker threads"""
workers = []
for _ in range(num_workers):
t = threading.Thread(target=self.worker)
t.start()
workers.append(t)
return workersThe threaded architecture processes multiple keywords concurrently while respecting rate limits through your proxy pool.
Generating Ranking Reports
python
def generate_ranking_report(db, domain, days=30):
"""Generate a ranking trend report"""
cursor = db.cursor()
cursor.execute('''
SELECT
k.keyword,
k.location,
r.position,
r.checked_at
FROM keywords k
JOIN rankings r ON k.id = r.keyword_id
WHERE k.domain = ?
AND r.checked_at > datetime('now', '-' || ? || ' days')
ORDER BY k.keyword, r.checked_at
''', (domain, days))
results = cursor.fetchall()
# Group by keyword
from collections import defaultdict
keyword_data = defaultdict(list)
for row in results:
keyword_data[row[0]].append({
'position': row[2],
'date': row[3]
})
# Calculate trends
report = []
for keyword, data in keyword_data.items():
if len(data) >= 2:
first_position = data[0]['position']
last_position = data[-1]['position']
if first_position and last_position:
change = first_position - last_position
trend = 'up' if change > 0 else 'down' if change < 0 else 'stable'
else:
change = None
trend = 'unknown'
report.append({
'keyword': keyword,
'current_position': last_position,
'change': change,
'trend': trend,
'data_points': len(data)
})
return sorted(report, key=lambda x: x['current_position'] or 999)Advanced: Anti-Detection Techniques for 2026
Google's bot detection has become increasingly sophisticated. Here are cutting-edge techniques to stay under the radar.
Browser Fingerprint Spoofing
Modern detection looks beyond IP addresses to browser fingerprints. Here's how to randomize key fingerprint elements:
python
import random
import string
def generate_canvas_noise():
"""Generate unique canvas fingerprint noise"""
return ''.join(random.choices(string.ascii_letters + string.digits, k=32))
def get_spoofed_navigator():
"""Generate spoofed navigator properties"""
platforms = [
{"platform": "Win32", "oscpu": "Windows NT 10.0; Win64; x64"},
{"platform": "MacIntel", "oscpu": "Intel Mac OS X 10_15_7"},
{"platform": "Linux x86_64", "oscpu": "Linux x86_64"}
]
selected = random.choice(platforms)
return {
"platform": selected["platform"],
"oscpu": selected["oscpu"],
"hardwareConcurrency": random.choice([4, 8, 12, 16]),
"deviceMemory": random.choice([4, 8, 16, 32]),
"languages": ["en-US", "en"],
"webdriver": False,
"plugins_length": random.randint(3, 7)
}Inject these values into your Selenium sessions using JavaScript execution:
python
def inject_fingerprint_spoofing(driver, navigator_props):
"""Inject fingerprint spoofing scripts"""
script = f"""
// Override navigator properties
Object.defineProperty(navigator, 'platform', {{
get: () => '{navigator_props["platform"]}'
}});
Object.defineProperty(navigator, 'hardwareConcurrency', {{
get: () => {navigator_props["hardwareConcurrency"]}
}});
Object.defineProperty(navigator, 'webdriver', {{
get: () => false
}});
// Override plugins to look normal
Object.defineProperty(navigator, 'plugins', {{
get: () => new Array({navigator_props["plugins_length"]}).fill({{name: 'Plugin'}})
}});
"""
driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
'source': script
})TLS Fingerprint Randomization
Advanced detection systems analyze TLS handshake patterns. Different browsers and versions create unique TLS fingerprints.
python
# Using curl_cffi for TLS fingerprint impersonation
from curl_cffi import requests as curl_requests
def request_with_tls_spoofing(url, proxy):
"""Make request with Chrome TLS fingerprint"""
response = curl_requests.get(
url,
impersonate="chrome120", # Impersonate Chrome 120
proxies={
"http": proxy,
"https": proxy
},
timeout=30
)
return responseThe curl_cffi library impersonates the TLS fingerprint of real browsers. This bypasses detection systems that identify automation by TLS characteristics.
Request Timing Humanization
Bots make requests in predictable patterns. Humans don't. Add realistic timing variations:
python
import numpy as np
def human_delay():
"""Generate human-like delay between requests"""
# Most humans have reaction times between 0.5-2 seconds
# With occasional longer pauses for reading
if random.random() < 0.1:
# 10% chance of longer pause (reading content)
return np.random.gamma(shape=2, scale=5)
else:
# Normal inter-request delay
return np.random.gamma(shape=2, scale=1) + 0.5
def scroll_pattern():
"""Generate human-like scroll behavior"""
scroll_actions = []
# Humans scroll in bursts, not continuously
num_scrolls = random.randint(3, 8)
for _ in range(num_scrolls):
scroll_distance = random.randint(100, 500)
pause_after = random.uniform(0.5, 2.0)
scroll_actions.append({
'distance': scroll_distance,
'pause': pause_after
})
return scroll_actionsCost Optimization Strategies
Proxy costs can add up quickly. Here's how to minimize spend while maintaining quality:
Strategy 1: Tiered Proxy Usage
Use expensive residential proxies only when necessary:
python
class TieredProxyManager:
def __init__(self, datacenter_proxies, residential_proxies, mobile_proxies):
self.tiers = {
'datacenter': datacenter_proxies,
'residential': residential_proxies,
'mobile': mobile_proxies
}
self.tier_costs = {
'datacenter': 0.10, # $ per 1000 requests
'residential': 2.00, # $ per GB
'mobile': 5.00 # $ per GB
}
def get_proxy_for_task(self, task_type, previous_failures=0):
"""Select proxy tier based on task and failure history"""
if task_type == 'site_audit':
return random.choice(self.tiers['datacenter']), 'datacenter'
elif task_type == 'google_serp':
if previous_failures == 0:
return random.choice(self.tiers['residential']), 'residential'
elif previous_failures >= 2:
return random.choice(self.tiers['mobile']), 'mobile'
else:
return random.choice(self.tiers['residential']), 'residential'
else:
return random.choice(self.tiers['datacenter']), 'datacenter'Start with cheaper proxies and escalate to more expensive tiers only when needed.
Strategy 2: Caching SERP Results
Don't re-scrape data you already have:
python
import hashlib
import json
from datetime import datetime, timedelta
class SERPCache:
def __init__(self, db_connection, cache_duration_hours=24):
self.db = db_connection
self.cache_duration = timedelta(hours=cache_duration_hours)
self._init_table()
def _init_table(self):
cursor = self.db.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS serp_cache (
cache_key TEXT PRIMARY KEY,
results TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
self.db.commit()
def _get_key(self, keyword, location):
key_string = f"{keyword}:{location}"
return hashlib.md5(key_string.encode()).hexdigest()
def get(self, keyword, location):
"""Get cached results if fresh"""
cache_key = self._get_key(keyword, location)
cursor = self.db.cursor()
cursor.execute('''
SELECT results, created_at FROM serp_cache
WHERE cache_key = ?
''', (cache_key,))
row = cursor.fetchone()
if row:
created_at = datetime.fromisoformat(row[1])
if datetime.now() - created_at < self.cache_duration:
return json.loads(row[0])
return None
def set(self, keyword, location, results):
"""Cache SERP results"""
cache_key = self._get_key(keyword, location)
cursor = self.db.cursor()
cursor.execute('''
INSERT OR REPLACE INTO serp_cache (cache_key, results, created_at)
VALUES (?, ?, ?)
''', (cache_key, json.dumps(results), datetime.now().isoformat()))
self.db.commit()For competitor monitoring where real-time data isn't critical, a 24-hour cache reduces proxy costs significantly.
Strategy 3: Off-Peak Scheduling
Run heavy scraping jobs during off-peak hours when proxy networks are less congested:
python
import schedule
from datetime import datetime
def is_off_peak():
"""Check if current time is off-peak (lower proxy usage)"""
hour = datetime.now().hour
# US off-peak: 2 AM - 6 AM Eastern
return 2 <= hour <= 6
def schedule_heavy_jobs():
"""Schedule resource-intensive jobs for off-peak"""
schedule.every().day.at("03:00").do(run_full_serp_analysis)
schedule.every().day.at("04:00").do(run_competitor_backlink_check)
# Light monitoring can run throughout the day
schedule.every(4).hours.do(run_priority_keyword_check)Final Thoughts: Building Your SEO Data Infrastructure
The best SEO professionals treat data collection as infrastructure, not a one-time project.
Invest in building robust systems that can scale with your needs. Start with the basics: a reliable proxy provider, a simple scraper, and a database to store results.
As you grow, add sophistication: multi-tiered proxy management, fingerprint spoofing, intelligent caching. The compound effect of good infrastructure pays dividends for years.
The proxy landscape evolves constantly. Providers improve their networks, search engines update detection methods, and new tools emerge. Stay current by testing regularly and maintaining relationships with multiple providers.
Your competitors are collecting data. The question is whether you'll have better data than they do.