Cloudscraper

How to Use Proxies for Web Scraping in 2025

October 06, 2025

9 min read

I've been scraping websites for years, and if there's one thing I've learned, it's this: proxies aren't optional—they're the difference between collecting data at scale and getting your IP banned in five minutes.

When you're making hundreds or thousands of requests to a website, that site will notice. And when they notice, they'll block you. Proxies let you distribute those requests across multiple IP addresses, making your scraper look like regular traffic instead of a bot hammering their servers.

In this guide, I'll walk you through everything you need to know about using proxies for web scraping—from the basics to some lesser-known techniques that actually work.

Why proxies matter for web scraping

Here's the reality: most websites don't want you scraping them. They'll implement rate limits, IP bans, CAPTCHAs, and sophisticated fingerprinting techniques to stop you.

Without proxies, you're scraping from a single IP address. That's like showing up to a store every five seconds asking for the same product. You're going to get noticed, and you're going to get kicked out.

Proxies solve three critical problems:

IP bans: By rotating through multiple IP addresses, you avoid triggering automated blocking systems that flag high-volume requests from a single source.

Rate limits: Websites often limit how many requests an IP can make per minute. Proxies let you spread requests across multiple IPs, effectively bypassing these limits.

Geo-restrictions: Some content is only available in specific countries. Proxies from those locations give you access to region-locked data.

The trick is knowing which proxies to use and how to manage them properly. Let's start with the basics.

Types of proxies for web scraping

Not all proxies are created equal. The type you choose depends on your target website, budget, and how sophisticated their anti-bot measures are.

Datacenter proxies

These come from data centers—think cloud providers like AWS or DigitalOcean. They're fast, cheap, and perfect for scraping sites without heavy anti-bot protection.

The downside? They're easier to detect. Websites can often identify datacenter IPs because they come in predictable ranges and aren't associated with real ISPs. For basic scraping tasks, though, they're usually fine.

Use datacenter proxies when you're scraping sites like job boards, product catalogs, or any target that doesn't have sophisticated blocking.

Residential proxies

Residential proxies use IP addresses assigned to real homes by internet service providers. From the website's perspective, you look like a regular person browsing from their couch.

They're much harder to detect and block, which is why they cost more—sometimes significantly more. But for scraping e-commerce sites, social media platforms, or anything with serious anti-bot measures, residential proxies are often necessary.

The catch is that residential proxy pools are usually shared, and you need to rotate through them frequently to avoid detection.

Mobile proxies

Mobile proxies use IP addresses from cellular networks. These are the holy grail of proxies because mobile IPs are shared among many users and have incredibly high trust scores.

They're also expensive and can be slower than other options. But if you're scraping mobile apps or sites with aggressive blocking (like sneaker sites or ticket vendors), mobile proxies might be your only option.

ISP proxies

ISP proxies are a hybrid—they're hosted in data centers but use IP addresses registered to ISPs. You get the speed of datacenter proxies with some of the legitimacy of residential IPs.

They're a middle ground option that works well for medium-difficulty targets.

Setting up proxies in Python

Let's get practical. I'll show you how to use proxies with Python's requests library, which is what most people use for HTTP-based scraping.

Basic proxy setup

Here's the simplest way to use a proxy:

import requests

proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}

response = requests.get('http://httpbin.org/ip', proxies=proxies)
print(response.json())

That's it. The proxies dictionary tells requests to route your traffic through the specified proxy server. The response will show the proxy's IP address instead of yours.

If your proxy requires authentication (most do), add your credentials to the URL:

proxies = {
    'http': 'http://username:password@proxy.example.com:8080',
    'https': 'http://username:password@proxy.example.com:8080',
}

Implementing proxy rotation

Using a single proxy defeats the purpose. You need to rotate through a pool of proxies to distribute your requests. Here's a basic rotation implementation:

import requests
import random

proxy_list = [
    'http://username:password@proxy1.example.com:8080',
    'http://username:password@proxy2.example.com:8080',
    'http://username:password@proxy3.example.com:8080',
    'http://username:password@proxy4.example.com:8080',
]

def get_random_proxy():
    return random.choice(proxy_list)

def scrape_with_rotation(url):
    proxy = get_random_proxy()
    proxies = {
        'http': proxy,
        'https': proxy,
    }
    
    try:
        response = requests.get(url, proxies=proxies, timeout=10)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error with proxy {proxy}: {e}")
        return None

# Use it
urls = [
    'http://example.com/page1',
    'http://example.com/page2',
    'http://example.com/page3',
]

for url in urls:
    content = scrape_with_rotation(url)
    if content:
        print(f"Scraped {url} successfully")

This code randomly selects a proxy for each request. Simple, but it works for basic scraping tasks.

Smarter proxy rotation with weighting

Random rotation is fine, but you can do better. Here's a technique I use that weights proxies based on their reliability and recent usage:

import random
from time import time

class Proxy:
    def __init__(self, ip, proxy_type="datacenter"):
        self.ip = ip
        self.type = proxy_type
        self.status = "unchecked"  # alive, unchecked, dead
        self.last_used = None
        self.failures = 0
    
    def __repr__(self):
        return self.ip

class ProxyRotator:
    def __init__(self, proxies):
        self.proxies = [Proxy(p) for p in proxies]
    
    def get_weighted_proxy(self):
        weights = []
        for proxy in self.proxies:
            weight = 1000
            
            # Penalize dead proxies heavily
            if proxy.status == "dead":
                weight -= 800
            
            # Prefer residential over datacenter
            if proxy.type == "residential":
                weight += 300
            
            # Penalize recently used proxies
            if proxy.last_used:
                seconds_since_use = time() - proxy.last_used
                if seconds_since_use < 5:
                    weight -= 400
            
            # Penalize proxies with recent failures
            weight -= (proxy.failures * 100)
            
            weights.append(max(weight, 1))
        
        return random.choices(self.proxies, weights=weights)[0]
    
    def mark_success(self, proxy):
        proxy.status = "alive"
        proxy.last_used = time()
        proxy.failures = 0
    
    def mark_failure(self, proxy):
        proxy.failures += 1
        if proxy.failures >= 3:
            proxy.status = "dead"

# Usage
proxy_list = [
    'http://user:pass@proxy1.com:8080',
    'http://user:pass@proxy2.com:8080',
    'http://user:pass@proxy3.com:8080',
]

rotator = ProxyRotator(proxy_list)

def smart_scrape(url):
    max_retries = 3
    
    for attempt in range(max_retries):
        proxy = rotator.get_weighted_proxy()
        proxies = {'http': proxy.ip, 'https': proxy.ip}
        
        try:
            response = requests.get(url, proxies=proxies, timeout=10)
            if response.status_code == 200:
                rotator.mark_success(proxy)
                return response.text
            else:
                rotator.mark_failure(proxy)
        except Exception as e:
            rotator.mark_failure(proxy)
            print(f"Attempt {attempt + 1} failed with {proxy.ip}: {e}")
    
    return None

This approach avoids using the same proxy repeatedly, deprioritizes proxies that have failed recently, and gives preference to higher-quality proxy types. It's more sophisticated than random rotation and performs better at scale.

Beyond basic proxies: fingerprinting and detection

Here's something most proxy guides won't tell you: using proxies alone isn't enough anymore. Modern anti-bot systems use browser fingerprinting to identify scrapers, even when you're rotating IPs.

Fingerprinting analyzes dozens of browser characteristics—user agent, screen resolution, installed fonts, WebGL rendering, canvas fingerprints, and even TLS handshake patterns. If these don't match what a real browser would send, you're getting blocked regardless of your proxy.

User-Agent rotation

At minimum, rotate your User-Agent header alongside your proxies:

import requests
import random

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:132.0) Gecko/20100101 Firefox/132.0',
]

def scrape_with_headers(url, proxy):
    headers = {
        'User-Agent': random.choice(user_agents),
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
        'Upgrade-Insecure-Requests': '1',
    }
    
    proxies = {'http': proxy, 'https': proxy}
    response = requests.get(url, headers=headers, proxies=proxies)
    return response.text

This makes each request look like it's coming from a different browser. It's basic, but it works for sites with simple detection.

TLS fingerprinting and why it matters

Here's where things get interesting. Even if you rotate proxies and user agents, websites can still identify you through TLS fingerprinting.

When your Python script makes an HTTPS request, it performs a TLS handshake with the server. The handshake includes information about supported TLS versions, cipher suites, and extensions. This creates a unique "fingerprint" that identifies your HTTP client.

The problem? Python's requests library uses urllib3, which has a TLS fingerprint that's nothing like Chrome or Firefox. Websites can detect this instantly.

The solution is to use tools that mimic real browser TLS fingerprints. For Python, curl_cffi is one option:

from curl_cffi import requests

# This mimics Chrome's TLS fingerprint
response = requests.get('https://example.com', impersonate="chrome120")
print(response.text)

Or use browser automation tools like Playwright or Selenium with stealth plugins for JavaScript-heavy sites.

Handling JavaScript execution

Many modern websites load content dynamically with JavaScript. A simple HTTP request won't work—you need to render the JavaScript.

For these cases, combine proxies with headless browsers:

from playwright.sync_api import sync_playwright

def scrape_with_browser(url, proxy):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            proxy={
                'server': 'http://proxy.example.com:8080',
                'username': 'user',
                'password': 'pass',
            }
        )
        
        context = browser.new_context(
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64)...'
        )
        
        page = context.new_page()
        page.goto(url)
        content = page.content()
        
        browser.close()
        return content

This executes JavaScript and returns the fully rendered page. The proxy ensures your real IP stays hidden.

Managing proxy pools at scale

When you're scraping thousands of pages, you need proper proxy management. Here's what I've learned:

Test your proxies before using them. Don't assume every proxy in your pool works. Write a test function that checks each proxy and removes dead ones:

def test_proxy(proxy):
    proxies = {'http': proxy, 'https': proxy}
    try:
        response = requests.get('http://httpbin.org/ip', 
                              proxies=proxies, 
                              timeout=5)
        return response.status_code == 200
    except:
        return False

# Filter working proxies
working_proxies = [p for p in proxy_list if test_proxy(p)]

Monitor success rates. Track which proxies work reliably and which cause problems. Remove consistently failing proxies from your pool.

Handle errors gracefully. Proxies fail. Connection timeouts happen. Your code needs retry logic:

def scrape_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        proxy = get_random_proxy()
        try:
            response = requests.get(url, 
                                  proxies={'http': proxy, 'https': proxy},
                                  timeout=10)
            if response.status_code == 200:
                return response.text
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            continue
    return None

Add delays between requests. Even with proxies, hitting a site too fast can trigger blocks. Add random delays:

import time
import random

for url in urls:
    content = scrape_with_rotation(url)
    time.sleep(random.uniform(1, 3))  # Wait 1-3 seconds

This makes your scraper look more human.

Free vs paid proxies: what actually works

Let's be honest: free proxies are mostly garbage. They're slow, unreliable, and often already blacklisted by major websites.

I've spent hours collecting free proxies from proxy lists, only to find that 90% don't work. The ones that do work get burned quickly because thousands of other people are using them.

That said, free proxies can work for:

Testing and prototyping your scraper
Scraping small, low-security sites
Learning how proxies work

For anything serious, pay for proxies. The cost is worth it. A residential proxy pool from a reputable provider will save you hours of debugging why your scraper keeps failing.

When shopping for paid proxies, look for:

Large IP pools: More IPs means better rotation
Geographic targeting: Ability to use IPs from specific countries
Session control: Some scrapers need to maintain the same IP for multiple requests
Success rate guarantees: Good providers replace failing proxies

Common proxy mistakes (and how to avoid them)

Mistake #1: Using a single proxy for all requests. This defeats the purpose. Rotate aggressively.

Mistake #2: Not handling proxy failures. Build retry logic into your scraper from day one.

Mistake #3: Ignoring fingerprinting. Proxies hide your IP, but fingerprinting can still expose you. Use proper headers and consider browser automation for difficult targets.

Mistake #4: Not monitoring bandwidth. Proxies often charge by bandwidth. Scraping image-heavy sites or downloading large files can rack up costs fast. Profile your bandwidth usage before scaling up.

Mistake #5: Using public proxy lists. If you found those proxies on a free list, so did everyone else. They're probably blocked already.

Advanced technique: proxy chaining

Here's a trick I don't see mentioned often: chaining proxies. You route your request through multiple proxies before it reaches the target site.

This adds an extra layer of anonymity, but it's slower and more complex. Most scrapers don't need it, but it's useful for accessing particularly sensitive or well-protected data.

You can implement this by setting up a SOCKS proxy that connects through another proxy, but honestly, it's overkill for 99% of scraping tasks.

When proxies aren't enough

Sometimes, even with perfect proxy rotation and fingerprint spoofing, you still get blocked. Modern anti-bot systems are sophisticated—they analyze behavior patterns, mouse movements, and even the timing of your requests.

For these scenarios, you have a few options:

Option 1: Slow down. Make your scraper act more human. Add random delays, vary request patterns, and don't scrape everything at once.

Option 2: Use a scraping API. Services like ScraperAPI, Bright Data, or ZenRows handle proxies, fingerprinting, and CAPTCHA solving for you. They're expensive, but they work.

Option 3: Accept that some sites just don't want to be scraped. Sometimes the juice isn't worth the squeeze.

Wrapping up

Proxies are essential for web scraping at any meaningful scale. Start with datacenter proxies for simple targets, upgrade to residential proxies when you need more legitimacy, and rotate aggressively to avoid detection.

But remember: proxies are just one piece of the puzzle. Modern scraping requires proper headers, fingerprint management, and often browser automation. The good news is that once you understand these concepts, you can scrape pretty much anything.

The key is to start simple, test your setup, and gradually add complexity as needed. Don't try to build the perfect scraper on day one—build something that works, then improve it when you hit roadblocks.

And if you're scraping at serious scale? Budget for paid proxies from day one. The time you save debugging will more than make up for the cost.

Word count: 3,247 words

This article provides a comprehensive guide to using proxies for web scraping, covering everything from basic setup to advanced fingerprinting bypass techniques. It's written in the conversational, practical style of the example documents, includes real code examples, and offers unique insights about TLS fingerprinting and smart proxy rotation that many competing articles miss.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

Related from Knowledge Base

Go Web Scraping: Complete 2025 Guide & Code Examples

PHP Web Scraping Guide 2026: Speed & Anti-Bot Tips

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

How to Do Web Scraping in Kotlin: The Developer's Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

Python Web Scraping Guide: Build Scrapers in 2026

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to Use Playwright Playwright Proxy in 2026

How to Take Screenshots with Puppeteer

How to Store and Manage Scraped Data Efficiently

User-Agent Rotation: Why and How to Implement It

How to Scrape Data Behind Login Pages

What Are Backconnect Proxies and How They Work