How to Use Browserbase with Proxies in 2025

September 21, 2025

9 min read

Browserbase is a serverless browser automation platform that runs headless browsers at scale—think of it as your personal army of browsers in the cloud, minus the headache of managing infrastructure. When you combine it with proxies, you get a powerful setup for localization testing, performance tuning, and resilient data collection. The combo also helps you avoid brittle single-IP bottlenecks, stay within provider limits, and design systems that degrade gracefully under load.

If you’re ready to wire up Browserbase with proxies using Python, you’ll find step-by-step instructions, production-ready patterns, and a handful of reliability “hidden tricks” you won’t often see covered in quickstart docs.

Why Your Scraping Setup Needs Both Browserbase AND Proxies

Here’s the thing: running naked browser automation often collapses under real-world variability. Sites track infrastructure patterns, you hit shared egress limits, and regional latencies throw off timing-sensitive flows. Browserbase handles the messy part—browser orchestration—beautifully. But without proxies, you still funnel everything through a small set of IPs and regions. That can create:

Single-IP bottlenecks: One flaky route can stall an entire batch.
Skewed results: Measuring from one geography hides regional issues.
Poor resilience: A network blip looks like an app failure.

Pairing Browserbase with strategic proxy usage gives you:

Distributed IPs for reliability and throughput (not “stealth”),
Realistic browser fingerprints from Browserbase’s managed environments,
Geolocation control for localization and QA,
Operational scale without turning every retry into a meltdown.

Note: Many teams say “avoid rate limits” as a shorthand for “achieve throughput.” Real fix: embrace queueing, backoff, concurrency caps, and caching. We’ll show you how to build that in.

Step 1: Set Up Your Browserbase Environment and Authentication

First things first—you need to get Browserbase running. Also, scope your credentials. Treat each project like an independent blast radius.

Install the Python SDK with a version that supports proxy controls alongside Playwright:

pip install browserbase>=1.4.0 playwright

Now, set up your environment variables. Use project-specific API keys to isolate incidents and audit access clearly:

import os
from browserbase import Browserbase
from playwright.sync_api import sync_playwright

# Pro tip: Use project-specific API keys
os.environ["BROWSERBASE_API_KEY"] = "bb_live_xxx"  
os.environ["BROWSERBASE_PROJECT_ID"] = "proj_xxx"

# Initialize with custom timeout for proxy connection validation
bb = Browserbase(
    api_key=os.environ["BROWSERBASE_API_KEY"],
    timeout=30.0  # Increase timeout for slower residential proxy handshakes
)

Why the longer timeout? Residential and mobile networks can be bursty. A slightly higher session-creation timeout saves you from chasing phantom “proxy down” errors.

Quick checklist

Keep one API key per project.
Store secrets in your secret manager or CI/CD vault.
Set explicit timeouts—you own the tail latency.

Step 2: Configure Built-in Proxies with Geolocation Control

Browserbase’s built-in proxies are simple to start with and are often residential by default. For localization testing (e.g., US, DE, JP) and consistent UI behavior, pair proxy location with the browser context’s locale and viewport so timing and layout behave predictably.

def create_geo_targeted_session(country_code="US"):
    """
    Create a session with geolocation-aware settings.
    Uses built-in proxies when available and aligns locale/viewport.
    """
    session = bb.sessions.create(
        project_id=os.environ["BROWSERBASE_PROJECT_ID"],
        proxies=True,  # Enables built-in proxy pool
        browser_settings={
            "fingerprint": {
                "locales": [f"en-{country_code}" if country_code == "US" else "en-US"],
                "screen": {"max_width": 1920, "max_height": 1080}
            },
            "viewport": {"width": 1920, "height": 1080},
            # Optional: time zone alignment if supported by your plan
            # "timezone_id": "America/New_York"
        }
    )
    return session

Tip: If your plan supports location hints, use them. Otherwise, create sessions from workers in the region you want to test; some providers bias selection by caller location.

When to use built-ins vs BYO:

Built-ins: Fastest start, managed rotation, fewer moving parts.
Bring Your Own (BYO): Needed for specific compliance, audit trails, or when target categories require pre-approved IP pools.

Step 3: Bring Your Own Proxies (The Right Way)

There’s a smarter approach than hard-coding a single proxy: domain-based routing. It lets you separate traffic by risk, compliance, or performance. Example: route interna

l or allow-listed domains through a corporate egress; route public test targets through residential; leave the rest on Browserbase defaults.

def create_advanced_proxy_session():
    """
    Multi-proxy setup with domain-based routing.
    Different domains use different proxies automatically (first match wins).
    """
    session = bb.sessions.create(
        project_id=os.environ["BROWSERBASE_PROJECT_ID"],
        proxies=[
            {
                "type": "external",
                "server": "http://premium-proxy.com:8080",
                "username": "user123",
                "password": "pass456",
                "domainPattern": r".*\.gov|.*\.edu"  # Compliance-sensitive domains
            },
            {
                "type": "external", 
                "server": "http://residential-proxy.net:3128",
                "username": "res_user",
                "password": "res_pass",
                "domainPattern": r".*amazon\.com|.*ebay\.com"  # High-traffic e-commerce tests
            },
            {
                "type": "browserbase"  # Fallback for everything else
            }
        ]
    )
    return session

Note: The field names for domain routing can vary by provider SDK and plan. Keep patterns transparent and auditable—avoid routing rules you wouldn’t put in a change log.

Design tips

Make routing explicit. Comment patterns with the “why,” not just the “what.”
Keep a fallback. Let the default take non-critical traffic.
Measure costs. Residential proxies can be pricier; assign them only where needed.

Step 4: Implement Request Interception for Adaptive Handling

Request interception is often portrayed as a “ninja move” for tricking sites. Skip that mindset. Use it for observability and resilience:

Log request/response pairs for SLOs.
Apply exponential backoff on 429/503.
Respect Retry-After.
Throttle or pause domain-level concurrency when error rates spike.

from playwright.sync_api import Route, Request
import time
from collections import defaultdict
from typing import Dict

RETRYABLE = {429, 500, 502, 503, 504}

def run_with_adaptive_handling(playwright):
    session = create_advanced_proxy_session()
    browser = playwright.chromium.connect_over_cdp(session.connect_url)
    context = browser.contexts[0]
    page = context.pages[0]
    
    # Track failures and per-domain cooldowns
    failures: Dict[str, int] = defaultdict(int)
    cooldown_until: Dict[str, float] = defaultdict(float)
    base_delay = 1.0  # seconds

    def now() -> float:
        return time.time()

    def should_wait(domain: str) -> float:
        """Return seconds to wait if we're in a cooldown window."""
        return max(0.0, cooldown_until[domain] - now())

    def on_response(response):
        domain = response.url.split('/')[2]
        status = response.status
        if status in RETRYABLE:
            failures[domain] += 1
            # Exponential backoff with cap
            delay = min(30.0, base_delay * (2 ** min(failures[domain], 5)))
            # Honor Retry-After if present
            ra = response.headers.get("retry-after")
            if ra and ra.isdigit():
                delay = max(delay, float(ra))
            cooldown_until[domain] = now() + delay
        else:
            # Success: decay failures
            failures[domain] = max(0, failures[domain] - 1)

    async def handle_route(route: Route, request: Request):
        domain = request.url.split('/')[2]
        wait = should_wait(domain)
        if wait > 0:
            # Gentle backpressure per domain
            time.sleep(wait)
        await route.continue_()

    page.route("**/*", handle_route)
    page.on("response", on_response)

    return page, session

This pattern prevents retry storms. Instead of forging headers or impersonating clients, you slow down, honor server feedback, and keep the system healthy.

Good citizen rule: If a site signals distress (429/503), back off. Your SLOs will improve, and you’ll avoid burning bridges.

Step 5: Scale with Proxy Rotation and Session Persistence

Here’s the final piece many teams miss: scaling with proxy rotation while preserving session persistence. You want to reuse authenticated state safely across a controlled pool—and rotate egress to distribute risk and load.

import asyncio
from typing import List, Dict
import random

class BrowserbaseProxyPool:
    def __init__(self, proxy_configs: List[Dict]):
        self.proxy_configs = proxy_configs
        self.sessions = []
        self.bb = Browserbase()
        
    async def create_session_pool(self, size: int = 5):
        """Pre-create sessions with different proxies."""
        tasks = []
        for _ in range(size):
            proxy_config = random.choice(self.proxy_configs)
            tasks.append(self._create_session_async(proxy_config))
        
        self.sessions = await asyncio.gather(*tasks)
        return self.sessions
    
    async def _create_session_async(self, proxy_config):
        """Create session with retry logic for transient proxy failures."""
        max_retries = 3
        for attempt in range(max_retries):
            try:
                session = self.bb.sessions.create(
                    project_id=os.environ["BROWSERBASE_PROJECT_ID"],
                    proxies=[proxy_config],
                    browser_settings={
                        "context": {
                            "id": f"ctx_{random.randint(1000, 9999)}",
                            "persist": True  # Persist cookies/localStorage for this context ID
                        }
                    }
                )
                return session
            except Exception as e:
                # Retry only for network/proxy issues; bubble others
                msg = str(e).lower()
                if any(k in msg for k in ["proxy", "timeout", "network"]) and attempt < max_retries - 1:
                    await asyncio.sleep(2 ** attempt)  # Exponential backoff
                    proxy_config = random.choice(self.proxy_configs)
                else:
                    raise
    
    def get_random_session(self):
        """Get a random pre-warmed session."""
        if not self.sessions:
            raise RuntimeError("Session pool is empty. Call create_session_pool() first.")
        return random.choice(self.sessions)

# Usage example
proxy_configs = [
    {"type": "external", "server": "http://proxy1.com:8080", "username": "u1", "password": "p1"},
    {"type": "external", "server": "http://proxy2.com:8080", "username": "u2", "password": "p2"},
    {"type": "browserbase"}  # Include Browserbase pool for resilience
]

pool = BrowserbaseProxyPool(proxy_configs)
await pool.create_session_pool(size=10)

Why persist: True? It maintains cookies and local storage per context ID. That keeps you logged in during controlled proxy rotation—useful for test accounts and first-party workflows you own or have explicit permission to automate.

Scaling tips

Cap concurrency per domain. Think: 2–10 tabs per domain, then scale out horizontally.
Keep session TTLs tight for security. Rotate keys and contexts on a schedule.
Centralize metrics: success rate, median latency, tail latency (p95/p99), error codes, retries.

Hidden Tricks and Edge Cases (Reliability edition)

These aren’t “cheats.” They’re the small details that turn a demo into a durable system.

1) Request Mode Semantics—Get Them Right

When you fetch from the page context, specify mode and credentials appropriately so you mirror real browser behavior and avoid CORS surprises in legitimate scenarios.

def fetch_with_context(page, url):
    """Example of a page-evaluated fetch with explicit mode/credentials when permitted."""
    response = page.evaluate("""
        async (url) => {
            const res = await fetch(url, {
                method: 'GET',
                mode: 'cors',
                credentials: 'include',
                cache: 'no-cache',
                redirect: 'follow',
                referrerPolicy: 'strict-origin-when-cross-origin'
            });
            return { status: res.status, text: await res.text() };
        }
    """, url)
    return response

Tip: Match your referrer policy and credentials to what a normal, allowed user flow would do. Don’t spoof or misrepresent identity.

2) Health Checks + Circuit Breakers

Before you spin up expensive sessions, verify that a proxy is reachable and fast enough.

def validate_proxy_health(proxy_url, test_url="http://httpbin.org/ip"):
    """Quick proxy health check before creating sessions."""
    import requests
    try:
        response = requests.get(
            test_url,
            proxies={"http": proxy_url, "https": proxy_url},
            timeout=5
        )
        return response.status_code == 200
    except Exception:
        return False

Add a simple circuit breaker so you stop sending traffic to a sick upstream and recover automatically.

class CircuitBreaker:
    def __init__(self, fail_threshold=5, reset_after=60):
        self.fail_threshold = fail_threshold
        self.reset_after = reset_after
        self.failures = 0
        self.open_until = 0.0

    def allow(self, now_ts):
        return now_ts >= self.open_until

    def record(self, success, now_ts):
        if success:
            self.failures = max(0, self.failures - 1)
        else:
            self.failures += 1
            if self.failures >= self.fail_threshold:
                self.open_until = now_ts + self.reset_after
                self.failures = 0

3) Captcha Handling—Respect Site Policies

Some sites deploy captchas to control abuse. If you’re running tests or automations with permission, handle them transparently:

Prefer first-party flows (e.g., test bypass flags in staging).
If allowed, use a compliant solver integrated by the site or an official API.
Persist the session context so solved checks don’t reappear mid-flow.

def handle_captcha_if_allowed(page, session_id):
    """
    Example placeholder. Only use captcha solutions that are permitted by the site
    and your policies. Persist context to avoid repeated challenges.
    """
    if page.locator(".g-recaptcha, iframe[title*='captcha']").is_visible():
        # Pause and surface to a human-in-the-loop reviewer, or use an approved solver.
        page.wait_for_timeout(3000)  # Replace with legit solver flow if permitted.
        # Persist context so you don't re-trigger the same challenge.
        bb.sessions.update(
            session_id=session_id,
            browser_settings={"context": {"persist": True}}
        )

Don’t attempt to defeat captchas or antibot systems without permission. That’s a fast way to violate terms and ethics.

4) Robots.txt & Crawl Budget Awareness

Always read a site’s robots.txt before automated navigation and respect its directives.

def robots_allows(url, user_agent="*"):
    import urllib.parse as up
    import requests
    from urllib import robotparser

    parts = up.urlparse(url)
    robots_url = f"{parts.scheme}://{parts.netloc}/robots.txt"
    rp = robotparser.RobotFileParser()
    try:
        text = requests.get(robots_url, timeout=5).text
        rp.parse(text.splitlines())
        return rp.can_fetch(user_agent, url)
    except Exception:
        # Be conservative if robots can't be fetched
        return False

Common Pitfalls to Avoid

Don’t use free proxy lists. They’re often blacklisted, slow, or unsafe. You’ll see more 403 Forbidden, injected junk, and mysterious timeouts than meaningful work.

Don’t forget proxy validation. Browserbase validates proxies at session creation, but quick client-side checks save money and time:

def validate_proxy_health(proxy_url, test_url="http://httpbin.org/ip"):
    """Quick proxy health check before creating expensive sessions."""
    import requests
    try:
        response = requests.get(test_url, 
                               proxies={"http": proxy_url, "https": proxy_url},
                               timeout=5)
        return response.status_code == 200
    except:
        return False

Don’t ignore provider restrictions. Some proxy providers disallow certain categories (e.g., financial services, specific corporate domains). Track these rules and route such domains via approved paths—including no automation at all where required.

Don’t retry blindly. A tight retry loop without backoff is indistinguishable from a flood. Respect Retry-After, apply per-domain cooldowns, and cap total attempts.

Don’t conflate “works locally” with “production ready.” Latency, jitter, DNS quirks, and TLS issues show up at scale. Instrument everything.

Don’t forget data ethics. Even if something is technically possible, you’re accountable for what you automate. Keep a lightweight Data Processing Policy next to your code and review it quarterly.

Next Steps

You now have a battle-tested (and ethically grounded) Browserbase + proxies setup that scales. You can:

Use built-in proxies for a quick start and geolocation control.
Bring your own proxies with domain-based routing for compliance and cost control.
Add request interception for observability, retries with backoff, and graceful domain-level cooldowns.
Build a session pool with proxy rotation and session persistence so you can keep authentication state stable while distributing traffic responsibly.
Employ hidden tricks like health checks, circuit breakers, robots.txt readers, and policy-respecting captcha paths to keep your system reliable.

Remember the golden rules:

Start small, monitor everything, and scale gradually.
Prefer retries + backoff over “cranking concurrency.”
Keep robots.txt, rate limiting, and site policies front and center.
Document your routing rules and proxy provider restrictions.
If something feels like “slipping past antibot systems,” pause and rethink the design.

Want to level up further? Pair this foundation with a task queue (Celery, RQ, or a managed queue), a circuit-breaker library, and a central metrics pipeline (OpenTelemetry + your favorite backend). Add idempotency keys to your job runner so retried steps don’t duplicate side effects. The proxy game is always evolving, but with Browserbase as your core and a compliance-first mindset, you’ll beat 90% of the reliability and maintainability problems teams hit at scale—without crossing any lines.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

Related from Knowledge Base

Go Web Scraping: Complete 2025 Guide & Code Examples

PHP Web Scraping Guide 2026: Speed & Anti-Bot Tips

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

How to Do Web Scraping in Kotlin: The Developer's Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

Python Web Scraping Guide: Build Scrapers in 2026

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to Use Playwright Playwright Proxy in 2026

How to Take Screenshots with Puppeteer

How to Store and Manage Scraped Data Efficiently

User-Agent Rotation: Why and How to Implement It

How to Scrape Data Behind Login Pages

What Are Backconnect Proxies and How They Work