How to Fix BeautifulSoup 403 Error

Running into a 403 Forbidden error while scraping with BeautifulSoup? You’re definitely not the only one. It’s one of the most common issues web scrapers face—and one of the most frustrating. A 403 means the website you’re trying to access has flagged your request as suspicious and blocked it before your scraper even gets a chance to parse the content.

Over the years, I’ve built dozens of scrapers—from small research tools to full-scale data extraction systems—and I’ve dealt with my fair share of 403s. In one recent project, my scraper’s access rate was stuck at just 30% due to constant 403s. But after applying the strategies I’ll walk you through in this guide, I pushed that success rate past 95%.

Here’s exactly how to get around those pesky 403s and keep your BeautifulSoup scraper running smoothly.

Why You Can Trust These Solutions

Before diving into the fix, it’s important to clear something up: BeautifulSoup isn’t what’s causing the 403 error. BeautifulSoup (or known as Bs4) simply parses the HTML after your script fetches it. The real culprit?

The way your HTTP request is being made—usually through requests or urllib.

Websites are increasingly savvy about detecting scrapers. Whether it’s through missing headers, suspicious request timing, or known bot IPs, anti-scraping measures are designed to block anything that doesn’t behave like a human visitor.

These solutions come directly from real-world experience, including scraping sites that are protected by advanced systems like Cloudflare.

Step 1: Understand What Causes the 403 Error

A 403 Forbidden error occurs when a server recognizes your request as coming from a bot rather than a real user and subsequently denies access. This happens because:

  1. Default User Agents: Libraries like Python's requests use identifiable user agents (e.g., "python-requests/2.31.0") that websites can easily detect.
  2. Missing or Incomplete Headers: Human browsers send numerous HTTP headers with each request, while basic scraping attempts often lack these.
  3. Request Patterns: Sending too many requests too quickly from the same IP address creates an unnatural pattern that triggers website security systems.
  4. IP Reputation: Some IPs, particularly those from data centers, are known to be used for scraping and may be preemptively blocked.

Understanding these factors helps you craft a more effective solution strategy.

Step 2: Customize Your User Agent

The easiest and most effective first step is to customize your User Agent string to mimic a legitimate browser.

import requests
from bs4 import BeautifulSoup

# Define a browser-like User Agent
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Make the request with the custom User Agent
url = "https://example.com"
response = requests.get(url, headers=headers)

# Check if successful
print(response.status_code)

# Parse with BeautifulSoup if successful
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    # Continue with your scraping logic
else:
    print(f"Failed to access the website: {response.status_code}")

This simple change can often resolve the 403 error for less protected websites. The User Agent string tells the server that your request is coming from a common browser (in this case, Chrome) rather than a scraping library.

Step 3: Add Complete Request Headers

If changing the User Agent doesn't fix the issue, you should enhance your request with a more complete set of browser-like headers.

import requests
from bs4 import BeautifulSoup

# Define comprehensive browser-like headers
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Cache-Control': 'max-age=0'
}

url = "https://example.com"
response = requests.get(url, headers=headers)

# Check if successful
print(response.status_code)

# Parse with BeautifulSoup if successful
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    # Continue with your scraping logic
else:
    print(f"Failed to access the website: {response.status_code}")

These additional headers better mimic what a real browser would send with a request, making it much harder for websites to distinguish your scraper from a legitimate user.

Step 4: Implement Request Delays

Websites often track how quickly requests come in from a single source. Making requests too rapidly is a clear indicator of automated scraping. Adding delays between requests helps your scraper behave more like a human user.

import requests
from bs4 import BeautifulSoup
import time
import random

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

for url in urls:
    # Add a random delay between 3 and 7 seconds
    delay = random.uniform(3, 7)
    time.sleep(delay)
    
    response = requests.get(url, headers=headers)
    
    # Check if successful
    print(f"URL: {url}, Status: {response.status_code}")
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        # Continue with your scraping logic
    else:
        print(f"Failed to access the website: {response.status_code}")

Using random delays rather than fixed intervals makes the request pattern appear even more natural and less predictable.

Step 5: Use Sessions and Cookies

Many websites use cookies to maintain state and authenticate users. Using a session object in your scraper helps manage cookies automatically and creates a more consistent browsing experience that websites expect from human users.

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# Create a session object
session = requests.Session()

# First, visit the homepage to get cookies
home_url = "https://example.com"
session.get(home_url, headers=headers)

# Now visit the target page using the same session (with cookies intact)
target_url = "https://example.com/protected-page"
response = session.get(target_url, headers=headers)

# Check if successful
print(response.status_code)

# Parse with BeautifulSoup if successful
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    # Continue with your scraping logic
else:
    print(f"Failed to access the website: {response.status_code}")

This approach is particularly useful for websites that set cookies during the initial visit and check for them in subsequent requests.

Step 6: Employ Proxies for IP Rotation

If you're still encountering 403 errors, your IP address might be flagged or blocked. Using proxies allows you to rotate your IP address, making it harder for websites to track and block your scraping activities.

import requests
from bs4 import BeautifulSoup
import random

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

# List of proxy IPs (replace with actual proxies)
proxies_list = [
    {'http': 'http://proxy1.example.com:8080', 'https': 'https://proxy1.example.com:8080'},
    {'http': 'http://proxy2.example.com:8080', 'https': 'https://proxy2.example.com:8080'},
    {'http': 'http://proxy3.example.com:8080', 'https': 'https://proxy3.example.com:8080'}
]

url = "https://example.com"

# Try with different proxies until successful
for _ in range(len(proxies_list)):
    # Select a random proxy
    proxy = random.choice(proxies_list)
    
    try:
        response = requests.get(url, headers=headers, proxies=proxy, timeout=10)
        
        # If successful, break the loop
        if response.status_code == 200:
            print(f"Successfully accessed using proxy: {proxy}")
            soup = BeautifulSoup(response.content, 'html.parser')
            # Continue with your scraping logic
            break
    except Exception as e:
        print(f"Error with proxy {proxy}: {e}")

Note that free proxies often have reliability issues. For serious scraping projects, consider using paid proxy services that offer reliable and rotating residential IPs.

Final Thoughts

Fixing 403 errors when scraping with BeautifulSoup requires understanding how websites detect and block bots. By implementing these six steps—customizing your User Agent, adding complete headers, implementing delays, using sessions, and employing proxies—you can significantly improve your chances of successful web scraping.

Remember that not all techniques are necessary for every website. Start with the simplest solutions and progressively add more sophisticated techniques as needed.

Marius Bernard

Marius Bernard

Marius Bernard is a Product Advisor, Technical SEO, & Brand Ambassador at Roundproxies. He was the lead author for the SEO chapter of the 2024 Web and a reviewer for the 2023 SEO chapter.