DuckDuckGo handles over 100 million daily searches. Unlike Google, it doesn't track users or personalize results.

This makes DuckDuckGo a goldmine for unbiased search data.

In this guide, you'll learn exactly how to scrape DuckDuckGo search results using three different methods. I'll show you working code that doesn't rely on expensive third-party APIs.

Whether you need to monitor keyword rankings, gather SERP data for research, or build a search aggregator, these techniques will get you there.

What You Need to Scrape DuckDuckGo

DuckDuckGo scraping requires different approaches depending on which version you target. The search engine serves two distinct page types:

The static HTML version lives at html.duckduckgo.com. It renders without JavaScript and uses traditional pagination. This version is faster to scrape and requires fewer resources.

The dynamic version at duckduckgo.com requires JavaScript rendering. It includes features like AI-generated summaries and infinite scroll pagination. Scraping this version demands browser automation tools.

Feature Static Version Dynamic Version
URL html.duckduckgo.com/html/?q= duckduckgo.com/?q=
JavaScript Required No Yes
Pagination "Next" button "More Results" button
AI Summaries No Yes
Scraping Difficulty Easy Moderate

Most scraping projects work fine with the static version. The code runs faster and uses less memory.

Let's start with the simplest approach.

Method 1: Scrape DuckDuckGo With HTTP Requests

This method uses Python's requests library combined with BeautifulSoup for parsing. It targets the static HTML version and works well for most use cases.

Setting Up Your Environment

First, create a project folder and virtual environment:

mkdir duckduckgo-scraper
cd duckduckgo-scraper
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:

pip install requests beautifulsoup4

Building the Basic Scraper

Create a file named scraper.py and add the following imports:

import requests
from bs4 import BeautifulSoup
import csv
import time

The requests library handles HTTP connections. BeautifulSoup parses the HTML response into a searchable tree structure.

Now add the core scraping function:

def scrape_duckduckgo(query, num_pages=1):
    """
    Scrape DuckDuckGo search results for a given query.
    
    Args:
        query: Search term to look up
        num_pages: Number of result pages to scrape
    
    Returns:
        List of dictionaries containing scraped results
    """
    base_url = "https://html.duckduckgo.com/html/"
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    
    all_results = []
    
    params = {"q": query}
    
    for page in range(num_pages):
        response = requests.get(base_url, params=params, headers=headers)
        
        if response.status_code != 200:
            print(f"Error: Received status code {response.status_code}")
            break
            
        results, next_params = parse_results(response.text)
        all_results.extend(results)
        
        if not next_params:
            break
            
        params = next_params
        time.sleep(1)  # Be respectful to the server
    
    return all_results

This function sends GET requests to DuckDuckGo's static search page. The User-Agent header makes the request look like it's coming from a real browser.

Without this header, DuckDuckGo returns a 403 Forbidden error.

Parsing Search Results

Add the parsing function that extracts data from the HTML:

def parse_results(html):
    """
    Parse DuckDuckGo HTML and extract search results.
    
    Args:
        html: Raw HTML string from the response
    
    Returns:
        Tuple of (results list, next page params)
    """
    soup = BeautifulSoup(html, "html.parser")
    results = []
    
    # Find all result containers
    result_elements = soup.select("#links .result")
    
    for element in result_elements:
        # Extract the title and URL
        title_link = element.select_one(".result__a")
        if not title_link:
            continue
            
        title = title_link.get_text(strip=True)
        url = title_link.get("href", "")
        
        # DuckDuckGo uses protocol-relative URLs
        if url.startswith("//"):
            url = "https:" + url
        
        # Extract the display URL
        display_url_elem = element.select_one(".result__url")
        display_url = display_url_elem.get_text(strip=True) if display_url_elem else ""
        
        # Extract the snippet
        snippet_elem = element.select_one(".result__snippet")
        snippet = snippet_elem.get_text(strip=True) if snippet_elem else ""
        
        results.append({
            "title": title,
            "url": url,
            "display_url": display_url,
            "snippet": snippet
        })
    
    # Get next page parameters
    next_params = get_next_page_params(soup)
    
    return results, next_params

The CSS selectors target specific elements in DuckDuckGo's HTML structure. Each result sits inside a container with the result class.

Handling Pagination

DuckDuckGo's pagination works through form submissions. Add this function to extract the next page parameters:

def get_next_page_params(soup):
    """
    Extract parameters needed to fetch the next page.
    
    Args:
        soup: BeautifulSoup object of current page
    
    Returns:
        Dictionary of form parameters or None if no next page
    """
    next_form = soup.select_one(".nav-link form")
    
    if not next_form:
        return None
    
    params = {}
    
    for input_elem in next_form.select("input"):
        name = input_elem.get("name")
        value = input_elem.get("value", "")
        
        if name:
            params[name] = value
    
    return params

The static version uses a hidden form for pagination. This function extracts all form fields and passes them to the next request.

Saving Results to CSV

Add a function to export the scraped data:

def save_to_csv(results, filename):
    """
    Save scraped results to a CSV file.
    
    Args:
        results: List of result dictionaries
        filename: Output file path
    """
    if not results:
        print("No results to save")
        return
    
    fieldnames = results[0].keys()
    
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(results)
    
    print(f"Saved {len(results)} results to {filename}")

Running the Scraper

Add the main execution block:

if __name__ == "__main__":
    query = "python web scraping tutorial"
    results = scrape_duckduckgo(query, num_pages=3)
    save_to_csv(results, "duckduckgo_results.csv")
    
    # Print a sample
    for result in results[:5]:
        print(f"\nTitle: {result['title']}")
        print(f"URL: {result['url']}")
        print(f"Snippet: {result['snippet'][:100]}...")

Run it with:

python scraper.py

You'll get a CSV file containing titles, URLs, display URLs, and snippets from DuckDuckGo's search results.

Method 2: Scrape DuckDuckGo With Browser Automation

Some projects require the dynamic version with JavaScript-rendered content. Browser automation handles this by controlling a real browser instance.

Playwright offers a cleaner API than Selenium and runs faster. Let's build a scraper using it.

Installing Playwright

pip install playwright
playwright install chromium

The second command downloads the Chromium browser binary that Playwright controls.

Building the Browser-Based Scraper

Create browser_scraper.py:

from playwright.sync_api import sync_playwright
import json
import time

def scrape_duckduckgo_dynamic(query, max_results=30):
    """
    Scrape DuckDuckGo using browser automation.
    
    Args:
        query: Search term
        max_results: Maximum results to collect
    
    Returns:
        List of result dictionaries
    """
    results = []
    
    with sync_playwright() as p:
        # Launch browser in headless mode
        browser = p.chromium.launch(headless=True)
        
        context = browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        )
        
        page = context.new_page()
        
        # Navigate to DuckDuckGo
        search_url = f"https://duckduckgo.com/?q={query}"
        page.goto(search_url, wait_until="networkidle")
        
        # Wait for results to load
        page.wait_for_selector("[data-testid='result']", timeout=10000)
        
        while len(results) < max_results:
            # Extract visible results
            new_results = extract_results(page)
            
            for result in new_results:
                if result not in results:
                    results.append(result)
            
            if len(results) >= max_results:
                break
            
            # Click "More Results" if available
            more_button = page.query_selector("button:has-text('More Results')")
            
            if more_button:
                more_button.click()
                time.sleep(2)
            else:
                break
        
        browser.close()
    
    return results[:max_results]

Playwright waits for the network to become idle before proceeding. This ensures all JavaScript has finished executing.

Extracting Results From the Dynamic Page

def extract_results(page):
    """
    Extract search results from the current page state.
    
    Args:
        page: Playwright page object
    
    Returns:
        List of result dictionaries
    """
    results = []
    
    # The dynamic version uses data-testid attributes
    result_elements = page.query_selector_all("[data-testid='result']")
    
    for element in result_elements:
        try:
            title_elem = element.query_selector("h2 a")
            snippet_elem = element.query_selector("[data-result='snippet']")
            
            if not title_elem:
                continue
            
            title = title_elem.inner_text()
            url = title_elem.get_attribute("href")
            snippet = snippet_elem.inner_text() if snippet_elem else ""
            
            results.append({
                "title": title,
                "url": url,
                "snippet": snippet
            })
            
        except Exception as e:
            continue
    
    return results

The dynamic version's HTML structure differs from the static version. It uses data-testid attributes for testing, which also make scraping easier.

Running the Browser Scraper

if __name__ == "__main__":
    results = scrape_duckduckgo_dynamic("machine learning courses", max_results=50)
    
    print(f"Scraped {len(results)} results")
    
    with open("dynamic_results.json", "w") as f:
        json.dump(results, f, indent=2)

Browser automation uses more resources than HTTP requests. Reserve it for cases where you specifically need JavaScript-rendered content.

Method 3: Using the DDGS Python Library

DDGS (formerly duckduckgo-search) provides a high-level interface for DuckDuckGo scraping. It handles all the parsing logic internally.

Installing DDGS

pip install -U ddgs

Scraping With DDGS

The library supports both Python code and command-line usage:

from ddgs import DDGS

def search_with_ddgs(query, max_results=20):
    """
    Search DuckDuckGo using the DDGS library.
    
    Args:
        query: Search term
        max_results: Number of results to return
    
    Returns:
        List of result dictionaries
    """
    results = []
    
    with DDGS() as ddgs:
        for result in ddgs.text(query, max_results=max_results):
            results.append({
                "title": result.get("title"),
                "url": result.get("href"),
                "snippet": result.get("body")
            })
    
    return results

# Usage
results = search_with_ddgs("best python frameworks 2024", max_results=30)

DDGS also offers a command-line interface:

ddgs text -q "python web scraping" -m 20 -o results.csv

This outputs results directly to a CSV file without writing any code.

Additional DDGS Features

The library supports multiple search types:

from ddgs import DDGS

with DDGS() as ddgs:
    # Image search
    images = list(ddgs.images("sunset beach", max_results=10))
    
    # News search
    news = list(ddgs.news("tech industry", max_results=10))
    
    # Video search
    videos = list(ddgs.videos("python tutorial", max_results=10))

DDGS abstracts away the complexity but offers less flexibility than custom scrapers.

Avoiding Blocks When You Scrape DuckDuckGo

DuckDuckGo implements rate limiting to prevent abuse. Making too many requests from the same IP triggers blocks.

Signs You're Being Blocked

Watch for these indicators:

  • HTTP 403 Forbidden responses
  • CAPTCHA challenges appearing
  • Empty result pages
  • Longer response times followed by connection drops

Implementing Request Delays

Add delays between requests to reduce detection:

import random
import time

def respectful_request(url, params, headers):
    """Make a request with random delay."""
    # Random delay between 1-3 seconds
    delay = random.uniform(1, 3)
    time.sleep(delay)
    
    return requests.get(url, params=params, headers=headers)

Random delays look more natural than fixed intervals.

Rotating User Agents

Cycle through different user agent strings:

import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
]

def get_random_headers():
    return {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "keep-alive"
    }

Using Rotating Proxies for Scale

For large-scale scraping, rotating proxies are essential. Each request goes through a different IP address, making it impossible for DuckDuckGo to identify your scraper.

Residential proxies work best because they use real home IP addresses. We offer residential, datacenter, ISP, and mobile proxy options that integrate easily with Python:

def scrape_with_proxy(query, proxy_url):
    """
    Make a request through a rotating proxy.
    
    Args:
        query: Search term
        proxy_url: Proxy connection string
    
    Returns:
        Response object
    """
    proxies = {
        "http": proxy_url,
        "https": proxy_url
    }
    
    base_url = "https://html.duckduckgo.com/html/"
    params = {"q": query}
    headers = get_random_headers()
    
    response = requests.get(
        base_url,
        params=params,
        headers=headers,
        proxies=proxies,
        timeout=30
    )
    
    return response

With rotating proxies, you can scrape thousands of queries without hitting rate limits.

Handling CAPTCHAs

If you encounter CAPTCHAs frequently, consider these approaches:

  1. Reduce request frequency
  2. Use higher-quality residential proxies
  3. Implement exponential backoff on errors
  4. Switch to the static version which triggers fewer CAPTCHAs
def exponential_backoff(func, max_retries=5):
    """Retry with exponential backoff on failure."""
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed. Waiting {wait_time:.1f}s")
            time.sleep(wait_time)

Complete Production Scraper

Here's a complete script combining all the techniques:

import requests
from bs4 import BeautifulSoup
import csv
import time
import random
from typing import List, Dict, Optional

class DuckDuckGoScraper:
    """Production-ready DuckDuckGo scraper with anti-detection measures."""
    
    def __init__(self, proxy_url: Optional[str] = None):
        self.base_url = "https://html.duckduckgo.com/html/"
        self.proxy_url = proxy_url
        self.session = requests.Session()
        
        self.user_agents = [
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/605.1.15",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Firefox/121.0",
        ]
    
    def _get_headers(self) -> Dict[str, str]:
        return {
            "User-Agent": random.choice(self.user_agents),
            "Accept": "text/html,application/xhtml+xml",
            "Accept-Language": "en-US,en;q=0.5",
        }
    
    def _make_request(self, params: Dict) -> Optional[str]:
        proxies = None
        if self.proxy_url:
            proxies = {"http": self.proxy_url, "https": self.proxy_url}
        
        time.sleep(random.uniform(1, 2))
        
        try:
            response = self.session.get(
                self.base_url,
                params=params,
                headers=self._get_headers(),
                proxies=proxies,
                timeout=30
            )
            response.raise_for_status()
            return response.text
        except requests.RequestException as e:
            print(f"Request failed: {e}")
            return None
    
    def _parse_results(self, html: str) -> tuple:
        soup = BeautifulSoup(html, "html.parser")
        results = []
        
        for element in soup.select("#links .result"):
            title_link = element.select_one(".result__a")
            if not title_link:
                continue
            
            url = title_link.get("href", "")
            if url.startswith("//"):
                url = "https:" + url
            
            results.append({
                "title": title_link.get_text(strip=True),
                "url": url,
                "snippet": element.select_one(".result__snippet").get_text(strip=True) if element.select_one(".result__snippet") else ""
            })
        
        # Get next page params
        next_form = soup.select_one(".nav-link form")
        next_params = None
        
        if next_form:
            next_params = {}
            for inp in next_form.select("input"):
                if inp.get("name"):
                    next_params[inp.get("name")] = inp.get("value", "")
        
        return results, next_params
    
    def scrape(self, query: str, max_pages: int = 1) -> List[Dict]:
        all_results = []
        params = {"q": query}
        
        for page in range(max_pages):
            html = self._make_request(params)
            if not html:
                break
            
            results, next_params = self._parse_results(html)
            all_results.extend(results)
            
            if not next_params:
                break
            
            params = next_params
            print(f"Scraped page {page + 1}, total results: {len(all_results)}")
        
        return all_results
    
    def save_csv(self, results: List[Dict], filename: str):
        if not results:
            return
        
        with open(filename, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=results[0].keys())
            writer.writeheader()
            writer.writerows(results)


if __name__ == "__main__":
    scraper = DuckDuckGoScraper()
    results = scraper.scrape("best programming languages 2024", max_pages=3)
    scraper.save_csv(results, "output.csv")
    print(f"Done! Scraped {len(results)} results")

This class-based approach keeps code organized and makes it easy to add features like proxy rotation.

Conclusion

You now have three reliable ways to scrape DuckDuckGo search results:

HTTP requests with BeautifulSoup work best for the static version. This approach is fast, lightweight, and handles most use cases.

Browser automation with Playwright handles the dynamic JavaScript version. Use this when you need AI summaries or other dynamic content.

The DDGS library provides a quick solution for simple scraping tasks. It's perfect for prototyping or one-off data collection.

For production scraping at scale, combine these techniques with rotating proxies and respect DuckDuckGo's servers with appropriate delays.

Start with the static version scraper. It covers 90% of use cases and runs much faster than browser automation.

FAQ

Web scraping public information is generally legal. However, you should review DuckDuckGo's terms of service and robots.txt. Avoid overwhelming their servers with excessive requests.

Why do I get 403 errors when scraping DuckDuckGo?

DuckDuckGo returns 403 errors when it detects automated requests. Add a realistic User-Agent header to your requests. If blocks persist, implement request delays and consider using rotating proxies.

How many results can I scrape from DuckDuckGo?

The static version returns about 30 results per page. You can paginate through multiple pages to collect more. Practical limits depend on rate limiting and your proxy infrastructure.

Should I use the static or dynamic version?

Use the static version at html.duckduckgo.com unless you specifically need JavaScript-rendered features like AI summaries. The static version is faster and easier to scrape.

How do I avoid getting blocked?

Implement random delays between requests, rotate User-Agent strings, and use rotating residential proxies for larger projects. Keep request rates reasonable and handle errors gracefully with exponential backoff.