DuckDuckGo handles over 100 million daily searches. Unlike Google, it doesn't track users or personalize results.
This makes DuckDuckGo a goldmine for unbiased search data.
In this guide, you'll learn exactly how to scrape DuckDuckGo search results using three different methods. I'll show you working code that doesn't rely on expensive third-party APIs.
Whether you need to monitor keyword rankings, gather SERP data for research, or build a search aggregator, these techniques will get you there.
What You Need to Scrape DuckDuckGo
DuckDuckGo scraping requires different approaches depending on which version you target. The search engine serves two distinct page types:
The static HTML version lives at html.duckduckgo.com. It renders without JavaScript and uses traditional pagination. This version is faster to scrape and requires fewer resources.
The dynamic version at duckduckgo.com requires JavaScript rendering. It includes features like AI-generated summaries and infinite scroll pagination. Scraping this version demands browser automation tools.
| Feature | Static Version | Dynamic Version |
|---|---|---|
| URL | html.duckduckgo.com/html/?q= |
duckduckgo.com/?q= |
| JavaScript Required | No | Yes |
| Pagination | "Next" button | "More Results" button |
| AI Summaries | No | Yes |
| Scraping Difficulty | Easy | Moderate |
Most scraping projects work fine with the static version. The code runs faster and uses less memory.
Let's start with the simplest approach.
Method 1: Scrape DuckDuckGo With HTTP Requests
This method uses Python's requests library combined with BeautifulSoup for parsing. It targets the static HTML version and works well for most use cases.
Setting Up Your Environment
First, create a project folder and virtual environment:
mkdir duckduckgo-scraper
cd duckduckgo-scraper
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the required packages:
pip install requests beautifulsoup4
Building the Basic Scraper
Create a file named scraper.py and add the following imports:
import requests
from bs4 import BeautifulSoup
import csv
import time
The requests library handles HTTP connections. BeautifulSoup parses the HTML response into a searchable tree structure.
Now add the core scraping function:
def scrape_duckduckgo(query, num_pages=1):
"""
Scrape DuckDuckGo search results for a given query.
Args:
query: Search term to look up
num_pages: Number of result pages to scrape
Returns:
List of dictionaries containing scraped results
"""
base_url = "https://html.duckduckgo.com/html/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
all_results = []
params = {"q": query}
for page in range(num_pages):
response = requests.get(base_url, params=params, headers=headers)
if response.status_code != 200:
print(f"Error: Received status code {response.status_code}")
break
results, next_params = parse_results(response.text)
all_results.extend(results)
if not next_params:
break
params = next_params
time.sleep(1) # Be respectful to the server
return all_results
This function sends GET requests to DuckDuckGo's static search page. The User-Agent header makes the request look like it's coming from a real browser.
Without this header, DuckDuckGo returns a 403 Forbidden error.
Parsing Search Results
Add the parsing function that extracts data from the HTML:
def parse_results(html):
"""
Parse DuckDuckGo HTML and extract search results.
Args:
html: Raw HTML string from the response
Returns:
Tuple of (results list, next page params)
"""
soup = BeautifulSoup(html, "html.parser")
results = []
# Find all result containers
result_elements = soup.select("#links .result")
for element in result_elements:
# Extract the title and URL
title_link = element.select_one(".result__a")
if not title_link:
continue
title = title_link.get_text(strip=True)
url = title_link.get("href", "")
# DuckDuckGo uses protocol-relative URLs
if url.startswith("//"):
url = "https:" + url
# Extract the display URL
display_url_elem = element.select_one(".result__url")
display_url = display_url_elem.get_text(strip=True) if display_url_elem else ""
# Extract the snippet
snippet_elem = element.select_one(".result__snippet")
snippet = snippet_elem.get_text(strip=True) if snippet_elem else ""
results.append({
"title": title,
"url": url,
"display_url": display_url,
"snippet": snippet
})
# Get next page parameters
next_params = get_next_page_params(soup)
return results, next_params
The CSS selectors target specific elements in DuckDuckGo's HTML structure. Each result sits inside a container with the result class.
Handling Pagination
DuckDuckGo's pagination works through form submissions. Add this function to extract the next page parameters:
def get_next_page_params(soup):
"""
Extract parameters needed to fetch the next page.
Args:
soup: BeautifulSoup object of current page
Returns:
Dictionary of form parameters or None if no next page
"""
next_form = soup.select_one(".nav-link form")
if not next_form:
return None
params = {}
for input_elem in next_form.select("input"):
name = input_elem.get("name")
value = input_elem.get("value", "")
if name:
params[name] = value
return params
The static version uses a hidden form for pagination. This function extracts all form fields and passes them to the next request.
Saving Results to CSV
Add a function to export the scraped data:
def save_to_csv(results, filename):
"""
Save scraped results to a CSV file.
Args:
results: List of result dictionaries
filename: Output file path
"""
if not results:
print("No results to save")
return
fieldnames = results[0].keys()
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(results)
print(f"Saved {len(results)} results to {filename}")
Running the Scraper
Add the main execution block:
if __name__ == "__main__":
query = "python web scraping tutorial"
results = scrape_duckduckgo(query, num_pages=3)
save_to_csv(results, "duckduckgo_results.csv")
# Print a sample
for result in results[:5]:
print(f"\nTitle: {result['title']}")
print(f"URL: {result['url']}")
print(f"Snippet: {result['snippet'][:100]}...")
Run it with:
python scraper.py
You'll get a CSV file containing titles, URLs, display URLs, and snippets from DuckDuckGo's search results.
Method 2: Scrape DuckDuckGo With Browser Automation
Some projects require the dynamic version with JavaScript-rendered content. Browser automation handles this by controlling a real browser instance.
Playwright offers a cleaner API than Selenium and runs faster. Let's build a scraper using it.
Installing Playwright
pip install playwright
playwright install chromium
The second command downloads the Chromium browser binary that Playwright controls.
Building the Browser-Based Scraper
Create browser_scraper.py:
from playwright.sync_api import sync_playwright
import json
import time
def scrape_duckduckgo_dynamic(query, max_results=30):
"""
Scrape DuckDuckGo using browser automation.
Args:
query: Search term
max_results: Maximum results to collect
Returns:
List of result dictionaries
"""
results = []
with sync_playwright() as p:
# Launch browser in headless mode
browser = p.chromium.launch(headless=True)
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
page = context.new_page()
# Navigate to DuckDuckGo
search_url = f"https://duckduckgo.com/?q={query}"
page.goto(search_url, wait_until="networkidle")
# Wait for results to load
page.wait_for_selector("[data-testid='result']", timeout=10000)
while len(results) < max_results:
# Extract visible results
new_results = extract_results(page)
for result in new_results:
if result not in results:
results.append(result)
if len(results) >= max_results:
break
# Click "More Results" if available
more_button = page.query_selector("button:has-text('More Results')")
if more_button:
more_button.click()
time.sleep(2)
else:
break
browser.close()
return results[:max_results]
Playwright waits for the network to become idle before proceeding. This ensures all JavaScript has finished executing.
Extracting Results From the Dynamic Page
def extract_results(page):
"""
Extract search results from the current page state.
Args:
page: Playwright page object
Returns:
List of result dictionaries
"""
results = []
# The dynamic version uses data-testid attributes
result_elements = page.query_selector_all("[data-testid='result']")
for element in result_elements:
try:
title_elem = element.query_selector("h2 a")
snippet_elem = element.query_selector("[data-result='snippet']")
if not title_elem:
continue
title = title_elem.inner_text()
url = title_elem.get_attribute("href")
snippet = snippet_elem.inner_text() if snippet_elem else ""
results.append({
"title": title,
"url": url,
"snippet": snippet
})
except Exception as e:
continue
return results
The dynamic version's HTML structure differs from the static version. It uses data-testid attributes for testing, which also make scraping easier.
Running the Browser Scraper
if __name__ == "__main__":
results = scrape_duckduckgo_dynamic("machine learning courses", max_results=50)
print(f"Scraped {len(results)} results")
with open("dynamic_results.json", "w") as f:
json.dump(results, f, indent=2)
Browser automation uses more resources than HTTP requests. Reserve it for cases where you specifically need JavaScript-rendered content.
Method 3: Using the DDGS Python Library
DDGS (formerly duckduckgo-search) provides a high-level interface for DuckDuckGo scraping. It handles all the parsing logic internally.
Installing DDGS
pip install -U ddgs
Scraping With DDGS
The library supports both Python code and command-line usage:
from ddgs import DDGS
def search_with_ddgs(query, max_results=20):
"""
Search DuckDuckGo using the DDGS library.
Args:
query: Search term
max_results: Number of results to return
Returns:
List of result dictionaries
"""
results = []
with DDGS() as ddgs:
for result in ddgs.text(query, max_results=max_results):
results.append({
"title": result.get("title"),
"url": result.get("href"),
"snippet": result.get("body")
})
return results
# Usage
results = search_with_ddgs("best python frameworks 2024", max_results=30)
DDGS also offers a command-line interface:
ddgs text -q "python web scraping" -m 20 -o results.csv
This outputs results directly to a CSV file without writing any code.
Additional DDGS Features
The library supports multiple search types:
from ddgs import DDGS
with DDGS() as ddgs:
# Image search
images = list(ddgs.images("sunset beach", max_results=10))
# News search
news = list(ddgs.news("tech industry", max_results=10))
# Video search
videos = list(ddgs.videos("python tutorial", max_results=10))
DDGS abstracts away the complexity but offers less flexibility than custom scrapers.
Avoiding Blocks When You Scrape DuckDuckGo
DuckDuckGo implements rate limiting to prevent abuse. Making too many requests from the same IP triggers blocks.
Signs You're Being Blocked
Watch for these indicators:
- HTTP 403 Forbidden responses
- CAPTCHA challenges appearing
- Empty result pages
- Longer response times followed by connection drops
Implementing Request Delays
Add delays between requests to reduce detection:
import random
import time
def respectful_request(url, params, headers):
"""Make a request with random delay."""
# Random delay between 1-3 seconds
delay = random.uniform(1, 3)
time.sleep(delay)
return requests.get(url, params=params, headers=headers)
Random delays look more natural than fixed intervals.
Rotating User Agents
Cycle through different user agent strings:
import random
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
]
def get_random_headers():
return {
"User-Agent": random.choice(USER_AGENTS),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive"
}
Using Rotating Proxies for Scale
For large-scale scraping, rotating proxies are essential. Each request goes through a different IP address, making it impossible for DuckDuckGo to identify your scraper.
Residential proxies work best because they use real home IP addresses. We offer residential, datacenter, ISP, and mobile proxy options that integrate easily with Python:
def scrape_with_proxy(query, proxy_url):
"""
Make a request through a rotating proxy.
Args:
query: Search term
proxy_url: Proxy connection string
Returns:
Response object
"""
proxies = {
"http": proxy_url,
"https": proxy_url
}
base_url = "https://html.duckduckgo.com/html/"
params = {"q": query}
headers = get_random_headers()
response = requests.get(
base_url,
params=params,
headers=headers,
proxies=proxies,
timeout=30
)
return response
With rotating proxies, you can scrape thousands of queries without hitting rate limits.
Handling CAPTCHAs
If you encounter CAPTCHAs frequently, consider these approaches:
- Reduce request frequency
- Use higher-quality residential proxies
- Implement exponential backoff on errors
- Switch to the static version which triggers fewer CAPTCHAs
def exponential_backoff(func, max_retries=5):
"""Retry with exponential backoff on failure."""
for attempt in range(max_retries):
try:
return func()
except Exception as e:
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) + random.uniform(0, 1)
print(f"Attempt {attempt + 1} failed. Waiting {wait_time:.1f}s")
time.sleep(wait_time)
Complete Production Scraper
Here's a complete script combining all the techniques:
import requests
from bs4 import BeautifulSoup
import csv
import time
import random
from typing import List, Dict, Optional
class DuckDuckGoScraper:
"""Production-ready DuckDuckGo scraper with anti-detection measures."""
def __init__(self, proxy_url: Optional[str] = None):
self.base_url = "https://html.duckduckgo.com/html/"
self.proxy_url = proxy_url
self.session = requests.Session()
self.user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/605.1.15",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Firefox/121.0",
]
def _get_headers(self) -> Dict[str, str]:
return {
"User-Agent": random.choice(self.user_agents),
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.5",
}
def _make_request(self, params: Dict) -> Optional[str]:
proxies = None
if self.proxy_url:
proxies = {"http": self.proxy_url, "https": self.proxy_url}
time.sleep(random.uniform(1, 2))
try:
response = self.session.get(
self.base_url,
params=params,
headers=self._get_headers(),
proxies=proxies,
timeout=30
)
response.raise_for_status()
return response.text
except requests.RequestException as e:
print(f"Request failed: {e}")
return None
def _parse_results(self, html: str) -> tuple:
soup = BeautifulSoup(html, "html.parser")
results = []
for element in soup.select("#links .result"):
title_link = element.select_one(".result__a")
if not title_link:
continue
url = title_link.get("href", "")
if url.startswith("//"):
url = "https:" + url
results.append({
"title": title_link.get_text(strip=True),
"url": url,
"snippet": element.select_one(".result__snippet").get_text(strip=True) if element.select_one(".result__snippet") else ""
})
# Get next page params
next_form = soup.select_one(".nav-link form")
next_params = None
if next_form:
next_params = {}
for inp in next_form.select("input"):
if inp.get("name"):
next_params[inp.get("name")] = inp.get("value", "")
return results, next_params
def scrape(self, query: str, max_pages: int = 1) -> List[Dict]:
all_results = []
params = {"q": query}
for page in range(max_pages):
html = self._make_request(params)
if not html:
break
results, next_params = self._parse_results(html)
all_results.extend(results)
if not next_params:
break
params = next_params
print(f"Scraped page {page + 1}, total results: {len(all_results)}")
return all_results
def save_csv(self, results: List[Dict], filename: str):
if not results:
return
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
if __name__ == "__main__":
scraper = DuckDuckGoScraper()
results = scraper.scrape("best programming languages 2024", max_pages=3)
scraper.save_csv(results, "output.csv")
print(f"Done! Scraped {len(results)} results")
This class-based approach keeps code organized and makes it easy to add features like proxy rotation.
Conclusion
You now have three reliable ways to scrape DuckDuckGo search results:
HTTP requests with BeautifulSoup work best for the static version. This approach is fast, lightweight, and handles most use cases.
Browser automation with Playwright handles the dynamic JavaScript version. Use this when you need AI summaries or other dynamic content.
The DDGS library provides a quick solution for simple scraping tasks. It's perfect for prototyping or one-off data collection.
For production scraping at scale, combine these techniques with rotating proxies and respect DuckDuckGo's servers with appropriate delays.
Start with the static version scraper. It covers 90% of use cases and runs much faster than browser automation.
FAQ
Is it legal to scrape DuckDuckGo?
Web scraping public information is generally legal. However, you should review DuckDuckGo's terms of service and robots.txt. Avoid overwhelming their servers with excessive requests.
Why do I get 403 errors when scraping DuckDuckGo?
DuckDuckGo returns 403 errors when it detects automated requests. Add a realistic User-Agent header to your requests. If blocks persist, implement request delays and consider using rotating proxies.
How many results can I scrape from DuckDuckGo?
The static version returns about 30 results per page. You can paginate through multiple pages to collect more. Practical limits depend on rate limiting and your proxy infrastructure.
Should I use the static or dynamic version?
Use the static version at html.duckduckgo.com unless you specifically need JavaScript-rendered features like AI summaries. The static version is faster and easier to scrape.
How do I avoid getting blocked?
Implement random delays between requests, rotate User-Agent strings, and use rotating residential proxies for larger projects. Keep request rates reasonable and handle errors gracefully with exponential backoff.