Product Hunt is a goldmine for market research, competitor analysis, and trend spotting.
Whether you're tracking launches in your industry, building a newsletter, or analyzing what makes products successful, scraping Product Hunt gives you access to data that would take hours to collect manually.

The catch? Product Hunt is a modern JavaScript-heavy site that doesn't play nice with simple HTTP requests. You'll need browser automation, smart anti-detection techniques, and a strategy for handling rate limits. In this guide, I'll show you exactly how to scrape Product Hunt using Python and Playwright, with real code that actually works.
What you'll find in this guide
- What data you can scrape from Product Hunt
- Should you use the API or scrape the site?
- Setting up Playwright for Product Hunt scraping
- Scraping the daily product feed
- Extracting product details and maker information
- Anti-detection techniques that work
- Handling rate limits without proxies
- Storing and exporting your data
What data can you scrape from Product Hunt?
Product Hunt surfaces a wealth of data about new products, and here's what you can realistically extract:
Product information: Names, taglines, descriptions, categories, launch dates, and product URLs. This is the bread and butter of most scraping projects.

Engagement metrics: Upvote counts, comment counts, and rankings. These numbers tell you what's resonating with the community.

Maker profiles: Information about the people behind products, including their names, profile links, and sometimes social media handles.

Comments and discussions: User feedback, questions, and conversations around products. This qualitative data is often overlooked but incredibly valuable.

Images and media: Product screenshots, logos, and demo videos. These can be downloaded for analysis or archiving.

Historical data: Past launches from the daily archive pages going back years. Want to see what was hot in 2018? It's all there.

Should you use the API or scrape the site?
Product Hunt offers a GraphQL API, so you might be wondering: why scrape at all?
The API has some serious limitations.
First, it requires approval for commercial use, which means you'll need to contact Product Hunt and explain your use case.
Second, there are rate limits—6,250 complexity points every 15 minutes for GraphQL queries, or 450 requests per 15 minutes for REST endpoints. For small projects, this is fine. For anything at scale, you'll hit the ceiling fast.
More importantly, the API requires OAuth authentication, which adds complexity to your setup. And if you're doing one-off research or building a prototype, going through the approval process feels like overkill.
Web scraping gives you more flexibility. You can extract exactly what you need without worrying about API schemas, rate limits hit you less aggressively, and you don't need permission to get started. The trade-off? You'll need to handle JavaScript rendering and anti-bot detection.
For this guide, we'll focus on scraping the site directly. It's more practical for most use cases and teaches you techniques that apply to other modern websites.
Setting up your scraping environment
Let's get the boring stuff out of the way first. You'll need Python 3.8+ and a few libraries.
Create a new project folder and set up a virtual environment:
mkdir producthunt-scraper
cd producthunt-scraper
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install the required packages:
pip install playwright beautifulsoup4 lxml pandas
playwright install chromium
Playwright is doing the heavy lifting here. It controls a real browser, executes JavaScript, and handles all the dynamic content Product Hunt throws at you. Beautiful Soup will help us parse the HTML once Playwright grabs it, and pandas makes exporting data dead simple.
Scraping the daily product feed
The main Product Hunt page shows today's top products. Let's start there.
Here's a basic scraper that grabs product names and taglines:
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
async def scrape_daily_products():
async with async_playwright() as p:
# Launch browser in headless mode
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
# Navigate to Product Hunt
await page.goto('https://www.producthunt.com/', wait_until='networkidle')
# Wait for products to load
await page.wait_for_selector('[data-test="homepage-section-0"]', timeout=10000)
# Get the page content
content = await page.content()
# Parse with Beautiful Soup
soup = BeautifulSoup(content, 'lxml')
# Find all product cards
products = []
product_cards = soup.select('div[data-test^="post-item"]')
for card in product_cards:
# Extract product name
name_elem = card.select_one('a[href^="/posts/"]')
name = name_elem.text.strip() if name_elem else 'N/A'
# Extract tagline
tagline_elem = card.select_one('[color="subdued"]')
tagline = tagline_elem.text.strip() if tagline_elem else 'N/A'
# Extract upvotes
upvote_elem = card.select_one('button[aria-label*="upvote"]')
upvotes = upvote_elem.text.strip() if upvote_elem else '0'
products.append({
'name': name,
'tagline': tagline,
'upvotes': upvotes
})
await browser.close()
return products
# Run the scraper
products = asyncio.run(scrape_daily_products())
for product in products:
print(f"{product['name']} - {product['tagline']} ({product['upvotes']} upvotes)")
This code does several important things. First, it launches a Chromium browser in headless mode, which means no visible window pops up. Then it navigates to Product Hunt and waits for the network to go idle, ensuring all the JavaScript has executed and the page is fully loaded.
The wait_for_selector
call is crucial. Product Hunt uses React, so the initial HTML is basically empty. We need to wait for the actual product cards to render before we can scrape anything.
Once we have the HTML, Beautiful Soup makes it easy to extract data using CSS selectors. Product Hunt's DOM structure uses data-test
attributes, which are actually more stable than class names (those tend to change when they update their CSS).
Extracting detailed product information
The daily feed gives you basic info, but what if you want everything—descriptions, maker details, comments, and more? You'll need to visit individual product pages.
Here's how to scrape a single product page:
async def scrape_product_details(product_url):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
)
page = await context.new_page()
await page.goto(product_url, wait_until='networkidle')
await page.wait_for_selector('[data-test="post-name"]', timeout=10000)
content = await page.content()
soup = BeautifulSoup(content, 'lxml')
# Extract product name
name = soup.select_one('[data-test="post-name"]')
name = name.text.strip() if name else 'N/A'
# Extract description
desc_elem = soup.select_one('[data-test="post-description"]')
description = desc_elem.text.strip() if desc_elem else 'N/A'
# Extract maker information
makers = []
maker_elements = soup.select('[data-test="post-maker"]')
for maker in maker_elements:
maker_name = maker.text.strip()
maker_link = maker.get('href', '')
makers.append({'name': maker_name, 'profile': maker_link})
# Extract website link
website_elem = soup.select_one('a[data-test="post-product-link"]')
website = website_elem.get('href', '') if website_elem else 'N/A'
# Extract comment count
comment_elem = soup.select_one('[data-test="post-comment-count"]')
comments = comment_elem.text.strip() if comment_elem else '0'
await browser.close()
return {
'name': name,
'description': description,
'makers': makers,
'website': website,
'comments': comments
}
# Example usage
product_data = asyncio.run(scrape_product_details('https://www.producthunt.com/posts/some-product'))
print(product_data)
Notice I added a custom user agent when creating the browser context. This is our first anti-detection measure. Playwright's default user agent screams "I'm a bot," so we're replacing it with one that looks like a regular Chrome browser on macOS.
The rest of the code follows the same pattern—navigate, wait for content, parse with Beautiful Soup, extract data. The key is using those data-test
attributes to target the right elements.
Building a complete scraper with pagination
Let's tie it all together. This scraper grabs today's products, visits each one, extracts detailed info, and saves everything to a CSV file:
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import pandas as pd
import time
async def scrape_product_hunt_complete():
async with async_playwright() as p:
# Launch browser with anti-detection settings
browser = await p.chromium.launch(
headless=True,
args=['--disable-blink-features=AutomationControlled']
)
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport={'width': 1920, 'height': 1080}
)
page = await context.new_page()
# Hide playwright automation
await page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => false
});
""")
# Scrape main page
print("Scraping daily products...")
await page.goto('https://www.producthunt.com/', wait_until='networkidle')
await page.wait_for_selector('[data-test="homepage-section-0"]', timeout=10000)
content = await page.content()
soup = BeautifulSoup(content, 'lxml')
# Get product URLs
product_links = []
product_cards = soup.select('a[href^="/posts/"]')
for link in product_cards[:10]: # Limit to 10 for testing
href = link.get('href')
if href and '/posts/' in href:
full_url = f"https://www.producthunt.com{href}"
if full_url not in product_links:
product_links.append(full_url)
# Scrape each product
all_products = []
for i, url in enumerate(product_links, 1):
print(f"Scraping product {i}/{len(product_links)}: {url}")
try:
await page.goto(url, wait_until='networkidle')
await page.wait_for_selector('[data-test="post-name"]', timeout=10000)
# Add human-like delay
await asyncio.sleep(2)
content = await page.content()
soup = BeautifulSoup(content, 'lxml')
# Extract data
name_elem = soup.select_one('[data-test="post-name"]')
name = name_elem.text.strip() if name_elem else 'N/A'
tagline_elem = soup.select_one('[data-test="post-tagline"]')
tagline = tagline_elem.text.strip() if tagline_elem else 'N/A'
desc_elem = soup.select_one('[data-test="post-description"]')
description = desc_elem.text.strip() if desc_elem else 'N/A'
upvote_elem = soup.select_one('button[aria-label*="upvote"]')
upvotes = upvote_elem.text.strip() if upvote_elem else '0'
# Get maker names
makers = []
maker_elems = soup.select('[data-test="post-maker"]')
for maker in maker_elems:
makers.append(maker.text.strip())
all_products.append({
'name': name,
'tagline': tagline,
'description': description,
'upvotes': upvotes,
'makers': ', '.join(makers),
'url': url
})
except Exception as e:
print(f"Error scraping {url}: {str(e)}")
continue
await browser.close()
# Save to CSV
df = pd.DataFrame(all_products)
df.to_csv('producthunt_products.csv', index=False)
print(f"\nScraped {len(all_products)} products. Saved to producthunt_products.csv")
return all_products
# Run it
asyncio.run(scrape_product_hunt_complete())
This script includes several key improvements. The --disable-blink-features=AutomationControlled
argument removes one of the telltale signs that you're using browser automation. The viewport size mimics a typical desktop browser, and we're injecting a script that overrides the navigator.webdriver
property—a flag that anti-bot systems check.
The human-like delays (asyncio.sleep(2)
) are important. If you scrape too fast, you'll trigger rate limits or get flagged as suspicious. Two seconds between requests is a reasonable pace that won't slow you down too much but keeps you under the radar.
Advanced anti-detection techniques
Product Hunt doesn't have Cloudflare-level protection, but they do monitor for bot behavior. Here's how to stay undetected:
Use playwright-stealth: This library patches dozens of bot detection signals automatically.
pip install playwright-stealth
Then update your code:
from playwright_stealth import stealth_async
async def scrape_with_stealth():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
# Apply stealth patches
await stealth_async(page)
await page.goto('https://www.producthunt.com/')
# Rest of your scraping code...
Rotate user agents: Don't use the same one for every request. Create a list and pick randomly:
import random
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]
context = await browser.new_context(
user_agent=random.choice(USER_AGENTS)
)
Mimic human scrolling: Before grabbing content, scroll the page like a real user would:
async def human_scroll(page):
await page.evaluate("""
async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 100;
const timer = setInterval(() => {
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= document.body.scrollHeight) {
clearInterval(timer);
resolve();
}
}, 100);
});
}
""")
Add this before extracting data, and you'll trigger lazy-loading while looking less bot-like.
Scraping historical data from archives
Product Hunt has archive pages for every day going back to 2013. The URL format is predictable:
https://www.producthunt.com/leaderboard/daily/2026/1/15
You can loop through dates and scrape historical launches:
from datetime import datetime, timedelta
async def scrape_archive(date):
"""Scrape products from a specific date"""
year, month, day = date.year, date.month, date.day
url = f"https://www.producthunt.com/leaderboard/daily/{year}/{month}/{day}"
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until='networkidle')
# Same scraping logic as before...
await browser.close()
# Scrape last 7 days
start_date = datetime.now() - timedelta(days=7)
for i in range(7):
date = start_date + timedelta(days=i)
print(f"Scraping {date.strftime('%Y-%m-%d')}...")
asyncio.run(scrape_archive(date))
await asyncio.sleep(5) # Be respectful with delays
This approach lets you build a dataset of thousands of products without hitting API rate limits.
Handling errors and retries
Web scraping is messy. Networks fail, pages time out, and selectors break when sites update. Build in retry logic:
async def scrape_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until='networkidle', timeout=30000)
# Your scraping logic here
await browser.close()
return data
except Exception as e:
print(f"Attempt {attempt + 1} failed: {str(e)}")
if attempt < max_retries - 1:
await asyncio.sleep(5 * (attempt + 1)) # Exponential backoff
else:
print(f"Failed after {max_retries} attempts")
return None
The exponential backoff (waiting longer after each failure) prevents you from hammering the server when something's wrong.
Storing and analyzing your data
Once you've scraped Product Hunt, you'll want to do something useful with the data. Pandas makes this straightforward:
import pandas as pd
# Load your scraped data
df = pd.DataFrame(all_products)
# Find top products by upvotes
df['upvotes_int'] = df['upvotes'].str.replace(',', '').astype(int)
top_products = df.nlargest(10, 'upvotes_int')
# Analyze by maker
maker_counts = df['makers'].value_counts()
print(f"Most active makers:\n{maker_counts.head()}")
# Export to different formats
df.to_csv('products.csv', index=False)
df.to_json('products.json', orient='records', indent=2)
df.to_excel('products.xlsx', index=False)
You can also push this data to a database, feed it into a dashboard, or use it for machine learning projects.
Ethical considerations and rate limiting
Let's talk about the elephant in the room: is this okay?
Scraping public data isn't illegal, but it's important to be respectful. Product Hunt's terms of service discourage automated access, so use your judgment. If you're doing academic research, building a personal project, or creating something that benefits the community, you're probably fine. If you're planning to resell the data or compete directly with Product Hunt, you should use their API or reach out for permission.
As for rate limiting, I recommend:
- No more than 1 request per 2-3 seconds
- Scraping during off-peak hours (late night US time)
- Not hammering the site with hundreds of concurrent requests
- Stopping if you encounter 429 or 403 errors
Think of it like this: if a human could reasonably do what your scraper does, you're probably okay.
Wrapping up
Scraping Product Hunt isn't rocket science, but it requires the right tools and techniques. Playwright handles the JavaScript rendering, anti-detection patches keep you under the radar, and smart rate limiting keeps you from getting blocked.
The code samples in this guide are production-ready. They handle errors, include delays, and use stealth techniques that work. The main things to remember: use Playwright instead of simple HTTP requests, hide your automation signals, and be respectful with your scraping pace.
Whether you're tracking competitors, researching market trends, or building a side project, Product Hunt's data is incredibly valuable. Now you know how to get it.