How to Scrape Product Hunt in 2026

October 19, 2025

11 min read

Product Hunt is a goldmine for market research, competitor analysis, and trend spotting.

Whether you're tracking launches in your industry, building a newsletter, or analyzing what makes products successful, scraping Product Hunt gives you access to data that would take hours to collect manually.

The catch? Product Hunt is a modern JavaScript-heavy site that doesn't play nice with simple HTTP requests. You'll need browser automation, smart anti-detection techniques, and a strategy for handling rate limits. In this guide, I'll show you exactly how to scrape Product Hunt using Python and Playwright, with real code that actually works.

What you'll find in this guide

What data you can scrape from Product Hunt
Should you use the API or scrape the site?
Setting up Playwright for Product Hunt scraping
Scraping the daily product feed
Extracting product details and maker information
Anti-detection techniques that work
Handling rate limits without proxies
Storing and exporting your data

What data can you scrape from Product Hunt?

Product Hunt surfaces a wealth of data about new products, and here's what you can realistically extract:

Product information: Names, taglines, descriptions, categories, launch dates, and product URLs. This is the bread and butter of most scraping projects.

Engagement metrics: Upvote counts, comment counts, and rankings. These numbers tell you what's resonating with the community.

Maker profiles: Information about the people behind products, including their names, profile links, and sometimes social media handles.

Comments and discussions: User feedback, questions, and conversations around products. This qualitative data is often overlooked but incredibly valuable.

Images and media: Product screenshots, logos, and demo videos. These can be downloaded for analysis or archiving.

Historical data: Past launches from the daily archive pages going back years. Want to see what was hot in 2018? It's all there.

Should you use the API or scrape the site?

Product Hunt offers a GraphQL API, so you might be wondering: why scrape at all?

The API has some serious limitations.

First, it requires approval for commercial use, which means you'll need to contact Product Hunt and explain your use case.

Second, there are rate limits—6,250 complexity points every 15 minutes for GraphQL queries, or 450 requests per 15 minutes for REST endpoints. For small projects, this is fine. For anything at scale, you'll hit the ceiling fast.

More importantly, the API requires OAuth authentication, which adds complexity to your setup. And if you're doing one-off research or building a prototype, going through the approval process feels like overkill.

Web scraping gives you more flexibility. You can extract exactly what you need without worrying about API schemas, rate limits hit you less aggressively, and you don't need permission to get started. The trade-off? You'll need to handle JavaScript rendering and anti-bot detection.

For this guide, we'll focus on scraping the site directly. It's more practical for most use cases and teaches you techniques that apply to other modern websites.

Setting up your scraping environment

Let's get the boring stuff out of the way first. You'll need Python 3.8+ and a few libraries.

Create a new project folder and set up a virtual environment:

mkdir producthunt-scraper
cd producthunt-scraper
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:

pip install playwright beautifulsoup4 lxml pandas
playwright install chromium

Playwright is doing the heavy lifting here. It controls a real browser, executes JavaScript, and handles all the dynamic content Product Hunt throws at you. Beautiful Soup will help us parse the HTML once Playwright grabs it, and pandas makes exporting data dead simple.

Scraping the daily product feed

The main Product Hunt page shows today's top products. Let's start there.

Here's a basic scraper that grabs product names and taglines:

import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup

async def scrape_daily_products():
    async with async_playwright() as p:
        # Launch browser in headless mode
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        
        # Navigate to Product Hunt
        await page.goto('https://www.producthunt.com/', wait_until='networkidle')
        
        # Wait for products to load
        await page.wait_for_selector('[data-test="homepage-section-0"]', timeout=10000)
        
        # Get the page content
        content = await page.content()
        
        # Parse with Beautiful Soup
        soup = BeautifulSoup(content, 'lxml')
        
        # Find all product cards
        products = []
        product_cards = soup.select('div[data-test^="post-item"]')
        
        for card in product_cards:
            # Extract product name
            name_elem = card.select_one('a[href^="/posts/"]')
            name = name_elem.text.strip() if name_elem else 'N/A'
            
            # Extract tagline
            tagline_elem = card.select_one('[color="subdued"]')
            tagline = tagline_elem.text.strip() if tagline_elem else 'N/A'
            
            # Extract upvotes
            upvote_elem = card.select_one('button[aria-label*="upvote"]')
            upvotes = upvote_elem.text.strip() if upvote_elem else '0'
            
            products.append({
                'name': name,
                'tagline': tagline,
                'upvotes': upvotes
            })
        
        await browser.close()
        
        return products

# Run the scraper
products = asyncio.run(scrape_daily_products())
for product in products:
    print(f"{product['name']} - {product['tagline']} ({product['upvotes']} upvotes)")

This code does several important things. First, it launches a Chromium browser in headless mode, which means no visible window pops up. Then it navigates to Product Hunt and waits for the network to go idle, ensuring all the JavaScript has executed and the page is fully loaded.

The wait_for_selector call is crucial. Product Hunt uses React, so the initial HTML is basically empty. We need to wait for the actual product cards to render before we can scrape anything.

Once we have the HTML, Beautiful Soup makes it easy to extract data using CSS selectors. Product Hunt's DOM structure uses data-test attributes, which are actually more stable than class names (those tend to change when they update their CSS).

Extracting detailed product information

The daily feed gives you basic info, but what if you want everything—descriptions, maker details, comments, and more? You'll need to visit individual product pages.

Here's how to scrape a single product page:

async def scrape_product_details(product_url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
        )
        page = await context.new_page()
        
        await page.goto(product_url, wait_until='networkidle')
        await page.wait_for_selector('[data-test="post-name"]', timeout=10000)
        
        content = await page.content()
        soup = BeautifulSoup(content, 'lxml')
        
        # Extract product name
        name = soup.select_one('[data-test="post-name"]')
        name = name.text.strip() if name else 'N/A'
        
        # Extract description
        desc_elem = soup.select_one('[data-test="post-description"]')
        description = desc_elem.text.strip() if desc_elem else 'N/A'
        
        # Extract maker information
        makers = []
        maker_elements = soup.select('[data-test="post-maker"]')
        for maker in maker_elements:
            maker_name = maker.text.strip()
            maker_link = maker.get('href', '')
            makers.append({'name': maker_name, 'profile': maker_link})
        
        # Extract website link
        website_elem = soup.select_one('a[data-test="post-product-link"]')
        website = website_elem.get('href', '') if website_elem else 'N/A'
        
        # Extract comment count
        comment_elem = soup.select_one('[data-test="post-comment-count"]')
        comments = comment_elem.text.strip() if comment_elem else '0'
        
        await browser.close()
        
        return {
            'name': name,
            'description': description,
            'makers': makers,
            'website': website,
            'comments': comments
        }

# Example usage
product_data = asyncio.run(scrape_product_details('https://www.producthunt.com/posts/some-product'))
print(product_data)

Notice I added a custom user agent when creating the browser context. This is our first anti-detection measure. Playwright's default user agent screams "I'm a bot," so we're replacing it with one that looks like a regular Chrome browser on macOS.

The rest of the code follows the same pattern—navigate, wait for content, parse with Beautiful Soup, extract data. The key is using those data-test attributes to target the right elements.

Building a complete scraper with pagination

Let's tie it all together. This scraper grabs today's products, visits each one, extracts detailed info, and saves everything to a CSV file:

import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import pandas as pd
import time

async def scrape_product_hunt_complete():
    async with async_playwright() as p:
        # Launch browser with anti-detection settings
        browser = await p.chromium.launch(
            headless=True,
            args=['--disable-blink-features=AutomationControlled']
        )
        
        context = await browser.new_context(
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            viewport={'width': 1920, 'height': 1080}
        )
        
        page = await context.new_page()
        
        # Hide playwright automation
        await page.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false
            });
        """)
        
        # Scrape main page
        print("Scraping daily products...")
        await page.goto('https://www.producthunt.com/', wait_until='networkidle')
        await page.wait_for_selector('[data-test="homepage-section-0"]', timeout=10000)
        
        content = await page.content()
        soup = BeautifulSoup(content, 'lxml')
        
        # Get product URLs
        product_links = []
        product_cards = soup.select('a[href^="/posts/"]')
        
        for link in product_cards[:10]:  # Limit to 10 for testing
            href = link.get('href')
            if href and '/posts/' in href:
                full_url = f"https://www.producthunt.com{href}"
                if full_url not in product_links:
                    product_links.append(full_url)
        
        # Scrape each product
        all_products = []
        for i, url in enumerate(product_links, 1):
            print(f"Scraping product {i}/{len(product_links)}: {url}")
            
            try:
                await page.goto(url, wait_until='networkidle')
                await page.wait_for_selector('[data-test="post-name"]', timeout=10000)
                
                # Add human-like delay
                await asyncio.sleep(2)
                
                content = await page.content()
                soup = BeautifulSoup(content, 'lxml')
                
                # Extract data
                name_elem = soup.select_one('[data-test="post-name"]')
                name = name_elem.text.strip() if name_elem else 'N/A'
                
                tagline_elem = soup.select_one('[data-test="post-tagline"]')
                tagline = tagline_elem.text.strip() if tagline_elem else 'N/A'
                
                desc_elem = soup.select_one('[data-test="post-description"]')
                description = desc_elem.text.strip() if desc_elem else 'N/A'
                
                upvote_elem = soup.select_one('button[aria-label*="upvote"]')
                upvotes = upvote_elem.text.strip() if upvote_elem else '0'
                
                # Get maker names
                makers = []
                maker_elems = soup.select('[data-test="post-maker"]')
                for maker in maker_elems:
                    makers.append(maker.text.strip())
                
                all_products.append({
                    'name': name,
                    'tagline': tagline,
                    'description': description,
                    'upvotes': upvotes,
                    'makers': ', '.join(makers),
                    'url': url
                })
                
            except Exception as e:
                print(f"Error scraping {url}: {str(e)}")
                continue
        
        await browser.close()
        
        # Save to CSV
        df = pd.DataFrame(all_products)
        df.to_csv('producthunt_products.csv', index=False)
        print(f"\nScraped {len(all_products)} products. Saved to producthunt_products.csv")
        
        return all_products

# Run it
asyncio.run(scrape_product_hunt_complete())

This script includes several key improvements. The --disable-blink-features=AutomationControlled argument removes one of the telltale signs that you're using browser automation. The viewport size mimics a typical desktop browser, and we're injecting a script that overrides the navigator.webdriver property—a flag that anti-bot systems check.

The human-like delays (asyncio.sleep(2)) are important. If you scrape too fast, you'll trigger rate limits or get flagged as suspicious. Two seconds between requests is a reasonable pace that won't slow you down too much but keeps you under the radar.

Advanced anti-detection techniques

Product Hunt doesn't have Cloudflare-level protection, but they do monitor for bot behavior. Here's how to stay undetected:

Use playwright-stealth: This library patches dozens of bot detection signals automatically.

pip install playwright-stealth

Then update your code:

from playwright_stealth import stealth_async

async def scrape_with_stealth():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        
        # Apply stealth patches
        await stealth_async(page)
        
        await page.goto('https://www.producthunt.com/')
        # Rest of your scraping code...

Rotate user agents: Don't use the same one for every request. Create a list and pick randomly:

import random

USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]

context = await browser.new_context(
    user_agent=random.choice(USER_AGENTS)
)

Mimic human scrolling: Before grabbing content, scroll the page like a real user would:

async def human_scroll(page):
    await page.evaluate("""
        async () => {
            await new Promise((resolve) => {
                let totalHeight = 0;
                const distance = 100;
                const timer = setInterval(() => {
                    window.scrollBy(0, distance);
                    totalHeight += distance;
                    
                    if (totalHeight >= document.body.scrollHeight) {
                        clearInterval(timer);
                        resolve();
                    }
                }, 100);
            });
        }
    """)

Add this before extracting data, and you'll trigger lazy-loading while looking less bot-like.

Scraping historical data from archives

Product Hunt has archive pages for every day going back to 2013. The URL format is predictable:

https://www.producthunt.com/leaderboard/daily/2026/1/15

You can loop through dates and scrape historical launches:

from datetime import datetime, timedelta

async def scrape_archive(date):
    """Scrape products from a specific date"""
    year, month, day = date.year, date.month, date.day
    url = f"https://www.producthunt.com/leaderboard/daily/{year}/{month}/{day}"
    
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        
        await page.goto(url, wait_until='networkidle')
        # Same scraping logic as before...
        
        await browser.close()

# Scrape last 7 days
start_date = datetime.now() - timedelta(days=7)
for i in range(7):
    date = start_date + timedelta(days=i)
    print(f"Scraping {date.strftime('%Y-%m-%d')}...")
    asyncio.run(scrape_archive(date))
    await asyncio.sleep(5)  # Be respectful with delays

This approach lets you build a dataset of thousands of products without hitting API rate limits.

Handling errors and retries

Web scraping is messy. Networks fail, pages time out, and selectors break when sites update. Build in retry logic:

async def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            async with async_playwright() as p:
                browser = await p.chromium.launch(headless=True)
                page = await browser.new_page()
                
                await page.goto(url, wait_until='networkidle', timeout=30000)
                # Your scraping logic here
                
                await browser.close()
                return data
                
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {str(e)}")
            if attempt < max_retries - 1:
                await asyncio.sleep(5 * (attempt + 1))  # Exponential backoff
            else:
                print(f"Failed after {max_retries} attempts")
                return None

The exponential backoff (waiting longer after each failure) prevents you from hammering the server when something's wrong.

Storing and analyzing your data

Once you've scraped Product Hunt, you'll want to do something useful with the data. Pandas makes this straightforward:

import pandas as pd

# Load your scraped data
df = pd.DataFrame(all_products)

# Find top products by upvotes
df['upvotes_int'] = df['upvotes'].str.replace(',', '').astype(int)
top_products = df.nlargest(10, 'upvotes_int')

# Analyze by maker
maker_counts = df['makers'].value_counts()
print(f"Most active makers:\n{maker_counts.head()}")

# Export to different formats
df.to_csv('products.csv', index=False)
df.to_json('products.json', orient='records', indent=2)
df.to_excel('products.xlsx', index=False)

You can also push this data to a database, feed it into a dashboard, or use it for machine learning projects.

Ethical considerations and rate limiting

Let's talk about the elephant in the room: is this okay?

Scraping public data isn't illegal, but it's important to be respectful. Product Hunt's terms of service discourage automated access, so use your judgment. If you're doing academic research, building a personal project, or creating something that benefits the community, you're probably fine. If you're planning to resell the data or compete directly with Product Hunt, you should use their API or reach out for permission.

As for rate limiting, I recommend:

No more than 1 request per 2-3 seconds
Scraping during off-peak hours (late night US time)
Not hammering the site with hundreds of concurrent requests
Stopping if you encounter 429 or 403 errors

Think of it like this: if a human could reasonably do what your scraper does, you're probably okay.

Wrapping up

Scraping Product Hunt isn't rocket science, but it requires the right tools and techniques. Playwright handles the JavaScript rendering, anti-detection patches keep you under the radar, and smart rate limiting keeps you from getting blocked.

The code samples in this guide are production-ready. They handle errors, include delays, and use stealth techniques that work. The main things to remember: use Playwright instead of simple HTTP requests, hide your automation signals, and be respectful with your scraping pace.

Whether you're tracking competitors, researching market trends, or building a side project, Product Hunt's data is incredibly valuable. Now you know how to get it.

Marius Bernard

Marius Bernard is a Product Advisor, Technical SEO, & Brand Ambassador at Roundproxies. He was the lead author for the SEO chapter of the 2024 Web and a reviewer for the 2023 SEO chapter.

Get the best
proxies out there

Get Proxies now

Related from Knowledge Base

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to Use Playwright Playwright Proxy in 2026

How to Take Screenshots with Puppeteer

How to Store and Manage Scraped Data Efficiently

User-Agent Rotation: Why and How to Implement It

How to Scrape Data Behind Login Pages

What Are Backconnect Proxies and How They Work

How to Do Requests in Go (Golang)

How to Do Requests with C

How to Do Requests with Swift

How to Do Requests with R

How to Make Requests with JavaScript (The Complete Guide)

How to Use Requests in Python

How to Build a RAG Chatbot in 6 Steps