How to Scrape Trustpilot Reviews and Company Data in 5 Steps

Trustpilot scraping lets you extract millions of customer reviews and company ratings programmatically, turning raw review data into actionable business intelligence.

In this guide, we'll help you find and scrape Trustpilot content using code that bypasses anti-bot protection with both lightweight web requests and browser automation techniques.

Ever tried manually copying thousands of Trustpilot reviews? Yeah, that's a nightmare. Whether you're analyzing competitor company data, building sentiment analysis models, or monitoring brand reputation, you need an automated web scraper solution that actually works.

Here's the thing - Trustpilot uses Next.js, which means all that juicy review data is hidden in JSON blobs inside __NEXT_DATA__ script tags. Once you find this secret, scraping Trustpilot becomes trivial. But there's a catch: scale up too fast and you'll hit their anti-bot walls faster than you can say "429 error."

In this post, I'll help you scrape Trustpilot at scale without getting blocked, including some neat tricks the big scraping services don't want you to know. We'll find ways to get review content, company details, and date information using smart web scraping techniques.

Why You Can Trust This Scraper Method

Problem: Traditional HTML parsing breaks when websites update their page structure. Plus, Trustpilot implements rate limiting, IP blocking, and browser fingerprinting to stop web scrapers.

Solution: We'll use their hidden JSON data structure and private API endpoints - the same ones their frontend uses to get review content. This approach helps you scrape faster and more reliably.

Proof: I've used this exact code to scrape over 2 million Trustpilot reviews for various clients without a single ban. The secret? Understanding how modern web pages work and finding their data fetching patterns.

Step 1: Find and Extract Hidden JSON Data from Search Pages

Forget BeautifulSoup parsing - we're going straight for the JSON goldmine. Every Trustpilot page contains a __NEXT_DATA__ script tag with all the review data pre-loaded. Let's find and scrape this content.

The Smart Way: Direct JSON Data Extraction

import httpx
import json
from parsel import Selector

def find_hidden_review_data(html):
    """Find and extract JSON data from __NEXT_DATA__ script tag"""
    selector = Selector(html)
    script_data = selector.xpath('//script[@id="__NEXT_DATA__"]/text()').get()
    return json.loads(script_data)

async def scrape_trustpilot_search(keyword, pages=5):
    """Scrape Trustpilot search results to get company data"""
    async with httpx.AsyncClient() as client:
        results = []
        
        for page in range(1, pages + 1):
            url = f"https://www.trustpilot.com/search?query={keyword}&page={page}"
            
            # Pro tip: Rotate user agents to help your scraper avoid detection
            headers = {
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
            }
            
            response = await client.get(url, headers=headers)
            data = find_hidden_review_data(response.text)
            
            # Navigate the JSON to find company content
            businesses = data['props']['pageProps']['businessUnits']['businesses']
            results.extend(businesses)
            
    return results

The Exploit: Get Bulk Company Data via Categories

Here's a trick most Trustpilot scrapers miss - instead of searching, use category pages. They load up to 100 companies at once, helping you scrape more content faster:

# This URL helps you get 100 electronics companies in one shot
category_url = "https://www.trustpilot.com/categories/electronics_technology?numberofreviews=0&status=all&page=1"

Common pitfall: Don't hammer the Trustpilot web server with concurrent requests. Add random delays between 1-3 seconds to help your scraper mimic human behavior.

Step 2: Scrape Company Details from Trustpilot Profile Pages

Company pages follow a predictable URL pattern: trustpilot.com/review/{domain}. But here's where it gets interesting - the same __NEXT_DATA__ technique helps us find and scrape company content too.

Advanced Code to Get Company Data with Error Handling

async def scrape_trustpilot_company_data(company_urls, max_concurrent=5):
    """Scrape multiple Trustpilot company pages to get review data"""
    import asyncio
    from random import uniform
    
    async def get_company_content(session, url):
        try:
            # Add jitter to help your web scraper
            await asyncio.sleep(uniform(0.5, 2))
            
            response = await session.get(url)
            data = find_hidden_review_data(response.text)
            
            return {
                'company_info': data['props']['pageProps']['businessUnit'],
                'review_sample': data['props']['pageProps']['reviews'],
                'trust_score': data['props']['pageProps']['businessUnit']['trustScore'],
                'date_claimed': data['props']['pageProps']['businessUnit']['claimedDate']
            }
        except Exception as e:
            print(f"Failed to scrape {url}: {e}")
            return None
    
    async with httpx.AsyncClient() as session:
        # Limit concurrent requests to help avoid blocking
        semaphore = asyncio.Semaphore(max_concurrent)
        
        async def bounded_fetch(url):
            async with semaphore:
                return await get_company_content(session, url)
        
        tasks = [bounded_fetch(url) for url in company_urls]
        return await asyncio.gather(*tasks)
Pro tip: The businessUnit object contains verified status, claim date, and response rates - data that helps with competitive analysis. This content is gold when you scrape Trustpilot company pages.

Step 3: Get Reviews Using Trustpilot's Private API

Here's where we find the real treasure. Trustpilot loads review content via an undocumented web API endpoint. Time to reverse-engineer this code.

Find the Secret Review API

  1. Open Chrome DevTools (F12)
  2. Navigate to Network tab
  3. Filter by "Fetch/XHR"
  4. Click "Next page" on reviews
  5. Boom - there's your API endpoint to get review data

The pattern to find reviews looks like this:

https://www.trustpilot.com/_next/data/{BUILD_ID}/review/{COMPANY}.json

Code to Get All Reviews

async def get_all_trustpilot_reviews(company_domain, max_pages=None):
    """Scrape all reviews using Trustpilot's private web API"""
    
    # First, get the build ID from the main page
    async with httpx.AsyncClient() as client:
        main_page = await client.get(f"https://www.trustpilot.com/review/{company_domain}")
        data = find_hidden_review_data(main_page.text)
        build_id = data['buildId']
        
        # Construct API endpoint to get review content
        api_url = f"https://www.trustpilot.com/_next/data/{build_id}/review/{company_domain}.json"
        
        all_reviews = []
        page = 1
        
        while True:
            params = {
                'businessUnit': company_domain,
                'page': page,
                'sort': 'recency'  # get reviews by date
            }
            
            response = await client.get(api_url, params=params)
            review_data = response.json()
            
            reviews = review_data['pageProps']['reviews']
            all_reviews.extend(reviews)
            
            # Check if more pages exist to scrape
            total_pages = review_data['pageProps']['filters']['pagination']['totalPages']
            if page >= total_pages or (max_pages and page >= max_pages):
                break
                
            page += 1
            await asyncio.sleep(uniform(1, 2))  # Help your scraper be nice
        
        return all_reviews

Find and Get Filtered Reviews

Want to find only 1-star review content? Add filters to help your scraper get specific data:

params = {
    'businessUnit': company_domain,
    'stars': '1',  # Find only 1-star reviews
    'verified': 'true',  # Get only verified review data
    'date': 'last6months',  # Recent reviews only
    'page': page
}

Step 4: Help Your Scraper Bypass Anti-Scraping Protection

Trustpilot isn't defenseless. They use rate limiting, IP tracking, and TLS fingerprinting to find and block web scrapers. Here's code to help you stay under the radar.

Method 1: Request-Based Stealth Scraper Code

import httpx
from itertools import cycle
import random

class StealthTrustpilotScraper:
    def __init__(self, proxies=None):
        self.proxies = cycle(proxies) if proxies else None
        self.user_agents = [
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
            'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36',
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101',
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15'
        ]
        
    async def get_trustpilot_content(self, url):
        """Help scraper get web page content without detection"""
        headers = {
            'User-Agent': random.choice(self.user_agents),
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate',
            'DNT': '1',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
            'Referer': 'https://www.google.com/'  # Help bypass referrer checks
        }
        
        proxy = next(self.proxies) if self.proxies else None
        
        async with httpx.AsyncClient(proxies=proxy) as client:
            return await client.get(url, headers=headers, follow_redirects=True)

Method 2: Browser Automation to Scrape Dynamic Content

Sometimes you need the heavy artillery to get review data. Here's Playwright code that helps scrape pages that require JavaScript:

from playwright.async_api import async_playwright
import random

async def scrape_trustpilot_with_browser(urls):
    """Use browser automation to find and scrape Trustpilot content"""
    async with async_playwright() as p:
        # Launch browser to help scrape dynamic pages
        browser = await p.chromium.launch(
            headless=True,
            args=[
                '--disable-blink-features=AutomationControlled',
                '--disable-features=IsolateOrigins,site-per-process',
                '--disable-site-isolation-trials',
                '--disable-web-security',
                '--disable-features=IsolateOrigins',
                '--disable-site-isolation-trials'
            ]
        )
        
        context = await browser.new_context(
            viewport={'width': 1920, 'height': 1080},
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            extra_http_headers={
                'Accept-Language': 'en-US,en;q=0.9',
            }
        )
        
        # Code to help bypass detection
        await context.add_init_script("""
            // Help scraper avoid webdriver detection
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined
            });
            
            // Mock chrome to help bypass checks
            window.chrome = { runtime: {} };
            
            // Add plugins to help look real
            Object.defineProperty(navigator, 'plugins', {
                get: () => [1, 2, 3, 4, 5]
            });
            
            // Help avoid fingerprinting
            const originalQuery = window.navigator.permissions.query;
            window.navigator.permissions.query = (parameters) => (
                parameters.name === 'notifications' ?
                Promise.resolve({ state: Notification.permission }) :
                originalQuery(parameters)
            );
        """)
        
        page = await context.new_page()
        
        results = []
        for url in urls:
            # Random delay to help scraper seem human
            await page.wait_for_timeout(random.randint(2000, 5000))
            
            await page.goto(url, wait_until='networkidle')
            
            # Find and extract review data using browser
            data = await page.evaluate("""
                () => {
                    // Find the script tag with review content
                    const scriptTag = document.querySelector('#__NEXT_DATA__');
                    return scriptTag ? JSON.parse(scriptTag.textContent) : null;
                }
            """)
            
            results.append(data)
        
        await browser.close()
        return results

The Ultimate Code: Using Residential Proxies

Free proxies get blocked instantly when you scrape Trustpilot. Here's code using premium proxies to help get data:

# Code to use residential proxies for web scraping
residential_proxies = [
    "http://user:pass@residential-proxy1.com:8080",
    "http://user:pass@residential-proxy2.com:8080",
    "http://user:pass@residential-proxy3.com:8080",
    # Add more proxies to help rotate
]

# Initialize scraper with proxies to get Trustpilot data
scraper = StealthTrustpilotScraper(proxies=residential_proxies)

# Now your code can scrape without getting blocked
response = await scraper.get_trustpilot_content("https://www.trustpilot.com/review/example.com")

Step 5: Scale Your Web Scraper to Get Millions of Reviews

Time to go from toy scraper to production beast. Here's code to help you scrape Trustpilot at scale without melting your server.

Concurrent Processing Code to Get Review Data Fast

import asyncio
from asyncio import Queue
import aiofiles
import json
from datetime import datetime

class TrustpilotScrapingPipeline:
    def __init__(self, max_workers=10):
        self.queue = Queue(maxsize=100)  # Help prevent memory overflow
        self.max_workers = max_workers
        self.results_file = f'trustpilot_reviews_{datetime.now().strftime("%Y%m%d")}.jsonl'
        
    async def find_companies_to_scrape(self, companies):
        """Add company pages to processing queue"""
        for company in companies:
            await self.queue.put(company)
        
        # Signal workers to stop
        for _ in range(self.max_workers):
            await self.queue.put(None)
    
    async def scrape_worker(self, worker_id):
        """Worker to get and process Trustpilot review data"""
        scraper = StealthTrustpilotScraper()
        
        while True:
            company = await self.queue.get()
            if company is None:
                break
                
            try:
                print(f"Worker {worker_id} scraping {company} reviews")
                # Get all review content for this company
                reviews = await get_all_trustpilot_reviews(company, max_pages=10)
                
                # Stream review data to file
                async with aiofiles.open(self.results_file, 'a') as f:
                    for review in reviews:
                        review_data = {
                            'company': company,
                            'review_id': review['id'],
                            'content': review['text'],
                            'rating': review['rating'],
                            'date': review['dates']['publishedDate'],
                            'scraped_date': datetime.now().isoformat()
                        }
                        await f.write(json.dumps(review_data) + '\n')
                        
            except Exception as e:
                print(f"Worker {worker_id} failed to scrape {company}: {e}")
            finally:
                self.queue.task_done()
    
    async def run(self, companies):
        """Run the web scraping pipeline to get all data"""
        # Start workers to scrape in parallel
        workers = [
            asyncio.create_task(self.scrape_worker(i)) 
            for i in range(self.max_workers)
        ]
        
        # Find and queue companies
        await self.find_companies_to_scrape(companies)
        
        # Wait for all scraping to complete
        await self.queue.join()
        await asyncio.gather(*workers)

# Code to use the pipeline
async def scrape_multiple_companies():
    companies = [
        'amazon.com',
        'ebay.com',
        'walmart.com',
        # Add more company domains to scrape
    ]
    
    pipeline = TrustpilotScrapingPipeline(max_workers=5)
    await pipeline.run(companies)

Database Storage Code for Review Analytics

import asyncpg
from datetime import datetime

async def store_trustpilot_data(reviews, company):
    """Store scraped Trustpilot review content in PostgreSQL"""
    conn = await asyncpg.connect('postgresql://user:password@localhost/trustpilot')
    
    # Create table to store review data
    await conn.execute('''
        CREATE TABLE IF NOT EXISTS trustpilot_reviews (
            id TEXT PRIMARY KEY,
            company_domain TEXT,
            rating INTEGER,
            title TEXT,
            content TEXT,
            author_name TEXT,
            verified BOOLEAN,
            review_date TIMESTAMP,
            experience_date DATE,
            scraped_date TIMESTAMP DEFAULT NOW(),
            page_number INTEGER
        )
    ''')
    
    # Prepare review data for bulk insert
    records = []
    for review in reviews:
        # Parse date from review content
        review_date = datetime.fromisoformat(
            review['dates']['publishedDate'].replace('Z', '+00:00')
        )
        experience_date = datetime.fromisoformat(
            review['dates']['experiencedDate'].replace('Z', '+00:00')
        ).date()
        
        records.append((
            review['id'],
            company,
            review['rating'],
            review.get('title', ''),
            review.get('text', ''),
            review['consumer']['displayName'],
            review['labels']['verification']['isVerified'],
            review_date,
            experience_date
        ))
    
    # Bulk insert to help speed up data storage
    await conn.copy_records_to_table(
        'trustpilot_reviews',
        records=records,
        columns=['id', 'company_domain', 'rating', 'title', 'content', 
                'author_name', 'verified', 'review_date', 'experience_date']
    )
    
    await conn.close()
    print(f"Stored {len(records)} reviews for {company}")

Monitoring Code to Help Track Scraper Performance

import time
from collections import deque
from datetime import datetime

class TrustpilotScraperMonitor:
    def __init__(self, window_size=100):
        self.success_rate = deque(maxlen=window_size)
        self.response_times = deque(maxlen=window_size)
        self.last_review_date = None
        
    async def track_scraping_request(self, func, *args, **kwargs):
        """Monitor code to help track web scraper performance"""
        start = time.time()
        success = False
        
        try:
            result = await func(*args, **kwargs)
            success = True
            
            # Track latest review date to find update frequency
            if isinstance(result, list) and result:
                dates = [r.get('dates', {}).get('publishedDate') for r in result]
                if dates:
                    self.last_review_date = max(dates)
            
            return result
        except Exception as e:
            print(f"Scraping request failed: {e}")
            raise
        finally:
            elapsed = time.time() - start
            self.success_rate.append(1 if success else 0)
            self.response_times.append(elapsed)
            
            # Alert if scraper performance drops
            if len(self.success_rate) == self.success_rate.maxlen:
                rate = sum(self.success_rate) / len(self.success_rate)
                avg_time = sum(self.response_times) / len(self.response_times)
                
                if rate < 0.8:  # Less than 80% success
                    print(f"⚠️ WARNING: Scraper success rate dropped to {rate:.1%}")
                    print(f"⚠️ Average response time: {avg_time:.2f}s")
                    print(f"⚠️ Consider adding more proxies or reducing request rate")
                
                if avg_time > 5.0:  # Responses taking too long
                    print(f"⚠️ SLOW: Web pages taking {avg_time:.2f}s to load")

Common mistakes that hurt your scraper:

  1. Over-scraping: Limit requests to help avoid bans (1-2 per second per IP)
  2. Ignoring robots.txt: Check what Trustpilot allows at trustpilot.com/robots.txt
  3. Not handling errors: Use exponential backoff when you get rate limited
  4. Memory leaks: Stream data instead of loading all review content at once

Final Thoughts

You now have all the code needed to build a production-grade Trustpilot scraper. The key is respecting the web platform while efficiently getting review data. Start small, monitor your success rates, and scale gradually.

Marius Bernard

Marius Bernard

Marius Bernard is a Product Advisor, Technical SEO, & Brand Ambassador at Roundproxies. He was the lead author for the SEO chapter of the 2024 Web and a reviewer for the 2023 SEO chapter.