Web Scraping

How to scrape Shopee: 4 methods for prices & reviews (2026)

You've built scrapers for dozens of eCommerce sites. Then you point your code at Shopee and watch it fail spectacularly.

Shopee isn't like other sites. It's one of the most heavily protected eCommerce platforms in Southeast Asia, serving nearly 300 million active users across multiple countries.

The platform employs login walls, aggressive fingerprinting, JavaScript-rendered content, and frequent DOM changes that break most standard scraping approaches.

In this guide, I'll show you how to scrape Shopee using custom Python solutions. No paid APIs. No subscription services. Just clean, working code you can run today.

What You'll Learn

Scraping Shopee requires understanding its defenses before writing any code. This guide covers:

  • Why standard scrapers fail against Shopee
  • Setting up stealth browser automation with Playwright
  • Handling authentication and session persistence
  • Extracting product data, prices, and reviews
  • Implementing proxy rotation to avoid IP bans
  • Storing scraped data in JSON and CSV formats

Let's start by understanding what makes Shopee different.

Why Is Shopee So Hard to Scrape?

Shopee uses multiple layers of protection that work together to detect and block automated access. Understanding these defenses is the first step to bypassing them.

JavaScript-Rendered Content

Shopee loads product data dynamically through JavaScript. Send a basic HTTP request and you'll get an empty shell with no useful data.

The requests library returns this:

<div id="main"></div>

All the actual product information loads after JavaScript executes in a browser environment.

Mandatory Login Wall

Unlike Amazon or eBay, Shopee forces authentication for most useful data. Without logging in, you'll hit redirect loops and blocked pages.

The platform requires:

  • Email/password authentication
  • Phone number verification (OTP)
  • Region-specific phone numbers for new accounts

Aggressive Bot Detection

Shopee employs sophisticated fingerprinting that checks:

  • Browser automation flags (Selenium, Puppeteer detection)
  • Canvas and WebGL fingerprints
  • Mouse movement patterns
  • Request timing and frequency
  • IP reputation and geolocation consistency

Frequent DOM Changes

Shopee updates its CSS selectors and page structure regularly. A scraper working today might break tomorrow when class names change from .shopee-search-item to .search-item-result__item.

This requires building scrapers that adapt to structural changes.

Method Overview: 4 Approaches to Scrape Shopee

Before diving into code, here's how different methods compare:

Method Difficulty Cost Success Rate Best For
Basic HTTP Requests Easy Free Very Low Won't work
Standard Playwright Medium Free Low Testing only
Stealth Playwright Medium Free High Most use cases
Playwright + Anti-Detect Hard Free Very High Scale scraping
Quick recommendation: Start with stealth Playwright. It handles 80% of Shopee scraping needs without additional complexity.

Prerequisites

Before writing any code, ensure you have:

  • Python 3.9 or higher
  • A Shopee account (create one with a local phone number)
  • Basic understanding of async Python
  • Chrome/Chromium browser installed

Install the required packages:

pip install playwright playwright-stealth aiofiles
playwright install chromium

The playwright-stealth package patches common automation detection points that Shopee checks.

Method 1: Stealth Playwright Setup

Standard Playwright gets detected immediately. Shopee checks for automation flags like navigator.webdriver being true.

Stealth Playwright patches these detection points automatically.

Basic Stealth Configuration

Create a file called shopee_scraper.py:

import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async

async def create_stealth_browser():
    """Initialize a stealth browser that bypasses basic detection."""
    playwright = await async_playwright().start()
    
    browser = await playwright.chromium.launch(
        headless=False,  # Run headed first to debug
        args=[
            '--disable-blink-features=AutomationControlled',
            '--disable-dev-shm-usage',
            '--no-sandbox',
            '--disable-setuid-sandbox',
        ]
    )
    
    context = await browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        locale='en-SG',
        timezone_id='Asia/Singapore',
    )
    
    page = await context.new_page()
    await stealth_async(page)
    
    return playwright, browser, context, page

The --disable-blink-features=AutomationControlled flag removes the automation indicator that many sites check.

Setting locale and timezone to match your target Shopee region (Singapore in this example) helps avoid geographical mismatches.

Testing the Connection

Add a simple test to verify the setup works:

async def test_connection():
    """Verify we can reach Shopee without immediate blocking."""
    playwright, browser, context, page = await create_stealth_browser()
    
    try:
        await page.goto('https://shopee.sg', wait_until='networkidle')
        await page.wait_for_timeout(3000)
        
        title = await page.title()
        print(f"Page title: {title}")
        
        # Check if we're blocked or redirected
        current_url = page.url
        if 'blocked' in current_url.lower() or 'captcha' in current_url.lower():
            print("Detected blocking - stealth may need adjustment")
        else:
            print("Successfully loaded Shopee homepage")
            
    finally:
        await browser.close()
        await playwright.stop()

if __name__ == "__main__":
    asyncio.run(test_connection())

Run this before proceeding. If you see the homepage load successfully, your stealth configuration works.

Method 2: Handling Shopee Authentication

Scraping Shopee requires authenticated sessions for meaningful data. There are two approaches: manual login with cookie persistence, or automated login with OTP handling.

The safest method is logging in manually once, then reusing those cookies:

import json
import os

async def login_and_save_cookies(page, cookies_file='shopee_cookies.json'):
    """Navigate to login, wait for manual authentication, save cookies."""
    await page.goto('https://shopee.sg/buyer/login')
    
    print("Please log in manually in the browser window...")
    print("Press Enter after you've successfully logged in.")
    input()
    
    # Save cookies for future sessions
    cookies = await page.context.cookies()
    with open(cookies_file, 'w') as f:
        json.dump(cookies, f, indent=2)
    
    print(f"Cookies saved to {cookies_file}")
    return cookies

async def load_cookies(context, cookies_file='shopee_cookies.json'):
    """Load previously saved cookies into the browser context."""
    if not os.path.exists(cookies_file):
        return False
    
    with open(cookies_file, 'r') as f:
        cookies = json.load(f)
    
    await context.add_cookies(cookies)
    return True

This approach:

  • Requires only one manual login
  • Stores session data locally
  • Works for days before requiring re-authentication

Approach B: Session Persistence with Browser Profiles

For longer-lasting sessions, save the entire browser state:

async def create_persistent_context():
    """Create a browser context that persists across sessions."""
    playwright = await async_playwright().start()
    
    # User data directory stores cookies, localStorage, etc.
    user_data_dir = './shopee_profile'
    
    context = await playwright.chromium.launch_persistent_context(
        user_data_dir,
        headless=False,
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        locale='en-SG',
        timezone_id='Asia/Singapore',
        args=['--disable-blink-features=AutomationControlled'],
    )
    
    page = context.pages[0] if context.pages else await context.new_page()
    await stealth_async(page)
    
    return playwright, context, page

Browser profiles maintain all session data automatically. After the first login, subsequent script runs stay authenticated.

Method 3: Extracting Product Data

With authentication handled, let's extract actual product information.

Scraping Search Results

This function searches for products and extracts basic information:

async def scrape_search_results(page, keyword, max_pages=3):
    """Search Shopee and extract product listings."""
    products = []
    
    search_url = f'https://shopee.sg/search?keyword={keyword}'
    await page.goto(search_url, wait_until='networkidle')
    
    for page_num in range(max_pages):
        print(f"Scraping page {page_num + 1}...")
        
        # Wait for product cards to load
        await page.wait_for_selector('.shopee-search-item-result__item', timeout=10000)
        
        # Scroll to trigger lazy loading
        await scroll_page(page)
        
        # Extract product data
        items = await page.query_selector_all('.shopee-search-item-result__item')
        
        for item in items:
            product = await extract_product_card(item)
            if product:
                products.append(product)
        
        # Navigate to next page
        next_button = await page.query_selector('[class*="next-page"]')
        if next_button:
            await next_button.click()
            await page.wait_for_timeout(2000)
        else:
            break
    
    return products

async def scroll_page(page):
    """Scroll down to trigger lazy loading of images and data."""
    await page.evaluate('''
        async () => {
            await new Promise(resolve => {
                let totalHeight = 0;
                const distance = 300;
                const timer = setInterval(() => {
                    window.scrollBy(0, distance);
                    totalHeight += distance;
                    if (totalHeight >= document.body.scrollHeight) {
                        clearInterval(timer);
                        resolve();
                    }
                }, 100);
            });
        }
    ''')

The scroll function is essential. Shopee lazy-loads product images and some data fields.

Extracting Product Card Information

Parse individual product cards:

async def extract_product_card(item):
    """Extract data from a single product card element."""
    try:
        # Product name
        name_el = await item.query_selector('[data-sqe="name"]')
        name = await name_el.inner_text() if name_el else None
        
        # Price - handle both discounted and regular prices
        price_el = await item.query_selector('.price')
        price = await price_el.inner_text() if price_el else None
        
        # Clean price string
        if price:
            price = price.replace('$', '').replace(',', '').strip()
        
        # Product link
        link_el = await item.query_selector('a')
        link = await link_el.get_attribute('href') if link_el else None
        
        if link and not link.startswith('http'):
            link = f'https://shopee.sg{link}'
        
        # Sold count
        sold_el = await item.query_selector('.sold')
        sold = await sold_el.inner_text() if sold_el else '0'
        
        # Rating
        rating_el = await item.query_selector('.rating')
        rating = await rating_el.inner_text() if rating_el else None
        
        return {
            'name': name,
            'price': price,
            'link': link,
            'sold': sold,
            'rating': rating,
        }
        
    except Exception as e:
        print(f"Error extracting product: {e}")
        return None

Important: Shopee changes selector names frequently. If scraping fails, inspect the page structure and update selectors accordingly.

Scraping Individual Product Pages

For detailed product information, navigate to individual pages:

async def scrape_product_details(page, product_url):
    """Extract detailed information from a product page."""
    await page.goto(product_url, wait_until='networkidle')
    await page.wait_for_timeout(2000)
    
    # Scroll to load all content
    await scroll_page(page)
    
    details = {}
    
    try:
        # Product title
        title_el = await page.query_selector('.product-title')
        details['title'] = await title_el.inner_text() if title_el else None
        
        # Current price
        price_el = await page.query_selector('.price-current')
        details['price'] = await price_el.inner_text() if price_el else None
        
        # Original price (if discounted)
        original_el = await page.query_selector('.price-original')
        details['original_price'] = await original_el.inner_text() if original_el else None
        
        # Stock quantity
        stock_el = await page.query_selector('.product-stock')
        details['stock'] = await stock_el.inner_text() if stock_el else None
        
        # Description
        desc_el = await page.query_selector('.product-description')
        details['description'] = await desc_el.inner_text() if desc_el else None
        
        # Seller information
        seller_el = await page.query_selector('.seller-name')
        details['seller'] = await seller_el.inner_text() if seller_el else None
        
        # Ratings summary
        rating_el = await page.query_selector('.product-rating')
        details['rating'] = await rating_el.inner_text() if rating_el else None
        
        # Number of reviews
        review_count_el = await page.query_selector('.review-count')
        details['review_count'] = await review_count_el.inner_text() if review_count_el else None
        
    except Exception as e:
        print(f"Error scraping product details: {e}")
    
    return details

Method 4: Proxy Rotation for Scale

Single IP scraping triggers rate limits quickly. Rotating residential proxies distribute requests across many IPs.

Implementing Proxy Rotation

import random

class ProxyRotator:
    """Manage a pool of proxies for request distribution."""
    
    def __init__(self, proxy_list):
        self.proxies = proxy_list
        self.current_index = 0
    
    def get_next_proxy(self):
        """Return the next proxy in rotation."""
        proxy = self.proxies[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.proxies)
        return proxy
    
    def get_random_proxy(self):
        """Return a random proxy from the pool."""
        return random.choice(self.proxies)

async def create_browser_with_proxy(proxy_url):
    """Launch a browser configured to use a specific proxy."""
    playwright = await async_playwright().start()
    
    browser = await playwright.chromium.launch(
        headless=False,
        proxy={'server': proxy_url},
        args=['--disable-blink-features=AutomationControlled'],
    )
    
    context = await browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    )
    
    page = await context.new_page()
    await stealth_async(page)
    
    return playwright, browser, context, page

Using the Proxy Rotator

# Example proxy list - replace with your actual proxies
proxies = [
    'http://user:pass@proxy1.example.com:8080',
    'http://user:pass@proxy2.example.com:8080',
    'http://user:pass@proxy3.example.com:8080',
]

rotator = ProxyRotator(proxies)

async def scrape_with_rotation(keywords, products_per_keyword=50):
    """Scrape multiple keywords, rotating proxies between requests."""
    all_products = []
    
    for keyword in keywords:
        proxy = rotator.get_next_proxy()
        print(f"Scraping '{keyword}' with proxy: {proxy}")
        
        playwright, browser, context, page = await create_browser_with_proxy(proxy)
        
        try:
            products = await scrape_search_results(page, keyword)
            all_products.extend(products)
        finally:
            await browser.close()
            await playwright.stop()
        
        # Delay between keywords
        await asyncio.sleep(random.uniform(5, 10))
    
    return all_products

For residential proxies that work well with Shopee's geo-restrictions, provide IPs from Southeast Asian countries where Shopee operates.

Rate Limiting and Request Delays

Aggressive scraping gets you blocked fast. Implement intelligent delays:

import random
import time

class RateLimiter:
    """Control request frequency to avoid triggering rate limits."""
    
    def __init__(self, requests_per_minute=30):
        self.min_delay = 60.0 / requests_per_minute
        self.last_request = 0
    
    async def wait(self):
        """Wait appropriate time before next request."""
        elapsed = time.time() - self.last_request
        if elapsed < self.min_delay:
            delay = self.min_delay - elapsed
            # Add random jitter
            delay += random.uniform(0.5, 2.0)
            await asyncio.sleep(delay)
        self.last_request = time.time()

# Usage
rate_limiter = RateLimiter(requests_per_minute=20)

async def scrape_with_rate_limit(page, urls):
    """Scrape URLs while respecting rate limits."""
    results = []
    
    for url in urls:
        await rate_limiter.wait()
        
        try:
            data = await scrape_product_details(page, url)
            results.append(data)
        except Exception as e:
            print(f"Error scraping {url}: {e}")
            continue
    
    return results

Keep requests under 30 per minute per IP. Lower is safer for long-running scrapes.

Saving Scraped Data

Export your data to usable formats:

JSON Export

import json
from datetime import datetime

def save_to_json(products, filename=None):
    """Save product list to JSON file."""
    if filename is None:
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f'shopee_products_{timestamp}.json'
    
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(products, f, ensure_ascii=False, indent=2)
    
    print(f"Saved {len(products)} products to {filename}")
    return filename

CSV Export

import csv

def save_to_csv(products, filename=None):
    """Save product list to CSV file."""
    if not products:
        print("No products to save")
        return None
    
    if filename is None:
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f'shopee_products_{timestamp}.csv'
    
    # Get all unique keys from products
    fieldnames = set()
    for product in products:
        fieldnames.update(product.keys())
    fieldnames = sorted(list(fieldnames))
    
    with open(filename, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(products)
    
    print(f"Saved {len(products)} products to {filename}")
    return filename

Complete Working Example

Here's the full scraper combining everything:

import asyncio
import json
import random
from datetime import datetime
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async

class ShopeeScraper:
    """Complete Shopee scraper with stealth and rate limiting."""
    
    def __init__(self, cookies_file='shopee_cookies.json'):
        self.cookies_file = cookies_file
        self.playwright = None
        self.browser = None
        self.context = None
        self.page = None
    
    async def start(self):
        """Initialize the browser with stealth configuration."""
        self.playwright = await async_playwright().start()
        
        self.browser = await self.playwright.chromium.launch(
            headless=False,
            args=['--disable-blink-features=AutomationControlled'],
        )
        
        self.context = await self.browser.new_context(
            viewport={'width': 1920, 'height': 1080},
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            locale='en-SG',
            timezone_id='Asia/Singapore',
        )
        
        self.page = await self.context.new_page()
        await stealth_async(self.page)
        
        # Load existing cookies if available
        await self._load_cookies()
    
    async def stop(self):
        """Clean up browser resources."""
        if self.browser:
            await self.browser.close()
        if self.playwright:
            await self.playwright.stop()
    
    async def _load_cookies(self):
        """Load saved cookies into browser context."""
        try:
            with open(self.cookies_file, 'r') as f:
                cookies = json.load(f)
            await self.context.add_cookies(cookies)
            print("Loaded existing session cookies")
        except FileNotFoundError:
            print("No saved cookies found - manual login required")
    
    async def _save_cookies(self):
        """Save current cookies to file."""
        cookies = await self.context.cookies()
        with open(self.cookies_file, 'w') as f:
            json.dump(cookies, f, indent=2)
    
    async def search_products(self, keyword, max_results=50):
        """Search and extract product listings."""
        products = []
        search_url = f'https://shopee.sg/search?keyword={keyword}'
        
        await self.page.goto(search_url, wait_until='networkidle')
        await self._scroll_page()
        
        items = await self.page.query_selector_all('[data-sqe="item"]')
        
        for item in items[:max_results]:
            await asyncio.sleep(random.uniform(0.1, 0.3))
            
            product = await self._extract_product_card(item)
            if product:
                products.append(product)
        
        return products
    
    async def _extract_product_card(self, item):
        """Extract data from a product card element."""
        try:
            name_el = await item.query_selector('[data-sqe="name"]')
            price_el = await item.query_selector('.price')
            link_el = await item.query_selector('a')
            
            return {
                'name': await name_el.inner_text() if name_el else None,
                'price': await price_el.inner_text() if price_el else None,
                'url': await link_el.get_attribute('href') if link_el else None,
                'scraped_at': datetime.now().isoformat(),
            }
        except Exception:
            return None
    
    async def _scroll_page(self):
        """Scroll to load lazy content."""
        await self.page.evaluate('''
            () => new Promise(resolve => {
                let total = 0;
                const timer = setInterval(() => {
                    window.scrollBy(0, 300);
                    total += 300;
                    if (total >= document.body.scrollHeight) {
                        clearInterval(timer);
                        resolve();
                    }
                }, 100);
            })
        ''')

async def main():
    """Main execution function."""
    scraper = ShopeeScraper()
    
    try:
        await scraper.start()
        
        # Search for products
        products = await scraper.search_products('wireless earbuds', max_results=20)
        
        # Save results
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f'shopee_results_{timestamp}.json'
        
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(products, f, ensure_ascii=False, indent=2)
        
        print(f"Scraped {len(products)} products")
        print(f"Results saved to {filename}")
        
    finally:
        await scraper.stop()

if __name__ == '__main__':
    asyncio.run(main())

Run this script with python shopee_scraper.py after configuring your cookies.

Extracting Product Reviews

Reviews provide valuable market intelligence. Here's how to scrape them:

Reviews load dynamically when you scroll to them. This function handles that:

async def scrape_product_reviews(page, product_url, max_reviews=50):
    """Extract reviews from a product page."""
    await page.goto(product_url, wait_until='networkidle')
    
    reviews = []
    
    # Scroll down to reviews section
    review_section = await page.query_selector('.product-ratings')
    if review_section:
        await review_section.scroll_into_view_if_needed()
        await page.wait_for_timeout(2000)
    
    # Click "All Reviews" tab if available
    all_reviews_tab = await page.query_selector('[data-filter="0"]')
    if all_reviews_tab:
        await all_reviews_tab.click()
        await page.wait_for_timeout(1500)
    
    while len(reviews) < max_reviews:
        # Extract visible reviews
        review_items = await page.query_selector_all('.shopee-product-rating')
        
        for item in review_items:
            review = await extract_single_review(item)
            if review and review not in reviews:
                reviews.append(review)
        
        # Check for "Next" button in pagination
        next_btn = await page.query_selector('.shopee-icon-button--right')
        if next_btn:
            is_disabled = await next_btn.get_attribute('disabled')
            if not is_disabled:
                await next_btn.click()
                await page.wait_for_timeout(2000)
            else:
                break
        else:
            break
    
    return reviews[:max_reviews]

async def extract_single_review(item):
    """Extract data from a single review element."""
    try:
        # Reviewer name
        author_el = await item.query_selector('.shopee-product-rating__author-name')
        author = await author_el.inner_text() if author_el else 'Anonymous'
        
        # Rating (count the filled stars)
        stars = await item.query_selector_all('.icon-rating-solid')
        rating = len(stars) if stars else None
        
        # Review text
        content_el = await item.query_selector('.shopee-product-rating__content')
        content = await content_el.inner_text() if content_el else ''
        
        # Review date
        date_el = await item.query_selector('.shopee-product-rating__time')
        date = await date_el.inner_text() if date_el else None
        
        # Product variation purchased
        variation_el = await item.query_selector('.shopee-product-rating__variation')
        variation = await variation_el.inner_text() if variation_el else None
        
        return {
            'author': author.strip(),
            'rating': rating,
            'content': content.strip(),
            'date': date,
            'variation': variation,
        }
        
    except Exception as e:
        print(f"Error extracting review: {e}")
        return None

Filtering Reviews by Rating

Shopee allows filtering reviews by star rating:

async def scrape_filtered_reviews(page, product_url, star_filter=None):
    """Scrape reviews filtered by star rating."""
    await page.goto(product_url, wait_until='networkidle')
    
    # Navigate to reviews section
    review_section = await page.query_selector('.product-ratings')
    if review_section:
        await review_section.scroll_into_view_if_needed()
        await page.wait_for_timeout(2000)
    
    # Apply star filter if specified (1-5)
    if star_filter and 1 <= star_filter <= 5:
        filter_btn = await page.query_selector(f'[data-filter="{star_filter}"]')
        if filter_btn:
            await filter_btn.click()
            await page.wait_for_timeout(1500)
    
    # Now extract reviews
    return await scrape_product_reviews(page, page.url)

This helps analyze negative reviews specifically or focus on highly positive feedback.

Scaling Your Shopee Scraper

When scraping thousands of products, single-threaded execution becomes too slow. Here's how to scale efficiently.

Concurrent Scraping with asyncio

Run multiple browser contexts simultaneously:

import asyncio
from asyncio import Semaphore

class ScalableShopeeScraper:
    """Handle concurrent scraping with controlled parallelism."""
    
    def __init__(self, max_concurrent=5):
        self.semaphore = Semaphore(max_concurrent)
        self.results = []
    
    async def scrape_url(self, url, playwright):
        """Scrape a single URL with semaphore control."""
        async with self.semaphore:
            browser = await playwright.chromium.launch(
                headless=True,
                args=['--disable-blink-features=AutomationControlled'],
            )
            
            try:
                context = await browser.new_context(
                    user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
                )
                page = await context.new_page()
                await stealth_async(page)
                
                # Add random delay
                await asyncio.sleep(random.uniform(1, 3))
                
                data = await scrape_product_details(page, url)
                return data
                
            finally:
                await browser.close()
    
    async def scrape_many(self, urls):
        """Scrape multiple URLs concurrently."""
        playwright = await async_playwright().start()
        
        try:
            tasks = [self.scrape_url(url, playwright) for url in urls]
            results = await asyncio.gather(*tasks, return_exceptions=True)
            
            # Filter out exceptions
            valid_results = [r for r in results if not isinstance(r, Exception)]
            return valid_results
            
        finally:
            await playwright.stop()

Using Semaphore(5) limits concurrent browsers to 5, preventing memory issues while still gaining significant speed improvements.

Distributed Scraping Across Machines

For enterprise-scale scraping, distribute work across multiple machines:

import json
import redis

class DistributedScraper:
    """Coordinate scraping across multiple workers."""
    
    def __init__(self, redis_host='localhost'):
        self.redis = redis.Redis(host=redis_host)
        self.queue_name = 'shopee_urls'
        self.results_name = 'shopee_results'
    
    def add_urls_to_queue(self, urls):
        """Add URLs to the distributed queue."""
        for url in urls:
            self.redis.rpush(self.queue_name, url)
        print(f"Added {len(urls)} URLs to queue")
    
    async def worker_loop(self, worker_id):
        """Main loop for a worker process."""
        print(f"Worker {worker_id} starting...")
        
        scraper = ShopeeScraper()
        await scraper.start()
        
        try:
            while True:
                # Get URL from queue
                url = self.redis.lpop(self.queue_name)
                if not url:
                    await asyncio.sleep(5)
                    continue
                
                url = url.decode('utf-8')
                print(f"Worker {worker_id} processing: {url}")
                
                try:
                    data = await scrape_product_details(scraper.page, url)
                    self.redis.rpush(self.results_name, json.dumps(data))
                except Exception as e:
                    print(f"Worker {worker_id} error: {e}")
                    # Re-queue failed URL
                    self.redis.rpush(self.queue_name, url)
                
                await asyncio.sleep(random.uniform(2, 5))
                
        finally:
            await scraper.stop()

Run multiple workers on different machines, each pulling URLs from the shared Redis queue.

Memory Management for Long Runs

Browser automation consumes significant memory. Restart browsers periodically:

class MemoryEfficientScraper:
    """Scraper that manages memory by recycling browsers."""
    
    def __init__(self, restart_after=100):
        self.request_count = 0
        self.restart_threshold = restart_after
        self.scraper = None
    
    async def ensure_browser(self):
        """Create or restart browser as needed."""
        if self.scraper is None or self.request_count >= self.restart_threshold:
            if self.scraper:
                await self.scraper.stop()
            
            self.scraper = ShopeeScraper()
            await self.scraper.start()
            self.request_count = 0
            print("Browser restarted for memory management")
        
        return self.scraper
    
    async def scrape(self, url):
        """Scrape with automatic browser recycling."""
        scraper = await self.ensure_browser()
        self.request_count += 1
        return await scrape_product_details(scraper.page, url)

Restarting every 100 requests prevents memory leaks from accumulating.

Troubleshooting Common Issues

Shopee pages load slowly. Increase timeout values:

await page.goto(url, timeout=60000, wait_until='networkidle')

CAPTCHA Challenges

If you encounter CAPTCHAs frequently:

  1. Reduce request frequency
  2. Use residential proxies instead of datacenter IPs
  3. Ensure browser fingerprint consistency
  4. Rotate user agents between sessions

Empty Product Data

Selectors change often. When scraping returns empty values:

  1. Open the browser in headed mode
  2. Inspect actual element classes
  3. Update selectors in your code

Login Session Expiration

Sessions expire after several hours. Implement automatic re-authentication:

async def check_login_status(page):
    """Verify if still logged in."""
    await page.goto('https://shopee.sg/user/account')
    return 'login' not in page.url.lower()

"Access Denied" or 403 Errors

These indicate detection. Try these fixes in order:

  1. Verify stealth patches are applied:
# Test if webdriver flag is hidden
result = await page.evaluate('navigator.webdriver')
print(f"Webdriver detected: {result}")  # Should be None or False
  1. Check timezone and locale match:
# Ensure these match your proxy location
context = await browser.new_context(
    locale='en-SG',
    timezone_id='Asia/Singapore',
    geolocation={'latitude': 1.3521, 'longitude': 103.8198},
)
  1. Rotate to a fresh IP address:
# Close current browser and start with new proxy
await browser.close()
new_proxy = proxy_rotator.get_random_proxy()
# Create new browser with fresh proxy

Infinite Loading or Stuck Pages

Shopee sometimes hangs during heavy JavaScript execution:

async def safe_navigate(page, url, max_retries=3):
    """Navigate with retry logic for stuck pages."""
    for attempt in range(max_retries):
        try:
            await page.goto(url, timeout=30000, wait_until='domcontentloaded')
            
            # Wait for key element instead of full load
            await page.wait_for_selector('.main-content', timeout=15000)
            return True
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                await page.reload()
                await asyncio.sleep(2)
    
    return False

Using domcontentloaded instead of networkidle prevents waiting for slow-loading trackers.

Data Returns Empty Despite Page Loading

This usually means selectors changed. Debug with:

async def debug_selectors(page):
    """Print page structure to find correct selectors."""
    # Get all class names on page
    classes = await page.evaluate('''
        () => {
            const elements = document.querySelectorAll('*');
            const classes = new Set();
            elements.forEach(el => {
                el.classList.forEach(c => classes.add(c));
            });
            return Array.from(classes).sort();
        }
    ''')
    
    print("Classes found on page:")
    for cls in classes[:50]:  # First 50
        print(f"  .{cls}")

Run this when scraping fails to discover current class names.

Best Practices for Shopee Scraping

Keep request rates low. Under 30 requests per minute per IP prevents most rate limiting. For safer operation, 15-20 requests per minute provides more headroom.

Use regional proxies. IPs from Singapore, Malaysia, Thailand, or other Southeast Asian countries avoid geo-blocking. Datacenter IPs get flagged quickly—residential proxies work much better for Shopee.

Persist sessions. Reuse browser profiles and cookies instead of logging in repeatedly. Each new login increases account risk flags.

Monitor for changes. Shopee updates its site frequently. Set up weekly tests that verify your selectors still work. A simple test that checks if key elements exist catches most breakages.

Implement exponential backoff. When you hit errors, increase wait times progressively:

async def scrape_with_backoff(page, url, max_retries=5):
    """Scrape with exponential backoff on failures."""
    base_delay = 2
    
    for attempt in range(max_retries):
        try:
            return await scrape_product_details(page, url)
        except Exception as e:
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed, waiting {delay:.1f}s")
            await asyncio.sleep(delay)
    
    raise Exception(f"Failed after {max_retries} attempts")

Respect the platform. Only collect publicly available data. Avoid personal information and honor robots.txt restrictions. Excessive scraping can lead to legal issues and harms the platform for other users.

Log everything. Track request counts, success rates, and error types:

import logging
from datetime import datetime

logging.basicConfig(
    filename=f'scraper_{datetime.now():%Y%m%d}.log',
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

async def logged_scrape(page, url):
    """Scrape with comprehensive logging."""
    start = datetime.now()
    
    try:
        result = await scrape_product_details(page, url)
        duration = (datetime.now() - start).total_seconds()
        logging.info(f"SUCCESS: {url} ({duration:.2f}s)")
        return result
    except Exception as e:
        logging.error(f"FAILED: {url} - {str(e)}")
        raise

Logs help diagnose issues and track scraper health over time.

Handle anti-fingerprinting properly. Beyond basic stealth patches, consider canvas fingerprint randomization:

await page.add_init_script('''
    // Randomize canvas fingerprint
    const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
    HTMLCanvasElement.prototype.toDataURL = function(type) {
        if (type === 'image/png' && this.width === 220 && this.height === 30) {
            const context = this.getContext('2d');
            const imageData = context.getImageData(0, 0, this.width, this.height);
            for (let i = 0; i < imageData.data.length; i += 4) {
                imageData.data[i] += Math.floor(Math.random() * 10) - 5;
            }
            context.putImageData(imageData, 0, 0);
        }
        return originalToDataURL.apply(this, arguments);
    };
''')

This adds noise to canvas fingerprints that anti-bot systems use for tracking.

Conclusion

Scraping Shopee requires more sophistication than typical eCommerce sites, but it's absolutely achievable with the right approach.

The combination of stealth Playwright, proper session management, and residential proxy rotation handles most scenarios effectively.

Start with the basic stealth configuration and add complexity only as needed. Most use cases work fine with cookies-based authentication and moderate rate limiting.

Your next steps:

  1. Set up Playwright with stealth patches
  2. Log in manually and save cookies
  3. Test with small searches before scaling
  4. Add proxy rotation when you need volume

The complete code examples in this guide give you everything needed to start scraping Shopee today.