How to Scrape Gymshark: Complete Python Guide (2026)

Gymshark runs on Shopify, which exposes product data through JSON endpoints and predictable URL structures. This guide covers five different methods to extract their entire product catalog.

The fastest way to scrape Gymshark is using Shopify's /products.json endpoint combined with TLS fingerprint spoofing via curl_cffi. This approach runs 10x faster than browser automation while bypassing Cloudflare's bot detection. You'll extract 2,000+ products in under 30 seconds using the methods below.

Why Scrape Gymshark?

Gymshark maintains over 2,000 fitness products across three regional stores. The UK brand updates inventory multiple times daily during sales events.

Price monitoring captures flash sales before they sell out. Competitor analysis reveals pricing strategies and new product launches.

Stock tracking shows which items sell fastest. This data feeds into market research and inventory planning systems.

Gymshark's Technical Architecture

Gymshark operates three separate Shopify stores with identical backend structures:

Region Domain Currency
US gymshark.com USD
UK uk.gymshark.com GBP
Rest of World row.gymshark.com Regional

Each store maintains its own product catalog, pricing, and inventory levels. The Shopify backend stays consistent across all three.

Your scraper code works on any Gymshark site. Just swap the domain.

Standard HTTP libraries like requests get blocked by Cloudflare's TLS fingerprinting. The curl_cffi library solves this by mimicking real browser TLS signatures.

Install Dependencies

pip install curl_cffi xmltodict pandas

This installs the TLS-spoofing HTTP client, XML parser, and data handling library.

Extract Product URLs from Sitemap

Gymshark publishes a product sitemap at /sitemap_products_1.xml. This contains every product URL on the site.

from curl_cffi import requests
import xmltodict

SITEMAP_URL = 'https://www.gymshark.com/sitemap_products_1.xml'

response = requests.get(
    SITEMAP_URL, 
    impersonate="chrome"
)

sitemap_data = xmltodict.parse(response.text)

The impersonate="chrome" parameter makes the request appear to come from a real Chrome browser. Cloudflare sees a valid TLS fingerprint and allows the request through.

# Extract all product URLs from sitemap
product_urls = []
for item in sitemap_data['urlset']['url']:
    url = item['loc']
    if '/products/' in url:
        product_urls.append(url)

print(f"Found {len(product_urls)} products")

This typically returns 2,200+ product URLs. The sitemap updates daily as new products launch.

Fetch Products via JSON API

Shopify's /products.json endpoint returns structured data without HTML parsing. Each request fetches up to 250 products.

from curl_cffi import requests
import time

def fetch_products_page(domain, page=1):
    """Fetch a single page of products from Shopify JSON API."""
    url = f"https://{domain}/products.json?limit=250&page={page}"
    
    response = requests.get(url, impersonate="chrome")
    
    if response.status_code != 200:
        return []
    
    data = response.json()
    return data.get('products', [])

The function returns an empty list when pagination ends. Shopify returns empty results after the last page.

def scrape_all_products(domain):
    """Scrape complete product catalog from a Gymshark store."""
    all_products = []
    page = 1
    
    while True:
        products = fetch_products_page(domain, page)
        
        if not products:
            break
        
        all_products.extend(products)
        print(f"Page {page}: {len(products)} products")
        
        page += 1
        time.sleep(0.3)  # Rate limiting
    
    return all_products

The rate limit of 0.3 seconds prevents triggering Shopify's abuse detection. Aggressive scraping leads to temporary IP blocks.

# Scrape US store
products = scrape_all_products('www.gymshark.com')
print(f"Total products: {len(products)}")

Expect around 2,200 products in 8-10 pages. The entire catalog downloads in under 30 seconds.

Parse Product Data

Each product object contains nested variant information. Extract the fields you need into a flat structure.

def parse_product(product):
    """Extract relevant fields from product JSON."""
    parsed = {
        'id': product['id'],
        'title': product['title'],
        'vendor': product['vendor'],
        'product_type': product['product_type'],
        'handle': product['handle'],
        'created_at': product['created_at'],
        'updated_at': product['updated_at'],
        'tags': ', '.join(product.get('tags', [])),
    }
    
    # Get primary variant pricing
    if product.get('variants'):
        variant = product['variants'][0]
        parsed['price'] = variant.get('price')
        parsed['compare_at_price'] = variant.get('compare_at_price')
        parsed['sku'] = variant.get('sku')
        parsed['available'] = variant.get('available')
    
    return parsed

The compare_at_price field shows the original price when items are on sale. This helps identify discounted products.

import pandas as pd

# Parse all products
parsed_products = [parse_product(p) for p in products]

# Create DataFrame
df = pd.DataFrame(parsed_products)
df.to_csv('gymshark_products.csv', index=False)

print(df.head())

Method 2: Async Scraping for Speed

Sequential requests waste time waiting for responses. Async scraping processes multiple requests concurrently.

from curl_cffi.requests import AsyncSession
import asyncio

async def fetch_page_async(session, domain, page):
    """Fetch a single page asynchronously."""
    url = f"https://{domain}/products.json?limit=250&page={page}"
    
    response = await session.get(url, impersonate="chrome")
    
    if response.status_code != 200:
        return []
    
    data = response.json()
    return data.get('products', [])

The AsyncSession handles concurrent connections efficiently. Each request runs independently without blocking others.

async def scrape_all_async(domain, max_pages=20):
    """Scrape all products using concurrent requests."""
    async with AsyncSession() as session:
        # Create tasks for all pages
        tasks = [
            fetch_page_async(session, domain, page) 
            for page in range(1, max_pages + 1)
        ]
        
        # Execute all requests concurrently
        results = await asyncio.gather(*tasks)
        
        # Flatten results
        all_products = []
        for page_products in results:
            all_products.extend(page_products)
        
        return all_products

# Run async scraper
products = asyncio.run(scrape_all_async('www.gymshark.com'))
print(f"Scraped {len(products)} products")

Async scraping completes in 3-5 seconds versus 30+ seconds for sequential requests. Set max_pages high enough to capture all products.

Batch Processing with Rate Limits

Too many concurrent requests trigger rate limits. Batch processing balances speed and reliability.

async def scrape_with_batches(domain, batch_size=5):
    """Scrape products in controlled batches."""
    async with AsyncSession() as session:
        all_products = []
        page = 1
        
        while True:
            # Create batch of tasks
            tasks = [
                fetch_page_async(session, domain, page + i)
                for i in range(batch_size)
            ]
            
            results = await asyncio.gather(*tasks)
            
            # Check if we've reached the end
            batch_products = []
            for result in results:
                batch_products.extend(result)
            
            if not batch_products:
                break
            
            all_products.extend(batch_products)
            page += batch_size
            
            await asyncio.sleep(0.5)  # Pause between batches
        
        return all_products

This approach scrapes 5 pages at once, pauses briefly, then continues. You get speed without overwhelming the server.

Method 3: Individual Product Pages

Some product details only appear on individual product pages. The JSON endpoint provides basic info, but product pages contain reviews, sizing guides, and additional images.

Fetch Single Product JSON

Each product has a dedicated JSON endpoint at /products/{handle}.json.

def fetch_product_details(domain, handle):
    """Fetch detailed product data from individual endpoint."""
    url = f"https://{domain}/products/{handle}.json"
    
    response = requests.get(url, impersonate="chrome")
    
    if response.status_code != 200:
        return None
    
    return response.json().get('product')

Individual product endpoints return the same structure as the list endpoint, but with fresher data. Use this for real-time price checks.

# Get detailed info for a specific product
product = fetch_product_details(
    'www.gymshark.com', 
    'gymshark-vital-seamless-2-0-leggings'
)

if product:
    print(f"Title: {product['title']}")
    print(f"Variants: {len(product['variants'])}")

Extract All Variants

Products like leggings have multiple size and color variants. Each variant has its own price, SKU, and availability.

def extract_variants(product):
    """Extract all variant information from a product."""
    variants = []
    
    for variant in product.get('variants', []):
        variants.append({
            'product_id': product['id'],
            'product_title': product['title'],
            'variant_id': variant['id'],
            'variant_title': variant['title'],
            'price': variant['price'],
            'compare_at_price': variant.get('compare_at_price'),
            'sku': variant.get('sku'),
            'available': variant.get('available'),
            'inventory_quantity': variant.get('inventory_quantity'),
        })
    
    return variants

Inventory quantity shows exact stock levels when available. Some stores hide this field for competitive reasons.

Method 4: Browser Automation with Playwright

JavaScript-rendered content requires browser automation. Gymshark's product pages load reviews and dynamic pricing through JavaScript.

Basic Playwright Setup

pip install playwright playwright-stealth
playwright install chromium

The stealth plugin patches detection signatures that Cloudflare looks for.

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def scrape_with_browser(url):
    """Scrape page content using headless browser."""
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={'width': 1920, 'height': 1080},
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        )
        
        page = context.new_page()
        stealth_sync(page)  # Apply stealth patches
        
        page.goto(url, wait_until='networkidle')
        
        content = page.content()
        browser.close()
        
        return content

The stealth_sync() function modifies browser fingerprints to avoid detection. It patches navigator.webdriver and other telltale properties.

Extract Product Details from HTML

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def scrape_product_page(url):
    """Extract product details from rendered page."""
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        stealth_sync(page)
        
        page.goto(url, wait_until='networkidle')
        
        # Wait for product details to load
        page.wait_for_selector('[data-testid="product-title"]', timeout=10000)
        
        # Extract data
        title = page.locator('[data-testid="product-title"]').inner_text()
        price = page.locator('[data-testid="product-price"]').inner_text()
        
        # Get all available sizes
        sizes = page.locator('[data-testid="size-button"]').all_inner_texts()
        
        browser.close()
        
        return {
            'title': title,
            'price': price,
            'sizes': sizes
        }

Selector classes change periodically. Check the page source when scraping fails.

Intercept Network Requests

Browser automation can capture API responses that pages make internally. This reveals hidden data endpoints.

def intercept_api_calls(url):
    """Capture API responses made by the page."""
    api_responses = []
    
    def handle_response(response):
        if '/api/' in response.url or 'graphql' in response.url:
            try:
                api_responses.append({
                    'url': response.url,
                    'status': response.status,
                    'body': response.json()
                })
            except:
                pass
    
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        stealth_sync(page)
        
        page.on('response', handle_response)
        page.goto(url, wait_until='networkidle')
        
        browser.close()
    
    return api_responses

This technique often discovers undocumented APIs. Gymshark may use internal endpoints for reviews, recommendations, or inventory checks.

Method 5: Collections and Category Scraping

Shopify organizes products into collections. Scraping by collection helps filter specific product types.

List All Collections

def fetch_collections(domain):
    """Fetch all product collections."""
    url = f"https://{domain}/collections.json"
    
    response = requests.get(url, impersonate="chrome")
    
    if response.status_code != 200:
        return []
    
    data = response.json()
    return data.get('collections', [])

Collections represent categories like "Women's Leggings" or "Men's Hoodies". Each has a handle for filtering.

# Get all collections
collections = fetch_collections('www.gymshark.com')

for collection in collections[:10]:
    print(f"{collection['title']}: {collection['handle']}")

Scrape Products by Collection

def fetch_collection_products(domain, collection_handle, page=1):
    """Fetch products from a specific collection."""
    url = f"https://{domain}/collections/{collection_handle}/products.json"
    url += f"?limit=250&page={page}"
    
    response = requests.get(url, impersonate="chrome")
    
    if response.status_code != 200:
        return []
    
    data = response.json()
    return data.get('products', [])

Collection-based scraping targets specific product categories without downloading the entire catalog.

# Scrape women's leggings only
leggings = fetch_collection_products(
    'www.gymshark.com', 
    'womens-leggings'
)

print(f"Found {len(leggings)} leggings")

Avoiding Detection and Blocks

Gymshark uses Cloudflare for bot protection. Here's how to stay under the radar.

Rotate User Agents

import random

USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
]

def get_random_headers():
    """Generate randomized request headers."""
    return {
        'User-Agent': random.choice(USER_AGENTS),
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
    }

Rotate user agents every 50-100 requests. Consistent fingerprints across thousands of requests raise flags.

Implement Proxy Rotation

Residential proxies avoid IP-based blocks when scraping at scale.

def fetch_with_proxy(url, proxy_url):
    """Make request through proxy server."""
    proxies = {
        'http': proxy_url,
        'https': proxy_url
    }
    
    response = requests.get(
        url, 
        impersonate="chrome",
        proxies=proxies
    )
    
    return response

Datacenter proxies get blocked quickly. Residential proxies from providers like Roundproxies.com mimic real user connections.

Handle Rate Limiting

import time
from curl_cffi import requests

def fetch_with_retry(url, max_retries=3):
    """Fetch URL with exponential backoff on failure."""
    for attempt in range(max_retries):
        try:
            response = requests.get(url, impersonate="chrome")
            
            if response.status_code == 429:
                wait_time = 2 ** attempt * 5
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)
                continue
            
            if response.status_code == 403:
                print("Blocked. Switching proxy...")
                time.sleep(10)
                continue
            
            return response
            
        except Exception as e:
            print(f"Error: {e}")
            time.sleep(2 ** attempt)
    
    return None

Exponential backoff prevents hammering a blocked server. Wait times increase with each failed attempt.

Build a Price Monitoring System

Automated price tracking catches sales and restocks instantly.

import json
import schedule
import time
from datetime import datetime

PRICE_FILE = 'price_history.json'

def load_price_history():
    """Load previous price data."""
    try:
        with open(PRICE_FILE, 'r') as f:
            return json.load(f)
    except FileNotFoundError:
        return {}

def save_price_history(data):
    """Save price data to file."""
    with open(PRICE_FILE, 'w') as f:
        json.dump(data, f, indent=2)

Persistent storage tracks prices over time. JSON format makes it easy to analyze trends.

def check_for_price_drops():
    """Compare current prices to historical data."""
    current_products = scrape_all_products('www.gymshark.com')
    history = load_price_history()
    
    alerts = []
    
    for product in current_products:
        product_id = str(product['id'])
        current_price = float(product['variants'][0]['price'])
        
        if product_id in history:
            old_price = history[product_id]['price']
            
            if current_price < old_price:
                drop_pct = ((old_price - current_price) / old_price) * 100
                
                alerts.append({
                    'title': product['title'],
                    'old_price': old_price,
                    'new_price': current_price,
                    'drop_percent': round(drop_pct, 1),
                    'url': f"https://www.gymshark.com/products/{product['handle']}"
                })
        
        history[product_id] = {
            'price': current_price,
            'title': product['title'],
            'updated': datetime.now().isoformat()
        }
    
    save_price_history(history)
    return alerts

The function returns a list of products with price drops. Set up notifications via email, Slack, or Discord.

def run_monitor():
    """Execute price check and display alerts."""
    print(f"\n[{datetime.now()}] Checking prices...")
    
    alerts = check_for_price_drops()
    
    if alerts:
        print(f"\n🚨 {len(alerts)} PRICE DROPS FOUND!")
        for alert in alerts:
            print(f"\n{alert['title']}")
            print(f"  ${alert['old_price']} → ${alert['new_price']}")
            print(f"  {alert['drop_percent']}% off")
            print(f"  {alert['url']}")
    else:
        print("No price changes detected")

# Schedule checks every 6 hours
schedule.every(6).hours.do(run_monitor)

# Run initial check
run_monitor()

# Keep running
while True:
    schedule.run_pending()
    time.sleep(60)

Deploy this on a cloud server for 24/7 monitoring. AWS Lambda or Google Cloud Functions work well for scheduled tasks.

Export Data for Analysis

Different output formats serve different use cases.

CSV Export with Pandas

import pandas as pd

def export_to_csv(products, filename='gymshark_products.csv'):
    """Export products to CSV file."""
    rows = []
    
    for product in products:
        for variant in product.get('variants', []):
            rows.append({
                'product_id': product['id'],
                'title': product['title'],
                'handle': product['handle'],
                'vendor': product['vendor'],
                'product_type': product['product_type'],
                'variant_id': variant['id'],
                'variant_title': variant['title'],
                'price': variant['price'],
                'compare_at_price': variant.get('compare_at_price'),
                'sku': variant.get('sku'),
                'available': variant.get('available'),
                'created_at': product['created_at'],
                'updated_at': product['updated_at'],
            })
    
    df = pd.DataFrame(rows)
    df.to_csv(filename, index=False)
    
    print(f"Exported {len(rows)} variants to {filename}")
    return df

CSV works with Excel, Google Sheets, and most analytics tools. Each row represents a single product variant.

JSON Export for APIs

import json

def export_to_json(products, filename='gymshark_products.json'):
    """Export products to JSON file."""
    with open(filename, 'w') as f:
        json.dump(products, f, indent=2)
    
    print(f"Exported {len(products)} products to {filename}")

JSON preserves the nested structure. Useful for feeding data into web applications or databases.

SQLite Database Storage

import sqlite3

def create_database():
    """Initialize SQLite database for product storage."""
    conn = sqlite3.connect('gymshark.db')
    cursor = conn.cursor()
    
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS products (
            id INTEGER PRIMARY KEY,
            title TEXT,
            handle TEXT UNIQUE,
            vendor TEXT,
            product_type TEXT,
            created_at TEXT,
            updated_at TEXT
        )
    ''')
    
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS variants (
            id INTEGER PRIMARY KEY,
            product_id INTEGER,
            title TEXT,
            price REAL,
            compare_at_price REAL,
            sku TEXT,
            available BOOLEAN,
            FOREIGN KEY (product_id) REFERENCES products(id)
        )
    ''')
    
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            variant_id INTEGER,
            price REAL,
            recorded_at TEXT,
            FOREIGN KEY (variant_id) REFERENCES variants(id)
        )
    ''')
    
    conn.commit()
    return conn

SQLite stores historical data efficiently. Query past prices to identify sale patterns.

def save_to_database(products, conn):
    """Save scraped products to SQLite database."""
    cursor = conn.cursor()
    
    for product in products:
        # Insert or update product
        cursor.execute('''
            INSERT OR REPLACE INTO products 
            (id, title, handle, vendor, product_type, created_at, updated_at)
            VALUES (?, ?, ?, ?, ?, ?, ?)
        ''', (
            product['id'],
            product['title'],
            product['handle'],
            product['vendor'],
            product['product_type'],
            product['created_at'],
            product['updated_at']
        ))
        
        # Insert variants
        for variant in product.get('variants', []):
            cursor.execute('''
                INSERT OR REPLACE INTO variants
                (id, product_id, title, price, compare_at_price, sku, available)
                VALUES (?, ?, ?, ?, ?, ?, ?)
            ''', (
                variant['id'],
                product['id'],
                variant['title'],
                variant['price'],
                variant.get('compare_at_price'),
                variant.get('sku'),
                variant.get('available')
            ))
            
            # Record price history
            cursor.execute('''
                INSERT INTO price_history (variant_id, price, recorded_at)
                VALUES (?, ?, datetime('now'))
            ''', (variant['id'], variant['price']))
    
    conn.commit()

Multi-Region Scraping

Compare prices across Gymshark's regional stores.

REGIONS = {
    'US': 'www.gymshark.com',
    'UK': 'uk.gymshark.com',
    'ROW': 'row.gymshark.com'
}

def scrape_all_regions():
    """Scrape products from all regional stores."""
    all_data = {}
    
    for region, domain in REGIONS.items():
        print(f"\nScraping {region} store...")
        products = scrape_all_products(domain)
        all_data[region] = products
        print(f"  Found {len(products)} products")
        time.sleep(2)  # Pause between regions
    
    return all_data

Cross-region analysis reveals pricing arbitrage opportunities. The same leggings might cost less in one region.

def compare_regional_prices(all_data):
    """Compare prices for same products across regions."""
    comparisons = []
    
    # Build lookup by handle (product identifier)
    us_products = {p['handle']: p for p in all_data.get('US', [])}
    uk_products = {p['handle']: p for p in all_data.get('UK', [])}
    
    for handle, us_product in us_products.items():
        if handle in uk_products:
            uk_product = uk_products[handle]
            
            us_price = float(us_product['variants'][0]['price'])
            uk_price = float(uk_product['variants'][0]['price'])
            
            # Rough GBP to USD conversion
            uk_price_usd = uk_price * 1.27
            
            comparisons.append({
                'title': us_product['title'],
                'us_price': us_price,
                'uk_price_gbp': uk_price,
                'uk_price_usd': round(uk_price_usd, 2),
                'difference': round(us_price - uk_price_usd, 2)
            })
    
    return comparisons

Error Handling and Logging

Production scrapers need robust error handling.

import logging
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('scraper.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

def scrape_with_logging(domain):
    """Scrape products with comprehensive logging."""
    logger.info(f"Starting scrape of {domain}")
    start_time = datetime.now()
    
    try:
        products = scrape_all_products(domain)
        
        elapsed = (datetime.now() - start_time).total_seconds()
        logger.info(f"Completed: {len(products)} products in {elapsed:.1f}s")
        
        return products
        
    except Exception as e:
        logger.error(f"Scrape failed: {str(e)}")
        raise

Logs help debug issues in production. Store them separately from output data.

Common Errors and Solutions

403 Forbidden: Cloudflare blocked your request. Use curl_cffi with impersonate="chrome" or switch to residential proxies.

Empty JSON Response: You've reached the end of pagination. This is expected behavior, not an error.

Connection Timeout: Network issue or server overload. Implement retry logic with exponential backoff.

Rate Limit (429): Too many requests. Slow down and add delays between requests.

Invalid JSON: The endpoint returned HTML instead of JSON. This usually means a CAPTCHA page or error. Check the response content.

SSL Certificate Error: TLS handshake failed. Update curl_cffi to the latest version.

Gymshark's /products.json endpoint serves public data intended for search engines and third-party integrations. The sitemap explicitly lists product URLs for crawlers.

Respect rate limits. Don't hammer their servers with thousands of requests per minute.

Avoid scraping customer accounts or checkout processes. That crosses into unauthorized access territory.

Check their robots.txt for any restrictions. As of 2026, Gymshark doesn't block the JSON endpoints used in this guide.

Complete Working Scraper

Here's a production-ready scraper combining all techniques:

#!/usr/bin/env python3
"""
Gymshark Product Scraper - Complete Implementation
Extracts all products from Gymshark Shopify stores.
"""

from curl_cffi import requests
import pandas as pd
import time
import logging
import json
from datetime import datetime

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class GymsharkScraper:
    """Scraper for Gymshark product data."""
    
    DOMAINS = {
        'US': 'www.gymshark.com',
        'UK': 'uk.gymshark.com',
        'ROW': 'row.gymshark.com'
    }
    
    def __init__(self, region='US'):
        self.domain = self.DOMAINS.get(region, self.DOMAINS['US'])
        self.region = region
        
    def fetch_page(self, page=1, max_retries=3):
        """Fetch a single page of products."""
        url = f"https://{self.domain}/products.json?limit=250&page={page}"
        
        for attempt in range(max_retries):
            try:
                response = requests.get(url, impersonate="chrome")
                
                if response.status_code == 200:
                    return response.json().get('products', [])
                    
                if response.status_code == 429:
                    wait = 2 ** attempt * 5
                    logger.warning(f"Rate limited. Waiting {wait}s...")
                    time.sleep(wait)
                    continue
                    
            except Exception as e:
                logger.error(f"Request failed: {e}")
                time.sleep(2 ** attempt)
        
        return []
    
    def scrape_all(self):
        """Scrape complete product catalog."""
        all_products = []
        page = 1
        
        while True:
            products = self.fetch_page(page)
            
            if not products:
                break
                
            all_products.extend(products)
            logger.info(f"Page {page}: {len(products)} products")
            
            page += 1
            time.sleep(0.3)
        
        logger.info(f"Total: {len(all_products)} products from {self.region}")
        return all_products
    
    def to_dataframe(self, products):
        """Convert products to pandas DataFrame."""
        rows = []
        
        for p in products:
            for v in p.get('variants', []):
                rows.append({
                    'product_id': p['id'],
                    'title': p['title'],
                    'handle': p['handle'],
                    'type': p['product_type'],
                    'variant_id': v['id'],
                    'variant': v['title'],
                    'price': v['price'],
                    'compare_price': v.get('compare_at_price'),
                    'available': v.get('available'),
                    'sku': v.get('sku'),
                })
        
        return pd.DataFrame(rows)
    
    def export_csv(self, products, filename=None):
        """Export products to CSV."""
        if filename is None:
            filename = f"gymshark_{self.region.lower()}_{datetime.now():%Y%m%d}.csv"
        
        df = self.to_dataframe(products)
        df.to_csv(filename, index=False)
        logger.info(f"Exported to {filename}")
        return filename


def main():
    """Main execution function."""
    # Scrape US store
    scraper = GymsharkScraper(region='US')
    products = scraper.scrape_all()
    scraper.export_csv(products)
    
    # Optional: scrape other regions
    # for region in ['UK', 'ROW']:
    #     scraper = GymsharkScraper(region=region)
    #     products = scraper.scrape_all()
    #     scraper.export_csv(products)


if __name__ == '__main__':
    main()

Save this as gymshark_scraper.py and run with python gymshark_scraper.py.

Next Steps

You now have multiple methods to scrape Gymshark's product catalog. The JSON endpoint approach works best for bulk data extraction.

For real-time monitoring, set up scheduled scraping with price comparison alerts. Cloud functions handle this efficiently.

Expand to other Shopify stores using the same techniques. The /products.json endpoint exists on most Shopify sites.

Consider building a web interface to browse and filter scraped data. React or Vue work well for this.

Frequently Asked Questions

Is it legal to scrape Gymshark?

Scraping publicly available product data is generally legal. The JSON endpoints serve data intended for public consumption. Don't scrape user accounts or bypass authentication.

How often should I scrape?

For price monitoring, every 6-12 hours captures most changes. During sales events, hourly checks catch flash deals.

Why do I get blocked even with curl_cffi?

You may need residential proxies for high-volume scraping. Datacenter IPs get flagged regardless of TLS fingerprint. Try rotating proxies between requests.

Can I scrape Gymshark reviews?

Reviews load via JavaScript. Use Playwright to render the page and intercept the review API calls. The network interception technique reveals the review endpoint.

How do I handle CAPTCHAs?

The JSON endpoint rarely triggers CAPTCHAs. If you see them, reduce request frequency or use residential proxies. CAPTCHA solving services work as a last resort.

What's the best proxy provider for Gymshark?

Residential proxies work best. Mobile proxies from UK and US locations have the highest success rates.

Conclusion

Gymshark's Shopify backend provides clean JSON access to their entire product catalog. The curl_cffi library bypasses TLS fingerprinting that blocks standard HTTP clients.

Start with the JSON endpoint method for bulk scraping. Add browser automation only when you need JavaScript-rendered content like reviews.

Rate limiting and proxy rotation keep your scraper running long-term. Monitor logs for blocks and adjust your approach accordingly.

The complete scraper class handles all common scenarios. Customize it for your specific use case.