Gymshark runs on Shopify, which makes product data accessible through JSON endpoints and browser automation. This guide shows you both methods.
You'll learn the fastest way to scrape Gymshark products, handle pagination limits, and avoid common blocking issues.
What's the Fastest Way to Scrape Gymshark?
Gymshark uses Shopify's /products.json endpoint that returns structured product data without HTML parsing. This approach runs 5x faster than browser automation and works on all Gymshark regional sites (US, UK, ROW). For JavaScript-heavy pages, combine it with Playwright or Selenium for complete data extraction.
Why Scrape Gymshark?
Gymshark sells over 2,000 fitness products across multiple regions. Price monitoring reveals when products drop below retail.
Stock tracking shows which items sell fastest. This data helps with competitive analysis and inventory planning.
The site updates frequently. Scraping captures these changes automatically.
Gymshark's Technical Structure
Gymshark operates three regional Shopify stores: gymshark.com (US), uk.gymshark.com (UK), and row.gymshark.com (Rest of World).
Each store maintains separate product catalogs and pricing. The backend structure stays identical across regions.
This makes your scraper reusable. Change the domain and scrape any Gymshark site.
Method 1: JSON Endpoint Scraping (Recommended)
This method extracts data directly from Shopify's API. No HTML parsing needed.
Get All Product URLs
Gymshark publishes a sitemap with every product URL at /sitemap_products_1.xml.
import requests
import xmltodict
# Target the Gymshark US site
SITEMAP = 'https://www.gymshark.com/sitemap_products_1.xml'
# Fetch sitemap content
response = requests.get(SITEMAP)
sitemap = xmltodict.parse(response.text)
# Extract all product URLs
urls = []
for item in sitemap['urlset']['url']:
    urls.append(item['loc'])
print(f"Found {len(urls)} products")
This returns ~2,200 product URLs. The sitemap updates daily with new products.
Store URLs in a text file for batch processing later.
Extract Product Data
The /products.json endpoint returns up to 250 products per request. Pagination handles larger catalogs.
import requests
import json
def fetch_products(page=1):
    url = f'https://www.gymshark.com/products.json?limit=250&page={page}'
    
    response = requests.get(url)
    data = response.json()
    
    return data['products']
# Get first 250 products
products = fetch_products(page=1)
for product in products:
    print(f"Title: {product['title']}")
    print(f"Price: {product['variants'][0]['price']}")
    print(f"Available: {product['available']}")
    print("---")
Each product includes title, vendor, type, pricing, and availability. Variants contain size-specific data.
The API returns clean JSON. No CSS selectors or HTML parsing required.
Handle Pagination
Shopify limits results to 250 products per request. Loop through pages to get everything.
def scrape_all_products():
    page = 1
    all_products = []
    
    while True:
        products = fetch_products(page=page)
        
        # Stop when no more products
        if not products:
            break
            
        all_products.extend(products)
        page += 1
        
        # Rate limiting - be respectful
        time.sleep(0.5)
    
    return all_products
# Get complete product catalog
complete_catalog = scrape_all_products()
print(f"Total products: {len(complete_catalog)}")
This scrapes the entire Gymshark catalog systematically. Takes about 10 seconds for 2,200 products.
Rate limiting prevents server overload. Stay under 2 requests per second.
Method 2: Browser Automation
Use this when JavaScript renders product data dynamically. Playwright handles modern web apps better than Selenium.
Setup Playwright
from playwright.sync_api import sync_playwright
import json
def scrape_with_browser(url):
    with sync_playwright() as p:
        # Launch browser in headless mode
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        
        # Navigate to product page
        page.goto(url)
        
        # Wait for product details to load
        page.wait_for_selector('.ProductTitle_product-title__2dbjR')
        
        # Extract data
        title = page.locator('.ProductTitle_product-title__2dbjR').inner_text()
        price = page.locator('.ProductPrice_product-price__1VQdR').inner_text()
        
        browser.close()
        
        return {
            'title': title,
            'price': price
        }
This method works when Gymshark's frontend loads data with JavaScript. Slower than JSON but handles dynamic content.
The selector classes change occasionally. Check the page source when scraping fails.
Method Comparison: Which Should You Use?
| Method | Speed | Complexity | Blocking Risk | Best For | 
|---|---|---|---|---|
| JSON Endpoint | Fast (10s for 2k products) | Low | Low | Bulk scraping, price monitoring | 
| Browser Automation | Slow (5-10s per product) | Medium | High | Dynamic content, JavaScript-heavy pages | 
| Third-Party APIs | Medium | Very Low | Very Low | Quick projects, no maintenance | 
JSON endpoints win for most use cases. Browser automation helps when JavaScript hides data.
Avoid Getting Blocked
Gymshark uses Cloudflare for bot detection. Here's how to scrape Gymshark without issues.
Rotate User Agents
import random
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]
headers = {
    'User-Agent': random.choice(user_agents)
}
response = requests.get(url, headers=headers)
This mimics different browsers. Cloudflare checks User-Agent strings.
Rotate after every 50-100 requests for best results.
Use Proxies for Scale
Residential proxies avoid IP bans when scraping thousands of products daily.
proxies = {
    'http': 'http://user:pass@proxy.example.com:8080',
    'https': 'http://user:pass@proxy.example.com:8080'
}
response = requests.get(url, proxies=proxies)
Free proxies fail frequently. Paid services like BrightData or Oxylabs work better.
Rotate proxies every 100-200 requests to stay under the radar.
Regional Site Differences
UK and ROW sites use identical structures. Change the domain to scrape them.
# US site
us_url = 'https://www.gymshark.com/products.json'
# UK site
uk_url = 'https://uk.gymshark.com/products.json'
# Rest of World
row_url = 'https://row.gymshark.com/products.json'
Pricing differs by region. The same leggings cost $60 USD and £48 GBP.
Currency conversion happens server-side. Scrape multiple regions for arbitrage opportunities.
Build a Price Monitor
Track Gymshark prices automatically with scheduled scraping.
import schedule
import time
import json
def check_prices():
    products = fetch_products(page=1)
    
    # Load previous prices
    try:
        with open('prices.json', 'r') as f:
            old_prices = json.load(f)
    except:
        old_prices = {}
    
    # Compare and alert on drops
    for product in products:
        product_id = str(product['id'])
        current_price = float(product['variants'][0]['price'])
        
        if product_id in old_prices:
            old_price = old_prices[product_id]
            if current_price < old_price:
                print(f"Price drop: {product['title']}")
                print(f"Was ${old_price}, now ${current_price}")
        
        old_prices[product_id] = current_price
    
    # Save updated prices
    with open('prices.json', 'w') as f:
        json.dump(old_prices, f)
# Run every 6 hours
schedule.every(6).hours.do(check_prices)
while True:
    schedule.run_pending()
    time.sleep(60)
This monitors prices without manual checking. Sends alerts when products go on sale.
Run it on a cloud server for 24/7 monitoring.
Store Data Efficiently
Save scraped data in pandas DataFrames for analysis.
import pandas as pd
def save_to_csv(products):
    # Flatten product data
    rows = []
    for product in products:
        for variant in product['variants']:
            rows.append({
                'product_id': product['id'],
                'title': product['title'],
                'vendor': product['vendor'],
                'product_type': product['product_type'],
                'variant_id': variant['id'],
                'size': variant['title'],
                'price': variant['price'],
                'available': variant['available'],
                'sku': variant['sku']
            })
    
    df = pd.DataFrame(rows)
    df.to_csv('gymshark_products.csv', index=False)
    
    return df
# Export all products
df = save_to_csv(complete_catalog)
print(df.head())
CSV format works with Excel and Google Sheets. Easy sharing with non-technical teams.
DataFrames enable quick filtering and sorting. Find all products under $30 in seconds.
Legal Considerations
Gymshark's robots.txt doesn't explicitly block product pages. The JSON endpoint serves public data.
Terms of service prohibit automated account actions. Don't scrape customer accounts or checkout flows.
Respect rate limits. Gymshark.com handles millions of visitors. Your scraper should blend in.
Common Errors and Fixes
"No products found" - You hit the end of pagination. This is normal.
"Timeout errors" - Add longer wait times between requests. Gymshark throttles aggressive scrapers.
"403 Forbidden" - Your IP got flagged. Switch proxies or reduce request frequency.
"Invalid JSON" - The endpoint returns 404 for removed products. Check status codes first.
Scale Your Scraping
Process products in parallel to speed up collection.
from concurrent.futures import ThreadPoolExecutor
def scrape_product(url):
    # Your scraping logic here
    pass
urls = [...]  # Your list of product URLs
with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(scrape_product, urls))
Limit workers to 5-10 threads. Higher numbers trigger rate limits.
Parallel processing cuts scraping time by 80%. Handle 2,000 products in under 2 minutes.
Next Steps
Start with the JSON endpoint method. It's reliable and fast for most projects.
Add browser automation only when JavaScript blocks data extraction. This happens rarely on Gymshark.
Schedule your scraper with cron jobs or cloud functions. Daily runs capture inventory changes.
Frequently Asked Questions
Is it legal to scrape Gymshark?
Scraping public product data is generally legal. Avoid scraping user accounts or violating terms of service. Consult a lawyer for commercial use.
How often does Gymshark update products?
New products launch weekly. Prices change during sales events. Daily scraping captures most updates.
Can I scrape Gymshark reviews?
Reviews load via JavaScript. Use Playwright to access the review API endpoint after page load.
What's the best proxy for Gymshark?
Residential proxies work best. Rotate IPs from US, UK, or the target region.
How do I handle CAPTCHAs?
JSON endpoints rarely show CAPTCHAs. If you see them, reduce request frequency or use CAPTCHA-solving services.
Conclusion
The JSON endpoint is the fastest way to scrape Gymshark product data. It returns clean, structured information without HTML parsing.
Browser automation handles edge cases where JavaScript renders content. Combine both methods for complete coverage.
Rate limiting and proxies keep your scraper running. Schedule regular checks to track prices and inventory automatically.