Ever spent hours writing a scraper only to watch it break the next day because a website changed its HTML structure? That's the frustrating reality most developers face.

Scrapling solves this problem. It's an adaptive Python library that automatically relocates elements when websites update their design.

In this guide, you'll learn how to use Scrapling for web scraping from start to finish. We'll cover static sites, dynamic JavaScript-heavy pages, bypassing anti-bot protections, and scaling with async requests.

What is Scrapling and How Does It Work?

Scrapling is a high-performance Python web scraping library that automatically adapts to website changes using intelligent similarity algorithms. Unlike BeautifulSoup or Selenium that break when selectors change, Scrapling tracks elements and relocates them even after site redesigns. It combines a fast parsing engine with multiple fetcher classes to handle any scraping challenge—from simple HTTP requests to full browser automation with anti-bot bypass capabilities.

The library offers four main fetcher types:

  • Fetcher: Fast HTTP requests with TLS fingerprint impersonation
  • DynamicFetcher: Full browser automation via Playwright
  • StealthyFetcher: Modified Firefox with fingerprint spoofing for bypassing Cloudflare
  • Session classes: Persistent connections for faster sequential requests

Let's start building scrapers.

Step 1: Install Scrapling and Dependencies

Before writing any code, you need to set up Scrapling correctly.

The base installation only includes the parser engine. For actual scraping, you need the fetchers package.

Run these commands in your terminal:

pip install "scrapling[fetchers]"

This installs the core library plus fetcher dependencies including curl-cffi for TLS fingerprinting.

Next, install browser binaries and fingerprint databases:

scrapling install

You'll see output like this:

Installing Playwright browsers...
Installing Playwright dependencies...
Installing Camoufox browser and databases...

This downloads Chromium for DynamicFetcher and the modified Firefox browser for StealthyFetcher.

For the complete package including CLI tools and AI features:

pip install "scrapling[all]"

Here's what each installation option includes:

Package Includes
scrapling Parser engine only
scrapling[fetchers] Parser + all fetcher classes
scrapling[shell] CLI tools and interactive shell
scrapling[ai] MCP server for AI integration
scrapling[all] Everything above

If you prefer Docker, a pre-built image with all dependencies exists:

docker pull pyd4vinci/scrapling

Verify your installation works:

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://httpbin.org/get')
print(page.status)  # Should print: 200

If you get a 200 status code, Scrapling is ready to use.

Common installation issues:

  • Permission errors: Use pip install --user or a virtual environment
  • Browser install fails: Run scrapling install with admin/sudo privileges
  • SSL errors: Update your system's CA certificates

Step 2: Scrape a Static Website

Static websites serve HTML directly without JavaScript rendering. They're the easiest targets for web scraping.

Let's scrape quotes from a practice website using Scrapling's Fetcher class.

Import the fetcher and make a request:

from scrapling.fetchers import Fetcher

url = "https://quotes.toscrape.com/"
page = Fetcher.get(url)

print(f"Status: {page.status}")
print(f"Content length: {len(page.html)} characters")

The Fetcher.get() method returns a Response object containing the HTML and metadata.

Now extract the quotes using CSS selectors:

from scrapling.fetchers import Fetcher

url = "https://quotes.toscrape.com/"
page = Fetcher.get(url)

quotes = []

for quote_element in page.css(".quote"):
    text = quote_element.css_first(".text::text")
    author = quote_element.css_first(".author::text")
    tags = [tag.text for tag in quote_element.css(".tags .tag")]
    
    quotes.append({
        "text": text,
        "author": author,
        "tags": tags
    })

for quote in quotes[:3]:
    print(f"{quote['author']}: {quote['text'][:50]}...")

Notice the ::text pseudo-selector. This extracts text content directly, similar to Scrapy's syntax.

The css_first() method is about 10% faster than css() when you only need the first matching element.

You can also use XPath if you prefer that syntax:

quotes_xpath = page.xpath('//div[@class="quote"]')
text_xpath = quote_element.xpath('.//span[@class="text"]/text()')

Both selector types work interchangeably in Scrapling.

Scrapling provides rich DOM traversal capabilities. Once you have an element, you can navigate to related elements:

quote = page.css_first(".quote")

# Navigate to parent element
container = quote.parent

# Get next sibling element
next_quote = quote.next_sibling

# Get previous sibling
prev_quote = quote.previous_sibling

# Get all child elements
children = quote.children

# Find elements below this one in the DOM
elements_below = quote.below_elements()

These methods chain together for complex navigation:

# Get the author from the next sibling's child
author_element = quote.next_sibling.css_first(".author")

Text Extraction Options

Scrapling offers multiple ways to extract text:

element = page.css_first(".quote")

# Get direct text content
text = element.text

# Get all text including nested elements
all_text = element.get_all_text()

# Get text with whitespace stripped
clean_text = element.get_all_text(strip=True)

# Using pseudo-selector
text = element.css_first(".text::text")

The get_all_text() method recursively collects text from all child elements. Use this when content spans multiple nested tags.

Step 3: Handle Dynamic JavaScript Websites

Many modern websites load content via JavaScript after the initial page load. Standard HTTP requests won't capture this data.

Scrapling's DynamicFetcher launches a real browser to render JavaScript before extracting content.

Here's how to scrape a page that loads products via AJAX:

from scrapling.fetchers import DynamicFetcher

url = "https://www.scrapingcourse.com/javascript-rendering"

page = DynamicFetcher.fetch(
    url,
    wait_selector=".product-item",
    headless=True,
    network_idle=True
)

print(f"Status: {page.status}")

The key parameters here:

  • wait_selector: Pauses until this element appears in the DOM
  • headless=True: Runs the browser without a visible window
  • network_idle=True: Waits for network activity to stop

Now extract the dynamically loaded products:

from scrapling.fetchers import DynamicFetcher

url = "https://www.scrapingcourse.com/javascript-rendering"

page = DynamicFetcher.fetch(
    url,
    wait_selector=".product-item",
    headless=True,
    network_idle=True
)

products = []

for product in page.css(".product-item"):
    name = product.css_first(".product-name::text")
    price = product.css_first(".product-price::text")
    link = product.css_first(".product-link::attr(href)")
    image = product.css_first(".product-image::attr(src)")
    
    products.append({
        "name": name,
        "price": price,
        "url": link,
        "image": image
    })

print(f"Scraped {len(products)} products")

The ::attr(href) syntax extracts HTML attributes directly from elements.

Alternatively, access attributes through the attrib dictionary:

link = product.css_first(".product-link").attrib["href"]
# Or shorthand:
link = product.css_first(".product-link")["href"]

All three approaches produce identical results. Use whichever feels most natural.

Step 4: Bypass Cloudflare and Anti-Bot Protection

Cloudflare Turnstile and similar anti-bot systems block most automated scrapers. Scrapling's StealthyFetcher uses a modified Firefox browser with fingerprint spoofing to bypass these protections.

Here's how to scrape a Cloudflare-protected page:

from scrapling.fetchers import StealthyFetcher

url = "https://www.scrapingcourse.com/cloudflare-challenge"

page = StealthyFetcher.fetch(
    url,
    solve_cloudflare=True,
    humanize=True,
    headless=True
)

result = page.css_first("#challenge-info").get_all_text(strip=True)
print(result)

The critical parameters:

  • solve_cloudflare=True: Automatically handles Turnstile challenges
  • humanize=True: Simulates human-like cursor movements
  • headless=True: Runs without displaying the browser window

StealthyFetcher relies on Camoufox, a modified Firefox build with native fingerprint spoofing. This makes detection significantly harder than standard browser automation tools.

For sites with aggressive bot detection, you might need to adjust settings:

page = StealthyFetcher.fetch(
    url,
    solve_cloudflare=True,
    humanize=True,
    headless=False,  # Visible browser may help with some protections
    google_search=True  # Makes request appear to come from Google search
)

The google_search=True parameter modifies the referer header to appear as if you clicked a Google search result. Many sites trust traffic from search engines more than direct visits.

Step 5: Scale with Async Sessions and Pagination

Scraping multiple pages sequentially is slow. Scrapling supports async operations to fetch pages concurrently.

Here's how to scrape paginated content efficiently:

import asyncio
from scrapling.fetchers import FetcherSession

async def scrape_page(session, url):
    page = await session.get(url)
    
    quotes = []
    for quote in page.css(".quote"):
        quotes.append({
            "text": quote.css_first(".text::text"),
            "author": quote.css_first(".author::text")
        })
    
    return quotes

async def scrape_all():
    base_url = "https://quotes.toscrape.com/page/{}/"
    all_quotes = []
    
    async with FetcherSession(impersonate="chrome") as session:
        tasks = []
        
        for page_num in range(1, 11):
            url = base_url.format(page_num)
            task = scrape_page(session, url)
            tasks.append(task)
        
        results = await asyncio.gather(*tasks)
        
        for page_quotes in results:
            all_quotes.extend(page_quotes)
    
    return all_quotes

quotes = asyncio.run(scrape_all())
print(f"Total quotes scraped: {len(quotes)}")

This script fetches all 10 pages concurrently instead of one at a time.

The FetcherSession class reuses connections across requests, making subsequent requests up to 10x faster than creating new connections each time.

For browser-based scraping, use AsyncStealthySession or AsyncDynamicSession:

from scrapling.fetchers import AsyncStealthySession

async with AsyncStealthySession(max_pages=5) as session:
    tasks = [session.fetch(url) for url in urls]
    results = await asyncio.gather(*tasks)

The max_pages parameter controls how many browser tabs run simultaneously. Setting this too high consumes excessive memory.

Browser Tab Pool Management

For browser-based sessions, Scrapling maintains a pool of tabs that rotate between requests:

from scrapling.fetchers import AsyncStealthySession

async with AsyncStealthySession(max_pages=3) as session:
    # Check pool status
    stats = session.get_pool_stats()
    print(f"Busy tabs: {stats['busy']}")
    print(f"Free tabs: {stats['free']}")
    
    tasks = [session.fetch(url) for url in urls]
    results = await asyncio.gather(*tasks)
    
    # Check stats after completion
    print(session.get_pool_stats())

The pool prevents memory issues from opening too many browser instances. Requests queue automatically when all tabs are busy.

Handling Pagination with Unknown Page Count

Sometimes you don't know how many pages exist. Use a while loop that checks for a "next" button:

from scrapling.fetchers import FetcherSession

all_data = []
page_num = 1

with FetcherSession() as session:
    while True:
        url = f"https://example.com/products?page={page_num}"
        page = session.get(url)
        
        items = page.css(".product")
        if not items:
            break  # No more products
            
        for item in items:
            all_data.append({
                "name": item.css_first(".name::text"),
                "price": item.css_first(".price::text")
            })
        
        # Check for next page link
        next_link = page.css_first(".pagination .next")
        if not next_link:
            break
            
        page_num += 1
        
print(f"Scraped {len(all_data)} items across {page_num} pages")

This pattern gracefully handles variable page counts.

Step 6: Integrate Proxies for Large-Scale Scraping

When scraping at scale, you'll eventually hit rate limits or IP bans. Proxies rotate your requests through different IP addresses.

Scrapling has native proxy support across all fetcher types.

Basic proxy integration:

from scrapling.fetchers import Fetcher

proxy_url = "http://username:password@proxy-host:port"

page = Fetcher.get(
    "https://httpbin.org/ip",
    proxy=proxy_url
)

print(page.json())  # Shows the proxy IP, not yours

For residential proxies that rotate automatically, services like Roundproxies.com provide endpoints that handle rotation server-side:

from scrapling.fetchers import StealthyFetcher

# Residential proxy from your provider
proxy = "http://user:pass@residential.proxy:port"

page = StealthyFetcher.fetch(
    "https://target-site.com",
    proxy=proxy,
    solve_cloudflare=True
)

Combining residential proxies with StealthyFetcher's fingerprint spoofing creates scrapers that are extremely difficult to detect.

For session-based scraping with proxies:

from scrapling.fetchers import FetcherSession

async with FetcherSession(
    impersonate="firefox",
    proxy="http://user:pass@proxy:port"
) as session:
    page1 = await session.get("https://site.com/page1")
    page2 = await session.get("https://site.com/page2")

The proxy setting persists across all requests in the session.

Rotating Proxies for Each Request

If your proxy provider doesn't rotate automatically, implement rotation yourself:

import random
from scrapling.fetchers import Fetcher

proxies = [
    "http://user:pass@proxy1:port",
    "http://user:pass@proxy2:port",
    "http://user:pass@proxy3:port",
]

def scrape_with_rotation(url):
    proxy = random.choice(proxies)
    return Fetcher.get(url, proxy=proxy)

for url in target_urls:
    page = scrape_with_rotation(url)
    # Process page...

For production scrapers, consider dedicated proxy services. Residential proxies from providers like Roundproxies.com are harder for target sites to detect compared to datacenter proxies.

Proxy Authentication Formats

Scrapling accepts proxies in standard formats:

# IP authentication (whitelist your IP first)
proxy = "http://proxy-host:port"

# Username/password authentication
proxy = "http://username:password@proxy-host:port"

# SOCKS5 proxy
proxy = "socks5://user:pass@proxy-host:port"

Test your proxy connection before scraping:

page = Fetcher.get("https://httpbin.org/ip", proxy=proxy)
print(page.json())  # Verify proxy IP is shown

Adaptive Scraping: Handle Website Redesigns Automatically

This is Scrapling's killer feature. Traditional scrapers break when websites change their HTML structure. Scrapling remembers element characteristics and finds them even after redesigns.

First, save element signatures during initial scraping:

from scrapling.fetchers import Fetcher

page = Fetcher.get("https://example.com/products")

# auto_save=True stores element fingerprints
products = page.css(".product-card", auto_save=True)

for product in products:
    print(product.css_first(".title::text"))

Later, when the website changes its CSS classes, use adaptive mode:

page = Fetcher.get("https://example.com/products")

# adaptive=True uses stored fingerprints to relocate elements
products = page.css(".product-card", adaptive=True)

# Still works even if .product-card changed to .item-container
for product in products:
    print(product.css_first(".title::text"))

Scrapling stores unique element properties: tag name, text content, attributes, parent/sibling relationships, and DOM depth. When you enable adaptive=True, it calculates similarity scores to find the best matching elements.

This eliminates the maintenance nightmare of constantly fixing broken selectors.

Find Similar Elements Without Writing Selectors

Sometimes you don't know the exact selector for all elements you need. Scrapling can find elements similar to one you've identified.

from scrapling.fetchers import Fetcher

page = Fetcher.get("https://quotes.toscrape.com/")

# Find one quote by its text
first_quote = page.find_by_text("The world as we have created it")

# Find all similar elements on the page
similar_quotes = first_quote.find_similar()

print(f"Found {len(similar_quotes)} similar elements")

The find_similar() method compares DOM structure, tag types, and attributes to locate matching elements.

You can fine-tune the matching:

similar = element.find_similar(
    ignore_attributes=["id", "data-timestamp"],  # Ignore dynamic attributes
    threshold=0.7  # Minimum similarity score (0-1)
)

This is especially useful when scraping sites with inconsistent HTML or when prototyping scrapers quickly.

Using the Scrapling CLI for Quick Extraction

Scrapling includes command-line tools for scraping without writing code.

Extract page content directly to a file:

scrapling extract get 'https://example.com' output.md

This saves the page body as markdown.

For more control, specify a CSS selector:

scrapling extract get 'https://quotes.toscrape.com' quotes.txt --css-selector '.quote .text'

For JavaScript-rendered pages:

scrapling extract fetch 'https://dynamic-site.com' data.html --no-headless

For Cloudflare-protected sites:

scrapling extract stealthy-fetch 'https://protected.com' content.md --solve-cloudflare

The interactive shell provides a REPL environment for testing selectors:

scrapling shell

Inside the shell, you can test CSS/XPath selectors and convert cURL commands to Scrapling code.

Common Mistakes and How to Avoid Them

Mistake 1: Using DynamicFetcher for Static Sites

Browser automation is slow and resource-intensive. Only use DynamicFetcher or StealthyFetcher when the target actually requires JavaScript rendering.

Test first with Fetcher:

from scrapling.fetchers import Fetcher

page = Fetcher.get(url)
content = page.css(".target-element")

if not content:
    # Page might be dynamic, try browser fetcher
    from scrapling.fetchers import DynamicFetcher
    page = DynamicFetcher.fetch(url)

Mistake 2: Not Using Sessions for Multiple Requests

Creating new connections for each request wastes time and resources.

Bad approach:

for url in urls:
    page = Fetcher.get(url)  # New connection each time

Better approach:

with FetcherSession() as session:
    for url in urls:
        page = session.get(url)  # Reuses connection

Sessions are up to 10x faster for sequential requests.

Mistake 3: Ignoring Rate Limits

Hammering a server with rapid requests gets you blocked fast. Add delays between requests:

import time

for url in urls:
    page = session.get(url)
    time.sleep(1)  # 1 second delay

For async code, use asyncio.sleep():

await asyncio.sleep(1)

Mistake 4: Not Handling Errors

Network requests fail. Scrapers should handle exceptions gracefully:

from scrapling.fetchers import Fetcher

try:
    page = Fetcher.get(url, timeout=30)
    
    if page.status != 200:
        print(f"Non-200 status: {page.status}")
        
except Exception as e:
    print(f"Request failed: {e}")

For retries, use the session's built-in retry parameter:

with FetcherSession(retries=3) as session:
    page = session.get(url)  # Automatically retries on failure

Implementing Robust Error Handling

Production scrapers need comprehensive error handling:

from scrapling.fetchers import Fetcher, DynamicFetcher
import time

def scrape_with_fallback(url, max_retries=3):
    """Scrape URL with fallback to browser if static fetch fails."""
    
    for attempt in range(max_retries):
        try:
            # Try static fetch first (faster)
            page = Fetcher.get(url, timeout=30)
            
            if page.status == 200:
                content = page.css(".target-element")
                if content:
                    return page
                    
                # Content not found, might be dynamic
                print("Content not in static HTML, trying browser...")
                
            elif page.status == 403:
                print("Blocked, trying stealth browser...")
                
            elif page.status == 429:
                print("Rate limited, waiting...")
                time.sleep(60)
                continue
                
        except Exception as e:
            print(f"Fetch error: {e}")
            
        # Fallback to browser
        try:
            page = DynamicFetcher.fetch(
                url,
                headless=True,
                network_idle=True,
                timeout=60000
            )
            
            if page.status == 200:
                return page
                
        except Exception as e:
            print(f"Browser fetch error: {e}")
            
        time.sleep(2 ** attempt)  # Exponential backoff
        
    return None

# Usage
page = scrape_with_fallback("https://target.com")
if page:
    data = page.css(".data")
else:
    print("Failed after all retries")

This pattern starts fast with HTTP requests and falls back to browser automation only when needed.

Performance Comparison: Scrapling vs Other Libraries

Scrapling's custom parsing engine significantly outperforms alternatives.

Text extraction benchmark (5000 nested elements):

Library Time vs Scrapling
Scrapling 1.99ms 1.0x
Parsel/Scrapy 2.01ms 1.01x
Raw lxml 2.5ms 1.26x
BeautifulSoup + lxml 1541ms 774x slower

For element similarity search (adaptive scraping):

Library Time vs Scrapling
Scrapling 2.46ms 1.0x
AutoScraper 13.3ms 5.4x slower

These benchmarks explain why Scrapling feels noticeably faster in real-world scraping tasks.

When to Use Each Fetcher Type

Choosing the right fetcher dramatically affects scraping success and speed.

Use Fetcher when:

  • Target serves static HTML
  • No JavaScript rendering required
  • Maximum speed needed
  • Scraping APIs or JSON endpoints

Use DynamicFetcher when:

  • Content loads via JavaScript/AJAX
  • Need to interact with page (clicks, scrolls)
  • SPA (Single Page Application) targets
  • Cloudflare not present

Use StealthyFetcher when:

  • Site has Cloudflare Turnstile
  • Aggressive bot detection present
  • Need fingerprint spoofing
  • Standard browser automation gets blocked

Use Session classes when:

  • Making multiple requests to same domain
  • Need to maintain cookies/state
  • Want connection reuse benefits
  • Scraping paginated content

Summary

Scrapling simplifies Python web scraping by combining fast parsing, multiple fetcher options, and adaptive element tracking in one library.

The key points to remember:

  • Install with pip install "scrapling[fetchers]" then run scrapling install
  • Use Fetcher for static sites, DynamicFetcher for JavaScript, StealthyFetcher for anti-bot bypass
  • Session classes dramatically speed up multi-page scraping
  • Adaptive scraping with auto_save=True and adaptive=True survives website redesigns
  • Native proxy support works across all fetcher types
  • The CLI allows quick extraction without writing code

Scrapling handles the complexity of modern web scraping so you can focus on extracting the data you need.

Start with simple static scraping, then gradually incorporate browser automation and anti-bot features as your targets require them.

Frequently Asked Questions

Does Scrapling work with Python 3.9?

No. Scrapling requires Python 3.10 or higher. The library uses type hints and features not available in earlier versions.

Can Scrapling bypass all Cloudflare protections?

StealthyFetcher successfully bypasses Cloudflare Turnstile and standard protection in most cases. However, Cloudflare continuously updates their detection. No tool guarantees 100% bypass rates.

Is Scrapling faster than Selenium?

Yes. Scrapling's parsing engine is hundreds of times faster than BeautifulSoup (which Selenium users typically pair with). For actual page fetching, DynamicFetcher uses Playwright which performs similarly to Selenium, but StealthyFetcher uses optimized Camoufox which can be faster.

How do I export scraped data to CSV or JSON?

Scrapling focuses on fetching and parsing. For data export, use Python's standard libraries:

import json
import csv

# JSON export
with open('data.json', 'w') as f:
    json.dump(scraped_data, f)

# CSV export
with open('data.csv', 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['title', 'price'])
    writer.writeheader()
    writer.writerows(scraped_data)

Does Scrapling respect robots.txt?

Scrapling does not automatically check robots.txt. You're responsible for respecting website terms of service and applicable laws.

How do I handle cookies and authentication?

Sessions automatically persist cookies between requests:

with FetcherSession() as session:
    # Login request
    login_page = session.post(
        "https://site.com/login",
        data={"user": "name", "pass": "secret"}
    )
    
    # Subsequent requests include session cookies
    dashboard = session.get("https://site.com/dashboard")

For browser-based sessions, cookies persist similarly within the session context.

Can I scrape JavaScript-only SPAs?

Yes. DynamicFetcher and StealthyFetcher fully render JavaScript. For Single Page Applications:

from scrapling.fetchers import DynamicFetcher

page = DynamicFetcher.fetch(
    "https://spa-site.com",
    network_idle=True,  # Wait for all XHR requests to complete
    wait_selector=".content-loaded"  # Wait for specific element
)

How do I debug selector issues?

Use the interactive shell to test selectors:

scrapling shell

Then test your selectors interactively before writing the full script. The shell supports live reloading and browser previews.

What's the difference between Scrapling and Scrapy?

Scrapy is a full web crawling framework with spiders, pipelines, and middleware. Scrapling is a focused library for fetching and parsing single pages.

Use Scrapy for large crawling projects with complex data pipelines. Use Scrapling for targeted scraping tasks where you need adaptive element tracking or anti-bot bypass.

Does Scrapling work behind a corporate firewall?

Yes, if your firewall allows outbound HTTP/HTTPS traffic. For browser fetchers, ensure ports used by Playwright/Camoufox aren't blocked. You may need to configure proxy settings to route through your corporate proxy.