Ever spent hours writing a scraper only to watch it break the next day because a website changed its HTML structure? That's the frustrating reality most developers face.
Scrapling solves this problem. It's an adaptive Python library that automatically relocates elements when websites update their design.
In this guide, you'll learn how to use Scrapling for web scraping from start to finish. We'll cover static sites, dynamic JavaScript-heavy pages, bypassing anti-bot protections, and scaling with async requests.
What is Scrapling and How Does It Work?
Scrapling is a high-performance Python web scraping library that automatically adapts to website changes using intelligent similarity algorithms. Unlike BeautifulSoup or Selenium that break when selectors change, Scrapling tracks elements and relocates them even after site redesigns. It combines a fast parsing engine with multiple fetcher classes to handle any scraping challenge—from simple HTTP requests to full browser automation with anti-bot bypass capabilities.
The library offers four main fetcher types:
- Fetcher: Fast HTTP requests with TLS fingerprint impersonation
- DynamicFetcher: Full browser automation via Playwright
- StealthyFetcher: Modified Firefox with fingerprint spoofing for bypassing Cloudflare
- Session classes: Persistent connections for faster sequential requests
Let's start building scrapers.
Step 1: Install Scrapling and Dependencies
Before writing any code, you need to set up Scrapling correctly.
The base installation only includes the parser engine. For actual scraping, you need the fetchers package.
Run these commands in your terminal:
pip install "scrapling[fetchers]"
This installs the core library plus fetcher dependencies including curl-cffi for TLS fingerprinting.
Next, install browser binaries and fingerprint databases:
scrapling install
You'll see output like this:
Installing Playwright browsers...
Installing Playwright dependencies...
Installing Camoufox browser and databases...
This downloads Chromium for DynamicFetcher and the modified Firefox browser for StealthyFetcher.
For the complete package including CLI tools and AI features:
pip install "scrapling[all]"
Here's what each installation option includes:
| Package | Includes |
|---|---|
scrapling |
Parser engine only |
scrapling[fetchers] |
Parser + all fetcher classes |
scrapling[shell] |
CLI tools and interactive shell |
scrapling[ai] |
MCP server for AI integration |
scrapling[all] |
Everything above |
If you prefer Docker, a pre-built image with all dependencies exists:
docker pull pyd4vinci/scrapling
Verify your installation works:
from scrapling.fetchers import Fetcher
page = Fetcher.get('https://httpbin.org/get')
print(page.status) # Should print: 200
If you get a 200 status code, Scrapling is ready to use.
Common installation issues:
- Permission errors: Use
pip install --useror a virtual environment - Browser install fails: Run
scrapling installwith admin/sudo privileges - SSL errors: Update your system's CA certificates
Step 2: Scrape a Static Website
Static websites serve HTML directly without JavaScript rendering. They're the easiest targets for web scraping.
Let's scrape quotes from a practice website using Scrapling's Fetcher class.
Import the fetcher and make a request:
from scrapling.fetchers import Fetcher
url = "https://quotes.toscrape.com/"
page = Fetcher.get(url)
print(f"Status: {page.status}")
print(f"Content length: {len(page.html)} characters")
The Fetcher.get() method returns a Response object containing the HTML and metadata.
Now extract the quotes using CSS selectors:
from scrapling.fetchers import Fetcher
url = "https://quotes.toscrape.com/"
page = Fetcher.get(url)
quotes = []
for quote_element in page.css(".quote"):
text = quote_element.css_first(".text::text")
author = quote_element.css_first(".author::text")
tags = [tag.text for tag in quote_element.css(".tags .tag")]
quotes.append({
"text": text,
"author": author,
"tags": tags
})
for quote in quotes[:3]:
print(f"{quote['author']}: {quote['text'][:50]}...")
Notice the ::text pseudo-selector. This extracts text content directly, similar to Scrapy's syntax.
The css_first() method is about 10% faster than css() when you only need the first matching element.
You can also use XPath if you prefer that syntax:
quotes_xpath = page.xpath('//div[@class="quote"]')
text_xpath = quote_element.xpath('.//span[@class="text"]/text()')
Both selector types work interchangeably in Scrapling.
Navigation Methods
Scrapling provides rich DOM traversal capabilities. Once you have an element, you can navigate to related elements:
quote = page.css_first(".quote")
# Navigate to parent element
container = quote.parent
# Get next sibling element
next_quote = quote.next_sibling
# Get previous sibling
prev_quote = quote.previous_sibling
# Get all child elements
children = quote.children
# Find elements below this one in the DOM
elements_below = quote.below_elements()
These methods chain together for complex navigation:
# Get the author from the next sibling's child
author_element = quote.next_sibling.css_first(".author")
Text Extraction Options
Scrapling offers multiple ways to extract text:
element = page.css_first(".quote")
# Get direct text content
text = element.text
# Get all text including nested elements
all_text = element.get_all_text()
# Get text with whitespace stripped
clean_text = element.get_all_text(strip=True)
# Using pseudo-selector
text = element.css_first(".text::text")
The get_all_text() method recursively collects text from all child elements. Use this when content spans multiple nested tags.
Step 3: Handle Dynamic JavaScript Websites
Many modern websites load content via JavaScript after the initial page load. Standard HTTP requests won't capture this data.
Scrapling's DynamicFetcher launches a real browser to render JavaScript before extracting content.
Here's how to scrape a page that loads products via AJAX:
from scrapling.fetchers import DynamicFetcher
url = "https://www.scrapingcourse.com/javascript-rendering"
page = DynamicFetcher.fetch(
url,
wait_selector=".product-item",
headless=True,
network_idle=True
)
print(f"Status: {page.status}")
The key parameters here:
wait_selector: Pauses until this element appears in the DOMheadless=True: Runs the browser without a visible windownetwork_idle=True: Waits for network activity to stop
Now extract the dynamically loaded products:
from scrapling.fetchers import DynamicFetcher
url = "https://www.scrapingcourse.com/javascript-rendering"
page = DynamicFetcher.fetch(
url,
wait_selector=".product-item",
headless=True,
network_idle=True
)
products = []
for product in page.css(".product-item"):
name = product.css_first(".product-name::text")
price = product.css_first(".product-price::text")
link = product.css_first(".product-link::attr(href)")
image = product.css_first(".product-image::attr(src)")
products.append({
"name": name,
"price": price,
"url": link,
"image": image
})
print(f"Scraped {len(products)} products")
The ::attr(href) syntax extracts HTML attributes directly from elements.
Alternatively, access attributes through the attrib dictionary:
link = product.css_first(".product-link").attrib["href"]
# Or shorthand:
link = product.css_first(".product-link")["href"]
All three approaches produce identical results. Use whichever feels most natural.
Step 4: Bypass Cloudflare and Anti-Bot Protection
Cloudflare Turnstile and similar anti-bot systems block most automated scrapers. Scrapling's StealthyFetcher uses a modified Firefox browser with fingerprint spoofing to bypass these protections.
Here's how to scrape a Cloudflare-protected page:
from scrapling.fetchers import StealthyFetcher
url = "https://www.scrapingcourse.com/cloudflare-challenge"
page = StealthyFetcher.fetch(
url,
solve_cloudflare=True,
humanize=True,
headless=True
)
result = page.css_first("#challenge-info").get_all_text(strip=True)
print(result)
The critical parameters:
solve_cloudflare=True: Automatically handles Turnstile challengeshumanize=True: Simulates human-like cursor movementsheadless=True: Runs without displaying the browser window
StealthyFetcher relies on Camoufox, a modified Firefox build with native fingerprint spoofing. This makes detection significantly harder than standard browser automation tools.
For sites with aggressive bot detection, you might need to adjust settings:
page = StealthyFetcher.fetch(
url,
solve_cloudflare=True,
humanize=True,
headless=False, # Visible browser may help with some protections
google_search=True # Makes request appear to come from Google search
)
The google_search=True parameter modifies the referer header to appear as if you clicked a Google search result. Many sites trust traffic from search engines more than direct visits.
Step 5: Scale with Async Sessions and Pagination
Scraping multiple pages sequentially is slow. Scrapling supports async operations to fetch pages concurrently.
Here's how to scrape paginated content efficiently:
import asyncio
from scrapling.fetchers import FetcherSession
async def scrape_page(session, url):
page = await session.get(url)
quotes = []
for quote in page.css(".quote"):
quotes.append({
"text": quote.css_first(".text::text"),
"author": quote.css_first(".author::text")
})
return quotes
async def scrape_all():
base_url = "https://quotes.toscrape.com/page/{}/"
all_quotes = []
async with FetcherSession(impersonate="chrome") as session:
tasks = []
for page_num in range(1, 11):
url = base_url.format(page_num)
task = scrape_page(session, url)
tasks.append(task)
results = await asyncio.gather(*tasks)
for page_quotes in results:
all_quotes.extend(page_quotes)
return all_quotes
quotes = asyncio.run(scrape_all())
print(f"Total quotes scraped: {len(quotes)}")
This script fetches all 10 pages concurrently instead of one at a time.
The FetcherSession class reuses connections across requests, making subsequent requests up to 10x faster than creating new connections each time.
For browser-based scraping, use AsyncStealthySession or AsyncDynamicSession:
from scrapling.fetchers import AsyncStealthySession
async with AsyncStealthySession(max_pages=5) as session:
tasks = [session.fetch(url) for url in urls]
results = await asyncio.gather(*tasks)
The max_pages parameter controls how many browser tabs run simultaneously. Setting this too high consumes excessive memory.
Browser Tab Pool Management
For browser-based sessions, Scrapling maintains a pool of tabs that rotate between requests:
from scrapling.fetchers import AsyncStealthySession
async with AsyncStealthySession(max_pages=3) as session:
# Check pool status
stats = session.get_pool_stats()
print(f"Busy tabs: {stats['busy']}")
print(f"Free tabs: {stats['free']}")
tasks = [session.fetch(url) for url in urls]
results = await asyncio.gather(*tasks)
# Check stats after completion
print(session.get_pool_stats())
The pool prevents memory issues from opening too many browser instances. Requests queue automatically when all tabs are busy.
Handling Pagination with Unknown Page Count
Sometimes you don't know how many pages exist. Use a while loop that checks for a "next" button:
from scrapling.fetchers import FetcherSession
all_data = []
page_num = 1
with FetcherSession() as session:
while True:
url = f"https://example.com/products?page={page_num}"
page = session.get(url)
items = page.css(".product")
if not items:
break # No more products
for item in items:
all_data.append({
"name": item.css_first(".name::text"),
"price": item.css_first(".price::text")
})
# Check for next page link
next_link = page.css_first(".pagination .next")
if not next_link:
break
page_num += 1
print(f"Scraped {len(all_data)} items across {page_num} pages")
This pattern gracefully handles variable page counts.
Step 6: Integrate Proxies for Large-Scale Scraping
When scraping at scale, you'll eventually hit rate limits or IP bans. Proxies rotate your requests through different IP addresses.
Scrapling has native proxy support across all fetcher types.
Basic proxy integration:
from scrapling.fetchers import Fetcher
proxy_url = "http://username:password@proxy-host:port"
page = Fetcher.get(
"https://httpbin.org/ip",
proxy=proxy_url
)
print(page.json()) # Shows the proxy IP, not yours
For residential proxies that rotate automatically, services like Roundproxies.com provide endpoints that handle rotation server-side:
from scrapling.fetchers import StealthyFetcher
# Residential proxy from your provider
proxy = "http://user:pass@residential.proxy:port"
page = StealthyFetcher.fetch(
"https://target-site.com",
proxy=proxy,
solve_cloudflare=True
)
Combining residential proxies with StealthyFetcher's fingerprint spoofing creates scrapers that are extremely difficult to detect.
For session-based scraping with proxies:
from scrapling.fetchers import FetcherSession
async with FetcherSession(
impersonate="firefox",
proxy="http://user:pass@proxy:port"
) as session:
page1 = await session.get("https://site.com/page1")
page2 = await session.get("https://site.com/page2")
The proxy setting persists across all requests in the session.
Rotating Proxies for Each Request
If your proxy provider doesn't rotate automatically, implement rotation yourself:
import random
from scrapling.fetchers import Fetcher
proxies = [
"http://user:pass@proxy1:port",
"http://user:pass@proxy2:port",
"http://user:pass@proxy3:port",
]
def scrape_with_rotation(url):
proxy = random.choice(proxies)
return Fetcher.get(url, proxy=proxy)
for url in target_urls:
page = scrape_with_rotation(url)
# Process page...
For production scrapers, consider dedicated proxy services. Residential proxies from providers like Roundproxies.com are harder for target sites to detect compared to datacenter proxies.
Proxy Authentication Formats
Scrapling accepts proxies in standard formats:
# IP authentication (whitelist your IP first)
proxy = "http://proxy-host:port"
# Username/password authentication
proxy = "http://username:password@proxy-host:port"
# SOCKS5 proxy
proxy = "socks5://user:pass@proxy-host:port"
Test your proxy connection before scraping:
page = Fetcher.get("https://httpbin.org/ip", proxy=proxy)
print(page.json()) # Verify proxy IP is shown
Adaptive Scraping: Handle Website Redesigns Automatically
This is Scrapling's killer feature. Traditional scrapers break when websites change their HTML structure. Scrapling remembers element characteristics and finds them even after redesigns.
First, save element signatures during initial scraping:
from scrapling.fetchers import Fetcher
page = Fetcher.get("https://example.com/products")
# auto_save=True stores element fingerprints
products = page.css(".product-card", auto_save=True)
for product in products:
print(product.css_first(".title::text"))
Later, when the website changes its CSS classes, use adaptive mode:
page = Fetcher.get("https://example.com/products")
# adaptive=True uses stored fingerprints to relocate elements
products = page.css(".product-card", adaptive=True)
# Still works even if .product-card changed to .item-container
for product in products:
print(product.css_first(".title::text"))
Scrapling stores unique element properties: tag name, text content, attributes, parent/sibling relationships, and DOM depth. When you enable adaptive=True, it calculates similarity scores to find the best matching elements.
This eliminates the maintenance nightmare of constantly fixing broken selectors.
Find Similar Elements Without Writing Selectors
Sometimes you don't know the exact selector for all elements you need. Scrapling can find elements similar to one you've identified.
from scrapling.fetchers import Fetcher
page = Fetcher.get("https://quotes.toscrape.com/")
# Find one quote by its text
first_quote = page.find_by_text("The world as we have created it")
# Find all similar elements on the page
similar_quotes = first_quote.find_similar()
print(f"Found {len(similar_quotes)} similar elements")
The find_similar() method compares DOM structure, tag types, and attributes to locate matching elements.
You can fine-tune the matching:
similar = element.find_similar(
ignore_attributes=["id", "data-timestamp"], # Ignore dynamic attributes
threshold=0.7 # Minimum similarity score (0-1)
)
This is especially useful when scraping sites with inconsistent HTML or when prototyping scrapers quickly.
Using the Scrapling CLI for Quick Extraction
Scrapling includes command-line tools for scraping without writing code.
Extract page content directly to a file:
scrapling extract get 'https://example.com' output.md
This saves the page body as markdown.
For more control, specify a CSS selector:
scrapling extract get 'https://quotes.toscrape.com' quotes.txt --css-selector '.quote .text'
For JavaScript-rendered pages:
scrapling extract fetch 'https://dynamic-site.com' data.html --no-headless
For Cloudflare-protected sites:
scrapling extract stealthy-fetch 'https://protected.com' content.md --solve-cloudflare
The interactive shell provides a REPL environment for testing selectors:
scrapling shell
Inside the shell, you can test CSS/XPath selectors and convert cURL commands to Scrapling code.
Common Mistakes and How to Avoid Them
Mistake 1: Using DynamicFetcher for Static Sites
Browser automation is slow and resource-intensive. Only use DynamicFetcher or StealthyFetcher when the target actually requires JavaScript rendering.
Test first with Fetcher:
from scrapling.fetchers import Fetcher
page = Fetcher.get(url)
content = page.css(".target-element")
if not content:
# Page might be dynamic, try browser fetcher
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch(url)
Mistake 2: Not Using Sessions for Multiple Requests
Creating new connections for each request wastes time and resources.
Bad approach:
for url in urls:
page = Fetcher.get(url) # New connection each time
Better approach:
with FetcherSession() as session:
for url in urls:
page = session.get(url) # Reuses connection
Sessions are up to 10x faster for sequential requests.
Mistake 3: Ignoring Rate Limits
Hammering a server with rapid requests gets you blocked fast. Add delays between requests:
import time
for url in urls:
page = session.get(url)
time.sleep(1) # 1 second delay
For async code, use asyncio.sleep():
await asyncio.sleep(1)
Mistake 4: Not Handling Errors
Network requests fail. Scrapers should handle exceptions gracefully:
from scrapling.fetchers import Fetcher
try:
page = Fetcher.get(url, timeout=30)
if page.status != 200:
print(f"Non-200 status: {page.status}")
except Exception as e:
print(f"Request failed: {e}")
For retries, use the session's built-in retry parameter:
with FetcherSession(retries=3) as session:
page = session.get(url) # Automatically retries on failure
Implementing Robust Error Handling
Production scrapers need comprehensive error handling:
from scrapling.fetchers import Fetcher, DynamicFetcher
import time
def scrape_with_fallback(url, max_retries=3):
"""Scrape URL with fallback to browser if static fetch fails."""
for attempt in range(max_retries):
try:
# Try static fetch first (faster)
page = Fetcher.get(url, timeout=30)
if page.status == 200:
content = page.css(".target-element")
if content:
return page
# Content not found, might be dynamic
print("Content not in static HTML, trying browser...")
elif page.status == 403:
print("Blocked, trying stealth browser...")
elif page.status == 429:
print("Rate limited, waiting...")
time.sleep(60)
continue
except Exception as e:
print(f"Fetch error: {e}")
# Fallback to browser
try:
page = DynamicFetcher.fetch(
url,
headless=True,
network_idle=True,
timeout=60000
)
if page.status == 200:
return page
except Exception as e:
print(f"Browser fetch error: {e}")
time.sleep(2 ** attempt) # Exponential backoff
return None
# Usage
page = scrape_with_fallback("https://target.com")
if page:
data = page.css(".data")
else:
print("Failed after all retries")
This pattern starts fast with HTTP requests and falls back to browser automation only when needed.
Performance Comparison: Scrapling vs Other Libraries
Scrapling's custom parsing engine significantly outperforms alternatives.
Text extraction benchmark (5000 nested elements):
| Library | Time | vs Scrapling |
|---|---|---|
| Scrapling | 1.99ms | 1.0x |
| Parsel/Scrapy | 2.01ms | 1.01x |
| Raw lxml | 2.5ms | 1.26x |
| BeautifulSoup + lxml | 1541ms | 774x slower |
For element similarity search (adaptive scraping):
| Library | Time | vs Scrapling |
|---|---|---|
| Scrapling | 2.46ms | 1.0x |
| AutoScraper | 13.3ms | 5.4x slower |
These benchmarks explain why Scrapling feels noticeably faster in real-world scraping tasks.
When to Use Each Fetcher Type
Choosing the right fetcher dramatically affects scraping success and speed.
Use Fetcher when:
- Target serves static HTML
- No JavaScript rendering required
- Maximum speed needed
- Scraping APIs or JSON endpoints
Use DynamicFetcher when:
- Content loads via JavaScript/AJAX
- Need to interact with page (clicks, scrolls)
- SPA (Single Page Application) targets
- Cloudflare not present
Use StealthyFetcher when:
- Site has Cloudflare Turnstile
- Aggressive bot detection present
- Need fingerprint spoofing
- Standard browser automation gets blocked
Use Session classes when:
- Making multiple requests to same domain
- Need to maintain cookies/state
- Want connection reuse benefits
- Scraping paginated content
Summary
Scrapling simplifies Python web scraping by combining fast parsing, multiple fetcher options, and adaptive element tracking in one library.
The key points to remember:
- Install with
pip install "scrapling[fetchers]"then runscrapling install - Use
Fetcherfor static sites,DynamicFetcherfor JavaScript,StealthyFetcherfor anti-bot bypass - Session classes dramatically speed up multi-page scraping
- Adaptive scraping with
auto_save=Trueandadaptive=Truesurvives website redesigns - Native proxy support works across all fetcher types
- The CLI allows quick extraction without writing code
Scrapling handles the complexity of modern web scraping so you can focus on extracting the data you need.
Start with simple static scraping, then gradually incorporate browser automation and anti-bot features as your targets require them.
Frequently Asked Questions
Does Scrapling work with Python 3.9?
No. Scrapling requires Python 3.10 or higher. The library uses type hints and features not available in earlier versions.
Can Scrapling bypass all Cloudflare protections?
StealthyFetcher successfully bypasses Cloudflare Turnstile and standard protection in most cases. However, Cloudflare continuously updates their detection. No tool guarantees 100% bypass rates.
Is Scrapling faster than Selenium?
Yes. Scrapling's parsing engine is hundreds of times faster than BeautifulSoup (which Selenium users typically pair with). For actual page fetching, DynamicFetcher uses Playwright which performs similarly to Selenium, but StealthyFetcher uses optimized Camoufox which can be faster.
How do I export scraped data to CSV or JSON?
Scrapling focuses on fetching and parsing. For data export, use Python's standard libraries:
import json
import csv
# JSON export
with open('data.json', 'w') as f:
json.dump(scraped_data, f)
# CSV export
with open('data.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['title', 'price'])
writer.writeheader()
writer.writerows(scraped_data)
Does Scrapling respect robots.txt?
Scrapling does not automatically check robots.txt. You're responsible for respecting website terms of service and applicable laws.
How do I handle cookies and authentication?
Sessions automatically persist cookies between requests:
with FetcherSession() as session:
# Login request
login_page = session.post(
"https://site.com/login",
data={"user": "name", "pass": "secret"}
)
# Subsequent requests include session cookies
dashboard = session.get("https://site.com/dashboard")
For browser-based sessions, cookies persist similarly within the session context.
Can I scrape JavaScript-only SPAs?
Yes. DynamicFetcher and StealthyFetcher fully render JavaScript. For Single Page Applications:
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch(
"https://spa-site.com",
network_idle=True, # Wait for all XHR requests to complete
wait_selector=".content-loaded" # Wait for specific element
)
How do I debug selector issues?
Use the interactive shell to test selectors:
scrapling shell
Then test your selectors interactively before writing the full script. The shell supports live reloading and browser previews.
What's the difference between Scrapling and Scrapy?
Scrapy is a full web crawling framework with spiders, pipelines, and middleware. Scrapling is a focused library for fetching and parsing single pages.
Use Scrapy for large crawling projects with complex data pipelines. Use Scrapling for targeted scraping tasks where you need adaptive element tracking or anti-bot bypass.
Does Scrapling work behind a corporate firewall?
Yes, if your firewall allows outbound HTTP/HTTPS traffic. For browser fetchers, ensure ports used by Playwright/Camoufox aren't blocked. You may need to configure proxy settings to route through your corporate proxy.