Want to extract structured data from unstructured websites in 2025—without wasting your time copying and pasting everything by hand? You’re in the right place.
Web scraping is still one of the fastest ways to gather public web data at scale, and the combination of Axios and Cheerio continues to be one of the most effective stacks in the modern web scraping toolkit.
In this comprehensive guide, we’ll break down exactly how to master web scraping using Axios and Cheerio—step by step. We’ll also tackle advanced 2025-specific techniques for getting around sophisticated anti-bot systems, as well as share what to use when you need alternatives.
Introduction to Web Scraping in 2025
Web scraping has come a long way. In 2025, websites are smarter. They detect bot-like behavior faster. And scraping without basic precautions? A quick way to get blocked.
But here’s the thing—the core principles haven’t changed. You send a request, receive HTML, parse it, and extract what you need. The magic lies in how you do this in a way that’s sustainable and stealthy.
Axios handles the heavy lifting of HTTP requests. Cheerio lets you sift through HTML using syntax nearly identical to jQuery. Together, they form a lightweight, no-headache solution for scraping sites that don’t require heavy JavaScript rendering.
Why Use Axios and Cheerio in 2025?
Before jumping into the code, let’s quickly answer the why behind these tools:
- Lightweight and efficient – Great performance with minimal setup
- jQuery-like syntax – If you’ve ever written a jQuery selector, you’ll be productive in minutes
- Promise-based – Axios plays well with async/await for clean code
- Cross-environment support – Works both on the server and browser
- Vibrant ecosystem – Tons of tutorials, StackOverflow answers, and GitHub support
While other tools like Playwright or Puppeteer are better for JavaScript-heavy websites, Axios and Cheerio dominate for speed and simplicity.
Step 1: Setting Up Your Environment
To get started, create a new folder and install the two packages:
mkdir axios-cheerio-scraper
cd axios-cheerio-scraper
npm init -y
npm install axios cheerio
Then create your scraper.js
file. This is your sandbox for learning everything else in this guide.
// scraper.js
const axios = require('axios');
const cheerio = require('cheerio');
async function scrapeWebsite() {
try {
console.log('Starting scraper...');
// We'll add our scraping code here
} catch (error) {
console.error('Error:', error.message);
}
}
scrapeWebsite();
Step 2: Making Your First HTTP Request with Axios
Axios is your scraper’s entry point to the web. But in 2025, you can’t just hit a URL with default headers and expect results. You need to mimic a real browser.
Adding headers like User-Agent
, Accept
, and Accept-Language
makes your request feel human—not bot.
async function scrapeWebsite() {
try {
// Configure request with headers to appear more browser-like
const config = {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
}
};
// Make the HTTP request
const response = await axios.get('https://example-store.com/products', config);
// Log status and content type
console.log(`Status: ${response.status}`);
console.log(`Content type: ${response.headers['content-type']}`);
// Now we have the HTML content in response.data
const html = response.data;
console.log('HTML content retrieved successfully!');
// We'll parse this with Cheerio in the next step
return html;
} catch (error) {
console.error('Error fetching data:', error.message);
throw error;
}
}
Once Axios successfully fetches the content, you’re holding a full HTML page in memory. That’s where Cheerio comes in.
Step 3: Parsing HTML with Cheerio
Think of Cheerio as your scraping scalpel. Once you've loaded the HTML response, you can use selectors to grab elements as if you were inside a browser's dev console.
Want the title tag? Simple.
Counting all .product-card
items? Done.
async function scrapeWebsite() {
try {
// ... previous Axios code ...
// Parse the HTML with Cheerio
const $ = cheerio.load(html);
// Example: Get the page title
const pageTitle = $('title').text();
console.log('Page title:', pageTitle);
// Example: Count all product cards on the page
const productCount = $('.product-card').length;
console.log(`Found ${productCount} products on the page`);
return $;
} catch (error) {
console.error('Error:', error.message);
throw error;
}
}
This step is where your scraping logic starts to take shape—and where inspecting the page source becomes your best friend.
Step 4: Extracting Specific Data Points
You’ve got your selectors—now it's time to extract the actual data.
Whether it’s product names, prices, images, or ratings, Cheerio lets you drill down to any nested child element you need. You can clean text with .trim()
and convert attributes into structured JSON fields.
async function scrapeWebsite() {
try {
// ... previous code ...
// Create an array to store our product data
const products = [];
// Select all product cards and iterate through them
$('.product-card').each((index, element) => {
// Extract data from each product
const name = $(element).find('.product-name').text().trim();
const price = $(element).find('.product-price').text().trim();
const imageUrl = $(element).find('img').attr('src');
const rating = $(element).find('.rating-stars').attr('data-rating');
// Add to our products array
products.push({
name,
price,
imageUrl,
rating: rating ? parseFloat(rating) : null,
index
});
});
console.log(`Successfully extracted data for ${products.length} products`);
console.log('First product:', products[0]);
return products;
} catch (error) {
console.error('Error:', error.message);
throw error;
}
}
If you’re building a scraper to extract eCommerce data, news headlines, or job listings, this is where you structure your output into meaningful results.
Step 5: Handling Pagination and Navigation
Most websites don’t show everything on one page. Pagination is your next hurdle.
Using a loop with a page
counter, you can programmatically scrape each page until there are no more. In 2025, you’ll also need to add delays between requests (even 2 seconds is helpful) to avoid tripping bot detection systems.
async function scrapeAllPages(baseUrl, maxPages = 5) {
let allProducts = [];
let currentPage = 1;
while (currentPage <= maxPages) {
try {
console.log(`Scraping page ${currentPage}...`);
// Construct the URL for the current page
const url = `${baseUrl}?page=${currentPage}`;
// Get the HTML content
const response = await axios.get(url, {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
}
});
const $ = cheerio.load(response.data);
// Extract products on this page
const products = [];
$('.product-card').each((index, element) => {
// ... extraction code (same as before) ...
});
console.log(`Found ${products.length} products on page ${currentPage}`);
allProducts = [...allProducts, ...products];
// Check if there's a next page
const hasNextPage = $('.pagination .next').length > 0;
if (!hasNextPage) {
console.log('No more pages available');
break;
}
// Add a delay to avoid hitting rate limits (important in 2025!)
await new Promise(resolve => setTimeout(resolve, 2000));
currentPage++;
} catch (error) {
console.error(`Error on page ${currentPage}:`, error.message);
break;
}
}
console.log(`Total products scraped: ${allProducts.length}`);
return allProducts;
}
Pro tip: Always check for the presence of a “Next” button or pagination control before deciding to stop scraping.
Step 6: Avoiding Blocks and Bans
Let’s be honest—this is where most beginner scrapers get stuck.
Websites in 2025 deploy all sorts of anti-scraping defenses:
- Bot protection services (like Cloudflare or Akamai)
- CAPTCHA challenges
- IP rate limiting
- Browser fingerprinting
Here’s how to stay under the radar:
- Rotate User-Agents with every request
- Add realistic headers like
Referer
andCache-Control
- Use exponential backoff for retries
- Detect block or CAPTCHA pages in response content
- Randomize request timing
It’s not about hacking your way through—it’s about staying polite and invisible.
// Enhanced request function with retry and rotation capabilities
async function makeRequest(url, attempt = 1, maxAttempts = 3) {
try {
// Rotate user agents
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
];
// Pick a random user agent
const randomUserAgent = userAgents[Math.floor(Math.random() * userAgents.length)];
// Configure request
const config = {
headers: {
'User-Agent': randomUserAgent,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Referer': 'https://www.google.com/',
'Cache-Control': 'no-cache',
'Pragma': 'no-cache',
},
// Add a timeout to avoid hanging requests
timeout: 10000
};
console.log(`Request attempt ${attempt} to ${url}`);
const response = await axios.get(url, config);
// Check if we got a captcha or block page
if (response.data.includes('captcha') || response.data.includes('blocked')) {
throw new Error('Detected captcha or blocking page');
}
return response.data;
} catch (error) {
if (attempt < maxAttempts) {
// Calculate exponential backoff delay
const backoffDelay = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
console.log(`Request failed. Retrying in ${Math.round(backoffDelay / 1000)} seconds...`);
await new Promise(resolve => setTimeout(resolve, backoffDelay));
return makeRequest(url, attempt + 1, maxAttempts);
} else {
console.error('Maximum retry attempts reached');
throw error;
}
}
}
Advanced Techniques for 2025
You’ve built a basic scraper—but what if the site is JavaScript-heavy or uses fingerprinting?
Here’s how to level up:
1. Use Headless Browsers When Needed
For websites that render content client-side, integrate Puppeteer or Playwright with Cheerio. These tools render the page just like a browser, giving you fully loaded HTML.
2. Fingerprint Spoofing
Use plugins like puppeteer-extra-plugin-stealth
to mask the fact that you're using a headless browser.
3. Proxy Rotation
Cycle through proxy IPs using paid or free proxy pools to avoid IP bans.
4. Hybrid Scraping
Render content with Puppeteer, extract the raw HTML, then switch to Cheerio for parsing. Best of both worlds—automation and speed.
Python Alternative: Requests + BeautifulSoup
Prefer Python? No problem. You can achieve similar results using requests
and BeautifulSoup
. The syntax is different, but the logic is identical: send requests → parse DOM → extract content.
import requests
from bs4 import BeautifulSoup
import time
import random
def scrape_website(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
products = []
for product in soup.select('.product-card'):
name = product.select_one('.product-name').text.strip()
price = product.select_one('.product-price').text.strip()
image_url = product.select_one('img')['src']
products.append({
'name': name,
'price': price,
'image_url': image_url
})
return products
In some cases, Python scrapers are more readable, especially for data science workflows. But if you’re embedded in the JavaScript ecosystem, Axios + Cheerio is still the fastest path to production.
Alternatives to Axios and Cheerio in 2025
Need more power or automation? Consider these tools:
- Got-scraping – Smart Axios replacement with built-in anti-blocking features
- Playwright/Puppeteer – Ideal for JavaScript-heavy or interactive sites
- ScrapingBee / ZenRows – Paid scraping APIs with proxy and CAPTCHA handling built-in
- Selenium WebDriver – Still around, but mostly replaced by Playwright in modern stacks
Each has its place—use the right tool for the job.
Final Thoughts: Web Scraping in 2025
Scraping isn’t dead in 2025—it’s just smarter.
Axios and Cheerio remain a top-tier choice when you want clean, fast data extraction from static or semi-dynamic websites. They’re lightweight, flexible, and perfect for developers who value simplicity and speed.
But the rules of the game have evolved:
- You need browser headers, proxy rotation, and error handling
- You should respect sites’
robots.txt
and legal limitations - You must treat scraping as an engineering discipline—not a hack
If you follow the best practices outlined in this guide, you’ll be well-equipped to build robust scrapers that stay undetected and deliver consistent results.