Web Scraping with Axios and Cheerio in 2025

Want to extract structured data from unstructured websites in 2025—without wasting your time copying and pasting everything by hand? You’re in the right place.

Web scraping is still one of the fastest ways to gather public web data at scale, and the combination of Axios and Cheerio continues to be one of the most effective stacks in the modern web scraping toolkit.

In this comprehensive guide, we’ll break down exactly how to master web scraping using Axios and Cheerio—step by step. We’ll also tackle advanced 2025-specific techniques for getting around sophisticated anti-bot systems, as well as share what to use when you need alternatives.

Introduction to Web Scraping in 2025

Web scraping has come a long way. In 2025, websites are smarter. They detect bot-like behavior faster. And scraping without basic precautions? A quick way to get blocked.

But here’s the thing—the core principles haven’t changed. You send a request, receive HTML, parse it, and extract what you need. The magic lies in how you do this in a way that’s sustainable and stealthy.

Axios handles the heavy lifting of HTTP requests. Cheerio lets you sift through HTML using syntax nearly identical to jQuery. Together, they form a lightweight, no-headache solution for scraping sites that don’t require heavy JavaScript rendering.

Why Use Axios and Cheerio in 2025?

Before jumping into the code, let’s quickly answer the why behind these tools:

  • Lightweight and efficient – Great performance with minimal setup
  • jQuery-like syntax – If you’ve ever written a jQuery selector, you’ll be productive in minutes
  • Promise-based – Axios plays well with async/await for clean code
  • Cross-environment support – Works both on the server and browser
  • Vibrant ecosystem – Tons of tutorials, StackOverflow answers, and GitHub support

While other tools like Playwright or Puppeteer are better for JavaScript-heavy websites, Axios and Cheerio dominate for speed and simplicity.

Step 1: Setting Up Your Environment

To get started, create a new folder and install the two packages:

mkdir axios-cheerio-scraper  
cd axios-cheerio-scraper  
npm init -y  
npm install axios cheerio  

Then create your scraper.js file. This is your sandbox for learning everything else in this guide.

// scraper.js
const axios = require('axios');
const cheerio = require('cheerio');

async function scrapeWebsite() {
  try {
    console.log('Starting scraper...');
    // We'll add our scraping code here
  } catch (error) {
    console.error('Error:', error.message);
  }
}

scrapeWebsite();

Step 2: Making Your First HTTP Request with Axios

Axios is your scraper’s entry point to the web. But in 2025, you can’t just hit a URL with default headers and expect results. You need to mimic a real browser.

Adding headers like User-Agent, Accept, and Accept-Language makes your request feel human—not bot.

async function scrapeWebsite() {
  try {
    // Configure request with headers to appear more browser-like
    const config = {
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
      }
    };
    
    // Make the HTTP request
    const response = await axios.get('https://example-store.com/products', config);
    
    // Log status and content type
    console.log(`Status: ${response.status}`);
    console.log(`Content type: ${response.headers['content-type']}`);
    
    // Now we have the HTML content in response.data
    const html = response.data;
    console.log('HTML content retrieved successfully!');
    
    // We'll parse this with Cheerio in the next step
    return html;
  } catch (error) {
    console.error('Error fetching data:', error.message);
    throw error;
  }
}

Once Axios successfully fetches the content, you’re holding a full HTML page in memory. That’s where Cheerio comes in.

Step 3: Parsing HTML with Cheerio

Think of Cheerio as your scraping scalpel. Once you've loaded the HTML response, you can use selectors to grab elements as if you were inside a browser's dev console.

Want the title tag? Simple.
Counting all .product-card items? Done.

async function scrapeWebsite() {
  try {
    // ... previous Axios code ...
    
    // Parse the HTML with Cheerio
    const $ = cheerio.load(html);
    
    // Example: Get the page title
    const pageTitle = $('title').text();
    console.log('Page title:', pageTitle);
    
    // Example: Count all product cards on the page
    const productCount = $('.product-card').length;
    console.log(`Found ${productCount} products on the page`);
    
    return $;
  } catch (error) {
    console.error('Error:', error.message);
    throw error;
  }
}

This step is where your scraping logic starts to take shape—and where inspecting the page source becomes your best friend.

Step 4: Extracting Specific Data Points

You’ve got your selectors—now it's time to extract the actual data.

Whether it’s product names, prices, images, or ratings, Cheerio lets you drill down to any nested child element you need. You can clean text with .trim() and convert attributes into structured JSON fields.

async function scrapeWebsite() {
  try {
    // ... previous code ...
    
    // Create an array to store our product data
    const products = [];
    
    // Select all product cards and iterate through them
    $('.product-card').each((index, element) => {
      // Extract data from each product
      const name = $(element).find('.product-name').text().trim();
      const price = $(element).find('.product-price').text().trim();
      const imageUrl = $(element).find('img').attr('src');
      const rating = $(element).find('.rating-stars').attr('data-rating');
      
      // Add to our products array
      products.push({
        name,
        price,
        imageUrl,
        rating: rating ? parseFloat(rating) : null,
        index
      });
    });
    
    console.log(`Successfully extracted data for ${products.length} products`);
    console.log('First product:', products[0]);
    
    return products;
  } catch (error) {
    console.error('Error:', error.message);
    throw error;
  }
}

If you’re building a scraper to extract eCommerce data, news headlines, or job listings, this is where you structure your output into meaningful results.

Step 5: Handling Pagination and Navigation

Most websites don’t show everything on one page. Pagination is your next hurdle.

Using a loop with a page counter, you can programmatically scrape each page until there are no more. In 2025, you’ll also need to add delays between requests (even 2 seconds is helpful) to avoid tripping bot detection systems.

async function scrapeAllPages(baseUrl, maxPages = 5) {
  let allProducts = [];
  let currentPage = 1;
  
  while (currentPage <= maxPages) {
    try {
      console.log(`Scraping page ${currentPage}...`);
      
      // Construct the URL for the current page
      const url = `${baseUrl}?page=${currentPage}`;
      
      // Get the HTML content
      const response = await axios.get(url, {
        headers: {
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        }
      });
      
      const $ = cheerio.load(response.data);
      
      // Extract products on this page
      const products = [];
      $('.product-card').each((index, element) => {
        // ... extraction code (same as before) ...
      });
      
      console.log(`Found ${products.length} products on page ${currentPage}`);
      allProducts = [...allProducts, ...products];
      
      // Check if there's a next page
      const hasNextPage = $('.pagination .next').length > 0;
      if (!hasNextPage) {
        console.log('No more pages available');
        break;
      }
      
      // Add a delay to avoid hitting rate limits (important in 2025!)
      await new Promise(resolve => setTimeout(resolve, 2000));
      
      currentPage++;
    } catch (error) {
      console.error(`Error on page ${currentPage}:`, error.message);
      break;
    }
  }
  
  console.log(`Total products scraped: ${allProducts.length}`);
  return allProducts;
}
Pro tip: Always check for the presence of a “Next” button or pagination control before deciding to stop scraping.

Step 6: Avoiding Blocks and Bans

Let’s be honest—this is where most beginner scrapers get stuck.

Websites in 2025 deploy all sorts of anti-scraping defenses:

  • Bot protection services (like Cloudflare or Akamai)
  • CAPTCHA challenges
  • IP rate limiting
  • Browser fingerprinting

Here’s how to stay under the radar:

  • Rotate User-Agents with every request
  • Add realistic headers like Referer and Cache-Control
  • Use exponential backoff for retries
  • Detect block or CAPTCHA pages in response content
  • Randomize request timing

It’s not about hacking your way through—it’s about staying polite and invisible.

// Enhanced request function with retry and rotation capabilities
async function makeRequest(url, attempt = 1, maxAttempts = 3) {
  try {
    // Rotate user agents
    const userAgents = [
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15',
      'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
    ];
    
    // Pick a random user agent
    const randomUserAgent = userAgents[Math.floor(Math.random() * userAgents.length)];
    
    // Configure request
    const config = {
      headers: {
        'User-Agent': randomUserAgent,
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
        'Referer': 'https://www.google.com/',
        'Cache-Control': 'no-cache',
        'Pragma': 'no-cache',
      },
      // Add a timeout to avoid hanging requests
      timeout: 10000
    };
    
    console.log(`Request attempt ${attempt} to ${url}`);
    const response = await axios.get(url, config);
    
    // Check if we got a captcha or block page
    if (response.data.includes('captcha') || response.data.includes('blocked')) {
      throw new Error('Detected captcha or blocking page');
    }
    
    return response.data;
  } catch (error) {
    if (attempt < maxAttempts) {
      // Calculate exponential backoff delay
      const backoffDelay = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
      console.log(`Request failed. Retrying in ${Math.round(backoffDelay / 1000)} seconds...`);
      
      await new Promise(resolve => setTimeout(resolve, backoffDelay));
      return makeRequest(url, attempt + 1, maxAttempts);
    } else {
      console.error('Maximum retry attempts reached');
      throw error;
    }
  }
}

Advanced Techniques for 2025

You’ve built a basic scraper—but what if the site is JavaScript-heavy or uses fingerprinting?

Here’s how to level up:

1. Use Headless Browsers When Needed

For websites that render content client-side, integrate Puppeteer or Playwright with Cheerio. These tools render the page just like a browser, giving you fully loaded HTML.

2. Fingerprint Spoofing

Use plugins like puppeteer-extra-plugin-stealth to mask the fact that you're using a headless browser.

3. Proxy Rotation

Cycle through proxy IPs using paid or free proxy pools to avoid IP bans.

4. Hybrid Scraping

Render content with Puppeteer, extract the raw HTML, then switch to Cheerio for parsing. Best of both worlds—automation and speed.

Python Alternative: Requests + BeautifulSoup

Prefer Python? No problem. You can achieve similar results using requests and BeautifulSoup. The syntax is different, but the logic is identical: send requests → parse DOM → extract content.

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_website(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
    }
    
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    products = []
    
    for product in soup.select('.product-card'):
        name = product.select_one('.product-name').text.strip()
        price = product.select_one('.product-price').text.strip()
        image_url = product.select_one('img')['src']
        
        products.append({
            'name': name,
            'price': price,
            'image_url': image_url
        })
    
    return products

In some cases, Python scrapers are more readable, especially for data science workflows. But if you’re embedded in the JavaScript ecosystem, Axios + Cheerio is still the fastest path to production.

Alternatives to Axios and Cheerio in 2025

Need more power or automation? Consider these tools:

  • Got-scraping – Smart Axios replacement with built-in anti-blocking features
  • Playwright/Puppeteer – Ideal for JavaScript-heavy or interactive sites
  • ScrapingBee / ZenRows – Paid scraping APIs with proxy and CAPTCHA handling built-in
  • Selenium WebDriver – Still around, but mostly replaced by Playwright in modern stacks

Each has its place—use the right tool for the job.

Final Thoughts: Web Scraping in 2025

Scraping isn’t dead in 2025—it’s just smarter.

Axios and Cheerio remain a top-tier choice when you want clean, fast data extraction from static or semi-dynamic websites. They’re lightweight, flexible, and perfect for developers who value simplicity and speed.

But the rules of the game have evolved:

  • You need browser headers, proxy rotation, and error handling
  • You should respect sites’ robots.txt and legal limitations
  • You must treat scraping as an engineering discipline—not a hack

If you follow the best practices outlined in this guide, you’ll be well-equipped to build robust scrapers that stay undetected and deliver consistent results.

Marius Bernard

Marius Bernard

Marius Bernard is a Product Advisor, Technical SEO, & Brand Ambassador at Roundproxies. He was the lead author for the SEO chapter of the 2024 Web and a reviewer for the 2023 SEO chapter.