How to Web-Scrape with JavaScript in 2025

You’ve got a bunch of data you want to collect from websites. And you’re thinking… “Do I really have to click through all of this and copy-paste it manually?”

Nope. That’s where web scraping comes in.

We used the exact approach in this guide to collect over 500,000 data points across different websites—in less than a month. Manually, that would’ve taken a small army and a full year. So yeah, this works.

In this guide, you’ll learn how to web-scrape with JavaScript in 2025. The tools have changed. The blockers have gotten smarter. But if you follow along, you’ll be up and scraping in no time.

1. Start with the right setup

Let’s not complicate things. To scrape websites with JavaScript, you need a proper setup. The good news? It’s a lot easier than it used to be.

Use Node.js 20+ or Deno

You’ll want to make sure you’re using an up-to-date runtime. Either of these works:

node -v
nvm install 20
nvm use 20
Tip: Deno is a good option if you’re paranoid about security. Node.js is better if you’re sticking to the mainstream ecosystem.

Structure your scraper

Don’t just toss files around. Make a clean folder with everything in its place:

web-scraper-2025/
├── src/
│   ├── scrapers/
│   ├── utils/
│   └── index.js
├── data/
├── package.json
└── .env

Install the right tools

Here’s what everyone’s using in 2025:

npm install puppeteer playwright axios cheerio jsdom
npm install --save-dev dotenv winston

You won’t need every tool for every site—but you’ll be glad to have them.

2. Choose the right tool for the job

Not every website is built the same. Some load instantly. Some need to “hydrate” like a fancy salad. Your scraper needs to adapt.

If the site is simple…

Use Axios.

const axios = require('axios');

async function fetchPage(url) {
  const res = await axios.get(url, {
    headers: {
      'User-Agent': 'Mozilla/5.0',
      'Accept-Language': 'en-US,en;q=0.9',
    }
  });
  return res.data;
}

If the site is complex (hello, JavaScript frameworks)…

Use Playwright.

const { chromium } = require('playwright');

async function scrapeDynamicPage(url) {
  const browser = await chromium.launch({ headless: true });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle' });

  const data = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.item')).map(item => ({
      title: item.querySelector('.title')?.textContent.trim(),
    }));
  });

  await browser.close();
  return data;
}

3. Beat the blockers (because websites fight back now)

Websites in 2025 are way more aggressive about stopping scrapers. But we’ve got tricks.

Rotate your proxies

One IP = one scraped page = blocked. Solution? Rotate.

const { getRandomProxy } = require('./utils/proxy-manager');

async function fetchWithProxy(url) {
  const proxy = getRandomProxy();
  return axios.get(url, {
    proxy: {
      host: proxy.host,
      port: proxy.port,
      auth: { username: proxy.username, password: proxy.password }
    },
    headers: { 'User-Agent': 'Mozilla/5.0' }
  });
}

Avoid fingerprinting

Sites can now detect if you’re using a bot—unless you tweak your setup.

const { chromium } = require('playwright');

async function setupStealthBrowser() {
  const browser = await chromium.launch({
    headless: true,
    args: ['--disable-blink-features=AutomationControlled']
  });

  const context = await browser.newContext({
    userAgent: 'Mozilla/5.0',
    locale: 'en-US',
    timezoneId: 'America/New_York',
  });

  await context.addInitScript(() => {
    Object.defineProperty(navigator, 'webdriver', { get: () => false });
  });

  return { browser, context };
}

4. Build your first scraper

Now it’s time to put everything together.

Let’s say you want to scrape product listings from a store. Here's the skeleton:

const { chromium } = require('playwright');
const fs = require('fs/promises');
const path = require('path');

async function scrapeProducts(url, pages = 3) {
  const { browser, context } = await setupStealthBrowser();
  const page = await context.newPage();
  const allProducts = [];

  await page.goto(url, { waitUntil: 'networkidle' });

  for (let i = 0; i < pages; i++) {
    await page.waitForSelector('.product-grid .product-item');

    const products = await page.evaluate(() => {
      return Array.from(document.querySelectorAll('.product-item')).map(item => ({
        name: item.querySelector('.product-name')?.textContent.trim(),
        price: item.querySelector('.product-price')?.textContent.trim(),
        available: !item.querySelector('.out-of-stock')
      }));
    });

    allProducts.push(...products);

    const hasNext = await page.$('.pagination .next:not(.disabled)');
    if (!hasNext) break;

    await Promise.all([
      page.click('.pagination .next'),
      page.waitForTimeout(2000),
    ]);
  }

  const timestamp = new Date().toISOString().replace(/:/g, '-');
  await fs.writeFile(
    path.join(__dirname, `../data/products-${timestamp}.json`),
    JSON.stringify(allProducts, null, 2)
  );

  await browser.close();
  console.log(`Scraped ${allProducts.length} products`);
}

5. Scale up (without getting banned)

One page at a time doesn’t cut it when you need all the data.

Here’s how to go big without burning out.

Add a rate limiter

This keeps you from bombarding the server and getting blocked:

class RateLimiter {
  constructor(maxRequests, timeWindow) {
    this.max = maxRequests;
    this.window = timeWindow;
    this.timestamps = [];
  }

  async acquire() {
    const now = Date.now();
    this.timestamps = this.timestamps.filter(ts => now - ts < this.window);

    if (this.timestamps.length >= this.max) {
      const waitTime = this.timestamps[0] + this.window - now;
      await new Promise(resolve => setTimeout(resolve, waitTime));
    }

    this.timestamps.push(now);
  }
}

Run scrapers in parallel

Use worker_threads to run multiple scrapers at once:

const { Worker, isMainThread, workerData, parentPort } = require('worker_threads');

if (isMainThread) {
  const urls = [/* ... your URLs here ... */];

  urls.forEach(url => {
    new Worker(__filename, { workerData: { url } });
  });
} else {
  (async () => {
    const result = await scrapeProducts(workerData.url);
    parentPort.postMessage(result);
  })();
}

Common mistakes to avoid

You don’t want your scraper breaking in the wild. Here’s what not to do:

  • Ignoring robots.txt – Just… don’t. Check it first.
  • Hardcoding CSS selectors – They’ll break the moment the site changes.
  • Skipping error handling – Add try-catch blocks or your scraper will crash midway.
  • Scraping too fast – Slow it down with delays and randomness.

Here’s a better way to find elements:

async function getPrice(page) {
  const selectors = ['.product-price', '[itemprop="price"]', '.price-main'];
  for (const sel of selectors) {
    const el = await page.$(sel);
    if (el) return await el.textContent();
  }
  return null;
}

What’s next?

Congrats. You’ve got the modern web scraping fundamentals down. But there’s more you can do:

  • Build a scraping API with Express
  • Use machine learning to make your scraper adapt to changes
  • Run your scraper in the cloud (serverless-style)
  • Store data in real databases (PostgreSQL, MongoDB, etc.)

Want to keep learning? Check out these guides:

  • Ethical Web Scraping: Best Practices for 2025
  • How to Bypass CAPTCHA Using AI in JavaScript
  • Building an E-commerce Price Tracker
Marius Bernard

Marius Bernard

Marius Bernard is a Product Advisor, Technical SEO, & Brand Ambassador at Roundproxies. He was the lead author for the SEO chapter of the 2024 Web and a reviewer for the 2023 SEO chapter.