You’ve got a bunch of data you want to collect from websites. And you’re thinking… “Do I really have to click through all of this and copy-paste it manually?”
Nope. That’s where web scraping comes in.
We used the exact approach in this guide to collect over 500,000 data points across different websites—in less than a month. Manually, that would’ve taken a small army and a full year. So yeah, this works.
In this guide, you’ll learn how to web-scrape with JavaScript in 2025. The tools have changed. The blockers have gotten smarter. But if you follow along, you’ll be up and scraping in no time.
1. Start with the right setup
Let’s not complicate things. To scrape websites with JavaScript, you need a proper setup. The good news? It’s a lot easier than it used to be.
Use Node.js 20+ or Deno
You’ll want to make sure you’re using an up-to-date runtime. Either of these works:
node -v
nvm install 20
nvm use 20
Tip: Deno is a good option if you’re paranoid about security. Node.js is better if you’re sticking to the mainstream ecosystem.
Structure your scraper
Don’t just toss files around. Make a clean folder with everything in its place:
web-scraper-2025/
├── src/
│ ├── scrapers/
│ ├── utils/
│ └── index.js
├── data/
├── package.json
└── .env
Install the right tools
Here’s what everyone’s using in 2025:
npm install puppeteer playwright axios cheerio jsdom
npm install --save-dev dotenv winston
You won’t need every tool for every site—but you’ll be glad to have them.
2. Choose the right tool for the job
Not every website is built the same. Some load instantly. Some need to “hydrate” like a fancy salad. Your scraper needs to adapt.
If the site is simple…
Use Axios.
const axios = require('axios');
async function fetchPage(url) {
const res = await axios.get(url, {
headers: {
'User-Agent': 'Mozilla/5.0',
'Accept-Language': 'en-US,en;q=0.9',
}
});
return res.data;
}
If the site is complex (hello, JavaScript frameworks)…
Use Playwright.
const { chromium } = require('playwright');
async function scrapeDynamicPage(url) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
const data = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.item')).map(item => ({
title: item.querySelector('.title')?.textContent.trim(),
}));
});
await browser.close();
return data;
}
3. Beat the blockers (because websites fight back now)
Websites in 2025 are way more aggressive about stopping scrapers. But we’ve got tricks.
Rotate your proxies
One IP = one scraped page = blocked. Solution? Rotate.
const { getRandomProxy } = require('./utils/proxy-manager');
async function fetchWithProxy(url) {
const proxy = getRandomProxy();
return axios.get(url, {
proxy: {
host: proxy.host,
port: proxy.port,
auth: { username: proxy.username, password: proxy.password }
},
headers: { 'User-Agent': 'Mozilla/5.0' }
});
}
Avoid fingerprinting
Sites can now detect if you’re using a bot—unless you tweak your setup.
const { chromium } = require('playwright');
async function setupStealthBrowser() {
const browser = await chromium.launch({
headless: true,
args: ['--disable-blink-features=AutomationControlled']
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0',
locale: 'en-US',
timezoneId: 'America/New_York',
});
await context.addInitScript(() => {
Object.defineProperty(navigator, 'webdriver', { get: () => false });
});
return { browser, context };
}
4. Build your first scraper
Now it’s time to put everything together.
Let’s say you want to scrape product listings from a store. Here's the skeleton:
const { chromium } = require('playwright');
const fs = require('fs/promises');
const path = require('path');
async function scrapeProducts(url, pages = 3) {
const { browser, context } = await setupStealthBrowser();
const page = await context.newPage();
const allProducts = [];
await page.goto(url, { waitUntil: 'networkidle' });
for (let i = 0; i < pages; i++) {
await page.waitForSelector('.product-grid .product-item');
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product-item')).map(item => ({
name: item.querySelector('.product-name')?.textContent.trim(),
price: item.querySelector('.product-price')?.textContent.trim(),
available: !item.querySelector('.out-of-stock')
}));
});
allProducts.push(...products);
const hasNext = await page.$('.pagination .next:not(.disabled)');
if (!hasNext) break;
await Promise.all([
page.click('.pagination .next'),
page.waitForTimeout(2000),
]);
}
const timestamp = new Date().toISOString().replace(/:/g, '-');
await fs.writeFile(
path.join(__dirname, `../data/products-${timestamp}.json`),
JSON.stringify(allProducts, null, 2)
);
await browser.close();
console.log(`Scraped ${allProducts.length} products`);
}
5. Scale up (without getting banned)
One page at a time doesn’t cut it when you need all the data.
Here’s how to go big without burning out.
Add a rate limiter
This keeps you from bombarding the server and getting blocked:
class RateLimiter {
constructor(maxRequests, timeWindow) {
this.max = maxRequests;
this.window = timeWindow;
this.timestamps = [];
}
async acquire() {
const now = Date.now();
this.timestamps = this.timestamps.filter(ts => now - ts < this.window);
if (this.timestamps.length >= this.max) {
const waitTime = this.timestamps[0] + this.window - now;
await new Promise(resolve => setTimeout(resolve, waitTime));
}
this.timestamps.push(now);
}
}
Run scrapers in parallel
Use worker_threads
to run multiple scrapers at once:
const { Worker, isMainThread, workerData, parentPort } = require('worker_threads');
if (isMainThread) {
const urls = [/* ... your URLs here ... */];
urls.forEach(url => {
new Worker(__filename, { workerData: { url } });
});
} else {
(async () => {
const result = await scrapeProducts(workerData.url);
parentPort.postMessage(result);
})();
}
Common mistakes to avoid
You don’t want your scraper breaking in the wild. Here’s what not to do:
- Ignoring
robots.txt
– Just… don’t. Check it first. - Hardcoding CSS selectors – They’ll break the moment the site changes.
- Skipping error handling – Add try-catch blocks or your scraper will crash midway.
- Scraping too fast – Slow it down with delays and randomness.
Here’s a better way to find elements:
async function getPrice(page) {
const selectors = ['.product-price', '[itemprop="price"]', '.price-main'];
for (const sel of selectors) {
const el = await page.$(sel);
if (el) return await el.textContent();
}
return null;
}
What’s next?
Congrats. You’ve got the modern web scraping fundamentals down. But there’s more you can do:
- Build a scraping API with Express
- Use machine learning to make your scraper adapt to changes
- Run your scraper in the cloud (serverless-style)
- Store data in real databases (PostgreSQL, MongoDB, etc.)
Want to keep learning? Check out these guides:
- Ethical Web Scraping: Best Practices for 2025
- How to Bypass CAPTCHA Using AI in JavaScript
- Building an E-commerce Price Tracker