Puppeteer Extra is like Puppeteer’s more capable cousin. It brings a modular plugin system into the mix, giving you the tools you need to tackle the real-world challenges of web scraping—whether it’s stealth mode, ad blocking, solving CAPTCHAs, or bypassing anti-bot walls.
If you've ever run into roadblocks using plain Puppeteer, this guide is going to be your new best friend.
Let’s be honest: web scraping today isn’t as simple as it used to be. Many modern websites are on high alert for bots, and basic Puppeteer setups are often flagged and blocked almost immediately. You might’ve already hit issues with CAPTCHAs, sluggish performance from loading too many ads, or even full-on Cloudflare blocks.
Standard Puppeteer is great for automation—but when it comes to scraping websites that fight back, it just doesn’t cut it. You’re up against:
- Anti-bot systems that flag your scripts
- Persistent CAPTCHA prompts
- Ad-heavy pages that waste time and bandwidth
- Browser fingerprinting traps
- Authentication flows that seem impossible to get past
That’s where Puppeteer Extra shines. In this guide, we’ll walk you through 5 essential steps to get the most out of Puppeteer Extra. You’ll learn how to install it, set it up with powerful plugins, and build a scraping setup that’s hard to detect and easy to scale.
Contents
❖ Why You Can Trust This Guide
❖ Step 1: Install and Set Up Puppeteer Extra
❖ Step 2: Master the Stealth Plugin for Avoiding Detection
❖ Step 3: Implement Automatic CAPTCHA Solving
❖ Step 4: Optimize Performance with Resource Blocking
❖ Step 5: Combine Multiple Plugins for Advanced Scraping
❖ Next Steps
Why You Can Trust This Guide
Here’s the deal—websites are evolving fast. They’re using smarter tools to detect bots and shut down scraping scripts. That’s the challenge.
The good news? Puppeteer Extra is designed to meet that challenge head-on. Its plugin system has been tested against real-world obstacles and has proven effective across a wide range of websites.
Thousands of developers use these exact techniques in production scraping tools. So yes, this guide is built on experience—and what actually works.
Step 1: Install and Set Up Puppeteer Extra
Before you can unleash the power of Puppeteer Extra, you need to get it installed and ready to go. The setup is quick, and once you’ve done it, you’ll have access to the whole plugin ecosystem.
Basic Installation
npm install puppeteer puppeteer-extra
# or using yarn
yarn add puppeteer puppeteer-extra
Your First Puppeteer Extra Script
const puppeteer = require('puppeteer-extra');
(async () => {
// Launch browser with puppeteer-extra
const browser = await puppeteer.launch({
headless: false, // Set to true for production
defaultViewport: null
});
const page = await browser.newPage();
await page.goto('https://example.com');
// Take a screenshot to verify it's working
await page.screenshot({ path: 'test.png' });
await browser.close();
})();
TypeScript Support
import puppeteer from 'puppeteer-extra';
// TypeScript will automatically infer types
const browser = await puppeteer.launch();
Pro Tip: Develop with headless: false
so you can watch what’s going on. Save headless: true
for production once everything’s dialed in.
Step 2: Master the Stealth Plugin for Avoiding Detection
The Stealth plugin is a game-changer. It helps disguise your automation as a real human browser session by tweaking and masking telltale signs that bots usually leave behind.
Installing the Stealth Plugin
npm install puppeteer-extra-plugin-stealth
Basic Stealth Configuration
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto('https://bot.sannysoft.com');
await page.waitForTimeout(5000);
await page.screenshot({ path: 'stealth-test.png', fullPage: true });
await browser.close();
})();
What the Stealth Plugin Does
It quietly disables or modifies a bunch of browser characteristics that typically scream “bot.” That includes:
- Removing the
navigator.webdriver
flag - Tweaking the user agent to look more human
- Faking plugin details
- Overriding permission prompts
- Masking WebGL info
- Fixing subtle layout quirks
Advanced Stealth Configuration
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const stealth = StealthPlugin();
stealth.enabledEvasions.delete('user-agent-override');
puppeteer.use(stealth);
Heads-up: Even the best evasion tricks won’t always work on tough systems like Cloudflare. Sometimes, you’ll need a backup plan—like rotating proxies or scraping APIs.
Step 3: Implement Automatic CAPTCHA Solving
Let’s face it—CAPTCHAs are one of the most annoying scraping hurdles. Fortunately, Puppeteer Extra can handle them automatically using third-party services.
Installing the Recaptcha Plugin
npm install puppeteer-extra-plugin-recaptcha
Setting Up CAPTCHA Solving
const puppeteer = require('puppeteer-extra');
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha');
puppeteer.use(
RecaptchaPlugin({
provider: {
id: '2captcha',
token: 'YOUR_2CAPTCHA_API_KEY'
},
visualFeedback: true
})
);
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.google.com/recaptcha/api2/demo');
await page.solveRecaptchas();
await page.waitForNavigation();
console.log('CAPTCHA solved and form submitted!');
await browser.close();
})();
Handling Multiple CAPTCHAs and Frames
for (const frame of page.mainFrame().childFrames()) {
await frame.solveRecaptchas();
}
Error Handling
try {
const { solved, solutions } = await page.solveRecaptchas();
if (solved.length === 0) {
console.log('No CAPTCHAs found on the page');
} else {
console.log(`Solved ${solved.length} CAPTCHAs`);
}
} catch (error) {
console.error('CAPTCHA solving failed:', error);
}
Pro Tip: Keep an eye on your solving service costs. If you're scraping high-traffic sites, the CAPTCHA solving fees can add up quickly.
Step 4: Optimize Performance with Resource Blocking
Why waste time downloading stuff you don’t need? Most sites serve images, fonts, ads, and other fluff that can slow you down. Here's how to block it smartly.
Installing Performance Plugins
npm install puppeteer-extra-plugin-adblocker puppeteer-extra-plugin-block-resources
Using the Adblocker Plugin
const puppeteer = require('puppeteer-extra');
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
puppeteer.use(AdblockerPlugin({ blockTrackers: true }));
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.example-with-ads.com');
const content = await page.evaluate(() => document.body.innerText);
console.log(content);
await browser.close();
})();
Selective Resource Blocking
const BlockResourcesPlugin = require('puppeteer-extra-plugin-block-resources');
puppeteer.use(
BlockResourcesPlugin({
blockedTypes: new Set(['image', 'stylesheet', 'font', 'media']),
interceptResolutionPriority: 1
})
);
page.on('request', (request) => {
const url = request.url();
if (url.includes('doubleclick.net') || url.includes('google-analytics.com')) {
request.abort();
} else {
request.continue();
}
});
Performance Monitoring
const startTime = Date.now();
await page.goto('https://heavy-website.com', {
waitUntil: 'networkidle0'
});
const loadTime = Date.now() - startTime;
console.log(`Page loaded in ${loadTime}ms`);
const metrics = await page.metrics();
console.log('Page metrics:', metrics);
Watch out: Don’t block essential scripts that the page needs to display your target data.
Step 5: Combine Multiple Plugins for Advanced Scraping
This is where it all comes together. You’ll see how to blend plugins—stealth, CAPTCHA solving, ad-blocking—into one powerful scraper that can hold its own against just about anything.
Complete Advanced Scraping Setup
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha');
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker');
const AnonymizeUAPlugin = require('puppeteer-extra-plugin-anonymize-ua');
// Configure all plugins
puppeteer.use(StealthPlugin());
puppeteer.use(AdblockerPlugin({ blockTrackers: true }));
puppeteer.use(AnonymizeUAPlugin());
puppeteer.use(
RecaptchaPlugin({
provider: {
id: '2captcha',
token: process.env.CAPTCHA_API_KEY
},
visualFeedback: true
})
);
async function scrapePage(url) {
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--no-first-run',
'--no-zygote',
'--disable-gpu'
]
});
try {
const page = await browser.newPage();
// Set viewport and user agent
await page.setViewport({ width: 1920, height: 1080 });
// Enable request interception for additional control
await page.setRequestInterception(true);
page.on('request', (request) => {
// Additional custom blocking logic if needed
request.continue();
});
// Navigate with timeout
await page.goto(url, {
waitUntil: 'networkidle0',
timeout: 30000
});
// Solve any CAPTCHAs
await page.solveRecaptchas();
// Wait for content to load
await page.waitForSelector('body', { timeout: 10000 });
// Extract data
const data = await page.evaluate(() => {
// Your extraction logic here
return {
title: document.title,
content: document.body.innerText
};
});
return data;
} catch (error) {
console.error('Scraping failed:', error);
throw error;
} finally {
await browser.close();
}
}
// Usage with error handling and retries
async function scrapeWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
const data = await scrapePage(url);
return data;
} catch (error) {
console.log(`Attempt ${i + 1} failed, retrying...`);
if (i === maxRetries - 1) throw error;
// Wait before retrying
await new Promise(resolve => setTimeout(resolve, 2000 * (i + 1)));
}
}
}
// Example usage
(async () => {
try {
const data = await scrapeWithRetry('https://example.com');
console.log('Scraped data:', data);
} catch (error) {
console.error('All retries failed:', error);
}
})();
Additional Useful Plugins
- puppeteer-extra-plugin-proxy: Use proxies to rotate IPs
- puppeteer-extra-plugin-user-preferences: Set custom browser behaviors
- puppeteer-extra-plugin-devtools: Great for debugging with DevTools
Plugin Compatibility
Plugins typically play well together, but:
- Some might override the same browser settings
- Too many plugins can slow things down
- Always test your setup thoroughly
Final Thoughts
Puppeteer Extra unlocks a whole new level of scraping power. Whether it’s flying under the radar with stealth, cutting through CAPTCHA roadblocks, or speeding up your scrapes by skipping unnecessary resources—it’s built for serious scraping.
That said, the web is always evolving. The tactics that work today might not work tomorrow. So stay nimble, keep learning, and always respect the rules (like robots.txt and terms of service).