Airtop is a cloud browser automation platform that lets you control browsers through natural language commands and handle complex authentication like OAuth, 2FA, and CAPTCHAs automatically.
In this guide, we'll walk through setting up Airtop from scratch, including proxy configuration for bypassing anti-bot detection and handling authenticated sessions.
What You'll Learn
In this guide, you'll discover how to:
- Set up Airtop with proper API authentication
- Configure residential proxies for stealth scraping
- Handle complex authentication flows (OAuth, 2FA)
- Build automated scrapers with natural language commands
- Bypass anti-bot measures effectively
Step 1: Get Your API Key and Initial Setup
First things first - you need an Airtop account and API key to get started.
Setting Up Your Account
- Head to portal.airtop.ai and create a free account
- Navigate to the API Keys section in your dashboard
- Click "+ Create new key" and give it a descriptive name
- Copy the generated key immediately (you won't see it again)
Installing the SDK
Airtop provides SDKs for both TypeScript/Node.js and Python. Pick your weapon:
For Node.js/TypeScript:
npm install @airtop/sdk
# or with yarn
yarn add @airtop/sdk
For Python:
pip install airtop
Environment Configuration
Create a .env
file in your project root:
AIRTOP_API_KEY=your_api_key_here
Step 2: Initialize Your First Browser Session
Let's create a basic browser session to test everything's working:
TypeScript Example:
import { AirtopClient } from "@airtop/sdk";
import * as dotenv from 'dotenv';
dotenv.config();
const client = new AirtopClient({
apiKey: process.env.AIRTOP_API_KEY
});
async function createSession() {
try {
// Create a browser session
const session = await client.sessions.create({
configuration: {
// Session lasts 10 minutes by default
timeoutMinutes: 10
}
});
console.log(`Session created: ${session.id}`);
// Create a window and navigate
const window = await client.windows.create(session.id, {
url: "https://example.com"
});
console.log(`Window created: ${window.id}`);
return { session, window };
} catch (error) {
console.error("Failed to create session:", error);
}
}
createSession();
Python Example:
import os
from airtop import Airtop
from dotenv import load_dotenv
load_dotenv()
client = Airtop(
api_key=os.getenv("AIRTOP_API_KEY")
)
async def create_session():
try:
# Create a browser session
session = await client.sessions.create(
configuration={
"timeoutMinutes": 10
}
)
print(f"Session created: {session.id}")
# Create a window
window = await client.windows.create(
session_id=session.id,
url="https://example.com"
)
print(f"Window created: {window.id}")
return session, window
except Exception as e:
print(f"Failed to create session: {e}")
Step 3: Configure Residential Proxies
Here's where things get interesting. Airtop has built-in residential proxy support with over 100 million IPs from 100+ countries. This is crucial for bypassing anti-bot detection.
Using Airtop's Integrated Proxy
const sessionWithProxy = await client.sessions.create({
configuration: {
proxy: {
type: "residential",
country: "US", // ISO 3166-1 format
sticky: true // Keep same IP for 30 minutes
}
}
});
Using Your Own Proxy
If you already have a proxy provider (like Bright Data, Oxylabs, or SmartProxy), you can bring your own:
const sessionWithCustomProxy = await client.sessions.create({
configuration: {
proxy: {
server: "http://proxy.example.com:8080",
username: "your_username",
password: "your_password"
}
}
});
Domain-Specific Proxy Routing
This is a neat trick - you can route only specific domains through the proxy while letting others connect directly:
const sessionWithSelectiveProxy = await client.sessions.create({
configuration: {
proxy: [
{
domainPattern: "*.wikipedia.org",
relay: {
type: "residential",
country: "UK"
}
},
{
domainPattern: "*", // All other domains
relay: null // No proxy
}
]
}
});
Step 4: Handle Authentication Like a Pro
One of Airtop's killer features is handling complex authentication flows automatically. Here's how to scrape data behind login walls:
async function scrapeWithAuth() {
const session = await client.sessions.create();
// Create a window
const window = await client.windows.create(session.id, {
url: "https://linkedin.com"
});
// Create a live view for manual login
const liveView = await client.windows.createLiveView(
session.id,
window.id
);
console.log(`Login here: ${liveView.url}`);
console.log("Complete the login process (including 2FA if needed)");
// Wait for user to complete login
await new Promise(resolve => {
setTimeout(resolve, 60000); // Wait 60 seconds
});
// Save the authenticated profile
const profile = await client.profiles.create({
sessionId: session.id,
name: "linkedin-authenticated"
});
// Now you can reuse this profile for future sessions
const newSession = await client.sessions.create({
profileId: profile.id
});
// Extract data using natural language
const data = await client.windows.pageQuery(
newSession.id,
window.id,
{
prompt: "Extract all job postings with company names, titles, and locations",
configuration: {
outputSchema: {
type: "array",
items: {
type: "object",
properties: {
company: { type: "string" },
title: { type: "string" },
location: { type: "string" }
}
}
}
}
}
);
return data;
}
Step 5: Leverage Natural Language Commands
Instead of writing complex selectors, use Airtop's AI to interact with pages naturally:
async function naturalLanguageAutomation() {
const session = await client.sessions.create({
configuration: {
proxy: { type: "residential", country: "US" }
}
});
const window = await client.windows.create(session.id, {
url: "https://producthunt.com"
});
// Extract structured data with a simple prompt
const products = await client.windows.pageQuery(
session.id,
window.id,
{
prompt: `Find all new product launches from today.
For each product, extract:
- Product name
- Description
- Vote count
- Maker name
Ignore sponsored listings`,
configuration: {
followPagination: true, // Automatically handle pagination
maxPages: 5
}
}
);
// Interact with the page
await client.windows.act(session.id, window.id, {
action: "Click on the first product that has more than 100 votes"
});
// Take a screenshot
const screenshot = await client.windows.screenshot(
session.id,
window.id
);
return products;
}
Step 6: Build a Production-Ready Scraper
Let's put it all together with a real-world example - monitoring competitor pricing:
import { AirtopClient } from "@airtop/sdk";
import * as fs from 'fs';
class CompetitorPriceMonitor {
private client: AirtopClient;
private profileId?: string;
constructor(apiKey: string) {
this.client = new AirtopClient({ apiKey });
}
async initialize() {
// Check if we have a saved profile
const profilePath = './competitor-profile.json';
if (fs.existsSync(profilePath)) {
const profile = JSON.parse(fs.readFileSync(profilePath, 'utf-8'));
this.profileId = profile.id;
}
}
async monitorPricing(competitorUrl: string) {
try {
// Create session with proxy rotation
const session = await this.client.sessions.create({
profileId: this.profileId,
configuration: {
proxy: {
type: "residential",
country: "US",
sticky: false // Rotate IP for each request
}
}
});
const window = await this.client.windows.create(session.id, {
url: competitorUrl
});
// Wait for page to load
await this.client.windows.waitForLoad(session.id, window.id);
// Extract pricing data
const pricingData = await this.client.windows.pageQuery(
session.id,
window.id,
{
prompt: `Extract all pricing plans with:
- Plan name
- Monthly price
- Annual price
- Top 3 features
- Any discounts or promotions`,
configuration: {
outputSchema: {
type: "object",
properties: {
plans: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
monthlyPrice: { type: "number" },
annualPrice: { type: "number" },
features: {
type: "array",
items: { type: "string" }
},
discount: { type: "string" }
}
}
},
lastUpdated: { type: "string" }
}
}
}
}
);
// Compare with previous data
const previousData = this.loadPreviousData(competitorUrl);
const changes = this.detectChanges(previousData, pricingData);
if (changes.length > 0) {
await this.notifyChanges(competitorUrl, changes);
}
// Save current data
this.savePricingData(competitorUrl, pricingData);
// Clean up
await this.client.sessions.terminate(session.id);
return pricingData;
} catch (error) {
console.error(`Error monitoring ${competitorUrl}:`, error);
throw error;
}
}
private detectChanges(previous: any, current: any): any[] {
// Implementation for change detection
const changes = [];
// Compare pricing, features, etc.
return changes;
}
private async notifyChanges(url: string, changes: any[]) {
// Send notifications (email, Slack, etc.)
console.log(`Price changes detected for ${url}:`, changes);
}
private loadPreviousData(url: string): any {
// Load from database or file
return null;
}
private savePricingData(url: string, data: any) {
// Save to database or file
fs.writeFileSync(
`./pricing-data/${url.replace(/[^a-z0-9]/gi, '_')}.json`,
JSON.stringify(data, null, 2)
);
}
}
// Usage
async function main() {
const monitor = new CompetitorPriceMonitor(process.env.AIRTOP_API_KEY!);
await monitor.initialize();
const competitors = [
"https://competitor1.com/pricing",
"https://competitor2.com/plans",
"https://competitor3.com/pricing"
];
for (const url of competitors) {
await monitor.monitorPricing(url);
// Add delay to avoid rate limiting
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
main().catch(console.error);
Advanced Tips and Tricks
1. Bypass Cloudflare and Other Anti-Bot Systems
const stealthSession = await client.sessions.create({
configuration: {
proxy: {
type: "residential",
country: "US"
},
viewport: {
width: 1920,
height: 1080
},
userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
// Airtop automatically handles fingerprinting
stealth: true
}
});
2. Handle Dynamic Content and Infinite Scroll
const scrollAndExtract = async (sessionId: string, windowId: string) => {
// Scroll to load dynamic content
await client.windows.act(sessionId, windowId, {
action: "Scroll to the bottom of the page slowly over 5 seconds"
});
// Extract data after scrolling
const data = await client.windows.pageQuery(sessionId, windowId, {
prompt: "Extract all product cards that are now visible",
configuration: {
waitForStable: true // Wait for content to stop changing
}
});
return data;
};
3. Smart CAPTCHA Handling
While Airtop handles many CAPTCHAs automatically, for complex ones you can use the live view:
async function handleCaptcha(session: any, window: any) {
const captchaDetected = await client.windows.pageQuery(
session.id,
window.id,
{
prompt: "Is there a CAPTCHA on this page? Return true or false"
}
);
if (captchaDetected) {
const liveView = await client.windows.createLiveView(
session.id,
window.id
);
console.log(`Manual intervention needed: ${liveView.url}`);
// Wait for human to solve CAPTCHA
await new Promise(resolve => setTimeout(resolve, 30000));
}
}
Pricing and Alternatives
Airtop Pricing Tiers
- Free Plan: 5,000 credits, 1 simultaneous session
- Starter: $29/month, 3 simultaneous sessions, integrated proxy
- Professional: $89/month, 30 simultaneous sessions, custom proxy support
- Enterprise: $380+/month, 100+ sessions, SOC 2 compliance, dedicated support
When to Use Airtop vs Alternatives
Choose Airtop when:
- You need to handle complex authentication (OAuth, 2FA)
- Natural language commands appeal to you
- You're scraping JavaScript-heavy sites with anti-bot measures
- You need reliable proxy rotation
Consider Playwright/Puppeteer when:
- You need fine-grained control over browser behavior
- You're building complex test suites
- Budget is extremely tight (both are free)
- You have existing infrastructure for proxy management
Consider Selenium when:
- You need support for legacy browsers
- Your team uses languages like Java or C#
- You have existing Selenium infrastructure
Common Pitfalls to Avoid
- Not rotating proxies enough: Some sites track IP patterns. Use
sticky: false
for aggressive sites. - Not saving authenticated profiles: Authentication is expensive. Always save profiles for reuse.
- Over-relying on natural language: Sometimes explicit selectors are more reliable for production systems.
Ignoring rate limits: Even with proxies, respect rate limits. Add delays between requests:
await new Promise(resolve => setTimeout(resolve, Math.random() * 3000 + 2000));
Next Steps
Now that you have Airtop configured, consider these advanced implementations:
- Build a distributed scraping system with queue management
- Integrate with data pipelines using Apache Airflow
- Create a monitoring dashboard with real-time alerts
- Implement automatic proxy health checking and rotation
Remember, Airtop shines when you need to interact with complex, modern web applications that would typically require human intervention. The combination of cloud browsers, natural language processing, and integrated proxy support makes it a powerful tool for scenarios where traditional scraping fails.