Most developers reach for Python or JavaScript when they think about web scraping. But here's what they're missing: Dart offers a cleaner syntax, strong typing that catches errors before runtime, and—when you need it—true parallel processing through isolates.

I've spent the last year building scrapers in Dart for everything from e-commerce price monitoring to research data collection. The language isn't just viable for scraping; it's actually pretty great at it. In this guide, I'll show you how to scrape websites with Dart, starting from the basics and moving into techniques that most tutorials skip entirely.

What You'll Need Before Starting

Setting up Dart for web scraping is straightforward. First, install the Dart SDK from dart.dev—use a package manager like Homebrew on Mac (brew install dart) or Chocolatey on Windows (choco install dart-sdk).

Once that's done, create a new project:

dart create web_scraper
cd web_scraper

Add the essential packages to your pubspec.yaml:

dependencies:
  http: ^1.2.0
  html: ^0.15.4
  
dev_dependencies:
  lints: ^3.0.0

Run dart pub get to install everything. That's it—you're ready to scrape.

The Foundation: HTTP Requests and HTML Parsing

Every web scraper starts by fetching HTML and parsing it. The http package handles requests, while html parses the response into a structure you can query.

Here's a basic scraper that extracts product information:

import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as parser;

Future<void> scrapeProducts() async {
  final url = Uri.parse('https://example-shop.com/products');
  final response = await http.get(url);

  if (response.statusCode == 200) {
    final document = parser.parse(response.body);
    final products = document.querySelectorAll('.product-card');
    
    for (var product in products) {
      final name = product.querySelector('h3')?.text.trim();
      final price = product.querySelector('.price')?.text.trim();
      final imageUrl = product.querySelector('img')?.attributes['src'];
      
      print('Product: $name');
      print('Price: $price');
      print('Image: $imageUrl\n');
    }
  } else {
    throw Exception('Failed to load page: ${response.statusCode}');
  }
}

This code does three things: fetches the HTML, parses it into a document object, then uses CSS selectors to extract specific elements. The ?. operator is Dart's null-safe way of handling missing elements—if a selector doesn't find anything, it returns null instead of crashing.

But here's something most guides don't mention: the html package doesn't execute JavaScript. If your target site loads content dynamically, this approach won't work. We'll fix that later.

Building a Reusable Scraper Class

Writing standalone functions works for quick scripts, but real projects need structure. Here's a scraper class that handles common tasks:

import 'dart:io';
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as parser;
import 'package:html/dom.dart';

class WebScraper {
  final Map<String, String> headers;
  final Duration timeout;
  
  WebScraper({
    Map<String, String>? headers,
    this.timeout = const Duration(seconds: 10),
  }) : headers = headers ?? {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
  };

  Future<Document?> fetchPage(String url) async {
    try {
      final uri = Uri.parse(url);
      final response = await http.get(uri, headers: headers).timeout(timeout);
      
      if (response.statusCode == 200) {
        return parser.parse(response.body);
      } else {
        print('Error: Status code ${response.statusCode}');
        return null;
      }
    } catch (e) {
      print('Error fetching $url: $e');
      return null;
    }
  }

  Future<void> delay([Duration? duration]) async {
    await Future.delayed(duration ?? Duration(seconds: 2));
  }
}

Notice the default headers? Real browsers send these with every request. Without them, some sites will immediately block you or return different content. The User-Agent string tells servers you're using a browser, not a bot.

The delay() method is equally important. If you hammer a server with requests, you'll get blocked. Always add delays between requests—2 seconds is a good starting point.

Extracting Data Like a Pro

CSS selectors are powerful, but they're not the only option. Sometimes you need to grab data from element attributes, or text that's nested several levels deep. Here's how to handle different scenarios:

extension ScraperExtensions on Element {
  String? getTextOrNull(String selector) {
    return querySelector(selector)?.text.trim();
  }
  
  String? getAttributeOrNull(String selector, String attribute) {
    return querySelector(selector)?.attributes[attribute];
  }
  
  List<String> getAllText(String selector) {
    return querySelectorAll(selector)
        .map((e) => e.text.trim())
        .where((text) => text.isNotEmpty)
        .toList();
  }
}

These extension methods make extraction cleaner. Instead of repeating null checks everywhere, you can write:

final document = await scraper.fetchPage(url);
if (document != null) {
  final title = document.getTextOrNull('h1.title');
  final description = document.getTextOrNull('div.description');
  final tags = document.getAllText('span.tag');
  final imageUrl = document.getAttributeOrNull('img.hero', 'src');
}

Handling Pagination Without Losing Your Mind

Most websites split content across multiple pages. Here's a pattern that crawls through pagination automatically:

Future<List<String>> crawlPages(String baseUrl, int maxPages) async {
  final scraper = WebScraper();
  final allData = <String>[];
  
  for (var page = 1; page <= maxPages; page++) {
    final url = '$baseUrl?page=$page';
    print('Scraping page $page...');
    
    final document = await scraper.fetchPage(url);
    if (document == null) break;
    
    final items = document.querySelectorAll('.item');
    if (items.isEmpty) break; // No more data
    
    for (var item in items) {
      final text = item.text.trim();
      allData.add(text);
    }
    
    await scraper.delay(Duration(seconds: 3));
  }
  
  return allData;
}

This code stops automatically when it hits an empty page, which is smarter than blindly crawling a fixed number of pages. The delay between pages prevents getting rate-limited.

Scraping JavaScript-Heavy Sites with Puppeteer

Remember when I said the html package can't execute JavaScript? This is where Puppeteer comes in. It's a port of the Node.js library that controls a headless Chrome browser.

Add it to your pubspec.yaml:

dependencies:
  puppeteer: ^3.11.0

Here's how to scrape a site that loads content with JavaScript:

import 'package:puppeteer/puppeteer.dart';

Future<void> scrapeWithPuppeteer() async {
  final browser = await puppeteer.launch(headless: true);
  final page = await browser.newPage();
  
  // Set a realistic viewport size
  await page.setViewport(DeviceViewport(width: 1920, height: 1080));
  
  // Navigate and wait for content to load
  await page.goto('https://dynamic-site.com/products', 
    wait: Until.networkIdle);
  
  // Execute JavaScript to scroll and load lazy content
  await page.evaluate('''
    window.scrollTo(0, document.body.scrollHeight);
  ''');
  await page.waitFor(Duration(seconds: 2));
  
  // Extract data using page.evaluate
  final products = await page.evaluate('''() => {
    const items = [];
    document.querySelectorAll('.product').forEach(el => {
      items.push({
        name: el.querySelector('h3')?.textContent,
        price: el.querySelector('.price')?.textContent,
      });
    });
    return items;
  }''');
  
  print(products);
  await browser.close();
}

The page.evaluate() method runs JavaScript in the browser context and returns the result to Dart. This is incredibly powerful—you can interact with the page exactly like a real user would.

But there's a catch: Puppeteer is resource-intensive. It launches an entire browser instance. For simple sites, stick with http + html. Save Puppeteer for when you actually need it.

The Secret Weapon: Concurrent Scraping with Isolates

Here's something most Dart tutorials completely skip: isolates. They're Dart's answer to threads, allowing true parallel processing. For scraping, this means you can fetch multiple pages simultaneously without blocking each other.

import 'dart:isolate';

class IsolateResult {
  final String url;
  final List<String> data;
  
  IsolateResult(this.url, this.data);
}

Future<void> scrapeUrlInIsolate(SendPort sendPort) async {
  final receivePort = ReceivePort();
  sendPort.send(receivePort.sendPort);
  
  await for (var message in receivePort) {
    if (message is String) {
      // Scrape the URL
      final scraper = WebScraper();
      final document = await scraper.fetchPage(message);
      
      if (document != null) {
        final data = document.querySelectorAll('.item')
            .map((e) => e.text.trim())
            .toList();
        
        sendPort.send(IsolateResult(message, data));
      }
    }
  }
}

Future<List<IsolateResult>> scrapeConcurrently(List<String> urls) async {
  final results = <IsolateResult>[];
  final futures = <Future>[];
  
  for (var url in urls) {
    final future = Isolate.run(() async {
      final scraper = WebScraper();
      final document = await scraper.fetchPage(url);
      
      if (document != null) {
        final data = document.querySelectorAll('.item')
            .map((e) => e.text.trim())
            .toList();
        return IsolateResult(url, data);
      }
      return null;
    });
    
    futures.add(future);
  }
  
  final completedResults = await Future.wait(futures);
  
  return completedResults
      .where((r) => r != null)
      .cast<IsolateResult>()
      .toList();
}

With Isolate.run(), each URL scrapes in parallel. On a multi-core machine, this can be 3-5x faster than sequential scraping. Just remember: more parallelism means more requests hitting the server. Don't abuse this or you'll get blocked.

Anti-Detection Techniques That Actually Work

Websites use various methods to detect scrapers. Here's how to fly under the radar:

1. Rotate User Agents

Don't use the same User-Agent for every request:

final userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36',
];

String getRandomUserAgent() {
  return userAgents[Random().nextInt(userAgents.length)];
}

2. Implement Exponential Backoff

If you get rate-limited (status code 429), back off exponentially:

Future<http.Response?> fetchWithBackoff(String url, {int maxRetries = 3}) async {
  var delay = Duration(seconds: 1);
  
  for (var attempt = 0; attempt < maxRetries; attempt++) {
    try {
      final response = await http.get(Uri.parse(url));
      
      if (response.statusCode == 429) {
        print('Rate limited. Waiting ${delay.inSeconds}s before retry $attempt');
        await Future.delayed(delay);
        delay *= 2; // Exponential backoff
        continue;
      }
      
      return response;
    } catch (e) {
      if (attempt == maxRetries - 1) rethrow;
      await Future.delayed(delay);
      delay *= 2;
    }
  }
  
  return null;
}

3. Respect robots.txt

Always check a site's robots.txt before scraping:

Future<bool> canScrapeUrl(String baseUrl, String path) async {
  try {
    final robotsUrl = Uri.parse('$baseUrl/robots.txt');
    final response = await http.get(robotsUrl);
    
    if (response.statusCode == 200) {
      final lines = response.body.split('\n');
      var disallowed = false;
      
      for (var line in lines) {
        if (line.startsWith('Disallow:')) {
          final disallowPath = line.split(':')[1].trim();
          if (path.startsWith(disallowPath)) {
            disallowed = true;
            break;
          }
        }
      }
      
      return !disallowed;
    }
  } catch (e) {
    print('Could not fetch robots.txt: $e');
  }
  
  return true; // If we can't fetch robots.txt, proceed cautiously
}

4. Add Random Delays

Make your scraper behavior less predictable:

Future<void> randomDelay({int minSeconds = 2, int maxSeconds = 5}) async {
  final seconds = minSeconds + Random().nextInt(maxSeconds - minSeconds);
  await Future.delayed(Duration(seconds: seconds));
}

Saving Data Efficiently

Once you've scraped data, you need to store it. Here are three common approaches:

JSON Files

import 'dart:convert';
import 'dart:io';

Future<void> saveToJson(List<Map<String, dynamic>> data, String filename) async {
  final file = File(filename);
  final jsonString = JsonEncoder.withIndent('  ').convert(data);
  await file.writeAsString(jsonString);
  print('Saved ${data.length} items to $filename');
}

CSV Files

Future<void> saveToCsv(List<Map<String, dynamic>> data, String filename) async {
  final file = File(filename);
  final sink = file.openWrite();
  
  if (data.isEmpty) return;
  
  // Write headers
  final headers = data.first.keys.join(',');
  sink.writeln(headers);
  
  // Write data
  for (var row in data) {
    final values = row.values.map((v) => '"${v.toString()}"').join(',');
    sink.writeln(values);
  }
  
  await sink.close();
}

SQLite Database

For larger datasets, use a database:

// Add sqflite package to pubspec.yaml
import 'package:sqflite/sqflite.dart';

Future<void> saveToDatabase(List<Map<String, dynamic>> data) async {
  final db = await openDatabase(
    'scraped_data.db',
    version: 1,
    onCreate: (db, version) {
      return db.execute(
        'CREATE TABLE products(id INTEGER PRIMARY KEY, name TEXT, price TEXT, url TEXT)',
      );
    },
  );
  
  final batch = db.batch();
  for (var item in data) {
    batch.insert('products', item);
  }
  await batch.commit();
  await db.close();
}

A Real-World Example: E-commerce Price Monitor

Let's put everything together in a practical example. This scraper monitors product prices and alerts you when they drop:

import 'dart:convert';
import 'dart:io';
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as parser;

class PriceMonitor {
  final String configFile = 'watched_products.json';
  
  Future<void> checkPrices() async {
    final products = await loadWatchedProducts();
    final scraper = WebScraper();
    
    for (var product in products) {
      print('Checking ${product['name']}...');
      
      final document = await scraper.fetchPage(product['url']);
      if (document == null) continue;
      
      final priceText = document.querySelector('.price')?.text.trim();
      if (priceText == null) continue;
      
      final currentPrice = _extractPrice(priceText);
      final targetPrice = double.parse(product['target_price']);
      
      if (currentPrice <= targetPrice) {
        _sendAlert(product['name'], currentPrice, targetPrice);
      }
      
      await scraper.delay(Duration(seconds: 3));
    }
  }
  
  double _extractPrice(String priceText) {
    // Remove currency symbols and parse
    final cleaned = priceText.replaceAll(RegExp(r'[^\d.]'), '');
    return double.parse(cleaned);
  }
  
  void _sendAlert(String name, double current, double target) {
    print('\n🚨 PRICE ALERT! 🚨');
    print('$name dropped to \$$current (target: \$$target)');
    print('Time to buy!\n');
  }
  
  Future<List<Map<String, dynamic>>> loadWatchedProducts() async {
    final file = File(configFile);
    if (!await file.exists()) return [];
    
    final content = await file.readAsString();
    return List<Map<String, dynamic>>.from(jsonDecode(content));
  }
}

When to Use HTTP vs. Puppeteer

The decision tree is simple:

Use http + html when:

  • The site's content is in the initial HTML response
  • You need speed and low resource usage
  • You're scraping dozens or hundreds of pages
  • The site doesn't use heavy JavaScript

Use Puppeteer when:

  • Content loads dynamically via JavaScript
  • You need to interact with the page (click buttons, fill forms)
  • The site uses infinite scrolling
  • You need to take screenshots or generate PDFs

In practice, about 70% of sites can be scraped with just http + html. Save Puppeteer for when you actually need it.

Common Pitfalls and How to Avoid Them

1. Not Handling Encoding Issues

Some sites use different character encodings:

final response = await http.get(uri);
final decoded = utf8.decode(response.bodyBytes);
final document = parser.parse(decoded);

2. Forgetting to Close Resources

Always clean up after Puppeteer:

try {
  final browser = await puppeteer.launch();
  // ... scraping code
} finally {
  await browser.close(); // Always close, even if errors occur
}

3. Not Validating Data

Scraped data is messy. Always validate:

bool isValidPrice(String? text) {
  if (text == null || text.isEmpty) return false;
  return RegExp(r'\d+\.?\d*').hasMatch(text);
}

4. Ignoring Memory Usage

When scraping thousands of pages, memory adds up:

// Bad: Loads everything into memory
final allData = <Map<String, dynamic>>[];
for (var url in urls) {
  final data = await scrapePage(url);
  allData.add(data);
}

// Good: Write to disk incrementally
for (var url in urls) {
  final data = await scrapePage(url);
  await appendToFile(data);
}

Testing Your Scraper

Don't deploy without testing. Here's a simple test harness:

Future<void> testScraper() async {
  final testUrls = [
    'https://httpbin.org/html',
    'https://httpbin.org/delay/2',
    'https://httpbin.org/status/404',
  ];
  
  final scraper = WebScraper();
  
  for (var url in testUrls) {
    print('\nTesting $url');
    try {
      final document = await scraper.fetchPage(url);
      print('✓ Success: ${document != null}');
    } catch (e) {
      print('✗ Failed: $e');
    }
  }
}

Performance Optimization Tips

1. Reuse HTTP Clients

Creating a new client for each request is slow:

class WebScraper {
  final http.Client _client = http.Client();
  
  Future<Document?> fetchPage(String url) async {
    final response = await _client.get(Uri.parse(url));
    // ... parsing logic
  }
  
  void dispose() {
    _client.close();
  }
}

2. Parse Only What You Need

Don't parse the entire document if you only need one element:

// Slower
final document = parser.parse(html);
final price = document.querySelector('.price')?.text;

// Faster - parse just the section you need
final priceSection = html.indexOf('<div class="price">');
if (priceSection != -1) {
  final endSection = html.indexOf('</div>', priceSection);
  final snippet = html.substring(priceSection, endSection);
  final doc = parser.parseFragment(snippet);
}

3. Cache DNS Lookups

Repeated DNS lookups slow you down. Use a custom HTTP client with caching.

Before scraping any website:

  1. Read the Terms of Service - Some sites explicitly forbid scraping
  2. Check robots.txt - Respect these directives
  3. Don't overload servers - Add appropriate delays
  4. Identify yourself - Use a descriptive User-Agent
  5. Use public data only - Don't scrape behind logins unless authorized

If a site offers an API, use that instead. It's always better than scraping.

Wrapping Up

Dart might not be the first language you think of for web scraping, but it's genuinely capable. The strong typing catches bugs early, the async/await syntax is clean, and isolates give you real parallelism when you need it.

Start simple with http + html for static sites. Add Puppeteer when you hit JavaScript-heavy pages. Use isolates when you need speed. And always—always—be respectful to the sites you're scraping.

The scrapers you build today should still work tomorrow. Write clean code, handle errors gracefully, and test thoroughly. Your future self will thank you.

Now go build something. And if you get blocked, well, now you know how to fix it.