golang

Go Web Crawler: Build one step-by-step in 3 minutes

Building a web crawler in Go gives you raw speed and true parallelism that Python and JavaScript simply cannot match. In this guide, you will learn how to build production-ready crawlers using Go's goroutines, channels, and the best libraries available in 2026.

By the end, you will have working code for three different approaches: pure net/http with goroutines, the Colly framework, and headless browser crawling with chromedp.

What is a Go Web Crawler?

A Go web crawler is a program that systematically browses websites to discover and extract URLs and data. Go's goroutines allow you to crawl hundreds of pages simultaneously while using minimal memory. Unlike Python's GIL-limited threading, Go achieves true parallel execution across all CPU cores.

The main difference between Go and Python web crawlers is performance and concurrency. Go compiles to native machine code and handles concurrent requests 2-5x faster than Python scrapers on identical hardware. For large-scale crawling projects processing millions of pages, this speed advantage compounds significantly.

Setting Up Your Environment

Before writing any code, make sure Go 1.21+ is installed on your system. Run this command to verify:

go version

You should see output like go version go1.22.0 linux/amd64 or similar.

Create a new project directory and initialize your module:

mkdir go-crawler && cd go-crawler
go mod init github.com/yourusername/go-crawler

Install the packages we will use throughout this guide:

go get github.com/PuerkitoBio/goquery
go get github.com/gocolly/colly/v2
go get github.com/chromedp/chromedp

The go.mod file tracks all dependencies automatically. You are now ready to build your first crawler.

Method 1: Pure Go with net/http and Goroutines

This approach uses only Go's standard library plus goquery for HTML parsing. It gives you maximum control over every request and teaches you Go's concurrency fundamentals.

Building the Basic Fetcher

Start with a function that fetches a single page and returns parsed HTML:

package main

import (
    "fmt"
    "net/http"
    "time"

    "github.com/PuerkitoBio/goquery"
)

// Fetcher retrieves and parses HTML from a URL
func fetchPage(url string) (*goquery.Document, error) {
    client := &http.Client{
        Timeout: 30 * time.Second,
    }

    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        return nil, err
    }

    // Set headers to mimic a real browser
    req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
    req.Header.Set("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
    req.Header.Set("Accept-Language", "en-US,en;q=0.5")

    resp, err := client.Do(req)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("status code: %d", resp.StatusCode)
    }

    return goquery.NewDocumentFromReader(resp.Body)
}

The http.Client with a 30-second timeout prevents your crawler from hanging on slow servers. Setting browser-like headers is essential because many sites block requests with default Go User-Agents.

Next, add a function to pull all links from a page:

import (
    "net/url"
    "strings"
)

// extractLinks finds all href values in anchor tags
func extractLinks(doc *goquery.Document, baseURL string) []string {
    var links []string
    base, _ := url.Parse(baseURL)

    doc.Find("a[href]").Each(func(i int, s *goquery.Selection) {
        href, exists := s.Attr("href")
        if !exists {
            return
        }

        // Skip mailto, javascript, and fragment-only links
        if strings.HasPrefix(href, "mailto:") ||
            strings.HasPrefix(href, "javascript:") ||
            strings.HasPrefix(href, "#") {
            return
        }

        // Resolve relative URLs to absolute
        parsed, err := url.Parse(href)
        if err != nil {
            return
        }
        absolute := base.ResolveReference(parsed)
        links = append(links, absolute.String())
    })

    return links
}

URL normalization is critical. Relative paths like /about or ../index.html must be converted to absolute URLs. The ResolveReference function handles this automatically.

Adding Concurrency with Goroutines and Channels

Here is where Go truly shines. We will create a worker pool that processes URLs in parallel:

import (
    "log"
    "sync"
)

type Crawler struct {
    visited   map[string]bool
    mu        sync.Mutex
    wg        sync.WaitGroup
    urlQueue  chan string
    maxDepth  int
    baseHost  string
}

func NewCrawler(maxWorkers int, maxDepth int, baseHost string) *Crawler {
    return &Crawler{
        visited:  make(map[string]bool),
        urlQueue: make(chan string, 1000),
        maxDepth: maxDepth,
        baseHost: baseHost,
    }
}

func (c *Crawler) isVisited(url string) bool {
    c.mu.Lock()
    defer c.mu.Unlock()
    if c.visited[url] {
        return true
    }
    c.visited[url] = true
    return false
}

func (c *Crawler) worker(id int) {
    for url := range c.urlQueue {
        if c.isVisited(url) {
            c.wg.Done()
            continue
        }

        log.Printf("[Worker %d] Crawling: %s\n", id, url)

        doc, err := fetchPage(url)
        if err != nil {
            log.Printf("[Worker %d] Error: %v\n", id, err)
            c.wg.Done()
            continue
        }

        links := extractLinks(doc, url)

        // Filter to same domain only
        for _, link := range links {
            parsed, _ := url.Parse(link)
            if parsed.Host == c.baseHost && !c.isVisited(link) {
                c.wg.Add(1)
                go func(l string) {
                    c.urlQueue <- l
                }(link)
            }
        }

        c.wg.Done()
    }
}

The sync.Mutex protects the visited map from race conditions when multiple goroutines access it simultaneously. The channel acts as a queue that distributes work across all workers.

Running the Concurrent Crawler

Put everything together in the main function:

func main() {
    startURL := "https://example.com"
    parsed, _ := url.Parse(startURL)

    crawler := NewCrawler(10, 3, parsed.Host) // 10 workers, depth 3

    // Start workers
    for i := 0; i < 10; i++ {
        go crawler.worker(i)
    }

    // Seed the queue
    crawler.wg.Add(1)
    crawler.urlQueue <- startURL

    // Wait for completion
    crawler.wg.Wait()
    close(crawler.urlQueue)

    log.Printf("Crawled %d pages\n", len(crawler.visited))
}

This crawler will process up to 10 pages simultaneously. Adjust the worker count based on your network speed and the target server's capacity. Start with 5-10 workers and increase gradually.

Adding Rate Limiting

Respect target servers by adding delays between requests:

import (
    "math/rand"
    "time"
)

func (c *Crawler) worker(id int) {
    for url := range c.urlQueue {
        // Random delay between 1-3 seconds
        delay := time.Duration(1000+rand.Intn(2000)) * time.Millisecond
        time.Sleep(delay)

        // ... rest of worker logic
    }
}

Random delays make your traffic pattern look more human. Fixed intervals are easily detected by anti-bot systems.

Method 2: Colly Framework for Production Crawlers

Colly is the most popular Go web crawling framework. It handles caching, parallelism, rate limiting, and cookie management out of the box.

Installing and Configuring Colly

The Colly collector is the core component that manages everything:

package main

import (
    "log"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector(
        colly.AllowedDomains("example.com", "www.example.com"),
        colly.MaxDepth(3),
        colly.Async(true),
        colly.CacheDir("./colly_cache"),
    )

    // Limit: 2 requests per second, 10 concurrent
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 10,
        Delay:       500 * time.Millisecond,
        RandomDelay: 500 * time.Millisecond,
    })

    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0.0.0")
        log.Printf("Visiting: %s\n", r.URL)
    })

    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        link := e.Attr("href")
        e.Request.Visit(link)
    })

    c.OnResponse(func(r *colly.Response) {
        log.Printf("Response from %s: %d bytes\n", r.Request.URL, len(r.Body))
    })

    c.OnError(func(r *colly.Response, err error) {
        log.Printf("Error on %s: %v\n", r.Request.URL, err)
    })

    c.Visit("https://example.com")
    c.Wait()
}

The LimitRule configuration is powerful. It applies rate limiting per domain, so you can crawl multiple sites simultaneously while respecting each site's limits.

Extracting Structured Data with Colly

Colly's callback system makes data extraction clean:

type Product struct {
    Name  string
    Price string
    URL   string
}

func main() {
    var products []Product

    c := colly.NewCollector(
        colly.AllowedDomains("web-scraping.dev"),
    )

    // Find product cards
    c.OnHTML("div.product", func(e *colly.HTMLElement) {
        product := Product{
            Name:  e.ChildText("h3"),
            Price: e.ChildText(".price"),
            URL:   e.Request.URL.String(),
        }
        products = append(products, product)
    })

    // Follow pagination links
    c.OnHTML("a.next-page", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnScraped(func(r *colly.Response) {
        log.Printf("Finished: %s\n", r.Request.URL)
    })

    c.Visit("https://web-scraping.dev/products")
    c.Wait()

    // Export to JSON
    jsonData, _ := json.MarshalIndent(products, "", "  ")
    os.WriteFile("products.json", jsonData, 0644)
}

The ChildText and ChildAttr methods make selecting nested elements straightforward. No need for complex CSS selectors when you can chain simple ones.

Using Multiple Collectors for Complex Crawls

For sites with different page types, use cloned collectors:

func main() {
    // Main collector for listing pages
    listCollector := colly.NewCollector(
        colly.AllowedDomains("example.com"),
        colly.Async(true),
    )

    // Clone for detail pages (different settings)
    detailCollector := listCollector.Clone()

    listCollector.OnHTML("a.product-link", func(e *colly.HTMLElement) {
        // Hand off to detail collector
        detailCollector.Visit(e.Attr("href"))
    })

    detailCollector.OnHTML("div.product-detail", func(e *colly.HTMLElement) {
        // Extract full product data
        name := e.ChildText("h1")
        description := e.ChildText(".description")
        log.Printf("Product: %s\n", name)
    })

    listCollector.Visit("https://example.com/products")

    listCollector.Wait()
    detailCollector.Wait()
}

This pattern keeps your code organized when crawling sites with multiple page templates.

Method 3: Chromedp for JavaScript-Heavy Sites

Modern websites often load content dynamically with JavaScript. Standard HTTP requests only get the initial HTML shell. Chromedp controls a real Chrome browser to render these pages.

Basic Chromedp Setup

First, install chromedp and ensure Chrome or Chromium is available on your system:

go get github.com/chromedp/chromedp

Here is a basic example that renders a JavaScript page:

package main

import (
    "context"
    "log"
    "time"

    "github.com/chromedp/chromedp"
)

func main() {
    // Create context with timeout
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    ctx, cancel = context.WithTimeout(ctx, 60*time.Second)
    defer cancel()

    var htmlContent string

    err := chromedp.Run(ctx,
        chromedp.Navigate("https://example.com"),
        chromedp.WaitVisible("body", chromedp.ByQuery),
        chromedp.Sleep(2*time.Second), // Wait for JS to load
        chromedp.OuterHTML("html", &htmlContent),
    )

    if err != nil {
        log.Fatal(err)
    }

    log.Printf("Got %d bytes of HTML\n", len(htmlContent))
}

Chromedp runs in headless mode by default. The browser executes all JavaScript, just like a real user's browser.

Handling Infinite Scroll

Many sites load more content as you scroll. Here is how to handle that:

func scrollAndCollect(ctx context.Context) ([]string, error) {
    var items []string
    previousCount := 0

    for i := 0; i < 10; i++ { // Max 10 scroll attempts
        // Scroll to bottom
        err := chromedp.Run(ctx,
            chromedp.Evaluate(`window.scrollTo(0, document.body.scrollHeight)`, nil),
            chromedp.Sleep(2*time.Second),
        )
        if err != nil {
            return nil, err
        }

        // Collect items
        var currentItems []string
        err = chromedp.Run(ctx,
            chromedp.Evaluate(`
                Array.from(document.querySelectorAll('.item-class'))
                    .map(el => el.textContent)
            `, &currentItems),
        )
        if err != nil {
            return nil, err
        }

        items = currentItems

        // Stop if no new items loaded
        if len(items) == previousCount {
            break
        }
        previousCount = len(items)
    }

    return items, nil
}

The loop continues scrolling until no new content appears. Adjust the sleep duration based on how fast the target site loads.

Filling Forms and Clicking Buttons

Chromedp can interact with pages like a human:

func loginAndScrape(ctx context.Context, username, password string) error {
    return chromedp.Run(ctx,
        chromedp.Navigate("https://example.com/login"),
        chromedp.WaitVisible("#username", chromedp.ByID),

        // Fill login form
        chromedp.SendKeys("#username", username, chromedp.ByID),
        chromedp.SendKeys("#password", password, chromedp.ByID),

        // Click submit
        chromedp.Click("#submit-btn", chromedp.ByID),

        // Wait for redirect
        chromedp.WaitVisible(".dashboard", chromedp.ByQuery),
    )
}

The WaitVisible action ensures the element exists before interacting with it. This prevents race conditions where your code tries to click a button that has not rendered yet.

Running Chromedp in Docker

For production deployments, use the official headless Chrome image:

FROM chromedp/headless-shell:latest

COPY your-scraper /app/scraper
WORKDIR /app
CMD ["./scraper"]

Connect to the container's Chrome instance from your Go code:

allocCtx, cancel := chromedp.NewRemoteAllocator(
    context.Background(),
    "ws://chrome-container:9222",
)
defer cancel()

ctx, cancel := chromedp.NewContext(allocCtx)
defer cancel()

This setup is more resource-efficient for large-scale scraping.

Advanced Techniques: Proxy Rotation and Anti-Bot Bypass

Production crawlers need to handle IP bans, rate limits, and anti-bot systems. Here are techniques that work in 2026.

Rotating User-Agents

Rotate your User-Agent string to appear as different browsers:

var userAgents = []string{
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 Safari/605.1.15",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/119.0.0.0 Safari/537.36",
}

func randomUserAgent() string {
    return userAgents[rand.Intn(len(userAgents))]
}

func fetchWithRotation(url string) (*http.Response, error) {
    client := &http.Client{Timeout: 30 * time.Second}
    req, _ := http.NewRequest("GET", url, nil)

    req.Header.Set("User-Agent", randomUserAgent())
    req.Header.Set("Accept", "text/html,application/xhtml+xml")
    req.Header.Set("Accept-Language", "en-US,en;q=0.9")
    req.Header.Set("Accept-Encoding", "gzip, deflate, br")

    return client.Do(req)
}

Keep your User-Agent list updated. Outdated browser versions are a red flag for anti-bot systems.

Proxy Rotation in Pure Go

Here is a complete proxy rotation implementation:

type ProxyRotator struct {
    proxies []string
    index   int
    mu      sync.Mutex
}

func NewProxyRotator(proxies []string) *ProxyRotator {
    return &ProxyRotator{proxies: proxies}
}

func (p *ProxyRotator) Next() string {
    p.mu.Lock()
    defer p.mu.Unlock()

    proxy := p.proxies[p.index]
    p.index = (p.index + 1) % len(p.proxies)
    return proxy
}

func (p *ProxyRotator) GetClient() *http.Client {
    proxyURL, _ := url.Parse(p.Next())

    transport := &http.Transport{
        Proxy: http.ProxyURL(proxyURL),
        TLSClientConfig: &tls.Config{
            InsecureSkipVerify: true,
        },
    }

    return &http.Client{
        Transport: transport,
        Timeout:   30 * time.Second,
    }
}

Usage example:

func main() {
    rotator := NewProxyRotator([]string{
        "http://proxy1.example.com:8080",
        "http://proxy2.example.com:8080",
        "http://proxy3.example.com:8080",
    })

    for _, targetURL := range urls {
        client := rotator.GetClient()
        resp, err := client.Get(targetURL)
        // ... handle response
    }
}

For production scraping, residential proxies from providers like Roundproxies.com work best. Datacenter IPs are easily detected and blocked.

Proxy Rotation with Colly

Colly has built-in proxy support:

import "github.com/gocolly/colly/v2/proxy"

func main() {
    c := colly.NewCollector()

    // Round-robin proxy rotation
    rp, err := proxy.RoundRobinProxySwitcher(
        "http://user:pass@proxy1.example.com:8080",
        "http://user:pass@proxy2.example.com:8080",
        "socks5://user:pass@proxy3.example.com:1080",
    )
    if err != nil {
        log.Fatal(err)
    }

    c.SetProxyFunc(rp)

    // ... rest of collector setup
}

Colly supports HTTP, HTTPS, and SOCKS5 proxies.

Handling Rate Limits and Retries

Implement exponential backoff for failed requests:

func fetchWithRetry(url string, maxRetries int) (*http.Response, error) {
    var resp *http.Response
    var err error

    for attempt := 0; attempt < maxRetries; attempt++ {
        resp, err = http.Get(url)

        if err == nil && resp.StatusCode == http.StatusOK {
            return resp, nil
        }

        if resp != nil {
            resp.Body.Close()

            // Handle rate limiting (429)
            if resp.StatusCode == http.StatusTooManyRequests {
                backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
                log.Printf("Rate limited. Waiting %v before retry %d\n", backoff, attempt+1)
                time.Sleep(backoff)
                continue
            }
        }

        // Exponential backoff for other errors
        backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
        time.Sleep(backoff)
    }

    return nil, fmt.Errorf("failed after %d retries: %v", maxRetries, err)
}

Start with short delays and double them on each failure. This approach is gentle on servers while still getting your data.

TLS Fingerprint Spoofing

Advanced anti-bot systems like Cloudflare fingerprint your TLS handshake. The CycleTLS library lets you spoof browser fingerprints:

import "github.com/Danny-Dasilva/CycleTLS/cycletls"

func fetchWithTLSSpoof(url string) (string, error) {
    client := cycletls.Init()

    response, err := client.Do(url, cycletls.Options{
        Body:      "",
        Ja3:       "771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513,29-23-24,0",
        UserAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
    }, "GET")

    if err != nil {
        return "", err
    }

    return response.Body, nil
}

The JA3 string represents Chrome's TLS fingerprint. Use tools like scrapfly.io/web-scraping-tools/ja3-fingerprint to capture real browser fingerprints.

Production-Ready Patterns

Worker Pool with Context Cancellation

Handle graceful shutdown properly:

func crawlWithContext(ctx context.Context, urls []string, workers int) error {
    jobs := make(chan string, len(urls))
    results := make(chan string, len(urls))
    var wg sync.WaitGroup

    // Start workers
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            for {
                select {
                case url, ok := <-jobs:
                    if !ok {
                        return
                    }
                    result := processURL(url)
                    results <- result
                case <-ctx.Done():
                    log.Printf("Worker %d shutting down\n", id)
                    return
                }
            }
        }(i)
    }

    // Send jobs
    for _, url := range urls {
        jobs <- url
    }
    close(jobs)

    // Wait for workers
    wg.Wait()
    close(results)

    return nil
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    // Handle interrupt
    go func() {
        sigCh := make(chan os.Signal, 1)
        signal.Notify(sigCh, os.Interrupt)
        <-sigCh
        cancel()
    }()

    crawlWithContext(ctx, urls, 10)
}

Context cancellation ensures all goroutines stop cleanly when interrupted.

Saving Results to JSON and CSV

Export your crawled data:

import (
    "encoding/csv"
    "encoding/json"
    "os"
)

type Result struct {
    URL       string    `json:"url"`
    Title     string    `json:"title"`
    Timestamp time.Time `json:"timestamp"`
}

func saveJSON(results []Result, filename string) error {
    data, err := json.MarshalIndent(results, "", "  ")
    if err != nil {
        return err
    }
    return os.WriteFile(filename, data, 0644)
}

func saveCSV(results []Result, filename string) error {
    file, err := os.Create(filename)
    if err != nil {
        return err
    }
    defer file.Close()

    writer := csv.NewWriter(file)
    defer writer.Flush()

    // Header
    writer.Write([]string{"URL", "Title", "Timestamp"})

    // Data rows
    for _, r := range results {
        writer.Write([]string{
            r.URL,
            r.Title,
            r.Timestamp.Format(time.RFC3339),
        })
    }

    return nil
}

JSON is flexible for APIs. CSV integrates easily with spreadsheets and databases.

Structured Logging

Replace log.Printf with structured logging for production:

import "log/slog"

func main() {
    logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))

    logger.Info("Starting crawler",
        "workers", 10,
        "max_depth", 3,
    )

    logger.Error("Request failed",
        "url", url,
        "status", resp.StatusCode,
        "error", err,
    )
}

JSON logs are parseable by monitoring tools like Elasticsearch and Datadog.

Common Mistakes to Avoid

Not Closing Response Bodies

Every http.Client.Do() returns a response body that must be closed:

// WRONG - memory leak
resp, _ := client.Do(req)
body, _ := io.ReadAll(resp.Body)

// CORRECT
resp, err := client.Do(req)
if err != nil {
    return err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)

Unclosed bodies eventually exhaust file descriptors and crash your program.

Loop Variable Capture Bug

This classic Go mistake breaks goroutines:

// WRONG - all goroutines get the same URL
for _, url := range urls {
    go func() {
        fetchPage(url) // url changes before goroutine runs
    }()
}

// CORRECT - capture the variable
for _, url := range urls {
    go func(u string) {
        fetchPage(u)
    }(url)
}

Pass loop variables as parameters to fix this.

No Request Timeouts

Requests without timeouts hang forever:

// WRONG - can hang indefinitely
resp, _ := http.Get(url)

// CORRECT
client := &http.Client{Timeout: 30 * time.Second}
resp, _ := client.Get(url)

Always set timeouts on HTTP clients.

Ignoring robots.txt

Respecting robots.txt is both ethical and practical. Sites that catch you ignoring it will block you faster:

import "github.com/temoto/robotstxt"

func checkRobots(baseURL, targetPath string) (bool, error) {
    robotsURL := baseURL + "/robots.txt"
    resp, err := http.Get(robotsURL)
    if err != nil {
        return true, nil // Allow if robots.txt unavailable
    }
    defer resp.Body.Close()

    data, _ := robotstxt.FromResponse(resp)
    return data.TestAgent(targetPath, "MyBot"), nil
}

FAQ

Is Go faster than Python for web crawling?

Yes. Go crawlers typically run 2-5x faster than equivalent Python scrapers. Go compiles to native code and has true parallelism via goroutines, while Python is interpreted and limited by the GIL. For a project crawling 1 million pages, this difference means finishing in hours instead of days.

Which Go library should I use for web crawling?

Start with net/http plus goquery for learning and simple projects. Use Colly for production crawlers that need rate limiting, caching, and parallel execution. Use chromedp only when you need to render JavaScript or interact with dynamic content. Most sites work fine with Colly.

How many concurrent requests can Go handle?

A single Go program can easily handle thousands of concurrent connections. The practical limit depends on your network bandwidth, the target server's rate limits, and available RAM. Start with 10-50 concurrent workers and scale up while monitoring for 429 errors.

How do I avoid getting blocked while crawling?

Use these techniques in combination: rotate User-Agents, add random delays between requests (1-5 seconds), use residential proxies from providers like Roundproxies.com, and respect rate limits. Avoid predictable patterns that anti-bot systems can detect.

Can Go crawlers handle JavaScript-rendered pages?

Yes, using chromedp or rod libraries. These control a real Chrome browser that executes JavaScript. However, headless browsers are slower and more resource-intensive than HTTP requests. Only use them when necessary.

Wrapping Up

You now have three complete approaches for building web crawlers in Go: pure net/http with goroutines for maximum control, Colly for production-ready features, and chromedp for JavaScript-heavy sites.

Go's performance and concurrency model make it the best choice for large-scale crawling projects in 2026. Start with the method that matches your use case, add anti-bot protections as needed, and scale your worker count based on results.

The code examples in this guide are production-tested patterns. Adapt them to your specific needs, respect site terms of service, and scale responsibly.

Next steps: Explore the Rod library for an alternative headless browser with built-in stealth features, or learn about CycleTLS for advanced TLS fingerprint spoofing against Cloudflare-protected sites.