AngleSharp transforms HTML parsing from a tedious regex nightmare into something that feels natural—like using JavaScript's DOM API, but in C#. If you've ever tried scraping a website with HtmlAgilityPack only to realize it chokes on HTML5 or modern CSS selectors, AngleSharp is the upgrade you didn't know you needed.

I've spent the last few months building scrapers and document processors with AngleSharp, and I keep finding clever ways to use it that aren't covered in the typical "parse HTML with AngleSharp" tutorial. This guide will take you from setup to production-ready techniques that actually handle real-world scenarios.

Why AngleSharp beats the alternatives

Before we dig into code, let's talk about why you'd choose AngleSharp over HtmlAgilityPack or CsQuery.

AngleSharp follows the W3C specification. This means it parses HTML5 exactly like Chrome or Firefox would. If you've ever opened the browser DevTools and used document.querySelector(), that's the same API AngleSharp exposes in C#. HtmlAgilityPack, on the other hand, was built before HTML5 existed and has quirks that'll drive you crazy.

It handles broken HTML gracefully. Real-world HTML is a mess. AngleSharp includes the same error correction algorithms that browsers use, so it won't freak out when it encounters unclosed tags or misplaced elements.

CSS selectors work out of the box. While HtmlAgilityPack requires XPath (which nobody enjoys writing), AngleSharp natively supports CSS selectors. Want all links in a nav? document.QuerySelectorAll("nav a"). Simple as that.

It's actively maintained. Check the GitHub repository—AngleSharp gets regular updates, bug fixes, and new features. HtmlAgilityPack's last major update was years ago.

The main trade-off? AngleSharp is a bit slower than HtmlAgilityPack on simple parsing tasks. But for most use cases, the difference is negligible (we're talking milliseconds), and the superior API makes up for it.

Getting started with AngleSharp

Let's get AngleSharp installed and parse our first HTML document.

Installation

Open your terminal in your .NET project directory and run:

dotnet add package AngleSharp

That's the core library. Depending on what you're building, you might also want:

# For CSS parsing support
dotnet add package AngleSharp.Css

# For JavaScript execution
dotnet add package AngleSharp.Js

We'll use these extension packages later in the guide.

Your first AngleSharp program

Here's the simplest possible AngleSharp program:

using AngleSharp;
using AngleSharp.Html.Dom;

var html = @"
<!DOCTYPE html>
<html>
<head><title>My Page</title></head>
<body>
    <h1>Hello, AngleSharp!</h1>
    <p>This is a paragraph.</p>
</body>
</html>";

var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));

Console.WriteLine(document.Title); // Output: My Page

Let's break down what's happening here:

Configuration.Default creates a basic configuration with minimal features enabled. Think of this as AngleSharp's settings object—it defines what capabilities your parser has.

BrowsingContext is like a browser tab. You can have multiple contexts running simultaneously, each with their own settings and document history.

OpenAsync loads your HTML content into a document. The req => req.Content(html) lambda tells AngleSharp to use our HTML string rather than fetching from a URL.

Basic HTML parsing and DOM navigation

Now that you've parsed a document, let's extract data from it.

Using QuerySelector and QuerySelectorAll

These are your bread-and-butter methods. If you've used JavaScript, this will feel familiar:

var html = @"
<html>
<body>
    <div class='product' data-id='101'>
        <h2>Laptop</h2>
        <span class='price'>$999</span>
    </div>
    <div class='product' data-id='102'>
        <h2>Mouse</h2>
        <span class='price'>$29</span>
    </div>
</body>
</html>";

var context = BrowsingContext.New(Configuration.Default);
var document = await context.OpenAsync(req => req.Content(html));

// Get a single element
var firstProduct = document.QuerySelector(".product");
Console.WriteLine(firstProduct.GetAttribute("data-id")); // 101

// Get all matching elements
var allPrices = document.QuerySelectorAll(".price");
foreach (var price in allPrices)
{
    Console.WriteLine(price.TextContent); // $999, then $29
}

The beauty of QuerySelector is that it accepts any valid CSS selector. Want nested elements? document.QuerySelector("div.product > h2"). Attribute selectors? document.QuerySelector("[data-id='101']"). It all works.

Sometimes you need to walk the DOM tree manually. AngleSharp gives you properties like ParentElement, Children, FirstElementChild, and more:

var product = document.QuerySelector(".product");
var heading = product.FirstElementChild; // The <h2>
var price = heading.NextElementSibling; // The <span class='price'>

Console.WriteLine(heading.TextContent); // Laptop
Console.WriteLine(price.TextContent); // $999

This is useful when the HTML structure is predictable but doesn't have convenient CSS classes.

Loading HTML from a URL

In real projects, you'll usually load HTML from the web. AngleSharp can handle HTTP requests directly:

var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync("https://example.com");

Console.WriteLine(document.Title);

The .WithDefaultLoader() method enables AngleSharp's built-in HTTP client. Behind the scenes, it uses HttpClient to fetch the page.

Advanced CSS selectors with LINQ integration

Here's where AngleSharp starts to shine. Every collection it returns supports LINQ, which means you can chain queries elegantly.

Combining CSS selectors with LINQ

Let's say you're scraping a product listing and want all products over $100:

var products = document.QuerySelectorAll(".product")
    .Select(p => new {
        Name = p.QuerySelector("h2")?.TextContent,
        PriceText = p.QuerySelector(".price")?.TextContent
    })
    .Where(p => {
        var priceStr = p.PriceText?.Replace("$", "").Replace(",", "");
        return decimal.TryParse(priceStr, out var price) && price > 100;
    })
    .ToList();

foreach (var product in products)
{
    Console.WriteLine($"{product.Name}: {product.PriceText}");
}

This is way cleaner than doing multiple loops. You're querying the DOM and filtering results in one fluid chain.

Using extension methods for cleaner code

AngleSharp provides extension methods that aren't part of the standard DOM API:

// Get all elements as a flat list (bypasses the node tree structure)
var allDivs = document.All.OfType<IHtmlDivElement>();

// Get all text content recursively
var allText = document.Body.Text();

// Get elements by tag name directly
var images = document.Images; // All <img> tags
var links = document.Links;   // All <a> tags with href attribute

The document.All property is particularly handy—it gives you every element in the document as a flat collection, which you can then filter with LINQ:

var elementsWithDataAttribute = document.All
    .Where(e => e.HasAttribute("data-id"))
    .ToList();

Parsing and manipulating CSS

Most AngleSharp tutorials skip this entirely, but CSS parsing is incredibly useful. You might need to extract color schemes from a website, analyze stylesheets, or validate CSS in user-submitted content.

First, install the CSS extension:

dotnet add package AngleSharp.Css

Now you can parse CSS stylesheets:

using AngleSharp.Css.Dom;

var config = Configuration.Default
    .WithDefaultLoader()
    .WithCss();

var context = BrowsingContext.New(config);
var document = await context.OpenAsync("https://example.com");

// Access inline styles
var element = document.QuerySelector(".colored-box");
var inlineStyle = element.GetAttribute("style");
Console.WriteLine(inlineStyle);

// Access stylesheets
foreach (var sheet in document.StyleSheets.OfType<ICssStyleSheet>())
{
    Console.WriteLine($"Stylesheet: {sheet.Href ?? "inline"}");
    
    foreach (var rule in sheet.Rules.OfType<ICssStyleRule>())
    {
        Console.WriteLine($"  Selector: {rule.SelectorText}");
        Console.WriteLine($"  Color: {rule.Style.GetPropertyValue("color")}");
    }
}

This lets you do things like:

  • Extract all colors used in a design
  • Find unused CSS rules
  • Validate that user-submitted CSS doesn't contain dangerous properties
  • Migrate CSS values (e.g., convert all hex colors to RGB)

Computing computed styles

This is where it gets really interesting. AngleSharp can calculate the computed styles for an element—meaning it resolves the cascade and gives you the final values:

var element = document.QuerySelector(".button");
var computedStyle = document.DefaultView.GetComputedStyle(element);

Console.WriteLine(computedStyle.GetPropertyValue("background-color"));
Console.WriteLine(computedStyle.GetPropertyValue("font-size"));

This accounts for CSS inheritance, specificity, and the cascade. If you're building a design tool or need to accurately extract styling information, computed styles are essential.

JavaScript execution support

AngleSharp can execute JavaScript, but there's a caveat: the JavaScript engine is experimental and limited. For production use, I usually recommend pairing AngleSharp with a headless browser like PuppeteerSharp (we'll cover that later).

That said, for simple scripts, AngleSharp's JS support works fine:

dotnet add package AngleSharp.Js
using AngleSharp.Js;

var config = Configuration.Default.WithJs();
var context = BrowsingContext.New(config);

var html = @"
<html>
<body>
    <div id='output'></div>
    <script>
        document.getElementById('output').textContent = 'Hello from JS!';
    </script>
</body>
</html>";

var document = await context.OpenAsync(req => req.Content(html));

var output = document.QuerySelector("#output");
Console.WriteLine(output.TextContent); // Hello from JS!

You can also execute arbitrary JavaScript against the document:

var result = document.ExecuteScript("document.querySelectorAll('p').length");
Console.WriteLine(result); // Number of paragraphs

When to use this: If the website uses JavaScript to populate data-* attributes or does simple DOM manipulation on page load, AngleSharp.Js can handle it. For single-page apps or sites with complex JavaScript, reach for PuppeteerSharp instead.

HTML sanitization techniques

This is a killer use case that most people overlook. If you accept HTML from users (like comments or WYSIWYG editors), you need to sanitize it to prevent XSS attacks.

Here's how to use AngleSharp as an HTML sanitizer:

public static string SanitizeHtml(string dirtyHtml)
{
    var config = Configuration.Default;
    var context = BrowsingContext.New(config);
    var document = await context.OpenAsync(req => req.Content(dirtyHtml));

    // Define allowed tags and attributes
    var allowedTags = new HashSet<string> { "p", "br", "strong", "em", "u", "a" };
    var allowedAttributes = new HashSet<string> { "href", "title" };

    // Remove all scripts
    foreach (var script in document.QuerySelectorAll("script").ToList())
    {
        script.Remove();
    }

    // Remove disallowed elements
    foreach (var element in document.Body.Descendants().OfType<IElement>().ToList())
    {
        if (!allowedTags.Contains(element.TagName.ToLower()))
        {
            element.Remove();
            continue;
        }

        // Remove disallowed attributes
        var attributesToRemove = element.Attributes
            .Where(attr => !allowedAttributes.Contains(attr.Name.ToLower()))
            .Select(attr => attr.Name)
            .ToList();

        foreach (var attr in attributesToRemove)
        {
            element.RemoveAttribute(attr);
        }

        // Sanitize href attributes to prevent javascript: URLs
        if (element.HasAttribute("href"))
        {
            var href = element.GetAttribute("href");
            if (href.StartsWith("javascript:", StringComparison.OrdinalIgnoreCase))
            {
                element.RemoveAttribute("href");
            }
        }
    }

    return document.Body.InnerHtml;
}

This approach:

  • Removes dangerous tags like <script> and <iframe>
  • Strips out disallowed attributes (like onclick or onerror)
  • Validates URLs in href attributes
  • Preserves the document structure

You can extend this to handle more cases—like stripping CSS expressions or validating that src attributes only point to trusted domains.

Performance optimization tricks

AngleSharp is fast, but when you're parsing thousands of documents, small optimizations add up. Here are techniques I've found effective.

Reuse BrowsingContext instances

Creating a new BrowsingContext has overhead. If you're parsing multiple documents with the same configuration, reuse the context:

var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);

// Parse multiple pages
foreach (var url in urls)
{
    var document = await context.OpenAsync(url);
    // Process document
}

This is especially important when scraping at scale.

Disable unnecessary features

If you don't need CSS or JavaScript support, don't enable them. The more features you load, the slower parsing becomes:

// Minimal config for maximum speed
var config = Configuration.Default;
var context = BrowsingContext.New(config);

Only add .WithCss() or .WithJs() if you actually need those features.

Parse fragments instead of full documents

If you only need to parse a snippet of HTML (like a component or a table), use fragment parsing:

var htmlFragment = "<div><span>Hello</span></div>";

var config = Configuration.Default;
var context = BrowsingContext.New(config);
var parser = context.GetService<IHtmlParser>();

var nodes = parser.ParseFragment(htmlFragment, null);
var element = nodes.OfType<IElement>().FirstOrDefault();

Console.WriteLine(element.QuerySelector("span").TextContent); // Hello

Fragment parsing skips the overhead of creating a full document structure, making it 2-3x faster for small snippets.

Use async properly

AngleSharp's API is async throughout. Don't block on .Result—let it run asynchronously:

// Bad - blocks the thread
var document = context.OpenAsync(url).Result;

// Good - actually async
var document = await context.OpenAsync(url);

When parsing multiple documents, use Task.WhenAll to parallelize:

var tasks = urls.Select(url => context.OpenAsync(url));
var documents = await Task.WhenAll(tasks);

Form submission and manipulation

AngleSharp can fill out and submit forms programmatically. This is useful for automating logins, filling out contact forms, or testing form validation.

var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync("https://example.com/contact");

// Find the form
var form = document.QuerySelector("form") as IHtmlFormElement;

// Fill out fields
var nameInput = document.QuerySelector("input[name='name']") as IHtmlInputElement;
nameInput.Value = "John Doe";

var emailInput = document.QuerySelector("input[name='email']") as IHtmlInputElement;
emailInput.Value = "john@example.com";

// Submit the form
var response = await form.SubmitAsync();

// Process the response
Console.WriteLine(response.StatusCode);

This works for GET and POST forms. AngleSharp constructs the request automatically based on the form's action, method, and field values.

Gotcha: This doesn't execute JavaScript, so if the form uses JS for validation or submission, you'll need a headless browser instead.

Handling JavaScript-heavy sites with PuppeteerSharp

For single-page apps or sites that heavily rely on JavaScript to render content, AngleSharp alone isn't enough. The solution? Use PuppeteerSharp to render the page, then pass the HTML to AngleSharp for parsing.

First, install PuppeteerSharp:

dotnet add package PuppeteerSharp

Here's how to combine them:

using PuppeteerSharp;
using AngleSharp;

public async Task<List<Product>> ScrapeDynamicSite(string url)
{
    // Launch headless browser
    var browserFetcher = new BrowserFetcher();
    await browserFetcher.DownloadAsync();
    
    await using var browser = await Puppeteer.LaunchAsync(new LaunchOptions
    {
        Headless = true
    });
    
    await using var page = await browser.NewPageAsync();
    await page.GoToAsync(url, new NavigationOptions
    {
        WaitUntil = new[] { WaitUntilNavigation.Networkidle0 }
    });

    // Wait for content to load
    await page.WaitForSelectorAsync(".product");

    // Get rendered HTML
    var html = await page.GetContentAsync();

    // Parse with AngleSharp
    var config = Configuration.Default;
    var context = BrowsingContext.New(config);
    var document = await context.OpenAsync(req => req.Content(html));

    // Extract data
    var products = document.QuerySelectorAll(".product")
        .Select(p => new Product
        {
            Name = p.QuerySelector("h2")?.TextContent,
            Price = p.QuerySelector(".price")?.TextContent
        })
        .ToList();

    return products;
}

This approach gives you the best of both worlds:

  • PuppeteerSharp renders JavaScript and waits for dynamic content
  • AngleSharp parses the final HTML efficiently with its superior API

It's slower than pure AngleSharp (because you're running a browser), but for JavaScript-heavy sites, it's the only reliable option.

Memory management for large documents

When processing huge HTML documents (think Wikipedia articles or documentation sites), memory usage can become an issue. Here are strategies to keep it under control.

Dispose documents explicitly

AngleSharp documents implement IDisposable. When you're done with a document, dispose it:

using (var document = await context.OpenAsync(url))
{
    // Process document
}
// Document is disposed here

This releases resources immediately instead of waiting for garbage collection.

Process elements incrementally

Instead of loading everything into memory at once, process elements as you find them:

var products = new List<Product>();

await foreach (var productElement in GetProductsAsync(document))
{
    // Process one product at a time
    var product = ExtractProduct(productElement);
    products.Add(product);
    
    // Could save to database here and clear the list
}

async IAsyncEnumerable<IElement> GetProductsAsync(IDocument document)
{
    foreach (var element in document.QuerySelectorAll(".product"))
    {
        yield return element;
        await Task.Yield(); // Allow other operations to proceed
    }
}

This keeps your memory footprint minimal, even with massive pages.

Parse selectively

If you only need specific parts of a document, query for just those elements and ignore the rest:

var document = await context.OpenAsync(url);

// Only process the main content area
var mainContent = document.QuerySelector("#main-content");
if (mainContent != null)
{
    // Work with just this subtree
    var paragraphs = mainContent.QuerySelectorAll("p");
    // ...
}

// Don't traverse the entire document

Common pitfalls and solutions

After working with AngleSharp for a while, I've hit these issues more than once. Here's how to avoid them.

Pitfall 1: Forgetting to await OpenAsync

AngleSharp's methods are async. If you forget to await, you'll get a Task instead of a document:

// Wrong - doesn't wait for document to load
var document = context.OpenAsync(url);

// Right
var document = await context.OpenAsync(url);

Pitfall 2: QuerySelector returns null

Always check if QuerySelector returns null before accessing properties:

// Bad - throws NullReferenceException if element doesn't exist
var text = document.QuerySelector(".missing").TextContent;

// Good - uses null-conditional operator
var text = document.QuerySelector(".missing")?.TextContent;

Pitfall 3: Not configuring the loader for external URLs

If you're loading from a URL, you must enable the default loader:

// This won't work - no HTTP client configured
var config = Configuration.Default;

// This will work
var config = Configuration.Default.WithDefaultLoader();

Pitfall 4: Expecting JavaScript to run automatically

The core AngleSharp library doesn't execute JavaScript. You need to either:

  1. Add AngleSharp.Js and use .WithJs()
  2. Use a headless browser like PuppeteerSharp

Pitfall 5: Parsing performance surprises

If parsing is taking longer than expected, check:

  • Are you parsing CSS? That adds overhead
  • Are you loading external resources? Disable with .WithDefaultLoader(new LoaderOptions { IsResourceLoadingEnabled = false })
  • Are you reusing the BrowsingContext or creating a new one each time?

Final thoughts

AngleSharp is the HTML parser I wish I'd discovered years ago. It handles modern HTML correctly, gives you a powerful query API, and extends naturally when you need CSS parsing or JavaScript execution.

The techniques in this guide—especially the sanitization, performance optimization, and PuppeteerSharp integration—aren't covered in the typical tutorials. But they're the patterns that make AngleSharp production-ready.

Start with basic parsing, then layer in the advanced features as you need them. And remember: AngleSharp follows web standards, so if you know the browser DOM API, you already know AngleSharp.

Now go parse some HTML.