Web Scraping

How to Scrape Kalshi in 2026: 4 Working Methods

Kalshi sits on a goldmine of prediction market data — contract prices, order books, trade history, and implied probabilities across thousands of events. Whether you're building a trading bot, backtesting strategies, or researching market efficiency, you need that data in your own pipeline.

Good news: Kalshi actually makes this easier than most platforms. They offer public API endpoints that require zero authentication for market data. No headless browsers, no CAPTCHA solving, no reverse-engineering JavaScript.

This guide walks you through four methods to scrape Kalshi data with Python, from simple REST calls to real-time WebSocket streams. Every code block runs as-is. By the end, you'll have a complete data collection pipeline exporting to CSV or JSON.

I tested each method against Kalshi's production API while writing this. The code works as of February 2026, and I've noted where things might change.

What Is Kalshi Scraping?

Scraping Kalshi means programmatically extracting prediction market data — prices, volumes, order books, and trade histories — from the Kalshi exchange. Unlike traditional web scraping that parses HTML, Kalshi provides a public REST API at api.elections.kalshi.com that returns structured JSON. No authentication is required for read-only market data. For real-time streaming, WebSocket channels deliver live updates without polling. The fastest path is their REST API, which handles most data collection needs.

Prerequisites

Before writing any code, get your environment set up:

  • Python 3.8+ (3.11+ recommended for better asyncio performance)
  • pip packages: requests, websockets, pandas
  • Optional: A Kalshi account + API key (only needed for portfolio/order data)

Install everything in one shot:

pip install requests websockets pandas

Public market data endpoints don't require an account. You can start pulling data immediately.

How Kalshi's Data Is Structured

Kalshi organizes markets in a three-level hierarchy. Understanding this saves you from writing confused API calls.

Series is the top level — a recurring category like "Highest temperature in NYC" or "S&P 500 closing price." Each series has a ticker like KXHIGHNY.

Events are specific instances within a series. "Will NYC hit 90°F on February 10, 2026?" is one event inside the KXHIGHNY series.

Markets are the actual tradeable contracts within an event. Each market has YES/NO prices, volume, and an order book.

The API mirrors this hierarchy exactly. You'll query /series, /events, and /markets endpoints depending on what you need.

Method 1: REST API — Pull All Open Markets

This is the simplest approach and handles 90% of data collection needs. Kalshi's public API returns JSON without authentication for market data.

Start by fetching every open market on the exchange:

import requests
import json
from datetime import datetime

BASE_URL = "https://api.elections.kalshi.com/trade-api/v2"

def get_all_markets(status="open", limit=200):
    """Fetch all markets with cursor-based pagination."""
    all_markets = []
    cursor = None

    while True:
        params = {"status": status, "limit": limit}
        if cursor:
            params["cursor"] = cursor

        response = requests.get(f"{BASE_URL}/markets", params=params)
        response.raise_for_status()
        data = response.json()

        markets = data.get("markets", [])
        all_markets.extend(markets)

        # Kalshi uses cursor-based pagination
        cursor = data.get("cursor")
        if not cursor or not markets:
            break

    return all_markets

markets = get_all_markets()
print(f"Fetched {len(markets)} open markets")

The limit parameter caps at 200 per request. The cursor field in each response tells you where to pick up next. When it's empty, you've hit the end.

Each market object comes loaded with fields. Here are the ones you'll actually use:

# Inspect what you're getting back
sample = markets[0]
useful_fields = {
    "ticker": sample["ticker"],
    "title": sample["title"],
    "yes_price": sample["yes_price"],      # in cents (0-100)
    "no_price": sample.get("no_price"),
    "volume": sample["volume"],
    "open_interest": sample.get("open_interest"),
    "event_ticker": sample["event_ticker"],
    "series_ticker": sample.get("series_ticker"),
    "close_time": sample.get("close_time"),
    "status": sample["status"],
}
print(json.dumps(useful_fields, indent=2))

Prices are in cents. A yes_price of 65 means the market implies a 65% probability of that outcome. That's the core data point most researchers care about.

Method 2: Targeted Series and Event Scraping

Pulling every market works, but it's noisy. If you're tracking specific categories — say, weather or economics — filter by series ticker instead.

def get_series_markets(series_ticker, status="open"):
    """Get all markets for a specific series."""
    params = {
        "series_ticker": series_ticker,
        "status": status,
    }
    response = requests.get(f"{BASE_URL}/markets", params=params)
    response.raise_for_status()
    return response.json().get("markets", [])

def get_event_details(event_ticker):
    """Get full details for a specific event."""
    response = requests.get(f"{BASE_URL}/events/{event_ticker}")
    response.raise_for_status()
    return response.json().get("event", {})

# Example: Fetch all NYC temperature markets
weather_markets = get_series_markets("KXHIGHNY")
for m in weather_markets[:5]:
    print(f"{m['ticker']}: {m['title']} — Yes: {m['yes_price']}c")

This returns only the markets you want, so you burn fewer API calls against the rate limit.

For order book depth — which tells you about liquidity and where the real money sits — grab the order book for any specific market:

def get_orderbook(market_ticker):
    """Fetch the order book for a single market."""
    url = f"{BASE_URL}/markets/{market_ticker}/orderbook"
    response = requests.get(url)
    response.raise_for_status()
    return response.json().get("orderbook", {})

# Grab the first weather market's order book
if weather_markets:
    ticker = weather_markets[0]["ticker"]
    book = get_orderbook(ticker)

    print(f"\nOrderbook for {ticker}:")
    print("YES bids:")
    for price, qty in book.get("yes", [])[:5]:
        print(f"  {price}c x {qty} contracts")

    print("NO bids:")
    for price, qty in book.get("no", [])[:5]:
        print(f"  {price}c x {qty} contracts")

Kalshi's order book only shows bids, not asks. That's by design — in binary markets, a YES bid at 40 cents is equivalent to a NO ask at 60 cents. The prices are complementary.

Method 3: Real-Time WebSocket Streaming

REST polling works fine for snapshots. But if you need live prices — for a trading bot or real-time dashboard — WebSockets eliminate the latency of repeated HTTP requests.

Kalshi's public WebSocket channels include ticker, trade, and orderbook_delta. No authentication needed for these.

import asyncio
import websockets
import json

WEBSOCKET_URL = "wss://api.elections.kalshi.com/trade-api/ws/v2"

async def stream_market_ticker(market_ticker, duration_seconds=60):
    """Stream live ticker updates for a market."""
    async with websockets.connect(WEBSOCKET_URL) as ws:
        # Subscribe to the ticker channel
        subscribe_msg = {
            "id": 1,
            "cmd": "subscribe",
            "params": {
                "channels": ["ticker"],
                "market_ticker": market_ticker,
            },
        }
        await ws.send(json.dumps(subscribe_msg))
        print(f"Subscribed to {market_ticker} ticker feed")

        # Collect updates for the specified duration
        end_time = asyncio.get_event_loop().time() + duration_seconds
        while asyncio.get_event_loop().time() < end_time:
            try:
                message = await asyncio.wait_for(
                    ws.recv(), timeout=5.0
                )
                data = json.loads(message)
                msg_type = data.get("type")

                if msg_type == "ticker":
                    msg = data["msg"]
                    print(
                        f"[{msg['market_ticker']}] "
                        f"Yes: {msg.get('yes_bid')}/"
                        f"{msg.get('yes_ask')}c "
                        f"Volume: {msg.get('volume')}"
                    )
                elif msg_type == "error":
                    print(f"Error: {data}")
                    break
            except asyncio.TimeoutError:
                continue  # No updates in 5s, keep waiting

# Run it (replace with an active market ticker)
# asyncio.run(stream_market_ticker("KXHIGHNY-26FEB10-T45"))

The ticker channel pushes updates only when the market moves. During quiet periods, you might wait seconds between messages. That's normal — it means nothing changed.

For order book changes, swap "ticker" for "orderbook_delta". You'll get a full snapshot first, then incremental deltas after that.

Method 4: Browser Scraping as a Fallback

The API covers almost everything. But occasionally you might need data that only appears on the Kalshi website — historical charts, market descriptions with rich formatting, or user-facing statistics not exposed via the API.

For those edge cases, Playwright handles Kalshi's JavaScript-rendered pages:

# pip install playwright
# playwright install chromium

from playwright.sync_api import sync_playwright

def scrape_kalshi_page(market_url):
    """Scrape data from a Kalshi market page."""
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        # Block unnecessary resources to speed things up
        page.route(
            "**/*.{png,jpg,jpeg,gif,svg,woff,woff2}",
            lambda route: route.abort(),
        )

        page.goto(market_url, wait_until="networkidle")

        # Extract the market title
        title = page.locator("h1").first.text_content()

        # Grab any visible price data from the page
        price_elements = page.locator(
            "[data-testid='yes-price'], .yes-price"
        ).all_text_contents()

        browser.close()
        return {"title": title, "prices": price_elements}

# Example usage:
# data = scrape_kalshi_page("https://kalshi.com/markets/kxhighny")

Use this sparingly. The API is faster, more reliable, and doesn't require a browser binary. Browser scraping also puts more load on Kalshi's servers, so respect their infrastructure.

If you're scaling browser-based collection beyond a handful of pages, rotate your IP addresses through residential proxies. Roundproxies residential proxies work well here since prediction market sites tend to flag datacenter IPs faster than residential ones.

Authenticated Endpoints: Trade History and Portfolio Data

The methods above cover public market data. To scrape Kalshi trade history, candlestick data, or your own portfolio, you need API key authentication.

Generate your keys in the Kalshi dashboard under Settings > API. You'll get a key ID and a private key file for RSA signing.

import time
import base64
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import padding, utils

def get_auth_headers(api_key_id, private_key_path, method, path):
    """Generate signed authentication headers for Kalshi API."""
    timestamp = str(int(time.time() * 1000))

    # Load your private key
    with open(private_key_path, "rb") as f:
        private_key = serialization.load_pem_private_key(
            f.read(), password=None
        )

    # Create the message to sign
    message = f"{timestamp}{method}{path}"
    signature = private_key.sign(
        message.encode(),
        padding.PKCS1v15(),
        hashes.SHA256(),
    )

    return {
        "KALSHI-ACCESS-KEY": api_key_id,
        "KALSHI-ACCESS-SIGNATURE": base64.b64encode(
            signature
        ).decode(),
        "KALSHI-ACCESS-TIMESTAMP": timestamp,
    }

With authentication, you unlock endpoints like /markets/{ticker}/trades for recent trade data and /markets/{ticker}/candlesticks for OHLC-style price history. These are gold for backtesting.

def get_market_trades(ticker, api_key_id, private_key_path):
    """Fetch recent trades for a specific market."""
    path = f"/trade-api/v2/markets/{ticker}/trades"
    headers = get_auth_headers(
        api_key_id, private_key_path, "GET", path
    )

    resp = requests.get(
        f"{BASE_URL}/markets/{ticker}/trades",
        headers=headers,
    )
    resp.raise_for_status()
    return resp.json().get("trades", [])

One thing to watch: authenticated requests count against the same rate limit as public ones. Don't run your public scraper and authenticated scraper simultaneously from the same key.

What Data Can You Actually Scrape from Kalshi?

Before you start writing code, it helps to know exactly what's available. Here's the full breakdown of data you can extract when you scrape Kalshi's API:

Public (no auth needed):

Endpoint Data Use Case
/markets All markets with prices, volume, status Bulk snapshots, screening
/markets/{ticker} Single market details Targeted monitoring
/markets/{ticker}/orderbook Bid depth for YES and NO Liquidity analysis
/series/{ticker} Series metadata and category Categorization
/events/{ticker} Event details and linked markets Event-level analysis

Authenticated (API key required):

Endpoint Data Use Case
/markets/{ticker}/trades Recent trade history Volume analysis, backtesting
/markets/{ticker}/candlesticks OHLC price bars Chart building, technical analysis
/portfolio/positions Your open positions Portfolio tracking
/portfolio/fills Your executed trades Performance analysis

The public endpoints alone give you enough to build market scanners, probability trackers, and research datasets. Authenticated access adds the historical depth needed for serious quantitative work.

Building a Complete Data Pipeline

Individual API calls are useful for exploration. For ongoing data collection, you need a pipeline that handles pagination, rate limits, and storage.

Here's a production-ready scraper that pulls all markets and saves to CSV:

import requests
import pandas as pd
import time
from datetime import datetime

BASE_URL = "https://api.elections.kalshi.com/trade-api/v2"
RATE_LIMIT_DELAY = 0.06  # ~16 req/sec (under 20/sec limit)

def fetch_all_markets_to_df(status="open"):
    """
    Paginate through all markets and return a DataFrame.
    Respects rate limits with built-in delays.
    """
    all_markets = []
    cursor = None
    page = 0

    while True:
        params = {"status": status, "limit": 200}
        if cursor:
            params["cursor"] = cursor

        try:
            resp = requests.get(
                f"{BASE_URL}/markets", params=params
            )
            resp.raise_for_status()
        except requests.exceptions.HTTPError as e:
            if resp.status_code == 429:
                print("Rate limited. Waiting 5 seconds...")
                time.sleep(5)
                continue
            raise e

        data = resp.json()
        markets = data.get("markets", [])
        all_markets.extend(markets)

        page += 1
        print(f"Page {page}: fetched {len(markets)} markets "
              f"(total: {len(all_markets)})")

        cursor = data.get("cursor")
        if not cursor or not markets:
            break

        time.sleep(RATE_LIMIT_DELAY)

    return pd.DataFrame(all_markets)


def enrich_with_orderbooks(df, top_n=50):
    """
    Add order book depth for the top N markets by volume.
    """
    top_markets = df.nlargest(top_n, "volume")
    spreads = []

    for _, row in top_markets.iterrows():
        ticker = row["ticker"]
        try:
            resp = requests.get(
                f"{BASE_URL}/markets/{ticker}/orderbook"
            )
            resp.raise_for_status()
            book = resp.json().get("orderbook", {})

            yes_bids = book.get("yes", [])
            no_bids = book.get("no", [])

            spreads.append({
                "ticker": ticker,
                "best_yes_bid": yes_bids[0][0] if yes_bids else None,
                "best_no_bid": no_bids[0][0] if no_bids else None,
                "yes_depth": len(yes_bids),
                "no_depth": len(no_bids),
            })
        except Exception as e:
            print(f"Skipping {ticker}: {e}")
            spreads.append({"ticker": ticker})

        time.sleep(RATE_LIMIT_DELAY)

    spread_df = pd.DataFrame(spreads)
    return df.merge(spread_df, on="ticker", how="left")


# Run the pipeline
print(f"Starting Kalshi scrape at {datetime.now()}")
df = fetch_all_markets_to_df()
df = enrich_with_orderbooks(df, top_n=50)

filename = f"kalshi_markets_{datetime.now():%Y%m%d_%H%M}.csv"
df.to_csv(filename, index=False)
print(f"Saved {len(df)} markets to {filename}")

print(f"\nCategories: {df['category'].nunique()}")
print(f"Total volume: {df['volume'].sum():,.0f} contracts")
print(f"Markets with orderbook data: "
      f"{df['best_yes_bid'].notna().sum()}")

This script handles the two most common failure modes: rate limiting (429 responses) and missing data. The RATE_LIMIT_DELAY keeps you under the Basic tier's 20 reads/second ceiling.

Kalshi API Rate Limits

Kalshi enforces rate limits by tier. Here's what you're working with:

Tier Read Limit Write Limit How to Qualify
Basic 20/second 10/second Create an account
Advanced 30/second 30/second Apply via Kalshi's form
Premier 100/second 100/second 3.75% of exchange volume/month
Prime 400/second 400/second 7.5% of exchange volume/month

For read-only data collection, Basic tier is fine. Twenty requests per second gives you roughly 4,000 markets per minute with pagination.

If you hit a 429 response, back off exponentially:

import time

def request_with_backoff(url, params=None, max_retries=5):
    """Make a GET request with exponential backoff on 429."""
    for attempt in range(max_retries):
        resp = requests.get(url, params=params)

        if resp.status_code == 429:
            wait = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
            continue

        resp.raise_for_status()
        return resp.json()

    raise Exception(f"Failed after {max_retries} retries")

Don't try to game the rate limit with distributed requests from multiple IPs. Kalshi tracks by API key for authenticated requests, not by IP. Hammering their public endpoints from dozens of addresses is a fast way to get blocked entirely.

Scheduling Recurring Scrapes

One-off pulls are useful for analysis. Continuous collection is where the real value lives — tracking how markets react to news, building historical datasets, or feeding a trading model.

Use cron for scheduled scrapes on a Linux server:

# Run every 15 minutes during market hours (6am-11pm ET)
*/15 6-23 * * * cd /home/user/kalshi-scraper && python scrape.py >> scrape.log 2>&1

Or wrap it in a Python loop with the schedule library:

import schedule
import time

def scrape_job():
    """Run the full scrape pipeline."""
    try:
        df = fetch_all_markets_to_df()
        filename = f"data/kalshi_{datetime.now():%Y%m%d_%H%M}.csv"
        df.to_csv(filename, index=False)
        print(f"[{datetime.now()}] Saved {len(df)} markets")
    except Exception as e:
        print(f"[{datetime.now()}] Scrape failed: {e}")

# pip install schedule
schedule.every(15).minutes.do(scrape_job)

while True:
    schedule.run_pending()
    time.sleep(1)

Store each run as a separate timestamped file. This gives you a clean historical record without worrying about database migrations early on. When the dataset grows large enough to warrant a database, migrate then.

Troubleshooting

"429 Too Many Requests"

Why: You exceeded Kalshi's rate limit for your tier (20 reads/second for Basic).

Fix: Add a delay between requests. time.sleep(0.06) keeps you at roughly 16 requests/second, leaving headroom. If you're already doing that and still getting 429s, check if another script is running against the same IP or API key.

Empty markets Array in Response

Why: Your filter parameters are too restrictive, or the series has no currently open markets.

Fix: Remove the status=open parameter or set status=all to include settled and closed markets. Some series only have markets during specific hours — weather markets, for example, resolve daily.

WebSocket Connection Drops

Why: Kalshi disconnects idle WebSocket connections, or network hiccups occur.

Fix: Wrap your WebSocket logic in a reconnection loop:

async def resilient_stream(market_ticker):
    """Auto-reconnect on WebSocket failures."""
    while True:
        try:
            await stream_market_ticker(market_ticker)
        except websockets.exceptions.ConnectionClosed:
            print("Connection lost. Reconnecting in 3s...")
            await asyncio.sleep(3)
        except Exception as e:
            print(f"Unexpected error: {e}. Retrying in 10s...")
            await asyncio.sleep(10)

KeyError When Accessing Market Fields

Why: Not all markets have every field populated. Settled markets lack order book data. Older markets may be missing fields like open_interest.

Fix: Use .get() with defaults instead of direct key access:

# Instead of market["open_interest"]
oi = market.get("open_interest", 0)

A Note on Responsible Scraping

Kalshi provides a public API specifically so developers can access their data programmatically. Use it.

Don't scrape their website HTML when the API gives you the same data faster and cleaner. Respect the rate limits — they exist to keep the exchange running smoothly for everyone, including traders with live orders.

If you need higher throughput than the Basic tier allows, apply for Advanced access through Kalshi's API tier form. It's free and they're generally responsive.

Check Kalshi's Developer Agreement before building anything commercial. The API is for building tools — not for redistributing raw market data to third parties.

Kalshi vs. Polymarket: Scraping Differences

If you're working with prediction market data, you're probably looking at both platforms. The approach to scrape Kalshi differs significantly from Polymarket.

Kalshi gives you a clean REST API with public endpoints and official documentation. The data is centralized on their servers and returned as standard JSON. Polymarket is blockchain-based, so you're either querying their separate API (which has a different structure and auth model) or reading directly from the Polygon chain via RPC calls.

For most Python developers, Kalshi is the easier starting point. Structured JSON responses, cursor-based pagination, and WebSocket channels that follow standard patterns. Polymarket requires understanding of blockchain data models and often needs specialized libraries like web3.py.

The data shapes are different too. Kalshi markets are binary YES/NO contracts priced in cents. Polymarket uses outcome tokens with USDC-denominated prices. Your parsing logic won't be interchangeable.

If you need data from both, build separate scrapers. Don't try to shoehorn them into a single abstraction — the data models are too different. Write a clean interface for each and merge the data downstream in pandas or your database.

Wrapping Up

You now have four working methods to scrape Kalshi data: REST API for bulk snapshots, targeted series queries for focused collection, WebSocket streams for real-time feeds, and Playwright as a fallback for edge cases.

Start with Method 1. It covers the vast majority of use cases and takes five minutes to get running. Graduate to WebSockets when you need live data, and use the pipeline code to automate ongoing collection.

For most projects, the unauthenticated REST endpoints are all you need to scrape Kalshi effectively. Add API key authentication when you want trade history and candlestick data for backtesting. And don't overlook the order book endpoint — bid depth and liquidity gaps are where the real alpha hides in prediction markets.

The prediction market data is there. Go build something with it.