How to Scrape Captions from YouTube

October 17, 2025

9 min read

Need to extract transcripts from YouTube videos for content analysis, accessibility tools, or research? You're in the right place. YouTube hosts billions of videos with captions—both manually created and auto-generated—that contain valuable data waiting to be extracted.

This guide walks through multiple approaches to scraping YouTube captions, from simple Python libraries to direct API calls. We'll cover what actually works in 2026, handle edge cases like age-restricted videos, and show you techniques most tutorials won't mention.

Why scrape YouTube captions?

Before diving into code, let's talk about why you'd want to do this. The obvious use cases are content analysis and accessibility tools, but there's more:

Training AI models. Video transcripts are gold for natural language processing. Whether you're building a chatbot or fine-tuning an LLM, YouTube captions provide massive amounts of conversational text data.

SEO and content repurposing. Extract transcripts to generate blog posts, articles, or show notes. The text is already there—you just need to grab it.

Research and sentiment analysis. Academics and marketers analyze video content at scale to track trends, monitor brand mentions, or study linguistic patterns.

Building video search tools. Make YouTube content searchable by indexing transcripts. Users can find specific moments in videos based on what was said, not just video metadata.

The catch? YouTube doesn't make this straightforward. Their official API requires OAuth authentication and has strict quota limits. That's why scraping approaches are often more practical for most projects.

Method 1: The Python approach (easiest)

Python's youtube-transcript-api library is the most reliable way to extract YouTube captions. It doesn't require an API key, works with auto-generated captions, and handles multiple languages. Best of all, it's actively maintained and updated to handle YouTube's changes.

Installation and basic usage

Install it with pip:

pip install youtube-transcript-api

Here's the simplest possible example:

from youtube_transcript_api import YouTubeTranscriptApi

video_id = "dQw4w9WgXcQ"  # Extract from youtube.com/watch?v=VIDEO_ID
transcript = YouTubeTranscriptApi.get_transcript(video_id)

for entry in transcript:
    print(f"[{entry['start']:.2f}s] {entry['text']}")

This returns a list of dictionaries with three keys: text (the caption text), start (timestamp in seconds), and duration (how long the caption displays).

Getting transcripts in different languages

YouTube videos often have captions in multiple languages. To specify which language you want:

from youtube_transcript_api import YouTubeTranscriptApi

video_id = "dQw4w9WgXcQ"

# Get Spanish captions
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['es'])

# Fallback to multiple languages
transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['es', 'en', 'fr'])

The languages parameter accepts a list of language codes. The library tries them in order and returns the first one available.

Listing all available transcripts

Before fetching a specific language, check what's actually available:

from youtube_transcript_api import YouTubeTranscriptApi

video_id = "dQw4w9WgXcQ"
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

for transcript in transcript_list:
    print(f"Language: {transcript.language}, Code: {transcript.language_code}")
    print(f"Is generated: {transcript.is_generated}")
    print(f"Translatable: {transcript.is_translatable}")
    print("---")

This tells you which captions are manually created vs. auto-generated, and whether translation is available.

Translating transcripts

Here's a neat trick: you can translate any transcript to another language on-the-fly. YouTube's caption system supports this natively:

from youtube_transcript_api import YouTubeTranscriptApi

video_id = "dQw4w9WgXcQ"
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

# Get English transcript and translate to German
transcript = transcript_list.find_transcript(['en'])
translated = transcript.translate('de')
caption_data = translated.fetch()

for entry in caption_data:
    print(entry['text'])

This is incredibly useful for multilingual projects where you need the same video's content in multiple languages.

Formatting output

Raw caption data is fine for processing, but what if you need a clean text file or JSON output?

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter, JSONFormatter

video_id = "dQw4w9WgXcQ"
transcript = YouTubeTranscriptApi.get_transcript(video_id)

# Plain text output
formatter = TextFormatter()
text_formatted = formatter.format_transcript(transcript)
print(text_formatted)

# Save to file
with open('transcript.txt', 'w', encoding='utf-8') as f:
    f.write(text_formatted)

# JSON output with pretty printing
json_formatter = JSONFormatter()
json_formatted = json_formatter.format_transcript(transcript, indent=2)
with open('transcript.json', 'w', encoding='utf-8') as f:
    f.write(json_formatted)

The TextFormatter strips out timestamps and gives you clean, readable text. Perfect for content repurposing or feeding into language models.

Method 2: Node.js caption extraction

If you're working in JavaScript, there are several libraries for extracting YouTube captions. The most reliable in 2026 is youtube-caption-extractor, which handles both Node.js and edge runtime environments.

Setup and basic extraction

Install the package:

npm install youtube-caption-extractor

Here's how to use it:

const { getSubtitles } = require('youtube-caption-extractor');

const videoID = 'dQw4w9WgXcQ';

getSubtitles({ videoID: videoID, lang: 'en' })
  .then(captions => {
    captions.forEach(caption => {
      console.log(`[${caption.start}s] ${caption.text}`);
    });
  })
  .catch(err => console.error('Error:', err));

The response format is similar to Python—you get an array of objects with start, dur, and text properties.

Handling videos with detailed metadata

Sometimes you need more than just captions. The youtube-caption-extractor library can also fetch video titles and descriptions:

const { getVideoDetails } = require('youtube-caption-extractor');

async function fetchVideoData(videoID) {
  try {
    const details = await getVideoDetails({
      videoID: videoID,
      lang: 'en'
    });

    console.log('Title:', details.title);
    console.log('Description:', details.description);
    console.log('\nSubtitles:');
    
    details.subtitles.forEach(sub => {
      console.log(`${sub.text}`);
    });
  } catch (error) {
    console.error('Error fetching video details:', error);
  }
}

fetchVideoData('dQw4w9WgXcQ');

This is particularly useful when you're building content analysis tools and need context beyond just the transcript.

Edge runtime compatibility

One major advantage of youtube-caption-extractor is its support for serverless environments like Vercel Edge Functions and Cloudflare Workers. The library automatically detects the environment and adapts:

// Works in Cloudflare Workers, Vercel Edge, AWS Lambda
import { getSubtitles } from 'youtube-caption-extractor';

export default async function handler(request) {
  const { searchParams } = new URL(request.url);
  const videoID = searchParams.get('videoId');

  const captions = await getSubtitles({
    videoID: videoID,
    lang: 'en'
  });

  return new Response(JSON.stringify(captions), {
    headers: { 'Content-Type': 'application/json' }
  });
}

This makes it dead simple to build API endpoints that extract captions on-demand without managing servers.

Method 3: YouTube's Innertube API (advanced)

Here's something most tutorials won't tell you: YouTube has an internal API called Innertube that its own web and mobile clients use. It's not officially documented, but we can leverage it to extract captions directly.

This approach gives you more control and is more resilient to library changes, though it requires more code.

Extracting the API key

YouTube embeds an API key in the HTML source of every video page. We can extract it with a simple regex:

const fetch = require('node-fetch');

async function getInnertubeApiKey(videoUrl) {
  const response = await fetch(videoUrl);
  const html = await response.text();
  
  const apiKeyMatch = html.match(/"INNERTUBE_API_KEY":"([^"]+)"/);
  
  if (!apiKeyMatch) {
    throw new Error('Could not extract API key');
  }
  
  return apiKeyMatch[1];
}

Once you have the key, you can make requests to YouTube's internal API.

Fetching player response data

The Innertube API requires specific client context. Here's how to structure the request:

async function getPlayerResponse(videoId, apiKey) {
  const url = `https://www.youtube.com/youtubei/v1/player?key=${apiKey}`;
  
  const payload = {
    videoId: videoId,
    context: {
      client: {
        clientName: 'ANDROID',
        clientVersion: '17.31.35',
        androidSdkVersion: 30,
      }
    }
  };

  const response = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(payload)
  });

  return await response.json();
}

Using the Android client context is important—it tends to have better success rates than web client requests.

Extracting caption track URLs

The player response contains caption track metadata. Parse it to get the caption URL:

function extractCaptionTrackUrl(playerResponse, lang = 'en') {
  const tracks = playerResponse?.captions
    ?.playerCaptionsTracklistRenderer
    ?.captionTracks;

  if (!tracks) {
    throw new Error('No caption tracks found');
  }

  const track = tracks.find(t => t.languageCode === lang);
  
  if (!track) {
    throw new Error(`No captions for language: ${lang}`);
  }

  // Clean up the URL (remove format parameter)
  return track.baseUrl.replace(/&fmt=\w+$/, '');
}

Downloading and parsing captions

Caption URLs return XML data. Here's how to fetch and convert it to JSON:

const xml2js = require('xml2js');

async function downloadAndParseCaptions(captionUrl) {
  const response = await fetch(captionUrl);
  const xmlText = await response.text();

  const parser = new xml2js.Parser();
  const result = await parser.parseStringPromise(xmlText);

  const captions = result.transcript.text.map(item => ({
    start: parseFloat(item.$.start),
    duration: parseFloat(item.$.dur || 0),
    text: item._
  }));

  return captions;
}

Putting it all together

Here's the complete Innertube extraction flow:

async function extractYouTubeTranscript(videoId, lang = 'en') {
  try {
    // Step 1: Get API key
    const videoUrl = `https://www.youtube.com/watch?v=${videoId}`;
    const apiKey = await getInnertubeApiKey(videoUrl);

    // Step 2: Fetch player response
    const playerResponse = await getPlayerResponse(videoId, apiKey);

    // Step 3: Extract caption URL
    const captionUrl = extractCaptionTrackUrl(playerResponse, lang);

    // Step 4: Download and parse
    const captions = await downloadAndParseCaptions(captionUrl);

    return captions;
  } catch (error) {
    console.error('Error extracting transcript:', error);
    throw error;
  }
}

// Usage
extractYouTubeTranscript('dQw4w9WgXcQ', 'en')
  .then(captions => {
    captions.forEach(c => console.log(c.text));
  });

This approach is more complex but gives you direct access to YouTube's internal systems. It's also more resilient to changes in third-party libraries.

Handling videos without captions

Not all YouTube videos have captions. Here's how to gracefully handle that:

from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import TranscriptsDisabled, NoTranscriptFound

def get_transcript_safely(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        return transcript
    except TranscriptsDisabled:
        print(f"Transcripts are disabled for video: {video_id}")
        return None
    except NoTranscriptFound:
        print(f"No transcripts found for video: {video_id}")
        return None
    except Exception as e:
        print(f"Unexpected error: {str(e)}")
        return None

For videos without captions, you have a few options:

Use speech-to-text APIs. Download the audio with yt-dlp and transcribe it using services like AssemblyAI, Whisper, or Deepgram.
Skip the video. If you're batch processing, just move to the next one.
Check for manually uploaded captions. Some videos disable auto-captions but have manual ones.

Working around rate limits and blocks

YouTube implements rate limiting and bot detection. Here's how to avoid getting blocked:

Add delays between requests

import time
from youtube_transcript_api import YouTubeTranscriptApi

video_ids = ['id1', 'id2', 'id3', 'id4']

for video_id in video_ids:
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id)
        print(f"Fetched transcript for {video_id}")
        
        # Wait 2-5 seconds between requests
        time.sleep(2)
    except Exception as e:
        print(f"Error with {video_id}: {str(e)}")

Random delays work even better:

import random

time.sleep(random.uniform(2, 5))

Use proxy rotation

If you're scraping at scale, rotate through proxies:

from youtube_transcript_api import YouTubeTranscriptApi

proxies = {
    'http': 'http://proxy1.example.com:8080',
    'https': 'https://proxy1.example.com:8080'
}

transcript = YouTubeTranscriptApi.get_transcript(
    video_id,
    proxies=proxies
)

For serious scraping operations, consider using residential proxies or a proxy service like Bright Data.

Handle IP blocks gracefully

YouTube may block your IP temporarily. Implement exponential backoff:

import time
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import YouTubeRequestFailed

def fetch_with_retry(video_id, max_retries=3):
    for attempt in range(max_retries):
        try:
            return YouTubeTranscriptApi.get_transcript(video_id)
        except YouTubeRequestFailed as e:
            if attempt < max_retries - 1:
                wait_time = (2 ** attempt) * 5  # 5, 10, 20 seconds
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise e

Batch processing multiple videos

When you need to extract captions from hundreds or thousands of videos, efficiency matters. Here's a robust batch processing setup:

from youtube_transcript_api import YouTubeTranscriptApi
import csv
import time
import random

def batch_extract_captions(video_ids, output_file='transcripts.csv'):
    results = []
    
    for idx, video_id in enumerate(video_ids, 1):
        try:
            print(f"Processing {idx}/{len(video_ids)}: {video_id}")
            
            # Get transcript
            transcript = YouTubeTranscriptApi.get_transcript(video_id)
            
            # Combine all text
            full_text = ' '.join([entry['text'] for entry in transcript])
            
            results.append({
                'video_id': video_id,
                'transcript': full_text,
                'status': 'success'
            })
            
            # Random delay to avoid rate limits
            time.sleep(random.uniform(1, 3))
            
        except Exception as e:
            results.append({
                'video_id': video_id,
                'transcript': '',
                'status': f'error: {str(e)}'
            })
    
    # Save to CSV
    with open(output_file, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=['video_id', 'transcript', 'status'])
        writer.writeheader()
        writer.writerows(results)
    
    print(f"Saved {len(results)} results to {output_file}")

# Usage
video_list = ['dQw4w9WgXcQ', 'jNQXAC9IVRw', 'L_jWHffIx5E']
batch_extract_captions(video_list)

This approach handles errors gracefully, adds delays, and saves everything to a CSV for easy analysis.

Common issues and fixes

Issue: "YouTube is blocking requests from my server"

This often happens on cloud servers (AWS EC2, DigitalOcean, etc.). YouTube detects datacenter IPs and blocks them.

Solution: Use a SOCKS5 proxy with Tor:

from youtube_transcript_api import YouTubeTranscriptApi

proxies = {
    'http': 'socks5://127.0.0.1:9050',
    'https': 'socks5://127.0.0.1:9050'
}

transcript = YouTubeTranscriptApi.get_transcript(
    video_id,
    proxies=proxies
)

Install Tor on your server first (sudo apt-get install tor), then route requests through it.

Issue: "Transcript is messy with weird line breaks"

Auto-generated captions often have poor formatting. Clean them up:

def clean_transcript(transcript):
    # Combine all text
    text = ' '.join([entry['text'] for entry in transcript])
    
    # Remove extra whitespace
    text = ' '.join(text.split())
    
    # Fix common issues
    text = text.replace(' .', '.')
    text = text.replace(' ,', ',')
    text = text.replace(' ?', '?')
    
    return text

For even better results, use an LLM like GPT-4 to clean and punctuate the text properly.

Issue: "Age-restricted videos won't return captions"

Age-restricted videos require authentication. There's no clean workaround with scraping libraries—you'll need to use the official YouTube Data API with OAuth.

When to use the official YouTube Data API

The official API has advantages despite its complexity:

Legal clarity. Using the official API is explicitly allowed under YouTube's terms.
Access to private videos. If you own the video, you can fetch captions via OAuth.
Higher reliability. Google maintains this API, so it won't break unexpectedly.

The downside? It requires OAuth setup and has strict quota limits (10,000 units per day). Fetching a caption costs 200 units, so you can only grab ~50 captions daily.

Here's a quick example using the official API:

from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build

def download_caption_official(video_id, caption_id):
    # Load OAuth credentials (setup required)
    credentials = Credentials.from_authorized_user_file('token.json')
    
    youtube = build('youtube', 'v3', credentials=credentials)
    
    request = youtube.captions().download(id=caption_id)
    caption_data = request.execute()
    
    return caption_data

For most scraping projects, the unofficial methods we covered earlier are more practical. Use the official API only when you need guaranteed access or are working with your own videos.

Wrapping up

Scraping YouTube captions isn't as straightforward as it should be, but with the right tools, it's definitely doable. The youtube-transcript-api library for Python and youtube-caption-extractor for Node.js handle most use cases with minimal code. For advanced needs, accessing YouTube's Innertube API directly gives you more control.

The key takeaways:

Start with established libraries before building custom scrapers
Always add delays between requests to avoid rate limits
Handle errors gracefully—not every video has captions
Consider proxies if you're processing at scale
Clean up auto-generated captions for better readability

Whether you're building a content analysis tool, training an AI model, or making videos more accessible, extracting YouTube captions opens up a world of possibilities. Just remember to respect YouTube's resources and avoid hammering their servers with rapid-fire requests.

Now go extract some transcripts—the data is waiting.

Marius Bernard

Marius Bernard is a Web Scraping Engineer & Technical Advisor at Roundproxies. He authored the Web Scraping chapter of the 2024 Web Almanac/Techinsider. He loves python, golang and proxies.

Get the best
proxies out there

Get Proxies now

Related from Knowledge Base

Go Web Scraping: Complete 2025 Guide & Code Examples

PHP Web Scraping Guide 2026: Speed & Anti-Bot Tips

C# Web Scraping Guide: Build Fast Working Scrapers

Web Scraping in R: Complete Guide 2026

Web Scraping in Rust: Complete 2026 Guide

How to Do Web Scraping in Kotlin: The Developer's Guide

How to Do Web Scraping in Lua: A Developer's Guide

How to Do Web Scraping in Dart: A Complete 2026 Guide

How to Do Web Scraping in Perl: The Complete Developer's Guide

Python Web Scraping Guide: Build Scrapers in 2026

How to Use Botasaurus in 2026

How to Scrape Dynamic Websites With Headless Web Browsers

12 Ways to Make HTTPS Requests in Node.js

15 Methods to Not Get Blocked Web Scraping

How to Use Playwright Playwright Proxy in 2026

How to Take Screenshots with Puppeteer

How to Store and Manage Scraped Data Efficiently

User-Agent Rotation: Why and How to Implement It

How to Scrape Data Behind Login Pages

What Are Backconnect Proxies and How They Work