Making HTTP requests in Python used to involve wrestling with urllib2's confusing API and writing boilerplate code for simple tasks. The requests
library changed that by providing an intuitive interface that actually makes sense. If you've ever needed to fetch data from an API, scrape a website, or interact with web services, requests is the tool you'll reach for.
In this guide, I'll walk you through everything from basic GET requests to advanced techniques like automatic retries and streaming large files—including some tricks that don't show up in the typical tutorials.
Installing Requests
Requests isn't part of Python's standard library, so you'll need to install it first:
pip install requests
For Python 3 specifically:
pip3 install requests
Once installed, import it into your script:
import requests
That's it. You're ready to start making HTTP requests.
Making Your First GET Request
The most common HTTP method is GET—it retrieves data from a server. Here's how to make a GET request:
import requests
response = requests.get('https://api.github.com/users/github')
print(response.status_code) # 200
print(response.text) # Raw response content
The requests.get()
function returns a Response
object that contains everything the server sent back. You can check if the request succeeded by looking at the status code—200 means success.
Understanding the Response Object
The Response object gives you several ways to access the data:
response = requests.get('https://api.github.com/users/github')
# Status code
print(response.status_code) # 200
# Response headers
print(response.headers['content-type']) # application/json
# Raw text content
print(response.text)
# Parsed JSON (if the response is JSON)
data = response.json()
print(data['name']) # GitHub
# Raw bytes
print(response.content)
The difference between .text
and .content
is important: .text
gives you a string (decoded using the response's encoding), while .content
gives you raw bytes. For JSON APIs, use .json()
to automatically parse the response into a Python dictionary.
Query Parameters: The Right Way
Most APIs accept parameters through query strings. You could build the URL manually, but requests has a better way:
# Don't do this
url = 'https://api.github.com/search/repositories?q=python&sort=stars'
response = requests.get(url)
# Do this instead
url = 'https://api.github.com/search/repositories'
params = {
'q': 'python',
'sort': 'stars',
'order': 'desc'
}
response = requests.get(url, params=params)
print(response.url) # Shows the full URL with encoded parameters
Requests handles URL encoding automatically, which is especially helpful when dealing with special characters or spaces in parameters.
Passing Lists as Parameters
Sometimes you need to pass multiple values for the same parameter:
params = {
'lang': ['python', 'javascript'],
'sort': 'stars'
}
response = requests.get('https://api.example.com/repos', params=params)
# Results in: ?lang=python&lang=javascript&sort=stars
POST, PUT, PATCH, and DELETE Requests
GET is just one HTTP method. Here's how to use the others:
POST Requests
POST is typically used to create new resources or submit form data:
# Sending form data
data = {
'username': 'johndoe',
'email': 'john@example.com'
}
response = requests.post('https://api.example.com/users', data=data)
# Sending JSON
import json
payload = {
'title': 'New Post',
'content': 'This is the content'
}
response = requests.post(
'https://api.example.com/posts',
json=payload # Automatically sets Content-Type to application/json
)
Notice the json=
parameter. This is cleaner than manually serializing your data and setting headers.
PUT and PATCH Requests
PUT replaces an entire resource, while PATCH updates specific fields:
# PUT - replace entire resource
updated_user = {
'username': 'johndoe',
'email': 'newemail@example.com',
'bio': 'New bio'
}
response = requests.put('https://api.example.com/users/123', json=updated_user)
# PATCH - update specific fields
changes = {'bio': 'Updated bio only'}
response = requests.patch('https://api.example.com/users/123', json=changes)
DELETE Requests
Delete a resource:
response = requests.delete('https://api.example.com/users/123')
if response.status_code == 204:
print("User deleted successfully")
Custom Headers
Many APIs require custom headers for authentication or to specify content types:
headers = {
'User-Agent': 'MyApp/1.0',
'Authorization': 'Bearer YOUR_API_TOKEN',
'Accept': 'application/json'
}
response = requests.get('https://api.example.com/data', headers=headers)
A word of caution: some websites block requests that don't have a proper User-Agent header. Setting one can help avoid getting blocked.
Error Handling: Beyond Status Codes
Most tutorials tell you to check response.status_code
, but there's a better approach:
try:
response = requests.get('https://api.example.com/data', timeout=5)
response.raise_for_status() # Raises HTTPError for bad responses (4xx, 5xx)
data = response.json()
# Process your data
except requests.exceptions.HTTPError as e:
print(f"HTTP error occurred: {e}")
except requests.exceptions.ConnectionError:
print("Failed to connect to the server")
except requests.exceptions.Timeout:
print("Request timed out")
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
The raise_for_status()
method automatically raises an exception for error status codes, making your error handling cleaner.
Why You Should Always Set Timeouts
Here's something most tutorials skip: always set a timeout. Without one, your request can hang indefinitely if the server stops responding:
# Don't do this - can hang forever
response = requests.get('https://api.example.com/data')
# Do this
response = requests.get('https://api.example.com/data', timeout=5)
The timeout is in seconds. For more control, use a tuple to set separate timeouts for connection and read:
# (connection timeout, read timeout)
response = requests.get('https://api.example.com/data', timeout=(3, 10))
This gives the connection 3 seconds to establish and the server 10 seconds to send data.
Sessions: The Performance Booster Nobody Talks About
If you're making multiple requests to the same host, use a Session object. It reuses the underlying TCP connection, which can dramatically speed up your code:
# Slow - creates a new connection for each request
for i in range(100):
response = requests.get(f'https://api.example.com/data/{i}')
# Fast - reuses the same connection
session = requests.Session()
for i in range(100):
response = session.get(f'https://api.example.com/data/{i}')
session.close()
Sessions also persist cookies, headers, and other settings across requests:
session = requests.Session()
session.headers.update({'Authorization': 'Bearer YOUR_TOKEN'})
# This header applies to all requests made with this session
response1 = session.get('https://api.example.com/users')
response2 = session.get('https://api.example.com/posts')
Always close your session when done, or use it as a context manager:
with requests.Session() as session:
response = session.get('https://api.example.com/data')
# Session automatically closes when exiting the with block
Authentication Made Simple
Requests supports multiple authentication methods out of the box.
Basic Authentication
from requests.auth import HTTPBasicAuth
response = requests.get(
'https://api.example.com/data',
auth=HTTPBasicAuth('username', 'password')
)
# Or use the shorthand
response = requests.get(
'https://api.example.com/data',
auth=('username', 'password')
)
Bearer Token Authentication
For API tokens:
headers = {'Authorization': 'Bearer YOUR_TOKEN'}
response = requests.get('https://api.example.com/data', headers=headers)
OAuth Authentication
For OAuth, you'll need the requests-oauthlib
library:
pip install requests-oauthlib
from requests_oauthlib import OAuth1
auth = OAuth1('YOUR_APP_KEY', 'YOUR_APP_SECRET',
'YOUR_USER_TOKEN', 'YOUR_USER_SECRET')
response = requests.get('https://api.example.com/data', auth=auth)
Automatic Retries: The Professional Approach
Network requests fail. Servers go down. Connections timeout. Instead of manually catching exceptions and retrying, configure requests to handle it automatically:
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Create a retry strategy
retry_strategy = Retry(
total=3, # Total number of retries
status_forcelist=[429, 500, 502, 503, 504], # Retry on these status codes
backoff_factor=1, # Wait 1, 2, 4 seconds between retries
allowed_methods=["HEAD", "GET", "OPTIONS", "POST"] # Methods to retry
)
# Mount it to a session
adapter = HTTPAdapter(max_retries=retry_strategy)
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)
# Now all requests through this session will automatically retry
response = session.get('https://api.example.com/data')
The backoff_factor
creates exponential backoff: with a factor of 1, it waits 1 second before the first retry, 2 seconds before the second, and 4 seconds before the third. This prevents hammering a server that's already struggling.
Understanding status_forcelist
By default, retries only happen for connection errors and certain exceptions. The status_forcelist
parameter tells requests which HTTP status codes should trigger a retry:
- 429: Rate limit exceeded (wait and retry)
- 500: Internal server error (server issue, might work on retry)
- 502: Bad gateway (proxy issue)
- 503: Service unavailable (server overloaded)
- 504: Gateway timeout
Don't retry on 4xx errors (except 429) because those indicate client errors—your request is malformed, and retrying won't help.
Streaming Large Responses
When downloading large files or dealing with streaming APIs, loading the entire response into memory is a bad idea. Use streaming instead:
url = 'https://example.com/large-file.zip'
with requests.get(url, stream=True) as response:
response.raise_for_status()
with open('large-file.zip', 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
This downloads the file in 8KB chunks, keeping memory usage low regardless of file size.
Adding a Progress Bar
Want to show download progress? Combine streaming with the file size from headers:
import requests
url = 'https://example.com/large-file.zip'
response = requests.get(url, stream=True)
total_size = int(response.headers.get('content-length', 0))
downloaded = 0
with open('large-file.zip', 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
downloaded += len(chunk)
f.write(chunk)
progress = (downloaded / total_size) * 100
print(f"\rProgress: {progress:.1f}%", end='')
Streaming Line-by-Line
For streaming APIs that send data line by line (like server-sent events):
response = requests.get('https://api.example.com/stream', stream=True)
for line in response.iter_lines(decode_unicode=True):
if line: # Filter out keep-alive newlines
data = json.loads(line)
print(f"Received: {data}")
Event Hooks: Intercepting Requests and Responses
Hooks let you inject custom behavior at different points in the request lifecycle. This is useful for logging, monitoring, or modifying responses:
def log_request(response, *args, **kwargs):
print(f"Request to {response.url} returned {response.status_code}")
print(f"Response time: {response.elapsed.total_seconds():.2f}s")
# Attach the hook to a single request
response = requests.get(
'https://api.example.com/data',
hooks={'response': log_request}
)
# Or attach it to all requests in a session
session = requests.Session()
session.hooks['response'].append(log_request)
You can even modify the response before it returns:
def add_custom_header(response, *args, **kwargs):
# Add a custom attribute to the response object
response.custom_header = response.headers.get('X-Custom-Header', 'Not found')
response = requests.get(
'https://api.example.com/data',
hooks={'response': add_custom_header}
)
print(response.custom_header)
SSL Certificate Verification
By default, requests verifies SSL certificates. This is good—it protects you from man-in-the-middle attacks. But sometimes (like with self-signed certificates in development) you need to disable it:
# Only do this in development!
response = requests.get('https://self-signed.example.com', verify=False)
You'll get a warning. For production, either fix the certificate or point to a certificate bundle:
response = requests.get('https://example.com', verify='/path/to/certfile.pem')
Pro Tips and Best Practices
1. Use Context Managers
Always use context managers for sessions or streaming responses to ensure proper cleanup:
with requests.Session() as session:
response = session.get('https://api.example.com/data')
# Session automatically closed
2. Set Default Timeouts with Sessions
Instead of adding timeout
to every request, set a default:
from requests.adapters import HTTPAdapter
class TimeoutHTTPAdapter(HTTPAdapter):
def __init__(self, timeout=None, *args, **kwargs):
self.timeout = timeout
super().__init__(*args, **kwargs)
def send(self, *args, **kwargs):
kwargs['timeout'] = self.timeout
return super().send(*args, **kwargs)
session = requests.Session()
adapter = TimeoutHTTPAdapter(timeout=5)
session.mount("http://", adapter)
session.mount("https://", adapter)
# All requests now have a 5-second timeout by default
response = session.get('https://api.example.com/data')
3. Keep-Alive Connections
Sessions automatically handle keep-alive connections, but you can verify it's working by checking the headers:
session = requests.Session()
response = session.get('https://api.example.com/data')
print(response.headers.get('Connection')) # Should show 'keep-alive'
4. Handle Rate Limits Gracefully
When you hit a rate limit, respect the Retry-After
header:
import time
response = requests.get('https://api.example.com/data')
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
print(f"Rate limited. Waiting {retry_after} seconds...")
time.sleep(retry_after)
response = requests.get('https://api.example.com/data')
5. Debugging with Verbose Output
Need to see exactly what's being sent? Enable debug logging:
import logging
import http.client as http_client
http_client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
response = requests.get('https://api.example.com/data')
This shows every header, redirect, and byte of data transmitted.
When Requests Isn't Enough
While requests is excellent for most use cases, it has limitations:
No async support: Requests is synchronous. For async HTTP, use httpx
or aiohttp
.
# For async requests, use httpx instead
import httpx
import asyncio
async def fetch_data():
async with httpx.AsyncClient() as client:
response = await client.get('https://api.example.com/data')
return response.json()
asyncio.run(fetch_data())
Limited concurrent requests: For making many concurrent requests, consider using requests-futures
or grequests
.
No HTTP/2: Requests uses HTTP/1.1. If you need HTTP/2, use httpx
.
Wrapping Up
The requests library is the standard for HTTP in Python because it strikes the perfect balance between simplicity and power. Start with basic GET and POST requests, then layer on the advanced features as you need them: sessions for performance, automatic retries for reliability, and streaming for large files.
The techniques covered here—especially automatic retries, proper timeout handling, and sessions—will make your code more robust and production-ready. And that timeout warning? Take it seriously. I've debugged too many hung processes that were waiting indefinitely for a response that never came.
Now go forth and make some HTTP requests. Your APIs are waiting.