If you are working on Python-based projects that need web-scraping or quick search results from Google, you are probably come across the googlesearch library. It hasa popular resource, especially for those who want to automate the process of searching the web.
But, as a content strategist and developer, I have seen a few recurring issues that can trip you up.
In this blog article we will dive into how to use the googlesearch Python library, some of the most common errors, and how to fix them.
We will also throw in a few code snippets to make it tangible.
1. What is the googlesearch Python Library?
googlesearch (sometimes referred to as googlesearch-python) is a lightweight library that acts as a wrapper, letting you run Google searches from within your Python scripts.
It helps you:
- Automate queries
- Retrieve URLs
- Integrate search data into your projects
- Save time, because you don't have to repeatedly scrape raw HTML for every link.
Its basic concept is straightforward: you import the library, you pass in a query, and you get back a list of search results in a neat format.
But there are some gotchas.
We will get to those in just a bit.
2. Installing googlesearch
Before you begin, you should have Python installed and ready.
You can install googlesearch using pip.
Typically, you will do:
pip install google
But wait.
Not all variations of google or googlesearch libraries are the same; there can be confusion around naming.
There has a library just called google , but also others like googlesearch-python or google-search. In recent times, the official PyPI listing generally appears as google, with the module name being googlesearch.
Be mindful that these naming overlaps might lead to version mismatches (we will talk about that soon).
3. Basic Usage & A Simple Code Demo
Once installed, you can do something like:
from googlesearch import search
query = "python tutorials"
for result in search(query, num_results=5, lang="en"):
print(result)
This snippet runs a Google search for "python tutorials", returning up to five results in English.
Those results usually come back as URLs.
That has the simplest form of usage.
Let us break it down:
1. from googlesearch import search: We import the main function.
2. query: The string that we want to run as a search.
3. search(query, num_results=5, lang="en"): This goes out to Google, performs the search, and yields a list or generator of results.
4. print(result): We iterate over them, printing each.
4. Common Error #1: NoModuleNotFoundError: No module named Googlesearch
One of the biggest problems people face is the dreaded No module named googlesearch error.
Here is how it typically occurs:
Traceback (most recent call last):
File "test.py", line 1, in <module>
from googlesearch import search
ModuleNotFoundError: No module named 'googlesearch'
Why it happens:
You installed the wrong package name (google vs googlesearch vs google-search).
You installed with pip, but you are running in a different Python environment that does not have googlesearch installed.
The package is no longer maintained or available on your path.
Quick Fixes:
Double-check which package you actually installed by running pip show google or pip freeze to see if google is listed.
If you have multiple Python environments, confirm with
python --version
that you are not installing in the same environment you run your code.
Sometimes, an older version of the library might not be recognized in Python 3.9+ environments.
Upgrade if needed:
pip install --upgrade google
As a fallback, you might consider using alternative libraries that replicate the same functionality, or build a custom solution with requests + BeautifulSoup.
5. Common Error #2: HTTP 429: Too Many Requests
It happens more often than you think.
You run a loop with 100 queries, and Google smacks you with a 429 error for too many requests.
The error message might show up as:
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
Why it happens:
Google sees repeated pattern-like queries from your script, suspects a bot, and throttles or blocks you.
You run the search function in quick succession with no waiting or rotating user-agents.
You are on a network or IP that Google has flagged.
Possible Workarounds:
Throttle your queries with small sleeps in between. Something like:
import time
for result in search(query, num_results=5):
print(result)
time.sleep(2) # sleep for 2 seconds
If you consistently scrape large volumes, consider an official API or a third-party service that respects Google TOS.
For certain projects, you might have to use rotating proxies or user-agents, but be mindful of compliance and ethical scraping guidelines.
In most personal or small-scale projects, a small time.sleep() often does the trick.
Use Proxies:
We especially developed our Proxies around this topic of scraping the web. Top tier companies just like you are using our service. From YC backed companies, up to bootstrapped companies.
from googlesearch import search
proxy = 'http://USERNAME:PASSWORD@proxy.host.com:3219/'
j = search("proxy test", num_results=25, lang="en", proxy=proxy, ssl_verify=False)
for i in j:
print(i)
With Proxies you can scale up your application or even lower down your proxies.
SMALL SELF PROMO: Check out our Proxies at roundproxies.com.
6. Common Error #3: Inconsistent Resultws, or Different Results Each Run
For some queries, you might get different sets of URLs on repeated runs, even though your code didn not change.
This can be unsettling.
One moment you see 5 URLs about Python, the next you see 3 or 4 different ones.
Why it happens:
- Google frequently personalizes or rotates results, based on server location or local data center.
- The library itself might not be pinned to a version that standardizes results.
- Language or region parameters are not set or are stale.
How to fix it:
Manually specify lang or tld for more stable results. For instance:
search(query, num_results=5, lang="en", tld="com")
Try using advanced query operators, e.g. site: or filetype:, to reduce the chance of random variations.
Accept that Google is dynamic. For certain tasks, absolute consistency from day to day might be unrealistic.
7. Common Error #4: SSL Certificate Issues
Every so often, you might see an ssl.SSLError or certificate verify failed error.
Here is a snippet of how it might look:
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
Why it happens:
Python can not verify Google SSL certificate if your local certificate store is outdated.
On Mac or Windows, your default SSL store might not be recognized by Python, especially if you installed Python in a less conventional way.
The library you installed might have a pinned SSL or a dependency that conflicts with your environment.
How to fix it:
If you are on Mac, you can run
/Applications/Python\ 3.X/Install\ Certificates.command
to re-link or update certificates.
On Windows, make sure your OS is fully updated or install the certifi package:
pip install certifi
As a temporary measure (though it is not secure), you can disable SSL verification. But be aware of the risks.
Run now again:
results = list(search(search_query, region="us", advanced=True, ssl_verify=True))[:5]
8. Common Error #5: 202 Ratelimited
The 202 Ratelimit error is a common error for classic proxy errors.
Most of the time this is due to poor proxy rotation script, I have good success with version of a in-depth proxy rotation system:
import aiohttp
import asyncio
from typing import List, Dict, Optional
import random
from dataclasses import dataclass
from datetime import datetime, timedelta
@dataclass
class Proxy:
url: str
last_used: datetime
fails: int
response_time: float
class ProxyRotator:
def __init__(self, min_delay: int = 3):
self.proxies: List[Proxy] = []
self.min_delay = min_delay
self.lock = asyncio.Lock()
async def add_proxies(self, proxy_list: List[str]):
async with self.lock:
for proxy in proxy_list:
self.proxies.append(Proxy(
url=proxy,
last_used=datetime.min,
fails=0,
response_time=0
))
async def get_proxy(self) -> Optional[str]:
async with self.lock:
available = [
p for p in self.proxies
if (datetime.now() - p.last_used).seconds > self.min_delay
and p.fails < 3
]
if not available:
return None
proxy = min(available, key=lambda p: (p.fails, p.response_time))
proxy.last_used = datetime.now()
return proxy.url
async def report_failure(self, proxy_url: str):
async with self.lock:
for proxy in self.proxies:
if proxy.url == proxy_url:
proxy.fails += 1
break
async def report_success(self, proxy_url: str, response_time: float):
async with self.lock:
for proxy in self.proxies:
if proxy.url == proxy_url:
proxy.response_time = response_time
proxy.fails = max(0, proxy.fails - 1)
break
async def make_request(self, url: str) -> Optional[str]:
while True:
if proxy := await self.get_proxy():
try:
start = datetime.now()
async with aiohttp.ClientSession() as session:
async with session.get(url, proxy=proxy, timeout=10) as response:
if response.status == 200:
elapsed = (datetime.now() - start).total_seconds()
await self.report_success(proxy, elapsed)
return await response.text()
except Exception:
await self.report_failure(proxy)
continue
await asyncio.sleep(self.min_delay)
async def main():
rotator = ProxyRotator()
proxies = [
"http://user:pass@ip1:port",
"http://user:pass@ip2:port"
]
await rotator.add_proxies(proxies)
tasks = [
rotator.make_request("https://example.com")
for _ in range(5)
]
results = await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(main())
9. Potential Deprecations & Alternatives
The googlesearch library has changed maintainers a few times.
It means certain versions might not be stable or fully up to date with Google scraping changes.
If you find that your searches are returning empty results or partial data, it might be time to check the library’s GitHub for updates.
Alternatives can include:
Official Google Search API, though it can require an API key and has usage quotas.
from googleapiclient.discovery import build
import pprint
my_api_key = "Google API key"
my_cse_id = "Custom Search Engine ID"
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
return res['items']
results = google_search(
'stackoverflow site:en.wikipedia.org', my_api_key, my_cse_id, num=10)
for result in results:
pprint.pprint(result)
You can find here a good Stackoverflow solution from the community for the commercial Search API from Google.
Use DuckDuckGo Search:
from duckduckgo_search import DDGS
results = DDGS().text("python programming", max_results=5)
print(results)
You can find more infos and a full guide about DuckDuckGo search.
Building your own mini-tool with requests and parsel or BeautifulSoup, though it has more manual and subject to monthly Google changes.
Here is a full working version:
import requests
import json
import os
from bs4 import BeautifulSoup
from datetime import datetime
from typing import Optional, List, Dict
from itertools import cycle
import time
import random
class ProxyManager:
def __init__(self, proxy_file: str = "proxies.txt"):
self.proxies = self._load_proxies(proxy_file)
self.proxy_cycle = cycle(self.proxies) if self.proxies else None
# Load proxies in a proxies.txt file from roundproxies.com for maximum quality
def _load_proxies(self, proxy_file: str) -> List[Dict[str, str]]:
if not os.path.exists(proxy_file):
return []
proxies = []
with open(proxy_file) as f:
for line in f:
try:
ip, port, user, pwd = line.strip().split(':')
proxy_str = f"http://{user}:{pwd}@{ip}:{port}"
proxies.append({"http": proxy_str, "https": proxy_str})
except ValueError:
continue
return proxies
def get_next_proxy(self) -> Optional[Dict[str, str]]:
return next(self.proxy_cycle) if self.proxy_cycle else None
class GoogleSearchScraper:
def __init__(self, output_file: str = "results.json", proxy_file: str = "proxies.txt"):
self.output_file = output_file
self.proxy_manager = ProxyManager(proxy_file)
self.session = requests.Session()
self.session.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
def _make_request(self, url: str, retries: int = 5) -> Optional[str]:
for _ in range(retries):
if not (proxy := self.proxy_manager.get_next_proxy()):
return None
try:
time.sleep(random.uniform(2, 5))
response = self.session.get(url, proxies=proxy, timeout=30)
if response.status_code == 200:
return response.text
if response.status_code == 429:
time.sleep(random.uniform(5, 10))
except requests.RequestException:
continue
return None
def _parse_results(self, html: str) -> List[Dict[str, str]]:
if not html:
return []
soup = BeautifulSoup(html, "html.parser")
results = []
for div in soup.select('div.g')[:5]:
if title := div.select_one('h3'):
results.append({
"title": title.text,
"url": div.select_one('a')['href']
})
return results
def scrape(self, query: str) -> List[Dict[str, str]]:
url = f"https://www.google.com/search?q={query.replace(' ', '+')}"
if html := self._make_request(url):
return self._parse_results(html)
return []
def main():
scraper = GoogleSearchScraper()
# Enter here your custom search term instead of "Python Tutorials
results = scraper.scrape("python tutorials")
print(json.dumps(results, indent=2))
if __name__ == "__main__":
main()
For many casual scenarios, the googlesearch library is still a decent and quick solution, so long as you are aware of the possible hurdles.
Final Thoughts
Using the googlesearch library is a convenient way to automate search tasks, but it does not come without its quirks.
You can inadvertently run into throttling, environment mismatches, or those pesky SSL issues.
If you keep these fix-it tips in mind, your experience should be smoother.
Good luck with your Google queries, and remember that the best approach is always to experiment and see how the library handles your specific tasks.
If you hit anything beyond the errors here, it is worth exploring communities on GitHub or Stack Overflow.
They are pretty good with additional debugging stories that might fit your unique scenario.