Parsing Google’s Search Results by Keywords: A Detailed Guide to Building a Free Google Parser

от автора

Any SEO expert knows the pain of collecting Google keyword data. It’s one thing if you can count all the queries on one hand, but what if they number in the thousands? How do you check the search volume in Google for each keyword? Frankly, once you hit tens of thousands of keywords, it’s enough to make your head spin. You’ll be tempted to reach for outdated, familiar tools, only to find modern reality throwing a curveball: the old formula of Key Collector + Google Ads + a few proxies simply doesn’t cut it anymore. We’re entering a new era, and without direct access to the official API, things get grim and complicated fast.

But every cloud has a silver lining. If you’re ready to dive into coding acrobatics, a bit of ritual dancing, and a dash of Google bureaucracy, you can likely snag direct access to the trove of data within the Keyword Planner API—a goldmine for SEO enthusiasts craving a real “Google Wordstat” (admit it, we all secretly want that). In this article, I’ll show you a step-by-step guide for ditching overpriced external services (300 bucks a month—really?!) in favor of building your own tool on top of the Google Ads API. Let’s roll!

And yes, there’s no actual “Google Wordstat” out there—those stories are SEO campfire tales. Sure, you could force-fit the Google Keyword Planner into that role, but they’re too different to compare side by side…

A Google Parser, or Why You Need Access to the Google Keyword Planner

Let’s broaden our view. If you’re looking to assemble a complete Google semantic core, you’ll need a decent batch of queries (it’s definitely more than 20!). Key Collector once did the trick but is less helpful now. This situation often forces people onto paid equivalents like Keyword Tool—costly services that don’t really stand out, other than by replicating Google Ads data through their own interface or API. But we want direct access and the ability to automate our process however we see fit.

Though Key Collector may be less relevant for extracting new data, you can still use it to process an existing semantic core: it does a solid job with sorting, cleaning, clustering, and integrating with external services. You likely won’t need residential proxies or CAPTCHA-solving services for that; a tool like KeySo might be handy, but that’s another story.

Keyword Tool itself requires a hefty fee, and Ubersuggest is basically a lightweight spin on Keyword Tool—lightweight because it doesn’t offer an API.
Long story short: if you want an official solution that lets you code your own data-gathering logic, your best bet is the Google Ads API—preferably doing it yourself rather than paying for middleman services that run $300–$1,000 a month.

Building a Google Parser via the API: Where It All Begins—Your Developer Token and Ad Account

Want to pull in 40,000+ keywords? Go for it! But first, there’s some formal stuff to handle. Sure, parts of this might feel like red tape, but once you get through it, you’ll be rewarded with the keyword data stash you once only dreamed of.

You need a Google Ads account—not an empty one, but one that’s actually spent at least some ad budget. Without real expenses, the Keyword Planner stays locked down. Getting a functioning ad account is beyond the scope of this article, but it’s required if you want to access the Keyword Planner as the backbone of your parser.

Create a Manager Account. Go to the Manager Accounts (MCC) page, sign up, and once you’re in settings, check the left-hand sidebar for “API Center.” That’s where you’ll find your Developer token—grab it and note it down.

Google Cloud Console: Client ID and Client Secret—You Can’t Parse SERPs Without These

Moving on. You’ll need to authenticate with the Google Ads API using OAuth, which means registering an application in the Google Cloud Console:

  1. Sign into the Google Cloud Console under the same Google account that manages your ad account.

  2. Create a new project (name it whatever you like).

  3. In the left-hand menu, go to APIs & Services → Library.

  4. Search for Google Ads API, then click Enable.

  5. Then go to APIs & Services → Credentials.

  6. Click “Create Credentials” → “OAuth client ID.”

  7. Choose the application type “Web application.”

  8. Give it a name (maybe “my-ads-api,” or anything else).

  9. Under Authorized redirect URIs, add:

http://localhost:8081/ http://localhost:8081

10. Hit “Create.” You’ll get two important values: Client ID and Client Secret.

Concerning those two nearly identical redirect URIs: I experimented with multiple setups. The one with the trailing slash worked for me, but I kept the version without the slash just in case it might work differently for someone else.

Getting a refresh_token for Google Ads—The Key to a Functional Google Parser

So your code can automatically call Google Ads, it needs a refresh_token—a long-lived key that updates session tokens behind the scenes without manual logins.
Let’s generate it (and by the way, this took me the longest, aside from Google’s official approval process, but more on that shortly).

  1. Install the Python library:

pip install google-ads

Create a .py file (e.g., get_refresh_token.py) and paste this code:

import logging from google.auth.transport.requests import Request from google_auth_oauthlib.flow import InstalledAppFlow  logging.basicConfig(level=logging.DEBUG)  def generate_refresh_token(client_id, client_secret):     scopes = ["https://www.googleapis.com/auth/adwords"]      flow = InstalledAppFlow.from_client_config(         {             "installed": {                 "client_id": client_id,                 "client_secret": client_secret,                 "auth_uri": "https://accounts.google.com/o/oauth2/auth",                 "token_uri": "https://oauth2.googleapis.com/token",                 "redirect_uris": ["http://localhost:8081/"]  # Notice the trailing slash             }         },         scopes,     )      auth_url, state = flow.authorization_url(         access_type="offline",         include_granted_scopes="true",         prompt='consent'     )     print(f"Authorization URL: {auth_url}")      credentials = flow.run_local_server(port=8081, state=state)     print(f"Your Refresh Token: {credentials.refresh_token}")  generate_refresh_token(     "Client ID",     "Client secret" )

Don’t forget to insert your Client ID and Client Secret at the bottom of the file.

Run:

python get_refresh_token.py
  1. In the console, you’ll see an authorization link. Open it in your browser (it might open automatically, but if not, just copy and paste). Choose the ad account you want, and when it’s done, your console will show a refresh_token. Copy and save it somewhere safe.

The google-ads.yaml Config File—The Core of a Functional Google Parser

What was all the fuss for? To feed the data into a config file that your keyword-collection script will rely on. Let’s build it with the info we gathered. Create or edit a file called google-ads.yaml:

developer_token: "YOUR_DEVELOPER_TOKEN" client_id: "YOUR_CLIENT_ID" client_secret: "YOUR_CLIENT_SECRET" refresh_token: "YOUR_REFRESH_TOKEN" login_customer_id: "YOUR_MANAGER_ACCOUNT_ID"  # Ad account ID

Keep in mind: login_customer_id should be your actual ad account ID (the one where your ads run), not your MCC’s ID. Don’t mix them up!
I haven’t tested using the manager account ID. If you do that, make sure it’s the same ID used in your main script (also without dashes, just digits in a row).

Getting a Higher Access Level (Otherwise, Your Google Parser Remains a Pipe Dream)

Now for the main gotcha: even if you do everything above, you’ll run into Test Access restrictions. To retrieve real keyword data, you need to bump it up to Basic Access. Here’s how:

  1. Go back to the API Center of your manager account (MCC).

  2. You’ll see you have Test Access. Click that status—there should be an option to Apply for Basic access.

  3. Fill out the form, explaining that you’re using the API for your own advertising analysis or internal automated data gathering.

  4. Wait for approval (up to 3 days).

If Google finds your request valid, you’ll get access. In most cases, if your ad account is genuinely active, you’re golden.
One more important note: your manager account must be linked to your ad account. Head to “Accounts” in the left menu; if your ad account isn’t there, click Add and enter the account number you want to link. You’ll get an email notification—confirm it, and you’re all set.

At Last, Collecting Keywords: A Python Script for Parsing Google

Below is a snippet for gathering data via the official Google Ads Keyword Planner API. Briefly, the script reads a list of keywords from a CSV, splits them into batches of 10, sends them to the API for monthly traffic stats (average searches, competition, bid ranges), then saves everything into another CSV file:

import csv import time from google.ads.googleads.client import GoogleAdsClient  def chunk_list(lst, n):     for i in range(0, len(lst), n):         yield lst[i:i+n]  def main():     # Load the client from the config file     client = GoogleAdsClient.load_from_storage("google-ads.yaml")          # Keyword ideas service     keyword_plan_idea_service = client.get_service("KeywordPlanIdeaService")          # Put your Customer ID here (no dashes, e.g. "1234567890")     customer_id = "YOUR_CUSTOMER_ID"          # Read keywords from a CSV     keywords = []     with open("keywords.csv", "r", encoding="utf-8") as f:         reader = csv.DictReader(f)         for row in reader:             kw = row['keyword'].strip()             if kw:                 keywords.append(kw)          chunk_size = 10      with open("keyword_data.csv", "w", newline="", encoding="utf-8") as outfile:         writer = csv.writer(outfile)         writer.writerow([             "keyword",             "avg_monthly_searches",             "competition",             "low_top_of_page_bid_micros",             "high_top_of_page_bid_micros"         ])                  for chunk in chunk_list(keywords, chunk_size):             # Build the request             request = client.get_type("GenerateKeywordIdeasRequest")             request.customer_id = customer_id              # Add language and geo:             # France             request.geo_target_constants.append("geoTargetConstants/2250")             # French             request.language = "languageConstants/1010"              # Insert the keywords             request.keyword_seed.keywords.extend(chunk)              # Send the request             response = keyword_plan_idea_service.generate_keyword_ideas(request=request)                          for idea in response.results:                 text = idea.text                 metrics = idea.keyword_idea_metrics                 avg_searches = metrics.avg_monthly_searches if metrics.avg_monthly_searches else 0                 competition = metrics.competition.name if metrics.competition else "UNSPECIFIED"                 low_bid = metrics.low_top_of_page_bid_micros if metrics.low_top_of_page_bid_micros else 0                 high_bid = metrics.high_top_of_page_bid_micros if metrics.high_top_of_page_bid_micros else 0                                  writer.writerow([                     text,                     avg_searches,                     competition,                     low_bid,                     high_bid                 ])              # Pause to avoid hitting API limits             time.sleep(1)  if __name__ == "__main__":     main()

How It Works

  • Your keywords.csv needs a keyword column for your input queries (make sure the header is literally keyword).

  • The script reads the CSV, batches the keywords into groups of 10 to avoid exceeding API limits.

  • For each batch, a GenerateKeywordIdeasRequest is built.

  • You specify geoTargetConstants (2250 is France, but you can swap in your own region) and languageConstants (1010 is French, also swappable). Adjust to your locale if you need another country or language. (For searching the right region or country, see the file at https://developers.google.com/google-ads/api/data/geotargets.)

  • The resulting keyword ideas (including synonyms or expanded phrases) get written to keyword_data.csv with metrics (search volume, competition, etc.).

  • time.sleep(1) ensures a one-second delay between requests so you don’t risk slamming Google’s limits.

A Few Clarifications on Language Codes and Limits

First, about language codes: I couldn’t find a straightforward list of numeric IDs. (If you’ve got a direct link, please share!) So here’s a script that fetches them from the API (though you’ll still need a working account for it):

from google.ads.googleads.client import GoogleAdsClient  def main():     # Load the client from the config file     client = GoogleAdsClient.load_from_storage("google-ads.yaml")      # Service for GAQL queries     ga_service = client.get_service("GoogleAdsService")      customer_id = "YOUR_CUSTOMER_ID"  # insert your Customer ID without dashes     query = """     SELECT language_constant.id, language_constant.code, language_constant.name, language_constant.targetable     FROM language_constant     ORDER BY language_constant.id     """      # Run a streaming query     response = ga_service.search_stream(customer_id=customer_id, query=query)      for batch in response:         for row in batch.results:             language = row.language_constant             print(f"ID: {language.id}, Code: {language.code}, Name: {language.name}, Targetable: {language.targetable}")  if __name__ == "__main__":     main()

Save that code in the same folder as google-ads.yaml (and make sure to insert your own Customer ID). You’ll get a list of language IDs, codes, names, and whether they’re targetable.

About limits:

  • With Basic Access, you have 15,000 daily API calls. Since each call can handle up to 10 keywords, that’s 150,000 keywords per day.

  • On Standard Access, the limits are far looser, but you’ll have to prove your loyalty and spotless reputation to Google (avoid overloading the API, spend more on ads, etc.). In short: YOU NEED MORE MONEY (that’s just how Google rolls).

In Closing

The main “gotcha” of this entire setup is that you need approval for Basic Access in the Google Ads API. No token or Python script can save you if Google doesn’t grant that higher level. Generally, if your ad account is truly active, Google won’t refuse. Once you get Basic Access, you’re good to go!

I hope this article helps everyone aiming to build their own semantic collection tools—without shelling out crazy amounts of cash for third-party services. Just remember to watch your limits, pace your API requests, and keep your application forms neat!

Good luck with your experiments, folks! May your JSON responses always be accurate and your CPCs forever sane!


ссылка на оригинал статьи https://habr.com/ru/articles/869386/


Комментарии

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *