On this post we want to show you an easy way that you can use Python notebooks to connect to Google’s Search Console API. After connecting to the API, you will be able to do several interesting things.
The first thing you need is to create a new Oauth Credential in Google Developers Console and select “Other” as type. Google provides detailed information on how to set this up here.
After completing these steps you’ll have a CLIENT_ID and CLIENT_SECRET that you will need to use in this notebook in order to connect to Google Search Console. Dominic Woodman’s post in Moz’s blog shows easy step by step instructions on how you can set this up on Google.
Google provides information on how to use Python to connect to their API, however the code they provide on this page is in Python 2. We went ahead and updated this code to Python 3 and added a few changes so that the credentials are saved so that you don’t need to plug in the verification code each time you run the code.
#!/usr/bin/python ## Uses Python 2 only import httplib2 import requests import logging import pandas as pd import time from tqdm import tqdm import warnings logging.getLogger('googleapiclient.discovery_cache').setLevel(logging.ERROR) logging.getLogger('oauth2client._helpers').setLevel(logging.ERROR) warnings.filterwarnings("ignore") from apiclient import errors from apiclient.discovery import build from oauth2client.client import OAuth2WebServerFlow from oauth2client.file import Storage # Copy your credentials from the console CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXX.apps.googleusercontent.com' CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' # Check https://developers.google.com/webmaster-tools/search-console-api-original/v3/ for all available scopes OAUTH_SCOPE = 'https://www.googleapis.com/auth/webmasters.readonly' # Redirect URI for installed apps REDIRECT_URI = 'urn:ietf:wg:oauth:2.0:oob' # Create a credential storage object. You pick the filename. storage = Storage('gsc_credentials') # Attempt to load existing credentials. Null is returned if it fails. credentials = storage.get() # Only attempt to get new credentials if the load failed. if not credentials: # Run through the OAuth flow and retrieve credentials flow = OAuth2WebServerFlow(CLIENT_ID, CLIENT_SECRET, OAUTH_SCOPE, REDIRECT_URI) authorize_url = flow.step1_get_authorize_url() print 'Go to the following link in your browser: ' + authorize_url code = raw_input('Enter verification code: ').strip() credentials = flow.step2_exchange(code) storage.put(credentials) if storage.get(): print('Credentials saved for later.') # Create an httplib2.Http object and authorize it with our credentials http = httplib2.Http() http = credentials.authorize(http) webmasters_service = build('webmasters', 'v3', http=http)
With the code above you’ll be able to connect with Google Search Console. Now, you’ll be able to do several interesting things by connecting to this API:
- Search Analytics: You’ll be able to extract information from the search analytics report. With the “Query” method you can obtain your search traffic data. You can add filters, date range and parameters in order to extract the data you need. The method returns zero or more rows grouped by the row keys (dimensions) that you define. For more detailed information and examples, please see https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics#resource
- Sitemaps: You are able to delete sitemap from sites that you select, retrieves information about a specific sitemap with the get method, list the sitemaps-entries submitted for a site, or included in the sitemap index file with the list method. You can also submit a new sitemap for a site by using the submit method. For detailed information see: https://developers.google.com/webmaster-tools/search-console-api-original/v3/sitemaps#resource
- URL Crawl Error Counts: From the crawl error report, you are able to retrieve a time series of the number of URL crawl errors per error category (404, soft 404s, 50x errors, etc) and platform (web, smartphone, etc.) with the query method. https://developers.google.com/webmaster-tools/search-console-api-original/v3/urlcrawlerrorscounts#resource
- URL Crawl Errors Samples: From the same crawl error report, you can retrieve details about crawl errors for a site’s sample URL. Here is a great example of that JR Oakes did where he uses the API to find through the crawl errors pages that are linked externally and should be redirected.. You can also extract a list a site’s sample URLs for the specified crawl error category and platform. It is also possible to mark crawl errors as fixed and removes it from the sample list. https://developers.google.com/webmaster-tools/search-console-api-original/v3/urlcrawlerrorssamples#resource
As you can see there are several things that can be done when connecting to Google’s search console.
You might also be interested in reading JR’s post about saving Google Search Console data to BigQuery.
If you have other examples you would like to share, please link to them on the comments.