One of the limitations of exporting data from Google Analytics is that you can only export 5,000 rows of data at a time. If you have more than 5,000 rows of data, you’ll have to make multiple exports my moving to the next page in reports:
There are a couple of ways that allows users to export more than 5000 rows, like:
- Changing the number from 5000 at the end of the URL to 50000 or other number to get larger reports, which doesn’t work anymore.
- The other option is installing the Google Analytics Sheets Add-On to pull 10,000 rows.
- Using the Google Analytics API to pull more data directly from Google Analytics.
In this post we’re providing a guide on how you can connect to Google’s API to extract different reports. My colleague Grayson Parks created the notebook that we’ll explain in this post. Find him on twitter at @GraysonParks.
Python Notebook that Connects to Google Analytics API
This notebook allows you generate simple queries at scale utilizing the Google Analytics API. The underlying architecture is based on this library by Stijn Debrouwere. You can click on the link to read the documentation for more information on authentication, metrics and more.
Below are the required packages that needed for this notebook to run.
#Required Libraries !pip install pandas as pd !pip install googleanalytics import pandas as pd import googleanalytics as ga
First, we’ll provide the name of the file we want to output to. This file must be a CSV.
file_out = 'all_data.csv'
Google Analytics Credentials
In the next cell, we will add the credentials that are provided in the Google Developers Console. Visit this section on authentication for more information on how to register a new API project/app with Google and get these credentials. You can also visit the official instructions from Google on how to create these credentials.
identity = "INSERT IDENTITY" client_id = "INSERT ID" client_secret = "INSERT SECRET"
The code below authenticates the process using the above Google Analytics Credentials
try: accounts = ga.authenticate( client_id=client_id, client_secret=client_secret, identity=identity, interactive=True, save=True ) except Exception as e: print('An error occured', str(e))
Account and Profile information
Next, we need to provide the account name, account ID and profile ID of the property we want to extract the data from. All items need to be strings.
- Account name is the primary name of the Google Analytics property
- When you are in a profile in the Google Analytics portal, there is string in the URL that looks something like this: a1234567w123456789p123456789
- Account ID is the string of numbers after the ‘a’
- Profile ID is the string of numbers after the ‘p’
#Choose the client account_name = "ADD NAME" account_id = "ADD ACCOUNT" profile_id = "ADD PROFILE"
Next we need to provide date paramters for the query, this will tell Google Analytics the starting and ending points of the data that we want to extract. Dates must be strings in the form of YYYYMMDD. Intervals must be exactly one of the following:
#Enter Date Parameters interval = 'Monthly' start_date = '20180101' end_date = '20180801'
Provide metrics, dimensions and segments as strings. Metrics are required, though dimensions and segments are optional. However, usually you’ll want to specify a dimension like Pages since having only the metrics without a dimension is not as useful. If you don’t want to use any dimensions or segments, you need to use “None” like in the example below. Multiple metrics/dimensions/parameters can be provided, you’ll have to use commas to separate them.
metrics = ‘Pageviews’, ‘BounceRate’
dimensions = None
segments = ‘Organic Traffic’
Metrics and dimensions can be specified using the internal ID, the slug or the (case-insensitive) human-readable name. Visit the Dimensions and Metrics reference for more information. Below we’re providing an example of a report that will pull a report of Pages getting organic traffic, with the metrics Page views and Bounce rate:
#Enter Metric Parameters metrics = 'Pageviews','BounceRate' dimensions = 'ga:pagePath' segments = 'Organic Traffic'
The code below runs the queries based on the above parameters
try: #Converts parameters to tuples if isinstance(metrics, str): metrics = (metrics,) if isinstance(dimensions, str): dimensions = (dimensions,) if isinstance(segments, str): segments = (segments,) profile_found = False # Goes throught the accounts until it finds the correct profile for account in accounts: if int(account.id) == int(account_id): web_properties = account.webproperties for web_property in web_properties: profiles = web_property.profiles for profile in profiles: if int(profile.id) == int(profile_id): #Profile found profile_found = True print("Found profile: ",account_name,web_property.id,profile_id) #Selects the query based on the interval parameter if interval == 'Daily': query = profile.core.query.daily(start_date,end_date) elif interval == 'Weekly': query = profile.core.query.weekly(start_date,end_date) elif interval == 'Monthly': query = profile.core.query.monthly(start_date,end_date) elif interval == 'Yearly': query = profile.core.query.yearly(start_date,end_date) else: print("Error: Invalid Interval") break #Selects type of query based on the metric parameters if metrics and dimensions and segments: data = query.segment(*segments).metrics(*metrics).dimensions(*dimensions).get() elif metrics and dimensions: data = query.metrics(*metrics).dimensions(*dimensions).get() elif metrics and segments: data = query.segment(*segments).metrics(*metrics).get() elif metrics: data = query.metrics(*metrics).get() else: print("Error: Invalid Metrics") break df = data.as_dataframe() if profile_found is False: print("Profile was not found!") except Exception as e: print('An error occured.', str(e))
Finally, we want to send the results to the specified file name that we created in the first cell.
We hope this notebook is useful and let us know if you had any issues running this notebook. We’re happy to reply and help you get the reports you need.