Technical SEO

How to Export More Than 5,000 Rows of Data from Google Analytics

  • Written by Robert Padgett Cooper

One of the limitations of exporting data from Google Analytics is that you can only export 5,000 rows of data at a time. If you have more than 5,000 rows of data, you’ll have to make multiple exports my moving to the next page in reports:

Google Analytics pagination

There are a couple of ways that allows users to export more than 5000 rows, like:

  • Changing the number from 5000 at the end of the URL to 50000 or other number to get larger reports, which doesn’t work anymore.
  • The other option is installing the Google Analytics Sheets Add-On to pull 10,000 rows.
  • Using the Google Analytics API to pull more data directly from Google Analytics.

In this post we’re providing a guide on how you can connect to Google’s API to extract different reports. My colleague Grayson Parks created the notebook that we’ll explain in this post. Find him on twitter at @GraysonParks.

Python Notebook that Connects to Google Analytics API

This notebook allows you generate simple queries at scale utilizing the Google Analytics API. The underlying architecture is based on this library by Stijn Debrouwere. You can click on the link to read the documentation for more information on authentication, metrics and more.

Below are the required packages that needed for this notebook to run.

#Required Libraries
!pip install pandas as pd
!pip install googleanalytics
import pandas as pd
import googleanalytics as ga

 

Output File

First, we’ll provide the name of the file we want to output to. This file must be a CSV.

file_out = 'all_data.csv'

 

Google Analytics Credentials

In the next cell, we will add the credentials that are provided in the Google Developers Console. Visit this section on authentication for more information on how to register a new API project/app with Google and get these credentials. You can also visit the official instructions from Google on how to create these credentials.

identity = "INSERT IDENTITY"
client_id = "INSERT ID"
client_secret = "INSERT SECRET"

 

Authenticate

The code below authenticates the process using the above Google Analytics Credentials

try:
    accounts = ga.authenticate(
      client_id=client_id, 
      client_secret=client_secret, 
      identity=identity,  
      interactive=True,
      save=True
    )
except Exception as e:
    print('An error occured', str(e))

 

Account and Profile information

Next, we need to provide the account name, account ID and profile ID of the property we want to extract the data from. All items need to be strings.

  • Account name is the primary name of the Google Analytics property
  • When you are in a profile in the Google Analytics portal, there is string in the URL that looks something like this: a1234567w123456789p123456789
  • Account ID is the string of numbers after the ‘a’
  • Profile ID is the string of numbers after the ‘p’
#Choose the client
account_name = "ADD NAME"
account_id = "ADD ACCOUNT"
profile_id = "ADD PROFILE"

 

Time Parameters

Next we need to provide date paramters for the query, this will tell Google Analytics the starting and ending points of the data that we want to extract. Dates must be strings in the form of YYYYMMDD. Intervals must be exactly one of the following:

  • Daily
  • Weekly
  • Monthly
  • Yearly
#Enter Date Parameters
interval = 'Monthly'
start_date = '20180101'
end_date = '20180801'

 

Metric Parameters

Provide metrics, dimensions and segments as strings. Metrics are required, though dimensions and segments are optional. However, usually you’ll want to specify a dimension like Pages since having only the metrics without a dimension is not as useful.  If you don’t want to use any dimensions or segments, you need to use “None” like in the example below. Multiple metrics/dimensions/parameters can be provided, you’ll have to use commas to separate them.

Example:
metrics = ‘Pageviews’, ‘BounceRate’
dimensions = None
segments = ‘Organic Traffic’

Metrics and dimensions can be specified using the internal ID, the slug or the (case-insensitive) human-readable name. Visit the Dimensions and Metrics reference for more information. Below we’re providing an example of a report that will pull a report of Pages getting organic traffic, with the metrics Page views and Bounce rate:

#Enter Metric Parameters
metrics = 'Pageviews','BounceRate'
dimensions = 'ga:pagePath'
segments = 'Organic Traffic'

 

Run Query

The code below runs the queries based on the above parameters

try:
    #Converts parameters to tuples
    if isinstance(metrics, str):
        metrics = (metrics,)
    if isinstance(dimensions, str):
        dimensions = (dimensions,)
    if isinstance(segments, str):
        segments = (segments,)

    profile_found = False    

    # Goes throught the accounts until it finds the correct profile   
    for account in accounts:
        if int(account.id) == int(account_id):
            web_properties = account.webproperties
            for web_property in web_properties:
                profiles = web_property.profiles
                for profile in profiles:
                    if int(profile.id) == int(profile_id):
                        #Profile found
                        profile_found = True    
                        print("Found profile: ",account_name,web_property.id,profile_id)

                        #Selects the query based on the interval parameter
                        if interval == 'Daily':
                            query = profile.core.query.daily(start_date,end_date)
                        elif interval == 'Weekly':
                            query = profile.core.query.weekly(start_date,end_date)
                        elif interval == 'Monthly':
                            query = profile.core.query.monthly(start_date,end_date)
                        elif interval == 'Yearly':
                            query = profile.core.query.yearly(start_date,end_date)
                        else:
                            print("Error: Invalid Interval")
                            break

                        #Selects type of query based on the metric parameters
                        if metrics and dimensions and segments:
                            data = query.segment(*segments).metrics(*metrics).dimensions(*dimensions).get()
                        elif metrics and dimensions:
                            data = query.metrics(*metrics).dimensions(*dimensions).get()
                        elif metrics and segments:
                            data = query.segment(*segments).metrics(*metrics).get()
                        elif metrics:
                            data = query.metrics(*metrics).get()
                        else:
                            print("Error: Invalid Metrics")
                            break
                        df = data.as_dataframe()
    if profile_found is False:
        print("Profile was not found!")
    
except Exception as e:
    print('An error occured.', str(e))

 

Output

Finally, we want to send the results to the specified file name that we created in the first cell.

df.to_csv(file_out,encoding="utf-8")
df.head()

We hope this notebook is useful and let us know if you had any issues running this notebook. We’re happy to reply and help you get the reports you need.

Adapt is now Locomotive. Same award-winning team, new name.
Learn More
close-image