Back Arrrow Go Back

How to Export More Than 5,000 Rows of Data from Google Analytics

Illustration of laptop and smartphone with website analytics on the screen

 

One of the limitations of exporting data from Google Analytics is that you can only export 5,000 rows of data at a time. If you have more than 5,000 rows of data, you’ll have to make multiple exports by moving to the next page in reports:

Google Analytics pagination

There are a couple of ways for users to export more than 5,000 rows:

  • Change the number from 5,000 at the end of the URL to 50,000 or other number to get larger reports — except that doesn’t work anymore.
  • Install the Google Analytics Sheets Add-On to pull 10,000 rows.
  • Use the Google Analytics API to pull more data directly from Google Analytics.

In this post we’re providing a guide on how you can connect to Google’s API to extract different reports. My colleague Grayson Parks created the notebook that we’ll explain in this post. Find him on twitter at @GraysonParks.

Python Notebook that Connects to Google Analytics API

This notebook allows you to generate simple queries at scale utilizing the Google Analytics API. The underlying architecture is based on this library by Stijn Debrouwere. You can click on the link to read the documentation for more information on authentication, metrics, and more.

Below are the required packages needed for this notebook to run.

#Required Libraries
!pip install pandas as pd
!pip install googleanalytics
import pandas as pd
import googleanalytics as ga

 

Output File

First, we’ll provide the name of the file we want to output to. This file must be a CSV.

file_out = 'all_data.csv'

 

Google Analytics Credentials

In the next cell, we will add the credentials that are provided in the Google Developers Console. Visit this section on authentication for more information on how to register a new API project/app with Google and get these credentials. You can also visit the official instructions from Google on how to create these credentials.

identity = "INSERT IDENTITY"
client_id = "INSERT ID"
client_secret = "INSERT SECRET"

 

Authenticate

The code below authenticates the process using the above Google Analytics Credentials

try:
    accounts = ga.authenticate(
      client_id=client_id, 
      client_secret=client_secret, 
      identity=identity,  
      interactive=True,
      save=True
    )
except Exception as e:
    print('An error occured', str(e))

 

Account and Profile information

Next, we need to provide the account name, account ID, and profile ID of the property we want to extract the data from. All items need to be strings.

  • Account name is the primary name of the Google Analytics property
  • When you are in a profile in the Google Analytics portal, there is a string in the URL that looks something like this: a1234567w123456789p123456789
  • Account ID is the string of numbers after the ‘a’
  • Profile ID is the string of numbers after the ‘p’
#Choose the client
account_name = "ADD NAME"
account_id = "ADD ACCOUNT"
profile_id = "ADD PROFILE"

 

Time Parameters

Next we need to provide date parameters for the query, this will tell Google Analytics the starting and ending points of the data that we want to extract. Dates must be strings in the form of YYYYMMDD. Intervals must be exactly one of the following:

  • Daily
  • Weekly
  • Monthly
  • Yearly
#Enter Date Parameters
interval = 'Monthly'
start_date = '20180101'
end_date = '20180801'

 

Metric Parameters

Provide metrics, dimensions, and segments as strings. Metrics are required, though dimensions and segments are optional. However, usually you’ll want to specify a dimension like Pages since having only the metrics without a dimension is not as useful.  If you don’t want to use any dimensions or segments, you need to use “None” like in the example below. Multiple metrics/dimensions/parameters can be provided; you’ll have to use commas to separate them.

Example:
metrics = ‘Pageviews’, ‘BounceRate’
dimensions = None
segments = ‘Organic Traffic’

Metrics and dimensions can be specified using the internal ID, the slug or the (case-insensitive) human-readable name. Visit the Dimensions and Metrics reference for more information. Below we’re providing an example of a report that will pull a report of Pages getting organic traffic with the metrics Page views and Bounce rate:

#Enter Metric Parameters
metrics = 'Pageviews','BounceRate'
dimensions = 'ga:pagePath'
segments = 'Organic Traffic'

 

Run Query

The code below runs the queries based on the above parameters

try:
    #Converts parameters to tuples
    if isinstance(metrics, str):
        metrics = (metrics,)
    if isinstance(dimensions, str):
        dimensions = (dimensions,)
    if isinstance(segments, str):
        segments = (segments,)

    profile_found = False    

    # Goes throught the accounts until it finds the correct profile   
    for account in accounts:
        if int(account.id) == int(account_id):
            web_properties = account.webproperties
            for web_property in web_properties:
                profiles = web_property.profiles
                for profile in profiles:
                    if int(profile.id) == int(profile_id):
                        #Profile found
                        profile_found = True    
                        print("Found profile: ",account_name,web_property.id,profile_id)

                        #Selects the query based on the interval parameter
                        if interval == 'Daily':
                            query = profile.core.query.daily(start_date,end_date)
                        elif interval == 'Weekly':
                            query = profile.core.query.weekly(start_date,end_date)
                        elif interval == 'Monthly':
                            query = profile.core.query.monthly(start_date,end_date)
                        elif interval == 'Yearly':
                            query = profile.core.query.yearly(start_date,end_date)
                        else:
                            print("Error: Invalid Interval")
                            break

                        #Selects type of query based on the metric parameters
                        if metrics and dimensions and segments:
                            data = query.segment(*segments).metrics(*metrics).dimensions(*dimensions).get()
                        elif metrics and dimensions:
                            data = query.metrics(*metrics).dimensions(*dimensions).get()
                        elif metrics and segments:
                            data = query.segment(*segments).metrics(*metrics).get()
                        elif metrics:
                            data = query.metrics(*metrics).get()
                        else:
                            print("Error: Invalid Metrics")
                            break
                        df = data.as_dataframe()
    if profile_found is False:
        print("Profile was not found!")
    
except Exception as e:
    print('An error occured.', str(e))

 

Output

Finally, we want to send the results to the specified file name that we created in the first cell.

df.to_csv(file_out,encoding="utf-8")
df.head()

We hope you find this notebook useful. Please let us know if you have any issues running it. We’re happy to help you get the reports you need.

 

Are you in need of technical SEO services? Contact LOCOMOTIVE Agency today.

Connect with us

Get in touch with us today to talk with our agency experts & learn how we can best benefit you and your business.

LOCATED IN
  • North America
  • Europe