Using Google BigQuery API to load BigQuery tables into Pandas Dataframes

Access Google BigQuery via Python Jupyter Notebook and load BigQuery tables into Pandas Dataframes

Google BigQuery is an analytics data warehouse provided by the Google Cloud Platform. Key advantage is that you can set it up quickly and query from your browser.

Enable the Google BigQuery API and you can remotely query your data using your Python Notebook  and load it into your Pandas Dataframe for analysis.  The following steps assume that you already have a Google Cloud Platform account.

  • Within Google Cloud Platform console, set up your authentication
    Go to this page and follow the steps on Create a Service Account Key
  • Once created and you have your JSON file with your key details, head on to your Python Jupyter Notebook

Steps to take within your Python Jupyter Notebook:

Install Google BigQuery libraries by running the following code if it is not yet installed:

#install the BigQuery Client library

!pip install --upgrade google-cloud-bigquery

Connect to Google BigQuery API

Upload your JSON authentication key file to a location on your server and run the following code

#call the google cloud bigquery libraries
import pandas
from import bigquery
from google.oauth2 import service_account

#input the location of your json file
credentials = service_account.Credentials.from_service_account_file('/home/nbuser/library/My First Project-8140695541c9.json')

#input your project id - this can be found under your Project details found on Google Cloud Platform
project_id = 'astute-age-200900'
client = bigquery.Client(credentials= credentials,project=project_id)

Define your SQL query and load BigQuery tables into Pandas Dataframes

In this example, I’m calling all records from my table transactions , which sits under the dataset transactions. Table transactions contains customer transaction records at various service stations across Sydney.

#use SQL to select and read data from your table

#use SQL to select and read data from your table
#REMEMBER: code is case sensitive
#you SQL code is stored within the variable sql_ucltm

sql_TraTra = "SELECT * FROM transactions.transactions"

#assign variable df_ucltm and use the following code and load the for your selected data as a PANDAS dataframe 
#get results with query() function and store in object named ucltm
res_TraTra = client.query(sql_TraTra)
#transform results to Pandas Dataframe using to_dataframe() function 
df_TraTra = res_TraTra.to_dataframe()

Display top and bottom rows of your dataframe

#display top 30 and bottom 30 rows 

Here is the full Jupyter notebook:

Was this article helpful?

Related Articles