Access Google BigQuery via Python Jupyter Notebook and load BigQuery tables into Pandas Dataframes
Google BigQuery is an analytics data warehouse provided by the Google Cloud Platform. Key advantage is that you can set it up quickly and query from your browser.
Enable the Google BigQuery API and you can remotely query your data using your Python Notebook and load it into your Pandas Dataframe for analysis. The following steps assume that you already have a Google Cloud Platform account.
- Within Google Cloud Platform console, set up your authentication
Go to this page and follow the steps on Create a Service Account Key
- Once created and you have your JSON file with your key details, head on to your Python Jupyter Notebook
Steps to take within your Python Jupyter Notebook:
Install Google BigQuery libraries by running the following code if it is not yet installed:
#install the BigQuery Client library !pip install --upgrade google-cloud-bigquery
Connect to Google BigQuery API
Upload your JSON authentication key file to a location on your server and run the following code
#call the google cloud bigquery libraries import pandas from google.cloud import bigquery from google.oauth2 import service_account #input the location of your json file credentials = service_account.Credentials.from_service_account_file('/home/nbuser/library/My First Project-8140695541c9.json') #input your project id - this can be found under your Project details found on Google Cloud Platform project_id = 'astute-age-200900' client = bigquery.Client(credentials= credentials,project=project_id)
Define your SQL query and load BigQuery tables into Pandas Dataframes
In this example, I’m calling all records from my table transactions , which sits under the dataset transactions. Table transactions contains customer transaction records at various service stations across Sydney.
#use SQL to select and read data from your table
#use SQL to select and read data from your table #REMEMBER: code is case sensitive #you SQL code is stored within the variable sql_ucltm sql_TraTra = "SELECT * FROM transactions.transactions" #assign variable df_ucltm and use the following code and load the for your selected data as a PANDAS dataframe #get results with query() function and store in object named ucltm res_TraTra = client.query(sql_TraTra) #transform results to Pandas Dataframe using to_dataframe() function df_TraTra = res_TraTra.to_dataframe()
Display top and bottom rows of your dataframe
#display top 30 and bottom 30 rows df_TraTra
Here is the full Jupyter notebook: