Read CSV files into Pandas DataFrames

Use cases for code example

Use case examples – read CSV files into Pandas Dataframes

Why do we load or read CSV files into Pandas Dataframes?

  • CSV files are common data file formats a
  • Many applications export their data to CSV
  • CSV is just a raw text data file, there is no functionality to summarise or manipulate the data
  • We load CSV files into Pandas Dataframes to enable easy data manipulation and application of statistical functions and summaries

Sample Datasets for this example: Read CSV into Pandas Dataframe

We will use a csv file that contains historical international football results (courtesy of  Mart Jurisoo on Kaggle)

Download the csv file here

Walkthrough and explanation of code

Walkthrough of code – Read CSV into Pandas Dataframes

We provide 2 examples of how CSV files are read into Pandas Dataframes:

  1. Pandas Dataframe reads a CSV file that contains header names for each column of data. We use these header names as the column names for our dataframe.
  2. Pandas Dataframe reads a CSV file that does not contain header names for each column of data. In this situation, we will need to specify the column names in our code.

Example code

Example Code – Read CSV into Pandas Dataframes

 

Read the CSV into Pandas

import pandas as pd
#set the file location of your data file
fileloc1 = "internation_football_results.csv"

#read the csv file into a Pandas dataframe df_games
df_games = pd.read_csv(fileloc1)

#display 1st 5 rows in the df_games dataframe
df_games.head()

Read CSV file into Pandas Dataframes and specify column names

In the above example, we did not have to specify the column names as the 1st line contained the column names.

However, if you need to specify column names, this how you do it:

Download the sample file without headers

#input your file location
fileloc2 = "internation_football_results_sample.csv"

#read csv file and load into Pandas Dataframe data_games_xheaders
df_games_xheaders = pd.read_csv(fileloc2, names=['date', 'home_team', 'away_team', 'home_score', 'away_score', 'tournament','city','country','neutral'])

df_games_xheaders

Walkthrough and explanation of code

Walkthrough of code – Read CSV into Pandas Dataframes and set index columns

The next 2 examples look at how we index a dataframe based on the records in the CSV file. An index is like a reference that allows one to search and reference records in a data table or dataframe quickly and efficiently. Think of it to be something similar to a Table of Contents page in a book.  Being a reference, it makes sense to choose columns with unique data values.

  1. Pandas Dataframe reads a CSV file with headers and we indicate which column will be used as an index for the Dataframe.
  2. Pandas Dataframe reads a CSV file with headers and we indicate which 2 columns will be used as an index for the Dataframe.

 

Example code

Example Code – Read CSV into Pandas Dataframes and set index columns

 

Read CSV with headers into Pandas DataFrame and state index column

#use file referenced in fileloc1 and use column home_team as index

df_games_oneindex = pd.read_csv(fileloc1, index_col='home_team')

df_games_oneindex

 

Read CSV with headers into Pandas DataFrame and state 2 columns as an index

#use file referenced in fileloc1 and use column home_team and away_team as index

df_games_twoindex = pd.read_csv(fileloc1, index_col=['home_team','away_team'])

df_games_twoindex

 

Was this article helpful?

Related Articles