Use case examples – read CSV files into Pandas Dataframes
Why do we load or read CSV files into Pandas Dataframes?
- CSV files are common data file formats a
- Many applications export their data to CSV
- CSV is just a raw text data file, there is no functionality to summarise or manipulate the data
- We load CSV files into Pandas Dataframes to enable easy data manipulation and application of statistical functions and summaries
Walkthrough of code – Read CSV into Pandas Dataframes
We provide 2 examples of how CSV files are read into Pandas Dataframes:
- Pandas Dataframe reads a CSV file that contains header names for each column of data. We use these header names as the column names for our dataframe.
- Pandas Dataframe reads a CSV file that does not contain header names for each column of data. In this situation, we will need to specify the column names in our code.
Example Code – Read CSV into Pandas Dataframes
Read the CSV into Pandas
import pandas as pd #set the file location of your data file fileloc1 = "internation_football_results.csv" #read the csv file into a Pandas dataframe df_games df_games = pd.read_csv(fileloc1) #display 1st 5 rows in the df_games dataframe df_games.head()
Read CSV file into Pandas Dataframes and specify column names
In the above example, we did not have to specify the column names as the 1st line contained the column names.
However, if you need to specify column names, this how you do it:
#input your file location fileloc2 = "internation_football_results_sample.csv" #read csv file and load into Pandas Dataframe data_games_xheaders df_games_xheaders = pd.read_csv(fileloc2, names=['date', 'home_team', 'away_team', 'home_score', 'away_score', 'tournament','city','country','neutral']) df_games_xheaders
Walkthrough of code – Read CSV into Pandas Dataframes and set index columns
The next 2 examples look at how we index a dataframe based on the records in the CSV file. An index is like a reference that allows one to search and reference records in a data table or dataframe quickly and efficiently. Think of it to be something similar to a Table of Contents page in a book. Being a reference, it makes sense to choose columns with unique data values.
- Pandas Dataframe reads a CSV file with headers and we indicate which column will be used as an index for the Dataframe.
- Pandas Dataframe reads a CSV file with headers and we indicate which 2 columns will be used as an index for the Dataframe.
Example Code – Read CSV into Pandas Dataframes and set index columns
Read CSV with headers into Pandas DataFrame and state index column
#use file referenced in fileloc1 and use column home_team as index df_games_oneindex = pd.read_csv(fileloc1, index_col='home_team') df_games_oneindex
Read CSV with headers into Pandas DataFrame and state 2 columns as an index
#use file referenced in fileloc1 and use column home_team and away_team as index df_games_twoindex = pd.read_csv(fileloc1, index_col=['home_team','away_team']) df_games_twoindex