A common challenge when dealing with data is the treatement of date values. We look at how to:
- Get Pandas Dataframes to recognise loaded data values as dates
- Change dates in Pandas DataFrames to date data types
Use case examples – Change dates in Pandas DataFrames to date time data type
We need Pandas Dataframes to recognise dates as a datetime data type so that we can:
- filter records by date ranges
- sort records by dates (for instance, in chronological order)
- perform functions that allow us to group or summarise records by time periods (for example, year, quarter, month, week)
Walkthrough of code
In this set of code, we:
- load the CSV dataset into a Pandas DataFrame into a dataframe called df_games_src
- list column and data types information about the dataframe using the statements df_games_src.columns and df_games_src.dtypes
- list first 5 rows in the dataframe with the head() statement
- get Pandas dataframe to recognise the date column field in the dataframe df_games_src as a date using 2 methods:
- to_datetime method from the datetime library
- astype statement
Example Code – Defining DataFrame column fields as date data types
Load the csv data into a data frame and get dataframe details
import pandas as pd #source files to read historical international games df_games_src = pd.read_csv("results.csv") print("df_games Column headings:") print(df_games_src.columns) print("df_games Data types:") print(df_games_src.dtypes)
Get Pandas DataFrames to recognise loaded data values as dates
The information above shows that the date column of our loaded csv data is recognised as the object data type.
To conduct any date operations on this column, we need to change the Pandas DataFrame column data type to a datetime object.
#observe the date values of the dataframe df_games_src.head()
Looking at the date values of the date column, we see that they are in the format day / month / year.
There are 2 ways we can do it:
We can use the datetime Python library to convert the date column to a datetime data type.
import datetime df_games_src['date'] = pd.to_datetime(df_games_src['date'], format='%d/%m/%Y')
We can also use the method .astype(‘datetime64[ns]’) to convert it to date.
df_games_src['date'] = df_games_src['date'].astype('datetime64[ns]')
Notice the change in the data type of the date column from object to datetime64[ns].