Introduction
The article explains how to handle DateTime in Pandas. Whenever a file is imported into a DataFrame, the supposed to be DateTime column is either interpreted as String or Object. One cannot perform any DateTime operations on these data types, that’s where Pandas ‘to_datetime’ method helps plus there are other functions as well. In this article, we will cover
- ‘to_datetime’ function
- Date Comparisons
- Date formatting
- ‘date_range’ function
Setup
In this article, we look into the same dataset which I have always used in my Pandas articles we will work on a Kaggle dataset that provides YouTube video trending statistics, URL: https://www.kaggle.com/datasnaek/youtube-newand the file we are using is ‘USvideos.csv’.
df = pd.read_csv('USvideos.csv')
df.columns
The columns of the dataset are
to_datetime function
The ‘to_datetime’ function converts the argument to DateTime. Converting String to Datetime
pd.to_datetime('11-13-2017 17:13:01') # returns, Timestamp('2017-11-13 17:13:01')
If the argument which is not parseable to DateTime, ‘ParseError’ is thrown,
pd.to_datetime(‘Test’)
Looking into the real dataset, the ‘publish_time’ column is of type object.
df['publish_time']
On passing the ‘publish_time’ to date_time function datatype changes to datetime.
pd.to_datetime(df['publish_time'])
For converting the UNIX timestamp to datetime again to_datetime function should be used with unit=’s’, where ‘s’ stands for seconds
pd.to_datetime(1749321105, unit="s") # Timestamp('2025-06-07 18:31:45')
DateTime comparison
Since the to_datetime function converts the argument to Timestamp, we can compare two dates easily after conversion.
datetimeOne = pd.to_datetime('2021-12-31 14:30:00')
datetimeTwo = pd.to_datetime('2021-12-31 14:32:02')
datetimeOne > datetimeTwo # False
datetimeOne < datetimeTwo # True.
We can make the above code conditional,
if(datetimeOne > datetimeTwo):
print('DateOne is greater')
else:
print('DateTwo is greater') # DateTwo is greater
DateTime Formatting
For formatting, Series object in Pandas has a function “panda.Series.dt.strftime”, the strftime function accepts String as a format like ‘%m-%d-%Y %H:%M:%S’.
df['publish_time'] = pd.to_datetime(df['publish_time']).dt.strftime('%m-%d-%Y %H:%M:%S')
df.head()
Formats the publish_time to in the format expected Month – Date-Year H-M-S.
date_range
date_range function generates a Series of Dates with the default frequency being DAY.
pd.date_range("12-31-2021", periods=10)
On changing the frequency from DAY to MONTH.
Function ‘bdate_range’ excludes the weekends.
Summary
As we can see working with datetime in pandas is very simple but documentation I believe is sparse which can create some confusion. The beauty of Pandas is that any Python library works very seamlessly with it, there is another library ‘Pendulum’ that can also work fine with it.