Handling DateTime In Pandas

Introduction

The article explains how to handle DateTime in Pandas. Whenever a file is imported into a DataFrame, the supposed to be DateTime column is either interpreted as String or Object. One cannot perform any DateTime operations on these data types, that’s where Pandas ‘to_datetime’ method helps plus there are other functions as well. In this article, we will cover

  1. ‘to_datetime’ function
  2. Date Comparisons
  3. Date formatting
  4. ‘date_range’ function

Setup

In this article, we look into the same dataset which I have always used in my Pandas articles we will work on a Kaggle dataset that provides YouTube video trending statistics, URL: https://www.kaggle.com/datasnaek/youtube-newand the file we are using is ‘USvideos.csv’.

df = pd.read_csv('USvideos.csv')
df.columns

The columns of the dataset are

Handling DateTime In Pandas

to_datetime function

The ‘to_datetime’ function converts the argument to DateTime. Converting String to Datetime

pd.to_datetime('11-13-2017 17:13:01') # returns, Timestamp('2017-11-13 17:13:01')

If the argument which is not parseable to DateTime, ‘ParseError’ is thrown,

pd.to_datetime(‘Test’)

Looking into the real dataset, the ‘publish_time’ column is of type object.

df['publish_time']

On passing the ‘publish_time’ to date_time function datatype changes to datetime. 

pd.to_datetime(df['publish_time'])

For converting the UNIX timestamp to datetime again to_datetime function should be used with unit=’s’, where ‘s’ stands for seconds

pd.to_datetime(1749321105, unit="s") # Timestamp('2025-06-07 18:31:45')

DateTime comparison

Since the to_datetime function converts the argument to Timestamp, we can compare two dates easily after conversion.

datetimeOne = pd.to_datetime('2021-12-31 14:30:00')
datetimeTwo = pd.to_datetime('2021-12-31 14:32:02')
datetimeOne > datetimeTwo # False
datetimeOne < datetimeTwo # True.

We can make the above code conditional, 

if(datetimeOne > datetimeTwo):
 print('DateOne is greater')
else:
 print('DateTwo is greater') # DateTwo is greater

DateTime Formatting

For formatting, Series object in Pandas has a function “panda.Series.dt.strftime”, the strftime function accepts String as a format like ‘%m-%d-%Y %H:%M:%S’.

df['publish_time'] = pd.to_datetime(df['publish_time']).dt.strftime('%m-%d-%Y %H:%M:%S')
df.head()

Formats the publish_time to in the format expected Month – Date-Year H-M-S.

date_range

date_range function generates a Series of Dates with the default frequency being DAY.

pd.date_range("12-31-2021", periods=10)

On changing the frequency from DAY to MONTH.

Function ‘bdate_range’ excludes the weekends.

Summary

As we can see working with datetime in pandas is very simple but documentation I believe is sparse which can create some confusion. The beauty of Pandas is that any Python library works very seamlessly with it, there is another library ‘Pendulum’ that can also work fine with it.


Recommended Free Ebook
Similar Articles