learn-python/contrib/pandas/datetime_with_pandas.md

4.8 KiB

Pandas DateTime

Pandas is a robust Python library that is available as free source. The Pandas library is used to manipulate and analyse data. Pandas are made up of data structures and functions that allow for efficient data processing.

While working with data, it is common to come across time series data. Pandas is a very handy tool for dealing with time series data. Pandas is a strong Python data analysis toolkit that provides a wide range of date and time data processing options. Many data science jobs require working with time series data, time zones, and date arithmetic, and pandas simplifies these processes.

Features of Pandas Date_Time:

  • Parsing dates and times: Pandas provides a number of functions for parsing dates and times from strings, including to_datetime() and parse_dates(). These functions can handle a variety of date and time formats, Unix timestamps, and human-readable formats.

  • Manipulating dates and times: Pandas provides a number of functions for manipulating dates and times, including shift(), resample(), and to_timedelta(). These functions can be used to add or subtract time periods, change the frequency of a time series, and calculate the difference between two dates or times.

  • Visualizing dates and times: Pandas provides a number of functions for visualizing dates and times, including plot(), hist(), and bar(). These functions can be used to create line charts, histograms, and bar charts of date and time data.

Installation of libraries

pip install pandas

  • Note: There is no need to install a seperate library for date_time operations, pandas module itself has built-in functions.

Example for retrieving day, month and year from given date:

import pandas as pd

ts = pd.Timestamp('2024-05-05')
y = ts.year
print('Year is: ', y)
m = ts.month
print('Month is: ', m)
d = ts.day
print('Day is: ', d)

Output:

Year is:  2024
Month is:  5
Day is:  5
  • Note: The timestamp function in Pandas is used to convert a datetime object to a Unix timestamp. A Unix timestamp is a numerical representation of datetime.

Example for extracting time related data from given date:

import pandas as pd

ts = pd.Timestamp('2024-10-24 12:00:00')
print('Hour is: ', ts.hour)
print('Minute is: ', ts.minute)
print('Weekday is: ', ts.weekday())
print('Quarter is: ', ts.quarter)

Output:

Hour is:  12
Minute is:  0
Weekday is:  1
Quarter is:  4

Example for getting current date and time:

import pandas as pd

ts = pd.Timestamp.now()
print('Current date and time is: ', ts)

Output:

Current date and time is:  2024-05-25 11:48:25.593213

Example for generating dates' for next five days:

import pandas as pd

ts = pd.date_range(start = pd.Timestamp.now(), periods = 5)
for i in ts:
    print(i.date())

Output:

2024-05-25
2024-05-26
2024-05-27
2024-05-28
2024-05-29

Example for generating dates' for previous five days:

import pandas as pd

ts = pd.date_range(end = pd.Timestamp.now(), periods = 5)
for i in ts:
    print(i.date())

Output:

2024-05-21
2024-05-22
2024-05-23
2024-05-24
2024-05-25

Pandas DateTime is Efficient than Built-in DateTime library in various aspects like:

  • In pandas, you may add a time delta to a full column of dates in a single action, but Python's datetime requires a loop.

Example using Pandas DateTime:

import pandas as pd

dates = pd.DataFrame(pd.date_range('2023-01-01', periods=100000, freq='T'))
dates += pd.Timedelta(days=1)
print(dates)

Output:

                    0
0     2023-01-02 00:00:00
1     2023-01-02 00:01:00
2     2023-01-02 00:02:00
3     2023-01-02 00:03:00
4     2023-01-02 00:04:00
...                   ...
99995 2023-03-12 10:35:00
99996 2023-03-12 10:36:00
99997 2023-03-12 10:37:00
99998 2023-03-12 10:38:00
99999 2023-03-12 10:39:00

Example using Built-In datetime library:

from datetime import datetime, timedelta

dates = [datetime(2023, 1, 1) + timedelta(minutes=i) for i in range(100000)]
dates = [date + timedelta(days=1) for date in dates]

Output: The output is very large to display and taking more time to display

  • Pandas employs NumPy's datetime64 dtype, which takes up a set amount of bytes (usually 8 bytes per date), to store datetime data more compactly and efficiently.

  • Each datetime object in Python takes up extra memory since it contains not only the date and time but also the additional metadata and overhead associated with Python objects.

  • Pandas Offers a wide range of convenient functions and methods for date manipulation, extraction, and conversion, such as pd.to_datetime(), date_range(), timedelta_range(), and more.

  • datetime library requires manual implementation for many of these operations, leading to longer and less efficient code.