kopia lustrzana https://github.com/animator/learn-python
159 wiersze
4.2 KiB
Markdown
159 wiersze
4.2 KiB
Markdown
# Working with Date & Time in Pandas
|
|
|
|
While working with data, it is common to come across data containing date and time. Pandas is a very handy tool for dealing with such data and provides a wide range of date and time data processing options.
|
|
|
|
- **Parsing dates and times**: Pandas provides a number of functions for parsing dates and times from strings, including `to_datetime()` and `parse_dates()`. These functions can handle a variety of date and time formats, Unix timestamps, and human-readable formats.
|
|
|
|
- **Manipulating dates and times**: Pandas provides a number of functions for manipulating dates and times, including `shift()`, `resample()`, and `to_timedelta()`. These functions can be used to add or subtract time periods, change the frequency of a time series, and calculate the difference between two dates or times.
|
|
|
|
- **Visualizing dates and times**: Pandas provides a number of functions for visualizing dates and times, including `plot()`, `hist()`, and `bar()`. These functions can be used to create line charts, histograms, and bar charts of date and time data.
|
|
|
|
### `Timestamp` function
|
|
|
|
The timestamp function in Pandas is used to convert a datetime object to a Unix timestamp. A Unix timestamp is a numerical representation of datetime.
|
|
|
|
Example for retrieving day, month and year from given date:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
ts = pd.Timestamp('2024-05-05')
|
|
y = ts.year
|
|
print('Year is: ', y)
|
|
m = ts.month
|
|
print('Month is: ', m)
|
|
d = ts.day
|
|
print('Day is: ', d)
|
|
```
|
|
|
|
Output:
|
|
|
|
```python
|
|
Year is: 2024
|
|
Month is: 5
|
|
Day is: 5
|
|
```
|
|
|
|
Example for extracting time related data from given date:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
ts = pd.Timestamp('2024-10-24 12:00:00')
|
|
print('Hour is: ', ts.hour)
|
|
print('Minute is: ', ts.minute)
|
|
print('Weekday is: ', ts.weekday())
|
|
print('Quarter is: ', ts.quarter)
|
|
```
|
|
|
|
Output:
|
|
|
|
```python
|
|
Hour is: 12
|
|
Minute is: 0
|
|
Weekday is: 1
|
|
Quarter is: 4
|
|
```
|
|
|
|
### `Timestamp.now()`
|
|
|
|
Example for getting current date and time:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
ts = pd.Timestamp.now()
|
|
print('Current date and time is: ', ts)
|
|
```
|
|
|
|
Output:
|
|
```python
|
|
Current date and time is: 2024-05-25 11:48:25.593213
|
|
```
|
|
|
|
### `date_range` function
|
|
|
|
Example for generating dates' for next five days:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
ts = pd.date_range(start = pd.Timestamp.now(), periods = 5)
|
|
for i in ts:
|
|
print(i.date())
|
|
```
|
|
|
|
Output:
|
|
|
|
```python
|
|
2024-05-25
|
|
2024-05-26
|
|
2024-05-27
|
|
2024-05-28
|
|
2024-05-29
|
|
```
|
|
|
|
Example for generating dates' for previous five days:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
ts = pd.date_range(end = pd.Timestamp.now(), periods = 5)
|
|
for i in ts:
|
|
print(i.date())
|
|
```
|
|
|
|
Output:
|
|
```python
|
|
2024-05-21
|
|
2024-05-22
|
|
2024-05-23
|
|
2024-05-24
|
|
2024-05-25
|
|
```
|
|
|
|
### Built-in vs pandas date & time operations
|
|
|
|
In `pandas`, you may add a time delta to a full column of dates in a single action, but Python's datetime requires a loop.
|
|
|
|
Example in Pandas:
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
dates = pd.DataFrame(pd.date_range('2023-01-01', periods=100000, freq='T'))
|
|
dates += pd.Timedelta(days=1)
|
|
print(dates)
|
|
```
|
|
|
|
Output:
|
|
```python
|
|
0
|
|
0 2023-01-02 00:00:00
|
|
1 2023-01-02 00:01:00
|
|
2 2023-01-02 00:02:00
|
|
3 2023-01-02 00:03:00
|
|
4 2023-01-02 00:04:00
|
|
... ...
|
|
99995 2023-03-12 10:35:00
|
|
99996 2023-03-12 10:36:00
|
|
99997 2023-03-12 10:37:00
|
|
99998 2023-03-12 10:38:00
|
|
99999 2023-03-12 10:39:00
|
|
```
|
|
|
|
Example using Built-in datetime library:
|
|
|
|
```python
|
|
from datetime import datetime, timedelta
|
|
|
|
dates = [datetime(2023, 1, 1) + timedelta(minutes=i) for i in range(100000)]
|
|
dates = [date + timedelta(days=1) for date in dates]
|
|
```
|
|
|
|
Why use pandas functions?
|
|
|
|
- Pandas employs NumPy's datetime64 dtype, which takes up a set amount of bytes (usually 8 bytes per date), to store datetime data more compactly and efficiently.
|
|
- Each datetime object in Python takes up extra memory since it contains not only the date and time but also the additional metadata and overhead associated with Python objects.
|
|
- Pandas Offers a wide range of convenient functions and methods for date manipulation, extraction, and conversion, such as `pd.to_datetime()`, `date_range()`, `timedelta_range()`, and more. datetime library requires manual implementation for many of these operations, leading to longer and less efficient code.
|