learn-python/contrib/pandas/datetime.md

159 wiersze
4.2 KiB
Markdown

# Working with Date & Time in Pandas
While working with data, it is common to come across data containing date and time. Pandas is a very handy tool for dealing with such data and provides a wide range of date and time data processing options.
- **Parsing dates and times**: Pandas provides a number of functions for parsing dates and times from strings, including `to_datetime()` and `parse_dates()`. These functions can handle a variety of date and time formats, Unix timestamps, and human-readable formats.
- **Manipulating dates and times**: Pandas provides a number of functions for manipulating dates and times, including `shift()`, `resample()`, and `to_timedelta()`. These functions can be used to add or subtract time periods, change the frequency of a time series, and calculate the difference between two dates or times.
- **Visualizing dates and times**: Pandas provides a number of functions for visualizing dates and times, including `plot()`, `hist()`, and `bar()`. These functions can be used to create line charts, histograms, and bar charts of date and time data.
### `Timestamp` function
The timestamp function in Pandas is used to convert a datetime object to a Unix timestamp. A Unix timestamp is a numerical representation of datetime.
Example for retrieving day, month and year from given date:
```python
import pandas as pd
ts = pd.Timestamp('2024-05-05')
y = ts.year
print('Year is: ', y)
m = ts.month
print('Month is: ', m)
d = ts.day
print('Day is: ', d)
```
Output:
```python
Year is: 2024
Month is: 5
Day is: 5
```
Example for extracting time related data from given date:
```python
import pandas as pd
ts = pd.Timestamp('2024-10-24 12:00:00')
print('Hour is: ', ts.hour)
print('Minute is: ', ts.minute)
print('Weekday is: ', ts.weekday())
print('Quarter is: ', ts.quarter)
```
Output:
```python
Hour is: 12
Minute is: 0
Weekday is: 1
Quarter is: 4
```
### `Timestamp.now()`
Example for getting current date and time:
```python
import pandas as pd
ts = pd.Timestamp.now()
print('Current date and time is: ', ts)
```
Output:
```python
Current date and time is: 2024-05-25 11:48:25.593213
```
### `date_range` function
Example for generating dates' for next five days:
```python
import pandas as pd
ts = pd.date_range(start = pd.Timestamp.now(), periods = 5)
for i in ts:
print(i.date())
```
Output:
```python
2024-05-25
2024-05-26
2024-05-27
2024-05-28
2024-05-29
```
Example for generating dates' for previous five days:
```python
import pandas as pd
ts = pd.date_range(end = pd.Timestamp.now(), periods = 5)
for i in ts:
print(i.date())
```
Output:
```python
2024-05-21
2024-05-22
2024-05-23
2024-05-24
2024-05-25
```
### Built-in vs pandas date & time operations
In `pandas`, you may add a time delta to a full column of dates in a single action, but Python's datetime requires a loop.
Example in Pandas:
```python
import pandas as pd
dates = pd.DataFrame(pd.date_range('2023-01-01', periods=100000, freq='T'))
dates += pd.Timedelta(days=1)
print(dates)
```
Output:
```python
0
0 2023-01-02 00:00:00
1 2023-01-02 00:01:00
2 2023-01-02 00:02:00
3 2023-01-02 00:03:00
4 2023-01-02 00:04:00
... ...
99995 2023-03-12 10:35:00
99996 2023-03-12 10:36:00
99997 2023-03-12 10:37:00
99998 2023-03-12 10:38:00
99999 2023-03-12 10:39:00
```
Example using Built-in datetime library:
```python
from datetime import datetime, timedelta
dates = [datetime(2023, 1, 1) + timedelta(minutes=i) for i in range(100000)]
dates = [date + timedelta(days=1) for date in dates]
```
Why use pandas functions?
- Pandas employs NumPy's datetime64 dtype, which takes up a set amount of bytes (usually 8 bytes per date), to store datetime data more compactly and efficiently.
- Each datetime object in Python takes up extra memory since it contains not only the date and time but also the additional metadata and overhead associated with Python objects.
- Pandas Offers a wide range of convenient functions and methods for date manipulation, extraction, and conversion, such as `pd.to_datetime()`, `date_range()`, `timedelta_range()`, and more. datetime library requires manual implementation for many of these operations, leading to longer and less efficient code.