Lecture 21 - Time Series
13 November, 2024
JOIN
statement to combine data from different tablesJOIN
statements, such as INNER JOIN
, LEFT JOIN
, RIGHT JOIN
, and FULL JOIN
pd.to_sql()
to write data to a SQL database from a DataFramedatetime
module to work with dates and timesmatplotlib
datetime
module to work with dates and timespandas
and matplotlib
date_str | sp500 | djia | date_ex1 | date_ex2 | date_ex3 | |
---|---|---|---|---|---|---|
0 | 4/4/2018 | 2644.69 | 24264.30 | April 04 2018 | Wednesday, 2018-04-04 | 4-Apr-18 |
1 | 4/5/2018 | 2662.84 | 24505.22 | April 05 2018 | Thursday, 2018-04-05 | 5-Apr-18 |
2 | 4/6/2018 | 2604.47 | 23932.76 | April 06 2018 | Friday, 2018-04-06 | 6-Apr-18 |
3 | 4/9/2018 | 2613.16 | 23979.10 | April 09 2018 | Monday, 2018-04-09 | 9-Apr-18 |
4 | 4/10/2018 | 2656.87 | 24408.00 | April 10 2018 | Tuesday, 2018-04-10 | 10-Apr-18 |
sp500
is the S&P 500 indexdjia
is the Dow Jones Industrial Averagedatetime
objects using the pd.to_datetime()
functionparse_dates
argument in pd.read_csv()
financial = pd.read_csv("data_raw/financial.csv",parse_dates = ["date"])
date_str
, to a datetime
object# Convert the date column to datetime
financial["date"] = pd.to_datetime(financial["date_str"])
financial.head()
date_str | sp500 | djia | date_ex1 | date_ex2 | date_ex3 | date | |
---|---|---|---|---|---|---|---|
0 | 4/4/2018 | 2644.69 | 24264.30 | April 04 2018 | Wednesday, 2018-04-04 | 4-Apr-18 | 2018-04-04 |
1 | 4/5/2018 | 2662.84 | 24505.22 | April 05 2018 | Thursday, 2018-04-05 | 5-Apr-18 | 2018-04-05 |
2 | 4/6/2018 | 2604.47 | 23932.76 | April 06 2018 | Friday, 2018-04-06 | 6-Apr-18 | 2018-04-06 |
3 | 4/9/2018 | 2613.16 | 23979.10 | April 09 2018 | Monday, 2018-04-09 | 9-Apr-18 | 2018-04-09 |
4 | 4/10/2018 | 2656.87 | 24408.00 | April 10 2018 | Tuesday, 2018-04-10 | 10-Apr-18 | 2018-04-10 |
financial.dtypes
min()
and max()
functionshead()
and tail()
as we did beforeplt.plot()
is used to create line plots(x,y)
datafinancial.sort_values("date")
djia
column on the y
axis and the date
column on the x
axissp500 | djia | |
---|---|---|
sp500 | 1.000000 | 0.981448 |
djia | 0.981448 | 1.000000 |
# Normalise the indices to start from the same point
financial["sp500_normalised"] = financial["sp500"] / financial["sp500"].iloc[0]
financial["djia_normalised"] = financial["djia"] / financial["djia"].iloc[0]
# Plot the normalized data
plt.plot("date", "sp500_normalised", data = financial, label = "S&P 500 (Normalised)")
plt.plot("date", "djia_normalised", data = financial, label = "Dow Jones Industrial Average (Normalised)")
plt.xlabel("Time")
plt.ylabel("Normalised Index")
plt.title("S&P 500 vs. Dow Jones Industrial Average (Normalised)")
plt.legend()
plt.show()
datetime
objects to strings using the strftime()
method%
is used to denote date formats%Y
is the year, %m
is the month, and %d
is the day%H
is the hour, %M
is the minute, and %S
is the second0 April
1 April
2 April
Name: monthname, dtype: object
YYYY-MM-DD
message_monthday = financial["date"].dt.strftime("The day of the week is %A")
print(message_monthday.head(3))
0 The day of the week is Wednesday
1 The day of the week is Thursday
2 The day of the week is Friday
Name: date, dtype: object
0 2018-04
1 2018-04
2 2018-04
Name: date, dtype: object
date_test
which has the format using .dt.strftime()
Weekday, Month Day, Year
Monday, December 31, 2023
datetime
module to perform operations on dates# Calculate the difference between two dates
date1 = datetime(2023, 12, 31)
date2 = datetime(2023, 12, 25)
difference = date1 - date2
print(difference)
6 days, 0:00:00
timedelta
function to add or subtract dayspd.to_datetime(financial["date_ex1"], format = "%A, %Y-%m-%d")
ValueError: time data "April 04 2018" doesn't match format "%A, %Y-%m-%d", at position 0. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
pd.Grouper()
to group the data, then we use the agg()
function to calculate the aggregate# Group the data by year
year_group = pd.Grouper(key='date', freq='YE') # 'YE' stands for year end
financial.groupby(year_group).agg(sp500_mean = ("sp500","mean")).reset_index()
date | sp500_mean | |
---|---|---|
0 | 2018-12-31 | 2751.921862 |
1 | 2019-12-31 | 2913.360556 |
2 | 2020-12-31 | 3217.856482 |
3 | 2021-12-31 | 4273.406627 |
4 | 2022-12-31 | 4098.514741 |
5 | 2023-12-31 | 4003.579844 |
month_group = pd.Grouper(key='date', freq='ME')
monthly_data = (financial
.groupby(month_group)
.agg(sp500_mean = ("sp500", "mean"))
.reset_index())
monthly_data
date | sp500_mean | |
---|---|---|
0 | 2018-04-30 | 2659.463158 |
1 | 2018-05-31 | 2701.493636 |
2 | 2018-06-30 | 2754.352857 |
3 | 2018-07-31 | 2793.643333 |
4 | 2018-08-31 | 2857.820435 |
... | ... | ... |
56 | 2022-12-31 | 3912.380952 |
57 | 2023-01-31 | 3960.656500 |
58 | 2023-02-28 | 4079.684737 |
59 | 2023-03-31 | 3968.559130 |
60 | 2023-04-30 | 4112.555000 |
61 rows × 2 columns
pd.to_datetime()
data_ex3
to a datetime
objectWeekday, Month Day, Year
financial['date_ex3_dt'] = pd.to_datetime(financial["date_ex3"], format = "%d-%b-%y")
week_group = pd.Grouper(key='date_ex3_dt', freq='W')
weekly_data = (financial
.groupby(week_group)
.agg(sp500_std = ("sp500", "std"))
.reset_index())
plt.plot("date_ex3_dt","sp500_std",
data = weekly_data)
plt.xlabel("Time")
plt.ylabel("S&P 500")
plt.title("Weekly standard deviation of the stock market performance")
plt.show()