Netflix Stock Price Prediction
Contents
Netflix Stock Price Prediction¶
Author: Jiayu Wang
Student ID: 74613921
Course Project, UC Irvine, Math 10, W22
Introduction¶
Stock market prediction is trying to determine the future value of a company stock. This project will utilize historic data with linear regression to help us predict the future stock values.
Section 1 Clean Dataset¶
import pandas as pd
import altair as alt
nfstocks = pd.read_csv("/work/netflix_stock.csv")
nfstocks.dropna(inplace=True)
nfstocks
Date | Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|---|
0 | 2002-05-23 | 1.156429 | 1.242857 | 1.145714 | 1.196429 | 1.196429 | 104790000 |
1 | 2002-05-24 | 1.214286 | 1.225000 | 1.197143 | 1.210000 | 1.210000 | 11104800 |
2 | 2002-05-28 | 1.213571 | 1.232143 | 1.157143 | 1.157143 | 1.157143 | 6609400 |
3 | 2002-05-29 | 1.164286 | 1.164286 | 1.085714 | 1.103571 | 1.103571 | 6757800 |
4 | 2002-05-30 | 1.107857 | 1.107857 | 1.071429 | 1.071429 | 1.071429 | 10154200 |
... | ... | ... | ... | ... | ... | ... | ... |
4940 | 2022-01-05 | 592.000000 | 592.840027 | 566.880005 | 567.520020 | 567.520020 | 4148700 |
4941 | 2022-01-06 | 554.340027 | 563.359985 | 542.010010 | 553.289978 | 553.289978 | 5711800 |
4942 | 2022-01-07 | 549.460022 | 553.429993 | 538.219971 | 541.059998 | 541.059998 | 3381700 |
4943 | 2022-01-10 | 538.489990 | 543.690002 | 526.320007 | 539.849976 | 539.849976 | 4486100 |
4944 | 2022-01-11 | 536.989990 | 543.909973 | 530.070007 | 540.840027 | 540.840027 | 3072000 |
4945 rows × 7 columns
We would only need data “Date” and “Adj Close.”
nfstocks['Date'] = pd.to_datetime(nfstocks['Date'])
Here we want to use adjusted closing price, which is “Adj Close.” The adjusted closing price amends a stock’s closing price to reflect that stock’s value after accounting for any corporate actions. Closing price is the raw price which is just the cash value of last transcted price before market closes.
Using adjusted closing prices since these fully incorporate any splits, dividens, spin-offs and other distributions made by trader.
nfstocks1 = nfstocks[['Date','Adj Close']]
Before we start utilize linear regression predict the future trend, we want to see whats the trend for stock prices in the past 20 years.
import altair as alt
from altair import Chart, X, Y
import numpy as np
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=["Date"], empty='none')
# The basic line
line = alt.Chart().mark_line(interpolate='basis').encode(
alt.X('Date:T', axis=alt.Axis(title='')),
alt.Y('Adj Close:Q', axis=alt.Axis(title='',format='$f')),
color='symbol:N'
).properties(title = "Stock Price")
selectors = alt.Chart().mark_point().encode(
x="Date:T",
opacity=alt.value(0),
).add_selection(
nearest
)
# Draw points
points = line.mark_point().encode(
opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)
text = line.mark_text(align='left', dx=5, dy=-5).encode(
text=alt.condition(nearest, 'Adj Close:Q', alt.value(' '))
)
# Draw a rule
rules = alt.Chart().mark_rule(color="gray").encode(
x="Date:T",
).transform_filter(
nearest
)
stockChart = alt.layer(selectors, line, points, rules, text,data=nfstocks1).add_selection(nearest)
stockChart
This graph is reflecting the price changes over the span of 20 years which is from 1.156 dollars to 537.00 dollars . We can see a clear trend that the prices is continuing to increase.
Now we are going to utilize Linear Regression to help us predict the future price trend based on historic data.
Section 2: Expoential Moving Average (EMA)¶
We know that in predicting stock prices utilize techinical analysis is crucial.
Techinical analysis indicator determines the support and resistance levels. This help indicate whether the prices has dropped lower or climbed higher.
Techinical indicators are heuristic or pattern-based signals produced by the price, volume, and/or open interest of a security or contract used by traders who follow techinical analysis.
Based on some research online, we will add exponential moving average to our existing data set.
Expoential moving average is a type of moving average that places a greater weight and significance on the most recent data points.
import pandas as pd
nfstocks1['EMA'] = nfstocks1['Adj Close'].ewm(span=20, min_periods=0,adjust=False,ignore_na=False).mean()
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This is separate from the ipykernel package so we can avoid doing imports until
Short term traders usually rely on 12 to 26 day EMA. Especially that EMA reacts more quickly to price swings than the SMA, but it will lag quite a bit over longer periods.
In order to test the accuracy of the EMA calculation, we searched that for 2021.06.03, on MarketWatch website shows that EMA for Netflix is 499.77 USD, which matches our calculations.
c1 = Chart(nfstocks1).mark_line().encode(
x = "Date",
y = "Adj Close",
color = "symbol:N"
).properties(title= "Stock Price")
c2 = Chart(nfstocks1).mark_line(color ="red").encode(
x = "Date",
y = "EMA"
)
c1 + c2
Now we are going to develop our regression model and see how effective the EMA is at predicting the price of the stock.
We are first going to use the 80/20 split to train and test our data
Section 3: Linear Regression¶
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(nfstocks1[['Adj Close']],nfstocks1[['EMA']], test_size = .2)
#test set
print(X_test.describe())
Adj Close
count 989.000000
mean 120.458023
std 174.670201
min 0.459286
25% 3.940000
50% 25.214287
75% 163.070007
max 691.690002
#training set
print(X_train.describe())
Adj Close
count 3956.000000
mean 110.194928
std 165.076349
min 0.372857
25% 3.920893
50% 23.812143
75% 130.042499
max 682.609985
Training Model
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X_train, y_train)
LinearRegression()
y_pred = reg.predict(X_test)
Now we are going to use mean absolute error and coefficient of determination to examine how well this model fits and examine the coefficient.
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))
print("Coefficient of Determination:", r2_score(y_test, y_pred))
Mean Absolute Error: 4.544997954933673
Coefficient of Determination: 0.9969900789763098
We know that Mean Absolute Error can be described as the sum of the absolute error for all observed values divided by the total number of observations. Therefore, the lower MAE we get, the better.
For the coefficient of determination(R_squared), we know that it has of valuyes of 1 or 0 will indicate the regression line represents all or none of the data. Therefore, we would want our coefficient is higher (closer to 1.0) since it helps to indicate that it is a better fit for the observation.
Based on the ideas and output above, we know that our regression from the MAE and R-squared perspectives that they are a good fit.
Now we want to utilize the graph to show the observed value and predicted values
df1 = pd.DataFrame(nfstocks1["Adj Close"].iloc[500:510])
df1["Date"] = pd.DataFrame(nfstocks1['Date'].iloc[500:510])
c3 = alt.Chart(df1).mark_circle().encode(
x = alt.X("Date",scale=alt.Scale(zero=False)),
y = alt.Y("Adj Close",scale=alt.Scale (zero=False))
)
nfstocks1["pred"] = reg.predict(nfstocks1[["Adj Close"]])
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
df1["prediction"] = pd.DataFrame(nfstocks1['pred'].iloc[500:510])
c4 = alt.Chart(df1).mark_line(color="red").encode(
x = alt.X("Date",scale=alt.Scale(zero=False)),
y = alt.Y("prediction",scale=alt.Scale (zero=False))
)
c3+c4
Section 4: Simulation to Test the Model¶
Since we already developed and trained the model based on historic pricing data. Now we want to develop the model that can use EMA of any given days to repdict the close price.
We also want to use a trading strategy such as if our predicted value of the stock is higher than the open value of the stock, we will consider to trade. However, if our predicted stock price is equal to or smaller than open value of the stock, we will consider not trade.
Input some data that we already have to test: (Here I choose data that are from consecutive days since the prediction of current stock price is based on the EMA value from the day before)
df2 = pd.DataFrame(nfstocks["Date"].iloc[4900:4944])
df2['Open'] = nfstocks['Open'].iloc[4900:4944]
df2['Adj Close'] = nfstocks['Adj Close'].iloc[4900:4944]
df2['EMA'] = nfstocks1['EMA'].iloc[4900:4944]
Predicted Value
df2['predict'] = reg.predict(df2[["EMA"]].values)
/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
"X does not have valid feature names, but"
#Here we use the conditions method, link is below reference
conditions = [df2['predict'] > df2['Open'],df2['predict'] < df2['Open']]
choices = ['Trade','Not Trade']
df2['Trade Decision'] = np.select(conditions, choices, default='Not Trade')
Now we want to create a seperate column with the potential loss/earning that we can make
df2['Earning'] = df2["predict"]-df2["Open"]
Before providing a direct view of potential earnings, you can utilize the function below to get the predicted value of the day.
You can follows the steps below:
#First, you can put your open value of the day, here we take 598.179993
#as an example
open = 598.179993
#Second, run this block, you will get to know what is the predicted close price will be based on your open price
close = df2['Adj Close'].where(df2['Open'] == open).dropna().values[0]
print(f'If you have an open price as ${open}, your predicted close price will be ${close}')
If you have an open price as $598.179993, your predicted close price will be $605.039978
import matplotlib as mpl
In order to give a more direct view of potential earnings, we highlighted the earning based on its value. If we are able
def style_negative(v, props='color:red;'):
return props if v < 0 else None
df2 = df2.style.applymap(style_negative,subset=["Earning"])
def highlight_max(s, props = ''):
return np.where(s == np.nanmax(s.values), props, '')
df2.apply(highlight_max, props='color:white;background-color:darkblue')
Date | Open | Adj Close | EMA | predict | Trade Decision | Earning | |
---|---|---|---|---|---|---|---|
4900 | 2021-11-08 00:00:00 | 650.289978 | 651.450012 | 654.898466 | 648.444913 | Not Trade | -1.845065 |
4901 | 2021-11-09 00:00:00 | 653.700012 | 655.989990 | 655.002421 | 648.547848 | Not Trade | -5.152164 |
4902 | 2021-11-10 00:00:00 | 653.010010 | 646.909973 | 654.231712 | 647.784696 | Not Trade | -5.225314 |
4903 | 2021-11-11 00:00:00 | 650.239990 | 657.580017 | 654.550598 | 648.100455 | Not Trade | -2.139535 |
4904 | 2021-11-12 00:00:00 | 660.010010 | 682.609985 | 657.222921 | 650.746573 | Not Trade | -9.263437 |
4905 | 2021-11-15 00:00:00 | 681.239990 | 679.330017 | 659.328358 | 652.831365 | Not Trade | -28.408625 |
4906 | 2021-11-16 00:00:00 | 678.270020 | 687.400024 | 662.001850 | 655.478641 | Not Trade | -22.791379 |
4907 | 2021-11-17 00:00:00 | 690.000000 | 691.690002 | 664.829293 | 658.278359 | Not Trade | -31.721641 |
4908 | 2021-11-18 00:00:00 | 691.609985 | 682.020020 | 666.466505 | 659.899517 | Not Trade | -31.710468 |
4909 | 2021-11-19 00:00:00 | 692.349976 | 678.799988 | 667.641123 | 661.062616 | Not Trade | -31.287360 |
4910 | 2021-11-22 00:00:00 | 676.020020 | 659.200012 | 666.837207 | 660.266584 | Not Trade | -15.753436 |
4911 | 2021-11-23 00:00:00 | 658.179993 | 654.059998 | 665.620330 | 659.061639 | Trade | 0.881646 |
4912 | 2021-11-24 00:00:00 | 658.010010 | 658.289978 | 664.922202 | 658.370356 | Trade | 0.360346 |
4913 | 2021-11-26 00:00:00 | 675.000000 | 665.640015 | 664.990565 | 658.438049 | Not Trade | -16.561951 |
4914 | 2021-11-29 00:00:00 | 663.200012 | 663.840027 | 664.880990 | 658.329548 | Not Trade | -4.870464 |
4915 | 2021-11-30 00:00:00 | 668.200012 | 641.900024 | 662.692326 | 656.162347 | Not Trade | -12.037665 |
4916 | 2021-12-01 00:00:00 | 649.479980 | 617.770020 | 658.414011 | 651.925985 | Trade | 2.446005 |
4917 | 2021-12-02 00:00:00 | 617.099976 | 616.469971 | 654.419341 | 647.970485 | Trade | 30.870509 |
4918 | 2021-12-03 00:00:00 | 622.750000 | 602.130005 | 649.439404 | 643.039381 | Trade | 20.289381 |
4919 | 2021-12-06 00:00:00 | 606.010010 | 612.690002 | 645.939461 | 639.573758 | Trade | 33.563748 |
4920 | 2021-12-07 00:00:00 | 619.830017 | 625.580017 | 644.000466 | 637.653777 | Trade | 17.823760 |
4921 | 2021-12-08 00:00:00 | 630.000000 | 628.080017 | 642.484233 | 636.152412 | Trade | 6.152412 |
4922 | 2021-12-09 00:00:00 | 627.580017 | 611.000000 | 639.485735 | 633.183316 | Trade | 5.603299 |
4923 | 2021-12-10 00:00:00 | 616.780029 | 611.659973 | 636.835662 | 630.559230 | Trade | 13.779201 |
4924 | 2021-12-13 00:00:00 | 612.000000 | 604.559998 | 633.761789 | 627.515499 | Trade | 15.515499 |
4925 | 2021-12-14 00:00:00 | 598.710022 | 597.989990 | 630.354951 | 624.142068 | Trade | 25.432046 |
4926 | 2021-12-15 00:00:00 | 598.179993 | 605.039978 | 627.944002 | 621.754760 | Trade | 23.574767 |
4927 | 2021-12-16 00:00:00 | 597.090027 | 591.059998 | 624.431239 | 618.276444 | Trade | 21.186417 |
4928 | 2021-12-17 00:00:00 | 591.609985 | 586.729980 | 620.840643 | 614.721056 | Trade | 23.111071 |
4929 | 2021-12-20 00:00:00 | 586.429993 | 593.739990 | 618.259629 | 612.165351 | Trade | 25.735358 |
4930 | 2021-12-21 00:00:00 | 597.539978 | 604.919983 | 616.989186 | 610.907366 | Trade | 13.367388 |
4931 | 2021-12-22 00:00:00 | 603.359985 | 614.239990 | 616.727358 | 610.648106 | Trade | 7.288121 |
4932 | 2021-12-23 00:00:00 | 616.400024 | 614.090027 | 616.476184 | 610.399394 | Not Trade | -6.000630 |
4933 | 2021-12-27 00:00:00 | 615.000000 | 613.119995 | 616.156547 | 610.082892 | Not Trade | -4.917108 |
4934 | 2021-12-28 00:00:00 | 614.950012 | 610.710022 | 615.637830 | 609.569261 | Not Trade | -5.380751 |
4935 | 2021-12-29 00:00:00 | 610.710022 | 610.539978 | 615.152320 | 609.088513 | Not Trade | -1.621509 |
4936 | 2021-12-30 00:00:00 | 612.989990 | 612.090027 | 614.860673 | 608.799725 | Not Trade | -4.190265 |
4937 | 2021-12-31 00:00:00 | 610.010010 | 602.440002 | 613.677752 | 607.628404 | Not Trade | -2.381606 |
4938 | 2022-01-03 00:00:00 | 605.609985 | 597.369995 | 612.124632 | 606.090514 | Trade | 0.480529 |
4939 | 2022-01-04 00:00:00 | 599.909973 | 591.150024 | 610.127051 | 604.112520 | Trade | 4.202547 |
4940 | 2022-01-05 00:00:00 | 592.000000 | 567.520020 | 606.069238 | 600.094498 | Trade | 8.094498 |
4941 | 2022-01-06 00:00:00 | 554.340027 | 553.289978 | 601.042642 | 595.117192 | Trade | 40.777165 |
4942 | 2022-01-07 00:00:00 | 549.460022 | 541.059998 | 595.330009 | 589.460577 | Trade | 40.000555 |
4943 | 2022-01-10 00:00:00 | 538.489990 | 539.849976 | 590.046197 | 584.228577 | Trade | 45.738587 |
Based on the graph above, we are able to see the highlights of the highest value of Netflix stock price in the selected 44 days. We also highlighted the trade areas to show you that the days that are reccomendated to trade with poential earnings.
Summary¶
In this project, we first cleaned and organized the data set. With the help of the graph, we are able to see the trend of the stock performances from 2002 to 2021. Later, we added another factor EMA, Expoential Moving Average, to our dataset and utilize the linear regression to train the model and help us predict the future stock prices. In the end, we utilize some data in the original dataset and ran simulations to test the effecitveness and application of this model.
References¶
Dataset: Kaggle Dataset
EMA: https://stackoverflow.com/questions/48775841/pandas-ema-not-matching-the-stocks-ema
Conditions: https://www.statology.org/compare-two-columns-in-pandas/
altair chart: https://altair-viz.github.io/gallery/multiline_tooltip.html