Netflix Stock Price Prediction

Author: Jiayu Wang

Student ID: 74613921

Course Project, UC Irvine, Math 10, W22

Introduction

Stock market prediction is trying to determine the future value of a company stock. This project will utilize historic data with linear regression to help us predict the future stock values.

Section 1 Clean Dataset

import pandas as pd
import altair as alt 
nfstocks = pd.read_csv("/work/netflix_stock.csv")
nfstocks.dropna(inplace=True)
nfstocks
Date Open High Low Close Adj Close Volume
0 2002-05-23 1.156429 1.242857 1.145714 1.196429 1.196429 104790000
1 2002-05-24 1.214286 1.225000 1.197143 1.210000 1.210000 11104800
2 2002-05-28 1.213571 1.232143 1.157143 1.157143 1.157143 6609400
3 2002-05-29 1.164286 1.164286 1.085714 1.103571 1.103571 6757800
4 2002-05-30 1.107857 1.107857 1.071429 1.071429 1.071429 10154200
... ... ... ... ... ... ... ...
4940 2022-01-05 592.000000 592.840027 566.880005 567.520020 567.520020 4148700
4941 2022-01-06 554.340027 563.359985 542.010010 553.289978 553.289978 5711800
4942 2022-01-07 549.460022 553.429993 538.219971 541.059998 541.059998 3381700
4943 2022-01-10 538.489990 543.690002 526.320007 539.849976 539.849976 4486100
4944 2022-01-11 536.989990 543.909973 530.070007 540.840027 540.840027 3072000

4945 rows × 7 columns

We would only need data “Date” and “Adj Close.”

nfstocks['Date'] = pd.to_datetime(nfstocks['Date'])

Here we want to use adjusted closing price, which is “Adj Close.” The adjusted closing price amends a stock’s closing price to reflect that stock’s value after accounting for any corporate actions. Closing price is the raw price which is just the cash value of last transcted price before market closes.

Using adjusted closing prices since these fully incorporate any splits, dividens, spin-offs and other distributions made by trader.

nfstocks1 = nfstocks[['Date','Adj Close']]

Before we start utilize linear regression predict the future trend, we want to see whats the trend for stock prices in the past 20 years.

import altair as alt
from altair import Chart, X, Y
import numpy as np
nearest = alt.selection(type='single', nearest=True, on='mouseover',
fields=["Date"], empty='none')

# The basic line
line = alt.Chart().mark_line(interpolate='basis').encode(
    alt.X('Date:T', axis=alt.Axis(title='')),
    alt.Y('Adj Close:Q', axis=alt.Axis(title='',format='$f')),
    color='symbol:N'
).properties(title = "Stock Price")

selectors = alt.Chart().mark_point().encode(
    x="Date:T",
    opacity=alt.value(0),
).add_selection(
    nearest
)

# Draw points 
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(1), alt.value(0))
)

text = line.mark_text(align='left', dx=5, dy=-5).encode(
    text=alt.condition(nearest, 'Adj Close:Q', alt.value(' '))
)

# Draw a rule
rules = alt.Chart().mark_rule(color="gray").encode(
    x="Date:T",
).transform_filter(
    nearest
)


stockChart = alt.layer(selectors, line, points, rules, text,data=nfstocks1).add_selection(nearest)
stockChart

This graph is reflecting the price changes over the span of 20 years which is from 1.156 dollars to 537.00 dollars . We can see a clear trend that the prices is continuing to increase.

Now we are going to utilize Linear Regression to help us predict the future price trend based on historic data.

Section 2: Expoential Moving Average (EMA)

We know that in predicting stock prices utilize techinical analysis is crucial.

  • Techinical analysis indicator determines the support and resistance levels. This help indicate whether the prices has dropped lower or climbed higher.

  • Techinical indicators are heuristic or pattern-based signals produced by the price, volume, and/or open interest of a security or contract used by traders who follow techinical analysis.

Based on some research online, we will add exponential moving average to our existing data set.

  • Expoential moving average is a type of moving average that places a greater weight and significance on the most recent data points.

import pandas as pd 

nfstocks1['EMA'] = nfstocks1['Adj Close'].ewm(span=20, min_periods=0,adjust=False,ignore_na=False).mean()
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until

Short term traders usually rely on 12 to 26 day EMA. Especially that EMA reacts more quickly to price swings than the SMA, but it will lag quite a bit over longer periods.

In order to test the accuracy of the EMA calculation, we searched that for 2021.06.03, on MarketWatch website shows that EMA for Netflix is 499.77 USD, which matches our calculations.

c1 = Chart(nfstocks1).mark_line().encode(
    x = "Date",
    y = "Adj Close",
    color = "symbol:N"
).properties(title= "Stock Price")
c2 = Chart(nfstocks1).mark_line(color ="red").encode(
    x = "Date",
    y = "EMA"
)
c1 + c2

Now we are going to develop our regression model and see how effective the EMA is at predicting the price of the stock.

We are first going to use the 80/20 split to train and test our data

Section 3: Linear Regression

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(nfstocks1[['Adj Close']],nfstocks1[['EMA']], test_size = .2)
#test set
print(X_test.describe())
        Adj Close
count  989.000000
mean   120.458023
std    174.670201
min      0.459286
25%      3.940000
50%     25.214287
75%    163.070007
max    691.690002
#training set
print(X_train.describe())
         Adj Close
count  3956.000000
mean    110.194928
std     165.076349
min       0.372857
25%       3.920893
50%      23.812143
75%     130.042499
max     682.609985

Training Model

from sklearn.linear_model import LinearRegression 

reg = LinearRegression()
reg.fit(X_train, y_train)
LinearRegression()
y_pred = reg.predict(X_test)

Now we are going to use mean absolute error and coefficient of determination to examine how well this model fits and examine the coefficient.

from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))
print("Coefficient of Determination:", r2_score(y_test, y_pred))
Mean Absolute Error: 4.544997954933673
Coefficient of Determination: 0.9969900789763098

We know that Mean Absolute Error can be described as the sum of the absolute error for all observed values divided by the total number of observations. Therefore, the lower MAE we get, the better.

For the coefficient of determination(R_squared), we know that it has of valuyes of 1 or 0 will indicate the regression line represents all or none of the data. Therefore, we would want our coefficient is higher (closer to 1.0) since it helps to indicate that it is a better fit for the observation.

Based on the ideas and output above, we know that our regression from the MAE and R-squared perspectives that they are a good fit.

Now we want to utilize the graph to show the observed value and predicted values

df1 = pd.DataFrame(nfstocks1["Adj Close"].iloc[500:510])
df1["Date"] = pd.DataFrame(nfstocks1['Date'].iloc[500:510])
c3 = alt.Chart(df1).mark_circle().encode(
    x = alt.X("Date",scale=alt.Scale(zero=False)),
    y = alt.Y("Adj Close",scale=alt.Scale (zero=False))
)
    
nfstocks1["pred"] = reg.predict(nfstocks1[["Adj Close"]])
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
df1["prediction"] = pd.DataFrame(nfstocks1['pred'].iloc[500:510])
c4 = alt.Chart(df1).mark_line(color="red").encode(
    x = alt.X("Date",scale=alt.Scale(zero=False)),
    y = alt.Y("prediction",scale=alt.Scale (zero=False))
)
c3+c4

Section 4: Simulation to Test the Model

Since we already developed and trained the model based on historic pricing data. Now we want to develop the model that can use EMA of any given days to repdict the close price.

We also want to use a trading strategy such as if our predicted value of the stock is higher than the open value of the stock, we will consider to trade. However, if our predicted stock price is equal to or smaller than open value of the stock, we will consider not trade.

Input some data that we already have to test: (Here I choose data that are from consecutive days since the prediction of current stock price is based on the EMA value from the day before)

df2 = pd.DataFrame(nfstocks["Date"].iloc[4900:4944])
df2['Open'] = nfstocks['Open'].iloc[4900:4944]
df2['Adj Close'] = nfstocks['Adj Close'].iloc[4900:4944]
df2['EMA'] = nfstocks1['EMA'].iloc[4900:4944]

Predicted Value

df2['predict'] = reg.predict(df2[["EMA"]].values)
/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/base.py:451: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  "X does not have valid feature names, but"
#Here we use the conditions method, link is below reference
conditions = [df2['predict'] > df2['Open'],df2['predict'] < df2['Open']]
choices = ['Trade','Not Trade']
df2['Trade Decision'] = np.select(conditions, choices, default='Not Trade')

Now we want to create a seperate column with the potential loss/earning that we can make

df2['Earning'] = df2["predict"]-df2["Open"]

Before providing a direct view of potential earnings, you can utilize the function below to get the predicted value of the day.

You can follows the steps below:

#First, you can put your open value of the day, here we take 598.179993 
#as an example 
open = 598.179993	
#Second, run this block, you will get to know what is the predicted close price will be based on your open price
close = df2['Adj Close'].where(df2['Open'] == open).dropna().values[0]
print(f'If you have an open price as ${open}, your predicted close price will be ${close}')
If you have an open price as $598.179993, your predicted close price will be $605.039978
import matplotlib as mpl 

In order to give a more direct view of potential earnings, we highlighted the earning based on its value. If we are able

def style_negative(v, props='color:red;'):
    return props if v < 0 else None
df2 = df2.style.applymap(style_negative,subset=["Earning"])
def highlight_max(s, props = ''):
    return np.where(s == np.nanmax(s.values), props, '')
df2.apply(highlight_max, props='color:white;background-color:darkblue')
  Date Open Adj Close EMA predict Trade Decision Earning
4900 2021-11-08 00:00:00 650.289978 651.450012 654.898466 648.444913 Not Trade -1.845065
4901 2021-11-09 00:00:00 653.700012 655.989990 655.002421 648.547848 Not Trade -5.152164
4902 2021-11-10 00:00:00 653.010010 646.909973 654.231712 647.784696 Not Trade -5.225314
4903 2021-11-11 00:00:00 650.239990 657.580017 654.550598 648.100455 Not Trade -2.139535
4904 2021-11-12 00:00:00 660.010010 682.609985 657.222921 650.746573 Not Trade -9.263437
4905 2021-11-15 00:00:00 681.239990 679.330017 659.328358 652.831365 Not Trade -28.408625
4906 2021-11-16 00:00:00 678.270020 687.400024 662.001850 655.478641 Not Trade -22.791379
4907 2021-11-17 00:00:00 690.000000 691.690002 664.829293 658.278359 Not Trade -31.721641
4908 2021-11-18 00:00:00 691.609985 682.020020 666.466505 659.899517 Not Trade -31.710468
4909 2021-11-19 00:00:00 692.349976 678.799988 667.641123 661.062616 Not Trade -31.287360
4910 2021-11-22 00:00:00 676.020020 659.200012 666.837207 660.266584 Not Trade -15.753436
4911 2021-11-23 00:00:00 658.179993 654.059998 665.620330 659.061639 Trade 0.881646
4912 2021-11-24 00:00:00 658.010010 658.289978 664.922202 658.370356 Trade 0.360346
4913 2021-11-26 00:00:00 675.000000 665.640015 664.990565 658.438049 Not Trade -16.561951
4914 2021-11-29 00:00:00 663.200012 663.840027 664.880990 658.329548 Not Trade -4.870464
4915 2021-11-30 00:00:00 668.200012 641.900024 662.692326 656.162347 Not Trade -12.037665
4916 2021-12-01 00:00:00 649.479980 617.770020 658.414011 651.925985 Trade 2.446005
4917 2021-12-02 00:00:00 617.099976 616.469971 654.419341 647.970485 Trade 30.870509
4918 2021-12-03 00:00:00 622.750000 602.130005 649.439404 643.039381 Trade 20.289381
4919 2021-12-06 00:00:00 606.010010 612.690002 645.939461 639.573758 Trade 33.563748
4920 2021-12-07 00:00:00 619.830017 625.580017 644.000466 637.653777 Trade 17.823760
4921 2021-12-08 00:00:00 630.000000 628.080017 642.484233 636.152412 Trade 6.152412
4922 2021-12-09 00:00:00 627.580017 611.000000 639.485735 633.183316 Trade 5.603299
4923 2021-12-10 00:00:00 616.780029 611.659973 636.835662 630.559230 Trade 13.779201
4924 2021-12-13 00:00:00 612.000000 604.559998 633.761789 627.515499 Trade 15.515499
4925 2021-12-14 00:00:00 598.710022 597.989990 630.354951 624.142068 Trade 25.432046
4926 2021-12-15 00:00:00 598.179993 605.039978 627.944002 621.754760 Trade 23.574767
4927 2021-12-16 00:00:00 597.090027 591.059998 624.431239 618.276444 Trade 21.186417
4928 2021-12-17 00:00:00 591.609985 586.729980 620.840643 614.721056 Trade 23.111071
4929 2021-12-20 00:00:00 586.429993 593.739990 618.259629 612.165351 Trade 25.735358
4930 2021-12-21 00:00:00 597.539978 604.919983 616.989186 610.907366 Trade 13.367388
4931 2021-12-22 00:00:00 603.359985 614.239990 616.727358 610.648106 Trade 7.288121
4932 2021-12-23 00:00:00 616.400024 614.090027 616.476184 610.399394 Not Trade -6.000630
4933 2021-12-27 00:00:00 615.000000 613.119995 616.156547 610.082892 Not Trade -4.917108
4934 2021-12-28 00:00:00 614.950012 610.710022 615.637830 609.569261 Not Trade -5.380751
4935 2021-12-29 00:00:00 610.710022 610.539978 615.152320 609.088513 Not Trade -1.621509
4936 2021-12-30 00:00:00 612.989990 612.090027 614.860673 608.799725 Not Trade -4.190265
4937 2021-12-31 00:00:00 610.010010 602.440002 613.677752 607.628404 Not Trade -2.381606
4938 2022-01-03 00:00:00 605.609985 597.369995 612.124632 606.090514 Trade 0.480529
4939 2022-01-04 00:00:00 599.909973 591.150024 610.127051 604.112520 Trade 4.202547
4940 2022-01-05 00:00:00 592.000000 567.520020 606.069238 600.094498 Trade 8.094498
4941 2022-01-06 00:00:00 554.340027 553.289978 601.042642 595.117192 Trade 40.777165
4942 2022-01-07 00:00:00 549.460022 541.059998 595.330009 589.460577 Trade 40.000555
4943 2022-01-10 00:00:00 538.489990 539.849976 590.046197 584.228577 Trade 45.738587

Based on the graph above, we are able to see the highlights of the highest value of Netflix stock price in the selected 44 days. We also highlighted the trade areas to show you that the days that are reccomendated to trade with poential earnings.

Summary

In this project, we first cleaned and organized the data set. With the help of the graph, we are able to see the trend of the stock performances from 2002 to 2021. Later, we added another factor EMA, Expoential Moving Average, to our dataset and utilize the linear regression to train the model and help us predict the future stock prices. In the end, we utilize some data in the original dataset and ran simulations to test the effecitveness and application of this model.