League of Legends Winning Factor

Author:Tianyi Chen

Course Project, UC Irvine, Math 10, S22

Introduction

My project is to find a suitable model that can predict the winning rate of League of Legends team more correctly through the leading situation of a team with various data in the game. In this project, I tried four models, namely Logistic regression, Decision tree, Random forests, and K-Nearest Neighbors Classifier. All four of them have similar prediction accuracy. I relied on the decision tree importance to find that Gold Lead was the most important factor affecting the win rate.

Main portion of the project

import pandas as pd
import altair as alt
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import log_loss
df= pd.read_csv("high_diamond_ranked_10min.csv")
df
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood blueKills blueDeaths blueAssists blueEliteMonsters blueDragons ... redTowersDestroyed redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin
0 4519157822 0 28 2 1 9 6 11 0 0 ... 0 16567 6.8 17047 197 55 -643 8 19.7 1656.7
1 4523371949 0 12 1 0 5 5 5 0 0 ... 1 17620 6.8 17438 240 52 2908 1173 24.0 1762.0
2 4521474530 0 15 0 0 7 11 4 1 1 ... 0 17285 6.8 17254 203 28 1172 1033 20.3 1728.5
3 4524384067 0 43 1 0 4 5 5 1 0 ... 0 16478 7.0 17961 235 47 1321 7 23.5 1647.8
4 4436033771 0 75 4 0 6 6 6 0 0 ... 0 17404 7.0 18313 225 67 1004 -230 22.5 1740.4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9874 4527873286 1 17 2 1 7 4 5 1 1 ... 0 15246 6.8 16498 229 34 -2519 -2469 22.9 1524.6
9875 4527797466 1 54 0 0 6 4 8 1 1 ... 0 15456 7.0 18367 206 56 -782 -888 20.6 1545.6
9876 4527713716 0 23 1 0 6 7 5 0 0 ... 0 18319 7.4 19909 261 60 2416 1877 26.1 1831.9
9877 4527628313 0 14 4 1 2 3 3 1 1 ... 0 15298 7.2 18314 247 40 839 1085 24.7 1529.8
9878 4523772935 1 18 0 1 6 6 5 0 0 ... 0 15339 6.8 17379 201 46 -927 58 20.1 1533.9

9879 rows × 40 columns

# checking if there exist missing value
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 40 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   gameId                        9879 non-null   int64  
 1   blueWins                      9879 non-null   int64  
 2   blueWardsPlaced               9879 non-null   int64  
 3   blueWardsDestroyed            9879 non-null   int64  
 4   blueFirstBlood                9879 non-null   int64  
 5   blueKills                     9879 non-null   int64  
 6   blueDeaths                    9879 non-null   int64  
 7   blueAssists                   9879 non-null   int64  
 8   blueEliteMonsters             9879 non-null   int64  
 9   blueDragons                   9879 non-null   int64  
 10  blueHeralds                   9879 non-null   int64  
 11  blueTowersDestroyed           9879 non-null   int64  
 12  blueTotalGold                 9879 non-null   int64  
 13  blueAvgLevel                  9879 non-null   float64
 14  blueTotalExperience           9879 non-null   int64  
 15  blueTotalMinionsKilled        9879 non-null   int64  
 16  blueTotalJungleMinionsKilled  9879 non-null   int64  
 17  blueGoldDiff                  9879 non-null   int64  
 18  blueExperienceDiff            9879 non-null   int64  
 19  blueCSPerMin                  9879 non-null   float64
 20  blueGoldPerMin                9879 non-null   float64
 21  redWardsPlaced                9879 non-null   int64  
 22  redWardsDestroyed             9879 non-null   int64  
 23  redFirstBlood                 9879 non-null   int64  
 24  redKills                      9879 non-null   int64  
 25  redDeaths                     9879 non-null   int64  
 26  redAssists                    9879 non-null   int64  
 27  redEliteMonsters              9879 non-null   int64  
 28  redDragons                    9879 non-null   int64  
 29  redHeralds                    9879 non-null   int64  
 30  redTowersDestroyed            9879 non-null   int64  
 31  redTotalGold                  9879 non-null   int64  
 32  redAvgLevel                   9879 non-null   float64
 33  redTotalExperience            9879 non-null   int64  
 34  redTotalMinionsKilled         9879 non-null   int64  
 35  redTotalJungleMinionsKilled   9879 non-null   int64  
 36  redGoldDiff                   9879 non-null   int64  
 37  redExperienceDiff             9879 non-null   int64  
 38  redCSPerMin                   9879 non-null   float64
 39  redGoldPerMin                 9879 non-null   float64
dtypes: float64(6), int64(34)
memory usage: 3.0 MB

As we can see, there is no missing value in this data set.

Now we need to make a sub_data to store the difference between each value of the blue and red sides.

# create the new dataframe of datas which contains differences of sides.
for i in range(2,21):
    df[df.columns[i]+"Lead"]=""
    
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 59 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   gameId                            9879 non-null   int64  
 1   blueWins                          9879 non-null   int64  
 2   blueWardsPlaced                   9879 non-null   int64  
 3   blueWardsDestroyed                9879 non-null   int64  
 4   blueFirstBlood                    9879 non-null   int64  
 5   blueKills                         9879 non-null   int64  
 6   blueDeaths                        9879 non-null   int64  
 7   blueAssists                       9879 non-null   int64  
 8   blueEliteMonsters                 9879 non-null   int64  
 9   blueDragons                       9879 non-null   int64  
 10  blueHeralds                       9879 non-null   int64  
 11  blueTowersDestroyed               9879 non-null   int64  
 12  blueTotalGold                     9879 non-null   int64  
 13  blueAvgLevel                      9879 non-null   float64
 14  blueTotalExperience               9879 non-null   int64  
 15  blueTotalMinionsKilled            9879 non-null   int64  
 16  blueTotalJungleMinionsKilled      9879 non-null   int64  
 17  blueGoldDiff                      9879 non-null   int64  
 18  blueExperienceDiff                9879 non-null   int64  
 19  blueCSPerMin                      9879 non-null   float64
 20  blueGoldPerMin                    9879 non-null   float64
 21  redWardsPlaced                    9879 non-null   int64  
 22  redWardsDestroyed                 9879 non-null   int64  
 23  redFirstBlood                     9879 non-null   int64  
 24  redKills                          9879 non-null   int64  
 25  redDeaths                         9879 non-null   int64  
 26  redAssists                        9879 non-null   int64  
 27  redEliteMonsters                  9879 non-null   int64  
 28  redDragons                        9879 non-null   int64  
 29  redHeralds                        9879 non-null   int64  
 30  redTowersDestroyed                9879 non-null   int64  
 31  redTotalGold                      9879 non-null   int64  
 32  redAvgLevel                       9879 non-null   float64
 33  redTotalExperience                9879 non-null   int64  
 34  redTotalMinionsKilled             9879 non-null   int64  
 35  redTotalJungleMinionsKilled       9879 non-null   int64  
 36  redGoldDiff                       9879 non-null   int64  
 37  redExperienceDiff                 9879 non-null   int64  
 38  redCSPerMin                       9879 non-null   float64
 39  redGoldPerMin                     9879 non-null   float64
 40  blueWardsPlacedLead               9879 non-null   object 
 41  blueWardsDestroyedLead            9879 non-null   object 
 42  blueFirstBloodLead                9879 non-null   object 
 43  blueKillsLead                     9879 non-null   object 
 44  blueDeathsLead                    9879 non-null   object 
 45  blueAssistsLead                   9879 non-null   object 
 46  blueEliteMonstersLead             9879 non-null   object 
 47  blueDragonsLead                   9879 non-null   object 
 48  blueHeraldsLead                   9879 non-null   object 
 49  blueTowersDestroyedLead           9879 non-null   object 
 50  blueTotalGoldLead                 9879 non-null   object 
 51  blueAvgLevelLead                  9879 non-null   object 
 52  blueTotalExperienceLead           9879 non-null   object 
 53  blueTotalMinionsKilledLead        9879 non-null   object 
 54  blueTotalJungleMinionsKilledLead  9879 non-null   object 
 55  blueGoldDiffLead                  9879 non-null   object 
 56  blueExperienceDiffLead            9879 non-null   object 
 57  blueCSPerMinLead                  9879 non-null   object 
 58  blueGoldPerMinLead                9879 non-null   object 
dtypes: float64(6), int64(34), object(19)
memory usage: 4.4+ MB
# making a sub_data we want to work on and converting object type to numerical type
df_dif=df.iloc[:,range(40,59)].apply(pd.to_numeric)
# inputting data by using blue side minus red side
for i in range (2,21):
    df_dif.iloc[:,i-2]= (df.iloc[:,i] - df.iloc[:,i+19])

As I looked at the names of the individual columni, I noticed that the money and experience differential was already there. We need to remove these duplicate entries (columns 15, 16)and also add the results of the match (“blueWins” columns).

Also,TotalGold is also reflect the same thing as GoldPerMinLead, so GoldPerMinLead are duplicate. KillsLead is also reflect the same thing as DeathsLead, so DeathsLead are duplicate.

#Check if they are duplicate.
df.iloc[:,[17,18,36,37,12,14,31,33]]
blueGoldDiff blueExperienceDiff redGoldDiff redExperienceDiff blueTotalGold blueTotalExperience redTotalGold redTotalExperience
0 643 -8 -643 8 17210 17039 16567 17047
1 -2908 -1173 2908 1173 14712 16265 17620 17438
2 -1172 -1033 1172 1033 16113 16221 17285 17254
3 -1321 -7 1321 7 15157 17954 16478 17961
4 -1004 230 1004 -230 16400 18543 17404 18313
... ... ... ... ... ... ... ... ...
9874 2519 2469 -2519 -2469 17765 18967 15246 16498
9875 782 888 -782 -888 16238 19255 15456 18367
9876 -2416 -1877 2416 1877 15903 18032 18319 19909
9877 -839 -1085 839 1085 14459 17229 15298 18314
9878 927 -58 -927 58 16266 17321 15339 17379

9879 rows × 8 columns

df_dif.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 19 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   blueWardsPlacedLead               9879 non-null   int64  
 1   blueWardsDestroyedLead            9879 non-null   int64  
 2   blueFirstBloodLead                9879 non-null   int64  
 3   blueKillsLead                     9879 non-null   int64  
 4   blueDeathsLead                    9879 non-null   int64  
 5   blueAssistsLead                   9879 non-null   int64  
 6   blueEliteMonstersLead             9879 non-null   int64  
 7   blueDragonsLead                   9879 non-null   int64  
 8   blueHeraldsLead                   9879 non-null   int64  
 9   blueTowersDestroyedLead           9879 non-null   int64  
 10  blueTotalGoldLead                 9879 non-null   int64  
 11  blueAvgLevelLead                  9879 non-null   float64
 12  blueTotalExperienceLead           9879 non-null   int64  
 13  blueTotalMinionsKilledLead        9879 non-null   int64  
 14  blueTotalJungleMinionsKilledLead  9879 non-null   int64  
 15  blueGoldDiffLead                  9879 non-null   int64  
 16  blueExperienceDiffLead            9879 non-null   int64  
 17  blueCSPerMinLead                  9879 non-null   float64
 18  blueGoldPerMinLead                9879 non-null   float64
dtypes: float64(3), int64(16)
memory usage: 1.4 MB
# Droping the meaningless column
df_dif["blueWins"] = df["blueWins"]
df_dif.drop(["blueGoldDiffLead","blueExperienceDiffLead","blueGoldPerMinLead","blueDeathsLead"], inplace=True, axis=1)

Now we have cleanning dataframe.

df_dif.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 16 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   blueWardsPlacedLead               9879 non-null   int64  
 1   blueWardsDestroyedLead            9879 non-null   int64  
 2   blueFirstBloodLead                9879 non-null   int64  
 3   blueKillsLead                     9879 non-null   int64  
 4   blueAssistsLead                   9879 non-null   int64  
 5   blueEliteMonstersLead             9879 non-null   int64  
 6   blueDragonsLead                   9879 non-null   int64  
 7   blueHeraldsLead                   9879 non-null   int64  
 8   blueTowersDestroyedLead           9879 non-null   int64  
 9   blueTotalGoldLead                 9879 non-null   int64  
 10  blueAvgLevelLead                  9879 non-null   float64
 11  blueTotalExperienceLead           9879 non-null   int64  
 12  blueTotalMinionsKilledLead        9879 non-null   int64  
 13  blueTotalJungleMinionsKilledLead  9879 non-null   int64  
 14  blueCSPerMinLead                  9879 non-null   float64
 15  blueWins                          9879 non-null   int64  
dtypes: float64(2), int64(14)
memory usage: 1.2 MB

We can test several models to predict the winrate of a game now. Let’s see what is the best model for League of Legends winrate prediction.

Baseline prediction

# Get the columns'name that we want to use to predict.
cols=df_dif.columns[0:15]
X_train, X_test, y_train, y_test = train_test_split(df_dif[cols], df_dif["blueWins"], test_size=0.5, random_state=0)
1-y_test.mean()
0.5135627530364373

So our baseline of prediction is 51%.

Logistic regression

reg = LogisticRegression(max_iter=100)
reg.fit(df_dif[cols], df_dif["blueWins"])
/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,
LogisticRegression()

At first I meet the problem (“lbfgs failed to converge”) of iteration 100 times, so I increased it to 1000.

reg = LogisticRegression(max_iter=1000)
reg.fit(df_dif[cols], df_dif["blueWins"])
LogisticRegression(max_iter=1000)
reg.score(X_train, y_train)
0.7363838833772018
# Prediction accuracy by Logistic regression
reg.score(X_test, y_test)
0.728744939271255

For Logistic regression, we get 73% and 73% for train dataset and test dataset. The difference is small so we do not have to worry about overfitting.

We restore the accuracy for final comparation.

LR_pred= reg.score(X_test, y_test)

Decision tree

clf = DecisionTreeClassifier(max_depth=7, max_leaf_nodes=40)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=7, max_leaf_nodes=40)
clf.score(X_train, y_train)
0.756023486535736
# Prediction accuracy by Decision tree
clf.score(X_test, y_test)
0.7192307692307692

For decision tree model, we get 76% and 72% for train dataset and test dataset. The difference is small so we do not have to worry about overfitting.

For Decision tree, we can go in deep a little bit to find more information.

# Visualizing Decision tree (not good on this project)
fig = plt.figure(figsize=(300,200))
plot_tree(
    clf,
    feature_names=clf.feature_names_in_,
    filled=True
);
../../_images/TianyiChen_40_0.png

In decision tree model, one more thing we can do is to look at the degree of influence of each factor on the wins, so we know which one is the most important for the winning.

pd.Series(clf.feature_importances_, index=cols)
blueWardsPlacedLead                 0.008686
blueWardsDestroyedLead              0.003835
blueFirstBloodLead                  0.000000
blueKillsLead                       0.004430
blueAssistsLead                     0.006805
blueEliteMonstersLead               0.013418
blueDragonsLead                     0.026373
blueHeraldsLead                     0.000000
blueTowersDestroyedLead             0.000000
blueTotalGoldLead                   0.839309
blueAvgLevelLead                    0.009342
blueTotalExperienceLead             0.062684
blueTotalMinionsKilledLead          0.004767
blueTotalJungleMinionsKilledLead    0.017190
blueCSPerMinLead                    0.003162
dtype: float64

We can easily find taht Gold is much more important than any other factors, but let’s make an altair chart to visulize this importances.

# restore the data of each factors'importance
df_importance = pd.DataFrame({"importance": clf.feature_importances_, "feature": clf.feature_names_in_})
alt.Chart(df_importance).mark_bar().encode(
    x="importance",
    y="feature",
    tooltip=["importance", "feature"],
).properties(
    title="Importance of factors affecting Blue's win"
)

As we can see from the chart, gold is the most important factor in influencing victory. The second is experience. The third is Dragons.

For decision tree, different nodes will cause some problems (overfitting or underfitting).Let’s see what kind number of nodes is better for the model.

train_error_dict = {}
test_error_dict = {}
for n in range(2,40):
    clf = DecisionTreeClassifier(max_depth=10, max_leaf_nodes=n)
    clf.fit(X_train, y_train)
    train_error_dict[n]= log_loss(y_train, clf.predict_proba(X_train))
    test_error_dict[n]= log_loss(y_test, clf.predict_proba(X_test))
df_train = pd.DataFrame({"y":train_error_dict, "type": "train"})
df_test = pd.DataFrame({"y":test_error_dict, "type": "test"})
df_error = pd.concat([df_train, df_test]).reset_index()
alt.Chart(df_error).mark_line(clip=True).encode(
    x="index:O",
    y="y",
    color="type"
)

We found that n=24 is the best for the model. Let’s try to use best nodes for decision tree again to find better accuracy maybe.

clf = DecisionTreeClassifier(max_depth=7, max_leaf_nodes=24)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=7, max_leaf_nodes=24)
clf.score(X_train, y_train)
0.7501518526017412
clf.score(X_test, y_test)
0.7226720647773279

We restore the accuracy for final comparation.

DT_pred=clf.score(X_test, y_test)

Random forests

rfe = RandomForestClassifier(n_estimators=1000, max_leaf_nodes=25)
rfe.fit(X_train, y_train)
RandomForestClassifier(max_leaf_nodes=25, n_estimators=1000)
rfe.score(X_train, y_train)
0.755618546264426
rfe.score(X_test, y_test)
0.7230769230769231

Again, the difference is small so we do not have to worry about overfitting. We have 72% predicting accuracy on wins.

Now we again find the importance of factors.

# restore the data of each factors'importance
df_importance1 = pd.DataFrame({"importance": rfe.feature_importances_, "feature": rfe.feature_names_in_})
alt.Chart(df_importance1).mark_bar().encode(
    x="importance",
    y="feature",
    tooltip=["importance", "feature"],
).properties(
    title="Importance of factors affecting Blue's win"
)

From the chart, we found that gold and experience are still the two most influential factors. However, there is a significant decrease in the value of gold. And, the third one is KillsLead.

We restore the accuracy for final comparation.

# Prediction accuracy by Random forests
RF_pred = rfe.score(X_test, y_test)

K-Nearest Neighbors Classifier

For the K-Nearest Neighbors Classifier, we need to rescale the data first.

# make a copy which restore the rescaled data.
df_dif_rescaled=df_dif.copy()
scaler = StandardScaler(with_mean=True, with_std=False)
scaler.fit(df_dif_rescaled[cols])
StandardScaler(with_std=False)
df_dif_rescaled[cols] = scaler.transform(df_dif_rescaled[cols])
X_train1, X_test1, y_train1, y_test1 = train_test_split(df_dif_rescaled[cols], df_dif_rescaled["blueWins"], test_size=0.5, random_state=0)
#make a function so we can change the number of neighbors to find the best model of K-Nearest Neighbors Classifier for league of legends.
def make_KNN(n):
    neigh = KNeighborsClassifier(n_neighbors=n)
    neigh.fit(X_train1, y_train1)
    return(neigh.score(X_train1, y_train1),neigh.score(X_test1, y_test1))
a,b=make_KNN(10)
a
0.7505567928730512
# Prediction accuracy by K-Nearest Neighbors Classifier
b
0.7040485829959514

It seems like we are underfitting, so we try to use larger n_neighbors.

a,b=make_KNN(100)
a
0.733144361206722
b
0.7224696356275304

This time it looks better, and we have 72% accuracy on K-Nearest Neighbors Classifier model.

We restore the accuracy for final comparation.

KNN_pred = b

Now we compare the probability of each method.

pd.DataFrame({"prob": [LR_pred, DT_pred, RF_pred, KNN_pred]}, index = ["LR","DT","RF","KNN"] )
prob
LR 0.728745
DT 0.722672
RF 0.723077
KNN 0.722470

These four models all has pretty similar probability, and logistic regression has a tiny lead. Overall, all four of these models can be used to predict win or lose.

Decision boundary for logistic regression

Based on Decision tree model, we now know that Gold and Experience are the two factors which has two highest importance values.

Since logistic regression has a good job on predicting the wins, we use logistic regression to make a decision boundary graph based on Gold and Experience. Using decision boundary, we can predict our win rate by the specific experience lead and the gold lead.

# get new columns'names that we will work on for the decision boundary 
cols1 = ["blueTotalGoldLead","blueTotalExperienceLead"]
reg1 = LogisticRegression()
reg1.fit(df_dif[cols1], df_dif["blueWins"])
LogisticRegression()
#get coefficient and intercept 
Goldcoef, Expcoef = reg1.coef_[0]
intercept = reg1.intercept_[0]
# The formula of 70% and 50% win rate based on given Goldlead.
win70 = lambda GoldLead: (1/Expcoef)*(-np.log((1/0.7)-1)-intercept-Goldcoef*GoldLead)
win50 = lambda GoldLead: (1/Expcoef)*(-np.log((1/0.5)-1)-intercept-Goldcoef*GoldLead)

It shows error when i make graph on original dataset because my dataset is bigger than 5000. Thus, I have to choose 5000 random sample from it.

df_sub = df_dif.sample(5000)
df_sub["bdry70"] = df_sub["blueTotalGoldLead"].map(win70)
df_sub["bdry50"] = df_sub["blueTotalGoldLead"].map(win50)
c=alt.Chart(df_sub).mark_circle().encode(
    x=alt.X("blueTotalGoldLead", scale=alt.Scale(zero=False)),
    y=alt.Y("blueTotalExperienceLead", scale=alt.Scale(zero=False)),
    color="blueWins:N"
)
c70= alt.Chart(df_sub).mark_line(color="red").encode(
    x=alt.X("blueTotalGoldLead", scale=alt.Scale(zero=False)),
    y=alt.Y("bdry70", scale=alt.Scale(zero=False)),
    tooltip=["blueTotalGoldLead","bdry70"]
)
c50= alt.Chart(df_sub).mark_line(color="black").encode(
    x=alt.X("blueTotalGoldLead", scale=alt.Scale(zero=False)),
    y=alt.Y("bdry50", scale=alt.Scale(zero=False)),
    tooltip=["blueTotalGoldLead","bdry50"]
)
c+c70+c50

Based on this graph, we can predict whether a team’s win rate in League of Legends is over 50 and 70 by how much experience and how many gold they have ahead.

Summary

I tested four models, Logistic regression, Decision tree, Random forests, and K-Nearest Neighbors Classifier. The prediction accuracy of all four models is around 72%. I think it is of reference value. Among them, in the decision tree model, I found that GOLD is the most important winning factor among various factors. The next most important factor is experience. This shows that a team needs to find a way to have gold and experience ahead in order to win in League of Legends

References

  • What is the source of your dataset(s)?

This is where I get my Data

  • Were any portions of the code or ideas taken from another source? List those sources here and say how they were used.

This is the website I learn how to Drop the column in dataframe.

This is the website I learn how to use K-Nearest Neighbors Classifier.

  • List other references that you found helpful.

https://www.educative.io/edpresso/how-to-delete-a-column-in-pandas

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

Created in deepnote.com Created in Deepnote