League of Legends Winning Factor
Contents
League of Legends Winning Factor¶
Author:Tianyi Chen
Course Project, UC Irvine, Math 10, S22
Introduction¶
My project is to find a suitable model that can predict the winning rate of League of Legends team more correctly through the leading situation of a team with various data in the game. In this project, I tried four models, namely Logistic regression, Decision tree, Random forests, and K-Nearest Neighbors Classifier. All four of them have similar prediction accuracy. I relied on the decision tree importance to find that Gold Lead was the most important factor affecting the win rate.
Main portion of the project¶
import pandas as pd
import altair as alt
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import log_loss
df= pd.read_csv("high_diamond_ranked_10min.csv")
df
gameId | blueWins | blueWardsPlaced | blueWardsDestroyed | blueFirstBlood | blueKills | blueDeaths | blueAssists | blueEliteMonsters | blueDragons | ... | redTowersDestroyed | redTotalGold | redAvgLevel | redTotalExperience | redTotalMinionsKilled | redTotalJungleMinionsKilled | redGoldDiff | redExperienceDiff | redCSPerMin | redGoldPerMin | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4519157822 | 0 | 28 | 2 | 1 | 9 | 6 | 11 | 0 | 0 | ... | 0 | 16567 | 6.8 | 17047 | 197 | 55 | -643 | 8 | 19.7 | 1656.7 |
1 | 4523371949 | 0 | 12 | 1 | 0 | 5 | 5 | 5 | 0 | 0 | ... | 1 | 17620 | 6.8 | 17438 | 240 | 52 | 2908 | 1173 | 24.0 | 1762.0 |
2 | 4521474530 | 0 | 15 | 0 | 0 | 7 | 11 | 4 | 1 | 1 | ... | 0 | 17285 | 6.8 | 17254 | 203 | 28 | 1172 | 1033 | 20.3 | 1728.5 |
3 | 4524384067 | 0 | 43 | 1 | 0 | 4 | 5 | 5 | 1 | 0 | ... | 0 | 16478 | 7.0 | 17961 | 235 | 47 | 1321 | 7 | 23.5 | 1647.8 |
4 | 4436033771 | 0 | 75 | 4 | 0 | 6 | 6 | 6 | 0 | 0 | ... | 0 | 17404 | 7.0 | 18313 | 225 | 67 | 1004 | -230 | 22.5 | 1740.4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9874 | 4527873286 | 1 | 17 | 2 | 1 | 7 | 4 | 5 | 1 | 1 | ... | 0 | 15246 | 6.8 | 16498 | 229 | 34 | -2519 | -2469 | 22.9 | 1524.6 |
9875 | 4527797466 | 1 | 54 | 0 | 0 | 6 | 4 | 8 | 1 | 1 | ... | 0 | 15456 | 7.0 | 18367 | 206 | 56 | -782 | -888 | 20.6 | 1545.6 |
9876 | 4527713716 | 0 | 23 | 1 | 0 | 6 | 7 | 5 | 0 | 0 | ... | 0 | 18319 | 7.4 | 19909 | 261 | 60 | 2416 | 1877 | 26.1 | 1831.9 |
9877 | 4527628313 | 0 | 14 | 4 | 1 | 2 | 3 | 3 | 1 | 1 | ... | 0 | 15298 | 7.2 | 18314 | 247 | 40 | 839 | 1085 | 24.7 | 1529.8 |
9878 | 4523772935 | 1 | 18 | 0 | 1 | 6 | 6 | 5 | 0 | 0 | ... | 0 | 15339 | 6.8 | 17379 | 201 | 46 | -927 | 58 | 20.1 | 1533.9 |
9879 rows × 40 columns
# checking if there exist missing value
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 40 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gameId 9879 non-null int64
1 blueWins 9879 non-null int64
2 blueWardsPlaced 9879 non-null int64
3 blueWardsDestroyed 9879 non-null int64
4 blueFirstBlood 9879 non-null int64
5 blueKills 9879 non-null int64
6 blueDeaths 9879 non-null int64
7 blueAssists 9879 non-null int64
8 blueEliteMonsters 9879 non-null int64
9 blueDragons 9879 non-null int64
10 blueHeralds 9879 non-null int64
11 blueTowersDestroyed 9879 non-null int64
12 blueTotalGold 9879 non-null int64
13 blueAvgLevel 9879 non-null float64
14 blueTotalExperience 9879 non-null int64
15 blueTotalMinionsKilled 9879 non-null int64
16 blueTotalJungleMinionsKilled 9879 non-null int64
17 blueGoldDiff 9879 non-null int64
18 blueExperienceDiff 9879 non-null int64
19 blueCSPerMin 9879 non-null float64
20 blueGoldPerMin 9879 non-null float64
21 redWardsPlaced 9879 non-null int64
22 redWardsDestroyed 9879 non-null int64
23 redFirstBlood 9879 non-null int64
24 redKills 9879 non-null int64
25 redDeaths 9879 non-null int64
26 redAssists 9879 non-null int64
27 redEliteMonsters 9879 non-null int64
28 redDragons 9879 non-null int64
29 redHeralds 9879 non-null int64
30 redTowersDestroyed 9879 non-null int64
31 redTotalGold 9879 non-null int64
32 redAvgLevel 9879 non-null float64
33 redTotalExperience 9879 non-null int64
34 redTotalMinionsKilled 9879 non-null int64
35 redTotalJungleMinionsKilled 9879 non-null int64
36 redGoldDiff 9879 non-null int64
37 redExperienceDiff 9879 non-null int64
38 redCSPerMin 9879 non-null float64
39 redGoldPerMin 9879 non-null float64
dtypes: float64(6), int64(34)
memory usage: 3.0 MB
As we can see, there is no missing value in this data set.
Now we need to make a sub_data to store the difference between each value of the blue and red sides.
# create the new dataframe of datas which contains differences of sides.
for i in range(2,21):
df[df.columns[i]+"Lead"]=""
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 59 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gameId 9879 non-null int64
1 blueWins 9879 non-null int64
2 blueWardsPlaced 9879 non-null int64
3 blueWardsDestroyed 9879 non-null int64
4 blueFirstBlood 9879 non-null int64
5 blueKills 9879 non-null int64
6 blueDeaths 9879 non-null int64
7 blueAssists 9879 non-null int64
8 blueEliteMonsters 9879 non-null int64
9 blueDragons 9879 non-null int64
10 blueHeralds 9879 non-null int64
11 blueTowersDestroyed 9879 non-null int64
12 blueTotalGold 9879 non-null int64
13 blueAvgLevel 9879 non-null float64
14 blueTotalExperience 9879 non-null int64
15 blueTotalMinionsKilled 9879 non-null int64
16 blueTotalJungleMinionsKilled 9879 non-null int64
17 blueGoldDiff 9879 non-null int64
18 blueExperienceDiff 9879 non-null int64
19 blueCSPerMin 9879 non-null float64
20 blueGoldPerMin 9879 non-null float64
21 redWardsPlaced 9879 non-null int64
22 redWardsDestroyed 9879 non-null int64
23 redFirstBlood 9879 non-null int64
24 redKills 9879 non-null int64
25 redDeaths 9879 non-null int64
26 redAssists 9879 non-null int64
27 redEliteMonsters 9879 non-null int64
28 redDragons 9879 non-null int64
29 redHeralds 9879 non-null int64
30 redTowersDestroyed 9879 non-null int64
31 redTotalGold 9879 non-null int64
32 redAvgLevel 9879 non-null float64
33 redTotalExperience 9879 non-null int64
34 redTotalMinionsKilled 9879 non-null int64
35 redTotalJungleMinionsKilled 9879 non-null int64
36 redGoldDiff 9879 non-null int64
37 redExperienceDiff 9879 non-null int64
38 redCSPerMin 9879 non-null float64
39 redGoldPerMin 9879 non-null float64
40 blueWardsPlacedLead 9879 non-null object
41 blueWardsDestroyedLead 9879 non-null object
42 blueFirstBloodLead 9879 non-null object
43 blueKillsLead 9879 non-null object
44 blueDeathsLead 9879 non-null object
45 blueAssistsLead 9879 non-null object
46 blueEliteMonstersLead 9879 non-null object
47 blueDragonsLead 9879 non-null object
48 blueHeraldsLead 9879 non-null object
49 blueTowersDestroyedLead 9879 non-null object
50 blueTotalGoldLead 9879 non-null object
51 blueAvgLevelLead 9879 non-null object
52 blueTotalExperienceLead 9879 non-null object
53 blueTotalMinionsKilledLead 9879 non-null object
54 blueTotalJungleMinionsKilledLead 9879 non-null object
55 blueGoldDiffLead 9879 non-null object
56 blueExperienceDiffLead 9879 non-null object
57 blueCSPerMinLead 9879 non-null object
58 blueGoldPerMinLead 9879 non-null object
dtypes: float64(6), int64(34), object(19)
memory usage: 4.4+ MB
# making a sub_data we want to work on and converting object type to numerical type
df_dif=df.iloc[:,range(40,59)].apply(pd.to_numeric)
# inputting data by using blue side minus red side
for i in range (2,21):
df_dif.iloc[:,i-2]= (df.iloc[:,i] - df.iloc[:,i+19])
As I looked at the names of the individual columni, I noticed that the money and experience differential was already there. We need to remove these duplicate entries (columns 15, 16)and also add the results of the match (“blueWins” columns).
Also,TotalGold is also reflect the same thing as GoldPerMinLead, so GoldPerMinLead are duplicate. KillsLead is also reflect the same thing as DeathsLead, so DeathsLead are duplicate.
#Check if they are duplicate.
df.iloc[:,[17,18,36,37,12,14,31,33]]
blueGoldDiff | blueExperienceDiff | redGoldDiff | redExperienceDiff | blueTotalGold | blueTotalExperience | redTotalGold | redTotalExperience | |
---|---|---|---|---|---|---|---|---|
0 | 643 | -8 | -643 | 8 | 17210 | 17039 | 16567 | 17047 |
1 | -2908 | -1173 | 2908 | 1173 | 14712 | 16265 | 17620 | 17438 |
2 | -1172 | -1033 | 1172 | 1033 | 16113 | 16221 | 17285 | 17254 |
3 | -1321 | -7 | 1321 | 7 | 15157 | 17954 | 16478 | 17961 |
4 | -1004 | 230 | 1004 | -230 | 16400 | 18543 | 17404 | 18313 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
9874 | 2519 | 2469 | -2519 | -2469 | 17765 | 18967 | 15246 | 16498 |
9875 | 782 | 888 | -782 | -888 | 16238 | 19255 | 15456 | 18367 |
9876 | -2416 | -1877 | 2416 | 1877 | 15903 | 18032 | 18319 | 19909 |
9877 | -839 | -1085 | 839 | 1085 | 14459 | 17229 | 15298 | 18314 |
9878 | 927 | -58 | -927 | 58 | 16266 | 17321 | 15339 | 17379 |
9879 rows × 8 columns
df_dif.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 blueWardsPlacedLead 9879 non-null int64
1 blueWardsDestroyedLead 9879 non-null int64
2 blueFirstBloodLead 9879 non-null int64
3 blueKillsLead 9879 non-null int64
4 blueDeathsLead 9879 non-null int64
5 blueAssistsLead 9879 non-null int64
6 blueEliteMonstersLead 9879 non-null int64
7 blueDragonsLead 9879 non-null int64
8 blueHeraldsLead 9879 non-null int64
9 blueTowersDestroyedLead 9879 non-null int64
10 blueTotalGoldLead 9879 non-null int64
11 blueAvgLevelLead 9879 non-null float64
12 blueTotalExperienceLead 9879 non-null int64
13 blueTotalMinionsKilledLead 9879 non-null int64
14 blueTotalJungleMinionsKilledLead 9879 non-null int64
15 blueGoldDiffLead 9879 non-null int64
16 blueExperienceDiffLead 9879 non-null int64
17 blueCSPerMinLead 9879 non-null float64
18 blueGoldPerMinLead 9879 non-null float64
dtypes: float64(3), int64(16)
memory usage: 1.4 MB
# Droping the meaningless column
df_dif["blueWins"] = df["blueWins"]
df_dif.drop(["blueGoldDiffLead","blueExperienceDiffLead","blueGoldPerMinLead","blueDeathsLead"], inplace=True, axis=1)
Now we have cleanning dataframe.
df_dif.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 blueWardsPlacedLead 9879 non-null int64
1 blueWardsDestroyedLead 9879 non-null int64
2 blueFirstBloodLead 9879 non-null int64
3 blueKillsLead 9879 non-null int64
4 blueAssistsLead 9879 non-null int64
5 blueEliteMonstersLead 9879 non-null int64
6 blueDragonsLead 9879 non-null int64
7 blueHeraldsLead 9879 non-null int64
8 blueTowersDestroyedLead 9879 non-null int64
9 blueTotalGoldLead 9879 non-null int64
10 blueAvgLevelLead 9879 non-null float64
11 blueTotalExperienceLead 9879 non-null int64
12 blueTotalMinionsKilledLead 9879 non-null int64
13 blueTotalJungleMinionsKilledLead 9879 non-null int64
14 blueCSPerMinLead 9879 non-null float64
15 blueWins 9879 non-null int64
dtypes: float64(2), int64(14)
memory usage: 1.2 MB
We can test several models to predict the winrate of a game now. Let’s see what is the best model for League of Legends winrate prediction.
Baseline prediction¶
# Get the columns'name that we want to use to predict.
cols=df_dif.columns[0:15]
X_train, X_test, y_train, y_test = train_test_split(df_dif[cols], df_dif["blueWins"], test_size=0.5, random_state=0)
1-y_test.mean()
0.5135627530364373
So our baseline of prediction is 51%.
Logistic regression¶
reg = LogisticRegression(max_iter=100)
reg.fit(df_dif[cols], df_dif["blueWins"])
/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,
LogisticRegression()
At first I meet the problem (“lbfgs failed to converge”) of iteration 100 times, so I increased it to 1000.
reg = LogisticRegression(max_iter=1000)
reg.fit(df_dif[cols], df_dif["blueWins"])
LogisticRegression(max_iter=1000)
reg.score(X_train, y_train)
0.7363838833772018
# Prediction accuracy by Logistic regression
reg.score(X_test, y_test)
0.728744939271255
For Logistic regression, we get 73% and 73% for train dataset and test dataset. The difference is small so we do not have to worry about overfitting.
We restore the accuracy for final comparation.
LR_pred= reg.score(X_test, y_test)
Decision tree¶
clf = DecisionTreeClassifier(max_depth=7, max_leaf_nodes=40)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=7, max_leaf_nodes=40)
clf.score(X_train, y_train)
0.756023486535736
# Prediction accuracy by Decision tree
clf.score(X_test, y_test)
0.7192307692307692
For decision tree model, we get 76% and 72% for train dataset and test dataset. The difference is small so we do not have to worry about overfitting.
For Decision tree, we can go in deep a little bit to find more information.
# Visualizing Decision tree (not good on this project)
fig = plt.figure(figsize=(300,200))
plot_tree(
clf,
feature_names=clf.feature_names_in_,
filled=True
);
In decision tree model, one more thing we can do is to look at the degree of influence of each factor on the wins, so we know which one is the most important for the winning.
pd.Series(clf.feature_importances_, index=cols)
blueWardsPlacedLead 0.008686
blueWardsDestroyedLead 0.003835
blueFirstBloodLead 0.000000
blueKillsLead 0.004430
blueAssistsLead 0.006805
blueEliteMonstersLead 0.013418
blueDragonsLead 0.026373
blueHeraldsLead 0.000000
blueTowersDestroyedLead 0.000000
blueTotalGoldLead 0.839309
blueAvgLevelLead 0.009342
blueTotalExperienceLead 0.062684
blueTotalMinionsKilledLead 0.004767
blueTotalJungleMinionsKilledLead 0.017190
blueCSPerMinLead 0.003162
dtype: float64
We can easily find taht Gold is much more important than any other factors, but let’s make an altair chart to visulize this importances.
# restore the data of each factors'importance
df_importance = pd.DataFrame({"importance": clf.feature_importances_, "feature": clf.feature_names_in_})
alt.Chart(df_importance).mark_bar().encode(
x="importance",
y="feature",
tooltip=["importance", "feature"],
).properties(
title="Importance of factors affecting Blue's win"
)
As we can see from the chart, gold is the most important factor in influencing victory. The second is experience. The third is Dragons.
For decision tree, different nodes will cause some problems (overfitting or underfitting).Let’s see what kind number of nodes is better for the model.
train_error_dict = {}
test_error_dict = {}
for n in range(2,40):
clf = DecisionTreeClassifier(max_depth=10, max_leaf_nodes=n)
clf.fit(X_train, y_train)
train_error_dict[n]= log_loss(y_train, clf.predict_proba(X_train))
test_error_dict[n]= log_loss(y_test, clf.predict_proba(X_test))
df_train = pd.DataFrame({"y":train_error_dict, "type": "train"})
df_test = pd.DataFrame({"y":test_error_dict, "type": "test"})
df_error = pd.concat([df_train, df_test]).reset_index()
alt.Chart(df_error).mark_line(clip=True).encode(
x="index:O",
y="y",
color="type"
)
We found that n=24 is the best for the model. Let’s try to use best nodes for decision tree again to find better accuracy maybe.
clf = DecisionTreeClassifier(max_depth=7, max_leaf_nodes=24)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=7, max_leaf_nodes=24)
clf.score(X_train, y_train)
0.7501518526017412
clf.score(X_test, y_test)
0.7226720647773279
We restore the accuracy for final comparation.
DT_pred=clf.score(X_test, y_test)
Random forests¶
rfe = RandomForestClassifier(n_estimators=1000, max_leaf_nodes=25)
rfe.fit(X_train, y_train)
RandomForestClassifier(max_leaf_nodes=25, n_estimators=1000)
rfe.score(X_train, y_train)
0.755618546264426
rfe.score(X_test, y_test)
0.7230769230769231
Again, the difference is small so we do not have to worry about overfitting. We have 72% predicting accuracy on wins.
Now we again find the importance of factors.
# restore the data of each factors'importance
df_importance1 = pd.DataFrame({"importance": rfe.feature_importances_, "feature": rfe.feature_names_in_})
alt.Chart(df_importance1).mark_bar().encode(
x="importance",
y="feature",
tooltip=["importance", "feature"],
).properties(
title="Importance of factors affecting Blue's win"
)
From the chart, we found that gold and experience are still the two most influential factors. However, there is a significant decrease in the value of gold. And, the third one is KillsLead.
We restore the accuracy for final comparation.
# Prediction accuracy by Random forests
RF_pred = rfe.score(X_test, y_test)
K-Nearest Neighbors Classifier¶
For the K-Nearest Neighbors Classifier, we need to rescale the data first.
# make a copy which restore the rescaled data.
df_dif_rescaled=df_dif.copy()
scaler = StandardScaler(with_mean=True, with_std=False)
scaler.fit(df_dif_rescaled[cols])
StandardScaler(with_std=False)
df_dif_rescaled[cols] = scaler.transform(df_dif_rescaled[cols])
X_train1, X_test1, y_train1, y_test1 = train_test_split(df_dif_rescaled[cols], df_dif_rescaled["blueWins"], test_size=0.5, random_state=0)
#make a function so we can change the number of neighbors to find the best model of K-Nearest Neighbors Classifier for league of legends.
def make_KNN(n):
neigh = KNeighborsClassifier(n_neighbors=n)
neigh.fit(X_train1, y_train1)
return(neigh.score(X_train1, y_train1),neigh.score(X_test1, y_test1))
a,b=make_KNN(10)
a
0.7505567928730512
# Prediction accuracy by K-Nearest Neighbors Classifier
b
0.7040485829959514
It seems like we are underfitting, so we try to use larger n_neighbors.
a,b=make_KNN(100)
a
0.733144361206722
b
0.7224696356275304
This time it looks better, and we have 72% accuracy on K-Nearest Neighbors Classifier model.
We restore the accuracy for final comparation.
KNN_pred = b
Now we compare the probability of each method.
pd.DataFrame({"prob": [LR_pred, DT_pred, RF_pred, KNN_pred]}, index = ["LR","DT","RF","KNN"] )
prob | |
---|---|
LR | 0.728745 |
DT | 0.722672 |
RF | 0.723077 |
KNN | 0.722470 |
These four models all has pretty similar probability, and logistic regression has a tiny lead. Overall, all four of these models can be used to predict win or lose.
Decision boundary for logistic regression¶
Based on Decision tree model, we now know that Gold and Experience are the two factors which has two highest importance values.
Since logistic regression has a good job on predicting the wins, we use logistic regression to make a decision boundary graph based on Gold and Experience. Using decision boundary, we can predict our win rate by the specific experience lead and the gold lead.
# get new columns'names that we will work on for the decision boundary
cols1 = ["blueTotalGoldLead","blueTotalExperienceLead"]
reg1 = LogisticRegression()
reg1.fit(df_dif[cols1], df_dif["blueWins"])
LogisticRegression()
#get coefficient and intercept
Goldcoef, Expcoef = reg1.coef_[0]
intercept = reg1.intercept_[0]
# The formula of 70% and 50% win rate based on given Goldlead.
win70 = lambda GoldLead: (1/Expcoef)*(-np.log((1/0.7)-1)-intercept-Goldcoef*GoldLead)
win50 = lambda GoldLead: (1/Expcoef)*(-np.log((1/0.5)-1)-intercept-Goldcoef*GoldLead)
It shows error when i make graph on original dataset because my dataset is bigger than 5000. Thus, I have to choose 5000 random sample from it.
df_sub = df_dif.sample(5000)
df_sub["bdry70"] = df_sub["blueTotalGoldLead"].map(win70)
df_sub["bdry50"] = df_sub["blueTotalGoldLead"].map(win50)
c=alt.Chart(df_sub).mark_circle().encode(
x=alt.X("blueTotalGoldLead", scale=alt.Scale(zero=False)),
y=alt.Y("blueTotalExperienceLead", scale=alt.Scale(zero=False)),
color="blueWins:N"
)
c70= alt.Chart(df_sub).mark_line(color="red").encode(
x=alt.X("blueTotalGoldLead", scale=alt.Scale(zero=False)),
y=alt.Y("bdry70", scale=alt.Scale(zero=False)),
tooltip=["blueTotalGoldLead","bdry70"]
)
c50= alt.Chart(df_sub).mark_line(color="black").encode(
x=alt.X("blueTotalGoldLead", scale=alt.Scale(zero=False)),
y=alt.Y("bdry50", scale=alt.Scale(zero=False)),
tooltip=["blueTotalGoldLead","bdry50"]
)
c+c70+c50
Based on this graph, we can predict whether a team’s win rate in League of Legends is over 50 and 70 by how much experience and how many gold they have ahead.
Summary¶
I tested four models, Logistic regression, Decision tree, Random forests, and K-Nearest Neighbors Classifier. The prediction accuracy of all four models is around 72%. I think it is of reference value. Among them, in the decision tree model, I found that GOLD is the most important winning factor among various factors. The next most important factor is experience. This shows that a team needs to find a way to have gold and experience ahead in order to win in League of Legends
References¶
What is the source of your dataset(s)?
This is where I get my Data
Were any portions of the code or ideas taken from another source? List those sources here and say how they were used.
This is the website I learn how to Drop the column in dataframe.
This is the website I learn how to use K-Nearest Neighbors Classifier.
List other references that you found helpful.
https://www.educative.io/edpresso/how-to-delete-a-column-in-pandas
https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
Created in Deepnote