Men’s volleyball performances prediction#

Author:Mingyang Yi

Course Project, UC Irvine, Math 10, S23

Introduction#

As a volleyball fan, I’m very interested in what factor effects the result the most, also I want to predict one team’s performance.

Data Cleaning#

import pandas as pd
df=pd.read_csv("mensvolleyball-PlusLiga08-23.csv")

this is one way to delete part of name of columns, it is from PyData website.

df2 = df.filter(regex="T1|Winner|Team_1|Date")

i need to divide into two dataframe later so i need to change its boolen value.

df2['Winner'] = df2['Winner'].replace({1: 0, 0: 1})
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
df2.columns = df2.columns.str.replace("1", "")
df2
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Rec_Perf T_Att_Sum T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner
0 01.10.2022, 14:45 AZS Olsztyn 1 60.0 17.0 11.0 79.0 18 6.0 -13% ... 25% 100 7.0 14.0 47.0 47% 26% 7.0 11 0
1 30.09.2022, 17:30 Jastrzębski Węgiel 3 51.0 17.0 27.0 77.0 15 4.0 -7% ... 16% 88 4.0 1.0 43.0 48% 43% 4.0 8 1
2 01.10.2022, 20:30 LUK Lublin 2 76.0 23.0 35.0 109.0 16 3.0 -9% ... 21% 115 6.0 10.0 63.0 54% 40% 10.0 9 0
3 02.10.2022, 14:45 Warta Zawiercie 3 66.0 16.0 22.0 98.0 21 5.0 -16% ... 12% 92 8.0 7.0 52.0 56% 40% 9.0 11 1
4 03.10.2022, 17:30 BBTS Bielsko-Biała 1 63.0 22.0 17.0 100.0 19 7.0 -7% ... 23% 97 5.0 10.0 48.0 49% 34% 8.0 10 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2634 20.03.2010, 17:00 Pamapol Wielton Wieluń 3 50.0 74.0 6.0 11.0 2,00 37.0 0 ... 18 48% 67.0 4.0 7.0 35 52% 9.0 3,00 1
2635 19.03.2010, 18:00 ZAKSA Kędzierzyn-Koźle 3 54.0 74.0 4.0 11.0 1,33 46.0 2 ... 18 39% 74.0 4.0 9.0 41 55% 9.0 3,00 1
2636 20.03.2010, 17:00 PGE Skra Bełchatów 3 54.0 75.0 5.0 12.0 1,67 54.0 5 ... 15 27% 69.0 3.0 5.0 41 59% 8.0 2,67 1
2637 20.03.2010, 17:00 Asseco Resovia 3 55.0 73.0 8.0 6.0 2,67 49.0 1 ... 19 38% 88.0 5.0 7.0 42 48% 5.0 1,67 1
2638 20.03.2010, 14:45 Chemik Bydgoszcz 0 43.0 64.0 1.0 12.0 0,33 65.0 1 ... 26 40% 89.0 9.0 7.0 41 46% 1.0 0,33 0

2639 rows × 23 columns

df3=df.filter(regex='T2|Winner|Team_2|Date')
df3.columns = df3.columns.str.replace("2", "")
df3
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Rec_Perf T_Att_Sum T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner
0 01.10.2022, 14:45 ZAKSA Kędzierzyn-Koźle 3 69 30 38 96 11 10 2% ... 26% 88 7 7 45 51% 35% 14 11 1
1 30.09.2022, 17:30 GKS Katowice 0 48 16 16 70 16 4 -11% ... 20% 91 8 4 43 47% 34% 1 17 0
2 01.10.2022, 20:30 Czarni Radom 3 82 23 40 104 19 9 -5% ... 18% 128 10 10 63 49% 33% 10 13 1
3 02.10.2022, 14:45 PGE Skra Bełchatów 2 71 21 25 103 23 8 -8% ... 9% 102 9 9 56 54% 37% 7 14 0
4 03.10.2022, 17:30 Cuprum Lubin 3 80 30 32 103 26 12 -8% ... 22% 109 7 8 58 53% 39% 10 10 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2634 20.03.2010, 17:00 AZS Częstochowa 0 34 52 0 15 0,00 60 3 ... 26 43% 70 9 9 27 39% 7 2,33 0
2635 19.03.2010, 18:00 AZS Olsztyn 0 39 57 2 11 0,67 63 4 ... 14 22% 80 10 9 28 35% 9 3,00 0
2636 20.03.2010, 17:00 Jadar Radom 0 43 67 4 13 1,33 63 5 ... 11 17% 66 7 8 35 53% 5 1,67 0
2637 20.03.2010, 17:00 Projekt Warszawa 0 37 59 1 10 0,33 67 8 ... 16 23% 82 8 6 31 38% 6 2,00 0
2638 20.03.2010, 14:45 Jastrzębski Węgiel 3 50 66 1 9 0,33 52 1 ... 26 50% 73 7 1 42 58% 7 2,33 1

2639 rows × 23 columns

df1 = pd.concat([df2, df3], axis=0, ignore_index=True)
perc_cols = ['T_Srv_Eff', 'T_Rec_Pos', 'T_Rec_Perf', 'T_Att_Kill_Perc', 'T_Att_Eff', 'T_Att_Sum']
for col in perc_cols:
    df1[col] = pd.to_numeric(df1[col].str.replace('%', ''))
float_cols = ['T_Srv_Err', 'T_Blk_As']
for col in float_cols:
    df1[col] = pd.to_numeric(df1[col].str.replace(',', '.'))

I change the points they gain to the percent of points they gain in different ways, because some teams have 5 sets a match, but some teams only have 3 sets a match.

df1["T_Att_Kill"]=df1["T_Att_Kill"]/df1["T_Sum"]
df1["T_Blk_Sum"]=df1["T_Blk_Sum"]/df1["T_Sum"]
df1["T_Srv_Ace"]=df1["T_Srv_Ace"]/df1["T_Sum"]
df1
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Rec_Perf T_Att_Sum T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner
0 01.10.2022, 14:45 AZS Olsztyn 1 60.0 17.0 11.0 79.0 18.00 0.100000 -13 ... 25 100 7.0 14.0 0.783333 47 26 0.116667 11.00 0
1 30.09.2022, 17:30 Jastrzębski Węgiel 3 51.0 17.0 27.0 77.0 15.00 0.078431 -7 ... 16 88 4.0 1.0 0.843137 48 43 0.078431 8.00 1
2 01.10.2022, 20:30 LUK Lublin 2 76.0 23.0 35.0 109.0 16.00 0.039474 -9 ... 21 115 6.0 10.0 0.828947 54 40 0.131579 9.00 0
3 02.10.2022, 14:45 Warta Zawiercie 3 66.0 16.0 22.0 98.0 21.00 0.075758 -16 ... 12 92 8.0 7.0 0.787879 56 40 0.136364 11.00 1
4 03.10.2022, 17:30 BBTS Bielsko-Biała 1 63.0 22.0 17.0 100.0 19.00 0.111111 -7 ... 23 97 5.0 10.0 0.761905 49 34 0.126984 10.00 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5273 20.03.2010, 17:00 AZS Częstochowa 0 34.0 52.0 0.0 15.0 0.00 1.764706 3 ... 26 43 70.0 9.0 0.264706 27 39 0.205882 2.33 0
5274 19.03.2010, 18:00 AZS Olsztyn 0 39.0 57.0 2.0 11.0 0.67 1.615385 4 ... 14 22 80.0 10.0 0.230769 28 35 0.230769 3.00 0
5275 20.03.2010, 17:00 Jadar Radom 0 43.0 67.0 4.0 13.0 1.33 1.465116 5 ... 11 17 66.0 7.0 0.186047 35 53 0.116279 1.67 0
5276 20.03.2010, 17:00 Projekt Warszawa 0 37.0 59.0 1.0 10.0 0.33 1.810811 8 ... 16 23 82.0 8.0 0.162162 31 38 0.162162 2.00 0
5277 20.03.2010, 14:45 Jastrzębski Węgiel 3 50.0 66.0 1.0 9.0 0.33 1.040000 1 ... 26 50 73.0 7.0 0.020000 42 58 0.140000 2.33 1

5278 rows × 23 columns

model training#

Either summarize what you did, or summarize the results. Maybe 3 sentences.

df1.isna().any(axis=0)
Date               False
Team_              False
T_Score            False
T_Sum              False
T_BP               False
T_Ratio            False
T_Srv_Sum          False
T_Srv_Err          False
T_Srv_Ace          False
T_Srv_Eff          False
T_Rec_Sum          False
T_Rec_Err          False
T_Rec_Pos          False
T_Rec_Perf         False
T_Att_Sum          False
T_Att_Err          False
T_Att_Blk          False
T_Att_Kill         False
T_Att_Kill_Perc    False
T_Att_Eff          False
T_Blk_Sum          False
T_Blk_As           False
Winner             False
dtype: bool

the first few columns represent total points and sets they win, they are not good for prediction.

X = df1.loc[:, "T_Srv_Sum":"T_Blk_As"]
y=df1["Winner"]
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
clf = DecisionTreeClassifier(max_depth=5)
clf.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=5)
a=clf.score(X_train, y_train)
b=clf.score(X_test, y_test)
print(a,b)
0.8889152060634771 0.8702651515151515

there is no overfitting, so we can predict data next.

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
df_err = pd.DataFrame(columns=["leaves", "error", "set"])

get the test error curve

for n in range(2, 200):
    clf2 = DecisionTreeClassifier(max_leaf_nodes=n, random_state=42)
    clf2.fit(X_train, y_train)
    train_error = 1 - clf2.score(X_train, y_train)
    test_error = 1 - clf2.score(X_test, y_test)
    d_train = {"leaves": n, "error": train_error, "set":"train"}
    d_test = {"leaves": n, "error": test_error, "set":"test"}
    df_err.loc[len(df_err)] = d_train
    df_err.loc[len(df_err)] = d_test
import altair as alt
c = alt.Chart(df_err).mark_line().encode(
    x="leaves",
    y="error",
    color="set"
)
c

the sweet spot is approximately at 17

clf1 = DecisionTreeClassifier(max_depth=5, max_leaf_nodes=17)
clf1.fit(X, y)
DecisionTreeClassifier(max_depth=5, max_leaf_nodes=17)
clf1.score(X, y)
0.8819628647214854
fig = plt.figure(figsize=(100,200))
plot_tree(
    clf1,
    feature_names=clf1.feature_names_in_,
    filled=True
);
../../_images/c8528438c2fe14d974476db07ca60a62be628879a427a2e3cbc361c85fc41d83.png

use logistic regression to predict#

from sklearn.linear_model import LogisticRegression
clf3 = LogisticRegression()
clf3.fit(X,y)
/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,
LogisticRegression()
df1["Pred"] = clf3.predict(X)
df1
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Att_Sum T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner Pred
0 01.10.2022, 14:45 AZS Olsztyn 1 60.0 17.0 11.0 79.0 18.00 0.100000 -13 ... 100 7.0 14.0 0.783333 47 26 0.116667 11.00 0 0
1 30.09.2022, 17:30 Jastrzębski Węgiel 3 51.0 17.0 27.0 77.0 15.00 0.078431 -7 ... 88 4.0 1.0 0.843137 48 43 0.078431 8.00 1 1
2 01.10.2022, 20:30 LUK Lublin 2 76.0 23.0 35.0 109.0 16.00 0.039474 -9 ... 115 6.0 10.0 0.828947 54 40 0.131579 9.00 0 0
3 02.10.2022, 14:45 Warta Zawiercie 3 66.0 16.0 22.0 98.0 21.00 0.075758 -16 ... 92 8.0 7.0 0.787879 56 40 0.136364 11.00 1 0
4 03.10.2022, 17:30 BBTS Bielsko-Biała 1 63.0 22.0 17.0 100.0 19.00 0.111111 -7 ... 97 5.0 10.0 0.761905 49 34 0.126984 10.00 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5273 20.03.2010, 17:00 AZS Częstochowa 0 34.0 52.0 0.0 15.0 0.00 1.764706 3 ... 43 70.0 9.0 0.264706 27 39 0.205882 2.33 0 0
5274 19.03.2010, 18:00 AZS Olsztyn 0 39.0 57.0 2.0 11.0 0.67 1.615385 4 ... 22 80.0 10.0 0.230769 28 35 0.230769 3.00 0 0
5275 20.03.2010, 17:00 Jadar Radom 0 43.0 67.0 4.0 13.0 1.33 1.465116 5 ... 17 66.0 7.0 0.186047 35 53 0.116279 1.67 0 0
5276 20.03.2010, 17:00 Projekt Warszawa 0 37.0 59.0 1.0 10.0 0.33 1.810811 8 ... 23 82.0 8.0 0.162162 31 38 0.162162 2.00 0 0
5277 20.03.2010, 14:45 Jastrzębski Węgiel 3 50.0 66.0 1.0 9.0 0.33 1.040000 1 ... 50 73.0 7.0 0.020000 42 58 0.140000 2.33 1 1

5278 rows × 24 columns

df1[df1["Winner"] == df1["Pred"]]
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Att_Sum T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner Pred
0 01.10.2022, 14:45 AZS Olsztyn 1 60.0 17.0 11.0 79.0 18.00 0.100000 -13 ... 100 7.0 14.0 0.783333 47 26 0.116667 11.00 0 0
1 30.09.2022, 17:30 Jastrzębski Węgiel 3 51.0 17.0 27.0 77.0 15.00 0.078431 -7 ... 88 4.0 1.0 0.843137 48 43 0.078431 8.00 1 1
2 01.10.2022, 20:30 LUK Lublin 2 76.0 23.0 35.0 109.0 16.00 0.039474 -9 ... 115 6.0 10.0 0.828947 54 40 0.131579 9.00 0 0
4 03.10.2022, 17:30 BBTS Bielsko-Biała 1 63.0 22.0 17.0 100.0 19.00 0.111111 -7 ... 97 5.0 10.0 0.761905 49 34 0.126984 10.00 0 0
5 02.10.2022, 20:30 Stal Nysa 3 68.0 23.0 29.0 100.0 22.00 0.102941 -12 ... 105 5.0 4.0 0.823529 53 44 0.073529 9.00 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5273 20.03.2010, 17:00 AZS Częstochowa 0 34.0 52.0 0.0 15.0 0.00 1.764706 3 ... 43 70.0 9.0 0.264706 27 39 0.205882 2.33 0 0
5274 19.03.2010, 18:00 AZS Olsztyn 0 39.0 57.0 2.0 11.0 0.67 1.615385 4 ... 22 80.0 10.0 0.230769 28 35 0.230769 3.00 0 0
5275 20.03.2010, 17:00 Jadar Radom 0 43.0 67.0 4.0 13.0 1.33 1.465116 5 ... 17 66.0 7.0 0.186047 35 53 0.116279 1.67 0 0
5276 20.03.2010, 17:00 Projekt Warszawa 0 37.0 59.0 1.0 10.0 0.33 1.810811 8 ... 23 82.0 8.0 0.162162 31 38 0.162162 2.00 0 0
5277 20.03.2010, 14:45 Jastrzębski Węgiel 3 50.0 66.0 1.0 9.0 0.33 1.040000 1 ... 50 73.0 7.0 0.020000 42 58 0.140000 2.33 1 1

4489 rows × 24 columns

4683/5278
0.8872679045092838

predict accuracy is high

from sklearn.metrics import mean_absolute_error
mean_absolute_error(clf3.predict(X_test), y_test)
0.1571969696969697
mean_absolute_error(clf3.predict(X_train), y_train)
0.14756039791567976

most important factor#

In men’s volleyball, many people complain that too many serving ace negatively affect the fluenty of the matches. I want to know if serving ace is the dominant factor.

clf1.feature_importances_
array([0.05616922, 0.        , 0.64921671, 0.        , 0.03569843,
       0.00612576, 0.        , 0.        , 0.00312938, 0.        ,
       0.        , 0.0188901 , 0.00658239, 0.22418801, 0.        ,
       0.        ])
pd.Series(clf1.feature_importances_, index=X.columns)
T_Srv_Sum          0.056169
T_Srv_Err          0.000000
T_Srv_Ace          0.649217
T_Srv_Eff          0.000000
T_Rec_Sum          0.035698
T_Rec_Err          0.006126
T_Rec_Pos          0.000000
T_Rec_Perf         0.000000
T_Att_Sum          0.003129
T_Att_Err          0.000000
T_Att_Blk          0.000000
T_Att_Kill         0.018890
T_Att_Kill_Perc    0.006582
T_Att_Eff          0.224188
T_Blk_Sum          0.000000
T_Blk_As           0.000000
dtype: float64
df4 = pd.DataFrame({"importance": clf1.feature_importances_, "factors": clf1.feature_names_in_})
df4
importance factors
0 0.056169 T_Srv_Sum
1 0.000000 T_Srv_Err
2 0.649217 T_Srv_Ace
3 0.000000 T_Srv_Eff
4 0.035698 T_Rec_Sum
5 0.006126 T_Rec_Err
6 0.000000 T_Rec_Pos
7 0.000000 T_Rec_Perf
8 0.003129 T_Att_Sum
9 0.000000 T_Att_Err
10 0.000000 T_Att_Blk
11 0.018890 T_Att_Kill
12 0.006582 T_Att_Kill_Perc
13 0.224188 T_Att_Eff
14 0.000000 T_Blk_Sum
15 0.000000 T_Blk_As
alt.Chart(df4).mark_bar().encode(
    x="factors",
    y="importance",
    tooltip=["importance"]
).properties(
    title = 'Importance of Factors'
)

the result shows us serving ace dose the most important factor in men’s volleyball

df1_1=df1.head(2000)
alt.Chart(df1_1).mark_circle().encode(
    x=alt.X("T_Srv_Ace", scale=alt.Scale(zero=False)),
    y=alt.Y("T_Att_Eff", scale=alt.Scale(zero=False)),
    color="Winner",
    tooltip=["Team_", "Date", "T_Att_Eff","T_Srv_Ace"],
).interactive()

You may find this chart a little weird because it is divided into two parts, that is because in volleyball game, some macthes have 5 sets but some macthes have 3 sets, thats give us two parts. I’m a little surprised in this figure, because I thought more serving ace means more possible to win. This figure shows us serving ace has more portion in total scores, it is less likely to win the game; also, the more effective attack you make, the more likely you win the game.

predict team performance#

many people don’t pay attention to defensive skills when they watch the games, so I want to predict if defensive skills important for a strong team.

alt.Chart(df1_1).mark_circle().encode(
    x=alt.X("T_Blk_Sum", scale=alt.Scale(zero=False)),
    y=alt.Y("T_Rec_Pos", scale=alt.Scale(zero=False)),
    color="Winner",
    tooltip=["Team_", "Date", "T_Att_Eff","T_Srv_Ace"],
).interactive()

This is no obvious relation between defensive skills, but it slightly shows there are more deep blue dots in right and top. let’s focus on one team.

I use neural networks because they consist of multiple layers of interconnected neurons and can be applied to various prediction tasks.

pip install xgboost
Collecting xgboost
  Downloading xgboost-1.6.2-py3-none-manylinux2014_x86_64.whl (255.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 255.9/255.9 MB 6.4 MB/s eta 0:00:00
?25hRequirement already satisfied: numpy in /shared-libs/python3.7/py/lib/python3.7/site-packages (from xgboost) (1.21.6)
Requirement already satisfied: scipy in /shared-libs/python3.7/py/lib/python3.7/site-packages (from xgboost) (1.7.3)
Installing collected packages: xgboost
Successfully installed xgboost-1.6.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.
Z=df1[["T_Blk_Sum","T_Rec_Pos"]]
X_train1, X_test1, y_train1, y_test1 = train_test_split(Z, y, test_size=0.2)
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

model1 = xgb.XGBClassifier()
model1.fit(X_train1, y_train1)
predictions = model1.predict(X_test1)
t=model1.predict(Z)
accurac = accuracy_score(y, t)
print("Accuracy:", accurac)
Accuracy: 0.7391057218643425
df1["p"]=model1.predict(Z)
df5=df1[df1["Winner"]==1]
df5
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner Pred p
1 30.09.2022, 17:30 Jastrzębski Węgiel 3 51.0 17.0 27.0 77.0 15.00 0.078431 -7 ... 4.0 1.0 0.843137 48 43 0.078431 8.00 1 1 0
3 02.10.2022, 14:45 Warta Zawiercie 3 66.0 16.0 22.0 98.0 21.00 0.075758 -16 ... 8.0 7.0 0.787879 56 40 0.136364 11.00 1 0 0
5 02.10.2022, 20:30 Stal Nysa 3 68.0 23.0 29.0 100.0 22.00 0.102941 -12 ... 5.0 4.0 0.823529 53 44 0.073529 9.00 1 1 0
6 02.10.2022, 17:30 Trefl Gdańsk 3 59.0 22.0 33.0 73.0 16.00 0.101695 -6 ... 2.0 8.0 0.847458 60 48 0.050847 10.00 1 1 0
7 01.10.2022, 17:30 Asseco Resovia 3 52.0 23.0 25.0 74.0 14.00 0.115385 -8 ... 7.0 5.0 0.692308 52 34 0.192308 8.00 1 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5269 14.03.2010, 14:45 Asseco Resovia 3 60.0 78.0 6.0 8.0 2.00 1.016667 2 ... 85.0 5.0 0.050000 48 56 0.100000 2.00 1 1 0
5270 14.03.2010, 14:45 Jastrzębski Węgiel 3 73.0 87.0 4.0 15.0 1.00 0.917808 3 ... 100.0 4.0 0.095890 58 58 0.150685 2.75 1 1 1
5271 14.03.2010, 14:45 ZAKSA Kędzierzyn-Koźle 3 73.0 97.0 6.0 14.0 1.50 0.904110 4 ... 96.0 7.0 0.082192 51 53 0.219178 4.00 1 1 1
5272 13.03.2010, 18:00 PGE Skra Bełchatów 3 67.0 97.0 3.0 14.0 0.75 1.044776 2 ... 104.0 8.0 0.164179 53 51 0.164179 2.75 1 1 1
5277 20.03.2010, 14:45 Jastrzębski Węgiel 3 50.0 66.0 1.0 9.0 0.33 1.040000 1 ... 73.0 7.0 0.020000 42 58 0.140000 2.33 1 1 1

2639 rows × 25 columns

I find the team who won the most amount of games.

a=df5["Team_"].value_counts()
a
ZAKSA Kędzierzyn-Koźle    324
PGE Skra Bełchatów        321
Asseco Resovia            290
Jastrzębski Węgiel        286
Projekt Warszawa          194
AZS Olsztyn               172
Trefl Gdańsk              171
Chemik Bydgoszcz          129
Cuprum Lubin              110
Czarni Radom              108
Warta Zawiercie            91
AZS Częstochowa            83
GKS Katowice               74
Społem Kielce              71
MKS Będzin                 44
BBTS Bielsko-Biała         35
Ślepsk Malow Suwałki       32
Stal Nysa                  26
LUK  Lublin                22
Stocznia Szczecin          19
Jadar Radom                17
Pamapol Wielton Wieluń     14
Barkom Każany Lwów          6
Name: Team_, dtype: int64
df_2 = df1.loc[df1['Team_'] == 'ZAKSA Kędzierzyn-Koźle']
df_2
Date Team_ T_Score T_Sum T_BP T_Ratio T_Srv_Sum T_Srv_Err T_Srv_Ace T_Srv_Eff ... T_Att_Err T_Att_Blk T_Att_Kill T_Att_Kill_Perc T_Att_Eff T_Blk_Sum T_Blk_As Winner Pred p
14 04.10.2022, 21:00 ZAKSA Kędzierzyn-Koźle 3 59.0 23.0 37.0 73.0 9.00 0.050847 -5 ... 5.0 4.0 0.762712 59 47 0.186441 7.00 1 1 0
29 13.10.2022, 17:30 ZAKSA Kędzierzyn-Koźle 3 53.0 20.0 34.0 73.0 7.00 0.075472 0 ... 3.0 2.0 0.735849 56 49 0.188679 5.00 1 1 1
44 30.10.2022, 14:45 ZAKSA Kędzierzyn-Koźle 3 50.0 22.0 26.0 73.0 11.00 0.120000 0 ... 2.0 6.0 0.660000 50 37 0.220000 9.00 1 1 1
59 06.11.2022, 14:45 ZAKSA Kędzierzyn-Koźle 3 62.0 20.0 26.0 95.0 14.00 0.032258 -9 ... 8.0 8.0 0.790323 49 33 0.177419 18.00 1 1 1
74 19.11.2022, 17:30 ZAKSA Kędzierzyn-Koźle 3 60.0 15.0 40.0 74.0 9.00 0.033333 -6 ... 5.0 3.0 0.800000 57 48 0.166667 6.00 1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5235 10.01.2010, 14:45 ZAKSA Kędzierzyn-Koźle 3 59.0 74.0 11.0 14.0 3.67 0.694915 0 ... 68.0 2.0 0.084746 44 65 0.067797 1.33 1 1 1
5245 12.02.2010, 18:00 ZAKSA Kędzierzyn-Koźle 3 72.0 103.0 8.0 13.0 1.60 1.125000 11 ... 102.0 5.0 0.166667 54 53 0.138889 2.00 1 1 0
5254 20.02.2010, 17:00 ZAKSA Kędzierzyn-Koźle 3 67.0 91.0 3.0 16.0 0.75 1.074627 1 ... 124.0 6.0 0.164179 56 45 0.134328 2.25 1 1 0
5263 06.03.2010, 18:00 ZAKSA Kędzierzyn-Koźle 3 77.0 106.0 12.0 18.0 2.40 1.077922 3 ... 115.0 8.0 0.207792 52 45 0.168831 2.60 1 0 1
5271 14.03.2010, 14:45 ZAKSA Kędzierzyn-Koźle 3 73.0 97.0 6.0 14.0 1.50 0.904110 4 ... 96.0 7.0 0.082192 51 53 0.219178 4.00 1 1 1

445 rows × 25 columns

alt.Chart(df_2).mark_circle().encode(
    x="T_Blk_Sum",
    y="T_Rec_Pos",
    color=alt.Color("p", title="Wins"),
    tooltip = ("T_Blk_Sum", "T_Rec_Pos")
).properties(
    width=500,
    height=200,
)

It doesn’t have much connections between block and reception skills, the deep blue dots are more on the top, but it is not a strong relation, so defensive skills are not as important as offensive skills.

Summary#

After predicting, I found that the score portion of serving ace is the most important factor for men’s volleyball, but more bigger ace scores portion actually causes negative result; On contrary, effective attack is more important to win a game. Therefore, it is more essential to practice attack skills than serving skills. Also, we found block and reception don’t have significant effect, so defensive skills don’t show equal importance as offensive in this data.

References#

Your code above should include references. Here is some additional space for references.

  • What is the source of your dataset(s)?

https://www.kaggle.com/code/kacpergregorowicz/predicting-volleyball-match-winners/notebook

  • List any other references that you found helpful.

https://pandas.pydata.org/pandas-docs/version/0.19/generated/pandas.DataFrame.filter.html

https://www.activestate.com/resources/quick-reads/how-to-create-a-neural-network-in-python-with-and-without-keras/

Submission#

Using the Share button at the top right, enable Comment privileges for anyone with a link to the project. Then submit that link on Canvas.

Created in deepnote.com Created in Deepnote