Men’s volleyball performances prediction

Men’s volleyball performances prediction#

Author:Mingyang Yi

Course Project, UC Irvine, Math 10, S23

Introduction#

As a volleyball fan, I’m very interested in what factor effects the result the most, also I want to predict one team’s performance.

Data Cleaning#

import pandas as pd

df=pd.read_csv("mensvolleyball-PlusLiga08-23.csv")

this is one way to delete part of name of columns, it is from PyData website.

df2 = df.filter(regex="T1|Winner|Team_1|Date")

i need to divide into two dataframe later so i need to change its boolen value.

df2['Winner'] = df2['Winner'].replace({1: 0, 0: 1})

/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

df2.columns = df2.columns.str.replace("1", "")
df2

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Rec_Perf	T_Att_Sum	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner
0	01.10.2022, 14:45	AZS Olsztyn	1	60.0	17.0	11.0	79.0	18	6.0	-13%	...	25%	100	7.0	14.0	47.0	47%	26%	7.0	11	0
1	30.09.2022, 17:30	Jastrzębski Węgiel	3	51.0	17.0	27.0	77.0	15	4.0	-7%	...	16%	88	4.0	1.0	43.0	48%	43%	4.0	8	1
2	01.10.2022, 20:30	LUK Lublin	2	76.0	23.0	35.0	109.0	16	3.0	-9%	...	21%	115	6.0	10.0	63.0	54%	40%	10.0	9	0
3	02.10.2022, 14:45	Warta Zawiercie	3	66.0	16.0	22.0	98.0	21	5.0	-16%	...	12%	92	8.0	7.0	52.0	56%	40%	9.0	11	1
4	03.10.2022, 17:30	BBTS Bielsko-Biała	1	63.0	22.0	17.0	100.0	19	7.0	-7%	...	23%	97	5.0	10.0	48.0	49%	34%	8.0	10	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2634	20.03.2010, 17:00	Pamapol Wielton Wieluń	3	50.0	74.0	6.0	11.0	2,00	37.0	0	...	18	48%	67.0	4.0	7.0	35	52%	9.0	3,00	1
2635	19.03.2010, 18:00	ZAKSA Kędzierzyn-Koźle	3	54.0	74.0	4.0	11.0	1,33	46.0	2	...	18	39%	74.0	4.0	9.0	41	55%	9.0	3,00	1
2636	20.03.2010, 17:00	PGE Skra Bełchatów	3	54.0	75.0	5.0	12.0	1,67	54.0	5	...	15	27%	69.0	3.0	5.0	41	59%	8.0	2,67	1
2637	20.03.2010, 17:00	Asseco Resovia	3	55.0	73.0	8.0	6.0	2,67	49.0	1	...	19	38%	88.0	5.0	7.0	42	48%	5.0	1,67	1
2638	20.03.2010, 14:45	Chemik Bydgoszcz	0	43.0	64.0	1.0	12.0	0,33	65.0	1	...	26	40%	89.0	9.0	7.0	41	46%	1.0	0,33	0

2639 rows × 23 columns

df3=df.filter(regex='T2|Winner|Team_2|Date')

df3.columns = df3.columns.str.replace("2", "")
df3

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Rec_Perf	T_Att_Sum	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner
0	01.10.2022, 14:45	ZAKSA Kędzierzyn-Koźle	3	69	30	38	96	11	10	2%	...	26%	88	7	7	45	51%	35%	14	11	1
1	30.09.2022, 17:30	GKS Katowice	0	48	16	16	70	16	4	-11%	...	20%	91	8	4	43	47%	34%	1	17	0
2	01.10.2022, 20:30	Czarni Radom	3	82	23	40	104	19	9	-5%	...	18%	128	10	10	63	49%	33%	10	13	1
3	02.10.2022, 14:45	PGE Skra Bełchatów	2	71	21	25	103	23	8	-8%	...	9%	102	9	9	56	54%	37%	7	14	0
4	03.10.2022, 17:30	Cuprum Lubin	3	80	30	32	103	26	12	-8%	...	22%	109	7	8	58	53%	39%	10	10	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2634	20.03.2010, 17:00	AZS Częstochowa	0	34	52	0	15	0,00	60	3	...	26	43%	70	9	9	27	39%	7	2,33	0
2635	19.03.2010, 18:00	AZS Olsztyn	0	39	57	2	11	0,67	63	4	...	14	22%	80	10	9	28	35%	9	3,00	0
2636	20.03.2010, 17:00	Jadar Radom	0	43	67	4	13	1,33	63	5	...	11	17%	66	7	8	35	53%	5	1,67	0
2637	20.03.2010, 17:00	Projekt Warszawa	0	37	59	1	10	0,33	67	8	...	16	23%	82	8	6	31	38%	6	2,00	0
2638	20.03.2010, 14:45	Jastrzębski Węgiel	3	50	66	1	9	0,33	52	1	...	26	50%	73	7	1	42	58%	7	2,33	1

2639 rows × 23 columns

df1 = pd.concat([df2, df3], axis=0, ignore_index=True)

perc_cols = ['T_Srv_Eff', 'T_Rec_Pos', 'T_Rec_Perf', 'T_Att_Kill_Perc', 'T_Att_Eff', 'T_Att_Sum']
for col in perc_cols:
    df1[col] = pd.to_numeric(df1[col].str.replace('%', ''))

float_cols = ['T_Srv_Err', 'T_Blk_As']
for col in float_cols:
    df1[col] = pd.to_numeric(df1[col].str.replace(',', '.'))

I change the points they gain to the percent of points they gain in different ways, because some teams have 5 sets a match, but some teams only have 3 sets a match.

df1["T_Att_Kill"]=df1["T_Att_Kill"]/df1["T_Sum"]
df1["T_Blk_Sum"]=df1["T_Blk_Sum"]/df1["T_Sum"]
df1["T_Srv_Ace"]=df1["T_Srv_Ace"]/df1["T_Sum"]

df1

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Rec_Perf	T_Att_Sum	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner
0	01.10.2022, 14:45	AZS Olsztyn	1	60.0	17.0	11.0	79.0	18.00	0.100000	-13	...	25	100	7.0	14.0	0.783333	47	26	0.116667	11.00	0
1	30.09.2022, 17:30	Jastrzębski Węgiel	3	51.0	17.0	27.0	77.0	15.00	0.078431	-7	...	16	88	4.0	1.0	0.843137	48	43	0.078431	8.00	1
2	01.10.2022, 20:30	LUK Lublin	2	76.0	23.0	35.0	109.0	16.00	0.039474	-9	...	21	115	6.0	10.0	0.828947	54	40	0.131579	9.00	0
3	02.10.2022, 14:45	Warta Zawiercie	3	66.0	16.0	22.0	98.0	21.00	0.075758	-16	...	12	92	8.0	7.0	0.787879	56	40	0.136364	11.00	1
4	03.10.2022, 17:30	BBTS Bielsko-Biała	1	63.0	22.0	17.0	100.0	19.00	0.111111	-7	...	23	97	5.0	10.0	0.761905	49	34	0.126984	10.00	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5273	20.03.2010, 17:00	AZS Częstochowa	0	34.0	52.0	0.0	15.0	0.00	1.764706	3	...	26	43	70.0	9.0	0.264706	27	39	0.205882	2.33	0
5274	19.03.2010, 18:00	AZS Olsztyn	0	39.0	57.0	2.0	11.0	0.67	1.615385	4	...	14	22	80.0	10.0	0.230769	28	35	0.230769	3.00	0
5275	20.03.2010, 17:00	Jadar Radom	0	43.0	67.0	4.0	13.0	1.33	1.465116	5	...	11	17	66.0	7.0	0.186047	35	53	0.116279	1.67	0
5276	20.03.2010, 17:00	Projekt Warszawa	0	37.0	59.0	1.0	10.0	0.33	1.810811	8	...	16	23	82.0	8.0	0.162162	31	38	0.162162	2.00	0
5277	20.03.2010, 14:45	Jastrzębski Węgiel	3	50.0	66.0	1.0	9.0	0.33	1.040000	1	...	26	50	73.0	7.0	0.020000	42	58	0.140000	2.33	1

5278 rows × 23 columns

model training#

Either summarize what you did, or summarize the results. Maybe 3 sentences.

df1.isna().any(axis=0)

Date               False
Team_              False
T_Score            False
T_Sum              False
T_BP               False
T_Ratio            False
T_Srv_Sum          False
T_Srv_Err          False
T_Srv_Ace          False
T_Srv_Eff          False
T_Rec_Sum          False
T_Rec_Err          False
T_Rec_Pos          False
T_Rec_Perf         False
T_Att_Sum          False
T_Att_Err          False
T_Att_Blk          False
T_Att_Kill         False
T_Att_Kill_Perc    False
T_Att_Eff          False
T_Blk_Sum          False
T_Blk_As           False
Winner             False
dtype: bool

the first few columns represent total points and sets they win, they are not good for prediction.

X = df1.loc[:, "T_Srv_Sum":"T_Blk_As"]
y=df1["Winner"]

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

clf = DecisionTreeClassifier(max_depth=5)
clf.fit(X_train, y_train)

DecisionTreeClassifier(max_depth=5)

a=clf.score(X_train, y_train)
b=clf.score(X_test, y_test)
print(a,b)

0.8889152060634771 0.8702651515151515

there is no overfitting, so we can predict data next.

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

df_err = pd.DataFrame(columns=["leaves", "error", "set"])

get the test error curve

for n in range(2, 200):
    clf2 = DecisionTreeClassifier(max_leaf_nodes=n, random_state=42)
    clf2.fit(X_train, y_train)
    train_error = 1 - clf2.score(X_train, y_train)
    test_error = 1 - clf2.score(X_test, y_test)
    d_train = {"leaves": n, "error": train_error, "set":"train"}
    d_test = {"leaves": n, "error": test_error, "set":"test"}
    df_err.loc[len(df_err)] = d_train
    df_err.loc[len(df_err)] = d_test

import altair as alt
c = alt.Chart(df_err).mark_line().encode(
    x="leaves",
    y="error",
    color="set"
)
c

the sweet spot is approximately at 17

clf1 = DecisionTreeClassifier(max_depth=5, max_leaf_nodes=17)
clf1.fit(X, y)

DecisionTreeClassifier(max_depth=5, max_leaf_nodes=17)

clf1.score(X, y)

0.8819628647214854

fig = plt.figure(figsize=(100,200))
plot_tree(
    clf1,
    feature_names=clf1.feature_names_in_,
    filled=True
);

../../_images/c8528438c2fe14d974476db07ca60a62be628879a427a2e3cbc361c85fc41d83.png

use logistic regression to predict#

from sklearn.linear_model import LogisticRegression

clf3 = LogisticRegression()
clf3.fit(X,y)

/shared-libs/python3.7/py/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,

LogisticRegression()

df1["Pred"] = clf3.predict(X)

df1

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Att_Sum	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner	Pred
0	01.10.2022, 14:45	AZS Olsztyn	1	60.0	17.0	11.0	79.0	18.00	0.100000	-13	...	100	7.0	14.0	0.783333	47	26	0.116667	11.00	0	0
1	30.09.2022, 17:30	Jastrzębski Węgiel	3	51.0	17.0	27.0	77.0	15.00	0.078431	-7	...	88	4.0	1.0	0.843137	48	43	0.078431	8.00	1	1
2	01.10.2022, 20:30	LUK Lublin	2	76.0	23.0	35.0	109.0	16.00	0.039474	-9	...	115	6.0	10.0	0.828947	54	40	0.131579	9.00	0	0
3	02.10.2022, 14:45	Warta Zawiercie	3	66.0	16.0	22.0	98.0	21.00	0.075758	-16	...	92	8.0	7.0	0.787879	56	40	0.136364	11.00	1	0
4	03.10.2022, 17:30	BBTS Bielsko-Biała	1	63.0	22.0	17.0	100.0	19.00	0.111111	-7	...	97	5.0	10.0	0.761905	49	34	0.126984	10.00	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5273	20.03.2010, 17:00	AZS Częstochowa	0	34.0	52.0	0.0	15.0	0.00	1.764706	3	...	43	70.0	9.0	0.264706	27	39	0.205882	2.33	0	0
5274	19.03.2010, 18:00	AZS Olsztyn	0	39.0	57.0	2.0	11.0	0.67	1.615385	4	...	22	80.0	10.0	0.230769	28	35	0.230769	3.00	0	0
5275	20.03.2010, 17:00	Jadar Radom	0	43.0	67.0	4.0	13.0	1.33	1.465116	5	...	17	66.0	7.0	0.186047	35	53	0.116279	1.67	0	0
5276	20.03.2010, 17:00	Projekt Warszawa	0	37.0	59.0	1.0	10.0	0.33	1.810811	8	...	23	82.0	8.0	0.162162	31	38	0.162162	2.00	0	0
5277	20.03.2010, 14:45	Jastrzębski Węgiel	3	50.0	66.0	1.0	9.0	0.33	1.040000	1	...	50	73.0	7.0	0.020000	42	58	0.140000	2.33	1	1

5278 rows × 24 columns

df1[df1["Winner"] == df1["Pred"]]

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Att_Sum	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner	Pred
0	01.10.2022, 14:45	AZS Olsztyn	1	60.0	17.0	11.0	79.0	18.00	0.100000	-13	...	100	7.0	14.0	0.783333	47	26	0.116667	11.00	0	0
1	30.09.2022, 17:30	Jastrzębski Węgiel	3	51.0	17.0	27.0	77.0	15.00	0.078431	-7	...	88	4.0	1.0	0.843137	48	43	0.078431	8.00	1	1
2	01.10.2022, 20:30	LUK Lublin	2	76.0	23.0	35.0	109.0	16.00	0.039474	-9	...	115	6.0	10.0	0.828947	54	40	0.131579	9.00	0	0
4	03.10.2022, 17:30	BBTS Bielsko-Biała	1	63.0	22.0	17.0	100.0	19.00	0.111111	-7	...	97	5.0	10.0	0.761905	49	34	0.126984	10.00	0	0
5	02.10.2022, 20:30	Stal Nysa	3	68.0	23.0	29.0	100.0	22.00	0.102941	-12	...	105	5.0	4.0	0.823529	53	44	0.073529	9.00	1	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5273	20.03.2010, 17:00	AZS Częstochowa	0	34.0	52.0	0.0	15.0	0.00	1.764706	3	...	43	70.0	9.0	0.264706	27	39	0.205882	2.33	0	0
5274	19.03.2010, 18:00	AZS Olsztyn	0	39.0	57.0	2.0	11.0	0.67	1.615385	4	...	22	80.0	10.0	0.230769	28	35	0.230769	3.00	0	0
5275	20.03.2010, 17:00	Jadar Radom	0	43.0	67.0	4.0	13.0	1.33	1.465116	5	...	17	66.0	7.0	0.186047	35	53	0.116279	1.67	0	0
5276	20.03.2010, 17:00	Projekt Warszawa	0	37.0	59.0	1.0	10.0	0.33	1.810811	8	...	23	82.0	8.0	0.162162	31	38	0.162162	2.00	0	0
5277	20.03.2010, 14:45	Jastrzębski Węgiel	3	50.0	66.0	1.0	9.0	0.33	1.040000	1	...	50	73.0	7.0	0.020000	42	58	0.140000	2.33	1	1

4489 rows × 24 columns

4683/5278

0.8872679045092838

predict accuracy is high

from sklearn.metrics import mean_absolute_error

mean_absolute_error(clf3.predict(X_test), y_test)

0.1571969696969697

mean_absolute_error(clf3.predict(X_train), y_train)

0.14756039791567976

most important factor#

In men’s volleyball, many people complain that too many serving ace negatively affect the fluenty of the matches. I want to know if serving ace is the dominant factor.

clf1.feature_importances_

array([0.05616922, 0.        , 0.64921671, 0.        , 0.03569843,
       0.00612576, 0.        , 0.        , 0.00312938, 0.        ,
       0.        , 0.0188901 , 0.00658239, 0.22418801, 0.        ,
       0.        ])

pd.Series(clf1.feature_importances_, index=X.columns)

T_Srv_Sum          0.056169
T_Srv_Err          0.000000
T_Srv_Ace          0.649217
T_Srv_Eff          0.000000
T_Rec_Sum          0.035698
T_Rec_Err          0.006126
T_Rec_Pos          0.000000
T_Rec_Perf         0.000000
T_Att_Sum          0.003129
T_Att_Err          0.000000
T_Att_Blk          0.000000
T_Att_Kill         0.018890
T_Att_Kill_Perc    0.006582
T_Att_Eff          0.224188
T_Blk_Sum          0.000000
T_Blk_As           0.000000
dtype: float64

df4 = pd.DataFrame({"importance": clf1.feature_importances_, "factors": clf1.feature_names_in_})
df4

	importance	factors
0	0.056169	T_Srv_Sum
1	0.000000	T_Srv_Err
2	0.649217	T_Srv_Ace
3	0.000000	T_Srv_Eff
4	0.035698	T_Rec_Sum
5	0.006126	T_Rec_Err
6	0.000000	T_Rec_Pos
7	0.000000	T_Rec_Perf
8	0.003129	T_Att_Sum
9	0.000000	T_Att_Err
10	0.000000	T_Att_Blk
11	0.018890	T_Att_Kill
12	0.006582	T_Att_Kill_Perc
13	0.224188	T_Att_Eff
14	0.000000	T_Blk_Sum
15	0.000000	T_Blk_As

alt.Chart(df4).mark_bar().encode(
    x="factors",
    y="importance",
    tooltip=["importance"]
).properties(
    title = 'Importance of Factors'
)

the result shows us serving ace dose the most important factor in men’s volleyball

df1_1=df1.head(2000)

alt.Chart(df1_1).mark_circle().encode(
    x=alt.X("T_Srv_Ace", scale=alt.Scale(zero=False)),
    y=alt.Y("T_Att_Eff", scale=alt.Scale(zero=False)),
    color="Winner",
    tooltip=["Team_", "Date", "T_Att_Eff","T_Srv_Ace"],
).interactive()

You may find this chart a little weird because it is divided into two parts, that is because in volleyball game, some macthes have 5 sets but some macthes have 3 sets, thats give us two parts. I’m a little surprised in this figure, because I thought more serving ace means more possible to win. This figure shows us serving ace has more portion in total scores, it is less likely to win the game; also, the more effective attack you make, the more likely you win the game.

predict team performance#

many people don’t pay attention to defensive skills when they watch the games, so I want to predict if defensive skills important for a strong team.

alt.Chart(df1_1).mark_circle().encode(
    x=alt.X("T_Blk_Sum", scale=alt.Scale(zero=False)),
    y=alt.Y("T_Rec_Pos", scale=alt.Scale(zero=False)),
    color="Winner",
    tooltip=["Team_", "Date", "T_Att_Eff","T_Srv_Ace"],
).interactive()

This is no obvious relation between defensive skills, but it slightly shows there are more deep blue dots in right and top. let’s focus on one team.

I use neural networks because they consist of multiple layers of interconnected neurons and can be applied to various prediction tasks.

pip install xgboost

Collecting xgboost
  Downloading xgboost-1.6.2-py3-none-manylinux2014_x86_64.whl (255.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 255.9/255.9 MB 6.4 MB/s eta 0:00:00
?25hRequirement already satisfied: numpy in /shared-libs/python3.7/py/lib/python3.7/site-packages (from xgboost) (1.21.6)
Requirement already satisfied: scipy in /shared-libs/python3.7/py/lib/python3.7/site-packages (from xgboost) (1.7.3)
Installing collected packages: xgboost
Successfully installed xgboost-1.6.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Z=df1[["T_Blk_Sum","T_Rec_Pos"]]

X_train1, X_test1, y_train1, y_test1 = train_test_split(Z, y, test_size=0.2)

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

model1 = xgb.XGBClassifier()
model1.fit(X_train1, y_train1)
predictions = model1.predict(X_test1)

t=model1.predict(Z)
accurac = accuracy_score(y, t)
print("Accuracy:", accurac)

Accuracy: 0.7391057218643425

df1["p"]=model1.predict(Z)

df5=df1[df1["Winner"]==1]

df5

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner	Pred	p
1	30.09.2022, 17:30	Jastrzębski Węgiel	3	51.0	17.0	27.0	77.0	15.00	0.078431	-7	...	4.0	1.0	0.843137	48	43	0.078431	8.00	1	1	0
3	02.10.2022, 14:45	Warta Zawiercie	3	66.0	16.0	22.0	98.0	21.00	0.075758	-16	...	8.0	7.0	0.787879	56	40	0.136364	11.00	1	0	0
5	02.10.2022, 20:30	Stal Nysa	3	68.0	23.0	29.0	100.0	22.00	0.102941	-12	...	5.0	4.0	0.823529	53	44	0.073529	9.00	1	1	0
6	02.10.2022, 17:30	Trefl Gdańsk	3	59.0	22.0	33.0	73.0	16.00	0.101695	-6	...	2.0	8.0	0.847458	60	48	0.050847	10.00	1	1	0
7	01.10.2022, 17:30	Asseco Resovia	3	52.0	23.0	25.0	74.0	14.00	0.115385	-8	...	7.0	5.0	0.692308	52	34	0.192308	8.00	1	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5269	14.03.2010, 14:45	Asseco Resovia	3	60.0	78.0	6.0	8.0	2.00	1.016667	2	...	85.0	5.0	0.050000	48	56	0.100000	2.00	1	1	0
5270	14.03.2010, 14:45	Jastrzębski Węgiel	3	73.0	87.0	4.0	15.0	1.00	0.917808	3	...	100.0	4.0	0.095890	58	58	0.150685	2.75	1	1	1
5271	14.03.2010, 14:45	ZAKSA Kędzierzyn-Koźle	3	73.0	97.0	6.0	14.0	1.50	0.904110	4	...	96.0	7.0	0.082192	51	53	0.219178	4.00	1	1	1
5272	13.03.2010, 18:00	PGE Skra Bełchatów	3	67.0	97.0	3.0	14.0	0.75	1.044776	2	...	104.0	8.0	0.164179	53	51	0.164179	2.75	1	1	1
5277	20.03.2010, 14:45	Jastrzębski Węgiel	3	50.0	66.0	1.0	9.0	0.33	1.040000	1	...	73.0	7.0	0.020000	42	58	0.140000	2.33	1	1	1

2639 rows × 25 columns

I find the team who won the most amount of games.

a=df5["Team_"].value_counts()
a

ZAKSA Kędzierzyn-Koźle    324
PGE Skra Bełchatów        321
Asseco Resovia            290
Jastrzębski Węgiel        286
Projekt Warszawa          194
AZS Olsztyn               172
Trefl Gdańsk              171
Chemik Bydgoszcz          129
Cuprum Lubin              110
Czarni Radom              108
Warta Zawiercie            91
AZS Częstochowa            83
GKS Katowice               74
Społem Kielce              71
MKS Będzin                 44
BBTS Bielsko-Biała         35
Ślepsk Malow Suwałki       32
Stal Nysa                  26
LUK  Lublin                22
Stocznia Szczecin          19
Jadar Radom                17
Pamapol Wielton Wieluń     14
Barkom Każany Lwów          6
Name: Team_, dtype: int64

df_2 = df1.loc[df1['Team_'] == 'ZAKSA Kędzierzyn-Koźle']
df_2

	Date	Team_	T_Score	T_Sum	T_BP	T_Ratio	T_Srv_Sum	T_Srv_Err	T_Srv_Ace	T_Srv_Eff	...	T_Att_Err	T_Att_Blk	T_Att_Kill	T_Att_Kill_Perc	T_Att_Eff	T_Blk_Sum	T_Blk_As	Winner	Pred	p
14	04.10.2022, 21:00	ZAKSA Kędzierzyn-Koźle	3	59.0	23.0	37.0	73.0	9.00	0.050847	-5	...	5.0	4.0	0.762712	59	47	0.186441	7.00	1	1	0
29	13.10.2022, 17:30	ZAKSA Kędzierzyn-Koźle	3	53.0	20.0	34.0	73.0	7.00	0.075472	0	...	3.0	2.0	0.735849	56	49	0.188679	5.00	1	1	1
44	30.10.2022, 14:45	ZAKSA Kędzierzyn-Koźle	3	50.0	22.0	26.0	73.0	11.00	0.120000	0	...	2.0	6.0	0.660000	50	37	0.220000	9.00	1	1	1
59	06.11.2022, 14:45	ZAKSA Kędzierzyn-Koźle	3	62.0	20.0	26.0	95.0	14.00	0.032258	-9	...	8.0	8.0	0.790323	49	33	0.177419	18.00	1	1	1
74	19.11.2022, 17:30	ZAKSA Kędzierzyn-Koźle	3	60.0	15.0	40.0	74.0	9.00	0.033333	-6	...	5.0	3.0	0.800000	57	48	0.166667	6.00	1	1	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5235	10.01.2010, 14:45	ZAKSA Kędzierzyn-Koźle	3	59.0	74.0	11.0	14.0	3.67	0.694915	0	...	68.0	2.0	0.084746	44	65	0.067797	1.33	1	1	1
5245	12.02.2010, 18:00	ZAKSA Kędzierzyn-Koźle	3	72.0	103.0	8.0	13.0	1.60	1.125000	11	...	102.0	5.0	0.166667	54	53	0.138889	2.00	1	1	0
5254	20.02.2010, 17:00	ZAKSA Kędzierzyn-Koźle	3	67.0	91.0	3.0	16.0	0.75	1.074627	1	...	124.0	6.0	0.164179	56	45	0.134328	2.25	1	1	0
5263	06.03.2010, 18:00	ZAKSA Kędzierzyn-Koźle	3	77.0	106.0	12.0	18.0	2.40	1.077922	3	...	115.0	8.0	0.207792	52	45	0.168831	2.60	1	0	1
5271	14.03.2010, 14:45	ZAKSA Kędzierzyn-Koźle	3	73.0	97.0	6.0	14.0	1.50	0.904110	4	...	96.0	7.0	0.082192	51	53	0.219178	4.00	1	1	1

445 rows × 25 columns

alt.Chart(df_2).mark_circle().encode(
    x="T_Blk_Sum",
    y="T_Rec_Pos",
    color=alt.Color("p", title="Wins"),
    tooltip = ("T_Blk_Sum", "T_Rec_Pos")
).properties(
    width=500,
    height=200,
)

It doesn’t have much connections between block and reception skills, the deep blue dots are more on the top, but it is not a strong relation, so defensive skills are not as important as offensive skills.

Summary#

After predicting, I found that the score portion of serving ace is the most important factor for men’s volleyball, but more bigger ace scores portion actually causes negative result; On contrary, effective attack is more important to win a game. Therefore, it is more essential to practice attack skills than serving skills. Also, we found block and reception don’t have significant effect, so defensive skills don’t show equal importance as offensive in this data.