Predict Position of Player with Seasonal Stats
Contents
Predict Position of Player with Seasonal Stats¶
Author: Sarah Thayer
Course Project, UC Irvine, Math 10, W22
Introduction¶
My project is exploring two Kaggle Datasets of NBA stats to predict their listed position. Two datasets are needed. One contains the players and their listed positions. The second contains season stats, height, and weight. We. merge the two for a complete dataset.
Main portion of the project¶
(You can either have all one section or divide into multiple sections)
import numpy as np
import pandas as pd
import altair as alt
Load First Dataset¶
NBA Stats found here.
The columns we are interested in extracting are Player
, Team
, Pos
, Age
.
Display shape of the dataframe to confirm we have enough data points.
df_position = pd.read_csv("nba.csv", sep = ',' ,encoding = 'latin-1')
df_position = df_position.loc[:,df_position.columns.isin(['Player', 'Tm', 'Pos', 'Age']) ]
df_position.head()
Player | Pos | Age | Tm | |
---|---|---|---|---|
0 | Alex Abrines\abrinal01 | SG | 24 | OKC |
1 | Quincy Acy\acyqu01 | PF | 27 | BRK |
2 | Steven Adams\adamsst01 | C | 24 | OKC |
3 | Bam Adebayo\adebaba01 | C | 20 | MIA |
4 | Arron Afflalo\afflaar01 | SG | 32 | ORL |
df_position.shape
(664, 4)
Clean First Dataset¶
Rename column TM
to Team
.
NBA Players have unique player id in Player
column. Remove player ID to view names. ( i.e.”Alex Abrines\abrinal01” remove the unique player id after the “")
NBA players that have been traded mid-season appear twice. Drop Player
duplicates from the dataframe and check shape.
df_position = df_position.rename(columns={'Tm' : 'Team'})
df_position['Player'] = df_position['Player'].map(lambda x: x.split('\\')[0])
df_pos_unique = df_position[~df_position.duplicated(subset=['Player'])]
df_pos_unique.shape
df_pos_unique.head()
Player | Pos | Age | Team | |
---|---|---|---|---|
0 | Alex Abrines | SG | 24 | OKC |
1 | Quincy Acy | PF | 27 | BRK |
2 | Steven Adams | C | 24 | OKC |
3 | Bam Adebayo | C | 20 | MIA |
4 | Arron Afflalo | SG | 32 | ORL |
Load Second Dataset¶
NBA stats found here.
Large dataset of 20 years of stats form 49 differrent leagues. Parse dataframe for the the relevant data in the NBA league during the 2017 - 2018 regular season. Then our new dataframe contains height and weight.
df = pd.read_csv("players_stats_by_season_full_details.csv", encoding='latin-1' )
df= df[(df["League"] == 'NBA') & (df["Season"] == '2017 - 2018') & (df["Stage"] == 'Regular_Season')]
df_hw = df.loc[:,~df.columns.isin(['Rk', 'League', 'Season', 'Stage', 'birth_month', 'birth_date', 'height', 'weight',
'nationality', 'high_school', 'draft_round', 'draft_pick', 'draft_team'])]
df_hw.shape
(279, 22)
Clean Second Dataset¶
Drop duplicates from dataframe with Player
from the dataframe containing height and weight.
df_hw_unique = df_hw[~df_hw.duplicated(subset=['Player'])]
df_hw_unique.shape
(279, 22)
Prepare Merged Data¶
Merge First and Second Dataset
Encode the NBA listed positions
Confirm it’s the same player by matching name and team.
df_merged = pd.merge(df_pos_unique,df_hw_unique, on = ['Player','Team'])
One-Hot Encoding¶
Encode the positions (strings
) into numbers (ints
).
enc = {'PG' : 1, 'SG' : 2, 'SF': 3, 'PF': 4, 'C':5}
df_merged["Pos_enc"] = df_merged["Pos"].map(enc)
df_merged
Player | Pos | Age | Team | GP | MIN | FGM | FGA | 3PM | 3PA | ... | DRB | REB | AST | STL | BLK | PTS | birth_year | height_cm | weight_kg | Pos_enc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alex Abrines | SG | 24 | OKC | 75 | 1134.2 | 115 | 291 | 84 | 221 | ... | 88 | 114 | 28 | 38 | 8 | 353 | 1993.0 | 198.0 | 86.0 | 2 |
1 | Quincy Acy | PF | 27 | BRK | 70 | 1359.2 | 130 | 365 | 102 | 292 | ... | 217 | 257 | 57 | 33 | 29 | 411 | 1990.0 | 201.0 | 109.0 | 4 |
2 | Steven Adams | C | 24 | OKC | 76 | 2487.0 | 448 | 712 | 0 | 2 | ... | 301 | 685 | 88 | 92 | 78 | 1056 | 1993.0 | 213.0 | 120.0 | 5 |
3 | Bam Adebayo | C | 20 | MIA | 69 | 1368.1 | 174 | 340 | 0 | 7 | ... | 263 | 381 | 101 | 32 | 41 | 477 | 1997.0 | 208.0 | 116.0 | 5 |
4 | LaMarcus Aldridge | C | 32 | SAS | 75 | 2508.7 | 687 | 1347 | 27 | 92 | ... | 389 | 635 | 152 | 43 | 90 | 1735 | 1985.0 | 211.0 | 120.0 | 5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
229 | Lou Williams | SG | 31 | LAC | 79 | 2589.2 | 582 | 1337 | 186 | 518 | ... | 158 | 198 | 417 | 85 | 19 | 1782 | 1986.0 | 185.0 | 79.0 | 2 |
230 | Justise Winslow | PF | 21 | MIA | 68 | 1680.2 | 207 | 488 | 49 | 129 | ... | 306 | 370 | 148 | 54 | 33 | 529 | 1996.0 | 201.0 | 102.0 | 4 |
231 | Delon Wright | PG | 25 | TOR | 69 | 1432.7 | 201 | 432 | 56 | 153 | ... | 153 | 198 | 200 | 72 | 33 | 555 | 1992.0 | 196.0 | 83.0 | 1 |
232 | Nick Young | SG | 32 | GSW | 80 | 1393.1 | 201 | 488 | 123 | 326 | ... | 105 | 125 | 36 | 38 | 7 | 581 | 1985.0 | 201.0 | 95.0 | 2 |
233 | Thaddeus Young | PF | 29 | IND | 81 | 2607.2 | 421 | 864 | 58 | 181 | ... | 328 | 512 | 152 | 135 | 36 | 955 | 1988.0 | 203.0 | 100.0 | 4 |
234 rows × 25 columns
Find Best Model¶
Feature Selection: Data has player name, team, position, height, weight, and 20+ seasonal stats. Not all features are relevant to predicting NBA Position. Test with different varations of features. Iterate through combinations()
of k columns in cols
. Combinations and estimating the counts of the training trials can be found here: “…the number of k-element subsets (or k-combinations) of an n-element set”.
KNN Optimization: On each combination of possible training features, iterate through range of ints for possible n_neighbors
.
Log Results: Create results dictionary to store features
, n_neighbors
, log_loss
, and classifier
for each training iteration. Iterate through results dictionary to find smallest log_loss
along with features used, n_neighbors
used, and the classifier object.
results = {
trial_num: {
'features': [],
'n_neighbors':,
'log_loss':
'classifier':
}
}
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import log_loss
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import warnings
warnings.filterwarnings('ignore')
from itertools import combinations
cols = ['Pos_enc','FGM', 'FGA', '3PM','3PA','FTM', 'FTA','TOV','PF','ORB','DRB',
'REB','AST','STL', "BLK",'PTS','height_cm','weight_kg'
]
trial_num = 0 # count of training attempts
loss_min = False # found log_loss minimum
n_search = True # searching for ideal n neighbors
results = {} # dictionary of results per training attempt
found_clf = False
for count in range(12, 18):
print(f"Testing: {len(cols)} Choose {count}")
for tup in combinations(cols,count): # iterate through combination of columns
for i in range(3,6): # iterate through options of n neighbors
if n_search:
X = df_merged[list(tup)]
y = df_merged["Pos"]
scaler = StandardScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
X_scaled_train, X_scaled_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
clf = KNeighborsClassifier(n_neighbors=i)
clf.fit(X_scaled_train,y_train)
X["pred"] = clf.predict(X_scaled)
probs = clf.predict_proba(X_scaled_test)
loss = log_loss( y_true=y_test ,y_pred=probs , labels= clf.classes_)
results[trial_num] = {
'features': list(tup) ,
'n_neighbors': i,
'log_loss': loss,
'classifier': clf
}
trial_num+=1
if loss < .7:
n_search = False
loss_min = True
found_clf = True
print(f"Found ideal n_neighbors")
break
if (n_search == False) or (loss<.6):
loss_min = True
print('Found combination of features')
break
if loss_min:
print('Return classifier')
break
if not found_clf:
print(f"Couldn't find accurate classifier. Continue to find best results.")
Testing: 18 Choose 12
Found ideal n_neighbors
Found combination of features
eturn classifier
Return Best Results¶
Find the training iteration with the best log_loss
.
Return the classifier and print the features selected, neighbors used, and corresponding log_loss
.
min_log_loss = results[0]['log_loss']
for key in results:
# key = trial number
iter_features = results[key]['features']
iter_n_neighbors = results[key]['n_neighbors']
iter_log_loss = results[key]['log_loss']
if iter_log_loss < min_log_loss:
min_log_loss = iter_log_loss
min_key=key
print(f"Total Attempts: {len(results)}")
print(f"Best log_loss: {results[min_key]['log_loss']}")
print(f"Best features: {results[min_key]['features']}")
print(f"Number of features: {len(results[min_key]['features'])}")
print(f"Ideal n_neighbors: {results[min_key]['n_neighbors']}")
print(f"Best classifier: {results[min_key]['classifier']}")
Total Attempts: 54609
Best log_loss: 0.6832883413753273
Best features: ['3PM', '3PA', 'FTM', 'FTA', 'TOV', 'ORB', 'DRB', 'REB', 'AST', 'PTS', 'height_cm', 'weight_kg']
Number of features: 12
Ideal n_neighbors: 5
Best classifier: KNeighborsClassifier()
Predict position of NBA players on entire dataset.
Access best classifier in results
dict by min_key
.
X = df_merged[results[min_key]['features']]
scaler = StandardScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)
clf = results[min_key]['classifier']
df_merged['Preds'] = clf.predict(X)
Vizualize Results¶
Display all the Centers. Our predicted values show good results at identifying Centers.
Look at a true Point Gaurd. Chris Paul is a good example of a Point Gaurd.
Look at Lebron James. In 2018, for Clevland, Kaggle has him listed has a Power Foward.
df_merged[df_merged['Pos_enc']==5]
Player | Pos | Age | Team | GP | MIN | FGM | FGA | 3PM | 3PA | ... | REB | AST | STL | BLK | PTS | birth_year | height_cm | weight_kg | Pos_enc | Preds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Steven Adams | C | 24 | OKC | 76 | 2487.0 | 448 | 712 | 0 | 2 | ... | 685 | 88 | 92 | 78 | 1056 | 1993.0 | 213.0 | 120.0 | 5 | C |
3 | Bam Adebayo | C | 20 | MIA | 69 | 1368.1 | 174 | 340 | 0 | 7 | ... | 381 | 101 | 32 | 41 | 477 | 1997.0 | 208.0 | 116.0 | 5 | C |
4 | LaMarcus Aldridge | C | 32 | SAS | 75 | 2508.7 | 687 | 1347 | 27 | 92 | ... | 635 | 152 | 43 | 90 | 1735 | 1985.0 | 211.0 | 120.0 | 5 | C |
5 | Jarrett Allen | C | 19 | BRK | 72 | 1441.2 | 234 | 397 | 5 | 15 | ... | 388 | 49 | 28 | 88 | 587 | 1998.0 | 211.0 | 108.0 | 5 | C |
16 | Aron Baynes | C | 31 | BOS | 81 | 1484.7 | 210 | 446 | 3 | 21 | ... | 434 | 93 | 22 | 51 | 482 | 1986.0 | 208.0 | 118.0 | 5 | C |
21 | Jordan Bell | C | 23 | GSW | 57 | 808.9 | 116 | 185 | 0 | 4 | ... | 207 | 102 | 35 | 56 | 262 | 1995.0 | 206.0 | 102.0 | 5 | C |
23 | Bismack Biyombo | C | 25 | ORL | 82 | 1494.9 | 183 | 352 | 0 | 1 | ... | 468 | 66 | 21 | 95 | 468 | 1992.0 | 206.0 | 116.0 | 5 | C |
25 | Tarik Black | C | 26 | HOU | 51 | 536.5 | 75 | 127 | 1 | 11 | ... | 163 | 13 | 21 | 30 | 180 | 1991.0 | 206.0 | 113.0 | 5 | C |
35 | Clint Capela | C | 23 | HOU | 74 | 2034.2 | 441 | 676 | 0 | 1 | ... | 802 | 68 | 58 | 138 | 1026 | 1994.0 | 208.0 | 109.0 | 5 | C |
38 | Willie Cauley-Stein | C | 24 | SAC | 73 | 2044.0 | 388 | 773 | 3 | 12 | ... | 510 | 172 | 77 | 67 | 932 | 1993.0 | 213.0 | 109.0 | 5 | C |
43 | Zach Collins | C | 20 | POR | 66 | 1045.6 | 115 | 289 | 35 | 113 | ... | 221 | 52 | 17 | 31 | 292 | 1997.0 | 213.0 | 107.0 | 5 | C |
51 | Deyonta Davis | C | 21 | MEM | 62 | 942.6 | 161 | 265 | 0 | 0 | ... | 250 | 40 | 15 | 39 | 360 | 1996.0 | 206.0 | 108.0 | 5 | C |
52 | Ed Davis | C | 28 | POR | 78 | 1470.9 | 170 | 292 | 0 | 1 | ... | 575 | 40 | 32 | 52 | 414 | 1989.0 | 208.0 | 102.0 | 5 | C |
53 | Dewayne Dedmon | C | 28 | ATL | 62 | 1542.3 | 250 | 477 | 50 | 141 | ... | 489 | 90 | 40 | 51 | 617 | 1989.0 | 213.0 | 111.0 | 5 | C |
57 | Gorgui Dieng | C | 28 | MIN | 79 | 1332.7 | 186 | 388 | 19 | 61 | ... | 360 | 71 | 45 | 39 | 470 | 1990.0 | 211.0 | 114.0 | 5 | C |
60 | Andre Drummond | C | 24 | DET | 78 | 2625.0 | 466 | 881 | 0 | 11 | ... | 1247 | 237 | 114 | 127 | 1171 | 1993.0 | 211.0 | 127.0 | 5 | C |
63 | Joel Embiid | C | 23 | PHI | 63 | 1912.3 | 510 | 1056 | 66 | 214 | ... | 690 | 199 | 40 | 111 | 1445 | 1994.0 | 213.0 | 113.0 | 5 | C |
64 | Derrick Favors | C | 26 | UTA | 77 | 2153.5 | 395 | 702 | 14 | 63 | ... | 552 | 102 | 54 | 82 | 944 | 1991.0 | 208.0 | 120.0 | 5 | C |
72 | Marc Gasol | C | 33 | MEM | 73 | 2408.4 | 434 | 1033 | 109 | 320 | ... | 592 | 305 | 54 | 101 | 1258 | 1985.0 | 216.0 | 116.0 | 5 | C |
73 | Pau Gasol | C | 37 | SAS | 77 | 1812.0 | 287 | 627 | 43 | 120 | ... | 619 | 238 | 24 | 79 | 775 | 1980.0 | 213.0 | 113.0 | 5 | C |
78 | Rudy Gobert | C | 25 | UTA | 56 | 1816.1 | 276 | 444 | 0 | 0 | ... | 601 | 80 | 44 | 129 | 756 | 1992.0 | 216.0 | 111.0 | 5 | C |
81 | Marcin Gortat | C | 33 | WAS | 82 | 2074.6 | 290 | 560 | 0 | 0 | ... | 623 | 151 | 40 | 61 | 690 | 1984.0 | 211.0 | 109.0 | 5 | C |
90 | Montrezl Harrell | C | 24 | LAC | 76 | 1293.0 | 348 | 548 | 1 | 7 | ... | 307 | 74 | 36 | 52 | 836 | 1994.0 | 203.0 | 109.0 | 5 | PF |
94 | John Henson | C | 27 | MIL | 76 | 1969.7 | 287 | 502 | 1 | 7 | ... | 513 | 114 | 45 | 109 | 665 | 1990.0 | 211.0 | 99.0 | 5 | C |
100 | Al Horford | C | 31 | BOS | 72 | 2277.1 | 368 | 753 | 97 | 226 | ... | 530 | 339 | 43 | 78 | 927 | 1986.0 | 208.0 | 111.0 | 5 | PF |
112 | Amir Johnson | C | 30 | PHI | 74 | 1170.3 | 140 | 260 | 10 | 32 | ... | 330 | 118 | 45 | 44 | 342 | 1987.0 | 206.0 | 109.0 | 5 | C |
117 | Nikola Jokic | C | 22 | DEN | 75 | 2440.9 | 504 | 1010 | 111 | 280 | ... | 803 | 458 | 90 | 61 | 1385 | 1995.0 | 213.0 | 113.0 | 5 | C |
119 | DeAndre Jordan | C | 29 | LAC | 77 | 2423.7 | 373 | 578 | 0 | 0 | ... | 1171 | 117 | 39 | 71 | 927 | 1988.0 | 211.0 | 120.0 | 5 | C |
121 | Enes Kanter | C | 25 | NYK | 71 | 1829.8 | 422 | 713 | 0 | 2 | ... | 780 | 105 | 36 | 37 | 1000 | 1992.0 | 211.0 | 113.0 | 5 | C |
125 | Kosta Koufos | C | 28 | SAC | 71 | 1391.2 | 222 | 389 | 0 | 0 | ... | 472 | 87 | 48 | 32 | 477 | 1989.0 | 213.0 | 120.0 | 5 | C |
133 | Kevon Looney | C | 21 | GSW | 66 | 910.0 | 112 | 193 | 1 | 5 | ... | 215 | 42 | 34 | 56 | 267 | 1996.0 | 206.0 | 100.0 | 5 | C |
134 | Brook Lopez | C | 29 | LAL | 74 | 1735.0 | 369 | 793 | 112 | 325 | ... | 294 | 126 | 30 | 98 | 961 | 1988.0 | 213.0 | 122.0 | 5 | SF |
135 | Robin Lopez | C | 29 | CHI | 64 | 1690.5 | 342 | 645 | 4 | 14 | ... | 290 | 124 | 14 | 53 | 756 | 1988.0 | 213.0 | 125.0 | 5 | C |
136 | Kevin Love | C | 29 | CLE | 59 | 1651.2 | 334 | 729 | 137 | 330 | ... | 546 | 103 | 43 | 24 | 1039 | 1988.0 | 208.0 | 114.0 | 5 | PF |
140 | Ian Mahinmi | C | 31 | WAS | 77 | 1145.3 | 138 | 248 | 0 | 2 | ... | 312 | 53 | 38 | 42 | 366 | 1986.0 | 211.0 | 119.0 | 5 | C |
141 | Thon Maker | C | 20 | MIL | 74 | 1237.8 | 130 | 316 | 31 | 104 | ... | 225 | 46 | 38 | 53 | 356 | 1997.0 | 216.0 | 100.0 | 5 | C |
148 | JaVale McGee | C | 30 | GSW | 65 | 615.2 | 136 | 219 | 0 | 6 | ... | 169 | 33 | 21 | 57 | 310 | 1988.0 | 213.0 | 122.0 | 5 | C |
150 | Salah Mejri | C | 31 | DAL | 61 | 728.9 | 88 | 137 | 0 | 3 | ... | 246 | 35 | 22 | 67 | 214 | 1986.0 | 218.0 | 107.0 | 5 | C |
164 | Dirk Nowitzki | C | 39 | DAL | 77 | 1900.4 | 346 | 758 | 138 | 337 | ... | 438 | 120 | 43 | 45 | 927 | 1978.0 | 213.0 | 111.0 | 5 | PF |
166 | Jusuf Nurkic | C | 23 | POR | 79 | 2088.3 | 480 | 951 | 0 | 7 | ... | 708 | 143 | 64 | 111 | 1132 | 1994.0 | 213.0 | 132.0 | 5 | C |
169 | Kyle O'Quinn | C | 27 | NYK | 77 | 1386.6 | 224 | 384 | 4 | 17 | ... | 470 | 158 | 36 | 98 | 550 | 1990.0 | 208.0 | 113.0 | 5 | C |
174 | Zaza Pachulia | C | 33 | GSW | 69 | 971.8 | 149 | 264 | 0 | 1 | ... | 321 | 109 | 38 | 17 | 373 | 1984.0 | 211.0 | 122.0 | 5 | C |
179 | Mason Plumlee | C | 27 | DEN | 74 | 1441.3 | 221 | 368 | 0 | 1 | ... | 400 | 142 | 49 | 81 | 524 | 1990.0 | 211.0 | 107.0 | 5 | C |
180 | Jakob Poeltl | C | 22 | TOR | 82 | 1523.5 | 253 | 384 | 1 | 2 | ... | 393 | 57 | 39 | 100 | 567 | 1995.0 | 213.0 | 109.0 | 5 | C |
183 | Dwight Powell | C | 26 | DAL | 79 | 1671.6 | 255 | 430 | 28 | 84 | ... | 444 | 91 | 67 | 32 | 671 | 1991.0 | 211.0 | 109.0 | 5 | C |
185 | Julius Randle | C | 23 | LAL | 82 | 2190.5 | 504 | 904 | 10 | 45 | ... | 654 | 210 | 43 | 45 | 1323 | 1994.0 | 206.0 | 113.0 | 5 | C |
193 | Domantas Sabonis | C | 21 | IND | 74 | 1811.8 | 340 | 661 | 13 | 37 | ... | 572 | 151 | 40 | 32 | 861 | 1996.0 | 211.0 | 109.0 | 5 | C |
211 | Daniel Theis | C | 25 | BOS | 63 | 935.7 | 126 | 233 | 18 | 58 | ... | 274 | 56 | 30 | 48 | 331 | 1992.0 | 203.0 | 110.0 | 5 | C |
214 | Tristan Thompson | C | 26 | CLE | 53 | 1072.2 | 132 | 235 | 0 | 0 | ... | 352 | 33 | 16 | 17 | 307 | 1991.0 | 208.0 | 108.0 | 5 | C |
217 | Karl-Anthony Towns | C | 22 | MIN | 82 | 2918.1 | 639 | 1172 | 120 | 285 | ... | 1012 | 199 | 64 | 115 | 1743 | 1995.0 | 213.0 | 112.0 | 5 | C |
220 | Myles Turner | C | 21 | IND | 65 | 1836.3 | 306 | 639 | 56 | 157 | ... | 417 | 87 | 38 | 118 | 828 | 1996.0 | 211.0 | 113.0 | 5 | PF |
221 | Ekpe Udoh | C | 30 | UTA | 63 | 809.5 | 60 | 120 | 0 | 1 | ... | 150 | 53 | 43 | 74 | 162 | 1987.0 | 208.0 | 111.0 | 5 | SF |
222 | Jonas Valanciunas | C | 25 | TOR | 77 | 1727.3 | 390 | 687 | 30 | 74 | ... | 660 | 81 | 29 | 69 | 980 | 1992.0 | 213.0 | 120.0 | 5 | C |
225 | David West | C | 37 | GSW | 73 | 998.7 | 216 | 378 | 3 | 8 | ... | 238 | 138 | 47 | 75 | 495 | 1980.0 | 206.0 | 113.0 | 5 | C |
227 | Hassan Whiteside | C | 28 | MIA | 54 | 1364.2 | 312 | 578 | 2 | 2 | ... | 618 | 54 | 38 | 94 | 754 | 1989.0 | 216.0 | 120.0 | 5 | C |
55 rows × 26 columns
df_merged[df_merged['Player']=='Chris Paul']
Player | Pos | Age | Team | GP | MIN | FGM | FGA | 3PM | 3PA | ... | REB | AST | STL | BLK | PTS | birth_year | height_cm | weight_kg | Pos_enc | Preds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
178 | Chris Paul | PG | 32 | HOU | 58 | 1846.6 | 367 | 798 | 144 | 379 | ... | 313 | 457 | 96 | 14 | 1081 | 1985.0 | 183.0 | 79.0 | 1 | PG |
1 rows × 26 columns
df_merged[df_merged['Player']=='LeBron James']
Player | Pos | Age | Team | GP | MIN | FGM | FGA | 3PM | 3PA | ... | REB | AST | STL | BLK | PTS | birth_year | height_cm | weight_kg | Pos_enc | Preds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
110 | LeBron James | PF | 33 | CLE | 82 | 3025.8 | 857 | 1580 | 149 | 406 | ... | 709 | 747 | 116 | 71 | 2251 | 1984.0 | 203.0 | 113.0 | 4 | PF |
1 rows × 26 columns
Evaluate Performance¶
Evalute the performance of classifier using the log_loss
metric.
X_scaled_train, X_scaled_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
probs = clf.predict_proba(X_scaled_test)
log_loss(y_true=y_test ,y_pred=probs , labels= clf.classes_)
loss = log_loss(y_true=y_test ,y_pred=probs , labels= clf.classes_)
loss
0.6991690157387052
1st row mostly Pink some Light Blue. We’re good at determining Point Guards, sometimes Small Forwards trigger false positive.
5th row mostly Blue with some Yellow. Makes sense. Great at determining Centers. Sometimes Power Forwards trigger false positive.
chart = alt.Chart(df_merged).mark_circle().encode(
x=alt.X('height_cm:O', axis=alt.Axis(title='Height in cm')),
y=alt.X('Pos_enc:O', axis=alt.Axis(title='Encoded')),
color = alt.Color("Preds", title = "Positions"),
).properties(
title = f"Predicted NBA Positions",
)
chart
Summary¶
Taking player seasonal stats, height, and weight we attempted to predict NBA positions by classification. Some NBA positions are easier to predict than others.
References¶
Include references that you found helpful. Also say where you found the dataset you used.
Dataframes used from Kaggle
Basketball Players Stats per Season - 49 Leagues found here.
NBA Player Stats 2017-2018 found here.
Created in Deepnote