The relationship and prediction between Pokemon’s stats and generation#

Author: Mingyan Xu

Course Project, UC Irvine, Math 10, F22

Introduction#

Introduce your project here. Maybe 3 sentences.

From other’s project in the past quarter, I found that there were some analysis about Pokemon, which was one of my favorite games, so I chose to explore more about this dataset. The main topic I choose is to find the relationship between the generations and the each stats for pokemons, and study whether the strength of Pokemon has become stronger through generations. In addition, I will try to predict the trend on designing new generation pokemon, such as their type and stats.

Import Section#

import pandas as pd
import altair as alt
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

Feature Engineering#

This dataset contains the general information of 800 different pokemons, including their names, types, HP, attack, defense and other stats.

df = pd.read_csv("Pokemon.csv")
df
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
795 719 Diancie Rock Fairy 600 50 100 150 100 150 50 6 True
796 719 DiancieMega Diancie Rock Fairy 700 50 160 110 160 110 110 6 True
797 720 HoopaHoopa Confined Psychic Ghost 600 80 110 60 150 130 70 6 True
798 720 HoopaHoopa Unbound Psychic Dark 680 80 160 60 170 130 80 6 True
799 721 Volcanion Fire Water 600 80 110 120 130 90 70 6 True

800 rows × 13 columns

I think I’m going to use the pokemons that have two types to avoid the error porvided by types. Use dropna() method to drop the nan value in the whole dataset.

df = df.dropna()

I plan to use non-legendary pokemon for this project because the legendary pokemons are designed to have higher stats compared to the regular pokemons.

df = df[df["Legendary"]==False]
df
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
... ... ... ... ... ... ... ... ... ... ... ... ... ...
785 711 GourgeistSmall Size Ghost Grass 494 55 85 122 58 75 99 6 False
786 711 GourgeistLarge Size Ghost Grass 494 75 95 122 58 75 69 6 False
787 711 GourgeistSuper Size Ghost Grass 494 85 100 122 58 75 54 6 False
790 714 Noibat Flying Dragon 245 40 30 35 45 40 55 6 False
791 715 Noivern Flying Dragon 535 85 70 80 97 80 123 6 False

374 rows × 13 columns

I don’t need the “Legendary” column anymore, so I just drop it.

df = df.drop("Legendary",axis=1)

The original column names are not working when doing the altair chart, so I change the name for speical attack and defense.

df = df.rename(columns={"Sp. Atk":"SpecialA","Sp. Def": "SpecialD"})
df.shape
(374, 12)

Analysis of the dataset#

To focus on the generation, first check the number of pokemons in each generations.

df["Generation"].value_counts()
1    74
3    74
5    69
4    61
2    53
6    43
Name: Generation, dtype: int64

Use groupby method to see more detailed information.

df.groupby("Generation").apply(display)
# Name Type 1 Type 2 Total HP Attack Defense SpecialA SpecialD Speed Generation
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1
... ... ... ... ... ... ... ... ... ... ... ... ...
151 140 Kabuto Rock Water 355 30 80 90 55 45 55 1
152 141 Kabutops Rock Water 495 60 115 105 65 70 80 1
153 142 Aerodactyl Rock Flying 515 80 105 65 60 75 130 1
154 142 AerodactylMega Aerodactyl Rock Flying 615 80 135 85 70 95 150 1
161 149 Dragonite Dragon Flying 600 91 134 95 100 100 80 1

74 rows × 12 columns

# Name Type 1 Type 2 Total HP Attack Defense SpecialA SpecialD Speed Generation
177 163 Hoothoot Normal Flying 262 60 30 30 36 56 50 2
178 164 Noctowl Normal Flying 442 100 50 50 76 96 70 2
179 165 Ledyba Bug Flying 265 40 20 30 40 80 55 2
180 166 Ledian Bug Flying 390 55 35 50 55 110 85 2
181 167 Spinarak Bug Poison 250 40 60 40 40 40 30 2
182 168 Ariados Bug Poison 390 70 90 70 60 60 40 2
183 169 Crobat Poison Flying 535 85 90 80 70 80 130 2
184 170 Chinchou Water Electric 330 75 38 38 56 56 67 2
185 171 Lanturn Water Electric 460 125 58 58 76 76 67 2
188 174 Igglybuff Normal Fairy 210 90 30 15 40 20 15 2
190 176 Togetic Fairy Flying 405 55 40 85 80 105 40 2
191 177 Natu Psychic Flying 320 40 50 45 70 45 70 2
192 178 Xatu Psychic Flying 470 65 75 70 95 70 95 2
196 181 AmpharosMega Ampharos Electric Dragon 610 90 95 105 165 110 45 2
198 183 Marill Water Fairy 250 70 20 50 20 50 40 2
199 184 Azumarill Water Fairy 420 100 50 80 60 80 50 2
202 187 Hoppip Grass Flying 250 35 35 40 35 55 50 2
203 188 Skiploom Grass Flying 340 55 45 50 45 65 80 2
204 189 Jumpluff Grass Flying 460 75 55 70 55 95 110 2
208 193 Yanma Bug Flying 390 65 65 45 75 45 95 2
209 194 Wooper Water Ground 210 55 45 45 25 25 15 2
210 195 Quagsire Water Ground 430 95 85 85 65 65 35 2
213 198 Murkrow Dark Flying 405 60 85 42 85 42 91 2
214 199 Slowking Water Psychic 490 95 75 80 100 110 30 2
218 203 Girafarig Normal Psychic 455 70 80 65 90 65 85 2
220 205 Forretress Bug Steel 465 75 90 140 60 60 40 2
222 207 Gligar Ground Flying 430 65 75 105 35 65 85 2
223 208 Steelix Steel Ground 510 75 85 200 55 65 30 2
224 208 SteelixMega Steelix Steel Ground 610 75 125 230 55 95 30 2
227 211 Qwilfish Water Poison 430 65 95 75 55 55 85 2
228 212 Scizor Bug Steel 500 70 130 100 55 80 65 2
229 212 ScizorMega Scizor Bug Steel 600 70 150 140 65 100 75 2
230 213 Shuckle Bug Rock 505 20 10 230 10 230 5 2
231 214 Heracross Bug Fighting 500 80 125 75 40 95 85 2
232 214 HeracrossMega Heracross Bug Fighting 600 80 185 115 40 105 75 2
233 215 Sneasel Dark Ice 430 55 95 55 35 75 115 2
237 219 Magcargo Fire Rock 410 50 50 120 80 80 30 2
238 220 Swinub Ice Ground 250 50 50 40 30 30 50 2
239 221 Piloswine Ice Ground 450 100 100 80 60 60 50 2
240 222 Corsola Water Rock 380 55 55 85 65 85 35 2
243 225 Delibird Ice Flying 330 45 55 45 65 45 75 2
244 226 Mantine Water Flying 465 65 40 70 80 140 70 2
245 227 Skarmory Steel Flying 465 65 80 140 40 70 70 2
246 228 Houndour Dark Fire 330 45 60 30 80 50 65 2
247 229 Houndoom Dark Fire 500 75 90 50 110 80 95 2
248 229 HoundoomMega Houndoom Dark Fire 600 75 90 90 140 90 115 2
249 230 Kingdra Water Dragon 540 75 95 95 95 95 85 2
257 238 Smoochum Ice Psychic 305 45 30 15 85 65 65 2
265 246 Larvitar Rock Ground 300 50 64 50 45 50 41 2
266 247 Pupitar Rock Ground 410 70 84 70 65 70 51 2
267 248 Tyranitar Rock Dark 600 100 134 110 95 100 61 2
268 248 TyranitarMega Tyranitar Rock Dark 700 100 164 150 95 120 71 2
271 251 Celebi Psychic Grass 600 100 100 100 100 100 100 2
# Name Type 1 Type 2 Total HP Attack Defense SpecialA SpecialD Speed Generation
275 254 SceptileMega Sceptile Grass Dragon 630 70 110 75 145 85 145 3
277 256 Combusken Fire Fighting 405 60 85 60 85 60 55 3
278 257 Blaziken Fire Fighting 530 80 120 70 110 70 80 3
279 257 BlazikenMega Blaziken Fire Fighting 630 80 160 80 130 80 100 3
281 259 Marshtomp Water Ground 405 70 85 70 60 70 50 3
... ... ... ... ... ... ... ... ... ... ... ... ...
409 373 SalamenceMega Salamence Dragon Flying 700 95 145 130 120 90 120 3
410 374 Beldum Steel Psychic 300 40 55 80 35 60 30 3
411 375 Metang Steel Psychic 420 60 75 100 55 80 50 3
412 376 Metagross Steel Psychic 600 80 135 130 95 90 70 3
413 376 MetagrossMega Metagross Steel Psychic 700 80 145 150 105 110 110 3

74 rows × 12 columns

# Name Type 1 Type 2 Total HP Attack Defense SpecialA SpecialD Speed Generation
434 389 Torterra Grass Ground 525 95 109 105 75 85 56 4
436 391 Monferno Fire Fighting 405 64 78 52 78 52 81 4
437 392 Infernape Fire Fighting 534 76 104 71 104 71 108 4
440 395 Empoleon Water Steel 530 84 86 88 111 101 60 4
441 396 Starly Normal Flying 245 40 55 30 30 30 60 4
... ... ... ... ... ... ... ... ... ... ... ... ...
532 479 RotomHeat Rotom Electric Fire 520 50 65 107 105 107 86 4
533 479 RotomWash Rotom Electric Water 520 50 65 107 105 107 86 4
534 479 RotomFrost Rotom Electric Ice 520 50 65 107 105 107 86 4
535 479 RotomFan Rotom Electric Flying 520 50 65 107 105 107 86 4
536 479 RotomMow Rotom Electric Grass 520 50 65 107 105 107 86 4

61 rows × 12 columns

# Name Type 1 Type 2 Total HP Attack Defense SpecialA SpecialD Speed Generation
558 499 Pignite Fire Fighting 418 90 93 55 70 55 55 5
559 500 Emboar Fire Fighting 528 110 123 65 100 65 65 5
578 519 Pidove Normal Flying 264 50 55 50 36 30 43 5
579 520 Tranquill Normal Flying 358 62 77 62 50 42 65 5
580 521 Unfezant Normal Flying 488 80 115 80 65 55 93 5
... ... ... ... ... ... ... ... ... ... ... ... ...
713 647 KeldeoOrdinary Forme Water Fighting 580 91 72 90 129 90 108 5
714 647 KeldeoResolute Forme Water Fighting 580 91 72 90 129 90 108 5
715 648 MeloettaAria Forme Normal Psychic 600 100 77 77 128 128 90 5
716 648 MeloettaPirouette Forme Normal Fighting 600 100 128 90 77 77 128 5
717 649 Genesect Bug Steel 600 71 120 95 120 95 99 5

69 rows × 12 columns

# Name Type 1 Type 2 Total HP Attack Defense SpecialA SpecialD Speed Generation
720 652 Chesnaught Grass Fighting 530 88 107 122 74 75 64 6
723 655 Delphox Fire Psychic 534 75 69 72 114 100 104 6
726 658 Greninja Water Dark 530 72 95 67 103 71 122 6
728 660 Diggersby Normal Ground 423 85 56 77 50 77 78 6
729 661 Fletchling Normal Flying 278 45 50 43 40 38 62 6
730 662 Fletchinder Fire Flying 382 62 73 55 56 52 84 6
731 663 Talonflame Fire Flying 499 78 81 71 74 69 126 6
734 666 Vivillon Bug Flying 411 80 52 50 90 50 89 6
735 667 Litleo Fire Normal 369 62 50 58 73 54 72 6
736 668 Pyroar Fire Normal 507 86 68 72 109 66 106 6
743 675 Pangoro Fighting Dark 495 95 124 78 69 71 58 6
748 679 Honedge Steel Ghost 325 45 80 100 35 37 28 6
749 680 Doublade Steel Ghost 448 59 110 150 45 49 35 6
750 681 AegislashBlade Forme Steel Ghost 520 60 150 50 150 50 60 6
751 681 AegislashShield Forme Steel Ghost 520 60 50 150 50 150 60 6
756 686 Inkay Dark Psychic 288 53 54 53 37 46 45 6
757 687 Malamar Dark Psychic 482 86 92 88 68 75 73 6
758 688 Binacle Rock Water 306 42 52 67 39 56 50 6
759 689 Barbaracle Rock Water 500 72 105 115 54 86 68 6
760 690 Skrelp Poison Water 320 50 60 60 60 60 30 6
761 691 Dragalge Poison Dragon 494 65 75 90 97 123 44 6
764 694 Helioptile Electric Normal 289 44 38 33 61 43 70 6
765 695 Heliolisk Electric Normal 481 62 55 52 109 94 109 6
766 696 Tyrunt Rock Dragon 362 58 89 77 45 45 48 6
767 697 Tyrantrum Rock Dragon 521 82 121 119 69 59 71 6
768 698 Amaura Rock Ice 362 77 59 50 67 63 46 6
769 699 Aurorus Rock Ice 521 123 77 72 99 92 58 6
771 701 Hawlucha Fighting Flying 500 78 92 75 74 63 118 6
772 702 Dedenne Electric Fairy 431 67 58 57 81 67 101 6
773 703 Carbink Rock Fairy 500 50 50 150 50 150 50 6
777 707 Klefki Steel Fairy 470 57 80 91 80 87 75 6
778 708 Phantump Ghost Grass 309 43 70 48 50 60 38 6
779 709 Trevenant Ghost Grass 474 85 110 76 65 82 56 6
780 710 PumpkabooAverage Size Ghost Grass 335 49 66 70 44 55 51 6
781 710 PumpkabooSmall Size Ghost Grass 335 44 66 70 44 55 56 6
782 710 PumpkabooLarge Size Ghost Grass 335 54 66 70 44 55 46 6
783 710 PumpkabooSuper Size Ghost Grass 335 59 66 70 44 55 41 6
784 711 GourgeistAverage Size Ghost Grass 494 65 90 122 58 75 84 6
785 711 GourgeistSmall Size Ghost Grass 494 55 85 122 58 75 99 6
786 711 GourgeistLarge Size Ghost Grass 494 75 95 122 58 75 69 6
787 711 GourgeistSuper Size Ghost Grass 494 85 100 122 58 75 54 6
790 714 Noibat Flying Dragon 245 40 30 35 45 40 55 6
791 715 Noivern Flying Dragon 535 85 70 80 97 80 123 6

Divide the original dataset into 6 based on generations.

grouped = df.groupby("Generation")
list=[]

for gen, group in grouped:
    list.append(group)

Data Visualization#

First visualizing the number of pokemons in each generation and find that the first generation has the most and the sixth generation has the least.

alt.Chart(df).mark_bar(size=15).encode(
    x=alt.X("Generation"),
    y="count()",
    color="Generation:N"
)
plt.figure(figsize=(6,4))
corr=df.iloc[:,5:11].corr().round(3)
sns.heatmap(corr,annot=True)
sns.set(font_scale=1.0)
sns.set_style("whitegrid")
../../_images/MingyanXu_26_0.png
alt.Chart(df).mark_circle().encode(
    x=alt.X("SpecialA",scale=alt.Scale(domain=[0, 250])),
    y=alt.Y("Attack",scale=alt.Scale(domain=[0, 240])),
    color="Generation:N",
    tooltip=["Name","Attack","SpecialA"]
).properties(
    title="Attack v.s. Special Attack based on generation"
)
alt.Chart(df).mark_circle().encode(
    x=alt.X("SpecialD",scale=alt.Scale(domain=[0, 250])),
    y=alt.Y("Defense",scale=alt.Scale(domain=[0, 240])),
    color="Generation:N",
    tooltip=["Name","Defense","SpecialD"]
).properties(
    title="Defense v.s. Special Defense based on generation"
)

From the chart above, we can find that the general shape and trend for each generation are similar. I find that the distribution is not similar to what I believe before. I think a pokemon should have whether a high attack or high special attack, but the graph shows that most of the pokemon have the same attack and special attack. Also the same observation for the defense and special defense.

K-Mean clusters#

Use the stats value to predict clusters. I choose 6 because there are total 6 generations in the dataset.

kmeans = KMeans(n_clusters=6)

Use all the stats value in the dataset to predict the cluster.

first_col = "Total"
last_col = "Speed"
kmeans.fit(df[[first_col,last_col]])
KMeans(n_clusters=6)
arr = kmeans.predict(df[[first_col,last_col]])
df["cluster"]= arr
alt.Chart(df).mark_circle().encode(
    x="Attack",
    y="SpecialA",
    color="cluster:N",
    tooltip=["#","Name","Attack","SpecialA"]
).facet(
    row="Generation"
)

The charts show that the predicting cluster is roughly and evenly distributed in each generation, so we can get a conclusion that Pokemon’s stats strength are not clearly related to generation. Therefore, we can say that The number of Pokemons of similar strength is about the same in each generation based on the result from K-means cluster.

Machine Learning - Linear Regression#

For the machine learning part, I first choose to use the linear regression model to predict the generation based on the known data.

features = ["Total","HP","Attack","SpecialA","Defense","SpecialD","Speed"]
X_train, X_test, y_train, y_test = train_test_split(df[features],df["Generation"],train_size=0.7)
lin = LinearRegression()
lin.fit(X_train,y_train)
LinearRegression()
linear_train_accuracy = lin.score(X_train,y_train)
linear_test_accuracy = lin.score(X_test,y_test)
print(f"When using the linear regression model,\nthe accuracy for the training set is {linear_train_accuracy},\nand the accuracy for the test set is {linear_test_accuracy}.")
When using the linear regression model,
the accuracy for the training set is 0.010434201363984985,
and the accuracy for the test set is -0.04463859744672849.

By using the score method, I calculate the accuracy for both train and test data, but the accuracies are very low compared with the dataset we used in class, which means that my dataset doesn’t perform well under the linear regression model.

Machine Learning - Logistic Regression#

Then, I try to use the logistic regression model to check if the generation can be predicted using pokemon’s stats. For this model, I also include the calculation of the mean squared error to see the accuracy.

lgr = LogisticRegression(max_iter=500)
lgr.fit(X_train,y_train)
LogisticRegression(max_iter=500)
log_test_accuracy = lgr.score(X_test, y_test)
log_train_accuracy = lgr.score(X_train, y_train)
log_train_error = mean_squared_error(y_train,lgr.predict(X_train))
log_test_error = mean_squared_error(y_test,lgr.predict(X_test))
print(f"The mean squared  error for the training set is {log_train_error},\n  and the mean squared error for the test set is {log_test_error}.\nThe accuracy for the training set is {log_train_accuracy},\n  and the accuracy for the test set is {log_test_accuracy}.")
The mean squared  error for the training set is 4.819923371647509,
  and the mean squared error for the test set is 5.070796460176991.
The accuracy for the training set is 0.2681992337164751,
  and the accuracy for the test set is 0.18584070796460178.

I get the similar result as the linear regression model, even a little bit less accurate because my accuracy is lower and error is relatively high.

It is clear that the generation can’t be predicted by the stats from three above regression model, so I changed my goal to using attack to predict the special attack for each pokemon for the following regression.

Machine Learning - Decision Tree Regressor#

Since my expectation is to predict the generation, using tree regressor instead of classifier is better.

alt.Chart(df).mark_line().encode(
    x="Attack",
    y="SpecialA"
)

There are too many peak in the dataset, maybe it’s hard to predict. I choose the generation 2, which consists of the most pokemons, and try to predict the stat of speical attacks.

df1 = list[1]
reg = DecisionTreeRegressor(max_leaf_nodes=10,max_depth=10)
reg.fit(df1[["Attack"]],df1["SpecialA"])
DecisionTreeRegressor(max_depth=10, max_leaf_nodes=10)
df1["PredictA"] = reg.predict(df1[["Attack"]])
d1 = alt.Chart(df1).mark_line().encode(
    x="Attack",
    y="SpecialA"
)

d2 = alt.Chart(df1).mark_line(color="red").encode(
    x="Attack",
    y="PredictA"
)
d1+d2

The graph shows that the prection is kind of accurate, but there is a big peak in the middle of the data which is hard to predect.

Machine Learning - Random Forest Regressor#

Finally, try to use random forest regressor to predict pokemons in generation 6, which has the least pokemons.

df2 = list[5]
features2 = ["Total","HP","Attack","Defense","SpecialD","Speed"]
X_train1, X_test1, y_train1, y_test1 = train_test_split(df2[features2],df2["SpecialA"],train_size=0.6,random_state=32203564)
rfe = RandomForestRegressor(n_estimators=100, max_leaf_nodes=15)
rfe.fit(X_train1,y_train1)
RandomForestRegressor(max_leaf_nodes=15)
rfe.score(X_train1,y_train1)
0.8852875743696715
rfe.score(X_test1,y_test1)
0.4096405747744222

Graphing for the Random Forest Regressor#

rfe.fit(df2[["Attack"]],df2["SpecialA"])
RandomForestRegressor(max_leaf_nodes=15)
df2["PredictA"] = rfe.predict(df2[["Attack"]])
d1 = alt.Chart(df2).mark_line().encode(
    x="Attack",
    y="SpecialA"
)

d2 = alt.Chart(df2).mark_line(color="red").encode(
    x="Attack",
    y="PredictA"
)
d1+d2

Summary#

Either summarize what you did, or summarize the results. Maybe 3 sentences.

I try to use different methods, including K-Mean clusters,logistics regression and linear regression, to analysis the relationship between generations and the stats of pokemons, but none of them performed very well to fit the data. Also, it is hard to do the prediction for future generation pokemons based on the stats becasue the relationship is not strong enough. After failing to predict the generation, I try to predict the special attack values based on attack values using dicision tree regressor and random forest regressor, and find that the special attack values can be predected by the attack values with small error and high accuracy.

References#

Your code above should include references. Here is some additional space for references.

  • What is the source of your dataset(s)? My dataset is from Kaggle. Pokemon

Submission#

Using the Share button at the top right, enable Comment privileges for anyone with a link to the project. Then submit that link on Canvas.

Created in deepnote.com Created in Deepnote