The relationship and prediction between Pokemon’s stats and generation
Contents
The relationship and prediction between Pokemon’s stats and generation#
Author: Mingyan Xu
Course Project, UC Irvine, Math 10, F22
Introduction#
Introduce your project here. Maybe 3 sentences.
From other’s project in the past quarter, I found that there were some analysis about Pokemon, which was one of my favorite games, so I chose to explore more about this dataset. The main topic I choose is to find the relationship between the generations and the each stats for pokemons, and study whether the strength of Pokemon has become stronger through generations. In addition, I will try to predict the trend on designing new generation pokemon, such as their type and stats.
Import Section#
import pandas as pd
import altair as alt
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
Feature Engineering#
This dataset contains the general information of 800 different pokemons, including their names, types, HP, attack, defense and other stats.
df = pd.read_csv("Pokemon.csv")
df
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 600 | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 700 | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 600 | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 680 | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True |
799 | 721 | Volcanion | Fire | Water | 600 | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
800 rows × 13 columns
I think I’m going to use the pokemons that have two types to avoid the error porvided by types. Use dropna() method to drop the nan value in the whole dataset.
df = df.dropna()
I plan to use non-legendary pokemon for this project because the legendary pokemons are designed to have higher stats compared to the regular pokemons.
df = df[df["Legendary"]==False]
df
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
6 | 6 | Charizard | Fire | Flying | 534 | 78 | 84 | 78 | 109 | 85 | 100 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
785 | 711 | GourgeistSmall Size | Ghost | Grass | 494 | 55 | 85 | 122 | 58 | 75 | 99 | 6 | False |
786 | 711 | GourgeistLarge Size | Ghost | Grass | 494 | 75 | 95 | 122 | 58 | 75 | 69 | 6 | False |
787 | 711 | GourgeistSuper Size | Ghost | Grass | 494 | 85 | 100 | 122 | 58 | 75 | 54 | 6 | False |
790 | 714 | Noibat | Flying | Dragon | 245 | 40 | 30 | 35 | 45 | 40 | 55 | 6 | False |
791 | 715 | Noivern | Flying | Dragon | 535 | 85 | 70 | 80 | 97 | 80 | 123 | 6 | False |
374 rows × 13 columns
I don’t need the “Legendary” column anymore, so I just drop it.
df = df.drop("Legendary",axis=1)
The original column names are not working when doing the altair chart, so I change the name for speical attack and defense.
df = df.rename(columns={"Sp. Atk":"SpecialA","Sp. Def": "SpecialD"})
df.shape
(374, 12)
Analysis of the dataset#
To focus on the generation, first check the number of pokemons in each generations.
df["Generation"].value_counts()
1 74
3 74
5 69
4 61
2 53
6 43
Name: Generation, dtype: int64
Use groupby method to see more detailed information.
df.groupby("Generation").apply(display)
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | SpecialA | SpecialD | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 |
6 | 6 | Charizard | Fire | Flying | 534 | 78 | 84 | 78 | 109 | 85 | 100 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
151 | 140 | Kabuto | Rock | Water | 355 | 30 | 80 | 90 | 55 | 45 | 55 | 1 |
152 | 141 | Kabutops | Rock | Water | 495 | 60 | 115 | 105 | 65 | 70 | 80 | 1 |
153 | 142 | Aerodactyl | Rock | Flying | 515 | 80 | 105 | 65 | 60 | 75 | 130 | 1 |
154 | 142 | AerodactylMega Aerodactyl | Rock | Flying | 615 | 80 | 135 | 85 | 70 | 95 | 150 | 1 |
161 | 149 | Dragonite | Dragon | Flying | 600 | 91 | 134 | 95 | 100 | 100 | 80 | 1 |
74 rows × 12 columns
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | SpecialA | SpecialD | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
177 | 163 | Hoothoot | Normal | Flying | 262 | 60 | 30 | 30 | 36 | 56 | 50 | 2 |
178 | 164 | Noctowl | Normal | Flying | 442 | 100 | 50 | 50 | 76 | 96 | 70 | 2 |
179 | 165 | Ledyba | Bug | Flying | 265 | 40 | 20 | 30 | 40 | 80 | 55 | 2 |
180 | 166 | Ledian | Bug | Flying | 390 | 55 | 35 | 50 | 55 | 110 | 85 | 2 |
181 | 167 | Spinarak | Bug | Poison | 250 | 40 | 60 | 40 | 40 | 40 | 30 | 2 |
182 | 168 | Ariados | Bug | Poison | 390 | 70 | 90 | 70 | 60 | 60 | 40 | 2 |
183 | 169 | Crobat | Poison | Flying | 535 | 85 | 90 | 80 | 70 | 80 | 130 | 2 |
184 | 170 | Chinchou | Water | Electric | 330 | 75 | 38 | 38 | 56 | 56 | 67 | 2 |
185 | 171 | Lanturn | Water | Electric | 460 | 125 | 58 | 58 | 76 | 76 | 67 | 2 |
188 | 174 | Igglybuff | Normal | Fairy | 210 | 90 | 30 | 15 | 40 | 20 | 15 | 2 |
190 | 176 | Togetic | Fairy | Flying | 405 | 55 | 40 | 85 | 80 | 105 | 40 | 2 |
191 | 177 | Natu | Psychic | Flying | 320 | 40 | 50 | 45 | 70 | 45 | 70 | 2 |
192 | 178 | Xatu | Psychic | Flying | 470 | 65 | 75 | 70 | 95 | 70 | 95 | 2 |
196 | 181 | AmpharosMega Ampharos | Electric | Dragon | 610 | 90 | 95 | 105 | 165 | 110 | 45 | 2 |
198 | 183 | Marill | Water | Fairy | 250 | 70 | 20 | 50 | 20 | 50 | 40 | 2 |
199 | 184 | Azumarill | Water | Fairy | 420 | 100 | 50 | 80 | 60 | 80 | 50 | 2 |
202 | 187 | Hoppip | Grass | Flying | 250 | 35 | 35 | 40 | 35 | 55 | 50 | 2 |
203 | 188 | Skiploom | Grass | Flying | 340 | 55 | 45 | 50 | 45 | 65 | 80 | 2 |
204 | 189 | Jumpluff | Grass | Flying | 460 | 75 | 55 | 70 | 55 | 95 | 110 | 2 |
208 | 193 | Yanma | Bug | Flying | 390 | 65 | 65 | 45 | 75 | 45 | 95 | 2 |
209 | 194 | Wooper | Water | Ground | 210 | 55 | 45 | 45 | 25 | 25 | 15 | 2 |
210 | 195 | Quagsire | Water | Ground | 430 | 95 | 85 | 85 | 65 | 65 | 35 | 2 |
213 | 198 | Murkrow | Dark | Flying | 405 | 60 | 85 | 42 | 85 | 42 | 91 | 2 |
214 | 199 | Slowking | Water | Psychic | 490 | 95 | 75 | 80 | 100 | 110 | 30 | 2 |
218 | 203 | Girafarig | Normal | Psychic | 455 | 70 | 80 | 65 | 90 | 65 | 85 | 2 |
220 | 205 | Forretress | Bug | Steel | 465 | 75 | 90 | 140 | 60 | 60 | 40 | 2 |
222 | 207 | Gligar | Ground | Flying | 430 | 65 | 75 | 105 | 35 | 65 | 85 | 2 |
223 | 208 | Steelix | Steel | Ground | 510 | 75 | 85 | 200 | 55 | 65 | 30 | 2 |
224 | 208 | SteelixMega Steelix | Steel | Ground | 610 | 75 | 125 | 230 | 55 | 95 | 30 | 2 |
227 | 211 | Qwilfish | Water | Poison | 430 | 65 | 95 | 75 | 55 | 55 | 85 | 2 |
228 | 212 | Scizor | Bug | Steel | 500 | 70 | 130 | 100 | 55 | 80 | 65 | 2 |
229 | 212 | ScizorMega Scizor | Bug | Steel | 600 | 70 | 150 | 140 | 65 | 100 | 75 | 2 |
230 | 213 | Shuckle | Bug | Rock | 505 | 20 | 10 | 230 | 10 | 230 | 5 | 2 |
231 | 214 | Heracross | Bug | Fighting | 500 | 80 | 125 | 75 | 40 | 95 | 85 | 2 |
232 | 214 | HeracrossMega Heracross | Bug | Fighting | 600 | 80 | 185 | 115 | 40 | 105 | 75 | 2 |
233 | 215 | Sneasel | Dark | Ice | 430 | 55 | 95 | 55 | 35 | 75 | 115 | 2 |
237 | 219 | Magcargo | Fire | Rock | 410 | 50 | 50 | 120 | 80 | 80 | 30 | 2 |
238 | 220 | Swinub | Ice | Ground | 250 | 50 | 50 | 40 | 30 | 30 | 50 | 2 |
239 | 221 | Piloswine | Ice | Ground | 450 | 100 | 100 | 80 | 60 | 60 | 50 | 2 |
240 | 222 | Corsola | Water | Rock | 380 | 55 | 55 | 85 | 65 | 85 | 35 | 2 |
243 | 225 | Delibird | Ice | Flying | 330 | 45 | 55 | 45 | 65 | 45 | 75 | 2 |
244 | 226 | Mantine | Water | Flying | 465 | 65 | 40 | 70 | 80 | 140 | 70 | 2 |
245 | 227 | Skarmory | Steel | Flying | 465 | 65 | 80 | 140 | 40 | 70 | 70 | 2 |
246 | 228 | Houndour | Dark | Fire | 330 | 45 | 60 | 30 | 80 | 50 | 65 | 2 |
247 | 229 | Houndoom | Dark | Fire | 500 | 75 | 90 | 50 | 110 | 80 | 95 | 2 |
248 | 229 | HoundoomMega Houndoom | Dark | Fire | 600 | 75 | 90 | 90 | 140 | 90 | 115 | 2 |
249 | 230 | Kingdra | Water | Dragon | 540 | 75 | 95 | 95 | 95 | 95 | 85 | 2 |
257 | 238 | Smoochum | Ice | Psychic | 305 | 45 | 30 | 15 | 85 | 65 | 65 | 2 |
265 | 246 | Larvitar | Rock | Ground | 300 | 50 | 64 | 50 | 45 | 50 | 41 | 2 |
266 | 247 | Pupitar | Rock | Ground | 410 | 70 | 84 | 70 | 65 | 70 | 51 | 2 |
267 | 248 | Tyranitar | Rock | Dark | 600 | 100 | 134 | 110 | 95 | 100 | 61 | 2 |
268 | 248 | TyranitarMega Tyranitar | Rock | Dark | 700 | 100 | 164 | 150 | 95 | 120 | 71 | 2 |
271 | 251 | Celebi | Psychic | Grass | 600 | 100 | 100 | 100 | 100 | 100 | 100 | 2 |
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | SpecialA | SpecialD | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
275 | 254 | SceptileMega Sceptile | Grass | Dragon | 630 | 70 | 110 | 75 | 145 | 85 | 145 | 3 |
277 | 256 | Combusken | Fire | Fighting | 405 | 60 | 85 | 60 | 85 | 60 | 55 | 3 |
278 | 257 | Blaziken | Fire | Fighting | 530 | 80 | 120 | 70 | 110 | 70 | 80 | 3 |
279 | 257 | BlazikenMega Blaziken | Fire | Fighting | 630 | 80 | 160 | 80 | 130 | 80 | 100 | 3 |
281 | 259 | Marshtomp | Water | Ground | 405 | 70 | 85 | 70 | 60 | 70 | 50 | 3 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
409 | 373 | SalamenceMega Salamence | Dragon | Flying | 700 | 95 | 145 | 130 | 120 | 90 | 120 | 3 |
410 | 374 | Beldum | Steel | Psychic | 300 | 40 | 55 | 80 | 35 | 60 | 30 | 3 |
411 | 375 | Metang | Steel | Psychic | 420 | 60 | 75 | 100 | 55 | 80 | 50 | 3 |
412 | 376 | Metagross | Steel | Psychic | 600 | 80 | 135 | 130 | 95 | 90 | 70 | 3 |
413 | 376 | MetagrossMega Metagross | Steel | Psychic | 700 | 80 | 145 | 150 | 105 | 110 | 110 | 3 |
74 rows × 12 columns
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | SpecialA | SpecialD | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
434 | 389 | Torterra | Grass | Ground | 525 | 95 | 109 | 105 | 75 | 85 | 56 | 4 |
436 | 391 | Monferno | Fire | Fighting | 405 | 64 | 78 | 52 | 78 | 52 | 81 | 4 |
437 | 392 | Infernape | Fire | Fighting | 534 | 76 | 104 | 71 | 104 | 71 | 108 | 4 |
440 | 395 | Empoleon | Water | Steel | 530 | 84 | 86 | 88 | 111 | 101 | 60 | 4 |
441 | 396 | Starly | Normal | Flying | 245 | 40 | 55 | 30 | 30 | 30 | 60 | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
532 | 479 | RotomHeat Rotom | Electric | Fire | 520 | 50 | 65 | 107 | 105 | 107 | 86 | 4 |
533 | 479 | RotomWash Rotom | Electric | Water | 520 | 50 | 65 | 107 | 105 | 107 | 86 | 4 |
534 | 479 | RotomFrost Rotom | Electric | Ice | 520 | 50 | 65 | 107 | 105 | 107 | 86 | 4 |
535 | 479 | RotomFan Rotom | Electric | Flying | 520 | 50 | 65 | 107 | 105 | 107 | 86 | 4 |
536 | 479 | RotomMow Rotom | Electric | Grass | 520 | 50 | 65 | 107 | 105 | 107 | 86 | 4 |
61 rows × 12 columns
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | SpecialA | SpecialD | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
558 | 499 | Pignite | Fire | Fighting | 418 | 90 | 93 | 55 | 70 | 55 | 55 | 5 |
559 | 500 | Emboar | Fire | Fighting | 528 | 110 | 123 | 65 | 100 | 65 | 65 | 5 |
578 | 519 | Pidove | Normal | Flying | 264 | 50 | 55 | 50 | 36 | 30 | 43 | 5 |
579 | 520 | Tranquill | Normal | Flying | 358 | 62 | 77 | 62 | 50 | 42 | 65 | 5 |
580 | 521 | Unfezant | Normal | Flying | 488 | 80 | 115 | 80 | 65 | 55 | 93 | 5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
713 | 647 | KeldeoOrdinary Forme | Water | Fighting | 580 | 91 | 72 | 90 | 129 | 90 | 108 | 5 |
714 | 647 | KeldeoResolute Forme | Water | Fighting | 580 | 91 | 72 | 90 | 129 | 90 | 108 | 5 |
715 | 648 | MeloettaAria Forme | Normal | Psychic | 600 | 100 | 77 | 77 | 128 | 128 | 90 | 5 |
716 | 648 | MeloettaPirouette Forme | Normal | Fighting | 600 | 100 | 128 | 90 | 77 | 77 | 128 | 5 |
717 | 649 | Genesect | Bug | Steel | 600 | 71 | 120 | 95 | 120 | 95 | 99 | 5 |
69 rows × 12 columns
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | SpecialA | SpecialD | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
720 | 652 | Chesnaught | Grass | Fighting | 530 | 88 | 107 | 122 | 74 | 75 | 64 | 6 |
723 | 655 | Delphox | Fire | Psychic | 534 | 75 | 69 | 72 | 114 | 100 | 104 | 6 |
726 | 658 | Greninja | Water | Dark | 530 | 72 | 95 | 67 | 103 | 71 | 122 | 6 |
728 | 660 | Diggersby | Normal | Ground | 423 | 85 | 56 | 77 | 50 | 77 | 78 | 6 |
729 | 661 | Fletchling | Normal | Flying | 278 | 45 | 50 | 43 | 40 | 38 | 62 | 6 |
730 | 662 | Fletchinder | Fire | Flying | 382 | 62 | 73 | 55 | 56 | 52 | 84 | 6 |
731 | 663 | Talonflame | Fire | Flying | 499 | 78 | 81 | 71 | 74 | 69 | 126 | 6 |
734 | 666 | Vivillon | Bug | Flying | 411 | 80 | 52 | 50 | 90 | 50 | 89 | 6 |
735 | 667 | Litleo | Fire | Normal | 369 | 62 | 50 | 58 | 73 | 54 | 72 | 6 |
736 | 668 | Pyroar | Fire | Normal | 507 | 86 | 68 | 72 | 109 | 66 | 106 | 6 |
743 | 675 | Pangoro | Fighting | Dark | 495 | 95 | 124 | 78 | 69 | 71 | 58 | 6 |
748 | 679 | Honedge | Steel | Ghost | 325 | 45 | 80 | 100 | 35 | 37 | 28 | 6 |
749 | 680 | Doublade | Steel | Ghost | 448 | 59 | 110 | 150 | 45 | 49 | 35 | 6 |
750 | 681 | AegislashBlade Forme | Steel | Ghost | 520 | 60 | 150 | 50 | 150 | 50 | 60 | 6 |
751 | 681 | AegislashShield Forme | Steel | Ghost | 520 | 60 | 50 | 150 | 50 | 150 | 60 | 6 |
756 | 686 | Inkay | Dark | Psychic | 288 | 53 | 54 | 53 | 37 | 46 | 45 | 6 |
757 | 687 | Malamar | Dark | Psychic | 482 | 86 | 92 | 88 | 68 | 75 | 73 | 6 |
758 | 688 | Binacle | Rock | Water | 306 | 42 | 52 | 67 | 39 | 56 | 50 | 6 |
759 | 689 | Barbaracle | Rock | Water | 500 | 72 | 105 | 115 | 54 | 86 | 68 | 6 |
760 | 690 | Skrelp | Poison | Water | 320 | 50 | 60 | 60 | 60 | 60 | 30 | 6 |
761 | 691 | Dragalge | Poison | Dragon | 494 | 65 | 75 | 90 | 97 | 123 | 44 | 6 |
764 | 694 | Helioptile | Electric | Normal | 289 | 44 | 38 | 33 | 61 | 43 | 70 | 6 |
765 | 695 | Heliolisk | Electric | Normal | 481 | 62 | 55 | 52 | 109 | 94 | 109 | 6 |
766 | 696 | Tyrunt | Rock | Dragon | 362 | 58 | 89 | 77 | 45 | 45 | 48 | 6 |
767 | 697 | Tyrantrum | Rock | Dragon | 521 | 82 | 121 | 119 | 69 | 59 | 71 | 6 |
768 | 698 | Amaura | Rock | Ice | 362 | 77 | 59 | 50 | 67 | 63 | 46 | 6 |
769 | 699 | Aurorus | Rock | Ice | 521 | 123 | 77 | 72 | 99 | 92 | 58 | 6 |
771 | 701 | Hawlucha | Fighting | Flying | 500 | 78 | 92 | 75 | 74 | 63 | 118 | 6 |
772 | 702 | Dedenne | Electric | Fairy | 431 | 67 | 58 | 57 | 81 | 67 | 101 | 6 |
773 | 703 | Carbink | Rock | Fairy | 500 | 50 | 50 | 150 | 50 | 150 | 50 | 6 |
777 | 707 | Klefki | Steel | Fairy | 470 | 57 | 80 | 91 | 80 | 87 | 75 | 6 |
778 | 708 | Phantump | Ghost | Grass | 309 | 43 | 70 | 48 | 50 | 60 | 38 | 6 |
779 | 709 | Trevenant | Ghost | Grass | 474 | 85 | 110 | 76 | 65 | 82 | 56 | 6 |
780 | 710 | PumpkabooAverage Size | Ghost | Grass | 335 | 49 | 66 | 70 | 44 | 55 | 51 | 6 |
781 | 710 | PumpkabooSmall Size | Ghost | Grass | 335 | 44 | 66 | 70 | 44 | 55 | 56 | 6 |
782 | 710 | PumpkabooLarge Size | Ghost | Grass | 335 | 54 | 66 | 70 | 44 | 55 | 46 | 6 |
783 | 710 | PumpkabooSuper Size | Ghost | Grass | 335 | 59 | 66 | 70 | 44 | 55 | 41 | 6 |
784 | 711 | GourgeistAverage Size | Ghost | Grass | 494 | 65 | 90 | 122 | 58 | 75 | 84 | 6 |
785 | 711 | GourgeistSmall Size | Ghost | Grass | 494 | 55 | 85 | 122 | 58 | 75 | 99 | 6 |
786 | 711 | GourgeistLarge Size | Ghost | Grass | 494 | 75 | 95 | 122 | 58 | 75 | 69 | 6 |
787 | 711 | GourgeistSuper Size | Ghost | Grass | 494 | 85 | 100 | 122 | 58 | 75 | 54 | 6 |
790 | 714 | Noibat | Flying | Dragon | 245 | 40 | 30 | 35 | 45 | 40 | 55 | 6 |
791 | 715 | Noivern | Flying | Dragon | 535 | 85 | 70 | 80 | 97 | 80 | 123 | 6 |
Divide the original dataset into 6 based on generations.
grouped = df.groupby("Generation")
list=[]
for gen, group in grouped:
list.append(group)
Data Visualization#
First visualizing the number of pokemons in each generation and find that the first generation has the most and the sixth generation has the least.
alt.Chart(df).mark_bar(size=15).encode(
x=alt.X("Generation"),
y="count()",
color="Generation:N"
)
plt.figure(figsize=(6,4))
corr=df.iloc[:,5:11].corr().round(3)
sns.heatmap(corr,annot=True)
sns.set(font_scale=1.0)
sns.set_style("whitegrid")
alt.Chart(df).mark_circle().encode(
x=alt.X("SpecialA",scale=alt.Scale(domain=[0, 250])),
y=alt.Y("Attack",scale=alt.Scale(domain=[0, 240])),
color="Generation:N",
tooltip=["Name","Attack","SpecialA"]
).properties(
title="Attack v.s. Special Attack based on generation"
)
alt.Chart(df).mark_circle().encode(
x=alt.X("SpecialD",scale=alt.Scale(domain=[0, 250])),
y=alt.Y("Defense",scale=alt.Scale(domain=[0, 240])),
color="Generation:N",
tooltip=["Name","Defense","SpecialD"]
).properties(
title="Defense v.s. Special Defense based on generation"
)
From the chart above, we can find that the general shape and trend for each generation are similar. I find that the distribution is not similar to what I believe before. I think a pokemon should have whether a high attack or high special attack, but the graph shows that most of the pokemon have the same attack and special attack. Also the same observation for the defense and special defense.
K-Mean clusters#
Use the stats value to predict clusters. I choose 6 because there are total 6 generations in the dataset.
kmeans = KMeans(n_clusters=6)
Use all the stats value in the dataset to predict the cluster.
first_col = "Total"
last_col = "Speed"
kmeans.fit(df[[first_col,last_col]])
KMeans(n_clusters=6)
arr = kmeans.predict(df[[first_col,last_col]])
df["cluster"]= arr
alt.Chart(df).mark_circle().encode(
x="Attack",
y="SpecialA",
color="cluster:N",
tooltip=["#","Name","Attack","SpecialA"]
).facet(
row="Generation"
)
The charts show that the predicting cluster is roughly and evenly distributed in each generation, so we can get a conclusion that Pokemon’s stats strength are not clearly related to generation. Therefore, we can say that The number of Pokemons of similar strength is about the same in each generation based on the result from K-means cluster.
Machine Learning - Linear Regression#
For the machine learning part, I first choose to use the linear regression model to predict the generation based on the known data.
features = ["Total","HP","Attack","SpecialA","Defense","SpecialD","Speed"]
X_train, X_test, y_train, y_test = train_test_split(df[features],df["Generation"],train_size=0.7)
lin = LinearRegression()
lin.fit(X_train,y_train)
LinearRegression()
linear_train_accuracy = lin.score(X_train,y_train)
linear_test_accuracy = lin.score(X_test,y_test)
print(f"When using the linear regression model,\nthe accuracy for the training set is {linear_train_accuracy},\nand the accuracy for the test set is {linear_test_accuracy}.")
When using the linear regression model,
the accuracy for the training set is 0.010434201363984985,
and the accuracy for the test set is -0.04463859744672849.
By using the score method, I calculate the accuracy for both train and test data, but the accuracies are very low compared with the dataset we used in class, which means that my dataset doesn’t perform well under the linear regression model.
Machine Learning - Logistic Regression#
Then, I try to use the logistic regression model to check if the generation can be predicted using pokemon’s stats. For this model, I also include the calculation of the mean squared error to see the accuracy.
lgr = LogisticRegression(max_iter=500)
lgr.fit(X_train,y_train)
LogisticRegression(max_iter=500)
log_test_accuracy = lgr.score(X_test, y_test)
log_train_accuracy = lgr.score(X_train, y_train)
log_train_error = mean_squared_error(y_train,lgr.predict(X_train))
log_test_error = mean_squared_error(y_test,lgr.predict(X_test))
print(f"The mean squared error for the training set is {log_train_error},\n and the mean squared error for the test set is {log_test_error}.\nThe accuracy for the training set is {log_train_accuracy},\n and the accuracy for the test set is {log_test_accuracy}.")
The mean squared error for the training set is 4.819923371647509,
and the mean squared error for the test set is 5.070796460176991.
The accuracy for the training set is 0.2681992337164751,
and the accuracy for the test set is 0.18584070796460178.
I get the similar result as the linear regression model, even a little bit less accurate because my accuracy is lower and error is relatively high.
It is clear that the generation can’t be predicted by the stats from three above regression model, so I changed my goal to using attack to predict the special attack for each pokemon for the following regression.
Machine Learning - Decision Tree Regressor#
Since my expectation is to predict the generation, using tree regressor instead of classifier is better.
alt.Chart(df).mark_line().encode(
x="Attack",
y="SpecialA"
)
There are too many peak in the dataset, maybe it’s hard to predict. I choose the generation 2, which consists of the most pokemons, and try to predict the stat of speical attacks.
df1 = list[1]
reg = DecisionTreeRegressor(max_leaf_nodes=10,max_depth=10)
reg.fit(df1[["Attack"]],df1["SpecialA"])
DecisionTreeRegressor(max_depth=10, max_leaf_nodes=10)
df1["PredictA"] = reg.predict(df1[["Attack"]])
d1 = alt.Chart(df1).mark_line().encode(
x="Attack",
y="SpecialA"
)
d2 = alt.Chart(df1).mark_line(color="red").encode(
x="Attack",
y="PredictA"
)
d1+d2
The graph shows that the prection is kind of accurate, but there is a big peak in the middle of the data which is hard to predect.
Machine Learning - Random Forest Regressor#
Finally, try to use random forest regressor to predict pokemons in generation 6, which has the least pokemons.
df2 = list[5]
features2 = ["Total","HP","Attack","Defense","SpecialD","Speed"]
X_train1, X_test1, y_train1, y_test1 = train_test_split(df2[features2],df2["SpecialA"],train_size=0.6,random_state=32203564)
rfe = RandomForestRegressor(n_estimators=100, max_leaf_nodes=15)
rfe.fit(X_train1,y_train1)
RandomForestRegressor(max_leaf_nodes=15)
rfe.score(X_train1,y_train1)
0.8852875743696715
rfe.score(X_test1,y_test1)
0.4096405747744222
Graphing for the Random Forest Regressor#
rfe.fit(df2[["Attack"]],df2["SpecialA"])
RandomForestRegressor(max_leaf_nodes=15)
df2["PredictA"] = rfe.predict(df2[["Attack"]])
d1 = alt.Chart(df2).mark_line().encode(
x="Attack",
y="SpecialA"
)
d2 = alt.Chart(df2).mark_line(color="red").encode(
x="Attack",
y="PredictA"
)
d1+d2
Summary#
Either summarize what you did, or summarize the results. Maybe 3 sentences.
I try to use different methods, including K-Mean clusters,logistics regression and linear regression, to analysis the relationship between generations and the stats of pokemons, but none of them performed very well to fit the data. Also, it is hard to do the prediction for future generation pokemons based on the stats becasue the relationship is not strong enough. After failing to predict the generation, I try to predict the special attack values based on attack values using dicision tree regressor and random forest regressor, and find that the special attack values can be predected by the attack values with small error and high accuracy.
References#
Your code above should include references. Here is some additional space for references.
What is the source of your dataset(s)? My dataset is from Kaggle. Pokemon
List any other references that you found helpful. Tutorial for Pokemon Pinting and forming new dataframe based on a groupby object Difference between Dicision Tree Regressor and Classifier
Submission#
Using the Share button at the top right, enable Comment privileges for anyone with a link to the project. Then submit that link on Canvas.
Created in Deepnote