Exoplanet Candidate Analysis
Contents
Exoplanet Candidate Analysis¶
Author: Maya Drusinsky
Email: mdrusins@uci.edu
Course Project, UC Irvine, Math 10, S22
I’ve been coding since around June 2021. I had an internship at a research internship for the past 10ish months where my job was to use computer vision and some machine learning to analyze images that the researchers produced from their experiments. Most of my python knowledge involved OpenCV and matplotlib, but I also had some knowledge of Numpy, Seaborn, and Pandas. I taught myself coding (I had never taken a coding class until this quarter) so I was missing a lot of the basic knowledge of Python but I am pretty well versed in libraries we didn’t use as much in this class like OpenCV.
Introduction¶
This dataset contains approximately 10,000 exoplanet candidates discovered by NASA’s Kepler space telescope since it was launched. These are classified as false positives, candidates, or confirmed exoplanets. Along with their status, each object of interest has a column for its numerous descriptive factors (such as the source of its signal or its radius). A full description of all the columns and what they contain can be found here. The goal of my project is to use Machine Learning to use the features of these planets to try to accurately predict whether the planet’s disposition.
Main portion of the project¶
import pandas as pd
import altair as alt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import plot_tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import log_loss
df = pd.read_csv("kepler_exoplanets.csv")
df.head()
rowid | kepid | kepoi_name | kepler_name | koi_disposition | koi_pdisposition | koi_score | koi_fpflag_nt | koi_fpflag_ss | koi_fpflag_co | ... | koi_steff_err2 | koi_slogg | koi_slogg_err1 | koi_slogg_err2 | koi_srad | koi_srad_err1 | koi_srad_err2 | ra | dec | koi_kepmag | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 10797460 | K00752.01 | Kepler-227 b | CONFIRMED | CANDIDATE | 1.000 | 0 | 0 | 0 | ... | -81.0 | 4.467 | 0.064 | -0.096 | 0.927 | 0.105 | -0.061 | 291.93423 | 48.141651 | 15.347 |
1 | 2 | 10797460 | K00752.02 | Kepler-227 c | CONFIRMED | CANDIDATE | 0.969 | 0 | 0 | 0 | ... | -81.0 | 4.467 | 0.064 | -0.096 | 0.927 | 0.105 | -0.061 | 291.93423 | 48.141651 | 15.347 |
2 | 3 | 10811496 | K00753.01 | NaN | FALSE POSITIVE | FALSE POSITIVE | 0.000 | 0 | 1 | 0 | ... | -176.0 | 4.544 | 0.044 | -0.176 | 0.868 | 0.233 | -0.078 | 297.00482 | 48.134129 | 15.436 |
3 | 4 | 10848459 | K00754.01 | NaN | FALSE POSITIVE | FALSE POSITIVE | 0.000 | 0 | 1 | 0 | ... | -174.0 | 4.564 | 0.053 | -0.168 | 0.791 | 0.201 | -0.067 | 285.53461 | 48.285210 | 15.597 |
4 | 5 | 10854555 | K00755.01 | Kepler-664 b | CONFIRMED | CANDIDATE | 1.000 | 0 | 0 | 0 | ... | -211.0 | 4.438 | 0.070 | -0.210 | 1.046 | 0.334 | -0.133 | 288.75488 | 48.226200 | 15.509 |
5 rows × 50 columns
df.shape
(9564, 50)
Plotting the data¶
To get an idea of what the data looks like I will plot a variety of altair charts using different features from the dataframe. The dataframe has a lot of rows and data points in it, so I am taking a sample of 5000 points so the charts are a little more readable.
sel = alt.selection_single(fields=["koi_disposition"])
c1 = alt.Chart(df.sample(5000)).mark_bar().encode(
x = "koi_disposition",
y = "count(koi_disposition)",
color = "koi_disposition"
).properties(
title='Exoplanet Disposition')
c2 = alt.Chart(df.sample(5000)).mark_circle(clip = True).encode(
x = alt.X("koi_impact", scale = alt.Scale(domain = [0, 20])),
y = alt.Y("koi_period", scale = alt.Scale(domain = [0, 1300])),
color = alt.Color("koi_disposition", scale=alt.Scale(scheme="magma")),
tooltip=["koi_disposition","koi_impact", "koi_score"]
).add_selection(sel).transform_filter(sel)
c3 = alt.Chart(df.sample(5000)).mark_bar().encode(
x = "koi_disposition",
y = "mean(koi_insol)",
color = "koi_disposition"
)
c4 = alt.Chart(df.sample(5000)).mark_bar().encode(
x = "koi_disposition",
y = "mean(koi_impact)",
color = "koi_disposition"
)
c5 = alt.Chart(df.sample(5000)).mark_circle(clip = True).encode(
x = alt.X("koi_impact", scale = alt.Scale(domain = [0, 4])),
y = "koi_score",
color = alt.Color("koi_disposition", scale=alt.Scale(scheme="magma")),
tooltip=["koi_disposition","koi_impact", "koi_score"]
).add_selection(sel).transform_filter(sel)
c1
Below are just some interesting charts to see how different planet dispositions are positioned based on different columns from the dataset.
alt.hconcat(c3, c4)