Analyses on NICS Firearm Background Checks¶

Nathan Samarasena

Course Project, UC Irvine, Math 10, W22

Introduction¶

For my project, I will be doing analyses on the ‘nics-firearm-background-checks.csv’ file from a BuzzFeed github repository.

I will be exploring how different columns within the data set can better predict certain aspects of given background checks, and whether or not there are better combinations of columns to analyze.

The data set is structured to describe the findings of all background checks done per month per state.

Main portion of the project¶

Importing Libraries and Data¶

import seaborn as sns
import numpy as np
import pandas as pd
import altair as alt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import log_loss

df = pd.read_csv('nics-firearm-background-checks.csv')
df.shape

(15400, 27)

df.columns

Index(['month', 'state', 'permit', 'permit_recheck', 'handgun', 'long_gun',
       'other', 'multiple', 'admin', 'prepawn_handgun', 'prepawn_long_gun',
       'prepawn_other', 'redemption_handgun', 'redemption_long_gun',
       'redemption_other', 'returned_handgun', 'returned_long_gun',
       'returned_other', 'rentals_handgun', 'rentals_long_gun',
       'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other',
       'return_to_seller_handgun', 'return_to_seller_long_gun',
       'return_to_seller_other', 'totals'],
      dtype='object')

df

	month	state	permit	permit_recheck	handgun	long_gun	other	multiple	admin	prepawn_handgun	...	returned_other	rentals_handgun	rentals_long_gun	private_sale_handgun	private_sale_long_gun	private_sale_other	return_to_seller_handgun	return_to_seller_long_gun	return_to_seller_other	totals
0	2022-02	Alabama	25401.0	499.0	21822.0	14541.0	1351.0	1260	0.0	13.0	...	0.0	0.0	0.0	28.0	29.0	2.0	1.0	0.0	0.0	69098
1	2022-02	Alaska	301.0	0.0	2644.0	2178.0	348.0	202	0.0	0.0	...	0.0	0.0	0.0	2.0	4.0	0.0	0.0	0.0	0.0	5916
2	2022-02	Arizona	2560.0	473.0	20150.0	9935.0	1690.0	1153	0.0	11.0	...	1.0	0.0	0.0	15.0	13.0	0.0	0.0	2.0	0.0	38149
3	2022-02	Arkansas	1842.0	309.0	7780.0	5756.0	429.0	515	4.0	15.0	...	0.0	0.0	0.0	5.0	11.0	1.0	0.0	0.0	0.0	19002
4	2022-02	California	15815.0	10550.0	36362.0	23017.0	4941.0	1	0.0	1.0	...	183.0	0.0	0.0	7638.0	3090.0	626.0	19.0	20.0	0.0	106295
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
15395	1998-11	Virginia	0.0	NaN	14.0	2.0	NaN	8	0.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	24
15396	1998-11	Washington	1.0	NaN	65.0	286.0	NaN	8	1.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	361
15397	1998-11	West Virginia	3.0	NaN	149.0	251.0	NaN	5	0.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	408
15398	1998-11	Wisconsin	0.0	NaN	25.0	214.0	NaN	2	0.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	241
15399	1998-11	Wyoming	8.0	NaN	45.0	49.0	NaN	5	0.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	107

15400 rows × 27 columns

df['dt_month'] = pd.to_datetime(df['month']).dt.month
df['dt_year'] = pd.to_datetime(df['month']).dt.year

More Handguns than other Firearms¶

permit and permit_recheck¶

df2 = df[df['permit'].notna()]
df2 = df2[df['permit_recheck'].notna()]

df2['more_handgun'] = df['handgun'] > (df['long_gun'] + df['other'])

df2.shape

/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  This is separate from the ipykernel package so we can avoid doing imports until

(4015, 30)

df2.columns

Index(['month', 'state', 'permit', 'permit_recheck', 'handgun', 'long_gun',
       'other', 'multiple', 'admin', 'prepawn_handgun', 'prepawn_long_gun',
       'prepawn_other', 'redemption_handgun', 'redemption_long_gun',
       'redemption_other', 'returned_handgun', 'returned_long_gun',
       'returned_other', 'rentals_handgun', 'rentals_long_gun',
       'private_sale_handgun', 'private_sale_long_gun', 'private_sale_other',
       'return_to_seller_handgun', 'return_to_seller_long_gun',
       'return_to_seller_other', 'totals', 'dt_month', 'dt_year',
       'more_handgun'],
      dtype='object')

df2

	month	state	permit	permit_recheck	handgun	long_gun	other	multiple	admin	prepawn_handgun	...	private_sale_handgun	private_sale_long_gun	private_sale_other	return_to_seller_handgun	return_to_seller_long_gun	return_to_seller_other	totals	dt_month	dt_year	more_handgun
0	2022-02	Alabama	25401.0	499.0	21822.0	14541.0	1351.0	1260	0.0	13.0	...	28.0	29.0	2.0	1.0	0.0	0.0	69098	2	2022	True
1	2022-02	Alaska	301.0	0.0	2644.0	2178.0	348.0	202	0.0	0.0	...	2.0	4.0	0.0	0.0	0.0	0.0	5916	2	2022	True
2	2022-02	Arizona	2560.0	473.0	20150.0	9935.0	1690.0	1153	0.0	11.0	...	15.0	13.0	0.0	0.0	2.0	0.0	38149	2	2022	True
3	2022-02	Arkansas	1842.0	309.0	7780.0	5756.0	429.0	515	4.0	15.0	...	5.0	11.0	1.0	0.0	0.0	0.0	19002	2	2022	True
4	2022-02	California	15815.0	10550.0	36362.0	23017.0	4941.0	1	0.0	1.0	...	7638.0	3090.0	626.0	19.0	20.0	0.0	106295	2	2022	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4010	2016-02	Virginia	784.0	0.0	30085.0	15948.0	1133.0	0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	47955	2	2016	True
4011	2016-02	Washington	15736.0	0.0	20583.0	11991.0	1832.0	863	1.0	3.0	...	578.0	422.0	30.0	5.0	15.0	0.0	56043	2	2016	True
4012	2016-02	West Virginia	3527.0	0.0	10746.0	7436.0	357.0	757	5.0	6.0	...	11.0	5.0	1.0	3.0	2.0	0.0	27216	2	2016	True
4013	2016-02	Wisconsin	9420.0	0.0	19465.0	12431.0	821.0	62	0.0	0.0	...	5.0	15.0	0.0	0.0	0.0	0.0	42855	2	2016	True
4014	2016-02	Wyoming	551.0	0.0	2287.0	2036.0	139.0	150	0.0	3.0	...	0.0	4.0	0.0	1.0	1.0	0.0	5703	2	2016	True

4015 rows × 30 columns

X_colnames = ['permit','permit_recheck']
y_colname = 'more_handgun'
X = df2.loc[:,X_colnames].copy()
y = df2.loc[:,y_colname].copy()

scaler = StandardScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)

clf = KNeighborsClassifier(n_neighbors = 10)
clf.fit(X_scaled,y)

KNeighborsClassifier(n_neighbors=10)

X_scaled_train, X_scaled_test, y_train, y_test = train_test_split(X_scaled,y,test_size=0.4)

clf2 = KNeighborsClassifier(n_neighbors = 10)
clf2.fit(X_scaled_train,y_train)

KNeighborsClassifier(n_neighbors=10)

probs = clf2.predict_proba(X_scaled_test)
log_loss(y_test,probs)

0.7955703072201001

probs

array([[0.2, 0.8],
       [0.3, 0.7],
       [0.1, 0.9],
       ...,
       [0.2, 0.8],
       [0.4, 0.6],
       [0.1, 0.9]])

df2['probsSer'] = pd.Series(probs[:,1])

private_sale_handgun an private_sale_not_handgun¶

df3 = df[df['private_sale_handgun'].notna()]
df3 = df3[df['handgun'].notna()]
df3 = df3[df['long_gun'].notna()]
df3 = df3[df['other'].notna()]

df3['more_handgun'] = df['handgun'] > (df['long_gun'] + df['other'])

df3.shape

/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:2: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  This is separate from the ipykernel package so we can avoid doing imports until
/shared-libs/python3.7/py-core/lib/python3.7/site-packages/ipykernel_launcher.py:4: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  after removing the cwd from sys.path.

(5665, 30)

df3['private_sale_not_handgun'] = df2.loc[:,'private_sale_long_gun'] + df2.loc[:,'private_sale_other']

## X2_colnames = ['private_sale_handgun','private_sale_long_gun','private_sale_other']
X2_colnames = ['private_sale_handgun','private_sale_not_handgun']
y2_colname = 'more_handgun'
X2 = df2.loc[:,X_colnames].copy()
y2 = df2.loc[:,y_colname].copy()

scaler = StandardScaler()
scaler.fit(X2)
X2_scaled = scaler.transform(X2)

clf3 = KNeighborsClassifier(n_neighbors = 10)
clf3.fit(X2_scaled,y2)

KNeighborsClassifier(n_neighbors=10)

X2_scaled_train, X2_scaled_test, y2_train, y2_test = train_test_split(X2_scaled,y2,test_size=0.4)

clf4 = KNeighborsClassifier(n_neighbors = 10)
clf4.fit(X2_scaled_train,y2_train)

KNeighborsClassifier(n_neighbors=10)

probs2 = clf4.predict_proba(X2_scaled_test)
log_loss(y2_test,probs2)

0.8214485407353612

df3['probs2Ser'] = pd.Series(probs2[:,1])

State¶

handgun and long_gun¶

df4 = df[df['state'].notna()]
df4 = df4[df4['handgun'].notna()]
df4 = df4[df4['long_gun'].notna()]

df4.shape

(15380, 29)

X3_colnames = ['handgun','long_gun']
y3_colname = 'state'
X3 = df4.loc[:,X3_colnames].copy()
y3 = df4.loc[:,y3_colname].copy()

scaler = StandardScaler()
scaler.fit(X3)
X3_scaled = scaler.transform(X3)

clf5 = KNeighborsClassifier(n_neighbors = 5)
clf5.fit(X3_scaled,y3)

KNeighborsClassifier()

X3_scaled_train, X3_scaled_test, y3_train, y3_test = train_test_split(X3_scaled,y3,test_size=0.4)

clf6 = KNeighborsClassifier(n_neighbors = 5)
clf6.fit(X3_scaled_train,y3_train)

KNeighborsClassifier()

probs3 = clf6.predict_proba(X3_scaled_test)
log_loss(y3_test,probs3)

16.72825544530875

Graphing¶

permit and permit_recheck for predicting when there are more handguns¶

alt.data_transformers.disable_max_rows()
graph = alt.Chart(df2).mark_bar().encode(
    x = 'more_handgun',
    y = 'permit',
    color = 'more_handgun',
    tooltip = ['more_handgun']
)

graph2 = alt.Chart(df2).mark_bar().encode(
    x = 'more_handgun',
    y = 'permit_recheck',
    color = 'more_handgun',
    tooltip = ['more_handgun']
)

graph|graph2

alt.data_transformers.disable_max_rows()
graph3 = alt.Chart(df2).mark_bar().encode(
    x = 'more_handgun',
    y = 'permit',
    color = 'probsSer',
    tooltip = ['probsSer','more_handgun']
)

alt.data_transformers.disable_max_rows()
graph4 = alt.Chart(df2).mark_bar().encode(
    x = 'more_handgun',
    y = 'permit',
    color = 'probsSer',
    tooltip = ['probsSer','more_handgun']
)

graph3|graph4

private_sale_handgun an private_sale_not_handgun for predicting when there are more handguns¶

alt.data_transformers.disable_max_rows()
graph5 = alt.Chart(df3).mark_bar().encode(
    x = 'more_handgun',
    y = 'private_sale_handgun',
    color = 'more_handgun',
    tooltip = ['more_handgun']
)

graph6 = alt.Chart(df3).mark_bar().encode(
    x = 'more_handgun',
    y = 'private_sale_not_handgun',
    color = 'more_handgun',
    tooltip = ['more_handgun']
)

graph5|graph6

alt.data_transformers.disable_max_rows()
graph7 = alt.Chart(df3).mark_bar().encode(
    x = 'more_handgun',
    y = 'private_sale_handgun',
    color = 'probs2Ser',
    tooltip = ['probs2Ser','more_handgun']
)

alt.data_transformers.disable_max_rows()
graph8 = alt.Chart(df3).mark_bar().encode(
    x = 'more_handgun',
    y = 'private_sale_not_handgun',
    color = 'probs2Ser',
    tooltip = ['probs2Ser','more_handgun']
)

graph7|graph8

Accompanying Documentation¶

###Importing Libraries and Data

To start, we will import all required libraries.

Next, we will import the nics-firearm-background-checks.csv file into the project to be analyzed, initializing it as a DataFrame df. We take note of the shape as well as the columns of df.

We will also create some datetime columns to be used later in the project.

More Handguns than other Firearms¶

From there, we will create a new DataFrame for each new comparison between columns, as we may need to remove different NA data rows depending on what we are analyzing.

To start, we will find a good option to predict whether the given background checks have more handguns as opposed to long guns and other types of firearms. We use KNeighborsClassifier for this data. My initial idea was to use how many permits and permit rechecks were seen in the background checks, but my log_loss was fairly high, with the number being around 0.9, so this might not be the best for predicting when the background checks have more handguns.

This time we try to use private_sale_handgun and a new column that we just made called private_sale_not_handgun. After running through it just like before with KNeighborsClassifier, we get a lower log_loss of approximately 0.75.

State¶

To predict state, I went with

When trying to predict the state from the data of background checks, we run into a few problems. Namely, based on the nature of how states’ populations tend to treat gun control and the stigma around owning firearms, it is highly likely that our program would mistake many states for others. We also cannot replace one of the comparative columns with another, as they will yield similar results.

Graphing¶

For graphing, I decided to show what each column looked like with respect to what we were trying to predict. I used a bar charts through Altair.

Earlier in the project I also created columns to represent the probability for there to be more handguns with probsSer and probs2Ser to allow for a bar graph that would show more insightful information for each data point.

Summary¶

Through this project, I unfortunately found that making predictions based on the data provided is much more difficult than I initially thought. The log_loss was the biggest indicator of whether or not the insight provided by using machine learning was useful, as the numbers were uncomfortably high even after adjusting test size and the n_neighbors significantly (within reason for the size of the data). However, the use of probsSer told me a lot about how well the machine learning works when compared directly with the data.

References¶

https://github.com/BuzzFeedNews/nics-firearm-background-checks

Created in Deepnote

UC Irvine Math 10 W22

Analyses on NICS Firearm Background Checks

Contents

Analyses on NICS Firearm Background Checks¶

Introduction¶

Main portion of the project¶

Importing Libraries and Data¶

More Handguns than other Firearms¶

permit and permit_recheck¶

private_sale_handgun an private_sale_not_handgun¶

State¶

handgun and long_gun¶

Graphing¶

permit and permit_recheck for predicting when there are more handguns¶

private_sale_handgun an private_sale_not_handgun for predicting when there are more handguns¶

Accompanying Documentation¶

More Handguns than other Firearms¶

State¶

Graphing¶

Summary¶

References¶