Working with missing data
Contents
Working with missing data¶
Today we covered a variety of shorter topics. The most important topic is how to work with missing data.
Warm-up question¶
What can go inside the brackets?
df[???]
(Notice that there is no loc
or iloc
in this code.)
import pandas as pd
We import a fake (artificial) dataset.
df = pd.read_csv("../data/sampleData.csv")
df
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
3 | 12.3 | -4.0 | 4.1 |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
The most basic way to use the syntax df[???]
is to access a column of a DataFrame. The column is returned as a pandas Series.
df["B"]
0 -2.3
1 1.0
2 5.0
3 -4.0
4 2.8
5 6.0
6 1.0
7 -1.0
Name: B, dtype: float64
type(df["B"])
pandas.core.series.Series
s = df["B"]
s.index
RangeIndex(start=0, stop=8, step=1)
s[5]
6.0
pandas Series are like fancier versions of the dictionary data type in Python. Here we convert the pandas Series s
into a Python dictionary d
.
d = dict(s)
d
{0: -2.3, 1: 1.0, 2: 5.0, 3: -4.0, 4: 2.8, 5: 6.0, 6: 1.0, 7: -1.0}
Notice how similar the content is.
s
0 -2.3
1 1.0
2 5.0
3 -4.0
4 2.8
5 6.0
6 1.0
7 -1.0
Name: B, dtype: float64
We access elements in a Python dict
the same we access them in a pandas Series
, using square brackets.
d[5]
6.0
Here is a simpler example of a Python dictionary.
d2 = {"chris":"instructor","yasmeen":"ta"}
d2["chris"]
'instructor'
Another use of the syntax df[???]
is to provide a list of column names.
df
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
3 | 12.3 | -4.0 | 4.1 |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
df[["A","C"]]
A | C | |
---|---|---|
0 | 3 | -1 |
1 | 2.1 | 0 |
2 | ? | ? |
3 | 12.3 | 4.1 |
4 | 7 | 3 |
5 | 5 | ? |
6 | ? | 1.8 |
7 | 2 | 3.14 |
You can even repeat a column multiple times.
df[["A","C","A"]]
A | C | A | |
---|---|---|---|
0 | 3 | -1 | 3 |
1 | 2.1 | 0 | 2.1 |
2 | ? | ? | ? |
3 | 12.3 | 4.1 | 12.3 |
4 | 7 | 3 | 7 |
5 | 5 | ? | 5 |
6 | ? | 1.8 | ? |
7 | 2 | 3.14 | 2 |
Another use of the syntax df[???]
is for what is called slicing.
df
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
3 | 12.3 | -4.0 | 4.1 |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
Here we get the first 3 rows in the DataFrame.
# slicing
df[:3]
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
Alternate methods:
df.head(3)
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
df.iloc[:3]
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
The last example we’ll cover with the syntax df[???]
, and the version most relevant for this last week’s material, is Boolean indexing.
df
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
3 | 12.3 | -4.0 | 4.1 |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
In the following, we find all the rows where the “A” column is equal to ?
, but how does this actually work?
# Boolean indexing
df[df["A"]=="?"]
A | B | C | |
---|---|---|---|
2 | ? | 5.0 | ? |
6 | ? | 1.0 | 1.8 |
Let’s look more closely at what is going inside the square brackets. This is an example of a Boolean Series.
# Boolean Series
df["A"]=="?"
0 False
1 False
2 True
3 False
4 False
5 False
6 True
7 False
Name: A, dtype: bool
Think of the Boolean Series as telling Python which rows to keep. In this case, only rows 2 and 6 are kept.
df[df["A"]=="?"]
A | B | C | |
---|---|---|---|
2 | ? | 5.0 | ? |
6 | ? | 1.0 | 1.8 |
Usually this will arise from a Boolean Series, but it’s also fine to use a list of True
and False
values. Here we get rid of the row labeled 3.
df[[True, True, True, False, True, True, True, True]]
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
Here is an example of making a small DataFrame from a Python dictionary. Here the keys of the dictionary are the names of the columns, and the values of the dictionary are the values in the rows.
df3 = pd.DataFrame({"col0":[3,1,4],"col1":[2,2,5]})
df3
col0 | col1 | |
---|---|---|
0 | 3 | 2 |
1 | 1 | 2 |
2 | 4 | 5 |
The following does not work because the inner list does not have length matching the number of rows. (Warning. It can be confusing, because if you are providing strings rather than Boolean values, then the length should match the number of columns!)
df3[[True, False]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_45289/2048590108.py in <module>
----> 1 df3[[True, False]]
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
3447 # Do we have a (boolean) 1d indexer?
3448 if com.is_bool_indexer(key):
-> 3449 return self._getitem_bool_array(key)
3450
3451 # We are left with two options: a single key, and a collection of keys,
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/frame.py in _getitem_bool_array(self, key)
3495 elif len(key) != len(self.index):
3496 raise ValueError(
-> 3497 f"Item wrong length {len(key)} instead of {len(self.index)}."
3498 )
3499
ValueError: Item wrong length 2 instead of 3.
Here we keep the first and last rows.
df3[[True, False,True]]
col0 | col1 | |
---|---|---|
0 | 3 | 2 |
2 | 4 | 5 |
df3
col0 | col1 | |
---|---|---|
0 | 3 | 2 |
1 | 1 | 2 |
2 | 4 | 5 |
That example is a little artificial, but it is the same reasoning that’s used with Boolean indexing. Here we make a Boolean Series.
df3["col0"] > 2
0 True
1 False
2 True
Name: col0, dtype: bool
And now we provide that Boolean Series inside the square brackets. This tells pandas which rows to keep. We are keeping the rows which have a value strictly greater than 2 in “col0”.
# Boolean indexing
df3[df3["col0"] > 2]
col0 | col1 | |
---|---|---|
0 | 3 | 2 |
2 | 4 | 5 |
Logic in pandas¶
Let’s return to our original DataFrame.
df
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
3 | 12.3 | -4.0 | 4.1 |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
Say we want to know which rows have a “?” anywhere (in any of the columns). Here is one solution, but it is not DRY. The vertical bar |
denotes “or” in pandas.
(df["A"]=="?") | (df["B"]=="?") | (df["C"]=="?")
0 False
1 False
2 True
3 False
4 False
5 True
6 True
7 False
dtype: bool
Aside: The parentheses are important.
df["A"]=="?" | df["B"]=="?" | df["C"]=="?"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op)
301 # (xint or xbool) and (yint or bool)
--> 302 result = op(x, y)
303 except TypeError:
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/roperator.py in ror_(left, right)
55 def ror_(left, right):
---> 56 return operator.or_(right, left)
57
TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op)
315 try:
--> 316 result = libops.scalar_binop(x, y, op)
317 except (
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/_libs/ops.pyx in pandas._libs.ops.scalar_binop()
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_45289/2004769874.py in <module>
----> 1 df["A"]=="?" | df["B"]=="?" | df["C"]=="?"
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/arraylike.py in __ror__(self, other)
72 @unpack_zerodim_and_defer("__ror__")
73 def __ror__(self, other):
---> 74 return self._logical_method(other, roperator.ror_)
75
76 @unpack_zerodim_and_defer("__xor__")
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/series.py in _logical_method(self, other, op)
5511 rvalues = extract_array(other, extract_numpy=True, extract_range=True)
5512
-> 5513 res_values = ops.logical_op(lvalues, rvalues, op)
5514 return self._construct_result(res_values, name=res_name)
5515
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py in logical_op(left, right, op)
390 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
391
--> 392 res_values = na_logical_op(lvalues, rvalues, op)
393 # error: Cannot call function of unknown type
394 res_values = filler(res_values) # type: ignore[operator]
~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op)
326 f"Cannot perform '{op.__name__}' with a dtyped [{x.dtype}] array "
327 f"and scalar of type [{typ}]"
--> 328 ) from err
329
330 return result.reshape(x.shape)
TypeError: Cannot perform 'ror_' with a dtyped [float64] array and scalar of type [bool]
Let’s take a closer look at how this syntax works.
(df["A"]=="?") | (df["B"]=="?") | (df["C"]=="?")
0 False
1 False
2 True
3 False
4 False
5 True
6 True
7 False
dtype: bool
Here is an example of a Boolean DataFrame. In each position, it says whether the DataFrame df
had the value “?”.
df=="?"
A | B | C | |
---|---|---|---|
0 | False | False | False |
1 | False | False | False |
2 | True | False | True |
3 | False | False | False |
4 | False | False | False |
5 | False | False | True |
6 | True | False | False |
7 | False | False | False |
Here is a method to find if there are any True
values along a row. The keyword argument axis=1
is telling pandas to look one row at a time. Notice how this produces the same Boolean Series as the non-Dry version above, with the vertical bars |
.
# DRY code
(df=="?").any(axis=1)
0 False
1 False
2 True
3 False
4 False
5 True
6 True
7 False
dtype: bool
If we instead use axis=0
, then we are determining if there are any True
values in each column, one column at a time.
(df=="?").any(axis=0)
A True
B False
C True
dtype: bool
We can also use all
instead of any
, if we want to know if all of the values are True
.
(df=="?").all(axis=0)
A False
B False
C False
dtype: bool
Problems with this DataFrame¶
The “?”s in this DataFrame cause a lot of problems. They are supposed to represent missing values, but pandas doesn’t know that.
df
A | B | C | |
---|---|---|---|
0 | 3 | -2.3 | -1 |
1 | 2.1 | 1.0 | 0 |
2 | ? | 5.0 | ? |
3 | 12.3 | -4.0 | 4.1 |
4 | 7 | 2.8 | 3 |
5 | 5 | 6.0 | ? |
6 | ? | 1.0 | 1.8 |
7 | 2 | -1.0 | 3.14 |
The values in column “A” look like numbers, but because of the “?” values, pandas is treating them as strings. For example, if we try to use the sum
method, the values get concatenated.
df["A"].sum()
'32.1?12.375?2'
The “B” column is better because it doesn’t have any “?”s in it.
df.dtypes
A object
B float64
C object
dtype: object
This is how +
works when the two objects are strings in Python.
"3"+"2"
'32'
The describe
method also doesn’t work in two of the three columns, because pandas thinks they are strings.
df.describe()
B | |
---|---|
count | 8.000000 |
mean | 1.062500 |
std | 3.466137 |
min | -4.000000 |
25% | -1.325000 |
50% | 1.000000 |
75% | 3.350000 |
max | 6.000000 |
Often missing values are handled automatically by pandas. In this case, they are not handled automatically, so we tell pandas explicitly when importing that the missing values are denoted by “?”.
df4 = pd.read_csv("../data/sampleData.csv", na_values="?")
Notice how those “?” values have gotten represented as NaN
.
df4
A | B | C | |
---|---|---|---|
0 | 3.0 | -2.3 | -1.00 |
1 | 2.1 | 1.0 | 0.00 |
2 | NaN | 5.0 | NaN |
3 | 12.3 | -4.0 | 4.10 |
4 | 7.0 | 2.8 | 3.00 |
5 | 5.0 | 6.0 | NaN |
6 | NaN | 1.0 | 1.80 |
7 | 2.0 | -1.0 | 3.14 |
Now sum
works correctly.
df4["A"].sum()
31.4
We can use the axis
argument the same as before. Here axis=0
is saying to add up the columns one at a time.
df4.sum(axis=0)
A 31.40
B 8.50
C 11.04
dtype: float64
Here we are adding the values in each row, one row at a time, because of the axis=1
argument.
df4.sum(axis=1)
0 -0.30
1 3.10
2 5.00
3 12.40
4 12.80
5 11.00
6 2.80
7 4.14
dtype: float64
There are lots of different methods which behave similarly. For example, here we compute the median
of each row.
df4.median(axis=1)
0 -1.0
1 1.0
2 5.0
3 4.1
4 3.0
5 5.5
6 1.4
7 2.0
dtype: float64
NaN¶
Let’s look more closely at these missing values.
df4.loc[2,"A"]
nan
type(df4.loc[2,"A"])
numpy.float64
This constant is part of the important NumPy library.
import numpy as np
If we want to make this “not a number” value ourselves, we can use the following syntax.
np.nan
nan
Missing values in a real dataset¶
So far today, all of our examples came from a small artificial dataset. Here we import a slightly bigger (still relatively small) dataset with real data in it. Notice how we don’t need to specify the na_values
keyword argument, because pandas correctly identifies the missing data automatically.
cars = pd.read_csv("../data/cars.csv")
Using the info
method, we can see that, for example, there are 8 null values (i.e., missing values) in the “Miles_per_Gallon column.
cars.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 406 entries, 0 to 405
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 406 non-null object
1 Miles_per_Gallon 398 non-null float64
2 Cylinders 406 non-null int64
3 Displacement 406 non-null float64
4 Horsepower 400 non-null float64
5 Weight_in_lbs 406 non-null int64
6 Acceleration 406 non-null float64
7 Year 406 non-null object
8 Origin 406 non-null object
dtypes: float64(4), int64(2), object(3)
memory usage: 28.7+ KB
cars["Miles_per_Gallon"].isna()
0 False
1 False
2 False
3 False
4 False
...
401 False
402 False
403 False
404 False
405 False
Name: Miles_per_Gallon, Length: 406, dtype: bool
Here is a quick way to count those missing values.
cars["Miles_per_Gallon"].isna().sum()
8
Using Boolean indexing to find the rows where the “Miles_per_Gallon” value is missing.
cars[cars["Miles_per_Gallon"].isna()]
Name | Miles_per_Gallon | Cylinders | Displacement | Horsepower | Weight_in_lbs | Acceleration | Year | Origin | |
---|---|---|---|---|---|---|---|---|---|
10 | citroen ds-21 pallas | NaN | 4 | 133.0 | 115.0 | 3090 | 17.5 | 1970-01-01 | Europe |
11 | chevrolet chevelle concours (sw) | NaN | 8 | 350.0 | 165.0 | 4142 | 11.5 | 1970-01-01 | USA |
12 | ford torino (sw) | NaN | 8 | 351.0 | 153.0 | 4034 | 11.0 | 1970-01-01 | USA |
13 | plymouth satellite (sw) | NaN | 8 | 383.0 | 175.0 | 4166 | 10.5 | 1970-01-01 | USA |
14 | amc rebel sst (sw) | NaN | 8 | 360.0 | 175.0 | 3850 | 11.0 | 1970-01-01 | USA |
17 | ford mustang boss 302 | NaN | 8 | 302.0 | 140.0 | 3353 | 8.0 | 1970-01-01 | USA |
39 | volkswagen super beetle 117 | NaN | 4 | 97.0 | 48.0 | 1978 | 20.0 | 1971-01-01 | Europe |
367 | saab 900s | NaN | 4 | 121.0 | 110.0 | 2800 | 15.4 | 1982-01-01 | Europe |
A common task is to remove all the rows which are missing any values. Here is a quick way to do that. (This is displaying a new DataFrame; the way it’s written, it does not change cars
.)
cars.dropna()
Name | Miles_per_Gallon | Cylinders | Displacement | Horsepower | Weight_in_lbs | Acceleration | Year | Origin | |
---|---|---|---|---|---|---|---|---|---|
0 | chevrolet chevelle malibu | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 1970-01-01 | USA |
1 | buick skylark 320 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 1970-01-01 | USA |
2 | plymouth satellite | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 1970-01-01 | USA |
3 | amc rebel sst | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 1970-01-01 | USA |
4 | ford torino | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 1970-01-01 | USA |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
401 | ford mustang gl | 27.0 | 4 | 140.0 | 86.0 | 2790 | 15.6 | 1982-01-01 | USA |
402 | vw pickup | 44.0 | 4 | 97.0 | 52.0 | 2130 | 24.6 | 1982-01-01 | Europe |
403 | dodge rampage | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 1982-01-01 | USA |
404 | ford ranger | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 1982-01-01 | USA |
405 | chevy s-10 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 1982-01-01 | USA |
392 rows × 9 columns
Here is another way to do the same thing, using any
.
cars.isna()
Name | Miles_per_Gallon | Cylinders | Displacement | Horsepower | Weight_in_lbs | Acceleration | Year | Origin | |
---|---|---|---|---|---|---|---|---|---|
0 | False | False | False | False | False | False | False | False | False |
1 | False | False | False | False | False | False | False | False | False |
2 | False | False | False | False | False | False | False | False | False |
3 | False | False | False | False | False | False | False | False | False |
4 | False | False | False | False | False | False | False | False | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
401 | False | False | False | False | False | False | False | False | False |
402 | False | False | False | False | False | False | False | False | False |
403 | False | False | False | False | False | False | False | False | False |
404 | False | False | False | False | False | False | False | False | False |
405 | False | False | False | False | False | False | False | False | False |
406 rows × 9 columns
We want to identify rows where any value is missing. Since we are working one row at a time, we use axis=1
. Here is a Boolean Series.
cars.isna().any(axis=1)
0 False
1 False
2 False
3 False
4 False
...
401 False
402 False
403 False
404 False
405 False
Length: 406, dtype: bool
Here we are using Boolean indexing to get keep only those rows which are missing data.
cars[cars.isna().any(axis=1)]
Name | Miles_per_Gallon | Cylinders | Displacement | Horsepower | Weight_in_lbs | Acceleration | Year | Origin | |
---|---|---|---|---|---|---|---|---|---|
10 | citroen ds-21 pallas | NaN | 4 | 133.0 | 115.0 | 3090 | 17.5 | 1970-01-01 | Europe |
11 | chevrolet chevelle concours (sw) | NaN | 8 | 350.0 | 165.0 | 4142 | 11.5 | 1970-01-01 | USA |
12 | ford torino (sw) | NaN | 8 | 351.0 | 153.0 | 4034 | 11.0 | 1970-01-01 | USA |
13 | plymouth satellite (sw) | NaN | 8 | 383.0 | 175.0 | 4166 | 10.5 | 1970-01-01 | USA |
14 | amc rebel sst (sw) | NaN | 8 | 360.0 | 175.0 | 3850 | 11.0 | 1970-01-01 | USA |
17 | ford mustang boss 302 | NaN | 8 | 302.0 | 140.0 | 3353 | 8.0 | 1970-01-01 | USA |
38 | ford pinto | 25.0 | 4 | 98.0 | NaN | 2046 | 19.0 | 1971-01-01 | USA |
39 | volkswagen super beetle 117 | NaN | 4 | 97.0 | 48.0 | 1978 | 20.0 | 1971-01-01 | Europe |
133 | ford maverick | 21.0 | 6 | 200.0 | NaN | 2875 | 17.0 | 1974-01-01 | USA |
337 | renault lecar deluxe | 40.9 | 4 | 85.0 | NaN | 1835 | 17.3 | 1980-01-01 | Europe |
343 | ford mustang cobra | 23.6 | 4 | 140.0 | NaN | 2905 | 14.3 | 1980-01-01 | USA |
361 | renault 18i | 34.5 | 4 | 100.0 | NaN | 2320 | 15.8 | 1982-01-01 | Europe |
367 | saab 900s | NaN | 4 | 121.0 | 110.0 | 2800 | 15.4 | 1982-01-01 | Europe |
382 | amc concord dl | 23.0 | 4 | 151.0 | NaN | 3035 | 20.5 | 1982-01-01 | USA |
A more natural thing is to get rid of those rows. We can negate our Boolean Series by using ~
.
# negation in pandas
cars[~cars.isna().any(axis=1)]
Name | Miles_per_Gallon | Cylinders | Displacement | Horsepower | Weight_in_lbs | Acceleration | Year | Origin | |
---|---|---|---|---|---|---|---|---|---|
0 | chevrolet chevelle malibu | 18.0 | 8 | 307.0 | 130.0 | 3504 | 12.0 | 1970-01-01 | USA |
1 | buick skylark 320 | 15.0 | 8 | 350.0 | 165.0 | 3693 | 11.5 | 1970-01-01 | USA |
2 | plymouth satellite | 18.0 | 8 | 318.0 | 150.0 | 3436 | 11.0 | 1970-01-01 | USA |
3 | amc rebel sst | 16.0 | 8 | 304.0 | 150.0 | 3433 | 12.0 | 1970-01-01 | USA |
4 | ford torino | 17.0 | 8 | 302.0 | 140.0 | 3449 | 10.5 | 1970-01-01 | USA |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
401 | ford mustang gl | 27.0 | 4 | 140.0 | 86.0 | 2790 | 15.6 | 1982-01-01 | USA |
402 | vw pickup | 44.0 | 4 | 97.0 | 52.0 | 2130 | 24.6 | 1982-01-01 | Europe |
403 | dodge rampage | 32.0 | 4 | 135.0 | 84.0 | 2295 | 11.6 | 1982-01-01 | USA |
404 | ford ranger | 28.0 | 4 | 120.0 | 79.0 | 2625 | 18.6 | 1982-01-01 | USA |
405 | chevy s-10 | 31.0 | 4 | 119.0 | 82.0 | 2720 | 19.4 | 1982-01-01 | USA |
392 rows × 9 columns