{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "5aaa54a06acf4c5f97504a5ba7e94c05",
    "deepnote_cell_height": 297.765625,
    "deepnote_cell_type": "markdown",
    "tags": []
   },
   "source": [
    "# Working with missing data\n",
    "\n",
    "Today we covered a variety of shorter topics.  The most important topic is how to work with missing data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "5d7bdd82-8296-4c49-b0d7-dc3eee542733",
    "deepnote_cell_height": 144.78125,
    "deepnote_cell_type": "markdown",
    "tags": []
   },
   "source": [
    "## Warm-up question\n",
    " \n",
    "* What can go inside the brackets? `df[???]`\n",
    "\n",
    "(Notice that there is no `loc` or `iloc` in this code.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "cell_id": "8c20572d905c43dab8a01e29069d4b84",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 0,
    "execution_start": 1649102496321,
    "source_hash": "9b82ee11",
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We import a fake (artificial) dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "cell_id": "f433429e6ac04cfa8d4b82e46a8448f0",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 2,
    "execution_start": 1649102529882,
    "source_hash": "e8446f83",
    "tags": []
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"../data/sampleData.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "cell_id": "4754670996c942718a0e2b2a9dc24883",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 32,
    "execution_start": 1649102532504,
    "source_hash": "f804c160",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0     3 -2.3    -1\n",
       "1   2.1  1.0     0\n",
       "2     ?  5.0     ?\n",
       "3  12.3 -4.0   4.1\n",
       "4     7  2.8     3\n",
       "5     5  6.0     ?\n",
       "6     ?  1.0   1.8\n",
       "7     2 -1.0  3.14"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The most basic way to use the syntax `df[???]` is to access a column of a DataFrame.  The column is returned as a pandas Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "cell_id": "55a4c233f645461e994210f685991552",
    "deepnote_cell_height": 271.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 480,
    "execution_start": 1649102598839,
    "source_hash": "41abad46",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -2.3\n",
       "1    1.0\n",
       "2    5.0\n",
       "3   -4.0\n",
       "4    2.8\n",
       "5    6.0\n",
       "6    1.0\n",
       "7   -1.0\n",
       "Name: B, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"B\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "cell_id": "8449f6076436473399f7c57874338466",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 4,
    "execution_start": 1649102625302,
    "source_hash": "71add696",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df[\"B\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "cell_id": "acbfaf219be0417091b60cddbcf9c782",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 1,
    "execution_start": 1649102649500,
    "source_hash": "f2e61a57",
    "tags": []
   },
   "outputs": [],
   "source": [
    "s = df[\"B\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "cell_id": "63ec392b2831477abdd30a3d36ac0f90",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 5,
    "execution_start": 1649102663575,
    "source_hash": "4a3a057a",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=8, step=1)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "cell_id": "c171c80a28e049908956fe70adaad747",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 7,
    "execution_start": 1649102689384,
    "source_hash": "68ae6f2",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pandas Series are like fancier versions of the dictionary data type in Python.  Here we convert the pandas Series `s` into a Python dictionary `d`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "cell_id": "9701a406501945af866a578e1d16742e",
    "deepnote_cell_height": 154.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 19,
    "execution_start": 1649102732781,
    "source_hash": "1e4e7cd4",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: -2.3, 1: 1.0, 2: 5.0, 3: -4.0, 4: 2.8, 5: 6.0, 6: 1.0, 7: -1.0}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = dict(s)\n",
    "d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how similar the content is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "cell_id": "ed64c6aafe864f9ca41193fc80700bdd",
    "deepnote_cell_height": 271.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 4,
    "execution_start": 1649102735876,
    "source_hash": "465dd177",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -2.3\n",
       "1    1.0\n",
       "2    5.0\n",
       "3   -4.0\n",
       "4    2.8\n",
       "5    6.0\n",
       "6    1.0\n",
       "7   -1.0\n",
       "Name: B, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We access elements in a Python `dict` the same we access them in a pandas `Series`, using square brackets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "cell_id": "586018bf91a241ab8cce7edc6c60d4a2",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 3,
    "execution_start": 1649102756456,
    "source_hash": "359b1ee1",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d[5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a simpler example of a Python dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "cell_id": "59a0d48592e04f5b83c73bc0f2d515c4",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 3,
    "execution_start": 1649102848551,
    "source_hash": "e8da924b",
    "tags": []
   },
   "outputs": [],
   "source": [
    "d2 = {\"chris\":\"instructor\",\"yasmeen\":\"ta\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "cell_id": "ec543a4fcd514aaaa544e8d46182c66e",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 23,
    "execution_start": 1649102859777,
    "source_hash": "9f05ebc5",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'instructor'"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d2[\"chris\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another use of the syntax `df[???]` is to provide a list of column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "cell_id": "80a8a4316fbc4322966349bcc351120e",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 14,
    "execution_start": 1649102894176,
    "source_hash": "f804c160",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0     3 -2.3    -1\n",
       "1   2.1  1.0     0\n",
       "2     ?  5.0     ?\n",
       "3  12.3 -4.0   4.1\n",
       "4     7  2.8     3\n",
       "5     5  6.0     ?\n",
       "6     ?  1.0   1.8\n",
       "7     2 -1.0  3.14"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "cell_id": "6fe6f5085bb74967bbbf0105edd25235",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 8,
    "execution_start": 1649102927653,
    "source_hash": "2044d700",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A     C\n",
       "0     3    -1\n",
       "1   2.1     0\n",
       "2     ?     ?\n",
       "3  12.3   4.1\n",
       "4     7     3\n",
       "5     5     ?\n",
       "6     ?   1.8\n",
       "7     2  3.14"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[[\"A\",\"C\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can even repeat a column multiple times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "cell_id": "8805b3d51c5d4d5aa6877c4015353731",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 7,
    "execution_start": 1649102948079,
    "source_hash": "7cc23c65",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>C</th>\n",
       "      <th>A</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>0</td>\n",
       "      <td>2.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>?</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>4.1</td>\n",
       "      <td>12.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>?</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.8</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>3.14</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A     C     A\n",
       "0     3    -1     3\n",
       "1   2.1     0   2.1\n",
       "2     ?     ?     ?\n",
       "3  12.3   4.1  12.3\n",
       "4     7     3     7\n",
       "5     5     ?     5\n",
       "6     ?   1.8     ?\n",
       "7     2  3.14     2"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[[\"A\",\"C\",\"A\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another use of the syntax `df[???]` is for what is called *slicing*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "cell_id": "5fd2d1b0a62445d5a4769674607ebc4f",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 11,
    "execution_start": 1649103038361,
    "source_hash": "f804c160",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0     3 -2.3    -1\n",
       "1   2.1  1.0     0\n",
       "2     ?  5.0     ?\n",
       "3  12.3 -4.0   4.1\n",
       "4     7  2.8     3\n",
       "5     5  6.0     ?\n",
       "6     ?  1.0   1.8\n",
       "7     2 -1.0  3.14"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we get the first 3 rows in the DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "cell_id": "a6f42843eeee4211a17557f4767fe534",
    "deepnote_cell_height": 329,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 8,
    "execution_start": 1649103047149,
    "source_hash": "6d948350",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C\n",
       "0    3 -2.3  -1\n",
       "1  2.1  1.0   0\n",
       "2    ?  5.0   ?"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# slicing\n",
    "df[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternate methods:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "cell_id": "b3962d21fff84eb1aa31bbc933fd0ada",
    "deepnote_cell_height": 311,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 7,
    "execution_start": 1649103090080,
    "source_hash": "3a1ea484",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C\n",
       "0    3 -2.3  -1\n",
       "1  2.1  1.0   0\n",
       "2    ?  5.0   ?"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "cell_id": "1f985ee58f304c6291223e4d16d04077",
    "deepnote_cell_height": 311,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 18,
    "execution_start": 1649103105891,
    "source_hash": "d3dff12e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C\n",
       "0    3 -2.3  -1\n",
       "1  2.1  1.0   0\n",
       "2    ?  5.0   ?"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last example we'll cover with the syntax `df[???]`, and the version most relevant for this last week's material, is Boolean indexing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "cell_id": "9a29ab5f6c334b4390d24a93e5b537eb",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 19,
    "execution_start": 1649103189815,
    "source_hash": "f804c160",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0     3 -2.3    -1\n",
       "1   2.1  1.0     0\n",
       "2     ?  5.0     ?\n",
       "3  12.3 -4.0   4.1\n",
       "4     7  2.8     3\n",
       "5     5  6.0     ?\n",
       "6     ?  1.0   1.8\n",
       "7     2 -1.0  3.14"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the following, we find all the rows where the \"A\" column is equal to `?`, but how does this actually work?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "cell_id": "9d2944ff897c410fbabb44142e26da49",
    "deepnote_cell_height": 288,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 5,
    "execution_start": 1649103204902,
    "source_hash": "ef7bfe16",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.deepnote.dataframe.v3+json": {
       "column_count": 3,
       "columns": [
        {
         "dtype": "object",
         "name": "A",
         "stats": {
          "categories": [
           {
            "count": 2,
            "name": "?"
           }
          ],
          "nan_count": 0,
          "unique_count": 1
         }
        },
        {
         "dtype": "float64",
         "name": "B",
         "stats": {
          "histogram": [
           {
            "bin_end": 1.4,
            "bin_start": 1,
            "count": 1
           },
           {
            "bin_end": 1.8,
            "bin_start": 1.4,
            "count": 0
           },
           {
            "bin_end": 2.2,
            "bin_start": 1.8,
            "count": 0
           },
           {
            "bin_end": 2.6,
            "bin_start": 2.2,
            "count": 0
           },
           {
            "bin_end": 3,
            "bin_start": 2.6,
            "count": 0
           },
           {
            "bin_end": 3.4000000000000004,
            "bin_start": 3,
            "count": 0
           },
           {
            "bin_end": 3.8000000000000003,
            "bin_start": 3.4000000000000004,
            "count": 0
           },
           {
            "bin_end": 4.2,
            "bin_start": 3.8000000000000003,
            "count": 0
           },
           {
            "bin_end": 4.6,
            "bin_start": 4.2,
            "count": 0
           },
           {
            "bin_end": 5,
            "bin_start": 4.6,
            "count": 1
           }
          ],
          "max": "5.0",
          "min": "1.0",
          "nan_count": 0,
          "unique_count": 2
         }
        },
        {
         "dtype": "object",
         "name": "C",
         "stats": {
          "categories": [
           {
            "count": 1,
            "name": "?"
           },
           {
            "count": 1,
            "name": "1.8"
           }
          ],
          "nan_count": 0,
          "unique_count": 2
         }
        },
        {
         "dtype": "int64",
         "name": "_deepnote_index_column"
        }
       ],
       "row_count": 2,
       "rows": [
        {
         "A": "?",
         "B": 5,
         "C": "?",
         "_deepnote_index_column": 2
        },
        {
         "A": "?",
         "B": 1,
         "C": "1.8",
         "_deepnote_index_column": 6
        }
       ]
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A    B    C\n",
       "2  ?  5.0    ?\n",
       "6  ?  1.0  1.8"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Boolean indexing\n",
    "df[df[\"A\"]==\"?\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look more closely at what is going inside the square brackets.  This is an example of a Boolean Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "cell_id": "12e2c29f0cb8455ab29e39941ba91552",
    "deepnote_cell_height": 289.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 4,
    "execution_start": 1649103263654,
    "source_hash": "246a5e1f",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "5    False\n",
       "6     True\n",
       "7    False\n",
       "Name: A, dtype: bool"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Boolean Series\n",
    "df[\"A\"]==\"?\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Think of the Boolean Series as telling Python which rows to keep.  In this case, only rows 2 and 6 are kept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "cell_id": "0177c4fb541c4ea08c1415f0a8ab4622",
    "deepnote_cell_height": 270,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 10,
    "execution_start": 1649103315632,
    "source_hash": "ef7bfe16",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A    B    C\n",
       "2  ?  5.0    ?\n",
       "6  ?  1.0  1.8"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"A\"]==\"?\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually this will arise from a Boolean Series, but it's also fine to use a list of `True` and `False` values.  Here we get rid of the row labeled 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "cell_id": "fc02e96706914e4ab5c86ff594c3f6b2",
    "deepnote_cell_height": 475,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 7,
    "execution_start": 1649103342955,
    "source_hash": "23af1fe9",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C\n",
       "0    3 -2.3    -1\n",
       "1  2.1  1.0     0\n",
       "2    ?  5.0     ?\n",
       "4    7  2.8     3\n",
       "5    5  6.0     ?\n",
       "6    ?  1.0   1.8\n",
       "7    2 -1.0  3.14"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[[True, True, True, False, True, True, True, True]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an example of making a small DataFrame from a Python dictionary.  Here the keys of the dictionary are the names of the columns, and the values of the dictionary are the values in the rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "cell_id": "55f89463c67445e9bfa5e7198e4663cc",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 2,
    "execution_start": 1649103444009,
    "source_hash": "76ee9abc",
    "tags": []
   },
   "outputs": [],
   "source": [
    "df3 = pd.DataFrame({\"col0\":[3,1,4],\"col1\":[2,2,5]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "cell_id": "7f3786671475450bbbf9d6970312ebe6",
    "deepnote_cell_height": 311,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 6,
    "execution_start": 1649103445630,
    "source_hash": "816a197e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col0</th>\n",
       "      <th>col1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col0  col1\n",
       "0     3     2\n",
       "1     1     2\n",
       "2     4     5"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following does not work because the inner list does not have length matching the number of rows.  (**Warning**.  It can be confusing, because if you are providing strings rather than Boolean values, then the length should match the number of columns!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "cell_id": "18fe0ab283d6469faa7b7f05561c0f58",
    "deepnote_cell_height": 144.1875,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 106,
    "execution_start": 1649103481464,
    "source_hash": "3fd56677",
    "tags": []
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Item wrong length 2 instead of 3.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_44741/2048590108.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf3\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m   3447\u001b[0m         \u001b[0;31m# Do we have a (boolean) 1d indexer?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3448\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mcom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_bool_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3449\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_bool_array\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   3450\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3451\u001b[0m         \u001b[0;31m# We are left with two options: a single key, and a collection of keys,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m_getitem_bool_array\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m   3495\u001b[0m         \u001b[0;32melif\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3496\u001b[0m             raise ValueError(\n\u001b[0;32m-> 3497\u001b[0;31m                 \u001b[0;34mf\"Item wrong length {len(key)} instead of {len(self.index)}.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   3498\u001b[0m             )\n\u001b[1;32m   3499\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Item wrong length 2 instead of 3."
     ]
    }
   ],
   "source": [
    "df3[[True, False]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we keep the first and last rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "cell_id": "2cbe15565df846bf99a9177badcbd566",
    "deepnote_cell_height": 270,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 10,
    "execution_start": 1649103510578,
    "source_hash": "d20d3a35",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col0</th>\n",
       "      <th>col1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col0  col1\n",
       "0     3     2\n",
       "2     4     5"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3[[True, False,True]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "cell_id": "80400f135bb5425f9fac6c84ba213493",
    "deepnote_cell_height": 311,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 46,
    "execution_start": 1649103530162,
    "source_hash": "816a197e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col0</th>\n",
       "      <th>col1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col0  col1\n",
       "0     3     2\n",
       "1     1     2\n",
       "2     4     5"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That example is a little artificial, but it is the same reasoning that's used with Boolean indexing.  Here we make a Boolean Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "cell_id": "6d464049be164180ad77fad3f76840c5",
    "deepnote_cell_height": 175.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     78.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 14,
    "execution_start": 1649103547860,
    "source_hash": "d31e785f",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1    False\n",
       "2     True\n",
       "Name: col0, dtype: bool"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3[\"col0\"] > 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now we provide that Boolean Series inside the square brackets.  This tells pandas which rows to keep.  We are keeping the rows which have a value strictly greater than 2 in \"col0\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "cell_id": "497bc75f71ef48a7a277440727f47f40",
    "deepnote_cell_height": 288,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 9,
    "execution_start": 1649103560707,
    "source_hash": "ca239ea5",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col0</th>\n",
       "      <th>col1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col0  col1\n",
       "0     3     2\n",
       "2     4     5"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Boolean indexing\n",
    "df3[df3[\"col0\"] > 2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logic in pandas\n",
    "\n",
    "Let's return to our original DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "cell_id": "1336a409e6114ba4a950749c3f0edd19",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 15,
    "execution_start": 1649103616726,
    "source_hash": "f804c160",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0     3 -2.3    -1\n",
       "1   2.1  1.0     0\n",
       "2     ?  5.0     ?\n",
       "3  12.3 -4.0   4.1\n",
       "4     7  2.8     3\n",
       "5     5  6.0     ?\n",
       "6     ?  1.0   1.8\n",
       "7     2 -1.0  3.14"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Say we want to know which rows have a \"?\" anywhere (in any of the columns).  Here is one solution, but it is not DRY.  The vertical bar `|` denotes \"or\" in pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "cell_id": "1335f38034414a63bcef7f1e07d05375",
    "deepnote_cell_height": 307.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 5,
    "execution_start": 1649103666382,
    "source_hash": "c5b2323d",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "5     True\n",
       "6     True\n",
       "7    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df[\"A\"]==\"?\") | (df[\"B\"]==\"?\") | (df[\"C\"]==\"?\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Aside**: The parentheses are important."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "cell_id": "3015c57058de4c18a1063acb45be0a95",
    "deepnote_cell_height": 144.1875,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 8,
    "execution_start": 1649103753888,
    "source_hash": "8417a5f9",
    "tags": [
     "output_scroll"
    ]
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "Cannot perform 'ror_' with a dtyped [float64] array and scalar of type [bool]",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py\u001b[0m in \u001b[0;36mna_logical_op\u001b[0;34m(x, y, op)\u001b[0m\n\u001b[1;32m    301\u001b[0m         \u001b[0;31m#  (xint or xbool) and (yint or bool)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 302\u001b[0;31m         \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    303\u001b[0m     \u001b[0;32mexcept\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/roperator.py\u001b[0m in \u001b[0;36mror_\u001b[0;34m(left, right)\u001b[0m\n\u001b[1;32m     55\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mror_\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 56\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0moperator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mor_\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mright\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     57\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mTypeError\u001b[0m: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py\u001b[0m in \u001b[0;36mna_logical_op\u001b[0;34m(x, y, op)\u001b[0m\n\u001b[1;32m    315\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 316\u001b[0;31m                 \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlibops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mscalar_binop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    317\u001b[0m             except (\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/_libs/ops.pyx\u001b[0m in \u001b[0;36mpandas._libs.ops.scalar_binop\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Buffer dtype mismatch, expected 'Python object' but got 'double'",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m/var/folders/8j/gshrlmtn7dg4qtztj4d4t_w40000gn/T/ipykernel_44741/2004769874.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"A\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m==\u001b[0m\u001b[0;34m\"?\"\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"B\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m==\u001b[0m\u001b[0;34m\"?\"\u001b[0m \u001b[0;34m|\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"C\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m==\u001b[0m\u001b[0;34m\"?\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/common.py\u001b[0m in \u001b[0;36mnew_method\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m     67\u001b[0m         \u001b[0mother\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mitem_from_zerodim\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     68\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 69\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     70\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     71\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mnew_method\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/arraylike.py\u001b[0m in \u001b[0;36m__ror__\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m     72\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0munpack_zerodim_and_defer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"__ror__\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     73\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m__ror__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 74\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_logical_method\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mroperator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mror_\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     75\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     76\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0munpack_zerodim_and_defer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"__xor__\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/series.py\u001b[0m in \u001b[0;36m_logical_method\u001b[0;34m(self, other, op)\u001b[0m\n\u001b[1;32m   5511\u001b[0m         \u001b[0mrvalues\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mextract_array\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mextract_numpy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mextract_range\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   5512\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5513\u001b[0;31m         \u001b[0mres_values\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlogical_op\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   5514\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_construct_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mres_values\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mres_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   5515\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py\u001b[0m in \u001b[0;36mlogical_op\u001b[0;34m(left, right, op)\u001b[0m\n\u001b[1;32m    390\u001b[0m         \u001b[0mfiller\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfill_int\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_self_int_dtype\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mis_other_int_dtype\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0mfill_bool\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    391\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 392\u001b[0;31m         \u001b[0mres_values\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mna_logical_op\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    393\u001b[0m         \u001b[0;31m# error: Cannot call function of unknown type\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    394\u001b[0m         \u001b[0mres_values\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfiller\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mres_values\u001b[0m\u001b[0;34m)\u001b[0m  \u001b[0;31m# type: ignore[operator]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/miniconda3/envs/math10s22/lib/python3.7/site-packages/pandas/core/ops/array_ops.py\u001b[0m in \u001b[0;36mna_logical_op\u001b[0;34m(x, y, op)\u001b[0m\n\u001b[1;32m    326\u001b[0m                     \u001b[0;34mf\"Cannot perform '{op.__name__}' with a dtyped [{x.dtype}] array \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    327\u001b[0m                     \u001b[0;34mf\"and scalar of type [{typ}]\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 328\u001b[0;31m                 ) from err\n\u001b[0m\u001b[1;32m    329\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    330\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mTypeError\u001b[0m: Cannot perform 'ror_' with a dtyped [float64] array and scalar of type [bool]"
     ]
    }
   ],
   "source": [
    "df[\"A\"]==\"?\" | df[\"B\"]==\"?\" | df[\"C\"]==\"?\" "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a closer look at how this syntax works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "cell_id": "29e7f6ac17fd44b3883a525a6bf4d43f",
    "deepnote_cell_height": 271.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 3,
    "execution_start": 1649103858923,
    "source_hash": "c5b2323d",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "5     True\n",
       "6     True\n",
       "7    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df[\"A\"]==\"?\") | (df[\"B\"]==\"?\") | (df[\"C\"]==\"?\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an example of a Boolean DataFrame.  In each position, it says whether the DataFrame `df` had the value \"?\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "cell_id": "d3a47d273a414326a9bddcc9af46a091",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 10,
    "execution_start": 1649103867483,
    "source_hash": "c01b8366",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       A      B      C\n",
       "0  False  False  False\n",
       "1  False  False  False\n",
       "2   True  False   True\n",
       "3  False  False  False\n",
       "4  False  False  False\n",
       "5  False  False   True\n",
       "6   True  False  False\n",
       "7  False  False  False"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df==\"?\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a method to find if there are any `True` values along a row.  The keyword argument `axis=1` is telling pandas to look one row at a time.  Notice how this produces the same Boolean Series as the non-Dry version above, with the vertical bars `|`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "cell_id": "84f9f3f1374d4d0ea17b59cf93c80a6e",
    "deepnote_cell_height": 271.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 4,
    "execution_start": 1649103915327,
    "source_hash": "38eeb1d6",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "5     True\n",
       "6     True\n",
       "7    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DRY code\n",
    "(df==\"?\").any(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we instead use `axis=0`, then we are determining if there are any `True` values in each column, one column at a time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "cell_id": "a323399c5f96404a9e25489a2b9748bc",
    "deepnote_cell_height": 175.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     78.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 4,
    "execution_start": 1649103941642,
    "source_hash": "b536adb1",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A     True\n",
       "B    False\n",
       "C     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df==\"?\").any(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use `all` instead of `any`, if we want to know if all of the values are `True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "cell_id": "e03c20de434b453ea5202096fb9f20b6",
    "deepnote_cell_height": 175.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     78.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 10,
    "execution_start": 1649104132497,
    "source_hash": "c38979c3",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    False\n",
       "B    False\n",
       "C    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(df==\"?\").all(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems with this DataFrame\n",
    "\n",
    "The \"?\"s in this DataFrame cause a lot of problems.  They are supposed to represent missing values, but pandas doesn't know that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "cell_id": "1d4f446ac16543f49d943403a5f8dc35",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 19,
    "execution_start": 1649104203165,
    "source_hash": "f804c160",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>?</td>\n",
       "      <td>5.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6.0</td>\n",
       "      <td>?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>?</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0     3 -2.3    -1\n",
       "1   2.1  1.0     0\n",
       "2     ?  5.0     ?\n",
       "3  12.3 -4.0   4.1\n",
       "4     7  2.8     3\n",
       "5     5  6.0     ?\n",
       "6     ?  1.0   1.8\n",
       "7     2 -1.0  3.14"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The values in column \"A\" look like numbers, but because of the \"?\" values, pandas is treating them as strings.  For example, if we try to use the `sum` method, the values get concatenated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "cell_id": "cc2b970d5f4c4028b190a1087105dbb6",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 19,
    "execution_start": 1649104232830,
    "source_hash": "3299dadc",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'32.1?12.375?2'"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"A\"].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"B\" column is better because it doesn't have any \"?\"s in it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "cell_id": "046aabfac03e4f20b4d5d24dc50cca5d",
    "deepnote_cell_height": 175.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     78.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 16,
    "execution_start": 1649104262434,
    "source_hash": "52430027",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A     object\n",
       "B    float64\n",
       "C     object\n",
       "dtype: object"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is how `+` works when the two objects are strings in Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "cell_id": "15f63b71ad2b49d69b65c3f4bfa58c2c",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 12,
    "execution_start": 1649104281757,
    "source_hash": "470eda09",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'32'"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"3\"+\"2\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `describe` method also doesn't work in two of the three columns, because pandas thinks they are strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "cell_id": "a07b3aee862d486ebcf6f034bac2d1ec",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 17,
    "execution_start": 1649104303499,
    "source_hash": "f88152d9",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>8.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>1.062500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>3.466137</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-4.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>-1.325000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>3.350000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>6.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              B\n",
       "count  8.000000\n",
       "mean   1.062500\n",
       "std    3.466137\n",
       "min   -4.000000\n",
       "25%   -1.325000\n",
       "50%    1.000000\n",
       "75%    3.350000\n",
       "max    6.000000"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Often missing values are handled automatically by pandas.  In this case, they are not handled automatically, so we tell pandas explicitly when importing that the missing values are denoted by \"?\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "cell_id": "c90045a79e7048d6952554c9fabc40a9",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 1,
    "execution_start": 1649104390159,
    "source_hash": "647887ff",
    "tags": []
   },
   "outputs": [],
   "source": [
    "df4 = pd.read_csv(\"../data/sampleData.csv\", na_values=\"?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how those \"?\" values have gotten represented as `NaN`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "cell_id": "ec9244eab7324f7f8333a6e1c8a8afd3",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 17,
    "execution_start": 1649104391685,
    "source_hash": "786d530e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.0</td>\n",
       "      <td>-2.3</td>\n",
       "      <td>-1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12.3</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>4.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>7.0</td>\n",
       "      <td>2.8</td>\n",
       "      <td>3.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5.0</td>\n",
       "      <td>6.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>3.14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      A    B     C\n",
       "0   3.0 -2.3 -1.00\n",
       "1   2.1  1.0  0.00\n",
       "2   NaN  5.0   NaN\n",
       "3  12.3 -4.0  4.10\n",
       "4   7.0  2.8  3.00\n",
       "5   5.0  6.0   NaN\n",
       "6   NaN  1.0  1.80\n",
       "7   2.0 -1.0  3.14"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now `sum` works correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "cell_id": "8b176dbbff794838997e1472525191f8",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 12,
    "execution_start": 1649104414272,
    "source_hash": "f571903f",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "31.4"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4[\"A\"].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the `axis` argument the same as before.  Here `axis=0` is saying to add up the columns one at a time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "cell_id": "80696626331545ff93cee70d354ed80e",
    "deepnote_cell_height": 175.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     78.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 476,
    "execution_start": 1649104443308,
    "source_hash": "a7ba9e58",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    31.40\n",
       "B     8.50\n",
       "C    11.04\n",
       "dtype: float64"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.sum(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we are adding the values in each row, one row at a time, because of the `axis=1` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "cell_id": "4dd4163aa634456b924c32df3dd10e38",
    "deepnote_cell_height": 271.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 7,
    "execution_start": 1649104488854,
    "source_hash": "68513aa5",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -0.30\n",
       "1     3.10\n",
       "2     5.00\n",
       "3    12.40\n",
       "4    12.80\n",
       "5    11.00\n",
       "6     2.80\n",
       "7     4.14\n",
       "dtype: float64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.sum(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are lots of different methods which behave similarly.  For example, here we compute the `median` of each row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "cell_id": "91d85a84fab6424aba4726d39494b770",
    "deepnote_cell_height": 271.78125,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     174.78125
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 9,
    "execution_start": 1649104511372,
    "source_hash": "a64f4a4b",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -1.0\n",
       "1    1.0\n",
       "2    5.0\n",
       "3    4.1\n",
       "4    3.0\n",
       "5    5.5\n",
       "6    1.4\n",
       "7    2.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.median(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NaN\n",
    "\n",
    "Let's look more closely at these missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "cell_id": "1e3bc538203c490b83418bf8a058438a",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 19,
    "execution_start": 1649104645761,
    "source_hash": "389fcc9a",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.loc[2,\"A\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "cell_id": "4fb243f15f294c078c387d5346775514",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 13,
    "execution_start": 1649104656229,
    "source_hash": "96b8035c",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.float64"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df4.loc[2,\"A\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This constant is part of the important NumPy library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "cell_id": "ae34a79ae0f7428989f6b83df9d16c82",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 0,
    "execution_start": 1649104709748,
    "source_hash": "c2602aa8",
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want to make this \"not a number\" value ourselves, we can use the following syntax."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "cell_id": "3f4cf77f3960412d81866259f7380a80",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 17,
    "execution_start": 1649104715353,
    "source_hash": "e200addd",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Missing values in a real dataset\n",
    "\n",
    "So far today, all of our examples came from a small artificial dataset.  Here we import a slightly bigger (still relatively small) dataset with real data in it.  Notice how we don't need to specify the `na_values` keyword argument, because pandas correctly identifies the missing data automatically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "cell_id": "c70cd2ecd2b046c78641fa24bec7e42a",
    "deepnote_cell_height": 81,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 1,
    "execution_start": 1649104825688,
    "source_hash": "cef682a2",
    "tags": []
   },
   "outputs": [],
   "source": [
    "cars = pd.read_csv(\"../data/cars.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the `info` method, we can see that, for example, there are 8 null values (i.e., missing values) in the \"Miles_per_Gallon column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "cell_id": "af03aec7146d4562a8e22b5f458dc93f",
    "deepnote_cell_height": 409.828125,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 13,
    "execution_start": 1649104836117,
    "source_hash": "53e91b83",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 406 entries, 0 to 405\n",
      "Data columns (total 9 columns):\n",
      " #   Column            Non-Null Count  Dtype  \n",
      "---  ------            --------------  -----  \n",
      " 0   Name              406 non-null    object \n",
      " 1   Miles_per_Gallon  398 non-null    float64\n",
      " 2   Cylinders         406 non-null    int64  \n",
      " 3   Displacement      406 non-null    float64\n",
      " 4   Horsepower        400 non-null    float64\n",
      " 5   Weight_in_lbs     406 non-null    int64  \n",
      " 6   Acceleration      406 non-null    float64\n",
      " 7   Year              406 non-null    object \n",
      " 8   Origin            406 non-null    object \n",
      "dtypes: float64(4), int64(2), object(3)\n",
      "memory usage: 28.7+ KB\n"
     ]
    }
   ],
   "source": [
    "cars.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "cell_id": "a9dcd462dd8242d0bf9d0768b7391301",
    "deepnote_cell_height": 329.375,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     232.375
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 6,
    "execution_start": 1649104886482,
    "source_hash": "b2d762b9",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      False\n",
       "1      False\n",
       "2      False\n",
       "3      False\n",
       "4      False\n",
       "       ...  \n",
       "401    False\n",
       "402    False\n",
       "403    False\n",
       "404    False\n",
       "405    False\n",
       "Name: Miles_per_Gallon, Length: 406, dtype: bool"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars[\"Miles_per_Gallon\"].isna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a quick way to count those missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "cell_id": "e74c2d32fef74d8ea61a3d38bb509e92",
    "deepnote_cell_height": 118.1875,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     21.1875
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 8,
    "execution_start": 1649104900782,
    "source_hash": "51f9069e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars[\"Miles_per_Gallon\"].isna().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using Boolean indexing to find the rows where the \"Miles_per_Gallon\" value is missing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "cell_id": "9913547c6b1144849ee86bf03d02a394",
    "deepnote_cell_height": 516,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 10,
    "execution_start": 1649104923859,
    "source_hash": "c6185c9a",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Miles_per_Gallon</th>\n",
       "      <th>Cylinders</th>\n",
       "      <th>Displacement</th>\n",
       "      <th>Horsepower</th>\n",
       "      <th>Weight_in_lbs</th>\n",
       "      <th>Acceleration</th>\n",
       "      <th>Year</th>\n",
       "      <th>Origin</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>citroen ds-21 pallas</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>133.0</td>\n",
       "      <td>115.0</td>\n",
       "      <td>3090</td>\n",
       "      <td>17.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>chevrolet chevelle concours (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>4142</td>\n",
       "      <td>11.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>ford torino (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>351.0</td>\n",
       "      <td>153.0</td>\n",
       "      <td>4034</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>plymouth satellite (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>383.0</td>\n",
       "      <td>175.0</td>\n",
       "      <td>4166</td>\n",
       "      <td>10.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>amc rebel sst (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>360.0</td>\n",
       "      <td>175.0</td>\n",
       "      <td>3850</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>ford mustang boss 302</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3353</td>\n",
       "      <td>8.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>volkswagen super beetle 117</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>97.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>1978</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1971-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>367</th>\n",
       "      <td>saab 900s</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>121.0</td>\n",
       "      <td>110.0</td>\n",
       "      <td>2800</td>\n",
       "      <td>15.4</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 Name  Miles_per_Gallon  Cylinders  \\\n",
       "10               citroen ds-21 pallas               NaN          4   \n",
       "11   chevrolet chevelle concours (sw)               NaN          8   \n",
       "12                   ford torino (sw)               NaN          8   \n",
       "13            plymouth satellite (sw)               NaN          8   \n",
       "14                 amc rebel sst (sw)               NaN          8   \n",
       "17              ford mustang boss 302               NaN          8   \n",
       "39        volkswagen super beetle 117               NaN          4   \n",
       "367                         saab 900s               NaN          4   \n",
       "\n",
       "     Displacement  Horsepower  Weight_in_lbs  Acceleration        Year  Origin  \n",
       "10          133.0       115.0           3090          17.5  1970-01-01  Europe  \n",
       "11          350.0       165.0           4142          11.5  1970-01-01     USA  \n",
       "12          351.0       153.0           4034          11.0  1970-01-01     USA  \n",
       "13          383.0       175.0           4166          10.5  1970-01-01     USA  \n",
       "14          360.0       175.0           3850          11.0  1970-01-01     USA  \n",
       "17          302.0       140.0           3353           8.0  1970-01-01     USA  \n",
       "39           97.0        48.0           1978          20.0  1971-01-01  Europe  \n",
       "367         121.0       110.0           2800          15.4  1982-01-01  Europe  "
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars[cars[\"Miles_per_Gallon\"].isna()]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common task is to remove all the rows which are missing any values.  Here is a quick way to do that.  (This is displaying a new DataFrame; the way it's written, it does not change `cars`.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "cell_id": "53d1abe86db34f7fbb01fbe19702f488",
    "deepnote_cell_height": 598,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 27,
    "execution_start": 1649105008028,
    "source_hash": "8f2e83b8",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Miles_per_Gallon</th>\n",
       "      <th>Cylinders</th>\n",
       "      <th>Displacement</th>\n",
       "      <th>Horsepower</th>\n",
       "      <th>Weight_in_lbs</th>\n",
       "      <th>Acceleration</th>\n",
       "      <th>Year</th>\n",
       "      <th>Origin</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>chevrolet chevelle malibu</td>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>buick skylark 320</td>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>plymouth satellite</td>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>amc rebel sst</td>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ford torino</td>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>401</th>\n",
       "      <td>ford mustang gl</td>\n",
       "      <td>27.0</td>\n",
       "      <td>4</td>\n",
       "      <td>140.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>2790</td>\n",
       "      <td>15.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>402</th>\n",
       "      <td>vw pickup</td>\n",
       "      <td>44.0</td>\n",
       "      <td>4</td>\n",
       "      <td>97.0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>2130</td>\n",
       "      <td>24.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>403</th>\n",
       "      <td>dodge rampage</td>\n",
       "      <td>32.0</td>\n",
       "      <td>4</td>\n",
       "      <td>135.0</td>\n",
       "      <td>84.0</td>\n",
       "      <td>2295</td>\n",
       "      <td>11.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>404</th>\n",
       "      <td>ford ranger</td>\n",
       "      <td>28.0</td>\n",
       "      <td>4</td>\n",
       "      <td>120.0</td>\n",
       "      <td>79.0</td>\n",
       "      <td>2625</td>\n",
       "      <td>18.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>405</th>\n",
       "      <td>chevy s-10</td>\n",
       "      <td>31.0</td>\n",
       "      <td>4</td>\n",
       "      <td>119.0</td>\n",
       "      <td>82.0</td>\n",
       "      <td>2720</td>\n",
       "      <td>19.4</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>392 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          Name  Miles_per_Gallon  Cylinders  Displacement  \\\n",
       "0    chevrolet chevelle malibu              18.0          8         307.0   \n",
       "1            buick skylark 320              15.0          8         350.0   \n",
       "2           plymouth satellite              18.0          8         318.0   \n",
       "3                amc rebel sst              16.0          8         304.0   \n",
       "4                  ford torino              17.0          8         302.0   \n",
       "..                         ...               ...        ...           ...   \n",
       "401            ford mustang gl              27.0          4         140.0   \n",
       "402                  vw pickup              44.0          4          97.0   \n",
       "403              dodge rampage              32.0          4         135.0   \n",
       "404                ford ranger              28.0          4         120.0   \n",
       "405                 chevy s-10              31.0          4         119.0   \n",
       "\n",
       "     Horsepower  Weight_in_lbs  Acceleration        Year  Origin  \n",
       "0         130.0           3504          12.0  1970-01-01     USA  \n",
       "1         165.0           3693          11.5  1970-01-01     USA  \n",
       "2         150.0           3436          11.0  1970-01-01     USA  \n",
       "3         150.0           3433          12.0  1970-01-01     USA  \n",
       "4         140.0           3449          10.5  1970-01-01     USA  \n",
       "..          ...            ...           ...         ...     ...  \n",
       "401        86.0           2790          15.6  1982-01-01     USA  \n",
       "402        52.0           2130          24.6  1982-01-01  Europe  \n",
       "403        84.0           2295          11.6  1982-01-01     USA  \n",
       "404        79.0           2625          18.6  1982-01-01     USA  \n",
       "405        82.0           2720          19.4  1982-01-01     USA  \n",
       "\n",
       "[392 rows x 9 columns]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars.dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is another way to do the same thing, using `any`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "cell_id": "efcfcfcb195841f0a7064cfeeb269239",
    "deepnote_cell_height": 598,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 12,
    "execution_start": 1649105033537,
    "source_hash": "9102c47e",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Miles_per_Gallon</th>\n",
       "      <th>Cylinders</th>\n",
       "      <th>Displacement</th>\n",
       "      <th>Horsepower</th>\n",
       "      <th>Weight_in_lbs</th>\n",
       "      <th>Acceleration</th>\n",
       "      <th>Year</th>\n",
       "      <th>Origin</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>401</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>402</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>403</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>404</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>405</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>406 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name  Miles_per_Gallon  Cylinders  Displacement  Horsepower  \\\n",
       "0    False             False      False         False       False   \n",
       "1    False             False      False         False       False   \n",
       "2    False             False      False         False       False   \n",
       "3    False             False      False         False       False   \n",
       "4    False             False      False         False       False   \n",
       "..     ...               ...        ...           ...         ...   \n",
       "401  False             False      False         False       False   \n",
       "402  False             False      False         False       False   \n",
       "403  False             False      False         False       False   \n",
       "404  False             False      False         False       False   \n",
       "405  False             False      False         False       False   \n",
       "\n",
       "     Weight_in_lbs  Acceleration   Year  Origin  \n",
       "0            False         False  False   False  \n",
       "1            False         False  False   False  \n",
       "2            False         False  False   False  \n",
       "3            False         False  False   False  \n",
       "4            False         False  False   False  \n",
       "..             ...           ...    ...     ...  \n",
       "401          False         False  False   False  \n",
       "402          False         False  False   False  \n",
       "403          False         False  False   False  \n",
       "404          False         False  False   False  \n",
       "405          False         False  False   False  \n",
       "\n",
       "[406 rows x 9 columns]"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars.isna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to identify rows where any value is missing.  Since we are working one row at a time, we use `axis=1`.  Here is a Boolean Series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "cell_id": "f1a4f5e5c7484f7b8905aa215bf28b9c",
    "deepnote_cell_height": 329.375,
    "deepnote_cell_type": "code",
    "deepnote_output_heights": [
     232.375
    ],
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 9,
    "execution_start": 1649105078776,
    "source_hash": "3ad1372b",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      False\n",
       "1      False\n",
       "2      False\n",
       "3      False\n",
       "4      False\n",
       "       ...  \n",
       "401    False\n",
       "402    False\n",
       "403    False\n",
       "404    False\n",
       "405    False\n",
       "Length: 406, dtype: bool"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars.isna().any(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we are using Boolean indexing to get keep only those rows which are missing data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "cell_id": "482c12d8ebe045ca8a8063a3d3e12445",
    "deepnote_cell_height": 598,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 46,
    "execution_start": 1649105094095,
    "source_hash": "8dabe1a0",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Miles_per_Gallon</th>\n",
       "      <th>Cylinders</th>\n",
       "      <th>Displacement</th>\n",
       "      <th>Horsepower</th>\n",
       "      <th>Weight_in_lbs</th>\n",
       "      <th>Acceleration</th>\n",
       "      <th>Year</th>\n",
       "      <th>Origin</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>citroen ds-21 pallas</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>133.0</td>\n",
       "      <td>115.0</td>\n",
       "      <td>3090</td>\n",
       "      <td>17.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>chevrolet chevelle concours (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>4142</td>\n",
       "      <td>11.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>ford torino (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>351.0</td>\n",
       "      <td>153.0</td>\n",
       "      <td>4034</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>plymouth satellite (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>383.0</td>\n",
       "      <td>175.0</td>\n",
       "      <td>4166</td>\n",
       "      <td>10.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>amc rebel sst (sw)</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>360.0</td>\n",
       "      <td>175.0</td>\n",
       "      <td>3850</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>ford mustang boss 302</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3353</td>\n",
       "      <td>8.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>ford pinto</td>\n",
       "      <td>25.0</td>\n",
       "      <td>4</td>\n",
       "      <td>98.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2046</td>\n",
       "      <td>19.0</td>\n",
       "      <td>1971-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>volkswagen super beetle 117</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>97.0</td>\n",
       "      <td>48.0</td>\n",
       "      <td>1978</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1971-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>133</th>\n",
       "      <td>ford maverick</td>\n",
       "      <td>21.0</td>\n",
       "      <td>6</td>\n",
       "      <td>200.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2875</td>\n",
       "      <td>17.0</td>\n",
       "      <td>1974-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>337</th>\n",
       "      <td>renault lecar deluxe</td>\n",
       "      <td>40.9</td>\n",
       "      <td>4</td>\n",
       "      <td>85.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1835</td>\n",
       "      <td>17.3</td>\n",
       "      <td>1980-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>ford mustang cobra</td>\n",
       "      <td>23.6</td>\n",
       "      <td>4</td>\n",
       "      <td>140.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2905</td>\n",
       "      <td>14.3</td>\n",
       "      <td>1980-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>361</th>\n",
       "      <td>renault 18i</td>\n",
       "      <td>34.5</td>\n",
       "      <td>4</td>\n",
       "      <td>100.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2320</td>\n",
       "      <td>15.8</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>367</th>\n",
       "      <td>saab 900s</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>121.0</td>\n",
       "      <td>110.0</td>\n",
       "      <td>2800</td>\n",
       "      <td>15.4</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>382</th>\n",
       "      <td>amc concord dl</td>\n",
       "      <td>23.0</td>\n",
       "      <td>4</td>\n",
       "      <td>151.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3035</td>\n",
       "      <td>20.5</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 Name  Miles_per_Gallon  Cylinders  \\\n",
       "10               citroen ds-21 pallas               NaN          4   \n",
       "11   chevrolet chevelle concours (sw)               NaN          8   \n",
       "12                   ford torino (sw)               NaN          8   \n",
       "13            plymouth satellite (sw)               NaN          8   \n",
       "14                 amc rebel sst (sw)               NaN          8   \n",
       "17              ford mustang boss 302               NaN          8   \n",
       "38                         ford pinto              25.0          4   \n",
       "39        volkswagen super beetle 117               NaN          4   \n",
       "133                     ford maverick              21.0          6   \n",
       "337              renault lecar deluxe              40.9          4   \n",
       "343                ford mustang cobra              23.6          4   \n",
       "361                       renault 18i              34.5          4   \n",
       "367                         saab 900s               NaN          4   \n",
       "382                    amc concord dl              23.0          4   \n",
       "\n",
       "     Displacement  Horsepower  Weight_in_lbs  Acceleration        Year  Origin  \n",
       "10          133.0       115.0           3090          17.5  1970-01-01  Europe  \n",
       "11          350.0       165.0           4142          11.5  1970-01-01     USA  \n",
       "12          351.0       153.0           4034          11.0  1970-01-01     USA  \n",
       "13          383.0       175.0           4166          10.5  1970-01-01     USA  \n",
       "14          360.0       175.0           3850          11.0  1970-01-01     USA  \n",
       "17          302.0       140.0           3353           8.0  1970-01-01     USA  \n",
       "38           98.0         NaN           2046          19.0  1971-01-01     USA  \n",
       "39           97.0        48.0           1978          20.0  1971-01-01  Europe  \n",
       "133         200.0         NaN           2875          17.0  1974-01-01     USA  \n",
       "337          85.0         NaN           1835          17.3  1980-01-01  Europe  \n",
       "343         140.0         NaN           2905          14.3  1980-01-01     USA  \n",
       "361         100.0         NaN           2320          15.8  1982-01-01  Europe  \n",
       "367         121.0       110.0           2800          15.4  1982-01-01  Europe  \n",
       "382         151.0         NaN           3035          20.5  1982-01-01     USA  "
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cars[cars.isna().any(axis=1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A more natural thing is to get rid of those rows.  We can negate our Boolean Series by using `~`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "cell_id": "afcad5bc940d4c718a2de164db65ca90",
    "deepnote_cell_height": 616,
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_millis": 48,
    "execution_start": 1649105137634,
    "source_hash": "3743334f",
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Miles_per_Gallon</th>\n",
       "      <th>Cylinders</th>\n",
       "      <th>Displacement</th>\n",
       "      <th>Horsepower</th>\n",
       "      <th>Weight_in_lbs</th>\n",
       "      <th>Acceleration</th>\n",
       "      <th>Year</th>\n",
       "      <th>Origin</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>chevrolet chevelle malibu</td>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>307.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>3504</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>buick skylark 320</td>\n",
       "      <td>15.0</td>\n",
       "      <td>8</td>\n",
       "      <td>350.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>3693</td>\n",
       "      <td>11.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>plymouth satellite</td>\n",
       "      <td>18.0</td>\n",
       "      <td>8</td>\n",
       "      <td>318.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3436</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>amc rebel sst</td>\n",
       "      <td>16.0</td>\n",
       "      <td>8</td>\n",
       "      <td>304.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>3433</td>\n",
       "      <td>12.0</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ford torino</td>\n",
       "      <td>17.0</td>\n",
       "      <td>8</td>\n",
       "      <td>302.0</td>\n",
       "      <td>140.0</td>\n",
       "      <td>3449</td>\n",
       "      <td>10.5</td>\n",
       "      <td>1970-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>401</th>\n",
       "      <td>ford mustang gl</td>\n",
       "      <td>27.0</td>\n",
       "      <td>4</td>\n",
       "      <td>140.0</td>\n",
       "      <td>86.0</td>\n",
       "      <td>2790</td>\n",
       "      <td>15.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>402</th>\n",
       "      <td>vw pickup</td>\n",
       "      <td>44.0</td>\n",
       "      <td>4</td>\n",
       "      <td>97.0</td>\n",
       "      <td>52.0</td>\n",
       "      <td>2130</td>\n",
       "      <td>24.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>403</th>\n",
       "      <td>dodge rampage</td>\n",
       "      <td>32.0</td>\n",
       "      <td>4</td>\n",
       "      <td>135.0</td>\n",
       "      <td>84.0</td>\n",
       "      <td>2295</td>\n",
       "      <td>11.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>404</th>\n",
       "      <td>ford ranger</td>\n",
       "      <td>28.0</td>\n",
       "      <td>4</td>\n",
       "      <td>120.0</td>\n",
       "      <td>79.0</td>\n",
       "      <td>2625</td>\n",
       "      <td>18.6</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>405</th>\n",
       "      <td>chevy s-10</td>\n",
       "      <td>31.0</td>\n",
       "      <td>4</td>\n",
       "      <td>119.0</td>\n",
       "      <td>82.0</td>\n",
       "      <td>2720</td>\n",
       "      <td>19.4</td>\n",
       "      <td>1982-01-01</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>392 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          Name  Miles_per_Gallon  Cylinders  Displacement  \\\n",
       "0    chevrolet chevelle malibu              18.0          8         307.0   \n",
       "1            buick skylark 320              15.0          8         350.0   \n",
       "2           plymouth satellite              18.0          8         318.0   \n",
       "3                amc rebel sst              16.0          8         304.0   \n",
       "4                  ford torino              17.0          8         302.0   \n",
       "..                         ...               ...        ...           ...   \n",
       "401            ford mustang gl              27.0          4         140.0   \n",
       "402                  vw pickup              44.0          4          97.0   \n",
       "403              dodge rampage              32.0          4         135.0   \n",
       "404                ford ranger              28.0          4         120.0   \n",
       "405                 chevy s-10              31.0          4         119.0   \n",
       "\n",
       "     Horsepower  Weight_in_lbs  Acceleration        Year  Origin  \n",
       "0         130.0           3504          12.0  1970-01-01     USA  \n",
       "1         165.0           3693          11.5  1970-01-01     USA  \n",
       "2         150.0           3436          11.0  1970-01-01     USA  \n",
       "3         150.0           3433          12.0  1970-01-01     USA  \n",
       "4         140.0           3449          10.5  1970-01-01     USA  \n",
       "..          ...            ...           ...         ...     ...  \n",
       "401        86.0           2790          15.6  1982-01-01     USA  \n",
       "402        52.0           2130          24.6  1982-01-01  Europe  \n",
       "403        84.0           2295          11.6  1982-01-01     USA  \n",
       "404        79.0           2625          18.6  1982-01-01     USA  \n",
       "405        82.0           2720          19.4  1982-01-01     USA  \n",
       "\n",
       "[392 rows x 9 columns]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# negation in pandas\n",
    "cars[~cars.isna().any(axis=1)]"
   ]
  }
 ],
 "metadata": {
  "deepnote": {
   "is_reactive": false
  },
  "deepnote_execution_queue": [],
  "deepnote_notebook_id": "a008aa3f-3aaa-4f75-9462-e29841a64137",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}