{ "cells": [ { "cell_type": "markdown", "metadata": { "cell_id": "1596786f-e64f-4ad9-b663-6c4212b7f1e4", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "# Week 10, Monday\n", "\n", "[YuJa recording](https://uci.yuja.com/V/Video?v=4539948&node=15119194&a=1725214600&autoplay=1)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cell_id": "1d0b28e8-327e-41b0-b327-1e5953bef295", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 2047, "execution_start": 1646673562178, "source_hash": "66ddaf3f", "tags": [] }, "outputs": [], "source": [ "import seaborn as sns\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "f9352b56-a5b5-46d3-914f-92de23de05e7", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "## Lecture\n", "\n", "We will practice analyzing and cleaning a dataset. This dataset contains scaled versions of the Midterm 2 scores.\n", "\n", "* Can you identify which problem needed to be curved differently between the two versions of the midterm?" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "cell_id": "8b0737dc-d8ac-4702-aac1-4c3af7be3e18", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 3, "execution_start": 1646673564236, "source_hash": "9c374d52", "tags": [] }, "outputs": [], "source": [ "df = pd.read_csv(\"../data/Midterm_scaled.csv\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StudentIDSIS User IDSIS Login IDSection1a1b1c1d1e1f23Column that is randomly blank.Version
0NaNNaNNaNNaNNaNOut of 3Out of 3Out of 3Out of 3Out of 3Out of 3Out of 4Out of 8NaNNaN
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2Chris Davis23232.023251.0daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.7171017190.8429706670.8879704080.9720782310.6541390060.2641352721.0723805290.887667048NaNa
3Chris Davis23232.023251.0daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)1.223291168-1.3519340880.8879704080.9720782310.6541390060.26413527200.887667048NaNNaN
4Chris Davis23232.023251.0daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717101719-1.351934088-1.206729017-0.470360434-2.0813513820.2641352720-1.432371827NaNa
\n", "
" ], "text/plain": [ " Student ID SIS User ID SIS Login ID \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 Chris Davis 23232.0 23251.0 daviscj \n", "3 Chris Davis 23232.0 23251.0 daviscj \n", "4 Chris Davis 23232.0 23251.0 daviscj \n", "\n", " Section 1a 1b \\\n", "0 NaN Out of 3 Out of 3 \n", "1 NaN NaN NaN \n", "2 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717101719 0.842970667 \n", "3 MATH 10 LEC A: INTRO PROG DATA SCI (44870) 1.223291168 -1.351934088 \n", "4 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717101719 -1.351934088 \n", "\n", " 1c 1d 1e 1f 2 \\\n", "0 Out of 3 Out of 3 Out of 3 Out of 3 Out of 4 \n", "1 NaN NaN NaN NaN NaN \n", "2 0.887970408 0.972078231 0.654139006 0.264135272 1.072380529 \n", "3 0.887970408 0.972078231 0.654139006 0.264135272 0 \n", "4 -1.206729017 -0.470360434 -2.081351382 0.264135272 0 \n", "\n", " 3 Column that is randomly blank. Version \n", "0 Out of 8 NaN NaN \n", "1 NaN NaN NaN \n", "2 0.887667048 NaN a \n", "3 0.887667048 NaN NaN \n", "4 -1.432371827 NaN a " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Student object\n", "ID float64\n", "SIS User ID float64\n", "SIS Login ID object\n", "Section object\n", "1a object\n", "1b object\n", "1c object\n", "1d object\n", "1e object\n", "1f object\n", "2 object\n", "3 object\n", "Column that is randomly blank. float64\n", "Version object\n", "dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to convert most of those columns to numeric values." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "cell_id": "76069d3f-99ea-4e68-a8eb-3e7f6f4c7b1d", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 50, "execution_start": 1646673564245, "source_hash": "bb4a0207", "tags": [] }, "outputs": [ { "ename": "ValueError", "evalue": "Unable to parse string \"Out of 3\" at position 0", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/pandas/_libs/lib.pyx:2315\u001b[0m, in \u001b[0;36mpandas._libs.lib.maybe_convert_numeric\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Unable to parse string \"Out of 3\"", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [5]\u001b[0m, in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mto_numeric\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdf\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m1a\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/pandas/core/tools/numeric.py:184\u001b[0m, in \u001b[0;36mto_numeric\u001b[0;34m(arg, errors, downcast)\u001b[0m\n\u001b[1;32m 182\u001b[0m coerce_numeric \u001b[38;5;241m=\u001b[39m errors \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m (\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraise\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 183\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 184\u001b[0m values, _ \u001b[38;5;241m=\u001b[39m \u001b[43mlib\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmaybe_convert_numeric\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 185\u001b[0m \u001b[43m \u001b[49m\u001b[43mvalues\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mset\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcoerce_numeric\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcoerce_numeric\u001b[49m\n\u001b[1;32m 186\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 187\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mValueError\u001b[39;00m, \u001b[38;5;167;01mTypeError\u001b[39;00m):\n\u001b[1;32m 188\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m errors \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraise\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n", "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/pandas/_libs/lib.pyx:2357\u001b[0m, in \u001b[0;36mpandas._libs.lib.maybe_convert_numeric\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Unable to parse string \"Out of 3\" at position 0" ] } ], "source": [ "pd.to_numeric(df[\"1a\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can fix that error by using the `errors` keyword argument." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 NaN\n", "1 NaN\n", "2 -0.717102\n", "3 1.223291\n", "4 -0.717102\n", "5 -0.717102\n", "6 -0.717102\n", "7 -0.717102\n", "8 1.223291\n", "9 -0.717102\n", "10 -0.717102\n", "11 -0.717102\n", "12 -1.687298\n", "13 -0.717102\n", "14 1.223291\n", "15 1.223291\n", "16 -1.687298\n", "17 1.223291\n", "18 -0.717102\n", "19 -0.717102\n", "20 -0.717102\n", "21 -0.717102\n", "22 -0.717102\n", "23 1.223291\n", "24 1.223291\n", "25 1.223291\n", "26 1.223291\n", "27 -0.717102\n", "28 1.223291\n", "29 1.223291\n", "30 -0.717102\n", "31 -0.717102\n", "32 1.223291\n", "33 -0.717102\n", "34 -0.717102\n", "35 1.223291\n", "36 -0.717102\n", "37 -0.717102\n", "38 1.223291\n", "39 -0.717102\n", "40 1.223291\n", "41 -0.717102\n", "42 -0.717102\n", "43 1.223291\n", "44 1.223291\n", "45 -0.717102\n", "46 1.223291\n", "47 -0.717102\n", "Name: 1a, dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_numeric(df[\"1a\"], errors=\"coerce\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to do the same thing to all the columns from \"1a\" to \"3\", we can use `apply` and a `lambda` function." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "cell_id": "006db592-cefd-4d55-8c0b-405e0c60cab7", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 37, "execution_start": 1646673564294, "source_hash": "64346799", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1a1b1c1d1e1f23
0NaNNaNNaNNaNNaNNaNNaNNaN
1NaNNaNNaNNaNNaNNaNNaNNaN
2-0.7171020.8429710.8879700.9720780.6541390.2641351.0723810.887667
31.223291-1.3519340.8879700.9720780.6541390.2641350.0000000.887667
4-0.717102-1.351934-1.206729-0.470360-2.0813510.2641350.000000-1.432372
5-0.717102-1.3519340.887970-0.4703600.6541390.2641350.000000-0.968364
6-0.717102-1.351934-1.206729-0.4703600.6541390.2641350.000000-1.432372
7-0.717102-0.620299-0.159379-0.4703600.6541390.264135-1.072381-0.968364
81.2232910.8429710.8879700.9720780.6541390.2641351.0723810.887667
9-0.717102-0.620299-1.2067290.9720780.6541390.2641351.0723810.423659
10-0.717102-1.3519340.887970-1.9127990.6541390.2641350.0000000.887667
11-0.7171020.8429710.8879700.9720780.6541390.2641351.0723810.887667
12-1.687298-1.351934-0.159379-0.470360-2.0813510.264135-3.217142-0.968364
13-0.7171020.842971-0.159379-0.470360-1.1695210.264135-1.072381-1.432372
141.2232910.8429710.8879700.9720780.6541390.2641350.0000000.887667
151.2232910.842971-1.206729-0.4703600.654139-3.785939-1.072381-0.504356
16-1.687298-1.351934-1.206729-0.470360-2.081351-3.7859390.000000-1.896380
171.2232910.8429710.8879700.9720780.6541390.2641351.0723810.423659
18-0.717102-1.351934-0.159379-0.470360-1.1695210.2641351.0723810.887667
19-0.717102-1.351934-0.159379-0.4703600.6541390.2641350.000000-0.040349
20-0.7171020.8429710.8879700.9720780.6541390.2641350.0000000.887667
21-0.7171020.8429710.8879700.9720780.6541390.2641351.0723810.887667
22-0.7171020.8429710.887970-0.4703600.6541390.2641351.0723810.887667
231.223291-1.351934-0.159379-0.470360-0.2576910.2641350.000000-0.504356
241.2232910.8429710.8879700.9720780.6541390.2641351.0723810.887667
251.2232910.842971-1.206729-0.4703600.6541390.2641350.000000-0.504356
261.2232910.8429710.8879700.9720780.6541390.2641350.0000000.887667
27-0.7171020.842971-1.206729-1.912799-1.1695210.264135-1.072381-1.432372
281.2232910.8429710.8879700.9720780.6541390.2641350.0000000.887667
291.2232910.842971-1.206729-1.9127990.6541390.2641351.072381-0.040349
30-0.7171020.842971-1.2067290.972078-1.1695210.2641350.0000000.887667
31-0.7171020.842971-0.159379-0.4703600.6541390.264135-1.072381-0.968364
321.2232910.8429710.8879700.9720780.6541390.2641351.0723810.887667
33-0.7171020.8429710.8879700.9720780.6541390.2641351.0723810.887667
34-0.7171020.842971-0.159379-0.470360-0.2576910.264135-1.072381-0.968364
351.2232910.842971-2.254079-0.4703600.6541390.2641350.000000-0.040349
36-0.717102-1.351934-1.206729-0.470360-1.1695210.264135-3.217142-0.040349
37-0.717102-1.351934-1.206729-0.4703600.6541390.2641350.0000000.887667
381.2232910.842971-1.206729-0.470360-1.1695210.2641350.000000-0.504356
39-0.717102-1.351934-2.254079-3.355238-2.0813510.264135-1.072381-1.896380
401.223291-0.6202990.887970-0.470360-1.1695210.264135-1.0723810.887667
41-0.7171020.8429710.8879700.9720780.6541390.2641351.072381-0.040349
42-0.7171020.8429710.8879700.9720780.6541390.2641350.0000000.887667
431.223291-0.6202990.8879700.972078-1.1695210.2641350.000000-0.040349
441.2232910.8429710.8879700.9720780.6541390.2641351.0723810.887667
45-0.717102-0.6202990.8879700.9720780.654139-3.7859390.000000-0.040349
461.223291-1.351934-0.159379-0.470360-2.0813510.2641350.000000-2.824395
47-0.7171020.1113360.887970-0.4703600.6541390.2641350.0000000.887667
\n", "
" ], "text/plain": [ " 1a 1b 1c 1d 1e 1f 2 \\\n", "0 NaN NaN NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN NaN NaN \n", "2 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "3 1.223291 -1.351934 0.887970 0.972078 0.654139 0.264135 0.000000 \n", "4 -0.717102 -1.351934 -1.206729 -0.470360 -2.081351 0.264135 0.000000 \n", "5 -0.717102 -1.351934 0.887970 -0.470360 0.654139 0.264135 0.000000 \n", "6 -0.717102 -1.351934 -1.206729 -0.470360 0.654139 0.264135 0.000000 \n", "7 -0.717102 -0.620299 -0.159379 -0.470360 0.654139 0.264135 -1.072381 \n", "8 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "9 -0.717102 -0.620299 -1.206729 0.972078 0.654139 0.264135 1.072381 \n", "10 -0.717102 -1.351934 0.887970 -1.912799 0.654139 0.264135 0.000000 \n", "11 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "12 -1.687298 -1.351934 -0.159379 -0.470360 -2.081351 0.264135 -3.217142 \n", "13 -0.717102 0.842971 -0.159379 -0.470360 -1.169521 0.264135 -1.072381 \n", "14 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 0.000000 \n", "15 1.223291 0.842971 -1.206729 -0.470360 0.654139 -3.785939 -1.072381 \n", "16 -1.687298 -1.351934 -1.206729 -0.470360 -2.081351 -3.785939 0.000000 \n", "17 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "18 -0.717102 -1.351934 -0.159379 -0.470360 -1.169521 0.264135 1.072381 \n", "19 -0.717102 -1.351934 -0.159379 -0.470360 0.654139 0.264135 0.000000 \n", "20 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 0.000000 \n", "21 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "22 -0.717102 0.842971 0.887970 -0.470360 0.654139 0.264135 1.072381 \n", "23 1.223291 -1.351934 -0.159379 -0.470360 -0.257691 0.264135 0.000000 \n", "24 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "25 1.223291 0.842971 -1.206729 -0.470360 0.654139 0.264135 0.000000 \n", "26 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 0.000000 \n", "27 -0.717102 0.842971 -1.206729 -1.912799 -1.169521 0.264135 -1.072381 \n", "28 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 0.000000 \n", "29 1.223291 0.842971 -1.206729 -1.912799 0.654139 0.264135 1.072381 \n", "30 -0.717102 0.842971 -1.206729 0.972078 -1.169521 0.264135 0.000000 \n", "31 -0.717102 0.842971 -0.159379 -0.470360 0.654139 0.264135 -1.072381 \n", "32 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "33 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "34 -0.717102 0.842971 -0.159379 -0.470360 -0.257691 0.264135 -1.072381 \n", "35 1.223291 0.842971 -2.254079 -0.470360 0.654139 0.264135 0.000000 \n", "36 -0.717102 -1.351934 -1.206729 -0.470360 -1.169521 0.264135 -3.217142 \n", "37 -0.717102 -1.351934 -1.206729 -0.470360 0.654139 0.264135 0.000000 \n", "38 1.223291 0.842971 -1.206729 -0.470360 -1.169521 0.264135 0.000000 \n", "39 -0.717102 -1.351934 -2.254079 -3.355238 -2.081351 0.264135 -1.072381 \n", "40 1.223291 -0.620299 0.887970 -0.470360 -1.169521 0.264135 -1.072381 \n", "41 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "42 -0.717102 0.842971 0.887970 0.972078 0.654139 0.264135 0.000000 \n", "43 1.223291 -0.620299 0.887970 0.972078 -1.169521 0.264135 0.000000 \n", "44 1.223291 0.842971 0.887970 0.972078 0.654139 0.264135 1.072381 \n", "45 -0.717102 -0.620299 0.887970 0.972078 0.654139 -3.785939 0.000000 \n", "46 1.223291 -1.351934 -0.159379 -0.470360 -2.081351 0.264135 0.000000 \n", "47 -0.717102 0.111336 0.887970 -0.470360 0.654139 0.264135 0.000000 \n", "\n", " 3 \n", "0 NaN \n", "1 NaN \n", "2 0.887667 \n", "3 0.887667 \n", "4 -1.432372 \n", "5 -0.968364 \n", "6 -1.432372 \n", "7 -0.968364 \n", "8 0.887667 \n", "9 0.423659 \n", "10 0.887667 \n", "11 0.887667 \n", "12 -0.968364 \n", "13 -1.432372 \n", "14 0.887667 \n", "15 -0.504356 \n", "16 -1.896380 \n", "17 0.423659 \n", "18 0.887667 \n", "19 -0.040349 \n", "20 0.887667 \n", "21 0.887667 \n", "22 0.887667 \n", "23 -0.504356 \n", "24 0.887667 \n", "25 -0.504356 \n", "26 0.887667 \n", "27 -1.432372 \n", "28 0.887667 \n", "29 -0.040349 \n", "30 0.887667 \n", "31 -0.968364 \n", "32 0.887667 \n", "33 0.887667 \n", "34 -0.968364 \n", "35 -0.040349 \n", "36 -0.040349 \n", "37 0.887667 \n", "38 -0.504356 \n", "39 -1.896380 \n", "40 0.887667 \n", "41 -0.040349 \n", "42 0.887667 \n", "43 -0.040349 \n", "44 0.887667 \n", "45 -0.040349 \n", "46 -2.824395 \n", "47 0.887667 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:,\"1a\":\"3\"].apply(lambda col: pd.to_numeric(col, errors=\"coerce\"), axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this particular dataset, a much easier strategy is just to specify during the import that we want to skip the top two rows (after the header row)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "cell_id": "ba3ce33b-332a-4016-b1b8-be4c51bc3727", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 10, "execution_start": 1646673564333, "source_hash": "dec346e2", "tags": [] }, "outputs": [], "source": [ "df = pd.read_csv(\"../data/Midterm_scaled.csv\", skiprows=[1,2])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "cell_id": "eff67d44-e50f-4dc4-8b76-d6c0fade82aa", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 40, "execution_start": 1646673564354, "source_hash": "c085b6ba", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StudentIDSIS User IDSIS Login IDSection1a1b1c1d1e1f23Column that is randomly blank.Version
0Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.7171020.8429710.8879700.9720780.6541390.2641351.0723810.887667NaNa
1Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)1.223291-1.3519340.8879700.9720780.6541390.2641350.0000000.887667NaNNaN
2Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-1.351934-1.206729-0.470360-2.0813510.2641350.000000-1.432372NaNa
3Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-1.3519340.887970-0.4703600.6541390.2641350.000000-0.968364NaNNaN
4Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-1.351934-1.206729-0.4703600.6541390.2641350.000000-1.432372NaNNaN
\n", "
" ], "text/plain": [ " Student ID SIS User ID SIS Login ID \\\n", "0 Chris Davis 23232 23251 daviscj \n", "1 Chris Davis 23232 23251 daviscj \n", "2 Chris Davis 23232 23251 daviscj \n", "3 Chris Davis 23232 23251 daviscj \n", "4 Chris Davis 23232 23251 daviscj \n", "\n", " Section 1a 1b 1c \\\n", "0 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 0.842971 0.887970 \n", "1 MATH 10 LEC A: INTRO PROG DATA SCI (44870) 1.223291 -1.351934 0.887970 \n", "2 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -1.351934 -1.206729 \n", "3 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -1.351934 0.887970 \n", "4 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -1.351934 -1.206729 \n", "\n", " 1d 1e 1f 2 3 \\\n", "0 0.972078 0.654139 0.264135 1.072381 0.887667 \n", "1 0.972078 0.654139 0.264135 0.000000 0.887667 \n", "2 -0.470360 -2.081351 0.264135 0.000000 -1.432372 \n", "3 -0.470360 0.654139 0.264135 0.000000 -0.968364 \n", "4 -0.470360 0.654139 0.264135 0.000000 -1.432372 \n", "\n", " Column that is randomly blank. Version \n", "0 NaN a \n", "1 NaN NaN \n", "2 NaN a \n", "3 NaN NaN \n", "4 NaN NaN " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The original dataset only specifies who had version \"a\", not version \"b\". Let's fill in version \"b\" in place of the `NaN` values." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "cell_id": "fff5a8eb-2bcc-4aa3-a213-794d1063e298", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 3, "execution_start": 1646673564495, "source_hash": "5d4ebd94", "tags": [] }, "outputs": [], "source": [ "df[\"Version\"] = df[\"Version\"].fillna(\"b\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some examples using `df.groupby`. There are more examples below in the Worksheet portion." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This shows all possible combinations of (scaled) scores on problems \"1a\" and \"1b\". For example, this shows that 3 students scored -0.717101719 on 1a and -0.62029917 on 1b." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The value of x is (-1.687298162, -1.351934088)\n", "(2, 15)\n", "\n", "The value of x is (-0.717101719, -1.351934088)\n", "(9, 15)\n", "\n", "The value of x is (-0.717101719, -0.62029917)\n", "(3, 15)\n", "\n", "The value of x is (-0.717101719, 0.111335748)\n", "(1, 15)\n", "\n", "The value of x is (-0.717101719, 0.842970667)\n", "(13, 15)\n", "\n", "The value of x is (1.223291168, -1.351934088)\n", "(3, 15)\n", "\n", "The value of x is (1.223291168, -0.62029917)\n", "(2, 15)\n", "\n", "The value of x is (1.223291168, 0.842970667)\n", "(13, 15)\n", "\n" ] } ], "source": [ "for x,y in df.groupby([\"1a\",\"1b\"]):\n", " print(f\"The value of x is {x}\")\n", " print(y.shape)\n", " print(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our question of whether one version was easier than the other version, we are interested in grouping by \"Version\". " ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "cell_id": "f39c15ad-f4cf-4bf4-83e7-2c9b48208045", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 13, "execution_start": 1646673564499, "source_hash": "4f3b753a", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The value of x is a\n", "(21, 15)\n", "The value of x is b\n", "(25, 15)\n" ] } ], "source": [ "for x,y in df.groupby(\"Version\"):\n", " print(f\"The value of x is {x}\")\n", " print(y.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this code, `y` is a DataFrame. Notice for example how in the first displayed DataFrame, the exams are all Version a." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "cell_id": "75512172-322a-4975-89b9-96abe52cbfa0", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 40, "execution_start": 1646673564504, "source_hash": "a25be741", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The value of x is a\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StudentIDSIS User IDSIS Login IDSection1a1b1c1d1e1f23Column that is randomly blank.Version
0Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.7171020.8429710.8879700.9720780.6541390.2641351.0723810.887667NaNa
2Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-1.351934-1.206729-0.470360-2.0813510.2641350.000000-1.432372NaNa
5Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-0.620299-0.159379-0.4703600.6541390.264135-1.072381-0.968364NaNa
11Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.7171020.842971-0.159379-0.470360-1.1695210.264135-1.072381-1.432372NaNa
14Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-1.687298-1.351934-1.206729-0.470360-2.081351-3.7859390.000000-1.896380NaNa
\n", "
" ], "text/plain": [ " Student ID SIS User ID SIS Login ID \\\n", "0 Chris Davis 23232 23251 daviscj \n", "2 Chris Davis 23232 23251 daviscj \n", "5 Chris Davis 23232 23251 daviscj \n", "11 Chris Davis 23232 23251 daviscj \n", "14 Chris Davis 23232 23251 daviscj \n", "\n", " Section 1a 1b 1c \\\n", "0 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 0.842971 0.887970 \n", "2 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -1.351934 -1.206729 \n", "5 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -0.620299 -0.159379 \n", "11 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 0.842971 -0.159379 \n", "14 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -1.687298 -1.351934 -1.206729 \n", "\n", " 1d 1e 1f 2 3 \\\n", "0 0.972078 0.654139 0.264135 1.072381 0.887667 \n", "2 -0.470360 -2.081351 0.264135 0.000000 -1.432372 \n", "5 -0.470360 0.654139 0.264135 -1.072381 -0.968364 \n", "11 -0.470360 -1.169521 0.264135 -1.072381 -1.432372 \n", "14 -0.470360 -2.081351 -3.785939 0.000000 -1.896380 \n", "\n", " Column that is randomly blank. Version \n", "0 NaN a \n", "2 NaN a \n", "5 NaN a \n", "11 NaN a \n", "14 NaN a " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "The value of x is b\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StudentIDSIS User IDSIS Login IDSection1a1b1c1d1e1f23Column that is randomly blank.Version
1Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)1.223291-1.3519340.8879700.9720780.6541390.2641350.0000000.887667NaNb
3Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-1.3519340.887970-0.4703600.6541390.2641350.000000-0.968364NaNb
4Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-1.351934-1.206729-0.4703600.6541390.2641350.000000-1.432372NaNb
6Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)1.2232910.8429710.8879700.9720780.6541390.2641351.0723810.887667NaNb
7Chris Davis2323223251daviscjMATH 10 LEC A: INTRO PROG DATA SCI (44870)-0.717102-0.620299-1.2067290.9720780.6541390.2641351.0723810.423659NaNb
\n", "
" ], "text/plain": [ " Student ID SIS User ID SIS Login ID \\\n", "1 Chris Davis 23232 23251 daviscj \n", "3 Chris Davis 23232 23251 daviscj \n", "4 Chris Davis 23232 23251 daviscj \n", "6 Chris Davis 23232 23251 daviscj \n", "7 Chris Davis 23232 23251 daviscj \n", "\n", " Section 1a 1b 1c \\\n", "1 MATH 10 LEC A: INTRO PROG DATA SCI (44870) 1.223291 -1.351934 0.887970 \n", "3 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -1.351934 0.887970 \n", "4 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -1.351934 -1.206729 \n", "6 MATH 10 LEC A: INTRO PROG DATA SCI (44870) 1.223291 0.842971 0.887970 \n", "7 MATH 10 LEC A: INTRO PROG DATA SCI (44870) -0.717102 -0.620299 -1.206729 \n", "\n", " 1d 1e 1f 2 3 \\\n", "1 0.972078 0.654139 0.264135 0.000000 0.887667 \n", "3 -0.470360 0.654139 0.264135 0.000000 -0.968364 \n", "4 -0.470360 0.654139 0.264135 0.000000 -1.432372 \n", "6 0.972078 0.654139 0.264135 1.072381 0.887667 \n", "7 0.972078 0.654139 0.264135 1.072381 0.423659 \n", "\n", " Column that is randomly blank. Version \n", "1 NaN b \n", "3 NaN b \n", "4 NaN b \n", "6 NaN b \n", "7 NaN b " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for x,y in df.groupby(\"Version\"):\n", " print(f\"The value of x is {x}\")\n", " display(y.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of iterating over the different possibilities, we can also perform what is called an aggregation operation, such as taking the `mean`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDSIS User ID1a1b1c1d1e1f23Column that is randomly blank.
Version
a23232.023251.0-0.0241040.355214-0.109506-0.126923-0.0840090.0712750.153197-0.018253NaN
b23232.023251.00.020248-0.2983800.0919850.1066150.070568-0.059871-0.1286860.015332NaN
\n", "
" ], "text/plain": [ " ID SIS User ID 1a 1b 1c 1d \\\n", "Version \n", "a 23232.0 23251.0 -0.024104 0.355214 -0.109506 -0.126923 \n", "b 23232.0 23251.0 0.020248 -0.298380 0.091985 0.106615 \n", "\n", " 1e 1f 2 3 \\\n", "Version \n", "a -0.084009 0.071275 0.153197 -0.018253 \n", "b 0.070568 -0.059871 -0.128686 0.015332 \n", "\n", " Column that is randomly blank. \n", "Version \n", "a NaN \n", "b NaN " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Version\").mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's a little easier to read if we take the transpose." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Versionab
ID23232.00000023232.000000
SIS User ID23251.00000023251.000000
1a-0.0241040.020248
1b0.355214-0.298380
1c-0.1095060.091985
1d-0.1269230.106615
1e-0.0840090.070568
1f0.071275-0.059871
20.153197-0.128686
3-0.0182530.015332
Column that is randomly blank.NaNNaN
\n", "
" ], "text/plain": [ "Version a b\n", "ID 23232.000000 23232.000000\n", "SIS User ID 23251.000000 23251.000000\n", "1a -0.024104 0.020248\n", "1b 0.355214 -0.298380\n", "1c -0.109506 0.091985\n", "1d -0.126923 0.106615\n", "1e -0.084009 0.070568\n", "1f 0.071275 -0.059871\n", "2 0.153197 -0.128686\n", "3 -0.018253 0.015332\n", "Column that is randomly blank. NaN NaN" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Version\").mean().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also apply formatting to these strings, by saying we only want three decimal places." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "cell_id": "70989629-1fe1-4cd2-ac04-af1c2904880f", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 4, "execution_start": 1646673564555, "source_hash": "7e49bc2", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Versionab
ID23232.00023232.000
SIS User ID23251.00023251.000
1a-0.0240.020
1b0.355-0.298
1c-0.1100.092
1d-0.1270.107
1e-0.0840.071
1f0.071-0.060
20.153-0.129
3-0.0180.015
Column that is randomly blank.nannan
\n", "
" ], "text/plain": [ "Version a b\n", "ID 23232.000 23232.000\n", "SIS User ID 23251.000 23251.000\n", "1a -0.024 0.020\n", "1b 0.355 -0.298\n", "1c -0.110 0.092\n", "1d -0.127 0.107\n", "1e -0.084 0.071\n", "1f 0.071 -0.060\n", "2 0.153 -0.129\n", "3 -0.018 0.015\n", "Column that is randomly blank. nan nan" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Version\").mean().T.applymap(lambda x: f\"{x:.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the 1b value is significantly higher in the \"a\" column than in the \"b\" column. This is the reason that the \"b\" version of the exam was curved one point more than the \"a\" version." ] }, { "cell_type": "markdown", "metadata": { "cell_id": "4d07e3ca-2c94-4d1d-8c11-2382edd100d2", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "## Worksheet\n", "\n", "(This worksheet contains some repetition from the portion above.)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "cell_id": "ed82ba1a-4834-4b57-9d4e-1e8de8eafeb2", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 147, "execution_start": 1646673564601, "source_hash": "c13fed2b", "tags": [] }, "outputs": [], "source": [ "df = sns.load_dataset(\"taxis\")\n", "df.dropna(inplace=True)" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "cb70799a-ebd6-43b4-af4f-eb17a16dafc7", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "## Practice with pandas groupby\n", "\n", "We haven't covered pandas groupby in Math 10 before today. This is a possible \"extra topic\" for the course project.\n", "\n", "Here is an example using `groupby`. We also use f-strings." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "cell_id": "33fe5fc3-8e15-474e-8370-ee42638f2fbb", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 12, "execution_start": 1646673564772, "source_hash": "848e885f", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is Allerton/Pelham Gardens\n", "The type of b is \n", "The shape of b is (2, 14)\n" ] } ], "source": [ "for a,b in df.groupby(\"pickup_zone\"):\n", " print(f\"a is {a}\")\n", " print(f\"The type of b is {type(b)}\")\n", " print(f\"The shape of b is {b.shape}\")\n", " break" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "c12fe168-9097-4e03-bdc4-3a674b5390e8", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "If we instead wanted to get the first 5 values, we could do something like the following. For example, this indicates that 65 rides began in the pickup zone \"Astoria\"." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "cell_id": "a3daf81e-4c94-4231-8133-89b8982c676c", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 11, "execution_start": 1646673564773, "source_hash": "65003594", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is Allerton/Pelham Gardens\n", "The type of b is \n", "The shape of b is (2, 14)\n", "\n", "a is Alphabet City\n", "The type of b is \n", "The shape of b is (9, 14)\n", "\n", "a is Astoria\n", "The type of b is \n", "The shape of b is (65, 14)\n", "\n", "a is Auburndale\n", "The type of b is \n", "The shape of b is (1, 14)\n", "\n", "a is Battery Park\n", "The type of b is \n", "The shape of b is (1, 14)\n", "\n" ] } ], "source": [ "counter = 0\n", "\n", "for a,b in df.groupby(\"pickup_zone\"):\n", " print(f\"a is {a}\")\n", " print(f\"The type of b is {type(b)}\")\n", " print(f\"The shape of b is {b.shape}\")\n", " print(\"\")\n", " counter += 1\n", "\n", " if counter >= 5:\n", " break\n" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "5b3adf1c-a957-41a5-9f2b-f34a8d8dc416", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "You can also group by multiple categories. For example, the following indicates that only 4 rides in the dataset began in Bronx and finished in Brooklyn." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "cell_id": "8aa3c0b8-6d78-4770-b536-72b3ed7e4fa8", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 17, "execution_start": 1646673564774, "source_hash": "5f3c2c2a", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a is ('Bronx', 'Bronx')\n", "The type of b is \n", "The shape of b is (66, 14)\n", "\n", "a is ('Bronx', 'Brooklyn')\n", "The type of b is \n", "The shape of b is (4, 14)\n", "\n", "a is ('Bronx', 'Manhattan')\n", "The type of b is \n", "The shape of b is (25, 14)\n", "\n", "a is ('Bronx', 'Queens')\n", "The type of b is \n", "The shape of b is (4, 14)\n", "\n", "a is ('Brooklyn', 'Bronx')\n", "The type of b is \n", "The shape of b is (5, 14)\n", "\n" ] } ], "source": [ "counter = 0\n", "\n", "for a,b in df.groupby([\"pickup_borough\",\"dropoff_borough\"]):\n", " print(f\"a is {a}\")\n", " print(f\"The type of b is {type(b)}\")\n", " print(f\"The shape of b is {b.shape}\")\n", " print(\"\")\n", " counter += 1\n", "\n", " if counter >= 5:\n", " break" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "a29fa746-705f-450f-b536-95d25bd0f95b", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "Sample exercises:\n", "\n", "1. For each pickup borough, using f-strings, print the average tip for rides that begin in that borough. \n", "\n", "2. Try producing a sub-DataFrame, `df_sub`, which contains only the \"distance\", \"fare\", \"tip\", and \"pickup_zone\" columns, and which contains only rows where the \"tip\" amount is greater than zero. Then execute `df_sub.groupby(\"pickup_zone\").mean()`. What information is this providing?\n", "\n", "3. Do the same thing as in the previous exercise, but instead find what the maximum was in each category, instead of the average." ] }, { "cell_type": "markdown", "metadata": { "cell_id": "8d68a4b7-63a8-482d-b5c6-2e3fc674b377", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "## Practice with pandas styler\n", "\n", "We haven't covered pandas styler in Math 10. This is a possible \"extra topic\" for the course project.\n", "\n", "Based on the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Styler-Functions).\n", "\n", "As an example, we will color the cells blue for which the \"pickup_zone\" or \"dropoff_zone\" contains the word \"Midtown\"." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "cell_id": "5fae2fd5-0014-4ebb-b53f-78921ce09964", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 5, "execution_start": 1646673564786, "source_hash": "5388c2f6", "tags": [] }, "outputs": [], "source": [ "def make_blue(x):\n", " if \"Midtown\" in x:\n", " return 'color:white;background-color:darkblue'\n", " else:\n", " return None" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "d4ce90c8-5cab-46f6-ab65-4cc08da767d7", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "You will have to scroll right to see the blue cells. We only display the first 20 rows." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "cell_id": "7a8b451d-142f-4676-a5c2-cf5bcffa07e2", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 2211, "execution_start": 1646673564792, "source_hash": "13b6aa31", "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pickupdropoffpassengersdistancefaretiptollstotalcolorpaymentpickup_zonedropoff_zonepickup_boroughdropoff_borough
02019-03-23 20:21:092019-03-23 20:27:2411.6000007.0000002.1500000.00000012.950000yellowcredit cardLenox Hill WestUN/Turtle Bay SouthManhattanManhattan
12019-03-04 16:11:552019-03-04 16:19:0010.7900005.0000000.0000000.0000009.300000yellowcashUpper West Side SouthUpper West Side SouthManhattanManhattan
22019-03-27 17:53:012019-03-27 18:00:2511.3700007.5000002.3600000.00000014.160000yellowcredit cardAlphabet CityWest VillageManhattanManhattan
32019-03-10 01:23:592019-03-10 01:49:5117.70000027.0000006.1500000.00000036.950000yellowcredit cardHudson SqYorkville WestManhattanManhattan
42019-03-30 13:27:422019-03-30 13:37:1432.1600009.0000001.1000000.00000013.400000yellowcredit cardMidtown EastYorkville WestManhattanManhattan
52019-03-11 10:37:232019-03-11 10:47:3110.4900007.5000002.1600000.00000012.960000yellowcredit cardTimes Sq/Theatre DistrictMidtown EastManhattanManhattan
62019-03-26 21:07:312019-03-26 21:17:2913.65000013.0000002.0000000.00000018.800000yellowcredit cardBattery Park CityTwo Bridges/Seward ParkManhattanManhattan
82019-03-23 11:48:502019-03-23 12:06:1413.63000015.0000001.0000000.00000019.300000yellowcredit cardEast Harlem SouthMidtown CenterManhattanManhattan
92019-03-08 16:18:372019-03-08 16:26:5711.5200008.0000001.0000000.00000013.300000yellowcredit cardLincoln Square EastCentral ParkManhattanManhattan
102019-03-16 10:02:252019-03-16 10:22:2913.90000017.0000000.0000000.00000017.800000yellowcashLaGuardia AirportAstoriaQueensQueens
112019-03-20 19:39:422019-03-20 19:45:3611.5300006.5000002.1600000.00000012.960000yellowcredit cardUpper West Side SouthManhattan ValleyManhattanManhattan
122019-03-18 21:27:142019-03-18 21:34:1611.0500006.5000001.0000000.00000011.300000yellowcredit cardMurray HillMidtown CenterManhattanManhattan
132019-03-19 07:55:252019-03-19 08:09:1711.75000010.5000000.0000000.00000013.800000yellowcashLincoln Square WestTimes Sq/Theatre DistrictManhattanManhattan
142019-03-27 12:13:342019-03-27 12:25:4802.90000011.5000000.0000000.00000014.800000yellowcashFinancial District NorthTwo Bridges/Seward ParkManhattanManhattan
152019-03-16 17:53:342019-03-16 18:13:5732.09000013.5000000.0000000.00000016.800000yellowcashUpper West Side NorthClinton EastManhattanManhattan
162019-03-15 12:36:052019-03-15 12:54:2812.12000013.0000000.0000000.00000016.300000yellowcashEast ChelseaMeatpacking/West Village WestManhattanManhattan
172019-03-23 20:50:492019-03-23 21:02:0712.60000010.5000002.0000000.00000016.300000yellowcredit cardMidtown CenterEast Harlem SouthManhattanManhattan
182019-03-27 06:28:362019-03-27 06:38:1012.1800009.5000001.9200000.00000014.720000yellowcredit cardGramercyMidtown CenterManhattanManhattan
192019-03-25 22:04:252019-03-25 22:11:3061.0800006.5000001.0800000.00000011.380000yellowcredit cardEast ChelseaEast ChelseaManhattanManhattan
202019-03-21 03:37:342019-03-21 03:44:1311.0700006.5000001.5400000.00000011.840000yellowcredit cardPenn Station/Madison Sq WestKips BayManhattanManhattan
\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:20].style.applymap(make_blue,subset=[\"pickup_zone\",\"dropoff_zone\"])" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "bbbd7bf6-c70c-4fd7-8180-a0184f7c470f", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "Here is a similar example, but where we color every cell in the row a random color. Notice the use of f-strings." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "cell_id": "de9657ba-adca-42e6-90f7-453017dd2092", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 59, "execution_start": 1646673567522, "source_hash": "11e01a88", "tags": [] }, "outputs": [], "source": [ "rng = np.random.default_rng()\n", "color_list = [\"red\",\"purple\",\"orange\",\"wheat\",\"black\",\"blue\"]\n", "prop_list = [f'color:white;background-color:{c}' for c in color_list]\n", "\n", "def find_midtown(row):\n", " if (\"Midtown\" in row[\"dropoff_zone\"]) or (\"Midtown\" in row[\"pickup_zone\"]):\n", " return rng.choice(prop_list, size=len(row))\n", " else:\n", " return [None]*len(row)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "cell_id": "ef246260-5ba8-488f-8926-a03acb86dfba", "deepnote_cell_type": "code", "deepnote_to_be_reexecuted": false, "execution_millis": 2269, "execution_start": 1646673567527, "output_cleared": true, "source_hash": "870b551a", "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 pickupdropoffpassengersdistancefaretiptollstotalcolorpaymentpickup_zonedropoff_zonepickup_boroughdropoff_borough
02019-03-23 20:21:092019-03-23 20:27:2411.6000007.0000002.1500000.00000012.950000yellowcredit cardLenox Hill WestUN/Turtle Bay SouthManhattanManhattan
12019-03-04 16:11:552019-03-04 16:19:0010.7900005.0000000.0000000.0000009.300000yellowcashUpper West Side SouthUpper West Side SouthManhattanManhattan
22019-03-27 17:53:012019-03-27 18:00:2511.3700007.5000002.3600000.00000014.160000yellowcredit cardAlphabet CityWest VillageManhattanManhattan
32019-03-10 01:23:592019-03-10 01:49:5117.70000027.0000006.1500000.00000036.950000yellowcredit cardHudson SqYorkville WestManhattanManhattan
42019-03-30 13:27:422019-03-30 13:37:1432.1600009.0000001.1000000.00000013.400000yellowcredit cardMidtown EastYorkville WestManhattanManhattan
52019-03-11 10:37:232019-03-11 10:47:3110.4900007.5000002.1600000.00000012.960000yellowcredit cardTimes Sq/Theatre DistrictMidtown EastManhattanManhattan
62019-03-26 21:07:312019-03-26 21:17:2913.65000013.0000002.0000000.00000018.800000yellowcredit cardBattery Park CityTwo Bridges/Seward ParkManhattanManhattan
82019-03-23 11:48:502019-03-23 12:06:1413.63000015.0000001.0000000.00000019.300000yellowcredit cardEast Harlem SouthMidtown CenterManhattanManhattan
92019-03-08 16:18:372019-03-08 16:26:5711.5200008.0000001.0000000.00000013.300000yellowcredit cardLincoln Square EastCentral ParkManhattanManhattan
102019-03-16 10:02:252019-03-16 10:22:2913.90000017.0000000.0000000.00000017.800000yellowcashLaGuardia AirportAstoriaQueensQueens
112019-03-20 19:39:422019-03-20 19:45:3611.5300006.5000002.1600000.00000012.960000yellowcredit cardUpper West Side SouthManhattan ValleyManhattanManhattan
122019-03-18 21:27:142019-03-18 21:34:1611.0500006.5000001.0000000.00000011.300000yellowcredit cardMurray HillMidtown CenterManhattanManhattan
132019-03-19 07:55:252019-03-19 08:09:1711.75000010.5000000.0000000.00000013.800000yellowcashLincoln Square WestTimes Sq/Theatre DistrictManhattanManhattan
142019-03-27 12:13:342019-03-27 12:25:4802.90000011.5000000.0000000.00000014.800000yellowcashFinancial District NorthTwo Bridges/Seward ParkManhattanManhattan
152019-03-16 17:53:342019-03-16 18:13:5732.09000013.5000000.0000000.00000016.800000yellowcashUpper West Side NorthClinton EastManhattanManhattan
162019-03-15 12:36:052019-03-15 12:54:2812.12000013.0000000.0000000.00000016.300000yellowcashEast ChelseaMeatpacking/West Village WestManhattanManhattan
172019-03-23 20:50:492019-03-23 21:02:0712.60000010.5000002.0000000.00000016.300000yellowcredit cardMidtown CenterEast Harlem SouthManhattanManhattan
182019-03-27 06:28:362019-03-27 06:38:1012.1800009.5000001.9200000.00000014.720000yellowcredit cardGramercyMidtown CenterManhattanManhattan
192019-03-25 22:04:252019-03-25 22:11:3061.0800006.5000001.0800000.00000011.380000yellowcredit cardEast ChelseaEast ChelseaManhattanManhattan
202019-03-21 03:37:342019-03-21 03:44:1311.0700006.5000001.5400000.00000011.840000yellowcredit cardPenn Station/Madison Sq WestKips BayManhattanManhattan
\n" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:20].style.apply(find_midtown,axis=1)" ] }, { "cell_type": "markdown", "metadata": { "cell_id": "d69bd3f9-803d-474a-89d4-f85fc332c977", "deepnote_cell_type": "markdown", "tags": [] }, "source": [ "pandas styler sample exercises:\n", "\n", "1. Try changing the text color to red on all rides where the fare was at least 10 dollars.\n", "\n", "2. For all cells where the pickup time is between 11pm and midnight, try giving those cells a black background with white text.\n", "\n", "3. For how many rides was the tip amount greater than 40% of the fare? Try coloring the entire row for those rides in red." ] } ], "metadata": { "deepnote": { "is_reactive": false }, "deepnote_execution_queue": [], "deepnote_notebook_id": "12e1f49e-fe81-4953-ab05-c27d9551969f", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }