Homework 3
Contents
Homework 3¶
Author: BLANK
Collaborators: BLANK
Question 1¶
Import the attached “indexInfo.csv” file.
Use the
rename
method (documentation) to rename the “Index” column as “Abbreviation” and the “Exchange” column as “Full Name”. You will have to specify the keyword argumentaxis=1
, which indicates that we are changing names in the columns axis and not in the rows axis.Use the
set_index
method (documentation) to set the “Abbreviation” column as the new index of this DataFrame.Give the resulting DataFrame the variable name
df_info
.
Question 2¶
Write a function
fullname
which takes as input an abbreviation string like “N225” and as output returns the full name, like “Tokyo Stock Exchange”. As another example,fullname("NYA")
should be equal to “New York Stock Exchange”. (Hint. Figure out how to use the pandas Seriesdf_info["Full Name"]
. This question will be much easier because you set theindex
in the previous part.)
Question 3¶
Load the attached “indexData.csv” file. Save the result with the variable name
df_pre
.Add a new column named “Full Name” to
df_pre
which contains the full name of each stock exchange. (Hint. Usemap
and the function you wrote above.)
Question 4¶
Here are two ways to drop rows with missing data from a DataFrame.
Use the code
df_pre.dropna()
.Use our “usual” method, involving Boolean indexing,
isna
,any
, andaxis
. Time both of these separately using%%timeit
. Write a markdown cell saying how the speeds compare.
Question 5¶
Using whichever method you prefer, make a new DataFrame with the variable name
df
which has the same data asdf_pre
but with the rows with missing data dropped. (Consider adding.copy()
at the end of your code, especially if warnings show up later.)Check your answer. The
shape
ofdf
should be the tuple(110253, 9)
.
Question 6¶
The “Date” column contains strings like “12/31/65”, which means “12/31/1965”, as well as strings like “6/2/21”, which means “6/2/2021”.
Write a function
fix_date
which takes as input strings like “12/31/65” and “6/2/21”, and as output returns strings like, for example, “12/31/1965” and “6/2/2021”. (Hint. You can slice a string just like you can slice a list. Notice that there are no dates in the “Date” column from before 1960, so if something like “1/1/10” appears, you can be sure that it stands for 2010 and not 1910.)Try to make the code for your
fix_date
function DRY, including as little repetition as possible.Use
map
to applyfix_date
to all of the entries in the “Date” column. Name the resulting pandas Seriestemp_series
.Insert a new column into
df
, named “Date2”, which contains thetemp_series
values converted to pandas datetime objects usingpd.to_datetime
. (Be sure to applypd.to_datetime
to the whole Series, not to the individual entries. There is no need to usemap
in this portion.)Check your work. If you evaluate
df.dtypes
, you should see that the “Date” column has “object” as its dtype and that the “Date2” column has “datetime64[ns]” (or something similar) as its dtype.
Question 7¶
Define a DataFrame
df_sub
that contains only rows fromdf
that occurred on a Tuesday in the year 2000 or later. (More details: Ifs
is a pandas Series of datetime objects, you can uses.dt.year
to access the year and you can uses.dt.day_name()
to access the day name. I’m not sure why one requires no parentheses and the other requires parentheses.)Check your work: The shape of
df_sub
should be(14063, 10)
, and the first date indf_sub
should be from January 4th, 2000.
Question 8¶
By default, Altair can only be used with DataFrames with 5000 rows or fewer. Replace 5000 with 15000 by using the following code.
alt.data_transformers.enable('default', max_rows=15000)
Define an Altair
selection_interval
using the following code.
brush = alt.selection_interval(encodings=["x"])
The portion encodings=["x"]
specifies that the selection can only be made along the x-axis.
Question 9¶
Make an Altair line chart
c
usingdf_sub
for the data, using the “Date2” column for the x-axis, using “Close” for the y-axis, and using “Full Name” for the color. For the color scheme (see the Altair documentation), use “category20”.Add the
brush
toc
usingadd_selection
.Specify that
c
should have a width of 600 and a height of 100, as in this example from the Altair documentation.
Question 10¶
Make a similar chart
c_detail
, again usingdf_sub
for the data, using the “Date2” column for the x-axis, using “Close” for the y-axis, and using “Full Name” for the color, again withscheme="category20"
. (So far,c_detail
is exactly the same asc
.)Instead of using
add_selection
, forc_detail
usetransform_filter(brush)
, so that only the data points selected inc
will be shown.Specify that
c_detail
should have a width of 600 and a height of 400.
Question 11¶
Display
c_detail
abovec
, usingc_detail&c
.Try clicking and dragging on the
c
chart. Thec_detail
chart should respond by displaying only the data from the highlighted region.
Submission¶
To submit this homework, go to the Share option at the top right, and share the project to create a link, and then submit that link on Canvas.
Created in Deepnote