Homework 3
Contents
Homework 3¶
Author: BLANK
Collaborators: BLANK
Question 1¶
Import the attached “indexInfo.csv” file.
Use the
renamemethod (documentation) to rename the “Index” column as “Abbreviation” and the “Exchange” column as “Full Name”. You will have to specify the keyword argumentaxis=1, which indicates that we are changing names in the columns axis and not in the rows axis.Use the
set_indexmethod (documentation) to set the “Abbreviation” column as the new index of this DataFrame.Give the resulting DataFrame the variable name
df_info.
Question 2¶
Write a function
fullnamewhich takes as input an abbreviation string like “N225” and as output returns the full name, like “Tokyo Stock Exchange”. As another example,fullname("NYA")should be equal to “New York Stock Exchange”. (Hint. Figure out how to use the pandas Seriesdf_info["Full Name"]. This question will be much easier because you set theindexin the previous part.)
Question 3¶
Load the attached “indexData.csv” file. Save the result with the variable name
df_pre.Add a new column named “Full Name” to
df_prewhich contains the full name of each stock exchange. (Hint. Usemapand the function you wrote above.)
Question 4¶
Here are two ways to drop rows with missing data from a DataFrame.
Use the code
df_pre.dropna().Use our “usual” method, involving Boolean indexing,
isna,any, andaxis. Time both of these separately using%%timeit. Write a markdown cell saying how the speeds compare.
Question 5¶
Using whichever method you prefer, make a new DataFrame with the variable name
dfwhich has the same data asdf_prebut with the rows with missing data dropped. (Consider adding.copy()at the end of your code, especially if warnings show up later.)Check your answer. The
shapeofdfshould be the tuple(110253, 9).
Question 6¶
The “Date” column contains strings like “12/31/65”, which means “12/31/1965”, as well as strings like “6/2/21”, which means “6/2/2021”.
Write a function
fix_datewhich takes as input strings like “12/31/65” and “6/2/21”, and as output returns strings like, for example, “12/31/1965” and “6/2/2021”. (Hint. You can slice a string just like you can slice a list. Notice that there are no dates in the “Date” column from before 1960, so if something like “1/1/10” appears, you can be sure that it stands for 2010 and not 1910.)Try to make the code for your
fix_datefunction DRY, including as little repetition as possible.Use
mapto applyfix_dateto all of the entries in the “Date” column. Name the resulting pandas Seriestemp_series.Insert a new column into
df, named “Date2”, which contains thetemp_seriesvalues converted to pandas datetime objects usingpd.to_datetime. (Be sure to applypd.to_datetimeto the whole Series, not to the individual entries. There is no need to usemapin this portion.)Check your work. If you evaluate
df.dtypes, you should see that the “Date” column has “object” as its dtype and that the “Date2” column has “datetime64[ns]” (or something similar) as its dtype.
Question 7¶
Define a DataFrame
df_subthat contains only rows fromdfthat occurred on a Tuesday in the year 2000 or later. (More details: Ifsis a pandas Series of datetime objects, you can uses.dt.yearto access the year and you can uses.dt.day_name()to access the day name. I’m not sure why one requires no parentheses and the other requires parentheses.)Check your work: The shape of
df_subshould be(14063, 10), and the first date indf_subshould be from January 4th, 2000.
Question 8¶
By default, Altair can only be used with DataFrames with 5000 rows or fewer. Replace 5000 with 15000 by using the following code.
alt.data_transformers.enable('default', max_rows=15000)
Define an Altair
selection_intervalusing the following code.
brush = alt.selection_interval(encodings=["x"])
The portion encodings=["x"] specifies that the selection can only be made along the x-axis.
Question 9¶
Make an Altair line chart
cusingdf_subfor the data, using the “Date2” column for the x-axis, using “Close” for the y-axis, and using “Full Name” for the color. For the color scheme (see the Altair documentation), use “category20”.Add the
brushtocusingadd_selection.Specify that
cshould have a width of 600 and a height of 100, as in this example from the Altair documentation.
Question 10¶
Make a similar chart
c_detail, again usingdf_subfor the data, using the “Date2” column for the x-axis, using “Close” for the y-axis, and using “Full Name” for the color, again withscheme="category20". (So far,c_detailis exactly the same asc.)Instead of using
add_selection, forc_detailusetransform_filter(brush), so that only the data points selected incwill be shown.Specify that
c_detailshould have a width of 600 and a height of 400.
Question 11¶
Display
c_detailabovec, usingc_detail&c.Try clicking and dragging on the
cchart. Thec_detailchart should respond by displaying only the data from the highlighted region.
Submission¶
To submit this homework, go to the Share option at the top right, and share the project to create a link, and then submit that link on Canvas.