Homework 3

Author: BLANK

Collaborators: BLANK

Question 1

  • Import the attached “indexInfo.csv” file.

  • Use the rename method (documentation) to rename the “Index” column as “Abbreviation” and the “Exchange” column as “Full Name”. You will have to specify the keyword argument axis=1, which indicates that we are changing names in the columns axis and not in the rows axis.

  • Use the set_index method (documentation) to set the “Abbreviation” column as the new index of this DataFrame.

  • Give the resulting DataFrame the variable name df_info.

Question 2

  • Write a function fullname which takes as input an abbreviation string like “N225” and as output returns the full name, like “Tokyo Stock Exchange”. As another example, fullname("NYA") should be equal to “New York Stock Exchange”. (Hint. Figure out how to use the pandas Series df_info["Full Name"]. This question will be much easier because you set the index in the previous part.)

Question 3

  • Load the attached “indexData.csv” file. Save the result with the variable name df_pre.

  • Add a new column named “Full Name” to df_pre which contains the full name of each stock exchange. (Hint. Use map and the function you wrote above.)

Question 4

Here are two ways to drop rows with missing data from a DataFrame.

  • Use the code df_pre.dropna().

  • Use our “usual” method, involving Boolean indexing, isna, any, and axis. Time both of these separately using %%timeit. Write a markdown cell saying how the speeds compare.

Question 5

  • Using whichever method you prefer, make a new DataFrame with the variable name df which has the same data as df_pre but with the rows with missing data dropped. (Consider adding .copy() at the end of your code, especially if warnings show up later.)

  • Check your answer. The shape of df should be the tuple (110253, 9).

Question 6

The “Date” column contains strings like “12/31/65”, which means “12/31/1965”, as well as strings like “6/2/21”, which means “6/2/2021”.

  • Write a function fix_date which takes as input strings like “12/31/65” and “6/2/21”, and as output returns strings like, for example, “12/31/1965” and “6/2/2021”. (Hint. You can slice a string just like you can slice a list. Notice that there are no dates in the “Date” column from before 1960, so if something like “1/1/10” appears, you can be sure that it stands for 2010 and not 1910.)

  • Try to make the code for your fix_date function DRY, including as little repetition as possible.

  • Use map to apply fix_date to all of the entries in the “Date” column. Name the resulting pandas Series temp_series.

  • Insert a new column into df, named “Date2”, which contains the temp_series values converted to pandas datetime objects using pd.to_datetime. (Be sure to apply pd.to_datetime to the whole Series, not to the individual entries. There is no need to use map in this portion.)

  • Check your work. If you evaluate df.dtypes, you should see that the “Date” column has “object” as its dtype and that the “Date2” column has “datetime64[ns]” (or something similar) as its dtype.

Question 7

  • Define a DataFrame df_sub that contains only rows from df that occurred on a Tuesday in the year 2000 or later. (More details: If s is a pandas Series of datetime objects, you can use s.dt.year to access the year and you can use s.dt.day_name() to access the day name. I’m not sure why one requires no parentheses and the other requires parentheses.)

  • Check your work: The shape of df_sub should be (14063, 10), and the first date in df_sub should be from January 4th, 2000.

Question 8

  • By default, Altair can only be used with DataFrames with 5000 rows or fewer. Replace 5000 with 15000 by using the following code.

alt.data_transformers.enable('default', max_rows=15000)
  • Define an Altair selection_interval using the following code.

brush = alt.selection_interval(encodings=["x"])

The portion encodings=["x"] specifies that the selection can only be made along the x-axis.

Question 9

  • Make an Altair line chart c using df_sub for the data, using the “Date2” column for the x-axis, using “Close” for the y-axis, and using “Full Name” for the color. For the color scheme (see the Altair documentation), use “category20”.

  • Add the brush to c using add_selection.

  • Specify that c should have a width of 600 and a height of 100, as in this example from the Altair documentation.

Question 10

  • Make a similar chart c_detail, again using df_sub for the data, using the “Date2” column for the x-axis, using “Close” for the y-axis, and using “Full Name” for the color, again with scheme="category20". (So far, c_detail is exactly the same as c.)

  • Instead of using add_selection, for c_detail use transform_filter(brush), so that only the data points selected in c will be shown.

  • Specify that c_detail should have a width of 600 and a height of 400.

Question 11

  • Display c_detail above c, using c_detail&c.

  • Try clicking and dragging on the c chart. The c_detail chart should respond by displaying only the data from the highlighted region.

Submission

To submit this homework, go to the Share option at the top right, and share the project to create a link, and then submit that link on Canvas.

Created in deepnote.com Created in Deepnote