Homework 4
Contents
Homework 4¶
Remark: This might not reflect updates, so check the Deepnote file for the official version.
List your name and the names of any collaborators at the top of this notebook.
(Reminder: It’s encouraged to work together; you can even submit the exact same homework as another student or two students, but you must list each other’s names at the top.)
Overview¶
This project includes a csv file unemployment.csv
which includes US unemployment rates for various industries between 2000 and 2010. Load that csv file using pd.read_csv
and save the resulting DataFrame as df
.
The background question for this homework is,
How does the time of year affect unemployment in different industries?
Extracting the month from the date¶
The DataFrame includes a “date” column, whose entries are strings. Use each of the following methods to extract the month (represented as an integer from 1 to 12) from the date.
If we wanted to get the year from a string like
s = '2000-01-01 08:00:00+00:00'
, we could uses[0:4]
. Adapt that strategy to get a pandas Series containing the month for each row, usingmap
and alambda
function. Convert the entries from strings to integers usingint
.Another option is to use
s.split("-")
. Use that strategy, again together withmap
, alambda
function, andint
.A more sophisticated option is to, as a first step, convert the
dtype
of the “date” column usingto_datetime
. (In other words, replace what’s in the “date” column so they are no longer strings; this doesn’t requiremap
.) Ift
is one of the entries in the resulting column, then you can uset.month
to get the month. Again usemap
and alambda
function.The best method is probably to avoid using
map
completely. Convert the column again usingto_datetime
, and then adapt one of the methods from the pandas documentation to convert each entry in the series into a month number.Using that last method, create a new column in the in the DataFrame containing these months. Name that column “month”.
If everything went correctly, the dtype
of the “date” column should now be datetime64[ns, UTC]
, and the dtype
of the “month” column should be int64
.
Plotting the data 1¶
Use Altair and mark_circle
to make a scatter plot of the data, placing “date” along the x-axis, “rate” along the y-axis, and coloring the points using “industry”. Include a tooltip showing the “month” and “industry”.
Plotting the data 2¶
Adapt the stacked bar chart example from the Altair documentation to make a stacked bar chart of the data. Use “month” for the x-axis, “mean(rate)” for the y-axis, and again “industry” for the color. Specify an encoding data type for the “month” column so that the plot looks better. (What is the appropriate data type for the discrete set 1, 2, …, 11, 12?)
Plotting the data 3¶
Let c1
denote the stacked bar chart you made in the previous part.
Create an Altair selection object using the following code:
choice = alt.selection_multi(fields=['industry'], bind='legend')
Add that to c1
using .add_selection
.
You have already specified x, y, and color. Also specify opacity
for c1
by using
opacity=alt.condition(choice, alt.value(1), alt.value(0.2)),
Display c1
and try clicking on one of the entries in the legend. The selected entry should be emphasized. You can also select multiple industries by holding down shift while you click.
Plotting the data 4¶
Create a new Altair Chart c2
using the following code.
c2 = alt.Chart(df).mark_bar().encode(
x='month:O',
y='mean(rate)',
color='industry',
).transform_filter(choice)
Then display both c1
and c2
side-by-side using c1|c2
. Now if you click on the legend, c2
should only show that industry.
Interpreting the results¶
What is an example of an industry that seems to have higher unemployment in the winter months? What is an example of an industry that seems to have higher unemployment in the summer months? (You should answer this question using the Altair charts, but if you want extra practice with pandas, you can also try to use pandas to answer it.)
Submission¶
Download the .ipynb file for this notebook (click on the folder icon to the left, then the … next to the file name) and upload the file on Canvas.