Worksheet 9#
You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same upload.)
Part 1: Generating the data#
Using the make_regression
function from sklearn.datasets
, generate data for linear regression with the following properties. Reminder: once you have imported this function, you can use help(make_regression)
to get more information.
There should be 150 instances (also called observations).
There should be 2 input features (also called predictors). We will think of these input features as
x0
andx1
.Use the default value of one target dimension. (Meaning you don’t need to type anything for this part.)
Choose a positive integer and store it as the variable
rs
.Use the
random_state
keyword argument, and set it to the valuers
, so that you get reproducible data. (Be sure to typers
, not the number itself.)Choose a value of
noise
so that the data looks close to linear, but not perfectly linear. (We will plot it below. If the data looks like it lies perfectly on a plane, then you need to makenoise
bigger. If the data looks totally random, then you need to makenoise
smaller.)
Part 2: Putting the data in a pandas DataFrame#
Make a pandas DataFrame df
containing both the input data and the output data together. Name the input data columns "x0"
and "x1"
, and name the target data column "z"
. Your DataFrame should have 150 rows and 3 columns.
Here are two different strategies you can use to make df
.
Use
df = pd.DataFrame({"x0": ???, "x1": ???, "z": ???})
. Here we are creating the DataFrame from a dictionary specifying the three columns. You can use slicing withX
to get the two columns (don’t useloc
oriloc
, becauseX
is a NumPy array not a pandas DataFrame).Use
df = pd.DataFrame(X, columns = [???, ???])
, whereX
is the first output frommake_regression
, and where you specify the names of the two columns in thecolumns
list. Then you can add a new column"z"
todf
just like you usually would:df["z"] = ???
.
Part 3: Plotting the data using Plotly#
Altair is not able to make 3d plots (and I don’t know of anything in Python that makes as nice of 3d plots as Mathematica or Matlab), so we will use Plotly instead.
Adapt this first example (be sure to remove the
df = px.data.iris()
line) and plot our linear regression data fromdf
.Rotate the resulting plot so that it is clear this data lies approximately but not perfectly on a plane. (It’s possible this will never be clear. If that’s the case, try changing
random_state
above, or try decreasing the amount of random Gaussiannoise
that you are using.)Include
title=f"random_state=???"
among the Plotly keyword arguments so that I know whatrandom_state
value you used. Don’t type the number directly; instead use thers
variable you defined above. (Notice that we are using an f-string here.)
Submission#
Save the resulting Plotly image as a png file by clicking the camera icon at the top right of the plot. (Make sure the title appears in the downloaded image, and make sure the data looks approximately, but not perfectly, linear.)
Submit that png file on Canvas.