Worksheet 9#
You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same upload.)
Part 1: Generating the data#
Using the make_regression function from sklearn.datasets, generate data for linear regression with the following properties. Reminder: once you have imported this function, you can use help(make_regression) to get more information.
There should be 150 instances (also called observations).
There should be 2 input features (also called predictors). We will think of these input features as
x0andx1.Use the default value of one target dimension. (Meaning you don’t need to type anything for this part.)
Choose a positive integer and store it as the variable
rs.Use the
random_statekeyword argument, and set it to the valuers, so that you get reproducible data. (Be sure to typers, not the number itself.)Choose a value of
noiseso that the data looks close to linear, but not perfectly linear. (We will plot it below. If the data looks like it lies perfectly on a plane, then you need to makenoisebigger. If the data looks totally random, then you need to makenoisesmaller.)
Part 2: Putting the data in a pandas DataFrame#
Make a pandas DataFrame df containing both the input data and the output data together. Name the input data columns "x0" and "x1", and name the target data column "z". Your DataFrame should have 150 rows and 3 columns.
Here are two different strategies you can use to make df.
Use
df = pd.DataFrame({"x0": ???, "x1": ???, "z": ???}). Here we are creating the DataFrame from a dictionary specifying the three columns. You can use slicing withXto get the two columns (don’t uselocoriloc, becauseXis a NumPy array not a pandas DataFrame).Use
df = pd.DataFrame(X, columns = [???, ???]), whereXis the first output frommake_regression, and where you specify the names of the two columns in thecolumnslist. Then you can add a new column"z"todfjust like you usually would:df["z"] = ???.
Part 3: Plotting the data using Plotly#
Altair is not able to make 3d plots (and I don’t know of anything in Python that makes as nice of 3d plots as Mathematica or Matlab), so we will use Plotly instead.
Adapt this first example (be sure to remove the
df = px.data.iris()line) and plot our linear regression data fromdf.Rotate the resulting plot so that it is clear this data lies approximately but not perfectly on a plane. (It’s possible this will never be clear. If that’s the case, try changing
random_stateabove, or try decreasing the amount of random Gaussiannoisethat you are using.)Include
title=f"random_state=???"among the Plotly keyword arguments so that I know whatrandom_statevalue you used. Don’t type the number directly; instead use thersvariable you defined above. (Notice that we are using an f-string here.)
Submission#
Save the resulting Plotly image as a png file by clicking the camera icon at the top right of the plot. (Make sure the title appears in the downloaded image, and make sure the data looks approximately, but not perfectly, linear.)
Submit that png file on Canvas.