Worksheet 9#

You are encouraged to work in groups of up to 3 total students, but each student should make their own submission on Canvas. (It’s fine for everyone in the group to have the same upload.)

Part 1: Generating the data#

Using the make_regression function from sklearn.datasets, generate data for linear regression with the following properties. Reminder: once you have imported this function, you can use help(make_regression) to get more information.

  • There should be 150 instances (also called observations).

  • There should be 2 input features (also called predictors). We will think of these input features as x0 and x1.

  • Use the default value of one target dimension. (Meaning you don’t need to type anything for this part.)

  • Choose a positive integer and store it as the variable rs.

  • Use the random_state keyword argument, and set it to the value rs, so that you get reproducible data. (Be sure to type rs, not the number itself.)

  • Choose a value of noise so that the data looks close to linear, but not perfectly linear. (We will plot it below. If the data looks like it lies perfectly on a plane, then you need to make noise bigger. If the data looks totally random, then you need to make noise smaller.)

Part 2: Putting the data in a pandas DataFrame#

Make a pandas DataFrame df containing both the input data and the output data together. Name the input data columns "x0" and "x1", and name the target data column "z". Your DataFrame should have 150 rows and 3 columns.

Here are two different strategies you can use to make df.

  • Use df = pd.DataFrame({"x0": ???, "x1": ???, "z": ???}). Here we are creating the DataFrame from a dictionary specifying the three columns. You can use slicing with X to get the two columns (don’t use loc or iloc, because X is a NumPy array not a pandas DataFrame).

  • Use df = pd.DataFrame(X, columns = [???, ???]), where X is the first output from make_regression, and where you specify the names of the two columns in the columns list. Then you can add a new column "z" to df just like you usually would: df["z"] = ???.

Part 3: Plotting the data using Plotly#

Altair is not able to make 3d plots (and I don’t know of anything in Python that makes as nice of 3d plots as Mathematica or Matlab), so we will use Plotly instead.

  • Adapt this first example (be sure to remove the df = px.data.iris() line) and plot our linear regression data from df.

  • Rotate the resulting plot so that it is clear this data lies approximately but not perfectly on a plane. (It’s possible this will never be clear. If that’s the case, try changing random_state above, or try decreasing the amount of random Gaussian noise that you are using.)

  • Include title=f"random_state=???" among the Plotly keyword arguments so that I know what random_state value you used. Don’t type the number directly; instead use the rs variable you defined above. (Notice that we are using an f-string here.)

Submission#

  • Save the resulting Plotly image as a png file by clicking the camera icon at the top right of the plot. (Make sure the title appears in the downloaded image, and make sure the data looks approximately, but not perfectly, linear.)

  • Submit that png file on Canvas.