💄 Makeup Foundation Shades Analysis 💅🏻#

Author: Loulou Vivian Mahfouz

Course Project, UC Irvine, Math 10, S23

✨ Introduction#

This project is about makeup foundation shades which is makeup that matches one’s skin color. Makeup users buy foundation to even out color, smoothen appearance, hide wrinkles, conceal blemishes, etc… I love makeup, art, fashion, music, and creativity in general so I thought that this would be an interesting dataset to work with and I could also learn something interesting.

So, I analyze bestselling makeup foundations to gain some insight to my overall guiding questions:

Do foundations really cover a wide range of shades (based on this dataset)? Which brand(s) are actually inclusive of all/many shade ranges (based on this dataset)?

import pandas as pd
import seaborn as sns
import numpy as np
import altair as alt
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

🔥 Definitions & Descriptions#

Some definitions taken from HSLA website

Here are brief explanations of the columns in this dataset:

  • brand - brand name

  • brand_short - shortened brand name (for coding purposes & easier to read)

  • product - specific bestseller foundation name

  • product_short - shortened product name

  • hex - hexadecimal color coding to show the amount of red, blue, and green in a color

  • H - hue - particular shade/tint of a color (0 to 360 degrees on the color wheel)

  • S - Saturation - intensity of a color - greyscale (0% completely grey to 100% no grey)

  • V - (Alpha) Value - specifies opacity - dimension of lightness/darkness or intensity of the strength of light (0.0 fully transparent to 1.0 not transparent)

  • L - Lightness - amount of light/brightness in a color (0% no light to 100% full light)

group column sorted numerically: 0: Fenty Beauty’s PRO FILT’R Foundation Only 1: Make Up For Ever’s Ultra HD Foundation Only 2: US Best Sellers 3: BIPOC-recommended Brands with BIPOC Founders 4: BIPOC-recommended Brands with White Founders 5: Nigerian Best Sellers 6: Japanese Best Sellers 7: Indian Best Sellers

df = pd.read_csv("shades.csv") #read the csv file
df.sample(20) # Here are 20 random row samples of the original dataset
brand brand_short product product_short hex H S V L group
619 L'Oréal lo True Match tms f0c7b3 20.0 0.25 0.94 83 7
432 Make Up For Ever mu Ultra HD uhd c98567 18.0 0.49 0.79 62 1
288 Laws of Nature ln Foxy Finish ff b19277 28.0 0.33 0.69 63 3
315 Lancôme lc Teint Idole ti d0975d 30.0 0.55 0.82 67 4
261 Black Up bu Matifying Fluid mf 3b2218 17.0 0.59 0.23 16 3
510 Addiction ad The Foundation tf f3bc86 30.0 0.45 0.95 80 6
382 Bobbi Brown br Skin Long-Wear slw e2b996 28.0 0.34 0.89 78 4
455 House of Tara ht Oil Free off b68358 27.0 0.52 0.71 59 5
110 Estée Lauder el Double Wear dw 5e3617 26.0 0.76 0.37 27 2
562 NARS na Velvet Matte vm e4aa7c 27.0 0.46 0.89 74 6
320 Lancôme lc Teint Idole ti ce9163 26.0 0.52 0.81 66 4
202 Fenty fe PRO FILT'R pf 824f30 23.0 0.63 0.51 39 0
173 Fenty fe PRO FILT'R pf f4cca8 28.0 0.31 0.96 85 0
478 Elsas Pro ep Full Coverage fcf 815d51 15.0 0.37 0.51 43 5
423 Make Up For Ever mu Ultra HD uhd dcb59c 23.0 0.29 0.86 77 1
338 MAC ma Studio Fix sff ffc9a9 22.0 0.34 1.00 85 4
347 MAC ma Studio Fix sff d0a97b 32.0 0.41 0.82 72 4
538 Kate ka Secret Skin Maker Zero ssm e8b793 25.0 0.37 0.91 78 6
390 Bobbi Brown br Skin Long-Wear slw d09059 28.0 0.57 0.82 65 4
14 Maybelline mb Fit Me fmf eab181 27.0 0.45 0.92 77 2
#Adding python dictionary column at the end of df with the group descriptions for each row (mentioned earlier)
group_dict = {
    0: 'Fenty Beauty\'s PRO FILT\'R Foundation Only',
    1: 'Make Up For Ever\'s Ultra HD Foundation Only',
    2: 'US Best Sellers',
    3: 'BIPOC-recommended Brands with BIPOC Founders',
    4: 'BIPOC-recommended Brands with White Founders',
    5: 'Nigerian Best Sellers',
    6: 'Japanese Best Sellers',
    7: 'Indian Best Sellers'
}

df['group_definition'] = df.group.map(group_dict)
df.sample(15)
brand brand_short product product_short hex H S V L group group_definition
161 Covergirl + Olay oc Simply Ageless sa eac4ae NaN NaN NaN 82 2 US Best Sellers
523 Shu Uemera su Petal Skin Fluid psf e0aa7c 28.0 0.45 0.88 74 6 Japanese Best Sellers
525 Shu Uemera su Petal Skin Fluid psf f9dabd 29.0 0.24 0.98 89 6 Japanese Best Sellers
359 MAC ma Studio Fix sff d9b28d 29.0 0.35 0.85 75 4 BIPOC-recommended Brands with White Founders
528 Shu Uemera su Petal Skin Fluid psf dbab6d 34.0 0.50 0.86 73 6 Japanese Best Sellers
581 Bharat & Doris bd Liquid Foundation lf cba167 35.0 0.49 0.80 69 7 Indian Best Sellers
281 Laws of Nature ln Foxy Finish ff 6c4529 25.0 0.62 0.42 33 3 BIPOC-recommended Brands with BIPOC Founders
564 NARS na Velvet Matte vm da9b5a 30.0 0.59 0.85 69 6 Japanese Best Sellers
256 Black Up bu Matifying Fluid mf 9e5a35 21.0 0.66 0.62 46 3 BIPOC-recommended Brands with BIPOC Founders
331 Lancôme lc Teint Idole ti 391b11 15.0 0.70 0.22 14 4 BIPOC-recommended Brands with White Founders
475 Trim & Prissy tp Hi - Def hdf b0704d 21.0 0.56 0.69 54 5 Nigerian Best Sellers
178 Fenty fe PRO FILT'R pf e2ad85 26.0 0.41 0.89 75 0 Fenty Beauty's PRO FILT'R Foundation Only
48 bareMinerals bm barePRO pro cfa786 27.0 0.35 0.81 72 2 US Best Sellers
522 Shu Uemera su Petal Skin Fluid psf c99e76 29.0 0.41 0.79 68 6 Japanese Best Sellers
536 Shiseido sh Synchro Skin ss cb9068 24.0 0.49 0.80 65 6 Japanese Best Sellers

🧹 Data Cleaning#

df.isna().any() #Checking for missing values. Three columns contain some missing values here.
brand               False
brand_short         False
product             False
product_short       False
hex                 False
H                    True
S                    True
V                    True
L                   False
group               False
group_definition    False
dtype: bool
print(f"This dataset currently has {df.shape[0]} rows and {df.shape[1]} columns")
This dataset currently has 625 rows and 11 columns
df.dropna(inplace=True) #dropping missing values
df.shape #shape of the df
(613, 11)
print(f"The Makeup-Shades dataset contains {df.shape[0]} rows and {df.shape[1]} columns after dropping rows with missing values")
The Makeup-Shades dataset contains 613 rows and 11 columns after dropping rows with missing values

🧐 Observing Data Info#

df.dtypes #This is important in case I need to convert any values in the future
brand                object
brand_short          object
product              object
product_short        object
hex                  object
H                   float64
S                   float64
V                   float64
L                     int64
group                 int64
group_definition     object
dtype: object
df.describe().drop(columns="group")
H S V L
count 613.000000 613.000000 613.000000 613.000000
mean 25.314845 0.459494 0.779543 65.654160
std 5.327852 0.154089 0.173955 17.570246
min 4.000000 0.100000 0.200000 11.000000
25% 23.000000 0.350000 0.690000 55.000000
50% 26.000000 0.440000 0.840000 70.000000
75% 29.000000 0.560000 0.910000 79.000000
max 45.000000 1.000000 1.000000 95.000000

Table above shows us important numerical information about the data like the count, mean, standard deviation, min, max, quartiles for numerical columns. The count shows us how many cells of values there are in each column which makes sense why it is the same number as the row number. The mean for the group numbers, the min,max, and quartiles seems useless at the moment so I dropped that colum so we can ignore those values.

Approximately, the average hue is 25, the average saturation value is 0.45 or 45%, the average value (or alpha value) is 0.77, and the average lightness is 0.65 or 65%.

Using this HSLA Color Picker website, here is the generated mean shade:

#(extra)
from PIL import Image
img = Image.open('Average HSLA.png')
mg = Image.open('colorwheel.jpg')
img.show()
mg.show()
../../_images/32ed50f0cd076b7272dc53b9c4f329e80db8caa6189d4c2f89d169cb59f33646.png ../../_images/0ce873f1f564bd050c0299e877536eb462d7bb9c95387121958ed0f352b447cd.png

The skin tones in general (hues) start from min of 4 degrees to max of 45 degrees in our case which makes sense looking at this wheel.

df.corr() #correlations
H S V L group
H 1.000000 -0.166436 0.409831 0.451416 0.118561
S -0.166436 1.000000 -0.707797 -0.810619 -0.048267
V 0.409831 -0.707797 1.000000 0.980690 0.165535
L 0.451416 -0.810619 0.980690 1.000000 0.145904
group 0.118561 -0.048267 0.165535 0.145904 1.000000

Plots#

This plot below shows us Lightness values vs the count. Lightness values range from 0 (black) to 100 (white). Since the lightness value tends to be mostly around the 70s, this could indicate that foundation shades are more skewed to the lighter shades since 70 is closer to 100 (white).

plt.figure(figsize=(50,10))
sns.countplot(x="L", data=df)
plt.show()
../../_images/a0cc242e29c9d061642298f2fbe8d1cd6c5d40adfb02aad2b9f5eb1590cd8cca.png

Extra (Box & Whisker Plots)#

import plotly.graph_objects as go
from plotly.subplots import make_subplots
colors = ['red', 'blue', 'green', 'orange', 'purple', 'brown', 'pink', 'gray']
groups = df.group.unique().tolist()
def show_violin(column):
    fig = go.Figure()
    # 8 groups and 8 colors - hard-coded here
    for group, color in zip(groups, colors[:8]):
        # add violin plot for group
        fig.add_trace(go.Violin(x=df['group_definition'][df['group_definition'] == group],
                                y=df[column][df['group_definition'] == group],
                                name=group,
                                # boxplot with mean
                                box_visible=True,
                                meanline_visible=True,
                                # color
                                line_color=color
                               ))
    fig.update_layout(template='plotly_white', width=1200)
    fig.show()
def show_distribution(column, title=''):
    fig = go.Figure()
    fig = make_subplots(rows=1, cols=2, 
                        shared_yaxes=True, horizontal_spacing = 0.01)
    for group, color in zip(groups, colors[:8]):
        fig.add_trace(go.Box(y=df['group_definition'][df['group'] == group],
                                x=df[column][df['group'] == group],
                                name=group,
                                boxpoints='outliers', # only outliers
                                marker_color='black',
                                line_color='black',
                                fillcolor=color,
                                marker=dict(symbol="diamond"),
                                opacity=0.6), 
                      row=1, col=1)
    for group, color in zip(groups, colors[:8]):
      fig.add_trace(go.Violin(y=df['group_definition'][df['group'] == group],
                              x=df[column][df['group'] == group], 
                              line_color=color), 
                    row=1, col=2)
    # more space between group label and boxplot
    fig.update_yaxes(ticksuffix = ' '* 10) 

    fig.update_layout(template='plotly_white', width=1000, showlegend=False,
                      title=title,  title_x=0.5)
    fig.update_traces(orientation='h')
    fig.update_traces(side='positive', width=2, points=False, col=2)
    fig.show()

This super long code uses plotly to display box and whisker plots and the overall skew of the data. We can also use our mouse to hover over to get the min, median, etc… for more data values. I think this is a great way to visualize and compare the different groupings.

Even though Fenty only has one product listed as a bestseller, the shape of the lightness data is the closest to a normal bell curve. Also, we can see that the Japanese and Indian bestsellers are skewed to the right meaning these are closer to the whiter/lighter shades. This makes sense considering the shades of Japanese people, however, Indians can range from more darker to light people, so it is strange how their plots are similar. For Nigerian bestsellers, the plot is skewed more to the left, however I would think it would be much more towards the left (closer to value 0/black). This could be caused from a variety of reasons - possibly from not collecting a variety of data or brands simply catering more to lighter skin tones.

So, we can see that Fenty, US bestsellers, and BIPOC reccomended by BIPOC founders, BIPOC reccomended by White founder groups have the largest shade ranges (including outliers).

show_distribution('L', 'Lightness')

3D Plotting (Extra)#

Below, I am using code with some help from AI to convert each hex value to rgb values that are normally easier to understand and used by computers. This code is not too complicated as we learned .apply, defining functions, and indexing in Math 10. The tricky part was figuring out how to convert to rgb values. So, I am adding this new column to the dataframe. I also separated the rgb values from a triple to having the red, green, and blue columns separated.

# Define a function to convert hex to RGB
def hex_to_rgb(hex):
    # Convert each pair of hex digits to an integer
    r = int(hex[0:2], 16)
    g = int(hex[2:4], 16)
    b = int(hex[4:6], 16)
    
    # Combine the integers to create an RGB tuple
    rgb = (r, g, b)
    
    # Return the RGB tuple
    return rgb

# Apply the function to each row of the hex code column
df['rgb'] = df['hex'].apply(hex_to_rgb)
df[['r', 'g', 'b']] = df['hex'].apply(hex_to_rgb).apply(pd.Series)

df.sample()
brand brand_short product product_short hex H S V L group group_definition rgb r g b
371 MAC ma Studio Fix sff a0714d 26.0 0.52 0.63 52 4 BIPOC-recommended Brands with White Founders (160, 113, 77) 160 113 77

Here is a plot of the rgb values from all of the data. I thought it looked cool to visualise all of the shades in 3d!

  • Code from Bing Chat AI

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Extract the R, G, and B values from the DataFrame
R = df['r'].values
G = df['g'].values
B = df['b'].values

# Create a 3D plot using the Axes3D class
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Plot the R, G, and B values on the 3D plot
ax.scatter(R, G, B, c=np.array([R, G, B]).T / 255)

# Set the labels for the axes
ax.set_xlabel('r')
ax.set_ylabel('g')
ax.set_zlabel('b')

# Show the plot
plt.show()
../../_images/d11000a35207e1c4024defe99f630727fe02a832842c812807b3f5093cf8fb81.png

Here is the same data plot as above but we can rotate the figure, zoom, and use our cursor to get exact value points on the plot.

  • Help from Bing Chat AI

import plotly.graph_objects as go

# Convert hex color codes to RGB - help from LA Hansen
df['color'] = '#' + df['hex'].str.lstrip('0x')

# Create color scale
unique_colors = df['color'].unique()
color_scale = [(p, color) for p, color in zip(np.linspace(0, 1, len(unique_colors)), unique_colors)]

# Create scatter plot
fig = go.Figure(data=go.Scatter3d(
    x=df['r'],
    y=df['g'],
    z=df['b'],
    mode='markers',
    marker=dict(
        size=6,
        color=df['color'],                # set color to an array/list of desired values
        colorscale=color_scale,           # choose a colorscale
        opacity=0.8
    )
))

# tight layout - help from LA Yufei
# This code is to view the data at an angle specified (rotating plot)
name = 'eye = (x:1.5, y:0., z:0.)'
camera = dict(
    eye=dict(x=1.5, y=0, z=0.)
)

fig.update_layout(scene_camera=camera, title=name)
fig.show()

💡 Kmeans clustering#

  • Code from Bing Chat AI

from sklearn.cluster import KMeans

# Extract the RGB values
rgb = df[['r', 'g', 'b']]

# Define the number of clusters
k = 6

# Fit the K-means model
kmeans = KMeans(n_clusters=k, random_state=0).fit(rgb)

# Get the cluster labels
labels = kmeans.labels_

# Add the cluster labels to the original data
df['Cluster'] = labels



# Plot the clusters in 3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['r'], df['g'], df['b'], c=df['Cluster'], cmap='viridis')
ax.set_xlabel('Red')
ax.set_ylabel('Green')
ax.set_zlabel('Blue')
plt.show()
../../_images/a9f717ccd7c0a4cf2c1927860031676914f07797d2d3d0211cf81e84f11e0ae7.png

Clusters are grouped by points nearest to them. So, we could use the tooltip below to check that out if interested.

Below, the 2d plot of r vs b is displayed. We use :N for discrete nominal no ordering data that is classified as categorical. So, the cluster numbers are just labels in this case.

#Since this is the k means clustering in 2 dimension, I chose to work with red & blue only.
col0 = "r"
col1 = "b"

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(df[[col0,col1]])
arr = kmeans.predict(df[[col0, col1]])
df["cluster"] = arr

alt.Chart(df).mark_circle(size = 50).encode(
    x=col0,
    y=col1,
    color="cluster:N", 
    tooltip = ["L","H","S","V","rgb", "brand"]
)

Linear Regression#

I would like to generate a line of best fit for my data x axis L and y axis V. Here we are fitting a regression model to the data.

from sklearn.linear_model import LinearRegression
regg = LinearRegression()
regg.fit(df[["L"]], df["V"])
LinearRegression()

We use [0] to get the element inside the array.

regg.coef_[0] #coefficient/slope of line of best fit
0.009709348040742022
regg.intercept_ #y-intercept of line of best fit
0.14208414152087523
df["pred"] = regg.predict(df[["L"]])
df["pred"] # predicted Values
0      0.977088
1      1.035344
2      1.025635
3      0.996507
4      0.773192
         ...   
620    0.967379
621    0.947960
622    0.967379
623    0.938251
624    0.918832
Name: pred, Length: 613, dtype: float64
c7 = alt.Chart(df).mark_line(color="magenta").encode(
    x = "L",
    y = "pred"
)
c7

We use linear regression because we want to predict a continuous quantitative value, in this case the Alpha Values are our prediction.

c4 = alt.Chart(df).mark_circle().encode(
    x="L",
    y="V",
    color=alt.Color("brand", scale=alt.Scale(scheme="dark2")),
    tooltip = ["L", "H","V","S","brand"]
)
c4 + c7 #layering charts on top of each other

I think the line of best fit is a good fit for this data. Lightness increases, Value increases as well which makes sense since the slope is positive. As the lightness increases (closer to 100), then the the intensity of the strength of light increases (more opaque). Note that the prediction line doesn’t end at 1.0 or at 100 which are technically the maximum values possible for these V & L.

🌲 Decision Tree Classifier (Feature Engineering)#

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(max_leaf_nodes=5)
cols = ["L", "H"]
clf.fit(df[cols], df["product"])
DecisionTreeClassifier(max_leaf_nodes=5)

I am using 5 leaf nodes, so the data is being divided into 5 products in this case. So, the machine is trying to classify which product name it is based on the H and L values as you can see in the plot below.

import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
fig = plt.figure(figsize=(20,10))
_ = plot_tree(clf, feature_names=clf.feature_names_in_, class_names=clf.classes_, filled=True)
../../_images/2ebeccb478d99c27cd59088b4f85c2b182e03f767d96d6a85ea945e1291262b5.png

Here we are splitting the data into testing and training sets. We use a train set for training the model while the test set is used to evaluate new unseen data. We do this to see how well the model can perform on new unseen data instead of just memorizing the data points which we do not want. Here, we use double brackets as our inputs since we need a 2D array for the inputs and our output/prediction is the product. I chose to have the train size be 20% of data.

X_train, X_test, y_train, y_test = train_test_split(df[["L","H"]], df["product"], train_size = 0.2)
clf.score(X_test,y_test)
0.1384928716904277

The score is the accuracy score for the model’s predictions. We see the score here for both the testing and training data.

clf.score(X_train,y_train)
0.11475409836065574

Here are the unique values in the product column. In other words, here is the sum of all the different product names.

len(df["product"].unique())
37
1/37
0.02702702702702703

We compare the test and training values to 1 / the number of product categories (37). So, 1/37 would be the probability of choosing one product. Since the test and training values are similar, there doesn’t seem to be any signs of overfitting. So, we have a good number of leaf nodes in this case 5.

clf.classes_ #All of the product names 
array(['#1 CAKE MIX', 'ColorStay', 'Diorskin Forever', 'Double Wear',
       'Fit Me', 'Fit Me Matte', 'Flawless Finish', 'Foundation',
       'Foxy Finish', 'Full Coverage', 'Hi - Def', 'Infalliable',
       'Invisible Finish', 'Liquid Foundation', 'Make-Up Ecostay',
       'Matifying Fluid', 'Matte Wear', 'Oil Free', "PRO FILT'R",
       'Perfect Match', 'Petal Skin Fluid', 'Photo Perfect', 'RMK Liquid',
       'SKINgenius', 'Second to None', 'Secret Skin Maker Zero',
       'Skin Long-Wear', 'Studio Fix', 'Synchro Skin', 'Teint Idole',
       'The Foundation', 'True Color', 'True Match', 'Ultra HD',
       'Velvet Matte', 'X Factor', 'barePRO'], dtype=object)

Plots#

Below, I am displaying the data in an altair chart with the color to be pred2, the x axis to be H, and the y axis to be L. I am using 5 leaf nodes here.

df_art = pd.DataFrame(df, columns=cols)
df_art["product"] = df["product"]
df_art["pred2"] = clf.predict(df_art[cols])

c5= alt.Chart(df_art).mark_circle(size=50).encode(
    x = "H",
    y = "L",
    color=alt.Color("pred2", scale=alt.Scale(scheme="category20")),
    tooltip=["L", "H","product", "pred2"]
)
c5

From this chart above, I am predicting the product name based on lightness and hue values. We use classification to predict a discrete class label and categorize. Using the tooltip, we can see the true product names on the graph and the color code of the predicted product names. So, we are able to see points that it predicted correctly and incorrectly. We can clearly see the decision boundaries of each product.

  • Repeating the same steps as above but with 100 leaf nodes to see a difference in the chart.

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(max_leaf_nodes=100)
cols = ["L", "H"]
clf.fit(df[cols], df["product"])
X_train, X_test, y_train, y_test = train_test_split(df[["L","H"]], df["product"], train_size = 0.2)
df_art = pd.DataFrame(df, columns=cols)
df_art["product"] = df["product"]
df_art["pred3"] = clf.predict(df_art[cols])

Since I am using 100 leaf nodes, we can see that there are more groups since we include more products and a larger leaf node value. So, the decision boundaries are less clear with the 100 leaf nodes since there are so many groups.

c5= alt.Chart(df_art).mark_circle(size=50).encode(
    x = "H",
    y = "L",
    color=alt.Color("pred3", scale=alt.Scale(scheme="category20")),
    tooltip=["L", "H","product", "pred3"]
)

c5

Key Takeaways#

It is important to note that there are brands that have 40 shades, however it is crucial to see how the shades are distributed across the lightness ranges. These images are from the article that this data was collected for which is referenced below.

#(extra)
from PIL import Image
im = Image.open('Fenty vs Makeup Forever.png')
ag = Image.open('US Bestsellers.png')
im.show()
ag.show()
../../_images/c20442b5355b1ef964ba099e74cd5f82c4dd159bf7f3e01df79edc5b62b91b3a.png ../../_images/043ca5de98a5a2a97f536f2883ea524ae32698c2efbcd3ca4f941c12f7483031.png

The Ultimate Conclusion 😎#

In terms of the data, I realized that the highest amount of popular bestselling brands mentioned above are really good in terms of actual representation of a variety of shade ranges as they have the most shade ranges. Are all brands inclusive of all shade ranges? No, as there are brands who do not have a wide range of shades - just because brands claim to have “dark and light” shades, that doesn’t mean all/many shade ranges are acually covered and we could see this from the analysis above. I was able to get some kind of insight as to answering my main questions. Technically, I would need more deeper complex analysis to fully answer my questions more specifically. I liked having guiding questoins which directed my project and kept me motivated.

Even though the machine’s predictions, testing, and training were nothing special, for next time I think there should be more variables that might make it all really complicated. I think this would require more complex code than what we learned in Math10, so I did not want to overcomplicate. I honestly had no idea that analysing colors had so many components so it got complicated to understand at times, but I learned a lot about code and colors which was awesome.

Honestly, I learned a lot about data science through this course and I learned how data science can apply to anything for instance AI websites like ChatGPT or Bing Chat. I also learned a lot about colors and hex codes which I had no clue about before. It is awesome how machines can predict things. I really learned a lot in this class and this class inspired me to go into the data science concentration!

References#

  • What is the source of your dataset(s)?

I found my dataset on Kaggle. https://www.kaggle.com/datasets/shivamb/makeup-shades-dataset

  • List any other references that you found helpful.

Submission#

Using the Share button at the top right, enable Comment privileges for anyone with a link to the project. Then submit that link on Canvas.

Created in deepnote.com Created in Deepnote