McDonald’s Menu Analysis¶

Author: Jenny Tran

Course Project, UC Irvine, Math 10, W22

Introduction¶

McDonald’s is one of the most popular fast food chains across the United States known for their affordable and unhealthy foods and beverages. We will use the McDonald’s Nutrition Facts dataset to find which food item and category appears to be most healthy and unhealthy.

We will define healthy foods as something with the most proteins, least calories, and etc. We will define unhealthy foods as items with the most sugar, calories, least proteins, and etc.

Main portion of the project¶

import numpy as np
import pandas as pd
import seaborn as sns
import altair as alt
import plotly.express as px
import plotly.offline as py
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, log_loss

Import the Data¶

df = pd.read_csv("menu.csv")
df

	Category	Item	Serving Size	Calories	Calories from Fat	Total Fat	Total Fat (% Daily Value)	Saturated Fat	Saturated Fat (% Daily Value)	Trans Fat	...	Carbohydrates	Carbohydrates (% Daily Value)	Dietary Fiber	Dietary Fiber (% Daily Value)	Sugars	Protein	Vitamin A (% Daily Value)	Vitamin C (% Daily Value)	Calcium (% Daily Value)	Iron (% Daily Value)
0	Breakfast	Egg McMuffin	4.8 oz (136 g)	300	120	13.0	20	5.0	25	0.0	...	31	10	4	17	3	17	10	0	25	15
1	Breakfast	Egg White Delight	4.8 oz (135 g)	250	70	8.0	12	3.0	15	0.0	...	30	10	4	17	3	18	6	0	25	8
2	Breakfast	Sausage McMuffin	3.9 oz (111 g)	370	200	23.0	35	8.0	42	0.0	...	29	10	4	17	2	14	8	0	25	10
3	Breakfast	Sausage McMuffin with Egg	5.7 oz (161 g)	450	250	28.0	43	10.0	52	0.0	...	30	10	4	17	2	21	15	0	30	15
4	Breakfast	Sausage McMuffin with Egg Whites	5.7 oz (161 g)	400	210	23.0	35	8.0	42	0.0	...	30	10	4	17	2	21	6	0	25	10
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
255	Smoothies & Shakes	McFlurry with Oreo Cookies (Small)	10.1 oz (285 g)	510	150	17.0	26	9.0	44	0.5	...	80	27	1	4	64	12	15	0	40	8
256	Smoothies & Shakes	McFlurry with Oreo Cookies (Medium)	13.4 oz (381 g)	690	200	23.0	35	12.0	58	1.0	...	106	35	1	5	85	15	20	0	50	10
257	Smoothies & Shakes	McFlurry with Oreo Cookies (Snack)	6.7 oz (190 g)	340	100	11.0	17	6.0	29	0.0	...	53	18	1	2	43	8	10	0	25	6
258	Smoothies & Shakes	McFlurry with Reese's Peanut Butter Cups (Medium)	14.2 oz (403 g)	810	290	32.0	50	15.0	76	1.0	...	114	38	2	9	103	21	20	0	60	6
259	Smoothies & Shakes	McFlurry with Reese's Peanut Butter Cups (Snack)	7.1 oz (202 g)	410	150	16.0	25	8.0	38	0.0	...	57	19	1	5	51	10	10	0	30	4

260 rows × 24 columns

df['Category'].value_counts()

Coffee & Tea          95
Breakfast             42
Smoothies & Shakes    28
Chicken & Fish        27
Beverages             27
Beef & Pork           15
Snacks & Sides        13
Desserts               7
Salads                 6
Name: Category, dtype: int64

print(f"Coffee & Tea has the most items with a total of {(df['Category'] == 'Coffee & Tea').sum()}")

Coffee & Tea has the most items with a total of 95

KNeighborsClassifier¶

Use KNeighborClassifier to predict the Category using Calories and Sodium.

clf = KNeighborsClassifier(n_neighbors=4)

X = df[["Calories", "Sodium"]]
y = df['Category']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.6)

clf.fit(X_test,y_test)

KNeighborsClassifier(n_neighbors=4)

df['Prediction'] = clf.predict(X)

Altair¶

Use Altair to plot.

c1 = alt.Chart(df).mark_circle().encode(
    x = "Sodium",
    y = "Calories",
    color = 'Category',
    tooltip = 'Item'
)

c2 = alt.Chart(df).mark_circle().encode(
        x = "Sodium",
        y = "Calories",
        color = 'Prediction',
        tooltip = 'Item'
    )

c1 | c2

The 1st graph shows the actual scatterplot and the 2nd graph shows the predicted scatterplot. There is a lot more Breakfast foods that are shown in the predicted scatterplot. In both scatterplots, it looks like we have a positive correlation between Sodium and Calories. We can also see that the drinks and beverages have the least sodium.

Next, we will find the number of neighbors (k) that will give us the best fit graph.

for k in range(10,50):
    clf = KNeighborsClassifier(n_neighbors=k)
    clf.fit(X_train, y_train)
    loss = log_loss(y_test, clf.predict_proba(X_test))
    print(k)
    print(loss)

10
3.3120738960940423
11
3.097198769819272
12
2.8921496270330396
13
2.901655283064176
14
2.2934615215939638
15
2.303168029898779
16
2.31457644369687
17
2.3282185367836825
18
2.328052254278475
19
2.340009135159213
20
2.142756164151032
21
1.9468874856039895
22
1.940569861791783
23
1.742740277377538
24
1.7482270386068899
25
1.7578548376770478
26
1.7581303091387004
27
1.7656781842106335
28
1.7694041997904968
29
1.5820999079470608
30
1.594377309250326
31
1.6011036914647103
32
1.6121611682957064
33
1.6228389634423437
34
1.6386336728933297
35
1.6504111525039924
36
1.6558699120306624
37
1.6631468941999572
38
1.6721948474042325
39
1.6843839877483247
40
1.6916566327585547
41
1.4990591506581366
42
1.5111139169286842
43
1.5241539302123701
44
1.535250593641166
45
1.5451405212684357
46
1.557352058251086
47
1.5662835247323774
48
1.5776393206201005
49
1.3904371313084163

The log_loss is the smallest when k=45. This means we will have the best fitted graph when we set n_neighbors equal to 45. (This number may change if we run the notebook multiple times. The ideal k number should be between 35-50.)

Plotly¶

Create a bar chart using Plotly. This bar chart will show us the food category and the amount of average proteins and sugars each category has.

avgprotein = pd.DataFrame(df.groupby('Category')['Protein'].mean())

fig = px.bar(df, x=avgprotein.index, y=avgprotein["Protein"])
fig.show()

Chicken & Fish has the most average protein. Lets find what food item in this category has the most protein.

chifi = df[df['Category'] == 'Chicken & Fish']
chifi = chifi[['Item', 'Protein']]
chifi.sort_values(by=['Protein'], ascending=False)

	Item	Protein
82	Chicken McNuggets (40 piece)	87
81	Chicken McNuggets (20 piece)	44
60	Premium Grilled Chicken Club Sandwich	40
64	Bacon Clubhouse Grilled Chicken Sandwich	40
59	Premium Crispy Chicken Club Sandwich	36
62	Premium Grilled Chicken Ranch BLT Sandwich	36
63	Bacon Clubhouse Crispy Chicken Sandwich	36
71	Premium McWrap Chicken & Bacon (Grilled Chicken)	36
70	Premium McWrap Chicken & Bacon (Crispy Chicken)	32
61	Premium Crispy Chicken Ranch BLT Sandwich	32
75	Premium McWrap Southwest Chicken (Grilled Chic...	31
73	Premium McWrap Chicken & Ranch (Grilled Chicken)	30
58	Premium Grilled Chicken Classic Sandwich	28
77	Premium McWrap Chicken Sweet Chili (Grilled Ch...	27
74	Premium McWrap Southwest Chicken (Crispy Chicken)	27
72	Premium McWrap Chicken & Ranch (Crispy Chicken)	27
57	Premium Crispy Chicken Classic Sandwich	24
76	Premium McWrap Chicken Sweet Chili (Crispy Chi...	23
67	Bacon Cheddar McChicken	22
80	Chicken McNuggets (10 piece)	22
65	Southern Style Crispy Chicken Sandwich	21
68	Bacon Buffalo Ranch McChicken	20
83	Filet-O-Fish	15
69	Buffalo Ranch McChicken	14
66	McChicken	14
79	Chicken McNuggets (6 piece)	13
78	Chicken McNuggets (4 piece)	9

The Chicken McNuggets (40 Pieces) have the most protein followed by the Chicken McNuggets (20 Pieces).

Next, we will check the average sugar of each category.

avgsugar = pd.DataFrame(df.groupby('Category')['Sugars'].mean())

fig = px.bar(df, x=avgsugar.index, y=avgsugar["Sugars"])
fig.show()

Smoothies & Shakes has the most average sugar. Lets find what item has the most sugar in this category.

smsh = df[df['Category'] == 'Smoothies & Shakes']
smsh = smsh[['Item', 'Sugars']]
smsh.sort_values(by=['Sugars'], ascending=False)

	Item	Sugars
253	McFlurry with M&M’s Candies (Medium)	128
246	Strawberry Shake (Large)	123
249	Chocolate Shake (Large)	120
251	Shamrock Shake (Large)	115
258	McFlurry with Reese's Peanut Butter Cups (Medium)	103
243	Vanilla Shake (Large)	101
245	Strawberry Shake (Medium)	100
248	Chocolate Shake (Medium)	97
250	Shamrock Shake (Medium)	93
252	McFlurry with M&M’s Candies (Small)	89
256	McFlurry with Oreo Cookies (Medium)	85
242	Vanilla Shake (Medium)	81
244	Strawberry Shake (Small)	79
247	Chocolate Shake (Small)	77
240	Mango Pineapple Smoothie (Large)	72
234	Blueberry Pomegranate Smoothie (Large)	70
237	Strawberry Banana Smoothie (Large)	70
255	McFlurry with Oreo Cookies (Small)	64
241	Vanilla Shake (Small)	63
254	McFlurry with M&M’s Candies (Snack)	59
239	Mango Pineapple Smoothie (Medium)	56
236	Strawberry Banana Smoothie (Medium)	54
233	Blueberry Pomegranate Smoothie (Medium)	54
259	McFlurry with Reese's Peanut Butter Cups (Snack)	51
238	Mango Pineapple Smoothie (Small)	46
235	Strawberry Banana Smoothie (Small)	44
232	Blueberry Pomegranate Smoothie (Small)	44
257	McFlurry with Oreo Cookies (Snack)	43

The McFlurry with M&M’s Candies (Medium) has the most sugar.

Next, we will find which category has the most total fats.

avgfat = pd.DataFrame(df.groupby('Category')['Total Fat'].mean())

fig = px.bar(df, x=avgfat.index, y=avgfat["Total Fat"])
fig.show()

The Beef & Pork, Breakfast, and Chicken & Fish categories contain the most average total fat.

Correlation between Nutrients (Vitamins, Iron, Fat, etc.)¶

dailyper = df[['Vitamin A (% Daily Value)','Vitamin C (% Daily Value)','Calcium (% Daily Value)',
      'Iron (% Daily Value)','Total Fat (% Daily Value)',
      'Cholesterol (% Daily Value)','Carbohydrates (% Daily Value)']]

dailyper.corr()

	Vitamin A (% Daily Value)	Vitamin C (% Daily Value)	Calcium (% Daily Value)	Iron (% Daily Value)	Total Fat (% Daily Value)	Cholesterol (% Daily Value)	Carbohydrates (% Daily Value)
Vitamin A (% Daily Value)	1.000000	0.069171	0.179190	0.137879	0.054038	0.080059	0.083376
Vitamin C (% Daily Value)	0.069171	1.000000	-0.215380	0.001292	-0.089353	-0.083315	-0.035450
Calcium (% Daily Value)	0.179190	-0.215380	1.000000	0.034149	0.162031	0.132382	0.590263
Iron (% Daily Value)	0.137879	0.001292	0.034149	1.000000	0.735478	0.653167	0.210643
Total Fat (% Daily Value)	0.054038	-0.089353	0.162031	0.735478	1.000000	0.680378	0.460298
Cholesterol (% Daily Value)	0.080059	-0.083315	0.132382	0.653167	0.680378	1.000000	0.270992
Carbohydrates (% Daily Value)	0.083376	-0.035450	0.590263	0.210643	0.460298	0.270992	1.000000

From this we can see that Cholesterol and Iron is positively correlated with Total Fat (meaning the more Cholesterol and Iron, the more Total Fat we have). Ideally, we want foods with less Cholesterol and Total Fat, and more Iron. Instead, we may want to choose foods with high Carbohydrates as they are next to be positively correlated with Iron after Total Fat and Cholesterol.

Salads¶

Salads are known to be very healthy. Lets find which salad contains the most dietary fibers.

salads = df[df['Category']=='Salads']
salads

	Category	Item	Serving Size	Calories	Calories from Fat	Total Fat	Total Fat (% Daily Value)	Saturated Fat	Saturated Fat (% Daily Value)	...	Carbohydrates (% Daily Value)	Dietary Fiber	Dietary Fiber (% Daily Value)	Sugars	Protein	Vitamin A (% Daily Value)	Vitamin C (% Daily Value)	Calcium (% Daily Value)	Iron (% Daily Value)	Prediction
84	Salads	Premium Bacon Ranch Salad (without Chicken)	7.9 oz (223 g)	140	70	7.0	11	3.5	18	...	3	3	12	4	9	170	30	15	6	Coffee & Tea
85	Salads	Premium Bacon Ranch Salad with Crispy Chicken	9 oz (255 g)	380	190	21.0	33	6.0	29	...	7	2	10	5	25	100	25	15	8	Chicken & Fish
86	Salads	Premium Bacon Ranch Salad with Grilled Chicken	8.5 oz (241 g)	220	80	8.0	13	4.0	20	...	3	2	10	4	29	110	30	15	8	Snacks & Sides
87	Salads	Premium Southwest Salad (without Chicken)	8.1 oz (230 g)	140	40	4.5	7	2.0	9	...	7	6	23	6	6	160	25	15	10	Beverages
88	Salads	Premium Southwest Salad with Crispy Chicken	12.3 oz (348 g)	450	190	22.0	33	4.5	22	...	14	7	28	12	23	170	30	15	15	Chicken & Fish
89	Salads	Premium Southwest Salad with Grilled Chicken	11.8 oz (335 g)	290	80	8.0	13	2.5	13	...	9	7	28	10	27	170	30	15	15	Snacks & Sides

6 rows × 25 columns

fig = px.pie(salads, values='Dietary Fiber', names='Item',
             title='Dietary Fiber in Salads',)
fig.show()

The Southwest Salad overall has the most dietary fibers with the highest being the Premium Southwest Salad with Crispy Chicken.

Summary¶

There are multiple ways to interpret the results depending what diet someone chooses to use. For a high protein diet, it’s best to choose food items within the Chicken & Fish Category. For a low calorie diet, it’s best to avoid foods with high sodium since they positively correlate with calories. This includes items within the Breakfast and Chicken & Fish Categories. Overall, Smoothies & Shakes should be avoided as they contain an overwhelming amount of sugar compared to the other categories. It’s hard to find foods with high protein and iron with low total fat, cholesterol, and sodium. The Premium Southwest Salad is probably the best item to choose for an overall healthy diet. Salads, overall, didn’t contain a lot of sugar and fat and had a good amount of protein, and this salad in particular had a lot of dietary fibers.