We will work with data on the fat and protein content of items on the
Burger King menu. The data are in a file named BKmenu.txt.
Double-clicking on this should start a text editor (usually Notepad in
Windows) with the data loaded. You will probably find it hard to work
with. One strategy is to select everything and paste it into an empty
Excel spreadsheet. Excel will usually separate it into columns and you can
cut and paste one column at a time into the R data editor or use the
scan() function. We will assume you have found some way to get the data
into R. The command to get the regression equation and related information
is not what you would expect. Here `fat` was the dependent
variable and `protein` the independent variable. `lm`
stands for linear model. By itself, it just returns the slope and
intercept. The `summary` command tells it to return the summary
table below.

> summary(lm(formula = fat ~ protein)) Call: lm(formula = fat ~ protein) Residuals: Min 1Q Median 3Q Max -11.726 -8.772 1.239 7.029 20.052 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.4113 2.6466 2.423 0.0217 * protein 0.9769 0.1212 8.057 5.4e-09 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 9.311 on 30 degrees of freedom Multiple R-Squared: 0.6839, Adjusted R-squared: 0.6734 F-statistic: 64.92 on 1 and 30 DF, p-value: 5.402e-09

There is much more here than you really need so we have put the basic
information in red. You can read off the regression equation as fat =
6.4113 + 0.9769*protein. R^{2} = 0.6839 =
68.39% and s_{e} = 9.311. The bad news is that fat tends to go up
when protein content goes up (positive slope). The good news is that there
is lots of scatter (R^{2} = 68.39% ) and
so you may find exceptions.

You can also compute or graph the residuals.

> residuals(lm(formula = fat ~ protein)) 1 2 3 4 5 6 4.2599876 7.3757304 3.6998103 6.8155532 -9.9946466 -7.9483495 7 8 9 10 11 12 -10.6937153 -6.6011210 -6.5316753 -10.0872409 8.1442448 -10.7168638 13 14 15 16 17 18 6.9127591 -8.1566866 7.6812735 5.6812735 -0.2492808 -2.1335380 19 20 21 22 23 24 5.6349763 11.0285020 2.7275706 20.0516505 15.8896106 7.7275706 25 26 27 28 29 30 10.7275706 6.6349763 -10.1335380 -11.1103894 -10.1335380 -6.4113208 31 32 -8.3650237 -11.7261323 > plot(protein,residuals(lm(formula = fat ~ protein)))

These look reasonably random but not clumped around zero. Instead there seems to be a group of residuals around 10 and another around -10.

If you plan to do much with the residuals, you may wish to store them in
a variable for further work. For example, here they are stored in a
variable `res` and then a histogram is made.

> res = residuals(lm(formula = fat ~ protein)) > hist(res)

The histogram is not reproduced here but it shows signs of the bimodality mentioned above.

© 2006 Robert W. Hayden