# Two-Way Analysis of Variance with R

Two-Way Analysis of Variance (ANOVA) is a technique for studying the relationship between a quantitative dependent variable and two qualitative independent variables. Usually we are interested in whether the level of the dependent variable differs for different values of the qualitative variables. We will use as an example data from a student project reported in Stats: Data and Models (2nd ed.), by De Veaux, Velleman and Bock, Addison-Wesley, 2008, Chapter 29, Exercise 11. The student was interested in her success at basketball free throws. This study investigated whether there was any relationship between the quantitative variable "number of shots 'made' (i.e., successfully completed out of 50 tries)" and two qualitative variables "Time of Day" and "Shoes Worn". ANOVA is commonly used with experimental studies and that is the case here. You can find the data at our site as a plain text file and as an Excel spreadsheet. Download the text file now and save it to the directory where you installed R.

R can read data from a text file. The text file has to be in the form of a table with columns representing variables. All columns must be the same length. Missing data must be signified by "NA". Optionally, the first row of the file may contain names for the variables. To use the file you just downloaded in R you must define a variable to be equal to the contents of this file.

The argument header=TRUE tells R that the first row of the file should be interpreted as variable names. (There must be a name for every variable and the names must not have spaces in them). You can now get a table of contents for what you have created in R with

> objects()

This should return baskball along with any other variables you may have created. You will not see on this list any of the variables that are inside of baskball because they are hiding. To see them, type

To bring them out of hiding, you must attach them to your R workspace.

Then you can work with them providing you remember that R is case-sensitive.

 Time Shoes Made Morning Others 25 Morning Others 26 Night Others 27 Night Others 27 Morning Favorite 32 Morning Favorite 22 Night Favorite 30 Night Favorite 34 Morning Others 35 Morning Others 34 Night Others 33 Night Others 30 Morning Favorite 33 Morning Favorite 37 Night Favorite 36 Night Favorite 38

We can compare the two times or the two shoes by looking at summary statistics or at parallel boxplots. To get the means for each level of each factor, use R's tapply command. This takes three arguments: the data you wish to summarize, the factor that determines the groups, and the function you wish to apply to each of the groups.

Morning   Night
30.500  31.875
Favorite   Others
32.750   29.625

Comparing the two sets of means, it looks like she does better at night and in her favorite shoes. But that could just be due to natural variability. We can check with ANOVA. We prefer to start with a model including interaction. R is a bit roundabout. We first run the ANOVA, store the results in a variable, and then generate a summary of those results.

> int <- aov(Made ~ Time*Shoes)
> summary(int)
Df  Sum Sq Mean Sq F value Pr(>F)
Time         1   7.562   7.562  0.3441 0.5684
Shoes        1  39.062  39.062  1.7773 0.2072
Time:Shoes   1  18.062  18.062  0.8218 0.3825
Residuals   12 263.750  21.979

The p-value for the interaction term of 0.3825 suggests we do not have to worry about interaction so repeat the process but with a simple additive model.

> noint <- aov(Made~Time + Shoes)
> summary(noint)
Df  Sum Sq Mean Sq F value Pr(>F)
Time         1   7.562   7.562  0.3489 0.5649
Shoes        1  39.062  39.062  1.8020 0.2024
Residuals   13 281.812  21.678

Unfortunately the p-values for both variables in both models are quite large, suggesting that any effect we saw could well have been due to chance. However, there is an alternative interpretation: with just 16 observations, we will only be able to detect a fairly large difference. It appeared the shoes made a difference of about 3 successes in 50 tries. If that is enough of a difference to matter in practice, we might repeat the experiment with more trials. Before we do that, though, we might make some displays to see if data of this sort matches the assumptions of ANOVA. No sense gathering more if it does not;-)