Comparing Means for Two Independent Samples in R

We will use data comparing lives of generic and brand name batteries that is the main example at the start of Chapter 24 of De Veaux, Velleman and Bock, Stats.: Data and Models 2nd ed., 2008, Addison Wesley, Boston.. If you have this book (or another by the same authors), run the ActivStats CD but instead of starting ActivStats click on the Datasets button at the lower left of the start-up window. Select the version of the text you have and go to the Text folder. Look among the Chapter 24 files. Life may be easier if you just select all the files and copy them to the directory where the R program is. (We will assume you have done that.) These files are tab-delimited text files meaning there are tab characters separating the columns of variables. (The only reason you need to know that is that the commands are different for inputting different kinds of files.) You can read these files in any text editor, such as Notepad, EditPad Lite, or Emacs. You can also read them in R with the read.delim command. You have to tell R what file to read so type file= followed by the file name (with path if the file is not in the R directory). Alternatively, you can type file=file.choose() and a fairly standard file-choosing dialog box will pop up and allow you to select the file. That is what we did but the selection process is invisible below.

> read.delim(file=file.choose())
   Times Battery.Type
1  194.0   Brand Name
2  205.5   Brand Name
3  199.2   Brand Name
4  172.4   Brand Name
5  184.0   Brand Name
6  169.5   Brand Name
7  190.7      Generic
8  203.5      Generic
9  203.5      Generic
10 206.5      Generic
11 222.5      Generic
12 209.4      Generic

Note that the lifetimes are in one variable and the type of battery in another. This is standard database format. If you do not have the CD, the dataset is small enough that you can just type it in from the above. You can abbreviate the categorical variable as B/G or even 0/1.

If you had looked at the data file in a text editor you might have noted that Battery Type got changed to Battery.Type. R does not like names with spaces in them. The spaces in "Brand Name" could also cause problems. Generally speaking, if you are setting up your own data, do not use names for files, values, variables, etc., that include spaces.

The command above just shows you what is in the file (and whether R can make any sense out of it). To do anything with the data, you have to read it into a data frame and then attach it to your workspace. (You do not need to know exactly what that means in order to do it.) We named the data frame bat.

> bat <- read.delim(file=file.choose(),header=TRUE)
> attach(bat)
> Times
 [1] 194.0 205.5 199.2 172.4 184.0 169.5 190.7 203.5 203.5 206.5 222.5 209.4
> ?t.test

Typing Times causes R to list the times, verifying that we can now access them. Typing ?t.test caused a help window (not shown here) to pop up with cryptic information on the t.test command.

> > t.test(formula = Times ~ Battery.Type)

        Welch Two Sample t-test

data:  Times by Battery.Type 
t = -2.5462, df = 8.986, p-value = 0.03143
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -35.097420  -2.069246 
sample estimates:
mean in group Brand Name    mean in group Generic 
                187.4333                 206.0167 

We are thinking that battery lifetimes may depend on the type of battery. After formula = we type the dependent variable, a "~", and the independent variable. (The tilde "~" separates dependent from independent variables in R.) The small p-value of 0.03143 suggests there may well be a difference and the 95% confidence interval contains the surprising news that it is not in the direction we might have expected! The generics seem better! This is clearer in a display, which also checks assumptions and conditions.

> boxplot(formula = Times ~ Battery.Type)

At least R syntax is consistent.

boxplots

The generics do indeed last longer. As for being normally distributed, this is about as good as we can expect with samples of size six.


©2008 statistics.com, portions ©2006-2007 Robert W. Hayden and used by permission.