These examples will use the heart attack data which comes with this description of its variables:

Heart Attack Patients This set of data is all of the hospital discharges in New York State with an admitting diagnosis of an Acute Myocardial Infarction (AMI), also called a heart attack, who did not have surgery, in the year 1993. There are 12,844 cases. AGE gives age in years SEX is coded M for males F for females DIAGNOSIS is in the form of an International Classification of Diseases, 9th Edition, Clinical Modification code. These tell which part of the heart was affected. DRG is the Diagnosis Related Group. It groups together patients with similar management. In this data set there are just three different drgs. 121 for AMIs with cardiovascular complications who did not die. 122 for AMIs without cardiovascular complications who did not die. 123 for AMIs where the patient died. LOS gives the hospital length of stay in days. DIED has a 1 for patients who died in hospital and a 0 otherwise. CHARGES gives the total hospital charges in dollars. Data provided by Health Process Management of Doylestown, PA.

This is a very large data set and so is provided as zip files. (You may need a program such as winzip to unzip them). Available are plain text (with tabs separating entries) and Excel versions of the data.

Getting tables into R is a bit complicated so use this
file which contains only the data on the DIED variable. Save it on
your hard drive in the directory where the R program is located. If you
name the file `DIED4R.txt`, you can use this R command to input
the data

> died = scan(file="DIED4R.txt") Read 12844 items

This puts the data into a variable called "died". Use `table`
on this variable to get counts if you do not already have them.

> table(died) died 0 1 11434 1410

1410 of the patients died. A single command gives confidence intervals
and tests any hypothetical *p*_{0} specified. Here we
compare this proportion to a (hypothetical) usual mortality rate of 10%.
Ignore the `X-squared` value and use the *p*-value for a
hypothesis test.

> prop.test(1410,12844,p=0.1) 1-sample proportions test with continuity correction data: 1410 out of 12844, null probability 0.1 X-squared = 13.5385, df = 1, p-value = 0.0002337 alternative hypothesis: true p is not equal to 0.1 95 percent confidence interval: 0.1044507 0.1153421 sample estimates: p 0.1097789

Getting R to read a table containing more than one variable is more complicated but you will need to learn how to do this eventually.

©2006-2007 Robert W. Hayden