Inference for One Proportion in R

These examples will use the heart attack data which comes with this description of its variables:

Heart Attack Patients
This set of data is all of the hospital discharges in New York State 
with an admitting diagnosis of an Acute Myocardial Infarction (AMI), 
also called a heart attack, who did not have surgery, in the year 1993. 
There are 12,844 cases.

AGE gives age in years

SEX is coded M for males F for females

DIAGNOSIS is in the form of an International Classification of Diseases, 
9th Edition, Clinical Modification code. These tell which part of the 
heart was affected.

DRG is the Diagnosis Related Group. It groups together patients with 
similar management. In this data set there are just three different drgs.

121 for AMIs with cardiovascular complications who did not die.
122 for AMIs without cardiovascular complications who did not die.
123 for AMIs where the patient died.

LOS gives the hospital length of stay in days.

DIED has a 1 for patients who died in hospital and a 0 otherwise.

CHARGES gives the total hospital charges in dollars.

Data  provided by Health Process Management of Doylestown, PA.

This is a very large data set and so is provided as zip files. (You may need a program such as winzip to unzip them). Available are plain text (with tabs separating entries) and Excel versions of the data.

Getting tables into R is a bit complicated so use this file which contains only the data on the DIED variable. Save it on your hard drive in the directory where the R program is located. If you name the file DIED4R.txt, you can use this R command to input the data

> died = scan(file="DIED4R.txt")
Read 12844 items

This puts the data into a variable called "died". Use table on this variable to get counts if you do not already have them.

> table(died)
    0     1 
11434  1410 

1410 of the patients died. A single command gives confidence intervals and tests any hypothetical p0 specified. Here we compare this proportion to a (hypothetical) usual mortality rate of 10%. Ignore the X-squared value and use the p-value for a hypothesis test.

> prop.test(1410,12844,p=0.1)

        1-sample proportions test with continuity correction

data:  1410 out of 12844, null probability 0.1 
X-squared = 13.5385, df = 1, p-value = 0.0002337
alternative hypothesis: true p is not equal to 0.1 
95 percent confidence interval:
 0.1044507 0.1153421 
sample estimates:

Getting R to read a table containing more than one variable is more complicated but you will need to learn how to do this eventually.

©2006-2007 Robert W. Hayden