> 5747 (%bjbjUU "477(!l>>>>>>>RR
s
u
u
u
u
u
u
$ V
>
>>
>>s
s
g
>>g
ZORBmg
g
0
g
vg nvg
RR>>>>Independent Samples vs Paired Samples
By Bob Hayden with help from Rachel Braun
In each of these situations
a) decide whether this is an independent sample or paired data design
b) describe a reasonable variable to measure
c) fill in the details for an appropriate experimental design or sampling plan
1. To test the effect of background music on productivity, workers at a teeshirt factory are observed. For one month they had no music. For another month they had background music. Each worker's teeshirt output is recorded.
2. A random sample of ten workers in Plant A is to be compared to a sample of ten workers in Plant B to assess absenteeism due to illness.
3. A new reading course was tried on ten women. Each student's reading speed was measured before the course, and again after attending the course for eight weeks.
4. To compare the average weight gain of pigs fed two different rations, nine pairs of pigs were used. The pigs in each pair were littermates.
5. To test the effects of a new fertilizer, 100 plots are treated with Fertilizer A, and 100 plots are treated with Fertilizer B.
6. A sample of upperclassmen at yourschool is taken. We wish to compare the average number of hours of sleep of juniors and seniors.
7. A new fertilizer is tested on 100 plots. Each plot is divided in half. Fertilizer A is applied to one half and B to the other.
8. Consumers Union wants to compare two types of calculators. They get 100 volunteers and ask them to carry out a series of 50 routine calculations (such as figuring discounts, sales tax, totaling a bill, etc.). Each calculation is done on each type of calculator, and the time required for each calculation is recorded.
Answer Key
By Rachel Braun with help from Bob Hayden
1. a) This is a paired design; each worker's output will be measured twice, with and without music.
b) Output can be measured as number of teeshirts sewn. To control for absenteeism and for different number of working days per month, the variable might be measured as teeshirts sewn per day.
c) Since there is matching, each worker serves as his or her own control in factors that might affect production, such as sewing skills and working speed. Because teeshirt output might increase with experience, regardless of presence of music, work stations are randomized so that some work stations have music the first month and others have music the second month. This might be accomplished by numbering the, say, ten stations (eg, 0 to 9) and choosing the first 5 nonrepeating digits from a random number generator for the 'music first' group.
2. a) This is an independent samples design. Even though the same NUMBER of workers is chosen in each plant, there is no apparent effort to match them by years of experience, gender, etc.
b) A possible variable of interest is number of days absent due to illness over, say, a three month period. Days could be fractional; ie, if a worker leaves 2 hours early due to illness developing at work, that absence might count as 0.25 days.
c) To control for seasonal effects in illness (e.g., flu season), the same three months are chosen for each plant. Employee rolls are numbered, and ten employees from each plant are chosen using a random number table. Employee time charts are then reviewed for the period of interest.
OPTIONS TO IMPROVE THE ESTIMATES: Stratified sampling could be used with employee rolls. We might first stratify by age groups (e.g., 1829, 3039, 4049, etc.) and/or by gender with strata separately sampled. This is an observational study, NOT an experiment.
3. a) This is a paired design; each woman's reading speed will be measured twice, before and after the course.
b) The variable of interest might be number of words read in a 10 minute period.
c) We measure reading speed before and after the course, and subtract (after before) to get gain in reading speed. Average gain might then be compared to results claimed by other reading courses. (The women would have to be volunteers, leading us to wonder whether they might be particularly motivated to learn to read quickly, contaminating the results. The alternative would be solicit a larger sample, then randomize them to two groups, the new speed reading course and an existing speed reading course.)
4. a) This is a paired design; the pigs are in pairs by litter.
b) A possible variable of interest is number of ounces gained after, say, one week on the ration.
c) We measure the pigs' weights before and after the week on their ration, and subtract to get the gain in weight. The pigs are paired by litter. (If more than one pair of pigs is taken from the same litter, the pigs are paired by first listing the litter in weight order and working down the list two pigs at a time). A pig from each pair is selected, and a coin is tossed to determine whether the ration will be type A or type B. The remaining pig is fed the other ration.
5. a) This is an independent samples design; two distinct samples of plots are taken, and no attempt is made to match individual plots across the two samples.
b) The variable of interest might be bushels of product, assuming all plots are of equal size.
c) If a field is easily divided into 200 plots, they could be numbered (000 to 199) and fertilizer A assigned to the first 100 nonrepeating eligible numbers. While simple to describe, this design might be difficult to put into practice. A more workable design might be developed where there are fewer distinct plots, but care is taken to block on factors that might affect productivity (access to ground water, presence of stone cropping, amount to sunlight), and randomization of fertilizer to plot happens within the blocks.
6. a) This is an independent samples design; two distinct samples (juniors, seniors) are taken, and no attempt is made to match students across the samples based on variables we may have wished to control (courseload, participation in athletics, afterschool job).
b) A possible variable of interest is number of hours of sleep per week or perhaps per school night, on average.
c) The data would be collected by selfreporting, so just as with government Consumer Expenditure Surveys, students could only participate in the study for a short time (a week, perhaps) or they will start to modify their sleep habits as they become aware of being observed. All subjects should record their data the same week, to avoid seasonal differences (proximity to midterms, school plays, changeover of sports seasons, etc.) In this case, a random sample of students from each class is taken from school rolls. This could be done as a systematic random sample take a random start at the beginning of the list, and take every kth student thereafter.
7. a) This is a paired design. Each fertilizer is tried on the same plot, which has been subdivided.
b) The difference in bushels harvested between the two halves of each plot could be measured. Note: this will result in 100 distinct data points, one for each plot. The design in #5 resulted in 200 distinct pieces of information (more data but a larger variance).
c) Each of the 100 plots is divided along the same gradient (eg, north/south). One of the halves is selected, and a coin is flipped: heads for fertilizer A, tails for B. The remaining half gets the other treatment.
NOTE: This design is superior to the design in #5, as by splitting each plot we control for some of the factors already suggested (water, stones, sun) and more importantly, for factors we've neglected or can't imagine.
8. a) This is a paired design. Each volunteer performs the calculations on the two calculators. In this way, we have controlled for differences in individuals' accuracy in punching calculator keys.
b) Number of seconds per task is recorded for each of the 50 routine calculations. Particular tasks could be compared to see whether either calculator is better at the specific operation, or a composite score (or subscore) could be obtained by adding all or some of the 50 times.
c) Since subjects gain familiarity with the tasks (eg, with figuring a sales tax) with experience, it is important to randomize the order on which the calculators are tried, or differences in speed may be due simply to practicing and NOT to calculator type. This could be done by flipping a coin to see which calculator is used first.
'RT
!1 (%6CJH*OJQJaJ6CJOJQJaJ5CJOJQJaJ CJOJQJaJ5CJ OJQJaJ
&'RSTp56LMbco p
$\$a$\$$\$a$$\$a$(%
QEF$+,9$\$$\$a$$a$!1 !!!"#(%\$ 1h/ =!"#$%
i@@@Normal d\$CJ_HaJmH sH tH N`N Heading 1$$@&\$a$5CJOJQJaJ <A@<
Default Paragraph Font>B`> Body Text \$CJOJQJaJ(!4
&'RSTp56LMbcopQE
F
$
+,9$!1*!0000000000000000000000000000000000000000000000000000000000000(%
(%(%8@0(
B
S ?} mwsu
*!?Ksy I
J
_j014 &!*!33333333333333333333$%Q44Yq
q
t
3456h
h

*==AC
@J4689qy'!*!Robert W. HaydeneC:\Documents and Settings\Administrator\Application Data\Microsoft\Word\AutoRecovery save of Inde.asdRobert W. Hayden=C:\Documents and Settings\Administrator\My Documents\Inde.doc*!@'!'!~'!'!(!@UnknownGz Times New Roman5Symbol3&z ArialU.{ @CalibriCentury Gothic"qhCf#٦W
:zD#!20!}P2KXStevenRobert W. HaydenOh+'0`
(
4@HPXssSteventevNormal.dotRobert W. Haydend3beMicrosoft Word 9.0@ա@bʀ&@GW՜.+,0hp
:
!
Title
!"#%&'()*+./01236Root Entry FdO81TablevWordDocument"4SummaryInformation($DocumentSummaryInformation8,CompObjjObjectPooldOdO
FMicrosoft Word Document
MSWordDocWord.Document.89q