Two by Two Tables Containing Counts (TwobyTwo)
This chapter provides the formulae and examples for
calculating crude and adjusted point estimates and confidence intervals for:
risk ratios and differences; odds ratios; incidence rate ratios and
differences; and etiologic and prevented fractions. The tests for interaction are
also presented. First, the
estimates from a single 2x2 table (“count” data) are
presented followed by estimates adjusted or summarized across stratified
data.
For a single 2x2 table, the
notation is as depicted in table 15-1.
The formulae for calculating the risk ratio, risk difference, and odds
ratio and their confidence intervals are shown
below. For the confidence intervals, the
Table 15-1. Notation
and table setup for a 2x2 table
|
Exposed |
Nonexposed |
|
Disease |
a |
b |
m1 |
No Disease |
c |
d |
m0 |
|
n1 |
n0 |
n |
Estimated risk
in the exposed =
Estimated risk in the
population =
The point and variance estimates and the confidence interval formulae are provided in Table 15.1. For some parameters there will not be a variance formula. The confidence limits for the Etiologic Fraction in the Exposed is based on the calculated upper lower bounds of the confidence limits for the risk ratio (RRUB and RRLB, respectively) with risk data. A similar approach is used when the Etiologic Fraction in the Exposed is based on the Odds Ratio.
There are many statistical tests that can be performed on a
single 2x2 table. Common tests include
the Chi-square test (corrected, uncorrected, and Mantel-Haenszel)
and exact tests (Fisher and mid-p exact).
In this chapter the uncorrected and Mantel-Haenszel chi-square tests will be presented; however, these
test should be used when the number of “expected” observations in each cell are
> 5. When the expected number
of observations in any cell is < 5, then one of the exact tests should be used. How
to calculate exact p-values is beyond the scope of this text and requires an
iterative calculation. The expected
number of observations in a cell is calculated by
multiplying the row and column total and dividing by the total sample size.
The uncorrected chi square is calculated as
and the Mantel Haenszel chi square as
Table
5.1. Estimates
and confidence intervals for epidemiologic parameters for a single table
Parameter |
Point
Estimate |
Variance
Estimate |
Confidence
Interval |
Parameters
based on risks (from randomized trials and cohort studies) or prevalences (cross-sectional studies) |
|||
Risk Ratio |
|
|
|
Risk Difference |
|
|
|
Etiologic Fraction in
the Population |
|
|
|
Etiologic Fraction in
the Exposed |
|
Based on variance
estimate for the RR |
LB= |
Prevented Fraction in
the Population |
|
|
|
Prevented Fraction in
the Exposed |
|
Based on variance estimate
for the RR |
LB= |
Parameters
based on the odds and odds ratio (from randomized trials, cohort studies,
case-control, or cross-sectional studies) |
|||
Odds Ratio |
|
|
|
Etiologic Fraction in
the Population |
|
|
|
Etiologic Fraction in
the Exposed |
|
Based on variance
estimate for the OR |
LB= |
Prevented Fraction in
the Population |
|
|
|
Prevented Fraction in
the Exposed |
|
Based on variance
estimate for the OR |
LB= |
LB=lower bound; UB=upper
bound
P’=…
To
work through an example of the calculations, a study was
performed in children 12-23.9 months of age. In this study, the prevalence of anemia was estimated. The
results are shown in Table 15-2.
Table 15-2. Example data;
prevalence of anemia in children 12-23.9 months of age by sex
|
Male |
Female |
|
Anemic |
205 |
129 |
334 |
Not Anemic |
89 |
86 |
175 |
|
294 |
215 |
509 |
The prevalence
estimates are:
Prevalence in males = 205/294 = 0.697 or 69.7%
Prevalence in females = 129/215 = 0.600 or 60.0%
The Prevalence Ratio estimate is as follows (using the formulae for the risk ratio):
Prevalence ratio = .697/.600 = 1.16
Variance of the prevalence ratio =
95% confidence interval; replace the Z value in the
formula to 1.96 for the calculation of a two-sided 95% confidence interval (for
a 90 confidence interval, the Z value is 1.645, and for a 99% confidence,
2.576):
(1.02, 1.32)
The
interpretation would be that males in this study were 1.16 times more likely to
have anemia than females; the 95% confidence interval around this estimate is
1.02, 1.32.
The Prevalence Difference estimate is as follows (using the formulae for the risk difference):
Prevalence difference = .697 - .600 = .097 or 9.7%
Variance of the prevalence difference =
95% confidence interval:
(.013, .181) or (1.3%, 18.1%)
The interpretation would be that the prevalence of anemia is 9.7% higher in males compared to females (in terms of an absolute difference), with a 95% confidence interval from 1.3% to 18.1%.
The odds ratio estimate, or
in this example the prevalence odds ratio
estimates, is as follows:
Odds ratio = (205*86)/(129*89)=1.54
Variance of the odds ratio =
95% confidence interval =
(1.06, 2.23)
The
interpretation would be that the odds of anemia in males is
1.54 times the odds in females with a 95% confidence interval of 1.06 to
2.23. Note that the odds ratio is larger
than the risk ratio because the prevalence of anemia is high (334/509 = 66%).
The
uncorrected chi-square tests would be calculated as:
which would have a p-value =
.022. The Mantel Haenszel
chi square would be calculated as:
which would have a p-value of
.023. The conclusion would be that there
was a statistically significant association between the sex of the child and
the prevalence of anemia. Note that the
statistical test for a 2x2 table can be used with the
risk ratio, risk difference, or odds ratio.
Also, it is calculated the same whether the
data are from an unmatched case-control study, a cohort study, or a clinical
trial.
Formulae
and Example for Stratified Data (Count Data)
For
stratified analyses, the same calculations for the crude table can be used for stratum-specific estimates. For adjusted or summary estimates, a slightly
different notation is used as shown in Table
15-2. In this table,
the subscript i
to denote estimates from a specific stratum. The general approach for adjusted point estimates is to weight each of the stratum-specific
estimates by a weighting method and then sum the results.
Table 15-2. Notation
and table setup for stratified 2x2 tables
|
Exposed |
Nonexposed |
|
Disease |
ai |
bi |
m1i |
No Disease |
ci |
di |
m0i |
|
n1i |
n0i |
ni |
For
the risk ratio and the odds ratio, two different approaches are
given for estimating the adjusted point estimate and confidence
interval, one referred to as the directly
adjusted ratio and the other referred to as the Mantel-Haenszel adjusted ratio. The directly adjusted approach requires
“large” numbers in each stratum. The
weights for directly adjusted values are the inverse of the variance; this
approach provides a greater weight to strata with the least amount of variance and
less weight to strata with a large variance.
The Mantel-Haenszel method works better when
data are sparse.
Parameter |
Point Estimate |
Confidence Interval |
Risk Ratio – Directly Adjusted |
where
|
|
Risk Ratio – Mantel-Haenszel
Adjusted |
|
where |
Risk Difference – Directly Adjusted |
where
|
|
Odds Ratio – Directly Adjusted |
where
|
|
Odds Ratio – Mantel-Haenszel
Adjusted |
|
where
|
|
|
|
|
|
|
|
|
|
|
|
|
Tests for
Interaction for the Risk Ratio, Risk Difference, and the Odds Ratio
The
tests for interaction presented here are generally referred
to as the “Breslow-Day test of homogeneity” and are
based on a chi square test.
The test for interaction for the risk ratio is:
where the Var[ln(RRi)] = 1/wi from the direct RR point
estimate calculation.
The test for interaction for the risk difference is
where the Var(RDi)
= 1/wi from the direct RD point estimate calculation.
To test for interaction for the odds ratio (OR), the chi square test is calculated as:
where the Var[ln(ORi)] = 1/wi from the direct OR point
estimate calculation.
A
statistical test to assess whether there is a statistically significant
association between the exposure and outcome variable controlling for the third
variable is the Mantel-Haenszel uncorrected
chi-square test. This statistic would be used only if it was decided that there was no
statistically significant interaction.
An example of the calculations for stratified data are
provided next. Continuing
on with the example in table 15-3 on the association between sex and
anemia in children, the data are stratified on mothers education level. Again, because the data were
based on prevalent cases, the term “prevalent” will be used rather than
“risk.”
Table 15-3. Example data;
prevalence of anemia in children 12-23.9 months of age by sex stratified on mothers education level.
Mother
has low
level of education
|
Male |
Female |
|
Anemic |
66 |
36 |
102 |
Not Anemic |
28 |
32 |
60 |
|
94 |
68 |
162 |
Mother
has high
level of education
|
Male |
Female |
|
Anemic |
139 |
93 |
232 |
Not Anemic |
61 |
54 |
115 |
|
200 |
147 |
347 |
Calculation of the directly adjusted prevalence ratio and
its 95% confidence interval is shown in Table 15-4.
Table 15-4. Calculations for
computing directly adjusted prevalence (risk) ratio
Stratum |
PRi |
ln(PRi) |
wi |
wi ln(PRi) |
1 |
1.326 |
.2821669 |
56.86628 |
16.04578 |
2 |
1.099 |
.0944001 |
162.75481 |
15.36407 |
Sum |
|
|
219.62109 |
31.40985 |
The
calculated point estimate is:
The
95% confidence interval is:
(1.011, 1.317)
The
interpretation would be that males were 1.154 times more likely to be anemic
than females controlling or adjusting for the mother’s education
level. In addition, we are 95% confident
that the true prevalence ratio is captured between
1.011 and 1.317. However, we must still
calculate the test for interaction to see if the mother’s education level
modifies the sex-anemia relationship. To
calculate the test for interaction, the directly adjusted risk ratio needs to be calculated beforehand.
Also, note that
Therefore, the test
for interaction for the prevalence/risk ratio would be:
The p-value for the chi square would be calculated for a chi square value of 1.486 with one degree of freedom (the degrees of freedom is determined from the number of strata minus 1). The p-value from this example is .223. Therefore, we would state that the mother’s education level does not significantly modify the sex-anemia relationship. Therefore, the next question is whether the mother’s education level confounds the relationship. The crude prevalence ratio was 1.16 and the directly adjusted value was 1.15, which is less than a 1% difference, therefore the conclusion would be that mother’s education does not modify nor confound the sex-anemia relationship.
The
calculation of the directly adjusted
Mantel-Haenszel prevalence ratio and its 95%
confidence interval is shown in Table 15-5.
Table 15-5. Calculations for
computing the Mantel-Haenszel prevalence (risk) ratio
Stratum |
ain0i/ni |
bin1i/
ni |
(m1in1in0i-aibini)/ni2 |
1 |
27.7037 |
20.8889 |
10.17650 |
2 |
58.8847 |
53.6023 |
19.3933 |
Sum |
86.5884 |
74.4912 |
29.5698 |
The
point estimate is
To
calculate the 95% confidence interval we will first calculate the standard
error of the estimate:
The
95% confidence interval is calculated as:
(1.018, 1.327)
Previously
we found that mother’s education did not modify the sex-anemia relationship,
therefore the interpretation would be that, controlling for mother’s education,
males were 1.162 times more likely to be anemic than females. However, because there is little confounding
(the crude value is 1.15), there is no need to control
for mother’s education level.
Calculation of the directly adjusted prevalence difference
and its 95% confidence interval is shown in Table
15-6.
Table 15-6. Calculations for
computing the direct adjusted prevalence (risk) difference
Stratum |
PDi |
wi |
wi PDi |
1 |
0.1727 |
169.8171 |
29.3274 |
2 |
0.0623 |
378.6661 |
23.5909 |
Sum |
|
548.4832 |
52.9183 |
The point estimate is:
and the 95% confidence interval
is:
(.0128, .1802)
Depending on the frequency of disease, it may be useful to describe the difference in term of per 100 individuals (or percent), per 1,000, or some other unit. In this example, the males had a prevalence of anemia 9.7% higher (in absolute terms) than females controlling for maternal education, and we are 95% confident that the truth is captured between 1.3% and 18.1%. However, before the decision is made as to whether or not to present the adjusted difference, the test for interaction should be calculated. Again, note that:
Therefore,
the test for interaction for
prevalence/risk differences would be:
The
chi square value of 1.42886 with one degree of freedom would have a p-value of
.232, which would not be statistically significant. The next step would be to determine whether
mother’s education confounds the sex-anemia relationship. The crude prevalence difference was .097, the
same as the adjusted difference, which would lead to the conclusion that there
is no important confounding in this analysis.
Calculation of the directly adjusted (prevalence) odds ratio
and its 95% confidence interval is shown in Table
15-7.
Table 15-7. Calculations for
computing the direct adjusted (prevalence) odds ratio
Stratum |
ORi |
ln(ORi) |
wi |
wi ln(ORi) |
1 |
2.095 |
.73955 |
9.09971 |
6.72969 |
2 |
1.323 |
.27990 |
18.91829 |
5.29523 |
Sum |
|
|
28.01800 |
12.02492 |
The
calculated point estimate is:
The
95% confidence interval is:
(1.061, 2.224)
The
interpretation would be that odds of anemia in males was
1.536 times the odds of anemia in females controlling
or adjusting for the mother’s
education level. In addition, we are 95%
confident that the true prevalence odds ratio is captured
between 1.061 and 2.224. However, we
must still calculate the test for interaction to see if the mother’s education
level modifies the sex-anemia relationship.
To calculate the test for interaction, the directly adjusted odds ratio
needs to be calculated beforehand. Also, note that
Therefore, the test
for interaction for the (prevalence) odds ratio would be:
The p-value for the chi square would be calculated for a chi square value of 1.2918 with one degree of freedom (the degrees of freedom is determined from the number of strata minus 1). The p-value from this example is .256. Therefore, we would state that the mother’s education level does not significantly modify the sex-anemia relationship. Therefore, the next question is whether the mother’s education level confounds the relationship. The crude prevalence odds ratio was 1.536 and the directly adjusted value was the same, the conclusion would be that, based on the odds ratio, mother’s education does not modify nor confound the sex-anemia relationship.
Calculation of the Mantel-Haenszel
adjusted (prevalence) odds ratio and its 95% confidence interval is as
follows. The values that need to be
calculated are shown in Table 15-8. To calculate the point estimate and the confidence
interval, eight values in Table 15-8 need to be calculated.
The calculated point estimate is:
The
standard error of the natural log of the point estimate is
calculated as:
Table 15-8. Calculations for
computing the Mantel-Haenszel adjusted (prevalence)
odds ratio
Stratum |
Pi |
Qi |
Ri |
Si |
1 |
.60494 |
.39506 |
13.03704 |
6.22222 |
2 |
.55620 |
.44380 |
21.63112 |
16.34870 |
Sum |
1.16114 |
.83886 |
34.66816 |
22.57092 |
Stratum |
PiRi |
PiSi |
QiRi |
QiSi |
1 |
7.88663 |
3.76407 |
5.15041 |
2.45815 |
2 |
12.03123 |
9.09315 |
9.59999 |
7.25555 |
Sum |
19.91786 |
12.85722 |
14.75040 |
9.71370 |
The confidence interval
based on the Robins,
The 95% confidence interval is
(1.062, 2.222)
Previously
we found that mother’s education did not modify the sex-anemia relationship,
therefore the interpretation would be that, controlling for mother’s education,
the odds of males having anemia were 1.536 times more likely to be anemic than
the odds in females. However, because
there is little or no confounding (the crude value is 1.536), there is no need
to control for mother’s education level.
The
overall Mantel-Haenszel uncorrected chi-square test would be calculated as with the intermediate calculations
shown in Table 15-9.
Table 15-9. Calculations for
computing the Mantel-Haenszel uncorrected chi-square
test
Stratum |
(aidi-bici)/ni |
(n1in0im1im0i)/[(ni-1)ni2] |
1 |
6.81481 |
9.25832 |
2 |
5.28242 |
18.82774 |
Sum |
12.09723 |
28.08606 |
Therefore
which would have a p-value of
.022.