User:Likelihoodist/sandbox

Odds Ratio for a Matched Case-Control Study
A case-control study involves selecting representative samples of cases and controls who do, and do not, have some disease, respectively. These samples are usually independent of each other. The prior prevalence of exposure to some risk factor is observed in subjects from both samples. This permits the estimation of the odds ratio for disease in exposed vs. unexposed people. Sometimes, however, it makes sense to match cases to controls on one or more confounding variables. In this case, the prior exposure of interest is determined for each case and her/his matched control. The data can be summarized in the following table.

Matched 2x2 Table
This table gives the exposure status of the matched pairs of subjects. There are $$n_{11}$$ pairs where both both the case and her/his matched control were exposed, $$n_{10}$$ pairs where the case patient was exposed but the control subject was not, $$n_{01}$$ pairs where the control subject was exposed but the case patient was not, and $$n_{00}$$ pairs were neither subject was exposed. The exposure of matched case and control pairs is correlated due to the similar values of their shared confounding variables.

The following derivation is due to Breslow & Day. We consider each pair as belonging to a stratum with identical values of the confounding variables. Conditioned on belonging to the same stratum, the exposure status of cases and controls are independent of each other. For any case-control pair within the same stratum let

$$p_1$$ be the probability that a case patient is exposed,

$$p_0$$ be the probability that a control patient is exposed,

$$q_1 = 1 - p_1$$ be the probability that a case patient is not exposed, and

$$q_0 = 1- p_0$$ be the probability that a control patient is not exposed.

Then the probability that a case is exposed and a control is not is $$p_1q_0$$, and the probability that a control is exposed and a case in not is $$p_0q_1$$. The within-stratum odds ratio for exposure in cases relative to controls is

$$\psi = (p_1/ q_1)/(p_0/q_0) =  p_1q_0/(q_1p_0) $$

We assume that $$\psi$$ is constant across strata.

Now concordant pairs in which either both the case and the control are exposed, or neither are exposed tell us nothing about the odds of exposure in cases relative to the odds of exposure among controls. The probability that the case is exposed and the control is not given that the pair is discordant is

$$\pi = (p_1q_0)/(p_1q_0 + q_1p_0) =  \psi/(\psi + 1) $$

The distribution of $$n_{10}$$ given the number of discordant pairs is binomial ~  B$$(n_{10} + n_{01},\pi)$$ and the maximum likelihood estimate of $$\pi$$ is

$$\hat{\pi} = n_{10}/(n_{10}+n_{01}) = \hat{\psi}/(\hat{\psi} + 1)$$

Multiplying both sides of this equation by $$(n_{10} + n_{01})(\hat{\psi} + 1 )$$ and subtracting $$n_{10}\hat{\psi}$$ gives

$$n_{10} = \hat{\psi}(n_{10} + n_{01} - n_{10})$$ and hence

$$\hat{\psi} = n_{10}/n_{01} $$.

Now $$\hat{\pi}$$ is the maximum likelihood estimate of $$\pi$$, and $$\psi$$ is a monotonic function of $$\hat{\pi}$$. It follows that $$\hat{\psi}$$ is the conditional maximum likelihood estimate of $$\hat{\psi}$$ given the number of discordant pairs. Rothman et al. give an alternate derivation of $$\hat{\psi}$$ by showing that it is a special case of the Mantel-Haenszel estimate of the intra-strata odds ratio for stratified 2x2 tables. They also reference Breslow & Day as providing the derivation given here.

Under the null hypothesis that $$\psi = 1, \pi = 1/(1+1) = 0.5 $$.

Hence, we can test the null hypothesis that $$\psi = 1$$ by testing the null hypothesis that $$\pi = 0.5$$. This is done using McNemar's test.

There are a number of ways to calculate a confidence interval for $$\pi$$. Let $$\hat{\pi}_{LB} $$ and $$\hat{\pi}_{UB}$$ denote the lower and upper bound of a confidence interval for $$\pi$$, respectively. Since $$\psi = \pi/(1-\pi)$$, the corresponding confidence interval for $$\psi$$ is

$$(\frac{\hat{\pi}_{LB} }{1 -\hat{\pi}_{LB} }, \frac{\hat{\pi}_{UB} }{1 -\hat{\pi}_{UB} })$$.

Matched 2x2 tables may also be analyzed using conditional logistic regression. This technique has the advantage of allowing users to regress case-control status against multiple risk factors from matched case-control data.

Example
McEvoy et al. studied the use of cell phones by drivers as a risk factor for automobile crashes in a case-crossover study. All study subjects were involved in an automobile crash requiring hospital attendance. Each driver's cell phone use at the time of her/his crash was compared to her/his cell phone use in a control interval at the same time of day one week earlier. We would expect that a person's cell phone use at the time of the crash would be correlated with his/her use one week earlier. Comparing usage during the crash and control intervals adjusts for driver's characteristics and the time of day and day of the week. The data can be summarized in the following table.

There were 5 drivers who used their phones in both intervals, 27 who used them in the crash but not the control interval, 6 who used them in the control but not the crash interval, and 288 who did not use them in either interval. The odds ratio for crashing while using their phone relative to driving when not using their phone was

$$\hat{\psi} = 27/6 = 4.5 $$.

Testing the null hypothesis that $$\hat{\psi} =1 $$ is the same as testing the null hypothesis that $$\hat{\pi} =0.5 $$ given 27 out of 33 discordant pairs in which the driver was using her/his phone at the time of his crash. McNemar's $$\chi^2 = 13.36$$. This statistic has one degree of freedom and yields a P value of 0.0003. This allows us to reject the hypothesis that cell phone use has no effect on the risk of automobile crashes ($$\psi = 1$$) with a high level of statistical significance.

Using Wilson's method, a 95% confidence interval for $$\pi$$ is (0.6561, 0.9139). Hence, a 95% confidence interval for $$\psi$$ is

$$(\frac{0.6561 }{1 - 0.6561}, \frac{0.9139}{1 - 0.9139 }) = (1.9, 10.6)$$

(McEvoy et al. analyzed their data using conditional logistic regression and obtained almost identical results to those given here. See the last row of Table 3 in their paper.)

Software for Power and Sample Size Calculations
Numerous programs are available for performing power and sample size calculations. These include nQuery Advisor, PASS, PS, R, Russ Lenth's power and sample-size page, SAS Power and sample size, and Stata. A large set of power and sample size routines are included in R and Stata, which are comprehensive statistical packages. The other programs listed above are specialized for these calculations and are easier to use by people who are not familiar with the more general packages. nQuery, PASS, SAS and Stata are commercial products. The other programs listed above are freely available.

Tom

Please take a look at my edits to the Statistical power and Sample size determination pages. My intent in writing this paragraph was to remove the orphan status of the PASS and PS pages Since I last looked there are now external software links that have been added to this page. Some of these links are not included in my paragraph, as I wanted to get some feed back from you and other editors before making further edits. It would be easy enough to include the the PS and PASS


 * 1) REDIRECT PS Power and Sample Size

Edit to Fisher's exact test page
The decision to condition on the margins of the table is also controversial. The p-values derived from Fisher's test come from the distribution that conditions on the margin totals. In this sense, the test is exact only for the conditional distribution and not the original table were the margin totals may change from experiment to experiment. It is possible to obtain an exact p-value for the 2x2 table when the margins are not held fixed. Barnard's test, for example, allows for random margins. However, some authors  (including, later, Barnard himself) have criticized Barnard's test based on this property. They argue that the marginal success total is an (almost ) ancillary statistic, containing (almost) no information about the tested property.

The act of conditioning on the marginal success rate from a 2x2 table can be shown to ignore some information in the data about the unknown odds ratio. The argument that the marginal totals are (almost) ancillary implies that the appropriate likelihood function for making inferences about this odds ratio should be conditioned on the marginal success rate. Whether that lost information is important for inferential purposes is the essence of the controversy.

Alternatives
An alternative exact test, Barnard's exact test, has been developed and proponents of it suggest that this method is more powerful, particularly in 2 × 2 tables. Another alternative is to use maximum likelihood estimates to calculate a p-value from the exact binomial or multinomial distributions and reject or fail to reject based on the p-value.

Choi et al. propose a p-value derived from the likelihood ratio test based on the conditional distribution of the odds ratio given the marginal success rate. This p-value is inferentially consistent with classical tests of normally distributed data as well as with likelihood ratios and support intervals based on this conditional likelihood function. It is also readily computable.