Sobel test

In statistics, the Sobel test is a method of testing the significance of a mediation effect. The test is based on the work of Michael E. Sobel, and is an application of the delta method. In mediation, the relationship between the independent variable and the dependent variable is hypothesized to be an indirect effect that exists due to the influence of a third variable (the mediator). As a result when the mediator is included in a regression analysis model with the independent variable, the effect of the independent variable is reduced and the effect of the mediator remains significant. The Sobel test is basically a specialized t test that provides a method to determine whether the reduction in the effect of the independent variable, after including the mediator in the model, is a significant reduction and therefore whether the mediation effect is statistically significant.

Theoretical basis


When evaluating a mediation effect three different regression models are examined:

Model 1: YO = γ1 + τXI + ε1

Model 2: XM = γ2 + αXI + ε2

Model 3: YO = γ3 + τ'XI + βXM + ε3

In these models YO is the dependent variable, XI is the independent variable and XM is the mediator. The parameters γ1, γ2, and γ3 represent the intercepts for each model, while ε1, ε2, and ε3 represent the error term for each equation. τ denotes the relationship between the independent variable and the dependent variable in model 1, while τ' denotes that same relationship in model 3 after controlling for the effect of the mediator. The terms αXI and βXM represent the relationship between the independent variable and the mediator, and the mediator and the dependent variable after controlling for the independent variable, respectively.

Product of coefficients
From these models, the mediation effect is calculated as (τ – τ'). This represents the change in the magnitude of the effect that the independent variable has on the dependent variable after controlling for the mediator. From examination of these equations it can be determined that (αβ) = (τ – τ'). The α term represents the magnitude of the relationship between the independent variable and the mediator. The β term represents the magnitude of the relationship between the mediator and dependent variable after controlling for the effect of the independent variable. Therefore (αβ) represents the product of these two terms. In essence this is the amount of variance in the dependent variable that is accounted for by the independent variable through the mechanism of the mediator. This is the indirect effect, and the (αβ) term has been termed the product of coefficients.

Venn diagram approach
Another way of thinking about the product of coefficients is to examine the figure below. Each circle represents the variance of each of the variables. Where the circles overlap represents variance the circles have in common and thus the effect of one variable on the second variable. For example sections c + d represent the effect of the independent variable on the dependent variable, if we ignore the mediator, and corresponds to τ. This total amount of variance in the dependent variable that is accounted for by the independent variable can then be broken down into areas c and d. Area c is the variance that the independent variable and the dependent variable have in common with the mediator, and this is the indirect effect. Area c corresponds to the product of coefficients (αβ) and to (τ &minus; τ'). The Sobel test is testing how large area c is. If area c is sufficiently large then Sobel's test is significant and significant mediation is occurring.



Calculating the Sobel test
In order to determine the statistical significance of the indirect effect, a statistic based on the indirect effect must be compared to its null sampling distribution. The Sobel test uses the magnitude of the indirect effect compared to its estimated standard error of measurement to derive a t statistic

$t = (τ &minus; τ')⁄SE$ OR   $t = (αβ)⁄SE$

Where SE is the pooled standard error term and $SE = \sqrt{α^{2} σ^{2}_{β} + β^{2}σ^{2}_{α}}|undefined$ and σ2β is the variance of β and σ2α is the variance of α.

This t statistic can then be compared to the normal distribution to determine its significance. Alternative methods of calculating the Sobel test have been proposed that use either the z or t distributions to determine significance, and each estimates the standard error differently.

Distribution of the product term
The distribution of the product term αβ is only normal at large sample sizes which means that at smaller sample sizes the p-value that is derived from the formula will not be an accurate estimate of the true p-value. This occurs because both α and β are assumed to be normally distributed, and the distribution of the product of two normally distributed variables is skewed, unless the means are much larger than the standard deviations. If the sample is large enough this will not be a problem, however determining when a sample is sufficiently large is somewhat subjective.

Problems with the product of coefficients
In some situations it is possible that (τ – τ') ≠ (αβ). This occurs when the sample size is different in the models used to estimate the mediated effects. Suppose that the independent variable and the mediator are available from 200 cases, while the dependent variable is only available from 150 cases. This means that the α parameter is based on a regression model with 200 cases and the β parameter is based on a regression model with only 150 cases. Both τ and τ' are based on regression models with 150 cases. Different sample sizes and different participants means that (τ – τ') ≠ (αβ). The only time (τ – τ') = (αβ) is when exactly the same participants are used in each of the models testing the regression.

Product of the coefficients distribution
One strategy to overcome the non-normality of the product of coefficients distribution is to compare the Sobel test statistic to the distribution of the product instead of to the normal distribution. This approach bases the inference on a mathematical derivation of the product of two normally distributed variables which acknowledges the skew of the distribution instead of imposing normality.

Bootstrapping
Another approach that is becoming more popular in the literature is bootstrapping. Bootstrapping is a non-parametric resampling procedure that can build an empirical approximation of the sampling distribution of αβ by repeatedly sampling the dataset. Bootstrapping does not rely on the assumption of normality.