User:Eyoungstrom/sandbox/SEM

Structural Equation Modeling
Structural equation modeling (SEM) is a label for a diverse set of methods used mostly in the social and behavioral sciences by scientists doing both observational and experimental research. SEM involves constructing a model to represent how various aspects of a phenomenon are thought to be causally structurally related to one another. The postulated causal structuring is often depicted with arrows representing causal connections between variables (as in Figure 1) but these causal connections can be equivalently represented as equations. The causal structures imply that specific patterns of connections should appear among the values of the variables, and the observed connections between the variables’ values are used to estimate the magnitudes of the causal effects, and to test whether or not the observed data are consistent with the postulated causal structuring.

The boundary between what is and is not a structural equation model is not always clear but SE models often contain postulated causal connections among a set of latent variables (variables thought to exist but which can’t be directly observed) and causal connections linking the postulated latent variables to variables that can be observed and whose values are available in some data set. Variations among the styles of latent causal connections, variations among the observed variables measuring the latent variables, and variations in the statistical estimation strategies result in the SEM toolkit including confirmatory factor analysis, confirmatory composite analysis, path analysis, multi-group modeling, longitudinal modeling, partial least squares path modeling, latent growth modeling and hierarchical or multilevel modeling.

Figure 1 depicts a model in which the latent concept of human intelligence is measured using two different intelligence tests, and the latent concept academic performance is measure by the students’ SAT scores and their high school GPA. Because intelligence and academic performance are merely imagined or theory-postulated variables, their precise scale values are unknown, though the model specifies that each latent variable’s values must fall somewhere along the observable scale possessed by one of the indicators. The 1.0 effect connecting a latent to an indicator specifies that each real unit increase or decrease in the latent variable’s value results in a corresponding unit increase or decrease in the indicator’s value. It is hoped a good indicator has been chosen for each latent, but the 1.0 values do not signal perfect measurement because this model also postulates that there are other unspecified entities causally impacting the observed indicator measurements, thereby introducing measurement error. This model postulates that separate measurement errors influence each of the two indicators of latent intelligence, and each indicator of latent achievement. The unlabeled arrow pointing to academic performance acknowledges that things other than intelligence can also influence academic performance.

Researchers using SEM employ software programs to estimate the strength and sign of the effect coefficients of interest, and the extent to which unknown sources contribute measurement errors and residual variation in latent variables like academic performance. Because a postulated model such as Figure 1 may not correspond to the worldly forces controlling the data measurements, programs also attempt to provide diagnostic clues suggesting which indicators or which model components might be introducing inconsistency between the model and the observed worldly-based data.

History
Structural equation modeling (SEM) began differentiating itself from correlation and regression when Sewall Wright provided explicit causal interpretations for a set of regression-style equations based on a solid understanding of the physical and physiological mechanisms producing direct and indirect effects among his observed variables. The equations were estimated like ordinary regression equations but the substantive context for the measured variables permitted clear causal, not merely predictive, understandings. O. D. Duncan introduced SEM to the social sciences in his 1975 book and SEM blossomed in the late 1970’s and 1980’s when increasing computing power permitted practical model estimation.

Different yet mathematically related modeling approaches developed in psychology, sociology, and economics. The convergence of two of these developmental streams (factor analysis from psychology, and path analysis from sociology via Duncan) produced the current core of SEM. One of several programs Karl Jöreskog developed at Educational Testing Services, LISREL embedded latent variables (which psychologists knew as the latent factors from factor analysis) within path-analysis-style equations (which sociologists inherited from Wright and Duncan). The factor-structured portion of the model incorporated measurement errors and thereby permitted measurement-error-adjusted, if not necessarily error-free, estimation of effects connecting latent variables.

Hayduk (1987) provided the first book-length introduction to structural equation modeling with latent variables, and this was soon followed by Bollen’s popular text (1989)   Traces of the historical convergence of the factor analytic and path analytic traditions persist as the distinction between the measurement and structural portions of models; and as continuing disagreements over model testing, and whether measurement should precede or accompany structural estimates. Viewing factor analysis as a data-reduction technique deemphasizes testing which contrasts with path analytic appreciation of testing specific postulated causal connections – where the testing sometimes signals model inappropriateness. The friction between the factor analytic and path analytic traditions are clear in the SEMNET archive. SEMNET is a free and open listserve supported by the University of Alabama {semnet@listserv.ua.edu}.

Wright's path analysis influenced Hermann Wold, Wold’s student Karl Jöreskog, and Jöreskog’s student Claes Fornell, but SEM never gained a large following among U.S. econometricians, possibly due to fundamental differences in modeling objectives and typical data structures. The continued separation of SEM’s economic branch has led to procedural and terminological differences, though deep mathematical and statistical connections remain. The economic version of SEM can be seen in SEMNET discussions of endogeneity, and in the heat produced as Judea Pearl’s approach to causality via directed acyclic graphs (DAG’s) rubs against economic approaches to modeling {Pearl, J. (2009) Causality: Models, Reasoning, and Inference. Second edition. New York: Cambridge University Press}. Discussions comparing and contrasting the various approaches are beginning to appear but disciplinary differences in data structures and in the concerns motivating SEM analysis suggest the final-chapters on the fruitfulness of more fully incorporating SEM’s economic branch into SEM, or SEM and DAG’s into economics, remain to be written.

General SEM Steps and Considerations
The following considerations apply to many structural equation models.

Model specification
Building or specifying a model requires attending to:


 * the set of variables to be employed,
 * what is known about the variables,
 * what is presumed or hypothesized about the variables’ causal connections and disconnections,
 * what the researcher seeks to learn from the modeling
 * and the cases for which values of the variables will be available (kids? workers? companies? countries? cells? accidents? cults?).

Structural equation models attempt to mirror the worldly forces operative for causally homogeneous cases – namely cases enmeshed in the same worldly causal structures but whose values on the causes differ and who therefore possess different values on the outcome variables. Causal homogeneity can be facilitated by case selection, or by segregating cases in a multi-group model. The model’s specification is not complete until the researcher specifies:


 * which effects and/or correlations/covariances are to be included and estimated,
 * which effects and other coefficients are forbidden or presumed unnecessary,
 * and which coefficients will be given fixed/unchanging values (e.g., to provide measurement scales for latent variables as in Figure 1).

The latent level of a model is composed of endogenous and exogenous variables. The endogenous latent variables are the variables postulated as receiving effects from any other modeled variable or variables. Each endogenous variable is modeled as the dependent variable in a regression-style equation. The exogenous latent variables are background variables postulated as causing one or more of the endogenous variables. Causal connections among the exogenous variables are not explicitly modeled but are acknowledged by modeling the exogenous variables as freely correlating with one another. As in regression, each endogenous variable is assigned an error/residual variable representing the causal effects of unavailable and usually unknown causes. Each latent variable, whether exogenous or endogenous, is thought of as containing the cases’ true-scores on that variable, and these true-scores causally contribute valid/genuine variations into one or more of the observed/reported indicator variables.

To keep track of the connections between the various model components, the LISREL program employed a set of matrices having specific names. These names became relatively standard notation, though the notation has been extended and altered to accommodate a variety of statistical considerations. (Joreskog and Sorbom, 1967; Hayduk, 1987; Bollen, 1989; Kline, 2016) Texts and programs “simplifying” model specification via diagrams or by using equations permitting user-selected variable names, re-convert the user’s model into some standard matrix-algebra form in the background. The “simplification” is often achieved by implicitly introducing default program “assumptions” about model features with which users supposedly need not concern themselves. Unfortunately, these default-assumptions easily obscure model components and leave unrecognized issues lurking within the model’s structure.

Estimation of Free Model Coefficients
Model coefficients fixed at zero, 1.0, or other values, do not require estimation because they already have specified values. Estimated values for free model coefficients are obtained by maximizing fit to, or minimizing difference from, the data relative to what the data’s features would look like if the free model coefficients took on the estimated values. The model’s implications for what should be observed with specific coefficient values depends on: a) the coefficient’s location in the model (e.g. which variables are involved), b) the nature of the connections between the variables (effects are often assumed to be linear), c) the nature of error or residual variables (these are often assumed to be independent from many variables), and d) the measurement scales appropriate for the variables (interval level measurement is often assumed).

A stronger effect connecting two latent variables implies that indicators of those latents should be more strongly correlated. Hence, the most reasonable estimate of the latent effect’s magnitude will be the value that best matches the data correlation (maximizes the match with the data, or minimizes the difference from the data). With maximum likelihood estimation, the numerical values of all the free model coefficients are adjusted (progressively increased or decreased from initial start values) until they maximize the likelihood of observing the sample data – whether the data are the variables’ covariances/correlations, or the cases’ actual values on the indicator variables. Ordinary least squares estimates are the coefficient values that minimize the squared differences between the data and what the data would look like if the model was correctly specified, namely if all the model’s features corresponded to real worldly features.

The appropriate statistical feature to maximize or minimize to obtain estimates depends on the variables’ levels of measurement (estimation is generally easier with interval level measurement than nominal or ordinal measures), and where a specific variable appears in the model (e.g. endogenous dichotomous variables are more awkward to estimate than exogenous dichotomous variables). Most SEM programs provide several options for what is to be maximized or minimized during estimation (e.g. maximum likelihood estimation (MLE), full information maximum likelihood (FIML), ordinary least squares (OLS), weighted least squares (WLS), diagonally weighted least squares (DWLS), two-stage least squares, three stage least squares, etc.),{Kline, 2016} but detailed consideration of alternative estimation strategies requires more sophistication than is appropriate here. {A couple of helpful, more complex, references should be included here.}

One common problem is that a coefficient’s estimated value may be underidentified because it is insufficiently constrained by the model and data. There will be no unique best-estimate unless the model and data together sufficiently constrain or restrict a coefficient’s value. For example, the magnitude of a single data correlation between two variables is insufficient to provide estimates of a pair of modeled reciprocal effects between those variables because the correlation might be accounted for by one of the reciprocal effects being stronger than the other effect, or the other effect being stronger than the one, or by the effects being of equal magnitude. Under-identified effect estimates can be rendered identified by introducing additional model and/or data constraints. For example, reciprocal effects can be rendered identified by constraining one effect estimate to be double, triple, or equivalent to, the other effect estimate, but the resultant estimates will only be trustworthy if the additional model constraint corresponds to the world’s structure. Data on a third variable that directly causes only one of the reciprocally causally connected variables can also assist identification. Constraining a third variable to not directly cause the other-one of the reciprocally-causal variables breaks the symmetry that otherwise plagues reciprocal effect estimates because the third variable must be more strongly correlated with the variable it causes directly than with the indirectly impacted variable at the “other” end of the reciprocal effects.{ Rigdon, E. (1995). A necessary and sufficient identification rule for structural models estimated in practice. Multivariate Behavioral Research, 30(3):359-383.} Notice that this again presumes the properness of the model’s causal specification – namely that there really is no direct effect leading from the third variable to the variable at the “other” end of the reciprocal effects. Also notice that a theory’s demand for null/zero effects provides constraints assisting estimation, though many theories do not attend sufficiently to which effects are alleged to be nonexistent.

Model Assessment
Estimated model coefficients depend on the data, the model, and the estimation strategy. Hence model assessment should include:

whether the data contain reasonable measurements of appropriate variables,

whether the modeled cases are causally homogeneous,

(It makes no sense to estimate one model if the data cases reflect two or more different causal networks.)

whether the model appropriately represents the theory or features of interest

(Models are rendered unpersuasive by omitting features required by a theory, or by inserting coefficients inconsistent with that theory.)

whether the estimates are statistically justifiable

(Substantive assessments may be devastated: by violating assumptions, by using an inappropriate estimator, and/or by encountering non-convergence of iterative estimators.)

the substantive reasonableness of the estimates

(Negative variances, and correlations exceeding 1.0 or -1.0, are impossible. Estimates that are statistically-possible but inconsistent with theory may also challenge theory and our understanding.)

the remaining consistency or inconsistency between the model and data.

(The estimation process minimizes the differences between the model and data but important and informative differences may nonetheless remain.)

Research claiming to test or investigate a theory must attend to beyond-chance model-data inconsistency. Estimation adjusts the model’s free coefficients to provide the best possible fit to the data. Hence, if a model remains stubbornly inconsistent with the data despite selection of optimal coefficient estimates, an honest research response requires reporting and attending to the evidence pointing toward theory disconfirmation (namely a significant model χ2 test).{ Hayduk, L. A. (2014b) Shame for disrespecting evidence: The personal consequences of insufficient respect for structural equation model testing.  BMC: Medical Research Methodology, 14(124)1-10   DOI 10.1186/1471-2288-14-124    http://www.biomedcentral.com/1471-2288/14/124} It would be nonsense to claim a theory has been tested if beyond-chance model-data inconsistency is disregarded! If the modeling objective is to investigate the sufficiency of mechanisms postulated as carrying effects from background variables to causally downstream variables, model-data inconsistency questions the assessment because it reports that the currently modeled mechanisms and estimated effect magnitudes are inconsistent with the data supposedly adjudicating the mechanisms’ veracity.

Estimates of coefficients in failing (data inconsistent) models are interpretable but they are only interpretable in the sense that they report how the world would appear to someone believing a model that is inconsistent with the available data. The estimates in data-inconsistent models do not necessarily become obviously-wrong in the sense of becoming statistically-strange or wrong-signed according to theory. The estimates may closely match the theory’s requirements but if the theory remains inconsistent with the data, the match between the estimates and theory provides no succor. Estimates in failing models report how the effects would appear to someone clinging to an incorrect view of the world. Failing models can be interpreted, but the interpretation should be acknowledged, and be presented, as an interpretation that is confronted by the data.

Numerous fit indices attempt to quantify how closely a model fits the data but all the indices suffer from the logical difficulty that the size or amount of ill fit is not trustably coordinated with the severity or nature of the issues being signaled by the data. {Hayduk, L. A. (2014a). Seeing perfectly-fitting factor models that are causally misspecified: Understanding that close-fitting models can be worse. Educational and Psychological Measurement, 74(6): 905-926. (doi: 10.1177/0013164414527449) } Models with different causal structures which fit data identically well have been called equivalent models.{Kline, 2016} Such models are data-fit-equivalent though not causally equivalent. For even moderately complex models, truly equivalently-fitting models are rare. Models which are radically causally incorrect, yet which nonetheless almost-fit the available data according to some index, constitute a much greater research impediment.

The definitions of fit indices such as the RMSEA (Root Mean Square Error of Approximation), SRMR (Standardized Room Mean Residual), AIC (Akaike Information Criterion), and the CFI (Confirmatory Fit Index) can be found in multiple sources {Kline 2016, and others}. Hu and Bentler (1999) { Hu, L. and Bentler,P.M. (1999) Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. ''Structural Equation Modeling. 6:1-55''.} became popular by proposing combinations of index values they thought would function reasonably given specific kinds of misspecification in factor models. Unfortunately many articles cite Hu and Bentler (1999) despite not investigating factor models, and not even attending to the pairs of index values Hu and Bentler required. What is worse, many “accept” their wrong models rather than making the relatively simple changes that, according to Hu and Bentler, would recover the proper model. There is no reason to “accept” a causally wrong model when the proper model is within reach! Many researchers have cited Hu and Bentler and then proceeded in ways that render their published SEM claims dubious and likely to result in red-faces or worse.

A modification index is an estimate of how much a model’s fit to the data would “improve” (but not necessarily how much the model itself would improve) if a specific currently-fixed model coefficient were freed for estimation. Researchers confronting a data-inconsistent model can easily free whichever series of additional coefficients the modification indices report as likely to produce the greatest improvements in fit. Unfortunately this introduces a substantial risk of moving from a causally-wrong-and-failing model to a causally-wrong-but-fitting model because the improved data-fit alone does not provide any assurance that the freed coefficients are substantively reasonable or world matching. The original model may contain causal misspecifications such as incorrectly directed effects, or assumptions about unavailable variables, that cannot be corrected by adding coefficients to the current model, and hence the model remains misspecified despite the closer fit provided by the added coefficients. Fitting yet worldly-inconsistent models are likely to arise if a researcher committed to a particular model (for example a factor structured model with a specific number of factors) bludgeons an initially-failing model into fitting by inserting measurement error covariances “suggested” by the modification indices.

“Accepting” failing models as “close enough” is also not a reasonable alternative. A cautionary instance is provided by Browne, MacCallum, Kim, Anderson, and Glaser (2002) { Browne, M.W., MacCallum, R.C., Kim, C.T., Andersen, B.L., and Glaser, R. (2002) When fit indices and residuals are incompatible. Psychological Methods, 7:403-421.} who addressed the mathematics behind why the χ2 test can have (though it does not always have) considerable power to detect model misspecification. The probability reported by the χ2 test is the probability that the data (whether viewed as covariances or as the indicator variables’ values) could arise by random sampling variations if the current model, with its optimal estimates, constituted the real underlying population forces. A small χ2 probability reports it would be unlikely for the current data to have arisen if the modeled structure constituted the real population causal forces, so any remaining differences would have to be attributed to random sampling variations. Browne, McCallum, Kim, Andersen, and Glaser presented a factor model they viewed as acceptable despite the model being significantly inconsistent with their data according to χ2. The fallaciousness of their idea that close-fit should be treated as good enough was demonstrated by Hayduk, Pazkerka-Robinson, Cummings, Levers and Beres (2005) {Hayduk, L. A., Pazderka-Robinson, H., Cummings, G .G., Levers, M-J. D., & Beres, M. A. (2005). Structural equation model testing and the quality of natural killer cell activity measurements. BMC Medical Research Methodology, 5(1), 1-9. (Open Web Access). (doi: 10.1186/1471-2288-5-1)  Note the correction of .922 to .992, and the correction of .944 to .994 in their Table 1.} who demonstrated a fitting model for Browne, et al.’s own data by incorporating an experimental feature Browne, et al. had overlooked. The fault was not in the math of the indices or in the over-sensitivity of the testing. The fault was in Browne, MacCallum, and the other authors forgetting, neglecting, or overlooking that the amount of ill fit cannot be trusted to correspond to the seriousness of the real problems in a model’s specification. {Hayduk, L.A. (2014a). Seeing perfectly-fitting factor models that are causally misspecified: Understanding that close-fitting models can be worse. Educational and Psychological Measurement, 74(6): 905-926. (doi: 10.1177/0013164414527449)}

Many researchers have tried to justify switching to indices rather than testing their models by claiming that χ2 problematically increases (and hence  χ2’s probability decreases) with the sample size (N). There are two mistakes in discounting χ2 on this basis. First, for proper models, χ2 does not increase with increasing N, {Hayduk, L.A. (2014b) } so if χ2 increases with N that itself is a sign that something is problematic. And second, for models that are detectably misspecified, χ2’s increase with N provides the good-news of increasing statistical power to detect model misspecification (namely detect what otherwise would become a Type II error). Some kinds of important misspecifications cannot be detected by χ2 {Hayduk, L.A. (2014a). Seeing perfectly-fitting factor models that are causally misspecified: Understanding that close-fitting models can be worse. Educational and Psychological Measurement, 74(6): 905-926. (doi: 10.1177/0013164414527449)} so any amount of ill fit beyond what might be reasonably produced by random variations should be reported and conscientiously addressed. Only the misinformed, or those willfully hiding evidence of problems, will continue to misrepresent the connection between N and χ2.{ Hayduk, L. A. (2014b). Shame for disrespecting evidence: The personal consequences of insufficient respect for structural equation model testing.  BMC: Medical Research Methodology, 14(124)1-10.  DOI 10.1186/1471-2288-14-124    http://www.biomedcentral.com/1471-2288/14/124 ;  Barrett, P. (2007). Structural equation modeling: Adjudging model fit. Personality and Individual Differences, 42(5), 815-824. } The χ2 model test (possibly adjusted {Satorra, A., and Bentler, P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye and C. C. Clogg (Eds.), Latent variables analysis: Applications for developmental research (pp. 399-419). Thousand Oaks, CA: Sage.}) is the strongest available model test.

Sample size, power, and estimation
Researchers agree samples should be large enough to provide stable coefficient estimates and reasonable testing power but there is no general consensus regarding specific required sample sizes, or even how to determine appropriate sample sizes. Recommendations have been based on the number of coefficients to be estimated, the number of modeled variables, and Monte Carlo simulations addressing specific model coefficients.{Kline, 2016}  Sample size recommendations based on the ratio of the number of indicators to latents are factor oriented and do not apply to models employing single indicators having fixed nonzero measurement error variances. Overall, for moderate sized models without statistically difficult-to-estimate coefficients, the required sample sizes (N’s) seem roughly comparable to the N’s required for a regression employing all the indicators.

The larger the sample size, the greater the likelihood of including cases that are not causally homogeneous. Consequently, increasing N to improve the likelihood of being able to report a desired coefficient as significant, simultaneously increases both the risk of model misspecification and the power to detect the model’s misspecification. Researchers seeking to learn from their modeling (including potentially learning their model requires adjustment or replacement) will strive for as large a sample size as permitted by funding and by their assessment of likely population-based causal homogeneity. (If the available N is huge, modeling specific sub-types or sub-sets of cases can control for variables that might otherwise disrupt causal homogeneity.) Researchers fearing they might have to report their model’s deficiencies are torn between wanting a larger N to provide sufficient power to detect coefficients of interest, while avoiding the power capable of signaling model-data inconsistency. The huge variation in model structures and data characteristics suggests adequate sample sizes might be usefully located by considering other researchers’ experiences (both good and bad) with models of comparable size and complexity estimated with similar data.

Interpretation
All structural equation models are causal models and have causal interpretations but those interpretations will be fallacious/wrong if the model’s structure does not correspond to the world’s causal structure. Consequently, interpretation should address the overall status of the model, not merely the coefficients in the model. Whether a model fits the data, and how a model came to fit the data, are paramount for interpretation. Data fit obtained by exploring, or by following successive modification indices, does not guarantee the model is wrong but raises serious doubts because these approaches are prone to incorrectly modeling data features. For example, exploring to see how many factors are required preempts finding the data is not factor structured, especially if the factor model has been “persuaded” to fit via inclusion of measurement error covariances. Data’s ability to speak against a postulated model is progressively eroded with each unwarranted inclusion of a “modification index suggested” effect or error covariance. Exploratory analyses are prone to inappropriate inclusion of coefficients because it becomes exceedingly difficult to recover a proper model if the initial/base model contains several misspecifications.{Herting, R.H. and Costner, H.L. (2000) Another perspective on “The proper number of factors” and the appropriate number of steps. Structural Equation Modeling 7(1):92-110.}

Avoiding model-misguided interpretations requires detecting misspecified models. One helpful way to do this is to add new latent variables entering or exiting the original model at a few clear causal locations/variables. The correlations between indicators of the new latents and all the original indicators contribute to testing the original model’s structure because the few new and focused effects must work in coordination with the model’s original direct and indirect effects to appropriately coordinate the new and original indicators. If the original model’s structure was problematic, the sparse new causal connections will be insufficient to coordinate the new indicators with the original indicators, and will signal the inappropriateness of the original model structure through model failure. The correlational constraints grounded in null/zero effects, and coefficients assigned fixed nonzero values, contribute to both model testing and estimation of free coefficients, and hence deserve acknowledgment as the scaffolding supporting the estimates and their interpretation.

Replication is unlikely to detect misspecified models which inappropriately fit the data. If the replicate data is within random variations of the original data, the wrong model that fit the original data will likely continue to fit. The weakness of replication as guarantor of model properness is especially concerning after intentional exploration, as when confirmatory factor analysis (CFA) is applied to a random half of data investigated via exploratory factor analysis (EFA).

Direct-effect estimates in fitting models are interpreted in parallel to the interpretation of coefficients in regression equations but with causal commitment. Each unit increase in a causal variable’s value is viewed as producing the estimated magnitude of response in the dependent variable’s value given control or adjustment for all the other operative/modeled causal mechanisms. Indirect effects are interpreted similarly, with the magnitude of a specific indirect effect equaling the product of the series of direct effects comprising that indirect effect. The statistical insignificance of an effect indicates the estimate could rather easily have arisen as a random sampling variation around a null/zero effect, so the interpretation of the estimate as a real effect becomes equivocal. Effects touching loops or reciprocal effects require slightly revised interpretations. {Hayduk, L.A. (1987). Structural Equation Modeling with LISREL: Essentials and Advances. Baltimore: Johns Hopkins University Press.; Hayduk, L.A. (1996). LISREL Issues, Debates, and Strategies. Baltimore: Johns Hopkins University Press.}

And as in regression, the proportion of each dependent variable’s variance explained by variations in the modeled causes are provided by R2, though the Blocked-Error R2 should be used if the dependent variable is involved in reciprocal effects or has an error variable correlated with any predictor’s error variable. {Hayduk, L.A. (2006). Blocked-Error-R2: A conceptually improved definition of the proportion of explained variance in models containing loops or correlated residuals. Quality and Quantity, 40, 629-649.}

If a model adjusts for measurement errors, the adjustment permits interpreting latent-level effects as referring to variations in true scores.{ Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203–219. https://doi.org/10.1037/0033-295X.110.2.203 } SEM interpretations depart most radically from regression interpretations when there are causal connections among several modeled latent variables because SEM interpretations should convey the consequences of the patterns of indirect effects carrying effects from background variables through intervening variables to the downstream dependent variables. SEM interpretations encourage understanding how multiple worldly causal pathways can work in coordination, or independently, or even counteract one another. Direct effects may be counteracted (or reinforced) by indirect effects, or have their correlational implications counteracted (or reinforced) by the effects of common causes. The meaning and interpretation of estimates should be contextualized in the full model context. Interpretations become progressively more complex for models containing interactions, nonlinearities, multiple groups, multiple levels, and categorical variables. {Kline, 2016}

The caution with which this segment on Interpretation began warrants repeat. Interpretation should be possible whether a model is consistent with, or is inconsistent with, the data. The estimates report how the world would appear to someone believing the model – even if that belief is unfounded because the model happens to be wrong. Interpretation should acknowledge that the model coefficients may or may not correspond to “parameters” – because the model’s coefficients may not have corresponding worldly structural features.

Both failing and fitting models can provide research advancement. To be dependable the model should investigate academically informative causal structures, fit the applicable data with understandable estimates, and not include vacuous coefficients.{Millsap, R.E. (2007) Structural equation modeling made difficult. Personality and Individual differences 42:875-881.}  Dependable fitting models are rarer than failing models and models inappropriately bludgeoned into fitting, but such models are possible. {Hayduk, L.A., Pazderka-Robinson, H., Cummings, G.G., Levers, M-J.D., & Beres, M.A. (2005). Structural equation model testing and the quality of natural killer cell activity measurements. BMC Medical Research Methodology, 5(1), 1-9. (Open Web Access) doi: 10.1186/1471-2288-5-1. Note the correction of .922 to .992, and the correction of .944 to .994 in Table 1.;  Entwisle, D.R., Hayduk, L.A. and Reilly, T.W. (1982) Early Schooling: Cognitive and Affective Outcomes. Baltimore:  Johns Hopkins University Press. ; Hayduk, L.A. (1994). Personal space: Understanding the simplex model. Journal of Nonverbal Behavior, 18(3):245-260.; Hayduk, L.A., Stratkotter, R., and Rovers, M.W. (1997). Sexual Orientation and the Willingness of Catholic Seminary Students to Conform to Church Teachings. Journal for the Scientific Study of Religion, 36(3):455-467.}

SEM Interpretive Fundamentals
Careful SEM interpretations connect specific model causal segments to their variance and covariance implications. Those new to SEM would be well advised to pursue understanding model implications by beginning with the definitions of variance and covariance, and the covariance based definition of correlation (namely correlation as being the covariance between two variables, each standardized to have variance 1.0). Interpretation of effects in SE models begins with understanding how a causal equation, and the corresponding real world features, use the variance in one variable to causally explain variance in another variable. The next step is to understand how two causal variables can both explain variance in a dependent variable, as well as how covariance between two such causes can increase or decrease explained variance in the effect (yes SE models can explain a decrease in variance). Understanding causal implications implicitly connects to understanding controlling, and why some things but not others should be controlled.{Pearl, J. (2009) Causality: Models, Reasoning, and Inference. (2nd ed.). New York: Cambridge University Press.; Hayduk, L. A., Cummings, G., Stratkotter, R., Nimmo, M., Grugoryev, K., Dosman, D., Gillespie, M., & Pazderka-Robinson, H. (2003). Pearl’s D-separation: One more step into causal thinking. Structural Equation Modeling, 10(2), 289-311. } Another fundamental component is understanding how a common cause explains covariance between two effected variables. As models become more complex these fundamental components can combine in non-intuitive ways, such as explaining how there can be no correlation (zero covariance) between two variables despite the variables being connected by a direct and non-zero causal effect. Authors near SEM’s beginnings were attentive to explaining such matters {Duncan, 1975; Hayduk 1987, 1996} but SEM beginners will likely have to practice considerable self-directed searching to find more recent presentations.

Controversies and Movements
Structural equation modeling is fraught with controversies, many of which have been aired in discussions on SEMNET and which remain available in SEMNET’s archive. SEMNET is a free listserv that is available at semnet@listserv.ua.edu. One disagreement centered on measurement and model testing. Attempting to “simplify” data by reducing multiple indicators to a smaller number of latent factors led many researchers to use factor-structured models. It was hoped the multiple indicators could be replaced by a scale, a composite-indicator, or factor-scores in practical contexts or when modeling the causal actions of the postulated latent factor. This implicitly suggested that research could be conducted stepwise, with measurement assessments providing a scale value or factor-score before attending to latent structural connections. This stepwise approach sounds reasonable but discussion of its multiple underlying deficiencies simmered on SEMNET for more than a year. One key deficiency of the stepwise approach corresponds to the difference between reliability and validity. A factor measurement model requires the causal actions of an underlying latent factor to reliably coordinate the values of the latent’s indicators. A full structural equation model moves toward validity by further requiring that the same latent factor also match the correlations between the latent factor’s indicators and indicators of theorized causes and consequences of that latent factor.{ Hayduk, L. A. (1996). LISREL Issues, Debates, and Strategies. Baltimore: Johns Hopkins University Press. } The indicators should have survived traditional initial assessments but a thorough validity assessment requires that the factor’s indicators (not just a scale manufactured from those indicators) be embedded in a full structural equation model connecting that factor to other relevant latent variables. The validity of a factor is questioned if it fails to function appropriately when embedded in an appropriate theoretical context. The simmering, and sometimes boiling, SEMNET discussions resulted in a special issue of the journal Structural Equation Modeling focused on a target article by Hayduk and Glaser (2000a).{ Hayduk, L.A., & Glaser, D.N. (2000a). Jiving the four-step, waltzing around factor analysis, and other serious fun. Structural Equation Modeling, 7(1), 1-35.)} which was followed by several comments and a rejoinder{Hayduk, L.A., & Glaser, D.N. (2000b). Doing the four-step, right-2-3, wrong-2-3: A brief reply to Mulaik and Millsap; Bollen; Bentler; and Herting and Costner. Structural Equation Modeling, 7(1), 111-123.} (all freely available, thanks to the efforts of G. Marcoulides).

These discussions fueled a concern for testing model-data consistency, which ignited the next round of SEMNET discussions – with the path-history SEMNETers defending careful model testing and those with factor-histories trying to defend fit-indexing rather than fit-testing. This round of SEMNET discussions led to a target article by Barrett {Barrett, P. (2007). Structural equation modeling: Adjudging model fit. Personality and Individual Differences, 42(5), 815-824.} who said: “In fact, I would now recommend banning ALL such indices from ever appearing in any paper as indicative of model “acceptability” or “degree of misfit”.” { Barrett, P. (2007), page 821.}. Barrett’s article drew considerable commentary from both perspectives. {Millsap, R.E. (2007) Structural equation modeling made difficult. Personality and Individual differences 42:875-881.; Hayduk, L.A., Cummings, G., Boadu, K., Pazderka-Robinson, H., & Boulianne, S. (2007). Testing! testing! one, two, three – Testing the theory in structural equation models! Personality and Individual Differences, 42(5), 841-850.}

A briefer controversy focused on competing models. It can be helpful to create competing models but there are fundamental issues that cannot be resolved by creating two models and retaining the better fitting model. The statistical sophistication of the presentations like Levy and Hancock (2007) for example, makes it easy to overlook that a researcher might begin with one terrible model and one atrocious model, and end by retaining the structurally terrible model because some index reports it as better fitting than the other model. Unfortunately, even otherwise strong texts like Kline (2016) {Kline, 2016} are deficient in their presentation of model testing.{Hayduk, L.A. (2018). Review essay on Rex B. Kline’s Principles and Practice of Structural Equation Modeling: Encouraging a fifth edition. Canadian Studies in Population, 45(3-4)154-178. DOI 10.25336/csp29397  (Open web access.).} Overall, careful structural equation modeling requires thorough and detailed diagnostic assessment of failing models, even if the model happens to be the best of several alternative models.

Factor models and theory-embedded factor structures having multiple indicators tend to fail, and one way to reduce the model-data inconsistency is to drop weak indicators. Reducing the number of indicators led to concern for, and controversy over, the minimum number of indicators required to support a latent variable in a structural equation model. Those tied to the factor tradition could be persuaded to reduce the number indicators to three per latent variable, but three or even two indicators can be inconsistent with an underlying factor common cause. Subsequently Hayduk and Littvay (2012) { Hayduk, L.A., & Littvay, L. (2012). Should researchers use single indicators, best indicators, or multiple indicators in structural equation models? BMC Medical Research Methodology, 12:159, 1-17. (Open Web Access).} discussed how to think about, defend, and adjust for measurement error when using only a single indicator for each latent variable. Single indicators can be used effectively in SE models, and have been used for a long time,{ Entwisle, D. R., Hayduk, L. A. and Reilly, T.W. (1982) Early Schooling: Cognitive and Affective Outcomes. Baltimore:  Johns Hopkins University Press.}  but controversy is only as far away as a reviewer who happens to be dedicated to considering measurement from only a factor analytic perspective.

The controversy over model testing declined as SEMNET moved toward requiring clear reports of model-data inconsistency and encouraged attention to model misspecification. This is evident in the lack of opposition to the requirement of addressing “endogeneity”, namely model misspecification leading to lack of independence of error variables. Attention to proper model specification is solidifying as a SEM benchmark-requirement.

The controversy over the causal nature of structural equation models has also been declining, and Bollen and Pearl’s (2013) {Bollen, K.A. and Pearl, J. (2013) Eight myths about causality and structural equation models. In S.L. Morgan (ed.) Handbook of Causal Analysis for Social Research, Chapter 15, 301-328, Springer.}  elucidation of many of the confounding-myths will likely extend this trend. Even Mulaik, a factor-analysis stalwart, has acknowledged the causal basis of factor models.{Mulaik, S.A., 2009}

Though declining, tinges of these and many other controversies are scattered throughout the SEM literature, and you can easily incite disagreement by asking: What should be done with models that are significantly inconsistent with the data? Or by asking: Does simplicity override respect for evidence of data inconsistency? Or, what weight should be given to indexes which show close or not-so-close data fit for some models? Or, should we be especially lenient toward, and “reward”, parsimonious models that are inconsistent with the data? Or, given that the RMSEA condones disregarding some real ill fit for each model degree of freedom, doesn’t that mean that people testing models with null-hypotheses of non-zero RMSEA are doing deficient model testing? Considerable variation in sophistication is required to address such questions, though there are likely to be common response styles – likely centered on the interlocutors’ assessments of whether researchers are required to report and respect evidence of model-data inconsistency.

Modeling Alternatives, Extensions, and Statistical Kin
The basic structural equation model has been extended in multiple ways, and alternative estimation strategies have been developed.

·         Longitudinal models  {reference required}

·         Growth curve models  {reference required}

·         Multiple group models (genders, cultures, regions, organizations, genetic groups, with or without specified constraints between groups)  {reference required}

·         Latent class models  {reference required}

·         Latent growth modeling  {reference required}

·         Hierarchical/multilevel models (people nested within groups; responses nested within people)  {reference required}

·         Mixture model s Multi-trait multi-method models  {reference required}

·         Measurement invariance models  {reference required}

·         Random intercepts models  {reference required}

·        Fusion validity models

SEM-specific software
LISREL (the first full SEM software, introduced basic notation, not free)

https://ssicentral.com/index.php/products/lisrel/

Mplus (not free)  https://www.statmodel.com/

SEM in Stata  (not free)   https://www.stata.com/features/structural-equation-modeling/

Lavaan (free R package)   https://cran.r-project.org/web/packages/lavaan/index.html

OpenMX (another free R package)   https://cran.r-project.org/web/packages/OpenMx/index.html

AMOS (not free)  https://www.ibm.com/products/structural-equation-modeling-sem (there is a placeholder for this on the disambiguation page: AMOS (statistical software package))

EQS  (not free)  https://www.azom.com/software-details.aspx?SoftwareID=11

Structural equation modeling programs differ widely in their capabilities and user requirements. Generally, the “simpler” or “more convenient” the user input, the greater the number of implicit model assumptions, and the greater the risk of introducing problems, or disregarding output signaling problems, created by insufficient attention to research design, methodology, measurement, or estimation difficulties.

Abbreviations Common in the Literature
CFA = Confirmatory Factor Analysis

CFI = Comparative Fit Index

DWLS = Diagonally Weighted Least Squares

EFA = Exploratory Factor Analysis

ESEM = Exploratory Structural Equation Modeling

GFI = Goodness of Fit Index

MIIV = Model Implied Instrumental Variables { Bollen, K. (2018). Model implied instrumental variables (MIIVs): An alternative orientation to structural equation modeling. Multivariate Behavioral Research, 54(1), 31-46. doi: 10.1080/00273171.2018.1483224. }

OLS = Ordinary Least Squares

RMSEA = Root Mean Square Error of Approximation

RMSR = Root Mean Squared Residual

SRMR = Standardized Root Mean Squared Residual

SEM = Structural Equation Model or Modeling

WLS  = Weighted Least Squares

χ2 = Chi square

{I have left the references to be collected from the text, and formatted, by the Wikipedia people (due to the likely formatting complications). }