User:Scientificinfo/sandbox

Generalized Model Aggregation (GMA) is a generic method for aggregation of prior statistical findings into a meta-model. It provides a flexible approach to synthesize empirical findings from heterogeneous prior studies: studies that have a different subset of explanatory variables and different study designs and methods.

From health to economics, many research disciplines utilize different statistical methods to estimate relationships among various variables and summarizing those findings is critical to build on previous work and assess the viability of a new model. Also, the rapid growth in scientific output requires flexible and robust methods for aggregating the findings from prior studies. In most cases, ' qualitative ' reviews, such as narrative reviews, are conducted for taking stock of what is known, but they offer little 'quantitative ' guidance. Also, the common quantitative aggregation methods (i.e., meta-analysis methods) usually combine one explanatory variable (e.g., a treatment) on one response variable (e.g., a health outcome) across multiple studies with similar designs.

Description
In the first step, empirical statistical findings (denoted as signatures) are extracted from prior studies. These signatures can be regression coefficients, correlation matrices, and variance of effect sizes across prior studies. In general, any measures that provide any information about the underlying phenomenon (i.e., the true underlying model which is unknown) can be used as signatures. Even if these signatures are biased or incomplete, they include relevant information about the true data generating process. Prior models then need to be simulated (i.e., replicated) in order to generate simulated signatures. The replication of the prior models should follow the exact sample size, variables studied, and any conditions that the counterpart original study had. In order to generate simulated signatures, the simulated models need simulated data on both explanatory and response variables. The simulated response variable is the output of the meta-model.

Eventually, the data generating process (meta-model) is numerically estimated through the optimization. In other words, the optimization solver estimates the unknown parameters of the meta-model by matching the vectors of simulated and empirical signatures.

Properties
GMA estimates share their theoretical underpinnings with the method of simulated moments and indirect inference. This gives them many appealing statistical properties :
 * GMA estimates do not require explicit likelihood functions. They are consistent under mild assumptions, as long as the underlying signatures are consistent and include enough information to identify the model.
 * Estimated parameters are asymptotically multivariate normal under many common scenarios, providing a straightforward path to confidence interval estimation and hypothesis testing.
 * While any positive semi-definite weighting matrix can be used for estimation, W is optimal (in terms of minimizing the variance of estimated parameters) when it is proportional to the inverse of the covariance matrix of the empirical signatures.

Codes
All the codes along with instructions on how to execute them are publicly available.