Quantification of margins and uncertainties

Quantification of Margins and Uncertainty (QMU) is a decision support methodology for complex technical decisions. QMU focuses on the identification, characterization, and analysis of performance thresholds and their associated margins for engineering systems that are evaluated under conditions of uncertainty, particularly when portions of those results are generated using computational modeling and simulation. QMU has traditionally been applied to complex systems where comprehensive experimental test data is not readily available and cannot be easily generated for either end-to-end system execution or for specific subsystems of interest. Examples of systems where QMU has been applied include nuclear weapons performance, qualification, and stockpile assessment. QMU focuses on characterizing in detail the various sources of uncertainty that exist in a model, thus allowing the uncertainty in the system response output variables to be well quantified. These sources are frequently described in terms of probability distributions to account for the stochastic nature of complex engineering systems. The characterization of uncertainty supports comparisons of design margins for key system performance metrics to the uncertainty associated with their calculation by the model. QMU supports risk-informed decision-making processes where computational simulation results provide one of several inputs to the decision-making authority. There is currently no standardized methodology across the simulation community for conducting QMU; the term is applied to a variety of different modeling and simulation techniques that focus on rigorously quantifying model uncertainty in order to support comparison to design margins.

History
The fundamental concepts of QMU were originally developed concurrently at several national laboratories supporting nuclear weapons programs in the late 1990s, including Lawrence Livermore National Laboratory, Sandia National Laboratory, and Los Alamos National Laboratory. The original focus of the methodology was to support nuclear stockpile decision-making, an area where full experimental test data could no longer be generated for validation due to bans on nuclear weapons testing. The methodology has since been applied in other applications where safety or mission critical decisions for complex projects must be made using results based on modeling and simulation. Examples outside of the nuclear weapons field include applications at NASA for interplanetary spacecraft and rover development, missile six-degree-of-freedom (6DOF) simulation results, and characterization of material properties in terminal ballistic encounters.

Overview
QMU focuses on quantification of the ratio of design margin to model output uncertainty. The process begins with the identification of the key performance thresholds for the system, which can frequently be found in the systems requirements documents. These thresholds (also referred to as performance gates) can specify an upper bound of performance, a lower bound of performance, or both in the case where the metric must remain within the specified range. For each of these performance thresholds, the associated performance margin must be identified. The margin represents the targeted range the system is being designed to operate in to safely avoid the upper and lower performance bounds. These margins account for aspects such as the design safety factor the system is being developed to as well as the confidence level in that safety factor. QMU focuses on determining the quantified uncertainty of the simulation results as they relate to the performance threshold margins. This total uncertainty includes all forms of uncertainty related to the computational model as well as the uncertainty in the threshold and margin values. The identification and characterization of these values allows the ratios of margin-to-uncertainty (M/U) to be calculated for the system. These M/U values can serve as quantified inputs that can help authorities make risk-informed decisions regarding how to interpret and act upon results based on simulations.



QMU recognizes that there are multiple types of uncertainty that propagate through a model of a complex system. The simulation in the QMU process produces output results for the key performance thresholds of interest, known as the Best Estimate Plus Uncertainty (BE+U). The best estimate component of BE+U represents the core information that is known and understood about the model response variables. The basis that allows high confidence in these estimates is usually ample experimental test data regarding the process of interest which allows the simulation model to be thoroughly validated.

The types of uncertainty that contribute to the value of the BE+U can be broken down into several categories:


 * Aleatory uncertainty: This type of uncertainty is naturally present in the system being modeled and is sometimes known as “irreducible uncertainty” and “stochastic variability.” Examples include processes that are naturally stochastic such as wind gust parameters and manufacturing tolerances.
 * Epistemic uncertainty: This type of uncertainty is due to a lack of knowledge about the system being modeled and is also known as “reducible uncertainty.” Epistemic uncertainty can result from uncertainty about the correct underlying equations of the model, incomplete knowledge of the full set of scenarios to be encountered, and lack of experimental test data defining the key model input parameters.

The system may also suffer from requirements uncertainty related to the specified thresholds and margins associated with the system requirements. QMU acknowledges that in some situations, the system designer may have high confidence in what the correct value for a specific metric may be, while at other times, the selected value may itself suffer from uncertainty due to lack of experience operating in this particular regime. QMU attempts to separate these uncertainty values and quantify each of them as part of the overall inputs to the process.

QMU can also factor in human error in the ability to identify the unknown unknowns that can affect a system. These errors can be quantified to some degree by looking at the limited experimental data that may be available for previous system tests and identifying what percentage of tests resulted in system thresholds being exceeded in an unexpected manner. This approach attempts to predict future events based on the past occurrences of unexpected outcomes.

The underlying parameters that serve as inputs to the models are frequently modeled as samples from a probability distribution. The input parameter model distributions as well as the model propagation equations determine the distribution of the output parameter values. The distribution of a specific output value must be considered when determining what is an acceptable M/U ratio for that performance variable. If the uncertainty limit for U includes a finite upper bound due to the particular distribution of that variable, a lower M/U ratio may be acceptable. However, if U is modeled as a normal or exponential distribution which can potentially include outliers from the far tails of the distribution, a larger value may be required in order to reduce system risk to an acceptable level.

Ratios of acceptable M/U for safety critical systems can vary from application to application. Studies have cited acceptable M/U ratios as being in the 2:1 to 10:1 range for nuclear weapons stockpile decision-making. Intuitively, the larger the value of M/U, the less of the available performance margin is being consumed by uncertainty in the simulation outputs. A ratio of 1:1 could result in a simulation run where the simulated performance threshold is not exceeded when in actuality the entire design margin may have been consumed. It is important to note that rigorous QMU does not ensure that the system itself is capable of meeting its performance margin; rather, it serves to ensure that the decision-making authority can make judgments based on accurately characterized results.

The underlying objective of QMU is to present information to decision-makers that fully characterizes the results in light of the uncertainty as understood by the model developers. This presentation of results allows decision makers an opportunity to make informed decisions while understanding what sensitivities exist in the results due to the current understanding of uncertainty. Advocates of QMU recognize that decisions for complex systems cannot be made strictly based on the quantified M/U metrics. Subject matter expert (SME) judgment and other external factors such as stakeholder opinions and regulatory issues must also be considered by the decision-making authority before a final outcome is decided.

Verification and validation
Verification and validation (V & V) of a model is closely interrelated with QMU. Verification is broadly acknowledged as the process of determining if a model was built correctly; validation activities focus on determining if the correct model was built. V&V against available experimental test data is an important aspect of accurately characterizing the overall uncertainty of the system response variables. V&V seeks to make maximum use of component and subsystem-level experimental test data to accurately characterize model input parameters and the physics-based models associated with particular sub-elements of the system. The use of QMU in the simulation process helps to ensure that the stochastic nature of the input variables (due to both aleatory and epistemic uncertainties) as well as the underlying uncertainty in the model are properly accounted for when determining the simulation runs required to establish model credibility prior to accreditation.

Advantages and disadvantages
QMU has the potential to support improved decision-making for programs that must rely heavily on modeling and simulation. Modeling and simulation results are being used more often during the acquisition, development, design, and testing of complex engineering systems. One of the major challenges of developing simulations is to know how much fidelity should be built into each element of the model. The pursuit of higher fidelity can significantly increase development time and total cost of the simulation development effort. QMU provides a formal method for describing the required fidelity relative to the design threshold margins for key performance variables. This information can also be used to prioritize areas of future investment for the simulation. Analysis of the various M/U ratios for the key performance variables can help identify model components that are in need of fidelity upgrades to order to increase simulation effectiveness.

A variety of potential issues related to the use of QMU have also been identified. QMU can lead to longer development schedules and increased development costs relative to traditional simulation projects due to the additional rigor being applied. Proponents of QMU state that the level of uncertainty quantification required is driven by certification requirements for the intended application of the simulation. Simulations used for capability planning or system trade analyses must generally model the overall performance trends of the systems and components being analyzed. However, for safety-critical systems where experimental test data is lacking, simulation results provide a critical input to the decision-making process. Another potential risk related to the use of QMU is a false sense of confidence regarding protection from unknown risks. The use of quantified results for key simulation parameters can lead decision makers to believe all possible risks have been fully accounted for, which is particularly challenging for complex systems. Proponents of QMU advocate for a risk-informed decision-making process to counter this risk; in this paradigm, M/U results as well as SME judgment and other external factors are always factored into the final decision.