Nonhomogeneous Gaussian regression

Non-homogeneous Gaussian regression (NGR) is a type of statistical regression analysis used in the atmospheric sciences as a way to convert ensemble forecasts into probabilistic forecasts. Relative to simple linear regression, NGR uses the ensemble spread as an additional predictor, which is used to improve the prediction of uncertainty and allows the predicted uncertainty to vary from case to case. The prediction of uncertainty in NGR is derived from both past forecast errors statistics and the ensemble spread. NGR was originally developed for site-specific medium range temperature forecasting, but has since also been applied to site-specific medium-range wind forecasting and to seasonal forecasts, and has been adapted for precipitation forecasting. The introduction of NGR was the first demonstration that probabilistic forecasts that take account of the varying ensemble spread could achieve better skill scores than forecasts based on standard model output statistics approaches applied to the ensemble mean.

Intuition
Weather forecasts generated by computer simulations of the atmosphere and ocean typically consist of an ensemble of individual forecasts. Ensembles are used as a way to attempt to capture and quantify the uncertainties in the weather forecasting process, such as uncertainty in the initial conditions and uncertainty in the parameterisations in the model. For point forecasts of normally distributed variables, one can summarize an ensemble forecast with the mean and the standard deviation of the ensemble. The ensemble mean is often a better forecast than any of the individual forecasts, and the ensemble standard deviation may give an indication of the uncertainty in the forecast.

However, direct output from computer simulations of the atmosphere needs calibration before it can be meaningfully compared with observations of weather variables. This calibration process is often known as model output statistics (MOS). The simplest form of such calibration is to correct biases, using a bias correction calculated from past forecast errors. Bias correction can be applied to both individual ensemble members and the ensemble mean. A more complex form of calibration is to use past forecasts and past observations to train a simple linear regression model that maps the ensemble mean onto the observations. In such a model the uncertainty in the prediction is derived purely from the statistical properties of the past forecast errors. However, ensemble forecasts are constructed with the hope that the ensemble spread may contain additional information about the uncertainty, above and beyond the information that can be derived from analysing past performance of the forecast. In particular since the ensemble spread is typically different for each successive forecast, it has been suggested that the ensemble spread may give a basis for predicting different levels of uncertainty in different forecasts, which is difficult to do from past performance-based estimates of uncertainty. Whether the ensemble spread actually contains information about forecast uncertainty, and how much information it contains, depends on many factors such as the forecast system, the forecast variable, the resolution and the lead time of the forecast.

NGR is a way to include information from the ensemble spread in the calibration of a forecast, by predicting future uncertainty as a weighted combination of the uncertainty estimated using past forecast errors, as in MOS, and the uncertainty estimated using the ensemble spread. The weights on the two sources of uncertainty information are calibrated using past forecasts and past observations in an attempt to derive optimal weighting.

Overview
Consider a series of past weather observations $$y_t$$ over a period of $$T$$ days (or other time interval):


 * $$y_t, \quad t=1,\ldots,T$$

and a corresponding series of past ensemble forecasts, characterized by the sample mean $$m_t$$ and standard deviation $$s_t$$ of the ensemble:


 * $$(m_t,s_t), \quad t=1,\ldots,T$$.

Also consider a new ensemble forecast from the same system with ensemble mean $$M$$ and ensemble standard deviation $$S$$, intended as a forecast for an unknown future weather observation $$Y$$.

A straightforward way to calibrate the new ensemble forecast output parameters $$(M,S)$$ and produce a calibrated forecast for $$Y$$ is to use a simple linear regression model based on the ensemble mean $$M$$, trained using the past weather observations and past forecasts:


 * $$y_t \sim N(\alpha+\beta m_t, \sigma^2)$$

This model has the effect of bias correcting the ensemble mean and adjusting the level of variability of the forecast. It can be applied to the new ensemble forecast $$(M,S)$$ to generate a point forecast for $$Y$$ using


 * $$\hat{Y}=\hat{\alpha}+\hat{\beta} M$$

or to obtain a probabilistic forecast for the distribution of possible values for $$Y$$ based on the normal distribution with mean $$\hat{\alpha}+\hat{\beta} M$$ and variance $$\hat{\sigma}^2$$:


 * $$\hat{Y} \sim N(\hat{\alpha}+\hat{\beta} M, \hat{\sigma}^2)$$

The use of regression to calibrate weather forecasts in this way is an example of model output statistics.

However, this simple linear regression model does not use the ensemble standard deviation $$S$$, and hence misses any information that the ensemble standard deviation may contain about the forecast uncertainty. The NGR model was introduced as a way to potentially improve the prediction of uncertainty in the forecast of $$Y$$ by including information extracted from the ensemble standard deviation. It achieves this by generalising the simple linear regression model to either:


 * $$y_t \sim N(\alpha+\beta m_t, \sigma=\gamma + \delta s_t)$$

or


 * $$y_t \sim N(\alpha+\beta m_t, \sigma^2=\gamma + \delta s_t^2)$$

this can then be used to calibrate the new ensemble forecast parameters $$(M,S)$$ using either


 * $$\hat{Y} \sim N(\hat{\alpha}+\hat{\beta} M, \hat{\sigma}=\hat{\gamma} + \hat{\delta} S)$$

or


 * $$\hat{Y} \sim N(\hat{\alpha}+\hat{\beta} M, \hat{\sigma}^2=\hat{\gamma} + \hat{\delta} S^2)$$

respectively. The prediction uncertainty is now given by two terms: the $$\gamma$$ term is constant in time, while the $$\delta$$ term varies as the ensemble spread varies.

Parameter estimation
In the scientific literature the four parameters $$\alpha, \beta, \gamma, \delta$$ of NGR have been estimated either by maximum likelihood or by maximum continuous ranked probability score (CRPS). The pros and cons of these two approaches have also been discussed.

History
NGR was originally developed in the private sector by scientists at Risk Management Solutions Ltd for the purpose of using information in the ensemble spread for the valuation of weather derivatives.

Terminology
NGR was originally referred to as ‘spread regression’ rather than NGR. Subsequent authors, however, introduced first the alternative names Ensemble Model Output Statistics (EMOS) and then NGR. The original name ‘spread regression’ has now fallen from use, EMOS is used to refer generally to any method used for the calibration of ensembles, and NGR is typically used to refer to the method described in this article.