The use of Ensembles

Climate model results provide the basis for projections of future climate change. Historically the international scientific community released assessment reports that included model evaluations but avoided weighting or ranking models. Projections and uncertainties are based mostly on a 'one model, one vote' approach, despite the fact that models differ in terms of resolution, processes included, forcings and agreement with observations.

Projections in the IPCC’s Fifth (AR5) and Sixth Assessment (AR6) Reports are based largely on CMIP5 and CMIP6 of the World Climate Research Programme (WCRP), a collaborative process in which the research and modelling community has agreed on the type of simulations to be performed. While many different types of climate models exist, the following discussion focuses on the global dynamical models included in the CMIP project.

Uncertainties in climate modelling arise from uncertainties in initial conditions, boundary conditions (e.g., a radiative forcing scenario), observational uncertainties, uncertainties in model parameters and structural uncertainties resulting from the fact that some processes in the climate system are not fully understood or are impossible to resolve due to computational constraints. The widespread participation in CMIP provides some perspective on model uncertainty. Nevertheless, inter-comparisons that facilitate systematic multi-model evaluation are not designed to yield formal error estimates, and are in essence ‘ensembles of opportunity’. The spread of a multiple model ensemble is therefore rarely a direct measure of uncertainty, particularly given that models are unlikely to be independent, but the spread can help to characterise uncertainty. This involves understanding how the variation across an ensemble was generated, making assumptions about the appropriate statistical framework, and choosing appropriate model quality metrics. Such topics are only beginning to be addressed by the research community (e.g., Randall et al., 2007; Tebaldi and Knutti, 2007; Gleckler et al., 2008; Knutti, 2008; Reichler and Kim, 2008; Waugh and Eyring, 2008; Pierce et al., 2009; Santer et al., 2009; Annan and Hargreaves, 2010; Knutti, 2010; Knutti et al., 2010).

When analysing results from multi-model ensembles, the following points should be taken into account:

Forming and interpreting ensembles for a particular purpose requires an understanding of the variations between model simulations and model set-up, and clarity about the assumptions.
The distinction between ‘best effort’ simulations (i.e., the results from the default version of a model submitted to a multi-model database) and ‘perturbed physics’ ensembles is important and must be recognized. Perturbed physics ensembles can provide useful information about the spread of possible future climate change and can address model diversity in ways that best effort runs are unable to do.
In many cases it may be appropriate to consider simulations from CMIP5 and combine CMIP5 and CMIP6 recognizing differences in specifications (e.g., differences in forcing scenarios). IPCC assessments should consider the large amount of scientific work on CMIP5, in particular in cases where lack of time prevents an in depth analysis of CMIP6. It is also useful to track model improvement through different generations of models.
Consideration needs to be given to cases where the number of ensemble members or simulations differs between contributing models. The single model’s ensemble size should not inappropriately determine the weight given to any individual model in the multi-model ensemble. In some cases ensemble members may need to be averaged first before combining different models, while in other cases only one member may be used for each model.
Ensemble members may not represent estimates of the climate system behaviour (trajectory) entirely independent of one another. This is likely true of members that simply represent different versions of the same model or use the same initial conditions. But even different models may share components and choices of parameterixations of processes and may have been calibrated using the same data sets.