We employ the percentage improvement in mean square error over a climatological forecast (MSEClim) as our skill score measure. The climatologies used here are a rolling prior 10year climatology for the replicated realtime forecast period 19882002, a fixed 51year climatology 19522002 for the crossvalidated hindcast period 19522002 and a fixed 53year climatology 19502002 for the crossvalidated hindcast period 19502002. MSEClim is the standard skill metric recommended by the World Meteorological Organisation for verification of deterministic seasonal forecasts [WMO, 2002]. MSEClim is a robust skill measure which is immune to the bias problems associated with the correlation and percent age of variance skill measures. Positive skill indicates the model does better than a climatology forecast, while negative skill indicates that it does worse than climatology. 
We compute the statistical significance of the MSEClim skill using the bootstrap method [Efron, 1979; also see Efron and Gong, 1983]. The bootstrap tests the hypothesis that the model forecasts are more skilful than those from climatology to a given level of significance. We apply the bootstrap by randomly selecting (with replacement) 15 (19882002), 51 (19522002) or 53 (19502002) actual values together with their associated predicted and climatology forecast values to provide a fresh set of hindcasts for which the MSEClim skill measure can be calculated. This process is repeated 2,000 times and the results histogrammed to give the required skill score. Provided the original data are independent (in distribution and in order), the distribution of these recalculated values maps the uncertainty in the forecast skill about the original value over a 15year or 51(53)year period. 95% twotailed confidence intervals for this uncertainty are then readily obtained. Where the lower boundary of this 95% confidence interval has an MSEClim skill value greater than 0% the model forecast has skill better than climatology to 97.5% confidence. 
