Wednesday, February 25, 2009

S09, MM07 and Spatial Autocorrelation

S09 suggests that spatial autocorrelation caused spurious results in MM07. In addition S09 points out that using the RSS data the significance of the MM07 results decreases. S09 also demonstrates that spurious correlations can be shown by regressing economic statistics against model driven temperature data. My conclusion is that using standard techniques of spatial analysis these findings as they relate to MM07 appear to be incorrect. There may in fact be other issues with MM07 but I am unconvinced that the arguments in S09 disprove MM07.

As a caveat I have only recently taken a look at spatial analysis. In addition I am hardly an expert in statistics. But the methods for doing this seem straight forward, and it is well implemented in the R language, using the spdep package. There are a number of options to choose on the various functions. I have played around with these without the results changing. I have not obtained any unreported results counter to my conclusions, but readers are free to try other options of course. Once again I have published all my code, and the locations of the data. I am wide open to suggestions and criticism.

It is well understood that the test for spatial autocorrelation in a regression is whether the residuals are spatially correlated. The most common test for this seems to be Moran’s I test. Moran’s I test yields a value between –1 and 1, which indicates the amount and sign of correlation. It also yields a p-value, which indicates whether the correlation is significant.
Running Moran’s I on the main UAH regression the statistic is .01 with a p-value of .24. This shows an insignificant and very small positive spatial correlation. I actually ran the test using three distance weightings schemes 1/x, 1/x^2, and 1/sqrt(x). All showed similar results. I think 1/x is the most standard and I will report the rest of the results in this post using that weight scheme.

The conclusion unsurprisingly is the same as Dr. McKitrick’s follow up paper (unpublished, but available on his website) dealing with spatial autocorrelation of his results.

As a point of interest I ran the Moran’s I test on the results of the regression using the RSS data as discussed in S09. A bit surprisingly this shows more signs of spatial autocorrelation at .029 and significance just below the 95% confidence level at p-value= .053. Later I will show the results of running both the UAH and RSS regressions with regression estimators that take into account spatial autocorrelation.

Turning to the model data I duplicated the results in S09 by running a regression using the modeled tropospheric and surface data. Exactly as in S09 various economic variables showed significance in the regression, although the coefficients are very small.

Running the Moran’s I test on this result showed that the residuals are significantly correlated with location. The statistic is .06 with a p-value < .01. Thus as hypothesized in S09 the spurious results of this regression appear to be caused by spatial autocorrelation. However as I have shown the fact that this regression gets spurious results in no way means that the MM07 results are spurious as shown by the Moran’s I test.

Another interesting result, which is only discussed briefly in MM07 and was not discussed in S09 is that a regression of the surface data without tropospheric data also shows significance for the economic variables. This result would tend to minimize the concern about the choice of satellite data introduced in S09. I duplicated this regression result, but the Moran’s I test shows very significant autocorrelation in the residuals. The statistic is .17 with a p-value < .01. So for the moment at least the regression result isn’t meaningful.

The lagsarlm function in spdep is a regression estimator that takes spatial autocorrelation into account. It yields significance factors for the variables, as well as p-value showing the significance of the overall result. As a start I ran an estimate on the model described above which doesn’t use the tropospheric data. The resulting estimate shows that several of the economic data are significant (all but x). In addition the overall p-value is < .01. Using the Model E data I get an estimate where the coefficients are extremely small but they are significant for several (e,x,p,m). However the p-value of the result is .38 and so the overall result isn’t significant. This then shows that the test is working since we wouldn’t expect the result to be significant as the economic data could have no effect on the model.

Running the UAH model through lagsarlm I get a similar result to the original regression. In addition the p-value is .02 showing that the result is significant. So the UAH model in MM07 holds up under both tests.

Finally I ran the RSS model from S09 . Taking spatial autocorrelation into account raises the significance of the economic variables. Now g, e, p, m, y, and c are all significant. In addition the p-value for the overall regression is .018. Absent any other information then I would conclude that MM07 is demonstrating that surface temperature trends are affected by a set of economic variables.

This is hardly surprising since there are all sorts of ways that this could happen. Urban heat island is only one reason along with various large scale surface changes. I don’t know the significance of this but the Model E data shows the troposphere trend at .16 degrees per decade with the surface at .14 degrees per decade. This is in contrast to the measured data with surface at .30 and the troposphere (both UAH and RSS) at .24. In the model the surface is heating ten percent less than the troposphere, but as measured in the real world the surface is heating 25% faster. Of course all kinds of things could cause this discrepancy including issues with the model data, and with the measurements of either the troposphere or the surface. But it is ironic that this roughly 30% swing is in line with MM07s estimate of the bias in the global surface temperature trend.

Test results can be found here. An ever growing code file can be found here.

3 comments:

  1. Nicolas, obviously I'm a firm believer in posting code for a variety of reasons and thank you for doing so. As a reader, I'm often interested in experimenting with slight variations and this can be done very efficiently if you don't have to run the gauntlet of trying to figure out every nuance of the prose.

    One suggestion. OVer the last couple of years, I've tried to make my online code "turnkey" as much as possible. I.e. try to remove all references to internal directories in favor of downloads within R - which has excellent tools to do so. It would be easy to modify your script to be turnkey and I think that you'd be pleased with the product.

    ReplyDelete
  2. I'll take a look at how to do that. Thanks.

    ReplyDelete
  3. Keep up your good work. Perhaps you will be he bridge between CA and Real Climate.

    ReplyDelete