Saturday, April 11, 2009

Duplicating Rahmstorf 2007

I had quite an interesting time replicating the calculations from "A Semi-Empirical Approach to Projecting Future Sea-Level Rise" (Science 1/19/2007 R07) by Stefan Rahmstorf. The general concept in the paper is very simple. He estimates a linear equation for the rate of sea level rise based on global temperature from measured data. The difficulty came in because the paper relies on an algorithm that is not generally available. To demonstrate this it seems that although there were two critical comments on his paper published in Science neither replicated the original calculations of R07. It only became possible to replicate the results when Dr. Rahmstorf published his code as part of his response to those comments. Even then you had to do a little hunting. In this post I will show that I have duplicated those results, so that my subsequent comments make sense.

R07 is quite simple in concept. His theory is that the rate of sea level rise has a linear correlation with surface temperature. The idea is that you start with a situation where sea level is stable. Then you raise the temperature causing sea level to rise until is reaches a new equilibrium. I won't explain the whole paper, but this makes perfect sense. There are, of course, multiple inputs to sea level rise, but pretty much all of them should respond to a change in surface temperature, the question is how fast and how much.

To estimate this linear equation he uses sea level data from Church and White (IPCC), as well as the GISS global temperature record. But this is where things get a little tricky. He doesn't do a simple regression on the temperature and sea level because this data is "noisy." Instead runs both the temperature and the sea level data through a process to separate the trend from the noise. This isn't explained in the main text of the paper, but it is referred to in the text below figures 2 and 3. "Both temperature andsea-level curves were smoothed by computing nonlinear trend lines, with an embedding period of 15 years (14)." He uses the word "smoothed" but this is not a typical smoothing algorithm. Use of a typical polynomial fit will not get as "good" a result, as I discovered in early attempts.

I might have discovered this more quickly if I had read all the way to the bottom of Dr. Rahmstorf's reply to comments where a link to the code and data was published. The code and data weren't linked from the original paper which is where I started. Both published comments also didn't have the benefit of this code and data which is clear from reading them, and I guess after that everyone called it a day. But I will get to this stuff in a later post.

Reference 14 from the paper is "J. C. Moore, A. Grinsted, S. Jevrejeva, Eos 86, 226 (2005)." This paper is titled "New Tools for Analyzing Time Series Relationships and Trends." It is actually a short review article of several new mathematical techniques. At first glance it certainly isn't immediately apparent that this tells us how the "smoothing" was done. But there is a section titled "Nonlinear Trends in Sea Level and Temperature." This section refers to the use of Single Spectrum Analysis (SSA) to extract a nonlinear trend. It refers to Ghil et. al. 2002. So this is at best an indirect reference.

At this point I still hadn't discovered the code at the bottom of the reply, but I did have a lead. So I looked into SSA and discovered software at UCLA that was available. To make a long story short SSA is necessary but not sufficient to duplicate the R07 results. In fact further searches, and eventually the code led me to the fact that R07 relied on a matlab function called ssatrend.m. Dr. Rahmstorf on Real Climate indicated that this was available from Aslak Grinsted.

I wrote Dr. Grinsted who wrote me back very promptly, and sent me the source code to ssatrend.m. He also commented that he had no idea how Dr. Rahmstorf had gotten a copy of it, and that he had never meant for it to be distributed. I think that he was just concerned about it being unsupported. My own view is that it is pretty strange to use some random piece of code in a published paper without making the code your own. Especially where, as in this case, the strength of ther result depends on using this specific algorithm. It isn't available from any of the usual Matlab repositories which is the first place I looked.

In any event with a little effort I found an SSA algorithm written in R on source forge. I checked its results agains the UCLA program and they were identical so I knew the foundation was good. Then with some effort, I translated ssatrend.m into R so that I could run it on an open platform.

Finally I translated the code supplied in the reply written by Dr. Rahmstorf into R so that I could at see if I was starting from the same place. This was successful.

Using the supplied input files for sea level and temperature I get a correlation coefficient (R) of .88 exactly as reported in R07. In addition the following graphs are identical to figures two and three (top) of R07. The code is here.

At the end of the day the use of this unpublished algorithm made this much harder than it should have been as the underlying concept is very simple. I have no idea how a reviewer could have evaluated whether the use of this algorithm made sense. Having said that I don't really see any problem with it, and I look forward to trying ssatrend with other data.

But now that I can get the exact results from the paper, I have a couple of comments that go beyond what has already been written in Science.


  1. Interesting.
    What happens to the correlation value when embedding period of 5, 10, 20, 25 years are used ? My cynical guess is that 15 years provides the largest correlation of any "multiple of 5" (it would look pretty suspicious if he said '13.76 year embedding period' for example) and is another example of climate scientists picking the parameters AFTER they know what gives the best desired result. Hope i'm wrong.

  2. Interesting question about the correlation value. The paper discussed varying the slope of the estimate is the same when the embedding period is anywhere from 2 to 17 years. This is true. What is also true is that the correlation drops to .68 at 5 years, and only .47 at 2 years. Above 17 years the correlation is still good, but the trend starts to drop.

    So in a sense you are right that the 15 year choice is "optimal." On the other hand the purpose of using SSA is to find the signal in noisy data, and I think that this is a good balance.

    The real issue is whether this type of model works at all, which I think is doubtful even when just looking at what is presented in the paper.

  3. I just thought of an additional question on this subject:

    Given that the warming over the 20th century is not evenly distributed with several large areas showing very little/no warming while other more localised areas like Chinese city centers and airports show much greater warming - can the same also not be said of ocean temperatures ? The immediate follow on to this is, does a large localised temperature increase in one part of the ocean increase the volume of the ocean more or less than the same temperature increase averaged across the entire ocean ?

    Given that I do not think I have read this being discussed, I would again cynically guess that localised heating has a significantly smaller effect on the increase in ocean volume, and that is why the projections use a smoothed out average across the whole ocean.

  4. You might be interested in this series of extensive thoughts on that paper:

  5. Anonymous,

    I believe that localized urban heating is unique to land and you wouldn't see that type of thing in the oceans. Even if you did the thermosteric effects would proportionately increase all sea level.

    Having said that sea level is not a single number, and has increases at different rates and times in different locations. This paper and my response deal in the global average.

  6. Andrew,

    I have written an even better :-) critique in my next post!

  7. (TCO

    I like how you go after things. Here is an example of McI using a specialized subroutine that got him a more extreme result.

    (read comments and the search for understanding hosking.sim)