Blogs review: The Reinhart and Rogoff debacle
What’s at stake: The authors of the widely acclaimed book on the history of financial crises, This Time is Different, have faced the mother of all academic backlashes after a group of economists identified important flaws (including basic coding errors) in their analysis of the relationship between public debt and economic growth.
What’s at stake: The authors of the widely acclaimed book on the history of financial crises, This Time is Different, have faced the mother of all academic backlashes after a group of economists identified important flaws (including basic coding errors) in their analysis of the relationship between public debt and economic growth. They, in particular, debunked the now popular notion that there is a debt threshold (90% of GDP) after which economic growth decreases in a nonlinear way. The availability of the dataset (in Stata) has also allowed an econometrician raise serious doubts on the idea that the causality runs from debt to GDP. While the backlash has, until now, centered mostly around this particular work on debt and GDP growth, similar data issues have been identified in the book This Time is Different.
The 90% threshold, the austerity narrative, and the twitter backlash
Paul Krugman writes that the intellectual edifice of austerity economics rests largely on two academic papers that have now been debunked. One was Alesina/Ardagna (see our previous review here) on expansionary fiscal contractions. The other paper, which has had immense influence, was Reinhart/Rogoff on the negative effects of debt on growth. Very quickly, everyone “knew” that terrible things happen when debt passes 90 percent of GDP.
Mike Konczal writes that from the beginning there have been complaints that RR weren’t releasing the data for their results (e.g. Dean Baker). Konczal knew of several people trying to replicate the results who were bumping into walls left and right – it couldn’t be done. After trying to replicate the RR results and failing, Thomas Herndon, Michael Ash, Robert Pollin (HAP) reached out to RR and they were willing to share their data spreadsheet. This allowed Herndon et al. to see how RR’s data was constructed.
Peter Coy (HT Tim Duy) notes that the Twitterverse exploded with chatter about the new research paper claiming to find holes in a landmark academic paper that’s been cited to justify extreme austerity measures. By mid-afternoon the authors had emailed reporters a quick reply defending their work.
Data issues in “Growth in a time of debt”
Mike Konczal writes that HAP find that three main issues stand out. First, RR selectively exclude years of high debt and average growth. Second, they use a debatable method to weight the countries. Third, there also appears to be a coding error that excludes high-debt and average-growth countries. All three bias their results, and without them you don’t get their controversial result about low growth for countries with a debt to GDP ratio higher than 90%.
Ritwik Priya writes that removing the asymmetric weights and including all episodes is really where the meat and juice of the debate lies. The excel error only changes the results from -0.1 to 0.1 (see Ritchie King for a picture of the now infamous excel spreadsheet and some tips on how to use excel properly!), while this changes the result further from 0.1% to 2.2%. 2% vs 3% simply lacks the jaw-drop value of less than 0% vs. 3%, and is actually not even statistically significant.
HAP writes that RR adopts a non-standard weighting methodology for measuring average real GDP growth within their public debt/GDP categories. After assigning each country-year to one of four public debt/GDP groups, RR calculates the average real GDP growth for each country within the group, that is, a single average value for the country for all the years it appeared in the category. The country averages within each group were then averaged, equally weighted by country, to calculate the average real GDP growth rate within each public debt/GDP grouping. The problem is that equal weighting by country gives a one-year episode as much weight as nearly two decades in the above 90 percent public debt/GDP range.
Reinhart and Rogoff argue that their weighting procedure is hardly unconventional. We do not want to excessively weight Greece, for example, which has debt over 90% for 19 years in the 1946-2009 sample. The post-war Advanced Economy experience would quickly reduce to the experiences of Greece and Japan. Our approach has been followed in many other settings where one does not want to overly weight a small number of countries that may have their own peculiarities.
Josh Barro writes that because there were only seven countries in the data set that RR used to calculate average GDP growth under high debt conditions, and because they weighted each country’s average growth equally, getting New Zealand wrong by more than 10 percentage points was a very big deal, shaving 1.5 percentage points off their estimate of average growth. Ritwik Priya writes that it’s true that any data analysis exercise will have to make these choices about weighting. But for precisely this reason data analysis exercises offer, or should offer, at the very least, footnotes or explanatory pieces detailing these choices and a short note on why they were preferred to other choices.
Reinhart and Rogoff writes that the charge of selective omissions is the one they object to in the strongest terms. The “gaps” are explained by the fact there were still gaps in our public data debt set at the time of this paper, a data set no one else had ever been able to construct before and which we now have filled in much more completely.
The conceptual issues in “Growth in a time of debt”
Coppola writes that the Rortybomb blog delivered the killer punch to RR. Econometric analysis by Arindrajit Dube demonstrated that even with good data, the economic analysis was flawed and the conclusions unjustifiable. High public debt cannot reliably be shown to cause low growth. But low growth can reasonably reliably be shown to cause high public debt. Ritwik Priya writes that it is worth pointing out that the main reason RR come up with the 90% figure is because their intervals are of 30% i.e. they split the data into buckets of 0-30%, 30-60% and so on. This is purely a modeling choice artifact, and the actual tipping point, assuming any exists, may be 80% or 110% or 93.7%.
Arindrajit Dube notes that there is a visible negative relationship between growth and debt-to-GDP, but as HAP point out, the strength of the relationship is actually much stronger at low ratios of debt-to-GDP. This makes us worry about the causal mechanism. After all, while nonlinearity may be expected at high ratios due to a tipping point, the stronger negative relationship at low ratios is difficult to rationalize using a tipping point dynamic.
Arindrajit Dube writes that while it is difficult to ascertain causality from plots like this, we can leverage the time pattern of changes to gain some insight. Here is a simple question: does a high debt-to-GDP ratio better predict future growth rates, or past ones? As is evident from the diagram below, current period debt-to-GDP is a pretty poor predictor of future GDP growth at debt-to-GDP ratios of 30 or greater—the range where one might expect to find a tipping point dynamic. But it does a great job predicting past growth. This pattern is a telltale sign of reverse causality.
Data issues in “This Time is Different”
A number of authors like Cardiff Garcia and Joseph Cotterill, Adam Posen, Ryan Avent (on Twitter) think that the criticism of this finding should be kept separate from judgments about their earlier book, “This Time is Different”, which remains a highly valuable contribution to the study of finance crises. Paul Krugman writes the book had a sound empirical strategy: it focused only on extreme events, then described what happened around those events. Because of the severity of the shock, it was reasonable to infer that whatever happened around crises was in fact crisis-related, so problems of causation were sidestepped.
But several authors have noted that there were also signs of sloppiness in the construction of the dataset for “This Time if Different”.
Andrew Jalil reveals the major inconsistencies in the banking panics series of “This Time is Different” based on his reading of contemporary news reports surrounding each of the banking panics episodes identified by RR. Jalil notes that the book actually provides two, sometimes contradicting, versions of the banking crisis series (Table A.3.1 and Table A.4.1). One version, for example, identifies December 1861 and April 1864 as banking crises, whereas the other does not contain. [April 1864 should not be classified as a banking panic since what happened was just a serious disturbance on stock markets that was unrelated to the states of banks and did not turn into a banking panic]. The RR series also happens to classify certain foreign crises as domestic ones. The banking panic that took place in England in 1825 is, for example, wrongly classified as one that affected the US.
David Lopez-Salido and Edward Nelson write that the account of postwar U.S. financial crises by RR is also questionable on several counts. They treat the 1970s as free of financial crises in the United States, even though the mid-1970s witnessed banking stresses that saw banks’ equity capital ratio plunge to a postwar low. RR do not treat 1982 and 1983 as years of financial crisis, despite the pressure in those years on U.S. commercial banks brought by the LDC debt position. And the savings and loan crisis is referred to by R&R as the S&L crisis of 1984, even though S&L failures actually were lower in 1984 than in any other year in 1981-1989.
Republishing and referencing
Bruegel considers itself a public good and takes no institutional standpoint. Anyone is free to republish and/or quote this post without prior consent. Please provide a full reference, clearly stating Bruegel and the relevant author as the source, and include a prominent hyperlink to the original post.