Be Skeptical of Both Piketty And His Skeptics

Data never has a virgin birth. It can be tempting to assume that the information contained in a spreadsheet or a database is pure or clean or beyond reproach. But this is almost never the case. All data is collected and compiled by someone — either an individual researcher or a government agency or a scientific laboratory or a news organization or someone or something else. Sometimes, the data collection process is automated or programmatic. But that automation process is initiated by human beings who write code or programs or algorithms; those programs can have bugs, which will be faithfully replicated by the computers.

This is another way of saying that almost all data is subject to human error. It’s important both to reduce the error rate and to develop methods that are more robust to the presence of error.¹ And it’s important to keep expectations in check when a controversy like the one surrounding the French economist Thomas Piketty arises.

Piketty’s 696-page book “Capital in the Twenty-First Century” has become an unlikely best-seller in the United States. That’s perhaps because it was published at a time when there is rapidly increasing interest in the subject of economic inequality in the U.S.² But on Friday, the Financial Times’ Chris Giles published a list of apparent errors and methodological questions in the data underpinning Piketty’s work. Piketty has so far responded to the Financial Times only in general terms.

My goal here is not to litigate the individual claims made by Giles; see The New York Times’ Neil Irwin or The Economist’s Ryan Avent for more detail on that. Rather, I hope to provide some broad perspective about data collection, publication and analysis. A series of disclosures: First, my economic priors and preferences are closer to The Economist’s than to Piketty’s.³ Second, I haven’t finished Piketty’s book, although I’ve spent some time exploring his data. Third, I’m no expert on macroeconomic policy or macroeconomic data. Fourth, this comment rather liberally takes advantage of our footnote system; there’s a short version (sans footnotes) and a long version (avec).

My perspective is that of someone who has spent a lot of time compiling and analyzing moderately complex data sets of different kinds. Also, I’m someone who, like Piketty, has seen his public profile grow unexpectedly in recent years. I consider myself extremely fortunate for this — however, I know that attention can sometimes yield disproportionate praise and criticism. Throat-clearing aside, here’s what I have to offer.

Piketty’s data sets are very detailed, and they aggregate data from many original sources. For instance, the data Piketty and the economist Gabriel Zucman compiled on wealth inequality in the United Kingdom for their paper “Capital is Back: Wealth-Income Ratios in Rich Countries, 1700-2010″ contains about 220 data series for the U.K. alone which are hard-coded into their spreadsheet. These data series are compiled from a wide array of original sources, which are reasonably well documented in the spreadsheet.

This type of data-collection exercise — many different data series over many different years, compiled from many countries and many sources — offers many opportunities for error. Part of the reason Piketty’s efforts are potentially valuable is because data on wealth inequality is lacking. But that also means his numbers will not have received as much scrutiny as other data sets.

An extreme contrast would be to something like Major League Baseball statistics, almost every detail of which have been scrubbed and scrutinized by enthusiasts for decades. Even so, they contain errors from time to time. There are, however, usually larger gains to be had when data or methods or findings are relatively new — as they are in Piketty’s case. (An analogy is the way a vacuum’s first sweep of the living-room floor picks up a lot more dust and dirt than the second and third attempts.) Perhaps Piketty is guilty of coming to some fairly grand conclusions based on data that has not yet received all that much scrutiny.

What error rate is acceptable? The right answer is probably not “zero.” If researchers kept scrubbing data until it were perfect, they’d never have time for analysis. There comes a point of diminishing returns; that Hack Wilson had 191 RBIs during the 1930 season rather than 190 ought not have a material impact on any analysis of baseball player performance. At other times, entire articles or analyses or theories or paradigms are developed on the basis of deeply flawed data.

I don’t know where Piketty sits on this spectrum. However, I think Giles (and some of the commentary surrounding his work) could do a better job of describing Piketty’s error rate relative to the overall volume of data that was examined. If Giles scrutinized all of Piketty’s data and found a handful of errors, that would be very different from taking a small subsample of that data and finding it rife with mistakes.

All of this is part of the peer-review process. Academics sometimes think of peer review as a relatively specific activity undertaken by other academics before academic papers or journal articles are published. This process of peer review has been much studied over the years (often in peer-reviewed articles, naturally), and scholars have come to different conclusions about how effective it is in avoiding various types of errors in published research.

I’m not necessarily opposed to this type of peer review. But I think it defines peer review too narrowly and confines it too much to the academy. Peer review, to my mind, should be thought of as a continuous process: It starts from the moment a researcher first describes her result to a colleague over coffee and it never ends, even after her work has been published in a peer-reviewed journal (or a best-selling book). Many findings are contradicted or even retracted years after being published, and replication rates for peer-reviewed academic studies across a variety of disciplines are disturbingly low.

I have a dog in this fight, obviously. I think journalistic organizations from the Financial Times to FiveThirtyEight should be thought of as prospective participants in the peer-review process, meaning both that we provide peer review and that our work is subject to peer review.

I can’t speak for the FT, but I know that FiveThirtyEight gets some things badly wrong from time to time. It’s helpful to have readers who hold us to a very high standard. (A terrific question is whether FiveThirtyEight and other news organizations are transparent enough about their research to be full-fledged participants in the peer-review process. That’s something I should probably address more completely in a separate post, but see the footnotes for some discussion about it.⁴)

Piketty’s errors would not have been detected so soon had he not published his data in detail. That’s not to say that transparency is an absolute defense.⁵ But one should also assume that there are as many problems (probably more) with unpublished data, or poorly explained methods.⁶

The peer-review process ideally involves both exactly replicating a research finding and replicating it in principle. It would be problematic if other researchers couldn’t duplicate Piketty’s data. But it would be at least as problematic — I’d argue more so — if they could replicate it but found that Piketty’s conclusions were not very robust to changes in assumptions or data sources.

Some of Giles’s critique of Piketty gets at this problem. For instance, he calls into question Piketty’s finding that wealth inequality is rising throughout Western Europe, a result which he says depends on a particular series of assumptions and choices that Piketty made.⁷

Of course, Giles’s methodological choices can be scrutinized, too. Perhaps there’s some reasonable set of assumptions under which wealth inequality is not rising at all in Western Europe, another under which it’s increasing modestly, and a third under which it’s increasing substantially.

In the medium term, the better test might be one of research that’s built up from scratch and largely independently of both Piketty and Giles. How robust are their findings to reasonable changes in data and assumptions?⁸

And in the long run, the best test might be whether Piketty’s hypothesis makes a good prediction about wealth inequality, i.e. whether wealth inequality continues to rise. The prediction won’t be as easy to evaluate as election forecasts are.⁹ Still, Piketty’s book comes closer to making a testable prediction than much other macroeconomic work.

Science is messy, and the social sciences are messier than the hard sciences. Research findings based on relatively new and novel data sets (like Piketty’s) are subject to one set of problems — the data itself will have been less well scrutinized and is more likely to contain errors, small and large. Research on well-worn datasets are subject to another. Such data is probably in better shape, but if researchers are coming to some new and novel conclusions from it, that may reflect some flaw in their interpretation or analysis.

The closest thing to a solution is to remain appropriately skeptical, perhaps especially when the research finding is agreeable to you. A lot of apparently damning critiques prove to be less so when you assume from the start that data analysis and empirical research, like other forms of intellectual endeavor, are not free from human error. Nonetheless, once the dust settles, it seems likely that both Piketty and Giles will have moved us toward an improved understanding of wealth inequality and its implications.

Footnotes

Perhaps the simplest way to make data more robust is to average it together from several different independent sources: A “bad” poll, for example, makes much less difference in an analysis when it’s averaged together with a dozen “good” ones.
By contrast, the original version of the book did not sell especially well in France, where discussions of inequality have long played a more prominent role in the political discourse.
Which is to say: they’re approximately center-right on a European scale and somewhere near the center on an American scale.
Just as there are a lot of different types of peer review, there are different kinds of transparency. FiveThirtyEight will be accused — very reasonably — of trying to have it both ways with respect to transparency. We’re publishing our raw data and our code more and more often, but we do not always do so.

Even in these cases, however, we aim to be detailed in describing our methodology and explicit about describing our assumptions. These plain-language descriptions about methodology can sometimes provide more insight to a wider audience than posting the data or code alone might. We’ve occasionally seen, by contrast, academics (or other news organizations) who publish their data or code in full, but without anything resembling an appropriate instruction manual.

I’m not saying that either approach is better — they aren’t mutually exclusive and the gold standard for transparency would involve doing both.
For instance, the polling firm Research 2000 was transparent about publishing its data, and that only served to make clear that the data had probably been faked.
One critique that’s been made of Piketty (as it was made of Carmen Reinhart and Kenneth Rogoff before him) is that he should have posted his data in a more advanced programming language (like R or Stata) rather than in Microsoft Excel. Programming languages like these do a better job of distinguishing data from code (there can be problems with both) and can serve to make it clearer when a researcher is making ad hoc adjustments to a formula. However, the advantage of Excel is that a lot more people know how to use it. That potentially yields a lot more eyes — a lot more peer-reviewers — on the same data.
Giles, perhaps less helpfully, also accuses Piketty of having “fat fingers.” It isn’t quite clear whether the term is meant to imply that Piketty has been sloppy and clumsy, or whether Giles is accusing Piketty of having “put his thumb on the scale,” i.e. deliberately manipulating his data to produce a particular result: one that shows rising wealth inequality.

I’d suggest that there’s sometimes a middle ground between mere sloppiness and overt manipulation. Even careful researchers implicitly abide by the following heuristic: Always check your data and your assumptions, but double-check them when they contradict your thesis. This concept sometimes holds in polling analysis. An analyst who has a particular preference about the race’s outcome won’t necessarily cherry-pick the outlier polls that show the best results for his candidate. But he’ll argue for throwing out the other side’s outliers even as he tolerates his own. Generally, his arguments for disregarding the opponent’s outliers are pretty reasonable — outlier polls (like outlying data points of any kind) rarely have the story right. But unless he applies the same scrutiny to the outlier polls that show a favorable result for his candidate, his overall impression of the race might wind up being fairly skewed.
One thing that distinguished the FiveThirtyEight election forecasting model in 2012 was that many other methods, developed independently from ours, produced similar results.
This is both because wealth inequality is much harder to measure than election outcomes and because Piketty’s prediction could potentially be self-canceling depending on how policymakers act upon it.

Footnotes

Comments