Thursday, October 8, 2009

What Does a Correlation Really Tell Us?

. Thursday, October 8, 2009

Ed Glaeser wants to know why Argentina's economy hasn't performed better over the past 100 years, and prints this graph:

He interprets it thusly:

Schooling is measured by the share of the relevant populations that was enrolled in primary, secondary or tertiary schooling. Argentina [in 1990] may have been rich, but it was not that well-educated. In 2000, Argentina was doing about as well as would be expected based on its education levels in 1900. Long-run national success is built on human capital, both because of the link between schooling and technology and because of the link between education and well-functioning democracy.

I mentioned this to a fellow grad student (who is Argentinian) and she thought Glaeser was crazy. She argued that correlation doesn't equal causation, and it is certainly plausible that the causal direction runs the other way (i.e. the poor economic performance led to lower public revenues which then led to less investment in education). She also half-jokingly recommended that I re-read some dependency theory to understand why regions are clustered at the top and bottom of the above graph.

To me, either Glaeser or my classmate could be correct. They could just as well both be incorrect. There's been a lot of political instability in Argentina over last century, right? Perhaps that instability is driving both results. Any of these interpretations are plausible, and just looking at the above graph doesn't help us eliminate alternative hypotheses.

Of course, Glaeser could just respond by arguing that education is also highly correlated with well-functioning democracy, so maybe low education levels explain the political instability as well. But if that story doesn't ring true to my classmate, who is very smart and surely knows more about her country than Glaeser does, then I'd have to wonder. Besides, if political elites knew that their hold on power would be strengthened by limiting opportunities for education, then maybe they'd... limit opportunities for education.

What's the point? Well, there's several. One is that we can learn quite a lot by examining cross-sectional correlations. But another is that correlations alone won't take us all the way to understanding. And the third is that it's really really hard to figure out what's gone on in Argentina over the past 100 years.


GabbyD said...

in this cause, you cannot argue reverse causation, coz the graph is education 100 years ago, to income per capita today.

Thomas Oatley said...

That's a lot of weight to attach to a five-point difference (Arg vs. Japan) in school enrollment 100 years ago. I suspect the word you are searching for to characterize this finding is "spurious."

I notice that East Asian societies are not included in the sample. I hypothesize that their inclusion would weaken the relationship substantially.

Kindred Winecoff said...

Thomas -

I actually noticed that too, esp. after my classmate mentioned the clustered regions. I wonder why they aren't in the sample... because many were still colonies in 1900? No data? If so then this might be selection bias. Also, by this metric several of the world's most successful economies of the past 100 years (e.g. U.S., Canada, Germany, U.K.) have actually underperformed slightly: they're below the regression line. (actually the pattern looks clearly nonlinear to me, but whatever.)

Gabby -

Mea culpa. But I think the overall point still stands. And actually, reverse causation could have a role if Argentina was unable to maintain or expand education over the 100 year period to keep pace with the other countries in the sample (something that Glaeser does not examine, and that I imagine is not actually the case, but it could be).

What Does a Correlation Really Tell Us?




Add to Technorati Favorites