First, a quick formality: I’m blogging about bad/misleading/less-informative-than-they-might-have-been graphs with my friend Catherine Roberts at her new fun internet adventure, Graphchat. There will be a lot of graphs, so you can probably look at it while at work and no one will be able to tell, assuming your work has something to with graphs.
Jordan Weissman looked at income distribution in the U.S. and New York, targeting the notion of “New York rich.” In his words:
it’s usually a somewhat strange experience when I get into conversations here about class. If I mention that a six-figure salary counts as rich in much of the country—that just $250,000 gets you into the top 2 percent—the response is usually, “Sure, but that’s not New York rich.”
Except, it sort of is.
I won’t quibble here about the claim, but rather about its graphical presentation. Weissman picked this side-by-side histogram to make his point:
New York city is visibly heavier in very low income and very high income, but the magnitudes of these differences aren’t exactly illustrative. If the claim is that the distribution of income in NYC is similar to the distribution of income in the U.S. at large, presenting the data this way with tons of white space between income buckets that aren’t really doing anything is confusing. Additionally, if the claim is in particular about “New York rich,” it doesn’t seem to matter if the proportion of households in the $50,000 to $74,999 bucket is similar in the U.S. and NYC, but rather how far out the right tail extends seems to be more instructive. You can’t figure that out from this graphic.
Things get worse after Weissman adds a claim that San Francisco is really the rich people city. To make the claim that New York isn’t special in one particular way and San Francisco is, Weissman threw in this graph comparing San Francisco to… the U.S.:
If you can remember where the red bar is in the previous image, you can do a really quick visual comparison and find out whether New York and San Francisco are alike or similar in different income buckets! Lest you think I’m being unfair by throwing two paragraphs in between the two graphs, click the link I opened with and compare the space between the graphs in Weissman’s article (or here it is again if you don’t like scrolling) with the space between the graphs here.
The issues here are:
- To compare NYC and San Francisco, Weissman showed both of them compared to a third entity
- If income distribution is lumpy within any of these brackets, we don’t have any sense of the underlying shape
Neither of these needs to be a problem. We can instead visualize the same data (I used the one year ACS instead of 5 year) in KDE plots, fit everything on one graph, and actually see what San Francisco and NYC look like relative to each other and to the national average:
Here, it’s much more obvious how right-skewed the San Francisco income distribution is, much more obvious how similar the U.S. and NYC income distributions are, and much more obvious just how far the right tail goes. I don’t know what to tell you about the weird squiggle all the way in the right tail.
As I said above, I have no issues with the claim Weissman made. He’s correct that the income distributions in the U.S. and NYC are similar and that San Francisco is, if you pick random citizens, richer than either. His graphical choices just didn’t make that obvious.
As usual, all code/data work (there’s really not a lot) on github.