Chattin’ more about graphs


This one makes fun of some WaPo map. Here’s the map (click for full size):

At the top link Catherine and I will tell you what’s wrong with it.

Brownback’s Tax Cuts and Migration Claims

Sam Brownback claimed that the tax cuts he passed in 2012 would compel 35,000 people to move to Kansas. This isn’t population growth (aren’t people turned on by high taxes?), but specifically migration.

I don’t think Sam Brownback believes he’s an evil guy who lies about things for fun, so I’ll give him the benefit of the doubt and assume that somewhere in some time series that he has access to he had good reason to believe in his tax cuts. More specifically, that there’s a case in publicly available data in which cutting top marginal income tax rates while doing nothing to offset the cut yields a huge influx of migrants to the low top tax-rate utopia. I don’t know that I’ll find anything, but I’ll do my best. (Also, not bothering with the supposed budget surplus and huge economic growth the tax cut was supposed to produce. That’s been handled in enough places already.)

The short answer turned out to be, if you don’t think too hard at the numbers, there’s a good reason for Brownback to believe that, net, people would move to Kansas more often than move out of Kansas in the years following his tax cut. I looked at each state that had a top marginal income tax rate cut followed by four years without another such cut to see if there was a naive reason to believe in Brownback’s migration story, and the results were sort of mixed. Simply, each state that qualified had positive net migration over the four years after which it passed its tax cut, but the migration patterns start to look pretty strange when you look at geographic variation.

First, the simple news. Here are1 the net migration figures for each state and tax cut year combination:

State Years Net Migration
Vermont 2001 774
Iowa 2008 1,029
Michigan 2005 913
Kansas 2008 1470
Utah 2008 957
Hawaii 2002 807
Rhode Island 2002 700
Nebraska 2008 765
Arkansas 2005 1,634
Massachusetts 2002 409
District of Columbia 2001 1,304
Maryland 2002 & 2006 4,087
Idaho 2001 949

That’s good for our good buddy Sam! Look at all of those positive numbers! But I have to urge caution. First, people move for a lot of reasons. The census question on why people move (2012 to 2013) showed some of the reasons and some of the ways they interact. “Top marginal income tax rate” wasn’t one of them,  but it doesn’t really need to be. Second, if top marginal income tax rate were a strong motivating factor for people to move, there’s a decent-sized cohort of states that have no income tax, which would make marginal reductions in the top income tax rate less appealing. For the tax rate to be the thing that pushed people over the edge, the claim has to be that for 765 people, from 2008 to 2011, holding all other reasons for moving to Nebraska fixed, the 1.61 percentage point decrease in tax rate on the next $1,000 earned over $1,500,000 (based on NBER TAXSIM methodology) swayed their choice. It’s possible to check the income of those who moved to Nebraska in this time period, but I haven’t done so. Regardless, I don’t think they were all millionaires.

Third, keeping in mind the second note of caution, the geographic patterns are sometimes really weird.

Colors here are:

  • Darkest red: at or above 90% of the maximum net migration to the state in question
  • Darkest blue: at or below 90% of the negative of the maximum net migration to the state in question
  • White: within 10% (positive or negative) of the maximum from zero
  • Black: indicates which state is being shown2

It makes sense that a lot of people moved from California, Florida, Texas, and New York to other states when there was positive net migration. Whatever factors were driving that net migration were going to drive more total people from these four very populous states, if only because there were more people possibly to drive. This is reasonable to believe unless you look at Michigan’s 2005 tax cut, which overall preceded positive net migration, but had large negative net migration to both Texas and California, but large positive net migration from Florida. You might claim that these populations are in some way qualitatively different, but then you’d have a lot of similarity to explain in the other maps.

Also, sometimes closeness seems to be the most important factor for where people seem to be moving, in which case we might think that people’s lives wouldn’t be that uprooted if they moved for a better top marginal income tax rate (see Idaho 2001, Kansas 2008, Maryland 2002 and 2006). This isn’t strictly reliable though, as Arkansas’s pattern of states with large in- and out-migrations is all over the place. Additionally, why did so many Kansans who left after the 2008 tax cut bypass Oklahoma for Texas?

Is there anything here? That’s unclear. There are counterexamples to any consistent narrative I’ve thought of lazily to throw at the maps, and I’m sure they’d change slightly with different cutoffs. Additionally, there are further tests I can do: checking net migration to neighboring states in the same periods, checking county-level net migration to and from states after tax cuts (in the spirit of Arin Dube’s minimum wage work), actually doing the data cleaning and lit work to estimate some sort of regression model.

As usual, all code available on github, with one caveat: ACS dataset I started with was 1.7gb, which is larger than github will let me upload. If you want the data, I’ll provide IPUMS instructions and the sql code I used to create the tables.

1On a 1% sample ACS scale, so I think multiply by 100 if you want to compare to a state’s initial population? I’m still unclear on this.
2Except in Michigan’s case, where I think I broke something in the code that handles the polygons that represent the shapes. Probably an upper and lower peninsula problem. Might fix later. I totally fixed Michigan.


Also I think I’m obligated to drop these two citations here as a precaution:

Steven Ruggles, J. Trent Alexander, Katie Genadek, Ronald Goeken, Matthew B. Schroeder, and Matthew Sobek. Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database]. Minneapolis: University of Minnesota, 2010.

Feenberg, Daniel Richard, and Elizabeth Coutts, An Introduction to the TAXSIM Model, Journal of Policy Analysis and Management vol 12 no 1, Winter 1993, pages 189-194.

Jordan Weissmann and “New York rich”

First, a quick formality: I’m blogging about bad/misleading/less-informative-than-they-might-have-been graphs with my friend Catherine Roberts at her new fun internet adventure, Graphchat. There will be a lot of graphs, so you can probably look at it while at work and no one will be able to tell, assuming your work has something to with graphs.


Jordan Weissman looked at income distribution in the U.S. and New York, targeting the notion of “New York rich.” In his words:

it’s usually a somewhat strange experience when I get into conversations here about class. If I mention that a six-figure salary counts as rich in much of the country—that just $250,000 gets you into the top 2 percent—the response is usually, “Sure, but that’s not New York rich.”

Except, it sort of is.

I won’t quibble here about the claim, but rather about its graphical presentation. Weissman picked this side-by-side histogram to make his point:

New York city is visibly heavier in very low income and very high income, but the magnitudes of these differences aren’t exactly illustrative. If the claim is that the distribution of income in NYC is similar to the distribution of income in the U.S. at large, presenting the data this way with tons of white space between income buckets that aren’t really doing anything is confusing. Additionally, if the claim is in particular about “New York rich,” it doesn’t seem to matter if the proportion of households in the $50,000 to $74,999 bucket is similar in the U.S. and NYC, but rather how far out the right tail extends seems to be more instructive. You can’t figure that out from this graphic.

Things get worse after Weissman adds a claim that San Francisco is really the rich people city. To make the claim that New York isn’t special in one particular way and San Francisco is, Weissman threw in this graph comparing San Francisco to… the U.S.:

If you can remember where the red bar is in the previous image, you can do a really quick visual comparison and find out whether New York and San Francisco are alike or similar in different income buckets! Lest you think I’m being unfair by throwing two paragraphs in between the two graphs, click the link I opened with and compare the space between the graphs in Weissman’s article (or here it is again if you don’t like scrolling) with the space between the graphs here.

The issues here are:

  1. To compare NYC and San Francisco, Weissman showed both of them compared to a third entity
  2. If income distribution is lumpy within any of these brackets, we don’t have any sense of the underlying shape

Neither of these needs to be a problem. We can instead visualize the same data (I used the one year ACS instead of 5 year) in KDE plots, fit everything on one graph, and actually see what San Francisco and NYC look like relative to each other and to the national average:

Here, it’s much more obvious how right-skewed the San Francisco income distribution is, much more obvious how similar the U.S. and NYC income distributions are, and much more obvious just how far the right tail goes. I don’t know what to tell you about the weird squiggle all the way in the right tail.

As I said above, I have no issues with the claim Weissman made. He’s correct that the income distributions in the U.S. and NYC are similar and that San Francisco is, if you pick random citizens, richer than either. His graphical choices just didn’t make that obvious.

As usual, all code/data work (there’s really not a lot) on github.

NIH Grant Funding Concentration – Are the rich getting richer?

Bill Gardner wrote about concentration of NIH grants and possible risks. The worry is timely because, in negative NIH funding growth environments, concentration might increase. There’s definitely negative growth in NIH funding:

So the question then becomes whether fears of concentration are justified. Grantome, an organization that provides information about grants and other forms of research funding, used the following standard to try to answer this question: if the past proportion of R01 grants an organization receives is positively correlated with its three-year annual growth rate in R01 grants, we’ll say grant funding concentration is increasing. They produced this chart:

There’s some obvious correlation there, but also an obvious problem: instead of the past proportion of R01 grants, they used the “current” proportion. The issue here is possible reverse causality: it’s likely that organizations that have larger proportions now would have grown more quickly over the last three years, because otherwise how did they get so big?

This question matters for reasons Gardner is clear about and that I agree with. Using a Richard Florida map showing scientific citations per capita in a number of the world’s cities…

…Gardner quips “Post-war global science has been led by the US, but not really. It’s been dominated by Massachusetts, the Washington-NYC corridor, and California.” His concerns are:

First, increasing concentration of science on the coasts will increase US regional economic and educational disparities… Second, a greater concentration of scientific dominance in a few liberal states is to the disadvantage of the NIH, because over time it must further erode broad political support for medical science.

These both make intuitive sense to me. The only assumption behind these concerns is that Congressional representatives systematically vote for bills and budgets that send more funds to their own states and against bills and budgets that don’t, which, duh.

So it is a problem that grants are concentrated, but fortunately, they don’t appear to be getting more so. When I fixed the Grantome chart to look at proportion of grants held in 2010, the correlation vanished entirely. Instead, we had a nice funnel, showing diminishing variance but no drift as we increase the proportion of grants held. “Big” organizations are those with at least 1% of all R01 grants, “little” organizations are all others (with an arbitrary cutoff of 0.02%, because Grantome did that).

In fact, I checked each year’s proportion of grant’s held vs. three year annual growth rate in number of grants from 1986 to 2010, and that same funnel pattern was present in all of them:

This says nothing about the current level of concentration, which is still scary for the reasons mentioned above, but at least grant funding concentration doesn’t appear to be getting worse.

(all code/datawork available on Github)

Zipf’s Law and IPython Notebooks

Bill Gardner is blowing the reproducible research horn again, which inspired me actually to bother learning how IPython notebooks worked. I probably should have started using IPython notebooks a while ago, given the tools available and my physical proximity (and academic irrelevance) to the HAP/Reinhart-Rogoff adventure. Oh well, get ’em next time.

Anyway, the Think Complexity book continues to be a font of interesting exercises and reasons to write python scripts. Current chapter I’m on starts with something called Zipf’s Law. Zipf’s Law is an empirical phenomenon that claims for a given large corpus of words in a language, counting the frequencies of those words, you can estimate how frequently a word will appear given only its rank along the lines of:

Log(f) = Log(c) – s Log(r)

where f is frequency, r is rank, and s and c are parameters that depend on language. That’s a simple linear equation though, so with a large body of words it’s not hard to estimate. I played with Adam Smith, and it worked ok, which is strange. Here’s what I wound up with (code below):




Section 1: Zipf’s Law

Problem 5.1

Write a program that reads a text from a file, counts word frequencies, and prints one line for each word, in descending order of frequency. You can test it by downloading an out-of- copyright book in plain text format from Project Gutenberg . You might want to remove punctuation from the words.

Plot the results and check whether they form a straight line. For plotting suggestions, see Section 3.6. Can you estimate the value of s?


  • nltk: Natural Language Toolkit. Useful for playing with text, especially large bodies of text.
  • re: python regular expressions library
  • pprint: pretty print. Makes long lists less ugly.
  • word_tokenize: tokenizes strings
  • urllib2: because we’re going to need some web data
In [1]:
import nltk
import re
import pprint
from nltk import word_tokenize
import urllib2

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We’re going to need some text, so we’ll grab Adam Smith’s Wealth of Nations from Project Gutenberg.

In [2]:
url = ''
response = urllib2.urlopen(url)
raw ='utf8')

There’s a bunch of license info though, so we’ll use find and reverse find on some stock Gutenberg labels to figure out where that is and then slice the string to between those positions.

In [3]:
raw = raw[start:end]


In [4]:
tokens = np.array([w.lower() for w in word_tokenize(raw)])
vocab = set(tokens)

So now we have a bunch of words sorted alphabetically, and they all came from The Wealth of Nations. The goal, though, is to see whether we can replicate/observe Zipf’s Law:

Log(f) = Log(c) – s Log(r)

where f is the frequency of a word, r is the word’s rank, and c and s are parameters that depend on the language. This is obviously a linear form, and should appear that way when plotted.

For this we need to count the frequencies of each word.

Vocab is all of the unique words in the corpus, so we can use a dict comprehension with vocab as keys and the length of the array of words (tokens) equal to each key (this takes some time). Then, just to inspect, we can print the first few items and their counts.

In [5]:
counts = {v: len(tokens[tokens == v]) for v in vocab}
In [6]:
print [{key: val} for key, val in counts.items()[:10]]
[{u'writings': 1}, {u'delusions.': 1}, {u'portugal.': 9}, {u'foul': 1}, {u'four': 188}, {u'prices': 70}, {u'woods': 6}, {u'thirst': 1}, {u'ocean.': 1}, {u'preface': 2}]

We then need to grab ranks of each word. This is easier in a pandas DataFrame than in a dict.

In [7]:
df = pd.DataFrame(index = counts.keys(),
                  columns = ['Count','Rank'],
                  dtype = 'float64')

for a in df.index:
    df.loc[a,'Count'] = counts[a]
df['Rank'] = df['Count'].rank(method = 'min', ascending = False)

Then finally we estimate s and plot things

In [8]:
x = np.log(df['Rank'])
y = np.log(df['Count'])

fit = np.polyfit(x, y, deg = 1)
fitted = fit[0] * x + fit[1]
print 'Estimate of s is:\n{0}'.format(fit[0],3)
Estimate of s is:

In [9]:
fig = plt.Figure(figsize = (4,4), facecolor = 'w', edgecolor = 'w')
ax = plt.subplot(111)

ax.plot(x, y, 'bo', alpha = 0.5)

ax.set_title('Log(Rank) vs. Log(Count) Words,\nAdam Smith\'s Wealth of Nations')

ax.set_xlim(left = max([min(np.log(df['Rank'])) * 0.95,0]))


Minimum Wage and Unemployment

Following up time! I’m sure I don’t need to link to the collection of papers and blog posts claiming minimum wage increases increase/decrease unemployment or the total employment level or the rate of job growth? Here’s John Schmitt at CEPR on why the minimum wage has no discernible effect on employment, and here’s the J. Meer Texas A&M paper on the relationship between minimum wage and labor dynamics. After that, you’re on your own.

One criticism of the previous post‘s approach brought up by Dr. Steve Steib1 (that I was totally going to mention anyway) is that state populations over time are not strictly comparable because people can move between states. Extreme cases here are pretty obvious. If North Dakota decreases its minimum wage to -$1,000 per hour (for science? To honor the ghost of Ayn Rand? As a bid to secure Paul Ryan’s transfer of citizenship? Whatever), low-wage workers, assuming their wages in ND set them below the poverty level for their family size in North Dakota, will move out of North Dakota, which will decrease the North Dakota poverty rate through selection rather than through any impact on people’s standard of living. Another criticism of the previous chart is that any relationship between poverty and the minimum wage might be attenuated by some sort of threshold effect: if the minimum wage is enough below the poverty level for full-time work, increases won’t affect the poverty level because there’s a large space for minimum wages higher than the current minimum but still below the poverty level. For those reasons, the poverty rate might not be a good proxy for the question “does increasing the minimum wage make people’s lives suck less?”, and it might be more useful to examine the impact on unemployment.

Examining unemployment in the context of interstate migration is also problematic. It’s possible that states that increase their minimum wages might attract in-migration. Ignoring the idea that firms lay people off for now (i.e., ceteris paribus), this in-migration increases the labor supply for a given level of labor demand, which, if wages are sticky downward, results in higher unemployment. If wages aren’t sticky downward, then wages in a region with a minimum wage increase should decrease. Both of these are testable predictions, which is nice, but for now, the raw, unexplained relationship between minimum wage and unemployment rates for different states is easier to display.

From a lazy, non-econometric approaches to visualizing the two time series, here’s what the relationship looks like between 1988 and 2006, with the unemployment rate2 in blue and the minimum wage2 in red:

Full image at the link

Again, look at all that variation! Look at North Dakota! Steadily rising minimum wage with decreasing unemployment! Liberal point: PROVEN.  But also, look at D.C. Unemployment sure was on the downswing before that minimum wage increase got out of control. Conservative point: PROVEN.

I’m not trying to say too much about what the “actual” relationship between minimum wage and unemployment and/or poverty is here. I think the point is probably something like “the relationship between changes in the wage-level and unemployment and/or poverty varies across space and time,” but that’s boring. Instead of that… look, graphs! But really, next time you hear that minimum wage increases will ruin everything/save us all, keep in mind both that meta-analyses tend to find a small and insignificant effect and that the impact of any minimum wage change will probably be dwarfed by what’s going on in the economy at large.

P.S. Github link is included above, but in case anyone wants to play with data, feel free to fork the repo. Current version of unemployment is monthly/seasonally adjusted. I’d like to do something similar with aggregated data from the monthly level (monthly unemployment series in Fred for not seasonally adjusted unemployment rates are just [state abbrev.]URN, and weights are just the size of the labor force in each month, which are [state abbrev.]LF. This isn’t hard, but I have a job and am sort of lazy?) to the annual level.

1 The actual facebook comment from Dr. Steib was “Intra state migration attracted by minimum wage differences???” The assumption that people don’t respond to their policy environment is a bit strong, so this is a point well taken, even if Matt Yglesias says people don’t move.
2 Standardization here was identical to standardization in the previous post, in which series were divided by their mean values to to bring the scales in line.

Minimum Wage and Poverty

The CBO estimates that increasing the federal minimum wage to $10.10 per hour would lift 900,000 people out of poverty. Pew Research points out that full time minimum wage work “hasn’t been enough to lift most out of poverty for decades,” which sounds like an argument for increasing it. Meanwhile, the Mises Institute has a principles-of-micro (this can be either a compliment or a criticism, but here, I think for lack of empirics, it’s a criticism) explanation for how minimum wage laws increase unemployment and poverty, and Jeffrey Dorfman at Forbes interprets the CBO report as evidence that the minimum wage is “terrible anti-poverty policy.”

This is all very confusing. I looked at minimum wage rates by state and poverty levels by state over time to see if there was anything that jumped out/suggested a more likely correct explanation of the relationship between the minimum wage and poverty and foundthat (with a lazy, non-econometric approach to looking at the relationship between poverty and the minimum wage), conclusions are hard to come by.

The chart below shows the minimum wage rate and two-year moving average poverty rate for each state, D.C., and the U.S. on the whole from 1993 through 2005. Minimum wage is in red, and poverty rate is in blue. Each was normalized by its mean in the period to get values onto roughly the same scale.


…nothing really jumps out. Connecticut has a steadily increasing minimum wage over the period, but no interesting poverty trend. New Mexico has a nice big poverty wobble in a period with no minimum wage change. California’s nice little criss-cross is a terrible joke about the perils of putting separate series on the same graph. I guess Kentucky was on a downward poverty trend before their slight uptick in minimum wage, and then the trend reversed? I guess?

You’d need a better sense of what’s supposed to be related to poverty rates (Education levels? Demographics? Tax policy? I’m not sure what to control for here) before drawing any conclusions of course, but I’d expect, if anyone’s going to call minimum wage either a terrible or great anti-poverty policy, there’d be something more visible here.

Next step: the same but with unemployment.