Red State/Blue State


ArcGIS includes a large suite of geostatistical analysis tools ranging from simple interpolation and cluster analyses to sophisticated kriging and geographically-weighted regression methods. This exercise has you explore them using a fairly small dataset (US states; N = 50) and a large dataset (US counties; N = 3140). ESRI's help files include a lot of useful background information on the various geostatistical tools. I encourage you to take time to explore these resources. Almost all of the geostatistics modules are fully supported in the ArcEditor (mid-level and student version) license.

For many years the Tax Foundation compared total Federal taxes paid by state (from the IRS's annual Data Book) with total Federal spending on each state (from the Census's Consolidated Federal Funds Report). Their most recent data from 2005 reveal wide discrepancies across the 50 states. At one extreme, "winners" such as New Mexico, Mississippi, West Virginia and Alaska received about $2 in federal spending for each $1 they send to Washington in taxes. At the other extreme, "losers" such as New Jersey, Nevada, Connecticut, New Hampshire and Minnesota got less than 75 cents in federal spending for each $1 they send to Washington. Similar discrepancies exist today, although the enormous growth in deficit spending means that every state now receives more in federal spending than it sends to Washington in Federal taxes.

Part 1 (in Excel): For at least the last 10 years, states voting "red" (Republican) in presidential elections generally had larger federal tax/spend ratios--and contributed more to the federal deficit--than states voting "blue" (Democrat). But is this relationship statistically significant?

First, try a nice old-fashioned Chi-square test to get a quick and dirty answer: count the states in each quadrant of the graph, put the counts into a 2x2 table in Excel, and use Excel's Chi-square utility.


Now download my compilation of Tax Foundation and election data and use Excel's linear regression utility (in the Data Analysis tools) to test this correlation using data from other years. Rather than use a binary red/blue variable, you can use the popular vote data to calculate the natural logarithm of the ratio of Democratic votes to Republican votes: ln(D/R). This variable represents the degree to which a state is red or blue.

My original hypothesis was that the incumbent majority in Washington would give relatively more federal money to states that were politically aligned with them (and maybe swing states that might be "bribed" to vote for them), and relatively less federal money to states aligned with the out-of-power party. But these federal spending and tax patterns don't change much in response to changing political majorities in Washington.

Part 2: I obtained more recent data from the IRS and CFFR and calculated similar spending-to-tax ratios by state for 2006 through 2009. This spreadsheet also contains log ratios of the popular votes by state for the 2004 and 2008 presidential elections.

As an advanced variant of the simple red state/blue state exercise, use Arc's geostatistical tools to test for more recent spatial correlations between states' federal tax/spend ratios and political orientations. The red state/blue state political divide exhibits significant geographic clustering which reflects differences in regional media markets, degrees of urbanization, etc. And you can test the significance of other factors, such as differing median educational attainment, rates of church attendance, median household incomes, etc. You are encouraged to incorporate any data you like into this analysis.

  1. First, join the Excel data to your States layer, and try out some of Arc's Spatial Statistics Tools, analyzing the spatial clustering of red and blue states, and winners and losers. Try the High/Low Clustering, Spatial Autocorrelation, Cluster & Outlier Analysis and Hot Spot utilities. (Review the documentation on each of these tools first!)

  2. What other factors might explain the regionalization of American politics, e.g., education levels, income levels, age distribution, degree of urbanization, immigration rates, racial composition, etc.? Find and join some additional explanatory variables to your States layer. Review the documentation on ArcGIS's Ordinary Least Squares (OLS) regression utility and use this procedure to model and test the relationship between political orientation and these other variables in the lower 48 states, omitting DC (not a state), Alaska and Hawaii.

  3. One of the coolest tools in ArcGIS is the Geographically-Weighted Regression utility. While conventional regression procedures yield single coefficient point estimates and significance tests, this utility yields coefficient estimates that vary across space. Estimate a geographically-weighted regression (GWR) model using the same variables that you used in the OLS model. Save and explain some of the more important coefficient maps.

  4. The standard OLS model is based on the assumption that the data are independent and identically-distributed (IID), so factors like location and local densities of data aren't supposed to matter. But "closer things are more closely related" implies spatial autocorrelation, which can be viewed as an information redundancy problem: sampling an additional datapoint in a cluster of similar datapoints inflates your nominal sample size and the reported significance levels of your regression coefficients. One solution to this problem is to use a spatially-weighted regression, where datapoints in densely-sampled regions are given lower weights than datapoints in sparsely-sampled regions. (NOTE: spatial weight matrices should always be calculated from projected, not geographic, data!)

    Review the documentation on ArcGIS's spatial weights matrix utility so that you understand how the weights are created. Then create a spatial weights matrix for the 48 states in the continental US, and re-estimate the GWR model incorporating these weights.

Summarize and compare the results of the OLS, GWR and spatially-weighted GWR models.

Part 3 (optional): To do a fine-grain analysis of the red-blue divide, download and join the county-level 2008 vote results Excel file into the US counties shapefile. Calculate the same red-blue log-ratio variable for counties that you calculated for the states. Your basic objective here is to use ArcGIS's geostatistics tools to visualize, analyze and hopefully explain the socioeconomic determinants and spatial clustering patterns behind the red-county/blue-county divide.

I crunched the CFFR data for 2009 by object code, agency and program; and at various levels of geography: state, county and Congressional District. Unfortunately, the most recent county-level data on federal tax burdens that I could find is for 2004; it's on the Tax Foundation's website.

The counties attribute table included with the US counties shapefile includes lots of other potential predictors of red-county/blue-county. You will notice that urban counties tend to vote Democratic while rural counties tend to vote Republican. There are two Rural-Urban Continuum Code fields for 1993 and 2003 in the counties shapefile's attribute table. RURURBCC03 may be a good predictor of political orientation.

You will find that some ArcGIS spatial statistics tools won't work on such large datasets. I am not expecting any particular results here; just test out a few hypotheses and see which ones the data supports.