ArcGIS includes a large suite of geostatistical analysis tools ranging from simple interpolation and cluster analyses to sophisticated kriging and geographicallyweighted regression methods. This exercise has you explore them using a fairly small dataset (US states; N = 50) and a large dataset (counties in the lower 48 states; N = 3107). ESRI's help files include a lot of useful background information on the various geostatistical tools. I encourage you to take time to explore these resources. All of the geostatistics modules are included in your student version license.
Part one: For many years the Tax Foundation compared total Federal taxes paid by state (from the IRS's annual Data Book) with total Federal spending on each state (from the Census's Consolidated Federal Funds Report). Their most recent data from 2005 reveal wide discrepancies across the 50 states. At one extreme, "winners" such as New Mexico, Mississippi, West Virginia and Alaska received about $2 in federal spending for each $1 they send to Washington in taxes. At the other extreme, "losers" such as New Jersey, Nevada, Connecticut, New Hampshire and Minnesota got less than 75 cents in federal spending for each $1 they send to Washington. Similar discrepancies exist today, although the enormous growth in deficit spending means that every state now receives more in federal spending than it sends to Washington in Federal taxes. For at least the last 20 years, states voting "red" (Republican) in presidential elections generally had larger federal tax/spend ratiosand contributed more to the federal deficitthan states voting "blue" (Democrat). But is this relationship statistically significant?
First, try a
quick and dirty analysis using 2008 election data.
Use Excel's Chisquare utility to compare the
2x2 table showing the actual numbers of WinnerMcCain, WinnerObama,
LoserMcCain and LoserObama states versus another table
representing the results you would expect if Obama and McCain had won
the same proportions of fiscal winners to losers.
 


Now download my compilation of Tax Foundation and election data and use Excel's linear regression utility (in the Data Analysis tools) to test this correlation using data from other years. Rather than use a binary red/blue variable, you can use the popular vote data to calculate the natural logarithm of the ratio of Democratic votes to Republican votes: ln(D/R). This variable represents the degree to which a state is red or blue. My original hypothesis was that the incumbent majority in Washington would give relatively more federal money to states that were politically aligned with them (and maybe swing states that might be "bribed" to vote for them), and relatively less federal money to states aligned with the outofpower party. But these federal spending and tax patterns don't change much in response to changing political majorities in Washington.
Part 2: I obtained more recent data from the IRS and CFFR and calculated similar spendingtotax ratios by state for 2010. This spreadsheet also contains log ratios of the popular votes by state for the 2004, 2008 and 2012 presidential elections. (Census has not compiled a CFFR since 2010, because Congress decided to stop wasting our tax dollars reporting on how they're, uh...wasting our tax dollars.) As an advanced variant of the simple red state/blue state exercise, use Arc's geostatistical tools to test for more recent spatial correlations between states' federal tax/spend ratios and political orientations. The red state/blue state political divide exhibits significant geographic clustering which reflects differences in regional media markets, degrees of urbanization, etc. And you can test the significance of other factors, such as differing median educational attainment, rates of church attendance, median household incomes, etc. You are encouraged to incorporate any data you like into this analysis.
Part 3: Try doing a finegrain analysis of the redblue divide at the county level using the same tools. Download and join the countylevel 2008 vote results Excel file into the US counties shapefile. Calculate the same redblue logratio variable for counties that you calculated for the states. Your basic objective here is to use ArcGIS's geostatistics tools to visualize, analyze and hopefully explain the socioeconomic determinants and spatial clustering patterns behind the redcounty/bluecounty divide. I crunched the CFFR data for 2009 by object code, agency and program; and at various levels of geography: state, county and Congressional District. Unfortunately, the most recent countylevel data on federal tax burdens that I could find is for 2004; it's on the Tax Foundation's website. The counties attribute table included with the US counties shapefile includes lots of other potential predictors of redcounty/bluecounty. You will notice that urban counties tend to vote Democratic while rural counties tend to vote Republican. There are two RuralUrban Continuum Code fields for 1993 and 2003 in the counties shapefile's attribute table. RURURBCC03 may be a good predictor of political orientation. Here's the output map from a Hot Spot Analysis of the logratio of Obama to McCain votes, by county, which represents core areas of redstate and bluestate voting strengths.
The following maps are output from a trial geographicallyweighted regression of the log odds ratio of the 2008 popular vote by county, LNVRATIO = ln[Obama/McCain], against median household income (MEDHHINC) and percentage of population that graduated from college (PCTCOLLG):
The income coefficient map is generally positive, with negative clusters along the upper Mississippi and west coast.
I normalized the income coefficient map by the corresponding coefficient standard error map to obtain a ttest map of coefficient significance with break values of 1.96 and +1.96.
I then superimposed the coefficient map with 50% transparency on top of the ttest map to identify the clusters of counties with significantly positive or negative income coefficients.
Here are equivalent composite sign/significance maps for the higher education and intercept coeffients and the normalized residuals:
The spatial distribution of residuals appears to exhibit some clustering (positive spatial autocorrelation) which would imply the estimated coefficient significances are overstated. I used the Spatial Autocorrelation (Moran's I) tool to obtain a calculated Moran's index of 0.187319 with a variance of only 0.000029 versus an expected index of 0.000322. The null hypothesis of no spatial autocorrelation is strongly rejected. The appropriate correction for autocorrelation in the residuals would be to construct a weight matrix based on inverse distances between county centroids, so that nearby (and more highlycorrelated) counties carry relatively less weight in the regression procedure.
Part 4: A cartogram is a map in which the features themselves are scaled according to some relevant variable, with necessary distortions of shape required to fit the features together. I downloaded a cartogram tool written by Tom Gross from the ESRI website, installed it and used it to create the maps below.
Here's a cartogram of US states scaled by their 2008 Electoral College votes, and thematized to show the 2008 popular votes (the logarithm of the ratio of Obama votes to McCain votes).
This map shows a truer balance of blue and red than the more conventional redblue maps which exaggerate the red states with large areas but small populations. Here's another cartogram of counties scaled by their 2010 populations, and thematized to show the countylevel popular vote. I dissolved the county polygons by state to create the state polygons. Note the more severe distortions of these state boundaries.
Again, this map shows a lot more balance between blue and red than the more conventional county vote map shown above, which is dominated by red counties with large land areas but low population densities. Since these are basically maps with uniform densities, they may have nicer sampling properties for geostatistical analysis than a conventional map! Try installing this cartogram tool on your own computer, and create a cartogram of states sized by their 2012 electoral votes, showing the 2012 election results. Here's a cluster analysis of the cartogram created from the GetisOrd Hot Spot Analysis geostatistical tool:
