FREC 480 -- GIS in Natural Resource Management
Census and TIGER Data


This project provides an introduction to the analysis of US Census Bureau data mapped with TIGER geodata. The US Constitution requires the Federal government to conduct a complete census state populations every decade for purposes of reapportioning the House of Representatives.  Nowadays the Census Bureau contacts every household by mail, with followup visits by enumerators.  Every household answers the questions on the "short-form" questionnaire about occupants' genders, ages, races, etc.  A large proportion of households receive a "long-form" questionnaire containing all the short-form questions plus additional questions regarding income, schooling, employment, marital status, etc.  Summary data compiled from the short-form ("100% count") questions are called Standard Format 1, or "SF1" data. Summary data compiled from the long-form ("Sample") questions are called "SF3" data.

The Census Bureau summarizes these data at various levels of geographic detail, using a hierarchy of geographic area units: States, Counties, Census Tracts, Census Block Groups and Census Blocks.   The Bureau also publishes GIS data, known as TIGER (Topologically Integrated Geographic Encoding and Referencing) files, that are used to create polygon shapefiles of States, Counties, Tracts, Block Groups and Blocks. Each of these is identified by a unique FIPS (Federal Information Processing Standard) code:
  • Each state has a 2-digit FIPS ID; Delaware's is 10.
  • Each county within a state has a 3-digit FIPS ID, appended to the 2-digit state ID. New Castle County, Delaware, has FIPS ID 10003.
  • Each Census Tract within a county has a 6-digit ID, appended to the county code. The Tract in New Castle County DE that contains the center of the UD campus has FIPS ID 10003014502.
  • Each Block Group within a Tract has a single digit ID appended to the Tract ID. The center of campus is Block Group 100030145022.
  • Each Block within a Block Group is identified by three more digits appended to the Block Group ID. Morris Library is located in Block 100030145022003.
The Bureau releases summarized SF3 data down to the Block Group level, and summarized SF1 data down to the Block level. State-, county-, tract-, block-group- and block-level Census data tables extracted from these data files can be joined to these polygons via their matching FIPS codes.

The TIGER data also include point, line and polygon features representing roads, rail lines, streams, water polygons and other physical features.

ESRI, the publisher of ArcGIS, maintains a website  http://arcdata.esri.com/data/tiger2000/tiger_download.cfm from which you can download TIGER and associated 2000 Census data.  Access this site to download the following shapefiles for New Castle County, Delaware:

  • Census Block Groups 2000: "tgr10003grp00.shp"
  • Census Blocks 2000: "tgr10003blk00.shp"
  • Census Tracts 2000: "tgr10003trt00.shp"
  • County 2000: "tgr10003cty00.shp"
  • Line Features -- Hydrography: "tgr10003lkH.shp"
  • Line Features -- Rails: "tgr10003lkB.shp"
  • Line Features -- Roads: "tgr10003lkA.shp"
  • School Districts -- Unified: "tgr10003uni.shp"
  • Water Polygons: "tgr10003wat.shp"  
  • Also include the Census Block Demographics (SF1) -- "tgr10000sf1blk.dbf" -- in your download.

Unzip all of these files into the same project folder on your data stick. (The download file from the ESRI site contains a separate zipped directory for each shapefile; but you should extract the contents of each of these to the same directory.) Then use ArcCatalog to rename these shapefiles with more meaningful names ("roads," "streams," etc.). 

Next, download the Census Block Group demographics file that I created from the SF3 data for New Castle County. One ply of the Excel worksheet contains the data, the other defines the variables.

The TIGER shapefiles are in lat-lon decimal degrees, but they don't have accompanying projection (.prj) files that specify this, so Arc won't handle them correctly until you define the coordinate system for each shapefile. Use the Arc Toolbox's Data Management Tools--Projections and Transformations--Define Projection tool, or edit the shapefile Properties in Arc Catalog, to define each shapefile's coordinate system as "Geographic--Spheroid-Based--GRS1980."

Now load the TIGER shapefiles and SF1 Census database file into your a new ArcMap project.  In the data frame's Properties, set the Coordinate System to "Projected--State Plane--NAD 1983 (HARN)--Delaware" This doesn't alter the unprojected shapefiles; it just displays them in a State Plane projection like the map on the left, not like the map on the right.

PART ONE: Exploring TIGER data

  1. Create a categorical road map with different line styles for sets of Census Feature Classification Code (CFCC) categories in the Roads shapefile: A1x's are interstate highways; A2x's are main highways; A3x's are connecting roads; A4x's and higher are neighborhood roads, except A63's which are highway ramps. Group the A10's as a category, the A20's as a category, the A30's as a category, the A40's and everything else except A63's as a category, and the A63's as a category. Include the water and rail features in your map with appropriate display styles. Once you get really nice symbology set up for the roads shapefile, you can save the shapefile symbology as a Layer file. (You can even save symbologies for a whole group of shapefiles in a group layer file.)

  2. Join the SF1 demographics Block-level file to the Census Blocks shapefile attribute table using the common STFID field.  As explained above, each block is identified in the STFID field by its hierarchical 15-digit FIPS code SSCCCTTTTTTBBBB where SS is the state, CCC is the county, TTTTTT is the tract and BBBB is the block ID.  (Block Groups within each Tract are identified by the first digit of the block ID.)  Likewise, join the Block-Group-level SF3 data for New Castle County to your block group shapefile using the 12-digit block group ID's (SSCCCTTTTTTB)

    Create "AREA" fields (data type should be "Double") in the block attribute table and the block group attribute table. Then right-click on the field headings and use "Calculate Geometry" to calculate the polygon areas in square meters or square kilometers (1 sq. KM = 1,000,000 sq. M.) Note that if you calculate areas from lat-lon units you get bogus measures based on "square degrees."

    Now create cool-to-hot thematic maps of 2000 population density by Census Block and by Block Group for the county using whatever classification scheme works best. 

  3. Download the EPA's point shapefile of toxic waste sites. The "TYPE" field near the end of the attribute table identifies the "Superfund" toxic waste sites. I included a field of ones to use in creating density maps of these.

    Use the Spatial Analyst "Density" or "Interpolate to Raster--Kriging" tool to create separate density maps of the Superfund sites and all other EPA sites using a search radius of 5000 meters. Use the Raster Calculator to create a weighted-sum exposure risk map, adding 5 times the Superfund density plus the other EPA site density. Does there appear to be a spatial correlation between exposure risk and poverty rates?

PART TWO: Spatial Statistics

  1. Open a blank Arc session and download a zipped shapefile of New Castle County block groups with a different attribute table. Add this shapefile to the dataframe. This shapefile is in DE State Plane NAD 1983 (HARN) coordinates.

    Under Tools--Extensions, activate the Geostatistical Analyst extension. Add the Geostatistical Analyst toolbar. Use the Geostatistical Analyst's "Explore Data" tools to examine the spatial clustering of poverty and racial groups in the county. (If you get an error telling you to increase the maximum number of features a geostatistics tool can analyze, increase the maximum from 300 to 400: from "My Computer" open C:--Program Files--ArcGIS--Utilities and run the AdvancedArcMapSettings utility; make the change in the "Geostatistics" tab.)

    1. Compare histograms of PCTBLACK, PCTWHITE and ln_B_W_ (the natural log of the ratio PCTBLACK/PCTWHITE). What do the skewness (asymmetry; a normal distribution has zero skewness) and kurtosis (fatness of the tails of the distribution; a normal distribution has kurtosis=2) suggest about the clustering of blacks and whites?
    2. Compare Normal QQ plots of PCTBLACK, PCTWHITE, ln_B_W_, MEDHHINC and PCTPOV. How do these accord with your analyses of the histograms?
    3. Create a semi-variogram/covariance cloud of ln_B_W_. These graphs plot the covariance of each pair of block groups against the distance between the pair. Click the Covariance tab to see the spatial covariance cloud, which should look like a Nike swoosh. What does this suggest about spatial clustering of blacks and whites in the county? If you suspected there was a particular directionality to the covariance, you could click the "Show search direction" box and examine covariance clouds in different search directions.

  2. In the Geostatistical Wizard toolset, create an Inverse Distance Weighted interpolation of ln_B_W_. Then create an ordinary Kriging of ln_B_W_. How does the IDW interpolation compare to the kriging prediction map?

  3. Now execute a Cokriging of ln_B_W_ (Dataset 1) with PCTPOV (Dataset 2). How does the co-kriging prediction map for ln_B_W_ compare with the ordinary kriging prediction map? (Right-click on the co-kriging layer in the table of contents and click "Compare...")

  4. In Arc Toolbox's Spatial Statistics--Analyzing Patterns tools, use the High/Low Clustering (Getis-Ord) tool to determine the probability that the spatial distributions of high and/or low values of PCTWHITE, PCTBLACK and PCTPOV are merely random.

  5. Use the Spatial Autocorrelation (Moran's I) tool to compare the spatial autocorrelation of PCTBLACK, PCTWHITE and PCTPOV.

  6. Under the Mapping Clusters tools, run the Cluster and Outlier and the Hot Spot Analyses on ln_B_W_.

  7. Under the Modeling Spatial Relationshiips tools, run an Ordinary Least Squares regression of ln_B_W_ (dependent variable) against PCTPOV, i.e. ln_B_W_ = C0 + C1×PCTPOV + e. This procedure models the relationship between race and poverty with intercept and slope coefficient estimates that do not vary over space. Does the map of residuals e from this regression exhibit significant clustering?

  8. Now run a Geographically Weighted Regression of ln_B_W_ against PCTPOV. This procedure allows the regression coefficients to vary over space. The map of residuals from this regression should be much more random. Switch the symbology to display the C1 slope coefficient. Notice how the relationship between race and poverty is negative around Newark but strongly positive north of Wilmington. Explain.


"Do the chickens have large talons?"

v