FREC 682 -- Spatial Analysis
TIGER data import


TIGER (Topologically Integrated Geographic Encoding & Referencing) files are the most widely-used public GIS data. They support creation of county-level (1:100,000-scale) base maps of road, rail and water features (mostly lines) and Census region boundaries (e.g. county, Census tract, Census block group and Census block) by county. This project has you extract records from TIGER data and create GRASS vector maps from these, manipulate vector map toplogy, and extract and map Census data.

You may want to check out the GRASS tutorial by J. Hinthorne, which explains TIGER data, details methods of importing these data to GRASS, and explains how to create choropleth maps of Census data.

Each county's TIGER dataset contains a set of files; the two most important are the RT1 file, which contains end-nodes and identifiers for ALL the linework for the county (roads, rail lines, streams, invisible boundaries, etc.), and the .RT2 file, which contains shape coordinates for the linework in the .RT1 file.  It doesn't make sense to extract all of it at once into an undifferentiated line file.  Rather, you should select subsets of records from the .RT1 file to import as separate maps or layers.

To extract records representing a particular category of line feature, refer to the Census Feature Classification Code (CFCC), which occupies columns 56-58 of the .RT1 file.  Column 56 is a letter: A=roads; B=rail features; H=water features.  A1*=interstates; A2*=highways, A3*=connecting roads, A4*=neighborhood roads; H0*=shorelines, H1*=streams, H2*=canals, ditches, H3*=lakes, ponds, H4*=reservoirs, H5*=bay,ocean, H7*=invisible water boundaries--pretty useless.

To extract boundary segments for polygons such as census tracts, select the records where the polygon ID on the "left" side of the segment (based on the direction in which the segment was digitized) is different from the polygon ID on the "right" side of the segment.  There are pairs of fields for left- and right-side 3-digit county FIPS codes (cols 135-137, 138-140), 6-character Census Tract codes (cols 171-176, 177-182, but we'll only mess with the first four digits of these which identify the"basic" tract numbers) and 4-character Census Block codes (cols 183-186, 187-190, but we'll only use the first digit of these which identifies the Block Group).

  1. Open a GRASS session in the de_utm83 location. The 1997 updates of the New Castle County (DE) TIGER Type 1 and Type 2 data files tgr10003.rt1 and tgr10003.rt2 are located in /home/grass.data/census.data/. If you want, create links to these in your home directory, e.g.,

  2. ln -s /home/grass.data/census.data/tgr10003.rt1 nc_tiger.1

    Use AWK to extract the appropriate subsets of records from the .RT1 file, and v.in.tig.basic to create the following vector maps:

  3. Use AWK to extract the appropriate boundary records from the .rt1 file and v.in.tig.basic to create the following vector maps:


  4. Use v.to.rast to create a filled (area) raster map of the county.  This step illustrates how you can manually control GRASS's topology-building.  As generated by v.in.tig.basic, the county boundary is  lots of little arcs, each with its own ID in the map's dig_att file. If you simply v.to.rast the vector boundary line segments directly, you'll just get a raster outline--not what you want!  To get a filled polygon, you can hack the boundary map's dig_att file, which contains the arc ID's for all the boundary line segments.  The first couple of lines of the original dig_att file look like this:

  5. L 435289.8280658 4350016.983914  187249930
    L 434980.9961061 4349992.139591  187249951
    The first field indicates the feature type (L=line; A=area), the next two fields are easting and northing coordinates (yes, specified to the micrometer!), and the final field is the arc ID.  When you run v.support to build the topology, GRASS scans the arc records in the map's dig file and matches each dig_att record by position to the line arc coincident with or nearest that point, or the multiple area arc segments that circumscribe that point.  Note that GRASS can commingle lines and areas in the same map.

    You can rename the original dig_att file to something else, then extract its first line with the UNIX head -1 command, e.g.,
    head -1 nc.bndy.orig > nc.bndy to create a single-line dig_att file for the map.  Then use a text editor to edit this new dig_att file to specify that this is an area ("A") rather than line ("L") feature, specify any interior point, and pick any ID value you like:
    A 434980.9961061 4349992.139591  1
    Then run v.support to rebuild the county boundary vector map's topology.  Then when you v.to.rast this map, you should get a filled raster polygon.  Now you know how to hack the topology of a vector map.

    Use d.vect to superimpose interstates, highways and secondary roads, rail features and all water features, all in different colors. Save this display as a .GIF file for a Web page presentation of your work.
     

  6. Create copies of your vector Census tract and block group maps.

  7.  
  8. I have already used the GRASS module m.in.stf1.tape and AWK to extract separate sets of tract and block group records from the STF1A 1990 Census files for all 3 counties in Delaware. These files, de.stf1a.blkgrp and de.stf1a.tract, are located in /home/grass.data/census.data.

  9. Use v.apply.census (what the Hinthorne tutorial refers to as s.in.stf1) to create area vector maps which can then be rasterized to create thematic (choropleth) maps. v.apply.census basically rewrites the dig_att file of your vector map the same way you hacked the county dig_att file, replacing line ID records with area ID records where the ID's are values extracted or calculated from STF1A data fields for each Census reporting area. The interior X-Y coordinates in the new dig_att file are extracted from the INTPTLAT and INTPTLON fields (the lat-lon coordinates of an interior point in the tract or block group) in the STF1A data file, converted to UTM.  Note that v.apply.census overwrites the map's dig_att file; which is why you should run it on copies of your vector block group and tract maps.

    Refer to the Matrix Section of the STF1A Data Dictionary to identify the appropriate fields and field lengths in the Census data file. v.apply.census even lets you do mathematical combinations of fields, e.g.:
    v.apply.census in=de.stf1a.tract out=att f='ncc.popden=(I291/J172)*1000/2.59'
    creates a map with population densities as area ID's in the dig_att file. (See Note 12 in the STF1A Data Dictionary: areas measures are in thousandths of a Km, there are 2.59 Km/Sq. mile). Note v.apply.census's simple field reference system for the Census records: for example, "J172" refers to 10 columns starting in column 172 (A=1 column, B=2, etc.).

    Create the following vector area maps:

  10. Now use v.to.rast to create raster thematic maps from these vector area maps.

  11.  
  12. Finally, use r.neighbors or r.mfilter to do a low-pass (neighbor averaging) filtering to smooth the edges of your population density and housing values area features. (Densities and housing values don't really change abruptly at Census tract or block group boundaries, do they?)  Display and save .GIF versions of these two maps with some vector road, rail and/or water features superimposed for visual reference.



  13. Create a brief HTML presentation of your work.

  14.  
  15. Optional (some final hacks you can try): Try creating a GRASS site (point) file directly from the STF1A file using a script of the form:

  16. #!/bin/bash

    # script to extract lat-lon ref'd census data in DECIMAL DEGREES
    # from STF1A file into GRASS site map in DD:MM:SS
    # NOTE: INTPTLAT begins in col 271 with a "-" char, is 9 char's long;
    # INTPTLON begins in col 279 with a "+" char, is 10 char's long

    awk '{ mx=100+int(substr($0,272,6)*.00006) ;
    sx=100+(substr($0,272,6)*.0036)%60 ;
    my=100+int(substr($0,282,6)*.00006) ;
    sy=100+(substr($0,282,6)*.0036)%60 }

    {print substr($0,270,2) ":" substr(mx,2,7) ":" substr(sx,2,7) "|"     \
    substr($0,280,2) ":"  substr(my,2,7) ":" substr(sy,2,7) "|"           \
    substr($0,72,3) substr($0,52,6) substr($0,51,1)}' < de.stf1a.blkgrp > \
    blkgrp.points

    You could display these in a lat-long location, or convert these to UTM with m.ll2u.  Alternately, you could hack up a new dig_att file by hand this way.