Intro to Atlas GIS 2.0
One of the reasons why GIS technologies have flourished in the U.S. is the widespread availability of public GIS data resources. Beginning in 1980, the Census Bureau began publishing digital geographic data to complement and support mapping of its population and housing data. These are now known as TIGER data. The U.S. Geological Survey publishes also publishes digital data locating transportation and water features, hypsography (elevation contour) lines and other physical features. This chapter reviews these data resources and explains how you can use them with AtlasGIS.
The Bureau of Census produces and distributes both geographic data (TIGER/Line files) and attribute data (tabulated Census data files at various geographic Summary Levels). Census data can be geo-referenced using the TIGER/Line files to support GIS analysis at various levels of geographic detail. The Summary Level hierarchy and SUMLEV codes are:
The Census block is the smallest unit of Census geography. A Census block typically contains about 70 people. A block group is a set of Census blocks aggregated (and identified) by the first digit of the block code.
While zipcode areas are not official census divisions, they are referenced in the TIGER/line files. Some census files (STF3B and Economic Census Volume 2) contain data at the zipcode level. Note that zipcodes have some drawbacks: the Postal Service frequently revises zipcode boundaries, and some zipcodes are assigned to post offices (just points in a GIS) serving undelineated areas. On the other hand, most zipcodes areas are reasonably good transportation area units, since they are defined to (hopefully) minimize mail delivery costs. Also, survey respondents generally know their home zipcodes and are willing to report these to survey researchers. They are generally less willing to report actual home addresses, and generally don't know what Census tract they live in. Zipcode boundary files are available from various commercial vendors.
TIGER/Line files geo-reference physical features (streets, water features, etc.). They are extremely useful for developing base map files for use in GIS's. TIGER files do not contain conventional Census data.
TIGER/Line data cover the United States and its territories, with a set of files covering one county or its equivalent. (These supercede the older DIME files released by the Census Bureau with the 1980 Census.) The most important TIGER/Line files are Type 1 and Type 2 files:
TIGER Type 1 files contain one record for each line (or point) feature. Fields include the feature ID and name, the address range on the left and right sides (if a street), a 3-character Census Feature Classification Code (CFCC) identifying the feature as a primary road, perennial stream or whatever; and the Census Tract, Block Group and Block ID's on the left and right sides of the feature. TIGER Type 2 files contain one or more records of up to 10 "shape points" (vertices in lat-long) linked to each Type 1 file record by feature ID. Other files contain information on landmarks (points), some regions (parks, etc.), alternate feature names, etc.
The latest data release (1994) is an update of the 1992 Post-Census files. TIGER/Line files are distributed on CD's which also include text files providing good technical documentation.
The Census Bureau conducts a decennial Census of Population and Housing to comply with Constitutional requirements for Congressional redistricting. Data from the 1990 Census on Population and Housing are published in various STF (Summary Tape File) releases. The most frequently used are STF1 and STF3:
STF1 contains "100-percent" data derived from the short-form questions answered by all U.S. households. Population items include tabulations by age, race, sex, marital status, Hispanic origin, household type, household relationship, etc., as well as various cross-tabulations of these. Housing items include occuancy/vacancy status, tenure, units in structure, contract rent, meals included in rent, value, and number of rooms in housing unit, with various cross-tabulations by race, Hispanic origin of householder or tenure. STF1 data releases include:
The CD's on which the various STF1 and STF3 files are released also contain copious technical documentation. The most useful documents are the Data Dictionaries, each of which has two parts: the "Identification Section" explains codes for the initial identification fields in each record, and the "Matrix Section" lists the contents of each data field, field name in the CD-ROM file, field length (generally 9 characters) and position in the tape file.
The Census Bureau conducts an Economic Census in years ending in "2" or "7." The results of the 1992 Economic Censuses have been distributed on compact discs as Volume 1 and Volume 2.
Geographic files in AtlasGIS use Strategic Mapping, Inc.'s proprietary data structures. To use TIGER/line data in AtlasGIS, these have to be translated into the AtlasGIS data format. Atlas Import/Export, a separate DOS-based program available from Strategic Mapping, translates geodata files in a number of formats (BNA, Arc/Info, DLG, DXF, GBF/Dime, Mapbase, MapInfo and TIGER/Line) into AtlasGIS geographic files. Import/Export is installed on several computers in the lab. With Import/Export, the transformation is done by running one-line batch programs of the format:
C:> ie input-file output-file /option1 /option2 ....
For importing TIGER/Line data, Import/Export options include:
Import/Export imports TIGER/Line files directly to create line layers such as streets, rail lines and water boundaries, and point layers for landmarks such as churches, schools, parks, etc. These files are typically quite large and detailed. The road, stream and water boundary layers are useful in their own right. They make good base maps for superimposing additional geographic layers. And they can also be used for address matching (explained in detail below).
Import/Export does not import Census geography region features from TIGER/Line data directly. These region boundaries are typically comprised of multiple line segments, and AtlasGIS lacks the automated topology-building functions necessary for assembling component boundary segments into region boundaries.
Atlas sales representatives have conceded to us that Import/Export is a somewhat buggy program. We have had difficulty using it on some machines. Its layer translation file for importing USGS Digital Line Graph (DLG) files fails to allocate many 1:24,000-scale hydrography (water) and hypsography (elevation contour) features to appropriate records unless the /all option is specified. Although Import/Export is supposed to recognize file types by their filename extension, it frequently requires options to instruct it which import and export formats to use, e.g., /ibna (import .BNA format file, explained below) and /oagf (output Atlas geographic file).
Import/Export imports line and point features from TIGER/Line files directly, but TIGER/Line files don't contain polygon records per se, and Import/Export cannot extract county, Census tract, block group or block boundary polygons directly. To form such region layers, an additional program is required to build the region topologies, i.e., string the TIGER/line records together to define closed polyline boundaries and extract feature identifiers. A commercial program called Doctor Doolittle, distributed by BonData, Inc., was specifically written to create .BNA (Atlas ASCII format) files defining census polygons. The .BNA output file of Doctor Doolittle can then be converted to an AtlasGIS geographic file by Import/Export.
Doctor Doolittle (version 2) is a well-written, easy-to-use program that can be run in batch mode or interactively. The user simply specifies the input TIGER/line files, the output filename and the types of census region features to be created. The post-Census 1992 TIGER/Line files support formation of the following feature types (with Dr. Doolittle codes):
C:> doc_do2
Once Doctor Dolittle creates the .BNA files, Import/Export imports them to create AtlasGIS geographic file regions:
C:> ie input-file.bna output-file.agf /ibna /oagf /names 2
The /names 2 option is required.
.BNA files are comma- or (optionally) tab-delimited ASCII files containing feature identifier lines followed by vertices (one coordinate pair per line). The identifier lines contain one, two or three feature names in double quotes, and end with an integer type/length code n indicating the type of feature and number of vertices on following lines. For example:
"_Name", "_Name2", "_ID", n
X1, Y1
X2, Y2
...
Xj, Yj
For a point feature, n = 1; a single line with its
vertex follows.
For a line n = -2 or less; the specified number
of lines with vertices follow.
For a region, n = 3 or greater: the specified number
of lines follows.
For a circle n = 2: two lines specifying the vertices
of the center and one point on the perimeter follow.
Regions must be explicitly closed: i.e., the first vertex is repeated on the last line. In cases where region features include islands or exclude interior lakes, the principal perimeter must close on itself and the vertices of the exterior island or interior lake follow immediately and close on themselves. AtlasGIS recognizes the implicit discontinuity.
The .BNA format is more fully documented in the Atlas Import/Export User Guide. You can always import geographic features to AtlasGIS with this format if no other format works. On occasion, to convert point features in an AtlasGIS geographic file to AtlasGIS datapoints, we have had to export the geographic point features to .BNA format, edit the ASCII file, and then re-import them with File-Datapoint-Tools-Import.
STF1 and STF3 Census data files from CD-ROM are dBase-compatible and can generally be used as as AtlasGIS attribute files once a location identifier field (based on County FIPS code, Tract Number, etc.) is constructed which can be matched to a field in the geographic file. For example, an STF3B file containing Census data by zipcode can be matched to a geographic file containing a zipcode region layer or a zipcode centroid point layer, since both files have matchable fields containing zipcodes.
Attribute data files are imported into AtlasGIS using File-Attribute-Tools-Import. Importable formats are dBase 3 and 4 (.DBF), 123/Symphony Worksheets (.WK1, .WKS, etc.), Excel Worksheets, and ASCII tab- or comma-delimited. Imports of dBase files are quite reliable; when importing from other formats, check the structure of new attribute file.
When the attribute file structure is correctly defined in the file structure spreadsheet, a pop-up form follows in which the import matching options are set. This is where you specify the identifier-fields matching the active geographic file's records to records in the attribute file being imported. The fields being matched must be of the same type (numeric, character, etc.)
If you are importing raw dBase-type Census data files, at some point you will need to create a new character identifier field of at least 15 columns in which you concatenate the state FIPS code (first 2 characters), county FIPS code (next 3 characters), tract ID (next 6 characters), block group ID (next 1 character) and block ID (next 3 characters). This identifier field is needed to match Census data attribute records to geographic features. You can create this field in the source file with a database manager program, or in the new attribute file you are importing to with File-Attribute-Tools-Import. To create a new field, enter its name and other characteristics in the bottom row of the Database structure spreadsheet. Then you can fill the field with the appropriate concatenation expression using Edit-Attribute-Replace-Expression.
File-Attribute-Tools-Import offers various options for handling duplicate attribute matches (you might have redundant records, or might be importing multiple records--for customers, say--to get cumulative data on the customer base in each sales region): Reject (imports the first occurence of a data record and discards the rest); Replace (imports the last occurence) or Sum (adds duplicate numeric records--useful for aggregating multiple attribute records).
Options for handling non-matches are: Reject (ignores all records that do not match to a geographic feature) or Import (imports records into the attribute file even though they have no match in the geographic file). The new attribute file has a .dbf extension and can now be referenced with the correct geographic file in place.
After the attribute file is imported, File-Attribute-Tools-Structure lets you make further changes to the attribute file structure as necessary. (Note: if you are creating new fields, keep in mind that dBase3 files are limited to a maximum of 128 fields; dBase4 files can have up to 255.)
Besides the standard geographic, attribute and datapoint files which come with AtlasGIS, the following files are available from the Spatial Analysis Lab for direct use in, or import to, AtlasGIS.
This exercise covers (1) the direct import of TIGER line features to AtlasGIS, (2) the creation and import of Census region features, and (3) the import of Census data to AtlasGIS attribute files so that data records are matched to corresponding Census region features in an AtlasGIS geographic file.
1. Direct Import of Tiger/Line files:
From 1992 Tiger/Line files, create an Atlas/GIS line map of New Castle county using the default layer translation file called TIGERIN.TRN. Copy the New Castle Tiger/Line .f51 file from compact disc to the C Drive. (Import/Export works faster extracting off disk rather than CD.) The compact disc path is D:\10\003. The file to be copied is tgr10003.f51.
Switch to directory where Import/Export is located. At the DOS prompt type
ie c:\tgr10003.f51 nc_tiger.agf /layer tigerin.trn
The import process will take about 15 minutes. The output includes the four files associated with an Atlas/GIS geographic file (NC_TIGER.AGF, .AGX, .AIF and .ANX). Restart AtlasGIS, activate "nc_tiger" and set appropriate line colors with Display-Layer-Settings.
2. Creating Census Polygons:
From the Tiger/line files, create an Atlas/GIS map of New Castle county at the block group level. The Tiger/Line files required to create the block groups are tgr10003.f51 and tgr10003.f52. Since we already have a copy of tgr10003.f51, we only need to copy tgr10003.f52.
Switch to directory where Doctor Doolittle is located. Try one of the following:
For interactive mode, type doc_do2 at the DOS prompt, and go through the bar menus. Extract just the block group ("bGroup") polygons. After the last bar menu, the polygon building process starts. The output is a .bna file.
--or--
For batch mode, type doc_do2 c:\tgr10003.f51 /c:\cg9 at the DOS prompt.
The polygon formation process will take about 15 minutes. When it is finished, you will have an Atlas ASCII format file of polygon boundaries ready for import into AtlasGIS.
Now import the .BNA format output from Dr. Doolittle to Atlas GIS: from the directory where Import/Export is located, type
ie c:\cg910003.bna nc_bgrps.agf /ibna /oagf /names 2
at the DOS prompt.
The import process yields the four files associated with an Atlas/GIS geographic file (NC_BGRPS.AGF, .AGX, .AIF and .ANX)
3. Importing Census Data into an Atlas/GIS attribute file
Import census file containing New Castle population count at the block group level into Atlas/GIS. Activate the geographic file containing New Castle county block groups.
Use the attribute import utility File-Attribute-Tools-Import, select "dBaseIII/IV" as data file type to be imported, and specify the file, including its path, to be imported:
c:\exercise\stf1a001.dbf
The file structure worksheet appears. A CENSUSID field has already been created in this file: you do not need to make any changes to it. Press F10. Specify the matching options:
Create a thematic map of population density (POP100/LANDAREA) in New Castle County by Census block group.