Intro to Atlas GIS 2.0
Mackenzie, Tanjuakio and Sparco

to next chapter


Chapter 8: Address-Matching

Introduction

One of AtlasGIS's most powerful capabilities is address-matching, i.e., geo-referencing addresses and/or converting database files with street address fields into geo-referenced AtlasGIS datapoint files. There are several ways to locate datapoints via street addresses:

View-Map-Address centers the map view on a particular street address and temporarily marks the location. (It does not actually create new datapoints.) The user is prompted to enter a target address and viewing radius in a query window.

The AtlasGIS script ADDR2PNT, accessed through File-Run, creates datapoints one at a time from user-inputted addresses. Its operation is self-explanatory

Edit-Datapoint-Admatch performs wholesale address matchings of datapoints to geographic street files. Admatches can be performed in batch or interactive mode. These utilities are discussed in detail below.

Edit-Datapoint-assign_Centroids is a quick-and-dirty way of locating ungeoreferenced datapoints at the centroids of any regions in which they are located. This may be used for datapoints which Admatch has failed to locate. The user specifies the region layer from which to assign the region centroids.

Basics of Address-Matching

The process of address-matching follows four basic steps:

  1. Address records are parsed into street number, street prefix (North, SW, etc.), street name, street type (Road, Ave., etc.), and suffix (NW, Apt. 4B, etc.);
  2. Common variants (abbreviations et al.) are interpreted via a translation file;
  3. The street prefix, name and type are matched to records in a geographic street file derived from Type 1 TIGER Line files; the side of the street (odd- or even-numbered) is determined and a position along it interpolated by matching the street number to the geographic file's address range fields derived from TIGER; and
  4. Locational coordinates are calculated for each matched record in the database so that it can be used as an AtlasGIS datapoint.
The efficiency of the address-matching utilities is constrained by the quality of the TIGER/Line data on which the geographic street file is based. TIGER/Line Type 1 files include fields for matching street number, prefix, name, type and suffix, but these data only exist for urbanized areas such as upper New Castle County. These fields are mostly empty for street features in Kent and Sussex Counties, where few areas have city-style addresses and most rural households have rural delivery addresses (e.g., RD2, Box 824) which are proprietary information of the Postal Service.

Other resources for address-matching in Delaware are limited. The Reference Room of the Morris Library at the University of Delaware has copies of Cross-Reference Directories for Delaware, which list telephone numbers in geographic order, although these are not readily referenceable to rural delivery addresses. Kent and Sussex Counties are currently developing a 911 Emergency Response system which will include a database of georeferenced addresses; this database will likely be made available to GIS researchers in the future.

Performing Address-Matches

Before using any address-matching utility, you must have an active geographic file containing address-range, street prefix, street name and street type fields. Such files are easily created from TIGER/Line Type 1 files using Strategic Mapping's Import/Export program (see Part 5).

You may want to experiment with the simpler utilities (View-Map-Address and the ADDR2PNT script) on your own. These are not discussed in detail here.

The Edit-Datapoint-Admatch utility is the most powerful method of address-matching. To use it, activate an (un-geo-referenced) datapoint file containing an address field to be parsed, along with a geographic street file. The datapoint file must include empty numeric LAT and LON fields; these are automatically created when you import data using File-Datapoint-Tools-Import. You may also add an optional MATCHCODE field with File-Datapoint-Tools-Structure: AtlasGIS will enter a code in this field indicating whether or not each record was matched or why it failed to match.

You may also have an attribute file active, and may create fields in the datapoint file so that matched datapoints can be assigned field values from corresponding geographic features. (For example, the street geographic file might also contain Census Tract features, so you could address-match customers and also identify the Census Tract they live in: each customer's TRACT ID would be extracted from the Tract in which he/she is located.)

Address-matching is usually an iterative process: first you match records according to strict match criteria in batch (non-interactive) mode; then you try successively more relaxed match criteria interactively.

Start with Edit-Datapoint-Admatch-Batch and fill in the settings form. You will process "Unmatched" records, using all line layers with street features having the necessary address fields, and (optionally) any point features with the necessary address fields as well. The Display ID simply helps you identify which records are still unmatched when you switch to interactive mode. The Address field is the single field in the datapoint file which AtlasGIS will parse.

Note: Be aware of situations which might yield erroneous matches, for example, a datapoint at 45 W. Main St., Middletown (not found in the database) mis-located at 45 W. Main St., Newark.

Start Admatching with all Relax settings at "No." You can relax one or more criteria later in interactive mode. If you have town or zipcode fields in both the datapoint file and the geographic file, you can specify these as additional match fields. (Unfortately, files derived from TIGER/Line Type 1 records don't generally have these.)

You can match the datapoint TRACT field to the correct Census Tract ID for the side of the street to which AtlasGIS matches the datapoint record: enter TRACTL and TRACTR in the left and right columns of the Fill menu. The Offset Distance specifies how far from the street (in feet) each matched datapoint should be located.

Your batch Admatch session almost certainly won't match all your datapoint records. You can improve your success rate in Edit-Datapoint-Admatch-Interactive by relaxing one or more match criteria--address number, prefix, name, type or suffix--and match records with minor typos, unrecognized abbreviations, etc. (Suggestion: relax street prefixes, types and/or suffixes first, then street names. AtlasGIS won't let you relax all the criteria.) In interactive mode, you are prompted to choose between multiple possible matches under relaxed criteria, and you can cycle through still-unmatched records, edit erroneous address records in the datapoint file, correct erroneous address field parsing, etc.


EXERCISE 8.1: Mapping the Incidence of Giardia lamblia

(Contributed by Maria C. Centenera, M.S. Candidate, Dept. of Food & Resource Economics, University of Delaware)

Giardia lamblia is a particularly nasty disease caused by water-borne intestinal parasites. By law, all diagnosed cases are reported to the State Epidemiologist at the Delaware Department of Health and Social Services. The objective of this exercise is to map these cases in New Castle County, Delaware, and show their distribution by patient age and over time.

  1. Setup: To begin, activate the geographic file of New Castle County roads. Import the dBase file of reported cases of the Giardia virus from 1992, 1993 and part of 1994 ("GIARDIA"), creating an AtlasGIS datapoint file with File-Datapoint-Tools-Import. Atlas will prompt you to add LAT and LON fields which the Admatch utility will fill in later. Add a character field called MATCHCODE to the datapoint file as well using File-Datapoint-Tools-Structure-/Tools-Insert.

    The GIARDIA file contains records for all cases in Delaware. Since you are only address-matching the New Castle County cases (the other counties don't have as high a proportion of city-style addresses to match cases to), select just the New Castle County records in the datapoint file using Edit-Datapoint-Browse-/Tools-Select-Condition, specifying COUNTYCODE=3 as the selection criterion. Write the selected records to a new file (say, "NC_GIARD") with File-Datapoint-Tools-Write). Then activate this new datapoint file (File-Datapoint-Use).

  2. Batch Admatch: Begin the actual address matching in batch mode: Edit-Datapoint-Admatch-Batch. Specify unmatched datapoint records; <<All>> line layers and <<All>> point layers; display and matching criteria (Address); Fields to Fill (fill the MATCHCODE field with the matchcode). After you execute the match, you can Edit-Datapoint-Browse the datapoint file's MATCHCODE field to see which records were correctly matched. (See the following table for definitions of AtlasGIS's matchcodes.)

  3. Interactive Admatch: Increase the number of matches by relaxing one or more search conditions: Re-execute the admatch on still-unmatched datapoint records with Edit-Datapoint-Admatch-Interactive. Reviewing the still-unmatched records one at a time, use the Edit utility to correct mis-spellings, etc. in datapoint records. If you know where an unmatched datapoint should be located on the map, use the Map utility to place it manually. You can Unmatch records you know are erroneously located, resetting their lat-long coordinates to (0,0). Note: coordinates of untried datapoints are (0,0); AtlasGIS attempts to match these immediately. Coordinates of tried-but-unmatched datapoints are (-1,-1)

  4. You will likely have some unmatched records despite your best efforts. Since you can't locate these points exactly, locating them at the centroids of the zip code areas they are located in may be sufficient. You will need to activate a separate zip code geographic file (the U.S. Zip Centroid file which comes with AtlasGIS will do), and use the Edit-Datapoint-assign_Centroids utility to replace the 0 and -1 values in the LAT and LON fields of unmatched records with the centroid's coordinates.

    Before doing this you should probably separate the matched and unmatched records into separate datapoint files, execute Edit-Datapoint-assign_Centroids on the unmatched records only, then re-merge the datapoint files. Use Select-Condition: matched records will have LAT>1; unmatched records will have LAT<1; then File-Datapoint-Tools-Write.

  5. Now create a one-variable thematic map using the datapoints that you just matched. The resulting map will show incidence of Giardia by roughly four month intervals. In the Thematic-Variable setup, select the Datapoint layer and enter the data expression: YR*100+MMWRWK. This will display cases by time of incidence (year, then week). Use a Ranged Symbol thematic type; Discontinuous ranging method with 9 ranges. Edit the range values as follows to group the datapoints by quarter-year:

    Now try mapping by age of patient, using 4 or 5 ranges. You will see giardia mainly afflicts children and elderly people. (Most cases apparently stem from poor hygiene at day care centers and nursing homes.)


AtlasGIS Match Codes

Unmatched addresses are indicated by UPPER CASE letters:

Matched addresses are indicated by lower case letters:


previous chapter
next chapter