AWK is a Unix programming language named for its developers (Aho, Weinberger and Kernighan). Once you get used to its funky, terse syntaxes and get past its useless error messages, it's well worth learning. In this course your principal uses of AWK will be to reformat files to create GRASS cats, reclass and sites files. AWK can do a great deal more, however. For more complete reference, check out Aho, A.V., B.W. Kernighan and P.J. Weinberger, The AWK Programming Language. Reading, MA: Addison-Wesley.
AWK programs perform pattern-matching and procedures on data streams.
Simple AWK programs enclosed in single quotes can be typed and executed right at the Unix prompt. For example, the program
awk '/string/ {print $1, " = " $3^2}' < input_file > output_file
extracts all lines with a character sequence matching the pattern string from input_file, parses each line into fields separated by blanks, and outputs the first field, an equals sign and the value of the third (numeric) field squared to output_file. If an input file or output file are not specified, awk will expect input from stdin or output to stdout. In seconds, a simple AWK program can create a GRASS cats or reclass rules file which might take you hours to create with a conventional text editor.
More complex AWK programs can be stored as text files and executed with the -f option:
awk -f textfile < input_file > output_file
AWK has various system variables: FS and OFS are the input and output field-separators (default is blank); RS and ORS are the input and output record separators (default is newline): NR is the number of the current record; NF is the number of fields in the current record; $0 is the entire current record; $n (n>0) is the nth field in the current record.
AWK is very flexible about matching character or numeric patterns. Patterns can be
(This last example selects lines where the two characters starting in fifth column are xx and the third field matches nasty, plus lines beginning with The, plus lines ending with mean, plus lines in which the fourth field is greater than two.)
BEGIN lets you do things before the program starts processing input lines, e.g., reset variables such as FS or RS, or create column headings for your output. For example:
BEGIN { FS = "|"; OFS = ","; print "Site coordinates" }
END lets you do things after the program has processed the last input line, e.g., print ending text or cumulative variables such as line counts or column totals. For example:
{ sum += $2 }
END { print "Total acreage is ", sum }
AWK procedures are enclosed in {curly brackets}. Procedures can
(1) assign variables or arrays. For example:
BEGIN {FS = ","}
(resets the field-separator
character to comma before reading input)
/string/ { count["string"]++ }
(creates array
count indexing the occurrences of string)
split (substr($0,4,12),N,",")
(splits the
substring at commas into arrays N[1], N[2], ...
newvar = $4*sqrt($5)
AWK operators by order of (decreasing) precedence:
Field reference: $ Increment or decrement: ++ -- Exponentiate: ^ Multiply, divide, modulus: * / % Add, subtract: + - Concatenation: (blank space) Relational: < <= > >= != == Match regular expression: ~ !~ Logical: && || C-style assignment: = += -= *= /= %= ^=
AWK arithmetic functions: exp, int, log and sqrt
AWK string functions:
index(string,substring)
returns position
of first occurrence of substring in string, or 0
length[(argument)]
returns length of argument,
if specified, or $0.
split(string,array[,f])
splits string
by separator character f (or blanks) into array[1],
array[2], ...
substr(string,s[,l])
extracts from string
starting at position number s with length l (or
the rest of string).
(2) print output. Output can be unformatted (print) or formatted (printf):
{ print $2 $1 ":" $4/$3 }
prints the first two fields in reverse order, a colon and the integer ratio of the next fields.)
{ printf "Clump %d: \t%4.1f acres \t%s. \n", $2, $4*640, $6 }
prints formatted output: %d specifies a decimal number format for the clump ID; %n.mf specifies a floating-point number format for the acreage, converted from square miles; %s specifies a string format. \t is the tab character; \n is the newline character. The input line
28 4 12 0.072 vegcov forest spearfish
would be printed as the tab-aligned output line
Clump 4: 46.1 acres forest
(3) perform flow-control (you won't need these for this class):
Do-loops:
for ( [initial expression];
[test expression];
[increment counter expression] )
{ commands }
example: for (i = 1; i <= 20; i++) does 20 iterations
If-Then-Else:
if (condition)
{ commands1 }
[ else
{ commands2 } ]
does commands1 if condition is true; commands2 (or nothing) if false; condition is any expression with relational or pattern-match operators.
Other flow-control commands:
break exits from a for loop.
continue begins next iteration of a for loop.
exit terminates remaining procedures; terminates input;
executes END procedure, if any.
The best way to learn AWK is to study examples. Patience with it pays off. Enhancements to the original AWK include gawk and nawk, which support additional built-in arithmetic functions, string functions and flow-control. AWK on the Linux servers is actually gawk (the Gnu Project's version of AWK).