Digital Image Basics
(adapted from a lecture written by Betsy Mackenzie)

Human vision

Humans evolved as predators. Like other predators, we have forward-looking aligned eyes to support stereo-vision, so that we can gauge the distances of prey and/or threats. In contrast, most prey species have divergent eyes to support wide-angle vision. Their eyes are attuned to brightness rather than color; they see in grayscale, but are better at detecting motion in the landscape (predators?) than we are. We inherited grayscale vision (rods) from our ancestors and developed color vision (cones) to complement that.

The human retina has about 120 million rods and 7 million cones. Rods are grayscale light sensors--very sensitive to overall brightness across the full spectrum of visible light (wavelengths between 0.4 and 0.7 micrometers). Rods function much better in low light than cones. Rods are spread across the retina to support wider-angle vision.

Human visual acuity is sharpest at a small part of the retina called the macula, where our cones are concentrated. Cones are our color sensors, attuned to specific wavelengths but less sensitive than rods to overall brightness. About 64% of our cones see red light, 32% see green light and 2% see blue light. The predominance of red- and green- sensitive cones explains why we can discriminate somewhat better among red and yellow wavelengths than among blue wavelengths, and why red instrumentation lights affect our night vision much less than white light. Our cones are more sensitive to blue light than green or red light.

Vision uses more of the human brain than any other function. Our optic nerve bundles are up to half a centimeter thick and carry an enormous flow of information to our brains, but the eyes and optic nerves are only a small part of visual intelligence. Because the macular focus is so narrow, our eyes are constantly scanning and our visual cortex is constantly interpolating to sustain functional vision. Our brains are constantly judging distances based on the tiny visual offsets of objects perceived in eyes that are less than three inches apart. Our cognitive functions are constantly interpreting patterns in the data: "That's a chair; that's a lamp." And virtually all of this happens unconsciously.

Basics of imaging

Analog photographs are created by exposing film with an emulsion of photo-reactive silver halide to light, developing the negative to neutralize the unreacted silver halide, shining light through the negative onto photo-reactive paper and then fixing the paper positive.  The reactivity of silver halide increases with particle size, which is why large prints from high-speed film tend to be grainier than prints from slower film.  The silver halide crystals are arbitrarily distributed in the emulsion spread on the film, so the graininess of prints is arbitrary too.

In contrast, digital images are derived from electrical signals recorded by rectangular arrays of charge-coupled devices (CCD's) integrated with photoelectric sensors. The sensors convert light in specific wavelengths to electrical charges and the CCD's converts the electrical charges into digital values. The image is stored as a regular array (aka grid or raster) of numbers that a computer can interpret as color or brightness values. .  The regularity of this array makes the image readily recoverable from the numbers; you just need to know many numbers fill a row. If you zoom way in on a digital image, you will typically see individual pixels as squares. 

The amount of data an image contains is determined by its size, resolution and bit depth. Size is simply the height times the width of the image, measured in inches, centimeters, points or picas.  (This is not to be confused with the image's file size, measured in KB or MB). Resolution is pixels per linear inch.  Higher resolution means squeezing more pixels into the same space. Bit depth is the number of bits used to store the color information for each pixel.  The pixel value might refer to an indexed color, or it might code a mix of red, green and blue values.  It would specify a shade of gray in a grayscale image, or it might simply indicate if the pixel is black or white.

Brief detour: a binary math primer

A bit (short for "binary digit") is the smallest unit of computer information.  The binary number system has only 2 digits (0,1), which reference the on-or-off charge states of the fundamental data elements in computers. 

In base 10, a single digit has 101 = 10 permutations (0 to 9); 2 digits have 102 = 100 permutations (0 to 99); 3 digits have 103 = 1,000 permutations (0 to 999) ...and so on.
In base 2, a single bit has 2 permutations: 0 1; 2 bits have 22 = 4 permutations: 00 01 10 11; 3 bits have 23 = 8 permutations: 000 001 010 011 100 101 110, 111 ...and so on.

An image with a bit-depth of one has only black or white pixels.
An image with bit-depth of 8 can have 28 = 256 colors or shades of gray.
An image with a bit-depth of 24 can combine 256 levels of red x 256 levels of green x 256 levels of blue to yield over 16 million possible colors, whic matches or exceeds the color sensitivity of human vision. 

So image size, resolution and bit depth jointly determine how the image looks and how much storage it requires. A simple formula to determine the size of an image file (uncompressed) is:  height x width x resolution2 x bit depth = size of image file in bits   (To convert to Kb, there are 8 bits in a byte and 210 = 1024 bytes in a kilobyte, hence 8,192 bits/Kb.)

Bit-depth vs. resolution

Compare the two equal-sized images here. The upper left picture is a 1-bit image with a resolution of 2000 dots per inch. Each tiny pixel is either white or black. The upper right picture is an 8-bit image with a resolution of 200 dots per inch. Its pixels are much larger but have 256 shades of gray.

The two details showing the guy's left eye and glasses frame illustrate the difference. The overall image quality is about the same, but the first image file is over 12 times larger than the second. Without any compression, the 1-bit 2,000-dpi image is (2 x 2 x 20002 x 1)/8192 = 1953 Kb, while the 8-bit 200-dpi image is (2 x 2 x 2002 x 8)/8192 = 156 Kb.

When processing images, keep your ultimate size and resolution objectives in mind. For example, if you are scanning a 3 x 5 inch photo for eventual printing as a 6 x 10 hardcopy image on a 300dpi printer, there is no advantage in scanning the source image at more than 600dpi. When you resize the image from 3x5 to 6x10 the resolution will change from 600dpi to 300 dpi. Your scanning resolution should match the final dpi times the ratio of the final size to the original size.

An image's size and resolution are inversely related: enlarging the image to double its original size reduces its original resolution by half. Excessive enlargement will reveal the pixel artifacts ("jaggies").

Some color theory

There are three color-space models you should be aware of. The RGB color space is represented here as a three dimensional cube with red, green and blue axes. Colors are defined as (x,y,z) coordinates in the cube where black is (0,0,0) and white is (255,255,255). All other colors are represented by their coordinates within the cube. There is a "gray line" diagonal through the center of the cube than goes from black at the origin to white at the far corner.

The RGB color space is called additive because you add various intensities of red, green and blue to black to get a color. RGB is the standard color space used in computer displays, scanners and film recorders--devices where the default is black.

The inverse of the RGB model is the CMY or CMYK (cyan-magenta-yellow plus black) model. This is a subtractive color model: colors are defined by subtracting values of cyan, magenta and yellow (the complements of red, green and blue) from white. This model can also be visualized as a cube with C, Y and M axes; the origin is white; the gray line traces the diagonal from the origin through the cube to the black far corner.

CMYK is used in printing where the default background (paper) is white. The K in CMYK stands for black.  CMYK printers typically substitute cheap black ink for equal proportions of expensive CMY inks when printing color blends. 

The HIS color space defines colors based on their Hue (dominant color), Intensity (value or brightness) and Saturation (color purity).  This model is sometimes called HSV (Hue-Saturation-Value). The HIS color space is represented here as a cone. The gray line runs through the center of the cone runs from black (0) at the origin (tip) to white (255) at the end (base).  A color's Intensity (aka value) is its distance out the greyline: each cross-section of the cone is a color wheel of uniform intensity.   A color's Hue is its angular distance from red (0) through green (86) and blue (170) to red again (255) around the color wheel.  Its Saturation is its relative distance from the grey center of its color wheel (0) where R, G and B are in equal proportions, to the perimeter of the wheel (255) where the color is a pure primary or a mix of just two primaries.

Most image processing software uses the RGB model to manipulate images. The HIS model is just an alternative set of dimensions for the same color space. (One application of the HIS model involves creating a hillshaded digital elevation map where the DEM determines the hue and the hillshade map determines the intensity.)

Digital cameras and scanners

A charged-couple device (CCD) is a silicon semiconductor that acts as a light detector.  When light hits the crystalline silicon in the device, the electrons in the silicon become excited and create an electrical charge proportional to the amount of light (or the number of photons) that the silicon is exposed to. Digital cameras use a grid (2 dimensional array) of crystalline silicon CCD's.  The entire array is exposed at once and each cell in the grid captures a value which becomes a pixel or dot in the digital images. Three filters (red, green and blue) capture the intensity of each band of light creating a composite color image.  The size of the grid (number of megapixels) defines the maximum resolution of the camera. 

The intensity of light hitting each CCD through each filter is recorded as an 8-bit number (0-255).  Each pixel's red, green and blue values are combined to yield a 24 bit color value. The picture is then stored on some removable media (e.g., flash card or sD card).

Desktop scanners use a linear array of CCD's that passes under the face-down analog image, recording it as pixel values one line at a time. The resolution of a scanner is measured by the optical resolution of the CCD on the horizontal and by the speed and accuracy of the motor that controls the linear array on the vertical side.  The important number to consider when looking at scanner resolution is the optical resolution; many scanners can double or quadruple their effective resolutions by interpolating pixels between recorded pixel values.  Some scanners use a white light source and three (RGB) filters to capture the image in one pass. Others make three passes, one for each of three primary colors.

Image editing

A digital image program is used to edit image files. The simplest editor is the old Paint program that comes with MS-Windows, although its capabilities are limited and its default BMP formats (see below) are not particularly efficient.

You can learn a lot of digital image theory from using a fancier image editor such Gimp (free from www.gimp.org).  The GIS lab usually has Gimp installed on its machines, and I encourage you to download and install it on your own.  It is a great alternative to Adobe Photoshop. Like Photoshop, GIMP lets you edit multiple layers of an image, and supports a large number of filters and other enhancements. 

Rule #1: Always edit your images in RGB mode (24-bit depth) rather than Indexed Color mode (8-bit depth).   If you are editing a GIF, convert it to RBG for editing, then convert it back to Indexed when saving your edits.

Although web download speeds have increased dramatically over the past decade, you should still try to keep your web image files sizes reasonable. Download speeds are typically slower for larger image files because there is more packet assembly required at the client end.  People quickly get impatient with slow downloads, and may not wait around for oversized images. 

An editing example:

The image below is a .PNG exported from ArcScene:

I used some standard GIMP tools to obtain the final image below: autocrop to eliminate extraneous white space; increase overall brightness and contrast; select the reds which were still too dark, grow the selection to cover most of the piedmont area, feather the selection so changes in the selected area will blend well at the edges, and brighten and sharpen the selected part of the image; add the image title with anti-aliasing to smooth the text; add an alpha channel for transparency, then select and delete the white background to make it transparent, and re-save.

Image file formats

The Web supports three image file formats: GIF, PNG and JPEG.

  • The GIF (Graphic Interchange Format) format uses "indexed" 8-bit color and image resolution of 72 dpi. Up to 255 colors are defined for each pixed value in a color table at the beginning of the file. The remainder of the file is pixel values referencing these colors.

    The GIF format supports animation by timed display of a series of GIFs to create a cartoon like effect.  GIFs can also be interlaced so that when they are downloaded they display every other line and then go back and fill in the missing lines.  This makes the image seem to appear faster.  GIF files can also include specified transparent colors, so that you can blend a GIF's background into the page. 

    GIF uses a lossless run-length coding compression algorithm known as LZW (the inventors' initials); this was the basis for Compuserve's patent on the GIF format.  Run-length coding basically abbreviates "00000000001110000000000" to "10x0 3x1 10x0."   This compression strategy is most efficient for images that have long (horizontal) runs of uniform color values, but it is not very efficient for most photographs. Compuserve's patent has expired, and GIF is still widely used. 

  • The PNG format was developed as a web-compatible shareware substitute for GIF after Compuserve started charging other software developers licensing fees for its GIF patent. PNG has the same functionality as GIF, better file compression, it supports 24-bit color.

    PNG is generally the best format for maps posted on the web. It offers better color fidelity than GIF and preserves the crispness of borders better than JPEG.

  • The JPEG (Joint Photographic Experts Group) format is generally most efficient for storing photographic images.  The JPEG coding process sections the image into 8 x 8 blocks of pixels, and calculates cosine transforms that approximate the intensity and hue shifts within each block.  The image file just stores the transform coefficients, not the original pixel values, so the JPEG decoding process produces an image that only approximates the original.  So this is a "lossy" format, but its compression efficiency can be very high. 

    JPEG's can be saved in various levels of image quality, substituting compression efficiency for additional transform information that retains image quality.   Low-quality JPEG's with maximum compression often exhibit discernible "smears" on the edges of image features, and the 8x8 pixel blocks may be annoyingly obvious.

    The quality of an image can really suffer from cumulative information loss if you edit and resave it in JPEG format multiple times.  If you think you may have to re-edit an image, you should keep it in a lossless format such as BMP or even GIF. 

Other image formats are not directly supported by the Web, but may be useful in other contexts:
  • BMP (Microsoft Windows Bitmap) formats are recognized by all Windows programs and most other PC applications.  The format supports multiple bit depths: 1-bit, 4-bit, 8-bit and 24-bit.  But since BMP images have no compression their file sizes are often 10+ times as big as equivalent GIF or JPEG images.

  • TIFF (Tagged Image File Format) is an old format originally created by the Aldus and Microsoft Corporations to store scanned images. There are actually many types of TIFF format; most platforms recognize the standard types. 

  • Postscript (PS) and Encapsulated Postscript (EPS) are mixed formats that encode both raster and vector graphics. These were developed by Adobe, and are precursors of Adobe's Acrobat format.

  • The PBM (Portable Bitmap) format was developed by Jeff Poskanzer as a generic intermediary UNIX format for translating images between formats with his Portable Bitmap Tools.  Rather than create N x (N-1) direct format translators for N image formats, the PBM library has 2N translators for 2-step conversions through PBM formats.  To convert a TIFF to a GIF, for example, you would use tifftoppm and then ppmtogif.

Exporting map images

You should generally save your maps in PNG format for display in your project web pages. There are various ways of exporting maps, charts and layouts from ArcMap as web page images:

  1. File--Export to create an image file of your current ArcMap data or layout view.  Adjust the image resolution to control the size of the image. Image heights and widths between 300 and 1,000 pixels are generally best for the web. The exported image will have the same aspect ratio (height/width) and whitespace as the Arc map frame, so you should size the map frame and position the map appropriately before exporting an image of it. Alternately, you can crop the image afterward with Gimp. 
  2. Edit--Copy Map to Clipboard copies your map to the Windows clipboard for pasting into any graphic editing package such as Gimp or the Windows Paint program.  Edit as needed and save in PNG format.  Paint only lets you save a pasted image in BMP format, but when you reload it, you can save it as a PNG.
  3. Use Alt-PrintScreen (screen-dump) to copy an entire ArcMap window to the Windows clipboard for pasting into an external editor. 

A word about copyright law

Pulling images, videos, music and text off the web is ridiculously easy, but most web materials are covered by copyright. Copyright law protects the rights of the creator to control how his or her creative work is used for a specific time period, typically 50 or 100 years beyond the author's death. After the copyright has expired, the work enters the "public domain" and may be freely used by anyone. Most nations have signed the Berne copyright conventions which establish automatic copyright for the creator of any creative work as soon as it is created. The creator does not have to give public notice of copyright claim.

Copyright law does permit limited "fair use" of copyrighted materials: the fair use doctrine allows you to quote or distribute parts of copyrighted materials for academic, journalistic or satirical purposes. 

The best way to avoid copyright violation is to create your own stuff, use public domain materials, or look for materials licensed for free use. For example, the GNU Project distributes contributors' "copyleft" software for free.