Digital Image Basics
(adapted from a lecture written by Betsy Mackenzie)
Human vision
Humans evolved as predators.
Like other predators, we have forward-looking aligned eyes to
support stereo-vision, so that we can gauge the distances of
prey and/or threats.
In contrast, most prey species have divergent eyes to
support wide-angle vision. Their eyes are attuned to brightness
rather than color; they see in grayscale, but are better at
detecting motion in the landscape (predators?) than we are.
We inherited grayscale vision (rods) from our ancestors and developed
color vision (cones) to complement that.
The human retina has about 120 million rods and 7 million cones.
Rods are grayscale light
sensors--very sensitive to overall brightness across the full spectrum
of visible light (wavelengths between 0.4 and 0.7 micrometers).
Rods function much better in low light than cones.
Rods are spread across the retina to support wider-angle vision.
Human visual acuity is sharpest at a small part of the retina called the
macula, where our cones are concentrated.
Cones are our color sensors, attuned to specific
wavelengths but less sensitive than rods to overall brightness.
About 64% of our cones see red light, 32% see green
light and 2% see blue light. The predominance of red- and green-
sensitive cones explains why we can discriminate somewhat better among
red and yellow wavelengths than among blue wavelengths, and why
red instrumentation lights affect our night vision much less
than white light. Our cones are more sensitive to blue light than
green or red light.
Vision uses more of the human brain than any other function.
Our optic nerve bundles are up to half a centimeter thick
and carry an enormous flow of information to our brains,
but the eyes and optic nerves are only a small part of
visual intelligence.
Because the macular focus is so narrow, our
eyes are constantly scanning and our visual cortex is
constantly interpolating to sustain functional vision.
Our brains are constantly judging distances based on the
tiny visual offsets of objects perceived in eyes that are
less than three inches apart. Our cognitive functions are
constantly interpreting patterns in the data: "That's a chair;
that's a lamp." And virtually all of this
happens unconsciously.
Basics of imaging
Analog photographs are created by
exposing film with an emulsion of photo-reactive silver halide to
light,
developing the negative to neutralize the unreacted silver halide,
shining light through the negative onto photo-reactive paper and
then fixing the paper positive.
The reactivity of silver halide increases with particle size, which
is why large prints from high-speed film tend to be grainier than
prints from slower film. The silver halide crystals are
arbitrarily distributed in the emulsion spread on the film, so the
graininess of prints is arbitrary too.
In contrast, digital images are derived from electrical signals
recorded by rectangular arrays of
charge-coupled devices (CCD's) integrated with photoelectric
sensors. The sensors convert light in specific wavelengths to
electrical charges and the CCD's converts the electrical charges
into digital values. The image is stored as a regular
array (aka grid or raster) of
numbers that a computer can interpret as color or brightness
values.
.
The regularity of this array makes the image
readily recoverable from the numbers; you just need to know
many numbers fill a row.
If you zoom way in on a digital image, you will typically see
individual pixels as squares.
The amount of data an image contains is determined by its size,
resolution and bit depth.
Size is simply the height
times the width of the image, measured in inches, centimeters, points or
picas. (This is not to be confused with the image's file size,
measured in KB or MB).
Resolution is pixels per linear inch.
Higher resolution
means squeezing more pixels into the same space.
Bit depth is the number
of bits used to store the color information for each pixel. The
pixel value might refer to an indexed color, or it might code a mix of
red, green and blue values. It would specify a shade of gray in a
grayscale image, or it might simply indicate if the pixel is black or
white.
Brief detour: a binary math primer
A bit (short for "binary digit") is the smallest unit of computer
information. The binary number system has only 2 digits (0,1),
which reference the on-or-off charge states of the fundamental data
elements in computers.
In base 10, a single digit has 101 = 10 permutations (0 to 9);
2 digits have 102 = 100 permutations (0 to 99);
3 digits have 103 = 1,000 permutations (0 to 999) ...and so on.
In base 2, a single bit has 2 permutations: 0 1;
2 bits have 22 = 4 permutations: 00 01 10 11;
3 bits have 23 = 8 permutations: 000 001 010 011 100 101 110, 111 ...and so on.
An image with a bit-depth of one has only black or white pixels.
An image with bit-depth of 8 can have 28 = 256 colors or shades
of gray.
An image with a bit-depth of 24 can combine 256 levels of red x 256
levels of green x 256 levels of blue to yield over 16 million
possible colors, whic matches or exceeds the color sensitivity of
human vision.
So image size, resolution and bit depth jointly determine how the
image looks and how much storage it requires.
A simple formula to determine the size of an image file (uncompressed)
is: height x width x resolution2 x bit depth =
size of
image file in bits (To convert to Kb, there are 8 bits
in a byte and 210 = 1024 bytes in a kilobyte, hence 8,192
bits/Kb.)
Bit-depth vs. resolution

Compare the two equal-sized images here. The upper
left picture is a 1-bit image with
a resolution of
2000 dots per inch. Each tiny pixel is either white or black.
The upper right picture is an 8-bit image with
a resolution of 200 dots per inch. Its pixels are much larger
but have 256 shades of gray.
The two
details showing the guy's left eye and glasses frame illustrate the
difference.
The overall image quality is about the same, but the
first image file is over 12 times larger than the second.
Without any compression,
the 1-bit 2,000-dpi image is (2 x 2 x 20002 x 1)/8192 = 1953
Kb, while the 8-bit 200-dpi image is (2 x 2 x 2002 x 8)/8192
= 156 Kb.
When processing images, keep your ultimate size and resolution
objectives
in mind. For example, if you are scanning a 3 x 5 inch photo for
eventual
printing as a 6 x 10 hardcopy image on a 300dpi printer, there is no
advantage
in scanning the source image at more than 600dpi. When you resize the
image
from 3x5 to 6x10 the resolution will change from 600dpi to 300 dpi.
Your scanning resolution should match the final dpi times the
ratio of the final size to the original size.
An image's size and resolution are inversely related:
enlarging the image to double its original size
reduces its original resolution by half.
Excessive enlargement will reveal the pixel artifacts ("jaggies").
Some color theory
There are three color-space models you should be aware of.
The RGB color space is represented here as a three dimensional
cube with red, green and blue axes.
Colors are defined as (x,y,z) coordinates in the cube
where black is (0,0,0) and white is (255,255,255).
All other colors are represented by their coordinates within the cube.
There is a "gray line" diagonal through the center of the cube
than goes from black at the origin to white at the far corner.
The RGB color space is called additive
because you add various intensities of red, green and blue to black to
get a color. RGB is the standard color space used in computer displays,
scanners and film recorders--devices where the
default is black.
The inverse of the RGB model is the CMY or CMYK
(cyan-magenta-yellow plus black) model.
This is a subtractive color model: colors are defined by
subtracting values of cyan, magenta and yellow (the complements of red,
green and blue) from white.
This model can also be visualized as a cube with C, Y and M axes;
the origin is white; the gray line traces the diagonal from the
origin through the cube to the black far corner.
CMYK is used in printing where the default background
(paper) is white.
The K in CMYK stands for black. CMYK printers typically substitute
cheap black ink for equal proportions of expensive CMY inks when printing
color blends.
The HIS
color space defines colors based on their Hue (dominant color),
Intensity
(value or brightness) and Saturation (color purity).
This model is sometimes called HSV (Hue-Saturation-Value).
The HIS color
space is represented here as a cone. The gray line runs through the
center
of the cone runs from black (0) at the origin (tip)
to white (255) at the end (base).
A color's Intensity (aka value)
is its distance out the greyline:
each cross-section of the cone
is a color wheel of uniform intensity.
A color's Hue is its
angular distance from red (0) through green (86) and blue (170) to red
again (255) around the color wheel.
Its Saturation is its relative
distance from the grey center of its color wheel (0) where R, G and B
are
in equal proportions, to the perimeter of the wheel (255) where the
color
is a pure primary or a mix of just two primaries.
Most image processing software uses the RGB model to manipulate
images.
The HIS model is just an alternative set of dimensions for the
same color space.
(One application of the HIS model involves creating a hillshaded
digital elevation map where the DEM determines the hue and the
hillshade map determines the intensity.)
Digital cameras and scanners
A charged-couple device (CCD) is a silicon semiconductor that acts
as
a light detector. When light hits the crystalline silicon in the
device, the electrons in the silicon become excited and create an
electrical
charge proportional to the amount of light (or the number of photons)
that the silicon is exposed to.
Digital cameras use a grid (2 dimensional array) of crystalline
silicon
CCD's. The entire array is exposed at once and each cell in the
grid
captures a value which becomes a pixel or dot in the digital images.
Three filters (red, green and blue) capture the intensity of each band of
light creating a composite color image. The size of the grid (number
of megapixels) defines
the maximum resolution of the camera.
The intensity of light hitting each CCD through each filter
is recorded as an
8-bit number (0-255).
Each pixel's red, green and blue
values are combined to yield a 24 bit color value.
The picture is then stored on some
removable media (e.g., flash card or sD card).
Desktop scanners use a linear array of CCD's that passes under the
face-down analog image, recording it as pixel values one
line at a time. The resolution of a scanner is measured by the optical
resolution of the CCD on the horizontal and by the speed and accuracy
of
the motor that controls the linear array on the vertical side. The
important
number to consider when looking at scanner resolution is the optical
resolution; many scanners can double or quadruple their effective
resolutions by interpolating pixels between recorded pixel
values. Some scanners use a white light source and three (RGB)
filters to capture the image in one pass.
Others make three passes, one for each of three primary colors.
Image editing
A digital image program is used to edit image files.
The simplest editor is the old Paint program that comes
with MS-Windows, although its capabilities are limited and its default
BMP formats (see below) are not particularly efficient.
You can learn a lot of digital image theory from using a fancier image
editor such Gimp (free from www.gimp.org).
The GIS lab usually has Gimp installed on its machines, and
I encourage you to download and install it on your own.
It is a great alternative to Adobe Photoshop.
Like Photoshop,
GIMP lets you edit multiple
layers of an image, and supports a large number of filters and other
enhancements.
Rule #1: Always edit your
images in RGB mode (24-bit depth) rather than Indexed Color mode
(8-bit depth). If you are editing a GIF, convert it
to RBG for editing, then convert it back to Indexed when saving your
edits.
Although web download speeds have increased dramatically over the
past decade, you should still try to keep your web image files
sizes reasonable.
Download speeds are
typically slower for larger image files because there is more packet
assembly required at the client end.
People quickly get impatient with
slow downloads, and may not wait around for oversized images.
An editing example:
The image below is a .PNG exported from ArcScene:
I used some standard GIMP tools to obtain the final image below:
autocrop to eliminate extraneous white space; increase overall brightness
and contrast; select the reds which were still too dark, grow the
selection to cover most of the piedmont area, feather the selection
so changes in the selected area will blend well at the edges, and
brighten and sharpen the selected part of the image; add the image
title with anti-aliasing to smooth the text; add
an alpha channel for transparency, then select and
delete the white background to make it transparent, and re-save.
Image file formats
The Web supports three image file formats: GIF, PNG and JPEG.
- The GIF (Graphic Interchange Format) format uses "indexed"
8-bit color and image resolution of 72 dpi. Up to 255 colors are
defined for each pixed value in a color table at the beginning of the
file.
The remainder of the file is pixel values referencing these colors.
The GIF format supports animation by timed display of a series of
GIFs to create a cartoon like effect. GIFs can also be interlaced
so that when they are downloaded they display every other line and then
go back and fill in the missing lines. This makes the image seem
to appear faster. GIF files can also include specified
transparent
colors, so that you can blend a GIF's background into the page.
GIF uses a lossless run-length coding compression algorithm known as
LZW (the inventors' initials); this was the basis for Compuserve's
patent on the GIF format. Run-length coding basically abbreviates
"00000000001110000000000" to "10x0 3x1 10x0." This
compression strategy is most efficient for images that have
long (horizontal) runs of uniform color values, but it is not very
efficient for most photographs.
Compuserve's patent has
expired, and GIF is still widely used.
-
The PNG format was developed as a web-compatible shareware substitute
for GIF after Compuserve started charging other software developers
licensing fees for its GIF patent. PNG has the same functionality
as GIF, better file compression, it supports 24-bit color.
PNG is generally the best format for maps posted on the web.
It offers better color fidelity than GIF and preserves the crispness of
borders better than JPEG.
-
The JPEG (Joint Photographic Experts Group) format is generally most
efficient for storing photographic images. The JPEG coding
process sections the image
into 8 x 8 blocks of pixels, and calculates cosine transforms that
approximate
the intensity and hue shifts within each block. The image file
just
stores the transform coefficients, not the original pixel values, so
the JPEG decoding process produces an image that only approximates the
original. So this is a "lossy" format, but its compression
efficiency can be very high.
JPEG's can be saved in various levels of image
quality, substituting compression efficiency for additional transform
information that retains image quality.
Low-quality JPEG's with maximum compression often exhibit discernible
"smears" on the edges of image features, and the 8x8 pixel blocks may
be annoyingly obvious.
The quality of an image can really suffer from cumulative
information loss if you edit and resave it in JPEG format multiple
times. If you think you may have to re-edit an image, you should
keep it in a lossless format such as BMP or even GIF.
Other image formats are not directly supported by the Web, but may be
useful in other contexts:
- BMP (Microsoft Windows Bitmap) formats are recognized by all Windows
programs and most other PC applications. The format supports
multiple bit depths: 1-bit, 4-bit, 8-bit and 24-bit. But since
BMP images have no compression their file sizes are often 10+ times as
big as equivalent GIF or JPEG images.
- TIFF (Tagged Image File Format) is an old format originally created by
the
Aldus and Microsoft Corporations to store scanned images. There are
actually
many types of TIFF format; most platforms recognize the standard
types.
- Postscript (PS) and Encapsulated Postscript (EPS) are mixed
formats that encode both raster and vector graphics. These were
developed by Adobe, and are precursors of Adobe's Acrobat format.
- The PBM (Portable Bitmap) format was developed by Jeff
Poskanzer as a generic intermediary UNIX format for translating images
between formats with his Portable Bitmap Tools.
Rather than create N x (N-1) direct format translators for N image
formats, the PBM library has 2N translators for 2-step conversions
through PBM formats.
To convert a TIFF to a GIF, for example, you would use tifftoppm and
then ppmtogif.
Exporting map images
You should generally save your maps in PNG
format for display in your project web pages.
There are various ways of exporting maps, charts and layouts from
ArcMap as web page images:
- File--Export to create
an image file of your current ArcMap data or layout view.
Adjust the image resolution to
control the size of the image. Image heights and widths between
300 and 1,000 pixels are generally best for the web.
The exported image will have the same aspect ratio (height/width)
and whitespace as the Arc map frame, so you should size the map frame and
position the map appropriately before exporting an image of it.
Alternately, you can crop the image afterward with Gimp.
- Edit--Copy Map to Clipboard copies
your map to the Windows clipboard for pasting into any graphic editing
package such as Gimp or the Windows Paint
program. Edit as needed and save in PNG format. Paint only
lets you save a pasted image in BMP format, but when you reload it, you
can save it as a PNG.
- Use Alt-PrintScreen (screen-dump) to copy an entire ArcMap window
to the Windows clipboard for pasting into an external editor.
A word about copyright law
Pulling images, videos, music and text off the web is ridiculously
easy, but most web materials are covered by copyright.
Copyright law protects the rights of the creator to control
how his or her creative work is used for a specific time period,
typically 50 or 100 years beyond the author's death.
After the copyright has expired, the work enters the "public domain"
and may be freely used by anyone.
Most nations have signed
the Berne copyright conventions which establish automatic
copyright for the creator of any creative work as soon as it is
created. The creator does not have to give public notice of
copyright claim.
Copyright law does permit limited "fair use" of copyrighted materials:
the fair use doctrine allows you to quote or distribute parts of
copyrighted
materials for academic, journalistic or satirical purposes.
The best way to avoid copyright violation is to create your own
stuff, use public domain materials, or look for materials
licensed for free use. For example, the GNU Project distributes
contributors' "copyleft" software for free.
|