Overheads for Unit 10—Chapter 19 (Interpreting Standardized Test Scores)

OH 1

The Challenge

Technical Challenge

Educational and psychological measures not like pounds or inches

No zero point
Units of measurement not equal

Methods have been developed to cope with this limitation

By providing meaningful frames of reference for interpreting scores
By providing ways that give equal units of measurement
By providing ways to compare and add very different kinds of scores

Professional Standard

“Should be able to interpret commonly reported scores: [such as] percentile ranks, percentile band scores, standard scores, and grade equivalents.” (Standard 3 for Teacher Competence in Educational Assessment)

OH 2

Methods of Interpreting Test Scores

Raw score

· Number of point when scored following the scoring directions

Has no inherent meaning (neither does % correct)

Criterion-referenced and standards-based interpretations

Definition

student’s score relates to clear description of specific tasks a student can perform
those tasks, in turn, related to specified standards of mastery
no need to consider other students’ scores

Most useful when test designed for this purpose

set of clearly stated learning objectives
enough items to infer degree of mastery or non-mastery of that domain
items selected to actually measure that domain

Guidelines for when can (cautiously) interpret norm-referenced tests in criterion-referenced terms

achievement domains (e.g., objectives) are homogeneous, delimited, and clearly specified?

· if not, avoid specific descriptive statements

enough items (say, 10) for each type of interpretation?

· if not, combine items into larger clusters or make only tentative judgments

easy items were omitted to increase discrimination?

· if so, then scores won’t describe what low achievers can do

used selection-type items only?

· if so, then scores influenced by guessing

test items provide directly relevant measure of the objectives?

· if not, base interpretations on what they do measure

Norm-Referenced Interpretation

Definition

student’s score relative to other students (in a norm group)
norm group is carefully defined
no need to look at level of mastery

Derived scores

definition: raw scores converted into numbers that have meaning within a particular comparison group
derived scores needed because simple rankings have limited value
most common types: grade equivalents, percentiles, standard scores
simple to calculate and conversion tables often provided
many types are standard scores (e.g., T-scores, NCE, standard age scores), based on same logic using the normal curve
other types of developmental scales besides GE (e.g., age-equivalents)

Expectancy Tables (chapter 4)

Definition: two-way chart that shows how often students with at each score level (say, SAT math) perform at each level on another valued performance (say, freshmen grades in college)
Don’t need any norms

OH 3

Grade Equivalent Scores

Description

· Definition: the grade level at which the typical student obtains that raw score

· Sample interpretation: “student had the same raw score that was average for students in grade 5.6 in the average school”

· Typical score is determined for each month in a grade: 5.0-5.9

· Tables provided, so just look up what grade level corresponds to a student’s raw score

· Widely used, especially in elementary school

Widely Misinterpreted!

Don’t confuse GE norms with standards that all students should attain
Don’t interpret a GE as an estimate of the grade a student should be placed in
Don’t expect all students to gain 1.0 GE each year (the average). Not a realistic goal
Don’t assume that the units are equal at different parts of the scale (the same difference can mean “just above” or “vastly above” average)
Don’t assume that scores on different tests are comparable

Different publishers test fuller ranges of students than others
Patterns of growth (variance in scores) may differ across subjects

Don’t interpret extreme scores as dependable estimates of student’s performance (usually extrapolated)

Usefulness

Most useful in reporting growth in basic skills in elementary school
Least useful for comparing performance on different tests
Inequality in grade units will muddle interpretation if you don’t keep it clearly in mind

OH 4

Percentile Rank

Description

· Definition: the percentage of students in the norm group scoring below a particular raw score (relative position in the group)

· Widely used and easily understood

Requirements for use

A conversion table (from raw scores to percentiles) based on a norm group
A norm group (conversion table) that is appropriate for the students taking the test: grade or age, time of year
A norm group (conversion table) that is also specific to the exact test being given: test, subtest, form or (difficulty) level of the test
Many tests or student groups means many conversion tables
Different purposes (comparisons of same child with different groups) require different norms

Limitations

Must always refer to a student’s percentile rank as relative to a particular norm group
Usually require multiple sets of norms, especially in high school and beyond
Units not equal, especially at the extremes

· Pattern of inequality is predictable, however

· Same percentile difference (say, 5 points) reflects a much bigger difference in performance at the extremes than near the average (recall the shape of the normal curve)

OH 5

Standard Scores

Definition

Standard score—how far above or below average a student scored
Distance is calculated in standard deviation (SD) units (a standard deviation is a measure of spread or variability)
The mean and standard deviation are for a particular norm group

Advantages

Based on the “normal curve,” which means that:

Scores are distributed symmetrically around the mean (average)
Each SD represents a fixed (but different) percentage of cases
Almost everyone is included between –3.0 and 3.0 SDs of the mean
The SD allows conversion of very different kinds of raw scores to a common scale that has (a) equal units and (b) can be readily interpreted in terms of the normal curve
When we can assume that scores follow a normal curve (classroom tests usually don’t but standardized tests do), we can translate standard scores into percentiles—very useful!

OH 6

Types of Standard Scores

All Standard Scores

Share a common logic
Can be translated into each other (see figure 19.2, p. 494)

z-Score

Simplest
The one on which all others based
Formula: z = (X-M)/SD, where X is person’s score, M is group’s average, and SD is group’s spread (standard deviation in scores
Z is negative for scores that are below average, so z’s are usually converted into some other system that has all positive numbers

T- Score

Normally distributed standard scores
M=50, SD=10
Can be obtained from z scores: T = 50 + 10(z)

Normalized Standard Scores

Starts with scores that you want to make conform to the normal curve
Get percentile ranks for each score
Transform percentiles into z scores using a conversion table (I handed one out in class)
Then transform into any other standard score you want (e.g., T-score, IQ equivalents)
Hope that your assumption was right, namely, that the scores really do naturally follow a normal curve. If they don’t, your interpretations (say, of equal units) may be somewhat mistaken

Stanines

Very simple type of normalized standard score
Ranges from 1-9 (the “standard nines”)
Each stanine from 2-8 covers ½ SD
Stanine 5 = percentiles 40-59 (the middle 20 percent)
A difference of 2 stanines usually signals a real difference
Strengths

1. easily explained to students and parents

2. normalized, so can compare different tests

3. can add stanines to get a composite score

4. easily recorded (only one column)

Limitations

1. like all standard scores, cannot record growth

2. crude, but prevents overinterpretation

Normal-Curve Equivalents (NCE)

Normally distributed standard scores
M=50
SD=21.06
Results in scores that go from 1-99
Like percentiles, expect that have equal units (this means that they make fewer distinctions in the middle of the curve and more at the extremes)

Standard Age Scores (SAS)

Normally distributed standard scores
Put into an IQ metric, where
M=100
SD=15 (Wechsler IQ Test) or SD=16 (Stanford-Binet IQ Test)

OH 7

Converting among Standard Scores

Easy Convertibility

All are different ways of saying the same thing
All represent equal units at different ranges of scores
All can be averaged (among themselves)
Can easily convert one into the other
Figure 19.2 on p 494 shows how they line up with each other
But interpretable only when scores are actually normally distributed (standardized tests usually are)
Downside—not as easily understood by students and parents as are percentiles

OH 8

Using Standard Scores to Examine Profiles

Uses

You can compare a student’s scores on different tests and subtests when you convert all the scores to the same type of standard score
But all the tests must use the same norm group
Plotting profiles can show their relative strengths and weaknesses
Should be plotted as confidence bands to illustrate fringe of error
Interpret scores as different only when their bands do not overlap
Sometimes plotted separately by male and female (say, on vocational interest tests), but is controversial practice
Tests sometimes come with tabular or narrative reports of profiles (see p. 496)

OH 9

Using Standard Scores to Examine Mastery of Skill Types

Some standardized tests try to provide some criterion-referenced information by providing scores on specific sets of skills (see Figure 19.4 on p. 498)
Be very cautious with these—use them as clues only, because each skill area typically has very few items

OH 10

Judging the Adequacy of Norms for Standard Scores

Remember Your Aim!

To interpret performance relative to a well-defined reference group

Criteria for Judging Norms

Relevant

Is this particular norm group appropriate for (a) the decision you want to make and (b) the set of students involved?

Representative

Was the norm group created with a random sample or stratified random sample? Does it match census figures (by race, sex, age, location, etc.) for the general population being considered?

Up-to-date

Don’t rely on the copyright date of the test manual. Read the manual to see how old the norms are
Beware of Lake Wobegon effect!

Comparable

· If you want to compare scores on tests with different norm groups, check the test manuals how comparable the groups are

Adequately described—look for:

· Method of sampling

· Number and distribution of cases in the norming sample

· Age, race, sex, geography, etc. of norm sample

· Extent to which standardized conditions were maintained in testing

· Prefer the tests that described in more detail

OH 11

Cautions in Interpreting Standardized Test Scores

Scores should be interpreted:

With clear knowledge about what the test measures. Don’t rely on titles; examine the content (breadth, etc.)
In light of other factors (aptitudes, educational experiences, cultural background, health, motivation, etc.) that may have affected test performance
According to the type of decision being made (high or low for what?)
As a band of scores rather than a specific value. Always subtract and add 1 SEM from the score to get a range to avoid overinterpretation
In light of all your evidence. Look for corroborating or conflicting evidence
Never rely on a single score to make a big decision