Overheads
for Unit 10—Chapter 19 (Interpreting Standardized Test Scores)
OH 1
The Challenge
Technical Challenge
Educational and psychological measures not like pounds or
inches
- No
zero point
- Units
of measurement not equal
Methods have been developed to cope with this limitation
- By
providing meaningful frames of reference for interpreting scores
- By
providing ways that give equal units of measurement
- By
providing ways to compare and add very different kinds of scores
Professional Standard
“Should be able to interpret commonly reported
scores: [such as] percentile ranks, percentile band scores, standard scores,
and grade equivalents.” (Standard 3 for Teacher Competence in Educational
Assessment)
OH 2
Methods of Interpreting Test Scores
Raw score
·
Number of point when scored
following the scoring directions
- Has no
inherent meaning (neither does % correct)
Criterion-referenced and standards-based interpretations
- Definition
- student’s
score relates to clear description of specific tasks a student can
perform
- those
tasks, in turn, related to specified standards of mastery
- no
need to consider other students’ scores
- Most
useful when test designed for this purpose
- set
of clearly stated learning objectives
- enough
items to infer degree of mastery or non-mastery of that domain
- items
selected to actually measure that domain
- Guidelines
for when can (cautiously) interpret norm-referenced tests in
criterion-referenced terms
- achievement domains (e.g., objectives) are
homogeneous, delimited, and clearly specified?
·
if not, avoid specific descriptive statements
- enough items (say, 10) for each type of
interpretation?
·
if not, combine items into larger clusters or
make only tentative judgments
- easy items were omitted to increase discrimination?
·
if so, then scores won’t describe what low
achievers can do
- used selection-type items only?
·
if so, then scores influenced by guessing
- test items provide directly relevant measure of the
objectives?
·
if not, base interpretations on what they do measure
Norm-Referenced Interpretation
- Definition
- student’s
score relative to other students (in a norm group)
- norm
group is carefully defined
- no
need to look at level of mastery
- Derived
scores
- definition:
raw scores converted into numbers that have meaning within a particular
comparison group
- derived
scores needed because simple rankings have limited value
- most
common types: grade equivalents, percentiles, standard scores
- simple
to calculate and conversion tables often provided
- many
types are standard scores
(e.g., T-scores, NCE, standard age scores), based on same logic using the
normal curve
- other
types of developmental scales besides GE (e.g., age-equivalents)
Expectancy Tables
(chapter 4)
- Definition:
two-way chart that shows how often students with at each score level (say,
SAT math) perform at each level on another valued performance (say,
freshmen grades in college)
- Don’t
need any norms
OH 3
Grade Equivalent Scores
Description
·
Definition: the grade level at which the typical
student obtains that raw score
·
Sample interpretation: “student had the same raw
score that was average for students in grade 5.6 in the average school”
·
Typical score is determined for each month in a
grade: 5.0-5.9
·
Tables provided, so just look up what grade
level corresponds to a student’s raw score
·
Widely used, especially in elementary school
Widely Misinterpreted!
- Don’t
confuse GE norms with standards that all students should attain
- Don’t
interpret a GE as an estimate of the grade a student should be placed in
- Don’t
expect all students to gain 1.0 GE each year (the average). Not a
realistic goal
- Don’t assume that the units are equal at
different parts of the scale (the same difference can mean “just above” or
“vastly above” average)
- Don’t
assume that scores on different tests are comparable
- Different
publishers test fuller ranges of students than others
- Patterns
of growth (variance in scores) may differ across subjects
- Don’t
interpret extreme scores as dependable estimates of student’s performance
(usually extrapolated)
Usefulness
- Most
useful in reporting growth in basic skills in elementary school
- Least
useful for comparing performance on different tests
- Inequality
in grade units will muddle interpretation if you don’t keep it clearly in
mind
OH 4
Percentile Rank
Description
·
Definition: the percentage of students in the
norm group scoring below a particular raw score (relative position in the
group)
·
Widely used and easily understood
Requirements for use
- A
conversion table (from raw scores to percentiles) based on a norm group
- A
norm group (conversion table) that is appropriate for the students taking
the test: grade or age, time of year
- A
norm group (conversion table) that is also specific to the exact test
being given: test, subtest, form or (difficulty) level of the test
- Many
tests or student groups means many conversion tables
- Different
purposes (comparisons of same child with different groups) require
different norms
Limitations
- Must
always refer to a student’s percentile rank as relative to a particular
norm group
- Usually
require multiple sets of norms, especially in high school and beyond
- Units
not equal, especially at the extremes
·
Pattern of inequality is predictable, however
·
Same percentile difference (say, 5 points)
reflects a much bigger difference in performance at the extremes than near the average
(recall the shape of the normal curve)
OH 5
Standard Scores
Definition
- Standard
score—how far above or below average a student scored
- Distance
is calculated in standard deviation (SD) units (a standard deviation is a
measure of spread or variability)
- The
mean and standard deviation are for a particular norm group
Advantages
Based on the “normal curve,” which means that:
- Scores
are distributed symmetrically around the mean (average)
- Each
SD represents a fixed (but different) percentage of cases
- Almost
everyone is included between –3.0 and 3.0 SDs of
the mean
- The
SD allows conversion of very different kinds of raw scores to a common
scale that has (a) equal units and (b) can be readily interpreted in terms
of the normal curve
- When
we can assume that scores follow a normal curve (classroom tests usually
don’t but standardized tests do), we can translate standard scores into
percentiles—very useful!
OH 6
Types of Standard Scores
All Standard Scores
- Share
a common logic
- Can
be translated into each other (see figure 19.2, p. 494)
z-Score
- Simplest
- The
one on which all others based
- Formula:
z = (X-M)/SD, where X is person’s score, M is group’s average, and SD is
group’s spread (standard deviation in scores
- Z is
negative for scores that are below average, so z’s
are usually converted into some other system that has all positive numbers
T- Score
- Normally
distributed standard scores
- M=50,
SD=10
- Can
be obtained from z scores: T = 50 +
10(z)
Normalized Standard Scores
- Starts
with scores that you want to make conform to the normal curve
- Get
percentile ranks for each score
- Transform
percentiles into z scores using a conversion table (I handed one out in
class)
- Then
transform into any other standard score you want (e.g., T-score, IQ
equivalents)
- Hope
that your assumption was right, namely, that the scores really do
naturally follow a normal curve. If they don’t, your interpretations (say,
of equal units) may be somewhat mistaken
Stanines
- Very
simple type of normalized standard score
- Ranges
from 1-9 (the “standard nines”)
- Each
stanine from 2-8 covers ½ SD
- Stanine 5 = percentiles 40-59 (the middle 20 percent)
- A
difference of 2 stanines usually signals a real
difference
- Strengths
1. easily
explained to students and parents
2. normalized,
so can compare different tests
3. can
add stanines to get a composite score
4. easily
recorded (only one column)
1. like
all standard scores, cannot record growth
2. crude,
but prevents overinterpretation
Normal-Curve
Equivalents (NCE)
- Normally
distributed standard scores
- M=50
- SD=21.06
- Results
in scores that go from 1-99
- Like
percentiles, expect that have equal units (this means that they make fewer
distinctions in the middle of the curve and more at the extremes)
Standard Age
Scores (SAS)
- Normally
distributed standard scores
- Put
into an IQ metric, where
- M=100
- SD=15
(Wechsler IQ Test) or SD=16 (Stanford-Binet IQ
Test)
OH 7
Converting among Standard Scores
Easy Convertibility
- All
are different ways of saying the same thing
- All
represent equal units at different ranges of scores
- All
can be averaged (among themselves)
- Can
easily convert one into the other
- Figure
19.2 on p 494 shows how they line up with each other
- But
interpretable only when scores are actually
normally distributed (standardized tests usually are)
- Downside—not
as easily understood by students and parents as are percentiles
OH 8
Using Standard Scores to Examine Profiles
Uses
- You
can compare a student’s scores on different tests and subtests when you
convert all the scores to the same type of standard score
- But
all the tests must use the same norm group
- Plotting
profiles can show their relative strengths and weaknesses
- Should
be plotted as confidence bands to illustrate fringe of error
- Interpret
scores as different only when their bands do not overlap
- Sometimes
plotted separately by male and female (say, on vocational interest tests),
but is controversial practice
- Tests
sometimes come with tabular or narrative reports of profiles (see p. 496)
OH 9
Using Standard Scores to Examine Mastery of Skill Types
- Some
standardized tests try to provide some criterion-referenced information by
providing scores on specific sets of skills (see Figure 19.4 on p. 498)
- Be
very cautious with these—use them as clues only, because each skill area
typically has very few items
OH 10
Judging the Adequacy of Norms for Standard Scores
Remember Your Aim!
- To
interpret performance relative to a well-defined reference group
Criteria for Judging Norms
- Relevant
- Is
this particular norm group appropriate for (a) the decision you want to
make and (b) the set of students involved?
- Representative
- Was
the norm group created with a random sample or stratified random sample?
Does it match census figures (by race, sex, age, location, etc.) for the
general population being considered?
- Up-to-date
- Don’t
rely on the copyright date of the test manual. Read the manual to see how
old the norms are
- Beware
of Lake Wobegon effect!
- Comparable
·
If you want to compare scores on tests with
different norm groups, check the test manuals how comparable the groups are
- Adequately
described—look for:
·
Method of sampling
·
Number and distribution of cases in the norming
sample
·
Age, race, sex, geography, etc. of norm sample
·
Extent to which standardized conditions were
maintained in testing
·
Prefer the tests that described in more detail
OH 11
Cautions in Interpreting Standardized Test Scores
Scores should be
interpreted:
- With
clear knowledge about what the test measures. Don’t rely on titles;
examine the content (breadth, etc.)
- In
light of other factors (aptitudes, educational experiences, cultural
background, health, motivation, etc.) that may have affected test
performance
- According
to the type of decision being made (high or low for what?)
- As a
band of scores rather than a specific value. Always subtract and add 1 SEM
from the score to get a range to avoid overinterpretation
- In
light of all your evidence. Look for corroborating or conflicting evidence
- Never
rely on a single score to make a big decision