Unit 1 (Chapter 1)—Educational Testing: Context, Issues, and Trends

Overheads for Unit 1--Chapter 1 (Educational Testing: Context, Issues, and Trends)

OH 1
Why Learn About Externally-Mandated Tests?

Pervasive

Many states and districts mandate them
Great variety (from MC to performance-based)
Various purposes (e.g., monitor student progress, school accountability)

Controversial

Concerns over effectiveness/side-effects
Fairness

Attract policy makers as reform tool

Inexpensive
Easy to implement
Quick to implement
Results are highly visible, easily reported by media

Will affect your life as a teacher

May be on panels determining test standards and test use
Will have to explain tests and scores to students, parents, public
Content standards and accountability procedures may affect your teaching

OH 2
Earlier Waves of Test-Based Educational Reform

Title I (TIERS)

Since 1965
Federal compensatory education program
Twice-yearly tests to assess program

Minimum Competency Tests

1970’s-1980’s
State tests for basic skills
Passing required for HS graduation

Nation at Risk (and other major reports)

1983 +
Stress standards beyond the minimum
Testing often major instrument of reform
Basis of "report cards" on schools
"Report cards" increase stakes of results
Pressure on schools to get scores up
Pressure produces side-effects (e.g., teaching to test, Lake Wobegon effect)

OH 3
Recent Wave of "Standards-Based" Reform (1990s)

Differs because:

Ambitious "world-class" standards
More performance-based assessments (less MC)
High-stakes accountability for schools, teachers, and (sometimes) students

All students assessed

Established both content and performance standards

Specify ends, not means
Content (the "what")—what students should know and be able to do in specific areas in specific grades
Performance (the "how well")—level of performance to be achieved
Almost every state has developed and adopted both
Basis for assessments intended to be aligned with curriculum

Emphasizes performance-based assessment

Common names—authentic, alternative, or performance assessment
Common theme—shift from fixed-choice MC to students constructing responses that judges rate
Rests on 3 premises

WYTIWYG
You don’t get what you don’t assess
Make tests worth teaching to

High-stakes accountability mechanisms

Increasingly popular with policy makers
Rewards for schools (e.g., special recognition, money)
Sanctions for schools (e.g., remove principal, reassign teachers, oversight)
Impact on students (e.g., promotion, graduation, types of diploma)

Includes all students

Many can take without any special accommodation (e.g., the recently moved)
Others can take with only minor accommodations (e.g., extra time)
Some will need more extensive accommodations (e.g., test in a different language), but then can take
Those with IEP's can take IEP-based tests (that is, modify tests in ways that instruction is already modified by IEP, e.g., more time, read instructions to student, student answers orally)

OH 4
Growing Role of Federal Government in Testing

NAEP (1969+)

National sample
Many subjects
Variety of item types
Some items repeated over years to chart trends (see example on p. 11)
Ages 9, 13, 17
State-by-state option beginning 1990
Performance standards—below basic, basic, proficient, advanced
3 purposes—report level of achievement for 3 ages; changes over time; differences across demographic groups

TIMSS

Math
Difficult to compare across countries (e.g., selectivity, sample quality, differences in definitions of education levels, translation)
Poor U.S. performance spurs calls for higher standards (see example on p. 13)

Various presidential initiatives

Last three presidents (e.g., Goals 2000), the last too recent for the book
All proposed system of voluntary national tests
Bush's was just approved

OH 5
Public Concern over Testing

Active public involvement

Public often on panels determining objectives and standards
Has led to more testing

Some concern that too much testing

Takes too much class time?
Distorts curriculum?

Debates over social consequences

Often contentious
E.g., attacks on testing industry, calls for moratoriums on testing
3 concerns (on following overhead transparencies)

OH 6
Social Consequences I: Nature and Quality of Tests

Complaints

MC (multiple-choice) may penalize the most able, creative students
Tests too structured, problems too narrow and unrealistic
Tests measure only a limited aspect of individual

Responses to complaints

Performance-based tests are now common--fewer are MC tests
Many complaints reflect poor test use, not poor tests (e.g., overgeneralizing from a single score)
There are costs to not testing, to opting for less rather than more information. Are the alternatives really better--or worse?

OH 7
Social Consequences II: Effects of Testing on Students

Anxiety

Concern: Tests create anxiety
Response: A little anxiety helps most students; liberal time limits help avoid harmful anxiety

Labeling

Concern: Tests categorize and label students
Response: Problems come when users overgeneralize from single scores; ability grouping can help when it is flexible and responsive to changes in performance

Self-concepts

Concern: Tests damage students’ self-concepts
Response: Problems come from overgeneralizing from low scores; can be avoided if mention strengths as well as weaknesses.

Self-fulfilling prophecies

Concern: Tests create self-fulfilling prophecies
Response: Don’t overgeneralize; attend to strengths as well as weaknesses

Overall lesson

Use tests properly! Don’t overgeneralize from single scores!

OH 8
Social Consequences III: Racial and Gender Fairness

Definitions of test "fairness" often differ

"absence of bias"—same score predicts the same thing, regardless of race

one of the professional standards for test fairness

"procedural fairness"—testing conditions provide equal opportunity for all to show what they know (e.g., comparable grading standards)

one of the professional standards for test fairness

"opportunity to learn"—all students had the same opportunity to learn the material being tested
"equality of results"—all races get the same average score

this often requires violating definitions 1 and 2

"Proper" definition depends on your purposes

If it is valid scores, then 1 and 2 are essential
If it is to measure an enduring aptitude or ability, then 3 is also essential; it is not, if the aim is to know how much students actually know (regardless of ability)
If it is equal scores for all groups, then 1-3 are irrelevant (and often conflict)

Distinction between test bias and unfair test use

Test bias = flaw in the test (e.g., content that is unfamiliar or demeaning to some groups)

test makers now use citizen panels and statistical tests to avoid this
this is a technical issue
lack of bias does not mean that all groups will score equally (there are many possible reasons for average score and skill differences)

Unfair test use = unfair use of an unbiased test (e.g., do we use the same or different score cutoffs for different racial groups? Should new tests be added that favor minorities or women, e.g., SAT writing test?)

this is a social-political issue (there is no technical solution)
decision may be affected by why scores differ (e.g., lack of opportunity to learn)