Overheads for Unit 1--Chapter 1 (Educational Testing:
Context,
Issues,
and Trends)
OH 1
Why Learn About Externally-Mandated Tests?
Pervasive
- Many states and districts mandate them
- Great variety (from MC to performance-based)
- Various purposes (e.g., monitor student progress, school accountability)
Controversial
- Concerns over effectiveness/side-effects
- Fairness
Attract policy makers as reform tool
- Inexpensive
- Easy to implement
- Quick to implement
- Results are highly visible, easily reported by media
Will affect your life as a teacher
- May be on panels determining test standards and test use
- Will have to explain tests and scores to students, parents, public
- Content standards and accountability procedures may affect your teaching
OH 2
Earlier Waves of Test-Based Educational Reform
Title I (TIERS)
Since 1965
Federal compensatory education program
Twice-yearly tests to assess program
Minimum Competency Tests
- 1970’s-1980’s
- State tests for basic skills
- Passing required for HS graduation
Nation at Risk (and other major reports)
- 1983 +
- Stress standards beyond the minimum
- Testing often major instrument of reform
- Basis of "report cards" on schools
- "Report cards" increase stakes of results
- Pressure on schools to get scores up
- Pressure produces side-effects (e.g., teaching to test, Lake Wobegon effect)
OH 3
Recent Wave of "Standards-Based" Reform (1990s)
Differs because:
- Ambitious "world-class" standards
- More performance-based assessments (less MC)
- High-stakes accountability for schools, teachers, and (sometimes) students
- All
students assessed
Established both content and performance standards
Specify ends, not means
Content (the "what")—what students should know and be able to do in specific areas in specific grades
Performance (the "how well")—level of performance to be achieved
Almost every state has developed and adopted both
Basis for assessments intended to be aligned with curriculum
Emphasizes performance-based assessment
- Common names—authentic, alternative, or performance assessment
- Common theme—shift from fixed-choice MC to students constructing responses that judges rate
- Rests on 3 premises
- WYTIWYG
- You don’t get what you don’t assess
- Make tests worth teaching to
High-stakes accountability mechanisms
Increasingly popular with policy makers
Rewards for schools (e.g., special recognition, money)
Sanctions for schools (e.g., remove principal, reassign teachers, oversight)
Impact on students (e.g., promotion, graduation, types of diploma)
Includes all students
Many can take without any special accommodation (e.g., the
recently moved)
Others can take with only minor accommodations (e.g., extra time)
Some will need more extensive accommodations (e.g., test in a
different language), but then can take
Those with IEP's can take IEP-based tests (that is, modify tests in
ways that instruction is already modified
by IEP, e.g., more time, read instructions to student, student answers orally)
OH 4
Growing Role of Federal Government in Testing
NAEP (1969+)
- National sample
- Many subjects
- Variety of item types
- Some items repeated over years to chart trends (see example on p. 11)
- Ages 9, 13, 17
- State-by-state option beginning 1990
- Performance standards—below basic, basic, proficient, advanced
- 3 purposes—report level of achievement for 3 ages; changes over time; differences across demographic groups
TIMSS
- Math
- Difficult to compare across countries (e.g., selectivity, sample quality, differences in definitions of education levels, translation)
- Poor U.S. performance spurs calls for higher standards (see example on p. 13)
Various presidential initiatives
- Last three presidents (e.g., Goals 2000), the last too recent for the book
- All proposed system of voluntary national tests
- Bush's was just approved
OH 5
Public Concern over Testing
Active public involvement
- Public often on panels determining objectives and standards
- Has led to more testing
Some concern that too much testing
- Takes too much class time?
- Distorts curriculum?
Debates over social consequences
- Often contentious
- E.g., attacks on testing industry, calls for moratoriums on testing
- 3 concerns (on following overhead transparencies)
OH 6
Social Consequences I: Nature and Quality of Tests
Complaints
- MC (multiple-choice) may penalize the most able, creative students
- Tests too structured, problems too narrow and unrealistic
- Tests measure only a limited aspect of individual
Responses to complaints
- Performance-based tests are now common--fewer are MC tests
- Many complaints reflect poor test use, not poor tests (e.g., overgeneralizing from a single score)
- There are costs to not testing, to opting for less
rather than more information. Are the alternatives really better--or
worse?
OH 7
Social Consequences II: Effects of Testing on Students
Anxiety
- Concern: Tests create anxiety
- Response: A little anxiety helps most students; liberal time limits help avoid harmful anxiety
Labeling
- Concern: Tests categorize and label students
- Response: Problems come when users overgeneralize from single scores; ability grouping can help when it is flexible and responsive to changes in performance
Self-concepts
- Concern: Tests damage students’ self-concepts
- Response: Problems come from overgeneralizing from low scores; can be avoided if mention strengths as well as weaknesses.
Self-fulfilling prophecies
- Concern: Tests create self-fulfilling prophecies
- Response: Don’t overgeneralize; attend to strengths as well as weaknesses
Overall lesson
- Use tests properly! Don’t overgeneralize from single scores!
OH 8
Social Consequences III: Racial and Gender Fairness
Definitions of test "fairness" often differ
- "absence of bias"—same score predicts the same thing,
regardless of race
- one of the professional standards for test fairness
- "procedural fairness"—testing conditions provide equal
opportunity for all to show what they know (e.g., comparable grading
standards)
- one of the professional standards for test fairness
- "opportunity to learn"—all students had the same opportunity
to learn the material being tested
- "equality of results"—all races get the same average
score
- this often requires violating definitions 1 and 2
"Proper" definition depends on your purposes
- If it is valid scores, then 1 and 2 are essential
- If it is to measure an enduring aptitude or ability, then 3 is also essential; it is not, if the aim is to know how much students actually know (regardless of ability)
- If it is equal scores for all groups, then 1-3 are irrelevant (and
often conflict)
Distinction between test bias and unfair test use
- Test bias = flaw in the test (e.g., content that is unfamiliar or
demeaning to some groups)
- test makers now use citizen panels and statistical tests to avoid this
- this is a technical issue
- lack of bias does not mean that all groups will score equally
(there are many possible reasons for average score and skill
differences)
- Unfair test use = unfair use of an unbiased test (e.g., do we use the
same or different score cutoffs for different racial groups? Should new
tests be added that favor minorities or women, e.g., SAT writing
test?)
- this is a social-political issue (there is no technical solution)
- decision may be affected by why scores differ (e.g., lack of
opportunity to learn)