Overheads for Unit 3--Chapter 5 (Reliability)
OH 1
Relation Between Validity and Reliability
Question:
What is the difference between
validity and reliability?
Answer:
- Validity is the extent to which test scores mean what you say they
mean. That is, are you interpreting the scores
appropriately?
- Reliability is the extent to which test results are consistent over
time, different versions of the test, or people scoring it. That is, how dependable are the results?
OH 2
Why should we be concerned about reliability?
Answer:
- Your test can’t be valid unless it is reliable (i.e., its scores
are
dependable).
- In fact, a test’s criterion validity can be no
higher than the square root of its reliability.
- It is important to know how much measurement error there is in
individuals’ scores (e.g., on a standardized test).
OH 3
Reliability:
Some important points
- there are different kinds of consistency, so there are different kinds
of reliability
- reliability requires statistical, not logical analysis (validity
requires both)
- calculating reliability requires test scores
- reliability can be reported in three ways, which serve different
purposes
- correlations
- standard error of measurement
- percentage agreement
OH 4
Reliability Coefficient (Rxx)
Rxx = square root of
the following ratio:
similarity in ranks on Forms 1 & 2
(SD1)(SD2)
SD = standard deviation
Important point:
- Like all correlations, reliability coefficients are sensitive
to
variation in the sample (SD): smaller variation means lower
reliabilities, all else equal.
- Why? Because tests can’t distinguish well among people who
don’t
differ much in knowledge or ability (SD is small). With retesting, small
changes in
their scores can easily change their ranks on the test—which depresses the numerator above (relative to the
SDs).
OH 5
Assessing Reliability of Norm-Referenced Tests:
Correlational Methods
Methods:
- test-retest—same test, different times
- equivalent forms—different forms of test, "same" time
- test-retest with equivalent forms—different forms, different time
- internal consistency—different parts of same test
- split half
- Kuder-Richardson and Coefficient Alpha
- interrater consistency—different raters/graders
OH 6
Assessing Reliability of Norm-Referenced Tests:
Correlational
Methods
Important points:
Comparing methods
- some methods include more types of consistency than others
- some are better suited to some purposes than others
- test-retest with equivalent forms is the most useful for most
purposes
Influences on reliability
- number of items **crucial, because it is something you can control!!**
- spread of scores
OH 7
What kinds of consistency do each of the methods capture?
Exercise
Put an X in the appropriate spots of Table 5.4
OH 8
Assessing Reliability of Norm-Referenced Tests: Standard Error
of Measurement
Definition: The amount of error (movement) in a person’s test
score we can expect from one administration to another of same or
comparable test.
Helps answer these questions:
- If short time interval: How sure can we be that the person’s
true score really is close to their observed
score? (fringe of error)
- If long time interval: How likely is their score to remain roughly the
same over some period of time? (stability of test scores)
OH 9
Standard Error of Measurement (SEM)
Important points:
- SEM is derived directly from reliability coefficient
SEM = SD times the square root of (1-reliability)
- SEMs always depend on the spread of scores (SD) and other
characteristics of a group (e.g., age)
- SEMs always refer to a specific set of test-takers, therefore
- you need to judge whether the estimates derived from another group really apply to your students (e.g., their age level, heterogeneity)
OH 10
Examples of how "fringe of error" around scores increases as
reliability falls (p. 122)
Note: the following numbers are taken from page 122. They
include 3 rows from that table (for SD 10, 20, and 30).
Reliability
coefficient
SD .95 .90 .85
.80 .75 .70
10 2.2 3.2 3.9 4.5 5.0 5.5
20 4.5 6.3 7.7 8.9 10.0 11.0
30 6.7 9.5 11.6 13.4 15.0 16.4
OH 11
Differences in error of measurement
Do they really matter? How?
Would you expect all kinds of tests to be equally reliable? Why or why not?
OH 12
Assessing Reliability of Criterion-Referenced Tests: Percentage
Agreement
Question:
Why might we not want to use correlational methods with criterion-referenced tests?
Answer:
The aims of norm- and criterion-referenced tests are usually different. The former often sample a broader range of material and seek to differentiate among students. In contrast, criterion-referenced tests usually cover a smaller, more specific domain of tasks and are meant to assess absolute, not relative, levels of success in mastering the material.
OH 13
Which decisions demand high test reliability?
- Important
- Final
- Irreversible
- Unconfirmable
- Concern individuals
- Have lasting consequences
OH 14
Usability of Assessments
- ease of administration
- time required for administration
- ease of interpretation and use
- availability of alternate forms
- cost of testing