Overheads for Unit 3--Chapter 5 (Reliability)

Relation Between Validity and Reliability


What is the difference between validity and reliability?



Why should we be concerned about reliability?



Reliability: Some important points

  1. there are different kinds of consistency, so there are different kinds of reliability
  2. reliability requires statistical, not logical analysis (validity requires both)
  3. calculating reliability requires test scores
  4. reliability can be reported in three ways, which serve different purposes
    1. correlations
    2. standard error of measurement
    3. percentage agreement


Reliability Coefficient (Rxx)

Rxx = square root of the following ratio:

similarity in ranks on Forms 1 & 2

SD = standard deviation


Important point:


Assessing Reliability of Norm-Referenced Tests: Correlational Methods



Assessing Reliability of Norm-Referenced Tests: Correlational Methods

Important points:

Comparing methods

Influences on reliability


What kinds of consistency do each of the methods capture?


Put an X in the appropriate spots of Table 5.4


Assessing Reliability of Norm-Referenced Tests: Standard Error of Measurement

Definition: The amount of error (movement) in a person’s test score we can expect from one administration to another of same or comparable test.

Helps answer these questions:


Standard Error of Measurement (SEM)

Important points:

SEM = SD times the square root of (1-reliability)


Examples of how "fringe of error" around scores increases as reliability falls (p. 122)

Note: the following numbers are taken from page 122. They include 3 rows from that table (for SD 10, 20, and 30).

Reliability coefficient
SD .95 .90 .85 .80 .75 .70

10 2.2 3.2 3.9 4.5 5.0 5.5

20 4.5 6.3 7.7 8.9 10.0 11.0

30 6.7 9.5 11.6 13.4 15.0 16.4


Differences in error of measurement

Do they really matter? How?

Would you expect all kinds of tests to be equally reliable? Why or why not?


Assessing Reliability of Criterion-Referenced Tests: Percentage Agreement


Why might we not want to use correlational methods with criterion-referenced tests?


The aims of norm- and criterion-referenced tests are usually different. The former often sample a broader range of material and seek to differentiate among students. In contrast, criterion-referenced tests usually cover a smaller, more specific domain of tasks and are meant to assess absolute, not relative, levels of success in mastering the material.


Which decisions demand high test reliability?


Usability of Assessments