ITEMAL performs item analyses of individual test questions as well as entire tests. It can be used to analyze data collected on optical scan sheets and already processed by the test scoring program or any scored test data that has been placed in a disk file.
The program provides the following summary information:
For each item:
For the entire test:
Details on the definition and interpretation of the statistics calculated by the ITEMAL program may be found in
Thorndike, Robert L. Applied Psychometrics. Boston: Houghton Mifflin Co. 1982.
Crocker, Linda, and Algina, James. Introduction to Classical and Modern Test Theory. New York: Holt, Rinehart and Winston. 1986.
The ITEMAL program allows you to analyze exams of any length, but you must break those with more than 160 questions into data sets of 160 questions or fewer. You then place data sets and their accompanying control information consecutively in a single data file. You may also place more than one complete test in a single data file.
Each data set is arranged in a special manner, as shown in the sample data file reproduced below. It begins with several lines of control information needed by the ITEMAL program.
The following parameters must be recorded on a single line:
+------------------------------------------------------------------+ | SAMPLE ITEMAL DATA FILE | +------------------------------------------------------------------+ |....+....1....+....2....+....3....+....4....+....5....+....6....+ | | | | 0001 PSY101 10 S. Freud 091312 10 20 1 4 1 1 | | 12341234123412341234 | | (T14, I5, T41, 20I1) | | 001 20 12341234123412341234 | | 002 10 12341234121133113311 | | 003 1 13223322332233223322 | | 004 15 12341234123412313311 | | -1 | | 005 8 33113311331112341234 | | 006 0 44114411441144114411 | | 007 4 12344411441144114411 | | 008 20 12341234123412341234 | | -1 | | 009 12 12341234123433223322 | | 010 16 33111234123412341234 | | -1 | | -1 | | -1 | | | | ....+....1....+....2....+....3....+....4....+....5....+....6....+ | | | +------------------------------------------------------------------+
Data Format: In columns 1-72 of the following line you will describe how the student data is formatted in the data file. You must use T (Tab) and I (Integer) formats to specify the location of the student score and the responses to be analyzed. The format statement must begin with a left parenthesis in column 1 and end with a right parenthesis. For example, if you were going to record the test scores in columns 14-18 and the responses in columns 41-60, you would use this format statement:
(T14, I5, T41, 20I1)
T14, I5 indicates that the scores start in column 14 and use a maximum of 5 columns; T41, 20I1 indicates that 20 one-digit student answers are found starting in column 41. The scores and answers must be right-justified (i.e., they must appear in the right-most position of the columns set aside for them).
If you wanted to omit the 10th, 11th, and 17th items from the analysis, for example, you would need to modify the format statement to reflect this omission. Since items 10, 11, and 17 are in columns 50, 51, and 57, you would need to skip those columns. The new format statement would therefore be
(T14, I5, T41, 9I1, T52, 5I1, T58, 3I1)
The answer key, described above, would then have the correct answers for items 1-9, 12-16, and 18-20 in the first 17 consecutive columns of that line.
Student Data Section: You would usually record each student's answers on a single line (for example, columns 41-60 in the example above) although you might need several lines for a long test. Using special code lines, you may divide the students into as many as five groups for ITEMAL to analyze both individually and collectively. For example, the first group might contain the top 20% of the class, the second group the next 20%, and so on. Note that you must do the grouping; ITEMAL does not do it automatically.
If you record each student's data on a single line, separate each group by a single line with a right-justified -1 in the columns specified in the format statement for the score value. In the example above, each -1 is in columns 17-18.
If you represent each student's data on two lines, separate the groups by two lines. The first line should have a -1 in the score area specified by the format statement; the second line should be blank. In general, the number of lines needed between groups is equal to the number of lines you use to record each student's data. The separator lines consist of one -1 line followed by the necessary number of blank lines. (The program will also allow you to use -1 lines instead of blank lines.)
If you want ITEMAL to analyze fewer than five groups, you must put the extra sets of -1 lines at the end of the whole data set. In the example above, the three groups each end with a -1 line. The two additional -1 lines at the end make a total of five. As a second example, if there were 2 lines per student, and only one group were to be analyzed, you would add 10 extra lines (either alternating -1 and blank lines or ten -1 lines).
If a test is composed of more than 160 questions, you will need to divide it into data sets of 160 questions or fewer. Before each set, repeat the control information specified in Steps 1-3. Stack the sets consecutively.
If you stack complete, independent tests, you must separate each test with one blank line. Two blank lines should follow the last complete test even if there is only a single complete test.
Note: Clients who use the test scoring service will automatically get a file that has been analyzed using the ITEMAL program, regardless of the options that are filled in on their control sheet. This file will be uploaded to the UD Dropbox with the filename bin#.itemal.analysis.
Warning: Do not use the automatically generated ITEMAL PDF file if you have multiple sections. Follow the directions to modify the output file produced by the test scoring program.
If you generate the data using the Test Scoring program, you will usually choose "4" as the value of standard option 3 on the Test Scoring Control Sheet. The Test Scoring Program will generate the student data for ITEMAL as well as the necessary control information described above. If the output file produced by the Test Scoring Program includes data for more than one section of a course, you will first need to delete the extra blank line and the line of asterisks generated by the Test Scoring Program. You can easily do this using any editor, such as pico.
Furthermore, the output file from the Test Scoring program will contain the item responses for all test questions, even those you may have designated to be deleted from the scoring. If you do not want ITEMAL to analyze these items, you must modify the information in Steps 1-3 above using an editor. Each step will require at least one change. If you do not make these changes, the summary statistics that appear in ITEMAL's "Additional Test Information" section may have spurious values.
You may also choose the values 3 or 5 for standard option 3 in the Test Scoring program. For each of these choices, the Test Scoring program will produce a file of student data, but you must construct the necessary control information for ITEMAL and insert the requisite -1 lines. The Test Scoring program documentation describes the format of the output.
The ITEMAL program is only available on Strauss, and not on Copland.
The general form of the command to run ITEMAL is
~consult/scanning/itemal < input_filename > output_filename
where you replace input_filename with the name of your input file and output_filename with the name of your output file. You must type the "<" and ">" UNIX redirection symbols. For example,
~consult/scanning/itemal < test1.itemal > test1.analysis
The output in the file test1.analysis can be printed on the Smith Hall laser printers by typing
enscript -rB -p- -fCourier-Bold8 test1.analysis | qpr -q smips
To print it on another printer, substitute the name of the printer for smips.
Item difficulty index: The proportion of students choosing the correct response is termed item difficulty; therefore, the higher the index, the easier the item. Item difficulty is recorded in the individual item output section under the heading "Proportion Choosing" and is marked with an asterisk. Mean Item Difficulty is printed in the "Additional Test Information" output section.
A classroom test covering related subject matter should contain items having a wide range of difficulty values. However, items with difficulty indices near or below the "chance level" (0.20 for an item having 5 alternatives; 0.25 for an item having 4 alternatives) are undesirable. Equally undesirable are easy items having indices close to 1 since they are poor discriminators. In general, if the proportion of students choosing an incorrect alternative is over 50%, you should make the alternative less attractive. Of course, you may intend a few questions to be very difficult or very easy.
Questionable items: Ideally, students who choose the incorrect alternatives for an item should be the students having lower test scores as well. An item for which this is not happening may be detected by comparing the columns "Proportion Choosing" and "Mean Score" in the item output section. For incorrect alternatives, high proportions should not coincide with relatively high mean total test scores. This notion roughly corresponds to wanting positive correlations between item scores and total scores.
ITEMAL tries to detect and note potentially bad alternatives by printing a question mark in the "Questionable Item" column. It does this for a correct alternative if either (a) the mean score for students choosing the correct answer is less than the overall mean test score or (b) if the proportion choosing the correct answer is not between 20% and 80%. It also does this for an incorrect alternative if the mean score for students choosing the incorrect alternative exceeds the mean score for students choosing the correct alternative. Furthermore, an alternative will be marked with a question mark if the proportion choosing the alternative is less than a critical value depending on the number of alternatives per item. Each critical value is approximately equal to .2/(number of alternatives - 1) for numbers of alternatives between 2 and 5. All of these criteria are heuristically based and are merely indicators.
Item score - total score biserial correlation: Usually, it is desirable to keep items with high positive correlation coefficients and to eliminate those with coefficients that are negative or near zero. As a rough guide, you should eliminate or substantially revise questions with correlation coefficients less than 0.10 and try to improve those questions with coefficients in the 0.10 - 0.30 range.
For each item, ITEMAL reports the point-biserial correlation and the biserial correlation using the total test score, including the contribution from that item. Many psychometricians prefer analyzing a corrected total score that does not include the contribution from that item. For exams having large numbers of items, the results are quite close, with any differences becoming more important as the number of items decreases. The use of corrected total scores can be obtained by using the reliability analysis procedures in SAS and SPSS.
You may encounter cases in which the estimated correlations exceed 1. This is due to the unbiased estimators of variances ITEMAL uses in the final calculation. In these cases, it is reasonable to interpret these correlations as being equal to 1.
The reported t-value corresponding to the point-biserial correlation r corresponds to a statistic for testing whether the point-biserial correlation is zero. It is equivalent to a statistic for testing that the mean test score of the students correctly answering the item is equal to the mean test score of the students answering incorrectly.
The mean item score - total score biserial correlation: the mean value of all the individual item score - total score biserial correlations, is printed in the "Additional Test Information" output section.
Test Reliability: ITEMAL uses the Kuder-Richardson 20 (KR-20) internal consistency formula to compute test reliability. It measures the similarity of the items within the test and is equivalent to Cronbach's coefficient alpha when items are scored either right or wrong.
KR-20 estimates may be as large as 1. Higher values indicate a greater degree of reliability than lower values. These estimates may be meaningless if even small numbers of students are unable to complete the test within the allotted time.
Test standard deviation: The standard deviation is a measure of the variability or scatter of test scores. It is especially useful because of its relationship to the normal curve. When the number of students is relatively large and the test scores follow a bell-shaped curve, one standard deviation taken on each side of the mean includes approximately 69% of the test scores, two standard deviations on each side of the mean includes over 95% of the scores, and three standard deviations includes over 99%.
Standard error of measurement: The standard error of measurement (s.e.m.) is an estimate of the probable extent of error in individual test scores. Its interpretation is similar to that of a standard deviation. For example, an s.e.m. of 1.5 indicates that for any particular test score, the odds are 2 to 1 that the student's true score (his average score on many similar tests) will not deviate from the one obtained by more than 1.5 points. Thus, small standard errors of measurement are associated with more reliable tests.
Reproduced below is the ITEMAL output for the first four questions of the test data shown above and the "Additional Test Information" output for the entire test.
ITEM ANALYSIS FOR DATA HAVING SPECIFIABLE RIGHT-WRONG ANSWERS THE USER HAS SPECIFIED THE FOLLOWING INFORMATION ON CONTROL CARDS JOB NUMBER 1 COURSE PSY101 10 INSTRUCTOR S. Freud DATE (MONTH, DAY, YEAR) 12 13 12 NUMBER OF STUDENTS 10 NUMBER OF ITEMS 20 ITEM EVALUATION OPTION (0=NO, 1=YES) 1 MAXIMUM NUMBER OF ANSWER CHOICES 4 INPUT FORMAT (T14, I5, T41, 20I1) RESPONSE FORM 1=A, 2=B, 3=C, ...ETC NUMBER OF COPIES OF OUTPUT (MAX. ALLOWED=2) 1 CORRECT ANSWERS IN GROUPS OF FIVE 12341 23412 34123 41234 ITEM NUMBER 1 CORRECT ANSWER AND ITEM DIFFICULTY INDEX ARE IDENTIFIED BY * OPTIONS 1ST 2ND 3RD 4TH 5TH RESPONSE PROPORTION MEAN OPTIONS GROUP GROUP GROUP GROUP GROUP TOTAL CHOOSING SCORE QUESTIONABLE OMIT 0 0 0 0 0 0 0.000 0.00 *A OR 1 4 2 1 0 0 7 * 0.700 11.71 B OR 2 0 0 0 0 0 0 0.000 0.00 ? C OR 3 0 1 1 0 0 2 0.200 12.00 ? D OR 4 0 1 0 0 0 1 0.100 0.00 TOTAL 4 4 2 0 0 10 BISERIAL CORRELATION BETWEEN ITEM SCORE AND TOTAL SCORE ON TEST = 0.304 POINT-BISERIAL CORRELATION = 0.232 T = 0.676 ITEM NUMBER 2 CORRECT ANSWER AND ITEM DIFFICULTY INDEX ARE IDENTIFIED BY * OPTIONS 1ST 2ND 3RD 4TH 5TH RESPONSE PROPORTION MEAN OPTIONS GROUP GROUP GROUP GROUP GROUP TOTAL CHOOSING SCORE QUESTIONABLE OMIT 0 0 0 0 0 0 0.000 0.00 A OR 1 0 0 0 0 0 0 0.000 0.00 ? *B OR 2 3 2 1 0 0 6 * 0.600 13.50 C OR 3 1 1 1 0 0 3 0.300 8.33 D OR 4 0 1 0 0 0 1 0.100 0.00 TOTAL 4 4 2 0 0 10 BISERIAL CORRELATION BETWEEN ITEM SCORE AND TOTAL SCORE ON TEST = 0.615 POINT-BISERIAL CORRELATION = 0.485 T = 1.569 ITEM NUMBER 3 CORRECT ANSWER AND ITEM DIFFICULTY INDEX ARE IDENTIFIED BY * OPTIONS 1ST 2ND 3RD 4TH 5TH RESPONSE PROPORTION MEAN OPTIONS GROUP GROUP GROUP GROUP GROUP TOTAL CHOOSING SCORE QUESTIONABLE OMIT 0 0 0 0 0 0 0.000 0.00 A OR 1 0 2 1 0 0 3 0.300 8.00 B OR 2 1 0 0 0 0 1 0.100 1.00 *C OR 3 3 2 1 0 0 6 * 0.600 13.50 D OR 4 0 0 0 0 0 0 0.000 0.00 ? TOTAL 4 4 2 0 0 10 BISERIAL CORRELATION BETWEEN ITEM SCORE AND TOTAL SCORE ON TEST = 0.615 POINT-BISERIAL CORRELATION = 0.485 T = 1.569 ITEM NUMBER 4 CORRECT ANSWER AND ITEM DIFFICULTY INDEX ARE IDENTIFIED BY * OPTIONS 1ST 2ND 3RD 4TH 5TH RESPONSE PROPORTION MEAN OPTIONS GROUP GROUP GROUP GROUP GROUP TOTAL CHOOSING SCORE QUESTIONABLE OMIT 0 0 0 0 0 0 0.000 0.00 A OR 1 0 2 1 0 0 3 0.300 8.00 B OR 2 1 0 0 0 0 1 0.100 1.00 C OR 3 0 0 0 0 0 0 0.000 0.00 ? *D OR 4 3 2 1 0 0 6 * 0.600 13.50 TOTAL 4 4 2 0 0 10 BISERIAL CORRELATION BETWEEN ITEM SCORE AND TOTAL SCORE ON TEST = 0.615 POINT-BISERIAL CORRELATION = 0.485 T = 1.569 ADDITIONAL TEST INFORMATION THE MEAN ITEM DIFFICULTY FOR THE ENTIRE TEST = 0.530 THE MEAN ITEM SCORE - TOTAL SCORE BISERIAL CORRELATION = 0.846 KUDER-RICHARDSON 20 RELIABILITY = 0.958 TEST MEAN = 10.60 VARIANCE = 53.60 STANDARD DEVIATION = 7.32 STANDARD ERROR OF MEASUREMENT (BASED ON KR-20) = 1.50 NUMBER OF STUDENTS = 10 NUMBER OF ITEMS ON TEST = 20 DISTRIBUTION OF THE TEST ITEMS DISTRIBUTION OF THE TEST ITEMS IN TERMS OF THE IN TERMS OF PERCENTAGE OF STUDENTS PASSING THEM ITEM SCORE - TOTAL SCORE BISERIAL CORRELATIONS PERCENT PASSING NUMBER OF ITEMS CORRELATIONS NUMBER OF ITEMS 0 - 19 0 NEGATIVE - .10 0 20 - 39 0 .11 - .30 1 40 - 59 10 .31 - .50 0 60 - 79 10 .51 - .70 3 80 - 100 0 .71 - .90 8 .91 - 8 CHOICES % KEYED % CHOSEN AVG. DIFF. A 0.250 0.315 0.560 B 0.250 0.205 0.540 C 0.250 0.265 0.520 D 0.250 0.215 0.500 % KEYED = FREQUENCY OF A GIVEN KEY DIVIDED BY THE NUMBER OF ITEMS. % CHOSEN = FREQUENCY OF A GIVEN RESPONSE DIVIDED BY THE TOTAL NUMBER OF RESPONSES TO ALL ITEMS (EXCLUDING OMITS). AVG. DIFF.=TOTAL OF ALL ITEM DIFFICULTY VALUES FOR ITEMS WITH A GIVEN KEY DIVIDED BY THE NUMBER OF SUCH ITEMS.