Other Issues in Computer-Adaptive Test

Pass-Fail CAT

In many situations, the purpose of the test is to classify examinees into two or more mutually exclusive and exhaustive categories. This includes the common "mastery test" where the two classifications are "Pass" and "Fail," but also includes situations where there are three or more classifications, such as "Insufficient," "Basic," and "Advanced" levels of knowledge or competency. The kind of "item-level adaptive" CAT described in this article is most appropriate for tests that are not "Pass/Fail." (Or, for Pass/Fail tests where providing good feedback is extremely important.) Some modifications are necessary for a Pass/Fail CAT, also known as a computerized classification test (CCT). For example, a new termination criterion and scoring algorithm must be applied that classifies the examinee into a category rather than providing a point estimate of ability. There are two primary methodologies available for this. The more prominent of the two is the sequential probability ratio test (SPRT). This formulates the examinee classification problem as a hypothesis test that the examinee's ability is equal to either some specified point above the cut score or another specified point below the cut score

A confidence interval approach is also used, where after each item is administered, this algorithm determines the probability that the examinee's true-score is above or below the passing score. For example, the algorithm may continue until the 95% confidence interval for the true score no longer contains the passing score. At that point, no further items are needed because the pass-fail decision is already 95% accurate (assuming that the psychometric models underlying the adaptive testing fit the examinee and test). For examinees with true-scores very close to the passing score, this algorithm will result in long tests while those with true-scores far above or below the passing score will have shortest exams. As a practical matter, the algorithm is generally programmed to have a minimum and a maximum test length (or a minimum and maximum administration time). This approach was originally called "adaptive mastery testing" but it can be applied to non-adaptive item selection and classification situations of two or more cut scores (the typical mastery test as a single cut score)

The item selection algorithm utilized depends on the termination criterion. Maximizing information at the cut score is more appropriate for the SPRT because it maximizes the difference in the probabilities used in the likelihood ratio. Maximizing information at the ability estimate is more appropriate for the confidence interval approach because it minimizes the conditional standard error of measurement, which decreases the width of the confidence interval needed to make a classification.

Constraints of Adaptively

ETS researcher Martha Stocking has quipped that most adaptive tests are actually barely adaptive tests (BAT's) because, in practice, many constraints are imposed upon item choice. For example, CAT exams must usually meet content specifications; a verbal exam may need to be composed of equal numbers of analogies, fill-in-the-blank and synonym item types. Also, on some tests, an attempt is made to balance surface characteristics of the items such as gender of the people in the items or the ethnicity implied by their names. Thus CAT exams are frequently constrained in which items it may choose and for some exams the constraints may be substantial and require complex search strategies (e.g., linear programming) to find suitable items.

Wim van der Linden and his coauthors have advanced an alternative approach called shadow testing which involves creating entire shadow tests as part of selecting items. Selecting items from shadow tests helps adaptive tests meet selection criteria by focusing on globally optimal choices (as opposed to choices that are optimal for a given item).