In discussions of the uses of IRT for psychological measurement the contention is often heard that IRT allows only the ordering of persons and items and that therefore a metric measurement of amounts of change or of treatment effects, e.g., in clinical psychology, cannot be established by means of IRT. The standard argument is that ultimately all measurements of abilities or traits must remain ordinal because we may at any time apply strictly monotone transformations to latent scales.
On the other hand, it might be countered that the latter is true in physics where we may at any time decide to replace the scales of properties like mass, force, or voltage - the metric scale properties of which are never doubted - by monotone functions of these scales, like square roots or logarithms. To this again it might be retorted that the fundamental measurement of physical properties is based on concatenation operations as an underpinning of the resulting ratio scales, whereas such operations do not exist for person traits in psychology.
It may be shown, however, that the principle of Specific Objectivity (SO), which is an empirically testable axiom introduced by Rasch (1968, 1972, 1977) in IRT and which was applied by Fischer (1987) to the measurement of change, actually is a perfect analogue of a concatenation operation. To make this clearer, we have to introduce the formal definition of SO on which the paper of Fischer (1987) was based.
Parameterization: Consider a patient S suffering from anxiety , who undergoes some therapy. Let , for t=1,2,3, be S's positions on the latent anxiety scale at three time points T1,T2,T3, and denote the amount of change (= the effect of a tretment) between Ta and Tb by . Then it is assumed that there exists a continuous function U(x,y), defined on R2, such that . U is called a `comparator' because it serves to compare the two values of anxiety, and , in order to arrive at an assessment of the amount of change, . For any fixed value x0, z=U(x0,y) is assumed to be a bijective decreasing mapping, and for any fixed value y0, z=U(x,y0) is assumed to be a bijective increasing mapping.
It then follows immediately from the monotonicity of U that there exists a function F such that . The meaning of F is that, if we know both S's latent anxiety at time point Ta before the treatment and the effect of the treatment given between Ta and Tb, denoted , we can predict uniquely S's amount of anxiety at time point Tb. F is therefore called the `effect function'.
Specific Objectivity (SO): The total effect of any two treatments with effect parameters
This definition of SO in a measurement of change framework is
analogous to Rasch's (1967, 1968, 1972, 1977) definition of SO for
the comparison of items or persons. The analogy becomes obvious if
in (1), yielding
This equation, which should hold for all possible values of the parameters , has strong implications for the functions H and U, because the right-hand side is a function of , whereas the left-hand side is independent of ; this means that must somehow cancel out of the right-hand side.
From results in the theory of functional equations (see Aczél, 1966)
it follows that there exist continuous, bijective functions (`scale
Moreover, the scale transformation is unique up to positive linear tranformations , with c>0, and is unique up to similarity transformations .
Using a term introduced by G. Rasch, we may say that U and H are `latently additive' functions. From this property it is further concluded that, once this admissible additive representation of the latent trait and the effect parameters has been adopted, the scale for anxiety is an interval scale, and that for the treatment effect, a ratio scale.
These results, however, are only one side of the problem of
constructing an IRT model that allows for a specifically objective
assessment of treatment effects. The other side is the search for an
appropriate ICC that links the latent parameters to the observed
reactions. Suppose the manifest reactions are S's responses to a
dichotomous item or the presence vs. absence of a certain symptom of
anxiety, and assume that the comparator function U be
a function of the two response probabilities
at time points Ta and Tb, respectively,
which is some monotone function of
an (unconditional or conditional) likelihood of the observed
responses; then it can be shown that the ICC
a logistic function (cf. Fischer, 1987),
The reason why we get so strong results about the measurement of treatment effects lies in the functional equations (1) and (2), whereby continuity and strict monotonicity of the functions plays a decisive role. But are these assumptions reasonable? In fact they are. Firstly, monotonicity is implied by the underlying psychological concepts: it is a defensible assumption that a medicinal or psychotherapeutic treatment of anxiety will always reduce - rather than increase - anxiety; etc. Secondly, the assumption of continuity is defensible because we may increase the dosage of a treatment as much or as little as we want, thus creating treatments with infinitely many different effect parameters, and it appears to be plausible that a small dosage of the treatment will at most produce a small change of the latent trait parameter. For this argument, however, it is essential that the dosage of a treatment is a manipulable experimental factor rather than an incidental factor. This is the substantive justification of the assumptions that yield strong results concerning the measurement of effects.
It is worthwhile to mention that in other psychometric settings we do not obtain equally strong results about the measurment of, say, abilities in educational testing: if persons can be tested only once and if the test consists of a finite number of items with fixed item parameters conforming to the Rasch Model (RM), it can be shown that the scale properties depend on whether the differences between item parameters are rational or irrational numbers; in other words, the scale properties of the measurement are empirically undecidable (see Fischer, 1995). (This is primarily of theoretical interest, though, and has little practical consequences because interval scale properties can be shown to hold for infinitely many discrete points along the latent continuum; but in principle the scale level remains undecidable.) The reason of this dilemma is that continuitiy in terms of the item parameters - tacitly assumed by Rasch (1967, 1968, 1972, 1977) and most authors in IRT literature - is not justifiable: we cannot change item content in such a manner that arbitrarily small changes of item difficulty result.
These considerations show why applications of IRT to clinical or applied psychology are particularly nice from a psychometric point of view. There are also other reasons why IRT is especially fruitful for clinical psychology; some of them are given below.
Although treatment effect studies basically are experimental because treatments can be given or withheld from patients and dosages can be manipulated, the randomized assignment of patients to treatment groups is often hard to realize: there may be practical or ethical reasons to reject randomization. Therefore, often treatment groups are compared to waiting groups (i.e., groups of patients waiting for treatment). Unfortunately, experience showns that in most cases waiting groups differ significantly from the treatment groups with respect to the distribution of relevant person characteristics. Moreover, in psychotherapy the patients play an active role in the selection of the therapy method. Lastly, patients' syndromes and severity of their illness influence the treatment decisions made by doctors or psychologists. Therefore, treatment effect studies often are not experimental studies in a strict sense, they are only what is called `quasi-experimental'.
We therefore have to ask, how can results about treatment effects be obtained that remain valid in spite of systematic differences between treatment and control groups? IRT has been seen to be an excellent basis for solving this problem as long as the differences between treatment and control groups reside in the distributions of the person parameters . The answer again lies in the method of conditional inference introduced in IRT by Rasch (1960) and Andersen (1973). Conditional inference is the statistical conterpart of the epistemic principle of SO. It requires that there exist a nontrivial likelihood function depending on the treatment effect parameter(s) but being independent of the incidental person parameters . Maximizing that function yields a conditional maximum likelihod (CML) estimator of the treatment effect idependently of the distribution of the latent person parameters in the samples. This CML estimator will generally be biased (like other ML estimators), but if its consistency can be proved, it suffices to increase the sample size sufficiently to obtain as precise a result as one desires. SO as a general methodological principle and conditional inference as a practical tool therefore provide an excellent answer to the scientific challenges of therapy effect studies - as long as the differences between groups are describable by the distributions of the person parameters. (On the many uses of the CML methods in the Rasch model and related models, cf. the monograph by Fischer & Molenaar, 1995).
The greater part of the IRT literature deals with dichotomous items. In clinical psychology, however, symptoms may be expressed in several degrees, or critical events (such as epileptic seizures of migraine attacks) may occur more or less frequently. Therefore, in clinical psychology IRT models are required for observations that are realizations of bounded or unbounded integer variables. A fairly general formal framework for treatment effect studies in clinical or applied psychology therefore is the following:
Let the state of a person at time point T1 be expressed in terms
of the distribution of a positive integer random variable H1 (a
so-called `lattice variable') with probability functions
governed by a real valued parameter
are continuous in
For time point T2, let the state of the
person similarly be expressed in the form of a distribution of a
lattice variable H2 with probability functions
is a real valued parameter
The variables H1 and H2 are
assumed to be stochastically independent and increasing in
in the sense that
for all h, are strictly monotone
increasing in ,
The present formalization covers cases where the observation is a frequency, or where it is a response to a polytomous item with ordered categories, or a response to a dichotomous item; in the first of these cases, h is allowed to take infinitely many values, ; in the second, H1 and H2 are bounded such that , where m+1 is the number of response categories; and in the third, m =1. Notice that the parameter of H2 is written as without loss of generality because we know that, if a specifically objective result concerning change is possible, the parameters of the person and of the treatment effect can always be transformed into the additive system (6) through (8).
Under these assumptions about the variables H1 and H2, the
following result can be derived (see Fischer, 1995, pp. 298-303):
If the conditional probability
realizations h1 and h2, is some function
and strictly monotone in ,
must be probability functions of the form
The distribution form underlying (11) is a so-called `power series distribution' which is well-known in statistics (see, e.g., Noack, 1950; Patil, 1965; Johnson & Kotz, 1969, Vol. 1). However, the general form of the model in (11) is not yet useful because the power series distributions comprise infinitely many unknown constants ph and qh. To render (11) useful, we either have to introduce more restrictive assumptions on the model or to truncate the variables H1 and H2 so that h is restricted to the values , as, e.g., in rating scale items with m+1 ordered response categories; then the number of constants ph and qh is at most 2m, so that they can be estimated empirically.
Regarding the first of these two cases, consider an application where H1 is the number of migraine attacks of a patient that occur during an observation period prior to the migraine treatment; that is, H1 represents the so-called `base rate' of that patient. To enable a meaningful comparison of both periods of observation, we assume that the two distributions are the same, that is, ; in other words, all that possibly changes reside in the latent trait parameter .
Clearly, for any chosen observation period, some attacks will have happened prior to it and others would happen after its end if the the observation were continued; that is, some attacks will remain unobserved. The same holds for the observation period after the treatment. The methodological problem is that the unobserved events may be a source of bias of the result about the treatment effect. If it is postulated, however, that ignoring the unobserved events does not systematically affect the result about (denoted `ignorability principle' by Fischer, 1991), it can be shown that the model must be equivalent to Rasch's (1960, 1973) multiplicative Poisson model, which results from (11) by setting ! and !.
In the second case, where H1 and H2 are responses to rating
scale items, let the variables H1 and H2 be restricted to
Introducing new parameters
(11), we obtain
By applying again the CML method, the parameters can be eliminated and the so-called `basic parameters' and estimated jointly. A problem that arises is that treatment effect studies mostly employ only small samples, so that estimating the jointly with the treatment effects parameters may bias the latter. Suppose, however, that a sample of calibration data for the items is available. The information about the contained in the calibration data can be used by simply adding the calibration sample to the actual treatment effect data: notice that the model for time point T1 - prior to the treatments, where the dosagens qvj1 are zero for all v and j - is just a PCM, so that the calibration data may simply be added to the observations obtained at time point T1. This will stabilize the and thus render the more precise.
Here we encounter one more payoff of the CML approach: since the are eliminated, the remaining `basic parameters' of the LPCM may generalize over different samples or populations. The LPCM is therefore very well suited for those problems which are treated in Meta Analysis, which nowadays is so popular in clinical psychology: the central question in Meta Analysis is, do treatment effects generalize over different studies? In the present models, the H0 of the generalizability of treatment effects over different samples is immediately testable by means of conditional likelihood ratio tests.
There are two important special cases of the LPCM which deserve being
mentioned: in the LPCM, there is one parameter
combination of item
category, that is, the items may have
different response categories. In items which all have the same
response categories, it often makes sense to replace the
that is, to assign one scalar
item (easiness) parameter to each item and one category
(attractiveness) parameter to each response category. Thereby,
the number of parameters to be estimated is considerably reduced.
The other special case occurs if the items are dichotomous (`yes' vs. `no'; correct vs. incorrect; `+' vs. `-'); then the Linear Logistic Test Model (LLTM; Fischer, 1973, 1976, 1995b) for the measurement of change results as a special case. All these models - including of course the simple RM, the PCM and the RSM applied to data obtained at just one time point - can be estimated and hypotheses within these models can be tested by means of a recently developed program that runs on PCs under Windows (LPCM-WIN 1.0; Fischer & Ponocny-Seliger, 1998). This program is an elegant tool expecially for treatment effect studies in clinical or applied psychology.
In the family of models described so far it is assumed that all items Ii measure one and the same unidimensional ability or trait . In many domains of applied research in psychology or education, however, the researcher will opt for multidimensional item sets to monitor change of behavior under the effects of treatments. A typical example is the treatment of depressive patients, where the items are various symptoms related to emotion, cognition, body functions, circadiane rhythm, to work and to social relations. Clearly, such a heterogeneous set of items will never form a unidimensional scale of depressiveness. It is therefore paramount to extend the models so that items may be multidimensional.
This method of dealing with multidimensionality within the family of
LPCMs is a simple reinterpretation of the LPCM: to each person
Sv, assign one separate parameter
per item Ii,
These parameters will be eliminated again, however,
by conditioning on one raw score per item namely,
To obtain a multidimensional LPCM, all we have to do is to rewrite
To specify the subsets Jd and to handle the data accordingly, various tools are provided by the computer program LPCM-WIN 1.0. The program supports incomplete designs where different samples of items from each subset Jd are selected at different time points, which has the advantage that memory and satiation effects, which often are quite troublesome in repeated measurement designs, are avoided. To summarize this, LPCM-WIN 1.0 allows for