Katrijn Van Deun, Luc Delbeke
University of Leuven, Belgium
A short elementary overview is presented on the use of multidimensional scaling as a method of data analysis, and as a tool for perceptual and cognitive modelling in psychology.
2. Historical Perspective
3. Basic Equations
4. Important Issues
4.2. Types of models
6. Related Methods
Multidimensional scaling is an exploratory technique used to visualize proximities in a low dimensional space. Interpretation of the dimensions can lead to an understanding of the processes underlying the perceived nearness of entities. Furthermore it is possible to incorporate individual or group differences in the solution. In this paper a general overview of multidimensional scaling is given, explaining the basics and giving a classification of the frequently used models. An example is discussed and the results obtained using the popular ALSCAL algorithm are compared to results obtained using the recent and promising PROXSCAL algorithm.
A sound knowledge of linear algebra is supposed. The basics can be found in Strang (1993).
Multidimensional scaling (MDS) encompasses a collection of methods which allow to gain insight in the underlying structure of relations between entities by providing a geometrical representation of these relations. As such, these methods belong to the more general category of methods for multivariate data analysis. Multidimensional scaling can be characterised by the generality of the type of observed relations which can be submitted to the data analysis on the one hand, and by the specificity of the type of geometrical representation of these relations on the other hand. Whatever kind of relation between a pair of entities that can be translated into a proximity measure, or conversely into a dissimilarity measure, can be considered as possible input for multidimensional scaling, while the choice of a particular type of spatial representation can be considered to be the most important part of the "modelling" which goes together with the application of a specific MDS-algorithm on the set of proximities.
MDS has its roots in two important traditions within psychology. The first is in psychophysics and the other in psychometrics. Young and Householder (1941) wanted to extend the methodology of unidimensional scaling of perceptual characteristics of stimuli to the simultaneous scaling of several characteristics. Guttman (1954) was interested in a less restrictive model than the factoranalytic model to represent the relations between several assessment variables, a model which at the same time would allow for a much more systematic way to formulate hypotheses on the underlying structure for these assessment variables. The psychophysical approach led to algorithmic developments which soon came to be known by the name of multidimensional scaling, while the psychometric approach preferred to label its own production of algorithms under the heading of "smallest space analysis". Nowadays, both traditions can still be recognised in terms of types of applications of MDS-methods, while the latest generation of MDS-algorithms is clearly based on features borrowed from earlier algorithms as they were developed within both traditional lines. Hence there is no more need or reason to make the distinction as far as the methodology as such is concerned. This is also reflected in a number of recent textbooks, such as Cox & Cox (1994), Borg & Groenen (1997), Everitt & Rabe-Hesketh (1997).
We use the symbol pij to refer to the proximity measure between entities i and j. If a subject has to indicate the perceived dissimilarity between two colour patches on a rating scale (0 for "no difference" and 10 for "maximal difference"), then this rating can be considered to be a reversed measure of the proximity between the two colour stimuli. Or a correlation coefficient between variables i and j can be considered to be a proximity measure for these two variables. The proximities are then represented in a geometrical space, e.g. in a Euclidean space. The distance between two points i and j in an m-dimensional Euclidean space is given by the formula:
The Euclidean distances are related to the observed proximities by a suitable transformation depending on the measurement characteristics considered to be appropriate for these proximities: dij = f(pij) (under the assumption that the geometrical model fits the data perfectly). If the proximities are unique up to e.g. a linear transformation, then f(pij) = a+b(pij), where the multiplicative constant b is negative when the proximities have been observed as similarities between stimuli, and positive when they have been observed directly as dissimilarities. If the proximities contain only ordinal information with respect to the perceived dissimilarity (or similarity) between stimuli, the function f will belong to the class of all possible monotone (or inverse monotone) transformations. MDS-analyses which imply uniqueness on the interval level (or stronger levels of uniqueness such as ratio or absolute level) are known as metric MDS or classical scaling. They can be shown to be special cases of principal components analysis. If weaker levels of uniqueness than the interval level are assumed, use is made of so-called nonmetric MDS algorithms. The construction of this latter type of algorithms in the MDS-context was to a certain extent the gateway through which algorithms for non-linear analysis became well known in psychology.
The coordinates in the distance function (xia, i = 1, ... , n with n = number of entities, a = 1, ..., m with m = number of dimensions) and the function f which allows to transform the proximities into distances are estimated by minimising the following badness of fit function (usually called stress or S-function in the context of MDS):
This function takes into account that it is not realistic to expect a perfect fit of the model to the data. Therefore, the dij values are introduced in the S-function as the optimal approximations of the transformed proximities pij to the distances dij in the geometrical representation. They are obtained by applying the suitable transformation to the observed proximities, or: . The dij values are often referred to as the disparities in contrast to the untransformed proximities pij on the one hand, and the modelled distances dij in the geometrical space on the other hand. To the extent that the disparities have a close fit to the distances, the function f can be considered to be a kind of psychophysical function relating the observed proximities to the modelled distances, e.g. when the dimensions of the geometrical space are clearly related to specific physical characteristics of the stimuli.
Multidimensional scaling is not just one method of data analysis. Different algorithms can be used to obtain the geometrical representation of the proximities and this goes together with the existence of a number of multidimensional scaling models. In fact the models can be classified according to the kind of proximity dealt with and some of these models have been extended to incorporate inferential and confirmatory procedures. The correspondence between the data and the model will be reflected next: first we give a classification of proximity measures and then we complement it with the regular MDS models.
Proximity is a general term which indicates ‘nearness’ of entities. Two cases can be distinguished: the nearness of objects belonging to the same set (dissimilarities/similarities) and the nearness of objects belonging to different sets (dominance relations). In psychology they are often obtained directly (see the colour patches example supra) but there are many ways to obtain them indirectly, the classical case being the dissimilarity based on one minus the pearson productmoment correlation. An overview of coefficients used to obtain proximites is given by Cox and Cox (1994).
The classification we will use for proximity data draws on the work of Coombs (1964), Carroll and Arabie (1980) and Young (1987 and 1999). It is based on the shape of the datamatrix and the measurement characteristics of the data.
The shape of the datamatrix encompasses the number of ways, the number of modes and the presence/absence of replications and of symmetry. The number of ways of a data matrix refers to the dimensionality of the data-matrix, i.e. the number of manipulated conditions considered in the experimental design. The number of modes refers to the number of unique ways underlying the dissimilarities. A special way to be notified is one containing replications of the experimental design. Replications are considered to be exchangeable. In case of a two-way (replicated) data matrix, we speak of a rectangular matrix if there are two modes and of a square matrix if there is only one mode. This coincides with the distinction between dominance relations and (dis)similarities. The symmetry of the proximities is often assumed in the muldimensional scaling of square matrices but not always fulfilled (e.g. it can happen that a person ‘a’ likes/is close to a person ‘b’ but is disliked by/far away from that same person ‘b’).
The measurement characteristics include the measurement level, the measurement process and the conditionality. The measurement level relates to the invariance of the proximities under transformations. The usual scales are ratio-, interval-, ordinal and nominal scale. Multidimensional scaling is particularly suited for the analysis of ordinal data, these are the non-metric scaling models. Metric MDS is applicable to data measured at the ratio- and the interval-level. The measurement process comes down to the distinction between continuous and discrete: objects measured by a discrete process and belonging to the same category have the same number while objects measured by a continuous proces fall in a range of numbers when belonging to the same category. This feature is important as to the approach to ties in ordinal MDS models. If all proximities can be compared, there is no conditionality. More often the proximities are row- or matrix-conditional (e.g. when two persons rank tomatoes as the most preferred vegetables this does not necessarely mean they equally like tomatoes: for one person this could really be his favorite vegetable while for the other this could be the least rejected vegetable of those he has to order according to his preference).
In total we have 7 elements (as the number of modes w.r.t. the number of ways indicates if the datamatrix is square or rectangular) to categorize the proximity-matrix. This means that a lot of different proximities and thus a lot of different models could be constructed. Therefore we will restrict ourselves to shape and measurement characteristics that are important in distinguishing the common MDS models. These are the number of ways and modes, the squareness, symmetry and replication of the design. As to the measurement level, all models mentioned represent in fact two models: a metric and a non-metric one.
In this section we first classify the common multidimensional scaling models according to the classification used for the proximities. After that the models are introduced and finally we mention some recent developments in MDS.
The variety in proximities goes together with a variety in multidimensional scaling models. So far, the multidimensional scaling literature beared on two- and three-way data matrices mainly. A first major distinction is between the metric and nonmetric scaling models. Both envelope the same range of models so we will treat them together. The second distinction is made w.r.t. to the squareness of the matrix: square matrices are the regular ones while the rectangular matrices make part of the unfolding models. A third distinction is made w.r.t. the third way: is this way a replication of the two-way design or not? Finally, for each of the square matrices described so far there exist models for symmetric as well as asymmetric matrices. All this is summarized in table 1.
Metric Multidimensional scaling
We start with the most simple case of multidimensional scaling: this is when the data are quantitative. In classical scaling the proximities are treated directly as distances. However, the matrix of (dis)similarities P should be preprocessed in order to have a metric. Two properties have to hold: these are called non-degeneracy and triangular inequality. Non-degeneracy means that for all i and the triangular inequality states that for all triplets (i, j, k). The matrix obtained after pre-processing is labeled D. It can be shown that the elements of the double centered dissimilarity matrix D equal minus two times the scalar products:
with i the row index, j the column index, n the number of objects and m the number of dimensions. Then the matrix of scalar products is:
where I is an n by n identity matrix and i a unity vector of length n. This matrix is symmetric and positive semidefinite. Performing the singular value decomposition (SVD), B=VLV’, we can define B=XX’ with X=VL1/2 being the matrix of coordinates. The eigenvectors are supposed to be normed. Retaining only the first r eigenvectors leads to a solution in lower dimensionality: this implies that the summation over a in equation (4.1.) runs over 1 to r instead of m. This is the best lower-rank approximation in the least-squares sense.
Nonmetric Multidimensional scaling
In case of ordinal data we want to recover the order of the proximities and not the proximities or a linear transformation of the proximities. This means that another procedure has to be followed than the use of singular value decomposition, namely a procedure which is invariant under monotonic transformations. A first solution to this problem was given by Shepard (1962) and was algorithmically considerably refined by Kruskal (1964). They propose to minimize a fit measure called Stress by an iterative algorithm. Stress is calculated according to (3.2.) and is minimised in two ways:
The iterative algorithm can be described in four steps:
Figure 1 summarizes the algorithm.
Figure 1. The iterative MDS-algorithm
The algorithm described by Figure 1 is a general one for which different implementations exist. In fact there are several ways to obtain an initial configuration, of optimal scaling and of updating the coordinates. Even more than one Stress-measure exists and and this can lead to different solutions. The most popular algorithm is ALSCAL (Takane, Young and De Leeuw, 1977) or alternating least squares scaling which is (was) implemented in SAS and SPSS and in which all models mentioned so far can be implemented. We might expect PROXSCAL to be the next popular and flexible algorithm (Busing, Commandeur and Heiser, 1996; Busing, 1998).
Two cases are to be distinguished in the nonmetric phase: the primary and secondary approach to ties. In the primary approach to ties, the measurement process is considered to be continuous and there are no equality restrictions while in the secondary approach a discrete measurement process is presupposed and an equality restriction is implied:
CMDS stands for Classical Multidimensional Scaling. The proximity-matrix is two-way and symmetric. The Euclidean distance model is the one given under the basic equation (3.1.). Metric scaling is based on the SVD of the double-centered matrix with Euclidean distances. For non-metric scaling an iterative algorithm is used.
RMDS stands for Replicated Multidimensional Scaling. In this case the proximity matrix is three-way and symmetric. The third way (and second mode) consists of replications of the design. Replicated Multidimensional Scaling is done by performing a MDS on each of the designs separately or by reducing the different submatrices to one (e.g. by taking the mean of all submatrices).
The proximity matrix analysed in the Weighted Multidimensional Scaling Models is three-way and symmetric. The third way (and second mode) is not a mere replication, but is considered to be a separate source of systematic variability in the observed proximities. Hence, the distance is calculated according to the weighted Euclidean distance model:
with k the index of the third way.
This model was extended to a more general one which is based on the generalized Euclidean distance:
with Wk an m by m weight-matrix and G the matrix of coordinates of the group space. Weighting can be done in several ways by setting restrictions on the matrix of weights. In fact the previous models can be seen as special cases of the generalized MDS models, e.g. using an identity matrix for the weights results in CMDS. In (4.3.) we considered the so called INDSCAL models (individual scaling) which allow different dimension weights for each subject and suppose Wk is diagonal. The more general model is known as the IDIOSCAL model (idiosyncratic weighting) and this model allows a separate stretching as well as a separate rotation of the group space for each subject.
A major reference for the INDSCAL approach is the article by Caroll and Chang (1970). Noteworthy is the development of an algorithm which allows to classify subjects in classes having the same matrix of weights Wc (see Winsberg and De Soete, 1993).
Different approaches for the analysis of asymmetric proximity data were proposed (see Zielman and Heiser, 1996). Here we consider only those which take the asymmetry into account and do not consider it as random error (e.g. by taking the mean of pij and pji). An interesting proposition was to split the original data in two parts, one containing the symmetric part of the data and another containing the asymmetric part (Weeks and Bentler, 1982):
with M equal to ˝ (P+P’) and N equal to ˝ (P-P’). This decomposition leads to a partitioning of the sum of squares of the proximities in a part due to symmetry and a part due to skew-symmetry (the cross-product term equals zero as M and N are uncorrelated):
With this formulation the proportion of variation due to skewness in the data is easily obtained. M is analysed with one of the models for symmetric data while N can be handled in different ways. One solution is based on a SVD to produce plots of vectors for which the area of the triangle spanned by two vectors is proportional to the amount of asymmetry between these points. Such a plot is called a Gower diagram (see Gower and Hand, 1996). Borg and Groenen (1997) proposed another way to treat N.
In the case of an asymmetric replicated proximity matrix, ASYMSCAL models are used on each of the matrices separately or on one compound matrix obtained by taking the different submatrices together.
When the different replications of an assymetric proximity matrix differ in a substantial way but an underlying common space is presupposed, weighted asymmetric multidimensional scaling should be applied. However, this goal has not yet been achieved (see Zielman & Heiser, 1996). A possible detour is to use an unfolding model for three-way data, an approach taken by DeSarbo & Carroll (1985).
Unfolding models apply to rectangular matrices, i.e. to dominance relations. Both metric and nonmetric unfolding models exist. A classical dataset for the unfolding models consists of objects which are ranked by different subjects according to their preference for the objects. The aim of the unfolding is to represent the subjects and the objects in a space such that the order of the distances between a person and the different objects corresponds to the preference ordering. One way to do this is to apply multidimensional scaling as treated previously, but on an incomplete square matrix:
Figure 2. Incomplete matrix of proximities between two sets formed by p objects and n subjects
This is done in ALSCAL, using an appropriate stress-formula called Stress2 (see Borg & Groenen, 1997) to prevent a degenerate solution. In case of such a solution, the obtained result will be good in terms of stress but it will be meaningless (e.g. the objects cluster together and the subjects cluster together so only two points are obtained). Using Stress2 still leaves too much freedom such that a degenerate solution can still be obtained. This implies the addition of restrictions to the solution in order to counteract a degenerate result. ALSCAL permits to fix coordinates which implies that a certain knowledge of the underlying structure is needed. PROXSCAL has this option also and can even fix coordinates w.r.t. external variables. However, the algorithm does not provide in the use of rectangular data and the Stress2 measure.
Replicated Multidimensional Unfolding comes down to the application of classical multidimensional unfolding on each of the rectangular matrices separately or on one rectangular matrix which resulted from joining the different submatrices.
Weighted Multidimensional Unfolding applies to three-way three-mode data for which the replications (third way) are not considered to be exchangeable. This problem could be handled as a WMDS with missing values and using Stress2. However, the solution obtained will be difficult to interpret as the order of distances between a subject point and several object points in the common space does not necessarely reflect the preference order. Each subject rather has his own private preference space (see Borg and Groenen, 1997). DeSarbo and Carroll (1985) developed a model for the unfolding of three-way (three-mode) metric data. Their algorithm foresees in the problem of degenerate results by user-provided weights gijk in the loss-function (previously called Stress):
MDS is considered as an exploratory technique primarily but algorithms were developed based on maximum likelihood estimation (Ramsay, 1977) and more recent including the EM-algorithm (see Winsberg en De Soete, 1993). This means that inference is possible. However, as the algorithms used for inference make (different) assumptions about the distributions of the error one should be careful with this. A useful application of the EM-based algorithm for the WMDS models, is the determination of the number of classes that can be distinguished in the third way.
Confirmatory MDS is made possible by the Procrustes analysis. It comes down to matching one configuration to another and producing a measure of the match. Matching is done by seeking the rotation, translation and dilation that minimize the sum of squared distances between the points of the two configurations Y and X:
with R˛ the Procrustes statistic.
The example we will use bears on relationships among kins. The data are due to Rosenberg and Kim (1975). They studied the perceived similarities between 15 kinship terms (aunt, brother, cousin, daughter, father, granddaughter, grandfather, grandmother, grandson, mother, nephew, niece, sister, son and uncle) by giving students a sorting task. The students themselves were subdivided in six groups according to their gender and the sorting condition they belonged to: in the single-sort condition the terms had to be sorted only once ("… sort the 15 words into categories on the basis of some aspect of meaning. (…) You need not even know what the basis is yourself.", Rosenberg and Kim p. 491) while students in the multiple-sort condition had to give at least a second partition based on a different basis of meaning ("… mix the 15 words up and sort the words again. This time, however, you must use a different basis of meaning.", Rosenberg and Kim p. 491).
The analysis was done using the MDS procedure in SAS which is based on the ALSCAL algorithm. The plots were made using S-Plus. Busing (1998) and Busing, Commandeur and Heiser (1996) analysed the same data using PROXSCAL. Their results can be found on the internet: http://www.fsw.leidenuniv.nl/www/w3_ment/proxscal/proxscal.html. The datamatrix is three-way symmetric with no replications as each of the six conditions might lead to a different structure. Therefore an INDSCAL model was used.
First we take a look at the fit of the group structure for one upto five dimensions. In SAS the fit is evaluated by Stress1 which can be calculated as follows:
The value of Stress1 in function of the dimensionality of the solution is shown in Figure 3 for the solution with condition weights applied to the dimensions of a common space.
Figure 3. Stress1 in function of the number of dimensions for the group structure
Both the two-dimensional and three-dimensional structure are suggested as they both lead to a considerable reduction in stress. We also show the graphs for each of the conditions separately. This is illustrated by Figure 4. In general we get the same pattern as for the group structure: the two- and three-dimensional structure are suggested. Two conditions behave somewhat different: the need for a third dimension is most obvious for the females in the single-sort condition and females in the first-sort condition have lower stress-values than the other conditions.
Figure 4. Stress1 in function of the number of dimensions for the individual structures
First we take a look at the solution for the group-structure in two dimensions. This is shown in Figure 5. The first dimension can be interpreted as a separation between the sexes: on the left we find terms indicating male kin and on the right we find terms indicating female kin. Cousin is in between as it can indicate both. The second dimension is one of degree of kinship: the bottom of the space is occupied by people related in the first degree (father, mother, daughter, son, brother and sister), then we find people related in the second degree (grandfather, grandmother, granddaughter and grandson), then people related in the third degree (aunt, uncle, niece and nephew) and finally we get the fourth degree (cousin).
Figure 5. Two-dimensional MDS solution
To obtain an idea about the individual structures, we consider the weights given by the different groups to each of the two dimensions. This is shown in table 2.
Females in the first-sort condition put more emphasis on the first dimension and less on the second dimension, this in contrast to females in the second-sort condition. Males in the multiple-sort condition do not make this distinction between the first and the second sort.
Next we take a look at the group structure in a three-dimensional configuration (see Figure 6).
Figure 6. Three-dimensional MDS solution
The first dimension shows a clear separation of the genders: males are situated on the left and females on the right. The position of cousin is in between the two sexes. The second dimension can be interpreted as one of degree of kinship. Going from the back to the front, we first have those who are related in the fourth/third degree, then those who are related in the first degree and finally those who are related in the second degree. Compared to the previous solution, the rank order in the degrees of kinship is not completely recovered. The last dimension seems to separate the nuclear family from the others as we find father, mother, son, daughter, brother and sister on the positive side and the others on the negative side. The position of cousins is once again in between.
Next we take a look at the different conditions. The dimension weights for each of the groups are given in the table below.
In the single sort conditions, less importance is attached to the gender. Females in the first-sort condition attach more importance to gender (and less to the nuclear type) while females in the second-sort condition attach less importance to gender. Males in the multiple-sort conditions do not make this distinction.
These results are generally spoken in accordance with the ones of Busing, Commandeur and Heiser (1996). However, using PROXSCAL they were able to put some restrictions on the common space: the characteristics of the terms (gender, degree of kinship and generation) were taken into account leading to a solution which is clearer w.r.t. the interpretation. The relative differences in weighting for the different conditions coincide.
Three methods of analysis are closely related to MDS. These are principal component analysis (PCA), correspondence analysis (CA) and cluster analyis. In this section we will give a short description of PCA, CA and cluster analysis and their relation to MDS.
Principal components analysis or PCA is performed on a matrix A of n entities observed w.r.t. p variables. The aim is to search for new variables, called principal components, which are based on a linear combination of the original variables and this in a way that they account for most of the variation in the original variables. In metric CMDS a matrix of distances D between the n entities is given and the aim is to find a low-dimensional configuration of the entities such that the distances are approximated in a least-squares sense. When these distances are Eulidean distances, the coordinates contained in X do represent the principal coordinates which would be obtained when doing PCA on A. This approach is called principal coordinates analysis as well as classical scaling. A more detailed account of this correspondence can be found in Everitt and Rabe-Hesketh (1997).
Correspondence analysis is classically used on a two-way contingency table with the aim to visualize the relations (i.e. deviations from statistical independence) between the row and column categories. The same is done by the unfolding models: subjects (row-categories) and objects (column-categories) are visualized in a way that the order of the distances between a subject-point and the object-points reflects the preference-ranking of the subject. The measure of "proximity" used in CA is the chi-square distance between the profiles. A short description of CA and its relation to MDS can be found in Borg and Groenen (1997).
Cluster analysis models or ultrametric tree models, are equally applicable to proximity data including two-way (asymmetric) square and rectangular data as well as three-way two-mode data. The main difference with the MDS models is that most models for cluster analysis lead to a hierarchical structure. The dissimilarities are approached by path distances under a number of restrictions. The path distances are looked for in a way that minimizes the sum of squared errors:
the weights wij are usually equal to one but they can be set equal to zero when dissimilarities are missing. A comparative study with both real and simulated data (see Everitt and Rabe-Hesketh, 1997) showed that data with an underlying hierarchical structure result in a better fit when using tree models while data with an underlying spatial structure result in a better fit when using multidimensional scaling models. Using an inappropriate model, it appeared that scaling models performed slightly better than tree models. Hybrid models based on both techniques have been suggested.
In this section we consider the well-known algorithms. These are ALSCAL, MULTISCALE and PROXSCAL. They all have in common that they are implemented in SAS and/or SPSS. This means that they are the most used algorithms and are user-friendly. Nevertheless one is not restricted to these. Everitt and Rabe-Hesketh (1997) give an extensive overview of the available software programs (except for PROXSCAL) with indication of the type of models they can handle and the place to find them.
The table below shows the three algorithms. For each of them the models which they can handle are indicated by an ‘X’. Another line is included with the place to find the software: this can be in SAS, SPSS, upon request or on the internet. The last row indicates the algorithm which is used in the model estimation phase of the program: this can be alternating least squares (ALS), maximum likelihood (ML) or iterative majorization (IM). One more row is included to indicate if it is possible to constrain the solution to fixed coordinates or to coordinates fixed w.r.t. external variables in a certain way (e.g. ordinal relation).
The braquets around the marks for the unfolding models mean that the program can handle the unfolding problem as a square MDS problem with missing data but does not include the option to minimize Stress2 and is not applicabele to rectangular data. For unfolding models Busing, Commandeur and Heiser (1996) refer to the SMACOF-III unfolding program. This is the only program of the older SMACOF series which wasn’t consolidated into PROXSCAL.
On the World Wide Web there exists a site with a collection of computer-algorithms and accompanying manuals: http://netlib.uow.edu.au/mds/.
Borg, I., & Groenen, P. (1997). Modern multidimensional scaling. Theory and applications. New York: Springer.
Borg, I., & Groenen, P. (1997). Modern multidimensional scaling. Theory and applications. New York: Springer.
Busing, F. M. T. A. (1998). PROXSCAL: User’s guide for version 6.3. Retrieved October 8, 1999 from the World Wide Web: http://www.fsw.leidenuniv.nl/www/w3_ment/proxscal/proxscal.html.
Busing, F. M. T. A., Commandeur, J. F., & Heiser, W. J. (1996). PROXSCAL: A multidimensional scaling program for individual differences scaling with constraints. Retrieved October 8, 1999 from the World Wide Web: http://www.fsw.leidenuniv.nl/www/w3_ment/proxscal/proxscal.html.
Carroll, J. D., & Arabie, P. (1980). Multidimensional scaling. Annual Review of Psychology, 31, 607-649.
Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an n-way generalization of "Eckart-Young" decomposition. Psychometrika, 35, 283-319.
Coombs, C. H. (1964). A theory of data. New York: Wiley.
Cox, T. F., & Cox, M. A. A. (1994). Multidimensional scaling. London: Chapman & Hall.
DeSarbo, W. S., & Carroll, J. D. (1985). Three-way metric unfolding via alternating least squares. Psychometrika, 50, 275-300.
Everitt, B. S., & Rabe-Hesketh, S. (1997). The analysis of proximity data. London: Arnold.
Gower, J. C., & Hand, D. J. (1996). Biplots. London: Chapman & Hall.
Guttman, L. (1954). A new approach to factor analysis: the radex. In P. Lazarsfeld (Ed.), Mathematical thinking in the behavioral sciences (pp. 258-348). New York: Free Press.
Kruskal, J. B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1-27.
Kruskal, J. B. (1964b). Nonmetric multidimensional scaling: a numerical method. Psychometrika, 29, 115-129.
Ramsay, J. O. (1977). Maximum likelihood estimation in multidimensional scaling. Psychometrika, 42, 241-266.
Rosenberg, S., & Kim, M. (1975). The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.
Shepard, R. N. (1962a). The analysis of proximities: multidimensional scaling with unknown distance function Part I. Psychometrika, 27, 125-140.
Shepard, R. N. (1962b). The analysis of proximities: multidimensional scaling with unknown distance function Part II. Psychometrika, 27, 219-246.
Strang, G. (1993). Introduction to linear algebra. MA: Wellesley Cambridge Press.
Takane, Y., Young, F. W., & De Leeuw, J. (1977). Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika, 42, 593-600.
Young, F. W. (1987). Multidimensional scaling: History, theory and applications. Hillsdale, NJ: Lawrence Erlbaum.
Young, F.W. (1999). Multidimensional scaling. Retrieved October 15, 1999 from the World Wide Web: http://forrest.psych.unc.edu/teaching/p208a/mds/mds.html
Young, G., & Householder, A. S. (1941). A note on multidimensional psycho-physical analysis. Psychometrika, 6, 331-333.
Weeks, D. G., & Bentler, P. M. (1982). Restricted multidimensional scaling models for asymmetric proximities. Psychometrika, 39, 201-208.
Winsberg, S., & De Soete, G. (1993). A latent class approach to fitting the weighted euclidean model, CLASCAL. Psychometrika, 58, 315-330.
Zielman, B., & Heiser, W. J. (1996). Models for asymmetric proximities. British Journal of Mathematical and Statistical Psychology, 49, 127-146.