Detecting Outliers - Multivariate
From PsychWiki - A Collaborative Psychology Wiki
Revision as of 20:54, 7 September 2009 by Doug
- What are bivariate and multivariate outliers? Bivariate and Multivariate outliers are outliers that occur within the joint combination of two (bivariate) or more (multivariate) variables; and are to be contrasted with univariate outliers which are outliers that occur within a single variable. See below for a concrete examples of bivariate and multivariate outliers.
- How do I detect outliers?
- One procedure for identifying bivariate outliers and identifying multivariate outliers is called Mahalanobis Distances, and it calculates the distance of particular scores from the center cluster of remaining cases. If conducting Mahalanobis Distances in SPSS, the procedure creates a new column at the end of the data file containing a calculated score for each subject. The newly calculated score is based upon the specific variables entered into the analysis. Thus, you could calculate many different Mahalanobis Distances where you enter different sets of variables into the analysis. - Imagine you have conducted a study measuring health behaviors (e.g., duration of exercise per week, eating habits, smoking habits, etc), you could look for bivariate outliers between eating habits and smoking habits, and then look separately for bivariate outliers between exercise duration and eating habits, then exercise duration and smoking habits, etc). You could also look for multivariate outliers that occur by the joint combination of all three variables. For each separate test for outliers, you would obtain separate Mahalanobis Distances scores.
- For each separate analysis, a separate score for each subject is created in a new column at the end of the data file. The Mahalanobis Distances score for each subject is considered an outlier if it exceeds a "critical value".
- The critical value is determined by a table at the back of most textbooks that takes into account the probability level you set, and the degrees of freedom. Here is a webpage that displays the table. The degrees of freedom for this test is equal to the number of variables under investigation. Thus, if you are analyzing a bivariate relationship, then degrees of freedom = 2. If you are analyzing 3 variables, then degrees of freedom = 3, and so forth. The probability level you set for this test is p < .001. - If you look at the table, you find the degrees of freedom, then scan to the right until you get to the column associated with 0.001. That is your critical value. For example, the critical value for a bivariate relationship is 13.82. Any Mahalanobis Distances score above that critical value is a bivariate outlier.
- Notice, however, that multivariate outlier analysis is just as arbitrary as univariate outlier analysis. The determination for the threshold level is arbitrarily determined, just as the threshold level for univariate outliers as 1.5* IQR and 3*IQR is arbitrarily determined.
◄ Back to Analyzing Data page