Multivariate Analysis

Multivariate Discriminant Analysis
Priyanshi Gupta

An Overview
 MDA is a statistical technique used to classify an observation into one of the several a priori groupings dependent on the observation’s individual characteristics. It is used primarily to classify and/or make predictions in the problems where dependent variable comes in qualitative form, for example, male or female, bankrupt or non-bankrupt etc.  So the first step is to establish explicit group classifications. We have got observations coming from k groups. We are trying to look at what is the best way or best function in order to discriminate observations coming from different groups.

 Once such function is in place, we go to classification which basically is the problem of classification of a new observation into appropriate population using the discriminant function.
 So typically in such problems, once you have a set of data (called LEARNING set of data) with observations possibly coming from different populations are pre-classified, having predefined memberships to the groups. And based on the particular previously classified data, we create a discriminant function and can use it after proper calibration to classify a new observation to be coming from one of the groups.  Discriminant analysis is used when groups are known a priori.

Types of DA Problems

 2 Group Problems...
…regression can be used

 k-Group Problem (where k>=2)...
…regression cannot be used if k>2

Example of a 2-Group DA Problem: ACME Manufacturing
 All employees of ACME manufacturing are given a pre-employment test measuring mechanical and verbal aptitude.

 Each current employee has also been classified into one of two groups: satisfactory or unsatisfactory.
 We want to determine if the two groups of employees differ with respect to their test scores.  If so, we want to develop a rule for predicting whether new applicants will be satisfactory or unsatisfactory.
Cliff T. Ragsdale, Spreadsheet Modeling & Decision Analysis, A Practical Introduction to Management Science 5th edition

Graph of Data for Current Employees
45
Group 1 centroid

Verbal Aptitude

40

Group 2 centroid

C1

35

C2

30
Satisfactory Employees Unsatisfactory Employees

25 25 30 35 40 45 50

Mechanical Aptitude

Cliff T. Ragsdale, Spreadsheet Modeling & Decision Analysis, A Practical Introduction to Management Science 5th edition

Discriminant analysis
 Discriminant analysis is used to analyze relationships between a non-metric dependent variable and metric or dichotomous independent variables.

 Discriminant analysis attempts to use the independent variables to distinguish among the groups or categories of the dependent variable.
 Discriminant analysis – creates an equation which will minimize the possibility of misclassifying cases into their respective groups or categories.  The usefulness of a discriminant model is based upon its accuracy rate, or ability to predict the known group memberships in the categories of the dependent variable. For this we make use of the HOLD-OUT sample.

Objectives of Discriminant Analysis
 Determining which of the independent variables account the most for the differences in the average score profiles of the two or more groups.

 Establishing procedures for classifying statistical units (individuals or objects) into groups on the basis of their scores on a set of independent variables.
 Establishing the number and composition of the dimensions of discrimination between groups formed from the set of independent variables.

Assumptions of Discriminant Analysis
 The observations are a random sample
 Multivariate normality: Independent variables are normal for each level of the grouping variable.  Homogeneity of variance/covariance (homoscedasticity): Variances among group variables are the same across levels of predictors. Can be tested with Box's M statistic.  Multicollinearity: Predictive power can decrease with an increased correlation between predictor variables.  Independence: Participants are assumed to be randomly sampled, and a participant’s score on one variable is assumed to be independent of scores on that variable for all other participants.

Discriminant functions
 It is similar to regression analysis

 A discriminant score can be calculated based on the weighted combination of the independent variables
 Di= a + b1x1i + b2x2i +…+ bnxni  D is predicted score (discriminant score), x is predictor and b is discriminant coefficient

 MDA sets variate’s weights to maximize between-group variance relative to within-group variance
 If group size is equal, the cut-off is mean score.  If group size is not equal, the cut-off is calculated from weighted means. Conceptually, we can think of the discriminant function or equation as defining the boundary between groups.

 Discriminant scores are standardized, so that if the score falls on one side of the boundary (standard score less than zero, the case is predicted to be a member of one group) and if the score falls on the other side of the boundary (positive standard score), it is predicted to be a member of the other group.

Number of functions
 If the dependent variable defines two groups, one statistically significant discriminant function is required to distinguish the groups; if the dependent variable defines three groups, two statistically significant discriminant functions are required to distinguish among the three groups; etc.

 If a discriminant function is able to distinguish among groups, it must have a strong relationship to at least one of the independent variables.

 The number of possible discriminant functions in an analysis is limited to the smaller of the number of independent variables or one less than the number of groups defined by the dependent variable.

Discriminant scores
 The aim of the statistical analysis in DA is to combine (weight) the variable scores in some way so that a single new composite variable, the discriminant score, is produced. We can obtain a Discriminant score for each observation.  One way of thinking about this is in terms of a food recipe, where changing the proportions (weights) of the ingredients will change the characteristics of the ﬁnished cakes. Hopefully the weighted combinations of ingredients will produce two different types of cake. Discriminant analysis works by creating a new variable called the discriminant function score which is used to predict to which group a case belongs.  Discriminant function scores are computed similarly to factor scores, i.e. using eigenvalues. The computations find the coefficients for the independent variables that maximize the measure of distance between the groups defined by the dependent variable.  The discriminant function is similar to a regression equation in which the independent variables are multiplied by coefficients and summed to produce a score.

Wilks’ Lambda
 In the first step, an F-test (Wilks’ Lambda) is used to test if the discriminant model as a whole is significant. A significant lambda means one can reject the null hypothesis that the two groups have the same mean discriminant function scores and conclude the model is discriminating.  Wilks' lambda is a statistic used as a measure of the class centres separation. It is used for testing the identity of the population means.  Therefore, Wilks’ lambda plays the same role in the multivariate domain as Fisher's F for (univariate) ANOVA.

 Definition of Wilks lambda: Wilks' lambda is defined as the proportion of the group variances which is not explained by the response variable (that identifies the classes) in the classical scheme of variance decomposition. It is therefore the ratio of :
* The intra-class variance, and * The total variance. Note the difference with ANOVA's F statistic, which is the ratio of the Explained Sum of Squares to the Residual Sum of Squares.

 Wilks' lambda is therefore a number between 0 and 1. If only a small fraction of the total inertia is not explained by the existence of groups, then these groups are well separated, and their means are significantly different. Hence :  A small (close to 0) value of Wilks' lambda means that the groups are well separated.  A large (close to 1) value of Wilks' lambda means that the groups are poorly separated.

Wilks’ Lambda
 Wilks test: The Wilks’ lambda is known under the following assumptions:  All variables are normally distributed,  Classes have identical covariance matrices  Classes have identical means,  Software often display only the p-value of the test statistic rather than the value of Wilks' lambda. Software sometimes display the value of Wilks' lambda for each and every individual independent variable. These values may then be regarded as measuring the discriminant power of the corresponding variable.  Wilks' test and variable selection: Wilk's lambda may also be used for variable selection in Discriminant Analysis. It is possible to build a statistic that is approximately F distributed, and which is a function of the Wilks' lambdas pertaining to :  A given subset of variables,  And that same subset to which a new variable has been added.  An F test is the used for identifying which new variable will most increase the group separation. This variable is the added to the model.

Good and Poor Distributions
 Since we have assumed that the independent variables have normal distribution, at the end of the DA process, it is hoped that each group will have a normal distribution of discriminant scores.  The degree of overlap between the discriminant score distributions can then be used as a measure of the success of the technique.  The top two distributions in the figure overlap too much and do not discriminate too well compared to the bottom set. Misclassiﬁcation will be minimal in the lower pair, whereas many will be misclassiﬁed in the top pair

Discriminant analysis and classification
 Discriminant analysis consists of two stages: in the first stage, the discriminant functions are derived; in the second stage, the discriminant functions are used to classify the cases.  While discriminant analysis does compute correlation measures to estimate the strength of the relationship, these correlations measure the relationship between the independent variables and the discriminant scores.  A more useful measure to assess the utility of a discriminant model is classification accuracy, which compares predicted group membership based on the discriminant model to the actual, known group membership which is the value for the dependent variable.

A Classification Rule
 Compute the distance from the point in question to the centroid of each group. Assign it to the closest group.  If an observation’s discriminant score is less than or equal to some cutoff value, then assign it to group 1; otherwise assign it to group 2  What should the cutoff value be?

Cut-Off Point
 Choice of Cut-off point depends on
 Importance of correct classification,  Cost of misclassification  Prevalence (the lower the prevalence, the higher the proportion of false positives among the positive results)

 The accuracy of classification largely depends upon the selection of “optimal” cut-off point. Traditionally the cut-off point determined in studies was arbitrary, for example 0.5. This lacked theoretical justifications.

Cutoff Value
 For data that is multivariate-normal with equal covariances, the optimal cutoff value is:

Z1  Z 2 Cutoff Value = 2
 Even when the data is not multivariate-normal, this cutoff value tends to give good results.

A Refined Cutoff Value

 Costs of misclassification may differ.  Probability of group memberships may differ.

 The following refined cutoff value accounts for these considerations:
Sp  p 2 C(1 | 2)  Z1  Z2 Cutoff Value =  LN   p C(2 | 1)   2 Z1  Z2  1 
2

Cliff T. Ragsdale, Spreadsheet Modeling & Decision Analysis, A Practical Introduction to Management Science 5th edition

Problem of Misclassification
 Classiﬁcation models can err in two ways. In the bankrupt, non-bankrupt firms example:  First, the model could indicate low probability of bankruptcy when, in fact, the risk is high. This is referred to as a Type I error. The cost of this error to a creditor would be the loss of interest and principal through default. In addition, creditors could incur recovery costs in a bankruptcy proceeding.  On the ﬂip side, a model could assign high risk of bankruptcy to a low-risk ﬁrm. The resulting Type II error cost includes the opportunity cost of not lending to a good credit and lost proﬁts. For investors in the ﬁrm, the error cost may include the premature sale of securities at distressed prices.  Altman et al. (1977) provide evidence of an asymmetric cost structure, with an estimate of a Type I error cost that is higher than Type II error cost. The cost of a Type I error was estimated from the loan loss experience of banks, and the cost of a Type II error was the opportunity cost of not lending to a nonbankrupt ﬁrm because it was predicted to become bankrupt.  Altman et al. (1977) proposed the ZETA model, which achieves lower Type I error than Altman’s (1968) MDA formulation

Efforts to calculate “Optimal Cut-Off Point” for bankruptcy models
 Joy and Tollefson (1975), Altman and Eisenbeis (1978) and Altman et al. (1977) calculated the optimal cut off point using the ZETA model. Two elements in the calculation were identified. In the selection of the optimal cut-off score of the estimated model, two things should be considered:  the prior probabilities of belonging to the failing or non-failing group (i.e. population) and  the costs of a type I and a type II error  Later, Maddala(1983) developed another optimal cut off point equation:

 There have been many more attempts to calculate the optimal cut off score, however, a fixed cut-off probability that can be used in all kinds of institutional arrangements in different countries and in all times, does not exist.
Source: Kuo H, Lee C, Lin L, Piesse J. Chapter 22, Encyclopedia of Finance (2006), Springer Science+ Business Media Inc.

Some Real Life Examples
 Loan classification problem: We want to classify a new application into one of these potential risk class to decide whether or not grant loan to that individual or not. Based on past experience on the types of applications, we build up a discriminant function. We will then decide whether it is a potentially low risk or high risk application.  Warning or Alert systems for financial crises or for extreme events: in bankruptcy prediction, credit card fraud, currency crises. Looking at the present state of a firm, one tries to classify the state into the particular class of distress.  Medical Diagnostics-constant monitoring of patients on certain health parameters and then classify the state of patients health condition into critical and non-critical.

 Predicting success or failure of a new product.

Conducting Discriminant Analysis
Formulate the Model Estimate the Discriminant Function Coefficients Determine the Significance of the Discriminant Function Interpret the Results

Assess Validity of Discriminant Analysis
 Source: Malhotra ,N. (2007). Marketing Research: An Applied Orientation. Prentice Hall

Overall test of relationship
 The overall test of relationship among the independent variables and groups defined by the dependent variable is a series of tests that each of the functions needed to distinguish among the groups is statistically significant.

 In some analyses, we might discover that two or more of the groups defined by the dependent variable cannot be distinguished using the available independent variables. While it is reasonable to interpret a solution in which there are fewer significant discriminant functions than the maximum number possible, our problems will require that all of the possible discriminant functions be significant.

SPSS Activity: Discriminant Analysis
 A large international air carrier has collected data on employees in three different job classifications: 1) customer service personnel, 2) mechanics and 3) dispatchers. The director of Human Resources wants to know if these three job classifications appeal to different personality types. Each employee is administered a battery of psychological test which include measures of interest in outdoor activity, sociability and conservativeness.  SOURCE: www.ats.ucla.edu

Steps in SPSS
 Analyse >> Classify >> Discriminant

 Select ‘JOB’ as your grouping variable and enter it into the Grouping Variable Box
 Click Deﬁne Range button and enter the lowest and highest code for your groups (here it is 1 and 3)  Click Continue.  Select your predictors (PV’s) and enter into Independents box and select .

 Enter Independents Together. If you planned a stepwise analysis you would at this point select Use Stepwise Method and not the previous instruction.
** If the set of predictor variable(PVs) is smaller or the objective is to simply determine discriminating capabilities of the entire set of PVs with no regard to the impact of an individual PV, then simultaneous approach (Independents together) is used.)** **When you have a lot of predictors, the stepwise method can be useful by automatically selecting the "best" variables to use in the model. **  Click on Statistics button and select Means, Univariate Anovas, Box’s M, Unstandardized and Within-Groups Correlation

Specifying statistical output
First, mark the Means checkbox on the Descriptives panel. We will use the group means in our interpretation.

Second, mark the Univariate ANOVAs checkbox on the Descriptives panel. Perusing these tests suggests which variables might be useful descriminators.

Third, mark the Box’s M checkbox. Box’s M statistic evaluates conformity to the assumption of homogeneity of group variances.

Fourth, click on the Continue button to close the dialog box.

Details for classification - 1
First, mark the option button to Compute from group sizes on the Prior Probabilities panel. This incorporates the size of the groups defined by the dependent variable into the classification of cases using the discriminant functions.

Second, mark the Casewise results checkbox on the Display panel to include classification details for each case in the output.

Third, mark the Summary table checkbox to include summary tables comparing actual and predicted classification.

Details for classification - 2

Fourth, mark the Leave-one-out classification checkbox to request SPSS to include a cross-validated classification in the output. This option produces a less biased estimate of classification accuracy by sequentially holding each case out of the calculations for the discriminant functions, and using the derived functions to classify the case held out.

Details for classification - 3

Fifth, accept the default of Within-groups option button on the Use Covariance Matrix panel. The Covariance matrices are the measure of the dispersion in the groups defined by the dependent variable. If we fail the homogeneity of group variances test (Box’s M), our option is use Separate groups covariance in classification.

Seventh, click on the Continue button to close the dialog box.

Sixth, mark the Combined-groups checkbox on the Plots panel to obtain a visual plot of the relationship between functions and groups defined by the dependent variable.

Groups, functions, and variables
 To interpret the relationship between an independent variable and the dependent variable, we must first identify how the discriminant functions separate the groups, and then the role of the independent variable is for each function.

 SPSS provides a table called "Functions at Group Centroids" (multivariate means) that indicates which groups are separated by which functions.

 SPSS provides another table called the "Structure Matrix" which, like its counterpart in factor analysis, identifies the loading, or correlation, between each independent variable and each function. This tells us which variables to interpret for each function. Each variable is interpreted on the function that it loads most highly on.

Which independent variables to interpret
 In a simultaneous discriminant analysis, in which all independent variables are entered together, we only interpret the relationships for independent variables that have a loading of 0.30 or higher one or more discriminant functions. A variable can have a high loading on more than one function, which complicates the interpretation. We will interpret the variable for the function on which it has the highest loading.

 In a stepwise discriminant analysis, we limit the interpretation of relationships between independent variables and groups defined by the dependent variable to those independent variables that met the statistical test for inclusion in the analysis.

Comparing accuracy rates
 To characterize our model as useful, we compare the cross-validated accuracy rate produced by SPSS to 25% more than the proportional by chance accuracy.

 The cross-validated accuracy rate is a one-at-a-time hold out method that classifies each case based on a discriminant solution for all of the other cases in the analysis. It is a more realistic estimate of the accuracy rate we should expect in the population because discriminant analysis inflates accuracy rates when the cases classified are the same cases used to derive the discriminant functions.

 Cross-validated accuracy rates are not produced by SPSS when separate covariance matrices are used in the classification, which we address more next week.

Table 1
A alysis Case Pro cessin g Summary n Unweighted Cases Valid Excluded Missing or out-of -range group codes At least one miss ing disc riminating v ariable Both miss ing or out -of -range group codes and at least one m issing disc riminating v ariable Tot al Tot al N 138 7 115 Percent 51. 1 2. 6 42. 6

10 132 270

3. 7 48. 9 100.0

The minimum ratio of valid cases to independent variables for discriminant analysis is 5 to 1, with a preferred ratio of 20 to 1. In this analysis, there are 138 valid cases and 4 independent variables. The ratio of cases to independent variables is 34.5 to 1, which satisfies the minimum requirement. In addition, the ratio of 34.5 to 1 satisfies the preferred ratio of 20 to 1.

Assumption of equal dispersion for dependent variable groups
In discriminant analysis, the best measure of overall fit is classification accuracy. The appropriateness of using the pooled covariance matrix in computing classifications is evaluated by the Box's M statistic.

We examine the probability of the Box's M statistic to determine whether or not we meet the assumption of equal dispersion of the dispersion or covariance matrices (multivariate measure of variance). This test is very sensitive, so we should select a conservative alpha value of 0.01. At that alpha level, we fail to reject the null hypothesis for this analysis.
Had we failed this test, our remedy would be to re-run the discriminant analysis requesting the use of separate covariance matrices in classification.

Table 2

Pri or Pro babi lities fo r Grou p s Cas es Us ed in Analy sis Unweighted Weighted 56 56. 000 49 49. 000 32 32. 000 137 137.000

WELF ARE 1 TOO LITTLE 2 ABOUT RIGHT 3 TOO MUCH Tot al

Prior .409 .358 .234 1. 000

In addition to the requirement for the ratio of cases to independent variables, discriminant analysis requires that there be a minimum number of cases in the smallest group defined by the dependent variable. The number of cases in the smallest group must be larger than the number of independent variables, and preferably contain 20 or more cases. The number of cases in the smallest group in this problem is 32, which is larger than the number of independent variables (4), satisfying the minimum requirement. In addition, the number of cases in the smallest group satisfies the preferred minimum of 20 cases.

NUMBER OF DISCRIMINANT FUNCTIONS - 1
The maximum possible number of discriminant functions is the smaller of one less than the number of groups defined by the dependent variable and the number of independent variables. In this analysis there were 3 groups defined by opinion about spending on welfare and 4 independent variables, so the maximum possible number of discriminant functions was 2.

NUMBER OF DISCRIMINANT FUNCTIONS - 2
In the table of Wilks' Lambda which tested functions for statistical significance, the stepwise analysis identified 2 discriminant functions that were statistically significant. The Wilks' lambda statistic for the test of function 1 through 2 functions (chisquare=21.853) had a probability of 0.001 which was less than or equal to the level of significance of 0.05.

After removing function 1, the Wilks' lambda statistic for the test of function 2 (chi-square=7.074) had a probability of 0.029 which was less than or equal to the level of significance of 0.05. The significance of the maximum possible number of discriminant functions supports the interpretation of a solution using 2 discriminant functions.

Independent variables and group membership: relationship of functions to groups
In order to specify the role that each independent variable plays in predicting group membership on the dependent variable, we must link together the relationship between the discriminant functions and the groups defined by the dependent variable, the role of the significant independent variables in the discriminant functions, and the differences in group means for each of the variables.

F unctio ns at Group Cen tro id s F unct ion WELF ARE 1 2 3 1 -. 220 .446 -. 311 2 .235 -. 031 -. 362

Uns tandardized canonical discrim inant f unct ions ev aluated at group means

Function 1 separates survey respondents who thought we spend about the right amount of money on welfare (the positive value of 0.446) from survey respondents who thought we spend too much (negative value of -0.311) or little money (negative value of -0.220) on welfare.

Function 2 separates survey respondents who thought we spend too little money on welfare (positive value of 0.235) from survey respondents who thought we spend too much money (negative value of -0.362) on welfare. We ignore the second group (-0.031) in this comparison because it was distinguished from the other two groups by function 1.

Independent variables and group membership: which predictors to interpret a,b,c,d Variab les Entered /Removed

Min. D Squared Between Groups Exact F Stat When we use the stepwise method of variable inclusion, we limit our interpretation of is tic df 1 df 2 Sig.

Step 1

2

3

Entered NUMBER OF HOURS WORKED LAST WEEK R SELF-EM P OR WORKS F OR SOMEBO DY HIGHEST Y EAR OF SCHOOL COMPLE TED

Stat is tic

independent variable predictors to those listed as statistically significant in the table of Variables Entered/Removed. We will interpret the impact on membership in groups defined by the dependent variable by the independent variables: •number of hours worked in the past week •self-employment. •highest year of school completed
.475 1 135.000 .492

.023

1 and 3

.251

1 and 2

3. 289

2

134.000

.040

.364

1 and 3

2. 433

3

133.000

.068

At each step, t he v ariable t hat maximizes the Mahalanobis distance between the t wo closest simultaneous entry of all Had we use groups is entered. variables, we would not have imposed a. Max im um number of steps is 8. this limitation. b. Max im um signif icance of F to ent er is .05. c. Minim um signif icance of F to rem ov e is .10. d.

Independent variables and group membership: predictor loadings on functions
We do not interpret loadings in the structure matrix unless they are 0.30 or higher.
Structu re Matri x F unct ion 1 HIGHEST Y EAR OF SCHOOL COMPLETED NUMBER OF HOUR S WORKED LAST WEEK R SELF -EMP OR WORKS F OR SOMEBOD Y a RESPONDENTS I NCOME .687* -. 582* .223 .101 2 .136 .345 .889* .292*

Pooled wit hin-groups correlat ions between discriminating v ariables and st andardized c anonic al disc riminant f unctions Variables ordered by absolut e size of correlat ion within f unct ion. *. Largest abs olute correlat ion between each v Based on the structure matrix, the ariable and predictor variable strongly associated any discrim Based on the structure matrix, the predictor variablesinant f unct ion

strongly associated with discriminant function 1 ariable not used in the analy sis. a. This v which distinguished between survey respondents who thought we spend about the right amount of money on welfare and survey respondents who thought we spend too much or little money on welfare were number of hours worked in the past week (r=-0.582) and highest year of school completed (r=0.687).

with discriminant function 2 which distinguished between survey respondents who thought we spend too little money on welfare and survey respondents who thought we spend too much money on welfare was self-employment (r=0.889).

Independent variables and group membership: predictors associated with first function - 1
Gro up Statistics Valid N (listwise) Unweighted Weighted

WELF ARE 1 TOO LITTLE

Mean 43. 96 13. 73 1. 93 13. 70 37. 90 14. 78 1. 90 14. 00 42. 03 13. 38 1. 75 14. 75 41. 32 14. 03

Std. Dev iation

NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME 2 ABOUT RIGHT NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME 3 TOO MUCH NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME Tot al NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED

13. 240 56 56. 000 week for survey respondents who thought we

The average number of hours worked in the past spend about the right amount of money on welfare of hours worked in the past weeks for survey

2. 401 56 56. than (mean=37.90) was lower 000 the average number .260 56 56. we respondents who thought 000 spend too little

money on welfare (mean=43.96) and survey 5. 034 56 56. 000
13. 235 2. 558

respondents who thought we spend too much 50 50. 000 money on welfare (mean=42.03).
50 50. 000

This supports the relationship that "survey respondents who thought we spend about the right .303 50 50. 000 amount of money on welfare worked fewer hours in 5. 503 50. 000 the past week50 than survey respondents who thought we spend too little or much money on 10. 456 32 32. 000 welfare."
2. 524 .440 5. 304 12. 846 2. 537 32 32 32 138 138 32. 000 32. 000 32. 000 138.000 138.000

Independent variables and group membership: predictors associated with first function - 2
Gro up Statistics Valid N (listwise) Unweighted Weighted

WELF ARE 1 TOO LITTLE

Mean 43. 96 13. 73 1. 93 13. 70 37. 90 14. 78 1. 90 14. 00 42. 03 13. 38 1. 75 14. 75 41. 32 14. 03

Std. Dev iation

NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME 2 ABOUT RIGHT NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME 3 TOO MUCH NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME Tot al NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED

The average highest year 000 13. 240 56 56. of school completed for

survey respondents who thought we spend about the right amount of money on welfare 2. 401 56 56. 000 (mean=14.78) was higher than the average highest year of school 56. 000 completeds for survey .260 56 respondents who thought we spend too little 5. 034 56 56. 000 money on welfare (mean=13.73) and survey respondents who thought 000 spend too much 13. 235 50 50. we money on welfare (mean=13.38).
2. 558 50 50. 000 .303 50 50. we respondents who thought 000 spend about the right

This supports the relationship that "survey

amount of money on welfare had completed more 50 50. 000 years of school than survey respondents who 10. 456 32 32. 000 thought we spend too little or much money on welfare."
5. 503 2. 524 .440 5. 304 12. 846 2. 537 32 32 32 138 138 32. 000 32. 000 32. 000 138.000 138.000

Independent variables and group membership: predictors associated with second function
Gro up Statistics Valid N (listwise) Unweighted Weighted

WELF ARE 1 TOO LITTLE

Mean 43. 96 13. 73 1. 93 13. 70 37. 90 14. 78 1. 90 14. 00 42. 03 13. 38 1. 75 14. 75 41. 32 14. 03

Std. Dev iation

NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME 2 ABOUT RIGHT NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME 3 TOO MUCH NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED R SELF -EMP OR WORKS F OR SOMEBODY RESPONDENTS I NCOME Tot al NUMBER OF HOUR S WORKED LAST WEEK HIGHEST Y EAR OF SCHOOL COMPLETED

13. 240 56 56. 000 mean is not directly interpretable. Its interpretation 2. 401 56. 000 corresponds to56 self-employed and 2 corresponds to

Since self-employment is a dichotomous variable, the must take into account the coding by which 1

(mean=1.75), 56 when compared to the mean for survey 5. 034 56. 000

.260 56 who thought we spend 56. 000 too much money on welfare

someone else. The lower mean for survey respondents respondents who thought we spend too little money on more survey respondents who were self-employed and 50 50. fewer survey respondents 000 were working for who someone else.
50 50. 000

13. 235 50 50. 000 welfare (mean=1.93), implies that the group contained 2. 558

.303

This supports the relationship that "survey 5. 503 50 50. 000
10. 456

respondents who thought we spend too much money 32 32. 000 on welfare were more likely to be self-employed than survey respondents who thought we spend too little 2. 524 32 32. 000 money on welfare."
.440 5. 304 32 32 138 138 32. 000 32. 000 138.000 138.000

12. 846 2. 537

CLASSIFICATION USING THE DISCRIMINANT MODEL: by chance accuracy rate
The independent variables could be characterized as useful predictors of membership in the groups defined by the dependent variable if the cross-validated classification accuracy rate was significantly higher than the accuracy attainable by chance alone. Operationally, the cross-validated classification accuracy rate should be 25% or more higher than the proportional by chance accuracy rate. The proportional by chance accuracy rate of was computed by squaring and summing the proportion of cases in each group from the table of prior probabilities for groups (0.406² + 0.362² + 0.232² = 0.350).

Pri or Probabi lities for Groups Cas es Us ed in Analy sis Unweighted Weighted 56 56. 000 50 50. 000 32 32. 000 138 138.000

WELFARE 1 TOO LITTLE 2 ABOUT RIGHT 3 TOO MUCH Tot al

Prior .406 .362 .232 1. 000

CLASSIFICATION USING THE DISCRIMINANT MODEL: criteria for classification accuracy b,c Classificatio n Resu lts

Original

Count

%

a Cross-v alidated

Count

%

Predic ted Group Mem bership 1 TOO 2 ABOUT WELF ARE LI TTLE RIGHT 3 TOO MUCH 1 TOO LITTLE 43 15 6 2 ABOUT RIGHT 26 30 6 3 TOO MUCH 17 10 9 Ungrouped c ases 3 3 2 1 TOO LITTLE 67. 2 23. 4 9. 4 2 ABOUT RIGHT 41. 9 48. 4 9. 7 3 TOO MUCH 47. 2 27. 8 25. 0 Ungrouped c ases 37. 5 37. 5 25. 0 1 TOO LITTLE 43 15 6 2The cross-validated accuracy 26 computed 30 ABOUT RIGHT rate by 6 3SPSS was 50.0% which was greater than or 11 TOO MUCH 17 8 equal to the proportional by chance accuracy 4 1 TOO LITTLE 67. 2 23. 9. 4 criteria of 43.7% (1.25 x 35.0% = 43.7%). The 2 ABOUT RIGHT 41. 9 48. 4 9. 7 criteria for classification accuracy is satisfied. 3 TOO MUCH 47. 2 30. 6 22. 2

Tot al 64 62 36 8 100.0 100.0 100.0 100.0 64 62 36 100.0 100.0 100.0

a. Cross v alidation is done only f or t hose cases in the analy sis. In cross v alidation, eac h case is clas sif ied by t he f unctions deriv ed f rom all cases ot her than t hat case. b. 50. 6% of original grouped c ases correct ly classif ied. c. 50. 0% of cross-v alidated grouped cas es c orrectly classif ied.

Stepwise Discriminant Analysis


Stepwise discriminant analysis is analogous to stepwise multiple regression in that the predictors are entered sequentially based on their ability to discriminate between the groups.
An F ratio is calculated for each predictor by conducting a univariate analysis of variance in which the groups are treated as the categorical variable and the predictor as the criterion variable. The predictor with the highest F ratio is the first to be selected for inclusion in the discriminant function, if it meets certain significance and tolerance criteria.







A second predictor is added based on the highest adjusted or partial F ratio, taking into account the predictor already selected.

Stepwise Discriminant Analysis


Each predictor selected is tested for retention based on its association with other predictors selected.
The process of selection and retention is continued until all predictors meeting the significance criteria for inclusion and retention have been entered in the discriminant function. The order in which the variables were selected also indicates their importance in discriminating between the groups.





Stepwise Discriminant Analysis: Methods
 Wilks' lambda. A variable selection method for stepwise discriminant analysis that chooses variables for entry into the equation on the basis of how much they lower Wilks' lambda. At each step, the variable that minimizes the overall Wilks' lambda is entered.

 Unexplained variance. At each step, the variable that minimizes the sum of the unexplained variation between groups is entered.
 Mahalanobis distance. A measure of how much a case's values on the independent variables differ from the average of all cases. A large Mahalanobis distance identifies a case as having extreme values on one or more of the independent variables.

 Smallest F ratio. A method of variable selection in stepwise analysis based on maximizing an F ratio computed from the Mahalanobis distance between groups.
 Rao's V. A measure of the differences between group means. Also called the Lawley-Hotelling trace. At each step, the variable that maximizes the increase in Rao's V is entered. After selecting this option, enter the minimum value a variable must have to enter the analysis

Mahanabolis D
 The "Mahalanobis distance" is a rule for calculating the distance between two points. The two usual cases where the Mahalanobis distance plays an important role :  Distance of a point to the mean of a distribution,  And, distance between the means of two distributions.  Better than Euclidian Distance in certain cases: In this image, the two points A and B are equally distant from the centre µ of the distribution.  Yet, it seems inappropriate to say that they occupy "equivalent" positions with respect to O as:  A is in a low density (probability) region,  While B is in a high density (probability) region.  So, in a situation like this one, the usual Euclidian distance d ²(A, µ) = i (oi - µi)² does not seem to be the right tool for measuring the "distance" of a point to the centre of the distribution.

Mahanabolis D
 We would instead consider “two points with the same probability density” as “points equally distant from the mean" as this would make them equally probable when drawing observations from the distribution.  So, we use Mahanabolis Distance instead of Euclidian distance: D ² = (x - µ)' -1(x - µ) with the covariance matrix of the distribution. D is called the Mahalanobis distance of the point x to the mean µ of the distribution.

Mahalanobis distance and Discriminant Analysis
 Suppose you want to discriminate between two equally extended spherical classes with equal a priori probabilities. Then the best classification rule is simply to assign an observation x to the class whose centre (mean) is closer to x in the sense of the ordinary Euclidian distance.  But it is not so if the classes are not spherical anymore. We then should assign x to the class to which it has the larger probability to belong, that is the class with the largest probability density in x (because of the equal a priori probabilities), and therefore to the class with the lower value of the Mahalanobis distance of x to the class mean. For example, in the lower image of the above illustration, x should be assigned to class C1 although it is in "C2 territory" from a Euclidian point of view.

References
1. Burns, R., & Burns, R. (2008). Business Research Methods and Statistics using SPSS. California: Sage Publications Inc.

2. Hair, Black, Babin and Anderson, Multivariate Data Analysis
3. www.utexas.edu 4. www.aiaccess.net

Multivariate Analysis

Similar Documents

Multivariate Analysis

Multivariate Data

Analytics

Bivariate Statistic

Qm 1.4 Vrije Universiteit

Digital Image Processing

Likert

Assignment

Norm-Package

Ethnocentrism

Effectiveness Analysis of an Imc Plan – Analysis on Djuice.

Ar-Rahnu

Marketing

Marketing

Preparing Business Scenario Analyses

Popular Essays