Free Essay

Dave’s Crash Course in Statistics Using Spss

In: Other Topics

Submitted By ahbean90
Words 2971
Pages 12
Dave’s Crash Course in Statistics using SPSS

1.0 Classifying the different types of data

There are four types of variables: nominal, ordinal, interval and ratio. Distinguishing between these types of variables is important, as several statistical tools may only be used for certain types of data.

Nominal variables: where values are assigned to categories in no particular order. This assignment of values is arbitrary and holds no particular meaning or order to them.
For example, “sex” where 1=male 2=female “marital status” where 1=never married 2= married 3=defacto “yes/no type questions” where 1=yes 2=no.

Ordinal variables: where values are assigned to categories that are related to each other in some logical order – such as ascending or descending order.
For example, “age group” 1=under 21yrs 2=21-35yrs 3=35-49yrs 4=50 yrs and over “education” where 1=high school completed 2=tertiary studies completed 3=post-graduate studies completed.
The higher the value assigned, the higher the category (ie. higher age group or education level).

Interval variables: where the values assigned are ordered in the same way as ordinal variables, however, the intervals or distances between the categories are equally spaced.
For example, “please rate the importance of the following attributes…” according to the scale 1----------2----------3----------4----------5 where 1=strongly disagree 2=somewhat disagree 3=neither disagree nor agree 4=somewhat agree 5=strongly agree.
All Likert type scales are interval in nature.

Ratio variables: hold the characteristics of interval type variables with the addition that they have an absolute zero point. It arises most often from questions whose responses are numeric in nature. These numeric responses are then used as the coding values themselves.
For example, “what is your age in years? __________years” if the response is 8 years, then the value of 8 is assigned to that person’s age. “how far in terms of meters do you walk per day? __________meters” if the response is 200 meters, then the value of 200 is assigned to that person.

2.0 Different types of statistics for different types of data

Different types of descriptive and inferential statistics are used depending on whether the data is nominal, ordinal, interval, or ratio in nature.
Descriptive statistics: are used to describe the sample that you have at hand.
Inferential statistics: are used on the sample to make inferences or generalizations about the bigger population.

| |Type of statistics |
| |Descriptive |Inferential |
|Type of |Nominal |Range, min/max |Chi-square ((2) |
|data | |Frequencies | |
| |Ordinal |Mode | |
| | |Median | |
| |Interval |Range, min/max |Correlations |
| | |Standard deviation |T-test |
| |Ratio |Variance |Indep samples |
| | |Mean |Paired samples |
| | | |One-way ANOVA |
| | | |Factor Analysis |
| | | |Multiple regression |

3.0 Procedure of this tutorial

This tutorial will use the Case 5: “Values and the Automobile Market” found in your Zikmund text (p.694). If you haven’t already read the case, do so now before proceeding further.

Do: On the side of Case Exhibit 5.4, or the side of the corresponding questionnaire found in the Appendix at the end of this SPSS Crash Course, make a note to indicate whether the variable is nominal, ordinal, interval, or ratio. Age is ordinal; Sex is nominal…and so on. If you are not sure, review the definitions and examples given earlier under “Classifying the different types of data”. Classifying the data now will ensure we do not encounter any problems when performing statistical procedures later.

The corresponding data file for the research done in this case may be found on the CBS network under: N:>COMMON>MR200>Luxury.sav (PERTH students only) Or, alternatively, download this file from Blackboard. (All OFFSHORE students)

Do: Open Luxury.sav

4.0 Getting familiar with the SPSS screen

SPSS works similarly to MS Excel and other spreadsheets. In the SPSS Data Editor, each column represents an item or response in the questionnaire and each row represents a different respondent.

Example

In the Luxury.sav file, each column (age, sex, educ, …) corresponds to each question in Case Exhibit 5.4. Each row corresponds to the different people who took part in the survey. The size of the sample will thus dictate how many rows there are. In this case, the sample size is n=155.

Note that there are some blank cells in this data. These blank cells indicate a non-response. Therefore, if the respondent fails to provide a response to any item on the questionnaire (whether he is meant to or not, or even if he skips an entire section because it is not relevant), that cell will be left blank. Do not put zero.

5.0 Assigning labels and values to your variables

Each item in your questionnaire (ie. the columns) is called a variable in SPSS. You may assign names, labels and values to each of these variables under the Variable View in SPSS. On the bottom left of the screen, you will find tabs that will take you between Data View and Variable View.

Example

Do: Go to the Variable View

In most cases, one would normally only make changes to the “Name”, “Label”, and “Values” columns. You should leave the other columns unchanged.

In this example, the “Names” of the variables have already been filled in for you. The “Label” column gives you an opportunity to further describe what that variable means. Once changed, these “Labels” will appear every time you construct a table or graph.

Do: For the variable “age”, type in a “Label” of “age of car owner”, similarly, under “sex”, type in a label of “gender of car owner”, and so on. When you get to assign labels to “issues 1 – 20”, refer to Case Exhibit 5.1, and use some keywords as the labels. Such as, for “issue 1”, label it “fun and excitement”, for “issue 2”, label it “being good to myself”, and so on. Similarly, the labels for the other variables may be found in Case Exhibit 5.2 and 5.3 respectively.

The next step is to assign “Values” to each of the possible responses for each question. These “Values” are indicated in Case Exhibit 5.4, where if the respondent is “35 years and under”, we assign him a “Value” of “2”… and so on. This is done as follows:

Do: For the variable “age”, click on the “Values” cell then on the (…) button, and a window will appear. In this window, do the following: Value = “2”, Value Label = “35 years and under” ( click on “Add” Value = “3”, Value Label = “36 to 45 years” ( click on “Add” and so on… when you have assigned all the values for age, click OK similarly, assign values to the other variables except for the “ISSUES”, “ATTRIBUTES”, and “VALUES” variables (see Case Exhibit 5.4) where the scale of 1 to 7 is self explanatory.

When you have assigned all the values for the variables, go back to the “Data View”.

Do: Try clicking on the ‘red tag’ button on the top tool bar called ‘value labels’. What do you see changing? These labels instead of their numbers will appear in any statistics that you perform from now on.

6.0 Editing the data

Sometimes we may not be entirely happy with the data we have. For example, we may have too many categories for a particular variable, or maybe we have asked the question in the wrong way, and the response is given in a different scale (nominal, ordinal, interval, or ratio) than what you would have liked to use in our analysis. There are several editing procedures we can use in the “Transform” menu. We will look at 2 situations here.

It is possible to re-categorize your data.

Example

If you have 5 “education” categories where 1=less than high school, 2=high school grad, 3=some college, 4=college grad, and 5=graduate degree; as in Case Exhibit 5.4, and you are only interested in whether the respondent is a graduate or non-graduate, you can re-code this data from 5 categories to just 2 categories.

Do: click on the menu Transform>Recode. Use “into different variables” (recommended), to create a new column in your data sheet so that your old values remain intact for future references. Select “educ” from the left selection box and click on the arrow to bring it to the middle box. Give the output variable a name (suggest: “neweduc”), and a label (suggest: “grad/nongrad”) and click on “change”. Click on “old and new values”. For the moment, we will make a mental note that if the person is a graduate (ie. Categories 4 or 5), we will assign him a new value of “1”. On the other hand, if the person is a non graduate (ie. Categories 1, 2, or 3), we will assign him a new value of “2”. Now do the following changes: Old value 1 ( New value 2, click Add Old value 2 ( New value 2, click Add Old value 3 ( New value 2, click Add Old value 4 ( New value 1, click Add Old value 5 ( New value 1, click Add, then Continue and OK Now scroll horizontally to the right of the Data View, and you will note that our new variable “neweduc” is created as a new column there. Go to the Variable View and you will find the new variable “neweduc” at the bottom of the list. Assign the values 1=graduate and 2=non graduate for the “neweduc” variable using the same procedure shown in Section 5.0.

It is also possible to re-compute ratio or interval data into a ‘lesser’ scale. For example, ‘age in years’ (ratio) may be re-coded into predefined ‘age groups’ (ordinal). The reverse, however, does not hold. You cannot re-code a nominal or ordinal scale into a ‘higher’ scale due to the lack of information in the data. For example, we have collected ‘income categories’; we cannot re-code this into ‘income dollars’ because we do not know what their exact income is. If you wish to do such a transformation of data, in SPSS try transform>compute, name your target variable, say ‘income category’, assign a value (numeric expression), say “1”, and click on the “If’” button to set the conditions mathematically, say ‘‘income dollars’Descriptive Statistics>Frequencies (suitable for nominal and ordinal data) Analyze>Descriptive Statistics>Descriptives (suitable for interval and ratio data)

Example 1

What are the proportions of respondents in each age (ordinal) and gender (nominal) group?

Do: Click on the menu Analyze>Descriptive Statistics>Frequencies Individually select “age” and “sex” from the selection box and click on the arrow to bring them to the right hand box. Under “Statistics”, check on Median. Mode, Range, Minimum, and Maximum Under “Charts”, select the appropriate chart (suggest: pie charts) Click on OK

Explanation: The statistics show 10 missing values. This means 10 respondents did not indicate their age and sex. The valid percentage indicates the proportion of respondents in each age or sex group without counting those missing values. You may cut and paste these tables and charts onto another application such as MS Word or MS Excel.

Example 2

What is the most important car attribute (interval) to luxury car buyers?

Do: Click on the menu Analyze>Descriptive Statistics>Descriptives Select all the car attributes (see Case Exhibit 5.2) from the selection box and click on the arrow to bring them to the right hand box. Click on OK

Explanation: A table appears in the output file showing the descriptive statistics for the various car attributes. In order to answer the above question on which attribute is the most important, we should look at the means or the average importance ratings for each of the attributes. Which attribute is the most important?

Note that for your output to be meaningful, interval and ratio variables should be described using min./max., means and standard deviations while nominal and ordinal variables should be described using medians, modes, min./max., range, and frequencies. In NO case should you use means and standard deviations to describe nominal or ordinal data – a mean “sex” of 0.61 does not mean anything.

8.0 Inferential Statistics

The aim of inferential statistics is to make inferences or judgments about the population on the basis of the sample. Because the characteristics of a sample are never exactly the same as that of the population, sampling error occurs. A measure of this error in inferential statistics is called the “Significance Level” or merely “sig.” as shown in SPSS outputs. In marketing (and many social sciences), the maximum acceptable level of error allowed is 5% (or sig.=0.05). If the error is greater than 5% (sig.>0.05), making inferences about the population based on the sample is unreliable. If the error is less than 5% (sig.Crosstabs Select “car” into the row and “educ” into the column (or the other way round makes no difference) Under “Statistics”, check on “Chi-square”, click Continue Under “Cells”, check both “Observed” and “Expected” counts, click Continue Click OK

Explanation: the first thing to look for is the significance level. We need to verify if the level of error is less than 5% (ie. Sig.Correlate>Bivariate Select all the “Issues” (“Issues 1 to 20”) and click on the arrow to bring them to the Variables box Click on OK

Explanation: a fairly large table will appear in the SPSS output. This is called the Correlation Matrix, which shows the correlations between all possible combinations of any two “issues”. In each cell, there are three numbers. The top number is the Pearson Correlation Coefficient (r) mentioned above, the second number is the Significance Value, and the third number N is the number of respondents who responded to that question. The first thing that you should look at is the significance value, as we need to first establish if Sig.Independent Samples T-test Select all the “Issues” (“Issues 1 to 20”) and click on the arrow to bring them to the Test Variables box Select “neweduc” (the new variable you created earlier) and click on the arrow to bring it to the Grouping Variable box Click on “Define Groups” to define the codes used for “grad” (group 1) and “nongrad” (group 2) as follows: Group 1 = 1, Group 2 = 2. Click on Continue then OK

Explanation: two large tables will appear in the SPSS output. The first table shows some summary statistics for each group - graduates and non-graduates. The second table contains the statistics for the T-test. We need to examine this table first to determine the significance value for each of the “issues”. This second table is divided to two sections – the “Levene’s Test for Equality of Variances” (left side of table) and the “T-test for Equality of Means” (right side of table). The value that interests us is the “Sig. (2-tailed)” under the “T-test for Equality of Means” section of the table. If there is a significant difference between the groups’ ratings of an “issue”, the “Sig. (2-tailed)” will have a valueCompare Means>One-way ANOVA Select all the “VALUES” (see Case Exhibit 5.3) and click on the arrow to bring them to the “Dependent List” box. Select “car” and click on the arrow to bring it to the “Factor” box. Click on “Post Hoc” and check “Sheffe”, and then click on Continue. Click on “Options” and check “Descriptive” and “Means Plot”, and then click on Continue then OK.

Explanation: the interpretation of the One-way ANOVA output is a three step procedure. Firstly, we need to look at the “ANOVA” table (second table) to determine which variables have a Sig.Linear.

End of Crash Course

Appendix: Values and the Automobile Market Case Questionnaire

The following questionnaire was used in the case for this SPSS crash course. The sections in this questionnaire correspond to the “List of Variables and Computer Codes” in Case Exhibit 5.4 on p.696 of your Zikmund text.
Section 1: Demographics.
Section 2: Issues that Consumers Consider when Buying Luxury Automobiles (from Case Exhibit 5.1).
Section 3: Car Attributes (from Case Exhibit 5.2).
Section 4: List of Values (from Case Exhibit 5.3).

Questionnaire

Luxury Automobile Survey

We are surveying people who have purchased a luxury automobile within the last year. Your help in completing this questionnaire is much appreciated. Please be assured that all your responses are confidential and will not be linked to you in anyway.

|1 |To analyze the information we get from this survey, we need to be able to classify information. The information about |
| |yourself will not be used for identification, but used only for establishing broad categories. |

|A |What is your Age? (years). |
| |[2] |35 & Under |[4] |46 - 55 |[6] |65 + |
| |[3] |36 - 45 |[5] |56 - 65 | | |

|B |What is your Gender? |
| |[1] |Male |[0] |Female |

|C |What is your Highest Formal Qualification? |
| |[1] |Not Completed High School |[4] |College Graduate |
| |[2] |High School Graduate |[5] |Graduate Degree |
| |[3] |Attended some College | | |

|D |What is your Personal Annual Income Before Tax? (Dollars/Year Before Tax). |
| |[1] |Less than $35,000 |[3] |$50,001 – $65,000 |
| |[2] |$35,000 - $50,000 |[4] |$65,000 + |

|E |What Type of Luxury Car do you currently own? |
| |[1] |American Car |[2] |European Car |[3] |Japanese Car |

|2 |Please rate the extent to which you agree or disagree with the following statements |Strongly |Strongly |
| |regarding Buying a Luxury Automobile. |Agree |Disagree |
| |(Please circle one number for each statement). | | |
| |Having a luxury car is a major part of my fun and excitement. |1 |2 |
| |Comfort |1 |2 |
| |

-----------------------
+

+

+

+

+

+

+

+

Very strong positive correlation, r close to 1

Very weak correlation, r close to 0

Very strong negative correlation, r close to -1

Factor 1

Factor 2

Factor 3

Similar Documents