Premium Essay

Statistics 309

In:

Submitted By acumber
Words 715
Pages 3
1. Review of random variables and summary statistics
Definitions:
• Sample space: A collection of all possible outcomes of a random experiment. Usually denoted by Ω.
• Probability distribution: An assignment of numbers between 0 and 1 to each possible outcomes. Sum of these numbers should be equal to 1. Usually denoted by
P.
• Random variable: A function defined on a sample space. Usually denoted by X.
• Expected value of X: µx = E[X] =

P(ω)X(ω) ω∈Ω (if X is a continuous random variable, then the summation should be replaced by the integral).
• Variance of X:
2
V ar(X) = σx = E[(X − µx )2 ]
• Standard deviation of X:
E[(X − µx )2 ]

σx =
• Covariance of X and Y :

Cov(X, Y ) = E[(X − µx )(Y − µx )]
• Correlation of X and Y :
Cov(X, Y ) σx σx
Consider a sample {(x1 , y1 ), (x2 , y2 ), · · · , (xn , yn )}. Then,
• Sample mean: n 1 x= ¯ xi n i=1 ρxy =

• Sample variance: s2 x

n

1
=
n−1

(xi − x)2
¯
i=1

• Sample standard deviation: sx =

1 n−1 n

(xi − x)2
¯
i=1

• Sample covariance: sxy 1
=
n−1

n

(xi − x)(yi − x)
¯
¯ i=1 1

2

• Sample correlation: rxy =

sxy sx sy

Interpretation of correlation rxy
• −1 ≤ rxy ≤ 1 always.
• If rxy > 0, then large values of X tend to be associated with large values of Y , and small values of X tend to be associated with small values of Y . In this case, X and
Y are positively linearly related.
• If rxy < 0, then large values of X tend to be associated with small values of Y , and small values of X tend to be associated with large values of Y . In this case, X and
Y are negatively linearly related.
• Bigger |rxy | implies stronger linear relation between X and Y .
• If rxy = 0, then X and Y are not linearly related (they may have non-linear relationship).
• If rxy = 1, then all of (xi , yi ) are on a straight line with positive

Similar Documents

Premium Essay

Case Study 49: Property Crimes

...Case Study 49: Property Crimes First M Last (firstlast@mail.com) For Professor Beintema Managerial Statistics (GM533) Keller School of Management August 2010 I. Executive summary  Our study examined data provided by various U.S. government agencies on property crime rates in the fifty U.S. states and eight possible contributing factors such as per capita income, high school dropout rate, average precipitation, population density, and urbanization. Our analysis revealed that of the eight possible contributing factors, only three variables (namely, urbanization rate, high school dropout rate, and population density) affected property crime rates. Our data analysis model accounted for approximately 66% of the factors contributing to property crimes. The model is generally considered to be statistically strong, however, if we need to account for the remaining 34% of factors contributing to property crime rates in the U.S., further data and evaluation of other possible factors would be necessary. II. Introduction  According to the US Department of Justice (2006), property crime includes several criminal offenses such as burglary; car and motorcycle theft, larceny theft and arson. Property crimes involve “taking of money or property, but there is no force or threat of force against the victims.” One exception to the basic rule, however, is arson which does not involve the taking of property and does involve force against the victims. The purpose of this case...

Words: 3478 - Pages: 14

Premium Essay

Property Crimes

...the victims. The purpose of this case study is to evaluate available data and attempt to determine the variables that contribute the most and address several conceptions and misconceptions about the leading causes of property crimes in the U.S. The questions that this study will answer include: 1. Are crime rates higher in urban than rural areas? 2. Does unemployment or education level contribute to property crime rates? 3. Does public assistance contribute to property crime rates? 4. What other factors relate to property crimes? The study used data that was collected from a “variety of U.S. government sources, including: the 1988 Uniform Crime Reports, Federal Bureau of Investigation; the Office of Research and Statistics,...

Words: 3432 - Pages: 14

Premium Essay

Factors That Affectedfactors That Affect People to Have More Than One Credit Cards

...FACTORS THAT AFFECT PEOPLE TO HAVE MORE THAN ONE CREDIT CARDS Abstract Credit card is only form of payment card that offers a revolving line of credit in addition to its function as a means of electronic payment. Currently, the number of credit cards in circulation in Indonesia is growing rapidly until it reached 14.7 million cards in 2011. This becomes higher since most of the credit card customers holding more than one card. As the number of credit cards in circulation rises, the problems also rise. Problems of credit card customers were diverse, ranging from billing to the difficulty when they want to stop being a customer. Although having many credits card increasing the probability problem, people still tend to have many. From those idea, author decide to do a research on what is actually affect people to have more than one credit card and avoiding the potential problem caused by that. The research methodology that authors use is the quantitative method by collecting through surveys to 106 respondents in Jakarta and Bandung. By the research method above, authors find that factors that actually influence people to have more than one credit card are promotion program that bank offered, lifestyle, the easiness to not bring cash, and the easiness to use the credit cards. While factor that significantly affect the amount of credit card owned by people is only the easiness to use factor. By knowing these factors, banks as the “credit card seller” can make some innovative...

Words: 4658 - Pages: 19

Premium Essay

Introduction to Statistical Thought

...Statistics and Computing Series Editors: J. Chambers D. Hand W. H¨ rdle a Statistics and Computing Brusco/Stahl: Branch and Bound Applications in Combinatorial Data Analysis Chambers: Software for Data Analysis: Programming with R Dalgaard: Introductory Statistics with R, 2nd ed. Gentle: Elements of Computational Statistics Gentle: Numerical Linear Algebra for Applications in Statistics Gentle: Random Number Generation and Monte Carlo Methods, 2nd ed. H¨ rdle/Klinke/Turlach: XploRe: An Interactive Statistical a Computing Environment H¨ rmann/Leydold/Derflinger: Automatic Nonuniform Random o Variate Generation Krause/Olson: The Basics of S-PLUS, 4th ed. Lange: Numerical Analysis for Statisticians Lemmon/Schafer: Developing Statistical Software in Fortran 95 Loader: Local Regression and Likelihood Marasinghe/Kennedy: SAS for Data Analysis: Intermediate Statistical Methods ´ Ruanaidh/Fitzgerald: Numerical Bayesian Methods Applied to O Signal Processing Pannatier: VARIOWIN: Software for Spatial Data Analysis in 2D Pinheiro/Bates: Mixed-Effects Models in S and S-PLUS Unwin/Theus/Hofmann: Graphics of Large Datasets: Visualizing a Million Venables/Ripley: Modern Applied Statistics with S, 4th ed. Venables/Ripley: S Programming Wilkinson: The Grammar of Graphics, 2nd ed. Peter Dalgaard Introductory Statistics with R Second Edition 123 Peter Dalgaard Department of Biostatistics University of Copenhagen Denmark p.dalgaard@biostat.ku.dk ISSN: 1431-8784 ISBN: 978-0-387-79053-4 DOI:...

Words: 104817 - Pages: 420

Free Essay

Statistical Thining in Sports

...Jim Albert and Ruud H. Koning (eds.) Statistical Thinking in Sports CRC PRESS Boca Raton Ann Arbor London Tokyo Contents 1 Introduction Jim Albert and Ruud H. Koning 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Patterns of world records in sports (2 articles) . . . . . . . 1.1.2 Competition, rankings and betting in soccer (3 articles) . . 1.1.3 An investigation into some popular baseball myths (3 articles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Uncertainty of attendance at sports events (2 articles) . . . 1.1.5 Home advantage, myths in tennis, drafting in hockey pools, American football . . . . . . . . . . . . . . . . . . . . . 1.2 Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modelling the development of world records in running Gerard H. Kuper and Elmer Sterken 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 2.2 Modelling world records . . . . . . . . . . . . . . 2.2.1 Cross-sectional approach . . . . . . . . . . 2.2.2 Fitting the individual curves . . . . . . . . 2.3 Selection of the functional form . . . . . . . . . . 2.3.1 Candidate functions . . . . . . . . . . . . . 2.3.2 Theoretical selection of curves . . . . . . . 2.3.3 Fitting the models . . . . . . . . . . . . . . 2.3.4 The Gompertz curve in more detail...

Words: 20315 - Pages: 82

Premium Essay

How to Critically Analyse Psychological Research

...Psychological Research Table of Contents The Theory 2 The Research Rationale 2 The Participants 2 The Design and Procedure 2 1. Research method 2 2. Lab vs field research 2 3. Demand characteristics 3 4. Experimenter bias 3 6. Social desirability 3 7. Validity of the experimental manipulation 3 8. Stimulus sampling 4 9. Reliability and validity of measures of the independent and/or dependent variables 4 10. Confounding variables in 4 11. Order of items/events 4 The Statistical Analyses 5 1. Excluded participants 5 2. Missing data 5 3. Validity and reliability of dependent variables 5 4. Sufficient statistical power 5 5. Statistical assumptions 6 6. Correct use of inferential statistics 6 7. Correct interpretation of analyses 6 8. Alternative analyses 6 The Discussion 6 1. Alternative explanations 6 2. Cause-effect ambiguities 6 3. Third variable 7 4. Mediators and moderators 7 5. Replication 7 6. Interaction or main effect?: 7 Place the Research in the Context of Similar Research 8 Suggestions for Future Research 8 Inappropriate Criticisms 8 1. Criticizing the article rather than the research 8 2. Ethical criticisms 8 3. Incomplete criticisms 8 4. Criticisms of the reliability or effectiveness of methodology that produced the predicted results 9 5. Random allocation of participants to conditions 9 How Not to Use this Document! 10 Structuring a Critical Review 10 Useful Websites...

Words: 7390 - Pages: 30

Premium Essay

Ergg

...Statistical Tool | Use/s | Level of Measurement | Formula | 1. Z – test | to determine whether two population means are different when the variances are known and the sample size is large.  Source:  http://www.investopedia.com/terms/z/z-test.asp#ixzz2LEqfeJnN | IV – NominalDV – Interval | | 2. T – test | to compare the means when the population mean is known but the population variance is unknown.Also when the population standard deviation is unknown but the sample standard deviation can be computed.Source:Basic Statistics Book | OrdinalInterval | | 3. F – test | used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.Source:http://en.wikipedia.org/wiki/F-test | Ordinal Interval | | 4. Spearman rank | measures the strength of association between two ranked variablesSource:https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php | NominalOrdinal | | 5. Pearson R | used in the sciences as a measure of the strength of linear dependence between two variables.Source:http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient | Interval | | 6. Chi – square | to test the difference between an actual sample and another hypothetical or previously established distribution such as that which may be expected due to chance or probabilitycan also be used to test differences between...

Words: 257 - Pages: 2

Premium Essay

Regression Analysis

...the AIU data set in order to complete a regression analysis for benefits & intrinsic, benefits & extrinsic and benefit and overall job satisfaction. Plus giving an overview of these regressions along with what it would mean to a manager (AIU Online).   Introduction Regression analysis can help us predict how the needs of a company are changing and where the greatest need will be. That allows companies to hire employees they need before they are needed so they are not caught in a lurch. Our regression analysis looks at comparing two factors only, an independent variable and dependent variable (Murembya, 2013). Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.018314784 R Square 0.000335431 The portion of the relations explained Adjusted R Square -0.009865228 by the line 0.00033% of relation is Standard Error 1.197079687 Linear. Observations 100 ANOVA df SS MS F Significance F Regression 1 0.04712176 0.047122 0.032883 0.856477174 Residual 98 140.4339782 1.433 Total 99 140.4811 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 4.731133588 1.580971255 2.992549 0.003501 1.593747586 7.86852 Intrinsic -slope 0.055997338 0.308801708 0.181338 0.856477 -0.5568096 0.668804 Line equation is benefits =4.73 + 0.0559 (intrinsic) ...

Words: 830 - Pages: 4

Premium Essay

Asdfghjkl

...statistical analysis is descriptive analysis. Descriptive statistics can summarize responses from large numbers of respondents in a few simple statistics. When a sample is obtained, the sample descriptive statistics are used to make inferences about characteristics of the entire population of interests. Descriptive analysis is the transformation of data in a way that describes the basic characteristics such as tendency, distribution, and variables. A examples of this would be if a company wanted to find out what type of bonus employees prefer. Descriptive statistics are used to explain the basic properties of these variables. One descriptive statistics that is used to explain the basic properties of variables is Mean, Median, and Modes. These terms all would be descriptive statistics for the above example by describing the central tendency in different ways. The mean would reflect the average answer that is given. The Median would provide the answer that is the central or middle range answer. The mode would be the answer that was given the most often. A second descriptive statistic that is used to explain the basic properties of variables is Tabulation. This refers to the orderly arrangement of data in a table or other summary format. When the tabulation process is done by hand, the term tallying is used. Simple tabulation tells how frequently each response or bit of information occurs. A third descriptive statistic used to explain the basic properties of variables is Cross...

Words: 470 - Pages: 2

Premium Essay

Business Strategy

...Hypothesis Testing – Two Sample * H0 : µ1 = µ2 H1 : µ1 ≠ µ2 * Case 1 -- If you know population variances, use this and Normal table * Case 2 -- If you know only sample variances, and samples are large, use this and Normal table * Case 3 -- If you know only sample variances, and samples are small, and unknown population variances can be assumed identical, use this and t-table, with n1 + n2 - 2 df. (sp is called “pooled estimate of σ”) * We use standard error of difference to compute (actual) t Hypothesis Testing – Means Of Dependent (Paired) Samples tActual * pooled estimate of population proportion Regression and Correlation Simple Linear Regression (Only 1 independent variable, and linear relationship) Regression Coefficients Using Method Of Least Squares, we get: Standard Error Of Estimate Correlation: * Variation of y around the regression line * Variation of y around its own mean * Coefficient of Determination Direct Computation of r:   Chi-Square * Make working table as follows: * List observed frequency cells, fo , in 1st column. * Compute expected frequency, fe , for each cell, and write in 2nd column. * fe = RT*CT/n where RT = row total, CT = column total, n = total no of observations in all cells of data table. * Compute (fo – fe ) for each row of working table in column 3 * Compute...

Words: 283 - Pages: 2

Premium Essay

Elementary Statistics

...TERM END EXAMINATIONS,MARCH-2013 BACHELOR OF COMMERCE, YEAR – III ELEMENTARY STASTISTICS Time: 3 hours M.Marks:60 SECTION A Note: - Attempt any 4 questions. All questions carry equal marks. (4 X 5) The answer should be limited upto 200 words. 1) What is statistics? Explain the nature and limitations of statistics? 2) What is frequency distribution? What are the different types of frequency distribution? 3) What is frequency curve? Explain cumulative frequency curve with example? 4) Suppose mean of a series of 5 item is30.four values are respectively, 10, 15, 30 and 35.estimate the missing 5th value of the series. ANSWER : Mean = (10+15+30+35+x)/5=30 Therefore, x=(30*50)-( 10+15+30+35) i.e x = 150-90, hence x=60 5) Calculate median of the following distribution of data. Class interval | 0-5 | 5-10 | 10-20 | 20-30 | 30-50 | 50-70 | 70-100 | frequency | 12 | 15 | 25 | 40 | 42 | 14 | 8 | n= 12+15+25+40+42+14+8=156 Hence median is at the average of n/2 & (n/2 +1) positon i.e 78th & 79th position Class interval | 0-5 | 5-10 | 10-20 | 20-30 | 30-50 | 50-70 | 70-100 | frequency | 12 | 15 | 25 | 40 | 42 | 14 | 8 | Position 12 27 52 92 134 148 156 6) Calculate the coefficient of correlation...

Words: 1424 - Pages: 6

Premium Essay

Econometrics Problem Set 4 Solution School of Business

...Problem 1: i) All the coefficients are significant, because t (crit) = 1,96 is smaller than the absolute values of these three coefficients β1, β2 and β3. Estimated equation is: Log (wage) = 0.128 + 0.0904educ + 0.041exper – 0.000714exper2 (0.106) (0.0075) (0.0052) (0.000116) n = 526, R2 = 0.30 ii) Yes, the coefficient is significant because t-statistics absolute value 6,16 is greater than t (critical value) at 1 % significance level which is 2,586 in this case. iii) Return to the fifth year of experience: 100 * [0.041-2*(0.000714)*4] = 3,53% Return to the 20th year of experience: 100 * [0.041-2*(0.000714)*19] = 1,39% iv) x* = 0.0410089/(2*(-0.0007136)) = -28.7338 28.7338 There are 121 people in the sample with at least 29 years of experience. Problem 2: a) SSE + SSR = SST SST – SSE = SSR SSR = 7160,41429–10.6243285= 7149,79 b) n =524 c) R2 = SSE/SST = 10.62/7160.41 = 0,001484 d) t = (-0,4682478/0,5306473) = -0,88241 e) t = coefficient/ std. error coefficient / t = (5,944174/34,96) = 0,170028 f) F = t^2 = (-0,88241)^2 = 0,778645 Problem 3: Model 1: a) Coefficient on variable cigs indicates that one cigarette smoked per day reduces birth weight by 0,44 %. Therefore, the effect on birth weight from smoking 10 more cigarettes will be that it reduces birth weight by 4,4 %. b) In model 1, a white child is predicted to weight 5,5 % more than a non-white child on...

Words: 821 - Pages: 4

Premium Essay

New Life

...LLR 1st Quarter Report Project Name: Address: Project Manager: Area Manager: Staff Team: Volunteers: Contents 1. Introduction 2. Service Activity 3. Referrals 4. Outcomes 5. Engagement 6. Incidents 7. Feedback 8. Staff Development 9. Project Development 10. Conclusion 1. Introduction This report is based on the activities undertaken by ------- for the period between This initial introductory period has been a very successful initiation period in terms of the increasing number of referrals and assessments received and conducted, in addition to the rising number of service user (SU) engagements. During this reporting period, LLR inducted four new staff members who all completed LLR’s in-house training on the LLR, Health and Safety as well as File and Data Management training. Referrals over the few months have grown steadily with positive client engagement in groups, 1-2-1 counselling and 1-2-1 Recovery Plan Sessions. During this short period we have already observed an increasing number of SU’s being very committed to their recovery journey and we expect their commitment to be reflected in their continued growth and change. The staff and management team have also been very supportive and continue to provide us with regular group space ensuring group activities got underway. Although actual attendance numbers for the group have been relatively...

Words: 452 - Pages: 2

Premium Essay

The Effects of Marijuana on Problem Solving Ability

...Research Design (Assignment 2) I will be conducting a study using a true experimental research design in order to investigate the effects of marijuana use on an individual’s problem solving ability. Marijuana use is the independent variable which is operationally defined as consumption in the form of smoking 0.5 grams of cannabis in a marijuana cigarette. Problem solving abilities is the dependent variable which is operationally defined as the total score on various math problems as well as time taken to complete said math problems. Scores on the math test can range from 0 to 100. 40 participants, 20 males and 20 females all of whom are 18 year old Freshmen taking Math 131 at Pasadena City College, will be utilized in this study. 10 males and 10 females will be randomly assigned to group A and will all smoke marijuana, while 10 males and 10 females will be randomly assigned to group B and will not smoke marijuana. In order to ensure that every participant in group A is affected by the marijuana equally, only students that consent to a drug test prior to the study and are found to have no traces of THC present in their blood will be eligible to participate in the study. This will ensure that all participants in group A are affected by marijuana equally by eliminating the possibility of one participant having a higher tolerance than another. In order to eliminate all possible plausible alternative explanations for the relationship observed between marijuana use and problem solving...

Words: 560 - Pages: 3

Premium Essay

Gm533 Course Project

...What elements should be considered when buying a home? Does the age of the home make a difference in the price? What about the square feet of the home? Does having a house with more square feet mean that the price of the home will increase? Well my fiancé and I are looking to buy a home and have come to the conclusion of five factors that we think are the most important. Using statistics, we will narrow our search down from 108 homes to only a few homes. The first thing, we need to discuss is the dependent and independent variables. Since we are most concerned with the price of the home and how other factors affect it, we will use PRICE as the dependent variable. We have other factors that influence the price such as square feet (home size), number of bedrooms, number of bathrooms, heat (gas forced or electric), style (ranch, two floor, or tri-level), garage, basement, fireplace, age of the home, and the school district which will be our independent variables. Out of these, we have decided to pay close attention to the square feet (home size of 1900 square feet or more), number of bathrooms (3 or more), heat (gas forced), basement, and the age of the home (less than or equal to 10 years). We chose these factors because we wanted to know what kind of relationship, if any, they have with the price of the home. To figure that out, we will be doing a series of tests to discuss the price difference for the independent variables. Then we will get the probability and confidence...

Words: 2412 - Pages: 10