Premium Essay

Vol. XXXIV (March 1996), pp. 97-114

The Standard Error of Regressions

By D E I R D R E N . M C C L O S K E Y

and

STEPHEN T. ZILIAK

University of Iowa

Suggestions by two anonymous and patient referees greatly improved the paper. Our thanks also to seminars at Clark, Iowa State, Harvard, Houston, Indiana, and Kansas State universities, at Williatns College, and at the universities of Virginia and Iowa. A colleague at Iowa,

Calvin Siehert, was materially helpful.

T

cant for science or policy and yet be insignificant statistically, ignored by the less thoughtful researchers.

In the 1930s Jerzy Neyman and Egon

S. Pearson, and then more explicitly

Abraham Wald, argued that actual investigations should depend on substantive not merely statistical significance. In

1933 Neyman and Pearson wrote of type

I and type II errors:

HE IDEA OF Statistical significance is

old, as old as Cicero writing on forecasts (Cicero, De Divinatione, 1. xiii. 23).

In 1773 Laplace used it to test whether comets came from outside the solar system (Elizabeth Scott 1953, p. 20). The first use of the very word "significance" in a statistical context seems to be John

Venn's, in 1888, speaking of differences expressed in units of probable error;

Is it more serious to convict an innocent man or to acquit a guilty? That will depend on the consequences of the error; is the punishment death or fine; what is the danger to the community of released criminals; what are the current ethical views on punishment? From the point of view of mathematical theory all that we can do is to show how the risk of errors may be controlled and minimised. The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator.

(Neyman and Pearson 1933, p. 296; italics

supplied)

Premium Essay

...| | | | | | Expected | Airline | | | | | | American | Continental | Delta | United | | Yes | 52.5 | 57.75 | 68.25 | 31.5 | 210 | No | 47.5 | 52.25 | 61.75 | 28.5 | 190 | | | | | | | CHI SQ | American | Continental | Delta | United | | YES | 0.385714286 | 2.19155844 | 0.000916 | 1.34127 | 3.919458 | NO | 0.426315789 | 2.4222488 | 0.001012 | 1.482456 | 4.332033 | | | | | | 8.251491 | | | | | | | | df=(n-1)(m-1)=(2-1)(4-1) | | | | | | | | | | | | chidist= | 0.04109026 | | | | Chapter 12 #5 SUMMARY OUTPUT | | | | | | | | | | | | | | | | | | Regression Statistics | | | | | | | | Multiple R | 0.877498417 | | | | | | | | R Square | 0.770003471 | | | | | | | | Adjusted R Square | 0.731670716 | | | | | | | | Standard Error | 5.383266912 | | | | | | | | Observations | 8 | | | | | | |...

Words: 1598 - Pages: 7

Premium Essay

...article by introducing more precise citations. (December 2012) In statistics, the Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation (a relationship between values separated from each other by a given time lag) in the residuals (prediction errors) from a regression analysis. It is named after James Durbin and Geoffrey Watson. The small sample distribution of this ratio was derived by John von Neumann (von Neumann, 1941). Durbin and Watson (1950, 1951) applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Later, John Denis Sargan and Alok Bhargava developed several von Neumann–Durbin–Watson type test statistics for the null hypothesis that the errors on a regression model follow a process with a unit root against the alternative hypothesis that the errors follow a stationary first order autoregression (Sargan and Bhargava, 1983). Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors.[1] Contents [hide] 1 Computing and interpreting the Durbin–Watson statistic 2 Durbin h-statistic 3 Durbin–Watson test for panel data 4 Implementations in statistics packages 5 See also 6 Notes 7 References 8 External links Computing and interpreting the...

Words: 1060 - Pages: 5

Premium Essay

...Statistical Project Assignment | Statistics for Business & Economics | | DATASET 1: SIMPLE REGRESSION ANALYSIS Variable Definition Xi = Weight of car (pounds) Yi = Price of car ($) 1. (a) Regression Model using X to predict Y Weight and Price of Car Sales | | | | | | | | | | | | | Regression Statistics | | | | | | Multiple R | 0.212585295 | | | | | | R Square | 0.045192508 | | | | | | Adjusted R Square | 0.038951936 | | | | | | Standard Error | 7883.368653 | | | | | | Observations | 155 | | | | | | | | | | | | | ANOVA | | | | | | | | df | SS | MS | F | Significance F | | Regression | 1 | 450055137.6 | 450055137.6 | 7.241725381 | 0.007915154 | | Residual | 153 | 9508567701 | 62147501.31 | | | | Total | 154 | 9958622839 | | | | | | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Intercept | 9854.041192 | 2894.819474 | 3.404026151 | 0.000847889 | 4135.063875 | 15573.01851 | Weight | 2.843766419 | 1.056751555 | 2.691045407 | 0.007915154 | 0.756058281 | 4.931474557 | Table 1 – Simple Linear Regression Model (Y and X) Simple linear regression equation Ŷi=b0+b1Xi From Table 1, we can see that b0 = 9854.0412 and b1 = 2.8438 Ŷi=9854.0412+2.8438Xi Figure 1 – Scatter Plot – Weight of Car vs Price of Car (b) Interpret the slope b1 measures the estimated change in the average value of Y as a result of...

Words: 3699 - Pages: 15

Premium Essay

...the 0.05 significance level. To study the affect of the independent variable “CONVERT” on the selling price, we have set the following Null and Alternate Hypothesis. The null hypothesis tested is H0: The mean price does not depend on whether the car is a convertible. The alternative hypothesis is H1: The mean price depends on whether the car is a convertible Using the independent variable CONVERT and the dependent variable Selling price, following Regression output is generated by Excel Data Analysis. SUMMARY OUTPUT | | | | | | | | | | | | | Regression Statistics | | | | | | Multiple R | 0.386986 | | | | | | R Square | 0.149758 | | | | | | Adjusted R Square | 0.123993 | | | | | | Standard Error | 3541.151 | | | | | | Observations | 35 | | | | | | | | | | | | | ANOVA | | | | | | | | df | SS | MS | F | Significance F | | Regression | 1 | 72887081 | 72887081 | 5.812481 | 0.021642 | | Residual | 33 | 4.14E+08 | 12539754 | | | | Total | 34 | 4.87E+08 | | | | | | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Intercept | 7281.2 |...

Words: 3107 - Pages: 13

Premium Essay

...Introduction Regression analysis was developed by Francis Galton in 1886 to determine the weight of mother/daughter sweet peas. Regression analysis is a parametric test used for the inference from a sample to a population. The goal of regression analysis is to investigate how effective one or more variables are in predicting the value of a dependent variable. In the following we conduct three simple regression analyses. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.616038 R Square 0.379503 Adjusted R Square 0.371338 Standard Error 0.773609 Observations 78 ANOVA df SS MS F Significance F Regression 1 27.81836 27.81836 46.48237 1.93E-09 Residual 76 45.48382 0.598471 Total 77 73.30218 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 2.897327 0.310671 9.326021 3.18E-14 2.278571 3.516082 2.278571 3.516082 X Variable 1 0.42507 0.062347 6.817798 1.93E-09 0.300895 0.549245 0.300895 0.549245 Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.516369 R Square 0.266637 Adjusted R Square 0.256987 Standard Error 0.35314 Observations 78 ANOVA ...

Words: 684 - Pages: 3

Premium Essay

...HOUSE PRICES II CASE: 28 Olusegun Abebayo TAKSAMAI TANAPAISANKIT STACEYANN BARTON GM533 Applied Managerial Statistics Abstract Pricing your home competitively is an important factor in determining your selling price. As a seller, the aim is to get the best asking price. To prevent losing money, one has to be careful not to underprice their home. As mentioned in the article Selling Your Home – The Importance of Pricing Correctly, the most important factor when selling your home is not what your home is listed for, but rather what similar homes have recently sold for. This is the statistic that will properly tell you what buyers are willing to pay for a similar home, in a comparable neighborhood. In the article entitled Pricing Houses-Pricing Houses to Sell, Elizabeth Weintraub provided a few guidelines that can be effective in pricing one’s home. She suggested that a seller looks at every similar home that was or is listed in the same neighborhood over the past six months. Compare similar square footage, within 10% up or down from the subject property, if possible. Compare apples to apples. The objective of this study is to use the data given in Case 28 – Housing Prices 11 to determine the selling price for a house in Eastville, Oregon and prepare and establish the description of how the findings might be used as a general method for estimating the selling price of any house in my neighborhood. In doing so, we had to figure out what factors determine the selling...

Words: 6813 - Pages: 28

Premium Essay

...Unit 5 Regression Analysis American Intercontinental University Regression Analysis Independent Variable: Benefits Dependent Variable: Intrinsic Regression Statistics | | Multiple R | 0.252916544 | R Square | 0.063966778 | Adjusted R Square | 0.045966139 | Standard Error | 0.390066747 | Observations | 54 | ANOVA | | | | | | | df | SS | MS | F | Significance F | Regression | 1 | 0.540685116 | 0.540685116 | 3.553583771 | 0.065010363 | Residual | 52 | 7.911907477 | 0.152152067 | | | Total | 53 | 8.452592593 | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | Intercept | 4.88865703 | 0.188506099 | 25.93368096 | 2.04938E-31 | 4.510391881 | 5.266922187 | 4.510391881 | 5.266922187 | 1.4 | 0.06958624 | 0.036913916 | 1.885095162 | 0.065010363 | -0.004486945 | 0.143659433 | -0.004486945 | 0.143659433 | Independent Variable: Benefits Dependent Variable: Extrinsic Regression Statistics | | Multiple R | 0.332749251 | R Square | 0.110722064 | Adjusted R Square | 0.093620565 | Standard Error | 0.405766266 | Observations | 54 | ANOVA | | | | | | | df | SS | MS | F | Significance F | Regression | 1 | 1.065986925 | 1.065987 | 6.474407048 | 0.013952455 | Residual | 52 | 8.561605668 | 0.164646 | | | Total | 53 | 9.627592593 | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95...

Words: 463 - Pages: 2

Premium Essay

...MULTIPLE REGRESSION After completing this chapter, you should be able to: understand model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model use variable transformations to model nonlinear relationships recognize potential problems in multiple regression analysis and take the steps to correct the problems. incorporate qualitative variables into the regression model by using dummy variables. Multiple Regression Assumptions The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Tools / Data Analysis… / Correlation Can check for statistical significance of correlation with a t test Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) ...

Words: 1561 - Pages: 7

Premium Essay

...2 4,273 30 2 3,067 22 4 3,074 46 5 4,820 66 4 5,149 ∑ 2112 180 197,717 N 50 50 50 Mean 42.24 3.6 3,954 Mode 22 2 2921 Median 40 3 4,009 Max. 67 7 5,678 Min. 21 1 2,448 Range 46 6 3,230 St. Deviation 15.29 1.81 920.90 Skewness 0.18 0.44 0.09 While most people earn $22, 000, the average income of the sample population is $42, 240 The household data is skewed due to the fact that 2 persons compose the majority of households. The data points are spread out over a wider range of values as indicated by the high standard deviation of $15, 290. The household incomes range from anywere between $21, 000 to $67, 000 while the size of each household ranges from 7 to 1. The amount charged data seems to be normally distributed. Simple linear regression: Amount Charged vs. Annual Income = 2388.83 + 37.06 Xi Where: Yi = estimated, or predicted, Y value for Amount Charged in $ Xi = value of the independent variable,...

Words: 1644 - Pages: 7

Premium Essay

...Methods and Analysis Instructor Leonidas Murembya April 23, 2013, Abstract This paper will be discussing regression analysis using AIU’s survey responses from the AIU data set in order to complete a regression analysis for benefits & intrinsic, benefits & extrinsic and benefit and overall job satisfaction. Plus giving an overview of these regressions along with what it would mean to a manager (AIU Online). Introduction Regression analysis can help us predict how the needs of a company are changing and where the greatest need will be. That allows companies to hire employees they need before they are needed so they are not caught in a lurch. Our regression analysis looks at comparing two factors only, an independent variable and dependent variable (Murembya, 2013). Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.018314784 R Square 0.000335431 The portion of the relations explained Adjusted R Square -0.009865228 by the line 0.00033% of relation is Standard Error 1.197079687 Linear. Observations 100 ANOVA df SS MS F Significance F Regression 1 0.04712176 0.047122 0.032883 0.856477174 Residual 98 140.4339782 1.433 Total 99 140.4811 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 4.731133588 1.580971255 2.992549 0.003501 1.593747586 7.86852 Intrinsic...

Words: 830 - Pages: 4

Premium Essay

...Simple linear regression results: Dependent Variable: Size Independent Variable: Income ($1000) Size = 2.3893113 + 0.023563983 Income ($1000) Sample size: 50 R (correlation coefficient) = 0.1984 R-sq = 0.039351758 Estimate of error standard deviation: 1.7220922 Parameter estimates: Parameter | Estimate | Std. Err. | Alternative | DF | T-Stat | P-Value | Intercept | 2.3893113 | 0.77432936 | ≠ 50 | 48 | -61.486355 | <0.0001 | Slope | 0.023563983 | 0.016804602 | > 50 | 48 | -2973.9731 | 1 | Analysis of variance table for regression model: Source | DF | SS | MS | F-stat | P-value | Model | 1 | 5.8311434 | 5.8311434 | 1.9662601 | 0.1673 | Error | 48 | 142.34886 | 2.9656012 | | | Total | 49 | 148.18 | | | | Simple linear regression results for Location=Urban: Dependent Variable: Income ($1000) Independent Variable: Size Income ($1000) = 39.63889 - 3.3611112 Size Sample size: 13 R (correlation coefficient) = -0.3589 R-sq = 0.12878229 Estimate of error standard deviation: 7.597348 Parameter estimates: Parameter | Estimate | Std. Err. | Alternative | DF | T-Stat | P-Value | Intercept | 39.63889 | 5.117386 | ≠ .40 | 11 | 7.66776 | <0.0001 | Slope | -3.3611112 | 2.6358569 | < .40 | 11 | -1.4269027 | 0.0907 | Analysis of variance table for regression model: Source | DF | SS | MS | F-stat | P-value | Model | 1 | 93.85256 | 93.85256 | 1.6260059 | 0.2285 | Error | 11 | 634.9167 | 57.719696 | | | Total...

Words: 1608 - Pages: 7

Premium Essay

...Chapter 3: Answers to Questions and Problems 1. a. When P = $12, R = ($12)(1) = $12. When P = $10, R = ($10)(2) = $20. Thus, the price decrease results in an $8 increase in total revenue, so demand is elastic over this range of prices. b. When P = $4, R = ($4)(5) = $20. When P = $2, R = ($2)(6) = $12. Thus, the price decrease results in an $8 decrease total revenue, so demand is inelastic over this range of prices. c. Recall that total revenue is maximized at the point where demand is unitary elastic. We also know that marginal revenue is zero at this point. For a linear demand curve, marginal revenue lies halfway between the demand curve and the vertical axis. In this case, marginal revenue is a line starting at a price of $14 and intersecting the quantity axis at a value of Q = 3.5. Thus, marginal revenue is 0 at 3.5 units, which corresponds to a price of $7 as shown below. Price $14 $12 $10 $8 $6 $4 $2 Demand $0 0 1 2 3 MR 4 5 6 Quantity Figure 3-1 Managerial Economics and Business Strategy, 5e Page 1 2. a. At the given prices, quantity demanded is 700 units: d Qx = 1000 − 2 (154 ) + .02 ( 400 ) = 700 . Substituting the relevant information into Px 154 = −2 = −0.44 . Since this is less 700 Qx than one in absolute value, demand is inelastic at this price. If the firm charged a lower price, total revenue would decrease. b. At the given prices, quantity demanded is 300 units: d Qx = 1000 − 2 ( 354 ) + .02 ( 400 ) = 300 . Substituting the relevant information...

Words: 4130 - Pages: 17

Premium Essay

...s 11-17-09 OIS 3440 Capstone Project Executive Summary My goal was to gather finance information and establish relationships between eight different variables. I incorporated the following variables into a small survey: Age, Income, Investment, Number of Children, Number of Years of college, Additional Investment for the Current Year and Home Value. The reason for studying this data is because I am a finance major and going to be working towards becoming a financial advisor. I created a survey that answered the questions for the criteria listed above. I asked friends and family through email and facebook, and told them the reason for my asking. Everyone was willing to participate, as I kept their personal information confidential. I took the survey results and computed with 49 surveys. After analyzing the data I noticed that there were many relationships amongst the variables. Based on the research I concluded that the higher the age the more that they are willing to invest for the current year. There was also a strong relationship between age and income. The older the person, the more money they made. The dependent variable was age. Older people make more money, are more familiar with investments, and are more willing to make larger investments currently. Also, age related to home value in that an increase in age was related to an increase in home value. Analysis Investments | Income | Education | Kids | Home Value | Additional Investments | Age | ...

Words: 1705 - Pages: 7

Premium Essay

...Linear Regression deals with the numerical measures to express the relationship between two variables. Relationships between variables can either be strong or weak or even direct or inverse. A few examples may be the amount McDonald’s spends on advertising per month and the amount of total sales in a month. Additionally the amount of study time one puts toward this statistics in comparison to the grades they receive may be analyzed using the regression method. The formal definition of Regression Analysis is the equation that allows one to estimate the value of one variable based on the value of another. Key objectives in performing a regression analysis include estimating the dependent variable Y based on a selected value of the independent variable X. To explain, Nike could possibly measurer how much they spend on celebrity endorsements and the affect it has on sales in a month. When measuring, the amount spent celebrity endorsements would be the independent X variable. Without the X variable, Y would be impossible to estimate. The general from of the regression equation is Y hat "=a + bX" where Y hat is the estimated value of the estimated value of the Y variable for a selected X value. a represents the Y-Intercept, therefore, it is the estimated value of Y when X=0. Furthermore, b is the slope of the line or the average change in Y hat for each change of one unit in the independent variable X. Finally, X is any value of the independent variable that is selected. Regression...

Words: 1324 - Pages: 6

Premium Essay

...The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes* Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator seeks to ascertain the causal eﬀect of one variable upon another—the eﬀect of a price increase upon demand, for example, or the eﬀect of changes in the money supply upon the inﬂation rate. To explore such issues, the investigator assembles data on the underlying variables of interest and employs regression to estimate the quantitative eﬀect of the causal variables upon the variable that they inﬂuence. The investigator also typically assesses the “statistical signiﬁcance” of the estimated relationships, that is, the degree of conﬁdence that the true relationship is close to the estimated relationship. Regression techniques have long been central to the ﬁeld of economic statistics (“econometrics”). Increasingly, they have become important to lawyers and legal policy makers as well. Regression has been oﬀered as evidence of liability under Title VII of the Civil Rights Act of , as evidence of racial bias in death penalty litigation, as evidence of damages in contract actions, as evidence of violations under the Voting Rights Act, and as evidence of damages in antitrust litigation, among other things. In this lecture, I will provide an overview of the most basic techniques of regression analysis—how they work, what they assume, Professor of Law, University of Chicago...

Words: 11643 - Pages: 47