Premium Essay

Regression

In: Business and Management

Submitted By marco100
Words 1464
Pages 6
Regression Analysis: Basic Concepts
Allin Cottrell∗

1

The simple linear model

Suppose we reckon that some variable of interest, y, is ‘driven by’ some other variable x. We then call y the dependent variable and x the independent variable. In addition, suppose that the relationship between y and x is basically linear, but is inexact: besides its determination by x, y has a random component, u, which we call the ‘disturbance’ or ‘error’. Let i index the observations on the data pairs (x, y). The simple linear model formalizes the ideas just stated: yi = β0 + β1 xi + ui The parameters β0 and β1 represent the y-intercept and the slope of the relationship, respectively. In order to work with this model we need to make some assumptions about the behavior of the error term. For now we’ll assume three things: E(ui ) = 0 2 2 E(ui ) = σu E(ui u j ) = 0, i = j u has a mean of zero for all i it has the same variance for all i no correlation across observations

We’ll see later how to check whether these assumptions are met, and also what resources we have for dealing with a situation where they’re not met. We have just made a bunch of assumptions about what is ‘really going on’ between y and x, but we’d like to put numbers on the parameters βo and β1 . Well, suppose we’re able to gather a sample of data on x and y. The task ˆ of estimation is then to come up with coefficients—numbers that we can calculate from the data, call them β0 and ˆ1 —which serve as estimates of the unknown parameters. β If we can do this somehow, the estimated equation will have the form ˆ ˆ yi = β0 + β1 x. ˆ We define the estimated error or residual associated with each pair of data values as the actual yi value minus the prediction based on xi along with the estimated coefficients ˆ ˆ ui = yi − yi = yi − β0 + β1 xi ˆ ˆ In a scatter diagram of y against x, this is the vertical distance between observed

Similar Documents

Premium Essay

Regression

...STATISTICS FOR ENGINEERS (EQT 373) TUTORIAL CHAPTER 3 – INTRODUCTORY LINEAR REGRESSION 1) Given 5 observations for two variables, x and y. | 3 | 12 | 6 | 20 | 14 | | 55 | 40 | 55 | 10 | 15 | a. Develop a scatter diagram for these data. b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? c. Develop the estimated regression equation by computing the values and. d. Use the estimated regression equation to predict the value of y when x=10. e. Compute the coefficient of determination. Comment on the goodness of fit. f. Compute the sample correlation coefficient (r) and explain the result. 2) The Tenaga Elektik MN Company is studying the relationship between kilowatt-hours (thousands) used and the number of room in a private single-family residence. A random sample of 10 homes yielded the following. Number of rooms | Kilowatt-Hours (thousands) | 12 9 14 6 10 8 10 10 5 7 | 9 7 10 5 8 6 8 10 4 7 | a. Identify the independent and dependent variable. b. Compute the coefficient of correlation and explain. c. Compute the coefficient of determination and explain. d. Test whether there is a positive correlation between both variables. Use α=0.05. e. Determine the regression equation (used Least Square method) f. Determine the value of kilowatt-hours used if number of rooms is 11. g. Can you use the model in (f.) to predict the kilowatt-hours if number of rooms...

Words: 1184 - Pages: 5

Premium Essay

Regression

...relationships between the variables. The relationships can either be negative or positive. This is told by whether the graph increases or decreases. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.069642247 R Square 0.004850043 Adjusted R Square -0.00471871 Standard Error 0.893876875 Observations 106 ANOVA df SS MS F Significance F Regression 1 0.404991362 0.404991 0.50686 0.478094147 Residual 104 83.09765015 0.799016 Total 105 83.50264151 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 5.506191723 0.363736853 15.13784 4.8E-28 4.784887893 6.2274956 4.7848879 6.22749555 Benefits -0.05716561 0.080295211 -0.711943 0.47809 -0.21639402 0.1020628 -0.216394 0.10206281 Y=5.5062+-0.0572x Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.161906 R Square 0.026214 Adjusted R Square 0.01685 Standard Error 1.001305 Observations 106 ANOVA df SS MS F Significance F Regression 1 2.806919 2.806919 2.799606 0.097293 Residual 104 104.2717 1.002612 Total 105 107.0786 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower...

Words: 653 - Pages: 3

Premium Essay

Regression

...Q1: All the regressions were performed. Output can be made available if needed. See outputs for Q2 in appendix. Q2: Select the model you are going to keep for each brand and explain WHY. Report the corresponding output in an appendix attached to your report (hence, 1 output per brand) We use Adjusted R Squared to compare the Linear or Semilog Regression. R^2 is a statistic that will give some information about the goodness of fit of a model. In regression, the Adjusted R^2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1 indicates that the regression line perfectly fits the data. Brand1: Linear Regression R^2 | 0.594 | SemiLog Regression R^2 | 0.563 | We use the Linear Regression Model since R-squared is higher. Brand 2: Linear Regression R^2 | 0.758 | SemiLog Regression R^2 | 0.588 | We use the Linear Regression Model since R-squared is higher Brand 3: Linear Regression R^2 | 0.352 | SemiLog Regression R^2 | 0.571 | We use the Semilog Regression Model since R-squared is higher Brand 4: Linear Regression R^2 | 0.864 | SemiLog Regression R^2 | 0.603 | We use the Linear Regression Model since R-squared is higher Q3: Here we compute the cross-price elasticity. Depending on whether we use linear or semi-log model, Linear Model Linear Model Semi-Log Model Semi-Log Model ` ...

Words: 609 - Pages: 3

Premium Essay

Regression

...Term Paper Requirements 1. There is no need for references unless you choose to use the text as a guide for interpretation of the data. 2. Paper must: • Have a cover page that is within APA guidelines • Have a header with student’s last name, page numbers, and be right justified • Be a minimum of 4 pages • Be single-spaced • Use 12-point font (Arial works best) • Have 1” margins • Detailed analysis of the data (see data specifics) • Use section headings for each part of the analysis (see suggested section headings) • Contain output tables 3. Options: • • Graphs or charts, if desired Running head 4. Data Specifics: • Describe the data o What is it that you are analyzing? o What do you intended to produce in the analysis? o What is your hypothesis statement? Includes Confidence Level and what it means Identify the dependent and independent variables o Explain why your choice is the dependent variable o Explain why the others are independent variables Analyze the data o What are the results of the first analysis? o Which variable, or variables, are statistically significant? Why? o Which variable, or variables, are not statistically significant? Why? o Is there a need to perform an additional analysis? Why? • • o What are the results of the second analysis? o Which variable, or variables, are statistically significant? Why? o Which variable, or variables, are not statistically significant? Why? o Is there a need to perform an additional analysis? Why? 5. Analysis summary:...

Words: 328 - Pages: 2

Free Essay

Regression

...1) | Sales | TV | Radio | Fuel.Volume | 1 | Min. :18969 | Min. : 0.00 | Min. : 0.00 | Min. :56259 | 2 | 1st Qu.:21171 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.:61754 | 3 | Median :22924 | Median : 0.00 | Median : 0.00 | Median :63136 | 4 | Mean :23064 | Mean : 41.28 | Mean : 80.47 | Mean :62853 | 5 | 3rd Qu.:24489 | 3rd Qu.: 70.00 | 3rd Qu.:205.00 | 3rd Qu.:64637 | 6 | Max. :28451 | Max. :225.00 | Max. :260.00 | Max. :68549 | 2) a) Yes, The p-value is 9.72e^-12. Much lower than Tyler’s 10% significant level. | Value | Prediction | Lower | Upper | 1 | Minimum | 18809.35 | 14777.04 | 22841.66 | 2 | Mean | 23063.73 | 19182.71 | 26944.74 | 3 | Max | 26739.02 | 22744.55 | 30733.49 | b) c) See above d) Greater fuel volumes could translate as greater number of customers. With greater numbers visiting the gas station, there is a greater chance the customer will visit the store. 3) Residuals: Min 1Q Median 3Q Max -4955.8 -1750.4 -232.4 1464.2 4730.6 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22142.385 275.018 80.513 < 2e-16 *** TV 12.193 3.874 3.147 0.00219 ** Radio 5.195 2.700 1.924 0.05726 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2138 on 98 degrees of freedom Multiple R-squared: 0.2544, Adjusted R-squared: 0.2391 F-statistic: 16.72 on 2 and 98 DF, p-value:...

Words: 1577 - Pages: 7

Premium Essay

Regression Models

...Regression Models Student Name Grantham University BA/520 – Quantitative Analysis Instructor Name April 6, 2013 Abstract This paper will refer to regression models and the benefits that variables provide when developing and examining such models. Also, it will discuss the reason why scatter diagrams are used and will describe the simple linear regression model and will refer to multiple regression analysis as well as the potential uses for this type of model. Regression Models Regression models are a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Regression models provide the scientist with a powerful tool, allowing predictions about past, present, or future events to be made with information about past or present events. Inference based on such models is known as regression analysis. The main purpose of regression analysis is to predict the value of a dependent or response variable based on values of the independent or explanatory variables. According to Render, Stair, and Hanna (2011) they are two reasons for which regression analyses are used: one is to understand the relation between various variables and the second is to predict the variable's value based on the value of the other. Variables provide many advantages when creating models. One of the advantages...

Words: 1282 - Pages: 6

Premium Essay

Multiple-Regression

...MULTIPLE REGRESSION After completing this chapter, you should be able to: understand model building using multiple regression analysis apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model test the significance of the independent variables in a multiple regression model use variable transformations to model nonlinear relationships recognize potential problems in multiple regression analysis and take the steps to correct the problems. incorporate qualitative variables into the regression model by using dummy variables. Multiple Regression Assumptions The errors are normally distributed The mean of the errors is zero Errors have a constant variance The model errors are independent Model Specification Decide what you want to do and select the dependent variable Determine the potential independent variables for your model Gather sample data (observations) for all variables The Correlation Matrix Correlation between the dependent variable and selected independent variables can be found using Excel: Tools / Data Analysis… / Correlation Can check for statistical significance of correlation with a t test Example A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) ...

Words: 1561 - Pages: 7

Premium Essay

Linear Regression

...Introduction Simple linear regression is a model with a single regressor x that has a relationship with a response y that is a straight line. This simple linear regression model is y = β0 + β1x + ε where the intercept β0 and the slope β1 are unknown constants and ε is a random error component. Testing Significance of Regression: H0: β1 = 0, H1 : β1 ≠ 0 The hypotheses relate to the significance of regression. Failing to reject H0: β1 = 0 implies that there is no linear relationship between x and y. On the other hand, if H0: β1 = 0 is rejected, it implies that x is of value in explaining the variability in y. The following equation is the Fundamental analysis-of-variance identity for a regression model. SST = SSR + SSRes Analysis of variance (ANOVA) is a collection of statistical models used in order to analyze the differences between group means and their associated procedures (such as "variation" among and between groups), developed by R. A. Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation.  P value or calculated probability is the estimated probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true. VIF (the variance inflation factor) for each term in the model measures the combined effect of the dependences among the regressors on the variance of the term. Practical experience indicates that if any of...

Words: 483 - Pages: 2

Premium Essay

Regression Analysis

...Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Local Government Engineering Department (LGED) is a public sector organization under the ministry of Local Government, Rural Development & Cooperatives. The prime mandate of LGED is to plan, develop and maintain local level rural, urban and small scale water resources infrastructure throughout the country. Here, I considered LGED as the organization and considering a projects eight districts “available fund” as Independent variable and “development (length of development of road in km)” as dependent variable. The value of the variables are- Districts Fund, X (lakh tk) Development,Y (km) Panchagar 450 10 Thakurgaon 310 6.8 Dinajpur 1500 32 Nilphamari 1160 24.5 Rangpur 1450 31 Kurigram 450 9 Lalmonirhat 950 16 Gaibandha 1550 33 For the two variables “available fund” and “development”, the regression equation can be given as: Y= a + bX Where, Y = Development X = Fund b = rate of change of development a...

Words: 365 - Pages: 2

Premium Essay

Regression Analysis

...Introduction Regression analysis was developed by Francis Galton in 1886 to determine the weight of mother/daughter sweet peas. Regression analysis is a parametric test used for the inference from a sample to a population. The goal of regression analysis is to investigate how effective one or more variables are in predicting the value of a dependent variable. In the following we conduct three simple regression analyses. Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.616038 R Square 0.379503 Adjusted R Square 0.371338 Standard Error 0.773609 Observations 78 ANOVA df SS MS F Significance F Regression 1 27.81836 27.81836 46.48237 1.93E-09 Residual 76 45.48382 0.598471 Total 77 73.30218 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 2.897327 0.310671 9.326021 3.18E-14 2.278571 3.516082 2.278571 3.516082 X Variable 1 0.42507 0.062347 6.817798 1.93E-09 0.300895 0.549245 0.300895 0.549245 Graph Benefits and Extrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.516369 R Square 0.266637 Adjusted R Square 0.256987 Standard Error 0.35314 Observations 78 ANOVA ...

Words: 684 - Pages: 3

Premium Essay

Regression Analysis

...Intercontinental University Unit 5 Individual Project BUSN311-1301B-10: Quantitative Methods and Analysis Instructor Leonidas Murembya April 23, 2013, Abstract This paper will be discussing regression analysis using AIU’s survey responses from the AIU data set in order to complete a regression analysis for benefits & intrinsic, benefits & extrinsic and benefit and overall job satisfaction. Plus giving an overview of these regressions along with what it would mean to a manager (AIU Online).   Introduction Regression analysis can help us predict how the needs of a company are changing and where the greatest need will be. That allows companies to hire employees they need before they are needed so they are not caught in a lurch. Our regression analysis looks at comparing two factors only, an independent variable and dependent variable (Murembya, 2013). Benefits and Intrinsic Job Satisfaction Regression output from Excel SUMMARY OUTPUT Regression Statistics Multiple R 0.018314784 R Square 0.000335431 The portion of the relations explained Adjusted R Square -0.009865228 by the line 0.00033% of relation is Standard Error 1.197079687 Linear. Observations 100 ANOVA df SS MS F Significance F Regression 1 0.04712176 0.047122 0.032883 0.856477174 Residual 98 140.4339782 1.433 Total 99 140.4811 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept...

Words: 830 - Pages: 4

Premium Essay

Multiple Regression

...Introduction to Multiple Regression Dale E. Berger Claremont Graduate University http://wise.cgu.edu Overview Multiple regression is a flexible method of data analysis that may be appropriate whenever a quantitative variable (the dependent or criterion variable) is to be examined in relationship to any other factors (expressed as independent or predictor variables). Relationships may be nonlinear, independent variables may be quantitative or qualitative, and one can examine the effects of a single variable or multiple variables with or without the effects of other variables taken into account (Cohen, Cohen, West, & Aiken, 2003). Multiple Regression Models and Significance Tests Many practical questions involve the relationship between a dependent or criterion variable of interest (call it Y) and a set of k independent variables or potential predictor variables (call them X1, X2, X3,..., Xk), where the scores on all variables are measured for N cases. For example, you might be interested in predicting performance on a job (Y) using information on years of experience (X1), performance in a training program (X2), and performance on an aptitude test (X3). A multiple regression equation for predicting Y can be expressed a follows: (1) [pic] To apply the equation, each Xj score for an individual case is multiplied by the corresponding Bj value, the products are added together, and the constant A is added to the...

Words: 1415 - Pages: 6

Premium Essay

Regression Report

...Probability, Statistics, and Forecasting OPRE 433 Fall 2013 Regression Report Xie Gehui (gxx24@case.edu) Dec 2, 2013 I. Introduction The data set given contains more than one independent variable, so the target of our regression analysis is to build an appropriate multiple regression model. To realize this target, we have to build a multiple linear regression model to test the regression assumptions: model appropriateness, constant variance, independence, and normality. Certainly we need to modify the data set or the model itself to satisfy these assumptions, and at last get the model acceptable. In the original data set that we are going to deal with in this report, there are 20,640 observations of 8 explanatory variables labeled X1, X2, X3, X4, X5, X6, X7, X8 and 1 dependent variable labeled Y. All of the 9 variables are continuous. II. Method of analysis To check the model appropriateness assumption, we need to make sure the functional form is correct. The residual plot will show the pattern suggesting the form of an appropriate model. To check the validity of the constant variance assumption, we need to examine residual plots. A residual plot with a horizontal band appearance suggests that the spread of the error terms around 0 is not changing much as the horizontal plot value increases. Such a plot tells us that the constant variance assumption approximately holds. To check the independence assumption, we need to detect if any positive autocorrelation...

Words: 1536 - Pages: 7

Premium Essay

Regression Basics

...rental cost for apartments, based on the size of the apartment, as defined by square footage. A sample of 25 apartments in a particular residential neighborhood was chosen. Q-2a: Construct a scatter plot of rent/size. Q-2b: Find the equation of the least squares regression line that models the relationship between square footage and rental amount and interpret the meaning of the coefficients. Q-2c: Predict the monthly rent for an apartment with 1000 square feet. Q-2d: Explain why it would not be appropriate to use the model to predict the monthly rent for apartments that have 500 square feet. Q-2e: You are considering signing a lease for an apartment in this residential neighborhood. You are deciding between two apartments, one with 1,000 square feet that rents for $1,275 and the other with 1,200 square fee that rents for $1,425. Which is a better deal? Explain. Page 1 2 Page 2 2 Q-2a Rent vs. Size 2500 2000 Rent in $ 1500 1000 500 0 0 500 1000 1500 2000 2500 Size in sq. feet y = 1.0651x + 177.12 R² = 0.7226 Q-2b Regression equation = (slope) (x) + (y-intercept) = (1.0651) (x) + (177.12) The slope is 1.0651. This means the rent increases by $1.07 for every sq. foot increase in size Q-2c Regression equation = (slope) (x) + (y-intercept) = (1.0651) (1000) + (177.12) = $1,242.22 According to the data provided, the monthly rent for a 1,000 square feet is $1,242.22 Q-2d It would not be appropriate to use the model to predict the monthly rent for apartments...

Words: 461 - Pages: 2

Premium Essay

Linear Regression

...Linear Regression deals with the numerical measures to express the relationship between two variables. Relationships between variables can either be strong or weak or even direct or inverse. A few examples may be the amount McDonald’s spends on advertising per month and the amount of total sales in a month. Additionally the amount of study time one puts toward this statistics in comparison to the grades they receive may be analyzed using the regression method. The formal definition of Regression Analysis is the equation that allows one to estimate the value of one variable based on the value of another. Key objectives in performing a regression analysis include estimating the dependent variable Y based on a selected value of the independent variable X. To explain, Nike could possibly measurer how much they spend on celebrity endorsements and the affect it has on sales in a month. When measuring, the amount spent celebrity endorsements would be the independent X variable. Without the X variable, Y would be impossible to estimate. The general from of the regression equation is Y hat "=a + bX" where Y hat is the estimated value of the estimated value of the Y variable for a selected X value. a represents the Y-Intercept, therefore, it is the estimated value of Y when X=0. Furthermore, b is the slope of the line or the average change in Y hat for each change of one unit in the independent variable X. Finally, X is any value of the independent variable that is selected. Regression...

Words: 1324 - Pages: 6