Free Essay

Heteroscedasticity

In:

Submitted By samiam7
Words 402
Pages 2
In statistics, a collection of random variables is heteroscedastic (often spelled heteroskedastic,[1] and commonly pronounced with a hard k regardless of spelling) if there are sub-populations that have different variabilities from others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity.

The possible existence of heteroscedasticity is a major concern in the application of regression analysis, including the analysis of variance, because the presence of heteroscedasticity can invalidate statistical tests of significance that assume that the modelling errors are uncorrelated and normally distributed and that their variances do not vary with the effects being modelled. Similarly, in testing for differences between sub-populations using a location test, some standard tests assume that variances within groups are equal.

Tests for the possible presence of heteroscedasticity are outlined below.

The term means "differing variance" and comes from the Greek "hetero" ('different') and "skedasis" ('dispersion').

Suppose there is a sequence of random variables {Yt}t=1n and a sequence of vectors of random variables, {Xt}t=1n. In dealing with conditional expectations of Yt given Xt, the sequence {Yt}t=1n is said to be heteroskedastic if the conditional variance of Yt given Xt, changes with t. Some authors refer to this as conditional heteroscedasticity to emphasize the fact that it is the sequence of conditional variances that changes and not the unconditional variance. In fact it is possible to observe conditional heteroscedasticity even when dealing with a sequence of unconditional homoscedastic random variables, however, the opposite does not hold. If the variance changes only because of changes in value of X and not because of a dependence on the index t, the changing variance might be described using a scedastic function.

When using some statistical techniques, such as ordinary least squares (OLS), a number of assumptions are typically made. One of these is that the error term has a constant variance. This might not be true even if the error term is assumed to be drawn from identical distributions.

For example, the error term could vary or increase with each observation, something that is often the case with cross-sectional or time series measurements. Heteroscedasticity is often studied as part of econometrics, which frequently deals with data exhibiting it. White's influential paper[2] used "heteroskedasticity" instead of "heteroscedasticity" whereas the latter has been used in later works.[3]

Similar Documents

Free Essay

House Price Data in Iowa

...influencing house prices in Ames, Iowa. Name: Michelle O’ Regan Student number: 114462288 Degree: BSc Finance. Second Year Word Count: 1822 (not including appendix) Submission Date: 14th April, 2016 Introduction Throughout this report I endeavour to present a clear, concise documentation of the factors that influence house prices in Ames, Iowa. I will initiate this report with my estimate of the possible regression based on my economic theory, create a dummy variable in respect to the absence/presence of a garage, followed by a comprehensive description of continuous and discrete variables. Preceding this I aim to report an extensive description of my estimated regression, test said regression for multicollinearity and heteroscedasticity, predict possible solutions to these problems and re run the regression taking into consideration the possible solutions. Main Body Part (a) From my study of econometrics and my knowledge of house prices, the following equation is my estimate of the factors that influence the price of houses PR= f (SI, YD, GA, lnAGE) + + + - (see appendix 1.1 for variable details) My reasoning for the inclusion of the above variables and their predicted signs are as follows: SI: Generally speaking, the larger the home the more you pay as house buyers like to buy houses with as much space as possible. I...

Words: 3224 - Pages: 13

Premium Essay

Hello

...Econometrics Assignment 3 FB11001  Test the Multicollinearity problem with a suitable method. Solve the problem of Multicollinearity if so, by any one of the method which you thing suitable for your example? Answer: We have referred to the following data base1, in order to illustrate the multiple linear regression models. This database is the same as used for Assignment 2. X1 = annual net sales/$1000 X2 = number sq. ft./1000 X3 = inventory/$1000 X4 = amount spent on advertising/$1000 X5 = size of sales district/1000 families X6 = number of competing stores in district So here we wish to establish a relationship between the Annual Net Sales (Dependent) of a Franchise store to the area of the franchise store, the inventory levels, the amount spent on advertising, the size of the district and the intensity of the competition (Independent Variables). Theory: We propose that the Annual net sales be positively related to the size of the store, the inventory, amount spent on advertising, the size of the sales district and negatively related to the number of competing stores in the district. The Regression Equation estimated is:Regressand, (Denoted here, X1) Net Annual Sales = Regressors, Area (X2) +Inventory (X3) + Advertising (X4) + Size of District (X5) + Competition (X6) Multicollinearity is a term used to describe the presence of linear relationships among the independent variables. A Multicollinearity problem occurs when the relationship is very strong. It is called a problem...

Words: 1039 - Pages: 5

Free Essay

Econometrics Demand and Supply

...11. Hypothesis Testing on Model 2 Now we examine the second model. The model is similar to the model conducted above with the exception that it uses LR5 instead of R1, it is a log model (all the explanatory variables are logged) and that a lagged term has been added LRM4 (-1). This is to correct for the autocorrelation and heteroscedasticity found in the previous model. The model runs from the 1st quarter of 1969 due to the lagged variable and missing data, to the 2nd quarter of 2001, because of missing data for the variables in the 3rd quarter of 2001. The same tests will be run as in the first model to compare and see whether any improvements to the model have been made. ------------------------------------------------- All variables appear statistically significant at the 95% level. They have their expected signs. The R bar square value is very high (.99806), as well as the F-stat (22135.4). Normal Distribution T-Test – Confidence Intervals P-Value | Similar to the previous model, the P-values for all 3 variables are 0.000. Since the value is less than 0.005, we reject the null hypothesis Ho and accept H1. The results are also statistically significant. There is 0% chance the null hypothesis is true. | Goodness of fit | Based on the R-bar squared figure of .99806. As the R-bar squared is above > 0.50, it also conforms to the third Gauss Markov assumption whereEUtVt-1=0, t≠t-1. | T-Ratio | From the results, * A: Serial Correlation*CHSQ ( 4) = 15.2797...

Words: 1382 - Pages: 6

Premium Essay

Format Project

...components (max) Marks Comments 1. Language - 2 marks 2. Design - 1 mark 3. Reference - 1 mark 4. Introduction & Conclusion - 2 marks 5. Model specification – 5 marks a) Theory - 2 marks b) Choice of variables - 2 marks c) Functional form – 1 mark 6. General tests – 4 marks a) Significance test (t ‘n F) - 2 marks b) Interpretation – 2 marks 7. Errors checking procedures - 5 marks a) Multicollinearity – 2 marks b) Heteroscedasticity – 2 marks c) Autocorrelation – 1 mark 3|Page 8. Bonus (if any) II. Presentation marks: ________________________/20 marks Relative marked components (max) Marks Comments Content of the presentation – 8 marks  Purpose of presentation – 2 marks  Logical Structure – 3 marks  Explanation and outcome – 3 marks Visual Aids and graphic displays – 2 marks Language – 5 marks Q&A sections – 5 marks Bonus (if any) Timing (-1 marks if overtime) 4|Page Additional Guideline Heteroscedasticity testing: After running Regression Analysis, you can use “White heteroscedasticity test” to examine whether there is the existence of this error. How to...

Words: 461 - Pages: 2

Premium Essay

The Effect of Environmental Regulations on Foreign Direct Investment

...The effect of Environmental Regulations on Foreign Direct Investment Florian Gasser Regina Hammerschmid Lydia Sperrer Projektarbeit Volkswirtschaftliche Analysemethoden Universität Innsbruck Eingereicht am: 25. Juni 2011 bei Ass.-Prof. Mag. Dr. Herbert Stocker Abstract International Trade Flows in particular Foreign Direct Investment (FDI) depend on numerous factors such socio- or political stability, environmental standards which must be met, imposed taxes or labour conditions. Measuring these effects is not as straight forward as it might seem and various studies have been conducted in this field. The following paper focuses on the pollution haven hypotheses stating that lax environmental regulations increase Foreign Direct Investment inflow since investing firms experience significant cost efficiencies and comparative advantages. The data set is mainly chosen from the World Data Bank and five explanatory variables are used to investigate their influence on FDI inflow (as percentage of GDP). During the empirical analysis a pivotal factor will be the OECD membership even if several environmental standards are controlled. We expect to see some significant determinants of FDI inflow in order to either agree or reject the pollution haven hypotheses. Contents 1 Introduction 2 The Two Hypotheses 3 Data Set 4 Econometric Model and Results 4.1 Linear Regression Model (OLS) . . . . . . . . . . . . . . . . . 4.2 Assumptions of Gauss-Markov-Theorem . . . . . . . . . . . . 4.3 Chow Test...

Words: 3184 - Pages: 13

Premium Essay

Understanding the Factors Affecting the Unemployment Rate Through Regression Analysis

...Dee 10933557 April 8, 2011 1 TABLE OF CONTENTS I. INTRODUCTION 4 A. Background of the Study 4 B. Statement of the Problem 5 C. Objective 5 II. THEORETICAL FRAMEWORK AND RELATED LITERATURE 6 A. GDP 6 B. Average Years in School 6 C. Population 7 D. Literacy Rate 7 III. OPERATIONAL FRAMEWORK 9 A. Model Specification 9 B. List and Description of Variables 9 C. A-priori Expectations 10 IV. METHODOLOGY 12 V. EMPIRICAL RESULTS AND INTERPRETATIONS 13 A. Regression of the Original Model 13 2 B. Summary Statistics 15 C. Testing for Misspecification in the Model 16 D. Testing for Multicollinearity 17 E. Testing for Heteroscedasticity 18 VI. CONCLUSION 21 VII. BIBLIOGRAPHY 22 3 I. INTRODUCTION A. Background of the Study When we were still kids, we dream of what we want to be in the future. Older people will usually ask us if what we want to be in the future. Most of us will say, they want to be a doctor, lawyer or engineer to name some. We think and think about our career, but once we are already in the college level, we now dream to become successful in life and have a stable job. But with the rate of unemployment here in country continue to increase, there are no guarantee that once we graduated we will have a job immediately. Unfortunately, many still fail to have stable jobs. Some even can’t find a job even though they graduated...

Words: 4362 - Pages: 18

Premium Essay

Purefoods

...Correlation Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect. People of the same height vary in weight, and you can easily think of two people you know where the shorter one is heavier than the taller one. Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'', and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how much of the variation in peoples' weights is related to their heights. Although this correlation is fairly obvious your data may contain unsuspected correlations. You may also suspect there are correlations, but don't know which are the strongest. An intelligent correlation analysis can lead to a greater understanding of your data. Techniques in Determining Correlation There are several different correlation techniques. The Survey System's optional Statistics Moduleincludes the most common type, called the Pearson or product-moment correlation. The module also includes a variation on this type called partial correlation. The latter is useful when you want to look at the relationship between two variables while removing the effect of one or two other variables. Like all statistical techniques, correlation is only appropriate for certain kinds of data. Correlation works for quantifiable...

Words: 2622 - Pages: 11

Premium Essay

Econometric Analysis

...ECONOMETRIC ANALYSIS. INDEX: - Introduction..................................................................................3 -Background....................................................................................8 -Empirical Analysis.........................................................................9 -Conclusion.....................................................................................31 -Bibliography..................................................................................31 * INTRODUCTION: For many years it has tried to explain and predict economic phenomena. In the present work we destructive her to perform an econometric study of the function of the number of travelers who occupy tourist accommodation in Andalusia. The data required for such analysis have been collected from the database of the Institute of statistics and cartography of Andalusia, for easy access through the official website (http://www.juntadeandalucia.es/institutodeestadisticaycartografia/index.html). The National Statistical Institute(NSI) sends every month to the Institute of statistics of Andalusia provisional results which offers survey during the previous month in the eight Andalusian provinces. The survey is exhaustive in all provinces, except in some categories where sampling procedures are performed. The estimates are published disaggregated by categories up to the level that allows the maintenance of statistical confidentiality. The highest...

Words: 6633 - Pages: 27

Free Essay

Time Series Analysis Summary

...analyse. A series is stationary if its mean and variance are constant over time. Special aids are available to help determine whether or not a series is stationary. Particularly notable in this regard are the autocorrelation function (ACF) and the partial autocorrelation function (PACF). These are plots of the sample autocorrelation and partial autocorrelation coefficients at various time lags, respectively. If the ACF decays gradually to zero, then the series is non-stationary. If on the other hand the ACF and PACF decay rapidly to zero, then the series is stationary. A series being non-stationary can be brought about by, among others, a trend, irregular fluctuations, or seasonal variation. Non-constant variance, or as commonly called, heteroscedasticity can be eliminated by using a variance-stabilising transformation. A number of ways exist that eliminate a trend. Two of which are, to subtract a regression line and to difference the series. The latter means creating the series ∆Xt = Xt − Xt−1 . In order to remove seasonality, the period must first be determined by creating a periodogram, and then differencing according to the period. As an example, a half-yearly periodic component from time-series Xt can be removed by taking the difference, ∆6 Xt = Xt − Xt−6 . Stationary time-series models include,...

Words: 1929 - Pages: 8

Premium Essay

Ecco 550

...MULTIPLE CHOICE (CHAPTER 4) 1. Using a sample of 100 consumers, a double-log regression model was used to estimate demand for gasoline. Standard errors of the coefficients appear in the parentheses below the coefficients. Ln Q = 2.45 -0.67 Ln P + . 45 Ln Y - .34 Ln Pcars (.20) (.10) (.25) Where Q is gallons demanded, P is price per gallon, Y is disposable income, and Pcars is a price index for cars. Based on this information, which is NOT correct? a. Gasoline is inelastic. b. Gasoline is a normal good. c. Cars and gasoline appear to be mild complements. d. The coefficient on the price of cars (Pcars) is insignificant. e. All of the coefficients are insignificant. 2. In a cross section regression of 48 states, the following linear demand for per-capita cans of soda was found: Cans = 159.17 – 102.56 Price + 1.00 Income + 3.94Temp |  |Coefficients |Standard Error |t Stat | |Intercept |159.17 |94.16 |1.69 | |Price |-102.56 |33.25 |-3.08 | |Income |1.00 |1.77 |0.57 | |Temperature |3.94 |0.82 |4.83 | R-Sq = 54.1% R-Sq(adj) = 51.0% From the linear regression results in the cans case...

Words: 1651 - Pages: 7

Premium Essay

Assistant Professor

...SUBJECT REVIEW Regression Methods in the Empiric Analysis of Health Care Data GRANT H. SKREPNEK, PhD ABSTRACT OBJECTIVE: The aim of this paper is to provide health care decision makers with a conceptual foundation for regression analysis by describing the principles of correlation, regression, and residual assessment. SUMMARY: Researchers are often faced with the need to describe quantitatively the relationships between outcomes andpre d i c t o r s , with the objective of ex p l a i n i n g trends, testing hypotheses, or developing models for forecasting. Regression models are able to incorporate complex mathematical functions and operands (the variables that are manipulated) to best describe the associations between sets of variables. Unlike many other statistical techniques, regression allows for the inclusion of variables that may control for confounding phenomena or risk factors. For robust analyses to be conducted, however, the assumptions of regression must be understood and researchers must be aware of diagnostic tests and the appropriate procedures that may be used to correct for violations in model assumptions. CONCLUSION: Despite the complexities and intricacies that can exist in re gre s s i o n , this statistical technique may be applied to a wide range of studies in managed care settings. Given the increased availability of data in administrative databases, the application of these procedures to pharmacoeconomics and outc o m e s assessments may result in...

Words: 9010 - Pages: 37

Free Essay

Eviews Commands

...Summary of important EViews-Commands Import of data from EXCEL: if the xlsx-format does not work, use File.xls Choice of sample period: Sample / @all @first @last 1990 2010 1981Q3 2005Q1 1960M1 2000M11 in command line e.g.: smpl @first 1990 Univariate statistics: Click series / View / Spreadsheet Graph Descriptive Statistics&Tests Correlogram data as numbers Graphics z.B. histogram, mean, etc. autocorrelationen Generation/Transformation of series: Generate / x = 0 generates a series with zeros Generate / pi = (pc – pc(-1))/pc(-1)*100 Generates the inflation rate in % based on prices pc Generate / x = log(y) taking logs Generate / dlx = dlog(x) dlx = log(x) – log(x(-1)) Growth rate in continuous time Generate / y = exp(x) exp(x) as command: series x=0 Trend variable (linear): Generate / t = @trend Standard normal distributed realizations: Generate / x = nrnd Lags, lagged variables, taking differences: Generate / x1 = x(-1) x1(t) = x(t-1), Lag 1 of x Generate / dx = d(x) dx(t) = x(t) – x(t-1) = (1-B)x(t) first difference Generate / d2x = d(x,2) d2x(t) = dx(t) – dx(t-1) = (1-B)^(2)x(t) taking first differences twice Generate / d12x = d(x,0,12) d12x(t) = x(t) - x(t-12) = [1-B^(12)]x(t) seasonal difference for monthly data Generate d12_1x = d(x,1,12) d12_1x(t) = (1-B)[1-B^(12)]x(t) Geneartion of dummy variables: seasonal dummies: s=1,2,3,... Generate / ds = @seas(s) as command: series ds = @seas(s) Generate / d1 = 0 and manually in View/Spreadsheet use Edit+/p-value for x of...

Words: 669 - Pages: 3

Premium Essay

Buad 508

...MULTIPLE CHOICE (CHAPTER 4) 1. Using a sample of 100 consumers, a double-log regression model was used to estimate demand for gasoline. Standard errors of the coefficients appear in the parentheses below the coefficients. Ln Q = 2.45 -0.67 Ln P + . 45 Ln Y - .34 Ln Pcars (.20) (.10) (.25) Where Q is gallons demanded, P is price per gallon, Y is disposable income, and Pcars is a price index for cars. Based on this information, which is NOT correct? a. Gasoline is inelastic. b. Gasoline is a normal good. c. Cars and gasoline appear to be mild complements. d. The coefficient on the price of cars (Pcars) is insignificant. e. All of the coefficients are insignificant. 2. In a cross section regression of 48 states, the following linear demand for per-capita cans of soda was found: Cans = 159.17 – 102.56 Price + 1.00 Income + 3.94Temp |  |Coefficients |Standard Error |t Stat | |Intercept |159.17 |94.16 |1.69 | |Price |-102.56 |33.25 |-3.08 | |Income |1.00 |1.77 |0.57 | |Temperature |3.94 |0.82 |4.83 | R-Sq = 54.1% R-Sq(adj) = 51.0% From the linear regression results in the cans case...

Words: 1651 - Pages: 7

Premium Essay

Factors That Affect Gdp of the Philippines

... Product  Growth  Rate   5.  Inflation  Rate   6.  Inflation  Rate  and  Gross  Domestic  Product  Growth  Rate   III.  Operational  Framework           1.  Presentation  of  Data   2.  Description  of  Variables   3.  A-­‐priori  Expectations   4.  Model  of  the  Study   IV.  Methodology   V.  Empirical  Results  and  Interpretation                   1.  Summary  of  Data   2.  Regression  of  the  Model   3.  New  Model  of  the  Study   4.  Testing  for  Multicollinearity   5.  Testing  for  Heteroscedasticity   6.  Testing  for  Autocorrelation   7.  Correction  for  Autocorrelation   8.  Final  Model  of  the  Study   VI.  Conclusion  and  Recommendations   VII.  Sources               I.  Introduction   Background  of  the  study   Gross   domestic   product   (GDP)   is   total   output   of...

Words: 4784 - Pages: 20

Premium Essay

Wage Rate Determination Under Cross Sectional Data

...Thesis Organization This thesis will be organized into five chapters; the first chapter will cover the introduction and the background about the issue to be studied. This chapter covers the research problem, questions and the hypothesis to be tested and also the justification for conducting the study. The second chapter will provide a literature review on wage rate determination issues globally and in Zimbabwe. This chapter provides an overview of wage determination. Determination of wage rates is reviewed bringing into attention some of the factors that contribute to determination of wages. Review of the importance of wage determination is also done. Empirical tools commonly used in assessing wage determination are finally reviewed. The third chapter will provide an outline of the methodology used in the study. A conceptual framework will be developed in order to identify possible variables for the study. Tools of analysis are also discussed in this chapter. The relevance of the hypothesis is also tested in this chapter. Chapter four analyzes the determinants of wages. Econometric techniques will be used in this chapter. Chapter five provides a conclusion and possible policy recommendations for the empirical findings of the study. A summary of results is presented first before recommendations. Chapter one Background of the study There is extensive literature that demonstrates important wage differences across workers exist...

Words: 2273 - Pages: 10