Premium Essay

In: Other Topics

Submitted By delilah

Words 2046

Pages 9

Words 2046

Pages 9

Notes accompany the Third Edition of Statistics: The Art and Science of Learning From Data by Alan Agresti and Christine Franklin

Contents

CHAPTER 9: HYPOTHESIS TESTS 9.1 Elements of a Hypothesis Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Normal Hypothesis Test for Population Proportion p . . . . . . . . . . . . . . . . . . 9.3 The t-Test: Hypothesis Testing for Population Mean µ . . . . . . . . . . . . . . . . . 9.4 Possible Errors in Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Limitations and Common Misinterpretations of Hypothesis Testing . . . . . . . . . . 1 1 6 10 15 17

Stat 3011

Chapter 9

CHAPTER 9: HYPOTHESIS TESTS

Motivating Example A diet pill company advertises that at least 75% of its customers lose 10 pounds or more within 2 weeks. You suspect the company of falsely advertising the beneﬁts of taking their pills. Suppose you take a sample of 100 product users and ﬁnd that only 5% have lost at least 10 pounds. Is this enough to prove your claim? What about if 72% had lost at least 10 pounds? Goal:

9.1 Elements of a Hypothesis Test

1. Assumptions

2. Hypotheses Each hypothesis test has two hypotheses about the population: Null Hypothesis (H0 ):

Alternative Hypothesis (Ha ):

1

Stat 3011

Chapter 9

Diet Pill Example: Let p = true proportion of diet pill customers that lose at least 10 pounds. State the null and alternative hypotheses for the diet pill example.

3. Test Statistic Deﬁnition: Test Statistic A test statistic is a measure of how compatible the data is with the null hypothesis. The larger the test statistic, the less compatible the data is with the null hypothesis. Most test statistics we will see have the following form:

What does a large value of |T…...

Free Essay

...STAT 346/446 - A computer is needed on which the R software environment can be installed (recent Mac, Windows, or Linux computers are sufficient).We will use the R for illustrating concepts. And students will need to use R to complete some of their projects. It can be downloaded at http://cran.r-project.org. Please come and see me when questions arise. Attendance is mandatory. Topics covered in STAT 346/446, EPBI 482 Chapter 5 – Properties of a Random Sample Order Statistics Distributions of some sample statistics Definitions of chi-square, t and F distributions Large sample methods Convergence in probability Convergence in law Continuity Theorem for mgfs Major Theorems WLLN CLT Continuity Theorem Corollaries Delta Method Chapter 7 – Point Estimation Method of Moments Maximum Likelihood Estimation Transformation Property of MLE Comparing statistical procedures Risk function Inadmissibility and admissibility Mean squared error Properties of Estimators Unbiasedness Consistency Mean-squared error consistency Sufficiency (CH 6) Definition Factorization Theorem Minimal SS Finding a SS in exponential families Search for the MVUE Rao-Blackwell Theorem Completeness Lehmann-Scheffe Location and scale invariance Location and scale parameters Cramer-Rao lower bound Chapter 9 - Interval Estimation Pivotal Method for finding a confidence interval Method for finding the “best” confidence interval Large sample confidence......

Words: 321 - Pages: 2

Premium Essay

...STAT 302 – Statistical Methods Lecture 8 Dr. Avishek Chakraborty Visiting Assistant Professor Department of Statistics Texas A&M University Using sample data to draw a conclusion about a population • Statistical inference provides methods for drawing conclusions about a population from sample data. • Two key methods of statistical inference: o o Confidence intervals Hypothesis tests (a.k.a., tests of significance) Hypothesis Testing: Evaluating the effectiveness of new machinery at the Bloggs Chemical Plant • Before the installation of new machinery, long historical records revealed that the daily yield of fertilizer produced by the Bloggs Chemical Plant had a mean μ = 880 tons and a standard deviation σ = 21 tons. Some new machinery is being evaluated with the aim of increasing the daily mean yield without changing the population standard deviation σ. Hypothesis Testing: Evaluating the effectiveness of new machinery at the Bloggs Chemical Plant Null hypotheses • The claim tested by a statistical test is called the null hypothesis. The test is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of “no effect” or “no difference”, that is, a statement of the status quo. Alternative hypotheses • The claim about the population that we are trying to find evidence for is the alternative hypothesis. The alternative hypothesis is one-sided if it states that a parameter is larger than or...

Words: 921 - Pages: 4

Premium Essay

...printout given above), Explain (0, (1, also provide the units of slope and y-intercept. Does (0, (1 make sense? d) Is there sufficient evidence to conclude that the model contributes information for predicting the Percentage of refund spent in 3-months? (State Hypothesis, and do the test.) e) Is there sufficient evidence to conclude, "As the family income increases than the Percentage of refund spent in 3-month decreases? (State Hypothesis, and do the test.) (Does it make sense to do this test? Explain) f) Calculate R-sq, what is the practical meaning of R-sq? g) Calculate the Standard error of Estimate, What is the practical meaning of S(? (Get the residual printouts – 5 points) In Minitab, Goto Stat>regression>regression, then follow the screen prints below to get the residual plots. [pic] [pic] And click ok h) State the regression Assumption 1 and test it using the residual plots. i) State the regression Assumption 2 and test it using the residual plots. j) State the regression Assumption 3 and test it using the residual plots. k) State the regression Assumption 4 and test it using the residual plots. l) Calculate R-sq(adjusted). m) Find 95% Confidence Interval for (0 n) Find 95% Confidence Interval for (1 o) Explain the relationship between Confidence Interval and Hypothesis testing. p) What is an Outlier? Are there any outliers? q) What is an...

Words: 1807 - Pages: 8

Premium Essay

...83/84: STAT, TESTS, 1-PropZTest Po: assumed proportion (0.21) x: number of successes (732) n: total number of candies (3500) In the next line, select the correct alternative hypothesis/test, then Calculate, Enter. On the next screen, the second line shows the test. The next line has the test statistic. The next line has the p-value of the test (if less than significance level, reject null) The next two lines have and n. IF using StatCrunch, you will want Stat > Proportions > One Sample > with summary. In the first window, you will enter the same information as for part 3: number of the color (number of successes) and total number of candies (number of observations). Then click Next, and in the following window, enter the claimed proportion as a decimal in the box next to “null”, select the inequality that matches the alternative hypothesis and then click Calculate. The output will include the test statistic (Z-Stat) and the p-value. Hypothesis test results: p : proportion of successes for population H0 : p = 0.21 HA : p ≠ 0.21 Proportion Count Total Sample Prop. Std. Err. Z-Stat P-value p 732 3500 0.20914286 0.006884766 -0.12449848 0.9009 Mean When you test for the mean number of candies per bag, you will need (sample mean), s (sample standard deviation) and n (total number of bags) as before. The test statistic is a z, because we have a large sample. Test statistic: IF using the TI 83/84: STAT, TESTS, Z-Test Input: Stats 0: ......

Words: 1120 - Pages: 5

Premium Essay

...Mean Alpha = 1 - Confidence Level Margin of Error (E): E =CONFIDENCE(alpha,SD,n) E =CONFIDENCE.T(alpha,SD,n) (small sample) Confidence Interval: = (mean - E) to (mean + E) Confidence Intervals, Proportion z =NORMSINV(confidence level + alpha/2) E =z * SQRT(p * (1-p)/n) Confidence Interval: =(proportion - E) to =(prop + E) Sample Size, Mean z = NORMSINV(confidence+alpha/2) n = (z * SD / E)^2 Sample Size, Proportion Validation: n*p ≥ 5 and n*(1-p) ≥ 5 n = p*(1 − p) * (z / E)^2 Hypothesis Testing Mean: 1-tail: z =NORMSINV(confidence level) 2-tail: z =NORMSINV(confidence level + alpha/2) Test Statistic: z = (sample mean - mean) / (SD/SQRT(n)) less reject if stat is more – than decision more reject if stat is more + than decision T2 reject if more – or + than decision Proportion: z = (pS - p) / SQRT(p*(1-p)/n) Test Statistic: z = (ps - p) / SQRT(p*(1 - p) / n) Std error= sqrt(p*q/n) q=1-p Small Sample: 1 tail: =TINV(2 * alpha,df) 2 tail: =TINV(alpha,df) Where df = degrees of freedom = n - 1 =T.INV(alpha,df) =T.INV.2T(alpha,df) P-Value: 1-tail: z =NORMDIST(sample mean, population mean, SD / SQRT(n),1) 2-tail: z = NORMDIST(sample mean, population mean, SD / SQRT(n),1)*2 Regression Analysis y = m*x + b (but in statistics it's written y = a + bx) In Excel statistics analysis: "Multiple R" = coefficient of correlation...

Words: 671 - Pages: 3

Premium Essay

...STAT 4220 Homework 2 Report Problem 1.22: a) Yˆ = 168.6 + 2.03X b) Yˆh = 168.6 + 2.03(40) = 168.6 + 81.2 = 249.8 c) 2.03 The population study is plastic hardness. The X is the elapsed time in hours and the Y is the hardness in Brinell units. The minimum unit was 196 with maximum to 253. The hours were 16 minimum and 40 maximum. The mean (average) was 225.6 for units and 28 for hours. The median was 226.5 units and 28 hours. The standard deviation of units with hour was 173.6. There was small variance large bias. Problem 1.28: a) a)Yˆ = 20517.6 + (-170.58)X No this equation does not fit well because there is not a line. b) 1)-170.58 2) Yˆh = 6871.2 3) ε10 = 1401.57 4) MSE= 5552112 The population was crime rates. The x is the percentage of the individuals in the county having at least high-school diploma and Y is the crime rate. The maximum percentage was 91 with the lowest 61. The crime rate was the maximum 14016 with the lowest 2105. The mean (average) was 7111 crime rate and 78.6 percent. The median was 79 percent and 6930 crime rate. The standard deviation of crime rate and percent was -6601.54. There was a Large variance small bias. Problem 1.31: In this problem the error will not include batch to batch variability and there will be a smaller variance from the original experiment. When you are going to use different batches there will not be a way to evaluate your results from the original experiment and the results there......

Words: 337 - Pages: 2

Free Essay

...tie in the Best Actress category, and the mean of the two ages is used; in 1932 there was a tie in the Best Actor category, and the mean of the two ages is used. These data are suggested by article “Ages of Oscar-winning Best Actors and Actress,” by Richard Brown and Gretchen Davis, Mathematics Teacher magazine. In that article, the year of birth of the award winner was subtracted from the year of the awards ceremony, but the ages in the tables below are based on the birth date of the winner and the date of the awards ceremony.) Analyzing the Results 1. Go to MyStatLab → Statcrunch → StatCrunch website → Open StatCrunch and will take you to the spreadsheet and use Data to load your data in excel onto the spreadsheet, Graph for all graphs, Stat for all analysis use it to answer question 2 to 4. Copy and paste all graphs and statcrunch output for full credit. 2. First explore the data using suitable statistics and graphs such as histogram, boxplot, etc. Use the results to make info In the histogram for actress mostly actresses received the Oscar in the age group between 20-40. The maximum number of actresses received the Oscar in the age group between 25-20. In the histogram for actors they mostly received Oscars at the age group between 30-50. The maximum number of actors received the Oscar in the age group between 40-45. The two boxes are a little different with the median of actress lower than that of actors. . Both box plots have no outliers there......

Words: 341 - Pages: 2

Premium Essay

...that is directly below 0.00. The probability of .8413 found in the body of the table means that the area less than a Z-value of 1.00 is equal to .8413. Because a normal random variable can range from -∞ to ∞, equality is negligible on a continuous scale. Therefore, any P(X = x) ( 0. Stats Class Example: Grades in a statistics course are normally distributed with a mean of 72 and a standard deviation of 8.5. Find the probability that a randomly selected student gets a grade of: (a) under 85. (b) over 80. (c) between 75 and 85. (d) between 61.6 and 83.1. Use the Normal Distribution to verify the Empirical Rule: Determining Percentiles from the Normal Distribution: Given ( and (, use X = ( + Z(. i.e. Z(.96) = 1.75. Z(.04) = -Z(.96) = -1.75. (i) What is the 78th percentile of the standard normal distribution? (ii) What is the 35th percentile of the standard normal distribution? Stats Class Example (cont.’d): (e) If the professor decides to give a grade of "A" to the top 10% of student grades, what grade would a student have to score to get an "A"? (f) If the professor decides to fail 5% of the students, what grade would a student have to score to pass......

Words: 14529 - Pages: 59

Free Essay

...Lecture 1 Examples. STAT 102 In Exercises 1-13, identify which of these types of sampling is used: random, stratified, systematic, cluster, or convenience. 1. 2. 3. 4. 5. 6. When she wrote Marriage and Divorce: Legal and Psychological Issues, author Julia Kim based her conclusions on 4500 responses from 100,000 questionnaires distributed to women. A psychologist at the University of Saskatchewan surveys all students from each of 20 randomly selected classes. A sociologist at Grant MacEwen Community College selects 12 men and 12 women from each of 4 Statistics classes. Sony selects every 200th compact disc from assembly line and conducts a thorough test of quality. A gun registry lobbyist writes the name of each Member of Parliament on a separate card, shuffles the cards, and then draws 10 names. Due to a number of factors, a real estate broker classifies his clients as: upper-class Protestants, middle-class Protestants, lower-class Protestants, upper-class Catholics, etc. Over last years he had about 1200 clients from 15 different groups. Trying to analyze the tendency he randomly selected 3 clients from each group. A fashion expert polls online 50 males and 50 females about their brand of clothing. An Air Canada market researcher interviews all passengers on each of 10 randomly selected flights. A medical researcher from Acadia University interviews all leukemia patients in each of 20 randomly selected hospitals. A reporter for the Financial Post interviews every 25th......

Words: 404 - Pages: 2

Premium Essay

...test results: μ1 : mean of Credit Balance($) μ2 : mean of Size μ1 - μ2 : mean difference H0 : μ1 - μ2 = 0.05 HA : μ1 - μ2 ≠ 0.05 (with pooled variances) Difference | Sample Mean | Std. Err. | DF | T-Stat | P-value | μ1 - μ2 | 3967.04 | 131.7902 | 98 | 30.100796 | <0.0001 | Base on my findings I believe that size is a great indictor of helping find credit balance. Here is why, you can look at the average size of a household and see the cost to run that house household. The bigger the household size the more money it cost to operate that household. I think that this what the all the test illustrates when you look at each figure. APPENDIX C Simple linear regression results: Dependent Variable: Size Independent Variable: Credit Balance($) Size = -2.1549776 + 0.0014041137 Credit Balance($) Sample size: 50 R (correlation coefficient) = 0.7524 R-sq = 0.56616867 Estimate of error standard deviation: 1.1572698 Parameter estimates: Parameter | Estimate | Std. Err. | DF | 95% L. Limit | 95% U. Limit | Intercept | -2.1549776 | 0.7231484 | 48 | -3.6089647 | -0.70099014 | Slope | 0.0014041137 | 1.7740639E-4 | 48 | 0.0010474143 | 0.0017608132 | Analysis of variance table for regression model: Source | DF | SS | MS | F-stat | P-value | Model | 1 | 83.89487 | 83.89487 | 62.64207 | <0.0001 | Error | 48 | 64.28513 | 1.3392736 | | | Total | 49 | 148.18 | | | | APPENDIX D When you look at the intervals of the Credit......

Words: 1413 - Pages: 6

Free Essay

...Stats PaperCara Robertson September 19, 2013 Elements of Statistics MAT121.M2 Jenny Fiedeldey Chatfield College Statistics Paper #1 “It Ain't Necessarily So” Being as interested in news and politics as I am, I was already aware of the fact that statistics are extremely inaccurate. Statistics falsely portray their sample or population to be over exaggerated or under exaggerated. Either way, statistics are basically lies, whether that is the intention or not. Reading “It Ain't Necessarily So” has only further confirmed by beliefs about statistics and their falseness. I had never taken into consideration all of those who are involved in the inaccuracy of said statistics, though. I had always just blamed the news sources for that. However, reading this paper has taught me that the news sources are probably the only people not involved in what is basically a lie; they are just given the information and told to report it. I now know that the victim (or in some cases, so-called “victim”), investigator, and the person collecting the data are the ones who are to blame for the misrepresentation. These false studies are being presented to the public every day, concerning a very wide range of topics. Extreme confusion is caused when the public hears drastically varying numbers and reports concerning things such as presidential approval rates, unemployment rates, and any other topic one might think of. When each news source is reporting entirely different information......

Words: 975 - Pages: 4

Premium Essay

...financial institutions it will be happy with the choice it makes to come or to stay with Wells Fargo. In the 1990’s Wells Fargo came out with a Vision and Values book but under Norwest Corporation at that time they were a small regional bank now Wells Fargo is a well known bank with a large global presence. Going back and keeping the traditions of each company they brought into make Wells Fargo what they are today is how the vision and values have all come about. Those beliefs are just as strong today as they were when they were first written down on a piece of paper. Staying true to them has helped Wells Fargo become known in every household, and where one in 600 US workers work. Wells Fargo is now home to 70 million customers. With stats like this, Wells Fargo is ranked in the top 10 publicly traded company according to Forbes magazine, this based on the sales, assets and market value. This is all attributed to going back to when they first started and kept the customers first. It didn’t matter how big Wells Fargo got there vision and values remained the same. Running head Wells Fargo 3 2. Analyze the five (5) forces of competition to determine how they impact the company. Wells Fargo had to compete against other banks. And depending on the size, the larger banks were trying to maximize customers as well. They......

Words: 2009 - Pages: 9

Premium Essay

...options would be in error (Second building block: all estimates are wrong) Example: You are measuring the tensile strength provided by 4 suppliers. Supplier 1 has been your supplier in the past and will be considered your base level. You create 3 dummy variables: X1 = 1 if supplier 2, 0 otherwise; x2 = 1 if supplier 3, 0 otherwise; and x3 = 1 if supplier 4, 0 otherwise. After taking random samples of size 5 from each supplier, you find the following result: ANOVA | | | | | | | | df | SS | MS | F | Significance F | | Regression | 3 | 63.2855 | 21.09517 | 3.461629 | 0.041366 | | Residual | 16 | 97.504 | 6.094 | | | | Total | 19 | 160.7895 | | | | | | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Intercept (S1) | 19.52 | 1.103993 | 17.68128 | 6.34E-12 | 17.17964 | 21.86036 | X1 (S2-S1) | 4.74 | 1.561282 | 3.035968 | 0.007866 | 1.430231 | 8.049769 | X2 (S3-S1) | 3.32 | 1.561282 | 2.126458 | 0.049376 | 0.010231 | 6.629769 | X3 (S4-S1) | 1.64 | 1.561282 | 1.050419 | 0.309133 | -1.66977 | 4.949769 | Predicted tensile strength = 19.52+4.74X1+3.32X2+1.64X3 With 95% confidence, average tensile strength of supplier 2 is from 1.430231 to 8.049769 more than supplier 1. With 95% confidence, average tensile strength of supplier 4 is from 1.66977 less than up to 4.949769 more than supplier 1. The sample average tensile strength of suppler 4 would be 19.52+1.64 = 21.16 Using part b to infer......

Words: 2068 - Pages: 9

Premium Essay

...| | | Regression Statistics | | | | | | | | R | 0.63097 | | | | | | | R Square | 0.39813 | | | | | | | Adjusted R Square | 0.38559 | | | | | | | Standard Error | 731.71323 | | | | | | | Total Number Of Cases | 50 | | | | | | | Amount Charged($) = 2203.9996 + 40.4798 * Income ($1000s) | | | | | | | | | ANOVA | | | | | | | | | d.f. | SS | MS | F | p-level | | | Regression | 1. | 16,999,744.78596 | 16,999,744.78596 | 31.75123 | 0. | | | Residual | 48. | 25,699,404.03404 | 535,404.25071 | | | | | Total | 49. | 42,699,148.82 | | | | | | | | | | | | | | | Coefficients | Standard Error | LCL | UCL | t Stat | p-level | H0 (5%) rejected? | Intercept | 2,203.99962 | 329.04893 | 1,542.40241 | 2,865.59683 | 6.69809 | 0. | Yes | Income ($1000s) | 40.47977 | 7.18386 | 26.03566 | 54.92388 | 5.63482 | 0. | Yes | T (5%) | 2.01063 | | | | | | | LCL - Lower value of a reliable interval (LCL) | | | | | UCL - Upper value of a reliable interval (UCL) | | | | | | | | | | | | | Residuals | | | | | | | | Observation | Predicted Y | Residual | Standard Residuals | | | | | 1 | 4,389.90718 | -373.90718 | -0.5163 | | | | | 2 | 3,418.39271 | -259.39271 | -0.35817 | | | | | 3 | 3,499.35225 | 1,600.64775 | 2.2102 | | | | | 4 | 4,227.9881 | 514.0119 | 0.70976 | | | |......

Words: 2460 - Pages: 10

Premium Essay

...sales and size as independent variable: SUMMARY OUTPUT | | | | | | | | | | | | | | | | | | Regression Statistics | | | | | | | | Multiple R | 0.992399 | | | | | | | | R Square | 0.984855 | | | | | | | | Adjusted R Square | 0.977283 | | | | | | | | Standard Error | 1.249868 | | | | | | | | Observations | 10 | | | | | | | | | | | | | | | | | ANOVA | | | | | | | | | | df | SS | MS | F | Significance F | | | | Regression | 3 | 609.527 | 203.1757 | 130.0599 | 7.56E-06 | | | | Residual | 6 | 9.373017 | 1.562169 | | | | | | Total | 9 | 618.9 | | | | | | | | | | | | | | | | | Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | Intercept | -10.1702 | 3.473129 | -2.92827 | 0.026346 | -18.6687 | -1.6718 | -18.6687 | -1.6718 | Food Sales (tens of thousands of dollars) | 0.027038 | 0.012041 | 2.245505 | 0.065847 | -0.00243 | 0.056501 | -0.00243 | 0.056501 | Nonfood Sales (tens of thousands of dollars) | 0.097052 | 0.030147 | 3.219291 | 0.018153 | 0.023285 | 0.17082 | 0.023285 | 0.17082 | Store Size (thousands of square feet | 0.524675 | 0.059158 | 8.869011 | 0.000114 | 0.37992 | 0.66943 | 0.37992 | 0.66943 | Multiple R is 0.99. It is positive and very close to 1. This means that this model is Adjusted R square is 97 % meaning that 97% of change in the profit can be explained by these 3......

Words: 1157 - Pages: 5