Free Essay

Submitted By maabak

Words 20315

Pages 82

Words 20315

Pages 82

Statistical Thinking in Sports

CRC PRESS Boca Raton Ann Arbor

London

Tokyo

Contents

1

Introduction Jim Albert and Ruud H. Koning 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Patterns of world records in sports (2 articles) . . . . . . . 1.1.2 Competition, rankings and betting in soccer (3 articles) . . 1.1.3 An investigation into some popular baseball myths (3 articles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Uncertainty of attendance at sports events (2 articles) . . . 1.1.5 Home advantage, myths in tennis, drafting in hockey pools, American football . . . . . . . . . . . . . . . . . . . . . 1.2 Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modelling the development of world records in running Gerard H. Kuper and Elmer Sterken 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 2.2 Modelling world records . . . . . . . . . . . . . . 2.2.1 Cross-sectional approach . . . . . . . . . . 2.2.2 Fitting the individual curves . . . . . . . . 2.3 Selection of the functional form . . . . . . . . . . 2.3.1 Candidate functions . . . . . . . . . . . . . 2.3.2 Theoretical selection of curves . . . . . . . 2.3.3 Fitting the models . . . . . . . . . . . . . . 2.3.4 The Gompertz curve in more detail . . . . 2.4 Running data . . . . . . . . . . . . . . . . . . . . 2.5 Results of ﬁtting the Gompertz curves . . . . . . . 2.6 Limit values of time and distance . . . . . . . . . 2.7 Summary and conclusions . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 2 3 4 4 5 5 5 7

2

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

7 9 10 11 12 12 17 18 18 23 23 26 28 29 33 33 34 34

3

The physics and evolution of Olympic winning performances Ray Stefani 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Running events . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The physics of running . . . . . . . . . . . . . . . . . . .

i

ii

Statistical Thinking in Sports 3.2.2 Measuring the rate of improvement in running . . . . 3.2.3 Periods of summer Olympic history . . . . . . . . . 3.2.4 The future of running . . . . . . . . . . . . . . . . . 3.3 Jumping events . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The physics of jumping . . . . . . . . . . . . . . . . 3.3.2 Measuring the rate of improvement in jumping . . . 3.3.3 The future of jumping . . . . . . . . . . . . . . . . 3.4 Swimming events . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The physics of swimming . . . . . . . . . . . . . . 3.4.2 Measuring the rate of improvement in swimming . . 3.4.3 The future of swimming . . . . . . . . . . . . . . . 3.5 Rowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 The physics of rowing . . . . . . . . . . . . . . . . 3.5.2 Measuring the rate of improvement in rowing . . . . 3.5.3 The future of rowing . . . . . . . . . . . . . . . . . 3.6 Speed skating . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 The physics of speed skating . . . . . . . . . . . . . 3.6.2 Measuring the rate of improvement in speed skating 3.6.3 Periods of winter Olympic history . . . . . . . . . . 3.6.4 The future of speed skating . . . . . . . . . . . . . . 3.7 A summary of what we have learned . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 38 40 40 40 44 45 45 45 47 49 50 50 52 53 54 54 55 56 58 58 60 63 . . . . . . . . . . . . . . . . . . . . . . . . 63 64 67 72 74 74 77 . . . . . . . . . . . 77 78 79 80 80 80 82 84 84 85 86

4

Competitive balance in national European soccer competitions Marco Haan, Ruud H. Koning and Arjen van Witteloostuijn 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Measurement of competitive balance . . . . . . . . . . . . 4.3 Empirical results . . . . . . . . . . . . . . . . . . . . . . 4.4 Can national competitive balance measures be condensed? 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

Statistical analysis of the effectiveness of the FIFA World Rankings Ian McHale and Stephen Davies 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 FIFA’s ranking procedure . . . . . . . . . . . . . . . . . . . . . 5.3 Implications of the FIFA World Rankings . . . . . . . . . . . . 5.4 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Preliminary analysis . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Team win percentage, in and out of own confederation . 5.5.2 International soccer versus domestic soccer . . . . . . . 5.6 Forecasting soccer matches . . . . . . . . . . . . . . . . . . . . 5.7 Using the FIFA World Rankings to forecast match results . . . . 5.7.1 Reaction to new information . . . . . . . . . . . . . . . 5.7.2 A forecasting model for match result using past results .

Table of Contents 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

iii 89 89

Forecasting scores and results and testing the efﬁciency of the ﬁxed-odds betting market in Scottish league football 91 Stephen Dobson and John Goddard 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3 Regression models for goal scoring and match results . . . . . . . 95 6.4 Data and estimation results . . . . . . . . . . . . . . . . . . . . . 97 6.5 The efﬁciency of the market for ﬁxed-odds betting on Scottish league football . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 . . Hitting in the pinch Jim Albert 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 7.2 A breakdown of a plate appearance: four hitting rates 7.3 Predicting runs scored by the four rates . . . . . . . . 7.4 Separating luck from ability . . . . . . . . . . . . . . 7.5 Situational biases . . . . . . . . . . . . . . . . . . . 7.6 A model for clutch hitting . . . . . . . . . . . . . . . 7.7 Clutch stars? . . . . . . . . . . . . . . . . . . . . . . 7.8 Related work and concluding comments . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . Does momentum exist in a baseball game? Rebecca J. Sela and Jeffrey S. Simonoff 8.1 Introduction . . . . . . . . . . . . . . . 8.2 Models for baseball play . . . . . . . . 8.3 Situational and momentum effects . . . 8.4 Does momentum exist? . . . . . . . . . 8.4.1 Modeling transition probabilities 8.4.2 Modeling runs scored . . . . . . 8.5 Rally starters and rally killers . . . . . . 8.6 Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . 111 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 112 113 114 117 124 125 130 132 135 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 136 138 140 140 144 149 150 151

7

8

9

Inference about batter-pitcher matchups in baseball from small samples153 Hal S. Stern and Adam Sugano 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.2 The batter-pitcher matchup: a binomial view . . . . . . . . . . . . 154 9.3 A hierarchical model for batter-pitcher matchup data . . . . . . . 155 9.3.1 Data for a single player . . . . . . . . . . . . . . . . . . . 155

iv

Statistical Thinking in Sports 9.3.2 A probability model for batter-pitcher matchups 9.3.3 Results - Derek Jeter . . . . . . . . . . . . . . 9.3.4 Results - multiple players . . . . . . . . . . . . 9.4 Batter-pitcher data from the pitcher’s perspective . . . 9.4.1 Results - a single pitcher . . . . . . . . . . . . 9.4.2 Results - multiple players . . . . . . . . . . . . 9.5 Towards a more realistic model . . . . . . . . . . . . . 9.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 158 160 160 161 163 163 164 165

10 Outcome uncertainty measures: how closely do they predict a close game? 167 Babatunde Buraimo, David Forrest and Robert Simmons 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.2 Measures of outcome uncertainty . . . . . . . . . . . . . . . . . . 169 10.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 10.4 Preliminary analysis of the betting market . . . . . . . . . . . . . 172 10.5 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 10.6 Out-of-sample testing . . . . . . . . . . . . . . . . . . . . . . . . 175 10.7 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . 176 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 . . 11 The impact of post-season play-off systems on the attendance at regular season games 179 Chris Bojke 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 11.2 Theoretical model of the demand for attendance and the impact of play-off design . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 11.3 Measuring the probability of end-of-season outcomes and game signiﬁcance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 11.4 The data: the 2000/01 English Football League 2nd tier . . . . . . 185 11.5 Statistical issues in the measurement of the determinants of attendance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 11.5.1 Skewed, non-negative heteroscedastic data . . . . . . . . 190 11.5.2 Clustering of attendance within teams and unobserved heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . 192 11.5.3 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . 192 11.5.4 Final statistical model . . . . . . . . . . . . . . . . . . . 193 11.6 Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 194 11.6.1 Choice of explanatory variables . . . . . . . . . . . . . . 194 11.6.2 Regression results . . . . . . . . . . . . . . . . . . . . . . 195 11.7 The impact of the play-off system on regular league attendances . 197 11.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 . .

Table of Contents 12 Measurement and interpretation of home advantage Ray Stefani 12.1 Introduction . . . . . . . . . . . . . . . . . . . . 12.2 Measuring home advantage . . . . . . . . . . . . 12.3 Rugby union, soccer, NBA . . . . . . . . . . . . 12.4 Australian rules football, NFL and college football 12.5 NHL hockey and MLB baseball . . . . . . . . . 12.6 Can home advantage become unfair? . . . . . . . 12.7 Summary . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

v 203 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 204 207 211 212 214 214 215 217 217 218 221 223 224 226 229 230 231 232 233 234 234 235 237 238 239 241 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 242 243 244 252 253 254 259

13 Myths in Tennis Jan Magnus and Franc Klaassen 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 The data and two selection problems . . . . . . . . . . . . . . . . 13.3 Service myths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 A player is as good as his or her second service . . . . . . 13.3.2 Serving ﬁrst . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 New balls . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Winning mood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 At the beginning of a ﬁnal set, both players have the same chance of winning the match . . . . . . . . . . . . . . . . 13.4.2 In the ﬁnal set the player who has won the previous set has the advantage . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 After breaking your opponent’s service there is an increased chance that you will lose your own service. . . . . . . . . 13.4.4 After missing break points in the previous game there is an increased chance that you will lose your own service . . . 13.5 Big points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5.1 The seventh game . . . . . . . . . . . . . . . . . . . . . . 13.5.2 Do big points exist? . . . . . . . . . . . . . . . . . . . . . 13.5.3 Real champions . . . . . . . . . . . . . . . . . . . . . . . 13.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Back to back evaluations on the gridiron David J. Berri 14.1 Why do professional team sports track player statistics? 14.2 The NFL’s quarterback rating measure . . . . . . . . . 14.3 The Scully approach . . . . . . . . . . . . . . . . . . . 14.4 Modeling team offense and defense . . . . . . . . . . . 14.5 Net Points, QB Score and RB Score . . . . . . . . . . . 14.6 Who is the best? . . . . . . . . . . . . . . . . . . . . . 14.7 Forecasting performance in the NFL . . . . . . . . . . 14.8 Do different metrics tell a different story? . . . . . . .

vi

Statistical Thinking in Sports 14.9 Do we have marginal physical product in the NFL? . . . . . . . . 260 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

15 Optimal drafting in hockey pools Amy E. Summers, Tim B. Swartz and Richard A. Lockhart 15.1 Introduction . . . . . . . . . . . . . . . . . . . . 15.2 Statistical modelling . . . . . . . . . . . . . . . . 15.2.1 Distribution of points . . . . . . . . . . . 15.2.2 Distribution of games . . . . . . . . . . . 15.3 An optimality criterion . . . . . . . . . . . . . . 15.4 A simulation study . . . . . . . . . . . . . . . . 15.5 An actual Stanley Cup playoff pool . . . . . . . . 15.6 Discussion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . References List of authors

263 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 264 264 266 268 269 273 276 276 277 291

1

Introduction

Jim Albert Bowling Green State University Ruud H. Koning University of Groningen

1.1

I NTRODUCTION

Sports has taken an ever more prominent position in society. An increasing number of people watch sports events on television and more people see live sports in stadiums and arenas. The economic value of franchises, broadcasting rights, and merchandising has grown. Books on sports appear on the New York Times bestseller list (for example, Lewis, 2004). Besides this increasing interest from the general public, scientists have also taken on sports for their research agenda. Traditionally, research from physiology and medicine has been used to improve performance in sports. Currently researchers from economics, statistics, sociology and law are working in the sports ﬁeld. In this volume, the focus is on the statistical analysis of sports. There has always been a close connection between sports and statistics. In most sports, players and teams are measured by various statistics, and these statistics are used to provide rankings of players and teams. A recent popular phenonema is fantasy sports where participants draft teams of players and games are won and lost on the basis of actual statistical information. One reason for the close connection between statistics and sports is probably the abundant data that are available on sports. Scores are kept and individual performance is measured and tracked over time. The advent of the internet has perhaps helped to distribute these data to an ever wider group of researchers. Also, sport is an convenient and familiar context to use in teaching or in demonstrating a new statistical method. In this volume, we illustrate a number of different models for sports data, such as time series, linear regression, ordered probit regression, factor analysis, and generalized linear models. A variety of distributions are used to model the variable of interest such as binomial, gamma, Poisson, and others. Parameters are estimated by least squares, maximum likelihood, Bayesian methods and used in simulation models. Indeed, sports provides a very broad area of application of statistical thinking.

1

2 1.1.1

Statistical Thinking in Sports PATTERNS OF WORLD RECORDS IN SPORTS (2 ARTICLES )

One fascinating subject for study is the pattern of world records in sports over time. Gerald Kuper and Elmer Sterken, in “Modelling the development of world records in running” provide an interesting look into the pattern of world records of metric running events for men and women. As the authors explain, there are interesting questions associated with this data. Can one estimate the ultimate human performance in these events? What is the impact of technology innovations on the pattern of records? Is the pattern of world records similar for different running lengths, and will women outperform men in the future? The authors provide a comprehensive survey of the use of different parametric families to model world records and give some interesting conclusions. Ray Stefani in “The physics and evolution of Olympic winning performances” takes a “holistic” view of the pattern of winning performances over time in a variety of olympic sports. To understand the changes in the winning performance in a given sport, say swimming, one should understand the factors inﬂuencing the power of a swimmer, such as the size and ﬁtness of the athletes, the arm and leg positioning techniques, coaching, and the quality of the venue and equipment. Although one cannot directly compare the time of a swimming event with the height of a pole vault, Stefani deﬁnes a dimensionless measure, percent improvement per Olympiad, to describe the change in winning performances. This measure is used to contrast the evolution of winning performances in a wide variety of sports. 1.1.2 C OMPETITION , RANKINGS AND BETTING IN SOCCER (3 ARTICLES )

Soccer is the most popular sport of the world, as judged by the number of people playing or watching the sport. Besides being interesting as a sport, it has also become an economic activity of some signiﬁcance. For example, anti-trust regulators watch the sale of television rights, and the European Commission is involved in setting up a system of transfer fees. Also, just as in many other sports, betting on soccer matches has become increasingly popular. Despite ongoing commercialization of soccer, one would almost forget that it is a game, and organizing leagues want to know which team is best. These issues are addressed in Chapters 4, 5 and 6. Marco Haan, Ruud Koning, and Arjen Van Witteloostuijn look at the development of competitive balance over time in their chapter “Competitive balance in national European soccer competitions”. They do so for seven different countries. First, they discuss different dimensions of competitive balance, and propose empirical measures that capture these dimensions. Then they proceed to examine whether balance has changed over time, in particular, they investigate the popular belief that competitive balance has worsened over time. Finally, noting the lack of agreement on a single measure of competitive balance in soccer, they use a factor model to see whether seven different indicators can be reduced to one factor. It turns out that the predominant factor can be interpreted as contemporaneous competitive balance. Soccer is played at different levels: club teams play in national leagues and international tournaments as the Champions League, and national teams play every four years to win the World Cup. Ranking of club teams is relatively easy, considering the

Introduction

3

number of games they play. It is much harder to rank national teams as, in a given year, they play only a limited number of games, against a selected set of opponents. Still, the world soccer federation FIFA publishes a ranking of national teams, that is updated frequently. What is the quality of this ranking? This issue is addressed by Ian McHale and Stephen Davies in “Statistical analysis of the effectiveness of the FIFA World Rankings”. They conclude that the FIFA World Ranking does not use all past information efﬁciently. Betting on sport results is a hobby for one, and a way of earning a living for others. To what extent are betting markets efﬁcient? Stephen Dobson and John Goddard’s “Forecasting scores and results and testing the efﬁciency of the ﬁxed-odds betting market in Scottish league football” examine different betting strategies using two different statistical forecast models: a goals-based model and a results-based model. These forecasting models are capable of eliminating almost all of the bookmakers over-round. 1.1.3 A N INVESTIGATION INTO SOME POPULAR BASEBALL MYTHS CLES ) (3 ARTI -

Baseball has been called the most statistical sport in the sense that more numerical information is collected about this game than any other. For a given baseball play, such as a batted ball hit into center ﬁeld for a single, many associated variables will be recorded about the event including the inning, runners on base, the players on the ﬁeld, and the exact location of the hit in the ﬁeld. Websites such as The Baseball Achieve (www.baseball1.com) and Retrosheet (www.retrosheet.org) provide extensive datasets on historical players and teams and play-by-play game results. The easy availability of this data invites interesting analyses by researchers that are reﬂected in the three baseball papers in this volume. All three articles investigate the validity of popular myths in baseball. “Hitting in the pinch” by Jim Albert investigates the popular belief that particular ballplayers have the ability to perform better in important or “clutch” situations during a game. In his paper, Albert shows that the ability to hit well can depend on the runners on base and the number of outs in an inning. But there is little evidence to suggest that particular players have the ability to do better in important situations. Another popular belief in sports is the importance of momentum during a game. If particular players perform well during a game, many people believe that this will motivate other players to also perform well, causing the team to rally. Rebecca Sela and Jeffrey Simonoff take a statistical view of this issue in “Does momentum exist in a baseball game?” They begin with a Markov Chain model for baseball, where the probability of a movement from one state (say no runners on and one out) to another state (runner on ﬁrst and one out) only depends on the beginning state. They consider a more sophisticated model where the probability of a movement can depend on various “momentum” variables. The authors ﬁnd little statistical evidence of momentum effects, especially from a predictive viewpoint. Hal Stern and Adam Sugano in “Inference about batter-pitcher matchups in baseball from small samples” investigate the ﬁnal myth, the importance of batter/pitcher matchup data. Baseball managers will of-

4

Statistical Thinking in Sports

ten make decisions on the basis of how particular batters perform against particular pitchers. The problem is that one has many samples of large number of batter/pitcher matchups, and it is likely to see extreme sample outcomes by chance. Stern and Sugano suggest modeling this type of data by a hierarchical model; this model will allow one to see how a batter’s ability can vary depending on the quality of the pitcher. 1.1.4 U NCERTAINTY OF ATTENDANCE AT SPORTS EVENTS (2 ARTICLES )

Professional sports is a business and all teams wish to have high attendance at its games. A popular assumption is that audiences will be attracted to games where the outcome is very uncertain. Babatude Buraimo, David Forrest, and Robert Simmons, in “Outcome uncertainty measures: how closely do they predict a close game?” note that there is little empirical support for this assumption. They suggest that one problem is that although there are several measures of outcome uncertainty used in the literature, it is unclear whether any of these measures are actually good predictors of close contests. This paper deﬁnes several measures of outcome uncertainty and ﬁnd that they only explain a small amount of the variation in game results of Spanish football. Attendance at sports events is also a central theme in the article “The impact of post-season play-off systems on the attendance at regular season games” by Chris Bojke. Many different sports have introduced play-off systems; one beneﬁt of these systems is that they enhance attendance by increasing the time that teams are in contention for the league championship. Unfortunately, there is little research on the impact of the play-off design on attendance and Bojke presents a statistical model to understand this relationship. By ﬁtting the model to English football data, one is able to measure the impact of a particular playoff system on attendance. Even though his application is to soccer, the methodology can be applied to modeling attendance for other sports as well. 1.1.5 H OME ADVANTAGE , MYTHS IN TENNIS , DRAFTING IN HOCKEY POOLS , A MERICAN FOOTBALL

One is familiar with the saying “there is no place like home”, and this statement is especially true for sports competitions. For all sports, the team playing in its home ﬁeld or area generally has an advantage. Ray Stefani’s article “Measurement and interpretation of home advantage” explores the home-ﬁeld advantage for a number of sports. A mathematical model is used to quantify home ﬁeld advantage for a particular sport and the size of the estimated home advantage is shown to differ greatly between sports. Stefani describes the physiological, psychology and tactical factors implicit in home-ﬁeld advantage and argues that player fatigue, especially in a continuous-action sport, plays an important role in home advantage. Jan Magnus and Franc Klaassen provide a nice survey of their tennis research in “Myths in tennis.” Tennis is an international sport watched by fans all over the world and the television commentators have strong views about competition. In particular, commentators believe a player has an advantage if he or she serves ﬁrst, and top players have a special ability to perform well in the important points in a match. This ar-

Introduction

5

ticle explores these beliefs by use of four years of point-by-point tennis data from the Wimbledon tennis championships. Many interesting conclusions are reached about tennis matches and most of the beliefs held by television competitors are shown to be false. David Berri’s article “Back to back evaluations on the gridiron” describes the evaluation of player performance in American football. Baseball is a relative easy sport in evaluating player performance since the game is essentially a confrontation between a single pitcher and a single batter. Football is fundamentally different from baseball in that the performance of a particular player such as a quarterback is highly dependent on the performance of his teammates on the ﬁeld. Berri applies the regression methodology of Scully and Blass to develop a measure of marginal performance for football players. This methodology leads to some interesting measures of player performance; in particular, Berri’s measure of a quarterback’s performance is likely superior to the ofﬁcial quarterback rating system of the National Football League. But these measures do not appear to show consistency over time, suggesting that the statistics collected by professional football are not useful for measuring the productivity of individual players. Amy Summers, Tim Swartz, and Richard Lockhart in “Optimal drafting in hockey pools” considers how one can be successful in a hockey fantasy league. In this game, participants draft players from the 16 National Hockey League teams that have qualiﬁed for the Stanley Cup Playoffs. The winner of the league is the one whose players have accumulated the largest number of points. An interesting statistical model is devised for the number of points scored by hockey players and this is used to ﬁnd an optimal selection strategy. One attractive by-product of this model is that a player can make intelligent draft choices in real time.

1.2

W EBSITE

This book is intended for sports enthusiasts, with some background in statistics. They can be students, teachers, researchers, but also practitioners or (sport) policy makers. To promote more statistical thinking in sports, we have made a website www.statistical-thinking-in-sports.com that has additional material as appendices, references, tables, and also data. Feel free to use the information provided there, but do send a copy of your paper or project for inclusion on the website if you use material from that website.

1.3

ACKNOWLEDGEMENTS

Chapters in this volume have been commented upon by Jos de Koning, Bart Los, Alec Stephenson, Tom Wansbeek, Ryanne van Dalen, Jay Bennett, Eric Bradlow, and Chris Andrews. Technical support has been provided by Siep Kroonenberg and Sashi Kumar.

6

Statistical Thinking in Sports

R EFERENCES

Lewis, M. (2004). Moneyball: The Art of Winning an Unfair Game. New York: W.W. Norton & Company.

11

The impact of post-season play-off systems on the attendance at regular season games

Chris Bojke University of Groningen and Pharmerit UK

A BSTRACT

Post-season play-offs feature in many professional sports leagues and they are thought to positively inﬂuence attendance in regular season games by prolonging the extent to which teams are still in contention for end of season outcomes such as promotion. However, the variety of systems in existence indicates that the relationship between design and attendance is unknown. This research addresses this issue and aims to analyze the extent to which an example of such a play-off system inﬂuenced attendance at regular league matches during the English Division 1 2000/01 season. It does so in three steps: construction of a simple theoretical model identifying play-off relevant parameters (promotion probability; non-zero probability; signiﬁcance of match); a statistical estimate of these parameter values on the effect on attendance; and ﬁnally, a predication of attendances that would be observed given the counterfactual values of promotion variables implied by a non play-off promotion regime. The theoretical model identiﬁes that play-off related variables may counteract against each other and thus makes the overall impact of play-offs on regular league attendance an empirical matter. A random effects GLM model, with a correction for endogenous variables, allows for unbiased estimates of the impact of play-off parameters when faced with strictly non-negative, heteroscedastic and skewed data produced by heterogeneous teams. The model permits unbiased prediction of attendances under under different regimes and the results show that, relative to an automatic promotion regime, the current play-off system does indeed appear to have positively inﬂuenced attendance at regular season games, though the overall impact is estimated at less than 1%. Furthermore, the redistribution of promotion probability and signiﬁcance across heterogeneous teams has led to some teams beneﬁting more than others and raises the possibility that some teams may lose attendance during regular season games as a consequence of the addition of post-season play-offs.

11.1

I NTRODUCTION

Identifying and understanding the relationship between a sporting competition’s characteristics and the demand for that product in the form of attendance is an important component in the design of league and cup competition formats. One such common policy-amenable element added to league structures is the addition of a post-season play-off system, whereby the allocation of end of season outcomes such as winning

179

180

Statistical Thinking in Sports

the overall championship or promotion/relegation to different divisions are ﬁnally determined. Such play-off systems are common and feature in many diverse sports, from determining champions in North American sports such as Major League Baseball and American Football to partly determining promotion and relegation issues in open league formats such as the European soccer leagues. One motivation for the presence of play-offs is that they argued to inﬂuence regular season attendance by increasing the proportion of regular season games for which a team is still in contention for the end of season outcome. Though play-offs are a common feature of many professional sporting competitions, there is little consistency in the design and size of play-off structure both within and across different sports, indicating that the size and nature of the impact of play-off designs on attendance at regular season matches is largely unknown. Although there exists an extensive literature on the determinants of demand within support in general and on the impact of league design in particular, limitations in the statistical techniques and the lack of a model which relates play-off design to demand has not reduced this uncertainty (Cairns, 1990; Kuypers, 1997; Borland and Macdonald, 2003; Noll, 2003). This chapter therefore outlines a statistical approach that may be used to address this important research gap and is illustrated with an empirical investigation of the incremental impact of a promotional play-off system on the attendance at regular season league matches in the English professional soccer league immediately below the top-tier Premiership division. The process of conducting this research is conceptually simple and falls into three distinct steps: 1. Identiﬁcation of the theoretical means by which play-off design may inﬂuence regular season games; 2. Estimation of the relevant parameters using empirical data; and ﬁnally 3. Prediction of attendances under other hypothetical play-off designs to identify the effects of different designs. Although conceptually simple, all these steps have proved difﬁcult in practice and methods for conducing each step are thus covered to some degree within this text, which is structured as follows: ﬁrstly I outline a simple model of the determinants of the demand for attendance which identiﬁes the theoretical framework by which the introduction of a post-season play-off system may inﬂuence demand in regular season games. Secondly I describe the method by which the play-off relevant variables are derived before describing the data in section 11.4 and, in section 11.5, discussing the statistical issues which arise in estimating the parameters of the model given “awkward” skewed, heteroscedastic data generated by heterogeneous teams. Section 11.6 presents the results of the statistical estimation, ﬁnding play-off related variables statistically signiﬁcant and section 11.7 predicts the incremental difference that the play-off system has made, ﬁnding a modest impact of an approximately 0.9% to 0.7% increase in aggregate attendance in regular season games in the 2000/01 season and, in addition, ﬁnding that this increase is not uniform across teams. Section 11.8 draws together the conclusions from the chapter.

Post-season play-off systems and attendance

181

11.2

T HEORETICAL MODEL OF THE DEMAND FOR ATTENDANCE AND THE IMPACT OF PLAY- OFF DESIGN

The framework in which to assess the impact of play-offs on attendance is provided by a conceptual microeconomic model of demand which argues that the demand for attendance, Yijt , of a match t between opponents i and j is a function of the characteristics of that match such as: the teams/individuals competing, the cost of attendance, whether the match is televised and, importantly, the context or signiﬁcance of the match in resolving who gets what end of season outcome at the end of the overall competition e.g. promotion, relegation or the championship itself. Yijt D d.xi t ; xjt ; z t /; (11.1)

with xi t representing home team i characteristics such as the quality of the home team, whether the team is still in contention for a desirable end of season outcome, the potential signiﬁcance of the match in resolving end of season outcomes, etc. at the time of match t. An analogous set of team characteristics applicable to away team j at the time of match t is represented by xjt . Finally, z t is a set of characteristics applicable to both teams such as ticket price, whether the match is televised live, the uncertainty surrounding the outcome of the match, etc. As attendance at matches tend to be dominated by home-team supporters one would a priori hypothesize that the factors contained within xi t have a larger inﬂuence than those contained in xjt . Post-season play-off systems may enter the demand function by the potential impact they have on the match signiﬁcance and/or the probability of obtaining the endof-season outcome. This may be illustrated by the example of the English soccer leagues where prior to the introduction of play-off systems, the second highest division operated a strictly automatic promotion scheme whereby the teams which ﬁnished in the top three positions at the end of the season were automatically promoted to the higher division. In contrast, the current play-off design is one in which the top two teams get automatically promoted to the higher division and the following four teams (positions three through six) play in a cup-style knock-out competition in which the winner joins the two automatically promoted teams. Given those deﬁnitions, imagine a hypothetical two-thirds played season where a mid-table team no longer has any reasonable chance of obtaining third position but a distinct possibility of obtaining sixth or slightly higher, under the automatic promotion regime, this team has a zero probability of obtaining promotion, whereas under the stated play-off system, there would be a non-zero probability - thus illustrating the potential impact of a post-season play-off function on the determinants of demand during regular season games. Indeed this impact appears to be singled out as a main motivating factor for the presence of play-off systems: a non-zero probability of obtaining promotion is a positive demand driver; play-off systems create more games where teams have such nonzero probabilities, ergo play-off systems increase demand in regular season games. However, this simple conclusion may be erroneous as it omits several other important issues that may act in a counteractive manner, for example what is the impact

182

Statistical Thinking in Sports

of such systems on the signiﬁcance of a given match in determining end-of-season outcome? If there is no counteractive forces then an attendance maximising play-off design would appear to include all competing teams. Indeed this is almost the case in the US Major League Soccer (MLS) where eight out of ten teams qualify for the end of season championship play-offs, prompting former US national coach Bruce Arena to comment that “most of the MLS regular season games mean nothing” (Gardner, 2005). A further issue to consider is that with a ﬁxed number of promoted teams then at any point in time the probability of promotion for a given team summed across all teams is equal to the number of promotion spots. That is, if there are three promotion places, then if at a given time point we were able to measure the probability of an end of season promotion for each and every team, they would all sum to 3. Thus when a play-off system creates more games with non-zero probabilities, it does so by redistributing the probability of promotion across teams and games rather than increasing probability in total. As there exist heterogeneous football clubs addressing very different sized markets, redistributing promotion probability, particularly from bigger clubs to smaller clubs as is likely in a play-off system, may potentially reduce attendance. The above discussion has identiﬁed three demand inputs that play-off systems may affect: (i) the probability of obtaining the outcome (promotion in this case) for team i at match t , pi t ; (ii) a simple dummy variable indicating whether this probability is non-zero or not, nzpi t and (iii) a signiﬁcance variable, in this case deﬁned as the difference in the probabilities between a win and a defeat for team i in the match t would make, dpi t . However, identiﬁcation (and measurement) of these variables is insufﬁcient to estimate the impact of different play-off designs, in addition one must identify the relationship between these variables and play-off design. In order to do this it is useful to consider the probability of promotion as the product of two different probabilities: the probability of team i ﬁnishing in league position m, i t .m/ and the probability of a team ﬁnishing in position m being promoted, .promjm/ , where the lack of subscripts on this latter terms indicates that it is constant across all teams and invariant to the stage of the season. Thus, in terms of our demand variables: pi t D i t .m/

.promjm/

(11.2) (11.3)

( 1 nzpi t D 0

pi t > 0 pi t D 0 pi tC1 .i loses t /:

dpi t D pi tC1 .i wins t /

(11.4)

This is easily illustrated by comparing the English system pre and current play-off systems. In the old automatic scheme: ( 1 .promjm/ D 0 if m D 1; 2; 3 otherwise (11.5)

Post-season play-off systems and attendance Whereas in the current play-off scheme it would appear to more like: 8 ˆ1 if m D 1; 2 < .promjm/ D 0:25 if m D 3; 4; 5; 6 ˆ : 0 otherwise

183

(11.6)

In the automatic scheme these probabilities are known with certainty, whereas in the current play-off scheme they are estimated with some uncertainty. The assumption of equal probabilities of promotion from each of the four qualifying positions is supported by results from the second tier of English soccer but may not apply in other circumstances and so further research in this area may be required (Dart and Gross, 2005). Nevertheless equations (11.2) through (11.6) identify the potential impact play-off systems may have on attendance on regular league matches and can be used to evaluate claims of the impact of play-off design on attendance. The theoretical model has thus identiﬁed a potential trade-off, marginal fans of clubs may be attracted to matches in which their team still has a non-zero probability of obtaining a desirable end of season outcome and play-off systems can increase this number of games. However, since the overall probability is ﬁxed, this effect is achieved by a redistribution of probability. Therefore, the impact of a play-off system may not be theoretically determined, but is instead an empirical question of which of these counteracting forces dominates.

11.3

THE PROBABILITY OF END - OF - SEASON OUTCOMES AND GAME SIGNIFICANCE

M EASURING

The play-off related variables may only be implemented in a regression model if there are measures of these probabilities readily available. Though there exists a betting market for end of season outcomes potentially providing a means of obtaining probabilities of end of season outcomes for each team prior to each match (and by deﬁnition asserting whether the probability is non-zero) , there remains one element which is not readily available: how a match may affect probabilities of obtaining end of season outcomes. In practice this is more complex than simply including market betting odds of promotion or winning the championship at the time of the match (which may be observable) as one must include the betting odds should that match be won or lost all other things remaining equal (potentially neither of these odds will be observable.) Thus even if the conceptual measures are accepted, estimation of the difference in probabilities is likely to remain a contentious area in research. To address this issue I implement an imperfect measure based on placing simulated results for remaining matches (based on individual match ex-ante betting odds) onto the existing league table at any point in time. For example, suppose there are 40 matches remaining in a 500 match league. The 460 completed matches give a factual table; where each team is, how many points they have, how many games they have

Although these data are not available for this research.

184

Statistical Thinking in Sports

played, who has played who, etc. The probabilities of the game outcomes (home win, draw, away win) for all 40 remaining games then allow us to devise a measure of the likely probabilities of where each team will ﬁnish. That is, I can simulate the outcome of each remaining game and add them to the ﬁxed real table to produce an expected ﬁnal table. Conducting this simulation exercise a number of times allows probabilistic statements about the likely ﬁnishing positions. i.e. for each team we can simulate i t .m/ for each m and obtain pi t by multiplying each i t .m/ by the assumed values of .promjm/ for each m. Furthermore, one can obtain a measure of the signiﬁcance of any of the remaining matches by: ﬁrst taking the ﬁxed existing league table, assuming team i wins that game and simulating the remaining games and recording the probability of ﬁnishing positions for team i , then, secondly conducting the same exercise with the exception that we assume team i loses that particular game. The differences in these two probabilities of ﬁnishing in a particular league position, multiplied by the relevant .promjm/ and summed across all m therefore gives a measure of the signiﬁcance of that game i.e. the difference in the probability of a team obtaining the end of season outcome as a result of winning the game and losing that game. The proposed measures have the beneﬁt of incorporating all teams actual positions and points and the expected total number of points allowing for the market expectations of the difﬁculty of the remaining schedules, taking into account matches between teams effectively competing for the same places and the structure of the league i.e. how many automatic promotion places there are, etc. However, the measures do have a number of weaknesses. Firstly data on betting odds are only available for the odds immediately prior to matches being played. Ideally, contemporary betting odds for all matches at each time point during the season are required in order to calculate the true ﬁnal table expectations. If odds change over time, then by using odds posted in March to estimate the expected signiﬁcance of a match in January, we are likely to have measurement error in a right hand side variable. If odds are subject to random changes over time, then this will simply be captured in the error term and will not cause any speciﬁc problems. However, if the odds are subject to systematic changes, a more likely scenario, then some bias via measurement error will occur. Secondly, I assume that the outcome of a particular match has no inﬂuence on eventual league outcomes other than that made by the allocation of points from that game. In other words I assume that there are no spill-over effects; that by winning one game, a team does not inﬂuence the probabilities in another match involving itself or other teams. Thus the measures I use to provide values for the model are potentially ﬂawed, and ﬂawed in a systematic and predictable manner. However, the approach incorporates in a systematic manner: the reality of the existing tables, the structure of the league and a means of incorporating expectations based on an existing schedule (i.e. who plays who with a probabilistic statement of the likely outcomes.) In addition, the empirical results have face validity in that they identify games as signiﬁcant which look to the author as games which are worthy of that description (see section 11.4). Nevertheless a more rigorous solution to measuring these probabilities is likely to be a future research priority in this area.

Post-season play-off systems and attendance

185

11.4

T HE DATA : THE TIER

2000/01 E NGLISH F OOTBALL L EAGUE 2 ND

Promotion (and relegation) play-offs were introduced to the English football leagues in the 1986/87 season. The current system in England for the division below the Premiership, now known as the Championship, but during the 2000/01 season as the 1st Division, are that the top two teams are promoted automatically to a higher division and the next four teams enter a straight forward knock-out tournament. The 3rd and 4th placed teams meet the 6th and 5th placed teams respectively in a twolegged semi-ﬁnal, playing home and away. The winners of the two semi-ﬁnals then meet for a single ﬁnal played on neutral territory where the winner is promoted to the higher division. Currently, in all divisions, relegation is strictly automatic based on league position. The non-play-off alternative in the English system is that the top three placed teams gain automatic promotion. Over the course of a regular season each of the 24 teams plays a balanced schedule, with each team playing each other twice, once home and once away and so a full 1st division season consists of 552 games. For each league match, three points are awarded to a winning team and none to a losing team. In the event of a draw, a single point is awarded to each team. League positions are ﬁrstly determined by accumulated points, then aggregate goal difference and then total goals scored if teams are tied on points. The season runs from August to May with no winter break. Fixtures are determined prior to the start of the season, but have some degree of ﬂexibility with some game dates being rearranged due to weather postponements, cup matches or television schedules. Typically, rearranged ﬁxtures will occur as close to the original date as possible, for example games moved so they may be televised live are typically moved from a Saturday to the preceding Friday, following Sunday, Monday or Tuesday. Fixtures are traditionally played on Saturday afternoons and also on bank holidays such as Boxing Day and Easter Saturday and Monday. During the 2000/2001 season, the teams within the 1st division showed sizeable intra- and inter-team variation in the match attendance at their 23 home games, as shown in Table 11.1. The table shows a number of notable features: there is a wide variation in average attendances with larger clubs such as Nottingham Forest, Birmingham, Blackburn, Shefﬁeld Wednesday and Wolverhampton Wanderers having averages three to four times that of the smaller clubs Grimsby, Crewe, Stockport and Wimbledon. Indeed none of the minimum attendances at the four biggest clubs fall below the maximum attendance at the smaller clubs. Clubs with the largest average values also tended to have the largest variances, with the variance being correlated with the square of the mean, indicating heteroscedasticity. That the means are systematically closer to the minimum level rather than the maximum is indicative of a skew. The data also indicate that stadium sizes are not thought to have had a censoring effect on the data, with only a very few attendances approaching stadium capacities. The teams with larger and smaller average attendances are predictable on historical grounds, may reﬂect different potential team speciﬁc market sizes and generally afford the teams greater or lesser resources which can aid league performance.

186

Statistical Thinking in Sports

TABLE 11.1

2000/01 Attendances, league positions and simulated promotion probabilities by team. Team Mean St. dev. Median. Mean Mean Non-zero Att. Att. pos. Prob. Prob. Prob. Dif Games Barnsley Birmingham Blackburn Bolton Burnley Crewe C Palace Fulham Gillingham Grimsby Huddersﬁeld Norwich N Forest Portsmouth Preston NE QPR Shef United Shef Wed Stockport Tranmere Watford WBA Wimbledon Wolves 14465 21283 20740 16062 16234 6698 17061 14990 9281 5646 12809 16525 20615 13533 14617 12013 17211 19268 7030 9052 13941 17657 7901 19258 1928 3798 3544 3577 1872 983 1986 2734 735 1212 2673 1835 2456 1973 1411 2112 4313 5272 1300 1255 1939 2097 2343 3067 13 4 4 3 9 18 19 1 15 19 23 16 7 15 6 22 9 21 20 19 5 5 11 15 0.012 0.324 0.541 0.412 0.052 0.001 0.003 0.977 0.002 0 0 0.002 0.112 0.001 0.118 0 0.039 0.001 0 0 0.234 0.157 0.029 0.002 0.007 0.093 0.125 0.113 0.035 0.001 0.003 0.015 0.002 0 0 0.001 0.06 0.002 0.053 0 0.029 0.001 0.001 0 0.072 0.066 0.024 0.002 27 46 46 46 45 8 15 46 23 3 5 24 45 15 46 5 42 6 4 6 44 46 44 16

Post-season play-off systems and attendance

187

Hence the larger clubs tend to be found at the higher end of the league table with a few exceptions: Shefﬁeld Wednesday, a team with a historically large support, had relatively few resources as a result of overspending in previous seasons. Shefﬁeld Wednesday spent much of the season in the bottom four positions, but still managed an impressive home average of 19268 attendees. Conversely Fulham, a team with a rather more modest historical following, were a team that beneﬁted from a wealthy chairman and spent almost all of the season in ﬁrst place. Despite the league success, crowds failed to match those of Shefﬁeld Wednesday. Such summary statistics hint at the importance of allowing for individual team speciﬁc latent support that is independent of short term success (or other temporary variables). Indeed such data hint at the potential limitations that tinkering with league design may have on attendance. With regard to the measure of promotional signiﬁcance, if we assume that the probability of promotion when qualifying for a play-off position is 0.25, then the average league position, probabilities of promotion, the difference the game may make to the probability of promotion facing each team immediately to their next match and the number of non-zero probability of promotion games are shown in the ﬁnal four columns. Fulham dominated the division, occupying the top spot for most of the season and with a high proportion of games won, a sizable gap between themselves and the chasing group and the likelihood of winning much of the remaining schedule, the probability of being promoted was close to one at the time of many of their matches (though this is probably an overestimate for the earlier games created by the means of generating the generating the probabilities.) The difference an individual match would make in altering the probability of promotion for Fulham was small, averaging around 1.5% per game. At the opposite extreme there are a number of teams whose probabilities are so low (15 out of 24 teams have average probabilities of promotion of less than 5%) that, similar to Fulham, the average difference a game may make to the probabilities of promotion is small (less than 3% for the 15 teams). However, for those teams with an average probability of promotion in the range 0.25 to 0.75, they have an average probability difference per game of approximately 10%. Figure 11.1 shows the relationship between the probability of promotion and the difference to probabilities that a match can make. It shows a quadratic relationship between the two with low and high probabilities being associated with low differences in potential changes, with a few obvious outliers. The limited nature of the observational data and the mathematical constraints imposed by probabilities may limit the ability of the statistical methods in separating out the effects of probability and the effect of the difference in probabilities a game can make. Given the ﬂawed nature of assessing the probabilities, a note is required to support the practical means of measuring the probability of promotion and the difference in probability a match may make. This is illustrated by inspection of games for which the model predicts a high signiﬁcance. The most signiﬁcant match is thus identiﬁed as a match in February. Initially one might feel that this would be too early in a season which runs to May but closer inspection of the game involved is fairly convincing. The game in question occurred when second placed Bolton hosted

188

Statistical Thinking in Sports

0.30

0.25

match significance

0.20

0.15

0.10

0.05

0.00 0.0 0.2 0.4 0.6 0.8 1.0

promotion probability

FIGURE 11.1 The relationship between promotion probability and match signiﬁcance.

third placed Blackburn. At the time Bolton were seven points clear of Blackburn (66 points against 59, Fulham in ﬁrst place had 75, Birmingham in fourth place also had 59 points), and had thirteen games left to play in the regular season, Blackburn had played one game less and had fourteen matches left. The probabilities of promotion prior to the match were Bolton 58% and Blackburn 50%. If Bolton were to win the match, their lead over Blackburn would increase to ten points and the probability of promotion would increase to 76.75% and Blackburn’s would fall to 34.75%. If Bolton were to lose, their probability would fall to 46.75% (hence a probability difference of the game of 30%) and Blackburn’s would rise to 62.25% (hence a probability difference to Blackburn of 27.5%) . The measure has thus incorporated the distance of the challenging teams, the fact that two of the challenging teams are playing against each other, that the third placed team has a game in hand and the importance of ﬁnishing second over third, etc. Blackburn had a similarly signiﬁcant match two games later when they played away at Birmingham City. With both teams having a probability of promotion of approximately 59%, the difference to Blackburn of the game was 24.75%. The slightly smaller difference in probability reﬂects the fact that this match was not expected to affect the points total of the current second placed team which was still Bolton; ahead of third placed Birmingham on goal difference and three points ahead of Blackburn. Of the remaining games, all of the signiﬁcances over 15% feature either Birming-

The

ﬁnal score was Bolton 1 - 4 Blackburn.

Post-season play-off systems and attendance

TABLE 11.2

189

Impact of regimes on aggregate signiﬁcance. Current Play-Off System Mean promotion probability (home) Mean match signiﬁcance (home) Non-zero probability games (home) Mean promotion probability (away) Mean match signiﬁcance (away) Non-zero probability games (away) 0.128 0.029 330 0.124 0.03 323

Automatic Promotion System 0.129 0.03 216 0.123 0.029 215

ham, Bolton or Blackburn, mostly in the later half in the season and mostly reﬂecting the chase for second place. None of the games that look like being major inﬂuences on play-off places at the bottom end of the qualifying places (i.e. positions 5 and 6) are valued quite as signiﬁcantly. The clearest example appears to be Nottingham Forest away to Wimbledon in their third last game of the season, where seventh placed Forest were two points adrift of the last play-off position and faced a probable promotion probability difference of 11.75% (14% if Forest won and 2.25% if they lost, Wimbledon were not play-off contenders). Although the probability of promotion and difference in probability is relatively modest compared to some games, since Forest were ten points adrift of third place at the time, with only a maximum of nine points available, if it were not for the play-off regime, the probability of promotion and the difference that match could make to the probability of promotion would be 0%. The proposed measure of signiﬁcance has identiﬁed games which appear intuitively signiﬁcant in the context of the season and clearly incorporate notions of six-pointers (i.e. games where points could be denied to other competing teams), actual position and expected performance over the remaining season. Arguably such beneﬁts provide a powerful motivation for persisting with imperfect measures. Finally, given an understanding of the relationship between play-off systems and promotion probabilities and game signiﬁcance and assuming that match results are unaffected by the league design, then it is possible to estimate the probabilities and match signiﬁcances for this season as if it had operated under the traditional pre1987 league structure with the teams ﬁnishing in the top three league positions being automatically promoted to the top division. Table 11.2 shows the difference in aggregate levels of probability and signiﬁcance under the two different regimes: the current play-off system and the pre-1987 automatic promotion system for home and away teams. The table shows that the average probability of promotion per game is, as expected, unchanged (there is a slight redistribution between home teams and away teams which is not inconsistent with the mathematics). The situation is the same for the difference in probabilities a match may cause, the aggregate stock of this measure is unchanged across regimes. This may initially seem counter-intuitive but the lim-

190

Statistical Thinking in Sports

ited nature of the play-off system described has essentially moved the signiﬁcance from games which distinguish ﬁnishing third from lower to those that determine who ﬁnishes second from lower. The obvious and expected difference is in the number of games for which either the home team or away team has a non-zero probability of promotion. In the 2000/01 season, the current play-off system created 114 more games where a home team had a non-zero probability than would occur with an automatic promotion scheme, approximately 21% of the season.

11.5

S TATISTICAL ISSUES IN THE MEASUREMENT OF THE DE TERMINANTS OF ATTENDANCE

As indicated by the descriptive analysis, a number of statistical issues in the estimation of the relevant parameter values of the demand model exist: attendance data are non-negative, heteroscedastic and often skewed. In addition, attendances are clustered within heterogeneous teams. Finally conceptually different important variables such as dpi t and pi t may be highly correlated causing issues of multicollinearity. 11.5.1 S KEWED , NON - NEGATIVE HETEROSCEDASTIC DATA

Within the economic literature, the most commonly addressed data problem is that of a positively skewed dependent variable and heteroscedasticity. This tends to be solved via a semi-log or log-log functional form (Forrest and Simmons, 2002; Garcia and Rodriguez, 2002) i.e. deﬁning the dependent variable as the natural logarithm of attendance and computation of robust standard errors for valid inference. Such an approach is acceptable if the objective of the analysis is to conduct hypothesis tests. However the objective of this research is not only to conduct a hypothesis test on the statistical signiﬁcance but also to identify the incremental effect of the system measured in its natural metric i.e. attendances. In this case the limitations of such transformation methods are revealed as “unbiased and consistent quantities on the transformed scale usually do not retransform into unbiased or consistent quantities on the untransformed scale” (Duan, 1983). Although Duan also identiﬁed a means of correcting this bias, smearing, it has been found that the smearing procedure performs poorly in the presence of heteroscedastic error terms (Manning and Mullahy, 2001). The alternative then is to look for models which can accommodate different data assumptions and thus circumnavigate the retransformation problem altogether. The Generalized Linear Model framework (GLM) represents a class of models (including the familiar linear model, logit, probit and Poisson count models) which provide some scope in handling the skewed heteroscedastic non-negative attendance data. McCullagh and Nelder (1989) provide a full description of the GLM framework, but a brief summary follows. In a simple linear model a relationship between a set of explanatory variables, xi , and a response variable, yi , is typically modelled as yi D i C i . Where i , the linear predictor, is given by i D xi0 ˇ and, the error term, the random ﬂuctuations around the outcome variable are assumed to be given

Post-season play-off systems and attendance by i 191

N .0; 2 /. The GLM speciﬁcation introduces two additional elements. Firstly it splits the relationship between the linear predictor and the conditional expectation, such that the expectation is a function of the predictor i.e. i D g 1 . i / or g. i / D i , such that g.E.yi j i // D i . The function g. / is known as the link function as it provides the link between the linear predictor and the conditional expectation. In the simple linear model, as i D i , the link function is simply known as the identity link function. However in the GLM framework other link functions may be applied such as a log or probit link function i.e. log. i / D i or ˆ 1 . i / D i . The inverse of the link function, g 1 . / may be used to turn a given linear predictor into an expectation on the original metric e.g. i D exp. i / or i D ˆ. i /. Though in principle this looks very similar to the transformation solutions, there is a subtle but important distinction and that is GLM deals with a function of the expectation, g.E.yi j i // D i , whilst the transformation solutions speciﬁes the expectation of a function E.g.yi /j i / D i . It is this distinction which means the transformation solution is plagued by the back or re-transformation problem and the GLM approach is not. The second distinguishing feature of the GLM is the increased range of distributions which surround the conditional expectation, i.e. the conditional probability distribution of the responses. For example, in the linear model, the conditional distribution is given by N .0; 2 / In the GLM framework, the conditional distribution may be any from the exponential family - this includes the binomial, normal and gamma distributions. The choice of conditional distribution implies a relationship between the conditional variation and the conditional mean given by: var.yi j i / D V . i /;

(11.7)

where V . i / is known as the variance function and links the conditional variance and expectation and is known as the dispersion parameter which is unrelated to the condition expectation. For example with the Poisson distribution, the variance is equal to the expectation (a well known property of the Poisson count model); and with the gamma distribution, the variance is proportional to the square of the expectation. The observed empirical relationship between variance and conditional mean can therefore be used to determine which is the appropriate choice of distribution. In this case, given the summary data, the gamma distribution appears to be an appropriate choice to model the non-zero, skewed and heteroscedastic attendance data. The other modelling choice is to choose an appropriate link function. A log link function creates an multiplicative effect between covariates. In the next section I discuss the addition of individual team speciﬁc effects to account for clustering, with a log link function this team speciﬁc effect interacts with other covariates, thus allowing some scope in modelling a heterogeneous response to play-off related covariates across teams (a similar motivation is provided by Berri, Schmidt, and Brook, 2004, in a log-log speciﬁcation.) Thus in order to produce unbiased estimates of the incremental effect on attendance of the English play-off system relative to an automatic promotion scheme,

192

Statistical Thinking in Sports

given the skewed, heteroscedastic and non-zero nature of the attendance data, a GLM model with a log link function and gamma distribution is speciﬁed. 11.5.2 C LUSTERING OF ATTENDANCE WITHIN TEAMS AND UNOBSERVED HETEROGENEITY

Like the simple linear regression models, the standard GLM framework assumes that the observations are independent of each other. However the descriptive analysis has indicated that unobserved heterogeneity may exist between teams, principally in this case different underlying market sizes, which in a classical multi-level setting may be captured by a random intercept or random effect consistent across time, vi . Conceptually the random term is easily incorporated in the GLM framework in 0 0 the linear component i.e. ijt D xi0 t ˇi C xjt ˇj C z t ˇ t C vi and further extensions could be made to allow for away team random effects. However, a key assumption of the random effects model is that the random effect is assumed to have a zero correlation with the included observed variables. In this case this assumption appears suspect as we may expect the teams with larger underlying markets to have access to larger resources and hence have higher expected probabilities of promotion etc. The consequences of ignoring this potential correlation are that we may obtain biased estimates of the impact of the correlated covariates. Skrondal and Rabe-Hesketh (2004) describe these correlated observed covariates as endogenous and identify that many analysts (particularly economists) falsely believe that this situation automatically rules out a random effects model in favour of a ﬁxed effects model. This is not the case as a rather simple and elegant solution exists: the impact of a correlated covariate xi t may be estimated without bias in a random effects model if one simply includes the cluster mean x:t as an additional covariate N in the regression model. The inclusion of the cluster mean breaks the correlation between the random effect and covariate of interest. Theoretical considerations guide the identiﬁcation of potentially endogenous variables. For example, the three play-off related variables for home teams are all likely to be endogenous and so team clustered means are included for these variables in the regression model. However variables associated with away teams (such as away team league position) are likely to be exogenous as by the random nature of ﬁxture determination, the values of the away team variables will be uncorrelated with the home team random effect. Other potential endogenous variables include whether the match is broadcast live on the (Sky) satellite subscription channels and ticket prices. 11.5.3 M ULTICOLLINEARITY

A potential estimation problem occurs through the collinear nature of several of the explanatory variables. Predictably those variables associated with the probability of promotion are all highly and signiﬁcantly correlated. For example the correlation between a home team’s league position and the signiﬁcance of the game to a team is 0:69. The consequence of such collinearity is that we may be unable separate out the individual effects of variables and hypothesis tests may have large (but still unbiased) standard errors Kennedy (2004). Further consequences may include parameter

Post-season play-off systems and attendance

193

estimates having incorrect signs or implausible magnitudes and that parameter estimates are sensitive to inclusion (or exclusion) of a few data points Greene (2003). Unfortunately to some extent we are constrained by the passive nature of the data collection - we are unable to construct an active data collection via an experimental design whereby we can produce orthogonal relationships between variables of interest. It is thus worthwhile exploring potential solutions to the problem. An obvious solution may be to incorporate more data by expanding the data to include more years or more divisions. However, if the relationship between these variables holds across other possible datasets, then this solution will be of limited use the extra data will contain little additional information, though the increased sample sizes would, at the margin, reduce parameter estimate variance. That additional data would contain the same restricting correlations is a situation more plausible than not: it is hard to imagine leagues where league position is not highly correlated with the probability of obtaining an end of season outcome and where low (or high) probabilities of obtaining that outcome are subject to large swings as a result of one game. Other analytical solutions such as ridge regression or factor analysis are often implemented but also criticised (Maddala, 2001) and it is likely that there exists no analytical solution to multicollinearity in this application. For example collapsing the three play-off related variables to a single factor may resolve the issue of collinearity but does not permit estimation of the separate effects. In such circumstances it may be more prudent to accept the limitations of the data, concede that the presence of multicollinearity is unavoidable, and to examine the sensitivity of the results and policy implications to the problems/uncertainty caused by multicollinearity. In this particular case, I do this by presenting a range of models where variables whose nonsigniﬁcance or magnitude is considered a potential artefact of multicollinearity are variously dropped and the sensitivity of the results to each speciﬁcation examined. The intuition behind this approach is that it gives us the range of responses from models with different assumptions. If we ﬁnd predictions are largely invariant to model speciﬁcation, then the consequences of multicollinearity are diminished. 11.5.4 F INAL STATISTICAL MODEL

All ﬁnal statistical models are thus of the form given in equation 11.8 expressed as a Generalized Linear Model: po ijt

Á E.yj

po ijt /

Dg

1

.

po ijt /;

(11.8)

i.e. the conditional expectation of attendance between home team i against away po team j at a league match indexed by t under the play-off system (po), ijt , is a po po 1 1 function g . / of a linear predictor ijt . Where g . / D exp. / and ijt D xi t ˇi C po po xjt ˇj C z t ˇz C x:t ˇend C vi . N po po xi t , xjt and z t are vectors of match characteristics relating to the home team, away team and match at match t. The lack of a play-off superscript for the match speciﬁc characteristics indicating they are considered invariant to play-off design. po vi is a team speciﬁc time invariant random effect (vi N .0; i2 /), x:t is a vector N

194

Statistical Thinking in Sports

of team averages of those variables which are potentially endogenous and ˇi , ˇj , ˇz and ˇend are vectors of unknown parameters to be estimated. Note that the component po x:t ˇend C vi accounts for the underlying heterogeneity of each team split into a ﬁxed and random component and does not represent an impact of any observable match characteristic. The data indicate that the variance of the dependent variable is related to the square of the expectation, a gamma distribution is assumed and hence the conditional variation is given by: var.yijt j ijt /

D

2 1 : ijt ˛

(11.9)

11.6

11.6.1

M ODEL ESTIMATION

C HOICE OF EXPLANATORY VARIABLES

The GLM statistical model with a log link and gamma distribution estimated with a random effect for the home team has been justiﬁed in sections 11.4 through 11.5. Here I outline the inclusion criteria for the explanatory variables. Included are the three play-off related variables: pi t , nzpi t and dpi t for the home team and a set of analogous variables for the away team pjt , nzpjt and dpjt . Team quality is captured by league positions for both home and away teams: posi t and posjt . Matches against local rivals are also anticipated to generate interest. Variables for derby games for both home teams and away teams (derbyi t and derbyjt ) are constructed on the basis of nearest neighbour and are set to zero unless the away (or home) team featuring in that match is the nearest club to that particular team, where it is set to one. The one exception to this construction rule is the intra-city Shefﬁeld derby games: due to the intense rivalry between Shefﬁeld Wednesday and Shefﬁeld United, a further dummy variable is constructed, sheffd t , to capture the difference between these two intra-city derby matches and any other derby games. miles t measures the distance between teams as attendance is expected to be diminishing in distance as travelling away fans may be deterred by the additional travel costs and marginal home fans may be deterred by a lack of interest in seeing a team from some distance. Two further dummy variables, ﬁrsti t and lasti t , are constructed for the home team only, to capture whether it was the ﬁrst or last home game of the season for that team. Clubs often arrange additional entertainment and teams may complete a post-match lap of honour to say thank you for the fans’ support throughout the season. The additional party atmosphere of a home team’s ﬁrst or last game and the ﬁrst (or last) chance to see your team at home since last (until next) season is expected to attract the marginal fan. The same arrangements are not made for away teams and thus no variable for the away team is included. Match uncertainty has been the focus of several empirical studies and is thought to be a driver of demand - with fans expected to shy away from games which have a certain outcome. In this chapter I measure the outcome uncertainty as a Theil measure based on the match betting odds, (Peel and Thomas, 1996). The measure is increasing in uncertainty and ranges from 0.75 (Fulham versus Tranmere, where the bookmakers average mark-up purged probability of a home win was 0.73; a draw

Post-season play-off systems and attendance

195

was 0.18; and away win was 0.09) to 1.09 (Crystal Palace versus QPR, where probability of a home win was 0.37; a draw was 0.29; and away win was 0.34). Data on betting odds, as with data on ﬁxtures, results and game dates, was provided by Mabel’s Tables (2003) and data on attendance provided by Statmail (2003). Football is traditionally played on a Saturday afternoon and on certain public holidays. However, many ﬁxtures are played midweek. It is perceived that, relative to the traditional Saturday ﬁxture, games played on public holidays attract attendance whereas games played midweek deter attendance. Thus dummy variables midweek t and hol t are included, capturing when the game was played (the omitted baseline category is a Friday or weekend ﬁxture.) Games played on a weekday bank holiday, such as a Bank Holiday Monday, are classed as holiday ﬁxtures and not midweek ﬁxtures. A sky t dummy is included, capturing whether the game was televised live on the main subscription satellite TV providers, Sky. Dummy variables Aug t , Sep t , Oct t , Nov t , Dec t , Feb t , Mar t , Apr t , and May t capture the month in which the game is played with January being the omitted category. However a number of desirable variables are omitted on the grounds that they were unavailable. The home team random effect is argued to pick up the net impact of time invariant omitted variables. However, variables such as ticket price, which for some teams may be responsive to expected demand for individual matches and therefore vary within a season, a random effect may not sufﬁce. Common examples include children under 16 being admitted for £ 1 when accompanied by a full-price adult or season ticket holders being able to bring a friend for £ 5. The omission of this endogenous variable means the model must be regarded as a reduced form model rather than fully structural. Though if ticket prices remain constant within teams across a season then the effect of ticket price will be subsumed in the random effect. Finally, given the requirements of the random effects model, a distinction must be made between those variables that are regarded as endogenous and those that are exogenous. For those variables that are considered endogenous, team season means are included in the model speciﬁcation. All those variables associated with the away team are assumed exogenous as are holiday ﬁxtures, derby variables, miles measure, month dummies and ﬁrst and last game variables. The play-off related variables for the home team are considered endogenous as are: the home team’s league position, the Theil measure of match uncertainty and whether the match is televised live on Sky. In addition since televised games are often moved to midweek, then the midweek dummy is also considered potentially endogenous. 11.6.2 R EGRESSION RESULTS

In order to test the sensitivity of the analysis to issues of collinearity between league positions and play-off related variables, ﬁve models are estimated. Model 1 has all explanatory variables included; models 2 through 5 omit league positions. Model 2 has all the play-off related variables included whereas models 3, 4 and 5 have a single play-off variable included in each case: non-zero probability dummy, promotion probability and probability difference (or signiﬁcance) respectively. All regressions

196

TABLE 11.3

Statistical Thinking in Sports

Regression results model 1. Variable coeff. std. err. Constant pos (home) pos (away) nzp (home) p (home) pd (home) nzp (away) p (away) pd (away) hol ﬁrst last midweek derby (home) derby (away) sky theil miles sheffd August 23.51 0:004 0.002 0.058 0.302 0.359 0.003 0.194 0:247 0.052 0.112 0.108 0:067 0.06 0.05 0:067 0.529 0:001 0.507 0:061 0.855 0.002 0.002 0.022 0.101 0.336 0.019 0.034 0.191 0.02 0.039 0.037 0.016 0.035 0.035 0.023 0.165 0 0.095 0.037

Variable September October November December February March April May av.pos (home) av.nzp (home) av.p (home) av.pd (home) av.sky av.real˙mi k av.theil var (rand effect) log likelihood

coeff. 0:049 0:004 0.007 0.051 0.09 0.107 0.092 0.075 0.034 0.361 2:504 1.419 1.092 0:556 14:595 0.033 4968:423

std. err. 0.03 0.028 0.03 0.028 0.029 0.029 0.028 0.05 0.006 0.073 0.176 0.422 0.129 0.141 0.818 0.001

are estimated using the GLLAMM suite of commands (Rabe-Hesketh, Skrondal, and Pickles, 2004) in Stata SE 8 (StataCorp, 2003) and use a random effects speciﬁcation with a log link function and a gamma distribution. There are 552 observations clustered within 24 home teams. Table 11.3 contains the regression result and regression log-likelihood for model 1 and the results for the remaining models are available on the website associated with this book: www.statistical-thinking-in-sport.com. The results are broadly similar across models and any meaningful differences are discussed within this text. The results in all models conform to prior expectations: the play-off related variables are all positive and in most cases statistically signiﬁcant; parameters associated with the home team exceed their analogous away team counterparts; variables anticipated as being positive drivers of attendance such as games being played on bank holidays, ﬁrst and last games of the season and derby games are all positive and mostly signiﬁcant; variables anticipated to have a negative impact on attendance such as the live broadcast variable, increasing distance between teams and a midweek setting all have negative coefﬁcients estimated which are mostly statistically signiﬁcant. In accordance with economic theory, the positive and mostly signiﬁcant variables estimated for the Theil measure of uncertainty supports the notion that increased uncertainty has a positive impact on demand.

Post-season play-off systems and attendance

197

The month dummies show a distinctive time trend with months after January having positive coefﬁcients whereas months prior to January having negative coefﬁcients. If the month dummies were picking up a negative inﬂuence of adverse weather conditions, then one would have expected the months of August and September to have had positive coefﬁcients. Thus the month dummies may be picking up some omitted variable which is correlated with the time of the season. Potentially this may be an artefact of the means of calculating the promotion probabilities and match signiﬁcances. Matches earlier in the season have promotion probabilities and signiﬁcances calculated using a disproportionate amount of matches simulated on the basis of betting odds ﬁxed some months later, thus these variables in the earlier part of the season may be more prone to measurement error - the trend observed in the month dummies may be an indication of this and indeed a crude means of correcting this bias. The averaged endogenous variables are generally signiﬁcant and with the expected signs. These variables do not estimate the impact of the associated variable but contribute to a ﬁxed component of a team’s unobserved heterogeneity. For example the positive and signiﬁcant coefﬁcients associated with the averaged Sky variable demonstrates the tendency for games with expected high attendances to be chosen for live broadcast rather than a positive impact of broadcasting on attendance. The negative and signiﬁcant coefﬁcient estimated for the Sky dummy is the unbiased estimate of the impact of live broadcast, indicating a negative impact on attendance. In all cases the random effects speciﬁcation appears justiﬁed with the estimated variance of the random effect being signiﬁcantly different from zero. The variation in results across models is also as expected. Those variables uncorrelated with the play-off variables produce estimates generally consistent across the models as various play-off variables are omitted. As expected the coefﬁcients associated with play-off related variables do change as correlated variables are omitted. Notably the magnitude and signiﬁcance of the remaining play-off variables increases as correlated variables are omitted. The promotion probability variable is positive and signiﬁcant in all models for both home and away teams. The non-zero probability dummy is also always positive and signiﬁcant for the home team, though only signiﬁcant for the away team when no other play-off related variable is included. The probability difference or signiﬁcance variable is of the expected sign but only signiﬁcant when it is the only included play-off related variable and even then, only when it is applied to the home team.

11.7

T HE IMPACT OF THE PLAY- OFF SYSTEM ON REGULAR LEAGUE

ATTENDANCES

Given a theoretical model, a measure of the expected values of the play-off related variables under an automatic promotion regime and estimates of the relationship between these variables and attendance, it is possible to predict the expected attendances for each match that would have occurred had the play-off system not been implemented by replacing the observed play-off variable values with the estimated counterfactuals, as shown in equation (11.10). As previously stated this requires the

198

TABLE 11.4

Statistical Thinking in Sports

Incremental impact on aggregate attendance over automatic promotion regime. Model description Impact on % increase over aggregate attendance auto regime Model 1 Model 2 Model 3 Model 4 Model 5 All variables No league positions non-zero prob only Promotion prob only Signiﬁcance only 71569 54595 153024 19291 14078 0.91% 0.69% 1.97% 0:24% 0:18%

assumption that .promjm/ has no impact on auto ijt

i t .m/. po

auto D exp.xiauto ˇi C xjt ˇj C z t ˇz C x:t ˇend C vi /: N t

(11.10) po Of particular note within this equation is the retention of the original values of x:t N po rather than a new set x:t . This is because the original values of x:t (in combination N auto N with ˇend ) capture a component of a home team’s time invariant heterogeneity rather than the impact of the play-off variables. The underlying heterogeneity is assumed to apply in different hypothetical settings and so should remain constant across designs. In the 2000/01 season, the aggregate attendance across all teams was 7909514. Table 11.4 indicates the estimated incremental impact the play-off regime has made relative to what would have occurred had an automatic top-three promotion scheme been been in operation. In all ﬁve models, the incremental difference is rather modest ranging from an increase of 1.97% to a decrease of 0:24%. Models 3 to 5 provide limits to the uncertainty attributable to the correlation between promotion related variables. Model 3 assumes that there is no impact of a redistribution of probability or signiﬁcance and the only impact occurs through the creation of new non-zero probability games, this therefore presents the play-off system in its “best” light assuming that there is no counteracting effects of redistributing probability from large attendance generating teams to smaller teams. In this model, no team is worse off than under the automatic regime and the overall impact is an additional 153024 attendees over the full regular season of 552 games, an average of an additional 277 attendees per game. The increase is not even across teams; teams such as Fulham (+1940), Bolton (+2206) have increased aggregate home attendances through the impact on the away teams, whereas teams such as Norwich (+13689) and Shefﬁeld United (+10783) gain more substantially by having far more non-zero probability games (both clubs had slim chances of obtaining a play-off place) and being reasonably big attendance producing clubs. Model 4 represents the play-offs in their worst possible light, whereby the impact of the play-offs occurs through the redistribution of promotion probabilities. In this case a reduction of 0:24% attendance is estimated or a reduction of 35 fans per game. Again the reduction is not felt evenly across teams with Bolton ( 28797) and Blackburn ( 21343) losing the most over the season, reﬂecting the reduction in the

Post-season play-off systems and attendance

199

probability of promotion for these two teams contesting second and third positions. Attendance during the post-season play-off games themselves may compensate this loss, but for Blackburn, the team which eventually ﬁnished second, there would be no such compensating attendance. Fulham ( 463) who only had relatively small probabilities of ﬁnishing 3rd is thus only marginally affected. The teams that gained the most if this were the correct model are West Bromwich Albion (+10615) and Preston North End (+10626) - teams which spent much of the season in the lower play-off positions ﬁfth and sixth. The outcome of model 5 where the only consequence of play-off is assumed to be a redistribution of signiﬁcance, is not as theoretically predictable as in models 3 and 4. The empirical estimate is actually a reduction of 0:18% or 26 attendees per game. The ambiguity of the anticipated effect is illustrated by the expected impact on Fulham (+2081). With a play-off system there is a signiﬁcant difference between ﬁnishing second and third and little between third and fourth, whereas with an automatic promotion scheme there is a signiﬁcant difference between ﬁnishing third and fourth and not between second and third. For Fulham, as they had a very low probability of ﬁnishing fourth, the shift of signiﬁcance “up” the table meant that more of Fulham’s games had greater overall signiﬁcance. The opposite is true for Birmingham ( 20981) who had a good chance of ﬁnishing third (they spent most of the season in fourth place). As the play-off system had removed much of the signiﬁcance between ﬁnishing third and fourth, it thus removed much of the signiﬁcance in Birmingham matches. Had the positions been reversed and larger club Birmingham had the season that smaller club Fulham had (and vice versa), the greater efﬁciency of Birmingham in turning signiﬁcance into attendance may have reversed the overall picture. Bolton ( 10168) are the other big losers whereas Shefﬁeld United (+5311) and Preston (+4143) are the biggest gainers. Models 1 and 2 include all play-off related variables and so allow for a net effect of the counteracting play-off related variables. They estimate the net impact of the play-off regime to be an increase in attendance of between 0.91% and 0.69%, an increase of between 71569 to 54595 or 130 to 99 attendees per game. This indicates the production of more non-zero probability games (keeping teams theoretically in contention) outweighs the redistribution of probability and signiﬁcance from the larger to smaller clubs, at least in the season 2000/01. However the overall effect is rather modest. Across teams, both models predict that Bolton, Birmingham Blackburn, Crewe and Fulham (in model 2 only) lose attendance (Crewe via the impact on away teams), with Bolton ( 27846), Blackburn ( 15635) and Birmingham ( 13793) being the major losers (though two of the three will gain attendance from the play-off games themselves), whereas all other teams gain smaller attendances, but on aggregate, sufﬁcient to overcome the loses suffered by the bigger clubs.

11.8

C ONCLUSIONS

The original research question aimed to analyse the extent to which post-season playoff systems inﬂuenced attendance at regular league matches. The analysis was applied to a single season of English league football and conducted in three steps:

200

Statistical Thinking in Sports

construction of a simple theoretical model identifying play-off relevant parameters (promotion probability; non-zero probability; signiﬁcance of match); a statistical estimate of these parameter values and the effect on attendance; and ﬁnally, a predication of attendances that would be observed given the values of the promotion variables implied by an automatic promotion regime. The theoretical model identiﬁes that play-off related variables may act against each other and thus makes the impact of play-offs an empirical matter. Section 11.3 identiﬁed the practical means of measuring the play-off related parameters in this analysis. Lack of ideal data has required construction of imperfect measures using ex-ante betting data in constructing the expectations of ﬁnal league positions. This will probably introduce an element of right-hand side variable measurement error that is largest during the early part of the season (where the time difference between actual and ideal variable measurement is largest and expectations are a greater function of the simulation). Whilst it is expected further research may target this issue and produce more reﬁned measurements, inspection of matches which the ﬂawed measure identify as signiﬁcant feel intrinsically correct and incorporate current league positions, league structure, remaining ﬁxtures and some reasonable rational means of incorporating how the remaining ﬁxtures will be resolved and how they will impact on the ﬁnal table. The empirical content of the paper is provided by analysis of the 2000/2001 English 1st division season which is described in section 11.4. The use of a single season means that inference may be limited to that season alone and, in addition, may have contributed to issues of multicollinearity. However, the season does not look unduly exceptional, with larger teams ﬁnishing towards the top of the division, and so the results may be expected to replicated over other seasons. This descriptive section identiﬁes the data issues involved - strictly non-zero, skewed and heteroscedastic attendance data clustered within teams as a result of unobserved heterogeneity between teams. The implications of which are discussed in section 11.5 and conclude that a GLM framework is appropriate to allow for the production of unbiased predictions on the impact on attendance, something the more common log transformation solutions do not permit. The GLM framework is supplemented with random effects and a correction for endogeneity. Within the model, a log link function allows clubs with different market sizes to have heterogeneous responses to determinants of demand and the choice of a Gamma distribution allows for unbiased estimates and predictions despite the skewed, non-zero and heteroscedastic nature of the dependent variable. Though there does appear to be an issue with multicollinearity and the data may be limited in the extent it can separate out the individual effects of the elements of promotion probability and signiﬁcance, sensitivity analysis show the substantive overall results are generally robust to this issue. The estimation ﬁnds that two elements of match signiﬁcance are consistently signiﬁcant: the non-zero probability of a game and the probability of promotion at the time of the game. The difference a match may make to the probability is of the correct sign but not signiﬁcant unless the highly correlated variables are omitted from the speciﬁcation. The data also permits a detailed description of how the play-off system has reallocated probability and signiﬁcance from bigger to smaller clubs leading to a potential

Post-season play-off systems and attendance

201

for play-offs to actually reduce aggregate attendance. However given the estimated relationship between game characteristics and attendance, the impact of creating more games where teams are theoretically still in contention for an end of season outcome is sufﬁcient for play-offs to have increased overall attendance. In total the overall estimated impact is a rather modest increase of between 0.91% and 0.69% attendees during the regular season. This supports the notion that the English system has created additional attendance, but by an amount that may be smaller than anticipated. Two reasons stand out for this, ﬁrstly although statistically signiﬁcant, the play-off related variables do not have a great practical impact on the attendance “production” function - a model which assumes no impact of the redistribution of probability still estimates a limited positive impact of just 1.79%. Such results may indicate the limitations other attendance generating policies may have. Secondly although the net impact is positive, the redistribution effect reduces the overall gain, with the bigger teams occupying the higher play-off qualiﬁcation positions (3rd and 4th) and 2nd place losing amounts of attendance which are not matched by any other individual team gain and eclipsed only by the combined gain of all other teams.

R EFERENCES

Berri, D.J., M.B. Schmidt, and S.L. Brook (2004). Stars at the gate: The impact of star power on NBA gate revenues. Journal of Sports Economics 5(1), 33–50. Borland, J. and R. Macdonald (2003). Demand for sport. Oxford Review of Economic Policy 19, 478–502. Cairns, J. (1990). The demand for professional team sports. British Review of Economic Issues 12(28), 1–20. Dart, J. and J. Gross (2005). Does it really matter where you ﬁnish in the play-offs? The Guardian. Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association 78, 605–610. Forrest, D. and R. Simmons (2002). Outcome uncertainty and attendance demand in sport: The case of English soccer. The Statistician 51(1), 13–38. Garcia, J. and P. Rodriguez (2002). The determinants of football match attendance revisited: Empricial evidence from the Spanish football league. Journal of Sports Economics 3(1), 18–38. Gardner, P. (2005). DC United outdo Adu. World Soccer, 22–23. Greene, W.H. (2003). Econometric Analysis (5th ed.). Prentice Hall. Kennedy, P. (2004). A Guide to Econometrics (5th ed.). Oxford: Blackwell Publishing. Kuypers, T. (1997). The beautiful game? An econometric study of audiences, gambling and efﬁciency in English football.

202

Statistical Thinking in Sports

Mabel’s Tables (2003). Football yearbook. www.mabels-tables.com. Maddala, G.S. (2001). Introduction to Econometrics (3rd ed.). Chichester: John Wiley. Manning, W.G. and J. Mullahy (2001). Estimating log models: To transform or not to transform? Journal of Health Economics 20, 461–494. McCullagh, P. and J.A. Nelder (1989). Generalized Linear Models (2nd ed.). Monographs on statistics and applied probability 37. Chapman and Hall. Noll, R.G. (2003). The organization of sports leagues. Oxford Review of Economic Policy 19(4), 530–551. Peel, D.A. and D.A. Thomas (1996). Attendance demand: An investigation of repeat ﬁxtures. Applied Economics Letters 3(6), 391–394. Rabe-Hesketh, S., A. Skrondal, and A. Pickles (2004). GLLAMM Manual. U.C. Berkeley Division of Biostatistics Working Paper Series. Skrondal, A. and S. Rabe-Hesketh (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC. StataCorp (2003). Stata Statistical Software: Release 8. College Station, TX: StataCorp LP. Statmail (2003). Statmail. www.statmail.co.uk.

References

Albert, J. (2001). Using play-by-play baseball data to develop a better measure of batting performance. Bowling Green State University. Online at bayes.bgsu.edu/papers/ rating paper2.pdf. Albert, J. (2002). Hitting with runners in scoring position. Chance 15, 8–16. Albert, J. (2003). Teaching Statistics Using Baseball. Washington, DC: Mathematical Association of America. Albert, J. (2005). Does a baseball hitter’s batting average measure ability or luck? Stats 44. Albert, J. and J. Bennett (2003). Curve Ball: Baseball, Statistics and the Role of Chance in the Game (revised ed.). New York: Springer-Verlag. Albright, S.C. (1993). A statistical analysis of hitting streaks in baseball. Journal of the American Statistical Association 88, 1175–1183 (with discussion). Anderson, C.L. (1977). Note on the advantage of ﬁrst serve. Journal of Combinatorial Theory A 23, 363. Audas, R., S. Dobson, and J. Goddard (2002). The impact of managerial change on team performance in professional sports. Journal of Economics and Business 54, 633–650. Baimbridge, M., S. Cameron, and P. Dawson (1996). Satellite television and the demand for football: A whole new ball game? Scottish Journal of Political Economy 43, 317–333. Barnett, V. and S. Hilditch (1993). The effect of an articiﬁal pitch surface on home team performance in football (soccer). Journal of the Royal Statistical Society, A 156, 39–50. Bennett, J. (Ed.) (1998). Statistics in Sport. New York: Oxford University Press. Berri, D.J. (2004). A simple measure of worker productivity in the National Basketball Association. Mimeo. Berri, D.J., S. Brook, A. Fenn, B. Frick, and R. Vicente-Mayoral (2005). The short supply of tall people: Explaining competitive imbalance in the National Basketball Association. Journal of Economic Issues 39(4), 1029–1041.

277

278

References

Berri, D.J. and E. Eschker (2005). Performance when it counts? The myth of the prime-time performer in the NBA. Journal of Economics Issues 39(3), 798–807. Berri, D.J. and A. Fenn (2004). Is the sports media color-blind? Presented at the Southern Economic Association; New Orleans, Louisiana. Berri, D.J. and A. Krautmann (2006). Shirking on the court: Testing for the disincentive effects of guaranteed pay. Economic Inquiry 44(3), 536–546. Berri, D.J. and M.B. Schmidt (2006). On the road with the National Basketball Associations superstar externality. Journal of Sports Economics 7, 347–358. Berri, D.J., M.B. Schmidt, and S.L. Brook (2004). Stars at the gate: The impact of star power on NBA gate revenues. Journal of Sports Economics 5(1), 33–50. Berri, D.J., M. Schmidt, and S. Brook (2006). The Wages of Wins. Stanford: Stanford University Press. Berry, S.M. (2000). My triple crown. Chance 13(2), 56–61. Berry, S.M., C.S. Reese, and P.D. Larkey (1999). Bridging different eras in sports. Journal of the American Statistical Association 94(447), 661–676. Bertalanffy, L. von (1938). A quantitative theory of organic growth. Human Biology 10, 181–213. Bissinger, B. (2005). Three Nights in August: Strategy, Heartbreak and Joy Inside the Mind of a Manager. Mariner Books. Blass, A.A. (1992). Does the baseball labor market contradict the human capital model of investment? The Review of Economics and Statistics 74(2), 261–268. Blest, D.C. (1996). Lower bounds for athletic performances. The Statistician 45, 243–253. Borghans, L. (1995). Keuzeprobleem op Centre Court. Economisch Statistische Berichten 80, 658–661. Borland, J. and R. Macdonald (2003). Demand for sport. Oxford Review of Economic Policy 19, 478–502. Boulier, B. and H. Stekler (2003). Predicting the outcomes of National Football League games. International Journal of Forecasting 19, 257–270. Brain, P. and R. Cousens (1989). An equation to describe dose responses where there is stimulation of growth at low dose. Weed Research 29, 93–96. Bray, S.R., J. Obrara, and M. Kwan (2005). Batting last as a home advantage factor in men’s NCAA tournament baseball. Journal of Sports Sciences 23(7), 681–686. Burnham, K.P. and D.R. Anderson (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd. ed. ed.). New York: Springer-Verlag.

References

279

Cain, M., D. Law, and D. Peel (2000). The favourite-longshot bias and market efﬁciency in UK football betting. Scottish Journal of Political Economy 47, 25–36. Cairns, J. (1990). The demand for professional team sports. British Review of Economic Issues 12(28), 1–20. Carlin, B.P. (1996). Improved NCAA basketball tournament modeling via point spread and team strength information. The American Statistician 50, 39–43. Charnes, A., W.W. Cooper, and E. Rhodes (1978). Measuring the efﬁciency of decision making units. European Journal of Operational Research 2, 429–444. Chatterjee, S. and S. Chatterjee (1982). New lamps for old: An exploratory analysis of running times in Olympic Games. Applied Statistics 31, 14–22. Clarke, S.R. (1993). Computer forecasting of Australian Rules Football for a daily newspaper. Journal of the Operational Research Society 44, 753–799. Clarke, S.R. and J.M. Norman (1995). Home ground advantage of individual clubs in English soccer. The Statistician 44(4), 509–521. Courneya, K.S. and A.V. Carron (1990). Batting ﬁrst versus last: Implications for the home advantage. Journal of Sport and Exercise Psychology 12, 312–316. Cramer, R.D. (1977). Do clutch hitters exist? Baseball Research Journal 2. Croskey, M. A., P. M. Dawson, A. C. Luessen, I. E. Marohn, and H. E. Wright (1922). The height of the center of gravity of man. American Journal of Physiology 61, 171– 185. Crowder, M., M. Dixon, A. Ledford, and M Robinson (2002). Dynamic modelling and prediction of English Football League matches for betting. The Statistician 51, 157–168. Czarnitzki, D. and G. Stadtmann (2002). Uncertainty of outcome versus reputation: Empirical evidence for the ﬁrst German football division. Empirical Economics 27, 101–112. Dart, J. and J. Gross (2005). Does it really matter where you ﬁnish in the play-offs? The Guardian. Dawson, P., S. Dobson, J. Goddard, and J. Wilson (2007). Are football referees really biased and inconsistent? Evidence on the incidence of disciplinary sanctions in the English Premier League. Journal of the Royal Statistical Society Series A 170, 231–250. De Boer, R. W., J. Cari, W. Vaes, J. P. Clarijs, A. P. Hollander, G. De Groot, and G. J. Van Ingen Schenau (1987). Moments of force, power, and muscle coordination in speed skating. International Journal of Sports Medicine 8(6), 371–378. De Boer, R. W., G. J. Ettema, H. Van Gorkum, G. De Groot, and G. J. Van Ingen Schenau (1987). Biomechanical aspects of push off techniques in speed skating the curves. International Journal of Sports Biomechanics 3, 69–79.

280

References

De Koning, J.J., G. De Groot, and G.J. Van Ingen Schenau (1989). Mechanical aspects of the sprint start in olympic speed skating. International Journal of Sports Biomechanics 5, 151–168. De Koning, J.J., G. De Groot, and G.J. Van Ingen Schenau (1991). Coordination of leg muscles during speed skating. Journal of Biomechanics 24(2), 137–146. Deakin, M.A.B. (1967). Estimating bounds on athletic performance. Mathematics Gazette 51, 100–103. Dixon, M.J. and S.C. Coles (1997). Modelling association football scores and inefﬁciencies in the football betting market. Applied Statistics 46, 265–280. Dixon, M.J. and P.F. Pope (2004). The value of statistical forecasts in the UK association football betting market. International Journal of Forecasting 20, 697–711. Dixon, M.J. and M.E. Robinson (1998). A birth process for association football matches. The Statistician 47, 523–538. Dobson, S. and J. Goddard (2001). The Economics of Football. Cambridge: Cambridge University Press. Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association 78, 605–610. Dubin, C.L. (1990). Commission of Inquiry into the Use of Drugs and Banned Practices Intended to Increase Athletic Performance. Ottawa: Canadian Government Publishing Center. Dyte, D. and S.R. Clarke (2000). A ratings based Poisson model for World Cup simulation. Journal of the Operational Research Society 51, 993–998. Efron, B. and C. Morris (1977). Stein’s paradox in statistics. Scientiﬁc American 236(5), 119–127. Fizel, J. (2005). The National Football League conundrum. In J. Fizel (Ed.), The Handbook of Sports Economics Research, pp. 170–171. M.E. Sharpe, Inc. Forrest, D., J. Beaumont, J. Goddard, and R. Simmons (2005). Home advantage and the debate about competitive balance in professional sports leagues. Journal of Sports Sciences 23(4), 439–445. Forrest, D., J. Goddard, and R. Simmons (2005). Odds setters as forecasters. International Journal of Forecasting 21, 551–564. Forrest, D. and R. Simmons (2000a). Forecasting sport: the behaviour and performance of football tipsters. International Journal of Forecasting 16, 316–331. Forrest, D. and R. Simmons (2000b). Making up the results: the work of the Football Pools Panel, 1963-1997. The Statistican 49(2), 253–260. Forrest, D. and R. Simmons (2002). Outcome uncertainty and attendance demand in sport: The case of English soccer. The Statistician 51(1), 13–38.

References

281

Forrest, D., R. Simmons, and B. Buraimo (2005). Outcome uncertainty and the couch potato audience. Scottish Journal of Political Economy 52, 641–661. Fox, D. (2005, November 10). Tony LaRussa and the search for signiﬁcance. www.hardballtimes.com/main/article/ tony-larussa-and-the-search-for-significance. Francis, A.W. (1943). Running records. Science 98, 315–316. Gale, D. (1980). Optimal strategy for serving in tennis. Mathematics Magazine 44, 197–199. Garcia, J. and P. Rodriguez (2002). The determinants of football match attendance revisited: Empricial evidence from the Spanish football league. Journal of Sports Economics 3(1), 18–38. Gardner, P. (2005). DC United outdo Adu. World Soccer, 22–23. Gelman, A., J.B. Carlin, H.S. Stern, and D B. Rubin (2003). Bayesian Data Analysis (2nd ed.). Boca Raton: CRC Press/Chapman & Hall. George, S.L. (1973). Optimal strategy in tennis: A simple probabilistic model. Applied Statistics 22, 97–104. Gillman, L. (1985). Missing more serves may win more points. Mathematics Magazine 58, 222–224. Goddard, J. (2005). Regression models for forecasting goals and match results in association football. International Journal of Forecasting 21, 331–340. Goddard, J. and I. Asimakopoulos (2004). Forecasting football match results and the efﬁciency of ﬁxed-odds betting. Journal of Forecasting 23, 51–66. Goddard, J. and S. Thomas (2006). The efﬁciency of the UK ﬁxed-odds betting market for Euro 2004. International Journal of Sports Finance 1, 21–32. Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and on a new mode of determining life contingencies. Philosophical Transactions of the Royal Society of London 115, 513–585. Grabiner (1993). Clutch hitting study. grabiner/fullclutch.html. www.baseball1.com/bb-data/

Greene, W.H. (2003). Econometric Analysis (5th ed.). Prentice Hall. Grifﬁths, R.C. and R.K. Milne (1978). A class of bivariate Poisson processes. Journal of Multivariate Analysis 8, 380–395. Grubb, H.J. (1998). Models for comparing athletic performances. The Statistician 47, 509–521. Haan, M.A., R.H. Koning, and A. van Witteloostuijn (2007). The effects of institutional change in European soccer. Manuscript.

282

References

Hannan, E.L. (1976). An analysis of different serving strategies in tennis. In R.E. Machol, S.P. Ladany, and D.G. Morrison (Eds.), Management Science in Sports, pp. 125–135. New York: North-Holland. Hart, R., J. Hutton, and T. Sharot (1975). A statistical analysis of association football attendance. Journal of the Royal Statistical Society, Series C 24, 17–27. Hastie, T., R. Tibshirani, and J. Friedman (2001). The Elements of Statistical Learning. New York: Springer. Hellebrandt, F.A. and E.B. Franssen (1943). Physiological study of the vertical standing of man. Physiology Review 23, 220–255. Hill, A.V. (1913). The combinations of haemoglobin with oxygen and with carbon monoxide. Biochemistry 7, 471–480. Hodak, G.A. (1988). Gordon H. Adam, 1936 Olympic Games. Olympic Oral History Project, Amateur Athletic Foundation of Los Angeles. Howard, R.A. (1960). Dynamic Programming and Markov Processes. Cambridge, MA: MIT Press. Insley, R., L. Mok, and T.B. Swartz (2004). Issues related to sports gambling. The Australian and New Zealand Journal of Statistics 46, 219–232. Iwaoka, K., H. Hatta, Y. Atomi, and M. Miyashita (1988). Lactate, respiratory compensation thresholds, and distance running performance in runners of both sexes. International Journal of Sports Medicine 9(5), 306–309. Jackson, D.A. (1989). Letter to the editor on “Probability models for tennis scoring systems” by L.H. Riddle. Applied Statistics 38, 377–378. Jacoby, E. and B. Fraley (1995). Complete Book of Jumps. Champaign, IL: Human Kinetics. James, B. (1984). The Bill James Baseball Abstract. New York: Ballantine Books. James, B. (2006). The Bill James Handbook 2007. Skokie, IL.: ACTA Sports. Jang, K. T., M. G. Flynn, D. L. Costill, J. A. Kirwin, J. P.and Houmard, J. B. Mitchell, and L. J. D’Acquisto (1987). Energy balance in competitive swimmers and runners. Journal of Swimming Research 3, 19–23. Janoschek, A. (1957). Das reaktionskinetische Grundgesetz und seine Beziehungen zum Wachstums- und Ertragsgesetz. Statistische Vierteljahresschrift 10, 25–37. Karlis, D. and I. Ntzoufras (2003). Analysis of sports data by using bivariate Poisson models. The Statistician 52, 381–393. Kennedy, P. (2004). A Guide to Econometrics (5th ed.). Oxford: Blackwell Publishing. Kennelly, A.E. (1905). A study of racing animals. The American Academy of Arts and Sciences.

References

283

Kingston, J.G. (1976). Comparison of scoring systems in two-sided competitions. Journal of Combinatorial Theory 20, 357–362. Klaassen, F.J.G.M. and J.R. Magnus (2001). Are points in tennis independent and identically distributed? Evidence from a dynamic binary panel data model. Journal of the American Statistical Association 96, 500–509. Klaassen, F.J.G.M. and J.R. Magnus (2006). Are economic agents successful maximizers? An analysis through service strategy in tennis. Submitted for publication. Koning, R.H. (2000). Balance in competition in Dutch soccer. The Statistician 49(3), 419–431. Koop, G. (2004). Modelling the evolution of distributions: An application to Major League Baseball. Journal of the Royal Statistical Society, Series A 167, 639–655. Kuper, G.H. and E. Sterken (2003). Endurance in speed skating: The development of world records. European Journal of Operational Research 148(2), 293–301. Kuypers, T. (1996). The beautiful game? An econometric study of why people watch English football. Discussion Paper 96-01, Department of Economics, University College London, London. Kuypers, T. (1997). The beautiful game? An econometric study of audiences, gambling and efﬁciency in English football. Kuypers, T. (2000). Information and efﬁciency: an empirical study of a ﬁxed odds betting market. Applied Economics 32, 1353–1363. Lane, F.C. (1925). Batting. Cleveland, Ohio: SABR. Lee, A. (1999). Modelling rugby league data via bivariate negative binomial regression. Australian and New Zealand Journal of Statistics 41, 153–171. Lee, Y.H. and D.J. Berri (forthcoming). A re-examination of production functions and efﬁciency estimates for the National Basketball Association. Scottish Journal of Political Economy. Lehmann, E. L. and G. Casella (2003). Theory of Point Estimation (2nd ed.). New York: Springer. Lerner, L (1996). Physics for scientists and engineers. Boston: Jones and Bartlett. Lewis, M. (2004). Moneyball: The Art of Winning an Unfair Game. New York: W.W. Norton & Company. Lietzke, M.H. (1954). An analytical study of world and Olympic racing records. Science 119, 333–336. Linhart, H. and W. Zucchini (1986). Model Selection. New York: John Wiley and Sons.

284

References

Linthorne, N.P (1999, 31 October-5 November). Optimum throwing and jumping angles in athletics. In Proceedings 5th IOC Congress on Sport Sciences with Annual Conference of Science and Medicine in Sport, Sydney, Australia. Little, A. (1995). Wimbledon Compendium 1995. All England Lawn Tennis and Croquet Club. Loland, R. (2001). Fair Play in Sport. London: Routledge. Long, J.S. and J. Freese (2003). Regression Models for Categorical Dependent Variables using Stata. College Station, TX: Stata Press. Lutoslawska, G., A. Klusiewics, D. Sitkowsi, and B. Krawczyk (1996). The effect of simulated 2 km laboratory rowing on blood lactate, plasma inorganic phosphate and ammonia in male and female junior rowers. Biology of Sport 13(1), 31–38. Mabel’s Tables (2003). Football yearbook. www.mabels-tables.com. MacKinnon, J., H. White, and R. Davidson (1983). Tests for model speciﬁcation in the presence of alternative hypothesis: Some further results. Journal of Econometrics 21, 53–70. Maddala, G.S. (2001). Introduction to Econometrics (3rd ed.). Chichester: John Wiley. Magnus, J.R. and F.J.G.M. Klaassen (1999a). The effect of new balls in tennis: Four years at Wimbledon. The Statistician 48, 239–246. Magnus, J.R. and F.J.G.M. Klaassen (1999b). The ﬁnal set in a tennis match: Four years at Wimbledon. Journal of Applied Statistics 26, 461–468. Magnus, J.R. and F.J.G.M. Klaassen (1999c). On the advantage of serving ﬁrst in a tennis set: Four years at Wimbledon. The Statistician 48, 247–256. Maher, M.J. (1982). Modelling association football scores. Statistica Neerlandica 36, 109–118. Maisel, H. (1966). Best k of 2k Association 61, 329–344. 1 comparison. Journal of the American Statistical

Malthus, T.R. (1798). An Essay on the Principle of Population. Library of Economics and Liberty. Manning, W.G. and J. Mullahy (2001). Estimating log models: To transform or not to transform? Journal of Health Economics 20, 461–494. Maud, P.J. and B.B. Schultz (1986). Gender comparisons in anaerobic power and anaerobic capacity tests. British Journal of Sports Medicine 20(2), 51–54. McArdle, W., F.I. Katch, and V.L. Katch (1981). Exercise Physiology. Philadelphia: Lea and Febiger. McCracken, V. (2001). Pitching and defense: how much control do hurlers have? Baseball Prospectus. www.baseballprospectus.com.

References

285

McCullagh, P. and J.A. Nelder (1989). Generalized Linear Models (2nd ed.). Monographs on statistics and applied probability 37. Chapman and Hall. McFadden, D. (1984). Econometric analysis of qualitative choice models. In Z. Griliches and M.D. Intrilligator (Eds.), Handbook of Econometrics, Volume II, Chapter 24. Amsterdam: North-Holland. McGinnis, P.M. (1991). Biomechanics and pole vaulting: Run fast and hold high. In Proceedings of the American Society of Biomechanics, 15th Annual Meeting, Tempe, AZ, pp. 16–17. McGinnis, P.M. (1997). Mechanics of the pole vault take-off. New Studies in Athletics 12(1), 43–46. McHale, I.G. and P.A. Scarf (2006). Modelling soccer matches using bivariate discrete outcomes. Technical report, University of Salford. McQuarrie, A.D.R. and C.-L. Tsai (1998). Regression and Time Series Model Selection. Singapore: World Scientiﬁc. Miles, R.E. (1984). Symmetric sequential analysis: The efﬁciencies of sports scoring systems (with particular reference to those of tennis). Journal of the Royal Statistical Society, Series B 46, 93–108. Morrison, D.G. and M.U. Kalwani (1993). The best NFL ﬁeld goal kickers: Are they lucky or good? Chance 6(3), 30–37. Mosteller, F. and J.W. Tuckey (1977). Data Analysis and Regression. AddisonWesley. Nevill, A.M., N.J. Balmer, and A.M. Williams (2002). The inﬂuence of crowd noise and experience upon refereeing decisions in football. Psychology of Sport and Exercise 3, 261–272. Nevill, A.M and G. Whyte (2005). Are there limits to running world records? Medicine and Science in Sports and Exercise 37, 1785–1788. NFL Record & Fact Book (Various editions). Noll, R.G. (2003). The organization of sports leagues. Oxford Review of Economic Policy 19(4), 530–551. Norman, J.M. (1985). Dynamic programming in tennis — when to use a fast serve. Journal of Operational Research Society 36, 75–77. Paton, D. and A. Cooke (2005). Attendance at county cricket: An economic analysis. Journal of Sports Economics 6, 24–45. Peel, D.A. and D.A. Thomas (1988). Outcome uncertainty and the demand for football. Scottish Journal of Political Economy 35, 242–249. Peel, D.A. and D.A. Thomas (1992). The demand for football: Some evidence on outcome uncertainty. Empirical Economics 17, 323–331.

286

References

Peel, D.A. and D.A. Thomas (1996). Attendance demand: An investigation of repeat ﬁxtures. Applied Economics Letters 3(6), 391–394. Pollard, R. (1986). Home advantage in soccer: A retrospective analysis. Journal of Sports Sciences 4, 237–248. Pollard, R. (2002). Evidence of a reduced home advantage when a team moves to a new stadium. Journal of Sports Sciences 20(12), 969–973. Pollard, R. (2006). Worldwide regional variations in home advantage in association football. Journal of Sports Sciences 24(3), 231–240. Pope, P.F. and D.A. Peel (1989). Information, prices and efﬁciency in a ﬁxed-odds betting market. Economica 56, 323–341. Quirk, J. and R.D. Fort (1992). Pay Dirt The Business of Professional Team Sports. Princeton: Princeton University Press. Rabe-Hesketh, S., A. Skrondal, and A. Pickles (2004). GLLAMM Manual. U.C. Berkeley Division of Biostatistics Working Paper Series. Rascher, D. (1999). A test of the optimal positive production network externality in Major League Baseball. In J. Fizel, E. Gustafson, and L. Hadley (Eds.), Sports Economics: Current Research, Westport, CT. Praeger. Ratkowsky, D.A. (1990). Dekker. Handbook of Nonlinear Regression Models. Marcel

Reep, R. and B. Benjamin (1968). Skill and chance in Association Football. Journal of the Royal Statistical Society A 131, 581–585. Richards, F.J. (1959). A ﬂexible growth function for empirical use. Journal of Experimental Botany 10, 290–300. Ridder, G., J.S. Cramer, and P. Hopstaken (1994). Down to ten: Estimating the effect of a red card in soccer. Journal of the American Statistical Association 89, 1124– 1127. Riddle, L.H. (1988). Probability models for tennis scoring systems. Applied Statistics 37, 63–75 & 490. Riddle, L.H. (1989). Author’s reply to D.A. Jackson. Applied Statistics 38, 378–379. Ruane, T. (2005). In search of clutch hitting. retrosheet.org/Research/ RuaneT/clutch_art.htm. Rue, H. and O. Salvesen (2000). Prediction and retrospective analysis of soccer matches in a league. The Statistician 49, 399–418. Schabenberger, O., B.E. Tharp, J.J. Kells, and D. Penner (1999). Statistical tests for Hormesis and effective dosages in herbicide dose response. Agronomy Journal 91, 713–721.

References

287

Schnute, J. (1981). A versatile growth model with statistically stable parameters. Canadian Journal of Fishery and Aquatic Science 38, 1128–1140. Scully, G.W. (1974). Pay and performance in major league baseball. American Economic Review 64, 915–930. Silver, N. (2004). Lies, damned lies: The unique Ichiro. Baseball Prospectus. www. baseballprospectus.com. Silver, N. (2006). Is David Ortiz a clutch hitter? In J. Keri (Ed.), Baseball Between the Numbers, Chapter 1-2, pp. 14–34. New York: Basic Books. Simon, G.A. and J.S. Simonoff (2006). “Last licks”: Do they really help? The American Statistician 60(1), 13–18. Simonoff, J.S. (2003). Analyzing Categorical Data. New York: Springer-Verlag. Skrondal, A. and S. Rabe-Hesketh (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC. Smith, R.L. (1988). Forecasting records by maxiumum likelihood. Journal of the American Statistical Association 83, 331–388. StataCorp (2003). Stata Statistical Software: Release 8. College Station, TX: StataCorp LP. Statmail (2003). Statmail. www.statmail.co.uk. Stefani, R.T. (1987). Applications of statistical methods to American Football. Journal of Applied Statistics 14(1), 61–73. Stefani, R.T. (1998). Predicting outcomes. In J. Bennett (Ed.), Statistics in Sport, Chapter 12. New York: Oxford University Press. Stefani, R.T (1999). A taxonomy of sports rating systems. IEEE Trans. On Systems, Man and Cybernetics, Part A 29(1), 116–120. Stefani, R.T. (2006). The relative power output and relative lean body mass of world and Olympic male and female champions with implications for gender equity. Journal of Sports Sciences 24(12), 1329–1339. Stefani, R.T and S.R. Clarke (1992). Predictions and home advantage for Australian rules football. Journal of Applied Statistics 19(2), 251–259. Stefani, R.T. and D. Stefani (2000). Power output of Olympic rowing champions. Olympic Review XXVI-30, 59–63. Sterken, E. (2005). A stochastic frontier approach to running performance. IMA Journal of Management Mathematics 16, 141–149. Stern, H.S. and C.N. Morris (1993). Looking for small effects: power and ﬁnite sample bias considerations. (comment on C. Albright’s “A statistical analysis of hitting streaks in baseball”). Journal of the American Statistical Association 88, 1189–1194.

288

References

Summers, A. (2005). Hockey pools for proﬁt: a simulation based player selection strategy. Master’s thesis, Department of Statistics and Actuarial Science, Simon Fraser University. Szymanski, S. (2003). The economic design of sporting contests. Journal of Economic Literature 41, 1137–1187. Szymanski, S. and A. Zimbalist (2003). National Pastime. Washington DC: Brookings Institution Press. Tatem, A.J., C.A. Guerra, P.M. Atkinson, and S.I. Hay (2004). Momentous sprint at the 2156 Olympics? Nature 431, 525. Thorn, J. and P. Palmer (1984). The Hidden Game of Baseball: A Revolutionary Approach to Baseball and Its Statistics. New York: Doubleday. Torgler, B. (2004). The economics of the FIFA football World Cup. Kyklos 57, 287–300. Toussaint, H.M. and P.J. Beck (1992). Biomechanics of competitive front crawl swimming. Sports Medicine 13, 8–24. Toussaint, H.M., G. De Groot, H.H. Savelberg, K. Vervoorn, A.P. Hollander, and G.J. Van Ingen Schenau (1988). Active drag related to velocity in male and female swimmers. Journal of Biomechanics 21, 435–438. Toussaint, H.M., T.Y. Janssen, and M. Kluft (1991). Effect of propelling surface size on the mechanics and energetics of front crawl swimming. Journal of Biomechanics 24, 205–211. Toussaint, H.M., W. Knops, G. De Groot, and A.P. Hollander (1990). The mechanical efﬁciency of front crawl swimming. Medicine and Science in Sports and Exercise 22, 402–408. Tuck, L.O. and L. Lazauskas (1996, 30 Sept. to 2 October). Low drag rowing shells. In Third Conference on Mathematics and Computers in Sport, Queensland, Australia, pp. 17–34. Bond University. Van Ingen Schenau, G.J, G. De Groot, and R.W. De Boer (1985). The control of speed in elite female speed skaters. Journal of Biomechanics 18(2), 91–96. Vrooman, J. (1996). The baseball’s player market reconsidered. Southern Economic Journal 63(2), 339–360. Wainer, H., C. Njue, and S. Palmer (2000). Assessing time trends in sex differences in swimming and running. Chance 13(1), 10–21. Wakai, M. and N.P Linthorne (2000, 7-12 September). Optimum takeoff angle in the standing long jump. In Sports Medicine Book of Abstracts, 2000 Pre-Olympic Congress: International Congress on Sport Science, Sports Medicine and Physical Education, Brisbane, Australia.

References

289

Wallechinsky, D. (2004). The Complete Book of the Summer Olympics, Athens 2004 Edition. Sport Classic. Wallechinsky, D. (2005). The Complete Book of the Winter Olympics, Turin 2006 Edition. Sport Classic. Welki, A. and T. Zlatoper (1999). U.S professional football game-day attendance. Atlantic Economic Journal 27, 285–298. Wellock, I.J., G.C. Emmans, and I. Kyriazakis (2004). Describing and predicting potential growth in the pig. Animal Science 78, 379–388. Weyand, P.G., D.B. Sternlight, M.J. Bellizzi, and S Wright (2000). Faster top running speeds are achieved with greater ground forces not rapid leg movements. Journal of Applied Physiology 89, 1991–1999. Whipp, B.J. and S.A. Ward (1992). Will women soon outrun men? Nature 355, 25. Woolner, K. (2001). Counterpoint: Pitching and defense. Baseball Prospectus. www. baseballprospectus.com.