Free Essay

Loyalty

In:

Submitted By acerman
Words 16766
Pages 68
Chapter 1

Discrete Probability Distributions
1.1 Simulation of Discrete Probabilities

Probability
In this chapter, we shall first consider chance experiments with a finite number of possible outcomes ω1 , ω2 , . . . , ωn . For example, we roll a die and the possible outcomes are 1, 2, 3, 4, 5, 6 corresponding to the side that turns up. We toss a coin with possible outcomes H (heads) and T (tails). It is frequently useful to be able to refer to an outcome of an experiment. For example, we might want to write the mathematical expression which gives the sum of four rolls of a die. To do this, we could let Xi , i = 1, 2, 3, 4, represent the values of the outcomes of the four rolls, and then we could write the expression X 1 + X 2 + X 3 + X4 for the sum of the four rolls. The Xi ’s are called random variables. A random variable is simply an expression whose value is the outcome of a particular experiment. Just as in the case of other types of variables in mathematics, random variables can take on different values. Let X be the random variable which represents the roll of one die. We shall assign probabilities to the possible outcomes of this experiment. We do this by assigning to each outcome ωj a nonnegative number m(ωj ) in such a way that m(ω1 ) + m(ω2 ) + · · · + m(ω6 ) = 1 . The function m(ωj ) is called the distribution function of the random variable X. For the case of the roll of the die we would assign equal probabilities or probabilities 1/6 to each of the outcomes. With this assignment of probabilities, one could write P (X ≤ 4) = 1 2 3

2

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

to mean that the probability is 2/3 that a roll of a die will have a value which does not exceed 4. Let Y be the random variable which represents the toss of a coin. In this case, there are two possible outcomes, which we can label as H and T. Unless we have reason to suspect that the coin comes up one way more often than the other way, it is natural to assign the probability of 1/2 to each of the two outcomes. In both of the above experiments, each outcome is assigned an equal probability. This would certainly not be the case in general. For example, if a drug is found to be effective 30 percent of the time it is used, we might assign a probability .3 that the drug is effective the next time it is used and .7 that it is not effective. This last example illustrates the intuitive frequency concept of probability. That is, if we have a probability p that an experiment will result in outcome A, then if we repeat this experiment a large number of times we should expect that the fraction of times that A will occur is about p. To check intuitive ideas like this, we shall find it helpful to look at some of these problems experimentally. We could, for example, toss a coin a large number of times and see if the fraction of times heads turns up is about 1/2. We could also simulate this experiment on a computer.

Simulation
We want to be able to perform an experiment that corresponds to a given set of probabilities; for example, m(ω1 ) = 1/2, m(ω2 ) = 1/3, and m(ω3 ) = 1/6. In this case, one could mark three faces of a six-sided die with an ω1 , two faces with an ω2 , and one face with an ω3 . In the general case we assume that m(ω1 ), m(ω2 ), . . . , m(ωn ) are all rational numbers, with least common denominator n. If n > 2, we can imagine a long cylindrical die with a cross-section that is a regular n-gon. If m(ωj ) = nj /n, then we can label nj of the long faces of the cylinder with an ωj , and if one of the end faces comes up, we can just roll the die again. If n = 2, a coin could be used to perform the experiment. We will be particularly interested in repeating a chance experiment a large number of times. Although the cylindrical die would be a convenient way to carry out a few repetitions, it would be difficult to carry out a large number of experiments. Since the modern computer can do a large number of operations in a very short time, it is natural to turn to the computer for this task.

Random Numbers
We must first find a computer analog of rolling a die. This is done on the computer by means of a random number generator. Depending upon the particular software package, the computer can be asked for a real number between 0 and 1, or an integer in a given set of consecutive integers. In the first case, the real numbers are chosen in such a way that the probability that the number lies in any particular subinterval of this unit interval is equal to the length of the subinterval. In the second case, each integer has the same probability of being chosen.

1.1. SIMULATION OF DISCRETE PROBABILITIES .203309 .932052 .069664 .750216 .946708 .762057 .415178 .670982 .784810 .380365 .151121 .716719 .352320 .089734 .027381 .623868 .967412 .049723 .966730 .900794

3

Table 1.1: Sample output of the program RandomNumbers. Let X be a random variable with distribution function m(ω), where ω is in the set {ω1 , ω2 , ω3 }, and m(ω1 ) = 1/2, m(ω2 ) = 1/3, and m(ω3 ) = 1/6. If our computer package can return a random integer in the set {1, 2, ..., 6}, then we simply ask it to do so, and make 1, 2, and 3 correspond to ω1 , 4 and 5 correspond to ω2 , and 6 correspond to ω3 . If our computer package returns a random real number r in the interval (0, 1), then the expression 6r + 1 will be a random integer between 1 and 6. (The notation x means the greatest integer not exceeding x, and is read “floor of x.”) The method by which random real numbers are generated on a computer is described in the historical discussion at the end of this section. The following example gives sample output of the program RandomNumbers. Example 1.1 (Random Number Generation) The program RandomNumbers generates n random real numbers in the interval [0, 1], where n is chosen by the user. When we ran the program with n = 20, we obtained the data shown in Table 1.1. 2

Example 1.2 (Coin Tossing) As we have noted, our intuition suggests that the probability of obtaining a head on a single toss of a coin is 1/2. To have the computer toss a coin, we can ask it to pick a random real number in the interval [0, 1] and test to see if this number is less than 1/2. If so, we shall call the outcome heads; if not we call it tails. Another way to proceed would be to ask the computer to pick a random integer from the set {0, 1}. The program CoinTosses carries out the experiment of tossing a coin n times. Running this program, with n = 20, resulted in: THTTTHTTTTHTTTTTHHTT. Note that in 20 tosses, we obtained 5 heads and 15 tails. Let us toss a coin n times, where n is much larger than 20, and see if we obtain a proportion of heads closer to our intuitive guess of 1/2. The program CoinTosses keeps track of the number of heads. When we ran this program with n = 1000, we obtained 494 heads. When we ran it with n = 10000, we obtained 5039 heads.

4

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

We notice that when we tossed the coin 10,000 times, the proportion of heads was close to the “true value” .5 for obtaining a head when a coin is tossed. A mathematical model for this experiment is called Bernoulli Trials (see Chapter 3). The Law of Large Numbers, which we shall study later (see Chapter 8), will show that in the Bernoulli Trials model, the proportion of heads should be near .5, consistent with our intuitive idea of the frequency interpretation of probability. Of course, our program could be easily modified to simulate coins for which the probability of a head is p, where p is a real number between 0 and 1. 2 In the case of coin tossing, we already knew the probability of the event occurring on each experiment. The real power of simulation comes from the ability to estimate probabilities when they are not known ahead of time. This method has been used in the recent discoveries of strategies that make the casino game of blackjack favorable to the player. We illustrate this idea in a simple situation in which we can compute the true probability and see how effective the simulation is. Example 1.3 (Dice Rolling) We consider a dice game that played an important role in the historical development of probability. The famous letters between Pascal and Fermat, which many believe started a serious study of probability, were instigated by a request for help from a French nobleman and gambler, Chevalier de M´r´. It is said that de M´r´ had been betting that, in four rolls of a die, at ee ee least one six would turn up. He was winning consistently and, to get more people to play, he changed the game to bet that, in 24 rolls of two dice, a pair of sixes would turn up. It is claimed that de M´r´ lost with 24 and felt that 25 rolls were ee necessary to make the game favorable. It was un grand scandale that mathematics was wrong. ee We shall try to see if de M´r´ is correct by simulating his various bets. The program DeMere1 simulates a large number of experiments, seeing, in each one, if a six turns up in four rolls of a die. When we ran this program for 1000 plays, a six came up in the first four rolls 48.6 percent of the time. When we ran it for 10,000 plays this happened 51.98 percent of the time. We note that the result of the second run suggests that de M´r´ was correct ee in believing that his bet with one die was favorable; however, if we had based our conclusion on the first run, we would have decided that he was wrong. Accurate results by simulation require a large number of experiments. 2 The program DeMere2 simulates de M´r´’s second bet that a pair of sixes ee will occur in n rolls of a pair of dice. The previous simulation shows that it is important to know how many trials we should simulate in order to expect a certain degree of accuracy in our approximation. We shall see later that in these types of experiments, a rough rule of thumb is that, at least 95% of the time, the error does not exceed the reciprocal of the square root of the number of trials. Fortunately, for this dice game, it will be easy to compute the exact probabilities. We shall show in the next section that for the first bet the probability that de M´r´ wins is ee 1 − (5/6)4 = .518.

1.1. SIMULATION OF DISCRETE PROBABILITIES
10 8 6 4 2 5 -2 -4 -6 -8 -10 10 15 20 25 30 35 40

5

Figure 1.1: Peter’s winnings in 40 plays of heads or tails. One can understand this calculation as follows: The probability that no 6 turns up on the first toss is (5/6). The probability that no 6 turns up on either of the first two tosses is (5/6)2 . Reasoning in the same way, the probability that no 6 turns up on any of the first four tosses is (5/6)4 . Thus, the probability of at least one 6 in the first four tosses is 1 − (5/6)4 . Similarly, for the second bet, with 24 rolls, the probability that de M´r´ wins is 1 − (35/36)24 = .491, and for 25 rolls it ee is 1 − (35/36)25 = .506. Using the rule of thumb mentioned above, it would require 27,000 rolls to have a reasonable chance to determine these probabilities with sufficient accuracy to assert that they lie on opposite sides of .5. It is interesting to ponder whether a gambler can detect such probabilities with the required accuracy from gambling experience. Some writers on the history of probability suggest that de M´r´ was, in fact, just ee interested in these problems as intriguing probability problems. Example 1.4 (Heads or Tails) For our next example, we consider a problem where the exact answer is difficult to obtain but for which simulation easily gives the qualitative results. Peter and Paul play a game called heads or tails. In this game, a fair coin is tossed a sequence of times—we choose 40. Each time a head comes up Peter wins 1 penny from Paul, and each time a tail comes up Peter loses 1 penny to Paul. For example, if the results of the 40 tosses are THTHHHHTTHTHHTTHHTTTTHHHTHHTHHHTHHHTTTHH. Peter’s winnings may be graphed as in Figure 1.1. Peter has won 6 pennies in this particular game. It is natural to ask for the probability that he will win j pennies; here j could be any even number from −40 to 40. It is reasonable to guess that the value of j with the highest probability is j = 0, since this occurs when the number of heads equals the number of tails. Similarly, we would guess that the values of j with the lowest probabilities are j = ±40.

6

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

A second interesting question about this game is the following: How many times in the 40 tosses will Peter be in the lead? Looking at the graph of his winnings (Figure 1.1), we see that Peter is in the lead when his winnings are positive, but we have to make some convention when his winnings are 0 if we want all tosses to contribute to the number of times in the lead. We adopt the convention that, when Peter’s winnings are 0, he is in the lead if he was ahead at the previous toss and not if he was behind at the previous toss. With this convention, Peter is in the lead 34 times in our example. Again, our intuition might suggest that the most likely number of times to be in the lead is 1/2 of 40, or 20, and the least likely numbers are the extreme cases of 40 or 0. It is easy to settle this by simulating the game a large number of times and keeping track of the number of times that Peter’s final winnings are j, and the number of times that Peter ends up being in the lead by k. The proportions over all games then give estimates for the corresponding probabilities. The program HTSimulation carries out this simulation. Note that when there are an even number of tosses in the game, it is possible to be in the lead only an even number of times. We have simulated this game 10,000 times. The results are shown in Figures 1.2 and 1.3. These graphs, which we call spike graphs, were generated using the program Spikegraph. The vertical line, or spike, at position x on the horizontal axis, has a height equal to the proportion of outcomes which equal x. Our intuition about Peter’s final winnings was quite correct, but our intuition about the number of times Peter was in the lead was completely wrong. The simulation suggests that the least likely number of times in the lead is 20 and the most likely is 0 or 40. This is indeed correct, and the explanation for it is suggested by playing the game of heads or tails with a large number of tosses and looking at a graph of Peter’s winnings. In Figure 1.4 we show the results of a simulation of the game, for 1000 tosses and in Figure 1.5 for 10,000 tosses. In the second example Peter was ahead most of the time. It is a remarkable fact, however, that, if play is continued long enough, Peter’s winnings will continue to come back to 0, but there will be very long times between the times that this happens. These and related results will be discussed in Chapter 12. 2 In all of our examples so far, we have simulated equiprobable outcomes. We illustrate next an example where the outcomes are not equiprobable. Example 1.5 (Horse Races) Four horses (Acorn, Balky, Chestnut, and Dolby) have raced many times. It is estimated that Acorn wins 30 percent of the time, Balky 40 percent of the time, Chestnut 20 percent of the time, and Dolby 10 percent of the time. We can have our computer carry out one race as follows: Choose a random number x. If x < .3 then we say that Acorn won. If .3 ≤ x < .7 then Balky wins. If .7 ≤ x < .9 then Chestnut wins. Finally, if .9 ≤ x then Dolby wins. The program HorseRace uses this method to simulate the outcomes of n races. Running this program for n = 10 we found that Acorn won 40 percent of the time, Balky 20 percent of the time, Chestnut 10 percent of the time, and Dolby 30 percent

1.1. SIMULATION OF DISCRETE PROBABILITIES

7

Figure 1.2: Distribution of winnings.

Figure 1.3: Distribution of number of times in the lead.

8

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

1000 plays 20 10 0 200 -10 -20 -30 -40 -50
Figure 1.4: Peter’s winnings in 1000 plays of heads or tails.

400

600

800

1000

10000 plays 200

150

100

50

0 2000 4000 6000 8000 10000

Figure 1.5: Peter’s winnings in 10,000 plays of heads or tails.

1.1. SIMULATION OF DISCRETE PROBABILITIES

9

of the time. A larger number of races would be necessary to have better agreement with the past experience. Therefore we ran the program to simulate 1000 races with our four horses. Although very tired after all these races, they performed in a manner quite consistent with our estimates of their abilities. Acorn won 29.8 percent of the time, Balky 39.4 percent, Chestnut 19.5 percent, and Dolby 11.3 percent of the time. The program GeneralSimulation uses this method to simulate repetitions of an arbitrary experiment with a finite number of outcomes occurring with known probabilities. 2

Historical Remarks
Anyone who plays the same chance game over and over is really carrying out a simulation, and in this sense the process of simulation has been going on for centuries. As we have remarked, many of the early problems of probability might well have been suggested by gamblers’ experiences. It is natural for anyone trying to understand probability theory to try simple experiments by tossing coins, rolling dice, and so forth. The naturalist Buffon tossed a coin 4040 times, resulting in 2048 heads and 1992 tails. He also estimated the number π by throwing needles on a ruled surface and recording how many times the needles crossed a line (see Section 2.1). The English biologist W. F. R. Weldon1 recorded 26,306 throws of 12 dice, and the Swiss scientist Rudolf Wolf2 recorded 100,000 throws of a single die without a computer. Such experiments are very timeconsuming and may not accurately represent the chance phenomena being studied. For example, for the dice experiments of Weldon and Wolf, further analysis of the recorded data showed a suspected bias in the dice. The statistician Karl Pearson analyzed a large number of outcomes at certain roulette tables and suggested that the wheels were biased. He wrote in 1894: Clearly, since the Casino does not serve the valuable end of huge laboratory for the preparation of probability statistics, it has no scientific raison d’ˆtre. Men of science cannot have their most refined theories e disregarded in this shameless manner! The French Government must be urged by the hierarchy of science to close the gaming-saloons; it would be, of course, a graceful act to hand over the remaining resources of the Casino to the Acad´mie des Sciences for the endowment of a laboratory e of orthodox probability; in particular, of the new branch of that study, the application of the theory of chance to the biological problems of evolution, which is likely to occupy so much of men’s thoughts in the near future.3 However, these early experiments were suggestive and led to important discoveries in probability and statistics. They led Pearson to the chi-squared test, which
C. Fry, Probability and Its Engineering Uses, 2nd ed. (Princeton: Van Nostrand, 1965). Czuber, Wahrscheinlichkeitsrechnung, 3rd ed. (Berlin: Teubner, 1914). 3 K. Pearson, “Science and Monte Carlo,” Fortnightly Review , vol. 55 (1894), p. 193; cited in S. M. Stigler, The History of Statistics (Cambridge: Harvard University Press, 1986).
2 E. 1 T.

10

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

is of great importance in testing whether observed data fit a given probability distribution. By the early 1900s it was clear that a better way to generate random numbers was needed. In 1927, L. H. C. Tippett published a list of 41,600 digits obtained by selecting numbers haphazardly from census reports. In 1955, RAND Corporation printed a table of 1,000,000 random numbers generated from electronic noise. The advent of the high-speed computer raised the possibility of generating random numbers directly on the computer, and in the late 1940s John von Neumann suggested that this be done as follows: Suppose that you want a random sequence of four-digit numbers. Choose any four-digit number, say 6235, to start. Square this number to obtain 38,875,225. For the second number choose the middle four digits of this square (i.e., 8752). Do the same process starting with 8752 to get the third number, and so forth. More modern methods involve the concept of modular arithmetic. If a is an integer and m is a positive integer, then by a (mod m) we mean the remainder when a is divided by m. For example, 10 (mod 4) = 2, 8 (mod 2) = 0, and so forth. To generate a random sequence X0 , X1 , X2 , . . . of numbers choose a starting number X0 and then obtain the numbers Xn+1 from Xn by the formula Xn+1 = (aXn + c) (mod m) , where a, c, and m are carefully chosen constants. The sequence X0 , X1 , X2 , . . . is then a sequence of integers between 0 and m − 1. To obtain a sequence of real numbers in [0, 1), we divide each Xj by m. The resulting sequence consists of rational numbers of the form j/m, where 0 ≤ j ≤ m − 1. Since m is usually a very large integer, we think of the numbers in the sequence as being random real numbers in [0, 1). For both von Neumann’s squaring method and the modular arithmetic technique the sequence of numbers is actually completely determined by the first number. Thus, there is nothing really random about these sequences. However, they produce numbers that behave very much as theory would predict for random experiments. To obtain different sequences for different experiments the initial number X0 is chosen by some other procedure that might involve, for example, the time of day.4 During the Second World War, physicists at the Los Alamos Scientific Laboratory needed to know, for purposes of shielding, how far neutrons travel through various materials. This question was beyond the reach of theoretical calculations. Daniel McCracken, writing in the Scientific American, states: The physicists had most of the necessary data: they knew the average distance a neutron of a given speed would travel in a given substance before it collided with an atomic nucleus, what the probabilities were that the neutron would bounce off instead of being absorbed by the nucleus, how much energy the neutron was likely to lose after a given
4 For a detailed discussion of random numbers, see D. E. Knuth, The Art of Computer Programming, vol. II (Reading: Addison-Wesley, 1969).

1.1. SIMULATION OF DISCRETE PROBABILITIES collision and so on.5

11

John von Neumann and Stanislas Ulam suggested that the problem be solved by modeling the experiment by chance devices on a computer. Their work being secret, it was necessary to give it a code name. Von Neumann chose the name “Monte Carlo.” Since that time, this method of simulation has been called the Monte Carlo Method. William Feller indicated the possibilities of using computer simulations to illustrate basic concepts in probability in his book An Introduction to Probability Theory and Its Applications. In discussing the problem about the number of times in the lead in the game of “heads or tails” Feller writes: The results concerning fluctuations in coin tossing show that widely held beliefs about the law of large numbers are fallacious. These results are so amazing and so at variance with common intuition that even sophisticated colleagues doubted that coins actually misbehave as theory predicts. The record of a simulated experiment is therefore included.6 Feller provides a plot showing the result of 10,000 plays of heads or tails similar to that in Figure 1.5. The martingale betting system described in Exercise 10 has a long and interesting history. Russell Barnhart pointed out to the authors that its use can be traced back at least to 1754, when Casanova, writing in his memoirs, History of My Life, writes She [Casanova’s mistress] made me promise to go to the casino [the Ridotto in Venice] for money to play in partnership with her. I went there and took all the gold I found, and, determinedly doubling my stakes according to the system known as the martingale, I won three or four times a day during the rest of the Carnival. I never lost the sixth card. If I had lost it, I should have been out of funds, which amounted to two thousand zecchini.7 Even if there were no zeros on the roulette wheel so the game was perfectly fair, the martingale system, or any other system for that matter, cannot make the game into a favorable game. The idea that a fair game remains fair and unfair games remain unfair under gambling systems has been exploited by mathematicians to obtain important results in the study of probability. We will introduce the general concept of a martingale in Chapter 6. The word martingale itself also has an interesting history. The origin of the word is obscure. The Oxford English Dictionary gives examples of its use in the
5 D. D. McCracken, “The Monte Carlo Method,” Scientific American, vol. 192 (May 1955), p. 90. 6 W. Feller, Introduction to Probability Theory and its Applications, vol. 1, 3rd ed. (New York: John Wiley & Sons, 1968), p. xi. 7 G. Casanova, History of My Life, vol. IV, Chap. 7, trans. W. R. Trask (New York: HarcourtBrace, 1968), p. 124.

12

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

early 1600s and says that its probable origin is the reference in Rabelais’s Book One, Chapter 19: Everything was done as planned, the only thing being that Gargantua doubted if they would be able to find, right away, breeches suitable to the old fellow’s legs; he was doubtful, also, as to what cut would be most becoming to the orator—the martingale, which has a draw-bridge effect in the seat, to permit doing one’s business more easily; the sailor-style, which affords more comfort for the kidneys; the Swiss, which is warmer on the belly; or the codfish-tail, which is cooler on the loins.8 In modern uses martingale has several different meanings, all related to holding down, in addition to the gambling use. For example, it is a strap on a horse’s harness used to hold down the horse’s head, and also part of a sailing rig used to hold down the bowsprit. The Labouchere system described in Exercise 9 is named after Henry du Pre Labouchere (1831–1912), an English journalist and member of Parliament. Labouchere attributed the system to Condorcet. Condorcet (1743–1794) was a political leader during the time of the French revolution who was interested in applying probability theory to economics and politics. For example, he calculated the probability that a jury using majority vote will give a correct decision if each juror has the same probability of deciding correctly. His writings provided a wealth of ideas on how probability might be applied to human affairs.9

Exercises
1 Modify the program CoinTosses to toss a coin n times and print out after every 100 tosses the proportion of heads minus 1/2. Do these numbers appear to approach 0 as n increases? Modify the program again to print out, every 100 times, both of the following quantities: the proportion of heads minus 1/2, and the number of heads minus half the number of tosses. Do these numbers appear to approach 0 as n increases? 2 Modify the program CoinTosses so that it tosses a coin n times and records whether or not the proportion of heads is within .1 of .5 (i.e., between .4 and .6). Have your program repeat this experiment 100 times. About how large must n be so that approximately 95 out of 100 times the proportion of heads is between .4 and .6? 3 In the early 1600s, Galileo was asked to explain the fact that, although the number of triples of integers from 1 to 6 with sum 9 is the same as the number of such triples with sum 10, when three dice are rolled, a 9 seemed to come up less often than a 10—supposedly in the experience of gamblers. in the Portable Rabelais, ed. S. Putnam (New York: Viking, 1946), p. 113. Marquise de Condorcet, Essai sur l’Application de l’Analyse ` la Probabilit´ d`s D´cisions a e e e Rendues a la Pluralit´ des Voix (Paris: Imprimerie Royale, 1785). e
9 Le 8 Quoted

1.1. SIMULATION OF DISCRETE PROBABILITIES

13

(a) Write a program to simulate the roll of three dice a large number of times and keep track of the proportion of times that the sum is 9 and the proportion of times it is 10. (b) Can you conclude from your simulations that the gamblers were correct? 4 In raquetball, a player continues to serve as long as she is winning; a point is scored only when a player is serving and wins the volley. The first player to win 21 points wins the game. Assume that you serve first and have a probability .6 of winning a volley when you serve and probability .5 when your opponent serves. Estimate, by simulation, the probability that you will win a game. 5 Consider the bet that all three dice will turn up sixes at least once in n rolls of three dice. Calculate f (n), the probability of at least one triple-six when three dice are rolled n times. Determine the smallest value of n necessary for a favorable bet that a triple-six will occur when three dice are rolled n times. (DeMoivre would say it should be about 216 log 2 = 149.7 and so would answer 150—see Exercise 1.2.17. Do you agree with him?) 6 In Las Vegas, a roulette wheel has 38 slots numbered 0, 00, 1, 2, . . . , 36. The 0 and 00 slots are green and half of the remaining 36 slots are red and half are black. A croupier spins the wheel and throws in an ivory ball. If you bet 1 dollar on red, you win 1 dollar if the ball stops in a red slot and otherwise you lose 1 dollar. Write a program to find the total winnings for a player who makes 1000 bets on red. 7 Another form of bet for roulette is to bet that a specific number (say 17) will turn up. If the ball stops on your number, you get your dollar back plus 35 dollars. If not, you lose your dollar. Write a program that will plot your winnings when you make 500 plays of roulette at Las Vegas, first when you bet each time on red (see Exercise 6), and then for a second visit to Las Vegas when you make 500 plays betting each time on the number 17. What differences do you see in the graphs of your winnings on these two occasions? 8 An astute student noticed that, in our simulation of the game of heads or tails (see Example 1.4), the proportion of times the player is always in the lead is very close to the proportion of times that the player’s total winnings end up 0. Work out these probabilities by enumeration of all cases for two tosses and for four tosses, and see if you think that these probabilities are, in fact, the same. 9 The Labouchere system for roulette is played as follows. Write down a list of numbers, usually 1, 2, 3, 4. Bet the sum of the first and last, 1 + 4 = 5, on red. If you win, delete the first and last numbers from your list. If you lose, add the amount that you last bet to the end of your list. Then use the new list and bet the sum of the first and last numbers (if there is only one number, bet that amount). Continue until your list becomes empty. Show that, if this

14

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS happens, you win the sum, 1 + 2 + 3 + 4 = 10, of your original list. Simulate this system and see if you do always stop and, hence, always win. If so, why is this not a foolproof gambling system?

10 Another well-known gambling system is the martingale doubling system. Suppose that you are betting on red to turn up in roulette. Every time you win, bet 1 dollar next time. Every time you lose, double your previous bet. Continue to play until you have won at least 5 dollars or you have lost more than 100 dollars. Write a program to simulate this system and play it a number of times and see how you do. In his book The Newcomes, W. M. Thackeray remarks “You have not played as yet? Do not do so; above all avoid a martingale if you do.”10 Was this good advice? 11 Modify the program HTSimulation so that it keeps track of the maximum of Peter’s winnings in each game of 40 tosses. Have your program print out the proportion of times that your total winnings take on values 0, 2, 4, . . . , 40. Calculate the corresponding exact probabilities for games of two tosses and four tosses. 12 In an upcoming national election for the President of the United States, a pollster plans to predict the winner of the popular vote by taking a random sample of 1000 voters and declaring that the winner will be the one obtaining the most votes in his sample. Suppose that 48 percent of the voters plan to vote for the Republican candidate and 52 percent plan to vote for the Democratic candidate. To get some idea of how reasonable the pollster’s plan is, write a program to make this prediction by simulation. Repeat the simulation 100 times and see how many times the pollster’s prediction would come true. Repeat your experiment, assuming now that 49 percent of the population plan to vote for the Republican candidate; first with a sample of 1000 and then with a sample of 3000. (The Gallup Poll uses about 3000.) (This idea is discussed further in Chapter 9, Section 9.1.) 13 The psychologist Tversky and his colleagues11 say that about four out of five people will answer (a) to the following question: A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital 15 babies are born each day. Although the overall proportion of boys is about 50 percent, the actual proportion at either hospital may be more or less than 50 percent on any day. At the end of a year, which hospital will have the greater number of days on which more than 60 percent of the babies born were boys? (a) the large hospital
M. Thackerey, The Newcomes (London: Bradbury and Evans, 1854–55). K. McKean, “Decisions, Decisions,” Discover, June 1985, pp. 22–31. Kevin McKean, Discover Magazine, c 1987 Family Media, Inc. Reprinted with permission. This popular article reports on the work of Tverksy et. al. in Judgement Under Uncertainty: Heuristics and Biases (Cambridge: Cambridge University Press, 1982).
11 See 10 W.

1.1. SIMULATION OF DISCRETE PROBABILITIES (b) the small hospital (c) neither—the number of days will be about the same.

15

Assume that the probability that a baby is a boy is .5 (actual estimates make this more like .513). Decide, by simulation, what the right answer is to the question. Can you suggest why so many people go wrong? 14 You are offered the following game. A fair coin will be tossed until the first time it comes up heads. If this occurs on the jth toss you are paid 2j dollars. You are sure to win at least 2 dollars so you should be willing to pay to play this game—but how much? Few people would pay as much as 10 dollars to play this game. See if you can decide, by simulation, a reasonable amount that you would be willing to pay, per game, if you will be allowed to make a large number of plays of the game. Does the amount that you would be willing to pay per game depend upon the number of plays that you will be allowed? 15 Tversky and his colleagues12 studied the records of 48 of the Philadelphia 76ers basketball games in the 1980–81 season to see if a player had times when he was hot and every shot went in, and other times when he was cold and barely able to hit the backboard. The players estimated that they were about 25 percent more likely to make a shot after a hit than after a miss. In fact, the opposite was true—the 76ers were 6 percent more likely to score after a miss than after a hit. Tversky reports that the number of hot and cold streaks was about what one would expect by purely random effects. Assuming that a player has a fifty-fifty chance of making a shot and makes 20 shots a game, estimate by simulation the proportion of the games in which the player will have a streak of 5 or more hits. 16 Estimate, by simulation, the average number of children there would be in a family if all people had children until they had a boy. Do the same if all people had children until they had at least one boy and at least one girl. How many more children would you expect to find under the second scheme than under the first in 100,000 families? (Assume that boys and girls are equally likely.) 17 Mathematicians have been known to get some of the best ideas while sitting in a cafe, riding on a bus, or strolling in the park. In the early 1900s the famous mathematician George P´lya lived in a hotel near the woods in Zurich. He o liked to walk in the woods and think about mathematics. P´lya describes the o following incident: At the hotel there lived also some students with whom I usually took my meals and had friendly relations. On a certain day one of them expected the visit of his fianc´e, what (sic) I knew, but e e I did not foresee that he and his fianc´e would also set out for a
12 ibid.

16

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

-3

-2

-1

0

1

2

3

a. Random walk in one dimension.

b. Random walk in two dimensions.

c. Random walk in three dimensions.

Figure 1.6: Random walk.

1.1. SIMULATION OF DISCRETE PROBABILITIES stroll in the woods, and then suddenly I met them there. And then I met them the same morning repeatedly, I don’t remember how many times, but certainly much too often and I felt embarrassed: It looked as if I was snooping around which was, I assure you, not the case.13

17

This set him to thinking about whether random walkers were destined to meet. P´lya considered random walkers in one, two, and three dimensions. In one o dimension, he envisioned the walker on a very long street. At each intersection the walker flips a fair coin to decide which direction to walk next (see Figure 1.6a). In two dimensions, the walker is walking on a grid of streets, and at each intersection he chooses one of the four possible directions with equal probability (see Figure 1.6b). In three dimensions (we might better speak of a random climber), the walker moves on a three-dimensional grid, and at each intersection there are now six different directions that the walker may choose, each with equal probability (see Figure 1.6c). The reader is referred to Section 12.1, where this and related problems are discussed. (a) Write a program to simulate a random walk in one dimension starting at 0. Have your program print out the lengths of the times between returns to the starting point (returns to 0). See if you can guess from this simulation the answer to the following question: Will the walker always return to his starting point eventually or might he drift away forever? (b) The paths of two walkers in two dimensions who meet after n steps can be considered to be a single path that starts at (0, 0) and returns to (0, 0) after 2n steps. This means that the probability that two random walkers in two dimensions meet is the same as the probability that a single walker in two dimensions ever returns to the starting point. Thus the question of whether two walkers are sure to meet is the same as the question of whether a single walker is sure to return to the starting point. Write a program to simulate a random walk in two dimensions and see if you think that the walker is sure to return to (0, 0). If so, P´lya would o be sure to keep meeting his friends in the park. Perhaps by now you have conjectured the answer to the question: Is a random walker in one or two dimensions sure to return to the starting point? P´lya answered o this question for dimensions one, two, and three. He established the remarkable result that the answer is yes in one and two dimensions and no in three dimensions.
13 G. P´lya, “Two Incidents,” Scientists at Work: Festschrift in Honour of Herman Wold, ed. o T. Dalenius, G. Karlsson, and S. Malmquist (Uppsala: Almquist & Wiksells Boktryckeri AB, 1970).

18

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS (c) Write a program to simulate a random walk in three dimensions and see whether, from this simulation and the results of (a) and (b), you could have guessed P´lya’s result. o

1.2

Discrete Probability Distributions

In this book we shall study many different experiments from a probabilistic point of view. What is involved in this study will become evident as the theory is developed and examples are analyzed. However, the overall idea can be described and illustrated as follows: to each experiment that we consider there will be associated a random variable, which represents the outcome of any particular experiment. The set of possible outcomes is called the sample space. In the first part of this section, we will consider the case where the experiment has only finitely many possible outcomes, i.e., the sample space is finite. We will then generalize to the case that the sample space is either finite or countably infinite. This leads us to the following definition.

Random Variables and Sample Spaces
Definition 1.1 Suppose we have an experiment whose outcome depends on chance. We represent the outcome of the experiment by a capital Roman letter, such as X, called a random variable. The sample space of the experiment is the set of all possible outcomes. If the sample space is either finite or countably infinite, the 2 random variable is said to be discrete. We generally denote a sample space by the capital Greek letter Ω. As stated above, in the correspondence between an experiment and the mathematical theory by which it is studied, the sample space Ω corresponds to the set of possible outcomes of the experiment. We now make two additional definitions. These are subsidiary to the definition of sample space and serve to make precise some of the common terminology used in conjunction with sample spaces. First of all, we define the elements of a sample space to be outcomes. Second, each subset of a sample space is defined to be an event. Normally, we shall denote outcomes by lower case letters and events by capital letters. Example 1.6 A die is rolled once. We let X denote the outcome of this experiment. Then the sample space for this experiment is the 6-element set Ω = {1, 2, 3, 4, 5, 6} , where each outcome i, for i = 1, . . . , 6, corresponds to the number of dots on the face which turns up. The event E = {2, 4, 6}

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

19

corresponds to the statement that the result of the roll is an even number. The event E can also be described by saying that X is even. Unless there is reason to believe the die is loaded, the natural assumption is that every outcome is equally likely. Adopting this convention means that we assign a probability of 1/6 to each of the six outcomes, i.e., m(i) = 1/6, for 1 ≤ i ≤ 6. 2

Distribution Functions
We next describe the assignment of probabilities. The definitions are motivated by the example above, in which we assigned to each outcome of the sample space a nonnegative number such that the sum of the numbers assigned is equal to 1. Definition 1.2 Let X be a random variable which denotes the value of the outcome of a certain experiment, and assume that this experiment has only finitely many possible outcomes. Let Ω be the sample space of the experiment (i.e., the set of all possible values of X, or equivalently, the set of all possible outcomes of the experiment.) A distribution function for X is a real-valued function m whose domain is Ω and which satisfies: 1. m(ω) ≥ 0 , 2. ω∈Ω for all ω ∈ Ω , and

m(ω) = 1 .

For any subset E of Ω, we define the probability of E to be the number P (E) given by P (E) = ω∈E m(ω) . 2

Example 1.7 Consider an experiment in which a coin is tossed twice. Let X be the random variable which corresponds to this experiment. We note that there are several ways to record the outcomes of this experiment. We could, for example, record the two tosses, in the order in which they occurred. In this case, we have Ω ={HH,HT,TH,TT}. We could also record the outcomes by simply noting the number of heads that appeared. In this case, we have Ω ={0,1,2}. Finally, we could record the two outcomes, without regard to the order in which they occurred. In this case, we have Ω ={HH,HT,TT}. We will use, for the moment, the first of the sample spaces given above. We will assume that all four outcomes are equally likely, and define the distribution function m(ω) by m(HH) = m(HT) = m(TH) = m(TT) = 1 . 4

20

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

Let E ={HH,HT,TH} be the event that at least one head comes up. Then, the probability of E can be calculated as follows: P (E) = m(HH) + m(HT) + m(TH) 3 1 1 1 + + = . = 4 4 4 4 Similarly, if F ={HH,HT} is the event that heads comes up on the first toss, then we have P (F ) = m(HH) + m(HT) 1 1 1 + = . = 4 4 2 2 Example 1.8 (Example 1.6 continued) The sample space for the experiment in which the die is rolled is the 6-element set Ω = {1, 2, 3, 4, 5, 6}. We assumed that the die was fair, and we chose the distribution function defined by m(i) = 1 , 6 for i = 1, . . . , 6 .

If E is the event that the result of the roll is an even number, then E = {2, 4, 6} and P (E) = m(2) + m(4) + m(6) 1 1 1 1 + + = . = 6 6 6 2 2 Notice that it is an immediate consequence of the above definitions that, for every ω ∈ Ω, P ({ω}) = m(ω) . That is, the probability of the elementary event {ω}, consisting of a single outcome ω, is equal to the value m(ω) assigned to the outcome ω by the distribution function. Example 1.9 Three people, A, B, and C, are running for the same office, and we assume that one and only one of them wins. The sample space may be taken as the 3-element set Ω ={A,B,C} where each element corresponds to the outcome of that candidate’s winning. Suppose that A and B have the same chance of winning, but that C has only 1/2 the chance of A or B. Then we assign m(A) = m(B) = 2m(C) . Since m(A) + m(B) + m(C) = 1 ,

1.2. DISCRETE PROBABILITY DISTRIBUTIONS we see that 2m(C) + 2m(C) + m(C) = 1 , which implies that 5m(C) = 1. Hence, m(A) = 2 , 5 m(B) = 2 , 5 m(C) = 1 . 5

21

Let E be the event that either A or C wins. Then E ={A,C}, and P (E) = m(A) + m(C) = 3 2 1 + = . 5 5 5 2 In many cases, events can be described in terms of other events through the use of the standard constructions of set theory. We will briefly review the definitions of these constructions. The reader is referred to Figure 1.7 for Venn diagrams which illustrate these constructions. Let A and B be two sets. Then the union of A and B is the set A ∪ B = {x | x ∈ A or x ∈ B} . The intersection of A and B is the set A ∩ B = {x | x ∈ A and x ∈ B} . The difference of A and B is the set A − B = {x | x ∈ A and x ∈ B} . The set A is a subset of B, written A ⊂ B, if every element of A is also an element of B. Finally, the complement of A is the set ˜ A = {x | x ∈ Ω and x ∈ A} . The reason that these constructions are important is that it is typically the case that complicated events described in English can be broken down into simpler events using these constructions. For example, if A is the event that “it will snow tomorrow and it will rain the next day,” B is the event that “it will snow tomorrow,” and C is the event that “it will rain two days from now,” then A is the intersection of the events B and C. Similarly, if D is the event that “it will snow tomorrow or it will rain the next day,” then D = B ∪ C. (Note that care must be taken here, because sometimes the word “or” in English means that exactly one of the two alternatives will occur. The meaning is usually clear from context. In this book, we will always use the word “or” in the inclusive sense, i.e., A or B means that at ˜ least one of the two events A, B is true.) The event B is the event that “it will not snow tomorrow.” Finally, if E is the event that “it will snow tomorrow but it will not rain the next day,” then E = B − C.

22

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

A

B

A

B

A ∼ A

A

B



A B

A B


A B

Figure 1.7: Basic set operations.

Properties
Theorem 1.1 The probabilities assigned to events by a distribution function on a sample space Ω satisfy the following properties: 1. P (E) ≥ 0 for every E ⊂ Ω . 2. P (Ω) = 1 . 3. If E ⊂ F ⊂ Ω, then P (E) ≤ P (F ) . 4. If A and B are disjoint subsets of Ω, then P (A ∪ B) = P (A) + P (B) . ˜ 5. P (A) = 1 − P (A) for every A ⊂ Ω .

Proof. For any event E the probability P (E) is determined from the distribution m by m(ω) , P (E) = ω∈E for every E ⊂ Ω. Since the function m is nonnegative, it follows that P (E) is also nonnegative. Thus, Property 1 is true. Property 2 is proved by the equations P (Ω) = ω∈Ω m(ω) = 1 .

Suppose that E ⊂ F ⊂ Ω. Then every element ω that belongs to E also belongs to F . Therefore, m(ω) ≤ m(ω) , ω∈E ω∈F

since each term in the left-hand sum is in the right-hand sum, and all the terms in both sums are non-negative. This implies that P (E) ≤ P (F ) , and Property 3 is proved.

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

23

Suppose next that A and B are disjoint subsets of Ω. Then every element ω of A ∪ B lies either in A and not in B or in B and not in A. It follows that P (A ∪ B) = ω∈A∪B m(ω) =

ω∈A

m(ω) +

ω∈B

m(ω)

= P (A) + P (B) , and Property 4 is proved. Finally, to prove Property 5, consider the disjoint union ˜ Ω=A∪A . Since P (Ω) = 1, the property of disjoint additivity (Property 4) implies that ˜ 1 = P (A) + P (A) , ˜ whence P (A) = 1 − P (A). 2

It is important to realize that Property 4 in Theorem 1.1 can be extended to more than two sets. The general finite additivity property is given by the following theorem.

Theorem 1.2 If A1 , . . . , An are pairwise disjoint subsets of Ω (i.e., no two of the Ai ’s have an element in common), then n P (A1 ∪ · · · ∪ An ) = i=1 P (Ai ) .

Proof. Let ω be any element in the union A1 ∪ · · · ∪ An . Then m(ω) occurs exactly once on each side of the equality in the statement of the theorem. 2 We shall often use the following consequence of the above theorem. Theorem 1.3 Let A1 , . . . , An be pairwise disjoint events with Ω = A1 ∪ · · · ∪ An , and let E be any event. Then n P (E) = i=1 P (E ∩ Ai ) .

Proof. The sets E ∩ A1 , . . . , E ∩ An are pairwise disjoint, and their union is the set E. The result now follows from Theorem 1.2. 2

24

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

Corollary 1.1 For any two events A and B, ˜ P (A) = P (A ∩ B) + P (A ∩ B) . 2 Property 4 can be generalized in another way. Suppose that A and B are subsets of Ω which are not necessarily disjoint. Then:

Theorem 1.4 If A and B are subsets of Ω, then P (A ∪ B) = P (A) + P (B) − P (A ∩ B) . (1.1)

Proof. The left side of Equation 1.1 is the sum of m(ω) for ω in either A or B. We must show that the right side of Equation 1.1 also adds m(ω) for ω in A or B. If ω is in exactly one of the two sets, then it is counted in only one of the three terms on the right side of Equation 1.1. If it is in both A and B, it is added twice from the calculations of P (A) and P (B) and subtracted once for P (A ∩ B). Thus it is counted exactly once by the right side. Of course, if A ∩ B = ∅, then Equation 1.1 reduces to Property 4. (Equation 1.1 can also be generalized; see Theorem 3.8.) 2

Tree Diagrams
Example 1.10 Let us illustrate the properties of probabilities of events in terms of three tosses of a coin. When we have an experiment which takes place in stages such as this, we often find it convenient to represent the outcomes by a tree diagram as shown in Figure 1.8. A path through the tree corresponds to a possible outcome of the experiment. For the case of three tosses of a coin, we have eight paths ω1 , ω2 , . . . , ω8 and, assuming each outcome to be equally likely, we assign equal weight, 1/8, to each ˜ path. Let E be the event “at least one head turns up.” Then E is the event “no heads turn up.” This event occurs for only one outcome, namely, ω8 = TTT. Thus, ˜ E = {TTT} and we have ˜ P (E) = P ({TTT}) = m(TTT) = By Property 5 of Theorem 1.1, ˜ P (E) = 1 − P (E) = 1 − 7 1 = . 8 8 1 . 8

Note that we shall often find it is easier to compute the probability that an event does not happen rather than the probability that it does. We then use Property 5 to obtain the desired probability.

1.2. DISCRETE PROBABILITY DISTRIBUTIONS
First toss Second toss Third toss H H H H T T (Start) H H T T H T T T Outcome

25

ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8

Figure 1.8: Tree diagram for three tosses of a coin. Let A be the event “the first outcome is a head,” and B the event “the second outcome is a tail.” By looking at the paths in Figure 1.8, we see that P (A) = P (B) = 1 . 2

Moreover, A ∩ B = {ω3 , ω4 }, and so P (A ∩ B) = 1/4. Using Theorem 1.4, we obtain P (A ∪ B) = P (A) + P (B) − P (A ∩ B) 3 1 1 1 + − = . = 2 2 4 4

Since A ∪ B is the 6-element set, A ∪ B = {HHH,HHT,HTH,HTT,TTH,TTT} , we see that we obtain the same result by direct enumeration. 2

In our coin tossing examples and in the die rolling example, we have assigned an equal probability to each possible outcome of the experiment. Corresponding to this method of assigning probabilities, we have the following definitions.

Uniform Distribution
Definition 1.3 The uniform distribution on a sample space Ω containing n elements is the function m defined by m(ω) = for every ω ∈ Ω. 1 , n 2

26

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

It is important to realize that when an experiment is analyzed to describe its possible outcomes, there is no single correct choice of sample space. For the experiment of tossing a coin twice in Example 1.2, we selected the 4-element set Ω ={HH,HT,TH,TT} as a sample space and assigned the uniform distribution function. These choices are certainly intuitively natural. On the other hand, for some ¯ purposes it may be more useful to consider the 3-element sample space Ω = {0, 1, 2} in which 0 is the outcome “no heads turn up,” 1 is the outcome “exactly one head turns up,” and 2 is the outcome “two heads turn up.” The distribution function m ¯ ¯ on Ω defined by the equations 1 1 1 , m(1) = , ¯ m(2) = ¯ 4 2 4 is the one corresponding to the uniform probability density on the original sample space Ω. Notice that it is perfectly possible to choose a different distribution func¯ tion. For example, we may consider the uniform distribution function on Ω, which is the function q defined by ¯ m(0) = ¯ 1 . 3 Although q is a perfectly good distribution function, it is not consistent with ob¯ served data on coin tossing. q (0) = q (1) = q (2) = ¯ ¯ ¯ Example 1.11 Consider the experiment that consists of rolling a pair of dice. We take as the sample space Ω the set of all ordered pairs (i, j) of integers with 1 ≤ i ≤ 6 and 1 ≤ j ≤ 6. Thus, Ω = { (i, j) : 1 ≤ i, j ≤ 6 } . (There is at least one other “reasonable” choice for a sample space, namely the set of all unordered pairs of integers, each between 1 and 6. For a discussion of why we do not use this set, see Example 3.14.) To determine the size of Ω, we note that there are six choices for i, and for each choice of i there are six choices for j, leading to 36 different outcomes. Let us assume that the dice are not loaded. In mathematical terms, this means that we assume that each of the 36 outcomes is equally likely, or equivalently, that we adopt the uniform distribution function on Ω by setting 1 , 1 ≤ i, j ≤ 6 . m((i, j)) = 36 What is the probability of getting a sum of 7 on the roll of two dice—or getting a sum of 11? The first event, denoted by E, is the subset E = {(1, 6), (6, 1), (2, 5), (5, 2), (3, 4), (4, 3)} . A sum of 11 is the subset F given by F = {(5, 6), (6, 5)} . Consequently, P (E) = P (F ) = ω∈E ω∈F

m(ω) = 6 · m(ω) = 2 ·

1 36 1 36

= =

1 6 1 18

, .

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

27

What is the probability of getting neither snakeeyes (double ones) nor boxcars (double sixes)? The event of getting either one of these two outcomes is the set E = {(1, 1), (6, 6)} . Hence, the probability of obtaining neither is given by ˜ P (E) = 1 − P (E) = 1 − 17 2 = . 36 18 2 In the above coin tossing and the dice rolling experiments, we have assigned an equal probability to each outcome. That is, in each example, we have chosen the uniform distribution function. These are the natural choices provided the coin is a fair one and the dice are not loaded. However, the decision as to which distribution function to select to describe an experiment is not a part of the basic mathematical theory of probability. The latter begins only when the sample space and the distribution function have already been defined.

Determination of Probabilities
It is important to consider ways in which probability distributions are determined in practice. One way is by symmetry. For the case of the toss of a coin, we do not see any physical difference between the two sides of a coin that should affect the chance of one side or the other turning up. Similarly, with an ordinary die there is no essential difference between any two sides of the die, and so by symmetry we assign the same probability for any possible outcome. In general, considerations of symmetry often suggest the uniform distribution function. Care must be used here. We should not always assume that, just because we do not know any reason to suggest that one outcome is more likely than another, it is appropriate to assign equal probabilities. For example, consider the experiment of guessing the sex of a newborn child. It has been observed that the proportion of newborn children who are boys is about .513. Thus, it is more appropriate to assign a distribution function which assigns probability .513 to the outcome boy and probability .487 to the outcome girl than to assign probability 1/2 to each outcome. This is an example where we use statistical observations to determine probabilities. Note that these probabilities may change with new studies and may vary from country to country. Genetic engineering might even allow an individual to influence this probability for a particular case.

Odds
Statistical estimates for probabilities are fine if the experiment under consideration can be repeated a number of times under similar circumstances. However, assume that, at the beginning of a football season, you want to assign a probability to the event that Dartmouth will beat Harvard. You really do not have data that relates to this year’s football team. However, you can determine your own personal probability

28

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

by seeing what kind of a bet you would be willing to make. For example, suppose that you are willing to make a 1 dollar bet giving 2 to 1 odds that Dartmouth will win. Then you are willing to pay 2 dollars if Dartmouth loses in return for receiving 1 dollar if Dartmouth wins. This means that you think the appropriate probability for Dartmouth winning is 2/3. Let us look more carefully at the relation between odds and probabilities. Suppose that we make a bet at r to 1 odds that an event E occurs. This means that we think that it is r times as likely that E will occur as that E will not occur. In general, r to s odds will be taken to mean the same thing as r/s to 1, i.e., the ratio between the two numbers is the only quantity of importance when stating odds. Now if it is r times as likely that E will occur as that E will not occur, then the probability that E occurs must be r/(r + 1), since we have ˜ P (E) = r P (E) and ˜ P (E) + P (E) = 1 . In general, the statement that the odds are r to s in favor of an event E occurring is equivalent to the statement that P (E) = = r/s (r/s) + 1 r . r+s

If we let P (E) = p, then the above equation can easily be solved for r/s in terms of p; we obtain r/s = p/(1 − p). We summarize the above discussion in the following definition. Definition 1.4 If P (E) = p, the odds in favor of the event E occurring are r : s (r to s) where r/s = p/(1 − p). If r and s are given, then p can be found by using the equation p = r/(r + s). 2

Example 1.12 (Example 1.9 continued) In Example 1.9 we assigned probability 1/5 to the event that candidate C wins the race. Thus the odds in favor of C winning are 1/5 : 4/5. These odds could equally well have been written as 1 : 4, 2 : 8, and so forth. A bet that C wins is fair if we receive 4 dollars if C wins and pay 1 dollar if C loses. 2

Infinite Sample Spaces
If a sample space has an infinite number of points, then the way that a distribution function is defined depends upon whether or not the sample space is countable. A sample space is countably infinite if the elements can be counted, i.e., can be put in one-to-one correspondence with the positive integers, and uncountably infinite

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

29

otherwise. Infinite sample spaces require new concepts in general (see Chapter 2), but countably infinite spaces do not. If Ω = {ω1 , ω2 , ω3 , . . .} is a countably infinite sample space, then a distribution function is defined exactly as in Definition 1.2, except that the sum must now be a convergent infinite sum. Theorem 1.1 is still true, as are its extensions Theorems 1.2 and 1.4. One thing we cannot do on a countably infinite sample space that we could do on a finite sample space is to define a uniform distribution function as in Definition 1.3. You are asked in Exercise 20 to explain why this is not possible. Example 1.13 A coin is tossed until the first time that a head turns up. Let the outcome of the experiment, ω, be the first time that a head turns up. Then the possible outcomes of our experiment are Ω = {1, 2, 3, . . .} . Note that even though the coin could come up tails every time we have not allowed for this possibility. We will explain why in a moment. The probability that heads comes up on the first toss is 1/2. The probability that tails comes up on the first toss and heads on the second is 1/4. The probability that we have two tails followed by a head is 1/8, and so forth. This suggests assigning the distribution function m(n) = 1/2n for n = 1, 2, 3, . . . . To see that this is a distribution function we must show that 1 1 1 m(ω) = + + + · · · = 1 . 2 4 8 ω That this is true follows from the formula for the sum of a geometric series, 1 , 1 + r + r2 + r3 + · · · = 1−r or r , (1.2) r + r2 + r3 + r4 + · · · = 1−r for −1 < r < 1. Putting r = 1/2, we see that we have a probability of 1 that the coin eventually turns up heads. The possible outcome of tails every time has to be assigned probability 0, so we omit it from our sample space of possible outcomes. Let E be the event that the first time a head turns up is after an even number of tosses. Then E = {2, 4, 6, 8, . . .} , and 1 1 1 + + + ··· . 4 16 64 Putting r = 1/4 in Equation 1.2 see that P (E) = P (E) = 1 1/4 = . 1 − 1/4 3

Thus the probability that a head turns up for the first time after an even number of tosses is 1/3 and after an odd number of tosses is 2/3. 2

30

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

Historical Remarks
An interesting question in the history of science is: Why was probability not developed until the sixteenth century? We know that in the sixteenth century problems in gambling and games of chance made people start to think about probability. But gambling and games of chance are almost as old as civilization itself. In ancient Egypt (at the time of the First Dynasty, ca. 3500 B.C.) a game now called “Hounds and Jackals” was played. In this game the movement of the hounds and jackals was based on the outcome of the roll of four-sided dice made out of animal bones called astragali. Six-sided dice made of a variety of materials date back to the sixteenth century B.C. Gambling was widespread in ancient Greece and Rome. Indeed, in the Roman Empire it was sometimes found necessary to invoke laws against gambling. Why, then, were probabilities not calculated until the sixteenth century? Several explanations have been advanced for this late development. One is that the relevant mathematics was not developed and was not easy to develop. The ancient mathematical notation made numerical calculation complicated, and our familiar algebraic notation was not developed until the sixteenth century. However, as we shall see, many of the combinatorial ideas needed to calculate probabilities were discussed long before the sixteenth century. Since many of the chance events of those times had to do with lotteries relating to religious affairs, it has been suggested that there may have been religious barriers to the study of chance and gambling. Another suggestion is that a stronger incentive, such as the development of commerce, was necessary. However, none of these explanations seems completely satisfactory, and people still wonder why it took so long for probability to be studied seriously. An interesting discussion of this problem can be found in Hacking.14 The first person to calculate probabilities systematically was Gerolamo Cardano (1501–1576) in his book Liber de Ludo Aleae. This was translated from the Latin by Gould and appears in the book Cardano: The Gambling Scholar by Ore.15 Ore provides a fascinating discussion of the life of this colorful scholar with accounts of his interests in many different fields, including medicine, astrology, and mathematics. You will also find there a detailed account of Cardano’s famous battle with Tartaglia over the solution to the cubic equation. In his book on probability Cardano dealt only with the special case that we have called the uniform distribution function. This restriction to equiprobable outcomes was to continue for a long time. In this case Cardano realized that the probability that an event occurs is the ratio of the number of favorable outcomes to the total number of outcomes. Many of Cardano’s examples dealt with rolling dice. Here he realized that the outcomes for two rolls should be taken to be the 36 ordered pairs (i, j) rather than the 21 unordered pairs. This is a subtle point that was still causing problems much later for other writers on probability. For example, in the eighteenth century the famous French mathematician d’Alembert, author of several works on probability, claimed that when a coin is tossed twice the number of heads that turn up would
14 I. 15 O.

Hacking, The Emergence of Probability (Cambridge: Cambridge University Press, 1975). Ore, Cardano: The Gambling Scholar (Princeton: Princeton University Press, 1953).

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

31

be 0, 1, or 2, and hence we should assign equal probabilities for these three possible outcomes.16 Cardano chose the correct sample space for his dice problems and calculated the correct probabilities for a variety of events. Cardano’s mathematical work is interspersed with a lot of advice to the potential gambler in short paragraphs, entitled, for example: “Who Should Play and When,” “Why Gambling Was Condemned by Aristotle,” “Do Those Who Teach Also Play Well?” and so forth. In a paragraph entitled “The Fundamental Principle of Gambling,” Cardano writes: The most fundamental principle of all in gambling is simply equal conditions, e.g., of opponents, of bystanders, of money, of situation, of the dice box, and of the die itself. To the extent to which you depart from that equality, if it is in your opponent’s favor, you are a fool, and if in your own, you are unjust.17 Cardano did make mistakes, and if he realized it later he did not go back and change his error. For example, for an event that is favorable in three out of four cases, Cardano assigned the correct odds 3 : 1 that the event will occur. But then he assigned odds by squaring these numbers (i.e., 9 : 1) for the event to happen twice in a row. Later, by considering the case where the odds are 1 : 1, he realized that this cannot be correct and was led to the correct result that when f out of n outcomes are favorable, the odds for a favorable outcome twice in a row are f 2 : n2 − f 2 . Ore points out that this is equivalent to the realization that if the probability that an event happens in one experiment is p, the probability that it happens twice is p2 . Cardano proceeded to establish that for three successes the formula should be p3 and for four successes p4 , making it clear that he understood that the probability is pn for n successes in n independent repetitions of such an experiment. This will follow from the concept of independence that we introduce in Section 4.1. Cardano’s work was a remarkable first attempt at writing down the laws of probability, but it was not the spark that started a systematic study of the subject. This came from a famous series of letters between Pascal and Fermat. This correspondence was initiated by Pascal to consult Fermat about problems he had been given by Chevalier de M´r´, a well-known writer, a prominent figure at the court of ee Louis XIV, and an ardent gambler. The first problem de M´r´ posed was a dice problem. The story goes that he had ee been betting that at least one six would turn up in four rolls of a die and winning too often, so he then bet that a pair of sixes would turn up in 24 rolls of a pair of dice. The probability of a six with one die is 1/6 and, by the product law for independent experiments, the probability of two sixes when a pair of dice is thrown is (1/6)(1/6) = 1/36. Ore18 claims that a gambling rule of the time suggested that, since four repetitions was favorable for the occurrence of an event with probability 1/6, for an event six times as unlikely, 6 · 4 = 24 repetitions would be sufficient for d’Alembert, “Croix ou Pile,” in L’Encyclop´die, ed. Diderot, vol. 4 (Paris, 1754). e Ore, op. cit., p. 189. 18 O. Ore, “Pascal and the Invention of Probability Theory,” American Mathematics Monthly, vol. 67 (1960), pp. 409–419.
17 O. 16 J.

32

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

a favorable bet. Pascal showed, by exact calculation, that 25 rolls are required for a favorable bet for a pair of sixes. The second problem was a much harder one: it was an old problem and concerned the determination of a fair division of the stakes in a tournament when the series, for some reason, is interrupted before it is completed. This problem is now referred to as the problem of points. The problem had been a standard problem in mathematical texts; it appeared in Fra Luca Paccioli’s book summa de Arithmetica, Geometria, Proportioni et Proportionalit` , printed in Venice in 1494,19 in the form: a A team plays ball such that a total of 60 points are required to win the game, and each inning counts 10 points. The stakes are 10 ducats. By some incident they cannot finish the game and one side has 50 points and the other 20. One wants to know what share of the prize money belongs to each side. In this case I have found that opinions differ from one to another but all seem to me insufficient in their arguments, but I shall state the truth and give the correct way. Reasonable solutions, such as dividing the stakes according to the ratio of games won by each player, had been proposed, but no correct solution had been found at the time of the Pascal-Fermat correspondence. The letters deal mainly with the attempts of Pascal and Fermat to solve this problem. Blaise Pascal (1623–1662) was a child prodigy, having published his treatise on conic sections at age sixteen, and having invented a calculating machine at age eighteen. At the time of the letters, his demonstration of the weight of the atmosphere had already established his position at the forefront of contemporary physicists. Pierre de Fermat (1601– 1665) was a learned jurist in Toulouse, who studied mathematics in his spare time. He has been called by some the prince of amateurs and one of the greatest pure mathematicians of all times. The letters, translated by Maxine Merrington, appear in Florence David’s fascinating historical account of probability, Games, Gods and Gambling.20 In a letter dated Wednesday, 29th July, 1654, Pascal writes to Fermat: Sir, Like you, I am equally impatient, and although I am again ill in bed, I cannot help telling you that yesterday evening I received from M. de Carcavi your letter on the problem of points, which I admire more than I can possibly say. I have not the leisure to write at length, but, in a word, you have solved the two problems of points, one with dice and the other with sets of games with perfect justness; I am entirely satisfied with it for I do not doubt that I was in the wrong, seeing the admirable agreement in which I find myself with you now. . . Your method is very sound and is the one which first came to my mind in this research; but because the labour of the combination is excessive, I have found a short cut and indeed another method which is much
19 ibid., 20 F.

p. 414. N. David, Games, Gods and Gambling (London: G. Griffin, 1962), p. 230 ff.

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

33

3

0

0

0

2 Number of games B has won 1

8

16

32

64

20

32

48

64

0

32

44

56

64

0

1 2 Number of games A has won

3

Figure 1.9: Pascal’s table. quicker and neater, which I would like to tell you here in a few words: for henceforth I would like to open my heart to you, if I may, as I am so overjoyed with our agreement. I see that truth is the same in Toulouse as in Paris. Here, more or less, is what I do to show the fair value of each game, when two opponents play, for example, in three games and each person has staked 32 pistoles. Let us say that the first man had won twice and the other once; now they play another game, in which the conditions are that, if the first wins, he takes all the stakes; that is 64 pistoles; if the other wins it, then they have each won two games, and therefore, if they wish to stop playing, they must each take back their own stake, that is, 32 pistoles each. Then consider, Sir, if the first man wins, he gets 64 pistoles; if he loses he gets 32. Thus if they do not wish to risk this last game but wish to separate without playing it, the first man must say: ‘I am certain to get 32 pistoles, even if I lost I still get them; but as for the other 32, perhaps I will get them, perhaps you will get them, the chances are equal. Let us then divide these 32 pistoles in half and give one half to me as well as my 32 which are mine for sure.’ He will then have 48 pistoles and the other 16. . . Pascal’s argument produces the table illustrated in Figure 1.9 for the amount due player A at any quitting point. Each entry in the table is the average of the numbers just above and to the right of the number. This fact, together with the known values when the tournament is completed, determines all the values in this table. If player A wins the first game,

34

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

then he needs two games to win and B needs three games to win; and so, if the tounament is called off, A should receive 44 pistoles. The letter in which Fermat presented his solution has been lost; but fortunately, Pascal describes Fermat’s method in a letter dated Monday, 24th August, 1654. From Pascal’s letter:21 This is your procedure when there are two players: If two players, playing several games, find themselves in that position when the first man needs two games and second needs three, then to find the fair division of stakes, you say that one must know in how many games the play will be absolutely decided. It is easy to calculate that this will be in four games, from which you can conclude that it is necessary to see in how many ways four games can be arranged between two players, and one must see how many combinations would make the first man win and how many the second and to share out the stakes in this proportion. I would have found it difficult to understand this if I had not known it myself already; in fact you had explained it with this idea in mind. Fermat realized that the number of ways that the game might be finished may not be equally likely. For example, if A needs two more games and B needs three to win, two possible ways that the tournament might go for A to win are WLW and LWLW. These two sequences do not have the same chance of occurring. To avoid this difficulty, Fermat extended the play, adding fictitious plays, so that all the ways that the games might go have the same length, namely four. He was shrewd enough to realize that this extension would not change the winner and that he now could simply count the number of sequences favorable to each player since he had made them all equally likely. If we list all possible ways that the extended game of four plays might go, we obtain the following 16 possible outcomes of the play: WWWW WWWL WWLW WWLL WLWW WLWL WLLW WLLL LWWW LWWL LWLW LWLL LLWW LLWL LLLW LLLL .

Player A wins in the cases where there are at least two wins (the 11 underlined cases), and B wins in the cases where there are at least three losses (the other 5 cases). Since A wins in 11 of the 16 possible cases Fermat argued that the probability that A wins is 11/16. If the stakes are 64 pistoles, A should receive 44 pistoles in agreement with Pascal’s result. Pascal and Fermat developed more systematic methods for counting the number of favorable outcomes for problems like this, and this will be one of our central problems. Such counting methods fall under the subject of combinatorics, which is the topic of Chapter 3.
21 ibid.,

p. 239ff.

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

35

We see that these two mathematicians arrived at two very different ways to solve the problem of points. Pascal’s method was to develop an algorithm and use it to calculate the fair division. This method is easy to implement on a computer and easy to generalize. Fermat’s method, on the other hand, was to change the problem into an equivalent problem for which he could use counting or combinatorial methods. We will see in Chapter 3 that, in fact, Fermat used what has become known as Pascal’s triangle! In our study of probability today we shall find that both the algorithmic approach and the combinatorial approach share equal billing, just as they did 300 years ago when probability got its start.

Exercises
1 Let Ω = {a, b, c} be a sample space. Let m(a) = 1/2, m(b) = 1/3, and m(c) = 1/6. Find the probabilities for all eight subsets of Ω. 2 Give a possible sample space Ω for each of the following experiments: (a) An election decides between two candidates A and B. (b) A two-sided coin is tossed. (c) A student is asked for the month of the year and the day of the week on which her birthday falls. (d) A student is chosen at random from a class of ten students. (e) You receive a grade in this course. 3 For which of the cases in Exercise 2 would it be reasonable to assign the uniform distribution function? 4 Describe in words the events specified by the following subsets of Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } (see Example 1.6). (a) E = {HHH,HHT,HTH,HTT}. (b) E = {HHH,TTT}. (c) E = {HHT,HTH,THH}. (d) E = {HHT,HTH,HTT,THH,THT,TTH,TTT}. 5 What are the probabilities of the events described in Exercise 4? 6 A die is loaded in such a way that the probability of each face turning up is proportional to the number of dots on that face. (For example, a six is three times as probable as a two.) What is the probability of getting an even number in one throw? ˜ 7 Let A and B be events such that P (A ∩ B) = 1/4, P (A) = 1/3, and P (B) = 1/2. What is P (A ∪ B)?

36

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS 8 A student must choose one of the subjects, art, geology, or psychology, as an elective. She is equally likely to choose art or psychology and twice as likely to choose geology. What are the respective probabilities that she chooses art, geology, and psychology? 9 A student must choose exactly two out of three electives: art, French, and mathematics. He chooses art with probability 5/8, French with probability 5/8, and art and French together with probability 1/4. What is the probability that he chooses mathematics? What is the probability that he chooses either art or French?

10 For a bill to come before the president of the United States, it must be passed by both the House of Representatives and the Senate. Assume that, of the bills presented to these two bodies, 60 percent pass the House, 80 percent pass the Senate, and 90 percent pass at least one of the two. Calculate the probability that the next bill presented to the two groups will come before the president. 11 What odds should a person give in favor of the following events? (a) A card chosen at random from a 52-card deck is an ace. (b) Two heads will turn up when a coin is tossed twice. (c) Boxcars (two sixes) will turn up when two dice are rolled. 12 You offer 3 : 1 odds that your friend Smith will be elected mayor of your city. What probability are you assigning to the event that Smith wins? 13 In a horse race, the odds that Romance will win are listed as 2 : 3 and that Downhill will win are 1 : 2. What odds should be given for the event that either Romance or Downhill wins? 14 Let X be a random variable with distribution function mX (x) defined by mX (−1) = 1/5, mX (0) = 1/5, mX (1) = 2/5, mX (2) = 1/5 . (a) Let Y be the random variable defined by the equation Y = X + 3. Find the distribution function mY (y) of Y . (b) Let Z be the random variable defined by the equation Z = X 2 . Find the distribution function mZ (z) of Z. *15 John and Mary are taking a mathematics course. The course has only three grades: A, B, and C. The probability that John gets a B is .3. The probability that Mary gets a B is .4. The probability that neither gets an A but at least one gets a B is .1. What is the probability that at least one gets a B but neither gets a C? 16 In a fierce battle, not less than 70 percent of the soldiers lost one eye, not less than 75 percent lost one ear, not less than 80 percent lost one hand, and not

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

37

less than 85 percent lost one leg. What is the minimal possible percentage of those who simultaneously lost one ear, one eye, one hand, and one leg?22 *17 Assume that the probability of a “success” on a single experiment with n outcomes is 1/n. Let m be the number of experiments necessary to make it a favorable bet that at least one success will occur (see Exercise 1.1.5). (a) Show that the probability that, in m trials, there are no successes is (1 − 1/n)m . (b) (de Moivre) Show that if m = n log 2 then lim 1− 1 n 1 n m n→∞

= n 1 . 2

Hint: n→∞ lim

1−

= e−1 .

Hence for large n we should choose m to be about n log 2. (c) Would DeMoivre have been led to the correct answer for de M´r´’s two ee bets if he had used his approximation? 18 (a) For events A1 , . . . , An , prove that P (A1 ∪ · · · ∪ An ) ≤ P (A1 ) + · · · + P (An ) . (b) For events A and B, prove that P (A ∩ B) ≥ P (A) + P (B) − 1. 19 If A, B, and C are any three events, show that P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (B ∩ C) − P (C ∩ A) + P (A ∩ B ∩ C) .

20 Explain why it is not possible to define a uniform distribution function (see Definition 1.3) on a countably infinite sample space. Hint: Assume m(ω) = a for all ω, where 0 ≤ a ≤ 1. Does m(ω) have all the properties of a distribution function? 21 In Example 1.13 find the probability that the coin turns up heads for the first time on the tenth, eleventh, or twelfth toss. 22 A die is rolled until the first time that a six turns up. We shall see that the probability that this occurs on the nth roll is (5/6)n−1 · (1/6). Using this fact, describe the appropriate infinite sample space and distribution function for the experiment of rolling a die until a six turns up for the first time. Verify that for your distribution function ω m(ω) = 1.
22 See

Knot X, in Lewis Carroll, Mathematical Recreations, vol. 2 (Dover, 1958).

38

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS

23 Let Ω be the sample space Ω = {0, 1, 2, . . .} , and define a distribution function by m(j) = (1 − r)j r , for some fixed r, 0 < r < 1, and for j = 0, 1, 2, . . .. Show that this is a distribution function for Ω. 24 Our calendar has a 400-year cycle. B. H. Brown noticed that the number of times the thirteenth of the month falls on each of the days of the week in the 4800 months of a cycle is as follows: Sunday 687 Monday 685 Tuesday 685 Wednesday 687 Thursday 684 Friday 688 Saturday 684 From this he deduced that the thirteenth was more likely to fall on Friday than on any other day. Explain what he meant by this. 25 Tversky and Kahneman23 asked a group of subjects to carry out the following task. They are told that: Linda is 31, single, outspoken, and very bright. She majored in philosophy in college. As a student, she was deeply concerned with racial discrimination and other social issues, and participated in anti-nuclear demonstrations. The subjects are then asked to rank the likelihood of various alternatives, such as: (1) Linda is active in the feminist movement. (2) Linda is a bank teller. (3) Linda is a bank teller and active in the feminist movement. Tversky and Kahneman found that between 85 and 90 percent of the subjects rated alternative (1) most likely, but alternative (3) more likely than alternative (2). Is it? They call this phenomenon the conjunction fallacy, and note that it appears to be unaffected by prior training in probability or statistics. Explain why this is a fallacy. Can you give a possible explanation for the subjects’ choices?
23 K.

McKean, “Decisions, Decisions,” pp. 22–31.

1.2. DISCRETE PROBABILITY DISTRIBUTIONS

39

26 Two cards are drawn successively from a deck of 52 cards. Find the probability that the second card is higher in rank than the first card. Hint: Show that 1 = P (higher) + P (lower) + P (same) and use the fact that P (higher) = P (lower). 27 A life table is a table that lists for a given number of births the estimated number of people who will live to a given age. In Appendix C we give a life table based upon 100,000 births for ages from 0 to 85, both for women and for men. Show how from this table you can estimate the probability m(x) that a person born in 1981 would live to age x. Write a program to plot m(x) both for men and for women, and comment on the differences that you see in the two cases. *28 Here is an attempt to get around the fact that we cannot choose a “random integer.” (a) What, intuitively, is the probability that a “randomly chosen” positive integer is a multiple of 3? (b) Let P3 (N ) be the probability that an integer, chosen at random between 1 and N , is a multiple of 3 (since the sample space is finite, this is a legitimate probability). Show that the limit P3 = lim P3 (N )
N →∞

exists and equals 1/3. This formalizes the intuition in (a), and gives us a way to assign “probabilities” to certain events that are infinite subsets of the positive integers. (c) If A is any set of positive integers, let A(N ) mean the number of elements of A which are less than or equal to N . Then define the “probability” of A as P (A) = lim A(N )/N ,
N →∞

provided this limit exists. Show that this definition would assign probability 0 to any finite set and probability 1 to the set of all positive integers. Thus, the probability of the set of all integers is not the sum of the probabilities of the individual integers in this set. This means that the definition of probability given here is not a completely satisfactory definition. (d) Let A be the set of all positive integers with an odd number of digits. Show that P (A) does not exist. This shows that under the above definition of probability, not all sets have probabilities. 29 (from Sholander24 ) In a standard clover-leaf interchange, there are four ramps for making right-hand turns, and inside these four ramps, there are four more ramps for making left-hand turns. Your car approaches the interchange from the south. A mechanism has been installed so that at each point where there exists a choice of directions, the car turns to the right with fixed probability r.
24 M.

Sholander, Problem #1034, Mathematics Magazine, vol. 52, no. 3 (May 1979), p. 183.

40

CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS (a) If r = 1/2, what is your chance of emerging from the interchange going west? (b) Find the value of r that maximizes your chance of a westward departure from the interchange.

30 (from Benkoski25 ) Consider a “pure” cloverleaf interchange in which there are no ramps for right-hand turns, but only the two intersecting straight highways with cloverleaves for left-hand turns. (Thus, to turn right in such an interchange, one must make three left-hand turns.) As in the preceding problem, your car approaches the interchange from the south. What is the value of r that maximizes your chances of an eastward departure from the interchange? 31 (from vos Savant26 ) A reader of Marilyn vos Savant’s column wrote in with the following question: My dad heard this story on the radio. At Duke University, two students had received A’s in chemistry all semester. But on the night before the final exam, they were partying in another state and didn’t get back to Duke until it was over. Their excuse to the professor was that they had a flat tire, and they asked if they could take a make-up test. The professor agreed, wrote out a test and sent the two to separate rooms to take it. The first question (on one side of the paper) was worth 5 points, and they answered it easily. Then they flipped the paper over and found the second question, worth 95 points: ‘Which tire was it?’ What was the probability that both students would say the same thing? My dad and I think it’s 1 in 16. Is that right?” (a) Is the answer 1/16? (b) The following question was asked of a class of students. “I was driving to school today, and one of my tires went flat. Which tire do you think it was?” The responses were as follows: right front, 58%, left front, 11%, right rear, 18%, left rear, 13%. Suppose that this distribution holds in the general population, and assume that the two test-takers are randomly chosen from the general population. What is the probability that they will give the same answer to the second question?

25 S. Benkoski, Comment on Problem #1034, Mathematics Magazine, vol. 52, no. 3 (May 1979), pp. 183-184. 26 M. vos Savant, Parade Magazine, 3 March 1996, p. 14.

Similar Documents

Premium Essay

Loyalty Cards

...The Effectiveness Of Loyalty Cards To Build Store Loyalty. The Case Of Tesco YourFirstName YourLastName University title EFFECTIVENESS OF LOYALTY CARDS TO BUILD THE STORE LOYALTY Abstract The main purpose of this research is to assess the effectiveness of loyalty cards and identify how loyalty cards build store loyalty. The research entailed selecting a survey from a sample of British households, which included 175 loyalty card holder and 175 non loyalty card holders. The data involved sending of a structured questionnaire to potential participants via email and the result was expected back in a period of two week time. Both the qualitative and quantitative methods of data analysis were used, Cronbach’s Alpha and Independent T-Test. The qualitative analysis was done with the use of content analysis. The findings from the research revealed that loyalty cards are less effective to build behavioral store loyalty but more effective in influencing attitudinal store. Introduction Considering the current economical crisis, new and existing companies have a reason to establish as well as maintain customers and earn profit as much as possible. New trends have emerged in the market which involves shortening products life cycles, and specialization in service area as well as focusing on customer preference, (Lee et al, 2010). Different countries adopt different methods of implementing customer relationship management (CRM). Traditionally, companies focused...

Words: 5630 - Pages: 23

Premium Essay

Customer Loyalty

...Customer loyalty has become a central goal of modern business. With each additional year of a relationship, customers become less costly to serve. Over time, as the loyalty life cycle plays out, loyal customers even become business builders: buying more, paying premium prices, and bringing in new customers through referrals (O’Brien & Jones, 1996). A rewards program can accelerate the loyalty cycle, encouraging first or second year customers to behave more like a company’s most profitable tenth-year customers – but only if it is planned and implemented as part of a larger loyalty management strategy (O’Brien & Jones, 1996). 4/5 consumers agree that companies should reward loyal customers instead of offering the best deals to new customers. This is particularly relevant to FMG who strives to compete on quality, not price. FMG has the need to adapt its current loyalty program to be more in-line with modern discussion around the topic. West (2004) found that when monetary rewards are given they tend to be spent on day-to-day living and the reward is not as morale boosting as hoped. Our current 1% discount per year model thus fails to provide our clients with the intended feeling of goodwill towards FMG. Moreover, as customers become accustomed to receiving a reward under a loyalty program they begin to view the rewards as an entitlement (Haisley & Loewenstein, 2011). This implies that while our reward magnitude is increasing over time, their perceived value will not increase...

Words: 1337 - Pages: 6

Premium Essay

Brand Loyalty

...Brand loyalty Click below Link for Answer http://workbank247.com/q/brand-loyalty/3937 http://workbank247.com/q/brand-loyalty/3937 This assessment asks you to write an essay on the topic of brand loyalty. The essay question is: Is the use of rewards and loyalty cards an effective way to build brand loyalty?In your essay you need to examine the affect of brand loyalty on the consumers’ purchase and product involvement and their decision-making process.Preparing for your essayYou need to do research using relevant academic and industry sources and prepare an argument.Below are some steps to help you gather the relevant information and prepare your essay.Step 1: Develop an understanding the marketplace and the current rewards systemIn an attempt to win the “supermarket war”, Coles has recently introduced a revised loyalty rewards systems.Please go online and review the details of the reward system on the Coles website and review the case study story called “Has Coles got revamped flybuys right?”See website link to B &T Magazine on Interact site In addition to these sources, you need to understand the situation in the market place- so it is a good idea to search for other media stories related to loyalty programs in other supermarkets.Step 2: Theory about brand loyalty and the decision-making stagesYou need to include theory in your essay, so in addition to the text book, you need to find some academic journal articles. Please use the followingtwoarticles:Bridson, K., Evans...

Words: 564 - Pages: 3

Premium Essay

Loyalty Marketing

...2012 Trends in Loyalty Marketing 2 2 0 1 2 T R E N D S I N L O YA LT Y M A R K E T I N G TABLE OF CONTENTS 1 GAME MECHANICS ................................................................... 3 2 SOCIAL MEDIA ........................................................................... 4 3 TIME-BASED POSTING ................................................................5 4 PRIVACY .......................................................................................6 5 REAL-TIME MARKETING .............................................................7 6 MOBILE ........................................................................................7 7 STAYING AHEAD OF THE CURVE ..............................................8 3 2 0 1 2 T R E N D S I N L O YA LT Y M A R K E T I N G Today’s standards for measuring the strength of customer engagement – frequency, recency, monetary value, and advocacy – are really no different than those in the hit American television show Mad Men, set more than fty years ago. While the measures of success remain the same, the marketing landscape has changed beyond recognition, with enormous challenges to establishing any customer relationship at all, much less advocacy. Marketers can’t simply tack on the internet to the TV, print, radio, and outdoor options available in the Mad Men era. The internet itself is fragmented into social media, search, games, the blogosphere, entertainment channels, and conventional ...

Words: 2081 - Pages: 9

Free Essay

2013 Loyalty Census

...JUNE 2013 Bulking Up: The 2013 COLLOQUY Loyalty Census Growth and Trends in U.S. Loyalty Program Activity Jeff Berry Sr. Director, Knowledge Development and Application, LoyaltyOne Research Director, COLLOQUY 2.65 BILLION That’s the number of U.S. loyalty program memberships in 2012 Learn why that incredible figure is both good and bad news for loyalty marketers, and how you can profitably respond to it COLLOQUY.COM | 06.13 | 1 Bulking Up: The 2013 COLLOQUY Loyalty Census Growth and Trends in U.S. Loyalty Program Activity Introduction When our 2011 COLLOQUY Loyalty Census revealed that the number of loyalty-program memberships in the U.S. topped 2 billion in 2010, you could almost hear the sound of the industry doing a double-take. Now it’s time to listen for that sound again, because the loyalty tally leaped to 2.65 billion in 2012. Obviously, the loyalty industry is pumping up. And what fascinates us as much as half a billion of pure growth are the reasons behind this continued expansion. Back in 2000, COLLOQUY tallied 973 million memberships in our landmark program sizing study. By the time we published the 2009 COLLOQUY Loyalty Census, that figure had reached 1.8 billion memberships. In the 2011 Census, we noted that the rate of expansion was slowing when we saw only 16.3% growth between 2009 and the eye-popping 2 billion in 2011. But the slowdown was temporary. The current figure of 2.65 billion represents additional 26.7% growth. Where is this bulk...

Words: 5669 - Pages: 23

Premium Essay

Walmart Customer Loyalty Program

...Walmart Customer Loyalty Program Save Money, Live Better, Reward Yourself! Travis Godderidge Rick Hayes Darrell Johnson Chad Mikels HBM 581 Services Management Section 2 Team 5 February 24, 2013 [Type  text]         Table of Contents Executive Summary Background Research Companies researched, program names and descriptions Description of the Proposed Program Company overview Target customers Suggested Program Names Features and Benefits Terms and Conditions Duration of the Program Initiating and Marketing Tracking Device for Customers Estimates costs to implement and sustain the loyalty program Summary and Recommendations References 1     2 3 8 9 9 9 10 11 12 13 14 15 16     Executive Summary In the fierce world of global discount retail competition, differentiation is critical to long term success and survival. Walmart was founded on the differentiating factor of the lowest prices for the highest quality products. While this has served Walmart well and will continue to do so with global sourcing and supply chain management to secure the best products at the best prices, an expansion of the services provided to the customer and the ability to obtain demographic spending information and trends will prove invaluable to Walmart’s continued success and rapid global expansion. Additionally, programs that help retain customers once the customer has entered the store will increase market share more rapidly and increase...

Words: 4862 - Pages: 20

Premium Essay

Gnc Customer Loyalty

...Customer Loyalty 2013 (Promotion change) GNC new Member Pricing Program known as the Member Price Program (allows members to access GNC's products at a reduced cost) A previous GNC loyalty program, known as Gold Card, has nearly 7 million active members who will be transferred to the new program.  For GNC, which sells commodity items such as vitamins and supplements that customers are likely to repurchase on an ongoing basis, a strong loyalty program can be a true differentiator from its competition. The brand believes its new pricing program is just that. - In less than 2 months, GNC Holdings has added 3 million new customers to its Member Price Gold Card loyalty program that rolled out during May and June Company accomplished its primary goals regarding the new Membership Price Gold Card customer loyalty program: * New Gold Card member acquisition * Increased flexibility to manage pricing * More transaction growth spread throughout the month * Improved price perception * Better program awareness among new customers. Most significantly, the new Membership Gold Card program addressed the biggest customer complaint against the previous version of the program, which was the inability to use the card throughout the month. Under the new Member Price program, customers save up to 50% on every item everyday at any GNC store or on their website. Under the old program, customers paid $15 for a one-year membership and received 20% off purchases during the...

Words: 626 - Pages: 3

Premium Essay

“Loyalty Cards Cost Shoppers Millions

...MKTG7501 Consumer Behaviour “Loyalty cards cost shoppers millions” Introduction - Everyday Reward Card  What is “Everyday Rewards Card”? The Everyday Rewards Card is an initiative from Woolworths that provide a series of benefit rewards to their customers. Such as fuel saving offers, exclusive offers and discounts of their products. Also, now it’s able to earn Qantas Frequent Flyer points when you shopping in Woolworths, BIG W, Dick Smith and Tandy stores.  How to accumulate reward points? You can earn one point per dollar when spent over $30 in one transaction when you shop in Woolworths, BIG W and one point per dollar when spent over $50 in one transaction when you shop in the Dick Smith and Tandy stores. Pros & Cons of using Everyday Rewards Card  Advantages There are three major advantages that Woolworths offers its rewards card holders : save 4c per litre on fuel at Caltex petrol stations, receive exclusive offers and discounts from all participating retailers, and earn Qantas frequent flyer points for a free flight ticket.  Disadvantages According to a recent report from a consumer group Choice, consumers will unconsciously spend more than what they can gain from these kind of loyalty programs. First of all, paybacks are very "poor". Woolworths Everyday Rewards program averagely offers less than $1 return for every $100 a shopper spends. And if anyone want to collect points for free flight tickets, he or she will need to spend at least $22,030 to gain...

Words: 662 - Pages: 3

Premium Essay

Bsa/310 Kudler Loyalty Program

...Kudler Loyalty Program Adam Love BSA/310 5/30/2011 Deborah Marshall Kudler Loyalty Program Kudler Fine Foods is a well-known California grocer specializing in high quality and specialty foods. Since Kudler does not consider the price of its products to be a primary differentiator from other grocers their intent is to produce more profit through increasing their customer purchase cycle. To achieve their goal Kudler has partnered with a loyalty points program to provide customers with points which can be redeemed for high end gift items, airline first-class upgrades, or other specialty foods. This program will also enable the ability to track customer transactions and identify trends that will allow Kudler Fine Foods to customize its product base to better serve its customers. Before implementing this program, Kudler will need to identify legal, ethical, and security considerations as well as complete a financial analysis to justify the expenditures of the loyalty program. The Program According to Kudler Fine Food’s Sales & Marketing intranet website, customers prefer high quality and specialty foods are more important than high value prices. Kudler’s Sales and Marketing Department is developing a Frequent Shopper Program which will offer incentives to its members through point accumulation and offer Kudler valuable insight into the shopping tendencies of customers. Initially, this program should promote customer loyalty and give Kudler...

Words: 1159 - Pages: 5

Premium Essay

How Businesses Develop Brand Loyalty

...Abstract Brand loyalty is one of the most powerful marketing tools available, and it ensures that your products will be desired again. Many consumers develop loyalty to products of all types. Cars, electronics, clothing, and even groceries are often chosen based on prior experiences with particular brands. There are many ways to create and maintain brand loyalty. Three of those ways include rewards programs, customer satisfaction, and advertising. This report covers the definition, creation, and influence of social media on brand loyalty.   How Businesses Develop Brand Loyalty All companies strive to create brand loyalty. It is one of the most powerful marketing tools available, and it ensures that your products will be desired again. Many consumers develop loyalty to products of all types. Cars, electronics, clothing, and even groceries are often chosen based on prior experiences with particular brands. How do companies develop this loyalty? In this report I will cover the definition, creation, and influence of social media on brand loyalty. The Hutchinson Unabridged Encyclopedia (2014) defines Brand Loyalty as “the attachment of the consumer to a particular brand, even though that brand may be more expensive than competing brands.” It is a psychological contract between the consumer and the brand, and is the target for all marketers. Consumers tend to develop devotion to specific brands. They buy what they have already bought in the past and opt for the brands that they...

Words: 1175 - Pages: 5

Premium Essay

The Influence of Brand Loyalty on Cosmetics Buying Behavior

...performance with an average of 5% growth rate among the industries every year (Kumar, 2005). According to Britannica (2011) the word "cosmetics" is derived from the Greek word Kosmetikos, which means "skilled at decorating". It refers to any of several preparations (excluding soap) that are applied to the human body for beautifying, preserving, or altering the appearance or for cleansing, coloring, conditioning, or protecting the skin, hair, nails, lips, eyes, or teeth. Besides that, when people hear the word ‘cosmetic’, they tend to think of makeup and perfume design for women. Actually, cosmetics come in many forms, which are powder, soap, shampoo, body makeup, toothpaste and others (Kumar et al., 2006). According to Wikipedia, brand loyalty can be defined as the situation in which the consumer usually...

Words: 4242 - Pages: 17

Premium Essay

Evaluate the Viability of Loyalty Programs in the Cruising Industry

...1. Evaluate the viability of loyalty programs in the cruising industry. Believe it or not people like the idea of feeling important and nothing seems more important than getting perks. So why belabor the point by putting up barriers to the customer and I like the Idea of loyalty rewards I hate the idea of carrying an extra card or coupon to be redeemed but I don’t want a coupon and I don’t want to have to redeem anything at some website. Whenever you ask the consumer to complete an additional action beyond the transaction in order to gain credit in a loyalty program, you're placing distance between the positive emotion elicited by the reward and the action of purchasing an item or experience. http://www.huffingtonpost.com/seth-priebatsch/customer-loyalty-programs_b_859462.html I like this Idea for Carnival because it could be very transactional and very lucrative...Let’s say a customer enjoyed a spa service, a few hours in the casino and a rock climbing session So why not use these customers experience as personal guides to what will attract these individual back to the Carnival cruise experience. So instead of offering a membership let them know when they Cruse with Carnival they are lifetime voyagers. If they enjoyed the spa offer them an upgrade them or comp them on their next Voyage treatment, Or matching funds at the casino even a free go on the Rock wall. Really if the idea here is to draw repeat customers why hide it. Just add the complimentary stuff to their...

Words: 795 - Pages: 4

Premium Essay

Security Considerations: Kudler Fine Foods Loyalty Program

...Initiation Phase During the initiation phase of the Loyalty Points project, the development team must store one to one interviews with key stakeholders in a secure location. The key deliverables in this stage are the business requirements document (BRD) and the project charter. Hence, the project manager (PM) must store these notes and documents in a secure location. If the documents are in electronic format, having a full disk encryption technique like Secure Disk from Seagate Technologies or a file encryption program like PGP Desktop would be an acceptable choice to cipher all documents in the event an inside or outside attacker stole any hardware during the initiation phase project documentation of the loyalty points program. (Conklin et. al., 2012) Planning Phase In the planning phase of the loyalty points program, JAD session notes between stakeholders and the development team along with the work breakdown structure (WBS), Data Flow Diagrams (DFD), and system flow charts will be the key deliverables for this phase. Like the initiation phase, the PM must consider what medium to store JAD, WBS and DFD documents. If the PM stores these documents in electronic format, using a full disk or file encryption program in tandem with username/password combination for the laptop or desktop OS would be a good layered defense for user authentication to gain access to these project documents. (Conklin et. al., 2012) If the deliverables are in hard copy, having the copies...

Words: 1182 - Pages: 5

Premium Essay

Retailing Lessons from Loyalty Programs Around the Globe

...Retailing Lessons from Loyalty Programs Around the Globe Terlisa Smith Professor Cynthia Mable Retail Management – MKT 310 January 29, 2012 Loyalty programs are invented to reward, and encourage, loyal buying behavior. This helps promote consumers to keep coming back to the store.  When consumers present the cards they either get a discount on the current purchase, or points that can be used for future purchases.  Discuss the CVS Extra Care loyalty program and whether the program is a source of shopper insights that can be leveraged in multiple ways to be successful.  Since launching in 2001, CVS/pharmacy’s Extra Care Rewards Program has become the largest retail rewards program in the U.S., with more than 66 million active cardholders. It’s one of the most popular loyalty programs. The Extra Care program pays back customers 2 percent on purchases in the form of a coupon printed at the bottom of the store receipt. It also rewards customers with $1.00 to spend on front store merchandise for every two prescriptions purchased. The Extra Care program operates as an extra layer to our mass marketing efforts. It allows CVS to target customers with offers that print at the register and in direct mailings according to what they really buy and need regularly. Card sales provide CVS with another body of information that helps provide better services and products, customize our promotional plans, and generally be a better pharmacy retailer for customers. Recommend how...

Words: 761 - Pages: 4

Premium Essay

Case Analysis Nectar Making Loyalty Pay

...5: Case 5: Promotion Decisions MAKING LOYALTY PAY NAME STUDENT ID Nancy Kumari S11013306 Saher Buksh S01007626 Aditya Raniga S11052046 Table of Contents CASE BACKGROUND 2 NECTAR 2 Nectar Promotions: 3 SAINSBURY 4 Therefore the aim of this case analysis is to find out: 4 CUSTOMER REGISTRATION 5 TWO METHODS OF REDEEMING POINTS 5 COMPETITION IN THE UK GROCERY RETAIL INDUSTRY 6 COMPARISION WITH COMPETITORS 7 SITUATIONAL ANALYSIS 8 CONSUMER PERCEPTION OF NECTAR PROGRAM 8 SPONSOR PERCEPTION OF NECTAR PROGRAM 9 SHOULD SAINSBURY CONTINUE WITH NECTAR? 10 SHOULD SAINSBURY HAVE ITS OWN LOYALTY PROGRAM? 12 SHOULD SAINSBURY DISCONTINUE WITH THE CURRENT LOYALTY PROGRAM & DIVERT FUNDS TO NON-LOYALTY PROGRAMS 14 CUSTOMER LOYALTY PROGRAM IN FIJI 16 RECOMMENDATION 17 CONCLUSION 20 BIBLIOGRAPHY 21 CASE BACKGROUND NECTAR • The Nectar loyalty program is the largest loyalty program in Britain. • It was launched by Loyalty Management UK (LMUK and chaired by Air Miles co-founder Keith Mills). • Nectar opened for business in 2002. • With reference to various individual retail loyalty programs, Keith Mills,Gierkink, and further members of the LMUK team undertook the decision to search for an assembly of retailers who had their individual reward programs however were not content with their performance and could enhance their performance by combining with a multi partner loyalty scheme. • The initial four sponsors of LMUK were Sainsbury...

Words: 3004 - Pages: 13