How many Points are Expected to be Played?

As I recently discussed the Theory of Point Differences, I will now discuss the theory of the total number of points to be played in a game and in a match. Often the expected number of points is a good measure of how one-sided a match is expected to be, i.e. if the expected number of rallies is rather low one player is thought to be the favorite and have a chance of winning a short match. On the other hand, for players or pairs that are expected to be of equal strength matches are expected to last longer, often going to three games.

Note that in the absence of lets and penalties the number of points is equal to the number of rallies played. We will therefore regard the two values for number of rallies and number of points as the same for this post.

How to Calculate the Expected Number of Points

Averages

Under the assumption that each rally is won with the constant probability \(p\) by one player, the expected average for the number of points in a game can be calculated as follows:

\[\begin{eqnarray} \bar{N}_{rallies per game} (p) &=& \sum\limits_{n_2 = 0}^{19} \, (21+n_2) \cdot \dbinom{20+n_2}{20} p^{20} (1-p)^{n_2}\cdot p \\ &+& \sum\limits_{i = 0}^8 (42 + 2i) \cdot \dbinom{20+20}{20} p^{20} (1-p)^{20} \cdot \left[ 2p(1-p) \right]^i \cdot p^2 \\ &+& (59) \cdot \left. \left. \dbinom{20+20}{20} p^{20} (1-p)^{20} \cdot \right[2 p (1-p) \right]^9 \cdot p \\ &+& (59) \cdot \left. \left. \dbinom{20+20}{20} p^{20} (1-p)^{20} \cdot \right[2 p (1-p) \right]^9 \cdot (1-p) \\ &+& \sum\limits_{i = 0}^8 (42+2i) \dbinom{20+20}{20} p^{20} (1-p)^{20} \cdot \left[ 2p(1-p) \right]^i \cdot (1-p)^2 \\ &+& \sum\limits_{n_1 = 0}^{19} \, (n_1+21) \dbinom{n_1+20}{n_1} p^{n_1} (1-p)^{20}\cdot (1-p) \end{eqnarray}\]

The formuala is quite similar to the one used in the post Theory of Point Differences. The first line corresponds to games won with a score of twenty-one to something, the second line corresponds to games won with a difference of two points after extra points while the third line corresponds to games won with a score of 30-29. The following lines correspond to the same games in reverse order but lost. All probabilities are then weighted by the number of points, given by the term in the first parentheses.

The average for a match can then by calculated using the formula

\[\begin{equation} \bar{N}_{rallies per match} (p) = \bar{N}_{games per match}(p) \cdot \bar{N}_{rallies per game} (p) \end{equation}\]

where \(\bar{N}_{games per match}(p)\) is the average number of games per match.

Medians

The calculation of the median is done using similar formulas. The match or game outcomes can be ordered by the number of points and then, starting with the lowest numbers, summed until a combined probability of 50% is reached.

Plots

Rallies Per Game

The following plot shows the average and the median for the number of rallies in a game for different probabilities to win a single rally.

Average and Median of Points in a Game for different Rally Probabilites

We see that there is no big difference between the median and the average. Both curves start at 21 points and rise towards the center of the plot. For a match between players of equal strength the maximum number of expected points is reached. The median is 37, with the average slightly larger.

The following three plots show histograms for the number of rallies for three values of the rally probability, that is for rally probabilities of 50%, 55% and 60%. Additionally the dashed vertical line indicates the average while the dotted vertical line shows the median. On the horizontal axis the number of rallies is shown. Possible values range from 21, corresponding to a score of 21-0 to 59, which corresponds to the highest possible game score of 30-29. Of course these extremes are rather unlikely, most games will have a number of points in the middle of the range.

Histogram of points in a game for a rally probabiliy of 50%

For the first plot we see a maximum of about 12.5% for a number of rallies of 40. That means that about one in eight games between players or pairs that are equally strong will end with a score of 21-19. There are a bit more than 6% of games ending after 42 points, i.e. with a score of 22-20. For each additional pair of points, the probability halves, finally leading to a probability of 0.024% for a score of 30-29.

Histogram of points in a game for a rally probabiliy of 55%

Increasing one player’s strength shifts the histogram towards lower values. The maximum is now for 39 rallies, the frequencies for 38 and 40 rallies are almost the same.

Histogram of points in a game for a rally probabiliy of 60%

For a rally probabiliy of 60% one would naively expect games to end with scores of about 21-14. The histogram is thus shifted even more towards lower numbers of rallies and now peaks at 34. The entries for games needing extra points are also smaller.

Rallies Per Match

The following plot shows the average and the median for the number of rallies in a whole match for different probabilities to win a single rally.

Average and Median of Points in a match for different Rally Probabilites

This plot looks distinct from the plot for games as shown above. The peaks around a rally probability of 50% are much narrower due to the greater number of rallies. There is also a gap between the averages and the medians. This is due to the fact that matches consisting of three games have a bigger influence on the average than on the median.

These three plots show histograms for the number of rallies in a complete match for three values of the rally probability, that is for rally probabilities of 50%, 55% and 60%. As above, the dashed vertical line indicates the average while the dotted vertical line shows the median.

Histogram of points in a match for a rally probabiliy of 50%

For a rally probability of 50%, we clearly see the two peaks for two- and three-game matches. For two-game matches the peak is at a value of 75 rallies, while for three game-matches the peak is at a value of 112. The curve for three-game matches is a bit broader as the deviations from the additional game introduce additional width of the distribution. Both the average and the median are at values of about 93 in a region with rather low frequencies between the two peaks. This region is not very accessible, as either a two-game match with two very long games or a three-game match with very short games would be necessary. Possible scorelines for matches with exactly 94 points would be for example 27-25 22-20 or 21-10 11-21 21-10. Also note that in some regions, most notably between 80 and 90, matches with an even number of rallies are more likely than matches with an odd number. This is just due to the influence of games where the final score is between 21-19 and 30-28. These games will have an even number of points and if a number of points in a match is predominantly accessible via these scores, matches with an even number of points will be favoured.

Histogram of points in a match for a rally probabiliy of 55%

Increasing one player or pair’s strength reduces the number of three-game matches as can be seen in the plot. Also both peaks move towards lower values, the three-game peak moves from 112 to 110. Also note that median and average are no longer close together with a difference of almost 10 rallies between them.

Histogram of points in a match for a rally probabiliy of 60%

Further increasing the inequality further reduces the probabilities for three-game matches. Also the curves move further towards lower numbers of rallies. As the influence of three-game matches weakens, the difference between average and median also decreases.

Table

The following table gives the averages and medians for the number of rallies in a game and in a match for different probabilities to win a single rally. The columns Side 1 and Side 2 give the rally probabilities. Not that for the number of rallies this is symmetric, i.e. it doesn’t make a difference whether the probabilities are 70 and 30 or are 30 and 70. Therefore only combinations with a probability for the first side of at least 50% are shown.

Side 1 Side 2 Average
(Game)
Median
(Game)
Average
(Match)
Median
(Match)
100% 0% 21.00 21.0 42.00 42.0
99% 1% 21.21 21.0 42.42 42.0
98% 2% 21.43 21.0 42.86 43.0
97% 3% 21.65 21.0 43.30 43.0
96% 4% 21.88 22.0 43.75 44.0
95% 5% 22.11 22.0 44.21 44.0
94% 6% 22.34 22.0 44.68 44.0
93% 7% 22.58 22.0 45.16 45.0
92% 8% 22.83 23.0 45.65 45.0
91% 9% 23.08 23.0 46.15 46.0
90% 10% 23.33 23.0 46.67 46.0
89% 11% 23.60 23.0 47.19 47.0
88% 12% 23.86 24.0 47.73 48.0
87% 13% 24.14 24.0 48.28 48.0
86% 14% 24.42 24.0 48.84 49.0
85% 15% 24.71 24.0 49.41 49.0
84% 16% 25.00 25.0 50.00 50.0
83% 17% 25.30 25.0 50.60 50.0
82% 18% 25.61 25.0 51.22 51.0
81% 19% 25.93 26.0 51.85 52.0
80% 20% 26.25 26.0 52.50 52.0
79% 21% 26.58 26.0 53.17 53.0
78% 22% 26.92 27.0 53.85 54.0
77% 23% 27.27 27.0 54.55 54.0
76% 24% 27.63 27.0 55.27 55.0
75% 25% 28.00 28.0 56.01 56.0
74% 26% 28.38 28.0 56.78 56.0
73% 27% 28.77 28.0 57.57 57.0
72% 28% 29.17 29.0 58.40 58.0
71% 29% 29.57 29.0 59.26 59.0
70% 30% 29.99 30.0 60.17 60.0
69% 31% 30.43 30.0 61.14 61.0
68% 32% 30.87 31.0 62.17 61.0
67% 33% 31.32 31.0 63.29 62.0
66% 34% 31.78 31.0 64.50 63.0
65% 35% 32.24 32.0 65.84 64.0
64% 36% 32.71 32.0 67.32 65.0
63% 37% 33.18 33.0 68.96 66.0
62% 38% 33.66 33.0 70.76 67.0
61% 39% 34.12 34.0 72.74 69.0
60% 40% 34.57 35.0 74.89 70.0
59% 41% 35.01 35.0 77.19 71.0
58% 42% 35.43 35.0 79.60 72.0
57% 43% 35.82 36.0 82.05 74.0
56% 44% 36.17 36.0 84.48 75.0
55% 45% 36.48 37.0 86.79 77.0
54% 46% 36.74 37.0 88.88 79.0
53% 47% 36.96 37.0 90.63 81.0
52% 48% 37.11 37.0 91.97 84.0
51% 49% 37.20 37.0 92.80 89.0
50% 50% 37.24 37.0 93.09 93.0

Conclusion

The distributions of numbers of rallies in a game and a match can be computed with the formulas above. For a single game median and average are always close to each other. The frequency distribution shows that for a match between players of equal strength a score of 21-19 is the most likely outcome. For games with one favorite the expected number of rallies decreases as shown above.

For matches the situation is a bit more complex due to the possibility of two- and three-game matches. This leads to significant differences between medians and averages. For the frequency distributions the two peaks belonging to the different number of games are clearly visible.

One question one might ask is how the distributions would look under different scoring systems. Increasing the number of games would probably smear out the peaks and remove gaps like the one seen between two- and three-game matches. For example in the proposed scoring system of five games to eleven, the difference between three- and four-game matches would probably not be clear enough to create a clear border. But this could be calculated in the same way as shown in this post, which might be the topic of a future post.