Statistics of Extra Points

Statistics of Extra Points - Title Image

Today we take a look at extra points, i.e. the part of a game after the score reached 20-all. Normally a game is won by the first player or team to reach 21 points, but the requirement of a two-point difference calls for this overtime. This part of a game is characterized by the tightness of the game. No player or pair could win enough points to conclude the game in the regular duration. Moreover, any player who will gain a lead of two points will be awarded the game, thus this part is also characterized by the possibility of winning or losing the whole game because of only two points. In the absence of a two-point margin, the game will be decided by one final rally in case the score reaches 29-all, being the ultimate decider.

For this analysis we will use only the matches played using the current scoring system of best of three games to 21 points. Matches with other scoring systems, even though they might contain extra points as well, will be ignored.

How often do Extra Points Occur?

First we can count how many games ended with extra points and discuss the distribution of final scores. As all necessary information is contained within the final score, we can include all matches where the final score is known.

Numbers of Games

Games With Extra Points Percentage Without Extra Points Percentage
1407959 113238 8.04% 1294721 91.96%

First we find that about 8% of all games required extra points. This means that about one in 12 to 13 games will only be decided in the overtime.

Numbers of Matches

  Matches No Game With Extra Points 1 Game With Extra Points 2 Games With Extra Points 3 Games With Extra Points
all 620239 515886
83.18%
95717
15.43%
8387
1.35%
249
0.04%
2 games 452758 400472
88.45%
49695
10.98%
2591
0.57%
 
3 games 167481 115414
68.91%
46022
27.48%
5796
3.46%
249
0.15%

We can also ask how many matches with a game with extra points took place. We differentiate by the number of games that the match lasted and by the number of games that required extra points.

Out of all matches, about 17 per cent contained at least one game that required extra points. We find that matches lasting three games have a higher proportion of matches with extended games than matches with only two games. The proportion of matches with one game with extra points is about 2.5 times greater than for two-game matches. This is a selection effect as the three game matches usually feature players or teams that are matched more closely in strength. We find that about one in 700 three-game matches consists of three games that all required extra points.

Matches where the first two games required extra points

Matches Same Game Winner Percentage Rallies Won
in 1st Game
Expected
for 2nd Game
4905 2591 52.8% 52.2% 54.0%

One question that arises is of course, whether some players are better in extra points than others. We can take a look at all matches where both first games required extra points. These matches can be either won in two games if both games were won by the same player or pair, or consist of three games in case the games were won by different players or pairs. If players differ in their abilities to win the extra points, there should be a correlation between the winner of the first and the second game, as players who won the first game in overtime should also be expected to win the second game once it reaches extra points. If it is more random, we would expect a smaller correlation.

We find that in 52.8% of the matches with the first two games requiring extra points, both these games were won by the same player or pair. This is more than half the cases, but not by much. If we calculate the average percentage of rallies won in the first game, we find that the winner of the first game won on average 52.2% of the rallies. If we adapt the formula from the post Probability to Win a Game to start at a score of 20-all, we find that we would have expected the winner of the first game to win 54.0% of the second games that require extra points. Thus the probability is lower than expected. We cannot say that some players are better at winning the extra points, what we see is much more likely to be a regression to the mean, or in other words, winning the first game was just a statistical fluctuation.

Histogram of Final Scores

We now take a look at the distribution of final scores. The following table shows the frequencies of the different final scores as well as the ratio of its frequency to the frequency of the next lowest score. A plot of the frequencies with which the different final scores occur is given in the subsequent plot.

Final Score Games Percentage Ratio to Previous Score
22-20 53625 3.809 %  
23-21 27541 1.956 % 51.36%
24-22 15153 1.076 % 55.02%
25-23 7794 0.554 % 51.44%
26-24 4200 0.298 % 53.89%
27-25 2216 0.157 % 52.76%
28-26 1275 0.091 % 57.54%
29-27 636 0.045 % 49.88%
30-28 358 0.025 % 56.29%
30-29 440 0.031 % 122.91%

Frequency of Final Scores after Extra Points

We see that the frequency starts at around 4% for a score of 22-20 and then roughly halves for each increment of the score, giving an almost perfect straight line in this logarithmic plot. Everyone familiar with the geometric series can thus easily see that the frequencies add up to about 8%, the total frequency of games with extra points as mentioned in the beginning of this post.

The halving of the frequency can be understood as follows: Suppose the score is 21-20 and that each player or pair has a probability of about 50% to win each rally. This assumption is quite justified as the players have been quite level for the whole game so far. From this score there is a 50% chance that the leader will win the next rally and the game. If he doesn’t win the rally, the score is level again. After the next rally the score will be 22-21 to either player or pair, after which this player or pair has then that same 50% chance to end the game at 23-21. This goes on until one party won the game, at the most at a score of 30-29. Thus at every game point, there is a 50% chance that the game ends, the probability of any final score is thus the probability for a score of 20-all and then a factor of 0.5 for each failed game point. Thus we expect the ratios to be about 50%. The final score of 30-29 will happen with certainty if the game point at 29-28 is not converted, thus we would expect the frequency for 30-28 and 30-29 to be the same.

If we look closer at the data, we see that the ratios are a bit bigger than expected. As we said, we expected ratios of 50%, but all ratios except one are above 50%. This is a hint that it is a disadvantage to serve. If the score is 21-20 for example, the player or pair who has the game point, will have won the last rally and thus will be serving on an own game point. If the serve is a disadvantage, it is generally less unlikely to convert the game point, thus the score will reach higher numbers more often than expected.

Percentage of Games with Extra Points per Odds

As some bookmakers offer bets on whether a game will require extra points, we can also take a look at the frequency of games with extra points among those matches that have information about odds in the database. We will only include matches that have odds for the winner of the match. The implied probability will be calculated as already discussed in the post How do Odds Develop for Repeated Men’s Singles Matches?. We will use the implied probability for the weaker player or pair and use bins with a width of 5%.

Implied Probability 1st Game 2nd Game 3rd Game
0 to 5 % 37 of 3642
1.02%
41 of 3642
1.13%
6 of 96
6.25%
5 to 10 % 390 of 10682
3.65%
365 of 10682
3.42%
67 of 1070
6.26%
10 to 15 % 836 of 12564
6.65%
821 of 12564
6.53%
205 of 2578
7.95%
15 to 20 % 953 of 10594
9.00%
878 of 10594
8.29%
270 of 3032
8.91%
20 to 25 % 792 of 8403
9.43%
718 of 8403
8.54%
252 of 2792
9.03%
25 to 30 % 1080 of 10714
10.08%
1025 of 10714
9.57%
390 of 3848
10.14%
30 to 35 % 869 of 7893
11.01%
830 of 7893
10.52%
318 of 3072
10.35%
35 to 40 % 888 of 7934
11.19%
833 of 7934
10.50%
353 of 3238
10.90%
40 to 45 % 724 of 6855
10.56%
774 of 6855
11.29%
290 of 2869
10.11%
45 to 50 % 883 of 8103
10.90%
890 of 8103
10.98%
349 of 3291
10.60%

So when bookmakers expected one player to win with a probability of less than 5%, there were extra points in about 1% of the first games. We’ll see the same data in the following plot.

Plot

The vertical error bars show the statistical uncertainty, the horizontal error bars show the width of the bin. The entries have been slightly moved out of the center to be more easily distinguishable.

We find that the percentage of games with extra points increases with the winning probability of the weaker players. It is expected that the more evenly matched the opponents are, the more likely a game with extra points will be. For fairly competitive matches, i.e. matches with a winning probabilities between 30 and 70%, the percentage of games with extra points is fairly independent of the implied winning probabilities and lies between 10 and 11%. Thus we would expect odds for betting that a particular game will have extra points to be around 9.0 or 10.0 on average.

For low winning probabilities we see that the percentages for third games are higher than for the first two games. This is also a selection effect. If a third game was played, the opponents were more closely matched than expected and thus the probability of extra points was also higher than expected.

The percentage presented here are higher than the 8% shown above, which is due to the generally more evenly strong line-ups in matches that have betting offered.

How does catching up influence winning probabilities?

We now will take a look at the winning probabilities depending on the number of game points that were not converted before reaching a score of 20-all. The underlying question is whether missed game points have an effect on the outcome of the game once a score of 20-all is reached, and the lead has vanished.

For this analysis we can only include matches for which the progression of scores is known. Thus the number of matches eligible for this analysis is smaller compared to the analysis above.

Histogram of win percentage after catching up

First we take a look at the probability that a player who came back from a deficit to a score of 20-all then goes on to win the game. Naturally the bigger the initial difference was, the more unlikely it is to reach a score of 20-all.

The following table contains the number of games that contained a score of 20-all, differentiated by the score after the point where one player reached 20 for the first time. The table also includes the number of games that were then won by the player who came back from the deficit.

Distance Caught up from Won Games Percentage Won
9 11-20 10 17 58.82 ± 11.94%
8 12-20 6 22 27.27 ± 9.50%
7 13-20 40 92 43.48 ± 5.17%
6 14-20 123 256 48.05 ± 3.12%
5 15-20 321 659 48.71 ± 1.95%
4 16-20 810 1706 47.48 ± 1.21%
3 17-20 1713 3704 46.25 ± 0.82%
2 18-20 4072 8574 47.49 ± 0.54%
1 19-20 9043 18656 48.47 ± 0.37%

We see that the percentage of won games fluctuates around 47 to 48 per cent. The percentages for the three biggest differences deviate from this, but there are also not many games where a player came back from such a great deficit. We can see no dependence of the winning percentage on the initial distance. For example coming back from a deficit of 14-20 we see a winning percentage of 48.1%, coming back only from 19-20 we see a winning percentage of 48.5%. Thus we cannot see any hints of the influence of momentum. It seems there is no difference between saving six game points and coming back from 14-20 and saving only one game point at the score of 19-20.

The following plot shows the winning percentages as pie charts. The lowest row shows the percentage for all final scores, the rows above show the winning percentages for specific final scores. The solid part of the circle stands for the percentage of won games. The colour of the circle stands for the number of matches. If there was only one match, the circle is grey, otherwise the colour is given by the palette on the right.

Plot

Again we see that for most entries, the winning probability is slightly less than 50 per cent. Also there is no visible difference between games that are decided after a short overtime, i.e. finishing 22-20, and games taking longer until a decision is reached.

Histogram of win percentage after catching up to 19-19

To separate the influence of the game points we can now analyse games where one opponent came back to a score of 19-all. Numerically the scores of 19-all and 20-all represent almost the same situation. In both cases the side first to gain a lead of two points will win the game. The only difference is that the final rally at 29-all is two rallies closer for a score of 20-all, but this is a minute difference.

Emotionally, on the other hand, the scores differ. At 20-all, one side just had one or more game or even match points that couldn’t be converted. The question that arises is whether these missed game points affect the subsequent performance. We will thus compare the winning percentages after catching up to these scores to separate the influence of missed game points.

The following table give the equivalent of the previous table, but now for games where a score of 19-all occurred. So here we compare the winning probabilities after trailing when the opponent reached 19 and subsequently levelling at 19-all.

Distance Caught up from Won Games Percentage Won
9 10-19 17 31 54.84 ± 8.94%
8 11-19 52 109 47.71 ± 4.78%
7 12-19 114 239 47.70 ± 3.23%
6 13-19 267 549 48.63 ± 2.13%
5 14-19 639 1325 48.23 ± 1.37%
4 15-19 1265 2650 47.74 ± 0.97%
3 16-19 2471 5187 47.64 ± 0.69%
2 17-19 4414 9116 48.42 ± 0.52%
1 18-19 7087 14466 48.99 ± 0.42%

We can then compare the winning probabilities with the games with the same initial distance from the previous table. The following plot shows these winning percentages after catching up to 19-all and 20-all respectively.

Plot

We see that there are almost no differences. The only difference visible is that for small initial distances a player or team catching up to 19-19 has a higher winning percentage than after catching up to 20-all. Thus the opponent who just missed one to three game points has a higher chance of winning the game than a player who just lost the same number of rallies allowing the opponent to catch up at a score of 19-all. Thus it seems that missed game points increase the chance of winning the game which seems counter-intuitive.

Comparison of numbers of games

Until now, we only compared the winning percentages. But we can also compare the number of times it occurred that one side led by a margin and then the other side caught up.

Assuming that each rally is won by each player or pair with a probability of \(p\approx0.5\) we can compute the probability for a comeback from a score of \((19-d) - 19\) and \((20-d) - 20\) to \(19 - 19\) and \(20 - 20\) respectively (We use the minus sign as a mathematical operator as well as to separate the scores. If there might be confusion, we will use parentheses to make clear which minus sign is used as a mathematical operator). The difference \(d\) is the difference in points the first time the leader reaches 19 or 20 respectively. The probability \(P_c\) for the comeback is then given by

\[P_c(19-d,19) = P(0,0 \rightarrow 19-d, 18) \cdot P(19-d,18 \rightarrow 19-d, 19) \cdot P(19-d, 19 \rightarrow 19, 19).\]

The first two terms give the probability for the score of \((19-d) - 19\) to be the first score where the leader had 19 points. Thus the leader has to have won the last rally and the probability is then given by the product of the score reaching \((19-d) - 18\) and then the leader scoring the next point. The third factor is just the probability for the player or team that is catching up to win the next \(d\) rallies. The formula for a comeback from \((20-d) - 20\) can be derived in the same manner.

We can then compare the expected ratios

\[\begin{eqnarray} R(d) &=& \frac{P_c(19-d,19)}{P_c(20-d,20)} \nonumber \\ &=& \frac{P(0,0 \rightarrow 19-d, 18) \cdot P(19-d,18 \rightarrow 19-d, 19) \cdot P(19-d, 19 \rightarrow 19, 19)}{P(0,0 \rightarrow 20-d, 19) \cdot P(20-d,19 \rightarrow 20-d, 20) \cdot P(20-d, 20 \rightarrow 20, 20)} \nonumber \\ &=& \frac{\binom{19-d+18}{18} p^{19-d+18} \cdot p \cdot p^{d}}{\binom{20-d+19}{19} p^{20-d+19} \cdot p \cdot p^{d}} \nonumber \\ &=& \frac{\frac{(19-d+18)!}{(19-d)! \cdot 18!}}{\frac{(20-d+19)!}{(20-d)! \cdot 19!} \cdot p^2} \nonumber \\ &=& \frac{(20-d)\cdot 19}{(20-d+19) \cdot (20-d+19-1) \cdot p^2 } \nonumber \\ &\approx& 4 \cdot \frac{(20-d)\cdot 19}{(39-d) \cdot (38-d) } \nonumber \end{eqnarray}\]

We find that the expected ration is very close to 1 for all relevant distances. The following table shows the observed ratios as well as the expected ratios for the different distances.

Distance Caught up from Games Observed Ratio Expected Ratio
9 10-19
11-20
31
17
1.82 0.96
8 11-19
12-20
109
22
4.95 0.98
7 12-19
13-20
239
92
2.60 1.00
6 13-19
14-20
549
256
2.14 1.01
5 14-19
15-20
1325
659
2.01 1.02
4 15-19
16-20
2650
1706
1.55 1.02
3 16-19
17-20
5187
3704
1.40 1.03
2 17-19
18-20
9116
8574
1.06 1.03
1 18-19
19-20
14466
18656
0.78 1.03

We find that the observed ratio is much greater than the expected ratio for larger initial distances. The most obvious explanation would be that players sometimes give up when trailing 12-20 for example, thus waiving the chance to catch up to 20-all. For a corresponding score of 11-19, players less frequently give up which increases the chance of scoring 8 points in a row and catching up to 19-all.

As the difference shrinks, the observed ratio approaches 1 as well, indicating that players more and more refuse to give up. For a difference of 1 though, the observed ration falls to 0.78 and thus below 1. I don’t know why the sequence of scores 18-all, 19-18, 19-all should occur less frequent than the sequence of scores 19-all, 20-19, 20-all. Maybe there is some factor I am missing.

Conclusion

We found that about 8% of games in the database required extra points. The number rose to about 10% when only including matches with odds.

Higher scores occur less frequent with a factor of slightly above 0.5 for each additional pair of points. The most likely explanation for the value being above 0.5 is that serving constitutes a disadvantage to the server.

We found that wasted game points did not affect the winning probability once the score reached 20-all. Players who just had one or multiple game points that they couldn’t convert still had a winning percentage of higher than 50%, which is in agreement with the i.i.d.-assumption and a server’s disadvantage.

However we found indications that players tend to give up when the opponent has multiple game points, thus lowering the number of times players have caught up to 20-all from greater distances as compared to catching up to 19-all from the same distance.

One question we cannot answer in this analysis is whether as soon as Gillian Clark says:

I have to remind everyone that as soon as the score reaches 29-all, it’s sudden death.

the game will be over after the next game point.