Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (2024)

MACHINE LEARNING MADE EASY

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (3)

Measuring the performance of a model is an essential step whether you are doing research, betting, or simply comparing predictions. This short article shows how to compute the hit ratio and the log loss for 1x2 football predictions with code and examples. These two losses are commonly used in football analytics and in machine learning globally. The data used in this article are provided by Sportmonks.

Before measuring the performance of a model we need to understand what problem we are dealing with. We want to predict the result of a football (soccer) match between team A and team B. The result and the prediction can take three different values called classes: team A wins, team B wins, and draw. Since we have only three categories this problem is a classification task. But, we are also interested in the probability of each outcome. These probabilities give information about the level of certitude you have in the outcome. It turns out we now have a probabilistic classification problem.

We have only three outcomes and as many probabilities for each match. We know that the sum of these three probabilities is always equal to one. As a consequence, the predicted class associated to the largest probability will always have a probability large of equal to 1/3.

In the 1x2 prediction case, the predicted result of the match between team A and team B has two linked answers:

  • Prediction of the probabilities: this is the task done by a mathematical model, the bookmakers, votes, and so on.
  • Prediction of the class: this is the result associated with the largest probability.

For instance, if team A has a 30% probability of winning, team B has 45% and the draw is 25%, the predicted class (result) is team B wins. Note that we have 30%+45%+25%=100%.

Since we have two slightly different answers to the prediction problem, we have two different performance measures. The log-loss measure the quality of predicted probabilities and the hit ratio the correctness of the predicted class.

The hit-ratio (or accuracy) is probably the metric the most used and the most understandable to evaluate classification models. It is easy to understand that 10% of your prediction are correct. In our case, we want to measure the hit-ratio of a three classes model.

Say we have 200 matches with their results and predicted class. All we need to do is to compare the predicted class to the actual result. If they are the same count it as one, if not count it as 0. We do that for the 200 matches, take the average, and obtain the hit ratio. Mathematically we can write:

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (4)

We want the hit ratio as high as possible and it is always between 0 and 1.

On the other hand, the log-loss is a bit more complicated. This loss gives you a measure of the quality of the probability. A way to think about it is to take each outcome separately.

For instance, take the team A wins prediction. If the actual result is team A wins then any model with a probability over 34% on team A wins will make the correct prediction. But, a model that has a probability of 95% should be seen as a better model than the one which predicted 55%. Even if both have the correct outcome.

As you can see all we need is to measure how far the probability is from 1 (respectively 0) when the correct result is true (respectively false). Indeed if team A wins, the known probability is 1 and we want to know how far our prediction was from that. The “how far” can be calculated using different methods but we will focus on the log-loss. In our team A wins example it is simply:

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (5)

It is 0 if team A loses the game but it becomes interesting if A wins. In this case, the loss value is -2.30 for a probability of 10%, -0.69 for a probability of 50%, and -0.01 for a probability of 90%. If the probability is 100% the loss is 0. In fact, we want the log-loss as close as possible to zeros. The log-loss is always between 0 and minus infinity. The closer the probability to the actual result the better the log-loss.

In this example, we know team A won the game. In this case, the log-loss on the two other results is 0. So for each game only the probaility associated with the correct result matters for the loss.

If we take again the 200 matches, the total log-loss is then:

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (6)

The hit-ratio and the log-loss are estimated. The larger the number of matches the better this estimate will be. It is especially true for the log loss where a wrong probability can have a large negative impact on the log-loss average.

Let’s calculate an example using data. The python code to compute both metrics is available on GitHub. The functions are very simple and use the same inputs. The first argument is probabilities , a pandas dataframe where the columns contain the probabilities of “1”, “2”, and “X” results, and where each row is a match. The second argument is a pandas series that contains the true result. Inputs are checked to make sure the correct value will be returned. The two functions are:

def compute_1x2_log_loss(probabilities, true_results):
'''
Compute the log-loss for 1x2 football results.

'''

def compute_1x2_hit_ratio(probabilities, true_results):
'''
Compute the hit-ratio for 1x2 football results.
'''

For instance the probabilities dataframe will be as follow:

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (7)

while the true_results will be:

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (8)

Then the loss can be easily computed as follow:

log_loss = compute_1x2_log_loss(probabilities,true_result)hit_ratio = compute_1x2_hit_ratio(probabilities,true_result)

A full example is available in a notebook here.

This article shows how to compute the log-loss and the hit-ratio for 1x2 probabilities models. We also provide the code to make it easier to understand and make the calculation by yourself. The functions presented can be extended to other types of prediction like both teams to score or over-under games.

Football Prediction Performance: How to Calculate Hit-ratio and Log-loss (2024)
Top Articles
Latest Posts
Article information

Author: Otha Schamberger

Last Updated:

Views: 6050

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Otha Schamberger

Birthday: 1999-08-15

Address: Suite 490 606 Hammes Ferry, Carterhaven, IL 62290

Phone: +8557035444877

Job: Forward IT Agent

Hobby: Fishing, Flying, Jewelry making, Digital arts, Sand art, Parkour, tabletop games

Introduction: My name is Otha Schamberger, I am a vast, good, healthy, cheerful, energetic, gorgeous, magnificent person who loves writing and wants to share my knowledge and understanding with you.