# Actuarial Data Science

Bridging the Gap between Actuarial and Data Science

Fri 10 November 2017

# How to explain ensemble models to your grandma

Posted by Pieter Marres in Articles

An ensemble model can significantly improve the reliability of your decisions. No wonder this powerful concept drives common machine learning models like random forest and gradient boost. But how can you explain your grandma why ensemble models are so useful?

Let’s assume you have just arrived at a new place you have never been before. You make it to this T-junction not being sure whether you should either turn left or turn right to get to your hotel. You decide to ask a local. He tells you to be 80% certain that you should turn left. What should you do? If you follow his advice, your chances of going into the wrong direction are still 20%. That’s why you decide to ask two other locals who are passing by. They both recommend you (again with 80% certainty) that you should turn right. This contradictory information may seem confusing at first, but then you realize that the majority (2 out of 3 locals) recommends you to turn right. How can this reduce your risk of still going into the wrong direction? Well, let’s calculate the probability that X locals (X = 3, 2, 1, 0) happen to send you into the wrong direction:

• P[X=3] = 20% x 20% x 20% = 0.8%
• P[X=2] = 20% x 80% x 80% x 3 = 9.6%
• P[X=1] = 80% x 80% x 20% x 3 = 38.4%
• P[X=0] = 80% x 80% x 80% = 51.2%

Note that these probablities add up to 100%, as you would expect.

As you decide to use the majority vote to make up your mind, you will only arrive late at your hotel for dinner if at least 2 out of 3 locals give you the wrong information. The probability that this happens is only 0.8% + 9.6% = 10.4%.

Conclusion: by asking three different individuals and taking a decision based on the majority, you almost decreased your error rate by half!

From this simplified example we can conclude the following:

• using several sources of information for decision making can reduce the error rate;
• this approach assumes the sources of information to be independent.

Congratulations! You did your best explaining an ensemble model to your grandma. Feel free to drop me a line for any suggestions you (or your grandma) may have.