How to build a community for developers

You may want to focus on UX design or mobile development. You might also be finding new ways to combine technologies, like working with AI and blockchain. The most important thing is that they value…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Unveiling Mathematics behind XGBoost

This article is targeted at people who found it difficult to grasp the original paper. Follow me till the end, and I assure you will atleast get a sense of what is happening underneath the revolutionary machine learning model. Let's get started.

John and Emily embarked on an adventure to pluck apples from the tree located at the lowest point of the mountain. While Emily is a smart and adventurous girl, John is a bit cautious and sluggish. However, only John can climb tree and pluck apples. We will see how they achieve their goal.

John and Emily start at point ‘a’ on the mountain and tree is located at point ‘g’. There are two ways they can reach point ‘g’:

John is skeptical about this approach and doesn’t want to walk more if by chance they end up at the wrong point. So Emily comes up with a new approach:

2. ) Emily suggests that she will first go to the next potential points and calculate the value of the loss function at those points. She keeps a note of the value of the function at all those points, and she will signal John to come to that point where the value is least. John loved this approach as he will never end up at a wrong point. However, poor Emily needs to explore all points in her neighborhood and calculate the value of the function at all those points. The beauty of XGBoost is it intelligently tackles both these problems.

Please be advised I may use function/base learner/tree interchangeably.

Many implementations of Graident Boosting follow approach 1 to minimize the objective function. At every iteration, we fit a base learner to the negative gradient of the loss function and multiply our prediction with a constant and add it to the value from the previous iteration.

The intuition is by fitting a base learner to the negative gradient at each iteration is essentially performing gradient descent on the loss function. The negative gradients are often called as pseudo residuals, as they indirectly help us to minimize the objective function.

It was a result of research by Tianqi Chen, a Ph.D. student at the University of Washington. XGBoost is an ensemble additive model that is composed of several base learners.

How should we choose a function at each iteration? we chose a function that minimizes the overall loss.

In the gradient boosting algorithm stated above, we obtained ft(xi) at each iteration by fitting a base learner to the negative gradient of the loss function with respect to previous iteration’s value. In XGBoost, we explore several base learners or functions and pick a function that minimizes the loss (Emily’s second approach). As I stated above, there are two problems with this approach: 1.) exploring different base learners 2.) calculating the value of the loss function for all those base learners.

XGBoost uses Taylor series to approximate the value of the loss function for a base learner ft(xi), thus, reducing the load on Emily to calculate the exact loss for different possible base learners.

It only uses expansion upto second order derivative assuming this level of approximation would suffice. The first term ‘C’ is constant irrespective of any ft(xi) we pick. ‘gi’ is the first order derivative of loss at the previous iteration with respect predictions at previous iteration. ‘hi’ is the second order derivative of loss at previous iteration with respect to predictions at previous iteration. Therefore, Emily can calculate ‘gi’ and ‘fi’ before starting exploring different base learners, as it will be just a matter of multiplications thereafter. Plug and play, isn’t it?

So XGBoost has reduced one of the burdens on Emily. Still, we have a problem of exploring different base learners.

Assume Emily fixed a base learner ft that has ‘K’ leaf nodes. Let Ij be the set of instances belonging to node ‘j’ and ‘wj’ be the prediction for that node. So for an instance ‘i’ belonging to Ij, ft(xi)=wj. Therefore, we obtained equation 3 from equation 2 by substituting ft(xi) with respective predictions. After substituting the predictions, we can take the derivative of the loss function with respect to each leaf node’s weight, to get optimal weight.

Substituting the optimal weights back into equation 3 gives equation 4. However, this is the optimal loss for a fixed tree structure. There will be several hundreds of possible trees.

Let's formulate the current position of Emily. She now knows how to ease the burden of computing loss using Taylor expansion. She knows for a fixed tree structure what should be the optimal weights in the leaf node. The only thing that is still a concern is how to explore all different possible tree structures.

Instead of exploring all possible tree structures, XGBoost greedily builds a tree. The split that results in maximum loss reduction is chosen. In the above picture, the tree starts at Node ‘I’. Based on some split criteria, the node is divided into left and right branches. So some instances fall in the left node and other instances fall in right leaf node. Now, we can calculate the reduction in loss and pick the split that causes the best reduction in loss.

But still, Emily has one more problem. How to chose the split criteria? XGBoost uses different tricks to come up with different split points like histogram, exact, etc. I will not be dealing with that part, as this article has already grown in size.

Add a comment

Related posts:

Signs Your Relationship is Becoming Toxic

Researchers indicate that a whopping 89% of people in relationships indicate that they consider themselves to be living in toxic relationships. As a man, you have the potential to either make your…

Anti Vibration Matting

Anti Vibration Floor Mats reduce vibrations, heat loss, and cold loss from exposed floors by cushioning them. The result is greater comfort when standing. And that makes for happier and more…