When a data set contains different categories of data and the number of elements contained in the categories differs greatly, we call this type of data set an unbalanced data set. When classifying imbalanced data sets, it is difficult to correctly classify important minority data using traditional algorithms. How to better classify minority data has become a difficult point.

This class of problems is known as Ranking Problem, and the most popular set of supervised Machine Learning methods that aim to solve them is called "Learning to Rank" (LTR). I am trying out xgBoost that utilizes GBMs to do pairwise ranking. The XGBoost Python API comes with a simple wrapper around its ranking. Imbalanced data classification is an inherently difficult task since there are so few samples to learn from. You should always start with the data first and do your best to collect as many samples as possible and give substantial thought to what features may be relevant so the model can get the most out of your minority class.