In statistics and machine learning, random multinomial logit (RMNL) is a technique for (multi-class) statistical classification using repeated multinomial logit analyses via Leo Breiman's random forests. Rationale for the new method Several learning algorithms have been proposed to handle multiclass classification. While some algorithms are extensions or combinations of intrinsically binary classification methods (e.g., multiclass classifiers as one-versus-one or one-versus-all binary classifiers), other algorithms like multinomial logit (MNL) are specifically designed to map features to a multiclass output vector. MNL’s stability has a proven track record in many disciplines, including transportation research and CRM (customer relationship management). Unfortunately, MNL cannot overcome the curse of dimensionality, thereby implicitly necessitating feature selection, i.e., the selection of a best subset of variables of the input feature set. In contrast to binary logit, to date, software packages mostly lack any feature selection algorithm for MNL. This absence constitutes a problem for several application areas. Recently, random forests, (i.e., a classifier combining a forest of decision trees grown on random input vectors and splitting nodes on a random subset of features) have been introduced for the classification of binary and multiclass outputs. Feature selection is implicitly incorporated during each tree construction. RMNL, a random forest of multinomial logit models, attempts to overcome the feature selection difficulty of MNL. Application The developers of the RMNL technique (Prinzie & Van den Poel, 2008) show in their application paper the usefulness of the technique for cross-sell analysis in customer relationship management.
|