How does random forest work?

RF is a powerful non-parametric classification method and can be used for both classification and important variable selection (Breiman, 2001). RF uses an ensemble of classification trees, each of which is grown by random feature selection using bootstrap sampling from the original sample set. Class prediction is based on the majority vote of the ensemble. By default, 500 trees are used to build the RF classifier. During tree construction, about one-third of the instances are left out of the bootstrap sampling process. These “left-out” data are then used as test data to obtain an unbiased estimate of the classification error, known as the ‘out-of-bag’ (OOB) error.

RF is a robust (not susceptible to overfitting issue) and versatile (regression, classification, dealing with missing values and interactions) method. The main drawback associated with RF is the model & results are hard to interpret.

For a more detailed description about random forests, click here.