Fuzzy Forests: Extending Random Forests for Correlated, High-Dimensional Data

View Researcher's Other Codes

Disclaimer: The provided code links for this paper are external links. Science Nest has no responsibility for the accuracy, legality or content of these links. Also, by downloading this code(s), you agree to comply with the terms of use as set out by the author(s) of the code(s).

Authors Daniel Conn, Tuck Ngun, Gang Li, Christina M. Ramirez
Journal/Conference Name Journal of Statistical Software
Paper Category
Paper Abstract Author(s): Conn, Daniel; Ngun, Tuck; Li, Gang; Ramirez, Christina | Abstract: In this paper we introduce fuzzy forests, a novel machine learning algorithm for rankingthe importance of features in high-dimensional classication and regression problems.Fuzzy forests is specically designed to provide relatively unbiased rankings of variableimportance in the presence of highly correlated features, especially when p gg n . Weintroduce our implementation of fuzzy forests in the R package, fuzzyforest . Fuzzy forestsworks by taking advantage of the network structure between features. First, the featuresare partitioned into separate modules such that the correlation within modules is highand the correlation between modules is low. The package fuzzyforest allows for easy useof Weighted Gene Coexpression Network Analysis (WGCNA) to form modules of featuressuch that the modules are roughly uncorrelated. Then recursive feature elimination randomforests (RFE-RFs) are used on each module, separately. From the surviving features,a nal group is selected and ranked using one last round of RFE-RFs. This procedureresults in a ranked variable importance list whose size is pre-specied by the user. Theselected features can then be used to construct a predictive model.
Date of publication 2015
Code Programming Language R

Copyright Researcher 2022