The Arborist

The Arborist is a scalable, high-performance implementation of the Random Forest algorithm.  It supports both regression and classification decision tree models, with either numerical or factor data.  Category counts are essentially unlimited.  The Arborist is able to take advanatage of multicore parallelism in both training and testing, and is cluster-friendly.  A version tuned to Nvidia GPUs is also being made available; timings performed on preliminary spins indicate that 50x acceleration is achievable over versions tuned for multicore performance.


The Arborist is designed to be versatile, with provisions for multiple front ends as well as inclusion with standalone software.  The Arborist can also easily be specialized, for higher performance with data sets having fixed characteristics.


A key innovation of the Arborist is the inclusion of common workflows within the software.  Quantile regression, for example, is performed without the need to invoke additional commands.  Similarly, resampling statistics can be computed within a single invocation, a sort of loopless  invocation feature.


The Arborist project is hosted on Github.