Skip to content

mlondschien/biosphere

Repository files navigation

Biosphere

Simple, fast random forests.

Random forests with a runtime of O(n d log(n) + n_estimators d n max_depth) instead of O(n_estimators mtry n log(n) max_depth).

biosphere is available as a rust crate and as a Python package.

Benchmarks

Ran on an M1 Pro with n_jobs=4. Wall-time to fit a Random Forest including OOB score with 400 trees to the NYC Taxi dataset, minimum over 10 runs. After feature engineering, the dataset consists of 5 numerical and 7 one-hot encoded features.

model 1000 2000 4000 8000 16000 32000 64000 128000 256000 512000 1024000 2048000
biosphere 0.04s 0.08s 0.15s 0.32s 0.65s 1.40s 2.97s 6.48s 15.53s 37.91s 96.69s 231.82s
scikit-learn 0.28s 0.34s 0.46s 0.69s 1.23s 2.47s 4.99s 10.49s 22.11s 51.04s 118.95s 271.03s