|
1 | 1 | defmodule EXGBoost do
|
2 | 2 | @moduledoc """
|
3 |
| - Elixir bindings for the XGBoost library. `EXGBoost` provides an implementation of XGBoost that works with |
4 |
| - [Nx](https://hexdocs.pm/nx/Nx.html) tensors. |
5 |
| -
|
6 |
| - Xtreme Gradient Boosting (XGBoost) is an optimized distributed gradient |
7 |
| - boosting library designed to be highly efficient, flexible and portable. |
8 |
| - It implements machine learning algorithms under the [Gradient Boosting](https://en.wikipedia.org/wiki/Gradient_boosting) |
9 |
| - framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) |
10 |
| - that solve many data science problems in a fast and accurate way. The same code |
11 |
| - runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond |
12 |
| - billions of examples. |
13 |
| -
|
14 |
| - ## Installation |
15 |
| -
|
16 |
| - ```elixir |
17 |
| - def deps do |
18 |
| - [ |
19 |
| - {:exgboost, "~> 0.5"} |
20 |
| - ] |
21 |
| - end |
22 |
| - ``` |
23 |
| -
|
24 |
| - ## API Data Structures |
25 |
| -
|
26 |
| - EXGBoost's top-level `EXGBoost` API works directly and only with `Nx.Tensor` for data |
27 |
| - representation and with `EXGBoost.Booster` structs as an internal representation. |
28 |
| - Direct manipulation of `EXGBoost.Booster` structs is discouraged. |
29 |
| -
|
30 |
| - ## Basic Usage |
31 |
| -
|
32 |
| - key = Nx.Random.key(42) |
33 |
| - {x, key} = Nx.Random.normal(key, 0, 1, shape: {10, 5}) |
34 |
| - {y, key} = Nx.Random.normal(key, 0, 1, shape: {10}) |
35 |
| - model = EXGBoost.train(x, y) |
36 |
| - EXGBoost.predict(model, x) |
37 |
| -
|
38 |
| - ## Training |
39 |
| -
|
40 |
| - EXGBoost is designed to feel familiar to users of the Python XGBoost library. `EXGBoost.train/2` is the |
41 |
| - primary entry point for training a model. It accepts an Nx tensor for the features and an Nx tensor for the labels. |
42 |
| - `EXGBoost.train/2` returns a trained `EXGBoost.Booster` struct that can be used for prediction. `EXGBoost.train/2` also |
43 |
| - accepts a keyword list of options that can be used to configure the training process. See the |
44 |
| - [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/parameter.html) for the full list of options. |
45 |
| -
|
46 |
| - `EXGBoost.train/2` has the ability for the user to provide a custom training function that will be used to train the model. |
47 |
| - This is done by passing a function to the `:obj` option. See `EXGBoost.Booster.update/4` for more information on this. |
48 |
| -
|
49 |
| - Another feature of `EXGBoost.train/2` is the ability to provide a validation set for early stopping. This is done |
50 |
| - by passing a list of 3-tuples to the `:evals` option. Each 3-tuple should contain an Nx tensor for the features, an Nx tensor |
51 |
| - for the labels, and a string label for the validation set name. The validation set will be used to calculate the validation |
52 |
| - error at each iteration of the training process. If the validation error does not improve for `:early_stopping_rounds` iterations |
53 |
| - then the training process will stop. See the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html) |
54 |
| - for a more detailed explanation of early stopping. |
55 |
| -
|
56 |
| - Early stopping is achieved through the use of callbacks. `EXGBoost.train/2` accepts a list of callbacks that will be called |
57 |
| - at each iteration of the training process. The callbacks can be used to implement custom logic. For example, the user could |
58 |
| - implement a callback that will print the validation error at each iteration of the training process or to provide a custom |
59 |
| - setup function for training. See`EXGBoost.Training.Callback` for more information on callbacks. |
60 |
| -
|
61 |
| - Please notes that callbacks are called in the order that they are provided. If you provide multiple callbacks that modify |
62 |
| - the same parameter then the last callback will trump the previous callbacks. For example, if you provide a callback that |
63 |
| - sets the `:early_stopping_rounds` parameter to 10 and then provide a callback that sets the `:early_stopping_rounds` parameter |
64 |
| - to 20 then the `:early_stopping_rounds` parameter will be set to 20. |
65 |
| -
|
66 |
| - You are also able to pass parameters to be applied to the Booster model using the `:params` option. These parameters will |
67 |
| - be applied to the Booster model before training begins. This allows you to set parameters that are not available as options |
68 |
| - to `EXGBoost.train/2`. See the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/parameter.html) for a full |
69 |
| - list of parameters. |
70 |
| -
|
71 |
| - EXGBoost.train( |
72 |
| - x, |
73 |
| - y, |
74 |
| - obj: :multi_softprob, |
75 |
| - evals: [{x_test, y_test, "test"}], |
76 |
| - learning_rates: fn i -> i / 10 end, |
77 |
| - num_boost_round: 10, |
78 |
| - early_stopping_rounds: 3, |
79 |
| - max_depth: 3, |
80 |
| - eval_metric: [:rmse, :logloss] |
81 |
| - ) |
82 |
| -
|
83 |
| - ## Prediction |
84 |
| -
|
85 |
| - `EXGBoost.predict/2` is the primary entry point for making predictions with a trained model. |
86 |
| - It accepts an `EXGBoost.Booster` struct (which is the output of `EXGBoost.train/2`). |
87 |
| - `EXGBoost.predict/2` returns an Nx tensor containing the predictions and also accepts |
88 |
| - a keyword list of options that can be used to configure the prediction process. |
89 |
| -
|
90 |
| -
|
91 |
| - ```elixir |
92 |
| - preds = EXGBoost.train(X, y) |> EXGBoost.predict(X) |
93 |
| - ``` |
94 |
| -
|
95 |
| - ## Serialization |
96 |
| -
|
97 |
| - A Booster can be serialized to a file using `EXGBoost.write_*` and loaded from a file |
98 |
| - using `EXGBoost.read_*`. The file format can be specified using the `:format` option |
99 |
| - which can be either `:json` or `:ubj`. The default is `:json`. If the file already exists, it will NOT |
100 |
| - be overwritten by default. Boosters can either be serialized to a file or to a binary string. |
101 |
| - Boosters can be serialized in three different ways: configuration only, configuration and model, or |
102 |
| - model only. `dump` functions will serialize the Booster to a binary string. |
103 |
| - Functions named with `weights` will serialize the model's trained parameters only. This is best used when the model |
104 |
| - is already trained and only inferences/predictions are going to be performed. Functions named with `config` will |
105 |
| - serialize the configuration only. Functions that specify `model` will serialize both the model parameters |
106 |
| - and the configuration. |
107 |
| -
|
108 |
| - ### Output Formats |
109 |
| - - `read`/`write` - File. |
110 |
| - - `load`/`dump` - Binary buffer. |
111 |
| -
|
112 |
| - ### Output Contents |
113 |
| - - `config` - Save the configuration only. |
114 |
| - - `weights` - Save the model parameters only. Use this when you want to save the model to a format that can be ingested by other XGBoost APIs. |
115 |
| - - `model` - Save both the model parameters and the configuration. |
116 |
| -
|
117 |
| - ## Plotting |
118 |
| -
|
119 |
| - `EXGBoost.plot_tree/2` is the primary entry point for plotting a tree from a trained model. |
120 |
| - It accepts an `EXGBoost.Booster` struct (which is the output of `EXGBoost.train/2`). |
121 |
| - `EXGBoost.plot_tree/2` returns a VegaLite spec that can be rendered in a notebook or saved to a file. |
122 |
| - `EXGBoost.plot_tree/2` also accepts a keyword list of options that can be used to configure the plotting process. |
123 |
| -
|
124 |
| - See `EXGBoost.Plotting` for more detail on plotting. |
125 |
| -
|
126 |
| - You can see available styles by running `EXGBoost.Plotting.get_styles()` or refer to the `EXGBoost.Plotting.Styles` |
127 |
| - documentation for a gallery of the styles. |
128 |
| -
|
129 |
| - ## Kino & Livebook Integration |
130 |
| -
|
131 |
| - `EXGBoost` integrates with [Kino](https://hexdocs.pm/kino/Kino.html) and [Livebook](https://livebook.dev/) |
132 |
| - to provide a rich interactive experience for data scientists. |
133 |
| -
|
134 |
| - EXGBoost implements the `Kino.Render` protocol for `EXGBoost.Booster` structs. This allows you to render |
135 |
| - a Booster in a Livebook notebook. Under the hood, `EXGBoost` uses [Vega-Lite](https://vega.github.io/vega-lite/) |
136 |
| - and [Kino Vega-Lite](https://hexdocs.pm/kino_vega_lite/Kino.VegaLite.html) to render the Booster. |
137 |
| -
|
138 |
| - See the [`Plotting in EXGBoost`](notebooks/plotting.livemd) Notebook for an example of how to use `EXGBoost` with `Kino` and `Livebook`. |
139 |
| -
|
140 |
| - ## Examples |
141 |
| -
|
142 |
| - See the example Notebooks in the left sidebar (under the `Pages` tab) for more examples and tutorials |
143 |
| - on how to use EXGBoost. |
| 3 | + #{File.cwd!() |> Path.join("README.md") |> File.read!() |> then(&Regex.run(~r/.*<!-- BEGIN MODULEDOC -->(?P<body>.*)<!-- END MODULEDOC -->.*/s, &1, capture: :all_but_first)) |> hd()} |
144 | 4 | """
|
145 | 5 |
|
146 | 6 | alias EXGBoost.ArrayInterface
|
|
0 commit comments