Skip to content

Commit 380c1a0

Browse files
committed
cleanup docs
1 parent 86e12b9 commit 380c1a0

File tree

4 files changed

+124
-183
lines changed

4 files changed

+124
-183
lines changed

README.md

Lines changed: 92 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
[![Documentation](https://img.shields.io/badge/-Documentation-blueviolet)](https://hexdocs.pm/exgboost)
2-
31
# EXGBoost
42

3+
[![EXGBoost version](https://img.shields.io/hexpm/v/exgboost.svg)](https://hex.pm/packages/exgboost)
4+
[![Hex Docs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/exgboost/)
5+
[![Hex Downloads](https://img.shields.io/hexpm/dt/exgboost)](https://hex.pm/packages/exgboost)
6+
[![Twitter Follow](https://img.shields.io/twitter/follow/ac_alejos?style=social)](https://twitter.com/ac_alejos)
7+
<!-- BEGIN MODULEDOC -->
58
Elixir bindings to the [XGBoost C API](https://xgboost.readthedocs.io/en/latest/c.html) using [Native Implemented Functions (NIFs)](https://www.erlang.org/doc/man/erl_nif.html).
69

7-
EXGBoost is currently based off of [this](https://github.com/dmlc/xgboost/tree/08ce495b5de973033160e7c7b650abf59346a984) commit for the upcoming `2.0.0` release of XGBoost.
8-
910
`EXGBoost` provides an implementation of XGBoost that works with
1011
[Nx](https://hexdocs.pm/nx/Nx.html) tensors.
1112

@@ -118,26 +119,106 @@ It accepts a `Booster` struct (which is the output of `EXGBoost.train/2`).
118119
preds = EXGBoost.train(X, y) |> EXGBoost.predict(X)
119120
```
120121

122+
## Serialization
123+
124+
A Booster can be serialized to a file using `EXGBoost.write_*` and loaded from a file
125+
using `EXGBoost.read_*`. The file format can be specified using the `:format` option
126+
which can be either `:json` or `:ubj`. The default is `:json`. If the file already exists, it will NOT
127+
be overwritten by default. Boosters can either be serialized to a file or to a binary string.
128+
Boosters can be serialized in three different ways: configuration only, configuration and model, or
129+
model only. `dump` functions will serialize the Booster to a binary string.
130+
Functions named with `weights` will serialize the model's trained parameters only. This is best used when the model
131+
is already trained and only inferences/predictions are going to be performed. Functions named with `config` will
132+
serialize the configuration only. Functions that specify `model` will serialize both the model parameters
133+
and the configuration.
134+
135+
### Output Formats
136+
137+
- `read`/`write` - File.
138+
- `load`/`dump` - Binary buffer.
139+
140+
### Output Contents
141+
142+
- `config` - Save the configuration only.
143+
- `weights` - Save the model parameters only. Use this when you want to save the model to a format that can be ingested by other XGBoost APIs.
144+
- `model` - Save both the model parameters and the configuration.
145+
146+
## Plotting
147+
148+
`EXGBoost.plot_tree/2` is the primary entry point for plotting a tree from a trained model.
149+
It accepts an `EXGBoost.Booster` struct (which is the output of `EXGBoost.train/2`).
150+
`EXGBoost.plot_tree/2` returns a VegaLite spec that can be rendered in a notebook or saved to a file.
151+
`EXGBoost.plot_tree/2` also accepts a keyword list of options that can be used to configure the plotting process.
152+
153+
See `EXGBoost.Plotting` for more detail on plotting.
154+
155+
You can see available styles by running `EXGBoost.Plotting.get_styles()` or refer to the `EXGBoost.Plotting.Styles`
156+
documentation for a gallery of the styles.
157+
158+
## Kino & Livebook Integration
159+
160+
`EXGBoost` integrates with [Kino](https://hexdocs.pm/kino/Kino.html) and [Livebook](https://livebook.dev/)
161+
to provide a rich interactive experience for data scientists.
162+
163+
EXGBoost implements the `Kino.Render` protocol for `EXGBoost.Booster` structs. This allows you to render
164+
a Booster in a Livebook notebook. Under the hood, `EXGBoost` uses [Vega-Lite](https://vega.github.io/vega-lite/)
165+
and [Kino Vega-Lite](https://hexdocs.pm/kino_vega_lite/Kino.VegaLite.html) to render the Booster.
166+
167+
See the [`Plotting in EXGBoost`](notebooks/plotting.livemd) Notebook for an example of how to use `EXGBoost` with `Kino` and `Livebook`.
168+
169+
## Examples
170+
171+
See the example Notebooks in the left sidebar (under the `Pages` tab) for more examples and tutorials
172+
on how to use EXGBoost.
173+
121174
## Requirements
122175

176+
### Precompiled Distribution
177+
178+
We currenly offer the following precompiled packages for EXGBoost:
179+
180+
```elixir
181+
%{
182+
"exgboost-nif-2.16-aarch64-apple-darwin-0.5.0.tar.gz" => "sha256:c659d086d07e9c209bdffbbf982951c6109b2097c4d3008ef9af59c3050663d2",
183+
"exgboost-nif-2.16-x86_64-apple-darwin-0.5.0.tar.gz" => "sha256:05256238700456c57e279558765b54b5b5ed4147878c6861cd4c937472abbe52",
184+
"exgboost-nif-2.16-x86_64-linux-gnu-0.5.0.tar.gz" => "sha256:ad3ba6aba8c3c2821dce4afc05b66a5e529764e0cea092c5a90e826446653d99",
185+
"exgboost-nif-2.17-aarch64-apple-darwin-0.5.0.tar.gz" => "sha256:745e7e970316b569a10d76ceb711b9189360b3bf9ab5ee6133747f4355f45483",
186+
"exgboost-nif-2.17-x86_64-apple-darwin-0.5.0.tar.gz" => "sha256:73948d6f2ef298e3ca3dceeca5d8a36a2d88d842827e1168c64589e4931af8d7",
187+
"exgboost-nif-2.17-x86_64-linux-gnu-0.5.0.tar.gz" => "sha256:a0b5ff0b074a9726c69d632b2dc0214fc7b66dccb4f5879e01255eeb7b9d4282",
188+
}
189+
```
190+
191+
The correct package will be downloaded and installed (if supported) when you install
192+
the dependency through Mix (as shown above), otherwise you will need to compile
193+
manually.
194+
195+
**NOTE** If MacOS, you still need to install `libomp` even to use the precompiled libraries:
196+
197+
`brew install libomp`
198+
199+
### Dev Requirements
200+
123201
If you are contributing to the library and need to compile locally or choose to not use the precompiled libraries, you will need the following:
124202

125-
* Make
126-
* CMake
127-
* If MacOS: `brew install libomp`
203+
- Make
204+
- CMake
205+
- If MacOS: `brew install libomp`
128206

129207
When you run `mix compile`, the `xgboost` shared library will be compiled, so the first time you compile your project will take longer than subsequent compilations.
130208

131209
You also need to set `CC_PRECOMPILER_PRECOMPILE_ONLY_LOCAL=true` before the first local compilation, otherwise you will get an error related to a missing checksum file.
132210

133211
## Known Limitations
134212

135-
The XGBoost C API uses C function pointers to implement streaming data types. The Python ctypes library is able to pass function pointers to the C API which are then executed by XGBoost. Erlang/Elixir NIFs do not have this capability, and as such, streaming data types are not supported in EXGBoost.
136-
213+
- The XGBoost C API uses C function pointers to implement streaming data types. The Python ctypes library is able to pass function pointers to the C API which are then executed by XGBoost. Erlang/Elixir NIFs do not have this capability, and as such, streaming data types are not supported in EXGBoost.
214+
- Currently, EXGBoost only works with tensors from the `Nx.Binarybackend`. If you are using any other backend you will need to perform an `Nx.backend_transfer` or `Nx.backend_copy` before training an `EXGBoost.Booster`. This is because Nx tensors are JSON-encoded and serialized before
215+
being sent to XGBoost and the binary backend is required for proper JSON-encoding of the underlying
216+
tensor.
217+
<!-- END MODULEDOC -->
137218
## Roadmap
138219

139-
* [ ] CUDA support
140-
* [ ] [Collective API](https://xgboost.readthedocs.io/en/latest/c.html#collective)?
220+
- [ ] CUDA support
221+
- [ ] [Collective API](https://xgboost.readthedocs.io/en/latest/c.html#collective)?
141222

142223
## License
143224

lib/exgboost.ex

Lines changed: 1 addition & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -1,146 +1,6 @@
11
defmodule EXGBoost do
22
@moduledoc """
3-
Elixir bindings for the XGBoost library. `EXGBoost` provides an implementation of XGBoost that works with
4-
[Nx](https://hexdocs.pm/nx/Nx.html) tensors.
5-
6-
Xtreme Gradient Boosting (XGBoost) is an optimized distributed gradient
7-
boosting library designed to be highly efficient, flexible and portable.
8-
It implements machine learning algorithms under the [Gradient Boosting](https://en.wikipedia.org/wiki/Gradient_boosting)
9-
framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM)
10-
that solve many data science problems in a fast and accurate way. The same code
11-
runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond
12-
billions of examples.
13-
14-
## Installation
15-
16-
```elixir
17-
def deps do
18-
[
19-
{:exgboost, "~> 0.5"}
20-
]
21-
end
22-
```
23-
24-
## API Data Structures
25-
26-
EXGBoost's top-level `EXGBoost` API works directly and only with `Nx.Tensor` for data
27-
representation and with `EXGBoost.Booster` structs as an internal representation.
28-
Direct manipulation of `EXGBoost.Booster` structs is discouraged.
29-
30-
## Basic Usage
31-
32-
key = Nx.Random.key(42)
33-
{x, key} = Nx.Random.normal(key, 0, 1, shape: {10, 5})
34-
{y, key} = Nx.Random.normal(key, 0, 1, shape: {10})
35-
model = EXGBoost.train(x, y)
36-
EXGBoost.predict(model, x)
37-
38-
## Training
39-
40-
EXGBoost is designed to feel familiar to users of the Python XGBoost library. `EXGBoost.train/2` is the
41-
primary entry point for training a model. It accepts an Nx tensor for the features and an Nx tensor for the labels.
42-
`EXGBoost.train/2` returns a trained `EXGBoost.Booster` struct that can be used for prediction. `EXGBoost.train/2` also
43-
accepts a keyword list of options that can be used to configure the training process. See the
44-
[XGBoost documentation](https://xgboost.readthedocs.io/en/latest/parameter.html) for the full list of options.
45-
46-
`EXGBoost.train/2` has the ability for the user to provide a custom training function that will be used to train the model.
47-
This is done by passing a function to the `:obj` option. See `EXGBoost.Booster.update/4` for more information on this.
48-
49-
Another feature of `EXGBoost.train/2` is the ability to provide a validation set for early stopping. This is done
50-
by passing a list of 3-tuples to the `:evals` option. Each 3-tuple should contain an Nx tensor for the features, an Nx tensor
51-
for the labels, and a string label for the validation set name. The validation set will be used to calculate the validation
52-
error at each iteration of the training process. If the validation error does not improve for `:early_stopping_rounds` iterations
53-
then the training process will stop. See the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html)
54-
for a more detailed explanation of early stopping.
55-
56-
Early stopping is achieved through the use of callbacks. `EXGBoost.train/2` accepts a list of callbacks that will be called
57-
at each iteration of the training process. The callbacks can be used to implement custom logic. For example, the user could
58-
implement a callback that will print the validation error at each iteration of the training process or to provide a custom
59-
setup function for training. See`EXGBoost.Training.Callback` for more information on callbacks.
60-
61-
Please notes that callbacks are called in the order that they are provided. If you provide multiple callbacks that modify
62-
the same parameter then the last callback will trump the previous callbacks. For example, if you provide a callback that
63-
sets the `:early_stopping_rounds` parameter to 10 and then provide a callback that sets the `:early_stopping_rounds` parameter
64-
to 20 then the `:early_stopping_rounds` parameter will be set to 20.
65-
66-
You are also able to pass parameters to be applied to the Booster model using the `:params` option. These parameters will
67-
be applied to the Booster model before training begins. This allows you to set parameters that are not available as options
68-
to `EXGBoost.train/2`. See the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/parameter.html) for a full
69-
list of parameters.
70-
71-
EXGBoost.train(
72-
x,
73-
y,
74-
obj: :multi_softprob,
75-
evals: [{x_test, y_test, "test"}],
76-
learning_rates: fn i -> i / 10 end,
77-
num_boost_round: 10,
78-
early_stopping_rounds: 3,
79-
max_depth: 3,
80-
eval_metric: [:rmse, :logloss]
81-
)
82-
83-
## Prediction
84-
85-
`EXGBoost.predict/2` is the primary entry point for making predictions with a trained model.
86-
It accepts an `EXGBoost.Booster` struct (which is the output of `EXGBoost.train/2`).
87-
`EXGBoost.predict/2` returns an Nx tensor containing the predictions and also accepts
88-
a keyword list of options that can be used to configure the prediction process.
89-
90-
91-
```elixir
92-
preds = EXGBoost.train(X, y) |> EXGBoost.predict(X)
93-
```
94-
95-
## Serialization
96-
97-
A Booster can be serialized to a file using `EXGBoost.write_*` and loaded from a file
98-
using `EXGBoost.read_*`. The file format can be specified using the `:format` option
99-
which can be either `:json` or `:ubj`. The default is `:json`. If the file already exists, it will NOT
100-
be overwritten by default. Boosters can either be serialized to a file or to a binary string.
101-
Boosters can be serialized in three different ways: configuration only, configuration and model, or
102-
model only. `dump` functions will serialize the Booster to a binary string.
103-
Functions named with `weights` will serialize the model's trained parameters only. This is best used when the model
104-
is already trained and only inferences/predictions are going to be performed. Functions named with `config` will
105-
serialize the configuration only. Functions that specify `model` will serialize both the model parameters
106-
and the configuration.
107-
108-
### Output Formats
109-
- `read`/`write` - File.
110-
- `load`/`dump` - Binary buffer.
111-
112-
### Output Contents
113-
- `config` - Save the configuration only.
114-
- `weights` - Save the model parameters only. Use this when you want to save the model to a format that can be ingested by other XGBoost APIs.
115-
- `model` - Save both the model parameters and the configuration.
116-
117-
## Plotting
118-
119-
`EXGBoost.plot_tree/2` is the primary entry point for plotting a tree from a trained model.
120-
It accepts an `EXGBoost.Booster` struct (which is the output of `EXGBoost.train/2`).
121-
`EXGBoost.plot_tree/2` returns a VegaLite spec that can be rendered in a notebook or saved to a file.
122-
`EXGBoost.plot_tree/2` also accepts a keyword list of options that can be used to configure the plotting process.
123-
124-
See `EXGBoost.Plotting` for more detail on plotting.
125-
126-
You can see available styles by running `EXGBoost.Plotting.get_styles()` or refer to the `EXGBoost.Plotting.Styles`
127-
documentation for a gallery of the styles.
128-
129-
## Kino & Livebook Integration
130-
131-
`EXGBoost` integrates with [Kino](https://hexdocs.pm/kino/Kino.html) and [Livebook](https://livebook.dev/)
132-
to provide a rich interactive experience for data scientists.
133-
134-
EXGBoost implements the `Kino.Render` protocol for `EXGBoost.Booster` structs. This allows you to render
135-
a Booster in a Livebook notebook. Under the hood, `EXGBoost` uses [Vega-Lite](https://vega.github.io/vega-lite/)
136-
and [Kino Vega-Lite](https://hexdocs.pm/kino_vega_lite/Kino.VegaLite.html) to render the Booster.
137-
138-
See the [`Plotting in EXGBoost`](notebooks/plotting.livemd) Notebook for an example of how to use `EXGBoost` with `Kino` and `Livebook`.
139-
140-
## Examples
141-
142-
See the example Notebooks in the left sidebar (under the `Pages` tab) for more examples and tutorials
143-
on how to use EXGBoost.
3+
#{File.cwd!() |> Path.join("README.md") |> File.read!() |> then(&Regex.run(~r/.*<!-- BEGIN MODULEDOC -->(?P<body>.*)<!-- END MODULEDOC -->.*/s, &1, capture: :all_but_first)) |> hd()}
1444
"""
1455

1466
alias EXGBoost.ArrayInterface

notebooks/iris_classification.livemd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
```elixir
44
Mix.install([
5-
{:exgboost, "~> 0.4"},
5+
{:exgboost, "~> 0.5"},
66
{:nx, "~> 0.5"},
77
{:scidata, "~> 0.1"},
88
{:scholar, "~> 0.1"}

0 commit comments

Comments
 (0)