|
9 | 9 | #' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the
|
10 | 10 | #' sorted-indices method (`tree_method = "exact"`), nor for the approximate method
|
11 | 11 | #' (`tree_method = "approx"`).
|
| 12 | +#' |
12 | 13 | #' @param data Data from which to create a DMatrix, which can then be used for fitting models or
|
13 | 14 | #' for getting predictions out of a fitted model.
|
14 | 15 | #'
|
15 |
| -#' Supported input types are as follows:\itemize{ |
16 |
| -#' \item `matrix` objects, with types `numeric`, `integer`, or `logical`. |
17 |
| -#' \item `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor`. |
| 16 | +#' Supported input types are as follows: |
| 17 | +#' - `matrix` objects, with types `numeric`, `integer`, or `logical`. |
| 18 | +#' - `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor` |
18 | 19 | #'
|
19 | 20 | #' Note that xgboost uses base-0 encoding for categorical types, hence `factor` types (which use base-1
|
20 | 21 | #' encoding') will be converted inside the function call. Be aware that the encoding used for `factor`
|
|
23 | 24 | #' was constructed.
|
24 | 25 | #'
|
25 | 26 | #' Other column types are not supported.
|
26 |
| -#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`. |
27 |
| -#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are **not** supported for |
28 |
| -#' 'xgb.QuantileDMatrix'. |
29 |
| -#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted |
30 |
| -#' as a single row (only when making predictions from a fitted model). |
31 |
| -#' \item Text files in a supported format, passed as a `character` variable containing the URI path to |
32 |
| -#' the file, with an optional format specifier. |
33 |
| -#' |
34 |
| -#' These are **not** supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{ |
35 |
| -#' \item XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()]. |
36 |
| -#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix |
37 |
| -#' `?format=libsvm` at the end of the file path. It will be the default format if not |
38 |
| -#' otherwise specified. |
39 |
| -#' \item CSV files (comma-separated values). This format can be specified by adding suffix |
40 |
| -#' `?format=csv` at the end ofthe file path. It will **not** be auto-deduced from file extensions. |
41 |
| -#' } |
| 27 | +#' - CSR matrices, as class `dgRMatrix` from package `Matrix`. |
| 28 | +#' - CSC matrices, as class `dgCMatrix` from package `Matrix`. |
42 | 29 | #'
|
43 |
| -#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv', |
44 |
| -#' it will not look at the extension or file contents to determine that it is a comma-separated value. |
45 |
| -#' Instead, the format must be specified following the URI format, so the input to `data` should be passed |
46 |
| -#' like this: `"file.csv?format=csv"` (or `"file.csv?format=csv&label_column=0"` if the first column |
47 |
| -#' corresponds to the labels). |
| 30 | +#' These are **not** supported by `xgb.QuantileDMatrix`. |
| 31 | +#' - XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()]. |
| 32 | +#' - Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted |
| 33 | +#' as a single row (only when making predictions from a fitted model). |
48 | 34 | #'
|
49 |
| -#' For more information about passing text files as input, see the articles |
50 |
| -#' \href{https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html}{Text Input Format of DMatrix} and |
51 |
| -#' \href{https://xgboost.readthedocs.io/en/stable/python/python_intro.html#python-data-interface}{Data Interface}. |
52 |
| -#' } |
53 | 35 | #' @param label Label of the training data. For classification problems, should be passed encoded as
|
54 | 36 | #' integers with numeration starting at zero.
|
55 | 37 | #' @param weight Weight for each instance.
|
|
95 | 77 | #' @param label_lower_bound Lower bound for survival training.
|
96 | 78 | #' @param label_upper_bound Upper bound for survival training.
|
97 | 79 | #' @param feature_weights Set feature weights for column sampling.
|
98 |
| -#' @param data_split_mode When passing a URI (as R `character`) as input, this signals |
99 |
| -#' whether to split by row or column. Allowed values are `"row"` and `"col"`. |
100 |
| -#' |
101 |
| -#' In distributed mode, the file is split accordingly; otherwise this is only an indicator on |
102 |
| -#' how the file was split beforehand. Default to row. |
103 |
| -#' |
104 |
| -#' This is not used when `data` is not a URI. |
105 |
| -#' @return An 'xgb.DMatrix' object. If calling 'xgb.QuantileDMatrix', it will have additional |
106 |
| -#' subclass 'xgb.QuantileDMatrix'. |
| 80 | +#' @param data_split_mode Not used yet. This parameter is for distributed training, which is not yet available for the R package. |
| 81 | +#' @return An 'xgb.DMatrix' object. If calling `xgb.QuantileDMatrix`, it will have additional |
| 82 | +#' subclass `xgb.QuantileDMatrix`. |
107 | 83 | #'
|
108 | 84 | #' @details
|
109 | 85 | #' Note that DMatrix objects are not serializable through R functions such as [saveRDS()] or [save()].
|
@@ -145,6 +121,9 @@ xgb.DMatrix <- function(
|
145 | 121 | if (!is.null(group) && !is.null(qid)) {
|
146 | 122 | stop("Either one of 'group' or 'qid' should be NULL")
|
147 | 123 | }
|
| 124 | + if (data_split_mode != "row") { |
| 125 | + stop("'data_split_mode' is not supported yet.") |
| 126 | + } |
148 | 127 | nthread <- as.integer(NVL(nthread, -1L))
|
149 | 128 | if (typeof(data) == "character") {
|
150 | 129 | if (length(data) > 1) {
|
|
0 commit comments