Skip to content

Commit 23aadda

Browse files
[R] Drop support for text inputs. (#11026)
--------- Co-authored-by: david-cortes <[email protected]>
1 parent 91a6bb8 commit 23aadda

File tree

2 files changed

+30
-71
lines changed

2 files changed

+30
-71
lines changed

R-package/R/xgb.DMatrix.R

+16-37
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,13 @@
99
#' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the
1010
#' sorted-indices method (`tree_method = "exact"`), nor for the approximate method
1111
#' (`tree_method = "approx"`).
12+
#'
1213
#' @param data Data from which to create a DMatrix, which can then be used for fitting models or
1314
#' for getting predictions out of a fitted model.
1415
#'
15-
#' Supported input types are as follows:\itemize{
16-
#' \item `matrix` objects, with types `numeric`, `integer`, or `logical`.
17-
#' \item `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor`.
16+
#' Supported input types are as follows:
17+
#' - `matrix` objects, with types `numeric`, `integer`, or `logical`.
18+
#' - `data.frame` objects, with columns of types `numeric`, `integer`, `logical`, or `factor`
1819
#'
1920
#' Note that xgboost uses base-0 encoding for categorical types, hence `factor` types (which use base-1
2021
#' encoding') will be converted inside the function call. Be aware that the encoding used for `factor`
@@ -23,33 +24,14 @@
2324
#' was constructed.
2425
#'
2526
#' Other column types are not supported.
26-
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
27-
#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are **not** supported for
28-
#' 'xgb.QuantileDMatrix'.
29-
#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
30-
#' as a single row (only when making predictions from a fitted model).
31-
#' \item Text files in a supported format, passed as a `character` variable containing the URI path to
32-
#' the file, with an optional format specifier.
33-
#'
34-
#' These are **not** supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{
35-
#' \item XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()].
36-
#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
37-
#' `?format=libsvm` at the end of the file path. It will be the default format if not
38-
#' otherwise specified.
39-
#' \item CSV files (comma-separated values). This format can be specified by adding suffix
40-
#' `?format=csv` at the end ofthe file path. It will **not** be auto-deduced from file extensions.
41-
#' }
27+
#' - CSR matrices, as class `dgRMatrix` from package `Matrix`.
28+
#' - CSC matrices, as class `dgCMatrix` from package `Matrix`.
4229
#'
43-
#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
44-
#' it will not look at the extension or file contents to determine that it is a comma-separated value.
45-
#' Instead, the format must be specified following the URI format, so the input to `data` should be passed
46-
#' like this: `"file.csv?format=csv"` (or `"file.csv?format=csv&label_column=0"` if the first column
47-
#' corresponds to the labels).
30+
#' These are **not** supported by `xgb.QuantileDMatrix`.
31+
#' - XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()].
32+
#' - Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
33+
#' as a single row (only when making predictions from a fitted model).
4834
#'
49-
#' For more information about passing text files as input, see the articles
50-
#' \href{https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html}{Text Input Format of DMatrix} and
51-
#' \href{https://xgboost.readthedocs.io/en/stable/python/python_intro.html#python-data-interface}{Data Interface}.
52-
#' }
5335
#' @param label Label of the training data. For classification problems, should be passed encoded as
5436
#' integers with numeration starting at zero.
5537
#' @param weight Weight for each instance.
@@ -95,15 +77,9 @@
9577
#' @param label_lower_bound Lower bound for survival training.
9678
#' @param label_upper_bound Upper bound for survival training.
9779
#' @param feature_weights Set feature weights for column sampling.
98-
#' @param data_split_mode When passing a URI (as R `character`) as input, this signals
99-
#' whether to split by row or column. Allowed values are `"row"` and `"col"`.
100-
#'
101-
#' In distributed mode, the file is split accordingly; otherwise this is only an indicator on
102-
#' how the file was split beforehand. Default to row.
103-
#'
104-
#' This is not used when `data` is not a URI.
105-
#' @return An 'xgb.DMatrix' object. If calling 'xgb.QuantileDMatrix', it will have additional
106-
#' subclass 'xgb.QuantileDMatrix'.
80+
#' @param data_split_mode Not used yet. This parameter is for distributed training, which is not yet available for the R package.
81+
#' @return An 'xgb.DMatrix' object. If calling `xgb.QuantileDMatrix`, it will have additional
82+
#' subclass `xgb.QuantileDMatrix`.
10783
#'
10884
#' @details
10985
#' Note that DMatrix objects are not serializable through R functions such as [saveRDS()] or [save()].
@@ -145,6 +121,9 @@ xgb.DMatrix <- function(
145121
if (!is.null(group) && !is.null(qid)) {
146122
stop("Either one of 'group' or 'qid' should be NULL")
147123
}
124+
if (data_split_mode != "row") {
125+
stop("'data_split_mode' is not supported yet.")
126+
}
148127
nthread <- as.integer(NVL(nthread, -1L))
149128
if (typeof(data) == "character") {
150129
if (length(data) > 1) {

R-package/man/xgb.DMatrix.Rd

+14-34
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)