Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experiment] ENH: using only raw inputs for onedal backend #2153

Open
wants to merge 88 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
daed528
ENH: using only raw inputs for onedal backend
samir-nasibli Nov 5, 2024
1be2ffb
minor fix
samir-nasibli Nov 5, 2024
a23b677
lin
samir-nasibli Nov 5, 2024
664e140
fix usw_raw_input True/False with dpctl tensor on device
ahuber21 Nov 5, 2024
518dceb
Add hacks to kmeans
ahuber21 Nov 5, 2024
df9d930
Basic statistics online
samir-nasibli Nov 5, 2024
2954913
Merge branch 'enh/raw_inputs' of https://github.com/samir-nasibli/sci…
samir-nasibli Nov 5, 2024
3ef345c
Covariance support
ethanglaser Nov 5, 2024
f1c9233
Merge branch 'enh/raw_inputs' of https://github.com/samir-nasibli/sci…
ethanglaser Nov 5, 2024
66d7b2d
DBSCAN support
samir-nasibli Nov 5, 2024
c5d26a4
Merge branch 'enh/raw_inputs' of https://github.com/samir-nasibli/sci…
samir-nasibli Nov 5, 2024
1350c10
minor fix for dbscan
samir-nasibli Nov 5, 2024
8aaaa70
minor fix for DBSCAN
samir-nasibli Nov 5, 2024
f0d92ae
Apply raw input for batch linear and logistic regression
Alexsandruss Nov 5, 2024
3b58beb
Apply linters
Alexsandruss Nov 5, 2024
d7f2c3c
fix for DBSCAN
samir-nasibli Nov 5, 2024
1aca420
support for Random Forest
samir-nasibli Nov 5, 2024
362930a
PCA support (batch)
ethanglaser Nov 5, 2024
bc37391
Merge branch 'enh/raw_inputs' of https://github.com/samir-nasibli/sci…
ethanglaser Nov 5, 2024
102dcae
minor fix for dbscan and rf
samir-nasibli Nov 5, 2024
6edab5b
fully fixed DBSCAN
samir-nasibli Nov 6, 2024
e153a28
Add Incremental Linear Regression
Alexsandruss Nov 6, 2024
37d32c9
Linting
Alexsandruss Nov 6, 2024
71c5135
add modification to knn
ahuber21 Nov 6, 2024
db9f021
minor update for RF
samir-nasibli Nov 6, 2024
bc353da
fix for RandomForestClassifier
samir-nasibli Nov 7, 2024
e873205
minor for RF
samir-nasibli Nov 7, 2024
fe3222a
Update online algos
olegkkruglov Nov 7, 2024
5b3ad17
Merge branch 'enh/raw_inputs' of https://github.com/samir-nasibli/sci…
samir-nasibli Nov 7, 2024
eaaab32
fix for RF regressor
samir-nasibli Nov 7, 2024
a7f0c2d
fix workaround for knn
ahuber21 Nov 7, 2024
d9a2966
kmeans predict support
ethanglaser Nov 12, 2024
3562c69
Merge remote-tracking branch 'origin/main' into enh/raw_inputs
ahuber21 Dec 16, 2024
42c3614
fix merge errors
ahuber21 Dec 16, 2024
53bcc7b
fix some tests
ahuber21 Dec 17, 2024
9964c5a
fixup
ahuber21 Dec 17, 2024
84afb62
undo more changes that broke tests
ahuber21 Dec 17, 2024
cf5b736
format
ahuber21 Dec 17, 2024
92393b9
restore original behavior when running without raw inputs
ahuber21 Dec 18, 2024
13471e5
restore original behavior when running without raw inputs
ahuber21 Dec 18, 2024
a8f3f19
align code
ahuber21 Dec 18, 2024
2b07c00
restore original from_table
ahuber21 Dec 19, 2024
6104736
add use_raw_input tests for incremental covariance
ahuber21 Dec 19, 2024
df03233
Add basic statistics testing
ahuber21 Dec 19, 2024
8a166b7
add incremental basic statistics
ahuber21 Dec 19, 2024
fb5f5fa
add dbscan
ahuber21 Dec 19, 2024
7072041
Merge remote-tracking branch 'origin/main' into dev/ahuber/raw-inputs…
ahuber21 Dec 19, 2024
91384ed
add kmeans
ahuber21 Dec 20, 2024
6dec57d
add covariance
ahuber21 Dec 20, 2024
529a7b8
align get_config() import and use_raw_input retrieval
ahuber21 Dec 20, 2024
9f78cbd
add incremental_pca
ahuber21 Dec 20, 2024
658ccc1
add pca
ahuber21 Dec 20, 2024
5e74a54
add incremental linear
ahuber21 Dec 20, 2024
dfbf223
add linear_model
ahuber21 Dec 22, 2024
c4094fb
Merge branch 'dev/ahuber/raw-inputs-dispatching' into enh/raw_inputs
ahuber21 Dec 22, 2024
bb5206f
raw inputs updates for functional forest predict
ethanglaser Jan 9, 2025
8211a23
fixes for logreg predict_proba, knnreg, inc cov, inc pca
ethanglaser Jan 18, 2025
e3425bf
dbscan + inc linreg changes
ethanglaser Jan 20, 2025
0630bc1
Merge 'upstream/main' into enh/raw_inputs
ethanglaser Jan 20, 2025
52ba18a
black
ethanglaser Jan 20, 2025
90b7175
temporary for CI
ethanglaser Jan 21, 2025
f4d18cd
isorted
ethanglaser Jan 21, 2025
d84a559
tuple indices safeguarding
ethanglaser Jan 22, 2025
2daeeb7
incremental bs fit fixes
ethanglaser Jan 22, 2025
fb3d0bc
dbscan CI fixes
ethanglaser Jan 22, 2025
a7bd2cd
use xp to take samples to avoid data copying
ahuber21 Jan 29, 2025
d64c6fe
align setting of use_raw_input
ethanglaser Feb 4, 2025
7dbf8df
Merge remote-tracking branch 'upstream/main' into enh/raw_inputs
ethanglaser Feb 13, 2025
4e0ec33
isort
ethanglaser Feb 13, 2025
a8c9fe0
unify and clean up onedal4py changes, remove raw ridge
ethanglaser Feb 13, 2025
83342f7
remove unnecessary sklearnex raw inputs, move tests to spmd
ethanglaser Feb 14, 2025
87746b2
minor followup
ethanglaser Feb 14, 2025
12e0f08
cleanup of remaining sklearnex changes
ethanglaser Feb 14, 2025
bb9bdec
oops
ethanglaser Feb 15, 2025
8b7725f
Merge remote-tracking branch 'upstream/main' into enh/raw_inputs
ethanglaser Feb 17, 2025
b84c129
switch to config_context only on spmd estimator in tests
ethanglaser Feb 18, 2025
1249d44
CI fixes for PCA and linreg
ethanglaser Feb 19, 2025
d77a6b2
logistic regression CI fixes
ethanglaser Feb 19, 2025
5f28a69
fix forest CI and unify logreg and forest n_classes_
ethanglaser Feb 20, 2025
7af5977
switch back to dpep for dpnp tests
ethanglaser Feb 20, 2025
e155695
forest cleanup and accuracy issue resolution
ethanglaser Feb 25, 2025
17d152f
add back astype conditions
ethanglaser Feb 25, 2025
e3b09cc
more attempted ci fixes (logreg np and forest)
ethanglaser Feb 25, 2025
5848248
format
ethanglaser Feb 25, 2025
6bdd227
remove unnecessary online attributes
ethanglaser Feb 25, 2025
3a83708
knnreg workaround
ethanglaser Feb 25, 2025
aba7e40
cleanup
ethanglaser Feb 26, 2025
f7f9fe8
add queue setting to rf predict_proba
ethanglaser Feb 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply raw input for batch linear and logistic regression
Alexsandruss committed Nov 5, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit f0d92aecce0bc08688675ce48a10122b05c2b585
87 changes: 56 additions & 31 deletions onedal/linear_model/linear_model.py
Original file line number Diff line number Diff line change
@@ -21,11 +21,13 @@

from daal4py.sklearn._utils import daal_check_version, get_dtype, make2d

from .._config import _get_config
from ..common._base import BaseEstimator
from ..common._estimator_checks import _check_is_fitted
from ..common.hyperparameters import get_hyperparameters
from ..datatypes import _convert_to_supported, from_table, to_table
from ..utils import _check_array, _check_n_features, _check_X_y, _num_features
from ..utils._array_api import _get_sycl_namespace


class BaseLinearRegression(BaseEstimator, metaclass=ABCMeta):
@@ -119,28 +121,35 @@ def predict(self, X, queue=None):

_check_is_fitted(self)

sua_iface, xp, _ = _get_sycl_namespace(X)
if xp is None:
xp = np
use_raw_input = _get_config().get("use_raw_input") is True

policy = self._get_policy(queue, X)

X = _check_array(
X, dtype=[np.float64, np.float32], force_all_finite=False, ensure_2d=False
)
if not use_raw_input:
X = _check_array(
X, dtype=[np.float64, np.float32], force_all_finite=False, ensure_2d=False
)
X = make2d(X)

_check_n_features(self, X, False)

if hasattr(self, "_onedal_model"):
model = self._onedal_model
else:
model = self._create_model(policy)

X = make2d(X)
X = _convert_to_supported(policy, X)
params = self._get_onedal_params(get_dtype(X))

X_table = to_table(X)
X_table = to_table(X, sua_iface=sua_iface)
result = module.infer(policy, params, model, X_table)
y = from_table(result.responses)
y = from_table(result.responses, sua_iface=sua_iface, sycl_queue=queue, xp=xp)

if y.shape[1] == 1 and self.coef_.ndim == 1:
return y.ravel()
return xp.reshape(y, (-1,))
else:
return y

@@ -194,26 +203,32 @@ def fit(self, X, y, queue=None):
"""
module = self._get_backend("linear_model", "regression")

# TODO Fix _check_X_y to make sure this conversion is there
if not isinstance(X, np.ndarray):
X = np.asarray(X)
sua_iface, xp, _ = _get_sycl_namespace(X)
if xp is None:
xp = np
use_raw_input = _get_config().get("use_raw_input") is True

if not use_raw_input:
# TODO Fix _check_X_y to make sure this conversion is there
if not isinstance(X, np.ndarray):
X = np.asarray(X)

dtype = get_dtype(X)
if dtype not in [np.float32, np.float64]:
dtype = np.float64
X = X.astype(dtype, copy=self.copy_X)
dtype = get_dtype(X)
if dtype not in [np.float32, np.float64]:
dtype = np.float64
X = X.astype(dtype, copy=self.copy_X)

y = np.asarray(y).astype(dtype=dtype)
y = np.asarray(y).astype(dtype=dtype)

X, y = _check_X_y(X, y, force_all_finite=False, accept_2d_y=True)
X, y = _check_X_y(X, y, force_all_finite=False, accept_2d_y=True)

policy = self._get_policy(queue, X, y)

self.n_features_in_ = _num_features(X, fallback_1d=True)

X, y = _convert_to_supported(policy, X, y)
params = self._get_onedal_params(get_dtype(X))
X_table, y_table = to_table(X, y)
X_table, y_table = to_table(X, y, sua_iface=sua_iface)

hparams = get_hyperparameters("linear_regression", "train")
if hparams is not None and not hparams.is_default:
@@ -223,14 +238,16 @@ def fit(self, X, y, queue=None):

self._onedal_model = result.model

packed_coefficients = from_table(result.model.packed_coefficients)
packed_coefficients = from_table(
result.model.packed_coefficients, sua_iface=sua_iface, sycl_queue=queue, xp=xp
)
self.coef_, self.intercept_ = (
packed_coefficients[:, 1:],
packed_coefficients[:, 0],
)

if self.coef_.shape[0] == 1 and y.ndim == 1:
self.coef_ = self.coef_.ravel()
self.coef_ = xp.reshape(self.coef_, (-1,))
self.intercept_ = self.intercept_[0]

return self
@@ -293,37 +310,45 @@ def fit(self, X, y, queue=None):
"""
module = self._get_backend("linear_model", "regression")

X = _check_array(
X,
dtype=[np.float64, np.float32],
force_all_finite=False,
ensure_2d=False,
copy=self.copy_X,
)
sua_iface, xp, _ = _get_sycl_namespace(X)
if xp is None:
xp = np
use_raw_input = _get_config().get("use_raw_input") is True

y = np.asarray(y).astype(dtype=get_dtype(X))
if not use_raw_input:
X = _check_array(
X,
dtype=[np.float64, np.float32],
force_all_finite=False,
ensure_2d=False,
copy=self.copy_X,
)

X, y = _check_X_y(X, y, force_all_finite=False, accept_2d_y=True)
y = np.asarray(y).astype(dtype=get_dtype(X))

X, y = _check_X_y(X, y, force_all_finite=False, accept_2d_y=True)

policy = self._get_policy(queue, X, y)

self.n_features_in_ = _num_features(X, fallback_1d=True)

X, y = _convert_to_supported(policy, X, y)
params = self._get_onedal_params(get_dtype(X))
X_table, y_table = to_table(X, y)
X_table, y_table = to_table(X, y, sua_iface=sua_iface)

result = module.train(policy, params, X_table, y_table)
self._onedal_model = result.model

packed_coefficients = from_table(result.model.packed_coefficients)
packed_coefficients = from_table(
result.model.packed_coefficients, sua_iface=sua_iface, sycl_queue=queue, xp=xp
)
self.coef_, self.intercept_ = (
packed_coefficients[:, 1:],
packed_coefficients[:, 0],
)

if self.coef_.shape[0] == 1 and y.ndim == 1:
self.coef_ = self.coef_.ravel()
self.coef_ = xp.reshape(self.coef_, (-1,))
self.intercept_ = self.intercept_[0]

return self
109 changes: 73 additions & 36 deletions onedal/linear_model/logistic_regression.py
Original file line number Diff line number Diff line change
@@ -21,6 +21,7 @@

from daal4py.sklearn._utils import daal_check_version, get_dtype, make2d

from .._config import _get_config
from ..common._base import BaseEstimator as onedal_BaseEstimator
from ..common._estimator_checks import _check_is_fitted
from ..common._mixin import ClassifierMixin
@@ -33,6 +34,8 @@
_num_features,
_type_of_target,
)
from ..utils._array_api import _get_sycl_namespace
from ..utils._dpep_helpers import get_unique_values_with_dpep


class BaseLogisticRegression(onedal_BaseEstimator, metaclass=ABCMeta):
@@ -63,29 +66,38 @@ def _get_onedal_params(self, is_csr, dtype=np.float32):
}

def _fit(self, X, y, module, queue):
use_raw_input = _get_config().get("use_raw_input") is True
if use_raw_input and _get_sycl_namespace(X)[0] is not None:
queue = X.sycl_queue

sparsity_enabled = daal_check_version((2024, "P", 700))
X, y = _check_X_y(
X,
y,
accept_sparse=sparsity_enabled,
force_all_finite=True,
accept_2d_y=False,
dtype=[np.float64, np.float32],
)
is_csr = _is_csr(X)
if not use_raw_input:
X, y = _check_X_y(
X,
y,
accept_sparse=sparsity_enabled,
force_all_finite=True,
accept_2d_y=False,
dtype=[np.float64, np.float32],
)
if _type_of_target(y) != "binary":
raise ValueError("Only binary classification is supported")

self.classes_, y = np.unique(y, return_inverse=True)
y = y.astype(dtype=np.int32)
else:
self.classes_ = get_unique_values_with_dpep(y)
n_classes = len(self.classes_)
if n_classes != 2:
raise ValueError("Only binary classification is supported")

self.n_features_in_ = _num_features(X, fallback_1d=True)

if _type_of_target(y) != "binary":
raise ValueError("Only binary classification is supported")

self.classes_, y = np.unique(y, return_inverse=True)
y = y.astype(dtype=np.int32)

is_csr = _is_csr(X)
policy = self._get_policy(queue, X, y)
X, y = _convert_to_supported(policy, X, y)
params = self._get_onedal_params(is_csr, get_dtype(X))
X_table, y_table = to_table(X, y)
sua_iface = _get_sycl_namespace(X, y)[0]
X_table, y_table = to_table(X, y, sua_iface=sua_iface)

result = module.train(policy, params, X_table, y_table)

@@ -152,22 +164,29 @@ def _create_model(self, module, policy):

return m

def _infer(self, X, module, queue):
def _infer(self, X, module, queue, sua_iface):
_check_is_fitted(self)

use_raw_input = _get_config().get("use_raw_input") is True
if use_raw_input and _get_sycl_namespace(X)[0] is not None:
queue = X.sycl_queue

sparsity_enabled = daal_check_version((2024, "P", 700))

X = _check_array(
X,
dtype=[np.float64, np.float32],
accept_sparse=sparsity_enabled,
force_all_finite=True,
ensure_2d=False,
accept_large_sparse=sparsity_enabled,
)
is_csr = _is_csr(X)
if not use_raw_input:
X = _check_array(
X,
dtype=[np.float64, np.float32],
accept_sparse=sparsity_enabled,
force_all_finite=True,
ensure_2d=False,
accept_large_sparse=sparsity_enabled,
)
X = make2d(X)

_check_n_features(self, X, False)
is_csr = _is_csr(X)

X = make2d(X)
policy = self._get_policy(queue, X)

if hasattr(self, "_onedal_model"):
@@ -178,26 +197,44 @@ def _infer(self, X, module, queue):
X = _convert_to_supported(policy, X)
params = self._get_onedal_params(is_csr, get_dtype(X))

X_table = to_table(X)
X_table = to_table(X, sua_iface=sua_iface)

result = module.infer(policy, params, model, X_table)
return result

def _predict(self, X, module, queue):
result = self._infer(X, module, queue)
y = from_table(result.responses)
y = np.take(self.classes_, y.ravel(), axis=0)
use_raw_input = _get_config().get("use_raw_input") is True
sua_iface, xp, _ = _get_sycl_namespace(X)
if xp is None:
xp = np
if use_raw_input and sua_iface is not None:
queue = X.sycl_queue

result = self._infer(X, module, queue, sua_iface)
y = from_table(result.responses, sua_iface=sua_iface, sycl_queue=queue, xp=xp)
y = xp.take(xp.asarray(self.classes_), xp.reshape(y, (-1,)), axis=0)
return y

def _predict_proba(self, X, module, queue):
result = self._infer(X, module, queue)
use_raw_input = _get_config().get("use_raw_input") is True
sua_iface, xp, _ = _get_sycl_namespace(X)
if xp is None:
xp = np
if use_raw_input and sua_iface is not None:
queue = X.sycl_queue

result = self._infer(X, module, queue, sua_iface)

y = from_table(result.probabilities)
y = from_table(result.probabilities, sua_iface=sua_iface, sycl_queue=queue, xp=xp)
y = y.reshape(-1, 1)
return np.hstack([1 - y, y])
return xp.hstack([1 - y, y])

def _predict_log_proba(self, X, module, queue):
_, xp, _ = _get_sycl_namespace(X)
if xp is None:
xp = np
y_proba = self._predict_proba(X, module, queue)
return np.log(y_proba)
return xp.log(y_proba)


class LogisticRegression(ClassifierMixin, BaseLogisticRegression):
15 changes: 15 additions & 0 deletions onedal/utils/_dpep_helpers.py
Original file line number Diff line number Diff line change
@@ -54,3 +54,18 @@ def is_dpnp_available(version=None):

dpctl_available = is_dpctl_available()
dpnp_available = is_dpnp_available()


if dpnp_available:
import dpnp
if dpctl_available:
import dpctl.tensor as dpt


def get_unique_values_with_dpep(X):
if dpnp_available:
return dpnp.unique(X)
elif dpctl_available:
return dpt.unique_values(X)
else:
raise RuntimeError("No DPEP package available to provide `unique` function.")