-
Notifications
You must be signed in to change notification settings - Fork 75
Add sklearn-compatible interface #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,312 @@ | ||||||||||
| # Sklearn-Compatible Interface | ||||||||||
|
|
||||||||||
| This document describes the new simplified sklearn-compatible interface for XGBoostLSS, which addresses the key issues identified in the sktime integration. | ||||||||||
|
|
||||||||||
| ## Overview | ||||||||||
|
|
||||||||||
| The sklearn-compatible interface provides: | ||||||||||
|
|
||||||||||
| - **Simplified workflow**: Standard `fit()`/`predict()` methods instead of complex multi-step process | ||||||||||
| - **Automatic distribution detection**: Intelligent selection based on target characteristics | ||||||||||
| - **sklearn ecosystem compatibility**: Works with pipelines, cross-validation, model selection | ||||||||||
| - **Python 3.12 support**: Updated dependency management and optional dependencies | ||||||||||
| - **Better user experience**: Sensible defaults and intuitive API | ||||||||||
|
|
||||||||||
| ## Quick Start | ||||||||||
|
|
||||||||||
| ### Basic Usage | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from xgboostlss import XGBoostLSSRegressor | ||||||||||
| import numpy as np | ||||||||||
|
|
||||||||||
| # Generate sample data | ||||||||||
| X = np.random.randn(1000, 5) | ||||||||||
| y = np.random.randn(1000) | ||||||||||
|
|
||||||||||
| # Simple 2-step workflow | ||||||||||
| model = XGBoostLSSRegressor() # Auto-detects distribution | ||||||||||
| model.fit(X, y) | ||||||||||
| y_pred = model.predict(X) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### With Specific Distribution | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| # Specify distribution explicitly | ||||||||||
| model = XGBoostLSSRegressor( | ||||||||||
| distribution='gamma', # For positive-valued targets | ||||||||||
| n_estimators=200, | ||||||||||
| learning_rate=0.1 | ||||||||||
| ) | ||||||||||
| model.fit(X, np.abs(y)) # Gamma requires positive values | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Automatic Distribution Detection | ||||||||||
|
|
||||||||||
| The interface can automatically select appropriate distributions based on target characteristics: | ||||||||||
|
|
||||||||||
| | Data Characteristics | Detected Distribution | Use Case | | ||||||||||
| |---------------------|----------------------|----------| | ||||||||||
| | Values in [0, 1] | Beta | Proportions, probabilities | | ||||||||||
| | Positive values, skewed | Gamma | Count data, waiting times | | ||||||||||
| | Heavy tails (high kurtosis) | Student's t | Robust to outliers | | ||||||||||
| | General real values | Gaussian | Default fallback | | ||||||||||
|
|
||||||||||
| ### Example | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from xgboostlss import XGBoostLSSRegressor | ||||||||||
|
|
||||||||||
| model = XGBoostLSSRegressor() | ||||||||||
|
|
||||||||||
| # Beta data (values in [0,1]) | ||||||||||
| y_beta = np.random.beta(2, 2, 1000) | ||||||||||
| model.fit(X, y_beta) # Automatically detects 'beta' distribution | ||||||||||
|
|
||||||||||
| # Gamma data (positive, skewed) | ||||||||||
| y_gamma = np.random.gamma(2, 2, 1000) | ||||||||||
| model.fit(X, y_gamma) # Automatically detects 'gamma' distribution | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## sklearn Ecosystem Integration | ||||||||||
|
|
||||||||||
| ### Pipeline Compatibility | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from sklearn.pipeline import Pipeline | ||||||||||
| from sklearn.preprocessing import StandardScaler | ||||||||||
|
|
||||||||||
| pipe = Pipeline([ | ||||||||||
| ('scaler', StandardScaler()), | ||||||||||
| ('regressor', XGBoostLSSRegressor(distribution='gaussian')) | ||||||||||
| ]) | ||||||||||
|
|
||||||||||
| pipe.fit(X_train, y_train) | ||||||||||
| y_pred = pipe.predict(X_test) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Cross-Validation | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from sklearn.model_selection import cross_val_score | ||||||||||
|
|
||||||||||
| model = XGBoostLSSRegressor(n_estimators=100) | ||||||||||
| scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error') | ||||||||||
| print(f"CV Score: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})") | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Hyperparameter Tuning | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| from sklearn.model_selection import GridSearchCV | ||||||||||
|
|
||||||||||
| param_grid = { | ||||||||||
| 'n_estimators': [50, 100, 200], | ||||||||||
| 'learning_rate': [0.1, 0.3, 0.5], | ||||||||||
| 'max_depth': [3, 6, 9] | ||||||||||
| } | ||||||||||
|
|
||||||||||
| grid_search = GridSearchCV( | ||||||||||
| XGBoostLSSRegressor(), | ||||||||||
| param_grid, | ||||||||||
| cv=3, | ||||||||||
| scoring='neg_mean_squared_error' | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| grid_search.fit(X_train, y_train) | ||||||||||
| print(f"Best params: {grid_search.best_params_}") | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Prediction Types | ||||||||||
|
|
||||||||||
| The interface supports multiple prediction types for uncertainty quantification: | ||||||||||
|
|
||||||||||
| ### Point Predictions (Mean) | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| # Default: returns mean of predictive distribution | ||||||||||
| y_mean = model.predict(X_test) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Quantile Predictions | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| # Get prediction intervals | ||||||||||
| y_quantiles = model.predict(X_test, return_type='quantiles') | ||||||||||
| # Returns 10th, 50th (median), 90th percentiles by default | ||||||||||
|
|
||||||||||
| # Custom quantiles | ||||||||||
| y_custom = model.predict( | ||||||||||
| X_test, | ||||||||||
| return_type='quantiles', | ||||||||||
| quantiles=[0.05, 0.25, 0.5, 0.75, 0.95] | ||||||||||
| ) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Sampling from Predictive Distribution | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| # Sample from posterior predictive distribution | ||||||||||
| y_samples = model.predict(X_test, return_type='samples', n_samples=100) | ||||||||||
| # Returns array of shape (n_test_samples, n_samples) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Probabilistic Interface | ||||||||||
|
|
||||||||||
| ```python | ||||||||||
| # sklearn-style probabilistic predictions | ||||||||||
| y_proba = model.predict_proba(X_test, n_samples=50) | ||||||||||
|
|
||||||||||
| # Dedicated quantile method | ||||||||||
| y_intervals = model.predict_quantiles(X_test, quantiles=[0.1, 0.9]) | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Installation and Dependencies | ||||||||||
|
|
||||||||||
| ### Core Installation (Lightweight) | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| pip install xgboostlss | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Installs only core dependencies: `xgboost`, `scikit-learn`, `numpy`, `pandas`, `scipy`, `tqdm` | ||||||||||
|
|
||||||||||
|
Comment on lines
+173
to
+174
|
||||||||||
| Installs only core dependencies: `xgboost`, `scikit-learn`, `numpy`, `pandas`, `scipy`, `tqdm` | |
| Installs core dependencies: `xgboost`, `scikit-learn`, `numpy`, `pandas`, `scipy`, `tqdm` | |
| **Note:** PyTorch is required for the package to function in most use cases (e.g., for probabilistic distributions). You must install PyTorch manually (see [PyTorch installation instructions](https://pytorch.org/get-started/locally/)) or use the `[torch]` extra as shown below. |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent claim about Python 3.12 compatibility. The documentation claims Python 3.12 support (lines 201-223), but the pyproject.toml specifies requires-python = ">=3.10" (line 10), which is actually more permissive than just 3.12. More importantly, whether PyTorch 2.1.0+ actually supports Python 3.12 needs verification - the documentation should be updated to reflect tested compatibility rather than assumed compatibility based on version constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading documentation. The code example shows using
XGBoostLSSRegressor()with auto distribution detection (lines 19-30), but this will fail without PyTorch installed due to the dependency issue. The "Quick Start" should either include installation instructions with the [torch] extra, or the dependency configuration needs to be fixed to include PyTorch in core dependencies.