Skip to content

Commit 942042f

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 4b64e3fb64cf4885cc6d552fabea9cf642682963
1 parent 6636174 commit 942042f

File tree

1,304 files changed

+5731
-5731
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,304 files changed

+5731
-5731
lines changed

dev/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: affa80f8cb8d5c1c5e087aa5055c7767
3+
config: 22332a40d68b565d5e7e7c04d5291efe
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.

dev/_downloads/7012baed63b9a27f121bae611b8285c2/plot_cyclical_feature_engineering.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@
275275
"cell_type": "markdown",
276276
"metadata": {},
277277
"source": [
278-
"All is well. We are now ready to do some predictive modeling!\n\n## Gradient Boosting\n\nGradient Boosting Regression with decision trees is often flexible enough to\nefficiently handle heteorogenous tabular data with a mix of categorical and\nnumerical features as long as the number of samples is large enough.\n\nHere, we use the modern\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with native support\nfor categorical features. Therefore, we only do minimal ordinal encoding for\nthe categorical variables and then\nlet the model know that it should treat those as categorical variables by\nusing a dedicated tree splitting rule. Since we use an ordinal encoder, we\npass the list of categorical values explicitly to use a logical order when\nencoding the categories as integers instead of the lexicographical order.\nThis also has the added benefit of preventing any issue with unknown\ncategories when using cross-validation.\n\nThe numerical variables need no preprocessing and, for the sake of simplicity,\nwe only try the default hyper-parameters for this model:\n\n"
278+
"All is well. We are now ready to do some predictive modeling!\n\n## Gradient Boosting\n\nGradient Boosting Regression with decision trees is often flexible enough to\nefficiently handle heterogenous tabular data with a mix of categorical and\nnumerical features as long as the number of samples is large enough.\n\nHere, we use the modern\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with native support\nfor categorical features. Therefore, we only do minimal ordinal encoding for\nthe categorical variables and then\nlet the model know that it should treat those as categorical variables by\nusing a dedicated tree splitting rule. Since we use an ordinal encoder, we\npass the list of categorical values explicitly to use a logical order when\nencoding the categories as integers instead of the lexicographical order.\nThis also has the added benefit of preventing any issue with unknown\ncategories when using cross-validation.\n\nThe numerical variables need no preprocessing and, for the sake of simplicity,\nwe only try the default hyper-parameters for this model:\n\n"
279279
]
280280
},
281281
{
@@ -682,7 +682,7 @@
682682
"cell_type": "markdown",
683683
"metadata": {},
684684
"source": [
685-
"First, note that trees can naturally model non-linear feature interactions\nsince, by default, decision trees are allowed to grow beyond a depth of 2\nlevels.\n\nHere, we can observe that the combinations of spline features and non-linear\nkernels works quite well and can almost rival the accuracy of the gradient\nboosting regression trees.\n\nOn the contrary, one-hot encoded time features do not perform that well with\nthe low rank kernel model. In particular, they significantly over-estimate\nthe low demand hours more than the competing models.\n\nWe also observe that none of the models can successfully predict some of the\npeak rentals at the rush hours during the working days. It is possible that\naccess to additional features would be required to further improve the\naccuracy of the predictions. For instance, it could be useful to have access\nto the geographical repartition of the fleet at any point in time or the\nfraction of bikes that are immobilized because they need servicing.\n\nLet us finally get a more quantative look at the prediction errors of those\nthree models using the true vs predicted demand scatter plots:\n\n"
685+
"First, note that trees can naturally model non-linear feature interactions\nsince, by default, decision trees are allowed to grow beyond a depth of 2\nlevels.\n\nHere, we can observe that the combinations of spline features and non-linear\nkernels works quite well and can almost rival the accuracy of the gradient\nboosting regression trees.\n\nOn the contrary, one-hot encoded time features do not perform that well with\nthe low rank kernel model. In particular, they significantly over-estimate\nthe low demand hours more than the competing models.\n\nWe also observe that none of the models can successfully predict some of the\npeak rentals at the rush hours during the working days. It is possible that\naccess to additional features would be required to further improve the\naccuracy of the predictions. For instance, it could be useful to have access\nto the geographical repartition of the fleet at any point in time or the\nfraction of bikes that are immobilized because they need servicing.\n\nLet us finally get a more quantitative look at the prediction errors of those\nthree models using the true vs predicted demand scatter plots:\n\n"
686686
]
687687
},
688688
{

dev/_downloads/9fcbbc59ab27a20d07e209a711ac4f50/plot_cyclical_feature_engineering.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@
167167
# -----------------
168168
#
169169
# Gradient Boosting Regression with decision trees is often flexible enough to
170-
# efficiently handle heteorogenous tabular data with a mix of categorical and
170+
# efficiently handle heterogenous tabular data with a mix of categorical and
171171
# numerical features as long as the number of samples is large enough.
172172
#
173173
# Here, we use the modern
@@ -795,7 +795,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
795795
# to the geographical repartition of the fleet at any point in time or the
796796
# fraction of bikes that are immobilized because they need servicing.
797797
#
798-
# Let us finally get a more quantative look at the prediction errors of those
798+
# Let us finally get a more quantitative look at the prediction errors of those
799799
# three models using the true vs predicted demand scatter plots:
800800
from sklearn.metrics import PredictionErrorDisplay
801801

dev/_downloads/scikit-learn-docs.zip

-15 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)