scikit-learn
diff --git a/‎dev/.buildinfo‎
Lines changed: 1 addition & 1 deletion b/‎dev/.buildinfo‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip‎
1 Byte b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip‎
1 Byte
diff --git a/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip‎
1 Byte b/‎dev/_downloads/6f1e7a639e0699d6164445b55e6c116d/auto_examples_jupyter.zip‎
1 Byte
diff --git a/‎dev/_downloads/7012baed63b9a27f121bae611b8285c2/plot_cyclical_feature_engineering.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/7012baed63b9a27f121bae611b8285c2/plot_cyclical_feature_engineering.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/9fcbbc59ab27a20d07e209a711ac4f50/plot_cyclical_feature_engineering.py‎
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/9fcbbc59ab27a20d07e209a711ac4f50/plot_cyclical_feature_engineering.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/scikit-learn-docs.zip‎
-15 Bytes b/‎dev/_downloads/scikit-learn-docs.zip‎
-15 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png‎
6 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_001.png‎
6 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png‎
209 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_002.png‎
209 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png‎
-181 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_003.png‎
-181 Bytes
diff --git a/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png‎
279 Bytes b/‎dev/_images/sphx_glr_plot_agglomerative_clustering_004.png‎
279 Bytes
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: affa80f8cb8d5c1c5e087aa5055c7767
+config: 22332a40d68b565d5e7e7c04d5291efe
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -275,7 +275,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "All is well. We are now ready to do some predictive modeling!\n\n## Gradient Boosting\n\nGradient Boosting Regression with decision trees is often flexible enough to\nefficiently handle heteorogenous tabular data with a mix of categorical and\nnumerical features as long as the number of samples is large enough.\n\nHere, we use the modern\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with native support\nfor categorical features. Therefore, we only do minimal ordinal encoding for\nthe categorical variables and then\nlet the model know that it should treat those as categorical variables by\nusing a dedicated tree splitting rule. Since we use an ordinal encoder, we\npass the list of categorical values explicitly to use a logical order when\nencoding the categories as integers instead of the lexicographical order.\nThis also has the added benefit of preventing any issue with unknown\ncategories when using cross-validation.\n\nThe numerical variables need no preprocessing and, for the sake of simplicity,\nwe only try the default hyper-parameters for this model:\n\n"
+        "All is well. We are now ready to do some predictive modeling!\n\n## Gradient Boosting\n\nGradient Boosting Regression with decision trees is often flexible enough to\nefficiently handle heterogenous tabular data with a mix of categorical and\nnumerical features as long as the number of samples is large enough.\n\nHere, we use the modern\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with native support\nfor categorical features. Therefore, we only do minimal ordinal encoding for\nthe categorical variables and then\nlet the model know that it should treat those as categorical variables by\nusing a dedicated tree splitting rule. Since we use an ordinal encoder, we\npass the list of categorical values explicitly to use a logical order when\nencoding the categories as integers instead of the lexicographical order.\nThis also has the added benefit of preventing any issue with unknown\ncategories when using cross-validation.\n\nThe numerical variables need no preprocessing and, for the sake of simplicity,\nwe only try the default hyper-parameters for this model:\n\n"
       ]
     },
     {
@@ -682,7 +682,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "First, note that trees can naturally model non-linear feature interactions\nsince, by default, decision trees are allowed to grow beyond a depth of 2\nlevels.\n\nHere, we can observe that the combinations of spline features and non-linear\nkernels works quite well and can almost rival the accuracy of the gradient\nboosting regression trees.\n\nOn the contrary, one-hot encoded time features do not perform that well with\nthe low rank kernel model. In particular, they significantly over-estimate\nthe low demand hours more than the competing models.\n\nWe also observe that none of the models can successfully predict some of the\npeak rentals at the rush hours during the working days. It is possible that\naccess to additional features would be required to further improve the\naccuracy of the predictions. For instance, it could be useful to have access\nto the geographical repartition of the fleet at any point in time or the\nfraction of bikes that are immobilized because they need servicing.\n\nLet us finally get a more quantative look at the prediction errors of those\nthree models using the true vs predicted demand scatter plots:\n\n"
+        "First, note that trees can naturally model non-linear feature interactions\nsince, by default, decision trees are allowed to grow beyond a depth of 2\nlevels.\n\nHere, we can observe that the combinations of spline features and non-linear\nkernels works quite well and can almost rival the accuracy of the gradient\nboosting regression trees.\n\nOn the contrary, one-hot encoded time features do not perform that well with\nthe low rank kernel model. In particular, they significantly over-estimate\nthe low demand hours more than the competing models.\n\nWe also observe that none of the models can successfully predict some of the\npeak rentals at the rush hours during the working days. It is possible that\naccess to additional features would be required to further improve the\naccuracy of the predictions. For instance, it could be useful to have access\nto the geographical repartition of the fleet at any point in time or the\nfraction of bikes that are immobilized because they need servicing.\n\nLet us finally get a more quantitative look at the prediction errors of those\nthree models using the true vs predicted demand scatter plots:\n\n"
       ]
     },
     {
 
@@ -167,7 +167,7 @@
 # -----------------
 #
 # Gradient Boosting Regression with decision trees is often flexible enough to
-# efficiently handle heteorogenous tabular data with a mix of categorical and
+# efficiently handle heterogenous tabular data with a mix of categorical and
 # numerical features as long as the number of samples is large enough.
 #
 # Here, we use the modern
@@ -795,7 +795,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
 # to the geographical repartition of the fleet at any point in time or the
 # fraction of bikes that are immobilized because they need servicing.
 #
-# Let us finally get a more quantative look at the prediction errors of those
+# Let us finally get a more quantitative look at the prediction errors of those
 # three models using the true vs predicted demand scatter plots:
 from sklearn.metrics import PredictionErrorDisplay
Original file line number	Diff line number	Diff line change
`@@ -275,7 +275,7 @@`
`275`	`275`	`"cell_type": "markdown",`
`276`	`276`	`"metadata": {},`
`277`	`277`	`"source": [`
`278`		- "All is well. We are now ready to do some predictive modeling!\n\n## Gradient Boosting\n\nGradient Boosting Regression with decision trees is often flexible enough to\nefficiently handle heteorogenous tabular data with a mix of categorical and\nnumerical features as long as the number of samples is large enough.\n\nHere, we use the modern\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with native support\nfor categorical features. Therefore, we only do minimal ordinal encoding for\nthe categorical variables and then\nlet the model know that it should treat those as categorical variables by\nusing a dedicated tree splitting rule. Since we use an ordinal encoder, we\npass the list of categorical values explicitly to use a logical order when\nencoding the categories as integers instead of the lexicographical order.\nThis also has the added benefit of preventing any issue with unknown\ncategories when using cross-validation.\n\nThe numerical variables need no preprocessing and, for the sake of simplicity,\nwe only try the default hyper-parameters for this model:\n\n"
	`278`	+ "All is well. We are now ready to do some predictive modeling!\n\n## Gradient Boosting\n\nGradient Boosting Regression with decision trees is often flexible enough to\nefficiently handle heterogenous tabular data with a mix of categorical and\nnumerical features as long as the number of samples is large enough.\n\nHere, we use the modern\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` with native support\nfor categorical features. Therefore, we only do minimal ordinal encoding for\nthe categorical variables and then\nlet the model know that it should treat those as categorical variables by\nusing a dedicated tree splitting rule. Since we use an ordinal encoder, we\npass the list of categorical values explicitly to use a logical order when\nencoding the categories as integers instead of the lexicographical order.\nThis also has the added benefit of preventing any issue with unknown\ncategories when using cross-validation.\n\nThe numerical variables need no preprocessing and, for the sake of simplicity,\nwe only try the default hyper-parameters for this model:\n\n"
`279`	`279`	`]`
`280`	`280`	`},`
`281`	`281`	`{`
`@@ -682,7 +682,7 @@`
`682`	`682`	`"cell_type": "markdown",`
`683`	`683`	`"metadata": {},`
`684`	`684`	`"source": [`
`685`		- "First, note that trees can naturally model non-linear feature interactions\nsince, by default, decision trees are allowed to grow beyond a depth of 2\nlevels.\n\nHere, we can observe that the combinations of spline features and non-linear\nkernels works quite well and can almost rival the accuracy of the gradient\nboosting regression trees.\n\nOn the contrary, one-hot encoded time features do not perform that well with\nthe low rank kernel model. In particular, they significantly over-estimate\nthe low demand hours more than the competing models.\n\nWe also observe that none of the models can successfully predict some of the\npeak rentals at the rush hours during the working days. It is possible that\naccess to additional features would be required to further improve the\naccuracy of the predictions. For instance, it could be useful to have access\nto the geographical repartition of the fleet at any point in time or the\nfraction of bikes that are immobilized because they need servicing.\n\nLet us finally get a more quantative look at the prediction errors of those\nthree models using the true vs predicted demand scatter plots:\n\n"
	`685`	+ "First, note that trees can naturally model non-linear feature interactions\nsince, by default, decision trees are allowed to grow beyond a depth of 2\nlevels.\n\nHere, we can observe that the combinations of spline features and non-linear\nkernels works quite well and can almost rival the accuracy of the gradient\nboosting regression trees.\n\nOn the contrary, one-hot encoded time features do not perform that well with\nthe low rank kernel model. In particular, they significantly over-estimate\nthe low demand hours more than the competing models.\n\nWe also observe that none of the models can successfully predict some of the\npeak rentals at the rush hours during the working days. It is possible that\naccess to additional features would be required to further improve the\naccuracy of the predictions. For instance, it could be useful to have access\nto the geographical repartition of the fleet at any point in time or the\nfraction of bikes that are immobilized because they need servicing.\n\nLet us finally get a more quantitative look at the prediction errors of those\nthree models using the true vs predicted demand scatter plots:\n\n"
`686`	`686`	`]`
`687`	`687`	`},`
`688`	`688`	`{`
Original file line number	Diff line number	Diff line change
`@@ -167,7 +167,7 @@`
`167`	`167`	`# -----------------`
`168`	`168`	`#`
`169`	`169`	`# Gradient Boosting Regression with decision trees is often flexible enough to`
`170`		`-# efficiently handle heteorogenous tabular data with a mix of categorical and`
	`170`	`+# efficiently handle heterogenous tabular data with a mix of categorical and`
`171`	`171`	`# numerical features as long as the number of samples is large enough.`
`172`	`172`	`#`
`173`	`173`	`# Here, we use the modern`
`@@ -795,7 +795,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):`
`795`	`795`	`# to the geographical repartition of the fleet at any point in time or the`
`796`	`796`	`# fraction of bikes that are immobilized because they need servicing.`
`797`	`797`	`#`
`798`		`-# Let us finally get a more quantative look at the prediction errors of those`
	`798`	`+# Let us finally get a more quantitative look at the prediction errors of those`
`799`	`799`	`# three models using the true vs predicted demand scatter plots:`
`800`	`800`	`from sklearn.metrics import PredictionErrorDisplay`
`801`	`801`