Skip to content

Commit b9ee46f

Browse files
Merge pull request #7 from matthewgillett/bias-variance
Bias-variance decomposition feature
2 parents 44a6876 + bb74639 commit b9ee46f

37 files changed

+3477
-272
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,20 @@
99

1010
repos:
1111

12-
- repo: https://github.com/ambv/black
13-
rev: 22.3.0
12+
- repo: https://github.com/psf/black
13+
rev: 23.11.0
1414
hooks:
1515
- id: black
1616
language_version: python3.8
1717
args: [--line-length=88, tests, mvtk]
1818

19-
- repo: https://gitlab.com/pycqa/flake8
20-
rev: 4.0.1
19+
- repo: https://github.com/pycqa/flake8
20+
rev: 6.1.0
2121
hooks:
2222
- id: flake8
2323
args: [--max-line-length=88, '--per-file-ignores=__init__.py:F401,F403', tests, mvtk]
2424
- repo: https://github.com/pre-commit/mirrors-mypy
25-
rev: v0.942
25+
rev: v1.7.1
2626
hooks:
2727
- id: mypy
2828
files: mvtk/

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ You can import:
3232
- `mvtk.sobol` for Sobol sensitivity analysis
3333
- `mvtk.supervisor` for divergence analysis
3434
- `mvtk.metrics` for specialised metrics
35+
- `mvtk.bias_variance` for bias-variance decomposition
3536

3637
# Documentation
3738
You can run `make -C docs html` on a Mac or `make.bat -C docs html` on a PC to just rebuild the docs. In this case, point your browser to ```docs/_build/html/index.html``` to view the homepage. If your browser was already pointing to documentation that you changed, you can refresh the page to see the changes.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
bias_variance
2+
=============
3+
4+
.. automodule:: mvtk.bias_variance.bias_variance
5+
:members:
6+
:imported-members:
7+
:undoc-members:
8+
:show-inheritance:
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
bias_variance_parallel
2+
======================
3+
4+
.. automodule:: mvtk.bias_variance.bias_variance_parallel
5+
:members:
6+
:imported-members:
7+
:undoc-members:
8+
:show-inheritance:

docs/bias_variance.estimators.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
estimators
2+
==========
3+
4+
.. automodule:: mvtk.bias_variance.estimators
5+
:members:
6+
:imported-members:
7+
:undoc-members:
8+
:special-members: __init__, __call__

docs/bias_variance.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
bias_variance
2+
=============
3+
4+
Subpackages
5+
-----------
6+
7+
.. toctree::
8+
9+
bias_variance.estimators
10+
bias_variance.bias_variance
11+
bias_variance.bias_variance_parallel

docs/bias_variance_user_guide.rst

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
########################
2+
Bias-Variance User Guide
3+
########################
4+
5+
**********
6+
Motivation
7+
**********
8+
9+
Statistical Bias vs. "Fairness"
10+
===============================
11+
12+
For this user guide and associated submodule, we are referring to
13+
`statistical bias <https://en.wikipedia.org/wiki/Bias_(statistics)>`_ rather
14+
than the "fairness" type of bias.
15+
16+
Why should we care about bias and variance?
17+
===========================================
18+
19+
Bias and variance are two indicators of model performance and together represent
20+
two-thirds of model error (the remaining one-third is irreducible "noise" error that
21+
comes from the data set itself). We can define bias and variance as follows
22+
by training a model with multiple `bootstrap sampled
23+
<https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_ training sets, resulting in
24+
multiple instances of the model.
25+
26+
.. topic:: Bias and variance defined over multiple training sets:
27+
28+
* Bias represents the average difference between the prediction a model makes and the correct prediction.
29+
* Variance represents the average variability of the prediction a model makes.
30+
31+
Typically, a model with high bias is "underfit" and a model with high variance is
32+
"overfit," but keep in mind this is not always the case and there can be many reasons
33+
why a model has high bias or high variance. An "underfit" model is oversimplified and
34+
performs poorly on the training data, whereas an "overfit" model sticks too closely to
35+
the training data and performs poorly on unseen examples. See Scikit-Learn's
36+
`Underfitting vs. Overfitting
37+
<https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html>`_
38+
for a clear example of an "underfit" model vs. an "overfit" model.
39+
40+
There is a concept
41+
known as the `"bias-variance tradeoff"
42+
<https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff>`_ that describes
43+
the relationship between high bias and high variance in a model. Our ultimate goal
44+
here is to find the ideal balance where both bias and variance is at a minimum.
45+
It is also important from a business problem standpoint on whether the model
46+
error that we are unable to reduce should favor bias or variance.
47+
48+
*****************************************
49+
Visualize Bias and Variance With Examples
50+
*****************************************
51+
52+
In order to easily understand the concepts of bias and variance, we will show
53+
four different examples of models for each of the high and low bias and variance
54+
combinations. These are extreme and engineered cases for the purpose of clearly
55+
seeing the bias/variance.
56+
57+
Before we begin, let's take a look at the distribution of the labels. Notice
58+
that the majority of label values are around 1 and 2, and much less around 5.
59+
60+
.. figure:: images/bias_variance_label_distribution.png
61+
:align: center
62+
:alt: alternate text
63+
:figclass: align-center
64+
65+
First we have a model with high bias and low variance. We artificially
66+
introduce bias to the model by adding 10 to every training label, but leaving
67+
the test labels as is. Given that values of greater than 5 in the entire label
68+
set are considered outliers, we are fitting the model against outliers.
69+
70+
.. figure:: images/high_bias_low_variance.png
71+
:align: center
72+
:alt: alternate text
73+
:figclass: align-center
74+
75+
Five sets of mean squared error results from the test set from the five
76+
bootstrap sample trainings of the model. Notice the model error is very
77+
consistent among the trials and is not centered around 0.
78+
79+
Next we have a model with low bias and high variance. We simulate this by
80+
introducing 8 random "noise" features to the data set. We also reduce the size
81+
of the training set and train a neural network over a low number of epochs.
82+
83+
.. figure:: images/low_bias_high_variance.png
84+
:align: center
85+
:alt: alternate text
86+
:figclass: align-center
87+
88+
Five sets of mean squared error results from the test set from the five
89+
bootstrap sample trainings of the model. Notice the model error has
90+
different distributions among the trials and centers mainly around 0.
91+
92+
Next we have a model with high bias and high variance. We simulate through
93+
a combination of the techniques from the high bias low variance example and
94+
the low bias high variance example and train another neural network.
95+
96+
.. figure:: images/high_bias_high_variance.png
97+
:align: center
98+
:alt: alternate text
99+
:figclass: align-center
100+
101+
Five sets of mean squared error results from the test set from the five
102+
bootstrap sample trainings of the model. Notice the model error has
103+
different distributions among the trials and is not centered around 0.
104+
105+
Finally we have a model with low bias and low variance. This is a simple
106+
linear regression model with no modifications to the training or test labels.
107+
108+
.. figure:: images/low_bias_low_variance.png
109+
:align: center
110+
:alt: alternate text
111+
:figclass: align-center
112+
113+
Five sets of mean squared error results from the test set from the five
114+
bootstrap sample trainings of the model. Notice the model error is very
115+
consistent among the trials and centers mainly around 0.
116+
117+
***************************
118+
Bias-Variance Decomposition
119+
***************************
120+
121+
.. currentmodule:: mvtk.bias_variance
122+
123+
There are formulas for breaking down total model error into three parts: bias,
124+
variance, and noise. This can be applied to both regression problem loss
125+
functions (mean squared error) and classification problem loss functions
126+
(0-1 loss). In a paper by Pedro Domingos, a method of unified
127+
decomposition was proposed for both types of problems :cite:`domingos2000decomp`.
128+
129+
First lets define :math:`y` as a single prediction, :math:`D` as the set of
130+
training sets used to train the models, :math:`Y` as the set of predictions
131+
from the models trained on :math:`D`, and a loss function :math:`L` that
132+
calculates the error between our prediction :math:`y` and the correct
133+
prediction.
134+
The main prediction :math:`y_m` is the smallest average loss for a prediction
135+
when compared to the set of predictions :math:`Y`. The main prediction is
136+
the mean of :math:`Y` for mean squared error and the mode of :math:`Y` for
137+
0-1 loss :cite:`domingos2000decomp`.
138+
139+
Bias can now be defined for a single example :math:`x` over the set of models
140+
trained on :math:`D` as the loss calculated between the main prediction
141+
:math:`y_m` and the correct prediction :math:`y_*` :cite:`domingos2000decomp`.
142+
143+
.. math::
144+
B(x) = L(y_*,y_m)
145+
146+
Variance can now be defined for a single example :math:`x` over the set of
147+
models trained on :math:`D` as the average loss calculated between all predictions
148+
and the main prediction :math:`y_m` :cite:`domingos2000decomp`.
149+
150+
.. math::
151+
V(x) = E_D[L(y_m, y)]
152+
153+
We will need to take the average of the bias over all examples as
154+
:math:`E_x[B(x)]` and the average of the variance over all examples as
155+
:math:`E_x[V(x)]` :cite:`domingos2000decomp`.
156+
157+
With :math:`N(x)` representing the irreducible error from observation noise, we
158+
can decompose the average expected loss as :cite:`domingos2000decomp`
159+
160+
.. math::
161+
E_x[N(x)] + E_x[B(x)] + E_x[cV(x)]
162+
163+
In other words, the average loss over all examples is equal to the noise plus the
164+
average bias plus the net variance (the :math:`c` factor included with the variance
165+
when calculating average variance gives us the net variance).
166+
167+
.. note::
168+
We are generalizing the actual value of :math:`N(x)`, as the Model Validation
169+
Toolkit's implementation of bias-variance decomposition does not include noise
170+
in the average expected loss. This noise represents error in the actual data
171+
and not error related to the model itself. If you would like to dive deeper
172+
into the noise representation, please consult the `Pedro Domingos paper
173+
<https://homes.cs.washington.edu/~pedrod/papers/mlc00a.pdf>`_.
174+
175+
For mean squared loss functions, :math:`c = 1`, meaning that average variance
176+
is equal to net variance.
177+
178+
For zero-one loss functions, :math:`c = 1` when :math:`y_m = y_*` otherwise
179+
:math:`c = -P_D(y = y_* | y != y_m)`. :cite:`domingos2000decomp` In other words,
180+
:math:`c` is 1 when the main prediction is the correct prediction. If the main
181+
prediction is not the correct prediction, then :math:`c` is equal to the
182+
probability of a single prediction being the correct prediction given that the
183+
single prediction is not the main prediction.
184+
185+
Usage
186+
=====
187+
188+
:meth:`bias_variance_compute` will return the average loss, average bias, average
189+
variance, and net variance for an estimator trained and tested over a specified number
190+
of training sets. This was inspired and modeled after Sebastian Raschka's
191+
`bias_variance_decomp
192+
<https://github.com/rasbt/mlxtend/blob/master/mlxtend/evaluate/bias_variance_decomp.py>`_
193+
function :cite:`mlxtenddecomp`.
194+
We use the `bootstrapping <https://en.wikipedia.org/wiki/Bootstrapping_(statistics)>`_
195+
method to produce our sets of training data from the original training set. By default
196+
it will use mean squared error as the loss function, but it will accept the following
197+
functions for calculating loss.
198+
199+
* :meth:`bias_variance_mse` for mean squared error
200+
* :meth:`bias_variance_0_1_loss` for 0-1 loss
201+
202+
Since :meth:`bias_variance_compute` trains an estimator over multiple iterations, it also
203+
expects the estimator to be wrapped in a class that extends the
204+
:class:`estimators.EstimatorWrapper` class, which provides fit and predict methods
205+
that not all estimator implementations conform to. The following estimator wrappers are
206+
provided.
207+
208+
* :class:`estimators.PyTorchEstimatorWrapper` for `PyTorch <https://pytorch.org/>`_
209+
* :class:`estimators.SciKitLearnEstimatorWrapper` for `Scikit-Learn <https://scikit-learn.org/stable/>`_
210+
* :class:`estimators.TensorFlowEstimatorWrapper` for `TensorFlow <https://www.tensorflow.org/>`_
211+
212+
:meth:`bias_variance_compute` works well for smaller data sets and less complex models, but what
213+
happens when you have a very large set of data, a very complex model, or both?
214+
:meth:`bias_variance_compute_parallel` does the same calculation, but leverages `Ray
215+
<https://www.ray.io/>`_ for parallelization of bootstrapping, training, and predicting.
216+
This allows for faster calculations using computations over a distributed architecture.
217+
218+
.. topic:: Tutorials:
219+
220+
* :doc:`Bias-Variance Visualization <notebooks/bias_variance/BiasVarianceVisualization>`
221+
* :doc:`Bias-Variance Regression <notebooks/bias_variance/BiasVarianceRegression>`
222+
* :doc:`Bias-Variance Classification <notebooks/bias_variance/BiasVarianceClassification>`
223+
224+
.. bibliography:: refs.bib
225+
:cited:

docs/conf.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
author = "Model Validation Toolkit Team"
2424

2525
# The full version, including alpha/beta/rc tags
26-
release = "0.0.1"
26+
release = "0.2.0"
2727

2828

2929
# -- General configuration ---------------------------------------------------
@@ -73,8 +73,8 @@
7373
# A fix for Sphinx error contents.rst not found
7474
master_doc = "index"
7575

76-
# increase the timeout for long running notebooks
77-
nbsphinx_timeout = 180
76+
# increase the timeout for long-running notebooks
77+
nbsphinx_timeout = 900
7878

7979
# Don't show full paths
8080
add_module_names = False
10.7 KB
Loading
16.5 KB
Loading

0 commit comments

Comments
 (0)