Skip to content

Commit 3030121

Browse files
Implement a better confidence interval computation
This commit introduces a new way of computing the confidence interval for a non-time-dependent GEV distribution. The new functions follow the book by Coles (2001). The function that was previously used to compute the confidence interval is removed, as it is not needed in any of the sample scripts. The function to print the GEV parameters is moved from tools_errors to advanced_GEV_analysis. The former file is removed because it became empty. The new way to calculate confidence intervals is mathematically more sound and generally leads to smaller confidence intervals, thus it improves the results. The old way of computing confidence intervals (now removed) gave a more conservative estimate. Note that this change does not impact time-dependent GEV distributions as discussed in the paper. This modification has been proposed by Marvin Lorenz (IOW), who also implemented a first version of the new methods. A new sample script that uses annual maxima instead of monthly maxima is added with this commit. Since no conversion between return periods in years and months is necessary for annual maxima, this code is slightly cleaner and makes the principles of this new method clearer.
1 parent 214973b commit 3030121

File tree

6 files changed

+187
-71
lines changed

6 files changed

+187
-71
lines changed

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,10 @@ the script [advanced_GEV_analysis.py](advanced_GEV_analysis.py) may be
4646
useful. Its main part is an implementation of the methods described in
4747
the book “An Introduction to Statistical Modeling of Extreme Values” by
4848
Stuart Coles (2001). Example usage of this library is shown for
49-
[time-independent GEV models](Time-independent_GEV_fit.py) and for
50-
[time-dependent GEV models](Time-dependent_GEV_fit.py). With the surge
51-
levels for Brest from the GESLA-2 dataset, the time-independent GEV
52-
model looks like this:
49+
[time-independent GEV models](Time-independent_GEV_fit.py), for
50+
[time-dependent GEV models](Time-dependent_GEV_fit.py), and for
51+
[GEV models of annual maxima](Time-independent_GEV_fit_for_annual_maxima.py).
52+
With the surge levels for Brest from the GESLA-2 dataset, the
53+
time-independent GEV model of monthly maxima looks like this:
5354

54-
![Figure of a time-independent GEV fit to extreme surge levels in Brest](results/GEV_fit_Brest.png)
55+
![Figure of a time-independent GEV fit to extreme surge levels (monthly maxima) in Brest](results/GEV_fit_Brest.png)

Time-independent_GEV_fit.py

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
11
"""Make a time-independent GEV fit to monthly maxima (MM).
22
3-
Written by Markus Reinert, June 2020–July 2021.
3+
Written by Markus Reinert, June 2020–March 2022.
44
"""
55

66
import numpy as np
77
from scipy import optimize
88
from matplotlib import pyplot as plt
99
from matplotlib.ticker import ScalarFormatter
1010

11-
from advanced_GEV_analysis import negative_log_likelihood, GEV
12-
from advanced_GEV_analysis import get_month_selection, get_year_selection
13-
from tools_errors import format_GEV_parameters, get_error_bounds
11+
from advanced_GEV_analysis import negative_log_likelihood, get_year_selection, get_month_selection
12+
from advanced_GEV_analysis import GEV_return_level, GEV_standard_error, format_GEV_parameters
1413
from tools_surge import load_data, Timeseries
1514

1615

@@ -36,30 +35,23 @@
3635
h_MM = np.array(h_MM)
3736
n_months = len(h_MM)
3837

39-
# Define a function to compute the return period from a CDF
40-
return_period = lambda P: 1 / (1 - P**(n_months / n_years))
41-
4238
# Fit a GEV to the extreme values
4339
result = optimize.minimize(negative_log_likelihood, [10, 15, -0.1], args=(h_MM,))
4440
if not result.success:
4541
print("Warning:", result.message)
4642
params = result.x
47-
errors = np.sqrt(np.diag(result.hess_inv))
48-
49-
# Calculate the CDF and the return periods of the fit
50-
x_axis = np.linspace(min(h_MM), max(h_MM), 1000)
51-
P_MM = GEV(x_axis, *params)
52-
T_MM = return_period(P_MM)
43+
covars = result.hess_inv
44+
errors = np.sqrt(np.diag(covars))
5345

54-
# Calculate the 95 % confidence interval of the fit
55-
n_sigma = 1.96
56-
P_MM_bounds = get_error_bounds(GEV, x_axis, params, errors, n_sigma)
57-
T_MM_bounds = [return_period(P_bound) for P_bound in P_MM_bounds]
46+
# Calculate the graph and the standard error of the fitted GEV model
47+
t_axis = np.logspace(0, 3, 10_000)[1:] # exclude t = 1 to avoid value -infinity
48+
h_model = GEV_return_level(t_axis, *params, values_per_year=n_months/n_years)
49+
h_std_error = GEV_standard_error(t_axis, *params, covars, values_per_year=n_months/n_years)
5850

5951
# Calculate the CDF and the return periods of the empirical data
6052
h_empirical = sorted(h_MM)
6153
P_empirical = (1 + np.arange(n_months)) / (1 + n_months)
62-
T_empirical = return_period(P_empirical)
54+
T_empirical = 1 / (1 - P_empirical**(n_months / n_years))
6355

6456

6557
fig, ax = plt.subplots()
@@ -77,9 +69,14 @@
7769
T_empirical, h_empirical, "k.",
7870
label="Empirical return periods ({} data points)".format(n_months),
7971
)
80-
ax.semilogx(T_MM, x_axis, label="GEV fit: " + format_GEV_parameters(params, errors))
81-
assert n_sigma == 1.96, "label 95 % CI is not correct"
82-
ax.fill_betweenx(x_axis, *T_MM_bounds, alpha=0.3, label="95 % confidence interval")
72+
ax.semilogx(t_axis, h_model, label="GEV fit: " + format_GEV_parameters(params, errors))
73+
ax.fill_between(
74+
t_axis,
75+
h_model + 1.96 * h_std_error,
76+
h_model - 1.96 * h_std_error,
77+
alpha=0.3,
78+
label="95 % confidence interval",
79+
)
8380

8481
ax.legend()
8582
ax.set_xlim(0.9, 200)
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
"""Make a time-independent GEV fit to annual maxima (AM).
2+
3+
Written by Markus Reinert, June 2020–March 2022.
4+
"""
5+
6+
import numpy as np
7+
from scipy import optimize
8+
from matplotlib import pyplot as plt
9+
from matplotlib.ticker import ScalarFormatter
10+
11+
from advanced_GEV_analysis import negative_log_likelihood, get_year_selection
12+
from advanced_GEV_analysis import GEV_return_level, GEV_standard_error, format_GEV_parameters
13+
from tools_surge import load_data, Timeseries
14+
15+
16+
data = load_data("Brest", Timeseries.SKEW_SURGE_GESLA)
17+
18+
# Get the maximum value in every calendar year
19+
h_AM = []
20+
for year in range(data["year_start"], data["year_end"] + 1):
21+
sel = get_year_selection(year, data["t"])
22+
if any(sel):
23+
h_AM.append(max(data["h"][sel]))
24+
h_AM = np.array(h_AM)
25+
n_years = len(h_AM)
26+
27+
# Fit a GEV to the extreme values
28+
result = optimize.minimize(negative_log_likelihood, [40, 15, -0.1], args=(h_AM,))
29+
if not result.success:
30+
print("Warning:", result.message)
31+
params = result.x
32+
covars = result.hess_inv
33+
errors = np.sqrt(np.diag(covars))
34+
35+
# Calculate the graph and the standard error of the fitted GEV model
36+
t_axis = np.logspace(0, 3, 100_000)[1:] # exclude t = 1 to avoid value -infinity
37+
h_model = GEV_return_level(t_axis, *params)
38+
h_std_error = GEV_standard_error(t_axis, *params, covars)
39+
40+
# Calculate the CDF and the return periods of the empirical data
41+
h_empirical = sorted(h_AM, reverse=True)
42+
P_empirical = (1 + np.arange(n_years)) / (1 + n_years)
43+
T_empirical = 1 / P_empirical
44+
45+
46+
fig, ax = plt.subplots()
47+
48+
ax.set_title(
49+
"GEV fit to annual surge maxima in {} from {} to {}".format(
50+
data["city"], data["year_start"], data["year_end"]
51+
),
52+
weight="bold",
53+
)
54+
ax.set_xlabel("Return period in years")
55+
ax.set_ylabel("Return level in cm")
56+
57+
ax.semilogx(
58+
T_empirical, h_empirical, "k.",
59+
label="Empirical return periods ({} data points)".format(n_years),
60+
)
61+
ax.semilogx(t_axis, h_model, label="GEV fit: " + format_GEV_parameters(params, errors))
62+
ax.fill_between(
63+
t_axis,
64+
h_model + 1.96 * h_std_error,
65+
h_model - 1.96 * h_std_error,
66+
alpha=0.3,
67+
label="95 % confidence interval",
68+
)
69+
70+
ax.legend()
71+
ax.set_xlim(0.9, 200)
72+
# Limit the y-extent to the observed range +/- 5%
73+
delta_h = (h_AM.max() - h_AM.min()) * 0.05
74+
ax.set_ylim(h_AM.min() - delta_h, h_AM.max() + delta_h)
75+
ax.grid(linestyle=":")
76+
ax.xaxis.set_major_formatter(ScalarFormatter())
77+
78+
plt.savefig("results/GEV_fit_{}_annual.png".format(data["city"]))
79+
plt.show()

advanced_GEV_analysis.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@
9393
oscillation with a phase.
9494
9595
Written by Markus Reinert, August 2020, February 2021.
96+
Extended by Marvin Lorenz and Markus Reinert, March 2022.
9697
"""
9798

9899
import calendar
@@ -295,6 +296,89 @@ def GEV(x, mu, sigma, xi):
295296
return y
296297

297298

299+
def GEV_return_level(t, mu, sigma, xi, values_per_year=1.0):
300+
"""Compute return level z for return period t of a GEV distribution.
301+
302+
A random variable with the distribution GEV(mu, sigma, xi) will
303+
typically reach a value as high as z once every t years, where z is
304+
the value returned by this function.
305+
306+
That means, for any t >= 1, the following call returns t:
307+
return_period(GEV(GEV_return_level(t, mu, sigma, xi), mu, sigma, xi))
308+
where return_period(p) = 1 / (1 - p).
309+
310+
If there is more than one value per year, i.e., not annual maxima
311+
but for example monthly maxima are used, then the optional argument
312+
values_per_year specifies how many values there are on average per
313+
year (12 in the case of monthly maxima, or less if individual months
314+
are missing). In this case, the return_period (see above) is
315+
defined as return_period(p) = 1 / (1 - p**values_per_year).
316+
317+
Reference: Equations (3.4) and (3.10) of Coles (2001)
318+
"""
319+
p = 1 / t
320+
if values_per_year != 1.0:
321+
# If there is more than one value per year, we need to correct p
322+
p = 1 - (1 - p)**(1/values_per_year)
323+
if abs(xi) < XI_THRESHOLD:
324+
# Gumbel distribution
325+
return mu - sigma * np.log(-np.log(1 - p))
326+
else:
327+
# non-Gumbel GEV distribution
328+
return mu - sigma / xi * (1 - (-np.log(1 - p)) ** (-xi))
329+
330+
331+
def GEV_standard_error(t, mu, sigma, xi, V, values_per_year=1.0):
332+
"""Compute the standard error of the return level for return period t.
333+
334+
When the distribution of a random variable is estimated to be
335+
GEV(mu, sigma, xi) with the variance-covariance matrix V, then the
336+
uncertainty associated with the return level z that belongs to the
337+
return period t (see function GEV_return_level) is the value
338+
returned by this function. The upper (resp. lower) bound of the 95%
339+
confidence interval can be approximated by adding (subtracting) 1.96
340+
times the value returned by this function to z.
341+
342+
The main computation in this function is grad_z_p * V * grad_z_p,
343+
where * is a matrix multiplication if grad_z_p is 1-dimensional.
344+
This is basically a scalar product between grad_z_p and V * grad_z_p
345+
and can be written as <grad_z_p|V|grad_z_p> in the bra-ket notation
346+
occasionally used in physics. The square root of this product is
347+
returned by this function, in order to have the standard error and
348+
not the variance.
349+
350+
This function can be used with t as a vector instead of a single
351+
value, in which case a vector of the same size as t is returned.
352+
Note that in this case, grad_z_p is a matrix instead of a vector, so
353+
grad_z_p * V * grad_z_p differs from regular matrix multiplication.
354+
355+
Reference: Equation (3.11) of Coles (2001)
356+
"""
357+
p = 1 / t
358+
if values_per_year != 1.0:
359+
# If there is more than one value per year, we need to correct p
360+
p = 1 - (1 - p)**(1/values_per_year)
361+
y_p = -np.log(1 - p)
362+
grad_z_p = np.array([
363+
np.ones_like(y_p),
364+
-xi**(-1) * (1 - y_p**(-xi)),
365+
sigma * xi**(-2) * (1 - y_p**(-xi)) - sigma * xi**(-1) * y_p**(-xi) * np.log(y_p),
366+
])
367+
return np.sqrt(np.sum(grad_z_p * np.dot(V, grad_z_p), axis=0))
368+
369+
370+
def format_GEV_parameters(parameters, errors, join_str=", "):
371+
"""Create a string that contains the GEV parameters in a nice format."""
372+
names = ["\\mu", "\\sigma", "\\xi"]
373+
# Write the error with one significant digit
374+
digits = [int(-np.floor(np.log10(e))) if e < 1 else 0 for e in errors]
375+
return join_str.join(
376+
"${name} = {param:.{digits}f} \\pm {std:.{digits}f}$".format(
377+
name=n, param=p, std=e, digits=d
378+
) for n, p, e, d in zip(names, parameters, errors, digits)
379+
)
380+
381+
298382
def get_year_selection(year, time_array):
299383
"""Get the Boolean array that selects ‘year’ from ‘time_array’.
300384

results/GEV_fit_Brest.png

-1.54 KB
Loading

tools_errors.py

Lines changed: 0 additions & 45 deletions
This file was deleted.

0 commit comments

Comments
 (0)