Some NumPy ufuncs
, and some related tools.
ufunclab
uses meson-python
as its build system. Currently no
pre-built wheels are provided, so to install it, you'll have to check
out the git repository, and in the top-level directory, run
pip install .
in the terminal or command prompt. To run the unit tests after installing, run
pytest --pyargs ufunclab
To build ufunclab
, a C99-compatible C compiler and a C++17-compatible C++
compiler are required.
The unit tests require pytest.
The test suite is run with Python versions 3.9 to 3.12 and with several recent releases of NumPy, including NumPy 2.0.
Links to reference material related to NumPy's C API for ufuncs and gufuncs are given below.
Element-wise ufuncs
Most of the element-wise ufuncs are implemented by writing the core
calculation as a templated C++ function, and using some Python code to
automate the generation of all the necessary boilerplate and wrappers
that implement a ufunc around the core calculation. The exceptions
are debye1
, logfactorial
, log1p
, loggamma1p
, issnan
, and
cabssq
, which are implemented in C, with all boilerplate code written
"by hand" in the C file.
Function | Description |
---|---|
logfactorial |
Log of the factorial of integers |
issnan |
Like isnan , but for signaling nans only. |
next_less |
Equivalent to np.nextafter(x, -inf) |
next_greater |
Equivalent to np.nextafter(x, inf) |
abs_squared |
Squared absolute value |
cabssq |
Squared absolute value for complex input only |
deadzone |
Deadzone function |
step |
Step function |
linearstep |
Piecewise linear step function |
smoothstep3 |
Smooth step using a cubic polynomial |
invsmoothstep3 |
Inverse of smoothstep3 |
smoothstep5 |
Smooth step using a degree 5 polynomial |
trapezoid_pulse |
Trapezoid pulse function |
pow1pm1 |
Compute (1 + x)**y - 1 |
debye1 |
Compute the Debye function D₁(x) |
expint1 |
Exponential integral E₁ for real inputs |
log1p_doubledouble |
log(1 + z) for complex z. |
log1p_theorem4 |
log(1 + z) for complex z. |
logexpint1 |
Logarithm of the exponential integral E₁ |
loggamma1p |
Logarithm of gamma(1 + x) for real x > -1. |
logistic |
The standard logistic sigmoid function |
logistic_deriv |
Derivative of the standard logistic sigmoid function |
log_logistic |
Logarithm of the standard logistic sigmoid function |
swish |
The 'swish' function--a smoothed ramp |
hyperbolic_ramp |
A smoothed ramp whose graph is a hyperbola |
exponential_ramp |
A smoothed ramp with exponential convergence to asymptotes |
yeo_johnson |
Yeo-Johnson transformation |
inv_yeo_johnson |
Inverse of the Yeo-Johnson transformation |
erfcx |
Scaled complementary error function |
Normal distribution functions | Functions for the normal distribution: cdf, sf, logcdf, logsf |
Semivariograms | Several semivariograms used in kriging interpolation |
Generalized ufuncs
Note, for anyone looking at the source code: some of these implementations
are in C and use the legacy NumPy templating language (look for filenames
that end in .src
); others use templated C++ functions combined with code
generation tools that can be found in tools/cxxgen
. The .src
files are
processed with the script in ufunclab/tools/conv_template.py
.
Functions that reduce a one-dimensional array to one or two numbers.
Function | Description |
---|---|
first |
First value that matches a target comparison |
argfirst |
Index of the first occurrence of a target comparison |
argmin |
Like numpy.argmin , but a gufunc |
argmax |
Like numpy.argmax , but a gufunc |
minmax |
Minimum and maximum |
argminmax |
Indices of the min and the max |
min_argmin |
Minimum value and its index |
max_argmax |
Maximum value and its index |
searchsortedl |
Find position for given element in sorted seq. |
searchsortedr |
Find position for given element in sorted seq. |
peaktopeak |
Alternative to numpy.ptp |
all_same |
Check all values are the same |
gmean |
Geometric mean |
hmean |
Harmonic mean |
meanvar |
Mean and variance |
mad |
Mean absolute difference (MAD) |
rmad |
Relative mean absolute difference (RMAD) |
gini |
Gini coefficient |
rms |
Root-mean-square for real and complex inputs |
vnorm |
Vector norm |
percentileofscore |
Percentile of score (like scipy.stats.percentileofscore ) |
Functions that reduce two one-dimensional arrays to a number.
Function | Description |
---|---|
vdot |
Vector dot product for real floating point arrays |
pearson_corr |
Pearson's product-moment correlation coefficient |
wjaccard |
Weighted Jaccard index. |
Functions that transform a one-dimensional array to another one-dimensional array.
Function | Description |
---|---|
fillnan1d |
Replace nan using linear interpolation |
backlash |
Backlash operator |
backlash_sum |
Sum backlash operators (Prandtl-Ishlinskii) |
hysteresis_relay |
Relay with hysteresis (Schmitt trigger) |
sosfilter |
SOS (second order sections) linear filter |
sosfilter_ic |
SOS linear filter with initial condition |
sosfilter_ic_contig |
SOS linear filter with contiguous array inputs |
Other generalized ufuncs.
Function | Description |
---|---|
cross2 |
2-d vector cross product (returns scalar) |
cross3 |
3-d vector cross product |
linear_interp1d |
Linear interpolation, like numpy.interp |
tri_area |
Area of triangles in n-dimensional space |
tri_area_indexed |
Area of triangles in n-dimensional space |
multivariate_logbeta |
Logarithm of the multivariate beta function |
Wrapped generalized ufuncs
These are Python functions that wrap a gufunc. The wrapper allows the function to provide a capability that is not possible with a gufunc.
Function | Description |
---|---|
bincount |
Like np.bincount , but gufunc-based |
convert_to_base |
Convert an integer to a given base. |
nextn_greater |
Next n values greater than the given x. |
nextn_less |
Next n values greater than the given x. |
one_hot |
Create 1-d array that is 1 at index k and 0 elsewhere. |
Other tools
Function | Description |
---|---|
gendot |
Create a new gufunc that composes two ufuncs |
ufunc_inspector |
Display ufunc information |
logfactorial
is a ufunc that computes the natural logarithm of the
factorial of the nonnegative integer x. (nan
is returned for negative
input.)
For example,
>>> from ufunclab import logfactorial
>>> logfactorial([1, 10, 100, 1000])
array([ 0. , 15.10441257, 363.73937556, 5912.12817849])
issnan
is an element-wise ufunc with a single input that acts like
the standard isnan
function, but it returns True only for
signaling nans.
The current implementation only handles the floating point types np.float16
,
np.float32
and np.float64
.
>>> import numpy as np
>>> from ufunclab import issnan
>>> x = np.array([12.5, 0.0, np.inf, 999.0, np.nan], dtype=np.float32)
Put a signaling nan in x[1]
. (The nan in x[4]
is a quiet nan, and
we'll leave it that way.)
>>> v = x.view(np.uint32)
>>> v[1] = 0b0111_1111_1000_0000_0000_0000_0000_0011
>>> x
array([ 12.5, nan, inf, 999. , nan], dtype=float32)
>>> np.isnan(x)
array([False, True, False, False, True])
Note that NumPy displays both quiet and signaling nans as just nan
,
and np.isnan(x)
returns True for both quiet and signaling nans (as
it should).
issnan(x)
indicates which values are signaling nans:
>>> issnan(x)
array([False, True, False, False, False])
next_less
is an element-wise ufunc with a single input that
is equivalent to np.nextafter
with the second argument set to -inf
.
>>> import numpy as np
>>> from ufunclab import next_less
>>> next_less(np.array([-12.5, 0, 1, 1000], dtype=np.float32))
array([-1.2500001e+01, -1.4012985e-45, 9.9999994e-01, 9.9999994e+02],
dtype=float32)
next_greater
is an element-wise ufunc with a single input that
is equivalent to np.nextafter
with the second argument set to inf
.
>>> import numpy as np
>>> from ufunclab import next_greater
>>> next_greater(np.array([-12.5, 0, 1, 1000], dtype=np.float32))
array([-1.24999990e+01, 1.40129846e-45, 1.00000012e+00, 1.00000006e+03],
dtype=float32)
abs_squared(z)
computes the squared absolute value of z
.
This is an element-wise ufunc with types 'f->f'
, 'd->d'
,
'g->g'
, 'F->f'
, 'D->d'
, and 'G->g'
. For real input,
the result is z**2
. For complex input, it is
z.real**2 + z.imag**2
, which is equivalent to z*conj(z)
,
where conj(z)
is the complex conjugate of z
.
>>> import numpy as np
>>> from ufunclab import abs_squared
>>> abs_squared.types
['f->f', 'd->d', 'g->g', 'F->f', 'D->d', 'G->g']
>>> x = np.array([-1.5, 3.0, 9.0, -10.0], dtype=np.float32)
>>> abs_squared(x)
array([ 2.25, 9. , 81. , 100. ], dtype=float32)
>>> z = np.array([-3+4j, -1, 1j, 13, 0.5-1.5j])
>>> abs_squared(z)
array([ 25. , 1. , 1. , 169. , 2.5])
cabssq(z)
computes the squared absolute value of z
for complex input only.
This is the same calculation as abs_squared
, but the implementation is
different. cabssq
is implemented in C with the inner loop functions
implemented "by hand", with no C++ or NumPy templating. cabssq
is generally
faster than abs_squared
, because it avoids some of the overhead that occurs
in the code generated in the implementation of abs_squared
, and it allows
the compiler to optimize the code more effectively.
deadzone(x, low, high)
is a ufunc with three inputs and one output.
It computes the "deadzone" response of a signal:
{ 0 if low <= x <= high
f(x) = { x - low if x < low
{ x - high if x > high
The function is similar to the deadzone block of Matlab's Simulink library. The function is also known as a soft threshold.
Here's a plot of deadzone(x, -0.25, 0.5)
:
The script deadzone_demo2.py
in the examples
directory generates
the plot
The ufunc step(x, a, flow, fa, fhigh)
returns flow
for
x < a
, fhigh
for x > a
, and fa
for x = a
.
The Heaviside function can be implemented as step(x, 0, 0, 0.5, 1)
.
The script step_demo.py
in the examples
directory generates
the plot
The ufunc linearstep(x, a, b, fa, fb)
returns fa
for
x <= a
, fb
for x >= b
, and uses linear interpolation
from fa
to fb
in the interval a < x < b
.
The script linearstep_demo.py
in the examples
directory generates
the plot
The ufunc smoothstep3(x, a, b, fa, fb)
returns fa
for
x <= a
, fb
for x >= b
, and uses a cubic polynomial in
the interval a < x < b
to smoothly transition from fa
to fb
.
The script smoothstep3_demo.py
in the examples
directory generates
the plot
The ufunc invsmoothstep3(y, a, b, fa, fb)
is the inverse of
smoothstep3(x, a, b, fa, fb)
.
The script invsmoothstep3_demo.py
in the examples
directory generates
the plot
The function smoothstep5(x, a, b, fa, fb)
returns fa
for
x <= a
, fb
for x >= b
, and uses a degree 5 polynomial in
the interval a < x < b
to smoothly transition from fa
to fb
.
The script smoothstep5_demo.py
in the examples
directory generates
the plot
trapezoid_pulse(x, a, b, c, d, amp)
is a ufunc that computes
a trapezoid pulse. The function is 0 for x
<= a
or x
>= d
,
amp
for b
<= x
<= c
, and a linear ramp in the intervals
[a, b]
and [c, d]
.
Here's a plot of trapezoid_pulse(x, 1, 3, 4, 5, 2)
(generated by
the script trapezoid_pulse_demo.py
in the examples
directory):
pow1pm1(x, y)
computes (1 + x)**y - 1
for x >= -1
.
The calculation is formulated to avoid loss of precision when
(1 + x)**y
is close to 1.
>>> from ufunclab import pow1pm1
The following result provides full machine precision:
>>> x = -0.125
>>> y = 3.25e-12
>>> pow1pm1(x, y)
-4.3397702602960437e-13
The naive calculation provides less than six digits of precision:
>>> (1 + x)**y - 1
-4.339861803259737e-13
debye1(x)
computes the Debye function D₁(x).
See the wikipedia article https://en.wikipedia.org/wiki/Debye_function for more details.
>>> from ufunclab import debye1
>>> debye1([-2, -1.5, 0, 0.25, 0.5, 2.5, 50, 100])
array([1.60694728, 1.43614531, 1. , 0.93923503, 0.88192716,
0.53878957, 0.03289868, 0.01644934])
The script debye1_demo.py
in the examples
directory generates
the plot
expint1(x)
computes the exponential integral E₁ for the real input x.
>>> from ufunclab import expint1
>>> expint1([0.25, 2.5, 25])
array([1.04428263e+00, 2.49149179e-02, 5.34889976e-13])
log1p_doubledouble(z)
computes log(1 + z)
for complex z
. This is an alternative
to numpy.log1p
and scipy.special.log1p
.
The function uses double-double numbers for some intermediate calculations to avoid loss of precision near the circle |z + 1| = 1 in the complex plane.
>>> from ufunclab import log1p_doubledouble
>>> z = -0.57113-0.90337j
>>> log1p_doubledouble(z)
(3.4168883248419116e-06-1.1275564209486122j)
Currently, the only "inner loop" implemented for this ufunc is for
the data type np.complex128
, so it will always return a complex result:
>>> log1p_doubledouble.types
['D->D']
>>> log1p_doubledouble([-6e-3, 0, 1e-12, 5e-8, 0.25, 1])
array([-6.01807233e-03+0.j, 0.00000000e+00+0.j, 1.00000000e-12+0.j,
4.99999988e-08+0.j, 2.23143551e-01+0.j, 6.93147181e-01+0.j])
log1p_theorem4(z)
computes log(1 + z)
for complex z
. This is an alternative
to numpy.log1p
and scipy.special.log1p
.
The function uses the "trick" given as Theorem 4 of Goldberg's paper "What every computer scientist should know about floating-point arithmetic".
The precision provided by this function depends on the precision of the function
clog
(complex logarithm) provided by the underlying C math library that is used
to build ufunclab
.
>>> from ufunclab import log1p_theorem4
>>> z = -0.57113-0.90337j
>>> log1p_theorem4(z)
(3.4168883248419116e-06-1.1275564209486122j)
Currently, the only "inner loop" implemented for this ufunc is for
the data type np.complex128
, so it will always return a complex result:
>>> log1p_theorem4.types
['D->D']
>>> log1p_theorem4([-6e-3, 0, 1e-12, 5e-8, 0.25, 1])
array([-6.01807233e-03+0.j, 0.00000000e+00+0.j, 1.00000000e-12+0.j,
4.99999988e-08+0.j, 2.23143551e-01+0.j, 6.93147181e-01+0.j])
logexpint1(x)
computes the logarithm of the exponential integral E₁ for the real input x.
expint1(x)
underflows to 0 for sufficiently large x:
>>> from ufunclab import expint1, logexpint1
>>> expint1([650, 700, 750, 800])
array([7.85247922e-286, 1.40651877e-307, 0.00000000e+000, 0.00000000e+000])
logexpint1
avoids the underflow by computing the logarithm of the value:
>>> logexpint1([650, 700, 750, 800])
array([-656.47850729, -706.55250586, -756.62140388, -806.68585939])
loggamma1p
computes log(gamma(1 + x))
for real x
> -1.
It avoids the loss of precision in the expression 1 + x
that occurs
when x
is very small.
>>> from ufunclab import loggamma1p
>>> x = -3e-11
>>> loggamma1p(x)
1.7316469947786207e-11
That result is accurate to machine precision. The naive calculation
loses precision in the sum 1 + x
; in the following result that uses
math.lgamma
, only about five digits are correct:
>>> import math
>>> math.lgamma(1 + x)
1.7316037492776104e-11
logistic(x)
computes the standard logistic sigmoid function.
The script logistic_demo.py
in the examples
directory generates
this plot:
logistic_deriv(x)
computes the derivative of the standard logistic sigmoid
function.
See logistic
(above) for a plot.
log_logistic(x)
computes the logarithm of the standard logistic sigmoid
function.
>>> import numpy as np
>>> from ufunclab import log_logistic
>>> x = np.array([-800, -500, -0.5, 10, 250, 500])
>>> log_logistic(x)
array([-8.00000000e+002, -5.00000000e+002, -9.74076984e-001,
-4.53988992e-005, -2.66919022e-109, -7.12457641e-218])
Compare that to the output of log(expit(x))
, which triggers a warning
and loses all precision for inputs with large magnitudes:
>>> from scipy.special import expit
>>> np.log(expit(x))
<stdin>:1: RuntimeWarning: divide by zero encountered in log
array([ -inf, -5.00000000e+02, -9.74076984e-01,
-4.53988992e-05, 0.00000000e+00, 0.00000000e+00])
swish(x, beta)
computes x * logistic(beta*x)
, where logistic(x)
is the standard logistic sigmoid function. The function is a type
of smoothed ramp.
hyperbolic_ramp(x, a)
computes the function
hyperbolic_ramp(x, a) = (x + sqrt(x*x + 4*a*a))/2
It is a smoothed ramp function. The scaling of the parameters is chosen
so that hyperbolic_ramp(0, a)
is a
.
exponential_ramp(x, a)
computes the function
exponential_ramp(x, a) = a*log_2(1 + 2**(x/a))
It is a smoothed ramp function that converges exponentially fast to
the asymptotes. The scaling of the parameters is chosen so that
exponential_ramp(0, a)
is a
.
The function is also known as the "softplus" function.
yeo_johnson
computes the Yeo-Johnson transform.
>>> import numpy as np
>>> from ufunclab import yeo_johnson
>>> yeo_johnson([-1.5, -0.5, 2.8, 7, 7.1], 2.3)
array([-0.80114069, -0.38177502, 8.93596922, 51.4905317 , 52.99552905])
>>> yeo_johnson.outer([-1.5, -0.5, 2.8, 7, 7.1], [-2.5, 0.1, 2.3, 3])
array([[-13.50294123, -2.47514321, -0.80114069, -0.6 ],
[ -1.15561576, -0.61083954, -0.38177502, -0.33333333],
[ 0.38578977, 1.42821388, 8.93596922, 17.95733333],
[ 0.39779029, 2.31144413, 51.4905317 , 170.33333333],
[ 0.39785786, 2.32674755, 52.99552905, 176.81366667]])
inv_yeo_johnson
computes the inverse of the Yeo-Johnson transform.
>>> import numpy as np
>>> from ufunclab import inv_yeo_johnson, yeo_johnson
>>> x = inv_yeo_johnson([-1.5, -0.5, 2.8, 7, 7.1], 2.5)
>>> x
array([-15. , -0.77777778, 1.29739671, 2.21268904,
2.22998502])
>>> yeo_johnson(x, 2.5)
array([-1.5, -0.5, 2.8, 7. , 7.1])
erfcx(x)
computes the scaled complementary error function,
exp(x**2) * erfc(x)
. The function is implemented for NumPy types
float32
, float64
and longdouble
(also known as float128
):
>>> from ufunclab import erfcx
>>> erfcx.types
['f->f', 'd->d', 'g->g']
This example is run on a platform where the longdouble
type
corresponds to a float128
with 80 bits of precision:
>>> import numpy as np
>>> b = np.longdouble('1.25e2000')
>>> x = np.array([-40, -1.0, 0, 2.5, 3000.0, b])
>>> x
array([-4.00e+0001, -1.00e+0000, 0.00e+0000, 1.00e+0003, 1.25e+2000],
dtype=float128)
>>> erfcx(x)
array([1.48662366e+0695, 5.00898008e+0000, 1.00000000e+0000,
5.64189301e-0004, 4.51351667e-2001], dtype=float128)
The submodule normal
defines several functions for the standard normal
probability distrbution.
normal.cdf(x)
computes the cumulative distribution function of the standard
normal distribution.
normal.logcdf(x)
computes the natural logarithm of the CDF of the standard
normal distribution.
normal.sf(x)
computes the survival function of the standard normal distribution.
This function is also known as the complementary CDF, and is often abbreviated
as ccdf
.
normal.logsf(x)
computes the natural logarithm of the survival function of the
standard normal distribution.
The submodule ufunclab.semivar
defines the ufuncs exponential
, linear
,
spherical
and parabolic
. The script semivar_demo.py
in the examples
directory generates the following plot.
semivar.exponential(h, nugget, sill, rng)
computes the exponential semivariogram.
semivar.linear(h, nugget, sill, rng)
computes the linear semivariogram.
semivar.spherical(h, nugget, sill, rng)
computes the spherical semivariogram.
semivar.parabolic(h, nugget, sill, rng)
computes the parabolic semivariogram.
first
is a gufunc with signature (n),(),(),()->()
that returns the first
value that matches a given comparison. The function signature is
first(x, op, target, otherwise)
, where op
is one of the values in
ufunclab.op
that specifies the comparison to be made. otherwise
is the
value to be returned if no value in x
satisfies the given relation with
target
.
Find the first nonzero value in a
:
>>> import numpy as np
>>> from ufunclab import first, op
>>> a = np.array([0, 0, 0, 0, 0, -0.5, 0, 1, 0.1])
>>> first(a, op.NE, 0.0, 0.0)
-0.5
Find the first value in each row of b
that is less than 0.
If there is no such value, return 0:
>>> b = np.array([[10, 23, -10, 0, -9],
... [18, 28, 42, 33, 71],
... [17, 29, 16, 14, -7]], dtype=np.int8)
...
>>> first(b, op.LT, 0, 0)
array([-10, 0, -7], dtype=int8)
The function stops at the first occurrence, so it will be faster when the condition is met at the beginning of the array than when the condition doesn't occur until near the end of the array.
In the following, the condition occurs in the last element of x
, and
the result of the timeit
call is 0.6 seconds.
>>> from timeit import timeit
>>> x = np.ones(100000)
>>> x[-1] = 0
>>> first(x, op.LT, 1.0, np.nan)
0.0
>>> timeit('first(x, op.LT, 1.0, np.nan)', number=20000, globals=globals())
0.6052625320153311
The time is reduced to 0.05 seconds when the first occurrence of the
condition is in the first element of x
.
>>> x[0] = -1.0
>>> first(x, op.LT, 1.0, np.nan)
-1.0
>>> timeit('first(x, op.LT, 1.0, np.nan)', number=20000, globals=globals())
0.049594153999350965
argfirst
is a gufunc (signature (n),(),()->()
) that finds the index of
the first true value of a comparison of an array with a target value. If no
value is found, -1 is return. Some examples follow.
>>> import numpy as np
>>> from ufunclab import argfirst, op
Find the index of the first occurrence of 0 in x
:
>>> x = np.array([10, 35, 19, 0, -1, 24, 0])
>>> argfirst(x, op.EQ, 0)
3
Find the index of the first nonzero value in a
:
>>> a = np.array([0, 0, 0, 0, 0, -0.5, 0, 1, 0.1])
>>> argfirst(a, op.NE, 0.0)
5
argfirst
is a gufunc, so it can handle higher-dimensional
array arguments, and among its gufunc-related parameters is
axis
. By default, the gufunc operates along the last axis.
For example, here we find the location of the first nonzero
element in each row of b
:
>>> b = np.array([[0, 8, 0, 0], [0, 0, 0, 0], [0, 0, 9, 2]],
... dtype=np.uint8)
>>> b
array([[0, 8, 0, 0],
[0, 0, 0, 0],
[0, 0, 9, 2]])
>>> argfirst(b, op.NE, np.uint8(0))
array([ 1, -1, 2])
If we give the argument axis=0
, we tell argfirst
to
operate along the first axis, which in this case is the
columns:
>>> argfirst(b, op.NE, np.uint8(0), axis=0)
array([-1, 0, 2, 2])
argmin
is a gufunc
with signature (n)->()
that is similar to numpy.argmin
.
>>> from ufunclab import argmin
>>> x = np.array([[11, 10, 10, 23, 31],
... [19, 20, 21, 22, 22],
... [16, 15, 16, 14, 14]])
>>> argmin(x, axis=1) # same as argmin(x)
array([1, 0, 3])
>>> argmin(x, axis=0)
array([0, 0, 0, 2, 2])
argmax
is a gufunc
with signature (n)->()
that is similar to numpy.argmax
.
>>> from ufunclab import argmax
>>> x = np.array([[11, 10, 10, 23, 31],
... [19, 20, 21, 22, 22],
... [16, 15, 16, 14, 14]])
>>> argmax(x, axis=1) # same as argmax(x)
array([4, 3, 0])
>>> argmax(x, axis=0)
array([1, 1, 1, 0, 0])
minmax
is a gufunc
(signature (n)->(2)
) that simultaneously computes
the minimum and maximum of a NumPy array.
The function handles the standard integer and floating point types, and
object arrays. The function will not accept complex arrays, nor arrays with
the data types datetime64
or timedelta64
. Also, the function does not
implement any special handling of nan
, so the behavior of this function
with arrays containing nan
is undefined.
For an input with more than one dimension, minmax
is applied to the
last axis. For example, if a
has shape (L, M, N), then minmax(a)
has
shape (L, M, 2).
>>> x = np.array([5, -10, -25, 99, 100, 10], dtype=np.int8)
>>> minmax(x)
array([-25, 100], dtype=int8)
>>> np.random.seed(12345)
>>> y = np.random.randint(-1000, 1000, size=(3, 3, 5)).astype(np.float32)
>>> y
array([[[-518., 509., 309., -871., 444.],
[ 449., -618., 381., -454., 565.],
[-231., 142., 393., 339., -346.]],
[[-895., 115., -241., 398., 232.],
[-118., -287., -733., 101., 674.],
[-919., 746., -834., -737., -957.]],
[[-769., -977., 53., -48., 463.],
[ 311., -299., -647., 883., -145.],
[-964., -424., -613., -236., 148.]]], dtype=float32)
>>> mm = minmax(y)
>>> mm
array([[[-871., 509.],
[-618., 565.],
[-346., 393.]],
[[-895., 398.],
[-733., 674.],
[-957., 746.]],
[[-977., 463.],
[-647., 883.],
[-964., 148.]]], dtype=float32)
>>> mm.shape
(3, 3, 2)
>>> z = np.array(['foo', 'xyz', 'bar', 'abc', 'def'], dtype=object)
>>> minmax(z)
array(['abc', 'xyz'], dtype=object)
>>> from fractions import Fraction
>>> f = np.array([Fraction(1, 3), Fraction(3, 5),
... Fraction(22, 7), Fraction(5, 2)], dtype=object)
>>> minmax(f)
array([Fraction(1, 3), Fraction(22, 7)], dtype=object)
argminmax
is a gufunc
(signature (n)->(2)
) that simultaneously
computes the argmin
and argmax
of a NumPy array.
>>> y = np.array([[-518, 509, 309, -871, 444, 449, -618, 381],
... [-454, 565, -231, 142, 393, 339, -346, -895],
... [ 115, -241, 398, 232, -118, -287, -733, 101]],
... dtype=np.float32)
>>> argminmax(y)
array([[3, 1],
[7, 1],
[6, 2]])
>>> argminmax(y, axes=[0, 0])
array([[0, 2, 1, 0, 2, 2, 2, 1],
[2, 1, 2, 2, 0, 0, 1, 0]])
min_argmin
is a gufunc (signature (n)->(),()
) that returns both
the extreme value and the index of the extreme value.
>>> x = np.array([[ 1, 10, 18, 17, 11],
... [15, 11, 0, 4, 8],
... [10, 10, 12, 11, 11]])
>>> min_argmin(x, axis=1)
(array([ 1, 0, 10]), array([0, 2, 0]))
max_argmax
is a gufunc (signature (n)->(),()
) that returns both
the extreme value and the index of the extreme value.
>>> x = np.array([[ 1, 10, 18, 17, 11],
... [15, 11, 0, 4, 8],
... [10, 10, 12, 11, 11]])
>>> max_argmax(x, axis=1)
(array([18, 15, 12]), array([2, 0, 2]))
>>> from fractions import Fraction as F
>>> y = np.array([F(2, 3), F(3, 4), F(2, 7), F(2, 5)])
>>> max_argmax(y)
(Fraction(3, 4), 1)
searchsortedl
is a gufunc with signature (n),()->()
. The function
is equivalent to numpy.searchsorted
with side='left'
, but as a gufunc,
it supports broadcasting of its arguments. (Note that searchsortedl
does not provide the sorter
parameter.)
>>> import numpy as np
>>> from ufunclab import searchsortedl
>>> searchsortedl([1, 1, 2, 3, 5, 8, 13, 21], [1, 4, 15, 99])
array([0, 4, 7, 8])
>>> arr = np.array([[1, 1, 2, 3, 5, 8, 13, 21],
... [1, 1, 1, 1, 2, 2, 10, 10]])
>>> searchsortedl(arr, [7, 8])
array([5, 6])
>>> searchsortedl(arr, [[2], [5]])
array([[2, 4],
[4, 6]])
searchsortedr
is a gufunc with signature (n),()->()
. The function
is equivalent to numpy.searchsorted
with side='right'
, but as a gufunc,
it supports broadcasting of its arguments. (Note that searchsortedr
does not provide the sorter
parameter.)
>>> import numpy as np
>>> from ufunclab import searchsortedr
>>> searchsortedr([1, 1, 2, 3, 5, 8, 13, 21], [1, 4, 15, 99])
array([2, 4, 7, 8])
>>> arr = np.array([[1, 1, 2, 3, 5, 8, 13, 21],
... [1, 1, 1, 1, 2, 2, 10, 10]])
>>> searchsortedr(arr, [7, 8])
array([5, 6])
>>> searchsortedr(arr, [[2], [5]])
array([[3, 6],
[5, 6]])
peaktopeak
is a gufunc
(signature (n)->()
) that computes the
peak-to-peak range of a NumPy array. It is like the ptp
method
of a NumPy array, but when the input is signed integers, the output
is an unsigned integer with the same bit width.
The function does not accept complex arrays. Also, the function does not
implement any special handling of nan
, so the behavior of this function
with arrays containing nan
is undefined (i.e. it might not do what you
want, and the behavior might change in the next update of the software).
>>> x = np.array([85, 125, 0, -75, -50], dtype=np.int8)
>>> p = peaktopeak(x)
>>> p
200
>>> type(p)
numpy.uint8
Compare that to the ptp
method, which returns a value with the
same data type as the input:
>>> q = x.ptp()
>>> q
-56
>>> type(q)
numpy.int8
f
is an object array of Fraction
s and has shape (2, 4).
>>> from fractions import Fraction
>>> f = np.array([[Fraction(1, 3), Fraction(3, 5),
... Fraction(22, 7), Fraction(5, 2)],
... [Fraction(-2, 9), Fraction(1, 3),
... Fraction(2, 3), Fraction(5, 9)]], dtype=object)
>>> peaktopeak(x)
array([Fraction(59, 21), Fraction(8, 9)], dtype=object)
all_same
is a gufunc (signature (n)->()
) that tests that all the
values in the array along the given axis are the same.
(Note: handling of datetime64
, timedelta64
and complex data types
are not implemented yet.)
>>> x = np.array([[3, 2, 2, 3, 2, 2, 3, 1, 3],
... [1, 2, 2, 2, 2, 2, 3, 1, 1],
... [2, 3, 3, 1, 2, 3, 3, 1, 2]])
>>> all_same(x, axis=0)
array([False, False, False, False, True, False, True, True, False])
>>> all_same(x, axis=1)
array([False, False, False])
Object arrays are handled.
>>> a = np.array([[None, "foo", 99], [None, "bar", "abc"]])
>>> a
array([[None, 'foo', 99],
[None, 'bar', 'abc']], dtype=object)
>>> all_same(a, axis=0)
array([ True, False, False])
gmean
is a gufunc (signature (n)->()
) that computes the
geometric mean.
For example,
>>> import numpy as np
>>> from ufunclab import gmean
>>> x = np.array([1, 2, 3, 5, 8], dtype=np.uint8)
>>> gmean(x)
2.992555739477689
>>> y = np.arange(1, 16).reshape(3, 5)
>>> y
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> gmean(y, axis=1)
array([ 2.60517108, 7.87256685, 12.92252305])
hmean
is a gufunc (signature (n)->()
) that computes the
harmonic mean.
For example,
>>> import numpy as np
>>> from ufunclab import hmean
>>> x = np.array([1, 2, 3, 5, 8], dtype=np.uint8)
>>> hmean(x)
2.316602316602317
>>> y = np.arange(1, 16).reshape(3, 5)
>>> y
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> hmean(y, axis=1)
array([ 2.18978102, 7.74431469, 12.84486077])
meanvar
is a gufunc (signature (n),()->(2)
) that computes both
the mean and variance in one function call.
For example,
>>> import numpy as np
>>> from ufunclab import meanvar
>>> meanvar([1, 2, 4, 5], 0) # Use ddof=0.
array([3. , 2.5])
Apply meanvar
with ddof=1
to the rows of a 2-d array.
The output has shape (4, 2)
; the first column holds the
means, and the second column holds the variances.
>>> x = np.array([[1, 4, 4, 2, 1, 1, 2, 7],
... [0, 0, 9, 4, 1, 0, 0, 1],
... [8, 3, 3, 3, 3, 3, 3, 3],
... [5, 5, 5, 5, 5, 5, 5, 5]])
>>> meanvar(x, 1) # Use ddof=1.
array([[ 2.75 , 4.5 ],
[ 1.875, 10.125],
[ 3.625, 3.125],
[ 5. , 0. ]])
Compare to the results of numpy.mean
and numpy.var
:
>>> np.mean(x, axis=1)
array([2.75 , 1.875, 3.625, 5. ])
>>> np.var(x, ddof=1, axis=1)
array([ 4.5 , 10.125, 3.125, 0. ])
mad(x, unbiased)
computes the mean absolute difference
of a 1-d array (gufunc signature is (n),()->()
). When the second parameter
is False, mad
is the standard calculation (sum of the absolute differences
divided by n**2
). When the second parameter is True, mad
is the unbiased
estimator (sum of the absolute differences divided by n*(n-1)
).
For example,
>>> import numpy as np
>>> from ufunclab import mad
>>> x = np.array([1.0, 1.0, 2.0, 3.0, 5.0, 8.0])
>>> mad(x, False)
2.6666666666666665
>>> y = np.linspace(0, 1, 21).reshape(3, 7)**2
>>> y
array([[0. , 0.0025, 0.01 , 0.0225, 0.04 , 0.0625, 0.09 ],
[0.1225, 0.16 , 0.2025, 0.25 , 0.3025, 0.36 , 0.4225],
[0.49 , 0.5625, 0.64 , 0.7225, 0.81 , 0.9025, 1. ]])
>>> mad(y, False, axis=1)
array([0.03428571, 0.11428571, 0.19428571])
When the second parameter is True
, the calculation is the unbiased
estimate of the mean absolute difference.
>>> mad(x, True)
3.2
>>> mad(y, True, axis=1)
array([0.04 , 0.13333333, 0.22666667])
rmad(x, unbiased)
computes the relative mean absolute difference (gufunc
signature is (n),()->()
).
rmad
is twice the Gini coefficient.
For example,
>>> import numpy as np
>>> from ufunclab import rmad
>>> x = np.array([1.0, 1.0, 2.0, 3.0, 5.0, 8.0])
>>> rmad(x, False)
0.7999999999999999
>>> y = np.linspace(0, 1, 21).reshape(3, 7)**2
>>> y
array([[0. , 0.0025, 0.01 , 0.0225, 0.04 , 0.0625, 0.09 ],
[0.1225, 0.16 , 0.2025, 0.25 , 0.3025, 0.36 , 0.4225],
[0.49 , 0.5625, 0.64 , 0.7225, 0.81 , 0.9025, 1. ]])
>>> rmad(y, False, axis=1)
array([1.05494505, 0.43956044, 0.26523647])
When the second parameter is True
, the calculation is based on the
unbiased estimate of the mean absolute difference (MAD).
>>> rmad(x, True)
0.96
>>> rmad(y, True, axis=1)
array([1.23076923, 0.51282051, 0.30944255])
gini(x, unbiased)
is a gufunc with signature (n),()->()
that computes the
Gini coefficient of the data in x
.
>>> from ufunclab import gini
>>> gini([1, 2, 3, 4], False)
0.25
>>> income = [20, 30, 40, 50, 60, 70, 80, 90, 120, 150]
>>> gini(income, False)
0.3028169014084507
When the second parameter is True
, the calculation is based on the
unbiased estimate of the mean absolute difference (MAD).
>>> gini([1, 2, 3, 4], True)
0.33333333333333337
>>> income = [20, 30, 40, 50, 60, 70, 80, 90, 120, 150]
>>> gini(income, True)
0.3364632237871674
rms(x)
computes the root-mean-square value for a collection of values.
It is a gufunc with signature (n)->()
. The implementation is for
float and complex types; integer types are cast to float.
>>> import numpy as np
>>> from ufunclab import rms
>>> x = np.array([1, 2, -1, 0, 3, 2, -1, 0, 1])
>>> rms(x)
1.5275252316519468
Compare to:
>>> np.sqrt(np.mean(x**2))
1.5275252316519468
A complex example:
>>> z = np.array([1-1j, 2+1.5j, -3-2j, 0.5+1j, 2.5j], dtype=np.complex64)
>>> rms(z)
2.3979158
An equivalent NumPy expression:
>>> np.sqrt(np.mean(z.real**2 + z.imag**2))
2.3979158
vnorm(x, p)
computes the vector p-norm of 1D arrays. It is a gufunc with
signatue (n), () -> ()
.
For example, to compute the 2-norm of [3, 4]:
>>> import numpy as np
>>> from ufunclab import vnorm
>>> vnorm([3, 4], 2)
5.0
Compute the p-norm of [3, 4] for several values of p:
>>> vnorm([3, 4], [1, 2, 3, np.inf])
array([7. , 5. , 4.49794145, 4. ])
Compute the 2-norm of four 2-d vectors:
>>> vnorm([[3, 4], [5, 12], [0, 1], [1, 1]], 2)
array([ 5. , 13. , 1. , 1.41421356])
For the same vectors, compute the p-norm for p = [1, 2, inf]:
>>> vnorm([[3, 4], [5, 12], [0, 1], [1, 1]], [[1], [2], [np.inf]])
array([[ 7. , 17. , 1. , 2. ],
[ 5. , 13. , 1. , 1.41421356],
[ 4. , 12. , 1. , 1. ]])
vnorm
handles complex numbers. Here we compute the norm of z
with orders 1, 2, 3, and inf. (Note that abs(z)
is [2, 5, 0, 14].)
>>> z = np.array([-2j, 3+4j, 0, 14])
>>> vnorm(z, [1, 2, 3, np.inf])
array([21. , 15. , 14.22263137, 14. ])
percentileofscore(x, score, kind)
, a gufunc with signature (n),(),()->()
,
performs the same calculation as scipy.stats.percentileofscore
. As a gufunc,
it broadcasts all of its arguments, including kind
.
Unlike the kind
parameter of scipy.stats.percentilescore
, here the third
argument is required, and it must be an integer type. The allowed values
are ufunclab.op.KIND_RANK
, ufunclab.op.KIND_WEAK
, ufunclab.op.KIND_STRICT
and ufunclab.op.KIND_MEAN
.
>>> import numpy as np
>>> from ufunclab import percentileofscore, op
>>> a = [1, 2, 3, 3, 4]
>>> percentileofscore(a, [2, 2.5, 3, 3.5], op.KIND_RANK)
array([40., 40., 70., 80.])
With broadcasting, all four kinds of the calculation can be performed with one function call.
>>> percentileofscore(a, 3, [op.KIND_RANK, op.KIND_WEAK, op.KIND_STRICT, op.KIND_MEAN])
array([70., 80., 40., 60.])
A more common application of broadcasting might be to compute the percentile of a score in several arrays.
>>> rng = np.random.default_rng(121263137472525314065)
>>> x = rng.integer(0, 20, size=(3, 16))
array([[19, 0, 13, 0, 6, 19, 17, 9, 0, 9, 11, 14, 6, 15, 18, 3],
[ 7, 0, 16, 3, 5, 1, 1, 16, 0, 16, 3, 14, 5, 2, 10, 6],
[17, 10, 5, 17, 8, 11, 13, 18, 15, 7, 3, 3, 6, 0, 17, 15]])
Get the percentile of 10 in each row of x:
>>> percentileofscore(x, 10, op.KIND_RANK)
array([50., 75., 50.])
Get the percentiles of 9 and 10 for each row. For broadcasting, the score
has shape (2, 1).
>>> percentileofscore(x, [[9], [10]], op.KIND_RANK)
array([[46.875, 68.75 , 43.75 ],
[50. , 75. , 50. ]])
vdot(x, y)
is the vector dot product of the real floating point vectors
x
and y
. It is a gufunc with signature (n),(n)->()
.
>>> import numpy as np
>>> from ufunclab import vdot
>>> x = np.array([[1, -2, 3],
... [4, 5, 6]])
>>> y = np.array([[-1, 0, 3],
[1, 1, 1]])
>>> vdot(x, y) # Default axis is -1.
array([ 8., 15.])
>>> vdot(x, y, axis=0)
array([ 3., 5., 15.])
pearson_corr(x, y)
computes Pearson's product-moment correlation coefficient.
It is a gufunc with shape signature (n),(n)->()
.
>>> import numpy as np
>>> from ufunclab import pearson_corr
>>> x = np.array([1.0, 2.0, 3.5, 7.0, 8.5, 10.0, 11.0])
>>> y = np.array([10, 11.5, 11.4, 13.6, 15.1, 16.7, 15.0])
>>> pearson_corr(x, y)
0.9506381287828245
In the following example, a trivial dimension is added to the array a
before
passing it to pearson_corr
, so the inputs are compatible for broadcasting.
The correlation coefficient of each row of a
with each row of b
is computed,
giving a result with shape (3, 2).
>>> a = np.array([[2, 3, 1, 3, 5, 8, 8, 9],
... [3, 3, 1, 2, 2, 4, 4, 5],
... [2, 5, 1, 2, 2, 3, 3, 8]])
>>> b = np.array([[9, 8, 8, 7, 4, 4, 1, 2],
... [8, 9, 9, 6, 5, 7, 3, 4]])
>>> pearson_corr(np.expand_dims(a, 1), b)
array([[-0.92758645, -0.76815464],
[-0.65015428, -0.53015896],
[-0.43575108, -0.32925148]])
wjaccard
is a gufunc with shape signature (n),(n)->()
that computes
the weighted Jaccard index, which is defined to be
Jw(x, y) = min(x, y).sum() / max(x, y).sum()
>>> import numpy as np
>>> from ufunclab import wjaccard
>>> x = np.array([0.9, 1.0, 0.7, 0.0, 0.8, 0.6])
>>> y = np.array([0.3, 1.0, 0.9, 0.6, 1.0, 0.2])
>>> wjaccard(x, y)
0.6
nextn_greater(x, n, out=None, axis=-1)
is a Python function that wraps a
gufunc with signature ()->(n)
. Given a floating point scalar x
, it
computes the next n
values greater than x
.
>>> import numpy as np
>>> from ufunclab import nextn_greater
>>> x = np.float32(2.5)
>>> nextn_greater(x, 5)
array([2.5000002, 2.5000005, 2.5000007, 2.500001 , 2.5000012],
dtype=float32)
nextn_less(x, n, out=None, axis=-1)
is a Python function that wraps a
gufunc with signature ()->(n)
. Given a floating point scalar x
, it
computes the next n
values less than x
.
>>> import numpy as np
>>> from ufunclab import nextn_less
>>> x = np.float32(2.5)
>>> nextn_less(x, 5)
array([2.4999998, 2.4999995, 2.4999993, 2.499999 , 2.4999988],
dtype=float32)
one_hot(k, n, out=None, axis=-1)
is a Python function that wraps
a gufunc with signature ()->(n)
. Given integers k
and n
,
it returns a 1-d integer array with length n
, where the value is
1 at index k
and 0 elsewhere. If k
is less than 0 or greater
than n - 1
, the array will be all zeros.
>>> from ufunclab import one_hot
>>> one_hot(3, 10)
array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0])
>>> one_hot([3, 7, 8], 10)
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cross2(u, v)
is a gufunc with signature (2),(2)->()
. It computes
the 2-d cross product that returns a scalar. That is, cross2([u0, u1], [v0, v1])
is u0*v1 - u1*v0
. The calculation is the same as that of numpy.cross
,
but cross2
is restricted to 2-d inputs.
For example,
>>> import numpy as np
>>> from ufunclab import cross2
>>> cross2([1, 2], [5, 3])
-7
>>> cross2([[1, 2], [6, 0]], [[5, 3], [2, 3]])
array([-7, 18])
In the following, a
and b
are object arrays; a
has shape (2,),
and b
has shape (3, 2). The result of cross2(a, b)
has shape
(3,).
>>> from fractions import Fraction as F
>>> a = np.array([F(1, 3), F(2, 7)])
>>> b = np.array([[F(7, 4), F(6, 7)], [F(2, 5), F(-3, 7)], [1, F(1, 4)]])
>>> cross2(a, b)
array([Fraction(-3, 14), Fraction(-9, 35), Fraction(-17, 84)],
dtype=object)
cross3(u, v)
is a gufunc with signature (3),(3)->(3)
. It computes
the 3-d vector cross product (like numpy.cross
, but specialized to the
case of 3-d vectors only).
For example,
>>> import numpy as np
>>> from ufunclab import cross3
>>> u = np.array([1, 2, 3])
>>> v = np.array([2, 2, -1])
>>> cross3(u, v)
array([-8, 7, -2])
In the following, x
has shape (5, 3), and y
has shape (2, 1, 3).
The result of cross3(x, y)
has shape (2, 5, 3).
>>> x = np.arange(15).reshape(5, 3)
>>> y = np.round(10*np.sin(np.linspace(0, 2, 6))).reshape(2, 1, 3)
>>> x
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> y
array([[[ 0., 4., 7.]],
[[ 9., 10., 9.]]])
>>> cross3(x, y)
array([[[ -1., 0., 0.],
[ 8., -21., 12.],
[ 17., -42., 24.],
[ 26., -63., 36.],
[ 35., -84., 48.]],
[[-11., 18., -9.],
[-14., 18., -6.],
[-17., 18., -3.],
[-20., 18., 0.],
[-23., 18., 3.]]])
tri_area(p)
is a gufunc with signature (3, n) -> ()
. It computes the
area of a triangle defined by three points in n-dimensional space.
>>> import numpy as np
>>> from ufunclab import tri_area
`p` has shape (2, 3, 4). It contains the vertices
of two triangles in 4-dimensional space.
>>> p = np.array([[[0.0, 0.0, 0.0, 6.0],
[1.0, 2.0, 3.0, 6.0],
[0.0, 2.0, 2.0, 6.0]],
[[1.5, 1.0, 2.5, 2.0],
[4.0, 1.0, 0.0, 2.5],
[2.0, 1.0, 2.0, 2.5]]])
>>> tri_area(p)
array([1.73205081, 0.70710678])
tri_area_indexed(p, i)
is a gufunc with signature (m, n),(3) -> ()
.
It computes the area of a triangle defined by three points in n-dimensional
space. The first argument, p
, is an array with shape (m, n)
holding m
points in n-dimensional space. The second argument, i
, is a 1-d array with
length three that holds indices into p
. The core calculation is equivalent
to tri_area(p[i])
.
>>> import numpy as np
>>> from ufunclab import tri_area_indexed, tri_area
>>> p = np.array([[0.0, 0.0, 0.0, 6.0],
[1.0, 2.0, 3.0, 6.0],
[0.0, 2.0, 2.0, 6.0],
[1.5, 1.0, 2.5, 2.0],
[4.0, 1.0, 0.0, 2.5],
[2.0, 1.0, 2.0, 2.5]])
>>> tri_area_indexed(p, [0, 2, 3])
6.224949798994367
>>> tri_area(p[[0, 2, 3]])
6.224949798994367
Compute the areas of several triangles formed from points in `p`.
Note that the last two are the same triangles.
>>> tri_area_indexed(p, [[0, 2, 3], [1, 3, 4], [3, 4, 5], [-1, -2, -3]])
array([6.2249498 , 7.46449931, 0.70710678, 0.70710678])
fillnan1d(x)
is a gufunc with signature (n)->(n)
. It uses linear
interpolation to replace occurrences of nan
in x
.
>>> import numpy as np
>>> from ufunclab import fillnan1d
>>> x = np.array([1.0, 2.0, np.nan, np.nan, 3.5, 5.0, np.nan, 7.5])
>>> fillnan1d(x)
array([1. , 2. , 2.5 , 3. , 3.5 , 5. , 6.25, 7.5 ])
nan
values at the ends of x
are replaced with the nearest non-nan
value:
>>> x = np.array([np.nan, 2.0, np.nan, 5.0, np.nan, np.nan])
>>> fillnan1d(x)
array([2. , 2. , 3.5, 5. , 5. , 5. ])
This plot of the result of applying fillnan1d(x)
to a bigger sample is
generated by the script examples/fillnan1d_demo.py
:
linear_interp1d(x, xp, fp)
is a gufunc with signature (),(n),(n)->()
. It
is like numpy.interp
, but as a gufunc, it will broadcast its arguments.
Currently the function provides just the basic interpolation function. It does
not extrapolate, so any x
values outside of the interval [xp[0], xp[-1]]
will
result in nan
. The period
option of numpy.interp
is also not implemented.
The function has a special code path in which the computation of the indices
where the values in x
lie in xp
is done once for each input. To activate
this code path, the appropriate shapes of the input arrays must be used. Some
familiarity with the broadcasting behavior of gufuncs is necessary to make the
most of this fast code path. The arrays must be shaped so that in the internal
gufunc loop, the strides associated with the x
and xp
arrays are 0.
Some examples of the use of linear_interp1d
follow.
>>> import numpy as np
>>> from ufunclab import linear_interp1d
This example is the same as numpy.interp
. xp
and fp
are the known values,
and we want to evaluate the interpolated function at x = [0, 0.25, 1, 2, 5, 6]
.
>>> xp = np.array([0, 1, 3, 5, 8])
>>> fp = np.array([10, 15, 15, 20, 80])
>>> x = np.array([0, 0.25, 1, 2, 5, 6.5])
>>> linear_interp1d(x, xp, fp)
array([10. , 11.25, 15. , 15. , 20. , 50. ])
>>> np.interp(x, xp, fp)
array([10. , 11.25, 15. , 15. , 20. , 50. ])
With broadcasting, we can interpolate multiple functions (i.e. multiple 1-d arrays) in one call.
For example, in the array fp3
below, we want to treat each column as a
separate function to be interpolated (note that the first column is the
same as fp
above):
>>> fp3 = np.array([[10, 15, 10],
... [15, 14, 20],
... [15, 13, 40],
... [20, 12, 10],
... [80, 11, 0]])
We can do this with linear_interp1d
, but we have to adjust the shapes
of the inputs to broadcast correctly:
>>> linear_interp1d(x[:, None], xp, fp3.T)
array([[10. , 15. , 10. ],
[11.25, 14.75, 12.5 ],
[15. , 14. , 20. ],
[15. , 13.5 , 30. ],
[20. , 12. , 10. ],
[50. , 11.5 , 5. ]])
The script linear_interp1d_demo.py
in the examples
directory
provides a demonstration of the use of linear_interp1d
to interpolate
a parametric curve in two dimensions using arclength as the parameter.
It generates the following plot:
backlash(x, deadband, initial)
, a gufunc with signature (n),(),()->(n)
,
computes the "backlash" response of a signal; see the Wikipedia article
Backlash (engineering).
The function emulates the
backlash block
of Matlab's Simulink library.
For example,
>>> import numpy as np
>>> from ufunclab import backlash
>>> x = np.array([0, 0.5, 1, 1.1, 1.0, 1.5, 1.4, 1.2, 0.5])
>>> deadband = 0.4
>>> initial = 0
>>> backlash(x, deadband, initial)
array([0. , 0.3, 0.8, 0.9, 0.9, 1.3, 1.3, 1.3, 0.7])
The script backlash_demo.py
in the examples
directory generates
the plot
backlash_sum
computes the linear combination of m
backlash
operations.
This is known as the Prandtl-Ishlinskii hysteresis model. The function is a gufunc
with signature (n),(m),(m),(m)->(n),(m)
. The second return value is the final value of
each of the backlash processes.
For example,
>>> import numpy as np
>>> from ufunclab import backlash_sum
>>> x = np.array([0.0, 0.2, 0.5, 1.1, 1.25, 1.0, 0.2, -1])
Here the weights are all the same and happen to sum to 1, but that
is not required in general.
>>> w = np.array([0.25, 0.25, 0.25, 0.25])
>>> deadband = np.array([0.2, 0.4, 0.6, 0.8])
>>> initial = np.zeros(4)
>>> y, final = backlash_sum(x, w, deadband, initial)
>>> y
array([ 0. , 0.025 , 0.25 , 0.85 , 1. , 0.9875, 0.45 , -0.75 ])
Another example is in the script backlash_sum_demo.py
in the examples
directory. It passes two cycles of a sine wave with amplitude 2 through a
Prandtl-Ishlinskii model with three backlash operations. The weights are
w = [0.125, 0.25, 0.25]
, the deadband values are [0.5, 1.0, 1.5]
, and
the initial values are all zero.
hysteresis_relay(x, low_threshold, high_threshold, low_value, high_value, init)
,
a gufunc with signature (n),(),(),(),(),()->(n)
, passes x
through a relay with
hysteresis (like a Schmitt trigger).
The function is similar to the
relay block
of Matlab's Simulink library.
The script hysteresis_relay_demo.py
in the examples
directory generates
the plot
sosfilter(sos, x)
is a gufunc with signature (m,6),(n)->(n)
.
The function applies a discrete time linear filter to the input
array x
. The array sos
with shape (m,6)
represents the
linear filter using the second order sections format.
The function is like scipy.signal.sosfilt
, but this version does
not accept the zi
parameter. See sosfilter_ic
for a function
that accepts zi
.
The script sosfilter_demo.py
in the examples
directory generates
the plot
sosfilter_ic(sos, x, zi)
is a gufunc with signature
(m,6),(n),(m,2)->(n),(m,2)
. Like sosfilter
, the function applies
a discrete time linear filter to the input array x
. The array sos
with shape (m,6)
represents the linear filter using the second order
sections format.
This function is like scipy.signal.sosfilt
, but for sosfilter_ic
,
the zi
parameter is required. Also, because sosfilter_ic
is a gufunc,
it uses the gufunc rules for broadcasting. scipy.signal.sosfilt
handles
broadcasting of the zi
parameter differently.
sosfilter_ic_contig(sos, x, zi)
is a gufunc with signature
(m,6),(n),(m,2)->(n),(m,2)
. This function has the same inputs and
performs the same calculation as sosfilter_ic
, but it assumes that the
array inputs are all C-contiguous. It does not verify this; if an array
input is not C-contiguous, the results will be incorrect, and the program
might crash.
multivariate_logbeta(x)
is a gufunc with signature (n)->()
that computes
the logarithm of the multivariate beta function.
>>> import numpy as np
>>> from ufunclab import multivariate_logbeta
>>> x = np.array([1, 2.5, 7.25, 3])
>>> multivariate_logbeta(x)
-13.87374699005739
Compare to
>>> from scipy.special import gammaln
>>> gammaln(x).sum() - gammaln(x.sum())
-13.87374699005739
bincount(x, m=None, weights=None, out=None, axis=-1)
is a Python function
that wraps two gufuncs, one with shape signature (n)->(m)
(no weights) and
the other with shape signature (n),(n)->(m)
. The function is like
np.bincount
, but it accepts n-dimensional arrays. When the input has more
than one dimension, the operation is applied along the given axis.
>>> import numpy as np
>>> from ufunclab import bincount
Create an array to work with. x
is an array with shape (3, 12)
.
>>> rng = np.random.default_rng(121263137472525314065)
>>> x = rng.integers(0, 8, size=(3, 12))
>>> x
array([[7, 0, 5, 0, 2, 7, 7, 3, 0, 3, 4, 5],
[2, 6, 7, 1, 3, 0, 6, 1, 2, 0, 0, 6],
[0, 6, 1, 5, 2, 1, 4, 2, 6, 4, 2, 6]])
By default, bincount
operates along the last axis. The default
value of m
is one more than maximum value in x
, so in this case
the output length of the counts will be 8. That is, the output
array will have shape (3, 8)
.
>>> bincount(x)
array([[3, 0, 1, 2, 1, 2, 0, 3],
[3, 2, 2, 1, 0, 0, 3, 1],
[1, 2, 3, 0, 2, 1, 3, 0]], dtype=uint64)
If we given a value for m
that is larger than 8, the final values
will be 0.
>>> bincount(x, 10)
array([[3, 0, 1, 2, 1, 2, 0, 3, 0, 0],
[3, 2, 2, 1, 0, 0, 3, 1, 0, 0],
[1, 2, 3, 0, 2, 1, 3, 0, 0, 0]], dtype=uint64)
If the given value of m
is smaller than np.max(x) + 1
, the values
greater than or equal to m
are ignored.
>>> bincount(x, 4)
array([[3, 0, 1, 2],
[3, 2, 2, 1],
[1, 2, 3, 0]], dtype=uint64)
The axis
parameter selects the axis of x
along which bincount
is applied. In the following example, since x
has shape (3, 12)
,
the output has shape (8, 12)
when axis=0
is given.
>>> bincount(x, axis=0)
array([[1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0],
[0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 2, 0, 0, 1, 1, 0, 1, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0],
[0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 2, 0, 0, 0, 0, 1, 0, 1, 0, 0, 2],
[1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0]], dtype=uint64)
Some examples with weights.
>>> x = np.array([3, 4, 5, 1, 1, 0, 4])
>>> w = np.array([1.0, 0.25, 1.5, 0.5, 0.75, 1.0, 1.5])
>>> bincount(x, weights=w)
array([1. , 1.25, 0. , 1. , 1.75, 1.5 ])
>>> x = np.array([[1, 0, 2, 2],
... [0, 0, 0, 2]])
>>> w = np.array([0.25, 0.75, 0.75, 0.5])
>>> bincount(x, weights=w)
array([[0.75, 0.25, 1.25],
[1.75, 0. , 0.5 ]])
The weights
array can be integer, float, double, complex float or
complex double.
>>> x = np.array([[1, 0, 2, 2, 1, 4],
... [0, 0, 0, 2, 3, 3]])
>>> w = np.array([0.25-1j, 0.75+3j, 0.75+0.5j, 0.5+1j, 1.0, -3j],
... dtype=np.complex64)
>>> bincount(x, weights=w)
array([[0.75+3.j , 1.25-1.j , 1.25+1.5j, 0. +0.j , 0. -3.j ],
[1.75+2.5j, 0. +0.j , 0.5 +1.j , 1. -3.j , 0. +0.j ]],
dtype=complex64)
convert_to_base(k, base, ndigits, out=None, axis=-1)
is a Python function
that wraps a gufunc. The function converts an integer to a given base, using
ndigits
"digits". The output "digits" are the coefficients of powers of
base
that sum to k
.
A gufunc cannot accept an argument such as ndigits
that determines the
size of one of the outputs dimensions. This function is a wrapper of a
gufunc with signature (),()->(n)
. The wrapper interprets the inputs to
produce an out
array of the appropriate shape that is passed to the gufunc.
>>> import numpy as np
>>> from ufunclab import convert_to_base
>>> convert_to_base(1249, 8, ndigits=4)
array([1, 4, 3, 2])
That result follows from 1249 = 1*8**0 + 4*8**1 + 3*8**2 + 2*8**3.
Broadcasting applies to `k` and `base`:
>>> x = np.array([10, 24, 85]) # shape is (3,)
>>> base = np.array([[8], [16]]) # shape is (2, 1)
>>> convert_to_base(x, base, ndigits=4) # output shape is (2, 3, 4)
array([[[ 2, 1, 0, 0],
[ 0, 3, 0, 0],
[ 5, 2, 1, 0]],
[[10, 0, 0, 0],
[ 8, 1, 0, 0],
[ 5, 5, 0, 0]]])
gendot
creates a new gufunc (with signature (n),(n)->()
) that is
the composition of two ufuncs. The first ufunc must be an element-wise
ufunc with two inputs and one output. The second must be either another
element-wise ufunc with two inputs and one output, or a gufunc with
signature (n)->()
.
The name gendot
is from "generalized dot product". The standard
dot product is the composition of element-wise multiplication and
reduction with addition. The prodfunc
and sumfunc
arguments of
gendot
take the place of multiplication and addition.
For example, to take the element-wise minimum of two 1-d arrays, and then take the maximum of the result:
>>> import numpy as np
>>> from ufunclab import gendot
>>> minmaxdot = gendot(np.minimum, np.maximum)
>>> a = np.array([1.0, 2.5, 0.3, 1.9, 3.0, 1.8])
>>> b = np.array([0.5, 1.1, 0.9, 2.1, 0.3, 3.0])
>>> minmaxdot(a, b)
1.9
minmaxdot
is a gufunc with signature (n),(n)->()
; the type
signatures of the gufunc loop functions were derived by matching
the signatures of the ufunc loop functions for np.minimum
and
np.maximum
:
>>> minmaxdot.signature
'(n),(n)->()'
>>> print(minmaxdot.types)
['??->?', 'bb->b', 'BB->B', 'hh->h', 'HH->H', 'ii->i', 'II->I', 'll->l',
'LL->L', 'qq->q', 'QQ->Q', 'ee->e', 'ff->f', 'dd->d', 'gg->g', 'FF->F',
'DD->D', 'GG->G', 'mm->m', 'MM->M']
gendot
is experimental, and might not be useful in many applications.
We could do the same calculation as minmaxdot
with, for example,
np.maximum.reduce(np.minimum(a, b))
, and in fact, the pure NumPy
version is faster than minmaxdot(a, b)
for large (and even moderately
sized) 1-d arrays. An advantage of the gendot
gufunc is that it does
not create an intermediate array when broadcasting takes place. For
example, with inputs x
and y
with shapes (20, 10000000)
and
(10, 1, 10000000)
, the equivalent of minmaxdot(x, y)
can be computed
with np.maximum.reduce(np.minimum(x, y), axis=-1)
, but np.minimum(x, y)
creates an array with shape (10, 20, 10000000)
. Computing the result
with minmaxdot(x, y)
does not create the temporary intermediate array.
ufunc_inspector(func)
prints information about a NumPy ufunc.
For example,
>>> import numpy as np
>>> from ufunclab import ufunc_inspector
>>> np.__version__
'1.24.0'
>>> ufunc_inspector(np.hypot)
'hypot' is a ufunc.
nin = 2, nout = 1
ntypes = 5
loop types:
0: ( 23, 23) -> 23 (ee->e) PyUFunc_ee_e_As_ff_f
1: ( 11, 11) -> 11 (ff->f) PyUFunc_ff_f
2: ( 12, 12) -> 12 (dd->d) PyUFunc_dd_d
3: ( 13, 13) -> 13 (gg->g) PyUFunc_gg_g
4: ( 17, 17) -> 17 (OO->O) PyUFunc_OO_O_method
(The output will likely change as the code develops.)
>>> ufunc_inspector(np.sqrt)
'sqrt' is a ufunc.
nin = 1, nout = 1
ntypes = 10
loop types:
0: 23 -> 23 (e->e) PyUFunc_e_e_As_f_f
1: 11 -> 11 (f->f) not generic (or not in the checked generics)
2: 12 -> 12 (d->d) not generic (or not in the checked generics)
3: 11 -> 11 (f->f) PyUFunc_f_f
4: 12 -> 12 (d->d) PyUFunc_d_d
5: 13 -> 13 (g->g) PyUFunc_g_g
6: 14 -> 14 (F->F) PyUFunc_F_F
7: 15 -> 15 (D->D) PyUFunc_D_D
8: 16 -> 16 (G->G) PyUFunc_G_G
9: 17 -> 17 (O->O) PyUFunc_O_O_method
>>> ufunc_inspector(np.add)
'add' is a ufunc.
nin = 2, nout = 1
ntypes = 22
loop types:
0: ( 0, 0) -> 0 (??->?) not generic (or not in the checked generics)
1: ( 1, 1) -> 1 (bb->b) not generic (or not in the checked generics)
2: ( 2, 2) -> 2 (BB->B) not generic (or not in the checked generics)
3: ( 3, 3) -> 3 (hh->h) not generic (or not in the checked generics)
4: ( 4, 4) -> 4 (HH->H) not generic (or not in the checked generics)
5: ( 5, 5) -> 5 (ii->i) not generic (or not in the checked generics)
6: ( 6, 6) -> 6 (II->I) not generic (or not in the checked generics)
7: ( 7, 7) -> 7 (ll->l) not generic (or not in the checked generics)
8: ( 8, 8) -> 8 (LL->L) not generic (or not in the checked generics)
9: ( 9, 9) -> 9 (qq->q) not generic (or not in the checked generics)
10: ( 10, 10) -> 10 (QQ->Q) not generic (or not in the checked generics)
11: ( 23, 23) -> 23 (ee->e) not generic (or not in the checked generics)
12: ( 11, 11) -> 11 (ff->f) not generic (or not in the checked generics)
13: ( 12, 12) -> 12 (dd->d) not generic (or not in the checked generics)
14: ( 13, 13) -> 13 (gg->g) not generic (or not in the checked generics)
15: ( 14, 14) -> 14 (FF->F) not generic (or not in the checked generics)
16: ( 15, 15) -> 15 (DD->D) not generic (or not in the checked generics)
17: ( 16, 16) -> 16 (GG->G) not generic (or not in the checked generics)
18: ( 21, 22) -> 21 (Mm->M) not generic (or not in the checked generics)
19: ( 22, 22) -> 22 (mm->m) not generic (or not in the checked generics)
20: ( 22, 21) -> 21 (mM->M) not generic (or not in the checked generics)
21: ( 17, 17) -> 17 (OO->O) PyUFunc_OO_O
Here's a collection of resources for learning about the C API for ufuncs.
- Universal functions (ufunc)
- UFunc API
- Generalized Universal Function API
- NEP 5 — Generalized Universal Functions
- NEP 20 — Expansion of Generalized Universal Function Signatures
- Universal functions,
part of the NumPy C Code Explanations
- In particular, the section "Function call" explains when the GIL is released.
- Some relevant NumPy source code, if you want to dive deep:
PyUFuncObject
along with related C types and macros are defined innumpy/numpy/_core/include/numpy/ufuncobject.h
.PyUFunc_FromFuncAndData
andPyUFunc_FromFuncAndDataAndSignatureAndIdentity
are defined in the filenumpy/numpy/core/_src/umath/ufunc_object.c
.
- Section of the SciPy Lecture Notes on ufuncs:
- Data Type API -- a handy reference.
- NumPy's distutils module and the corresponding template processor are deprecated. Do not use in new code! When implementing inner loops for many NumPy dtypes, the NumPy distutils template preprocessor is a useful tool. (See the "Other files" section for the syntax that would be used in, say, a C file.)