Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of codebase and docs - Probabilities and Encodings- Datseris #213

Merged
merged 36 commits into from
Dec 25, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
edf5da6
update probabilities table
Datseris Dec 22, 2022
b8a416d
CountOccurrences works with `Any` input
Datseris Dec 22, 2022
789c5d4
better terminology header
Datseris Dec 22, 2022
d6a327a
simpler headers in probabilities
Datseris Dec 22, 2022
fac8c8a
Add encodings page
Datseris Dec 22, 2022
5dfbc98
simplify SymbolicPermutation docstring
Datseris Dec 22, 2022
d57c4e6
reference complexity measures
Datseris Dec 22, 2022
7150a6e
correct dosctring to reference isrand
Datseris Dec 22, 2022
fb9f13f
more organized tests for symbolic permutat
Datseris Dec 22, 2022
6815cad
full rewrite of `SymbolicPermutation` and proper `encode` for Ordinal.
Datseris Dec 22, 2022
e8d5ade
type optimization in making the embedding
Datseris Dec 22, 2022
bb9e450
remove entropy!
Datseris Dec 22, 2022
90376c4
simplifi probabilities! even more
Datseris Dec 22, 2022
2d1f455
move fasthist to encoding folder
Datseris Dec 22, 2022
b2572bd
complete unification of symbolic perm methods
Datseris Dec 22, 2022
995fcca
docstring for weighted fversion
Datseris Dec 23, 2022
1da7c87
add docstring to amplkitude aware
Datseris Dec 23, 2022
9f56b1d
delete ALL other files
Datseris Dec 23, 2022
48143e6
fix all symbolic permutation tests
Datseris Dec 23, 2022
3dea113
fix all permutation tests (and one file only)
Datseris Dec 23, 2022
8d75e24
clarify source code of encode Gaussian
Datseris Dec 23, 2022
de128a1
better docstring for GaussEncod
Datseris Dec 23, 2022
5acdd86
simplify docstring of Dispersion
Datseris Dec 23, 2022
20867aa
more tests for naivekernel
Datseris Dec 23, 2022
772af38
Zhu -> Correa
Datseris Dec 23, 2022
c5a5cb8
shorter docstring for spatial permutation
Datseris Dec 23, 2022
17dd4e6
port spatial permutation example to Examples
Datseris Dec 23, 2022
04265cb
re-write SpatialSymb to have encoding as field. All tests pass.
Datseris Dec 24, 2022
69fadb7
better display of exampels in decode
Datseris Dec 24, 2022
17f9a17
better doc for ordinal encoding
Datseris Dec 24, 2022
a1c9c65
Some typos/nitpickery
kahaaga Dec 25, 2022
e78231a
Probabilities can't compute.
kahaaga Dec 25, 2022
3c54910
Don't duplicate `SpatialDispersion`
kahaaga Dec 25, 2022
da35469
Clarify docstrings a bit
kahaaga Dec 25, 2022
99131b7
Typo
kahaaga Dec 25, 2022
7f77993
Cross-reference spatial estimators
kahaaga Dec 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
ChaosTools = "608a59af-f2a3-5ad4-90b4-758bdf3122a7"
CoordinateTransformations = "150eb455-5306-5404-9cee-2592286d6298"
DelayEmbeddings = "5732040d-69e3-5649-938a-b6b4f237613f"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterTools = "35a29f4d-8980-5a13-9543-d66fff28ecb8"
Expand Down
3 changes: 2 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ cd(@__DIR__)
using Pkg
CI = get(ENV, "CI", nothing) == "true" || get(ENV, "GITHUB_TOKEN", nothing) !== nothing
using Entropies
using DelayEmbeddings
using Documenter
using DocumenterTools: Themes
using CairoMakie
using Entropies.DelayEmbeddings
import Entropies.Wavelets

# %% JuliaDynamics theme
Expand Down Expand Up @@ -35,6 +35,7 @@ ENV["JULIA_DEBUG"] = "Documenter"
ENTROPIES_PAGES = [
"index.md",
"probabilities.md",
"encodings.md",
"entropies.md",
"complexity.md",
"multiscale.md",
Expand Down
2 changes: 1 addition & 1 deletion docs/src/devdocs.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Good practices in developing a code base apply in every Pull Request. The [Good
5. If suitable, the estimator may be able to operate based on [`Encoding`]s. If so, it is preferred to implement an `Encoding` subtype and extend the methods [`encode`](@ref) and [`decode`](@ref). This will allow your probabilities estimator to be used with a larger span of entropy and complexity methods without additional effort.
6. Implement dispatch for [`probabilities_and_outcomes`](@ref) and your probabilities estimator type.
7. Implement dispatch for [`outcome_space`](@ref) and your probabilities estimator type.
8. Add your probabilities estimator type to the list in the docstring of [`ProbabilitiyEstimator`](@ref), and if you also made an encoding, add it to the [`Encoding`](@ref) docstring.
8. Add your probabilities estimator type to the table list in the documentation page of probabilities. If you made an encoding, also add it to corresponding table in the encodings section.

### Optional steps
You may extend any of the following functions if there are potential performance benefits in doing so:
Expand Down
20 changes: 20 additions & 0 deletions docs/src/encodings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Encodings

## Encoding API

Some probability estimators first "encode" input data into an intermediate representation indexed by the positive integers. This intermediate representation is called an "encoding" and its API is defined by the following:

```@docs
Encoding
encode
decode
```

## Available encodings

```@docs
OrdinalPatternEncoding
GaussianCDFEncoding
RectangularBinEncoding
```

33 changes: 18 additions & 15 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,22 @@ Entropies
You are reading the development version of the documentation of Entropies.jl,
that will become version 2.0.

## API & terminology
## Terminology

!!! note
The documentation here follows (loosely) chapter 5 of
[Nonlinear Dynamics](https://link.springer.com/book/10.1007/978-3-030-91032-7),
Datseris & Parlitz, Springer 2022.

In the literature, the term "entropy" is used (and abused) in multiple contexts.
The API and documentation of Entropies.jl aim to clarify some aspects of its usage, and
to provide a simple way to obtain probabilities, entropies, or other complexity measures.
The API and documentation of Entropies.jl aim to clarify some aspects of its usage, and to provide a simple way to obtain probabilities, entropies, or other complexity measures.

### Probabilities

Entropies and other complexity measures are typically computed based on _probability distributions_.
These are obtained from [Input data for Entropies.jl](@ref) in a plethora of different ways.
The central API function that returns a probability distribution (in fact, just a vector of probabilities) is [`probabilities`](@ref), which takes in a subtype of [`ProbabilitiesEstimator`](@ref) to specify how the probabilities are computed.
All estimators available in Entropies.jl can be found in the [estimators page](@ref probabilities_estimators).
These can be obtained from input data in a plethora of different ways.
The central API function that returns a probability distribution (or more precisely a probability mass function) is [`probabilities`](@ref), which takes in a subtype of [`ProbabilitiesEstimator`](@ref) to specify how the probabilities are computed.
All available estimators can be found in the [estimators page](@ref probabilities_estimators).

### Entropies

Expand All @@ -40,24 +39,28 @@ Thus, any of the implemented [probabilities estimators](@ref probabilities_estim

These names are commonplace, and so in Entropies.jl we provide convenience functions like [`entropy_wavelet`](@ref). However, it should be noted that these functions really aren't anything more than 2-lines-of-code wrappers that call [`entropy`](@ref) with the appropriate [`ProbabilitiesEstimator`](@ref).

In addition to `ProbabilitiesEstimators`, we also provide [`EntropyEstimator`](@ref)s,
which compute entropies via alternate means, without explicitly computing some
In addition to `ProbabilitiesEstimators`, we also provide [`EntropyEstimator`](@ref)s,
which compute entropies via alternate means, without explicitly computing some
probability distribution. Differential/continuous entropy, for example, is computed
using a dedicated [`EntropyEstimator`](@ref). For example, the [`Kraskov`](@ref)
estimator computes Shannon differential entropy via a nearest neighbor algorithm, while
using a dedicated [`EntropyEstimator`](@ref). For example, the [`Kraskov`](@ref)
estimator computes Shannon differential entropy via a nearest neighbor algorithm, while
the [`Zhu`](@ref) estimator computes Shannon differential entropy using order statistics.
Datseris marked this conversation as resolved.
Show resolved Hide resolved

### Other complexity measures

Other complexity measures, which strictly speaking don't compute entropies, and may or may
not explicitly compute probability distributions, are found in
[Complexity.jl](https://github.com/JuliaDynamics/Complexity.jl) package. This includes
measures like sample entropy and approximate entropy.
Other complexity measures, which strictly speaking don't compute entropies, and may or may not explicitly compute probability distributions, are found in
[Complexity measures](@ref) page.
This includes measures like sample entropy and approximate entropy.

## [Input data for Entropies.jl](@id input_data)

The input data type typically depend on the probability estimator chosen. In general though, the standard DynamicalSystems.jl approach is taken and as such we have three types of input data:
The input data type typically depend on the probability estimator chosen.
In general though, the standard DynamicalSystems.jl approach is taken and as such we have three types of input data:

- _Timeseries_, which are `AbstractVector{<:Real}`, used in e.g. with [`WaveletOverlap`](@ref).
- _Multi-dimensional timeseries, or datasets, or state space sets_, which are [`Dataset`](@ref), used e.g. with [`NaiveKernel`](@ref).
- _Spatial data_, which are higher dimensional standard `Array`s, used e.g. with [`SpatialSymbolicPermutation`](@ref).

```@docs
Dataset
```
40 changes: 26 additions & 14 deletions docs/src/probabilities.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# [Probabilities](@id probabilities_estimators)
# Probabilities

## Probabilities API

Expand All @@ -8,51 +8,63 @@ The probabilities API is defined by
- [`probabilities`](@ref)
- [`probabilities_and_outcomes`](@ref)

and related functions that you will find in the following documentation blocks:

### Probabilitities

```@docs
ProbabilitiesEstimator
probabilities
probabilities!
Probabilities
```

### Outcomes

```@docs
probabilities_and_outcomes
outcomes
outcome_space
total_outcomes
missing_outcomes
```

## Overview
## [Overview of probabilities estimators](@id probabilities_estimators)

Any of the following estimators can be used with [`probabilities`](@ref).
Any of the following estimators can be used with [`probabilities`](@ref)
(in the column "input data" it is assumed that the `eltype` of the input is `<: Real`).

| Estimator | Principle | Input data |
| ------------------------------------------- | --------------------------- | ------------------- |
| [`CountOccurrences`](@ref) | Frequencies | `Vector`, `Dataset` |
|:--------------------------------------------|:----------------------------|:--------------------|
| [`CountOccurrences`](@ref) | Count of unique elements | `Any` |
| [`ValueHistogram`](@ref) | Binning (histogram) | `Vector`, `Dataset` |
| [`TransferOperator`](@ref) | Binning (transfer operator) | `Vector`, `Dataset` |
| [`NaiveKernel`](@ref) | Kernel density estimation | `Dataset` |
| [`SymbolicPermutation`](@ref) | Ordinal patterns | `Vector` |
| [`SymbolicWeightedPermutation`](@ref) | Ordinal patterns | `Vector` |
| [`SymbolicAmplitudeAwarePermutation`](@ref) | Ordinal patterns | `Vector` |
| [`SymbolicPermutation`](@ref) | Ordinal patterns | `Vector`, `Dataset` |
| [`SymbolicWeightedPermutation`](@ref) | Ordinal patterns | `Vector`, `Dataset` |
| [`SymbolicAmplitudeAwarePermutation`](@ref) | Ordinal patterns | `Vector`, `Dataset` |
| [`SpatialSymbolicPermutation`](@ref) | Ordinal patterns in space | `Array` |
| [`Dispersion`](@ref) | Dispersion patterns | `Vector` |
| [`SpatialDispersion`](@ref) | Dispersion patterns in space | `Array` |
| [`Diversity`](@ref) | Cosine similarity | `Vector` |
| [`WaveletOverlap`](@ref) | Wavelet transform | `Vector` |
| [`PowerSpectrum`](@ref) | Fourier spectra | `Vector`, `Dataset` |
| [`PowerSpectrum`](@ref) | Fourier transform | `Vector` |
Datseris marked this conversation as resolved.
Show resolved Hide resolved

## Count occurrences (counting)
## Count occurrences

```@docs
CountOccurrences
```

## Visitation frequency (histograms)
## Histograms

```@docs
ValueHistogram
RectangularBinning
FixedRectangularBinning
```

## Permutation (symbolic)
## Symbolic permutations

```@docs
SymbolicPermutation
Expand All @@ -61,14 +73,14 @@ SymbolicAmplitudeAwarePermutation
SpatialSymbolicPermutation
```

## Dispersion (symbolic)
## Dispersion patterns

```@docs
Dispersion
SpatialDispersion
```

## Transfer operator (binning)
## Transfer operator

```@docs
TransferOperator
Expand Down
5 changes: 1 addition & 4 deletions src/encoding/ordinal_pattern.jl
Original file line number Diff line number Diff line change
@@ -1,13 +1,10 @@
using StaticArrays: MVector
using StateSpaceSets: AbstractDataset

export OrdinalPatternEncoding
#TODO: The docstring here, and probably the source code, needs a full re-write
# based on new `encode` interface.

"""
OrdinalPatternEncoding <: Encoding
OrdinalPatternEncoding(; m::Int, lt = est.lt)
OrdinalPatternEncoding(; m::Int, lt = Entropies.isless_rand)

An encoding scheme that [`encode`](@ref)s `m`-dimensional permutation/ordinal patterns to
integers and [`decode`](@ref)s these integers to permutation patterns based on the Lehmer
Expand Down
18 changes: 3 additions & 15 deletions src/probabilities.jl
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ assigning to each outcome ``\\omega_i`` a probability ``p(\\omega_i)``, such tha
1. Define ``\\Omega``, the "outcome space", which is the set of all possible outcomes over
which probabilities are estimated. The cardinality of this set can be obtained using
[`total_outcomes`](@ref).
2. Define how probabilities ``p_i = p(\\omega_i)` are assigned to outcomes ``\\omega_i``.
2. Define how probabilities ``p_i = p(\\omega_i)`` are assigned to outcomes ``\\omega_i``.

In practice, probability estimation is done by calling [`probabilities`](@ref) with some
input data and one of the following probabilities estimators. The result is a
Expand All @@ -72,20 +72,8 @@ outcome space when instantiated. For some estimators this means that the input d
`x` must be provided both when instantiating the estimator, but also when computing
the probabilities.

All currently implemented probability estimators are:

- [`CountOccurrences`](@ref).
- [`ValueHistogram`](@ref).
- [`TransferOperator`](@ref).
- [`Dispersion`](@ref).
- [`SpatialDispersion`](@ref).
- [`WaveletOverlap`](@ref).
- [`PowerSpectrum`](@ref).
- [`SymbolicPermutation`](@ref).
- [`SymbolicWeightedPermutation`](@ref).
- [`SymbolicAmplitudeAwarePermutation`](@ref).
- [`SpatialSymbolicPermutation`](@ref).
- [`NaiveKernel`](@ref).
All currently implemented probability estimators are listed in a nice table in the
[probabilities estimators](@ref probabilities_estimators) section of the online documentation.
"""
abstract type ProbabilitiesEstimator end

Expand Down
6 changes: 3 additions & 3 deletions src/probabilities_estimators/counting/count_occurences.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ struct CountOccurrences{X} <: ProbabilitiesEstimator
x::X
end

function probabilities_and_outcomes(::CountOccurrences, x::Array_or_Dataset)
function probabilities_and_outcomes(::CountOccurrences, x)
z = copy(x)
probs = Probabilities(fasthist!(z))
# notice that `z` is now sorted within `fasthist!` so we can skip sorting
Expand All @@ -24,8 +24,8 @@ end

outcome_space(est::CountOccurrences) = sort!(unique(est.x))

probabilities(::CountOccurrences, x::Array_or_Dataset) = probabilities(x)
function probabilities(x::Array_or_Dataset)
probabilities(::CountOccurrences, x) = probabilities(x)
function probabilities(x)
# Fast histograms code is in the `histograms` folder
return Probabilities(fasthist!(copy(x)))
end
Original file line number Diff line number Diff line change
Expand Up @@ -7,60 +7,52 @@ export SymbolicPermutation

A probabilities estimator based on ordinal permutation patterns.

The quantity computed depends on the input data:
When passed to [`probabilities`](@ref) the output depends on the input data type:

- **Univariate data**. If applied to a univariate time series, then the time series
- **Univariate data**. If applied to a univariate timeseries (`Vector`), then the timeseries
is first embedded using embedding delay `τ` and dimension `m`, resulting in embedding
vectors ``\\{ \\bf{x}_i \\}_{i=1}^{N-(m-1)\\tau}``. Then, for each ``\\bf{x}_i``,
we find its permutation pattern ``\\pi_{i}``, which we internally encode a an integer
``s_i \\in \\mathbb{N}^+`` for efficient computation (integer symbols are obtained by
using [`encode`](@ref) with [`OrdinalPatternEncoding`](@ref)).
Probabilities are then
estimated as naive frequencies over the encoded permutation symbols
``\\{ s_i \\}_{i=1}^{N-(m-1)\\tau}`` by using [`CountOccurrences`](@ref).
The resulting probabilities can be used to compute permutation entropy (PE;
Bandt & Pompe, 2002[^BandtPompe2002]).
we find its permutation pattern ``\\pi_{i}``. Probabilities are then
estimated as the frequencies of the encoded permutation symbols
by using [`CountOccurrences`](@ref). The resulting probabilities, when given to
[`entropy`](@ref), compute the original permutation entropy[^BandtPompe2002].
- **Multivariate data**. If applied to a an `D`-dimensional `Dataset`,
then it is assumed that the input data represents ``N`` observations of a multivariate
system ``\\{ \\bf{x}_i \\}_{i=1}^N``, and no embedding is constructed.
For each ``\\bf{x}_i \\in \\mathbb{R}^D``, we direct find its permutation pattern
``\\pi_{i}`` and encode it as ``s_i \\in \\mathbb{N}^+`` (i.e. `est.τ` and `est.m` are
ignored, and we set `m = D` instead). Finally, probabilities are estimated as relative
frequencies of occurrences of the encoded permutation symbols.
then no embedding is constructed. For each vector ``\\bf{x}_i``of the dataset,
we directly map it to its permutation pattern
Like above, probabilities are estimated as the frequencies of the permutation symbols.
``\\pi_{i}`` by comparing the elements in the vector. In this case, `m` is ignored,
but `m` must still match the dimension of the dataset for optimization.
The resulting probabilities can be used to compute multivariate permutation
entropy (MvPE; He et al., 2016[^He2016]), but here we don't perform any subdivision
of the permutation patterns (see Figure 3 in He et al., 2016).
entropy[^He2016], although here we don't perform any further subdivision
of the permutation patterns (as in Figure 3 of[^He2016]).

Internally, [`SymbolicPermutation`](@ref) uses the [`OrdinalPatternEncoding`](@ref)
to represent ordinal patterns as integers for efficient computations.

## Outcome space

The outcome space `Ω` for `SymbolicPermutation` is the set of length-`m` ordinal
patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`,
ordered lexicographically. There are `factorial(m)` such patterns.

For example, the outcome `[3, 1, 2]` corresponds to the ordinal pattern of having
first the largest value, then the lowest value, and then the value in between.

## In-place symbolization

`SymbolicPermutation` also implements the in-place [`entropy!`](@ref) and
[`probabilities!`](@ref). The length of the pre-allocated symbol vector must match the
length of the embedding: `N - (m-1)τ` for univariate time series, and `M` for length-`M`
`Dataset`s), i.e.
length of the embedding: `N - (m-1)τ` for univariate timeseries, and `M` for length-`M`
`Dataset`s). For example

```julia
using DelayEmbeddings, Entropies
m, τ, N = 2, 1, 100
est = SymbolicPermutation(; m, τ)

# For a time series
x_ts = rand(N)
πs_ts = zeros(Int, N - (m - 1)*τ)
x_ts = rand(N) # timeseries example
πs_ts = zeros(Int, N - (m - 1)*τ) # length must match length of delay embedding
p = probabilities!(πs_ts, est, x_ts)
h = entropy!(πs_ts, Renyi(), est, x_ts)

# For a pre-discretized `Dataset`
x_symb = outcomes(x_ts, OrdinalPatternEncoding(m = 2, τ = 1))
x_d = genembed(x_symb, (0, -1, -2))
πs_d = zeros(Int, length(x_d))
p = probabilities!(πs_d, est, x_d)
h = entropy!(πs_d, Renyi(), est, x_d)
```

See [`SymbolicWeightedPermutation`](@ref) and [`SymbolicAmplitudeAwarePermutation`](@ref)
Expand All @@ -76,9 +68,9 @@ information about within-state-vector amplitudes.
`lt = Base.isless`).

[^BandtPompe2002]: Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural
complexity measure for time series." Physical review letters 88.17 (2002): 174102.
complexity measure for timeseries." Physical review letters 88.17 (2002): 174102.
[^Zunino2017]: Zunino, L., Olivares, F., Scholkmann, F., & Rosso, O. A. (2017).
Permutation entropy based time series analysis: Equalities in the input signal can
Permutation entropy based timeseries analysis: Equalities in the input signal can
kahaaga marked this conversation as resolved.
Show resolved Hide resolved
lead to false conclusions. Physics Letters A, 381(22), 1883-1892.
[^He2016]:
He, S., Sun, K., & Wang, H. (2016). Multivariate permutation entropy and its
Expand All @@ -101,7 +93,7 @@ end

function probabilities!(πs::AbstractVector{Int}, est::SymbolicPermutation, x::Vector_or_Dataset)
encodings_from_permutations!(πs, est, x)
probabilities(πs)
return probabilities(πs)
end

function probabilities_and_outcomes(est::SymbolicPermutation, x::Vector_or_Dataset)
Expand Down
Loading