Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design discussion #178

Closed
kahaaga opened this issue Dec 18, 2022 · 10 comments
Closed

Design discussion #178

kahaaga opened this issue Dec 18, 2022 · 10 comments
Labels
design-discussion Decisions must be made

Comments

@kahaaga
Copy link
Member

kahaaga commented Dec 18, 2022

@Datseris, I decided to answer your comment in Entropies.jl here instead, because the discussion doesn't really apply to Entropies.jl, where the API is much simpler.

mutualinfo(def::MITsallisTypeA, measure::MITsallis, est::SomeEstimator, x, y)
off-topic, but I think that this is too much for a user. I wouldn't define MIType at all and just use different functions all together. Having three different types to decide the "algorithm" used probably is a good hint that the name of the function should be where the specialization should occur at.

Actually, in the CausalityTools v2 branch, at the moment, all information theoretic measures are implemented using a single function

estimate([definition::InfoMeasureDefinition], measure::InformationMeasure, est, args...)

The functions mutualinfo, entropy_relative, conditional_mutualinfo, etc, are just wrappers around estimate. The wrappers can of course be tweaked to any level of "simpleness" for the sake of the user-exposed API, e.g.

  • mutualinfo(::MITsallis, est, x, y) = estimate(DefaultTsallisMIType(), ::MITsallis, est, x, y).
  • mutualinfo_tsallis(est, x, y) = estimate(MITsallisType1(), MITsallis(), est, x, y),
  • or any other variant, depending on what we want the API to look like.

Example

The nice thing about the underlying estimate is that it is trivial to compute higher-level info measures in terms of the lower-level ones, because they all use the same estimate function, with the same signature (definition, measure, estimator, input_data...).

Say we want to estimate differential/continuous Shannon conditional mutual information (CMI). There are many ways of defining Shannon CMI, either as a sum of entropy terms, as a sum of mutual information terms, as a KL divergence, and other ways. Thus, EntropyEstimators, MutualInformationEstimator or KlDivergenceEstimators are all valid estimators for the same quantity. We therefore define:

  • estimate(def::CMIH4, measure::CMIShannon, est, x, y, z) computes Shannon CMI as a sum of four entropies.
  • estimate(def::CMI2MI, measure::CMIShannon, est, x, y, z) computes Shannon CMI as as sum of two mutual informations.
  • estimate(def::CMIKLDiv, measure::CMIShannon, est, x, y, z) computes Shannon CMI as a divergence.

Motivation

A common use of CMI is conditional independence testing, for example in the context of causal graphs. A popular independence test is Runge (2018)'s local permutation test for conditional independence. In CausalityTools v2, this test is implemented (roughly) as

Base.@kwdef struct LocalPermutation <: IndependenceTest
    definition::InfoMeasureDefinition = CMI4H(),
    measure::InfoMeasure = CMIShannon(),
    est = Kraskov(k = 10)
    nsurr::Int = 100
    ... # more test parameters
end

function independence(test::LocalPermutation, x::AbstractDataset, y::AbstractDataset, z::AbstractDataset) 
    .... # create permutations of the data
    .... # compute the information measure in question from `test.definition`, `test.measure`, `test.est`
end

Say I have the data X = randn(1000); Y = X .+ randn(1000); Z = Y .+ randn(1000) (i.e. X -> Y -> Z), and I want to check how well the LocalPermutation test correctly identifies that X and Z are conditionally dependent, given Y.

I can now write a single for loop to test how well the test performs, for all valid combinations estimation methods, definitions and measures. For example, I could do

measures = [CMIShannon(), CMITsallis(q = 2), CMIRenyi(q = 1.5), CMIRenyi(q = 0.5)]
definitions = [CMI4H(), CMI2MI()]
estimators = [Kraskov(), KozachenkoLeonenko(), ZhuSingh(), ...]

for definition in definitions
    for measure in measures
        for est in estimators
            test = LocalPermutation(; definition, measure, est, kwargs...)
            result = independence(test, X, Z, Y)
        end
    end
end

Armed with this machinery, a user can essentially perform literature-wide method-sensitivity-analysis for any analysis they do on their own data, using any sort of null hyopthesis testing scheme, such as LocalPermutation or GlobalPermutation (i.e. traditional surrogates), with just ten-ish lines of code.

Alternative (defining explicit methods for different definitions)

We could of course also do

mutualinfo_tsallis_methodT1(est, x, y), or mutualinfo_methodT1(::MITsallis, est, x, y)
mutualinfo_tsallis_methodT2(est, x, y), or mutualinfo_methodT2(::MITsallis, est, x, y)
mutualinfo_renyi_methodR1(est, x, y), ...
mutualinfo_renyi_methodR2(est, x, y), ...
mutualinfo_renyi_methodR3(est, x, y), ...
mutualinfo_shannon_methodA(est, x, y), ...
(yes, there are actually that many variants in the literature)

Your point about expectations from the user is valid. However, there is obviously a trade-off to be made between what to expect from a user, and messiness of the code.

The conditional independence example above is enough for me to conclude that the modular design with the single estimate function far outweights any confusion for entry level users. Appropriately wrappers with good default values trivially circumvents any such confusion, e.g.

conditional_mutualinfo(est::Union{EntropyEstimator, ProbabilitiesEstimator} x, y, z; base = 2) = estimate(CMI4H(), CMIShannon(base = 2), est, x, y, z)
conditional_mutualinfo(est::MutualInformationEstimator, x, y, z; base = 2) = estimate(CMI2MI(), CMIShannon(base = 2), est, x, y, z)
... 
@kahaaga kahaaga added the design-discussion Decisions must be made label Dec 18, 2022
@Datseris
Copy link
Member

Datseris commented Jan 5, 2023

Alright, I really need to review CausalityTools fast because the workshop is coming up real fast. I take it that I should checkout and review branch v2, right?

There are many ways of defining Shannon CMI, either as a sum of entropy terms, as a sum of mutual information terms, as a KL divergence, and other ways

What do you mean here? According to Wikipedia there is one defintion of CMI, which may take other equivalent forms. So what's the use of providing different equivalent definitions here? They are all supposed to be identities of each other.

@Datseris
Copy link
Member

Datseris commented Jan 5, 2023

Well there isn't any v2 branch, but a bunch of them, so I need to know where to focus the review effort in.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 5, 2023

@Datseris There been a bunch of changes since last time I pushed anything. I've had to re-do a bunch of stuff since we changed the ComplexityMeasures API.

I think I'll be done tonight or tomorrow morning with something you can review. I've been working non-stop since we tagged ComplexityMeasure, so it's just about ready. I'll tag you explicitly when I've committed everything.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 5, 2023

What do you mean here? According to Wikipedia there is one defintion of CMI, which may take other equivalent forms. So what's the use of providing different equivalent definitions here? They are all supposed to be identities of each other.

Take-away: the different definitions are equivalent in theory, but we don't compute the CMI. We compute an estimate of it. And depending on which (equivalent, in theory) definition of CMI we use, estimation is done differently, and leads to different biases in the final result. For other types of CMI, such as Renyi, there are actually non-equivalent definitions.

Anyways, all the questions you may have regarding this should be clarified by the updated docs. Again, I'll tag you when ready

@Datseris
Copy link
Member

Datseris commented Jan 5, 2023

Sure, I get this, but then you only need to specify the estimator. Why would one need to specify both an estimator as well as the "definition" it estimates? The latter is included in the former.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 5, 2023

Sure, I get this, but then you only need to specify the estimator. Why would one need to specify both an estimator as well as the "definition" it estimates? The latter is included in the former.

See comment below also. But here's some pseudocode based on the Tsallis mutual information below. Typenames are totally experimental.

abstract type MutualInformation end
abstract type MutualInformationDefinition end

# `ContingencyMatrix` is an estimator of an empirical joint density of multiple variables.
# Can be used as an estimator for any discrete information measure. It has to be precomputed.
struct ContingencyMatrix <: ProbabilitiesEstimator end 

struct MITsallis{B, D <: MutualInformationDefinition} <: MutualInformation
    e::Tsallis
    definition::D
end
# Both of the following definitions are referred to as "Tsallis mutual information" in the literature. They are different
# non-equivalent variants  of the same concepts (formulas are different), but  both have the Tsallis entropy as a starting point
struct MITsallisFuruichi <: MutualInformationDefinition
struct MITsallisMartin <: MutualInformationDefinition


# Works on *any* input data (also categorial). 
estimate(m::MITsallis{<:MITsallisFuruichi}, c::ContingencyMatrix) = ... # returns some quantity
estimate(m::MITsallis{<:MITsallisMartin}, c::ContingencyMatrix) = ... # return a *different* quantity (different formula)

# Only works for numerical data, because it calls `entropy` directly (some variant of `H(X) + H(Y) - H(X, Y)`)
estimate(m::MITsallis{<:MITsallisFuruichi}, c::ProbabilitiesEstimator, x, y) = ... # returns some quantity
estimate(m::MITsallis{<:MITsallisMartin}, c::ProbabilitiesEstimator, x, y) = ... # returns a different quantity, because the definition is different

# common alias (`estimate` is used generically elsewhere in hypothesis tests, so define everything in terms of it)
mutualinfo(m::MutualInfo, args...) = estimate(m::Mutualinfo, args...)

##############################
# A user would do something like::
##############################
x = rand(100); y = rand(100)
est = SymbolicPermutation()
c = contingency_matrix(est, x, y)

# in general, mi_f != mi_m
mf = MITsallis(def = MITsallisFuruichi(), base = 2)
mm = MITsallis(def = MITsallisMartin(), base = 2)
mi_f = mutualinfo(mf, c)
mi_m = mutualinfo(mm, c)

y = rand(["yes please", "no thanks"], 100)
z = rand(["cheese", "potatoes", "hamburgers"], 100)
w = rand(1:2, 100)
ZW = [(zi, wi) for (zi, wi) in zip(z, w)]
c = contingency_matrix(y, ZW)
mf = MITsallis(def = MITsallisFuruichi), base = 10)
mutualinfo(, c)

Alternatively, instead of using definitions, we can instead define

struct MITsallisMartin <: MutualInformation end
struct MITsallisFuruichi <: MutualInformation end
...

estimate(::MITsallisFuruichi, args...)
estimate(::MITsallisMartin, args...)

I like the former, because for some methods, it allows common dispatch and saves a bunch of lines of code for computations that are common.

More will follow when I push the latest changes.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 5, 2023

Sure, I get this, but then you only need to specify the estimator. Why would one need to specify both an estimator as well as the "definition" it estimates? The latter is included in the former.

The latter is not included in the former. This is the same discussion as we had when dealing with the differential entropy estimators. A particular estimator can be used to compute multiple types of differential entropies. Not all, but some can.

Example: A ValueHistogram estimator can estimate discrete mutual information, but it contains no information whatsoever about the definition of mutual information. It can also estimate entropies of various kind, but ValueHistogram contain no information about those entropies. The only information in ValueHistogram are instructions about how input data are discretized to compute probabilities.

In the above sketched solution, it is the MutualInformationDefinition that contains information about the formula. Defining estimate(::MITsallis{<:MITsallisFuruichi}, est::ValueHistogram) and estimate(::MITsallis{<:MITsallisMartin}, est::ValueHistogram)methods is what defines the computation. This is precisely analogous to how entropy(::Renyi, ::ValueHistogram, x) and entropy(::Tsallis, ::ValueHistogram, x) defines how Renyi and Tsallis entropies are computed.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 5, 2023

Hm. I think it actually might be easier, to just make separate MITsallisFuruichi, MITsallisMartin, MIShannon, MIRenyiTypeA, MIRenyiTypeB, etc, and drop the *definition entirely. It saves one extra keyword in null hypothesis test functions.

However, I'll keep it as described above for now, so you can get something to review. Merging measure/definition should be pretty quick to do if we settle on that approach.

@kahaaga
Copy link
Member Author

kahaaga commented Jan 6, 2023

I'm porting everything to kah_v2_minimal_wip, but it is extreemely WIP and I'm going to need one more day before it's ready to review. The docs aren't all updated, and tests fail. There are probably a bunch of missing methods, rogue files, notebooks etc too.
Just ignore everything that is not explicitly included. Everything will be squashed down in the end, once fixed.

You can have a look in src/entropy_conditional, src/mutalinfo and src/condmutualinfo folders and files therein to see what you think about the syntax. These methods should be pretty up to date. The docs should build locally, if you install the dev branch of ComplexityMeasures

This will be in a muuch better state tomorrow evening, though, so I'd wait with analyzing it in any particular detail if I were you.

@Datseris
Copy link
Member

Datseris commented Jan 6, 2023

estimate

this is a problematic name I am not happy with. At least according to the principles from which I taught Good Scientific Code, this name shows me that it is time to separate into simpler/smaller functions. estimate is a generic name that could be applied to anything on this world. You could estimate anything. Here we are more specific. So, I think, given your "aliasing" code here:

# common alias (`estimate` is used generically elsewhere in hypothesis tests, so define everything in terms of it)
mutualinfo(m::MutualInfo, args...) = estimate(m::Mutualinfo, args...)

estimate shouldn't exist because it doesn't provide any useful benefit over mutualinfo. Which also eliminates m from the picture, because the function name itself conveys the information of what is being estimated. Or, m becomes the definition of mutual info one estimates like in entropy.


m::MITsallis{<:MITsallisFuruichi}

Well I see here a clear duplication of MITsallis, so here clearly a latter is included in the former, as MITsallisFuruichi very much includes MITsallis and hence you don't need to two types to express this, but only one. So MITsallis{<:MITsallisFuruichi} becomes MITsallisFuruichi instead.

So, I 100% am on the side of the alternative:

Alternatively, instead of using definitions, we can instead define

struct MITsallisMartin <: MutualInformation end
struct MITsallisFuruichi <: MutualInformation end
...
estimate(::MITsallisFuruichi, args...)
estimate(::MITsallisMartin, args...)

I like the former, because for some methods, it allows common dispatch and saves a bunch of lines of code for computations that are common.

This argument doesn't stand in my eyes. You can eliminate code duplication with internal functions that are called by the high level functions. The elimination of code duplication isn't a feature of multiple dispatch, it is of functional programming. You can achieve it without any dispatch by clever internal design of the code base. So, eliminating code duplication is not an argument for adding more complexity to our type hierarchy.

To give an example: the simplification of the source code of all the SymbolicPermutation variants didn't use multiple dispatch. It used internal functions (the encodings, which by now are no longer internal but anyways).


The latter is not included in the former. This is the same discussion as we had when dealing with the differential entropy estimators. A particular estimator can be used to compute multiple types of differential entropies. Not all, but some can.

Yes, but I will still repeat that these different definitions should be fields of the estimator itself rather than one more types and input arguments to the highest level function, especially given that in the majority of cases, each estimator is valid only for one definition, eliminating the need of putting the definition as an argument in the first place.


This is the same discussion as we had when dealing with the differential entropy estimators. A particular estimator can be used to compute multiple types of differential entropies. Not all, but some can. Example: A ValueHistogram estimator can estimate discrete mutual information

Yes, but in this example we are talking about the discrete estimators that estimate probabilities, not other quantities that are functions of probabilities. So yes, the design we have for the discrete entropies there is great and I definitely support same design here. What is not great is the design of the DifferentialEntropyEstimators, which I very deeply regret not vetoing completely out, where we allow the estimator and the definition, even though we have 11 estimators that only compute a specific definition... We should really instead have the definition be a field of the estimator, which for these 11 estimators, means nothing, as there is only one option for the definition anyways.


In the above sketched solution, it is the MutualInformationDefinition that contains information about the formula. Defining estimate(::MITsallis{<:MITsallisFuruichi}, est::ValueHistogram) and estimate(::MITsallis{<:MITsallisMartin}, est::ValueHistogram)methods is what defines the computation. This is precisely analogous to how entropy(::Renyi, ::ValueHistogram, x) and entropy(::Tsallis, ::ValueHistogram, x) defines how Renyi and Tsallis entropies are computed.

Not precisely. entropy(::Renyi, ::ValueHistogram, x) doesn't define how a Renyi entropy is computed. entropy(::Renyi, ::Probabilities) does. And this is the important difference. The (probabilities) estimator has no effect on the entropy definition. It is very important here that we don't confuse dispatches. There is no dispatch on the entropy type in the call entropy(::EntropyDef, ::Estimator, x)! Nowhere in the source code you can find such a dispatch on the first type in this function.

Defining estimate(::MITsallis{<:MITsallisFuruichi}, est::ValueHistogram) and estimate(::MITsallis{<:MITsallisMartin}, est::ValueHistogram) methods is what defines the computation

The dispatch here happens on the type parameter MITsallisFuruichi or MITsallisMartin. Even though I have seen no code, I can 99.9% guarantee that the dispatch would happen at these two names, and not on the MITSallis name, which makes that name obsolete at least for dispatch purposes.


Hm. I think it actually might be easier, to just make separate MITsallisFuruichi, MITsallisMartin, MIShannon, MIRenyiTypeA, MIRenyiTypeB, etc, and drop the *definition entirely. It saves one extra keyword in null hypothesis test functions.\

I agree, although my reasons to prefer this are different.


This will be in a muuch better state tomorrow evening, though, so I'd wait with analyzing it in any particular detail if I were you.

Just tag me I won't look at it yet.

@kahaaga kahaaga closed this as completed Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-discussion Decisions must be made
Projects
None yet
Development

No branches or pull requests

2 participants