-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design discussion #178
Comments
Alright, I really need to review CausalityTools fast because the workshop is coming up real fast. I take it that I should checkout and review branch
What do you mean here? According to Wikipedia there is one defintion of CMI, which may take other equivalent forms. So what's the use of providing different equivalent definitions here? They are all supposed to be identities of each other. |
Well there isn't any v2 branch, but a bunch of them, so I need to know where to focus the review effort in. |
@Datseris There been a bunch of changes since last time I pushed anything. I've had to re-do a bunch of stuff since we changed the ComplexityMeasures API. I think I'll be done tonight or tomorrow morning with something you can review. I've been working non-stop since we tagged ComplexityMeasure, so it's just about ready. I'll tag you explicitly when I've committed everything. |
Take-away: the different definitions are equivalent in theory, but we don't compute the CMI. We compute an estimate of it. And depending on which (equivalent, in theory) definition of CMI we use, estimation is done differently, and leads to different biases in the final result. For other types of CMI, such as Renyi, there are actually non-equivalent definitions. Anyways, all the questions you may have regarding this should be clarified by the updated docs. Again, I'll tag you when ready |
Sure, I get this, but then you only need to specify the estimator. Why would one need to specify both an estimator as well as the "definition" it estimates? The latter is included in the former. |
See comment below also. But here's some pseudocode based on the Tsallis mutual information below. Typenames are totally experimental. abstract type MutualInformation end
abstract type MutualInformationDefinition end
# `ContingencyMatrix` is an estimator of an empirical joint density of multiple variables.
# Can be used as an estimator for any discrete information measure. It has to be precomputed.
struct ContingencyMatrix <: ProbabilitiesEstimator end
struct MITsallis{B, D <: MutualInformationDefinition} <: MutualInformation
e::Tsallis
definition::D
end
# Both of the following definitions are referred to as "Tsallis mutual information" in the literature. They are different
# non-equivalent variants of the same concepts (formulas are different), but both have the Tsallis entropy as a starting point
struct MITsallisFuruichi <: MutualInformationDefinition
struct MITsallisMartin <: MutualInformationDefinition
# Works on *any* input data (also categorial).
estimate(m::MITsallis{<:MITsallisFuruichi}, c::ContingencyMatrix) = ... # returns some quantity
estimate(m::MITsallis{<:MITsallisMartin}, c::ContingencyMatrix) = ... # return a *different* quantity (different formula)
# Only works for numerical data, because it calls `entropy` directly (some variant of `H(X) + H(Y) - H(X, Y)`)
estimate(m::MITsallis{<:MITsallisFuruichi}, c::ProbabilitiesEstimator, x, y) = ... # returns some quantity
estimate(m::MITsallis{<:MITsallisMartin}, c::ProbabilitiesEstimator, x, y) = ... # returns a different quantity, because the definition is different
# common alias (`estimate` is used generically elsewhere in hypothesis tests, so define everything in terms of it)
mutualinfo(m::MutualInfo, args...) = estimate(m::Mutualinfo, args...)
##############################
# A user would do something like::
##############################
x = rand(100); y = rand(100)
est = SymbolicPermutation()
c = contingency_matrix(est, x, y)
# in general, mi_f != mi_m
mf = MITsallis(def = MITsallisFuruichi(), base = 2)
mm = MITsallis(def = MITsallisMartin(), base = 2)
mi_f = mutualinfo(mf, c)
mi_m = mutualinfo(mm, c)
y = rand(["yes please", "no thanks"], 100)
z = rand(["cheese", "potatoes", "hamburgers"], 100)
w = rand(1:2, 100)
ZW = [(zi, wi) for (zi, wi) in zip(z, w)]
c = contingency_matrix(y, ZW)
mf = MITsallis(def = MITsallisFuruichi), base = 10)
mutualinfo(, c) Alternatively, instead of using definitions, we can instead define struct MITsallisMartin <: MutualInformation end
struct MITsallisFuruichi <: MutualInformation end
...
estimate(::MITsallisFuruichi, args...)
estimate(::MITsallisMartin, args...) I like the former, because for some methods, it allows common dispatch and saves a bunch of lines of code for computations that are common. More will follow when I push the latest changes. |
The latter is not included in the former. This is the same discussion as we had when dealing with the differential entropy estimators. A particular estimator can be used to compute multiple types of differential entropies. Not all, but some can. Example: A In the above sketched solution, it is the |
Hm. I think it actually might be easier, to just make separate However, I'll keep it as described above for now, so you can get something to review. Merging measure/definition should be pretty quick to do if we settle on that approach. |
I'm porting everything to You can have a look in This will be in a muuch better state tomorrow evening, though, so I'd wait with analyzing it in any particular detail if I were you. |
this is a problematic name I am not happy with. At least according to the principles from which I taught Good Scientific Code, this name shows me that it is time to separate into simpler/smaller functions. # common alias (`estimate` is used generically elsewhere in hypothesis tests, so define everything in terms of it)
mutualinfo(m::MutualInfo, args...) = estimate(m::Mutualinfo, args...)
Well I see here a clear duplication of So, I 100% am on the side of the alternative:
This argument doesn't stand in my eyes. You can eliminate code duplication with internal functions that are called by the high level functions. The elimination of code duplication isn't a feature of multiple dispatch, it is of functional programming. You can achieve it without any dispatch by clever internal design of the code base. So, eliminating code duplication is not an argument for adding more complexity to our type hierarchy. To give an example: the simplification of the source code of all the SymbolicPermutation variants didn't use multiple dispatch. It used internal functions (the encodings, which by now are no longer internal but anyways).
Yes, but I will still repeat that these different definitions should be fields of the estimator itself rather than one more types and input arguments to the highest level function, especially given that in the majority of cases, each estimator is valid only for one definition, eliminating the need of putting the definition as an argument in the first place.
Yes, but in this example we are talking about the discrete estimators that estimate probabilities, not other quantities that are functions of probabilities. So yes, the design we have for the discrete entropies there is great and I definitely support same design here. What is not great is the design of the
Not precisely.
The dispatch here happens on the type parameter
I agree, although my reasons to prefer this are different.
Just tag me I won't look at it yet. |
@Datseris, I decided to answer your comment in Entropies.jl here instead, because the discussion doesn't really apply to Entropies.jl, where the API is much simpler.
Actually, in the CausalityTools v2 branch, at the moment, all information theoretic measures are implemented using a single function
estimate([definition::InfoMeasureDefinition], measure::InformationMeasure, est, args...)
The functions
mutualinfo
,entropy_relative
,conditional_mutualinfo
, etc, are just wrappers aroundestimate
. The wrappers can of course be tweaked to any level of "simpleness" for the sake of the user-exposed API, e.g.mutualinfo(::MITsallis, est, x, y) = estimate(DefaultTsallisMIType(), ::MITsallis, est, x, y)
.mutualinfo_tsallis(est, x, y) = estimate(MITsallisType1(), MITsallis(), est, x, y)
,Example
The nice thing about the underlying
estimate
is that it is trivial to compute higher-level info measures in terms of the lower-level ones, because they all use the sameestimate
function, with the same signature (definition, measure, estimator, input_data...).Say we want to estimate differential/continuous Shannon conditional mutual information (CMI). There are many ways of defining Shannon CMI, either as a sum of entropy terms, as a sum of mutual information terms, as a KL divergence, and other ways. Thus,
EntropyEstimator
s,MutualInformationEstimator
orKlDivergenceEstimator
s are all valid estimators for the same quantity. We therefore define:estimate(def::CMIH4, measure::CMIShannon, est, x, y, z)
computes Shannon CMI as a sum of four entropies.estimate(def::CMI2MI, measure::CMIShannon, est, x, y, z)
computes Shannon CMI as as sum of two mutual informations.estimate(def::CMIKLDiv, measure::CMIShannon, est, x, y, z)
computes Shannon CMI as a divergence.Motivation
A common use of CMI is conditional independence testing, for example in the context of causal graphs. A popular independence test is Runge (2018)'s local permutation test for conditional independence. In CausalityTools v2, this test is implemented (roughly) as
Say I have the data
X = randn(1000); Y = X .+ randn(1000); Z = Y .+ randn(1000)
(i.e.X -> Y -> Z
), and I want to check how well theLocalPermutation
test correctly identifies thatX
andZ
are conditionally dependent, givenY
.I can now write a single for loop to test how well the test performs, for all valid combinations estimation methods, definitions and measures. For example, I could do
Armed with this machinery, a user can essentially perform literature-wide method-sensitivity-analysis for any analysis they do on their own data, using any sort of null hyopthesis testing scheme, such as
LocalPermutation
orGlobalPermutation
(i.e. traditional surrogates), with just ten-ish lines of code.Alternative (defining explicit methods for different definitions)
We could of course also do
mutualinfo_tsallis_methodT1(est, x, y)
, ormutualinfo_methodT1(::MITsallis, est, x, y)
mutualinfo_tsallis_methodT2(est, x, y)
, ormutualinfo_methodT2(::MITsallis, est, x, y)
mutualinfo_renyi_methodR1(est, x, y)
, ...mutualinfo_renyi_methodR2(est, x, y)
, ...mutualinfo_renyi_methodR3(est, x, y)
, ...mutualinfo_shannon_methodA(est, x, y)
, ...(yes, there are actually that many variants in the literature)
Your point about expectations from the user is valid. However, there is obviously a trade-off to be made between what to expect from a user, and messiness of the code.
The conditional independence example above is enough for me to conclude that the modular design with the single
estimate
function far outweights any confusion for entry level users. Appropriately wrappers with good default values trivially circumvents any such confusion, e.g.The text was updated successfully, but these errors were encountered: