-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review of codebase and docs - Probabilities and Encodings- Datseris #213
Conversation
Name |
Codecov Report
@@ Coverage Diff @@
## main #213 +/- ##
==========================================
- Coverage 84.63% 84.47% -0.17%
==========================================
Files 49 45 -4
Lines 1191 1140 -51
==========================================
- Hits 1008 963 -45
+ Misses 183 177 -6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Maybe just say |
aaaaaaaaaaaaaaaaah haha my bad. I did'think about it and thought it was the internal function that does the lehmer code thingy... |
@kahaaga I would like to remove |
I would also remove |
@kahaaga the EDIT: In fact, I am convinced that once the encoding does as its supposed to do, we will save a lot of lines of source code. Just like we do in the histogram encoding. In essense, applyling the estimator Here is my plan on how to change it. First, the entirety of the function encode(encoding::OrdinalPatternEncoding, x::SVector{m}) # x already element of input dataset
perm = sortperm!(encoding.dummyperm, x, lt = est.lt)
m = encoding.m
n = 0
for i = 1:m-1
for j = i+1:m
n += perm[i] > perm[j] ? 1 : 0
end
n = (m-i)*n
end
# The Lehmer code actually results in 0 being an encoded symbol. Shift by 1, so that
# encodings are positive integers.
return n + 1
end and BOOM, we saved 200 lines of code. hence, like in |
Ok with me! We just need to remember to remove all references to
Also fine with me.
Excellent. A much better solution. Go ahead! |
src/probabilities_estimators/permutation_ordinal/SymbolicPermutation.jl
Outdated
Show resolved
Hide resolved
@kahaaga have a look at the new maybe the loop # TODO: The following loop can probably be parallelized!
@inbounds for (i, χ) in enumerate(x)
πs[i] = encode(est.encoding, χ)
end is altered to add weights to EDIT: Seems to me that the weight of any estimator is obtained from the data point itself and does not need additional data points. So, each data point |
src/probabilities_estimators/permutation_ordinal/SymbolicPermutation.jl
Outdated
Show resolved
Hide resolved
src/probabilities_estimators/permutation_ordinal/SymbolicPermutation.jl
Outdated
Show resolved
Hide resolved
Yes, the point of the weighted and amplitude-adjusted variants is that they use the same encoded symbols, but weight each symbol differently according some property of the state vector from which the symbol was constructed. Now, the weights are computed using the
Notice that there is sorting going on in |
What's the point of the in-place method if weights are recomputed in every call to the inplace method? one would need an inplace method with pre-allocated weihts and symbol vector...? E.g., I see in the current |
Yes, the returned value from The question is: should we be strict (only allow scalar input |
No just call |
I'm not sure if I follow. Am I right you want to be strict about requiring scalar inputs and require the user to broadcast/map manually? :D |
Why? This sounds like an arbitrary requirement. My suggestion is that things should be in folders if they require more than 1 file, which is unlikely for a probability estimator generally speaking. |
It's just a preference I have, but I have no problems with not sticking to that. Your version is also a natural choice. Feel free to drop the folders if you like. |
@Datseris Are you fixing the remaining tests (some fail after the |
I haven't finished yet, I still need to go through Transfer Operator and this will take a lot of time |
As part of #55, I'll have to re-think the whole transfer operator estimator API anyways in light of all our recent updates. So I suggest we don't dive too deeply into it now, except cleaning up the documentation and cleaning up the source code. However, I would be very hesitant to do any fundamental changes to the structure of the source code now (i.e. remove/add existing functions/types), because this code is directly used in scripts related to publications that are already out, and in other materials I use, which I won't have time to re-do in a while. |
Okay then I don't do TraOp at the moment. Well then this PR is finished from my end and should be reviewed. Minor bugfixes for the tests can be corrected overtime but shouldn't hold back the review.
But the publications use Entropies v1.2 which is already out, not only that but modifications to the code will be published under a package with different name! So why would this matter for your existing publicaitons...? |
As you saw, many of the old estimators such as the ValueHistogram or symbolic stuff saw a complete rewrite that led to much simpler code. They are so easy to follow now. Probably worth doing the same with TE. |
The scripts mostly investigate the transfer operator in the context of causal inference. I still want users to be able to use the existing material, and still be able to to use the latest stuff from CausalityTools V2 when it is released, without having to check out different versions (and remember differences in syntax), and manually using manifests. From experience, that will will quickly become a mess of proportions, unless you know very well what you're doing.
Yes, we definitely should do something similar. But I'll think about that once I have time to address #55.
Ok, excellent. Thanks! I'll have a look at this as soon as possible and follow up with a last review of my own, but probably not until early next week. Some holiday-days are incoming now. Merry christmas, @Datseris! |
Sounds good to me, happy holidays! |
This is done now and all tests pass. I am re-writing the top-level comment to outline the changes. I don't remember all of them. There weere some small changes regarding clarity here and there. I won't write them. |
Computations are done with probabilities as *input*
@Datseris I've now reviewed this PR in detail, and I must say your changes look very, very good! The simplification of the symbolic estimators is ridiculous :D I only had some minor docstring changes (typos, slight clarifications/reformulations).
The existing probabilities estimators are now mature enough that I think we can guarantee their v2-stability. There comes a point where further optimisation at the expense of delayed publication becomes a bit silly. I don't think we're beyond the threshold yet, but we're approaching it :D The only thing I'm a bit unsure of is the multiscale API, which may change. I have still to test whether it makes sense for upstream methods. However, there is no reason to delay Entropies v2 because of that. We can always release a minor breaking version later if necessary. That said, I'm now merging, so we can get on with the rest on the review, and I can get on with solidifying the probabilities-based estimators in CausalityTools. |
There is no such thing as a minor breaking version. If you think multiscale isn't ready, it shouldn't go into v2. |
For the applications in this package, it is ready. I can always take care of any future issues with a simple wrapper upstreams. |
Well in anycase, I haven't review that part yet, so we'll see soon how it is. I'm now going to make my entropies review PR. |
This is a review of all probabilities estimators except the Transfer Operator for v2. Once this PR is merged, all probabilities estimators except the Transfer Operator are "set in stone" for v2 and will not get any further changes. So it is important @kahaaga once we merge this to both agree that this is "done" and commit to this doneness so that we can actually get some v2 out at some point in during our lifespans, because if we don't commit I am sure I will be finding out things to change every week given my history with the repo...
Alright, here I outline the major changes did:
entropy!
method coimpletely.