Adding FramedDataset class for framed RSA analysis #464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request adds a FramedDataset class for performing Framed RSA analysis. Together with my earlier pull requests that adds functions for computing
sigma_kandV, this provides the remaining functionality needed to do Framed RSA with the toolbox. I note that it uses thesigmak_from_datafunction from the other pull request, so that one should be evaluated/merged before this one.Specifically, the FramedDataset class adds the following functionality:
all-zandall-cfor theobs_descriptorthat the user specifies (which has the variable namecond_descriptor'; I am not attached to this). To make sure that any downstream analyses involving crossvalidation remain kosher, repeats of these patterns are added for every combination of values of the remainingobs_descriptors(such that no matter which of the remaining descriptors might be used for crossvalidation, the user will be covered). An alternative option would be to have the user explicitly provide the cv_descriptor here; I'm open to this too. The all-c pattern is automatically tuned such that its norm is equal to the mean norm of the stimulus patterns times a constant (this scaling constant is a free parameter in framed RSA). If a noise matrix is provided, this scaling is done based on the whitened patterns (i.e., the value ofc` is set such that after whitening, the all-c vector has norm equal to the mean norm of the whitened patterns times a constant); this noise matrix should be the same one used for downstream RSA analysis.sigma_kfor the dataset, with an option whether or not to estimate this matrix from the data. If so, thesigmak_from_datafunction is used, and the user must provide a cv_descriptor. If not,sigma_kis assumed to be an identity matrix, with the exception of the entries involving the framed paterns being set to zero (since they have zero variance).cond_descriptor) involve the framed patterns. The rationale for this is that when using framed RSA in conjunction with whitened RDM comparators, it is sometimes useful to incorporate the distances involving the framed patterns into the estimation of V, but not the distances involving only stimulus patterns.Together, this functionality is sufficient to implement Framed RSA as it currently exists. Possible future changes could include alternative methods for tuning the all-c vector (e.g., based on the mean norm of the projection of the stimulus vectors onto the all-c); I think these could be encompassed with one additional argument to
__init__for the class.I made the choice to implement these changes via subclassing
Dataset. However, an alternative approach would have been to subclassRDMinstead, providing a function such ascalc_framed_rdm. I'm not sure about the relative tradeoffs of these two options; I went with subclassing Dataset since there's already anotherDatasetsubclass (TemporalDataset) but currently noRDMsubclass, and since subclassingRDMwouldn't provide a natural way to calculatesigma_kfrom the data for FramedRSA (since this requires having the actual augmented dataset). In the event that we want any downstream functionality (e.g. MDS plotting) to change for framed RSA, I did add anis_frameddataset descriptor that should carry over to any downstream RDMs.Finally, one random thing: currently
self.n_obscorresponds to the number of observations for the original stimuli, before the framed entries are added. I don't know if this would be problematic for any downstream functionality.Happy to provide any other information or testing that would be useful.