You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a developer, I want to optimise feature annotation processing because it will reduce computation time during the L2G training and prediction phases.
Background
L2G at the moment annotates all features of the input credible sets at execution time. We defined it like this because of
the fact that feature matrix is purely an intermediate dataset only useful in the process of L2G training/prediction
its reliability, we don't introduce a codependence between 2 files.
However, although sensible, in reality this approach makes L2G training under different scenarios inconvenient. Most of the computation time of the step itself goes into feature annotation, so that every single L2G training, where we annotate all credible sets, takes about 25 minutes.
This also affects the prediction part. In this step, only the credible sets for which we want to extract L2G scores are annotated, however I experienced unreasonably long times to extract predictions for 30 loci.
Tasks
Ensure business logic of the colocalisation factories doesn't have big bottlenecks that might be slowing the process down
If not, consider the idea of writing the feature matrix as another dataset. To ensure credible set/feature matrix compatibility, we must assert that all credible sets are part of the feature matrix.
The text was updated successfully, but these errors were encountered:
After my tests, I concluded that the majority of the computation time goes into the part of feature extraction (the generation of the long dataframe). There is a lot of logic there, but any improvement will make the process better.
As a developer, I want to optimise feature annotation processing because it will reduce computation time during the L2G training and prediction phases.
Background
L2G at the moment annotates all features of the input credible sets at execution time. We defined it like this because of
However, although sensible, in reality this approach makes L2G training under different scenarios inconvenient. Most of the computation time of the step itself goes into feature annotation, so that every single L2G training, where we annotate all credible sets, takes about 25 minutes.
This also affects the prediction part. In this step, only the credible sets for which we want to extract L2G scores are annotated, however I experienced unreasonably long times to extract predictions for 30 loci.
Tasks
The text was updated successfully, but these errors were encountered: