Optimise feature matrix management to accelerate L2G Training and Prediction #3252

ireneisdoomed · 2024-03-11T08:14:01Z

As a developer, I want to optimise feature annotation processing because it will reduce computation time during the L2G training and prediction phases.

Background

L2G at the moment annotates all features of the input credible sets at execution time. We defined it like this because of

the fact that feature matrix is purely an intermediate dataset only useful in the process of L2G training/prediction
its reliability, we don't introduce a codependence between 2 files.

However, although sensible, in reality this approach makes L2G training under different scenarios inconvenient. Most of the computation time of the step itself goes into feature annotation, so that every single L2G training, where we annotate all credible sets, takes about 25 minutes.
This also affects the prediction part. In this step, only the credible sets for which we want to extract L2G scores are annotated, however I experienced unreasonably long times to extract predictions for 30 loci.

Tasks

Ensure business logic of the colocalisation factories doesn't have big bottlenecks that might be slowing the process down
If not, consider the idea of writing the feature matrix as another dataset. To ensure credible set/feature matrix compatibility, we must assert that all credible sets are part of the feature matrix.

ireneisdoomed · 2024-03-15T12:43:29Z

This PR is relevant for the issue described here opentargets/gentropy#544

After my tests, I concluded that the majority of the computation time goes into the part of feature extraction (the generation of the long dataframe). There is a lot of logic there, but any improvement will make the process better.

addramir · 2024-09-20T21:03:42Z

Can we close this issue since it is duplicated in other issues?

ireneisdoomed · 2024-09-23T09:51:39Z

Yes. Closing as there are no no specific actions.

ireneisdoomed added the gentropy label Mar 11, 2024

ireneisdoomed self-assigned this Mar 11, 2024

ireneisdoomed mentioned this issue Mar 11, 2024

Perform hyperparameter tuning and cross validation on L2G #3253

Open

3 tasks

ireneisdoomed mentioned this issue Sep 12, 2024

L2G feature matrix revision #3432

Closed

7 tasks

project-defiant mentioned this issue Sep 12, 2024

Promote features for new L2G feature matrix #3456

Closed

4 tasks

ireneisdoomed linked a pull request Sep 18, 2024 that will close this issue

refactor(L2GFeatureMatrix)!: streamline feature matrix management opentargets/gentropy#745

Merged

9 tasks

ireneisdoomed mentioned this issue Sep 18, 2024

Make annotation of L2G features agnostic of what features to include #3146

Closed

ireneisdoomed closed this as completed Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise feature matrix management to accelerate L2G Training and Prediction #3252

Optimise feature matrix management to accelerate L2G Training and Prediction #3252

ireneisdoomed commented Mar 11, 2024 •

edited

Loading

ireneisdoomed commented Mar 15, 2024

addramir commented Sep 20, 2024

ireneisdoomed commented Sep 23, 2024

Optimise feature matrix management to accelerate L2G Training and Prediction #3252

Optimise feature matrix management to accelerate L2G Training and Prediction #3252

Comments

ireneisdoomed commented Mar 11, 2024 • edited Loading

Background

Tasks

ireneisdoomed commented Mar 15, 2024

addramir commented Sep 20, 2024

ireneisdoomed commented Sep 23, 2024

ireneisdoomed commented Mar 11, 2024 •

edited

Loading