Skip to content

[Feature] Fast Gapfill #1449

@djinnome

Description

@djinnome

Checklist

Problem

I want to be able to run a fast (but approximate) gapfill over a large universal matrix and get a quick and dirty answer that I can use to reduce the size of the universal matrix for slow (but accurate) gap filling.

Solution

The proposed solution for this feature is to add an optional boolean parameter fast_gapfilling to gapfill() that relaxes the gapfilling problem so that instead of binary indicator values, you have continuous indicator values constrained to be between zero and one as described in the supplementary material of:

Dreyfuss, J. M., Zucker, J. D., Hood, H. M., Ocasio, L. R., Sachs, M. S., & Galagan, J. E. (2013). Reconstruction and validation of a genome-scale metabolic model for the filamentous fungus Neurospora crassa using FARM. PLoS Computational Biology, 9(7), e1003126. https://doi.org/10.1371/journal.pcbi.1003126

Then you run gapfill() by minimizing the sum of the indicator variables (L1 metric), and voila: an approximate solution. It may contain more reactions than are strictly necessary, but decreasing the integer threshold can help resolve this.

Alternatives

No response

Anything else?

There are many enzyme prediction algorithms that provide the likelihood that a protein sequence and/or structure catalyzes a particular reaction, such as:

Yang, Y., Jerger, A., Feng, S., Wang, Z., Brasfield, C., Cheung, M. S., Zucker, J., & Guan, Q. (2024). Improved enzyme functional annotation prediction using contrastive learning with structural inference. Communications Biology, 7(1), 1690. https://doi.org/10.1038/s42003-024-07359-z

And they could be used to provide (negative) penalties for the indicator variables. Furthermore, many reactions contain thermodynamic information, and that could be combined with the enzyme evidence to produce an overall probability that a reaction should be included in the model.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions