Checklist
Problem
I want to be able to run a fast (but approximate) gapfill over a large universal matrix and get a quick and dirty answer that I can use to reduce the size of the universal matrix for slow (but accurate) gap filling.
Solution
The proposed solution for this feature is to add an optional boolean parameter fast_gapfilling to gapfill() that relaxes the gapfilling problem so that instead of binary indicator values, you have continuous indicator values constrained to be between zero and one as described in the supplementary material of:
Dreyfuss, J. M., Zucker, J. D., Hood, H. M., Ocasio, L. R., Sachs, M. S., & Galagan, J. E. (2013). Reconstruction and validation of a genome-scale metabolic model for the filamentous fungus Neurospora crassa using FARM. PLoS Computational Biology, 9(7), e1003126. https://doi.org/10.1371/journal.pcbi.1003126
Then you run gapfill() by minimizing the sum of the indicator variables (L1 metric), and voila: an approximate solution. It may contain more reactions than are strictly necessary, but decreasing the integer threshold can help resolve this.
Alternatives
No response
Anything else?
There are many enzyme prediction algorithms that provide the likelihood that a protein sequence and/or structure catalyzes a particular reaction, such as:
Yang, Y., Jerger, A., Feng, S., Wang, Z., Brasfield, C., Cheung, M. S., Zucker, J., & Guan, Q. (2024). Improved enzyme functional annotation prediction using contrastive learning with structural inference. Communications Biology, 7(1), 1690. https://doi.org/10.1038/s42003-024-07359-z
And they could be used to provide (negative) penalties for the indicator variables. Furthermore, many reactions contain thermodynamic information, and that could be combined with the enzyme evidence to produce an overall probability that a reaction should be included in the model.
Checklist
Problem
I want to be able to run a fast (but approximate) gapfill over a large universal matrix and get a quick and dirty answer that I can use to reduce the size of the universal matrix for slow (but accurate) gap filling.
Solution
The proposed solution for this feature is to add an optional boolean parameter
fast_gapfillingtogapfill()that relaxes the gapfilling problem so that instead of binary indicator values, you have continuous indicator values constrained to be between zero and one as described in the supplementary material of:Then you run
gapfill()by minimizing the sum of the indicator variables (L1 metric), and voila: an approximate solution. It may contain more reactions than are strictly necessary, but decreasing the integer threshold can help resolve this.Alternatives
No response
Anything else?
There are many enzyme prediction algorithms that provide the likelihood that a protein sequence and/or structure catalyzes a particular reaction, such as:
And they could be used to provide (negative) penalties for the indicator variables. Furthermore, many reactions contain thermodynamic information, and that could be combined with the enzyme evidence to produce an overall probability that a reaction should be included in the model.