[Feature Request] Generate all HiFi reads in a single file per sample #17

jwalewski · 2024-02-23T12:18:47Z

Hello again,

Thanks for your incredible help in using this software. While everything is working now, I think another significant increase in efficiency could come from generating one output file per input file (as compared to current system which generates, to my knowledge, one output file per chromosome/DNA fragment). Perhaps this could be enabled as an option?

Currently I am (and I assume other users are as well) cating together all of the output files into a single input file. An example follows:

            sed '1,2d' -i $INTERMEDIATE_DIR$OUTPUTNAME*".sam" # remove headers from files
            cat $INTERMEDIATE_DIR$OUTPUTNAME*".sam" > $INTERMEDIATE_DIR$OUTPUTNAME".sam" # 
            rm $INTERMEDIATE_DIR$OUTPUTNAME"_0"*

Please let me know if you have any other questions about the desired functionality. And once again, thanks so much for an incredibly useful algorithm!

All the best,

Joe

The text was updated successfully, but these errors were encountered:

yukiteruono · 2024-02-27T01:48:37Z

Thank you for your suggestion.
Since one output file per chromosome/DNA fragment is a small burden on the user, we will not add new functionality right away, but we plan to improve the usability of PBSIM in consideration of changes in demand for long read simulators in the future.

jwalewski · 2024-03-01T11:30:45Z

Hey, thank you for your timely reply, and a huge thanks to considering this as a future feature. In the meantime, for highly fragmented/ large genomes, do you have any suggestions for how to make the process more efficient? For instance, are there any ways to multithread sed (to remove file headers), or ways to speed up cat? I assume the primary limiting factor is disk read/write speeds?

Any help is much appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Generate all HiFi reads in a single file per sample #17

[Feature Request] Generate all HiFi reads in a single file per sample #17

jwalewski commented Feb 23, 2024

yukiteruono commented Feb 27, 2024

jwalewski commented Mar 1, 2024

[Feature Request] Generate all HiFi reads in a single file per sample #17

[Feature Request] Generate all HiFi reads in a single file per sample #17

Comments

jwalewski commented Feb 23, 2024

yukiteruono commented Feb 27, 2024

jwalewski commented Mar 1, 2024