Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Generate all HiFi reads in a single file per sample #17

Open
jwalewski opened this issue Feb 23, 2024 · 2 comments
Open

Comments

@jwalewski
Copy link

Hello again,

Thanks for your incredible help in using this software. While everything is working now, I think another significant increase in efficiency could come from generating one output file per input file (as compared to current system which generates, to my knowledge, one output file per chromosome/DNA fragment). Perhaps this could be enabled as an option?

Currently I am (and I assume other users are as well) cating together all of the output files into a single input file. An example follows:

            sed '1,2d' -i $INTERMEDIATE_DIR$OUTPUTNAME*".sam" # remove headers from files
            cat $INTERMEDIATE_DIR$OUTPUTNAME*".sam" > $INTERMEDIATE_DIR$OUTPUTNAME".sam" # 
            rm $INTERMEDIATE_DIR$OUTPUTNAME"_0"*

Please let me know if you have any other questions about the desired functionality. And once again, thanks so much for an incredibly useful algorithm!

All the best,

Joe

@yukiteruono
Copy link
Owner

Thank you for your suggestion.
Since one output file per chromosome/DNA fragment is a small burden on the user, we will not add new functionality right away, but we plan to improve the usability of PBSIM in consideration of changes in demand for long read simulators in the future.

@jwalewski
Copy link
Author

Hey, thank you for your timely reply, and a huge thanks to considering this as a future feature. In the meantime, for highly fragmented/ large genomes, do you have any suggestions for how to make the process more efficient? For instance, are there any ways to multithread sed (to remove file headers), or ways to speed up cat? I assume the primary limiting factor is disk read/write speeds?

Any help is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants