Gzipping all training files results in a nice reduction: add feature that allows scripts/modules to handle this

For example, using the [Metalign](https://github.com/nlapier2/Metalign) default training database (199807 genomes) and running
```bash
python MakeStreamingDNADatabase.py ${trainingFiles} ${outputDir}/${cmashDatabase} -n ${numHashes} -k 60 -v
python MakeStreamingPrefilter.py ${outputDir}/${cmashDatabase} ${outputDir}/${prefilterName} 30-60-10
```
results in uncompressed:
```bash
16G Mar 22 03:39 cmash_db_n1000_k60.h5
9.3G Mar 22 08:07 cmash_db_n1000_k60_30-60-10.bf
6.9G Mar 22 04:34 cmash_db_n1000_k60.tst
```
yet
```bash
4.6G Mar 22 03:39 cmash_db_n1000_k60.h5.gz
3.6G Mar 22 08:07 cmash_db_n1000_k60_30-60-10.bf.gz
3.6G Mar 22 04:34 cmash_db_n1000_k60.tst.gz
```
so ~2-4x compression. 

Would need to either:
- [ ] Enable `MakeStreamingDNADatabase.py` and `MakeStreamingPrefilter.py` to detect compressed training data and decompress it in the script or (better yet)
- [ ] Enable decompression in the modules [MinHash.py](https://github.com/dkoslicki/CMash/blob/master/CMash/MinHash.py) and [Query.py](https://github.com/dkoslicki/CMash/blob/master/CMash/Query.py) themselves.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gzipping all training files results in a nice reduction: add feature that allows scripts/modules to handle this #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gzipping all training files results in a nice reduction: add feature that allows scripts/modules to handle this #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions