Releases: sgkit-dev/vcztools
Release list
Major feature release
Major feature release adding new output formats, cloud store support, a Python API, performance improvements and bug fixes.
New output formats:
- Add
view-plinkcommand for PLINK 1 binary output (.bed/.bim/.fam), following the semantics ofplink2 --vcf X --make-bed. Use--no-bim/--no-famto suppress sidecars. - Add
view-bgencommand for Oxford BGEN output, streamed to stdout or written with.bgen.bgi/.samplesidecars (-o STEM). Supports haploid and mixed-ploidy data.
Cloud and remote storage:
- Read VCZ from cloud and remote stores via the optional
[obstore]and[icechunk]extras; fsspec and HTTP stores are also supported. - Add support for
.vcz.zipfiles (#280). - Support "proportional chunking" across arrays (#356).
Filtering and CLI:
- Support
-R/--regions-fileand-T/--targets-file(#268). - Add
-v/--types,-V/--exclude-types,-m/--min-alleles, and-M/--max-allelestoviewandview-plink, matching bcftools view,
along with a newN_ALTfilter identifier. - Add
N_MISSINGandF_MISSINGfilter variables - Add
--fill-tagsto emulatebcftools +fill-tags - Add
--log-leveland--log-fileoptions; report throughput in MiB/s.
Data types:
- Add support for float16 and float64 data stored in Zarr (#413, #414).
- Add support for 64-bit integer POS and INFO/FORMAT fields.
Python API:
- Add a public Python API, documented under {ref}
sec-python-api:VczReaderfor variant data access, with sample selection by ID viaset_samples(sample_ids, complement=, ignore_missing_samples=)(or by raw index viaset_sample_indexes).- One-shot writers
write_vcf/write_plink/write_bgenand streamingBedEncoder/BgenEncoderbyte encoders. - Sidecar writers
write_bim/write_fam/write_sample/write_bgi. - Click option bundles (
ViewBgenOptions,ViewPlinkOptions,SelectionOptions,ZarrStoreOptions,ReaderOptions,LogOptions) andGroupedCommandfor downstream CLI reuse. - Array-sentinel helpers
is_missing/is_fill/trim_fillfor interpreting missing and end-of-vector (fill) values in iterated arrays.
Platform and packaging:
- Add Windows support.
- Provide prebuilt wheels for Linux, macOS, and Windows (CPython 3.11-3.13).
- New required dependencies:
pandasandhumanfriendly;pyparsing>=3.1.0. Genomic-range operations now useruranges_pyon Python 3.12+ andpyrangeson earlier versions.
Documentation:
- Add a documentation website covering installation, storage backends, the CLI, PLINK/BGEN output, and the Python API.
Deprecations:
--zarr-backend-storageis deprecated in favour of--backend-storage; the old name still works and emits a warning.
Bug fixes:
- vcztools query silently truncated output on multi-chunk stores (#283)
- vcztools view crash on FILTER expressions on multi-chunk stores (#282)
- Incorrect output on vcztools query on FORMAT fields (#286, #287)
- Per-allele INFO field + -s crashes (#295)
- VCF header emitted even when filter expression is invalid (#221)
- Incorrect behaviour for bcftools query with FORMAT scoped filters and sample subsetting (#297)
- Incorrect output for filtering with missing Number=A INFO field + numeric (#299)
- Noise on stderr when performing arithmetic with missing values (#301)
- Null samples included in plink output (#310)
- Fill values handled incorrectly by query (#415)
- Crash evaluating non-scalar INFO field expressions (#435, #436)
Minor feature, maintenance and bugfix release.
Features:
- Add -N/--disable-automatic-newline option (#261)
- Support -S/--samples-file in query (#264)
- Ignore missing samples (#258)
Bug fixes:
- Fix region edge cases and improve test coverage (#262). Region queries or views were in some cases omitting variants that should have been returned.
Breaking:
Feature release
Improvements:
- Support filtering by FILTER (#217), CHROM (#223) and general string values (#220)
- Support regions (-r/-t), filter expressions (-i/-e) and samples (-s) in query command (#205)
- Various improvements to support VCZ datasets produced from tskit and plink files by bio2zarr.
- Use a fully dynamically generated header via vcf_meta_information attributes (#208). Requires vcf-zarr version >= 0.4 (bio2zarr >= 0.1.6) to fully recover the original header.
- Add --version (#197)
Breaking:
Update minimum Click version to 8.2.0 (#206)
Recommended bugfix release
Important bugfixes for filtering language and sample subsetting.
All users are recommended to upgrade ASAP.
Filtering mini-language
Clarify the implementation status of the filtering mini-lanuage in view/query. Version 0.0.1 contained several data-corrupting bugs, including incorrect missing data handling (#163), incorrect matching on FILTER (#164) and CHROM (#178) columns, and incorrect per-sample filtering in query (#179). These issues have been resolved by raising informative errors on aspects of the query language that are not implemented correctly.
The filtering mini-language now consists of arbitrary arithmetic expressions on 1-dimensional fields.
Sample subsetting
Add support for specifying samples via -s/-S options
Initial release
0.0.1a3
Alpha pre-release for testing
Alpha pre-release for testing
Early pre-release for testing.