Skip to content

Conversation

@whitwham
Copy link
Contributor

This is the NEWS update for the end of 2025 release.

It is here for review and is open for updates until the release.

Copy link
Member

@daviesrob daviesrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few suggested changes to make things clearer. Feel free to make adjustments if you think they need improving.

NEWS Outdated
Comment on lines 32 to 34
* S3 reading code now reads in `chunks` to minimise S3 reading length when
doing a range request. Also this combines the reading, writing and
authorisation code into a single file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a better description?

Suggested change
* S3 reading code now reads in `chunks` to minimise S3 reading length when
doing a range request. Also this combines the reading, writing and
authorisation code into a single file.
* S3 reading code now reads in "chunks" to limit the amount of data read (and
therefore egress costs) from the object store when doing a range request.
Also this combines the reading, writing and authorisation code into a single
file.

NEWS Outdated
Comment on lines 23 to 24
* Improved operation of filters that work with header data. Filter expressions
such as rname, mrname, rnext and library were not working well with iterators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should count as a bug? The description looked a bit vague, this might be better:

Suggested change
* Improved operation of filters that work with header data. Filter expressions
such as rname, mrname, rnext and library were not working well with iterators.
* Improved operation of filters that work with header data. Filter expressions
set as an `HTS_OPT_FILTER` on a BAM or CRAM iterator failed to return
records matching on `rname`, `mrname`, `rnext` or `library`.

NEWS Outdated
with multiple secondary alignments. See also samtools/hts-specs#842.
(PR #1951, fixes #1948. Reported by Matt Sexton)

* Recognise the tabix comment character (-c) when reading records.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Recognise the tabix comment character (-c) when reading records.
* Make tabix skip comments (-c) wherever they occur, not just at the start of
the file.

NEWS Outdated
Comment on lines 76 to 77
* Fix embed_ref=2 on SEQ * and MD:Z tag. The combination of no sequence and
MD:Z with embed_ref=2 caused the slice extents to be miscalculated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Fix embed_ref=2 on SEQ * and MD:Z tag. The combination of no sequence and
MD:Z with embed_ref=2 caused the slice extents to be miscalculated.
* Fix embed_ref=2 on SEQ * and MD:Z tag. The combination of no sequence and
MD:Z with embed_ref=2 caused the slice extents to be miscalculated,
causing invalid CRAM output to be written.

NEWS Outdated
Comment on lines 80 to 81
* Internally store phase in VCF4.4 format irrespective of input file format.
This should prevent problems when dealing with different VCF versions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in the Updates section, as it's a fairly significant change in behaviour. This came out a bit long, but I think it's important to highlight exactly what changed here.

Suggested change
* Internally store phase in VCF4.4 format irrespective of input file format.
This should prevent problems when dealing with different VCF versions.
* HTSlib 1.22 changed the VCF reader so that it stored GT prefixed phasing
information, but only for files specifying `fileformat=VCFv4.4` or higher.
This caused problems when merging files with different versions, so the
VCF reader will now store prefixed phasing information irrespective of
the VCF version listed in the file headers. For files up to VCFv4.3, the
first phasing bit will be set if all other alleles are phased, and cleared
otherwise (following the rules for VCFv4.4 onwards where no explicit
phasing symbol is present). This will also happen when reading BCF.
When accessing GT data, it is no longer safe to assume that the phasing
is set to zero even if the file reports a version earlier than VCFv4.4.
Interfaces such as `bcf_gt_allele()` should always be used to access
GT allele data.
For compatibility, prefixed phasing will be stripped when writing VCF
files with version 4.3 or earlier.

NEWS Outdated
Comment on lines 89 to 90
* Prevent the dropping of in-flight decode jobs when seeking in
cram_next_slice().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Prevent the dropping of in-flight decode jobs when seeking in
cram_next_slice().
* Fix bug where multi-threaded CRAM iterators could drop long alignments
starting significantly before, but overlapping, the region of interest.

@whitwham whitwham marked this pull request as ready for review December 9, 2025 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants