Skip to content

Inquiry About HGSVC3 PacBio HiFi Data Access and BAM File Usage #54

@CaryStar01

Description

@CaryStar01

Hi there,

I am working on a project involving human genome sequencing data and have been using the statement below (with citation to Logsdon et al. 2024):

“We obtained PacBio HiFi long-read sequencing data for five individuals from the 1000 Genomes Project (HG02282, HG02769, HG02953, HG03452, and HG03520) from https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3 (Logsdon et al. 2024).”

I have two questions related to the HGSVC3 data access and usage:

Difficulty locating PacBio HiFi data for the five individuals
I have tried accessing the HGSVC3 dataset via the FTP link: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3

However, I have not been able to locate the PacBio HiFi long-read sequencing data corresponding to these five individuals:

HG02282
HG02769
HG02953
HG03452
HG03520
Could you please provide the specific, direct links (or the exact directory paths) to the PacBio HiFi data for each of these samples? For example, if there is a standard subdirectory structure (e.g. under “sequencing”, “pacbio”, “hifi”, etc.), pointers to the correct subfolders would be extremely helpful.

Question about using pre-aligned BAM files vs. re-aligning reads
I noticed that there are pre-aligned BAM files in the HGSVC3 FTP directory. For example:

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/working/20240816_JAX_ONT_guppy6_Rebasecalled/HG03520/20211102_211020_21-lee-006_PCT0053_2-A11-D11/

Given that such pre-aligned BAM files already exist, I am trying to understand best practices:

In your view, is it recommended for downstream users to directly use these BAMs?
Or do you consider these BAMs mainly as intermediate/working files, and would you instead recommend users download the raw reads (FASTQ/FAST5) and perform their own alignments?
If you recommend re-alignment, could you briefly indicate the reasons? (For example: to use a specific reference genome version, a different aligner or parameter set, updated basecalling, or to ensure consistency with other datasets.)

Thank you very much for your time and for making these data available. Any guidance or clarification you can provide on (1) where to find the PacBio HiFi data for the five 1000 Genomes samples and (2) how you recommend using the pre-aligned BAM files will be extremely helpful for my project.

Best regards,
hyguo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions