Quality header must start with +: CGGCCTCAGTGAGCGA... at line 1 in fastq #38

dlbowie0 · 2025-02-17T15:26:31Z

I generated reads using the following command line:
´pbsim --strategy wgs --method qshmm --qshmm ../../../../pbsim/pbsim3/data/QSHMM-ONT-HQ.model --length-min 2050 --length-max 2050 --depth 6000 --difference-ratio 0:0:0 --genome ../file.fa --prefix ssaav --accuracy-mean 1´

However, when I try to run a quality control using FastQC or to convert the FASTQ file top BAM using picard-tools, the same error is generated : " Quality header must start with +: CGGCCTCAGTGAGCGAGCGAGCGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCTCAGATCAATTCCCGCCT at line 1 in fastq".

I would like to know if this error stems from the parameters I have chosen or maybe the issue is linked to the tool.

yukiteruono · 2025-02-18T01:13:43Z

Thank you for using PBSIM3.
I ran your command with some changes (--depth 6 --genome sample/sample.fasta).
The generated FASTQ file was fine.
Both FastQC and picard-tools ran successfully.
I suspect your FASTQ file was corrupted. If the nucleotide sequence in the FASTQ file is split into two lines, your error message will be output.

dlbowie0 · 2025-02-18T08:26:18Z

This is what the FASTQ file looks like:

I have re-ran the command line several times and the files are all corrupted. Do you think the issue could be linked to the input FASTA file? I checked the sample FASTA file that you provided and compare it with my FASTA file. The difference is that in yours the bases are all written in capital letters, while mine are in common.

yukiteruono · 2025-02-18T09:58:35Z

PBSIM converts lowercase letters in the genome sequence to uppercase. Non-ATGC letters are randomly changed to ATGC in the reads.
Your FASTQ nucleotide sequences are too short, which is strange. The error may be caused by line break codes in the genome sequence. PBSIM only recognizes LF (\n) as the line break code. If CR (\r) or CRLF (\r\n) is used, change it to LF.

dlbowie0 · 2025-02-18T13:30:27Z

My FASTA was indeed using CRLF (\r\n) . I changed it and the FastQC analysis was successful. Is this requirement noted in the README? If so, could you point me to where exactly?

yukiteruono · 2025-02-18T15:27:45Z

This is something that needs to be noted, so I've added it to the README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality header must start with +: CGGCCTCAGTGAGCGA... at line 1 in fastq #38

Quality header must start with +: CGGCCTCAGTGAGCGA... at line 1 in fastq #38

dlbowie0 commented Feb 17, 2025

yukiteruono commented Feb 18, 2025

dlbowie0 commented Feb 18, 2025

yukiteruono commented Feb 18, 2025

dlbowie0 commented Feb 18, 2025

yukiteruono commented Feb 18, 2025

Quality header must start with +: CGGCCTCAGTGAGCGA... at line 1 in fastq #38

Quality header must start with +: CGGCCTCAGTGAGCGA... at line 1 in fastq #38

Comments

dlbowie0 commented Feb 17, 2025

yukiteruono commented Feb 18, 2025

dlbowie0 commented Feb 18, 2025

yukiteruono commented Feb 18, 2025

dlbowie0 commented Feb 18, 2025

yukiteruono commented Feb 18, 2025