-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
Hi, when I run the command
fastx_quality_stats -i input.fastq -o output.stats
, where input.fastq consists of
@0 <unknown description>
A
+
]
@1 <unknown description>
A
+
]
, then output.stats consists of
column count min max sum mean Q1 med Q3 IQR lW rW A_Count C_Count G_Count T_Count N_Count Max_count
1 2 60 60 120 60.00 60 50 50 -10 75 35 2 0 0 0
0 2
. Note the med column has a value of 50, whereas the mean column has a value of 60, and the two quality scores in input.fastq are identical (]).
I believe this is a bug?
PS fastx_quality_stats -h prints:
usage: fastx_quality_stats [-h] [-N] [-i INFILE] [-o OUTFILE]
Part of FASTX Toolkit 0.0.14 by A. Gordon ([email protected])
[-h] = This helpful help screen.
[-i INFILE] = FASTQ input file. default is STDIN.
[-o OUTFILE] = TEXT output file. default is STDOUT.
[-N] = New output format (with more information per nucleotide/cycle).
The *OLD* output TEXT file will have the following fields (one row per column):
column = column number (1 to 36 for a 36-cycles read solexa file)
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
A_Count = Count of 'A' nucleotides found in this column.
C_Count = Count of 'C' nucleotides found in this column.
G_Count = Count of 'G' nucleotides found in this column.
T_Count = Count of 'T' nucleotides found in this column.
N_Count = Count of 'N' nucleotides found in this column.
max-count = max. number of bases (in all cycles)
The *NEW* output format:
cycle (previously called 'column') = cycle number
max-count
For each nucleotide in the cycle (ALL/A/C/G/T/N):
count = number of bases found in this column.
min = Lowest quality score value found in this column.
max = Highest quality score value found in this column.
sum = Sum of quality score values for this column.
mean = Mean quality score value for this column.
Q1 = 1st quartile quality score.
med = Median quality score.
Q3 = 3rd quartile quality score.
IQR = Inter-Quartile range (Q3-Q1).
lW = 'Left-Whisker' value (for boxplotting).
rW = 'Right-Whisker' value (for boxplotting).
Metadata
Metadata
Assignees
Labels
No labels