FASTQ quality scores
All referencesOverview
A FASTQ record consists of four lines: a header, a sequence, a separator, and a quality string. The quality string must contain exactly one character per base in the sequence.
@read1 optional description
ACGTN
+
I?I5IQuality characters must be interpreted using the specific encoding used by the basecaller or upstream pipeline.
Phred scores
FASTQ quality values are reported on the Phred scale, a logarithmic transformation of an estimated probability that the base call is incorrect (p):
Q = -10 × log10(p)
p = 10^(-Q/10)Q30 corresponds to an estimated error probability of 10⁻³ (approximately one error per thousand bases). Q means lower estimated error probability; every +10 in Q corresponds to a 10× decrease in p. Interpreting Phred values
The table below shows common reference points on the Phred scale. These values follow directly from the definition and should not be treated as universal cutoffs.
| Phred (Q) | Error probability (p) | Implied accuracy (%) |
|---|---|---|
| 10 | 0.1 | 90% |
| 20 | 0.01 | 99% |
| 30 | 0.001 | 99.9% |
| 40 | 0.0001 | 99.99% |
Aggregate metrics such as mean or median quality are derived from per-base values and are not directly comparable across platforms or basecalling pipelines.
ASCII encoding (Phred+33)
FASTQ stores quality values as printable ASCII characters. In the widely used Sanger convention, the encoded character is chr(Q + 33). This encoding is commonly referred to as Phred+33.
Q0 → '!'
Q10 → '+'
Q20 → '5'
Q30 → '?'
Q40 → 'I' The FASTQ file stores the characters (e.g., ?), not the numeric Q values.
Non-printable characters, unexpected whitespace, or replacement glyphs usually indicate corruption introduced by editors, copy/paste, or character re-encoding.
Short reads vs long reads
FASTQ is used across sequencing technologies, but the operational meaning of quality scores depends on the basecaller and processing stage. Treat Q-scores as platform-specific estimates rather than a universal currency.
Common pitfalls
- Incomplete records due to truncated files.
- Sequence and quality strings with mismatched lengths.
- Line wrapping introduced by editors or copy/paste.
- Unexpected characters due to encoding conversions or copy/paste.
- Legacy FASTQ conventions in older datasets.
Practical validation
A strict FASTQ validation typically checks:
- complete 4-line records
- proper header and separator lines
- exact sequence/quality length matching
- quality characters consistent with the expected encoding (e.g., Phred+33)
- For FASTQ record structure and paired-end layouts, see FASTA/FASTQ formats.
- For strict structural checks, use the FASTA/FASTQ Validator.
- Cock PJA et al. The Sanger FASTQ file format for sequences with quality scores.PMC2847217
- NCBI SRA: submission formats