Sequence stats

Deterministic FASTA/FASTQ statistics with interactive length distributions: sequence/read counts, N50/L50, N90/L90, and GC% for nucleotide data.

Limits: up to 45 MB (uncompressed) • up to 220,000 records

About this tool

Sequence Stats computes fast, reproducible summary statistics for FASTA and FASTQ inputs, including sequence/read counts, total length, length distribution metrics (N50/L50, N90/L90), and GC% for nucleotide sequences. For definitions, interpretation, and examples, see Sequence statistics reference. For protein sequences, statistics are reported in amino acids (aa) and GC% is omitted.
It is intended for computing descriptive statistics on small to moderately sized FASTA and FASTQ inputs, for inspection and exploratory analysis. The tool does not modify input or infer biological meaning. For strict format checks and optional DNA/RNA/protein alphabet validation, use the FASTA/FASTQ Validator tool. For IUPAC nucleotide and protein codes, see Sequence alphabets.

For larger datasets, multi-file runs, or more involved workflows, this can be executed separately as a custom analysis.

Tool guarantees

✓ No hidden transformations or guessing
✓ Input is processed once and not stored
✓You can optionally create shareable result pages; shared pages include only derived statistics and sequence headers, never raw sequence content
✓ Shared result pages are temporary and expire automatically after 20 minutes

Results

Submit input to see results here.

Results will include summary tables, interactive plots, and downloadable exports.

N50: length such that 50% of total length is contained in sequences of this length or longer.
L50: number of sequences needed to reach 50% of total length when sequences are sorted by length (descending).
N90 / L90: analogous to N50/L50, but using 90% of total length. Useful for understanding tail fragmentation in length distributions.
N50/N90 (protein sets): descriptive length-distribution statistics only; they do not reflect assembly contiguity or biological completeness.
GC%: computed from A/C/G/T only; ambiguous or non-standard bases are excluded from GC% calculation (nucleotide sequences only).
Input validation: malformed FASTA/FASTQ fails loudly; no auto-correction.
See the Sequence statistics reference for extended explanations and examples.

Quickly confirm a FASTA/FASTQ file is the expected size and read/sequence count.
Sanity-check length distributions after trimming, filtering, or deduplication.
Spot unexpected GC shifts that can indicate contamination or wrong reference data.
Compare simple QC metrics across samples before running heavier pipelines.
Interactively explore sequence length distributions and assess how length filtering affects N50/N90, total bases, and the length profile before exporting subsets for downstream analysis.

Length statistics, GC%, and N50/N90 are commonly used to assess basic properties of sequence collections. They help detect obvious issues early (unexpected read counts, extreme lengths, or unusual composition) before investing time in larger workflows.

FASTA and FASTQ appear throughout routine bioinformatics work: raw sequencing output, trimmed reads, reference sets, and intermediate QC steps. This tool provides a deterministic, browser-based summary suitable for quick checks and exploratory validation.

Sequence stats

Tools

About this tool

Results

Details

Definitions & notes

Common use cases

Background