FASTA ID deduplicator

Find duplicate FASTA record IDs, remove duplicates (keep first/longest/shortest), or rename duplicates to unique IDs.

Limits: up to 15 MB (uncompressed) • up to 500,000 records

About this tool

Detects duplicate FASTA record identifiers and allows you to report duplicates only, remove redundant records, or rename duplicated IDs. The record ID is defined as the first non-empty token after >, ending at the first whitespace character (space or tab). Anything after that is treated as a description and is preserved. When renaming, only the ID token is changed by appending a deterministic suffix (e.g. seq1_dup2, seq1_dup3). Duplicate IDs frequently arise after concatenation of FASTA files, transcriptome assemblies, or reference merges. FASTA/FASTQ record structure is described in the FASTA and FASTQ formats reference.

For larger datasets, multi-file runs, or more involved workflows, this can be executed separately as a custom analysis.

Tool guarantees

✓ No hidden transformations
✓ Input processed only for this request
✓ FASTA structure preserved in output

Results

Submit input to see results here.

Record ID: the first non-empty token after >, ending at the first whitespace character. Whitespace includes spaces and tabs. Everything after the first whitespace is treated as a description.
Duplicate ID: an ID that appears in more than one FASTA record.
Report: detects duplicate IDs and reports them, without modifying headers or sequences.
Remove: keeps exactly one record per ID. When multiple records share the same ID, you can keep the first occurrence, the longest sequence, or the shortest sequence.
Rename: keeps all records but makes IDs unique by appending a deterministic suffix (e.g. seq1__dup2, seq1__dup3). Only the ID token is modified; the rest of the header and the sequence content are preserved unchanged.
Determinism: no sequences are merged, reordered, or altered. All actions follow explicit, reproducible rules.

Fix “duplicate sequence name” errors before running aligners, assemblers, or annotation tools.
Merge FASTA files from multiple sources where IDs collide.
Keep only the longest (or shortest) representative sequence per ID.
Create a “unique IDs” FASTA for downstream tools that require distinct headers.

FASTA headers can include free text, but most downstream tools treat the first token as the record ID. If IDs are duplicated, many tools fail or silently overwrite records.

This tool is intentionally deterministic: it reports duplicates clearly and applies simple, explicit rules for removal or renaming. It does not attempt to infer which record is “correct”.

FASTA ID deduplicator

Tools

About this tool

Results

Details

Definitions & notes

Common use cases

Background