Encode (Forward Burrows-Wheeler Transform)
Enter plain text to compute its BWT output and primary index. Use a unique end marker (sentinel), commonly $.
Decode (Inverse Burrows-Wheeler Transform)
Use the transformed string and primary index to recover the original text.
What this Burrows-Wheeler Transform calculator does
This page gives you a practical Burrows-Wheeler Transform (BWT) tool that works in both directions: forward transform (encode) and inverse transform (decode). If you are learning data compression, bioinformatics indexing, suffix-based algorithms, or simply exploring string algorithms, this calculator helps you validate examples quickly.
The BWT rearranges characters of a string so that similar characters cluster together. It does not compress by itself, but it makes the data much friendlier for compression methods like run-length encoding and entropy coding.
Quick example
For input BANANA with sentinel $, the calculator computes:
- Transformed string L = ANNB$AA
- Primary index I = 4
Using L and I, inverse BWT reconstructs BANANA$, and then removes the sentinel to return BANANA.
How the algorithm works
Forward BWT
The forward transform is typically described in four steps:
- Append a unique sentinel character to the end of the input.
- Generate all cyclic rotations of the string.
- Sort all rotations lexicographically.
- Take the last column of the sorted matrix; that is the transformed output.
The row where the original string appears in the sorted list is stored as the primary index.
Inverse BWT
To reverse BWT, we rebuild the sorted rotation table iteratively:
- Start with an empty table of rows.
- Prepend each character of the transformed string to corresponding rows.
- Sort rows after each prepend round.
- After n rounds, select row I.
That selected row contains the original text plus sentinel.
Why BWT is important
- Compression pipelines: groups repeated symbols for better downstream compression.
- FM-index and search: foundational in full-text indexing and genomic sequence tools.
- Algorithm education: demonstrates how sorting and permutation can expose structure in data.
Practical usage tips
- Choose a sentinel character that does not appear in the input text.
- Keep the same sentinel when decoding.
- If decoding fails, first verify the primary index is correct.
- For very long strings, matrix-based demonstrations are educational but not optimal for speed.
Complexity notes
This educational calculator favors clarity over heavy optimization. It explicitly builds and sorts rotations (or table rows), which is great for understanding but can be expensive for huge inputs. Production systems typically use suffix arrays, suffix trees, or LF-mapping optimizations for scale.