Average linear time and compressed space construction of the burrows-wheeler transform


© Springer International Publishing Switzerland 2015. The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n, σ, and Hk be the text length, the alphabet size, and the k-th order empirical entropy of the text, respectively. Moreover, let Hk = min{Hk +1, [log σ]}. Under the word RAM model with word size w ∈ Θ(log n), our algorithm builds theBWTin averageO(nHk) time using nHk+o(nHk) bits of space,where k = logσ(n/ log2 n) − 1. We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)