Fast randomized approximate string matching with succinct hash data structures

2015-01-01

Abstract

The high throughput of modern NGS sequencers coupled with the huge sizes of genomes currently analysed, poses always higher algorithmic challenges to align short reads quickly and accurately against a reference sequence. A crucial, additional, requirement is that the data structures used should be light. The available modern solutions usually are a compromise between the mentioned constraints: in particular, indexes based on the Burrows-Wheeler transform offer reduced memory requirements at the price of lower sensitivity, while hash-based text indexes guarantee high sensitivity at the price of significant memory consumption.

Type

Journal article

Publication

BMC BIOINFORMATICS

Sequence Analysis DNA; Algorithms; Genome; hash; BWT

Fast randomized approximate string matching with succinct hash data structures

Abstract

Related