RazerS - Fast Read Mapping with Sensitivity Control
David Weese, Anne-Katrin Emde, Tobias Rausch, Andreas Döring, and Knut Reinert
Genome Research, Sep 2009, 19: pp. 1646-1654
Abstract
Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time.
Main Features
- import of MultiFASTA read and genome files
- reads can be of arbitrary length
- supports Hamming and edit distance read mapping with configurable error rates
- supports paired-end read mapping
- configurable and predictable sensitivity (runtime/sensitivity tradeoff)
RazerS Binaries
You can find our implementation in the download section. Supported platforms are: Windows, Linux, Linux 64, and Mac OS X. Please take a look at the README file for usage instructions.
Version History
2010-06-18: v1.1
- added: memory efficient support for large q-grams (up to 31)
- added: omptimized mapping onto many short contigs, deferred Swift post-processing
- fixed: minor bug fixes
2009-07-10: v1.0
- first official release of RazerS
- added: paired-end mapping
- added: Eland and GFF output formats
- added: minor optimizations
2008-10-29:
- dramatically decreased memory consumption
- added: "--purge-ambiguous" and "--distance-range" options
- changed "--max-hits" behavior
2008-09-25:
- important: If your version doesn't contain the 'gapped_params' folder please re-download
- added: "--recognition-rate" option to control the sensitivity of RazerS
- added: automatic configuration of the filter depending on the recognition rate
2008-09-18:
- added: "--hamming-only" option to ignore Indels and consider only mismatches
- added: "--match-N" option allows 'N' to match with all characters
2008-09-04:
- added: "--max-hits" option to neglect reads with too many hits or non-unique reads
- added: "--shape" option to change the underlying (un)gapped k-mer shape
2008-08-21:
- fixed: matches at the beginning of the genome were not found on 32bit machines
- added: optimized verifications and increased performance
2008-08-20:
- fixed: matches were primarily sorted by their orientation and secondarily by the sort-order
2008-08-18:
- added: "--repeat-length" option to ignore single character repeats in the genome
- added: "--overabundance-cut" option to remove overabundant read k-mers
- added: "--position-format" option to specify how match positions are denoted
2008-08-07:
- added: "--sort-order" option now allows to select the sort order of matches
- fixed: many non-repeat regions were causing a significant performance drop
2008-08-05:
- added: RepeatMasker masks all Ns of the reference genomes automatically
- fixed: precision bug relating the percent identity
- fixed: closely adjacent matches got lost
2008-08-01:
- first pre-release of RazerS using a fixed k=11 and manual recognition/performance parametrization
Contact
For questions, comments, or suggestions feel free to contact David Weese.