We present Masai, a read mapper representing the state of the art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2–4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared to exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic datasets. Masai is implemented in C++ using the SeqAn library.
Compilation from Source Code
Follow the Getting Started section and check out the latest Masai version. Instead of creating a project file in Debug mode, switch to Release mode (-DCMAKE_BUILD_TYPE=Release) and compile masai. This can be done as follows:
svn co http://svn.seqan.de/seqan/tags/masai-0.6.1 masai-0.6.1
cmake .. -DCMAKE_BUILD_TYPE=Release
make masai_indexer masai_mapper masai_output_se masai_output_pe
After successful compilation, copy the binary to a folder in your PATH variable, e.g. /usr/local/bin:
sudo cp bin/masai_* /usr/local/bin
Masai is implemented in C++ under BSD license using the SeqAn library and supports Linux, Mac OS X, and Windows. Please take a look at the README file for usage instructions.
- Fixed a problem in the SeqAn library in case that zlib and libbz2 could not be found.
- Improved suffix array index, it is now used by default
- Added support for FM-index
- Indices are memory mapped by default
- Improved command line interface
- Improved read loading
- Fixed writing of big SAM files
- This version was used in the revised manuscript
- Moved the sources of Masai from private sandbox to seqan/extras, see README for installation instructions
- Reference genomes can be now also indexed with a suffix array or a q-gram index
- Switched from deprecated CommandLineParser to the new ArgumentParser
- Updated command line interface
- Updated README to the new features of Masai
- First official release of Masai
- This version was used in the submitted publication
For questions, comments, or suggestions feel free to contact Enrico Siragusa.
Siragusa, E., Weese D., & Reinert, K. (2013). Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Research, 2013, 1–8.
Last Update 8. May 2013