Masai

Update

Instead of Masai, check out yet another read aligner from Yara @ GitHub.
Yara features multi-threading, paired-end protocol, direct output into SAM or BAM, significantly lower memory footprint, and much more!

Abstract

We present Masai, a read mapper representing the state of the art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2–4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared to exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic datasets. Masai is implemented in C++ using the SeqAn library.

Compilation from Source Code

Follow the Getting Started section and check out the latest Masai version. Instead of creating a project file in Debug mode, switch to Release mode (-DCMAKE_BUILD_TYPE=Release) and compile masai. This can be done as follows:

svn co http://svn.seqan.de/seqan/tags/masai-0.6.1 masai-0.6.1
mkdir masai-0.6.1/build
cd masai-0.6.1/build
cmake .. -DCMAKE_BUILD_TYPE=Release
make masai_indexer masai_mapper masai_output_se masai_output_pe


After successful compilation, copy the binary to a folder in your PATH variable, e.g. /usr/local/bin:

sudo cp bin/masai_* /usr/local/bin

Binaries Download

Masai is implemented in C++ under BSD license using the SeqAn library and supports Linux, Mac OS X, and Windows. Please take a look at the README file for usage instructions.

Version History

2012-11-20: v0.6.1

  • Fixed a problem in the SeqAn library in case that zlib and libbz2 could not be found.

2012-11-16: v0.6

  • Improved suffix array index, it is now used by default
  • Added support for FM-index
  • Indices are memory mapped by default
  • Improved command line interface
  • Improved read loading
  • Fixed writing of big SAM files
  • This version was used in the revised manuscript

2012-10-25: v0.4

  • Moved the sources of Masai from private sandbox to seqan/extras, see README for installation instructions
  • Reference genomes can be now also indexed with a suffix array or a q-gram index
  • Switched from deprecated CommandLineParser to the new ArgumentParser
  • Updated command line interface
  • Updated README to the new features of Masai

2012-07-22: v0.2

  • First official release of Masai
  • This version was used in the submitted publication

Contact

For questions, comments, or suggestions feel free to contact Enrico Siragusa.

References

Siragusa, E., Weese D., & Reinert, K. (2013). Fast and accurate read mapping with approximate seeds and multiple backtrackingNucleic Acids Research2013, 1–8.

Last Update 31. March 2014