Raptor

Abstract

Raptor is a system for approximately searching many queries like NGS reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the Interleaved Bloom Filters (IBF) as a set membership data structure, and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory.

Links

Please Cite

  • Enrico Seiler, Svenja Mehringer, Mitra Darvish, Etienne Turc, Knut Reinert, “Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences”, 2020.
    cite this publication
    @unpublished{fu_mi_publications2519,
     abstract = {We present Raptor, a tool for approximately searching many queries in large collections of nucleotide sequences. In comparison with similar tools like Mantis and COBS, Raptor is 12-144 times faster and uses up to 30 times less memory. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the Interleaved Bloom Filters (IBF) as a set membership data structure, and probabilistic thresholding for minimizers. Our approach allows compression and a partitioning of the IBF to enable the effective use of secondary memory.
    Competing Interest Statement: The authors have declared no competing interest.},
     author = {Enrico Seiler and Svenja Mehringer and Mitra Darvish and Etienne Turc and Knut Reinert},
     booktitle = {Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences},
     journal = {bioRxiv},
     title = {Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences},
     url = {http://publications.imp.fu-berlin.de/2519/},
     year = {2020}
    }

Contact

For questions, comments, or suggestions please contact:

Enrico Seiler enrico.seiler@fu-berlin.de
˄