Raptor

Abstract

Raptor is a system for approximately searching many queries like NGS reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the Interleaved Bloom Filters (IBF) as a set membership data structure, and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory.

Links

Please Cite

  • Enrico Seiler, Svenja Mehringer, Mitra Darvish, Etienne Turc, Knut Reinert, “Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences”, vol. 24, iss. 7, 2021-07-23.
    cite this publication
    @article{fu_mi_publications2519,
     abstract = {We present Raptor, a tool for approximately searching many queries in large collections of nucleotide sequences. In comparison with similar tools like Mantis and COBS, Raptor is 12-144 times faster and uses up to 30 times less memory. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the Interleaved Bloom Filters (IBF) as a set membership data structure, and probabilistic thresholding for minimizers. Our approach allows compression and a partitioning of the IBF to enable the effective use of secondary memory.
    Competing Interest Statement: The authors have declared no competing interest.},
     author = {Enrico Seiler and Svenja Mehringer and Mitra Darvish and Etienne Turc and Knut Reinert},
     booktitle = {Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences},
     journal = {iScience},
     month = {July},
     number = {7},
     pages = {102782},
     publisher = {Elsevier},
     title = {Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences},
     url = {http://publications.imp.fu-berlin.de/2519/},
     volume = {24},
     year = {2021}
    }
  • Svenja Mehringer, Enrico Seiler, Felix Droop, Mitra Darvish, René Rahn, Martin Vingron, Knut Reinert, “Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries”, vol. 24, iss. 131, 2023-05-31.
    cite this publication
    @article{fu_mi_publications2846,
     abstract = {We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.
    We show that the HIBF is superior in build time, index size and search time while achieving a comparable or better accuracy compared to other state-of-the art tools (Mantis and Bifrost). The HIBF builds an index up to 211 times faster, using up to 14 times less space and can answer approximate membership queries faster by a factor of up to 129. This can be considered a quantum leap that opens the door to indexing complete sequence archives like the European Nucleotide Archive or even larger metagenomics data sets.},
     author = {Svenja Mehringer and Enrico Seiler and Felix Droop and Mitra Darvish and Ren{\'e} Rahn and Martin Vingron and Knut Reinert},
     booktitle = {Hierarchical Interleaved Bloom Filter: Enabling ultrafast, approximate sequence queries},
     journal = {Genome Biology},
     month = {May},
     number = {131},
     publisher = {BioMed Central},
     title = {Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries},
     url = {http://publications.imp.fu-berlin.de/2846/},
     volume = {24},
     year = {2023}
    }

Contact

For questions, comments, or suggestions please contact:

Enrico Seiler enrico.seiler@fu-berlin.de
˄