TetRex

Abstract

Regular Expressions provide a convenient way to model protein signatures, domains, and interaction sites. However, despite the efficiency of modern tools for regular expression search, their runtime is often dominated by the size of the input text. We present TetRex, a novel algorithm for regular expression matching of biological motifs that leverages the (Hierarchical) Interleaved Bloom Filter as an index. This allows for ultra-fast queries of large sequence collections.

Links

Please Cite

  • Remy Marcel Schwab, Simon G Gottlieb, Knut Reinert, “TetRex: a novel algorithm for index-accelerated search of highly conserved motifs”, vol. 7, iss. 2, 2025-04-17.
    cite this publication
    @article{fu_mi_publications3284,
     abstract = {The scale of modern datasets has necessitated innovations to solve even the most traditional and fundamental of computational problems. Set membership and set cardinality are both examples of simple queries that, for large enough datasets, quickly become challenging using traditional approaches. Interestingly, we find a need for these innovations within the field of biology. Despite the vast differences among living organisms, there exist functions so critical to life that they are conserved in the genomes and proteomes across seemingly unrelated species. Regular expressions (regexes) can serve as a convenient way to represent these conserved sequences compactly. However, despite the strong theoretical foundation and maturity of tools available, the state-of-the-art regex search falls short of what is necessary to quickly scan an enormous database of biological sequences. In this work, we describe a novel algorithm for regex search that reduces the search space by leveraging a recently developed compact data structure for set membership, the hierarchical interleaved bloom filter. We show that the runtime of our method combined with a traditional search outperforms state-of-the-art tools.},
     author = {Remy Marcel Schwab and Simon G Gottlieb and Knut Reinert},
     journal = {NAR Genomics and Bioinformatics},
     month = {April},
     number = {2},
     publisher = {Oxford University Press},
     title = {TetRex: a novel algorithm for index-accelerated search of highly conserved motifs},
     url = {http://publications.imp.fu-berlin.de/3284/},
     volume = {7},
     year = {2025}
    }

Contact

For questions, comments, or suggestions please contact:

Remy Schwab