ANISE and BASIL

Abstract

Motivation: Large insertions of novel sequence are an important type of structural variants. Previous studies used traditional de novo assemblers for assembling non-mapping high-throughput sequencing (HTS) or capillary reads and then tried to anchor them in the reference using paired read information.

Results: We present approaches for detecting insertion breakpoints and targeted assembly of large insertions from HTS paired data: BASIL and ANISE. On near identity repeats that are hard for assemblers, ANISE employs a repeat resolution step. This results in far better reconstructions than obtained by ABYSS. On simulated data, we found our insert assembler to be competitive with the de novo assembler ABYSS while yielding already anchored inserted sequence as opposed to unanchored contigs as from ABYSS. On real-world data, we detected novel sequence in a human individual and thoroughly validated the assembled sequence.

Links

Please Cite

  • M. Holtgrewe, L. Kuchenbecker, K. Reinert, “Methods for the Detection and Assembly of Novel Sequence in High-Throughput Sequencing Data”, vol. 31, iss. 12, 2015.
    cite this publication
    @article{fu_mi_publications1506,
     abstract = {Motivation: 
    Large insertions of novel sequence are an important type of structural variants. Previous studies used traditional de novo assemblers for assembling non-mapping high-throughput sequencing (HTS) or capillary reads and then tried to anchor them in the reference using paired read information.
    
    Results: 
    We present approaches for detecting insertion breakpoints and targeted assembly of large insertions from HTS paired data: BASIL and ANISE. On near identity repeats that are hard for assemblers, ANISE employs a repeat resolution step. This results in far better reconstructions than obtained by the compared methods. On simulated data, we found our insert assembler to be competitive with the de novo assemblers ABYSS and SGA while yielding already anchored inserted sequence as opposed to unanchored contigs as from ABYSS/SGA. On real-world data, we detected novel sequence in a human individual and thoroughly validated the assembled sequence. ANISE was found to be superior to the competing tool MindTheGap on both simulated and real-world data.
    
    Availability and implementation: ANISE and BASIL are available for download at http://www.seqan.de/projects/herbarium under a permissive open source license. 
    
    Contact: manuel.holtgrewe@fu-berlin.de or knut.reinert@fu-berlin.de},
     author = {M. Holtgrewe and L. Kuchenbecker and K. Reinert},
     journal = {Bioinformatics},
     number = {12},
     pages = {1904--1912},
     title = {Methods for the Detection and Assembly of Novel Sequence in High-Throughput Sequencing Data},
     url = {http://publications.imp.fu-berlin.de/1506/},
     volume = {31},
     year = {2015}
    }

Contact

For questions, comments, or suggestions please contact:

Manuel Holtgrewe manuel.holtgrewe@fu-berlin.de
˄