Kathrin Trappe, Anne-Katrin Emde, Hans-Christian Ehrlich and Knut Reinert
Large-scale population and disease association studies have shown the importance as well as the difficulty of detecting structural variants (SVs) in genomic and also transcriptomic sequencing data. Although being very fast and precise, current read mapping tools usually fail to map sequencing reads that cross SV breakpoints or exon-exon boundaries. These events cause one or even multiple splits in the read-to-reference alignment, with parts of the read mapping to various locations on the reference sequence.
We present GUSTAF, a sound generic multi-split detection method. GUSTAF uses SeqAn’s exact local aligner Stellar to find partial read alignments. Compatible partial alignments are identified, and a split-read graph storing all compatibility information is constructed for each read. Vertices in the graph represent partial alignments, edges represent possible split positions. Using an exact dynamic programming approach, we refine the alignments around possible split positions to determine precise breakpoint locations at single-nucleotide level. We use a DAG shortest path algorithm to determine the best combination of refined alignments, and report those breakpoints supported by multiple reads.
Supported platforms are: Windows, Linux 64, and Mac OS X. Please take a look at the README file for usage instructions.
Please refer to SeqAns Git repository for fatest developer version.
- added: paired-end support
- added: SV classification duplications and translocations
- first release of GUSTAF Version 0.1
For questions, comments, or suggestions please contact Kathrin Trappe.
Last Update 28. August 2014