Fiona, A parallel and automatic strategy for read error correction

Abstract

Motivation: Fiona is a tool for the automatic correction of sequencing errors in reads produced by high throughput sequencing experiments. It uses an efficient implementation of suffix arrays to detect read overlaps with different seed lengths in parallel. Fiona was compared on several real datasets to state-of-the-art methods and showed overall superior correction accuracy. It was also among the fastest. Additionaly, Fiona embarks unique characteristics which makes it a good choice over existing programs:

  • No parameters to set for the user. You just need to know the length of the genome!
  • Correction of both substitution and indel errors.
  • Optimal correction over a range of seed values.
  • Multicore-Parallelization using OpenMP.
  • Efficient, memory-saving implementation.

Links

Please Cite

  • M. H. Schulz, D. Weese, M. Holtgrewe, V. Dimitrova, S. Niu, K. Reinert, H. Richard, “Fiona: a parallel and automatic strategy for read error correction”, vol. 30, iss. 17, 2014.
    cite this publication
    @article{fu_mi_publications1451,
     abstract = {Motivation: Automatic error correction of high-throughput sequencing data can have a dramatic impact on the amount of usable base pairs and their quality. It has been shown that the performance of tasks such as de novo genome assembly and SNP calling can be dramatically improved after read error correction. While a large number of methods specialized for correcting substitution errors as found in Illumina data exist, few methods for the correction of indel errors, common to technologies like 454 or Ion Torrent, have been proposed.Results: We present Fiona, a new stand-alone read error{\^a}??correction method. Fiona provides a new statistical approach for sequencing error detection and optimal error correction and estimates its parameters automatically. Fiona is able to correct substitution, insertion and deletion errors and can be applied to any sequencing technology. It uses an efficient implementation of the partial suffix array to detect read overlaps with different seed lengths in parallel. We tested Fiona on several real datasets from a variety of organisms with different read lengths and compared its performance with state-of-the-art methods. Fiona shows a constantly higher correction accuracy over a broad range of datasets from 454 and Ion Torrent sequencers, without compromise in speed.Conclusion: Fiona is an accurate parameter-free read error{\^a}??correction method that can be run on inexpensive hardware and can make use of multicore parallelization whenever available. Fiona was implemented using the SeqAn library for sequence analysis and is publicly available for download at http://www.seqan.de/projects/fiona.Contact: mschulz@mmci.uni-saarland.de or hugues.richard@upmc.frSupplementary information: Supplementary data are available at Bioinformatics online.},
     author = {M. H. Schulz and D. Weese and M. Holtgrewe and V. Dimitrova and S. Niu and K. Reinert and H. Richard},
     journal = {Bioinformatics},
     number = {17},
     pages = {i356--i363},
     title = {Fiona: a parallel and automatic strategy for read error correction},
     url = {http://publications.imp.fu-berlin.de/1451/},
     volume = {30},
     year = {2014}
    }

Contact

For questions, comments, or suggestions please contact:

Marcel Schulz
˄