• E. Audain, J. Uszkoreit, T. Sachsenberg, J. Pfeuffer, X. Liang, Henning Hermjakob, A. Sanchez, M. Eisenacher, K. Reinert, D. L. Tabb, O. Kohlbacher, and Yasset Perez-Riverol, “In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics,” Journal of proteomics, vol. 150, pp. 170-182, 2017.
    volume = {150},
    pages = {170--182},
    title = {In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics},
    year = {2017},
    author = {Enrique Audain and Julian Uszkoreit and Timo Sachsenberg and Julianus Pfeuffer and Xiao Liang and Henning
    Hermjakob and Aniel Sanchez and Martin Eisenacher and Knut Reinert and David L. Tabb and Oliver Kohlbacher and Yasset
    journal = {Journal of Proteomics},
    month = {January},
    publisher = {Elsevier},
    abstract = {In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result.
    However, most of the analytical methods are based on the identification of reliable peptides and not the direct
    identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of
    proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein
    inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for
    protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines:
    Mascot, X!Tandem, and MS-GF +. All the algorithms were evaluated using a highly customizable KNIME workflow using four
    different public datasets with varying complexities (different sample preparation, species and analytical instruments).
    We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein
    inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only
    regarding the actual numbers of reported protein groups but also concerning the actual composition of groups.
    Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on
    the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily
    increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be
    url = {http://publications.imp.fu-berlin.de/1939/}