Abstract
Sequence based methods for prediction of signal peptides share a common problem: the difficulty in distinguishing between signal peptides and transmembrane helices. We present here a new version of SignalP, the most widely used tool for signal peptide prediction, which has been constructed specifically to address this problem. By extensive benchmarking using realistic data sets where transmembrane proteins are abundant, we show that SignalP 4.0 outperforms ten other current methods, including transmembrane helix topology predictors with built-in signal peptide models. The comparison is performed using novel data that have entered UniProt since the construction of SignalP 3.0, and the evaluation of the SignalP 4.0 performance has been carried out using a strict "nested cross-validation" approach to avoid any form of overfitting. The new version of SignalP is available at
http://www.cbs.dtu.dk/services/SignalP/.