Going from where to why
Loading...
Date
2010-03-17
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Motivation: Protein subcellular localization is pivotal in understanding
a protein’s function. Computational prediction of subcellular
localization has become a viable alternative to experimental
approaches. While current machine learning-based methods yield
good prediction accuracy, most of them suffer from two key
problems: lack of interpretability and dealing with multiple locations.
Results: We present YLoc, a novel method for predicting protein
subcellular localization that addresses these issues. Due to its
simple architecture, YLoc can identify the relevant features of a
protein sequence contributing to its subcellular localization, e.g.
localization signals or motifs relevant to protein sorting. We present
several example applications where YLoc identifies the sequence
features responsible for protein localization, and thus reveals not
only to which location a protein is transported to, but also why
it is transported there. YLoc also provides a confidence estimate
for the prediction. Thus, the user can decide what level of error is
acceptable for a prediction. Due to a probabilistic approach and the
use of several thousands of dual-targeted proteins, YLoc is able to
predict multiple locations per protein. YLoc was benchmarked using
several independent datasets for protein subcellular localization and
performs on par with other state-of-the-art predictors. Disregarding
low-confidence predictions, YLoc can achieve prediction accuracies
of over 90%. Moreover, we show that YLoc is able to reliably predict
multiple locations and outperforms the best predictors in this area.