UniLoc: a universal protein localization site predictor for eukaryotes and prokaryotes

The knowledge of subcellular localization of a protein provides crucial information to reveal its biological function, to help annotate newly sequenced genomes, to verify experimental results, and to identify potential drug targets [1]. However, determining protein localization sites through biochemical experiments is time consuming, costly, and labor intensive. Computational approaches, in contrast, offer an efficient way to assign uncharacterized subcellular localizations of proteins with good accuracy by using primary sequence information. .

We have developed a subcellular localization site predictor, UniLoc, which is capable of predicting single and multiple localization sites for eukaryotic and prokaryotic proteins. Trained with a proteome-scale non-redundant data set, KowPredsite II considerably outperforms existing methods in prediction accuracy. Comparing with the state-of-the-art eukaryotic protein localization site predictors, UniLoc achieves at least 6.5%, 3%, and 15.6% improvements in top1 score, precision, and recall, respectively. Comparing with the state-of-the-art prokaryotic protein localization site predictors, UniLoc achieves at least 6.9%, 7%, and 12.5% improvements in top1 score, precision, and recall, respectively.


The following shows the flowchart of UniLoc. First, a target sequence is used to search against SP-homo using BLAST program to identify similar sequences. If one or more similar protein sequence exists, the prediction is made by scoring the localization site(s) of the top five similar protein sequence(s). If not, a set of high-scoring segment pairs (HSPs, i.e. sequence alignments) are generated by utilizing PSI-BLAST search against NCBI nr database. Synonymous words are extracted from HSPs and are used to query synonymous word dictionary which records tens of millions of synonymous words. All the matched synonymous words are used to vote for localization sites and a scoring function is used to select the top N sites as the prediction result. The prediction output, weighted voting scores of all the sites, and the significant template proteins are shown on the website.

footer contact Sung Hsu IASL Home