A set of 538,010 proteins were obtained from UniProtKB/SwissProt website. Proteins less than 50 amino acids and those without PSL annotations were removed. To construct a non-redundant data set for benchmark data sets, proteins whose PSL annotations were not experimentally determined and those that shared greater than 30% sequence identity with any other proteins were removed by PSI-CD-HIT program(SP_NR30). We removed localization sites with less than 30 protein instances. Consequently, we compiled two subsets, one contains 14,754 eukaryotic proteins (SP-Euk), the other contains 2,585 were prokaryotic proteins (SP-Prok).

There are 16 subcellular locations for eukaryotic proteins including nucleus, cytoplasm, plasma membrane, extracell, mitochondrion, cytoskeleton, endoplasmic reticulum, plastid, chloroplast, golgi apparatus, centriole, vacuole, lysosome, peroxisome, cell wall, and microsome. There are 5 subcellular locations for prokaryotic proteins including plasma membrane, cytoplasm, extracell, periplasm, and cell wall.








