近日来自华东理工大学以及上海生物信息研究中心的研究人员在国际蛋白质组学顶级期刊《分子与细胞蛋白质组学》（Molecular & Cellular Proteomics，MCP，2010年SCI影响因子为8.35）上发表了题为“Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting”的生物信息学研究论文。|
作为蛋白质组学研究领域一种非常重要的蛋白质鉴定方法，肽质量指纹图谱（Peptide mass fingerprinting，PMF）和串联质谱（Tandem MS，MS/MS）相比，具有高通量、对单肽的高度特异性、对蛋白质翻译后修饰的低敏感度等特点。本研究着眼于提高PMF算法的精确度和稳定性，将蛋白质鉴定过程区分为独立而又关联的三个对象，针对每个对象的特定属性和关键问题，共分解出35640个特征；利用机器学习方法—支持向量机—训练1733项标准数据集；与现有四种PMF鉴定算法（Mascot，MS-Fit，ProFound 和 Aldent）相比，新算法在灵敏度、精确度和稳定性上均获得显著提高；并在新算法理论基础上建立了专用蛋白质鉴定网站。审稿人认为该项研究观念新颖，具有很好的应用性。
Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting
Youyuan Li1, Pei Hao, Siliang Zhang and Yixue Li
Peptide mass fingerprinting (PMF), regardless of becoming complementary to tandem mass spectrometry (MS/MS) for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications. In this study, we propose, implement and evaluate a uniform approach using support vector machines (SVMs) to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum, the experimental spectrum and spectrum alignment. Eighty-one feature-matching patterns (FMPs) derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the PMF procedure. We developed a new strategy to handle shared peak intensities and 440 parameters were generated to digitalize each FMP. A high performance for an evaluation dataset of 137 items was finally achieved by the optimal multi-criteria SVM approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" PMF dataset of 1733 items. Compared to the Mascot, MS-Fit, ProFound and Aldente, the FMP algorithm has a greater ability to identify correct proteins with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%). Several conclusions have been reached via this research. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. Inherent attribute, peak intensity, should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive PMFs. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.