Logo

Download

Title:
A Novel Feature Extraction Model for Protein Sequence Comparison
Authors:
Jian Jin, Jie Feng
doi:
Volume
93
Issue
3
Year
2025
Pages
579-597
Abstract In this paper, we introduce a novel feature extraction model for protein sequence comparison. First we cluster 20 natural amino acids into 8 groups based on their physicochemical properties using K-Means algorithm, then a 36-dimensional feature vector is extracted from the frequency, the mean absolute error of the position of amino acids in reduced amino acid sequences, and the order information of 20 amino acids in the original sequences. Finally, the Euclidean distance is used to measure the similarity and evolutionary distance between protein sequences. The test indicates that our method is fast and accurate for classifying and inferring the phylogeny of proteins.

Back