match90n2

Abstract

In this paper, we propose a new fast alignment-free method for protein sequence similarity and evolutionary analysis. First 20 natural amino acids are clustered into 6 groups based on their physicochemical properties, then a 12-dimensional vector is constructed based on the frequency and the average position of occurrence of amino acids in each reduced amino acid sequences. Finally, the Euclidean distance is used to measure the similarity and evolutionary distance between protein sequences. The test on three datasets shows that our method can cluster each protein sequence accurately, which illustrates the effective of our method.