Logo

Download

Title:
Protein Sequence Comparison Method Based on 3-ary Huffman Coding
Authors:
Zhaohui Qi, Yingqiang Ning, Yinmei Huang
doi:
Volume
90
Issue
2
Year
2023
Pages
357-380
Abstract Based on 3-ary Huffman coding algorithm, we propose a digital mapping method of protein sequence. Firstly, a 3-ary Huffman tree is defined by the frequency characteristic of 20 amino acids in given protein sequences. The 0-2 codes of 20 amino acids constructed by the 3-ary Huffman tree can convert long protein sequences into one-to-one 0-2 digital sequences. According to the frequency characteristic and the distribution information of 0-2 codes of 20 amino acids in the 0-2 digital sequences, we design the 40-dimensional vectors to characterize the protein sequences. Next, the proposed digital mapping method is used to perform three separate applications, similarity comparison of nine ND6 proteins, evolutionary trend analysis of the 2009 pandemic Human influenza A (H1N1) viruses from January 2020 to June 2022, and the evolution analysis of 95 coronavirus genes. The results illustrate the utility of the proposed method.

Back