Sequence Entropy And Markov Modelling Of Mutated CFTR Genes: Insights From Multiple Sequence Alignment
Keywords:
Cystic Fibrosis, CFTR Gene, Gene Mutation, Multiple Sequence Alignment, Shannon Entropy, Markov Chain Model, Genotype-Phenotype CorrelationAbstract
Mutations in the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene are central to the development of cystic fibrosis, a severe genetic disorder affecting epithelial function. This study analyses 35 mutated CFTR gene sequences to explore underlying sequence variation and structural patterns. Multiple sequence alignment was performed to organize the sequences, followed by Shannon entropy analysis to identify regions of high variability and conservation. Hierarchical clustering provided insights into relationships among the mutated sequences, while sequence logo plots visually highlighted nucleotide distribution at each alignment position. To model the sequence behaviour statistically, a first-order Markov chain was constructed, capturing transition probabilities between nucleotides across the aligned sequences. Together, these methods offer a comprehensive view of the mutational landscape within the CFTR gene. The findings enhance our understanding of sequence-level mutation dynamics and provide a foundation for further computational modelling and genotype-phenotype correlation studies in cystic fibrosis research.
Downloads
Metrics
References
Bonidia RP, Domingues DS, Sanches DS, de Carvalho AC. MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Briefings in bioinformatics. 2022 Jan;23(1)
Capra, J. A., & Singh, M. (2007). Predicting functionally important residues from sequence conservation. Bioinformatics, 23(15), 1875-1882.
Donaldson, S. H., Samulski, T. D., LaFave, C., Zeman, K., Wu, J., Trimble, A., ... & Davis, S. D. (2020). A four week trial of hypertonic saline in children with mild cystic fibrosis lung disease: effect on mucociliary clearance and clinical outcomes. Journal of Cystic Fibrosis, 19(6), 942-948.
Durbin, R., Eddy, S. R., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press.
Emdadi A, Moughari FA, Meybodi FY, Eslahchi C. A novel algorithm for parameter estimation of Hidden Markov Model inspired by Ant Colony Optimization. Heliyon. 2019 Mar 1;5(3):e01299.
Hasan, S., Soltman, S., Wood, C., & Blackman, S. M. (2022). The role of genetic modifiers, inflammation and CFTR in the pathogenesis of Cystic fibrosis related diabetes. Journal of Clinical & Translational Endocrinology, 27, 100287.
Jeniffer S D and Senthamarai Kannan K (2021) Stochastic modelling for identifying malignant diseases. Advances and Applications in Mathematical Sciences, 20(9) : 1923-1936.
Kannan KS, Jeniffer SD. Hidden Markov Modelling for Biological Sequence. In Proceedings of International Conference on Computational Intelligence: ICCI (2022) Oct 4 (p. 383). Springer Nature.
Karuppusamy T. Biological Gene Sequence Stucture Analysis Using Hidden Markov Model. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2021 Apr 11;12(4):1652-66.
Kumar S, Gadagkar SR. Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics. 2001 Jul 1;158(3):1321-7.
Li J, Lee JY, Liao L. A new algorithm to train hidden Markov models for biological sequences with partial labels. BMC bioinformatics. 2021 Dec;22(1):1-21.
Meng Y, Fei J. Hidden service publishing flow homology comparison using profile‐hidden markov model. International Journal of Intelligent Systems. 2022 Feb;37(2):1081-112.
Mor B, Garhwal S, Kumar A. A systematic review of hidden Markov models and their applications. Archives of computational methods in engineering. 2021 May;28(3):1429-48.
Muthu, J. D. P., & Kaliyaperumal, S. K. (2022). Markov Modelling for Mucoviscidosis using Genomic Data. European Journal of Mathematics and Statistics, 3(6), 27-34.
Roth C. Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts (Doctoral dissertation, Georg-August-Universität Göttingen).
Sarkar BK. Entropy Based Biological Sequence Study. In Entropy and Exergy in Renewable Energy 2021 Mar 29. IntechOpen.
Sasidharan SK, Thomas C. ProDroid—An Android malware detection framework based on profile hidden Markov model. Pervasive and Mobile Computing. 2021 Apr 1;72:101336.
Schuster‐Böckler B, Bateman A. An introduction to hidden Markov models. Current protocols in bioinformatics. 2007 Jun;18(1):A-3A.
Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., & Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11), 2498-2504.
Sosnay, P. R., Siklosi, K. R., Van Goor, F., Kaniecki, K., Yu, H., Sharma, N., ... & Cutting, G. R. (2013). Defining the disease liability of variants in the cystic fibrosis transmembrane conductance regulator gene. Nature genetics, 45(10), 1160-1167.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.