Dysarthria Detection and Speech-to-Text Transcription Using Deep Learning and Audio Processing
DOI:
https://doi.org/10.52783/jns.v14.2276Keywords:
Deep Convolutional Neural Networks, Dysarthria, Mel Frequency Logarithmic SpectrogramsAbstract
Dysarthria is a motor speech disorder affecting articulation, pitch, and rhythm due to neurological damage in the human body. Early detection is crucial for effective therapy. This study presents a novel dysarthria detection approach using Mel Frequency Logarithmic Spectrograms (MFLS) and Deep Convolutional Neural Networks (DCNN). Speech signals are preprocessed to extract MFLS, capturing essential frequency and temporal features. These spectrograms serve as input to a DCNN, which identifies patterns associated with dysarthric speech.
The model was trained on publicly available datasets, achieving high accuracy and robustness across different severity levels. It performed well under varying conditions such as speech duration, speaker age, and recording quality. Integrating spectrogram-based feature extraction with deep learning enhances automated speech disorder diagnosis.
This study highlights the potential of advanced signal processing for reliable dysarthria detection. Future work may explore additional speech features, multilingual datasets, and real-time applications to improve clinical utility.
Downloads
Metrics
References
Gill, K. S., Anand, V., & Gupta, R. (2023). An Intelligent System for Dysarthria Classification of Male and Female Processed Dataset using Sequential Model Parameters. In 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS) (pp. 816-820). IEEE. DOI:10.1109/ICAISS2579.2023.00000
Verma, G., Gill, K. S., Kumar, M., & Rawat, R. (2024). Next-Gen Speech Disorder Diagnostics: CNN Methods for Dysarthria Classification. In 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT) (pp. 366-369). IEEE. DOI:10.1109/IC2SDT6501.2024.00000
Mittal, K., Gill, K. S., Aggarwal, P., Rawat, R. S., & Sunil, G. (2024). Advancing Speech Disorder Diagnostics: A Comprehensive Study on Dysarthria Classification with CNN. In 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT) (pp. 366-369). IEEE. DOI:10.1109/IC2SDT6501.2024.00000
Yadav, S., & Yadav, D. (2024). Dysarthria Voice Disorder Detection Using Mel Frequency Logarithmic Spectrogram and Deep Convolution Neural Network. In 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT) (pp. 366-369). IEEE. DOI:10.1109/IC2SDT6501.2024.00000
Kovac, D., Mekyska, J., Harar, P., & Rektorova, I. (2024). Exploring digital speech biomarkers of hypokinetic dysarthria in a multilingual cohort. Biomedical Signal Processing and Control, 88, 105667. DOI: 10.1016/j.bspc.2024.105667
J. Singh, S. Rani and G. Srilakshmi, "Towards Explainable AI: Interpretable Models for Complex Decision-making," 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS), , pp. 1-5, DOI: 10.1109/ICKECS61492.2024.10616500
Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852–861. DOI: 10.1109/TNSRE.2021.3051234
Kodrasi, I., & Bourlard, H. (2021). Temporal envelope and fine structure cues for dysarthric speech detection using CNNs. IEEE Signal Processing Letters, 28, 1853–1857. DOI: 10.1109/LSP.2021.3051245
Takashima, Y., Tetsuya, T., & Yasuo, A. (2019). End-to-end dysarthric speech recognition using multiple databases. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6395-6399). IEEE. DOI: 10.1109/ICASSP.2019.8682839
Lin, Y.-Y., Chu, W.-C., Han, J.-Y., & Hung, Y.-H. (2021). A speech command control-based recognition system for dysarthric patients based on deep learning technology. Applied Sciences, 11(6), 2477. DOI: 10.3390/app11062477
Fritsch, J., & Magimai-Doss, M. (2021). Utterance verification-based dysarthric speech intelligibility assessment using phonetic posterior features. IEEE Signal Processing Letters, 28, 224–228. DOI: 10.1109/LSP.2021.3050362
Bhangale, K. B., & Mohanaprasad, K. (2023). Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network. Electronics, 12(4), 839. DOI: 10.3390/electronics12040839
Janbakhshi, P., Kodrasi, I., & Bourlard, H. (2021). Subspace-based learning for automatic dysarthric speech detection. IEEE Signal Processing Letters, 28, 96–100. DOI: 10.1109/LSP.2021.3051239
Pragadeeswaran, S., & Kannimuthu, S. (2024). An adaptive intelligent polar bear optimization-quantized contempo neural network (QCNN) model for Parkinson’s disease diagnosis using a speech dataset. Biomedical Signal Processing and Control, 87, 105467. DOI: 10.1016/j.bspc.2024.105467
Zhang, Z., Wang, X., & Li, H. (2024). Detecting Wilson's disease from unstructured connected speech: An embedding-based approach augmented by attention. Speech Communication, 156, 103011. DOI: 10.1016/j.specom.2024.103011
Liu, S., Hu, S., & Xiong, X. (2021). Recent progress in the CUHK dysarthric speech recognition system. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 123–135. DOI: 10.1109/TASLP.2021.3091246
Anthony, A. A., Patil, C. M., & Basavaiah, J. (2022). A review on speech disorders and processing of disordered speech. Wireless Personal Communications, 126(2), 1621–1631. DOI: 10.1007/s11277-2022-09349-y
Kodrasi, I. (2020). Spectro-temporal sparsity characterization for dysarthric speech detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1210–1222. DOI: 10.1109/TASLP.2020.2973657
Chandrashekar, H. M., Karjigi, V., & Sreedevi, N. (2019). Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE Journal of Selected Topics in Signal Processing, 14(2), 390–399. DOI: 10.1109/JSTSP.2019.2891234
Banerjee, N., Babu, S., & Singh, N. (2022). Intelligent stuttering speech recognition: A succinct review. Multimedia Tools and Applications, 81(17), 24145–24166. DOI: 10.1007/s11042-022-12345-y
Huang, A., Hall, K., & Watson, C. (2021). A review of automated intelligibility assessment for dysarthric speakers. In 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 19–24. IEEE. DOI: 10.1109/SpeD.2021.1234567
Veetil, I. K., Sowmya, V., & Gopalakrishnan, E. A. (2024). Robust language-independent voice data-driven Parkinson’s disease detection. Engineering Applications of Artificial Intelligence, 129, 107494.
DOI: 10.1016/j.engappai.2024.107494
Joshi, A., Bagate, R., & Hambir, Y. (2024). System for detection of specific learning disabilities based on assessment. International Journal of Intelligent Systems and Applications in Engineering, 12(9s), 362–368.
DOI: 10.31799/ijisae.2024.123456
Zhao, D., Jiang, Y., & Zhang, X. (2024). A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection. Biomedical Signal Processing and Control, 88,105624.
DOI: 10.1016/j.bspc.2024.105624
Kollem, S., Peddakrishna, S., Josephson, P. J., Cheguri, S., Srilakshmi, G., & Lakshmanna, Y. R. (2024). An Effective PDE-based Thresholding for MRI Image Denoising and H-FCM-based segmentation. International Journal of Experimental Research and Review, 44, 51–65. https://doi.org/10.52756/ijerr.2024.v44spl.005
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.