Dysarthria Detection and Speech-to-Text Transcription Using Deep Learning and Audio Processing

Authors

Garaga Srilakshmi
Vadakattu Sai Harsha
Kurakula Nitin
Bera Vamsi Krishna
Osipilli David Raju

DOI:

https://doi.org/10.52783/jns.v14.2276

Keywords:

Deep Convolutional Neural Networks, Dysarthria, Mel Frequency Logarithmic Spectrograms

Abstract

Dysarthria is a motor speech disorder affecting articulation, pitch, and rhythm due to neurological damage in the human body. Early detection is crucial for effective therapy. This study presents a novel dysarthria detection approach using Mel Frequency Logarithmic Spectrograms (MFLS) and Deep Convolutional Neural Networks (DCNN). Speech signals are preprocessed to extract MFLS, capturing essential frequency and temporal features. These spectrograms serve as input to a DCNN, which identifies patterns associated with dysarthric speech.

The model was trained on publicly available datasets, achieving high accuracy and robustness across different severity levels. It performed well under varying conditions such as speech duration, speaker age, and recording quality. Integrating spectrogram-based feature extraction with deep learning enhances automated speech disorder diagnosis.

This study highlights the potential of advanced signal processing for reliable dysarthria detection. Future work may explore additional speech features, multilingual datasets, and real-time applications to improve clinical utility.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Gill, K. S., Anand, V., & Gupta, R. (2023). An Intelligent System for Dysarthria Classification of Male and Female Processed Dataset using Sequential Model Parameters. In 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS) (pp. 816-820). IEEE. DOI:10.1109/ICAISS2579.2023.00000

Verma, G., Gill, K. S., Kumar, M., & Rawat, R. (2024). Next-Gen Speech Disorder Diagnostics: CNN Methods for Dysarthria Classification. In 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT) (pp. 366-369). IEEE. DOI:10.1109/IC2SDT6501.2024.00000

Mittal, K., Gill, K. S., Aggarwal, P., Rawat, R. S., & Sunil, G. (2024). Advancing Speech Disorder Diagnostics: A Comprehensive Study on Dysarthria Classification with CNN. In 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT) (pp. 366-369). IEEE. DOI:10.1109/IC2SDT6501.2024.00000

Yadav, S., & Yadav, D. (2024). Dysarthria Voice Disorder Detection Using Mel Frequency Logarithmic Spectrogram and Deep Convolution Neural Network. In 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT) (pp. 366-369). IEEE. DOI:10.1109/IC2SDT6501.2024.00000

Kovac, D., Mekyska, J., Harar, P., & Rektorova, I. (2024). Exploring digital speech biomarkers of hypokinetic dysarthria in a multilingual cohort. Biomedical Signal Processing and Control, 88, 105667. DOI: 10.1016/j.bspc.2024.105667

J. Singh, S. Rani and G. Srilakshmi, "Towards Explainable AI: Interpretable Models for Complex Decision-making," 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS), , pp. 1-5, DOI: 10.1109/ICKECS61492.2024.10616500

Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852–861. DOI: 10.1109/TNSRE.2021.3051234

Kodrasi, I., & Bourlard, H. (2021). Temporal envelope and fine structure cues for dysarthric speech detection using CNNs. IEEE Signal Processing Letters, 28, 1853–1857. DOI: 10.1109/LSP.2021.3051245

Takashima, Y., Tetsuya, T., & Yasuo, A. (2019). End-to-end dysarthric speech recognition using multiple databases. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6395-6399). IEEE. DOI: 10.1109/ICASSP.2019.8682839

Lin, Y.-Y., Chu, W.-C., Han, J.-Y., & Hung, Y.-H. (2021). A speech command control-based recognition system for dysarthric patients based on deep learning technology. Applied Sciences, 11(6), 2477. DOI: 10.3390/app11062477

Fritsch, J., & Magimai-Doss, M. (2021). Utterance verification-based dysarthric speech intelligibility assessment using phonetic posterior features. IEEE Signal Processing Letters, 28, 224–228. DOI: 10.1109/LSP.2021.3050362

Bhangale, K. B., & Mohanaprasad, K. (2023). Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network. Electronics, 12(4), 839. DOI: 10.3390/electronics12040839

Janbakhshi, P., Kodrasi, I., & Bourlard, H. (2021). Subspace-based learning for automatic dysarthric speech detection. IEEE Signal Processing Letters, 28, 96–100. DOI: 10.1109/LSP.2021.3051239

Pragadeeswaran, S., & Kannimuthu, S. (2024). An adaptive intelligent polar bear optimization-quantized contempo neural network (QCNN) model for Parkinson’s disease diagnosis using a speech dataset. Biomedical Signal Processing and Control, 87, 105467. DOI: 10.1016/j.bspc.2024.105467

Zhang, Z., Wang, X., & Li, H. (2024). Detecting Wilson's disease from unstructured connected speech: An embedding-based approach augmented by attention. Speech Communication, 156, 103011. DOI: 10.1016/j.specom.2024.103011

Liu, S., Hu, S., & Xiong, X. (2021). Recent progress in the CUHK dysarthric speech recognition system. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 123–135. DOI: 10.1109/TASLP.2021.3091246

Anthony, A. A., Patil, C. M., & Basavaiah, J. (2022). A review on speech disorders and processing of disordered speech. Wireless Personal Communications, 126(2), 1621–1631. DOI: 10.1007/s11277-2022-09349-y

Kodrasi, I. (2020). Spectro-temporal sparsity characterization for dysarthric speech detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1210–1222. DOI: 10.1109/TASLP.2020.2973657

Chandrashekar, H. M., Karjigi, V., & Sreedevi, N. (2019). Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE Journal of Selected Topics in Signal Processing, 14(2), 390–399. DOI: 10.1109/JSTSP.2019.2891234

Banerjee, N., Babu, S., & Singh, N. (2022). Intelligent stuttering speech recognition: A succinct review. Multimedia Tools and Applications, 81(17), 24145–24166. DOI: 10.1007/s11042-022-12345-y

Huang, A., Hall, K., & Watson, C. (2021). A review of automated intelligibility assessment for dysarthric speakers. In 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 19–24. IEEE. DOI: 10.1109/SpeD.2021.1234567

Veetil, I. K., Sowmya, V., & Gopalakrishnan, E. A. (2024). Robust language-independent voice data-driven Parkinson’s disease detection. Engineering Applications of Artificial Intelligence, 129, 107494.

DOI: 10.1016/j.engappai.2024.107494

Joshi, A., Bagate, R., & Hambir, Y. (2024). System for detection of specific learning disabilities based on assessment. International Journal of Intelligent Systems and Applications in Engineering, 12(9s), 362–368.

DOI: 10.31799/ijisae.2024.123456

Zhao, D., Jiang, Y., & Zhang, X. (2024). A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection. Biomedical Signal Processing and Control, 88,105624.

DOI: 10.1016/j.bspc.2024.105624

Kollem, S., Peddakrishna, S., Josephson, P. J., Cheguri, S., Srilakshmi, G., & Lakshmanna, Y. R. (2024). An Effective PDE-based Thresholding for MRI Image Denoising and H-FCM-based segmentation. International Journal of Experimental Research and Review, 44, 51–65. https://doi.org/10.52756/ijerr.2024.v44spl.005

Downloads

Published

2025-03-18

How to Cite

Srilakshmi G, Sai Harsha V, Nitin K, Krishna BV, David Raju O. Dysarthria Detection and Speech-to-Text Transcription Using Deep Learning and Audio Processing. J Neonatal Surg [Internet]. 2025Mar.18 [cited 2025Nov.18];14(6S):567-73. Available from: https://www.jneonatalsurg.com/index.php/jns/article/view/2276

Download Citation

Issue

Vol. 14 No. 6S (2025): Journal of Neonatal Surgery

Section

Original Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.