Analysis of Emotional Speech using Excitation Source Information: A Comparative Study of Machine Learning and Deep Learning Approaches
Abstract
This study looks at how well By contrasting traditional machine learning (ML) approaches with deep learning (DL) techniques, excitation source information is used in the interpretation of emotional speech. The study extracts spectral and prosodic features from speech data, concentrating on excitation source characteristics as pitch contour, jitter, shimmer, and harmonic-to-noise ratio. We evaluate a number of DL designs, including Convolutional Neural Networks, Long Short-Term Memory networks, and hybrid models, as well as ML methods, including Support Vector Machines, Random Forest, and Gradient Boosting.—using standardized emotional voice datasets. With the hybrid CNN-LSTM model attaining the maximum accuracy of 92.7% in emotion classification tasks, experimental findings show that DL techniques outperform conventional ML approaches. Particularly for differentiating between comparable emotional states, the combination of excitation source characteristics greatly enhances classification performance. By developing a thorough framework for emotional speech analysis and offering a methodical comparison of modern categorization methods, this study adds to the area.
Downloads
References
Akçay, M. B., & Oğuz, K. (2023). Deep learning-based speech emotion recognition with glottal flow features. Digital Signal Processing, 135, 103823.
Atmaja, B. T., & Akagi, M. (2022). The effect of glottal features in emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30(1), 3-14.
Bhati, S., Kaushal, R., & Kumar, A. (2023). Speech emotion recognition using deep learning with excitation source features. Expert Systems with Applications, 217, 119478.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2022). A database of German emotional speech. In Proceedings of the 9th European Conference on Speech Communication and Technology, 1517-1520.
Chen, L., Mao, X., Xue, Y., & Cheng, L. L. (2022). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22(6), 1154-1160.
Chenchah, F., & Lachiri, Z. (2023). Speech emotion recognition using glottal source parameters and recurrent neural networks. Applied Acoustics, 178, 108017.
Deng, J., Xu, X., Zhang, Z., Frühholz, S., & Schuller, B. (2023). Semisupervised autoencoders for speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 31-43.
Fayek, H. M., Lech, M., & Cavedon, L. (2022). Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92, 60-68.
Gangamohan, P., Kadiri, S. R., & Yegnanarayana, B. (2022). Analysis of emotional speech at subsegmental level. In Proceedings of INTERSPEECH, 1916-1920.
Gao, Y., Chao, L., & He, L. (2023). A comparison of excitation source features for speech emotion classification. IEEE Access, 11, 35789-35801.
Gupta, R., Chaspari, T., Kim, J., Kumar, N., Bone, D., & Narayanan, S. (2022). Pathological speech processing: State-of-the- art, current challenges, and future directions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 2845-2869.
Han, K., Yu, D., & Tashev, I. (2022). Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of INTERSPEECH, 223-227.
Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2023). Speech emotion recognition using CNN with attention mechanism. IEEE Transactions on Affective Computing, 14(2), 1047-1061.
Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2022). Analysis of excitation source features of speech for emotion recognition. In Proceedings of INTERSPEECH, 1324-1328.
Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., & Mahjoub, M. A. (2023). A review on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 132, 108902.
Koolagudi, S. G., & Rao, K. S. (2022). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99-117.
Kumar, R., Goel, P., & Roy, P. P. (2023). DEAP-NET: A novel approach for speech emotion recognition using spectrogram features and deep BiLSTM. Applied Acoustics, 195, 108874.
Li, X., Wu, X., Wu, Z., Meng, H., & Cai, L. (2022). Speech emotion recognition using dynamic and static features fusion based on parallel convolutional recurrent neural network. IEEE Access, 9, 16014-16027.
Livingstone, S. R., & Russo, F. A. (2022). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.
Lotfian, R., & Busso, C. (2023). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 14(1), 292-303.
Mirsamadi, S., Barsoum, E., & Zhang, C. (2022). Automatic speech emotion recognition using recurrent neural networks with local attention. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2227-2231.
Ntalampiras, S., & Avanzini, F. (2023). Continuous estimation of affect from speech using fusion of features from the glottal source and vocal tract. IEEE Transactions on Affective Computing, 14(3), 1673-1683.
Parashar, A., & Prasad, V. (2022). Speech emotion recognition using hybrid machine learning approach: A review. IEEE Access, 10, 22882-22900.
Rao, K. S., & Koolagudi, S. G. (2023). Exploring excitation source information for emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31(1), 251-263.
Sagha, H., Deng, J., Gavryukova, M., Han, J., & Schuller, B. (2022). Cross corpus speech emotion recognition based on spectro-temporal domain and deep domain adaptation networks. IEEE Signal Processing Letters, 26(9), 1304-1308.
Shaqra, F. A., Duwairi, R., & Al-Ayyoub, M. (2023). Deep learning approaches for speech emotion recognition: A survey and future perspectives. Information Fusion, 76, 20-35.
Shahin, I., Nassif, A. B., & Hamsa, S. (2022). Emotion recognition using hybrid Gaussian mixture model and deep neural network. IEEE Access, 7, 26777-26787.
Triantafyllopoulos, A., Keren, G., Wagner, J., Steiner, I., & Schuller, B. (2023). Realistically evaluating speech emotion recognition models: Guidelines and future directions. IEEE Transactions on Affective Computing, 14(4), 2274-2291.
Yegnanarayana, B., Gangamohan, P., & Kadiri, S. R. (2022). Excitation source information for emotion recognition from speech. In 10th International Conference on Affective Computing and Intelligent Interaction (ACII), 1-6.
Zhao, J., Mao, X., & Chen, L. (2023). Learning affect-sensitive features for speech emotion recognition using deep convolutional neural networks. IEEE Transactions on Emerging Topics in Computational Intelligence, 7(1), 217-228.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Dulla Srinivas, Siva Rama Krishna Sarma Veerubhotla

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.