Analysis of Emotional Speech using Excitation Source Information: A Comparative Study of Machine Learning and Deep Learning Approaches

Authors

  • Dulla Srinivas
  • Siva Rama Krishna Sarma Veerubhotla

Abstract

This study looks at how well By contrasting traditional machine learning (ML) approaches with deep learning (DL) techniques, excitation source information is used in the interpretation of emotional speech. The study extracts spectral and prosodic features from speech data, concentrating on excitation source characteristics as pitch contour, jitter, shimmer, and harmonic-to-noise ratio. We evaluate a number of DL designs, including Convolutional Neural Networks, Long Short-Term Memory networks, and hybrid models, as well as ML methods, including Support Vector Machines, Random Forest, and Gradient Boosting.—using standardized emotional voice datasets. With the hybrid CNN-LSTM model attaining the maximum accuracy of 92.7% in emotion classification tasks, experimental findings show that DL techniques outperform conventional ML approaches. Particularly for differentiating between comparable emotional states, the combination of excitation source characteristics greatly enhances classification performance. By developing a thorough framework for emotional speech analysis and offering a methodical comparison of modern categorization methods, this study adds to the area.

Downloads

Download data is not yet available.

References

Akçay, M. B., & Oğuz, K. (2023). Deep learning-based speech emotion recognition with glottal flow features. Digital Signal Processing, 135, 103823.

Atmaja, B. T., & Akagi, M. (2022). The effect of glottal features in emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30(1), 3-14.

Bhati, S., Kaushal, R., & Kumar, A. (2023). Speech emotion recognition using deep learning with excitation source features. Expert Systems with Applications, 217, 119478.

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2022). A database of German emotional speech. In Proceedings of the 9th European Conference on Speech Communication and Technology, 1517-1520.

Chen, L., Mao, X., Xue, Y., & Cheng, L. L. (2022). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22(6), 1154-1160.

Chenchah, F., & Lachiri, Z. (2023). Speech emotion recognition using glottal source parameters and recurrent neural networks. Applied Acoustics, 178, 108017.

Deng, J., Xu, X., Zhang, Z., Frühholz, S., & Schuller, B. (2023). Semisupervised autoencoders for speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1), 31-43.

Fayek, H. M., Lech, M., & Cavedon, L. (2022). Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92, 60-68.

Gangamohan, P., Kadiri, S. R., & Yegnanarayana, B. (2022). Analysis of emotional speech at subsegmental level. In Proceedings of INTERSPEECH, 1916-1920.

Gao, Y., Chao, L., & He, L. (2023). A comparison of excitation source features for speech emotion classification. IEEE Access, 11, 35789-35801.

Gupta, R., Chaspari, T., Kim, J., Kumar, N., Bone, D., & Narayanan, S. (2022). Pathological speech processing: State-of-the- art, current challenges, and future directions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 2845-2869.

Han, K., Yu, D., & Tashev, I. (2022). Speech emotion recognition using deep neural network and extreme learning machine. In Proceedings of INTERSPEECH, 223-227.

Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2023). Speech emotion recognition using CNN with attention mechanism. IEEE Transactions on Affective Computing, 14(2), 1047-1061.

Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2022). Analysis of excitation source features of speech for emotion recognition. In Proceedings of INTERSPEECH, 1324-1328.

Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., & Mahjoub, M. A. (2023). A review on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 132, 108902.

Koolagudi, S. G., & Rao, K. S. (2022). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99-117.

Kumar, R., Goel, P., & Roy, P. P. (2023). DEAP-NET: A novel approach for speech emotion recognition using spectrogram features and deep BiLSTM. Applied Acoustics, 195, 108874.

Li, X., Wu, X., Wu, Z., Meng, H., & Cai, L. (2022). Speech emotion recognition using dynamic and static features fusion based on parallel convolutional recurrent neural network. IEEE Access, 9, 16014-16027.

Livingstone, S. R., & Russo, F. A. (2022). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391.

Lotfian, R., & Busso, C. (2023). Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings. IEEE Transactions on Affective Computing, 14(1), 292-303.

Mirsamadi, S., Barsoum, E., & Zhang, C. (2022). Automatic speech emotion recognition using recurrent neural networks with local attention. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2227-2231.

Ntalampiras, S., & Avanzini, F. (2023). Continuous estimation of affect from speech using fusion of features from the glottal source and vocal tract. IEEE Transactions on Affective Computing, 14(3), 1673-1683.

Parashar, A., & Prasad, V. (2022). Speech emotion recognition using hybrid machine learning approach: A review. IEEE Access, 10, 22882-22900.

Rao, K. S., & Koolagudi, S. G. (2023). Exploring excitation source information for emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31(1), 251-263.

Sagha, H., Deng, J., Gavryukova, M., Han, J., & Schuller, B. (2022). Cross corpus speech emotion recognition based on spectro-temporal domain and deep domain adaptation networks. IEEE Signal Processing Letters, 26(9), 1304-1308.

Shaqra, F. A., Duwairi, R., & Al-Ayyoub, M. (2023). Deep learning approaches for speech emotion recognition: A survey and future perspectives. Information Fusion, 76, 20-35.

Shahin, I., Nassif, A. B., & Hamsa, S. (2022). Emotion recognition using hybrid Gaussian mixture model and deep neural network. IEEE Access, 7, 26777-26787.

Triantafyllopoulos, A., Keren, G., Wagner, J., Steiner, I., & Schuller, B. (2023). Realistically evaluating speech emotion recognition models: Guidelines and future directions. IEEE Transactions on Affective Computing, 14(4), 2274-2291.

Yegnanarayana, B., Gangamohan, P., & Kadiri, S. R. (2022). Excitation source information for emotion recognition from speech. In 10th International Conference on Affective Computing and Intelligent Interaction (ACII), 1-6.

Zhao, J., Mao, X., & Chen, L. (2023). Learning affect-sensitive features for speech emotion recognition using deep convolutional neural networks. IEEE Transactions on Emerging Topics in Computational Intelligence, 7(1), 217-228.

Downloads

Published

2025-07-30

How to Cite

1.
Srinivas D, Krishna Sarma Veerubhotla SR. Analysis of Emotional Speech using Excitation Source Information: A Comparative Study of Machine Learning and Deep Learning Approaches. J Neonatal Surg [Internet]. 2025Jul.30 [cited 2025Oct.13];14(32S). Available from: https://www.jneonatalsurg.com/index.php/jns/article/view/8386