Deep Learning Approaches in Video Compression and Transmission Efficiency
Keywords:
Dataset, Feature Enhancement, Hyper-Prior, Light Field Video CompressionAbstract
In recent years, video files have significantly increased in size, posing challenges for storage and transmission. One effective solution has been to reduce the length of videos. In this study, we introduce Deep Bi VC, a dual-branch framework designed to improve video compression. The model was developed to address these challenges through two distinct compression strategies. Initially, video sequences were pre-processed by segmenting them into groups of five consecutive frames to enable efficient processing. For the first stage, we implemented an Invertible Neural Network (INN) to develop an image compression module. This component focuses on compressing the first and last frames of each group. Subsequently, a video compression module was developed, utilizing motion prediction techniques to interpolate the intermediate frames between key frames. Experimental evaluations using PSNR and MS-SSIM metrics demonstrated that Deep Bi VC outperformed several state-of-the-art methods. On the VUG dataset, the model achieved a PSNR of 37.16 and an MS-SSIM of 0.98 at 3.2 bits per pixel, indicating superior compression performance.
Downloads
References
Bienik, J., Uhrina, M., Sevcik, L., & Holesova, A. (2023). Impact of packet loss rate on quality of compressed high resolution videos. Sensors, 23(5), 2744.
Bruce, N. D. B., & Tsotsos, J. K. (2005). Saliency based on information maximization. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS’05, pp. 155–162. MIT Press, Cambridge, MA, USA.
C.-Y. Lu et al. Optimized projections for sparse representation-based classification Neurocomputing (2013)
Chen, W.G.; Yu, R.; Wang, X. Neural Network-Based Video Compression Artifact Reduction Using Temporal Correlation and Sparsity Prior Predictions. IEEE Access 2020, 8, 162479–162490.
Chen, Y., Mukherjee, D., Han, J., Grange, A., Xu, Y., Parker, S., et al. (2020). An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media. APSIPA Trans. Signal Inf. Process. 9.
Coding of Moving Video: High-Efficiency Video Coding (HEVC) ITU-T Recommendation H.265. Available online: https://handle.itu.int/11.1002/1000/14107 (accessed on 1 May 2023).
H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull, “HiNeRV: Video compression with hierarchical encoding based neural representation,” in NeurIPS, 2023.
Jeon, Minseong, and Kyungjoo Cheoi. "Efficient Video Compression Using Afterimage Representation." Sensors 24, no. 22 (2024): 7398.
Joy, Helen K., Manjunath R. Kounte, Arunkumar Chandrasekhar, and Manoranjan Paul. "Deep Learning Based Video Compression Techniques with Future Research Issues." Wireless Personal Communications 131, no. 4 (2023): 2599-2625.
Khatoonabadi, S. H., Bajić, I. V., & Shan, Y. (2015). Compressed-domain correlates of human fixations in dynamic scenes. Multimedia Tools and Applications, 74(22), 10057–10075.
Kovtun, V.; Izonin, I.; Gregus, M. Model of functioning of the centralized wireless information ecosystem focused on multimedia streaming. Egypt. Inform. J. 2022, 23, 89–96.
Ma, C., Liu, D., Peng, X., Li, L., & Wu, F. (2020). Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology, 30(7), 1901–1916.
Mochurad, L. (2024). A Comparison of Machine Learning-Based and Conventional Technologies for Video Compression. Technologies, 12(4), 52.
Moffat, A. Huffman coding. ACM Comput. Surv. (CSUR) 2019, 52, 1–35.
Mustafa, D.I.; Ali, I.A. Error Resilience of H. 264/Avc Coding Structures for Delivery over Wireless Networks. J. Duhok Univ. 2022, 25, 114–128.
Pfaff, J., Helle, P., Maniry, D., Kaltenstadler, S., Samek, W., Schwarz, H., Marpe, D., & Wiegand, T. (2018). Neural network based intra prediction for video coding, in Applications of Digital Image Processing XLI, vol. 10752. International Society for Optics and Photonics, 2018, p. 1075213. 28.
Rakhmanov, A.; Wiseman, Y. Compression of GNSS Data to speed up Communication to Autonomous Vehicles. Remote Sens. 2023, 15, 2165.
Santamaria, M., Malamal Vadakital, V. K., Kondrad, L., Hallapuro, A., and Hannuksela, M. M. (2021). “Coding of Volumetric Content with MIV Using VVC Subpictures,” in Proceeding of the IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, Oct. 2021 (IEEE).
Serrano-Carrasco, D. J., Diaz-Honrubia, A. J., & Cuenca, P. (2021). Video Compression for Screen Recorded Sequences Following Eye Movements. Journal of Signal Processing Systems, 93(12), 1457-1465.
Shao, D., Wang, N., Chen, P., Liu, Y. and Lin, L., 2024. A Novel Video Compression Approach Based on Two-Stage Learning. Entropy, 26(12), p.1110.
Shilpa, B.; Budati, A.K.; Rao, L.K.; Goyal, S.B. Deep learning based optimized data transmission over 5G networks with Lagrangian encoder. Comput. Electr. Eng. 2022, 102, 108164.
Sritharan, Braveenan, Chamira US Edussooriya, Chamith Wijenayake, R. J. Cintra, and Arjuna Madanayake. "Computationally Efficient Light Field Video Compression Using 5-D Approximate DCT." Journal of Low Power Electronics and Applications 15, no. 1 (2025): 2.
Sullivan, G.J., Ohm, J.-R., Han, W.-J., Wiegand, T.: Overview of the high-efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Terms of Reference of the Joint Collaborative Team on 3D Video Coding Extension Development ITU-T SG 16 document TD 532 (Plen/16) and ISO/IEC MPEG document N12830 Geneva Switzerland, Apr. 2012.
TranQ.N. et al. Video frame interpolation via down–upscale generative adversarial networks Comput. Vis. Image Underst. (2022)
Wiseman, Y. (2024). Video compression prototype for autonomous vehicles. Smart Cities, 7(2), 758-771.
Zhang, Z.T., Yeh, C.H., Kang, L.W., & Lin, M.H. (2017). Efficient CTU- based frame coding for HEVC based on deep learning, in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 661–664
Zhao, Y., Po, L. M., Cheung, K. W., Yu, W. Y., & Rehman, Y. A. U. (2021). SCGAN: saliency map-guided colourization with the generative adversarial network. IEEE Transactions on Circuits and Systems for Video Technology, 31(8), 3062–3077. https://doi.org/10.1109/TCSVT.2020.3037688.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.