Laplacian Operators for Scientific Computing: A Comparative Analysis of CPU and GPU Implementations

Authors

  • Aswini J
  • Marinto Richee J
  • M. G. Dinesh
  • A. Gayathri

Keywords:

Image Processing, GPU Acceleration, Performance Benchmarking

Abstract

This paper presents a comprehensive bench-marking study of a 2D Laplacian filter implemented on both CPU and GPU architectures for image processing applications. The Laplacian filter serves as a fundamental tool in edge detection and feature extraction, playing a crucial role in various computer vision tasks

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Arteaga, A., Ruprecht, D., & Krause, R. (2014). A stencil-based implementation of Parareal in the C++ domain specific embedded language STELLA. *ArXiv.* https://doi.org/10.1016/j.amc.2014.12.055

Bianco, M., & Varetto, U. (2012). A Generic Library for Stencil Computations. *ArXiv.* https://arxiv.org/abs/1207.1746

Birke, M., Philip, B., Wang, Z., & Berrill, M. (2012). Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs. *ArXiv.* https://arxiv.org/abs/1208.1975

Brown, N., Echols, B., Zarins, J., & Grosser, T. (2022). TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine. *ArXiv.* https://arxiv.org/abs/2210.04795

Brown, N., Jamieson, M., Lydike, A., Bauer, E., & Grosser, T. (2023). Towards Accelerating high-order stencil computations on modern GPUs and emerging architectures using a portable framework.

*ArXiv.* https://doi.org/10.1145/3624062.3624167

Denzler, A., Bera, R., Hajinazar, N., Singh, G., Oliveira, G. F., & Mutlu, O. (2021). Casper: Accelerating Stencil Computation using Near-cache Processing. *ArXiv.* https://arxiv.org/abs/2112.14216

Ernst, D., Holzer, M., Hager, G., Knorr, M., & Wellein, G. (2022). Analytical Performance Estimation during Code Generation on Modern GPUs. *ArXiv.* https://arxiv.org/abs/2204.14242

Gloster, A. (2021). GPU Methodologies for Numerical Partial Differential Equations. *ArXiv.* https://arxiv.org/abs/2101.06550

Kachris, C. (2024). A Survey on Hardware Accelerators for Large Language Models. *ArXiv.* https://arxiv.org/abs/2401.09890

Kerzner, Ethan, and Timothy Urness. "GPU Programming for Mathematical and Scientific Computing."

*Drake University* (2010).

Luo, W., Fan, R., Li, Z., Du, D., Wang, Q., & Chu, X. (2024). Benchmarking and Dissecting the Nvidia Hopper GPU Architecture. *ArXiv.* https://arxiv.org/abs/2402.13499

Matsumura, K., Zohouri, H. R., Wahib, M., Endo, T., & Matsuoka, S. (2020). AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs. *ArXiv.* https://doi.org/10.1145/3368826.3377904

Mayer, F., Brandner, J., & Philippsen, M. (2024). Utilizing polyhedral methods to optimize stencil computations on FPGAs, incorporating stencil-specific caches, data reuse strategies, and wide data bursts. *ArXiv.* https://arxiv.org/abs/2401.13645

Omlin, S., & Räss, L. (2022). High-performance xPU Stencil Computations in Julia. *ArXiv.* https://arxiv.org/abs/2211.15634

Omlin, S., Räss, L., & Utkin, I. (2022). Distributed Parallelization of xPU Stencil Computations in Julia.

*ArXiv.* https://arxiv.org/abs/2211.15716

Paredes, E. G., Groner, L., Ubbiali, S., Vogt, H., Madonna, A., Mariotti, K., Cruz, F., Benedicic, L., Bianco, M., VandeVondele, J., & Schulthess, T. C. (2023). GT4Py: Python-based high-performance stencil computations tailored for weather and climate applications. *ArXiv.* https://arxiv.org/abs/2311.08322

Pekkilä, J., Väisälä, M. S., Käpylä, M. J., Rheinhardt, M., & Lappi, O. (2021). Implementing scalable communication techniques for high-order stencil computations by leveraging CUDA-aware MPI.

*ArXiv.* https://doi.org/10.1016/j.parco.2022.102904

Quezada, F. A., & Navarro, C. A. (2021). Accelerating Compact Fractals with Tensor Core GPUs.

*ArXiv.* https://arxiv.org/abs/2110.12952

Reguly, I. Z., Mudalige, G. R., & Giles, M. B. (2017). Exploring out-of-core stencil computations beyond the limitations of 16GB memory. *ArXiv.* https://arxiv.org/abs/1709.02125

Rodrigues, V. H., Cavalcante, L., Pereira, M. B., Luporini, F., Reguly, I., Gorman, G., & De Souza, S. X. (2019). GPU Support for Automatic Generation of Finite-Differences Stencil Kernels. *ArXiv.* $https://doi.org/10.1007/978-3-030-41005-6_16$

Sai, R., & Xu, J. (2023). Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework. *ArXiv.* https://arxiv.org/abs/2309.04671

Seznec, Mickael, et al. "Computing large 2D convolutions on GPU efficiently with the im2tensor algorithm." *Journal of Real-Time Image Processing* 19.6 (2022): 1035-1047.

Shen, J., Deng, X., Wu, Y., Okita, M., & Ino, F. (2022). Compression-Based Optimizations for Out-of-Core GPU Stencil Computation. *ArXiv.* https://arxiv.org/abs/2204.11315

Shen, J., Long, L., Zhang, J., Shen, W., Okita, M., & Ino, F. (2023). A Synergy between On- and Off-Chip Data Reuse for GPU-based Out-of-Core Stencil Computation. *ArXiv.* https://arxiv.org/abs/2309.08864

Shen, J., Wu, Y., Okita, M., & Ino, F. (2021). Accelerating GPU

26.-Based Out-of-Core Stencil Computation with On-the-Fly Compression. *ArXiv.* https://arxiv.org/abs/2109.05410

Smith, Melissa C., Jeffery S. Vetter, and Sadaf R. Alam. "Scientific computing beyond CPUs: FPGA implementations of common scientific kernels." *2005 MAPLD International Conference.* 2005.

Yang, J., Giannoula, C., Wu, J., Elhoushi, M., Gleeson, J., & Pekhimenko, G. (2023). Minuet: Accelerating 3D Sparse Convolutions on GPUs. *ArXiv.* https://arxiv.org/abs/2401.06145

Zhang, L., M., Wahib, P., Chen, J., Meng, X., Wang, T., Endo, & Matsuoka, S. (2023). Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt). *ArXiv.* https://doi.org/10.1145/3589236.3589242

Zhang, L., M., Wahib, P., Chen, J., Meng, X., Wang, T., Endo, & Matsuoka, S. (2023). Revisiting Temporal Blocking Stencil Optimizations. *ArXiv.* https://doi.org/10.1145/3577193.3593716

Zohouri, H. R., Podobas, A., & Matsuoka, S. (2020). High-Performance High-Order Stencil Computation on FPGAs Using OpenCL. *ArXiv.* https://doi.org/10.1109/IPDPSW.2018.00027

Downloads

Published

2025-05-29

How to Cite

1.
J A, Richee J M, Dinesh MG, Gayathri A. Laplacian Operators for Scientific Computing: A Comparative Analysis of CPU and GPU Implementations. J Neonatal Surg [Internet]. 2025May29 [cited 2025Oct.8];14(29S):75-84. Available from: https://www.jneonatalsurg.com/index.php/jns/article/view/6714