Laplacian Operators for Scientific Computing: A Comparative Analysis of CPU and GPU Implementations
Keywords:
Image Processing, GPU Acceleration, Performance BenchmarkingAbstract
This paper presents a comprehensive bench-marking study of a 2D Laplacian filter implemented on both CPU and GPU architectures for image processing applications. The Laplacian filter serves as a fundamental tool in edge detection and feature extraction, playing a crucial role in various computer vision tasks
Downloads
Metrics
References
Arteaga, A., Ruprecht, D., & Krause, R. (2014). A stencil-based implementation of Parareal in the C++ domain specific embedded language STELLA. *ArXiv.* https://doi.org/10.1016/j.amc.2014.12.055
Bianco, M., & Varetto, U. (2012). A Generic Library for Stencil Computations. *ArXiv.* https://arxiv.org/abs/1207.1746
Birke, M., Philip, B., Wang, Z., & Berrill, M. (2012). Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs. *ArXiv.* https://arxiv.org/abs/1208.1975
Brown, N., Echols, B., Zarins, J., & Grosser, T. (2022). TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine. *ArXiv.* https://arxiv.org/abs/2210.04795
Brown, N., Jamieson, M., Lydike, A., Bauer, E., & Grosser, T. (2023). Towards Accelerating high-order stencil computations on modern GPUs and emerging architectures using a portable framework.
*ArXiv.* https://doi.org/10.1145/3624062.3624167
Denzler, A., Bera, R., Hajinazar, N., Singh, G., Oliveira, G. F., & Mutlu, O. (2021). Casper: Accelerating Stencil Computation using Near-cache Processing. *ArXiv.* https://arxiv.org/abs/2112.14216
Ernst, D., Holzer, M., Hager, G., Knorr, M., & Wellein, G. (2022). Analytical Performance Estimation during Code Generation on Modern GPUs. *ArXiv.* https://arxiv.org/abs/2204.14242
Gloster, A. (2021). GPU Methodologies for Numerical Partial Differential Equations. *ArXiv.* https://arxiv.org/abs/2101.06550
Kachris, C. (2024). A Survey on Hardware Accelerators for Large Language Models. *ArXiv.* https://arxiv.org/abs/2401.09890
Kerzner, Ethan, and Timothy Urness. "GPU Programming for Mathematical and Scientific Computing."
*Drake University* (2010).
Luo, W., Fan, R., Li, Z., Du, D., Wang, Q., & Chu, X. (2024). Benchmarking and Dissecting the Nvidia Hopper GPU Architecture. *ArXiv.* https://arxiv.org/abs/2402.13499
Matsumura, K., Zohouri, H. R., Wahib, M., Endo, T., & Matsuoka, S. (2020). AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs. *ArXiv.* https://doi.org/10.1145/3368826.3377904
Mayer, F., Brandner, J., & Philippsen, M. (2024). Utilizing polyhedral methods to optimize stencil computations on FPGAs, incorporating stencil-specific caches, data reuse strategies, and wide data bursts. *ArXiv.* https://arxiv.org/abs/2401.13645
Omlin, S., & Räss, L. (2022). High-performance xPU Stencil Computations in Julia. *ArXiv.* https://arxiv.org/abs/2211.15634
Omlin, S., Räss, L., & Utkin, I. (2022). Distributed Parallelization of xPU Stencil Computations in Julia.
*ArXiv.* https://arxiv.org/abs/2211.15716
Paredes, E. G., Groner, L., Ubbiali, S., Vogt, H., Madonna, A., Mariotti, K., Cruz, F., Benedicic, L., Bianco, M., VandeVondele, J., & Schulthess, T. C. (2023). GT4Py: Python-based high-performance stencil computations tailored for weather and climate applications. *ArXiv.* https://arxiv.org/abs/2311.08322
Pekkilä, J., Väisälä, M. S., Käpylä, M. J., Rheinhardt, M., & Lappi, O. (2021). Implementing scalable communication techniques for high-order stencil computations by leveraging CUDA-aware MPI.
*ArXiv.* https://doi.org/10.1016/j.parco.2022.102904
Quezada, F. A., & Navarro, C. A. (2021). Accelerating Compact Fractals with Tensor Core GPUs.
*ArXiv.* https://arxiv.org/abs/2110.12952
Reguly, I. Z., Mudalige, G. R., & Giles, M. B. (2017). Exploring out-of-core stencil computations beyond the limitations of 16GB memory. *ArXiv.* https://arxiv.org/abs/1709.02125
Rodrigues, V. H., Cavalcante, L., Pereira, M. B., Luporini, F., Reguly, I., Gorman, G., & De Souza, S. X. (2019). GPU Support for Automatic Generation of Finite-Differences Stencil Kernels. *ArXiv.* $https://doi.org/10.1007/978-3-030-41005-6_16$
Sai, R., & Xu, J. (2023). Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework. *ArXiv.* https://arxiv.org/abs/2309.04671
Seznec, Mickael, et al. "Computing large 2D convolutions on GPU efficiently with the im2tensor algorithm." *Journal of Real-Time Image Processing* 19.6 (2022): 1035-1047.
Shen, J., Deng, X., Wu, Y., Okita, M., & Ino, F. (2022). Compression-Based Optimizations for Out-of-Core GPU Stencil Computation. *ArXiv.* https://arxiv.org/abs/2204.11315
Shen, J., Long, L., Zhang, J., Shen, W., Okita, M., & Ino, F. (2023). A Synergy between On- and Off-Chip Data Reuse for GPU-based Out-of-Core Stencil Computation. *ArXiv.* https://arxiv.org/abs/2309.08864
Shen, J., Wu, Y., Okita, M., & Ino, F. (2021). Accelerating GPU
26.-Based Out-of-Core Stencil Computation with On-the-Fly Compression. *ArXiv.* https://arxiv.org/abs/2109.05410
Smith, Melissa C., Jeffery S. Vetter, and Sadaf R. Alam. "Scientific computing beyond CPUs: FPGA implementations of common scientific kernels." *2005 MAPLD International Conference.* 2005.
Yang, J., Giannoula, C., Wu, J., Elhoushi, M., Gleeson, J., & Pekhimenko, G. (2023). Minuet: Accelerating 3D Sparse Convolutions on GPUs. *ArXiv.* https://arxiv.org/abs/2401.06145
Zhang, L., M., Wahib, P., Chen, J., Meng, X., Wang, T., Endo, & Matsuoka, S. (2023). Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt). *ArXiv.* https://doi.org/10.1145/3589236.3589242
Zhang, L., M., Wahib, P., Chen, J., Meng, X., Wang, T., Endo, & Matsuoka, S. (2023). Revisiting Temporal Blocking Stencil Optimizations. *ArXiv.* https://doi.org/10.1145/3577193.3593716
Zohouri, H. R., Podobas, A., & Matsuoka, S. (2020). High-Performance High-Order Stencil Computation on FPGAs Using OpenCL. *ArXiv.* https://doi.org/10.1109/IPDPSW.2018.00027
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.