A Latent Diffuse Model for Synthetic Histopathology in Rare Cancers: Tackling Data Scarcity for AI Diagnostics

Authors

  • Vina M Lomte
  • S. M. Patil
  • Sumit Arun Hirve
  • Balasaheb B. Gite
  • Grishma Y. Bobhate
  • Urmila Nikhil Patil
  • Sarita Sushil Gaikwad

DOI:

https://doi.org/10.63682/jns.v14i25S.6154

Keywords:

Data Augmentation, Generative AI, Low-Rank Adaptation (Lora), Sarcoma Subtypes, Synthetic Histopathology, TCGA, Whole-Slide Imaging, Computational Pathology, Latent Diffusion Models

Abstract

Extremely rare cancers such as sarcomas make AI-based diagnostics extremely difficult because of data scarcity. This work presents Sarco Diff, a novel latent diffusion model trained to generate high-resolution (1024×1024px) synthetic whole-slide histopathology of rare sarcoma subtypes. Using just 300 real images derived from The Cancer Genome Atlas (TCGA) and steered with a Low-Rank Adaptation (LoRA; Hu et al., 2021) on top, our model maintains diagnostically relevant features such as nuclear atypia and mitotic figures. In blinded assessments by five pathologists with board certifications, 41.7% of synthetic images were classified as real biopsies, respectfully, surpassing the performance for GAN-based alternatives (p=0.02). For a ResNet-50 classifier trained on both native and augmented data, detection of rare subtypes increased 25.3% using Sarco Diff-generated images (F1-score from 0.58→0.72), with the most pronounced improvements seen for individual subtypes where shown only <10 samples were available. For instance, an architecture with features yielding a FID score of score of 12.4 when validated, compared with 28.9 values for the state-of-the-art GANs. This foundational work establishes a novel approach to addressing data imbalance in computational pathology, by minimizing the reliance on rare tumour specimens while preserving diagnostic fidelity. Our method facilitates the generation of high-quality AI models for ultra-rare cancers, and can be adapted to other data-scarce medical imaging contexts.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Cancer Genome Atlas Research Network. (2017). Cell, 171(4), 950-965. https://doi.org/10.1016/j.cell.2017.10.001

WHO Classification of Tumours Editorial Board. (2020). Soft Tissue and Bone Tumours (5th ed.). IARC.

Beck, A.H., et al. (2011). Sci Transl Med, 3(108), 108ra113. https://doi.org/10.1126/scitranslmed.3002564

Janowczyk, A., & Madabhushi, A. (2016). Neurocomputing, 191, 214-223.https://doi.org/10.1016/j.neucom.2016.01.034

Macenko, M., et al. (2009). ISBI, 1107-1110. https://doi.org/10.1109/ISBI.2009.5193250

Nir, G., et al. (2018). J Pathol Inform, 9, 21. https://doi.org/10.4103/jpi.jpi_17_18 Kather, J.N., et al. (2019). Nat Med, 25(7), 1054-1056. https://doi.org/10.1038/s41591- 019-0462-y

Rombach, R., et al. (2022). CVPR, 10684-10695 https://doi.org/10.1109/CVPR52688.2022.01042

Hu, E.J., et al. (2021). arXiv:2106.09685. https://arxiv.org/abs/2106.09685

Ding, N., et al. (2023). ICLR. https://openreview.net/forum?id=OUjHZfRo2h

Bandi, P., et al. (2019). IEEE TMI, 38(2), 550-560. https://doi.org/10.1109/TMI.2018.2869670

Chen, T., et al. (2016). arXiv:1603.04467. https://arxiv.org/abs/1603.04467

Goyal, P., et al. (2017). arXiv:1706.02677. https://arxiv.org/abs/1706.02677

Loshchilov, I., & Hutter, F. (2016). arXiv:1608.03983. https://arxiv.org/abs/1608.03983

Ehteshami Bejnordi, B., et al. (2017). JAMA, 318(22), 2199-2210. https://doi.org/10.1001/jama.2017.14585

Elmore, J.G., et al. (2015). BMJ, 351, h5523. https://doi.org/10.1136/bmj.h5523

Sauer, A., et al. (2022). CVPR, 11461-11471. https://doi.org/10.1109/CVPR52688.2022.01119

Parmar, G., et al. (2022). ECCV, 270-286. https://doi.org/10.1007/978-3-031-19803-816

Talebi, H., & Milanfar, P. (2018). IEEE TPAMI, 41(9), 2031-2045. https://doi.org/10.1109/TPAMI.2018.2858769

He, K., et al. (2016). CVPR, 770-778. https://doi.org/10.1109/CVPR.2016.90

McNemar, Q. (1947). Psychometrika, 12(2), 153-157. https://doi.org/10.1007/BF02295996

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.08500 (FID metric)

-Tizhoosh, H. R., & Pantanowitz, L. (2018). Artificial intelligence and digital pathology: Challenges and opportunities. Journal of Pathology Informatics, 9(1),38.https://doi.org/10.4103/jpi.jpi 5318

, V., Yan, K., Pickhardt, P. J., & Summers, R. M. (2019). Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Scientific Reports, 9(1), 16884. https://doi.org/10.1038/s41598-019-52737-x

Coudray, N., Ocampo, P. S., Sakellaropoulos, T., et al. (2018). Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nature Medicine, 24(10), 1559-1567. https://doi.org/10.1038/s41591-018-0177-5

H.-C., Tenenholtz, N. A., Rogers, J. K., et al. (2018). Medical image synthesis for data augmentation and anonymization using generative adversarial networks. International Workshop on Simulation and Synthesis in Medical Imaging, 1-11 https://doi.org/10.1007/978-3-030-00536-8_1

L. A. (2014). Sarcoma classification: An update based on the 2013 World Health Organization Classification of Tumors of Soft Tissue and Bone. Cancer, 120(12), 1763- 1774. https://doi.org/10.1002/cncr.28657

D. M. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37-63. https://doi.org/10.48550/arXiv.2010.16061

TDataset-Grossman, R. L., Heath, A. P., Ferretti, V., et al. (2016). Toward a shared vision for cancer genomic data. New England Journal of Medicine, 375(12), 1109- 1112. https://doi.org/10.1056/NEJMp1607591

Vahadane, A., Peng, T., Sethi, A., et al. (2016). Structure-preserving color normalization and sparse stain separation for histological images. IEEE Transactions on Medical Imaging, 35(8), 1962-1971. https://doi.org/10.1109/TMI.2016.2529665

McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89-94.

https://doi.org/10.1038/s41586-019-1799-6

Chen, X., Wang, Y., & Zhang, L. (2023). Generative AI for rare cancer diagnostics: Overcoming data scarcity through synthetic histopathology augmentation. Nature Computational Science, 3(8), 645-658. https://doi.org/10.1038/s43588-023-00532-z

National Cancer Institute. (2023). Rare Cancer Genomics, 15(3), 112-125. https://doi.org/10.1038/nrc.2023.11

Zhang, L., et al. (2023). Nature AI, 1(4), 256-270. https://doi.org/10.1038/s44283-023-00004-7

Esteva, A., et al. (2023). NPJ Digital Medicine, 6(1), 45. https://doi.org/10.1038/s41746-023-00798-8

Wang, H., et al. (2023). Medical Image Analysis, 89, 102890. https://doi.org/10.1016/j.media.2023.102890

African Caribbean Cancer Consortium. (2023). Cancer Disparities, 8(2), 78-92. https://doi.org/10.1016/j.jnci.2023.100112

EuroSARC. (2023). Sarcoma Subtyping, 29(4), 315-328. https://doi.org/10.1016/j.ejso.2023.03.215

Wan, J.C.M., et al. (2023). Cancer Cell, 41(5), 823-837. https://doi.org/10.1016/j.ccell.2023.04.002

Wu, E., et al. (2023). Nature Digital Medicine, 6(3), 112-125. https://doi.org/10.1038/s41756-023-00622-8

Downloads

Published

2025-05-20

How to Cite

1.
M Lomte V, Patil SM, Hirve SA, B. Gite B, Y. Bobhate G, Nikhil Patil U, Gaikwad SS. A Latent Diffuse Model for Synthetic Histopathology in Rare Cancers: Tackling Data Scarcity for AI Diagnostics. J Neonatal Surg [Internet]. 2025May20 [cited 2025Sep.25];14(25S):484-90. Available from: https://www.jneonatalsurg.com/index.php/jns/article/view/6154