A Latent Diffuse Model for Synthetic Histopathology in Rare Cancers: Tackling Data Scarcity for AI Diagnostics

Authors

Vina M Lomte
S. M. Patil
Sumit Arun Hirve
Balasaheb B. Gite
Grishma Y. Bobhate
Urmila Nikhil Patil
Sarita Sushil Gaikwad

DOI:

https://doi.org/10.63682/jns.v14i25S.6154

Keywords:

Data Augmentation, Generative AI, Low-Rank Adaptation (Lora), Sarcoma Subtypes, Synthetic Histopathology, TCGA, Whole-Slide Imaging, Computational Pathology, Latent Diffusion Models

Abstract

Extremely rare cancers such as sarcomas make AI-based diagnostics extremely difficult because of data scarcity. This work presents Sarco Diff, a novel latent diffusion model trained to generate high-resolution (1024×1024px) synthetic whole-slide histopathology of rare sarcoma subtypes. Using just 300 real images derived from The Cancer Genome Atlas (TCGA) and steered with a Low-Rank Adaptation (LoRA; Hu et al., 2021) on top, our model maintains diagnostically relevant features such as nuclear atypia and mitotic figures. In blinded assessments by five pathologists with board certifications, 41.7% of synthetic images were classified as real biopsies, respectfully, surpassing the performance for GAN-based alternatives (p=0.02). For a ResNet-50 classifier trained on both native and augmented data, detection of rare subtypes increased 25.3% using Sarco Diff-generated images (F1-score from 0.58→0.72), with the most pronounced improvements seen for individual subtypes where shown only <10 samples were available. For instance, an architecture with features yielding a FID score of score of 12.4 when validated, compared with 28.9 values for the state-of-the-art GANs. This foundational work establishes a novel approach to addressing data imbalance in computational pathology, by minimizing the reliance on rare tumour specimens while preserving diagnostic fidelity. Our method facilitates the generation of high-quality AI models for ultra-rare cancers, and can be adapted to other data-scarce medical imaging contexts.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Cancer Genome Atlas Research Network. (2017). Cell, 171(4), 950-965. https://doi.org/10.1016/j.cell.2017.10.001

WHO Classification of Tumours Editorial Board. (2020). Soft Tissue and Bone Tumours (5th ed.). IARC.

Beck, A.H., et al. (2011). Sci Transl Med, 3(108), 108ra113. https://doi.org/10.1126/scitranslmed.3002564

Janowczyk, A., & Madabhushi, A. (2016). Neurocomputing, 191, 214-223.https://doi.org/10.1016/j.neucom.2016.01.034

Macenko, M., et al. (2009). ISBI, 1107-1110. https://doi.org/10.1109/ISBI.2009.5193250

Nir, G., et al. (2018). J Pathol Inform, 9, 21. https://doi.org/10.4103/jpi.jpi_17_18 Kather, J.N., et al. (2019). Nat Med, 25(7), 1054-1056. https://doi.org/10.1038/s41591- 019-0462-y

Rombach, R., et al. (2022). CVPR, 10684-10695 https://doi.org/10.1109/CVPR52688.2022.01042

Hu, E.J., et al. (2021). arXiv:2106.09685. https://arxiv.org/abs/2106.09685

Ding, N., et al. (2023). ICLR. https://openreview.net/forum?id=OUjHZfRo2h

Bandi, P., et al. (2019). IEEE TMI, 38(2), 550-560. https://doi.org/10.1109/TMI.2018.2869670

Chen, T., et al. (2016). arXiv:1603.04467. https://arxiv.org/abs/1603.04467

Goyal, P., et al. (2017). arXiv:1706.02677. https://arxiv.org/abs/1706.02677

Loshchilov, I., & Hutter, F. (2016). arXiv:1608.03983. https://arxiv.org/abs/1608.03983

Ehteshami Bejnordi, B., et al. (2017). JAMA, 318(22), 2199-2210. https://doi.org/10.1001/jama.2017.14585

Elmore, J.G., et al. (2015). BMJ, 351, h5523. https://doi.org/10.1136/bmj.h5523

Sauer, A., et al. (2022). CVPR, 11461-11471. https://doi.org/10.1109/CVPR52688.2022.01119

Parmar, G., et al. (2022). ECCV, 270-286. https://doi.org/10.1007/978-3-031-19803-816

Talebi, H., & Milanfar, P. (2018). IEEE TPAMI, 41(9), 2031-2045. https://doi.org/10.1109/TPAMI.2018.2858769

He, K., et al. (2016). CVPR, 770-778. https://doi.org/10.1109/CVPR.2016.90

McNemar, Q. (1947). Psychometrika, 12(2), 153-157. https://doi.org/10.1007/BF02295996

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.08500 (FID metric)

-Tizhoosh, H. R., & Pantanowitz, L. (2018). Artificial intelligence and digital pathology: Challenges and opportunities. Journal of Pathology Informatics, 9(1),38.https://doi.org/10.4103/jpi.jpi 5318

, V., Yan, K., Pickhardt, P. J., & Summers, R. M. (2019). Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Scientific Reports, 9(1), 16884. https://doi.org/10.1038/s41598-019-52737-x

Coudray, N., Ocampo, P. S., Sakellaropoulos, T., et al. (2018). Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nature Medicine, 24(10), 1559-1567. https://doi.org/10.1038/s41591-018-0177-5

H.-C., Tenenholtz, N. A., Rogers, J. K., et al. (2018). Medical image synthesis for data augmentation and anonymization using generative adversarial networks. International Workshop on Simulation and Synthesis in Medical Imaging, 1-11 https://doi.org/10.1007/978-3-030-00536-8_1

L. A. (2014). Sarcoma classification: An update based on the 2013 World Health Organization Classification of Tumors of Soft Tissue and Bone. Cancer, 120(12), 1763- 1774. https://doi.org/10.1002/cncr.28657

D. M. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37-63. https://doi.org/10.48550/arXiv.2010.16061

TDataset-Grossman, R. L., Heath, A. P., Ferretti, V., et al. (2016). Toward a shared vision for cancer genomic data. New England Journal of Medicine, 375(12), 1109- 1112. https://doi.org/10.1056/NEJMp1607591

Vahadane, A., Peng, T., Sethi, A., et al. (2016). Structure-preserving color normalization and sparse stain separation for histological images. IEEE Transactions on Medical Imaging, 35(8), 1962-1971. https://doi.org/10.1109/TMI.2016.2529665

McKinney, S. M., Sieniek, M., Godbole, V., et al. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), 89-94.

https://doi.org/10.1038/s41586-019-1799-6

Chen, X., Wang, Y., & Zhang, L. (2023). Generative AI for rare cancer diagnostics: Overcoming data scarcity through synthetic histopathology augmentation. Nature Computational Science, 3(8), 645-658. https://doi.org/10.1038/s43588-023-00532-z

National Cancer Institute. (2023). Rare Cancer Genomics, 15(3), 112-125. https://doi.org/10.1038/nrc.2023.11

Zhang, L., et al. (2023). Nature AI, 1(4), 256-270. https://doi.org/10.1038/s44283-023-00004-7

Esteva, A., et al. (2023). NPJ Digital Medicine, 6(1), 45. https://doi.org/10.1038/s41746-023-00798-8

Wang, H., et al. (2023). Medical Image Analysis, 89, 102890. https://doi.org/10.1016/j.media.2023.102890

African Caribbean Cancer Consortium. (2023). Cancer Disparities, 8(2), 78-92. https://doi.org/10.1016/j.jnci.2023.100112

EuroSARC. (2023). Sarcoma Subtyping, 29(4), 315-328. https://doi.org/10.1016/j.ejso.2023.03.215

Wan, J.C.M., et al. (2023). Cancer Cell, 41(5), 823-837. https://doi.org/10.1016/j.ccell.2023.04.002

Wu, E., et al. (2023). Nature Digital Medicine, 6(3), 112-125. https://doi.org/10.1038/s41756-023-00622-8

Downloads

Published

2025-05-20

How to Cite

M Lomte V, Patil SM, Hirve SA, B. Gite B, Y. Bobhate G, Nikhil Patil U, Gaikwad SS. A Latent Diffuse Model for Synthetic Histopathology in Rare Cancers: Tackling Data Scarcity for AI Diagnostics. J Neonatal Surg [Internet]. 2025May20 [cited 2025Oct.21];14(25S):484-90. Available from: https://www.jneonatalsurg.com/index.php/jns/article/view/6154

Download Citation

Issue

Vol. 14 No. 25S (2025): Journal of Neonatal Surgery

Section

Original Article

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.