In this era, the Text-to-Image (TTI) generation has attained notable evolution. However, the TTI generation-based approaches face critical challenges in ensuring semantic consistency, high visual fidelity, efficient data utilization and fast generation speed. Specifically, semantic drift between textual descriptions and generated images has limited robustness under data imbalance and high computational cost during inference remain unresolved. To resolve these challenges, this research proposes a Semantic-Aware Latent Diffusion framework with Text-Guided Data Augmentation (SALD-TGA) for enhanced TTI generation. In particular, the proposed SALD-TGA framework presents a semantic-aware text encoding strategy that disentangles attribute, object and context-level information, which allows precise text-image alignment. Subsequently, to improve data efficiency and semantic diversity, a novel text-guided latent data augmentation mechanism is incorporated, which performs attribute-consistent perturbations in the latent space. Furthermore, an accelerated latent diffusion generator is designed to significantly decrease inference time while handling high-resolution and structurally coherent image synthesis. Therefore, results demonstrate the robustness of integrating semantic-aware representation learning, latent-space augmentation and fast diffusion sampling for next-generation TTI synthesis systems. The proposed SALD-TGA framework, in comparison with Context-Aware Generative Adversarial Network (CA-GAN) has acquired 9.36 Fréchet Inception Distance (FID) and 5.48 ± 0.06 Inception Score (IS) with respect to CUB-200-2011 dataset.
Keywords: Data augmentation, latent diffusion, semantic-aware learning, text-guided generation, Text-to-image synthesis.
How to cite this article: Bankar SA, Ket S, Semantic-Aware Latent Diffusion With Text-Guided Data Augmentation For Text-To-Image Generation. Int J Drug Deliv Technol. 2026;16(2s): 288-299; DOI: 10.25258/ijddt.16.288-299