1*Associate Professor, Department of Electronics and Communication Engineering, Sir M. Visvesvaraya Institute of Technology, Bengaluru-562157, India. Email: nataraja_ec@sirmvit.edu
2Professor, Department of Electronics and Communication Engineering, Sir M. Visvesvaraya Institute of Technology, Bengaluru-562157, India. Email: sugursg_ec@sirmvit.edu
Breast cancer computer-aided diagnosis systems are commonly developed as separate models for each imaging modality, such as mammography, ultrasound and magnetic resonance imaging. This modality-dependent design increases system complexity and restricts practical deployment in routine clinical environments. To address this limitation, a unified deep learning framework is developed for breast cancer prediction across multiple imaging modalities using a single convolutional neural network architecture. The proposed model employs a shared ResNet-50 backbone combined with a modality-conditional attention module that enables adaptive feature learning for different input types. Adaptive instance normalization and gated feature recalibration are incorporated to balance modality-invariant representations with modality-specific characteristics. The framework is trained and evaluated using three publicly available datasets: INbreast for mammography, BUSI for ultrasound and TCIA-BC for magnetic resonance imaging, comprising more than 2,300 annotated images in total. The unified network achieves an overall Area Under the Curve (AUC) of 0.973 with a 95% confidence interval of 0.966–0.979. This performance is comparable to, and in several cases exceeds, that of modality-specific models, which obtain AUC values in the range of 0.945–0.968. In addition, the proposed approach reduces model parameter redundancy by approximately 65% compared with maintaining independent networks for each modality. Ablation experiments demonstrate that the modality-conditional attention mechanism plays a critical role, yielding a sensitivity improvement of 4.7% for ultrasound images, which are particularly challenging due to low contrast and speckle noise. To support clinical interpretability, Grad-CAM++ visualizations are generated for each modality. These maps indicate that the network focuses on diagnostically meaningful regions, including spiculated lesion margins in mammograms and posterior acoustic shadowing in ultrasound images. The results indicate that a single adaptive network can provide reliable and interpretable breast cancer predictions across heterogeneous imaging modalities, thereby reducing system complexity and supporting scalable deployment in clinical radiology workflows.
Keywords: Breast Cancer, Mammography, Grad-Cam++, Modalities, Medical Imaging.
How to cite this article: Nataraja R, Sundaraguru R. A Unified Modality-Agnostic CNN Framework with Attention Mechanisms for Breast Cancer Detection across Multiple Imaging Modalities. Int J Drug Deliv Technol. 2026;16(3): 241-257. DOI: 10.25258/ijddt.16.3.30
Source of support: Nil.
Conflict of interest: None