International Journal of Drug Delivery Technology
Volume 16, Issue 12s, 2026

Longitudinal Multi-Modal Self-Supervised Transformer Framework For Early Prediction And Progression Modeling Of Alzheimer's Disease

Swati K. Mohod¹, Rajesh D. Thakare²

¹Research Scholar, Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India. Email: swatimohod6882@gmail.com

²Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India. Email: rdt2909@gmail.com

ABSTRACT

Background: Prevention of the progression of the Alzheimer disease (AD) is a highly important clinical issue because of the heterogeneity of the manifestation of the disease, the lack of labeled longitudinal data, and the insufficient combination of multi-modal biomarkers. The current cross-sectional imaging models do not point to the time effects of neurodegenerative dynamics and multi-omics interactions.

Objective: In this paper, the researchers suggest LMST-ADNet, a Longitudinal Multi-Modal Self-Supervised Transformer architecture to predict the onset and progression of AD. The aim is to combine structural MRI with FDG/Amyloid PET, resting-state fMRI connectivity, cognitive evaluation (MMSE, CDR) with genetic markers (APOE ε4) into a single deep learning model that minimizes dependence on annotation and provides the opportunity to risk stratify individuals.

Methodology: The given strategy utilizes a 3D Swin Transformer on top of MRI, a 3D vision backbone on metabolic patterns of PET, and a Graph Neural Network on fMRI connectivity modeling. These are clinical and genetic metadata that are encoded with a TabTransformer-based encoder. Self-supervised pretraining is a combination of masked autoencoding, cross-modal contrastive learning, and temporal consistency regularization to improve the robustness of the representations. A time-aware attention Transformer time-resolves longitudinal changes and estimates the probability of conversion of MCI to AD, stage determination, cognitive decline, and time-to-event risk.

Results: Longitudinal validation through experimentation with a 5-fold validation proves to be a better performer as compared to the imaging only and cross-sectional baselines. The suggested framework has 96.8% classification accuracy, 0.982 AUC, 0.91 C-index to predict survival, and 18.6% increase in early MCI-to-AD conversion prediction. Self-supervised pretraining enhances the stability of features by 12.4 percent and lessens labeled data input by about 30.

Conclusion: The results indicate the success of longitudinal multi-modal fusion and self-supervised learning to model personality-specific progression of Alzheimer disease in ways that are clinically applicable.

Keywords: Alzheimer's disease; Longitudinal modeling; Multi-modal deep learning; Self-supervised learning; Transformer networks; Disease progression prediction

How to cite this article: Mohod SK, Thakare RD. Longitudinal Multi-Modal Self-Supervised Transformer Framework for Early Prediction and Progression Modeling of Alzheimer's Disease. Int J Drug Deliv Technol. 2026;16(12s): 627-641. DOI: 10.25258/ijddt.16.12s.75

Source of support: Nil.

Conflict of interest: None

Longitudinal Multi-Modal Self-Supervised Transformer Framework For Early Prediction And Progression Modeling Of Alzheimer's Disease

Swati K. Mohod1, Rajesh D. Thakare2

ABSTRACT

Swati K. Mohod¹, Rajesh D. Thakare²