International Journal of Drug Delivery Technology
Volume 16, Issue 12s, 2026

Longitudinal Multi-Modal Self-Supervised Transformer Framework For Early Prediction And Progression Modeling Of Alzheimer's Disease

Swati K. Mohod1, Rajesh D. Thakare2

1Research Scholar, Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India. Email: swatimohod6882@gmail.com

2Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India. Email: rdt2909@gmail.com


ABSTRACT

Background: Prevention of the progression of the Alzheimer disease (AD) is a highly important clinical issue because of the heterogeneity of the manifestation of the disease, the lack of labeled longitudinal data, and the insufficient combination of multi-modal biomarkers. The current cross-sectional imaging models do not point to the time effects of neurodegenerative dynamics and multi-omics interactions.

Objective: In this paper, the researchers suggest LMST-ADNet, a Longitudinal Multi-Modal Self-Supervised Transformer architecture to predict the onset and progression of AD. The aim is to combine structural MRI with FDG/Amyloid PET, resting-state fMRI connectivity, cognitive evaluation (MMSE, CDR) with genetic markers (APOE ε4) into a single deep learning model that minimizes dependence on annotation and provides the opportunity to risk stratify individuals.

Methodology: The given strategy utilizes a 3D Swin Transformer on top of MRI, a 3D vision backbone on metabolic patterns of PET, and a Graph Neural Network on fMRI connectivity modeling. These are clinical and genetic metadata that are encoded with a TabTransformer-based encoder. Self-supervised pretraining is a combination of masked autoencoding, cross-modal contrastive learning, and temporal consistency regularization to improve the robustness of the representations. A time-aware attention Transformer time-resolves longitudinal changes and estimates the probability of conversion of MCI to AD, stage determination, cognitive decline, and time-to-event risk.

Results: Longitudinal validation through experimentation with a 5-fold validation proves to be a better performer as compared to the imaging only and cross-sectional baselines. The suggested framework has 96.8% classification accuracy, 0.982 AUC, 0.91 C-index to predict survival, and 18.6% increase in early MCI-to-AD conversion prediction. Self-supervised pretraining enhances the stability of features by 12.4 percent and lessens labeled data input by about 30.

Conclusion: The results indicate the success of longitudinal multi-modal fusion and self-supervised learning to model personality-specific progression of Alzheimer disease in ways that are clinically applicable.

Keywords: Alzheimer's disease; Longitudinal modeling; Multi-modal deep learning; Self-supervised learning; Transformer networks; Disease progression prediction

How to cite this article: Mohod SK, Thakare RD. Longitudinal Multi-Modal Self-Supervised Transformer Framework for Early Prediction and Progression Modeling of Alzheimer's Disease. Int J Drug Deliv Technol. 2026;16(12s): 627-641. DOI: 10.25258/ijddt.16.12s.75

Source of support: Nil.

Conflict of interest: None