International Journal of Drug Delivery Technology
Volume 16, Issue 10s, 2026

Transformer-Based Spatio-Temporal Real-Time Human Activity Recognition Using Skeleton Data

1 Deepak S, 2 Dr. D. Anandan, 3 Kapilan P.C

1Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur. Email: deepak5441828@gmail.com

2Assistant Professor, Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur

3Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur. Email: kapilanchellamuthu@gmail.com


Abstract

Human activity recognition (HAR) from skeleton data has emerged as a critical research area with broad applications spanning healthcare monitoring, human–computer interaction, sports analytics, and surveillance systems. While convolutional and recurrent neural networks have demonstrated promising results, they inherently struggle to capture the complex, long-range spatio-temporal dependencies that characterize human motion. In this paper, we propose TST-HAR, a novel hybrid framework that synergistically integrates Graph Attention Networks (GAT) with Transformer encoders to achieve robust skeleton-based HAR. Our architecture employs GAT layers to model spatial dependencies among body joints by adaptively learning anatomical and semantic relationships within each skeletal frame, while a multi-head Transformer encoder captures long-range temporal dynamics across frame sequences. Furthermore, we introduce a multi-scale temporal attention mechanism that effectively handles activities of varying durations by aggregating temporal features at multiple granularities. Extensive experiments on two large-scale benchmarks demonstrate that TST-HAR achieves state-of-the-art performance, attaining 92.7% and 96.3% accuracy on NTU RGB+D 60 under cross-subject and cross-view protocols, and 88.4% and 89.6% on NTU RGB+D 120 under cross-subject and cross-setup protocols, respectively. These results confirm that our unified spatio-temporal modeling paradigm substantially advances the field of skeleton-based human activity recognition.

Keywords: Human Activity Recognition, Skeleton Data, Transformer, Graph Attention Networks, Spatio-Temporal Modeling, NTU RGB+D

How to cite this article: Deepak S, Anandan D, Kapilan PC. Transformer-Based Spatio-Temporal Real-Time Human Activity Recognition Using Skeleton Data. Int J Drug Deliv Technol. 2026;16(10s): 842-853; DOI: 10.25258/ijddt.16.10s.99

Source of support: Nil.

Conflict of interest: None