Abstract:This study addresses the prevailing trend of relying predominantly on visible light for action recognition. Leveraging the advantage of infrared imagery's insensitivity to changes in lighting conditions, a lightweight algorithm for action recognition in thermal infrared videos is proposed. The contributions of this research include the utilization of YOLOv7-tiny for target detection, Alphapose for pose estimation and dynamic skeleton modeling, and GCN for the extraction of spatiotemporal features. By refraining from treating the entire video as a single entity and extracting individual actions at different time intervals, the algorithm significantly enhances robustness. To improve the accuracy of action recognition, a bistream shifted graph convolutional network (2s-ShiftGCN) is introduced. Experimental results of 2s-ShiftGCN show a Top-1 accuracy of 88.06% and a Top-5 accuracy of 98.28% on the InfAR-skeleton dataset. On the filtered kinetics-skeleton dataset, the Top-1 accuracy is 55.26%, and the Top-5 accuracy is 83.98%.