In recent years, many Vision Transformers (ViTs)-based methods have become popular in the field of Human Pose Estimation (HPE) and have achieved excellent results. However, Convolutional Neural ...