A pose-based action recognition machine learning framework for basketball analytics
Abstract
Human pose estimation and action recognition are two different computer vision tasks that share a significant overlap in information. However, the challenges inherent to both inhibit them from being tackled together more often. In this work, we present two machine learning frameworks that may be used together to jointly perform pose estimation and action recognition. For pose estimation, we employed stacked hourglass networks, which produce pose predictions for 16 body joints in the form of heatmaps. Evaluation was done using the Percentage of Correct Keypoints metric, and an overall accuracy of 79% was achieved. For action recognition, we propose distance spreads, a dimensionality reduction method to represent spatio-temporal data. From a series of 10 frames of body poses from video, the distances of each of the 16 body joints to the body center were stacked together as row vectors. A straightforward 1D Convolutional Neural Network was then used for action recognition using distance spreads as inputs. The network was trained to predict among three action classes, namely: shoot, run, and defense. Evaluation of the network yielded an accuracy of 96%.