The repository is a part of a larger project for human activity and emotion recognition in a multi-view system. Here, the classifier for automatic annotation is uploaded. The algorithm applies a combination of CNN and SVM for the annotation. The annotation is based on the head poses of the individuals.
Before the annotation, a person is detected and his face is localised with the help of a Kinect v2 sensors. Then, the poses are annotated based on the three angles of the head – yaw, pitch and roll. The images of the poses are recorded in the database with the corresponding angles. In the end, the hybrid CNN and SVM network classifies the head pose images.
The neural network was tested first on the CroppedYale B dataset and then on a dataset created by Kinect.
More information about the algorithm for pose detection and annotation can be found here.
A detailed explanation of the algorithms is given in the papers listed in the Paper section.
In data_load.py the data is loaded. It could be used CroppedYale dataset or the dataset generated by the Kinect. The neural network is defined in cnn_svm.py. It's a combination of CNN and SVM. The main file in the repo is facerec.py. There, the magic happens. Metrics, like the confusion matrix and accuracy, are implemented in facerec.py to validate the neural network performance. Principal Component Analysis (PCA) can be applied to the images for improved performance. The code is in pca2d.py.