A study looking into the accuracy and bias of gender and skin colour in automatic face recognition algorithms tested with real-world data has found that some demographics show higher false positive or false negative rates.
With this purpose in mind, researchers from the Human Pose Recovery and Behaviour Analysis Group at the Computer Vision Centre (CVC) in Spain and the University of Barcelona (UB), organised a challenge within the European Conference of Computer Vision (ECCV) 2020. The results evaluated the accuracy of the submitted algorithms by the participants on the face verification task in the presence of other confounding attributes.
The challenge was a success, since “it attracted 151 participants, who made more than 1,800 submissions in total, exceeding our expectations regarding the number of participants and submissions,” explained Sergio Escalera of UB, who led the study.
As part of the study, participants used a not-balanced image dataset, which simulated a real-world scenario where AI-based models are supposed to be trained and evaluated on imbalanced data (considerably more white males than dark females). In total, participants worked with 152,917 images from 6,139 identities.
These images were then annotated for two protected attributes: gender and skin colour; and five legitimate attributes: age group (0-34, 35-64, 65+), head pose (frontal, other), image source (still image, video frame), wearing glasses and a bounding box size.
The researchers found that top winning solutions exceeded 99.9 per cent accuracy while achieving very low scores in the proposed bias metrics. Julio C S Jacques Jr, a researcher at the CVC and at the Open University of Catalonia, said that such results: “can be considered a step toward the development of fairer face recognition methods.”
An analysis of the top 10 teams showed higher false-positive rates [matches] for females with darker skin tones and for samples where both individuals wear glasses. In contrast, there were higher false-negative rates for males with light skin tone and for samples where both individuals were aged 35 and below.
Furthermore, the study found that in the dataset individuals younger than 35 wear glasses less often than older individuals, resulting in a combination of effects of these attributes.
“This was not a surprise, since the adopted dataset was not balanced regarding the different demographic attributes. However, it shows that overall accuracy is not enough when the goal is to build fair face recognition methods, and that future works on the topic must take into account accuracy and bias mitigation together,” concluded Jacques Jr.