IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 1789 Face Recognition Performance: Role of Demographic Information Brendan F. Klare, Member, IEEE, Mark J. Burge, Senior Member, IEEE, Joshua C. Klontz, Richard W. Vorder Bruegge, Member, IEEE, and Anil K. Jain, Fellow, IEEE Abstract—This paper studies the influence of demographics on the performance of face recognition algorithms. The recognition accuracies of six different face recognition algorithms (three commercial, two nontrainable, and one trainable) are computed on a large scale gallery that is partitioned so that each partition consists entirely of specific demographic cohorts. Eight total cohorts are isolated based on gender (male and female), race/ethnicity (Black, White, and Hispanic), and age group (18–30, 30–50, and 50–70 years old). Experimental results demonstrate that both commercial and the nontrainable algorithms consistently have lower matching accuracies on the same cohorts (females, Blacks, and age group 18–30) than the remaining cohorts within their demographic. Additional experiments investigate the impact of the demographic distribution in the training set on the performance of a trainable face recognition algorithm. We show that the matching accuracy for race/ethnicity and age cohorts can be improved by training exclusively on that specific cohort. Operationally, this leads to a scenario, called dynamic face matcher selection, where multiple face recognition algorithms (each trained on different demographic cohorts) are available for a biometric system operator to select based on the demographic information extracted from a probe image. This procedure should lead to improved face recognition accuracy in many intelligence and law enforcement face recognition scenarios. Finally, we show that an alternative to dynamic face matcher selection is to train face recognition algorithms on datasets that are evenly distributed across demographics, as this approach offers consistently high accuracy across all cohorts. Index Terms—Age, demographics, dynamic face matcher selection, face recognition, gender, race/ethnicity, training. I. INTRODUCTION OURCES of errors in automated face recognition algorithms are generally attributed to the well studied variations in pose, illumination, and expression [1], collectively S Manuscript received January 13, 2012; revised May 29, 2012; accepted August 05, 2012. Date of publication October 09, 2012; date of current version November 15, 2012. This work was supported by the Office of the Director of National Intelligence (ODNI). The work of R. W. Vorder Bruegge was supported in part by the Director of National Intelligence (DNI) Science and Technology (S&T) Fellows program. The work of A. K. Jain was supported in part by the World Class University (WCU) program funded by the Ministry of Education, Science and Technology through the National Research Foundation of Korea (R31-10008). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Alex ChiChung Kot. B. F. Klare is with Noblis, Falls Church, VA 22042 USA. M. J. Burge and J. C. Klontz are with The MITRE Corporation, McLean, VA 22102 USA. R. W. Vorder Bruegge is with the Science and Technology Branch, Federal Bureau of Investigation, Quantico, VA 22135 USA. A. K. Jain is with the Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824 USA, and also with the Department of Brain and Cognitive Engineering, Korea University, Seoul 136713, Korea. Digital Object Identifier 10.1109/TIFS.2012.2214212 known as PIE. Other factors such as image quality (e.g., resolution, compression, blur), time lapse (facial aging), and occlusion also contribute to face recognition errors [2]. Previous studies have also shown within a specific demographic group (e.g., race/ethnicity, gender, age) that certain cohorts are more susceptible to errors in the face matching process [3], [4]. However, there has yet to be a comprehensive study that investigates whether or not we can train face recognition algorithms to exploit knowledge regarding the demographic cohort of a probe subject. This study presents a large scale analysis of face recognition performance on three different demographics (see Fig. 1): (i) race/ethnicity, (ii) gender, and (iii) age. For each of these demographics, we study the performance of six face recognition algorithms belonging to three different types of systems: (i) three commercial off the shelf (COTS) face recognition systems (FRS), (ii) face recognition algorithms that do not utilize training data, and (iii) a trainable face recognition algorithm. While the COTS FRS algorithms leverage training data, we are not able to retrain these algorithms; instead they are black box systems that output a measure of similarity between a pair of face images. The nontrainable algorithms use common feature representations to characterize face images, and similarities are measured within these feature spaces. The trainable face recognition algorithm used in this study also outputs a measure of similarity between a pair of face images. However, different versions of this algorithm can be generated by training it with different sets of face images, where the sets have been separated based on demographics. Both the trainable algorithms, and (presumably) the COTS FRS, initially use some variant of the nontrainable representations. The study of COTS FRS performance on each of the demographics considered is intended to augment previous experiments [3], [4] on whether these algorithms, as used in government and other applications, exhibit biases. Such biases would cause the performance of commercial algorithms to vary across demographic cohorts. In evaluating three different COTS FRS, we confirmed that not only do these algorithms perform worse on certain demographic cohorts, they consistently perform worse on the same cohorts (females, Blacks, and younger subjects). Even though biases of COTS FRS on various cohorts were observed in this study, these algorithms are black boxes that offer little insight into to why such errors manifest on specific demographic cohorts. To understand this, we also study the performance of noncommercial trainable and nontrainable face recognition algorithms, and whether statistical learning methods can leverage this phenomenon. 1556-6013/$31.00 © 2012 IEEE 1790 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 Fig. 1. Examples of the different demographics studied. (a)–(c) Age demographic. (d), (e) Gender demographic. (f)–(h) Race/ethnicity demographic. Within each demographic, the following cohorts were isolated: (a) ages 18–30, (b) ages 30–50, (c) ages 50–70, (d) female gender, (e) male gender, (f) Black race, (g) White race, and (h) Hispanic ethnicity. The first row shows the “mean face” for each cohort. A “mean face” consists of the average pixel values computed from all the aligned face images in a cohort. The second and third rows show different sample images within the cohorts. By studying nontrainable face recognition algorithms, we gain an understanding of whether or not the errors are inherent to the specific demographics. This is because nontrainable algorithms operate by measuring the (dis)similarity of face images based on a specific feature representation that, ideally, encodes the structure and shape of the face. This similarity is measured independent of any knowledge of how face images vary for the same subject and between different subjects. Thus, cases in which the nontrainable algorithms have the same relative performance within a demographic group as the COTS FRS indicates that the errors are likely due to one of the cohorts being inherently more difficult to recognize. Relative differences in performance between the nontrainable algorithms and the COTS FRS indicate that the lower performance of COTS FRS on a particular cohort may be due to imbalanced training of the COTS algorithm. We explore this hypothesis by training the Spectrally Sampled Structural Subspace Features (4SF) face recognition algorithm [5] (i.e., the trainable face recognition algorithm used in this study) on image sets that consist exclusively of a particular cohort (e.g., Whites only). The learned subspaces in 4SF are applied to test sets from different cohorts to understand how unbalanced training with respect to a particular demographic impacts face recognition accuracy. The 4SF trained subspaces also help answer the following question: to what extent can statistical learning improve accuracy on a demographic cohort? For example, it will be shown that females are more difficult to recognize than males. We will investigate how much training on only females, for example, can improve face recognition accuracy when matching females. The remainder of this paper is organized as follows. In Section II we discuss previous studies on demographic introduced biases in face recognition algorithms and the design of face recognition algorithms. Section III discusses the data corpus that was utilized in this study. Section IV identifies the different face recognition algorithms that were used in this study (commercial systems, trainable and nontrainable algorithms). Section V describes the matching experiments conducted on each demographic. Section VI provides analysis of the results in each experiment and summarizes the contributions of this paper. II. PRIOR STUDIES AND RELATED WORK Over the last twenty years the National Institute of Standards and Technology (NIST) has run a series of evaluations to quantify the performance of automated face recognition algorithms. Under certain imaging constraints these tests have measured a relative improvement of over two orders of magnitude in performance over the last two decades [4]. Despite these improvements, there are still many factors known to degrade face recognition performance (e.g., PIE, image quality, aging). In order to maximize the potential benefit of face recognition in forensics and law enforcement applications, we need to improve the ability of face recognition to sort through facial images more accurately and in a manner that will allow us to perform more specialized or targeted searches. Facial searches leveraging demographics represents one such avenue for performance improvement. While there is no standard approach to automated face recognition, most face recognition algorithms follow a similar pipeline [6]: face detection, alignment, appearance normalization, feature representation (e.g., local binary patterns [7], Gabor features [8]), feature extraction [9], [10]), and matching [11]. Feature extraction generally relies on an offline training stage that utilizes exemplar data to learn improved feature combinations. For example, variants of the linear discriminant analysis (LDA) algorithm [9], [10] use training data to compute between-class and within-class scatter matrices. Subspace projections are then computed to maximize the separability of subjects based on these scatter matrices. KLARE et al.: FACE RECOGNITION PERFORMANCE: ROLE OF DEMOGRAPHIC INFORMATION This study examines the impact of training on face recognition performance. Without leveraging training data, face recognition algorithms are not able to discern between noisy facial features and facial features which offer consistent cues to a subject’s identity. As such, automated face recognition algorithms are ultimately based on statistical models of the variance between individual faces. These algorithms seek to minimize the measured distance between facial images of the same subject, while maximizing the distance between the subject’s images and those of the rest of the population. However, the feature combinations discovered are functions of the data used to train the recognition system. If the training set is not representative of the population a face recognition algorithm will be operating on, then the performance of the resulting system may deteriorate. For example, the most distinguishing features for Black subjects may differ from White subjects. As such, if a system was predominantly trained on White faces, and later operated on Black faces, the learned representation may discard information useful for discerning Black faces. The observation that the performance of face recognition algorithms could suffer if the training data is not representative of the population is not new. One of the earliest studies reporting this phenomenon is not in the automated face recognition literature, but instead in the context of human face recognition. Coined the “other-race effect”, humans have consistently demonstrated a decreased ability to recognize subjects from races different from their own [12], [13]. While there is no generally agreed upon explanation for this phenomenon, many researchers believe the decreased performance on other races is explained by the “contact” hypothesis, which postulates that the lower performance on other races is due to a decreased exposure [14]. While the validity of the contact hypothesis has been disputed [15], the presence of the “other-race effect” has not. From the perspective of automated face recognition, the 2002 NIST Face Recognition Vendor Test (FRVT) is believed to be the first study that showed that face recognition algorithms have different recognition accuracies depending on a subject’s demographic cohort [3]. Among other findings, this study demonstrated, for commercial face recognition algorithms on a dataset containing roughly 120,000 images, that (i) female subjects were more difficult to recognize than male subjects, and (ii) younger subjects were generally more difficult to recognize than older subjects. More recently, Grother et al. measured the performance of seven commercial face recognition algorithms and three academic face recognition algorithms in the 2010 NIST Multi-Biometric Evaluation [4]. The results of their experiments also concluded that females were more difficult to recognize than males. This study also measured the recognition accuracy of different races and ages. Other studies have also investigated the impact of the distribution of a training set on recognition accuracy. Furl et al. [16] and Phillips et al. [17] conducted studies to investigate the impact of cross training and matching on White and Asian races. Similar training biases were investigated by Klare and Jain [18], showing that aging-invariant face recognition algorithms suffer from decreased performance in nonaging scenarios. 1791 The study in [17] was motivated by a rather surprising result in the 2006 NIST Face Recognition Vendor Test (FRVT) [19]. In this test, the various commercial and academic face recognition algorithms tested exhibited a common characteristic: algorithms which originated in East Asia performed better on Asian subjects than did algorithms developed in the Western hemisphere. The reverse was true for White subjects: algorithms developed in the western hemisphere performed better. O’Toole et al. suggested that this discrepancy was due to the different racial distribution in the training sets for the Western and Asian algorithms. The impact of these training sets on face recognition algorithms cannot be overemphasized; face recognition algorithms do not generally rely upon explicit physiological models of the human face for determining match or nonmatch between two faces. Instead, the measure of similarity between face images is based on statistical learning, generally in the feature extraction stage [10], [20]–[23] or during the matching stage [11]. In this work, we expand on previous studies to better demonstrate and understand the impact of a training set on the performance of face recognition algorithms. While previous studies [16], [17] only isolated the race variate, and only considered two races (i.e., Asian and White), this study explores both the inherent biases and training biases across gender, race (three different races/ethnicities) and age. To our knowledge, no studies have investigated the impact of gender or subject age for training face recognition algorithms. III. FACE DATABASE This study was enabled by a collection of over one million mug shot face images from the Pinellas County Sheriff’s Office (PCSO)1 (examples of these images can be found in Fig. 1). Accompanying these images are complete subject demographics. The demographics provide the race/ethnicity, gender, and age of the subject in each image, as well as a subject ID number. The images in this dataset have been acquired since the year 1994 (when PCSO began capturing digital mug shots) to the present. Images were acquired across different cameras at several stations (one camera at intake, two cameras at booking, and two cameras at release). For the images acquired between 1994 to 2001 the specifications of the capture cameras are not known. Starting in 2001, all images were acquired using Sony D100 cameras. The cameras are mounted vertically to capture at 480 600 resolution, and images adhere to the ANSI/NIST-ITL 1-2000 face image standard [24]. Despite being captured in a controlled setting with subject cooperation, some face images exhibit minor pose and expression variations. The images in Fig. 1 are commensurate with the pose and expression variations in this dataset. Illumination was controlled using three point lighting and the background was set to 18% gray. Given this large corpus of face images, we were able to use the metadata provided to control the three demographics studied: race/ethnicity, gender, and age. For gender, we partitioned image sets into cohorts of (i) male only, and (ii) female 1The mug shot data used in this study was acquired in the public domain through Florida’s “Sunshine” laws. Subjects shown in this manuscript may or may not have been convicted of a criminal charge, and thus should be presumed innocent of any wrongdoing. 1792 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 TABLE I NUMBER OF SUBJECTS USED FOR TRAINING AND TESTING FOR EACH DEMOGRAPHIC CATEGORY. TWO IMAGES PER SUBJECT WERE USED. TRAINING AND TEST SETS WERE DISJOINT. A TOTAL OF 102,942 FACE IMAGES WERE USED IN THIS STUDY East Asians. As previously discussed, these studies concluded that algorithms developed in the Western Hemisphere did better on White subjects and Asian algorithms did better on Asian subjects. IV. FACE RECOGNITION ALGORITHMS In this section we will discuss each of the six face recognition algorithms used in this study. We have organized these algorithms into commercial algorithms (Section IV-A), nontrainable algorithms (Section IV-B), and trainable algorithms (Section IV-C). A. Commercial Face Recognition Algorithms only. For age, we partitioned the sets into three cohorts: (i) young (18 to 30 years old), (ii) middle-age (30 to 50 years old), and (iii) old (50 to 70 years old). There were very few individuals in this database with age less than 18 and older than 70. For race/ethnicity,2 we partitioned the sets into cohorts of (i) White, (ii) Black, and (iii) Hispanic.3 A summary of these cohorts and the number of subjects available for each cohort can be found in Table I. Asian, Indian, and Unknown race/ethnicities were not considered because an insufficient number of samples were available. For each of the eight cohorts (i.e., male, female, young, middle-aged, old, White, Black, and Hispanic), we created independent training and test sets of face images. Each set contains a maximum of 8,000 subjects, with two images (one probe and one gallery) for each subject. Table I lists the number of subjects included for each set. Cohorts far less than 8,000 subjects (i.e., Hispanic and older) reflect a lack of data available to us. Cases with cohorts containing only slightly fewer than 8,000 subjects are the result of removing a few images that could not be successfully enrolled in the COTS FRS. For each demographic that was controlled (e.g. gender), the other demographics were uncontrolled so that they have roughly the same distribution (following the findings in [25]). For example, the percentage of black subjects and white subjects is roughly the same for both the female and male controlled datasets. Thus, the relative recognition between cohorts within a controlled demographic is not a function of some other demographic being distributed differently. From a random sample of 150,000 subjects in the PCSO dataset, we have the following distribution for each cohort: 78.6% male, 21.4% female, 67.7% White, 28.0% Black, 3.7% Hispanic, 44.8% young, 47.5% middle-aged, and 7.7% old. The dataset of mug shot images did not contain a large enough number of Asian subjects to measure that particular race/ethnicity cohort. However, studies by Furl et al. [16] and O’Toole et al. [17] investigated the impact of the Whites and 2Racial identifiers (i.e. White, Black, and Hispanic) follow the FBI’s National Crime Information Center code manual. 3Hispanic is not technically a race, but instead an ethnic category. Three commercial face recognition algorithms were evaluated in this study: (i) Cognitec’s FaceVACS v8.2, (ii) PittPatt v5.2.2, and (iii) Neurotechnology’s MegaMatcher v3.1. The results in this study obfuscate the names of the three commercial matchers. These commercial algorithms are three of the ten algorithms evaluated in the NIST sponsored Multi-Biometrics Evaluation (MBE) [4]. As such, these algorithms are representative of the state of the art performance in face recognition technology. B. Non-Trainable Face Recognition Algorithms Two nontrainable face recognition algorithms were used in this study: (i) local binary patterns (LBP), and (ii) Gabor features. Both of these methods operate by representing the face with Level 2 facial features (LBP and Gabor), where Level 2 facial features are features that encode the structure and shape of the face, and are critical to face recognition algorithms [26]. These nontrainable algorithms perform an initial geometric normalization step (also referred to as alignment) by using the automatically detected eye coordinates (eyes were detected using FaceVACS SDK) to scale, rotate, and crop a face image. After this step, the face image has a height and width of 128 pixels. Both algorithms are custom implementations by the authors. 1) Local Binary Patterns: A seminal method in face recognition is the use of local binary patterns [7] (LBP) to represent the face [27]. Local Binary Patterns represent small patches across the face with histograms of binary patterns that encode the structure and texture of the face. Local binary patterns describe each pixel using a -bit binary number. Each bit is determined by sampling pixel values at uniformly spaced locations along a circle of radius , centered at the pixel being described. For each sampling location, the corresponding bit receives the value 1 if it is greater than or equal to the center pixel, and 0 otherwise. A special case of LBP, called the uniform LBP [7], is generally used in face recognition. Uniform LBP assigns any nonuniform binary number to the same value, where uniformity is defined by whether more than transitions between the values 0 and 1 occur in the binary number. For and , the uniform LBP has 58 uniform binary numbers, and the 59th value is reserved for the remaining nonuniform binary numbers. Thus, each pixel will take on a value ranging from 1 to 59. Two different radii are used ( and ), resulting in KLARE et al.: FACE RECOGNITION PERFORMANCE: ROLE OF DEMOGRAPHIC INFORMATION 1793 Fig. 2. Overview of the Spectrally Sampled Structural Subspace Features (4SF) algorithm. This custom algorithm is representative of state-of-the-art methods in face recognition. By changing the demographic distribution of the training sets input into the 4SF algorithm, we are able to analyze the impact the training distribution has on various demographic cohorts. two different local binary pattern representations that are subsequently concatenated together (called Multiscale Local Binary Patterns, or MLBP). In the context of face recognition, LBP values are first computed at each pixel in the (normalized) face image as previously described. The image is tessellated into patches with a height and width of 12 pixels. For each patch , a histogram of the LBP values is computed (where ). This feature vector is then normalized to the feature vector by . Finally, we concatenate the vectors into a single vector of dimensionality . In our implementation, the illumination filter proposed by Tan and Triggs [28] is used prior to computing the LBP codes in order to suppress nonuniform illumination variations. This filter resulted in improved recognition performance. 2) Gabor Features: Gabor features are one of the facial features [26] to have been used with wide success in representing facial images [8], [20], [29]. One reason Gabor features are popular for representing both facial and natural images is their similarity with human neurological receptor fields [30], [31]. A Gabor image representation is computed by convolving a set of Gabor filters with an image (in this case, a face image). The Gabor filters are defined as (1) (2) (3) where sets the filter scale (or frequency), is the filter orientation along the major axis, controls the filter sharpness along the major axis, and controls the sharpness along the minor axis. Typically, combinations across the following values for the scale and orientation are used: and . This creates a set (or bank) of filters with different scales and orientations. Given the bank of Gabor filters, the input image is convolved with each filter, which results in a Gabor image for each filter. The combination of these scale and orientation values results in 40 different Gabor filters, which in turn results in 40 Gabor images (for example). In this paper, the recognition experiments using a Gabor image representation involve: (i) performing illumination correction using the method proposed by Tan and Triggs [28], (ii) computing the phase response of the Gabor images with , and , (iii) tessellating the Gabor image(s) into patches of size 12 12, (iv) quantizing the phase response (which ranges from 0 to ) into 24 values and computing the histogram within each patch, and (v) concatenating the histogram vectors into a single feature vector. Given two (aligned) face images, the distance between their corresponding Gabor feature vectors is used to measure the dissimilarity between the two face images. C. Trainable Face Recognition Algorithm The trainable algorithm used in this study is the Spectrally Sampled Structural Subspace Features algorithm [5], which is abbreviated as 4SF. This algorithm (which was developed in house) uses multiple discriminative subspaces to perform face recognition. After geometric normalization of a face image using the automatically detected eye coordinates (eyes were detected using FaceVACS SDK), illumination correction is performed using the illumination correction filter presented by Tan and Triggs [28]. Face images are then represented using histograms of local binary patterns at densely sampled face patches [27] (to this point, 4SF is the same as the nontrainable LBP algorithm described in Section IV-B1). For each face patch, principal component analysis (PCA) is performed so that 98.0% of the variance is retained. Given a training set of subjects, multiple stages of weighted random sampling is performed, where the spectral densities (i.e., the eigenvalues) from each face patch are used for weighting. The randomly sampled subspaces are based on Ho’s original method [32], however the proposed approach is unique in that the sampling is weighted based on the spectral densities. For each stage of random sampling, LDA [10] is performed on the randomly sampled components. The LDA subspaces are learned using subjects randomly sampled from the training set (i.e., bagging [33]). Finally, distance-based recognition is performed by projecting the LBP representation of face images into the per-patch PCA subspaces, and then into each of the learned LDA subspaces. The sum of the Euclidean distance in each subspace is the dissimilarity between two face images. The 4SF algorithm is summarized in Fig. 2. As shown in the experiments conducted in this study, the 4SF algorithm performs on par with several commercial face recognition algorithms. Because 4SF initially uses the same approach as the nontrainable LBP matcher, the improvement in recognition accuracies (in this study) between the nontrainable LBP matcher and the 4SF algorithm clearly demonstrates the ability of 4SF to leverage training data. Thus, a high matching accuracy and the ability to leverage training data make 4SF an ideal 1794 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 Fig. 3. Performance of the six face recognition systems on datasets separated by cohorts within the gender demographic. (a) COTS-A, (b) COTS-B, (c) COTS-C, (d) Local binary patterns (nontrainable), (e) Gabor (nontrainable), (f) 4SF trained on equal number of samples from each gender, (g) 4SF algorithm (trainable) on the Females cohort, and (h) 4SF algorithm (trainable) on the Males cohort. face recognition algorithm to study the effects of training data on face recognition performance. While 4SF is intended to be representative of learning-based methods in face recognition, it could be the case that other learning-based algorithms (such as [20]–[23]) exhibit different amounts of sensitivity to the demographic distribution of the training data. However, unlike most pattern classification tasks which train on samples from the same classes (i.e. subjects) that the algorithms are being tested on, the recognition scenarios in this study (and face recognition in general) operate in a transfer learning scenario. Thus, because we are training on different classes/subjects than those being tested, the relative performance of 4SF’s training-based results in this study are generally considered to be a function of the data and not the 4SF algorithm itself. V. EXPERIMENTAL RESULTS For each demographic (gender, race/ethnicity, and age), three separate matching experiments are conducted. The results of these experiments are presented per demographic. Fig. 3 delineates the results for all the experiments on the gender demographic. Fig. 4 delineates the results for all experiments on the race/ethnicity demographic. Finally, Fig. 5 delineates the results for all experiments on the age demographic. The true accept rate at a fixed false accept rate of 0.1% for all the plots in Figs. 3 to 5 are summarized in Table II. KLARE et al.: FACE RECOGNITION PERFORMANCE: ROLE OF DEMOGRAPHIC INFORMATION 1795 Fig. 4. Performance of the six face recognition systems on datasets separated by cohorts within the race demographic. (a) COTS-A, (b) COTS-B, (c) COTS-C, (d) Local binary patterns (nontrainable), (e) Gabor (nontrainable), (f) 4SF trained on equal number of samples from each race, (g) 4SF algorithm (trainable) on the Black cohort, (h) 4SF algorithm (trainable) on the White cohort, and (i) 4SF algorithm (trainable) on the Hispanic cohort. The first experiment conducted on each demographic measures the relative performance within the demographic cohort for each COTS FRS. That is, for a particular commercial matcher (e.g., COTS-A), we compare it’s matching accuracy on each cohort within that demographic. For example, on the gender demographic, this experiment will measure the difference in recognition accuracy for commercial matchers on males versus females. The results from this set of experiments can be found in Figs. 3(a)–(c) for the gender demographic, Figs. 4(a)–(c) for the race/ethnicity demographic, and Figs. 5(a)–(c) for the age demographic. The second experiment conducted on each demographic cohort measures the relative performance within the cohort for nontrainable face recognition algorithms. Because the nontrainable algorithms do not leverage statistical variability in faces, they are not susceptible to training biases. Instead, they reflect the inherent (or a priori) difficulty in recognizing cohorts of subjects within a specific demographic group. The results from this set of experiments can be found in Figs. 3(d), (e) for the gender demographic, Figs. 4(d), (e) for the race/ethnicity demographic, and Figs. 5(d), (e) for the age demographic. The final experiment investigates the influence of the training set on recognition performance. Within each demographic co- 1796 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 Fig. 5. Performance of the six face recognition systems on datasets separated by cohorts within the age demographic. (a) COTS-A, (b) COTS-B, (c) COTS-C, (d) Local binary patterns (nontrainable), (e) Gabor (nontrainable), (f) 4SF trained on equal number of samples from each age, (g) 4SF algorithm (trainable) on the Ages 18–30 cohort, (h) 4SF algorithm (trainable) on the Ages 30–50 cohort, (i) 4SF algorithm (trainable) on the Ages 50–70 cohort. hort, we train several versions of the 4SF algorithm (one for each cohort). These differently trained versions of the 4SF algorithm are then applied to separate testing sets from each cohort within the particular demographic. This enables us to understand within the gender demographic (for example), how much training exclusively on females (i) improves performance on females, and (ii) decreases performance on males. In addition to training 4SF exclusively on each cohort, we also use a version of 4SF trained on an equal representation of specific demographic cohorts (referred to as “Trained on All”). For example, in the gender demographic, this would mean that for “All”, 4SF was trained on 4,000 male subjects and 4,000 female subjects. The results from this set of experiments can be found in Figs. 3(f)–(h) for the gender demographic, Figs. 4(f)–(i) for the race/ethnicity demographic, and Figs. 5(f)–(i) for the age demographic. VI. ANALYSIS In this section we provide an analysis of the findings of the experiments described in Section V. A. Gender Each of the three commercial face recognition algorithms performed significantly worse on the female cohort than the male KLARE et al.: FACE RECOGNITION PERFORMANCE: ROLE OF DEMOGRAPHIC INFORMATION 1797 TABLE II LISTED ARE THE TRUE ACCEPT RATES (%) AT A FIXED FALSE ACCEPT RATE OF 0.1% FOR EACH MATCHER AND DEMOGRAPHIC DATASET cohort (see Figs. 3(a)–(c)). Additionally, both nontrainable algorithms (LBP and Gabor) performed significantly worse on females (see Figs. 3(d), (e)). The agreement in relative accuracies of the COTS FRS and the nontrainable LBP method on the gender demographic suggests that the female cohort is more difficult to recognize than the male cohort. That is, if the results in the COTS algorithms were due to imbalanced training sets (i.e., training on more males than females), then the LBP matcher should have yielded similar matching accuracies on males and females. Instead, the nontrained LBP and Gabor matchers performed worse on the female cohort. When training on males and females equally (Fig. 3(h)), the 4SF algorithm also did significantly worse on the female cohort. Together, these results strongly suggest that the female cohort is inherently more difficult to recognize. The results of the 4SF algorithm on the female cohort (Fig. 3(f)) offer additional evidence about the nature of the discrepancy. The performance of training on only females is not higher than the performance of training on a mix of males and females (labeled “All”). Different factors may explain why females are more difficult to recognize than males. One explanation may be the use of cosmetics by females (i.e., makeup), which results in a higher Fig. 6. Match score distributions for the male and female genders using the 4SF system trained with an equal number of male and female subjects. The increased distances (dissimilarities) for the true match comparisons in the female cohort suggest increased within-class variance in the female cohort. All histograms are aligned on the same horizontal axis. within-class variance for females than males. This hypothesis is supported by the match score distributions for males and females (see Fig. 6). A greater difference in the true match distributions is noticed when compared to the false match distributions. The increased dissimilarities between images of the same female subjects demonstrate intraclass variability. Again, a cause of this may be due to the application of cosmetics. One could postulate that the reason for female face images to be more difficult to recognize than male faces is that the size of the female face is generally smaller than the male face. However, the mean and standard deviation of interpulilary distances (IPD) for the female face images in this study was 108.2 19.0. The mean and standard deviation of IPD’s for all the male face images in this study was 110.3 17.0. Thus, it is quite unlikely that such a large discrepancy in recognition accuracy between males and females would be due to this 2 pixel difference in size. 1798 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 B. Race D. Impact of Training When examining the race/ethnicity cohort, all three commercial face recognition algorithms achieved the lowest matching accuracy on the Black cohort (see Figs. 4(a)–(c)). The two nontrained algorithms had similar results (Figs. 4(d), (e)). When matching against only Black subjects (Fig. 4(f)), 4SF has a higher accuracy when trained exclusively on Black subjects (about a 5% improvement over the system trained on Whites and Hispanics only). Similarly, when evaluating 4SF on only White subjects (Fig. 4(g)), the system trained on only the White cohort had the highest accuracy. However, when comparing the 4SF algorithm trained equally on all race/ethnicity cohorts (Fig. 4(i)), we see that the performance on the Black cohort is still lower than on the White cohort. Thus, even with balanced training, the Black cohort still is more difficult to recognize. The key finding in the training results shown in Figs. 4(f)–(i) is the ability to improve recognition accuracy by training exclusively on subjects of the same race/ethnicity. Compared to balanced training (i.e., training on “All”), the performance of 4SF when trained on the same race/ethnicity as it is recognizing is higher. Thus, by merely changing the distribution of the training set, we can improve the recognition rate by nearly 2% on the Black cohort and 1.5% on the White cohort (see Table II). The inability to effectively train on the Hispanic cohort is likely due to the insufficient number of training samples available for this cohort. However, the biogeographic ancestry of the Hispanic ethnicity is generally attributed to a three-way admixture of Native American, European, and West Black populations [34]. Even with an increased number of training samples, we believe this mixture of races would limit the ability to improve recognition accuracy through race/ethnicity specific training. The demographic distribution of the training set generally had a clear impact on the performance of different demographic groups. Particularly in the case of race/ethnicity, we see that training on a set of subjects from the same demographic cohort as being matched offers an increase in the True Accept Rate (TAR). This finding on the 4SF algorithm is particularly important because in most operational scenarios, particularly those dealing with forensics and law enforcement, the use of face recognition is not being done in a fully automated, “lights out” mode. Instead, an operator is usually interacting with a face recognition system, performing a one-to-one verification task, or exploring the gallery to group together candidates for further exploitation. Provided such results generalize to other learning algorithms (as we postulate), each of these scenarios can benefit from the use of demographic-enhanced matching algorithms, as described below. 1) Scenario 1—1:N Search: In many large face recognition database searches, the objective is to have the true match candidates ranked high enough to be found by the analyst performing the candidate adjudication. COTS algorithms used for such searches should be trained on datasets where the demographic distributions are evenly distributed across the different cohorts. In cases where a successful match is not found, the analyst will often be able to categorize the demographics of the probe image based on age, gender, and/or race/ethnicity. In such a situation, if the analyst has the option to select a different matching algorithm that has been trained for that specific demographic group, then improved matching results should be expected. An schematic of this is shown in Fig. 8. The individual in the probe image in Fig. 8 could be searched using an algorithm trained on male, Whites, and aged 18 to 30. If a true match is not found using that algorithm, then a more generic algorithm might be used as a follow up to further search the gallery. Note that this scenario does not require that the gallery images be preclassified based on specific demographic information. Instead, the algorithm should simply generate higher match scores for subjects that share the characteristics of that demographic cohort. We call this method of face search dynamic face matcher selection. In cases where the demographic of the probe subject is unclear (e.g., a mixed race/ethnicity subject), the matcher trained on all cohorts equally can be used. Examples of improved retrieval instances through applying this technique can be found in Fig. 7. 2) Scenario 2—1:1 Verification: It is often the case that investigators will identify a possible match to a known subject and will request an analyst to perform a 1:1 verification of the match. This also happens as a result of a 1:N search, once a potential match to a probe is identified. In either case, the analyst must reach a determination of match or no-match. In fully automated systems, this decision is based on a numerical similarity threshold. In some environments, the analyst is prevented from seeing the similarity score out of concern that his judgment will be biased. But in others, the analyst is permitted to incorporate the match score into his analysis. In either case, it is anticipated that an algorithm trained on a specific demographic group will C. Age Demographic All three commercial algorithms had the lowest matching accuracy on subjects grouped in the ages 18 to 30 (see Figs. 5(a)–(c)). The COTS-A matcher performed nearly the same on the 30 to 50 year old cohort as the 50 to 70 year old cohort. However, COTS-B had slightly higher accuracy on 30 to 50 age group than 50 to 70 age group, while COTS-C performed slightly better on 50 to 70 than 30 to 50 age groups. The nontrainable algorithms (Figs. 5(d), (e)) also performed the worst on the 18 to 30 age cohort. When evaluating 4SF on only the 18 to 30 year old cohort (Fig. 5(f)) and the 30 to 50 year old cohort (Fig. 5(g)), the highest performance was achieved when training on the same cohort. Table II helps elaborate on the exact accuracies. Similar to race, we were able to improve recognition accuracy for the age cohort by merely changing the distribution of the training set. When comparing the 4SF system that is trained with equal number of subjects from all age cohorts, the performance on the 18 to 30 year old cohort is the lowest. This is consistent with the accuracies of the commercial face recognition algorithms. The less effective results from training on the 50 to 70 year old cohort is likely due to a small number of training subjects. This is consistent with the training results on the Hispanic cohort, which also had a small number of training subjects. KLARE et al.: FACE RECOGNITION PERFORMANCE: ROLE OF DEMOGRAPHIC INFORMATION 1799 Fig. 7. Shown are examples where dynamic face matcher selection improved the retrieval accuracy. The last two columns show the less frequent cases where such a technique reduced the retrieval accuracy. Retrieval ranks are out of (a) 8,000 gallery subjects for the White cohort, and (b) 7,992 for the Black cohort. Leveraging demographic information (such as race/ethnicity in this example) allows a face recognition system to perform the matching using statistical models that are tuned to the differences within the specific cohort. return higher match scores for true matches than one that was more generic. As a result, the analyst is more likely to get a hit and the 1:1 matching results process will be improved. 3) Scenario 3—Verification at Border Crossings: The results presented here provide support for further testing of additional demographic groups, potentially including specific country or geographic-region of origin. Assuming such demographics proved effective at improving match scores, then use of dynamic face matcher selection could be extended to immigration or border checks on entering subjects to verify that their passport or other documents accurately reflects their true demographic. 4) Scenario 4—Face Clustering: Another analyst-driven application involves the exploitation of large sets of uncontrolled face imagery. Images encountered in intelligence or investigative applications often include large sets of videos or arbitrary photographs taken with no intention of enrolling them in a face recognition environment. Such image sets offer a great potential for development of intelligence leads by locating multiple pictures of specific individuals and giving analysts an opportunity to link subjects who may be found within the same photographs. Clustering methods are now being used on these datasets to group faces that appear to represent the same subject. Implementations of such clustering methods today usually rely upon a single algorithm to perform the grouping and an analyst must perform the quality control step to determine if a particular cluster contains only a single individual. By combining multiple demographic-based algorithms into a sequential analysis, it may be possible to improve the clustering of large sets of face images and thereby reduce the time required for the analyst to perform the adjudication of individual clusters. VII. CONCLUSIONS This paper examined face recognition performance on different demographic cohorts on a large operational database of 102,942 face images. Three demographics were analyzed: gender (male and female), race/ethnicity (White, Black, and Hispanic), and age (18 to 30 years old, 30 to 50 years old, and 50 to 70 years old). 1800 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012 Fig. 8. Dynamic face matcher selection. The findings in this study suggest that many face recognition scenarios may benefit from multiple face recognition systems that are trained exclusively on different demographic cohorts. Demographic information extracted from a probe image may be used to select the appropriate matcher, and improve face recognition accuracy. For each demographic cohort, the performances of three commercial face recognition algorithms were measured. The performances of all three commercial algorithms were consistent in that they all exhibited lower recognition accuracies on the following cohorts: females, Blacks, and younger subjects (18 to 30 years old). Additional experiments were conducted to measure the performance of nontrainable face recognition algorithms (local binary pattern-based and Gabor-based), and a trainable subspace method (the Spectrally Sampled Structural Subspace Features (4SF) algorithm). The results of these experiments offered additional evidence to form hypotheses about the observed discrepancies between certain demographic cohorts. Some of the key findings in this study are: • The female, Black, and younger cohorts are more difficult to recognize for all matchers used in this study (commercial, nontrainable, and trainable). • Training face recognition systems on datasets well distributed across all demographics is critical to reduce face matcher vulnerabilities on specific demographic cohorts. • Face recognition performance on race/ethnicity and age cohorts generally improves when training exclusively on that same cohort. • In forensic scenarios, the above findings suggest the use of dynamic face matcher selection, where multiple face recognition systems, trained on different demographic cohorts, are available as a suite of systems for operators to select based on the demographic information of a given query image (see Fig. 8). Finally, as with any empirical study, additional ways to exploit the findings of this research are likely to be found. Of particular interest is the observation that women appear to be more difficult to identify through facial recognition than men. If we can determine the cause of this difference, it may be possible to use that information to improve the overall matching performance. The experiments conducted in this paper should have a significant impact on the design of face recognition algorithms. Similar to the large body of research on algorithms that improve face recognition performance in the presence of other variates known to compromise recognition accuracy (e.g., pose, illumination, and aging), the results in this study should motivate the design of algorithms that specifically target different demographic cohorts within the race/ethnicity, gender and age demographics. By focusing on improving the recognition accuracy on such confounding cohorts (i.e., females, Blacks, and younger subjects), researchers should be able to further reduce the error rates of state of the art face recognition algorithms and reduce the vulnerabilities of such systems used in operational environments. Future studies will seek to confirm the training-based results in this work on other learning algorithms (such as [20]–[23]), as well as study cohorts from multiple demographics (such as White males, Black females, etc.). ACKNOWLEDGMENT The authors would like to thank S. McCallum and the Pinellas County Sheriff’s Office for providing a large database of face images. This study would not have been possible without their invaluable support. Feedback provided by N. Orlans was instrumental in the completion this paper. The authors appreciate P. Grother’s insightful comments on this study. REFERENCES [1] P. Phillips, J. Beveridge, B. Draper, G. Givens, A. O’Toole, D. Bolme, J. Dunlop, Y. M. Lui, H. Sahibzada, and S. Weimer, “An introduction to the good, the bad, & the ugly face recognition challenge problem,” in Proc. Automatic Face Gesture Recognition, 2011, pp. 346–353. [2] A. K. Jain, B. Klare, and U. Park, “Face matching and retrieval in forensics applications,” IEEE Multimedia, vol. 19, no. 1, Jan. 2012, 20 pp.. [3] P. J. Phillips, P. J. Grother, R. J. Micheals, D. Blackburn, E. Tabassi, and J. M. Bone, “Face recognition vendor test 2002: Evaluation report,” National Institute of Standards and Technology (NISTIR), vol. 6965, pp. 1–54, 2003. [4] P. J. Grother, G. W. Quinn, and P. J. Phillips, “MBE 2010: Report on the evaluation of 2D still-image face recognition algorithms,” National Institute of Standards and Technology (NISTIR), vol. 7709, pp. 1–61, 2010. [5] B. Klare, Spectrally sampled structural subspace features (4SF), Michigan State University, Tech. Rep. MSUCSE-11-16, 2011. [6] Handbook of Face Recognition, S. Z. Li and A. K. Jain, Eds., 2nd ed. New York: Springer, 2011. [7] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, Jul. 2002. [8] L. Wiskott, J. Fellous, N. Kuiger, and C. von der Malsburg, “Face recognition by elastic bunch graph matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 775–779, Jul. 1997. [9] X. Wang and X. Tang, “Random sampling for subspace face recognition,” Int. J. Comput. Vis., vol. 70, no. 1, pp. 91–104, 2006. [10] P. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997. [11] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face recognition,” Pattern Recognit., vol. 33, no. 11, pp. 1771–1782, 2000. [12] R. K. Bothwell, J. Brigham, and R. Malpass, “Cross-racial identification,” Personality Social Psychol. Bull., vol. 15, pp. 19–25, 1989. [13] P. N. Shapiro and S. D. Penrod, “Meta-analysis of face identification studies,” Psychol. Bull., vol. 100, pp. 139–156, 1986. [14] P. Chiroro and T. Valentine, “An investigation of the contact hypothesis of the own-race bias in face recognition,” Quarterly J. Experimental Psychol., Human Experimental Psychol., vol. 48A, pp. 879–894, 1995. [15] W. Ng and R. C. Lindsay, “Cross-race facial recognition: Failure of the contact hypothesis,” J. Cross-Cultural Psychol., vol. 25, pp. 217–232, 1994. [16] N. Furl, P. J. Phillips, and A. J. O’Toole, “Face recognition algorithms and the other-race effect: Computational mechanisms for a developmental contact hypothesis,” Cognitive Sci., vol. 26, no. 6, pp. 797–815, 2002. KLARE et al.: FACE RECOGNITION PERFORMANCE: ROLE OF DEMOGRAPHIC INFORMATION [17] P. Phillips, A. Narvekar, F. Jiang, and A. OToole, “An other-race effect for face recognition algorithms,” ACM Trans. Appl. Perception, vol. 8, no. 2, pp. 1–11, 2010. [18] B. Klare and A. K. Jain, “Face recognition across time lapse: On learning feature subspaces,” in Proc. Int. Joint Conf. Biometrics, 2011, pp. 1–8. [19] P. Phillips, W. Scruggs, A. O’Toole, P. Flynn, K. Bowyer, C. Schott, and M. Sharpe, “FRVT 2006 and ICE 2006 large-scale experimental results,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 5, pp. 831–846, May 2010. [20] C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” IEEE Trans. Image Process., vol. 11, no. 4, pp. 467–476, Apr. 2002. [21] X. Jiang, B. Mandal, and A. Kot, “Eigenfeature regularization and extraction in face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 383–394, Mar. 2008. [22] X. Jiang, “Linear subspace learning-based dimensionality reduction,” IEEE Signal Process. Mag., vol. 28, no. 2, pp. 16–26, Mar. 2011. [23] X. Wang and X. Tang, “A unified framework for subspace face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 9, pp. 1222–1228, Sep. 2004. [24] R. McCabe, “Data format for the interchange of fingerprint, facial, scar mark & tattoo (SMT) information,” in American National Standard ANSI/NIST-ITL 1-2000, 2000. [25] A. J. O’Toole, P. J. Phillips, X. An, and J. Dunlop, “Demographic effects on estimates of automatic face recognition performance,” Image Vis. Comput., vol. 30, no. 3, pp. 169–176, 2012. [26] B. Klare and A. Jain, “On a taxonomy of facial features,” in Proc. IEEE Conf. Biometrics: Theory, Applications and Systems, 2010, pp. 1–8. [27] T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006. [28] X. Tan and B. Triggs, “Enhanced local texture feature sets for face recognition under difficult lighting conditions,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1635–1650, Jun. 2010. [29] L. Shen and L. Bai, “A review on Gabor wavelets for face recognition,” Pattern Anal. Applicat., vol. 9, pp. 273–292, 2006. [30] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neurosci., vol. 2, no. 11, pp. 1019–1025, 1999. [31] E. Meyers and L. Wolf, “Using biologically inspired features for face processing,” Int. J. Comput. Vis., vol. 76, no. 1, pp. 93–104, 2008. [32] T. K. Ho, “The random subspace method for constructing decision forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp. 832–844, Aug. 1998. [33] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996. [34] X. Mao et al., “A genomewide admixture mapping panel for hispanic/latino populations,” Amer. J. Human Genetics, vol. 80, no. 6, pp. 1171–1178, 2007. Brendan F. Klare (S’09–M’11) received the B.S. and M.S. degrees in computer science from the University of South Florida in 2007 and 2008, respectively, and the Ph.D. degree in computer science from Michigan State University in 2012. From 2001 to 2005 he served as an airborne ranger infantryman in the 75th Ranger Regiment. He has authored several papers on the topic of face recognition, and he received the Honeywell Best Student Paper Award at the 2010 IEEE Conference on Biometrics: Theory, Applications and Systems (BTAS). His other research interests include pattern recognition, image processing, and computer vision. 1801 Mark J. Burge (M’99–SM’05) is a scientist with The MITRE Corporation, McLean, VA. His research interests include image processing, pattern recognition, and biometric authentication. He served as a Program Director at the National Science Foundation, a research scientist at the Swiss Federal Institute of Science (ETH) in Zurich, Switzerland, and the TUWien in Vienna, Austria. While a tenured computer science professor, he coauthored the three-volume set, Principles of Digital Image Processing which has been translated into both German and Chinese. He holds patents in multispectral iris recognition and is the coeditor of Springer Verlag’s forthcoming Handbook of Iris Recognition. Joshua C. Klontz is a scientist with The MITRE Corporation, McLean, VA. He received the B.S. degree in computer science from Harvey Mudd College in 2010. His research interests include pattern recognition and cross-platform C++ software development. He also has an academic interest in epidemiology and is recently published in the Journal of Food Protection. Richard W. Vorder Bruegge (M’08) is a Senior Level Photographic Technologist for the Federal Bureau of Investigation. In this role, he is responsible for overseeing FBI science and technology developments in the imaging sciences. He serves as the FBI’s subject matter expert for face recognition and is the current chair of the Facial Identification Scientific Working Group. He has multiple publications on forensic image analysis and biometrics and he was coeditor of Computer-Aided Forensic Facial Identification (2010). He has testified as an expert witness over 60 times in criminal cases in the United States and abroad. Dr. Vorder Bruegge is a fellow of the American Academy of Forensic Sciences and was named a Director of National Intelligence Science and Technology Fellow in January 2010. Anil K. Jain (S’70–M’72–SM’86–F’91) is a university distinguished professor in the Department of Computer Science and Engineering at Michigan State University. His research interests include pattern recognition and biometric authentication. He served as the Editor-in-Chief of the IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (1991–1994). The holder of six patents in the area of fingerprints, he is the author of a number of books, including Introduction to Biometrics (2011), Handbook of Fingerprint Recognition (2009), Handbook of Biometrics (2007), Handbook of Multibiometrics (2006), Handbook of Face Recognition (2005), Biometrics: Personal Identification in Networked Society (1999), and Algorithms for Clustering Data (1988). He served as a member of the Defense Science Board and The National Academies committees on Whither Biometrics and Improvised Explosive Devices. Dr. Jain received the 1996 IEEE TRANSACTIONS ON NEURAL NETWORKS Outstanding Paper Award and the Pattern Recognition Society best paper awards in 1987, 1991, and 2005. He is a fellow of the AAAS, ACM, IAPR, and SPIE. He has received Fulbright, Guggenheim, Alexander von Humboldt, IEEE Computer Society Technical Achievement, IEEE Wallace McDowell, ICDM Research Contributions, and IAPR King-Sun Fu awards. ISI has designated him a highly cited researcher. According to Citeseer, his book Algorithms for Clustering Data (Englewood Cliffs, NJ: Prentice-Hall, 1988) is ranked #93 in most cited articles in computer science.