Proc. CVPR 2011 Workshop on Biometrics Facial Marks as Biometric Signatures to Distinguish between Identical Twins Nisha Srinivas, Gaurav Aggarwal, Patrick J Flynn Department of Computer Science and Engineering, University of Notre Dame nsriniva, gaggarwa, flynn@nd.edu Richard W. Vorder Bruegge Federal Bureau of Investigation, Digital Evidence Laboratory Building 27958A, Pod E, Quantico, VA 22135 Richard.VorderBruegge@ic.fbi.gov Abstract There exists a high degree of similarity in facial appearance between identical twins that makes it difficult for even the state of the art face matching systems to distinguish between them. Given the consistent increase in the number of twin births in recent decades, there is a need to develop alternate approaches to characterize facial appearance that can address this challenging task that has eluded even humans. In this paper, we investigate the usefulness of facial marks as biometric signatures with focus on the task of distinguishing between identical twins. We define and characterize a set of facial marks that are manually annotated by multiple observers. The geometric distribution of annotated facial marks along with their respective categories is used to characterize twin face images. The analysis is conducted on 295 twin face images acquired at the Twins Days Festival at Twinsburg, Ohio, in 2009. The results of our analysis signify the usefulness of distribution of facial marks as a biometric signature. In addition, contrary to prior research, our results indicate the existence of some degree of correlation between positions of facial marks belonging to identical twins. (a) Twin A (b) Twin B Figure 1: A pair of identical twins from the dataset. a) Twin A and b) Twin B. We can observe a high degree of facial similarity. cies. There have been several criminal cases in which either both or none of the identical twins were convicted due to the difficulty in determining the correct identity of the perpetrator. This has led to recent interest in the usefulness of biometric traits like facial appearance, fingerprints, iris, etc. to distinguish between identical twins [1] [3]. These studies have indicated that though biometrics like iris and fingerprints perform reasonably well when dealing with twin images, automatic face matching performance shows significant performance degradation when asked to distinguish between images of identical twins. The quest for novel characterizations for this task has the potential to enrich existing facial characterizations used in automatic systems to improve face matching performance in general. In this paper, we propose to differentiate between iden- 1. Introduction Humans face difficulty in distinguishing between monozygotic (identical) twins because of the high degree of similarity in their facial appearances. Even the state of the art face recognition systems exhibit poor performance when trying to distinguish between identical twins [1]. Traditionally, distinguishing between identical twins has been considered to be a problem of only academic interest but due to consistent increase in twin births in recent decades (the total increase in twin births since 1980 is 70%) [2], this has become a pertinent challenge for law enforcement agen113 Figure 2: Proposed approach to differentiate between identical twins. tical twins using facial marks. Facial marks are considered to be unique and inherent characteristics of an individual. High-resolution images enable us to observe more micro-scale details on the face [4]. Facial marks are defined as visible changes in the skin and they differ in texture, shape and color from the surrounding skin [5]. We have defined eleven types of facial marks including moles, freckles, freckle groups, darkened skin, lightened skin, etc. for the analysis. The paper is organized as follows. Section 2 briefly discusses related work. Section 3 provides description of different categories of facial marks. The description of manual annotation process is provided in Section 4 and Section 5 describes the proposed matching process. The details of the dataset, experimental setup and results are presented in Section 6. The paper concludes with a brief summary and discussion. 2. Previous Work The approach to distinguish between monozygotic twins based on facial marks is shown in Figure 2. Initially, each image in the dataset is manually annotated to determine the different types of perceptible facial marks. The distribution of annotated facial marks is used to characterize face images. The locations of facial marks are converted from image pixel coordinates to Barycentric co-ordinates, to facilitate inter-image comparison of geometric distribution of landmarks. The similarity of these distributions is used to determine the similarity between two face images. The similarity is computed by formulating a bipartite graph matching problem to correspond facial marks in one image to another. Traditionally, biometrics research has focused primarily on developing robust characterizations and systems to deal with challenges posed by variations in acquisition conditions (like pose, illumination condition, distance from sensor, etc.) and presence of noise in the acquired data. Only recently have researchers started to look at the challenges involved in dealing with the task of distinguishing between identical twins. Here we provide pointers to a few of the relevant investigations. Kong et al. [7] observed that palmprints from identical twins have correlated features (though they were able to distinguish between them based on other non-genetic information). The same observation was made by Jain et al. [8] for fingerprints also. They observed that though fingerprints appear to be more correlated for identical twins, fingerprint matching systems can be used to distinguish between them. Genetically identical irises were compared by Daugman and Downing [3] and were found to be as uncorrelated as the patterns of irises from unrelated persons. Kodate et al. [9] experimented with 10 sets of identical twins using a 2D face recognition system. Recently, Sun et al. [1] presented a study of distinctiveness of biometric characteristics in identical twins using fingerprint, face and iris biometrics. They observed that though iris and fingerprints show little to no degradation in performance when dealing with identical twins, face matchers find it hard to distinguish between identical twins. All these studies were either conducted on very small twin biometric datasets or evaluated using exist- Extensive experiments are conducted on 295 twin face images belonging to 157 unique subjects. The data used for the investigation was acquired at Twins Day Festival at Twinsburg, Ohio in 2009. Figure 1 presents images of a pair of identical twins from this dataset. The presented results highlight the merit of the proposed facial characterization to distinguish between identical twins. Prior research has claimed that the number of facial marks between twins is similar but the distribution of facial marks across twins is different [6]. Hence we also analyze if our investigation agrees with these conjectures. We observe that as suggested earlier, the number of facial marks does appear to be correlated across identical twins. On the other hand, contrary to the commonly held belief, our results indicate non-trivial correlation between distributions of facial marks across identical twins. 114 6. Birthmark: A persistent visible mark on the skin that is evident at birth or shortly thereafter. Birthmarks are generally pink, red, or brown in color and are large. 7. Splotchiness: An irregularly shaped spot, stain, or colored or discolored area. 8. Raised skin: A solid, raised mark less than 1 cm across. It has a rough texture and appears in red, pink, or brown in color. 9. Scar: Discolored tissue that permanently replaces normal skin after destruction of the epidermis. 10. Pockmark: A hollow area caused by scratching or picking of a primary scar. Figure 3: The different categories of facial marks defined 11. Pimple: A raised lesion that is temporary in nature. ing in-house or commercial matchers. We build on these efforts and present facial marks based approach to characterize faces to address this challenging task. Facial marks were recently used as soft biometric for face recognition by Jain and Park [10] in which they fuse a commercial face matcher with facial marks and observe small improvement in matching performance on non-twin datasets. 4. Facial Annotation The initial step in the proposed approach to distinguish between identical twins is to manually annotate the different types of facial marks in each image of the dataset. The twins dataset was annotated using MarkIt 1 , a facial annotation tool developed at our laboratory. MarkIt was designed to aid users to manually annotate images. MarkIt has three main components as follows, 3. Facial Marks used for Biometric Characterization 1. Display component: The images are displayed to observe and annotate the various facial marks. The objective is to discriminate between identical twins solely based on the features extracted from facial marks. Facial marks can be defined as a region of skin or superficial growth that does not resemble the skin in the surrounding area.We have identified and defined the following facial marks (shown in Figure 3) 2. Annotation component: Contains a list of pre-defined facial marks for annotations. 3. Tools component: Presents different shapes of bounding boxes to perform the actual annotations. 1. Mole: A small flat spot less than 1cm in diameter. The color of a mole is not the same as the nearby skin. It appears in a variety of shapes and is normally black in color. The metadata associated with each image consists of the number, type and location of annotated facial marks. Manual annotations enable us to gain insight on the different categories and number of facial marks present in the dataset, the location of facial marks and an understanding of how human observers visualize and differentiate between facial marks. Three different individuals, referred to as observers, annotated the identical twins dataset. This was to eliminate any kind of bias and to determine the consistency in annotating the various types of facial marks. All the observers were provided with the definitions of the facial marks characterized in this investigation along with example markings on a few sample images. Figure 4 presents the facial marks identified and annotated by a single observer on an image from the dataset. Figure 5 presents the category-wise distribution of the number of facial marks annotated by each observer. 2. Freckle: A small flat spot less than 1 cm in diameter and appears in a variety of shapes. It is usually brown in color. 3. Freckle group: A cluster of freckles. 4. Lightened patch: A flat spot that is more than 1 cm in diameter and appears in different shapes. They are usually brown, red, or lighter in color than its surroundings. 5. Darkened patch: A flat spot that is more than 1 cm in diameter and appears in different shapes. These spots are darker in color than the surroundings caused due to a local concentration in melanin. 1 Matthew Pruitt, Designed and Implemented Markit: A face Annotation Tool, at Computer Vision Research Lab (CVRL), University of Notre Dame. 115 scale, rotation and translation invariant. They are computed with respect to a reference triangle that forms the basis for the transformation. In this work, the reference triangle is determined by the center of each eye and the tip of the nose which is automatically localized using STASM [11]. Given the reference triangle, points inside and outside the reference triangle can be expressed as a function of the vertices of the triangle. 5.1. Bipartite Graph Matching Similarity between two sets of facial marks is computed by formulating the matching problem in terms of weighted bipartite graph. A weighted bipartite graph is defined as G = (S, T ; E), where S and T denote two disjoint sets of vertices and E denotes the connecting edges with corresponding nonnegative cost. A bipartite graph with N1 + N2 nodes corresponding to N1 facial marks in image I1 and N2 facial marks in image I2 is constructed. The edges correspond to potential matches between facial marks in I1 and I2 . The nonnegative weights associated with each edge is a function of euclidean distance between normalized feature locations being compared. At any given point of time, facial marks of the same category should be compared against each other, e.g., mole against mole, freckle against freckle, etc. Therefore, to avoid correspondence between facial marks of different categories, edges connecting marks belonging to different categories are given infinite weight. Facial marks in I1 are compared to facial marks in I2 by computing the Euclidean distance between the normalized feature centroids corresponding to the facial marks being compared. A potential match is determined, if the Euclidean distance between a pair of feature centroids (belonging to same category) is less than a threshold λ. The optimal correspondences are then established by executing standard Hungarian bipartite graph matching algorithm [12] on the set of potential matches. The normalized similarity score si,j is defined as, Figure 5: Variation in number of facial marks of different types identified by each Observer Observer ID Observer-1 Observer-2 Observer-3 Number of Facial Marks Annotated 3785 2311 5100 Table 1: The total number of facial marks annotated by each observer for the twins dataset. Table 1 shows the total number of facial marks annotated by each observer on the entire dataset. Clearly, based on these gross statistics, there is large variation in how the three observers perceived marks of faces in the dataset. Although, all observers appear to have annotated the prominent facial marks, some observers failed to annotate the less prominent but visible marks. 5. Facial Marks based Matching The goal of this effort is to highlight the usefulness of facial marks to characterize face images to discriminate between identical twins. Therefore, we propose a very simple matching approach that characterizes each face image based on the geometric distribution of the annotated facial marks. Similarity between two face images is computed using the similarity between the corresponding distributions of annotated marks. The details of the matching process are as follows. Each facial mark is characterized using its facial mark category (one of eleven categories as described in Section 3) and its geometric location on the corresponding face image. For comparing locations of facial marks across images, mark locations are transformed from image space to normalized barycentric coordinate space. Barycentric coordinates are normalized homogeneous coordinates that are si,j = M max(Ni , Nj ) (1) where M is the total number of correspondences established across the two images being compared. The similarity metric is used as match scores between two images. 6. Experimental Evaluation In this section, we provide the details and results of the experiments conducted to evaluate the usefulness of the proposed facial marks based characterization of faces. All the experiments are conducted on twins dataset consisting of 295 images from 157 unique subjects. 116 Figure 4: Facial marks identified and annotated by a single observer (Observer-3) 6.1. Dataset The query set is defined as images of unidentified persons to be compared against the target set. In the individual observer experiment, performance is evaluated by comparing the query set against the target set annotated by the same observer. Three different observers annotated the identical twins dataset, hence the experiment is executed three times. The performance for inter-observer experiment is evaluated by comparing the query set annotated by one observer against the target set annotated by another observer. For example, a query set annotated by observer-1 is compared against a target set annotated by observer-2 and the process is known as Observer-1 vs Observer-2. Similarly, we have Observer-1 vs Observer-3 and Observer-2 vs Observer-3. The twins dataset used in this investigation consists of face images of identical twins and was acquired in 2009 at the Twins Days Festival in Twinsburg, Ohio [13]. Face images were captured under different scenarios and conditions like controlled and uncontrolled lighting, presence and absence of eyeglasses, different facial expression like smile or neutral, different poses with yaw ranging from 0 − 180 degrees. The dataset used for the proposed experiments consists of only frontal (yaw=0) face images with no glasses, no facial hair and a neutral expression. These images were captured indoors. The dataset consists of 295 images corresponding to 157 subjects and 76 pairs of twins. Note that for a few subjects (five out of 157) used in the experiments, there is no twin image present in the dataset. The number of images per subject varies from one to six. The resolution of the images is equal to 2868 × 4310 with face occupying a large region in the image. The high-resolution images enable us to observe more micro-scale details on the face when compared to low-resolution images. For each of the aforementioned experiments, we define two different scenarios of comparing the target set and the gallery set to generate performance curves. In the first scenario, we perform only twin comparisons and in the second scenario we perform twin-unrelated persons comparison. The main difference between these two scenarios lies in the generation of the impostor scores. For both scenarios the genuine scores are obtained by comparing face images of a subject in the query set against other images of the same subject in the target set. However, for twin comparison scenario, the impostor scores are obtained by comparing a face image of a subject in the target set only against the images of the subject’s twin. The impostor scores for twin-unrelated persons comparisons are obtained by comparing a face image of a subject in the target set against images of all other subjects in the query set. 6.2. Experimental Setup We perform two different types of experiments to evaluate the performance of the proposed characterization namely, individual observer experiments and inter-observer experiments. In each experiment, the query set and the target set is composed of the available 295 face images. The target set is defined as gallery of persons to be recognized. 117 Initially the experiments are executed by considering all categories of facial marks defined. However, to determine if some facial marks categories improve the verification performance compared to other facial marks categories, the following subsets of facial marks are also considered 1. FM1=moles, freckles, freckle group, birthmark, darkened patch, lightened patch, splotchiness, raised skin, pockmark, scar round, scar linear or (no pimple) 2. FM2=moles, freckles 3. FM3=moles, freckles, pimple 6.3. Individual Observer Experiments In these experiments, the query images and target images are annotated by the same observer. Figure 6 and Figure 7 show the Receiver Operator Characteristic (ROC) curves for twin comparison and unrelated-persons comparison scenarios for the different subsets of facial marks annotated by one of the observers. It is observed that using all facial mark categories for verification enhances the performance when compared to the subsets of facial marks. Similar trends are observed in both experimental setups. The distributions of match and non-match scores for both experiments are shown in Figure 9. Figure 8 compares the performance in twins comparison experiment against the unrelated persons comparison. An improved performance is observed when comparing facial marks across unrelated persons than comparing facial marks across identical twins. This suggests that geometric distribution of facial marks are more similar across twins than across unrelated persons. In other words, there appears to be a correlation in distribution of facial marks across twins. Prior research claims that the distribution of facial marks is different across identical twins but the results obtained do not exhibit this phenomenon [6]. To substantiate statistical significance of this observation, the experiment is repeated multiple times with 50 twin pairs randomly selected from each run. The error bars in Figure 8 reflect the range of performance curves obtained. There does not appear to be any overlap across the two sets of error-bars for false accept rates 0.4 and lower which highlights the performance difference is statistically significant. Hence, there appears to be a correlation in distribution of facial marks across identical twins. Table 2 and Table 3 presents the Equal Error Rates (EER) obtained for all three observers in individual observer experiments. The difference in EERs across the two tables again substantiates our observation that facial marks are more similarly distributed across identical twins as compared to any two unrelated persons. Figure 6: ROC curves of twin comparisons for various sets of facial marks annotated by Observer-3 Figure 7: ROC curves of non-twin comparisons for various sets of facial marks annotated by Observer-3 set annotated by a different observer. Performance curves for these experiments are shown in Figure 10 for Observer-1 vs Observer-3 for both twin comparisons and twin-urelated comparisons. Table 4 and 5 presents the EER for Observer1 vs Observer-2, Observer-2 vs Observer-3 and Observer1 vs Observer-3, for both twin-unrelated comparisons and twin comparisons. The ROCs are obtained by considering all categories of facial marks for both the twin and unrelated persons scenarios. The performance degrades considerably when facial marks annotated by different observes are compared against each other. This occurs due to the variation of facial marks annotated by observers. Though all observers have annotated the prominently visible facial marks but some failed to annotate the less prominent marks. How- 6.4. Inter-Observer Experiments In these experiments, biometric scores are computed by comparing a query set annotated by one observer to a target 118 Facial Mark Sets FM FM1 FM2 FM3 EER- Observer-1 6.98% 8.03% 10.99% 11.16% EER- Observer-2 10.87% 11.98% 15.96% 16.40% EER- Observer-3 6.96% 8.27% 11.87% 9.42% Table 2: Equal error rates computed for unrelated-persons comparison for various experiments for each observer, where FM is the set consisting of all categories of facial marks and FM1, FM2 and FM3 are the different subsets defined. Facial Mark Sets FM FM1 FM2 FM3 EER- Observer-1 14.12% 15.78% 20.09% 19.06% EER- Observer-2 18.27% 20.84% 21.68% 20.22% EER- Observer-3 13.13% 14.35% 19.58% 16.27% Table 3: Equal error rates computed for twin comparison for various experiments for each observer, where FM is the set consisting of all categories of facial marks and FM1, FM2 and FM3 are the different subsets defined. Figure 8: Comparing performance between twin comparisons and non-twin comparisons for Observer-3 annotations. The error bars indicate the range of performance curves obtained by repeating the experiments with randomly selected subjects from the dataset. Figure 9: The distributions of match and non-match scores for twin and non-twin comparisons. Comparison of Annotations across Observers Observer-1 vs Observer-2 Observer-1 vs Observer-3 Observer-2 vs Observer-3 ever even in these experiments, the performance of nontwin comparisons is better than the twin comparisons, indicating that there exists some similarity in the distribution of facial marks across twins compared to an unrelated persons. EER 28.11% 18.64% 28.17% Table 4: Equal error rates obtained for unrelated-persons comparison for facial mark annotations across Observers. 7. Summary and Discussion twins solely based on the geometric distribution of facial marks. The experiments were designed and implemented to observe if there exists a correlation between facial marks across twins when compared to unrelated persons. From In this paper, we analyzed the usefulness of facial marks as a potential biometric signature for face verification. We proposed a system for distinguishing between identical 119 ert, D. M. Strobino, B. Guyer, and S. R. Sutton, “Annual summary of vital statistics: 2006,” in Pediatrics, pp. 788–801, 2008. [3] J. Daugman and C. Downing, “Epigenetic randomness, complexity and singularity of human iris patterns,” Proceedings: Biological Sciences, vol. 268, no. 1477, pp. pp. 1737–1740, 2001. [4] D. Lin and X. Tang, “Recognize high resolution faces: From macrocosm to microcosm,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, pp. 1355 – 1362, 2006. [5] http://skin-care.health-cares.net/skin-lesions.php. Last accessed: 07/2010. Figure 10: ROC curves for inter-observer analysis for twin and non-twin comparisons for Observer-1 vs Observer-3. Comparison of Annotations across Observers Observer-1 vs Observer-2 Observer-1 vs Observer-3 Observer-2 vs Observer-3 [6] G. Zhu, D. L. Duffy, A. Eldridge, M. Grace, C. Mayne, L. O’Gorman, J. F. Aitken, M. C. Neale, N. K. Hayward, A. C. Green, and N. G. Martin, “A major quantitative-trait locus for mole density is linked to the familial melanoma gene cdkn2a: A maximum-likelihood combined linkage and association analysis in twins and their sibs,” American journal of human genetics, vol. 65, pp. 483– 492, 08 1999. EER 35.08% 23.71% 32.71% [7] A. Kong, D. Zhang, and G. Lu, “A study of identical twins palmprints for personal authentication.” Table 5: Equal error rates obtained for twin comparisons for facial mark annotations across Observers. [8] A. K. Jain, S. Prabhakar, and S. Pankanti, “On the similarity of identical twin fingerprints,” 2002. the results there appears to be a correlation in distribution of facial marks across twins, that is, the position of certain facial marks appears to be similar for twins. In future, we will explore using richer facial mark characteristics like texture, shape and color to improve performance. However, the results of the conducted investigation makes a case for the use of facial marks in biometric characterization. Facial mark information can be used simultaneously with existing textural feature to enrich facial characterizations for improved performance. Manual annotation of facial marks is a difficult task, hence we observe a degradation in performance when comparing facial marks annotated by different observers. This could be improved by developing robust automatic facial mark detection methods. For future work, we aim to develop automatic facial mark detection techniques and extract other features associated with facial marks. [9] E. W. Kashiko Kodate, Rieko Inaba and T. Kamiya, “Facial recognition by a compact parallel optical correlator,” Measurement Science and Technology, vol. 13, Nov 2002. [10] A. K. Jain and U. Park, “Facial marks: Soft biometric for face recognition.,” in ICIP, pp. 37–40, 2009. [11] S. Milborrow and F. Nicolls, “Locating facial features with an extended active shape model,” ECCV, 2008. http://www.milbo.users.sonic.net/stasm. [12] H. W. Kuhn, “The hungarian method for the assignment problem,” in 50 Years of Integer Programming 1958-2008 (M. Jnger, T. M. Liebling, D. Naddef, G. L. Nemhauser, W. R. Pulleyblank, G. Reinelt, G. Rinaldi, and L. A. Wolsey, eds.), pp. 29–47, Springer Berlin Heidelberg, 2010. References [13] http://www.twinsdays.org/. [1] Z. Sun, A. A. Paulino, J. Feng, Z. Chai, T. Tan, and A. K. Jain, “A study of multibiometric traits of identical twins,” in Proc. SPIE, Biometric Technology for Human Identification VII, April 2010. Acknowledgement This research was supported by the U.S. Department of Justice/National Institute of Justice under grant 2009-DNBX-K231. [2] A. J. Martin, H.-C. Kuang, T. J. Mathews, D. L. Hoy120