1536 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 Analysis of Facial Marks to Distinguish Between Identical Twins Nisha Srinivas, Gaurav Aggarwal, Patrick J. Flynn, Fellow, IEEE, and Richard W. Vorder Bruegge Abstract—Identical twin face recognition is a challenging task due to the existence of a high degree of correlation in overall facial appearance. Commercial face recognition systems exhibit poor performance in differentiating between identical twins under practical conditions. In this paper, we study the usability of facial marks as biometric signatures to distinguish between identical twins. We propose a multiscale automatic facial mark detector based on a gradient-based operator known as the fast radial symmetry transform. The transform detects bright or dark regions with high radial symmetry at different scales. Next, the detections are tracked across scales to determine the prominence of facial marks. Extensive experiments are performed both on manually annotated and on automatically detected facial marks to evaluate the usefulness of facial marks as biometric signatures. Experiment results are based on identical twin images acquired at the 2009 Twins Days Festival in Twinsburg, Ohio. The results of our analysis signify the usefulness of the distribution of facial marks as a biometric signature. In addition, our results indicate the existence of some degree of correlation between geometric distribution of facial marks across identical twins. Index Terms—Face recognition, facial marks, identical twins. I. INTRODUCTION T HE ability to distinguish between identical twins based on different biometric modalities such as face, iris, fingerprint, etc., is a challenging and interesting problem in the biometric area [1]–[5]. Identical twins (also known as monozygotic twins) are formed when a zygote splits and forms two embryos. They cannot be discriminated based on DNA. Therefore, other biometric traits are needed to distinguish between identical twins. Using face recognition to differentiate between identical twins (monozygotic twins) is very difficult [3], because of the high degree of similarity in their overall facial appearance. In this paper we focus on distinguishing between monozygotic twins based on localized facial features known as facial marks. Traditionally, biometrics research has focused primarily on developing robust characterizations and systems to deal with challenges posed by variations in acquisition conditions (such Manuscript received December 08, 2011; revised June 04, 2012; accepted June 11, 2012. Date of publication June 29, 2012; date of current version September 07, 2012. This work was supported by the U.S. Department of Justice/National Institute of Justice under Grant 2009-DN-BX-K231. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jaihie Kim. N. Srinivas, G. Aggarwal, and P. J. Flynn are with the Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail: nsriniva@nd.edu; gaggarwa@nd.edu; flynn@nd.edu). R. W. Vorder Bruegge is with the Federal Bureau of Investigation, Digital Evidence Laboratory, Quantico, VA 22135 USA (e-mail: Richard.VorderBruegge@ic.fbi.gov). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2012.2206027 Fig. 1. A pair of identical twins from the identical twins dataset. We observe a high degree of overall facial similarity and the difference in the number and type of facial marks. as pose, illumination condition, distance from sensor, etc.) and the presence of noise in the acquired data [6]. Only recently have researchers started to look at the challenges involved in dealing with the task of distinguishing between identical twins [1], [3]. Developing techniques and systems that improve twin face recognition should also improve generic face recognition systems. Although identical twins represent only 0.5% of the global population [2], failure to correctly identify each twin has led to problems for law enforcement agencies [1]. There have been several criminal cases in which either both or neither of the identical twins was convicted due to the difficulty in determining the correct identity of the perpetrator [1]. In this paper, we propose to differentiate between identical twins using facial marks alone. Facial marks are considered to be unique and inherent characteristics of an individual. Fig. 1 shows a pair of identical twins from the dataset. Although they are similar in appearance, they can be distinguished using facial marks. High-resolution images enable us to capture these finer details on the face [7]. Facial marks are defined as visible changes in the skin and they differ in texture, shape and color from the surrounding skin [8]. Facial marks appear at random positions of the face. By extracting different facial mark features we aim to differentiate between identical twins. We have defined eleven types of facial marks including moles, freckles, freckle groups, darkened skin, lightened skin, etc., for the analysis. 1556-6013/$31.00 © 2012 IEEE SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS 1537 Fig. 2. Approach to distinguish between identical twins using manually detected facial marks. Fig. 3. Overview of the proposed multiscale automatic facial mark detection process. Initially, each image in the identical twin dataset is manually annotated by multiple observers to determine the different types of perceptible facial marks. Manually annotated facial marks are characterized both by location and category. The approach to distinguish between monozygotic twins based on manually detected facial marks is shown in Fig. 2. Next, we propose a multiscale automatic facial mark detector based on the fast radial symmetry transform (FRST) [9]. The transform detects dark regions with high radial symmetry. An overview of the proposed multiscale automatic facial mark detector is shown in Fig. 3. Initially, an image is represented at multiple scales in the form of a Gaussian pyramid. An Active Shape Model (ASM) [10] is used to detect the contours of the primary facial features like eyes, lips, nostrils and eyebrows. Using the output of the ASM, a mask is created to remove the primary facial features. Next, the FRST is applied to the masked image to detect dark regions with radial symmetry. The aforementioned steps are applied to all images in the Gaussian pyramid. Finally, the detections are tracked across scales. Automatically detected facial marks are characterized only by geometric location. The locations of facial marks are converted from image pixel coordinates to barycentric coordinates [11] to facilitate interimage comparison of landmark locations. The similarity in the distribution of facial marks is used to determine the similarity between two face images. The similarity is computed by formulating a bipartite graph matching problem. Extensive experiments are conducted on the manually and automatically detected facial marks. The data used for the investigation was acquired at the Twins Days Festival at Twinsburg, Ohio in 2009 [3]. The dataset consists of 477 images corresponding to 178 subjects and 89 pairs of twins. The presented results indicate the need for an automatic facial mark detector and demonstrate that facial marks can be used biometric signatures to distinguish between identical twins. Prior research has claimed that the number of facial marks between twins is similar but the distribution of facial marks across twins is different [12]. We also analyze this conjecture. Contrary to the commonly held belief, our results indicate nontrivial correlation between distributions of facial marks across identical twins. A preliminary version of this investigation was presented in [13]. In [13], we presented the use of facial marks as biometric signatures by analyzing facial marks annotated by multiple observers. The dataset used in [13] consisted of only 295 twin face images and 76 pairs of twins. In this paper we have introduced a multiscale automatic facial mark detector based on the fast radial symmetry transform and evaluated its performance. We also compare performance across multiple annotation sessions and multiple observers. The different experiments are executed on a larger dataset. The paper is organized as follows. Section II discusses related work. Section III provides description of different categories 1538 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 of facial marks. Section IV describes the identical twin dataset used in this investigation. The manual annotation process is described in Sections V and VII describes the proposed multiscale automatic facial mark detector. Section VI describes the matching process. The details of the experimental setup and results are presented in Section VIII. The paper concludes with a brief summary and discussion. II. RELATED WORK We first discuss prior research related to facial mark detections and then focus on identical twin face recognition research. A. Related Work on Facial Mark Detections Lin et al. [7] represented the face at multiple layers in terms of global appearance, facial features, skin texture and irregularities that contribute towards identification. Global appearance and facial features are modeled using a multilevel PCA (Principal component analysis) followed by regularized LDA (Linear discriminant analysis). A Scale Invariant Feature Transform (SIFT) is employed to detect and describe details of irregular skin region, which is combined with elastic graph matching for recognition. Improved performance was achieved by fusing facial features at multiple levels. Pierrard et al. [14] presented a framework to localize prominent facial skin irregularities, like moles and birthmarks. They use a multiscale template matching algorithm for face recognition. A discriminative factor is computed for each point by using skin segmentation and local saliency measure and is used to filter points. Recently, Zhang et al. [15] designed a facial skin mark matcher based on a region growing algorithm. Each facial mark is described in terms of position, color intensity and size. The results of the facial skin mark matcher are fused with the results of a PCA based matcher to evaluate the performance of the system. A drawback of the method is that they need to have prior knowledge about the location of facial marks in order to apply the region growing algorithm. Park et al. [16] proposed to use facial marks as soft biometrics. They initially map each face image contour obtained from an Active Appearance Model (AAM) into a mean shape using barycentric texture mapping. The mean shape images are then filtered using a Laplacian of Gaussian filter. This acts as a blob detector. Once the facial marks are detected, matching is performed based on Euclidean distance with a set threshold. The number of matches represents the similarity score between images. However, they do not use facial marks by themselves to evaluate performance. They fuse the scores from the facial mark matcher and a commercial face recognition software to evaluate the performance. They observed marginal improvement in performance by fusing the two scores. We have not compared the proposed method with any other previously published methods mainly because previous approaches fuse different face features with the features obtained from facial marks. The work proposed in this paper uses only facial marks as biometric signatures to distinguish between individuals. Most previously published work use low-resolution images. The number of facial marks detected in low-resolution images is generally lower than the number of facial marks detected in high-resolution images. Since a higher number of facial marks are detected using high-resolution images, we are able to represent an individual by a larger and more unique feature set. Therefore, the performance of the low-resolution approaches is not directly comparable to the proposed approach. B. Related Work on Identical Twin Biometrics Recently researchers have started to look at the challenges involved in dealing with the task of distinguishing between identical twins. Kong et al. [17] observed that palm prints from identical twins have correlated features (though they were able to distinguish between them based on other nongenetic information). The same observation was made by Jain et al. [4] for fingerprints. They observed that though fingerprints appear to be more similar for identical twins than unrelated persons, fingerprint matching systems can distinguish between them. Genetically identical irises were compared by Daugman and Downing [5] and were found to be as uncorrelated as the patterns of irises from unrelated persons. Kodate et al. [18] experimented with ten sets of identical twins using a 2-D face recognition system. Recently, Sun et al. [1] presented a study of distinctiveness of biometric characteristics in identical twins using fingerprint, face and iris biometrics. They observed that though iris and fingerprints show little to no degradation in performance when dealing with identical twins, face matchers experienced problems in distinguishing between identical twins. All of these studies were either conducted on very small twin biometric datasets or evaluated using existing inhouse or commercial matchers. Phillips et al. [3] presented the first detailed study on discrimination of identical twins using different face recognition algorithms. They compared three different commercial face recognition algorithms on the identical twins dataset acquired at Twins Day festival in Twinsburg, Ohio. The dataset consists of images acquired under varying conditions such as facial pose, illumination, facial expression, etc. They observed that it is easier to distinguish between identical twins under controlled studio-like settings than under uncontrolled settings. III. TYPES OF FACIAL MARKS A facial mark is defined as a region of skin or superficial growth that does not resemble the skin in the surrounding area. Facial marks represent finer details on the face. They contain information useful to discriminate between identical twins. Availability of high resolution images enables us to view facial marks in greater detail for analysis. We have identified and defined the following facial marks (shown in Fig. 4), 1) Mole: A small flat spot less than 1 cm in diameter. The color of a mole is not the same as the nearby skin. It appears in a variety of shapes and is normally black in color. 2) Freckle: A small flat spot less than 1 cm in diameter and appears in a variety of shapes. It is usually brown in color. 3) Freckle group: A cluster of freckles. 4) Lightened patch: A flat spot that is more than 1 cm in diameter and appears in different shapes. It is lighter in color than its surroundings. SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS Fig. 4. The different categories of facial marks defined. 5) Darkened patch: A flat spot that is more than 1 cm in diameter and appears in different shapes. These spots are darker in color than their surroundings. 6) Birthmark: A persistent visible mark on the skin that is evident at birth or shortly thereafter. Birthmarks are generally pink, red, or brown in color. 7) Splotchiness: An irregularly shaped spot, stain, or colored or discolored area. 8) Raised skin: A solid, raised mark less than 1 cm across. It has a rough texture and appears red, pink, or brown in color. 9) Scar: Discolored tissue that permanently replaces normal skin after destruction of the epidermis. 10) Pockmark: A hollow area or small indentation. 11) Pimple: A raised lesion that is temporary in nature. IV. DATA The dataset consists of face images of identical twins acquired in two days of August, 2009 at the Twins Days Festival in Twinsburg, Ohio [3]. Face images were captured under different scenarios and conditions like controlled and uncontrolled lighting, presence and absence of eyeglasses, different facial expressions like smile or neutral, different poses with yaw ranging from to 90 degrees, where 0 degrees is a frontal view. The dataset used for the proposed experiments consists of only frontal face images with no glasses, no facial hair and a neutral expression. These images were captured under controlled lighting. The 2009 dataset consists of 477 images corresponding to 178 subjects and 89 pairs of twins. Fig. 1 shows images of a set of identical twins from the dataset. The guidelines used to capture facial images at the Twins Day Festival match the requirements defined by SAP level 51 [19]. The resolution of the images is 4310 2868 . The average interpupillary distance is 567 pixels. These high-resolution images enable us to observe finer details on the face when compared to low-resolution images. V. MANUAL ANNOTATIONS To evaluate the usefulness of the facial marks, the twins dataset was annotated initially by multiple observers. Manual annotations enable us to gain insight on the different categories and number of facial marks present in the dataset, the location 1539 of facial marks and an understanding of how human observers visualize and differentiate between facial marks. The manual annotation process is accomplished using MarkIt, a facial annotation tool developed at our laboratory by Matthew Pruitt. MarkIt was designed to aid users to manually annotate images, and has three main components. 1) Display component: The images are displayed so that the user may observe and annotate the various facial marks. 2) Annotation component: Contains a list of predefined facial marks for annotations. 3) Tools component: Presents different shapes of bounding boxes to perform the actual annotations. The metadata produced by MarkIt for each image consists of the number, types and locations of annotated facial marks. We conducted two sessions of manual annotations with a time lapse of six months. In each session, a different set of images of the identical twins dataset was annotated. In the first session 275 images were annotated and in the second session 202 images were annotated for a total of 477 annotated images. Four observers, denoted 1 through 4, annotated the images in two sessions. Observers 1, 2 and 3 did the first session and 1, 2, and 4 did the second session. The observers had no prior experience with facial mark annotation. Hence the observers were provided with the definitions of the facial marks characterized in this investigation along with example markings on a few sample images. Fig. 7 shows examples of different facial marks manually annotated by observer 1, observer 2 and observer 3. We observe that there is a difference in number of facial marks annotated and categorized between observers. Figs. 5 and 6 show the category-wise distribution of the number of facial marks annotated by each observer in session 1 and session 2. Table I indicates the total number of facial marks annotated by observers in session 1 and session 2. Based on these gross statistics, there is large variation in how the four observers perceived facial marks in the dataset. Difficulties in manual annotations noted by observers and us include: 1) Manual annotation is a difficult task because it involves training and familiarizing an individual with the definitions and characteristics of the different types of facial marks (apart from learning how to use the facial annotation tool). 2) Observers experienced difficulty in differentiating between categories of facial marks, especially in the case of moles and freckles. 3) Although all observers appear to have annotated the prominent facial marks, some observers failed to annotate the less prominent but visible marks. 4) It is a time consuming and an expensive task. Hence there is a need for an automatic facial mark detector to overcome these problems. Such a method is discussed in Section VII. VI. FACIAL MARKS BASED MATCHING We propose a matching approach that characterizes each face image based on the corresponding mark locations between the gallery and probe images. Each facial mark is characterized by a facial mark category and geometric location on the face image. Mark locations are transformed from image space to normalized 1540 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 avoid correspondence between facial marks of different categories, edges connecting marks belonging to different categories are given infinite weight. A potential match is determined if the Euclidean distance between a pair of feature centroids (belonging to same category) is less than a threshold . The optimal correspondences are then established by executing standard Hungarian bipartite graph matching algorithm [21] on the set of potential matches. The normalized similarity score for the comparison between image and image is given by (1) Fig. 5. Variation in number of facial marks of different types identified by each observer during session 1. where is the total number of correspondences established across the two images being compared. The similarity metric is used as a match score between two images. B. Preliminary Results: Manual Annotations Fig. 6. Variation in number of facial marks of different types identified by each observer during session 2. barycentric coordinate space. Barycentric coordinates are normalized homogeneous coordinates that are scale, rotation and translation invariant. They are computed with respect to a reference triangle that forms the basis for the transformation. In this work, the reference triangle is determined by the center of each eye and the tip of the nose, which is automatically localized using STASM [20]. Given the reference triangle, points inside and outside the reference triangle can be expressed as a function of the vertices of the triangle. Since barycentric coordinates are computed using three vertices, it provides robustness to facial pitch changes. A. Bipartite Graph Matching Similarity between two sets of facial marks is computed by representing the matching problem using a weighted bipartite graph. A weighted bipartite graph is defined as , where and denote two disjoint sets of vertices and denotes the connecting edges with corresponding nonnegative cost. A bipartite graph with nodes corresponding to facial marks in image and facial marks in image is constructed. The edges correspond to potential matches between facial marks in and . The nonnegative weight associated with each edge is a function of Euclidean distance between normalized feature locations being compared. Facial marks of the same category should be compared against each other (e.g., mole against mole, freckle against freckle, etc.). Therefore, to We perform extensive experiments on the manually annotated dataset. Each experiment is designed and implemented such that it evaluates the usefulness of facial marks as a biometric signature to distinguish between identical twins. The outcome of each of the experiments provide answers to the following questions, 1) How do facial marks annotated by a single observer distinguish between identical twins? 2) Is performance consistent when comparing annotations from different observers against each other? 3) Is performance consistent when comparing annotations from the same observer in different sessions? The different experiments are designed as follows: 1) Experiment 1 a) Experiment Setup: Compare the query set against the target set annotated by the same observer in the same session. b) Target Set: Session Data, where represents the session number and or 2. c) Query Set: Session Data d) Inference: Facial marks can be used as potential biometric signatures to distinguish between identical twins. 2) Experiment 2 a) Experiment Setup: Compare the query set against the target set annotated by the same observer in different sessions b) Target Set: Session 1 Data c) Query Set: Session 2 Data d) Inference: Performance degrades due to inconsistency in annotations made by an observer across sessions. 3) Experiment 3 a) Experiment Setup: Compare the query set annotated by one observer against the target set annotated by another observer. For example, a query set annotated by observer 1 is compared against a target set annotated by observer 2, and the process is known as observer 1 versus observer 2. b) Target Set: Session Data, where represents the session number and or 2. c) Query Set: Session Data SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS 1541 Fig. 7. Different facial marks manually annotated by each observer in session 1 for a given image. (a) observer 1; (b) observer 2; (c) observer 3. TABLE I TOTAL NUMBER OF FACIAL MARKS ANNOTATED BY EACH OBSERVER DURING SESSION 1 AND SESSION 2 OF MANUAL ANNOTATIONS d) Inference: Performance degrades due to inconsistency in perceiving facial marks by different annotators. Fig. 8 compares the Receiver Operating Characteristic (ROC) curves for Experiments 1, 2, and 3. We observe that the best performance is obtained when comparing annotations made by a single observer in the same session, i.e., Experiment 1. This indicates that facial marks are useful in distinguishing between identical twins. However, performance curves obtained for Experiments 2 and 3 exhibit a significant degradation in performance, indicating that individual observers perceive facial marks differently over time and the annotations are inconsistent. Similarly, different observers view facial marks differently, leading to inconsistency. Also, the annotation style of an observer varies differently over time. Inconsistency in performance observed in Experiment 2 and Experiment 3 is a major drawback of using manually annotated facial marks to differentiate between identical twins. Hence, in order to obtain consistency, there is a need for a robust and efficient automatic facial mark detector. In the following sections we present such a multiscale automatic facial mark detector. VII. AUTOMATIC FACIAL MARK DETECTION Fig. 9 presents a detailed overview of the proposed automatic facial mark detector. Facial mark detection is performed at different scales, which is achieved by constructing a Gaussian pyramid [22]. For each image in the pyramid the following steps are applied: Fig. 8. Performance curves obtained for the different types of experiments executed on the manually annotated dataset. 1) The primary facial features like eyes, eyebrows, lips and nostrils are localized using an Active Shape Model (ASM). 2) Individual masks are created based on the output of the ASM to mask the primary features. 3) A gradient-based interest operator (the fast radial symmetry transform) is applied to the masked image. The transform detects regions of high radial symmetry. 4) Applying a threshold to the output of the fast radial symmetry transform results in detecting bright or dark regions of high radial symmetry, which corresponds to potential facial marks. The potential facial marks are detected at each level of the image pyramid. Those present in two or more levels are considered to be reliable facial marks. Presently, we do not categorize the detections into different types; we just treat them as point features. In the following sections we describe the process of image pyramid construction, localization of primary facial features, mask generation, and the fast radial symmetry transform. 1542 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 Fig. 9. Diagram highlighting the main steps of the proposed multiscale automatic facial marks detection process. A. Gaussian Pyramid Construction The objective is to detect facial marks that are stable across different scales. This can be achieved by using a Gaussian pyramid. The Gaussian pyramid consists of a set of low-pass filtered and subsampled images. The original image is defined at the base level. The successive levels of the pyramid are obtained by filtering the image in the previous level and downsampling it by a factor 2. Gaussian pyramid [22] is defined by (2) (3) where is the base image of size and represents the images in the subsequent levels, is the level number and is a Gaussian filter of size 5 5. The number of levels in a Gaussian pyramid is defined by . In this study we define . Facial marks are detected at each level and then tracked across levels to signify their prominence. B. Detection of Primary Facial Features The contours of primary facial features like eyes, eyebrows, nostrils and lips are detected using an Active Shape Model. Primary facial features must be masked before the detection process to avoid detections that are caused by their presence. The Active Shape Model was first presented by Cootes et al. [10]. Once an ASM is trained based on the data found in the training set, it iteratively deforms a contour to fit the new image. The ASM defines two components of an object, the model shape and the profile. Model shape defines the shape of the contour. The profile is defined for each contour point and describes what the image looks like around each point in the model. We use an open source implementation of ASM called STASM [20]. This detects 68 facial landmark points corresponding to the contours of the primary facial features. Using these landmark points, a mask is created for each image to mask out the primary facial features referred to as masked images. C. Fast Radial Symmetry Detector We propose to apply the fast radial symmetry transform, to detect the desired facial marks. FRST was defined by Loy et al. [9]; it is similar to a generalized symmetry transform [23] and the circular Hough transform [24]. The output of the transform highlights radially symmetrical regions and suppresses regions that are asymmetrical. For a given image , FRST determines the contribution of each pixel to the symmetry over a set of radii . The radii set is defined based on the size of the different facial marks that appear on the face. Also, at each image level a different set is defined. Fig. 9 illustrates the multiscale automatic detection process. Initially, images are represented at different scales. Next for each image, the gradient image, is computed using a 3 3 Sobel operator. We compute the orientation projection image and the magnitude projection image at each point of the gradient image for every value in . These images are determined by observing the gradient at each point from which a corresponding positively affected pixels and negatively-affected pixels are determined. Positively affected pixels are defined as pixels that lie along the direction of the gradient vector and negatively-affected pixels are pixels in the direction directly opposite to the gradient vector as shown in Fig. 10. The SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS 1543 coordinates of the positively-affected and negatively affected pixels are defined as (4) (5) where is the gradient vector and is the magnitude of the gradient vector. Initially the orientation projection image and the magnitude projection image are zero. The points in and corresponding to a pair of positively-affected or negatively-affected pixels is computed by (6) Fig. 10. Positively and negatively affected pixels influenced by point . gradient image for of the (7) (8) (9) The orientation projection image and magnitude projection image capture the radial features of a particular region. The radial symmetry contribution , at radius is given by (10) where composition (SVD) of the gradient matrix region gradient matrix is defined as [26]: .. . .. . .. . .. . of the region. The (14) and the local dominant orientation information is obtained by computing the SVD of , is given by (15) (11) is , (12) where is the Gaussian kernel, is the degree of radial strictness and is the normalizing factor. The final symmetry image is formed by averaging the radial symmetry distributions over all radii as (13) After the radial symmetry image is computed, we apply hysteresis thresholding to detect dark or bright regions of high radial symmetry resulting in a binary image . Then we apply connected component analysis to detect connected regions which correspond to potential facial marks. For each image, we carry out the above procedure at different levels of the pyramid, hence we have a set of potential facial marks at each level. Next, we track the potential facial marks across levels to form the final set of detections. Detections present at two or more levels are included in the final detection set. Once the final detection set is calculated, we perform postprocessing to reduce the number of false positives. False positives occur mainly due to presence of hair on the forehead. These are eliminated by using dominant orientation information obtained from local gradients of the detected regions [25]. The dominant orientation information for each detected region is computed by calculating the Singular Value De- where is an orthonormal to . The singular values and describe the energy in the direction of vectors and . If the energy in the direction of vector for a given region, then we consider that region as a hair region and is eliminated. Also, if the size of the detections are greater than a predefined value then the detections are eliminated. The remainder of the detections are characterized as facial marks. In Fig. 11, the detections represented by the square markers represent the potential facial marks detected (before postprocessing). The detections represented by the circular marker are eliminated using SVD and the detections represented by the square marker indicate the true positives. The different colors indicate the number of scales at which the facial marks were detected. Facial marks detected are shown in Fig. 12. Currently, each facial mark is defined based only on the geometric location and we do not classify them into different categories. They are treated as point features. Once the point features are computed we perform facial mark matching. D. Bipartite Graph Matching for Automatically Detected Facial Marks The process for matching facial marks detected by the multiscale automatic facial mark detector is similar to that described in Section VI. In the case of automatically detected facial marks, each facial mark is characterized only by its geometric location on the corresponding face image. Therefore, automatically detected facial marks are treated as point features and can be viewed as they all belong to the same category. Location of facial marks are compared across images in barycentric coordi- 1544 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 Fig. 11. (a) Potential facial marks detected by the multiscale automatic facial mark detector. The different colors indicate the number of scales at which the facial marks are detected. Red is two levels, blue is three levels, green is four levels, and yellow is five levels. (b) Detections represented by circles represent potential false positives and are removed using gradient-based SVD. Fig. 12. Facial marks detected by the multiscale automatic facial mark detector for a pair of twins. The different colors indicate the number of scales at which the facial marks are detected. Red is two levels, blue is three levels, green is four levels, and yellow is five levels. nate system as described in Section VI. Similarity is computed by formulating the matching process in terms of a weighted bipartite graph. The normalized similarity between two images and is defined as and are weighting factors that correspond to the and number of levels at which the facial marks are detected. (16) We perform extensive experiments on both the manually annotated facial marks and the automatically detected facial marks. In addition to the experiments mentioned in Section VI (Experiment 1, Experiment 2 and Experiment 3), we include where the optimal matches are represented by a set , and , VIII. EXPERIMENTAL SETUP AND RESULTS SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS 1545 Fig. 13. Twins versus Twins and All versus All scenario used for comparing the target set and the query set. Experiment 4 and Experiment 5, which evaluate the performance of multiscale automatic facial mark detector. These new experiments are described as follows, 1) Experiment 4 a) Experiment Setup: Match the facial marks detected by the multiscale automatic facial mark detector. The query set and the target set are composed of detections from the automatic facial mark detector from session-1 or session-2 dataset. b) Target Set: Detections from Session- data, where represents the session number and or 2. c) Query Set: Detections from Session- data d) Inference: Performance is lower when compared to Experiment 1 but it is more consistent with change in scenarios. 2) Experiment 5 a) Experiment Setup: Match the facial marks detected by the multiscale automatic facial mark detector. The query set and the target set are composed of detections from the automatic facial mark detector from different sessions. b) Target Set: Detections from Session-1 Data c) Query Set: Detections from Session-2 Data d) Inference: Consistency in performance is observed when comparing facial marks across sessions, unlike the trend observed with manual detections in Experiment 2 and Experiment 3. For each of the aforementioned experiments, we define two different scenarios (Twins versus Twins and All versus All) of comparing the target set and the query set to generate performance curves. Fig. 13 depicts both of these scenarios. The main difference between these two scenarios lies in the generation of the impostor scores. For both scenarios the genuine scores are obtained by comparing face images of a subject in the query set against other images of the same subject in the target set. However, for the Twins versus Twins scenario, the impostor scores are obtained by comparing a face image of a subject in the target set only against the images of the subject’s twin. The impostor scores for All versus All are obtained by comparing a face image of a subject in the target set against images of all other subjects in the query set. Fig. 14. Distribution of moles, freckles, and acne across the first 50 subjects in the dataset, i.e., 25 pairs of twins. Finally, the last variation added only to Experiment 1 is that it is executed for different subsets of facial mark categories provided. Each manually annotated facial mark is characterized by its geometric location and a facial mark category. Amongst the different categories of facial marks defined, moles, freckles and pimple are more common and prominent. Therefore we define different subsets of facial mark categories to determine if some facial mark categories are more useful compared to others. Fig. 14 shows the distribution of moles, freckles and pimple for the first 50 pairs of twins. The following subsets of facial marks are also considered: 1) 2) 3) 4) ; i.e., each manually annotated facial mark is characterized only by its geometric location and the facial mark category is ignored. Since the automatically detected facial marks are characterized only by the geometric location, we do not have to execute 1546 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 Fig. 15. ROC curves of Twins versus Twins comparison for various sets of facial marks annotated by observer 1 in session 1. Fig. 16. Performance of All versus All comparison for various sets of facial marks annotated by observer 1 in session 1. the experiments based on the subset of facial marks. We use Receiver Operating Characteristic (ROC) curves and Equal Error Rate (EER) as performance measures to evaluate the performance of the different experiments. If the ROC curve is closer to the upper-left corner, it indicates a better performing system and correspondingly, a lower value of EER. A. Results: Experiment 1 Figs. 15 and 17 present the ROCs for Experiment 1 for session 1 and session 2 dataset for Twins vs Twins comparison using different subsets of facial marks. Similarly, Figs. 16 and 18 present the ROCs for Experiment 1 for session 1 and session 2 dataset for All versus All comparison using different subsets of facial marks. Considering subsets of facial marks does not significantly improve performance. In fact, best performance is achieved by considering all categories of facial marks. We observe that there is no significant degradation in performance Fig. 17. ROC curves of Twins versus Twins comparison for various sets of facial marks annotated by observer 1 in session 2. Fig. 18. Performance of All versus All comparison for various sets of facial marks annotated by observer 1 in session 2. when facial mark categories are ignored. This indicates that geometric location of facial marks is a robust feature that can be used to differentiate between identical twins. A similar trend is observed in both scenarios. An interesting result is observed when comparing the performance between Twins versus Twins comparisons and All versus All comparisons. Performance is significantly better when comparing facial marks across unrelated individuals (All versus All scenario) with facial marks across identical twins. This leads to an inference that distribution of facial marks across identical twins appears to be correlated. The distributions of genuine and impostor scores for both experiments from session 1 are shown in Fig. 19. There exists a larger overlap between the match and nonmatch scores for the Twins versus Twins comparison compared to the All versus All comparisons, leading to higher error and lower performance. Hence, it is easier to differentiate unrelated persons using facial mark distribution than identical twins. Although results are provided based on the facial marks annotated by observer 1, similar SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS 1547 TABLE II EQUAL ERROR RATES COMPUTED FOR TWINS VERSUS TWINS AND ALL VERSUS ALL COMPARISONS ACROSS SESSIONS FOR OBSERVER 1 AND OBSERVER 2 Fig. 19. Distributions of match and nonmatch scores for Twins versus Twins and All versus All comparisons. Fig. 21. Performance curves for Experiment 3 for Twins versus Twins and All versus All comparisons for observer 1 versus observer 3, session 1. computed for both twin and unrelated persons comparison across sessions for observer 1 and observer 2. C. Results: Experiment 3 Fig. 20. Comparing the performance curves of observer 1 for both All versus All and Twins versus Twins comparisons, for Experiment 2. trends are seen in the case of facial marks annotated by other observers. B. Results: Experiment 2 In Experiment 2 we compare manually annotated facial marks across sessions. Only two observers, observer 1 and observer 2, took part in both the sessions. Hence, we can compare annotations made only by them. Fig. 20 presents the ROC curves for both the Twins versus Twins and All versus All comparison considering all facial marks (FM). Again, we observe that the performance of All versus All outperforms the Twins versus Twins performance. Ideally, the performance should be similar to the results of Experiment 1. However, there is a degradation in overall performance when comparing manually detected facial marks across sessions, indicating that observers are not consistent over time. Table II lists the EER In this experiment, similarity scores are computed by comparing a query set annotated by one observer to a target set annotated by a different observer. Performance curves for these experiments for each session are shown in Fig. 21 for observer 1 versus observer 3 and Fig. 22 for observer 1 versus observer 4, for both Twins versus Twins and All versus All scenarios. The performance degrades considerably when facial marks annotated by different observers are compared against each other. This occurs due to the variation of facial mark annotations by observers. There is lack of uniformity in facial mark annotations across observers. Though all observers annotated the prominently visible facial marks, a few failed to annotate the less prominent marks. However, even in these experiments, the performance of All versus All comparisons is better than the Twins versus Twins suggesting similarity in the distribution of facial marks across twins compared to unrelated persons. D. Results: Experiment 4 and Experiment 5 Experiment 4 and Experiment 5 evaluate the performance of the proposed multiscale automatic facial mark detector. A total of 4602 facial marks were detected in the session 1 dataset and 3012 facial marks were detected in the session 2 dataset. Figs. 23 and 24 shows the performance curves for Experiment 4 for both the Twins versus Twins and All versus All comparisons for each 1548 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 Fig. 22. Performance curves for Experiment 3 for Twins versus Twins and All versus All comparisons for observer 1 versus observer 4, session 2. Fig. 24. ROC curves representing the performance of the proposed multiscale automatic facial mark detector for Experiment 4, for session 2 dataset, for both Twins versus Twins and All versus All comparison. Fig. 23. ROC curves representing the performance of the proposed multiscale automatic facial mark detector for Experiment 4, for session 1 dataset, for both Twins versus Twins and All versus All comparison. Fig. 25. ROC curves representing the performance of the proposed multiscale automatic facial mark detector for Experiment 5, for both Twin versus Twin and All versus All comparison. of the dataset. The performance of the proposed detector is relatively worse when compared with the results obtained for Experiment 1, which uses manual annotations. The performance curves for Experiment 5 are shown in Fig. 25. Facial marks detected in session 2 dataset are compared against facial marks detected in session 1 dataset. In theory, this is analogous to Experiment 2. We obtain similar results for both Twins versus Twins comparisons and All versus All comparisons in Experiment 4 and Experiment 5. It does not exhibit a large variation in performance as seen in Experiment 2 and Experiment 3. Finally, we compare the performance between manually detected facial marks and automatically detected facial marks. Fig. 26 compares performance across Experiment 1, Experiment 2, Experiment 4, and Experiment 5 for the Twins versus Twins comparison. Although the best performance is achieved by comparing facial marks annotated by an observer at a given time, there is a large degradation in performance when we compare facial marks annotated by an observer over time, indicating inconsistency in performance. This variation in performance is not seen when comparing facial marks detected by the proposed multiscale automatic facial mark detector. The automatic facial mark detector performs better than facial marks annotated by an observer at different instances of time. Similarly, we observe that the performance of the automatic facial mark detector is better when compared with Experiment 3, i.e., when we compare facial marks annotated by different observers, as shown in Fig. 27. The performance of the proposed automatic facial mark detector is relatively consistent and uniform. The same trend is observed in All versus All comparison. The results obtained can be summarized as follows, 1) In every experiment, we observe that the All versus All comparison performs better than Twins versus Twins comparison. This indicates that the facial mark distributions across identical twins appear to be correlated. SRINIVAS et al.: ANALYSIS OF FACIAL MARKS TO DISTINGUISH BETWEEN IDENTICAL TWINS 1549 experiments are designed and implemented to highlight the performance of the proposed automatic facial mark detector when compared with the performance achieved by using manually detected facial marks. From the results there appears to be a correlation in distribution of facial marks across twins. This phenomenon is observed across all experiments. In the future, we will explore using richer facial mark characteristics like texture, shape and color to improve performance. We hope to further explore the use of different matching algorithms and compare it to the proposed matching algorithm. Facial marks features can be fused with other facial features to enrich facial characterizations for improved performance. The results of the investigation makes a case for the use of facial marks in biometric characterization. REFERENCES Fig. 26. Performance comparison between manually detected facial marks experiments (Experiment 1 and Experiment 2), and automatically detect facial mark experiment (Experiment 4 and Experiment 5). For Twins versus Twins comparisons. Fig. 27. Performance comparison between manually detected facial marks experiment (Experiment 3), and automatically detect facial mark experiment (Experiment 4). For Twins versus Twins comparisons. 2) Manual annotation process is a difficult and time consuming task, and hence there is a need for a robust automatic facial mark detector. 3) Though the performance of the proposed multiscale automatic facial mark detector is slightly lower than the performance of manual annotations for a single session, it exhibits greater consistency over time. IX. CONCLUSION In this paper, we analyzed the usefulness of facial marks as a potential biometric signature for distinguishing between identical twins. We proposed a multiscale automatic facial mark detection system for distinguishing between identical twins solely based on the geometric distribution of facial marks. Different [1] Z. Sun, A. A. Paulino, J. Feng, Z. Chai, T. Tan, and A. K. Jain, “A study of multibiometric traits of identical twins,” in Proc. SPIE, Biometric Technology for Human Identification VII, Apr. 2010, pp. 1–12. [2] K. W. Bowyer, “What surprises do identical twins have for identity science?,” Computer, vol. 44, pp. 100–102, Jul. 2011. [3] P. Phillips, P. Flynn, K. Bowyer, R. Voder Bruegge, P. Grother, G. Quinn, and M. Pruitt, “Distinguishing identical twins by face recognition,” in Proc. IEEE Int. Conf. Automatic Face Gesture Recognition and Workshops, 2011, Mar. 2011, pp. 185–192. [4] A. K. Jain, S. Prabhakar, and S. Pankanti, “On the similarity of identical twin fingerprints,” Pattern Recognit., vol. 35, no. 11, pp. 2653–2663, 2002. [5] J. Daugman and C. Downing, “Epigenetic randomness, complexity and singularity of human iris patterns,” in Proc. Biological Sciences, 2001, vol. 268, no. 1477, pp. 1737–1740. [6] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face recognition: A literature survey,” ACM Computer Survey, vol. 35, pp. 399–458, Dec. 2003. [7] D. Lin and X. Tang, “Recognize high resolution faces: From macrocosm to microcosm,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, 2006 (CVPR’06), 2006, vol. 2, pp. 1355–1362. [8] [Online]. Available: http://skin-care.health-cares.net/skin-lesions.phpLast accessed: 07/2010 [9] G. Loy and A. Zelinsky, “A fast radial symmetry transform for detecting points of interest,” in Proc. 7th Eur. Conf. Computer Vision—Part I (ECCV’02), London, U.K., 2002, pp. 358–368, Springer-Verlag. [10] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models-their training and application,” Comput. Vis. Image Understand., vol. 61, pp. 38–59, Jan. 1995. [11] C. Bradley, The Algebra of Geometry: Cartesian, Areal and Projective Co-Ordinates. Bath: Highperception, 2007. [12] G. Zhu, D. L. Duffy, A. Eldridge, M. Grace, C. Mayne, L. O’Gorman, J. F. Aitken, M. C. Neale, N. K. Hayward, A. C. Green, and N. G. Martin, “A major quantitative-trait locus for mole density is linked to the familial melanoma gene cdkn2a: A maximum-likelihood combined linkage and association analysis in twins and their sibs,” Amer. J. Human Genetics, vol. 65, pp. 483–492, Aug. 1999. [13] N. Srinivas, G. Aggarwal, P. Flynn, and R. Voder Bruegge, “Facial marks as biometric signatures to distinguish between identical twins,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops (CVPRW 2011), Jun. 2011, pp. 106–113. [14] J. S. Pierrard and T. Vetter, “Skin detail analysis for face recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007 (CVPR’07), Jun. 2009, pp. 1–8. [15] Z. Zhang, S. Tulyakov, and V. Govindaraju, “Combining facial skin mark and eigenfaces for face recognition,” in Proc. Third Int. Conf. Biometrics (ICB’09), Berlin, Heidelberg, Germany, 2009, pp. 424–433, Springer-Verlag. [16] A. Jain and U. Park, “Facial marks: Soft biometric for face recognition,” in Proc. 16th IEEE Int. Conf. Image Processing (ICIP 2099), Nov. 2009, pp. 37–40. [17] A. W. K. Kong, D. Zhang, and G. Lu, “A study of identical twins palmprints for personal verification,” Pattern Recognit., vol. 39, no. 11, pp. 2149–2156, 2006. 1550 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 5, OCTOBER 2012 [18] E. W. K. Kodate, R. Inaba, and T. Kamiya, “Facial recognition by a compact parallel optical correlator,” Meas. Sci. Technol., vol. 13, no. 11, p. 1756, Nov. 2002. [19] American National Standard for Information Systems-Data Format for the Interchange of Fingerprint, Facial, and Other Biometric Information, Part 2: Xml Version [Online]. Available: http://www.nist.gov/ itl/ansi/upload/Approved-XML-Std-20080828.pdf, Last accessed: 07/2010 [20] S. Milborrow and F. Nicolls, “Locating facial features with an extended active shape model,” in Proc. Eur. Conf. Computer Vision (ECCV), 2008, pp. 504–513 [Online]. Available: http://www.milbo.users.sonic. net/stasm [21] H. W. Kuhn, “50 years of integer programming 1958–2008,” in The Hungarian Method for the Assignment Problem. Berlin, Heidelberg, Germany: Springer, 2010, pp. 29–47. [22] P. Burt and E. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. 31, no. 4, pp. 532–540, Apr. 1983. [23] D. Reisfeld, H. Wolfson, and Y. Yeshurun, “Context free attentional operators: The generalized symmetry transform,” Int. J. Comput. Vis., vol. 14, pp. 119–130, 1995. [24] K. Carolyn, D. Ballard, and J. Sklansky, “Finding circles by an array of accumulators,” Commun. ACM, vol. 18, pp. 120–122, Feb. 1975. [25] X. Zhu and P. Milanfar, “A no-reference sharpness metric sensitive to blur and noise,” in Proc. Int. Workshop on Quality of Multimedia Experience, 2009., Jul. 2009, pp. 64–69. [26] X. Feng and P. Milanfar, “Multiscale principal components analysis for image local orientation estimation,” in Proc. Thirty-Sixth Asilomar Conf. Signals, Systems and Computers, 2002., Nov. 2002, vol. 1, pp. 478–482. Nisha Srinivas is a Graduate Research Assistant enrolled in the Ph.D. program in computer science and engineering at the University of Notre Dame. She received the M.S. degree in electrical engineering from Syracuse University in 2009 and the B.E. degree in electronics and communication from Bangalore Institute of Technology, Bangalore, in 2006. Her current work focuses on studying the usefulness of facial marks as biometric signatures to distinguish between individuals. Gaurav Aggarwal is a scientist with Yahoo! Labs Bangalore. He received the B.Tech. degree in computer science and engineering from the Indian Institute of Technology, Madras, in 2002, and the M.S. and Ph.D. degrees in computer science from the University of Maryland, College Park, in 2004 and 2008, respectively. He has held a Research Scientist position with Object Video Inc. and a Research Assistant Professor position with the Department of Computer Science and Engineering, University of Notre Dame before joining Yahoo. His research interests are in image and video processing, computer vision, pattern recognition, and machine learning. Patrick J. Flynn (F’12) is Professor of Computer Science and Engineering and Concurrent Professor of Electrical Engineering at the University of Notre Dame. He received the B.S. degree in electrical engineering (1985), the M.S. degree in computer science (1986), and the Ph.D. degree in computer science (1990) from Michigan State University, East Lansing. He has held faculty positions at Notre Dame (1990–1991, 2001–present), Washington State University (1991–1998), and Ohio State University (1998–2001). His research interests include computer vision, biometrics, and image processing. Dr. Flynn is an IAPR Fellow, an ACM Distinguished Scientist, a past Associate Editor-in-Chief of IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, and a past Associate Editor of IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Pattern Recognition, and Pattern Recognition Letters. He has received outstanding teaching awards from Washington State University and the University of Notre Dame. Richard W. Vorder Bruegge is the Senior Photographic Technologist at the Federal Bureau of Investigation (FBI), in the Digital Evidence Laboratory, where he is responsible for overseeing science and technology developments in the imaging sciences. He received the B.Sc. degree in engineering (1985), the M.Sc. degree in geological sciences (1987), and the Ph.D. degree in geological sciences (1991) from Brown University. He has been with the FBI since 1995, where he has performed forensic analysis of image and video evidence, testifying in state, federal, and international courts as an expert witness over 60 times. His research interests include the forensic analysis of image evidence, with a particular interest in face and iris recognition. Dr. Vorder Bruegge was chair of the Scientific Working Group on Imaging Technology (SWGIT) from 2000 to 2006 and is current chair of the Facial Identification Scientific Working Group (FISWG). He is a fellow of the American Academy of Forensic Sciences (AAFS) and in 2010 he was named a Director of National Intelligence (DNI) Science and Technology Fellow.