Proc. CVPR 2011 Workshop on Biometrics

Facial Marks as Biometric Signatures to Distinguish between Identical Twins
Nisha Srinivas, Gaurav Aggarwal, Patrick J Flynn
Department of Computer Science and Engineering, University of Notre Dame
nsriniva, gaggarwa, flynn@nd.edu

Richard W. Vorder Bruegge
Federal Bureau of Investigation, Digital Evidence Laboratory
Building 27958A, Pod E, Quantico, VA 22135
Richard.VorderBruegge@ic.fbi.gov

Abstract
There exists a high degree of similarity in facial appearance between identical twins that makes it difficult for even
the state of the art face matching systems to distinguish between them. Given the consistent increase in the number of
twin births in recent decades, there is a need to develop alternate approaches to characterize facial appearance that
can address this challenging task that has eluded even humans. In this paper, we investigate the usefulness of facial
marks as biometric signatures with focus on the task of distinguishing between identical twins. We define and characterize a set of facial marks that are manually annotated by
multiple observers. The geometric distribution of annotated
facial marks along with their respective categories is used
to characterize twin face images. The analysis is conducted
on 295 twin face images acquired at the Twins Days Festival at Twinsburg, Ohio, in 2009. The results of our analysis signify the usefulness of distribution of facial marks as
a biometric signature. In addition, contrary to prior research, our results indicate the existence of some degree of
correlation between positions of facial marks belonging to
identical twins.

(a) Twin A

(b) Twin B

Figure 1: A pair of identical twins from the dataset. a) Twin
A and b) Twin B. We can observe a high degree of facial
similarity.
cies. There have been several criminal cases in which either
both or none of the identical twins were convicted due to
the difficulty in determining the correct identity of the perpetrator. This has led to recent interest in the usefulness of
biometric traits like facial appearance, fingerprints, iris, etc.
to distinguish between identical twins [1] [3]. These studies have indicated that though biometrics like iris and fingerprints perform reasonably well when dealing with twin
images, automatic face matching performance shows significant performance degradation when asked to distinguish
between images of identical twins. The quest for novel
characterizations for this task has the potential to enrich existing facial characterizations used in automatic systems to
improve face matching performance in general.
In this paper, we propose to differentiate between iden-

1. Introduction
Humans face difficulty in distinguishing between
monozygotic (identical) twins because of the high degree
of similarity in their facial appearances. Even the state of
the art face recognition systems exhibit poor performance
when trying to distinguish between identical twins [1]. Traditionally, distinguishing between identical twins has been
considered to be a problem of only academic interest but
due to consistent increase in twin births in recent decades
(the total increase in twin births since 1980 is 70%) [2], this
has become a pertinent challenge for law enforcement agen113

 Figure 2: Proposed approach to differentiate between identical twins.
tical twins using facial marks. Facial marks are considered to be unique and inherent characteristics of an individual. High-resolution images enable us to observe more
micro-scale details on the face [4]. Facial marks are defined as visible changes in the skin and they differ in texture, shape and color from the surrounding skin [5]. We
have defined eleven types of facial marks including moles,
freckles, freckle groups, darkened skin, lightened skin, etc.
for the analysis.

The paper is organized as follows. Section 2 briefly discusses related work. Section 3 provides description of different categories of facial marks. The description of manual
annotation process is provided in Section 4 and Section 5
describes the proposed matching process. The details of
the dataset, experimental setup and results are presented in
Section 6. The paper concludes with a brief summary and
discussion.

2. Previous Work

The approach to distinguish between monozygotic twins
based on facial marks is shown in Figure 2. Initially, each
image in the dataset is manually annotated to determine the
different types of perceptible facial marks. The distribution of annotated facial marks is used to characterize face
images. The locations of facial marks are converted from
image pixel coordinates to Barycentric co-ordinates, to facilitate inter-image comparison of geometric distribution of
landmarks. The similarity of these distributions is used to
determine the similarity between two face images. The similarity is computed by formulating a bipartite graph matching problem to correspond facial marks in one image to another.

Traditionally, biometrics research has focused primarily
on developing robust characterizations and systems to deal
with challenges posed by variations in acquisition conditions (like pose, illumination condition, distance from sensor, etc.) and presence of noise in the acquired data. Only
recently have researchers started to look at the challenges
involved in dealing with the task of distinguishing between
identical twins. Here we provide pointers to a few of the
relevant investigations.
Kong et al. [7] observed that palmprints from identical
twins have correlated features (though they were able to distinguish between them based on other non-genetic information). The same observation was made by Jain et al. [8] for
fingerprints also. They observed that though fingerprints
appear to be more correlated for identical twins, fingerprint matching systems can be used to distinguish between
them. Genetically identical irises were compared by Daugman and Downing [3] and were found to be as uncorrelated
as the patterns of irises from unrelated persons. Kodate et
al. [9] experimented with 10 sets of identical twins using
a 2D face recognition system. Recently, Sun et al. [1] presented a study of distinctiveness of biometric characteristics
in identical twins using fingerprint, face and iris biometrics.
They observed that though iris and fingerprints show little
to no degradation in performance when dealing with identical twins, face matchers find it hard to distinguish between
identical twins. All these studies were either conducted on
very small twin biometric datasets or evaluated using exist-

Extensive experiments are conducted on 295 twin face
images belonging to 157 unique subjects. The data used
for the investigation was acquired at Twins Day Festival
at Twinsburg, Ohio in 2009. Figure 1 presents images of
a pair of identical twins from this dataset. The presented
results highlight the merit of the proposed facial characterization to distinguish between identical twins. Prior research has claimed that the number of facial marks between
twins is similar but the distribution of facial marks across
twins is different [6]. Hence we also analyze if our investigation agrees with these conjectures. We observe that as
suggested earlier, the number of facial marks does appear
to be correlated across identical twins. On the other hand,
contrary to the commonly held belief, our results indicate
non-trivial correlation between distributions of facial marks
across identical twins.
114

 6. Birthmark: A persistent visible mark on the skin that
is evident at birth or shortly thereafter. Birthmarks are
generally pink, red, or brown in color and are large.
7. Splotchiness: An irregularly shaped spot, stain, or colored or discolored area.
8. Raised skin: A solid, raised mark less than 1 cm
across. It has a rough texture and appears in red, pink,
or brown in color.
9. Scar: Discolored tissue that permanently replaces normal skin after destruction of the epidermis.
10. Pockmark: A hollow area caused by scratching or
picking of a primary scar.

Figure 3: The different categories of facial marks defined

11. Pimple: A raised lesion that is temporary in nature.
ing in-house or commercial matchers. We build on these
efforts and present facial marks based approach to characterize faces to address this challenging task. Facial marks
were recently used as soft biometric for face recognition by
Jain and Park [10] in which they fuse a commercial face
matcher with facial marks and observe small improvement
in matching performance on non-twin datasets.

4. Facial Annotation
The initial step in the proposed approach to distinguish
between identical twins is to manually annotate the different
types of facial marks in each image of the dataset. The twins
dataset was annotated using MarkIt 1 , a facial annotation
tool developed at our laboratory. MarkIt was designed to
aid users to manually annotate images. MarkIt has three
main components as follows,

3. Facial Marks used for Biometric Characterization

1. Display component: The images are displayed to observe and annotate the various facial marks.

The objective is to discriminate between identical twins
solely based on the features extracted from facial marks.
Facial marks can be defined as a region of skin or superficial growth that does not resemble the skin in the surrounding area.We have identified and defined the following facial
marks (shown in Figure 3)

2. Annotation component: Contains a list of pre-defined
facial marks for annotations.
3. Tools component: Presents different shapes of bounding boxes to perform the actual annotations.

1. Mole: A small flat spot less than 1cm in diameter. The
color of a mole is not the same as the nearby skin. It
appears in a variety of shapes and is normally black in
color.

The metadata associated with each image consists of the
number, type and location of annotated facial marks. Manual annotations enable us to gain insight on the different categories and number of facial marks present in the dataset,
the location of facial marks and an understanding of how
human observers visualize and differentiate between facial
marks. Three different individuals, referred to as observers,
annotated the identical twins dataset. This was to eliminate
any kind of bias and to determine the consistency in annotating the various types of facial marks. All the observers were
provided with the definitions of the facial marks characterized in this investigation along with example markings on a
few sample images. Figure 4 presents the facial marks identified and annotated by a single observer on an image from
the dataset. Figure 5 presents the category-wise distribution
of the number of facial marks annotated by each observer.

2. Freckle: A small flat spot less than 1 cm in diameter
and appears in a variety of shapes. It is usually brown
in color.
3. Freckle group: A cluster of freckles.
4. Lightened patch: A flat spot that is more than 1 cm in
diameter and appears in different shapes. They are usually brown, red, or lighter in color than its surroundings.
5. Darkened patch: A flat spot that is more than 1 cm in
diameter and appears in different shapes. These spots
are darker in color than the surroundings caused due to
a local concentration in melanin.

1 Matthew Pruitt, Designed and Implemented Markit: A face Annotation Tool, at Computer Vision Research Lab (CVRL), University of Notre
Dame.

115

 scale, rotation and translation invariant. They are computed
with respect to a reference triangle that forms the basis for
the transformation. In this work, the reference triangle is
determined by the center of each eye and the tip of the nose
which is automatically localized using STASM [11]. Given
the reference triangle, points inside and outside the reference triangle can be expressed as a function of the vertices
of the triangle.

5.1. Bipartite Graph Matching
Similarity between two sets of facial marks is computed
by formulating the matching problem in terms of weighted
bipartite graph. A weighted bipartite graph is defined as
G = (S, T ; E), where S and T denote two disjoint sets
of vertices and E denotes the connecting edges with corresponding nonnegative cost. A bipartite graph with N1 + N2
nodes corresponding to N1 facial marks in image I1 and
N2 facial marks in image I2 is constructed. The edges
correspond to potential matches between facial marks in
I1 and I2 . The nonnegative weights associated with each
edge is a function of euclidean distance between normalized feature locations being compared. At any given point
of time, facial marks of the same category should be compared against each other, e.g., mole against mole, freckle
against freckle, etc. Therefore, to avoid correspondence between facial marks of different categories, edges connecting
marks belonging to different categories are given infinite
weight.
Facial marks in I1 are compared to facial marks in I2
by computing the Euclidean distance between the normalized feature centroids corresponding to the facial marks being compared. A potential match is determined, if the Euclidean distance between a pair of feature centroids (belonging to same category) is less than a threshold λ. The optimal
correspondences are then established by executing standard
Hungarian bipartite graph matching algorithm [12] on the
set of potential matches. The normalized similarity score
si,j is defined as,

Figure 5: Variation in number of facial marks of different
types identified by each Observer
Observer ID
Observer-1
Observer-2
Observer-3

Number of Facial Marks
Annotated
3785
2311
5100

Table 1: The total number of facial marks annotated by each
observer for the twins dataset.
Table 1 shows the total number of facial marks annotated by
each observer on the entire dataset. Clearly, based on these
gross statistics, there is large variation in how the three observers perceived marks of faces in the dataset. Although,
all observers appear to have annotated the prominent facial
marks, some observers failed to annotate the less prominent
but visible marks.

5. Facial Marks based Matching
The goal of this effort is to highlight the usefulness of
facial marks to characterize face images to discriminate between identical twins. Therefore, we propose a very simple matching approach that characterizes each face image
based on the geometric distribution of the annotated facial
marks. Similarity between two face images is computed using the similarity between the corresponding distributions
of annotated marks. The details of the matching process are
as follows.
Each facial mark is characterized using its facial mark
category (one of eleven categories as described in Section 3)
and its geometric location on the corresponding face image. For comparing locations of facial marks across images, mark locations are transformed from image space to
normalized barycentric coordinate space. Barycentric coordinates are normalized homogeneous coordinates that are

si,j =

M
max(Ni , Nj )

(1)

where M is the total number of correspondences established
across the two images being compared. The similarity metric is used as match scores between two images.

6. Experimental Evaluation
In this section, we provide the details and results of the
experiments conducted to evaluate the usefulness of the proposed facial marks based characterization of faces. All the
experiments are conducted on twins dataset consisting of
295 images from 157 unique subjects.
116

 Figure 4: Facial marks identified and annotated by a single observer (Observer-3)

6.1. Dataset

The query set is defined as images of unidentified persons
to be compared against the target set. In the individual observer experiment, performance is evaluated by comparing
the query set against the target set annotated by the same
observer. Three different observers annotated the identical
twins dataset, hence the experiment is executed three times.
The performance for inter-observer experiment is evaluated by comparing the query set annotated by one observer
against the target set annotated by another observer. For
example, a query set annotated by observer-1 is compared
against a target set annotated by observer-2 and the process
is known as Observer-1 vs Observer-2. Similarly, we have
Observer-1 vs Observer-3 and Observer-2 vs Observer-3.

The twins dataset used in this investigation consists of
face images of identical twins and was acquired in 2009 at
the Twins Days Festival in Twinsburg, Ohio [13]. Face images were captured under different scenarios and conditions
like controlled and uncontrolled lighting, presence and absence of eyeglasses, different facial expression like smile or
neutral, different poses with yaw ranging from 0 − 180 degrees. The dataset used for the proposed experiments consists of only frontal (yaw=0) face images with no glasses,
no facial hair and a neutral expression. These images were
captured indoors. The dataset consists of 295 images corresponding to 157 subjects and 76 pairs of twins. Note that
for a few subjects (five out of 157) used in the experiments,
there is no twin image present in the dataset. The number
of images per subject varies from one to six. The resolution
of the images is equal to 2868 × 4310 with face occupying a large region in the image. The high-resolution images
enable us to observe more micro-scale details on the face
when compared to low-resolution images.

For each of the aforementioned experiments, we define
two different scenarios of comparing the target set and the
gallery set to generate performance curves. In the first scenario, we perform only twin comparisons and in the second scenario we perform twin-unrelated persons comparison. The main difference between these two scenarios lies
in the generation of the impostor scores. For both scenarios
the genuine scores are obtained by comparing face images
of a subject in the query set against other images of the same
subject in the target set. However, for twin comparison scenario, the impostor scores are obtained by comparing a face
image of a subject in the target set only against the images of
the subject’s twin. The impostor scores for twin-unrelated
persons comparisons are obtained by comparing a face image of a subject in the target set against images of all other
subjects in the query set.

6.2. Experimental Setup
We perform two different types of experiments to evaluate the performance of the proposed characterization
namely, individual observer experiments and inter-observer
experiments. In each experiment, the query set and the target set is composed of the available 295 face images. The
target set is defined as gallery of persons to be recognized.
117

 Initially the experiments are executed by considering all
categories of facial marks defined. However, to determine if
some facial marks categories improve the verification performance compared to other facial marks categories, the following subsets of facial marks are also considered
1. FM1=moles, freckles, freckle group, birthmark, darkened patch, lightened patch, splotchiness, raised skin,
pockmark, scar round, scar linear or (no pimple)
2. FM2=moles, freckles
3. FM3=moles, freckles, pimple

6.3. Individual Observer Experiments
In these experiments, the query images and target images are annotated by the same observer. Figure 6 and
Figure 7 show the Receiver Operator Characteristic (ROC)
curves for twin comparison and unrelated-persons comparison scenarios for the different subsets of facial marks annotated by one of the observers. It is observed that using all
facial mark categories for verification enhances the performance when compared to the subsets of facial marks. Similar trends are observed in both experimental setups. The
distributions of match and non-match scores for both experiments are shown in Figure 9. Figure 8 compares the
performance in twins comparison experiment against the
unrelated persons comparison. An improved performance
is observed when comparing facial marks across unrelated
persons than comparing facial marks across identical twins.
This suggests that geometric distribution of facial marks are
more similar across twins than across unrelated persons. In
other words, there appears to be a correlation in distribution of facial marks across twins. Prior research claims that
the distribution of facial marks is different across identical twins but the results obtained do not exhibit this phenomenon [6]. To substantiate statistical significance of this
observation, the experiment is repeated multiple times with
50 twin pairs randomly selected from each run. The error
bars in Figure 8 reflect the range of performance curves obtained. There does not appear to be any overlap across the
two sets of error-bars for false accept rates 0.4 and lower
which highlights the performance difference is statistically
significant. Hence, there appears to be a correlation in distribution of facial marks across identical twins. Table 2 and
Table 3 presents the Equal Error Rates (EER) obtained for
all three observers in individual observer experiments. The
difference in EERs across the two tables again substantiates our observation that facial marks are more similarly
distributed across identical twins as compared to any two
unrelated persons.

Figure 6: ROC curves of twin comparisons for various sets
of facial marks annotated by Observer-3

Figure 7: ROC curves of non-twin comparisons for various
sets of facial marks annotated by Observer-3

set annotated by a different observer. Performance curves
for these experiments are shown in Figure 10 for Observer-1
vs Observer-3 for both twin comparisons and twin-urelated
comparisons. Table 4 and 5 presents the EER for Observer1 vs Observer-2, Observer-2 vs Observer-3 and Observer1 vs Observer-3, for both twin-unrelated comparisons and
twin comparisons. The ROCs are obtained by considering
all categories of facial marks for both the twin and unrelated
persons scenarios. The performance degrades considerably
when facial marks annotated by different observes are compared against each other. This occurs due to the variation
of facial marks annotated by observers. Though all observers have annotated the prominently visible facial marks
but some failed to annotate the less prominent marks. How-

6.4. Inter-Observer Experiments
In these experiments, biometric scores are computed by
comparing a query set annotated by one observer to a target
118

 Facial Mark Sets
FM
FM1
FM2
FM3

EER- Observer-1
6.98%
8.03%
10.99%
11.16%

EER- Observer-2
10.87%
11.98%
15.96%
16.40%

EER- Observer-3
6.96%
8.27%
11.87%
9.42%

Table 2: Equal error rates computed for unrelated-persons comparison for various experiments for each observer, where FM
is the set consisting of all categories of facial marks and FM1, FM2 and FM3 are the different subsets defined.
Facial Mark Sets
FM
FM1
FM2
FM3

EER- Observer-1
14.12%
15.78%
20.09%
19.06%

EER- Observer-2
18.27%
20.84%
21.68%
20.22%

EER- Observer-3
13.13%
14.35%
19.58%
16.27%

Table 3: Equal error rates computed for twin comparison for various experiments for each observer, where FM is the set
consisting of all categories of facial marks and FM1, FM2 and FM3 are the different subsets defined.

Figure 8: Comparing performance between twin comparisons and non-twin comparisons for Observer-3 annotations. The error bars indicate the range of performance
curves obtained by repeating the experiments with randomly selected subjects from the dataset.

Figure 9: The distributions of match and non-match scores
for twin and non-twin comparisons.
Comparison of Annotations
across Observers
Observer-1 vs Observer-2
Observer-1 vs Observer-3
Observer-2 vs Observer-3

ever even in these experiments, the performance of nontwin comparisons is better than the twin comparisons, indicating that there exists some similarity in the distribution
of facial marks across twins compared to an unrelated persons.

EER
28.11%
18.64%
28.17%

Table 4: Equal error rates obtained for unrelated-persons
comparison for facial mark annotations across Observers.

7. Summary and Discussion

twins solely based on the geometric distribution of facial
marks. The experiments were designed and implemented
to observe if there exists a correlation between facial marks
across twins when compared to unrelated persons. From

In this paper, we analyzed the usefulness of facial marks
as a potential biometric signature for face verification. We
proposed a system for distinguishing between identical
119

 ert, D. M. Strobino, B. Guyer, and S. R. Sutton, “Annual summary of vital statistics: 2006,” in Pediatrics,
pp. 788–801, 2008.
[3] J. Daugman and C. Downing, “Epigenetic randomness, complexity and singularity of human iris patterns,” Proceedings: Biological Sciences, vol. 268,
no. 1477, pp. pp. 1737–1740, 2001.
[4] D. Lin and X. Tang, “Recognize high resolution faces:
From macrocosm to microcosm,” in Computer Vision
and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, pp. 1355 – 1362, 2006.
[5] http://skin-care.health-cares.net/skin-lesions.php.
Last accessed: 07/2010.
Figure 10: ROC curves for inter-observer analysis for twin
and non-twin comparisons for Observer-1 vs Observer-3.
Comparison of Annotations
across Observers
Observer-1 vs Observer-2
Observer-1 vs Observer-3
Observer-2 vs Observer-3

[6] G. Zhu, D. L. Duffy, A. Eldridge, M. Grace,
C. Mayne, L. O’Gorman, J. F. Aitken, M. C. Neale,
N. K. Hayward, A. C. Green, and N. G. Martin,
“A major quantitative-trait locus for mole density is
linked to the familial melanoma gene cdkn2a: A
maximum-likelihood combined linkage and association analysis in twins and their sibs,” American journal of human genetics, vol. 65, pp. 483– 492, 08 1999.

EER
35.08%
23.71%
32.71%

[7] A. Kong, D. Zhang, and G. Lu, “A study of identical
twins palmprints for personal authentication.”

Table 5: Equal error rates obtained for twin comparisons for
facial mark annotations across Observers.

[8] A. K. Jain, S. Prabhakar, and S. Pankanti, “On the similarity of identical twin fingerprints,” 2002.

the results there appears to be a correlation in distribution
of facial marks across twins, that is, the position of certain
facial marks appears to be similar for twins. In future, we
will explore using richer facial mark characteristics like texture, shape and color to improve performance. However,
the results of the conducted investigation makes a case for
the use of facial marks in biometric characterization. Facial mark information can be used simultaneously with existing textural feature to enrich facial characterizations for
improved performance. Manual annotation of facial marks
is a difficult task, hence we observe a degradation in performance when comparing facial marks annotated by different
observers. This could be improved by developing robust automatic facial mark detection methods. For future work, we
aim to develop automatic facial mark detection techniques
and extract other features associated with facial marks.

[9] E. W. Kashiko Kodate, Rieko Inaba and T. Kamiya,
“Facial recognition by a compact parallel optical
correlator,” Measurement Science and Technology,
vol. 13, Nov 2002.
[10] A. K. Jain and U. Park, “Facial marks: Soft biometric
for face recognition.,” in ICIP, pp. 37–40, 2009.
[11] S. Milborrow and F. Nicolls, “Locating facial features
with an extended active shape model,” ECCV, 2008.
http://www.milbo.users.sonic.net/stasm.
[12] H. W. Kuhn, “The hungarian method for the assignment problem,” in 50 Years of Integer Programming
1958-2008 (M. Jnger, T. M. Liebling, D. Naddef, G. L.
Nemhauser, W. R. Pulleyblank, G. Reinelt, G. Rinaldi,
and L. A. Wolsey, eds.), pp. 29–47, Springer Berlin
Heidelberg, 2010.

References

[13] http://www.twinsdays.org/.

[1] Z. Sun, A. A. Paulino, J. Feng, Z. Chai, T. Tan, and
A. K. Jain, “A study of multibiometric traits of identical twins,” in Proc. SPIE, Biometric Technology for
Human Identification VII, April 2010.

Acknowledgement
This research was supported by the U.S. Department of
Justice/National Institute of Justice under grant 2009-DNBX-K231.

[2] A. J. Martin, H.-C. Kuang, T. J. Mathews, D. L. Hoy120