Proceedings on Privacy Enhancing Technologies ..; .. (..):1–18

Giridhari Venkatadri*, Elena Lucherini, Piotr Sapiezynski, and Alan Mislove

Investigating sources of PII used in Facebook’s
targeted advertising
Online social networking services have become the gateway to the Internet for millions of users, accumulating rich databases of user data that form the basis of
their powerful advertising platforms. Today, these services frequently collect various kinds of personally identifying information (PII), such as phone numbers, email
addresses, and names and dates of birth. Since this PII
often represents extremely accurate, unique, and verified user data, these services have the incentive to exploit it for other purposes, including to provide advertisers with more accurate targeting. Indeed, most popular
services have launched PII-based targeting features that
allow advertisers to target users with ads directly by uploading the intended targets’ PII. Unfortunately, these
services often do not make such usage clear to users,
and it is often impossible for users to determine how
they are actually being targeted by advertisers.
In this paper, we focus on Facebook and investigate
the sources of PII used for its PII-based targeted advertising feature. We develop a novel technique that uses
Facebook’s advertiser interface to check whether a given
piece of PII can be used to target some Facebook user,
and use this technique to study how Facebook’s advertising service obtains users’ PII. We investigate a range
of potential sources of PII, finding that phone numbers
and email addresses added as profile attributes, those
provided for security purposes such as two-factor authentication, those provided to the Facebook Messenger
app for the purpose of messaging, and those included in
friends’ uploaded contact databases are all used by Facebook to allow advertisers to target users. These findings
hold despite all the relevant privacy controls on our test
accounts being set to their most private settings. Overall, our paper highlights the need for the careful design
of usable privacy controls for, and detailed disclosure
about, the use of sensitive PII in targeted advertising.

*Corresponding Author: Giridhari Venkatadri: Northeastern University, E-mail: venkatadri.g@husky.neu.edu
Elena Lucherini: Princeton University
Piotr Sapiezynski: Northeastern University
Alan Mislove: Northeastern University

1 Introduction
Users conduct an increasingly large fraction of their everyday activities online, often via online social network
services such as Twitter and Facebook. By virtue of being free, these services have become extremely popular;
this has allowed them to collect data about an extensive set of users. These services use this data for various
purposes, most notably to build advertising platforms
through which advertisers can target platform users.
In particular, these services collect significant
amounts of personally identifiable information (PII)—
information such as email addresses or phone numbers
that uniquely identify users—for a variety of uses. For
example, on Facebook, many of these uses are userfacing features: email addresses serve as login usernames, phone numbers allow users to find each other
on Messenger, and users can “sync” their address books
to find others they are not yet “friends” with.
However, there are other uses of PII that primarily
benefit third parties. Most notably, many services have
recently deployed PII-based targeting features to their
advertising platforms [3, 37, 43], which allow advertisers
to directly choose which users see their ads by providing a list of those users’ PII. This feature—called custom audiences on Facebook’s platform—is now popular
among advertisers as it allows them to exploit the existing PII they have about their customers (such as email
addresses, phone numbers, or names and addresses) and
target them with advertisements.
Recent events have brought the issue of how user
data is collected, used, and made available to third parties to the forefront. In particular, it was recently revealed that tens of millions of users’ Facebook profile
data was collected by an innocuous Facebook app, and
then later shared with Cambridge Analytica (a data
mining consultancy) for use in targeting political ads
in the 2016 U.S. election [7]; custom audiences potentially played a significant role in accomplishing this targeting [18, 22, 25]. In response to the resulting uproar
among both the press and lawmakers, Facebook changed
certain aspects of how apps can collect data [1, 34].
While the Cambridge Analytica story received significant attention, the resulting privacy debate focused

 Proceedings on Privacy Enhancing Technologies ..; .. (..):2–18

largely on third-party apps and other such vectors of
data leakage, and not on the advertising platform that
these companies use to exploit such data. For example, even though Facebook removed the functionality
that allowed users to find other users using phone numbers as part of the response to the Cambridge Analytica
story [34], advertisers can still use phone numbers for
targeting ads. Unfortunately, we have little understanding on how Facebook collects user PII, associates PII
with user accounts, and makes PII available for use by
advertisers via custom audiences.
In this paper, we address this situation by developing a novel methodology to study how Facebook obtains the PII that they use to provide custom audiences
to advertisers. We test whether PII that Facebook obtains through a variety of methods (e.g., directly from
the user, from two-factor authentication services, etc.)
is used for targeted advertising, whether any such use
is clearly disclosed to users, and whether controls are
provided to users to help them limit such use.
Developing such a methodology presents two challenges: First, how do we verify whether PII added to
an account has actually been used by Facebook for PIIbased targeting? Second, how do we select “fresh” pieces
of PII that are not already associated with some other
Facebook user, in order to prevent incorrect inferences?
We solve both of these challenges by developing a technique to check whether a given piece of PII can be used
to target some Facebook user (i.e., is targetable).
Our technique exploits the size estimates that reveal
how many users in a custom audience can be targeted
with ads; these estimates are a fundamental feature of
many advertising platforms, as they help advertisers
budget their ad campaigns. We first reverse-engineer
one such size estimate: potential reach [4], which reports
the number of users in an audience who are active on
a daily basis.1 We show that Facebook obfuscates potential reach using a combination of rounding and noise
seeded by the uploaded records.2 Despite this obfuscation, we develop a technique of uploading lists of PII of
consecutive sizes, adding “dummy” padding records to

1 While these size estimates were reverse-engineered in prior
work [40] and were found then to be computed by simple rounding, we found that Facebook now uses a more sophisticated way
of obfuscating these size estimates (potentially to defend against
the privacy vulnerabilities discovered by that work).
2 This was the case during the time period of our experiments in
early 2018; as discussed in more detailed Section 5, Facebook has
now temporarily removed these statistics, but has other related
statistics that could likely be used.

each list to get multiple samples at each size, and then
using these samples to conclude whether the true number of matched users is different across consecutive sizes.
We demonstrate that this approach is able to effectively
negate the effect of Facebook’s obfuscation, allowing us
to check whether a single piece of PII can be used to
target a Facebook user with ads via custom audiences.
We then use our technique to check which of a variety of potential sources of PII are actually used by
Facebook to gather PII for targeted advertising. For example, if we wish to study whether phone numbers provided for two-factor authentication (2FA) are used for
targeted advertising, we first obtain a new phone number. We then verify (using the technique above) that no
user is currently targetable via this phone number; if so,
we add it as a 2FA number to a control account. We then
repeatedly check over the subsequent month (again using the technique above) to see whether the phone number becomes targetable. Finally, to verify our results if
the number does become targetable, we run a controlled
advertisement targeting the phone number and confirm
that our ads are received by the control account.
We examine seven different sources of PII to see
which are used for targeted advertising: (1) PII added
directly to a user’s Facebook profile, (2) PII provided
to the Facebook Messenger app, (3) PII provided to
WhatsApp, (4) PII shared with Facebook when sharing
a phone’s contacts, (5) PII uploaded by advertisers to
target customers via custom audiences, (6) PII added to
user accounts for 2FA, and (7) PII added for login alerts.
We find that five of these result in the PII being used
for advertising: all except for PII provided to WhatsApp
and PII uploaded by advertisers.
Unfortunately, we find that Facebook does not directly disclose its PII practices beyond generic statements such as [13]:
We use the information we have about you—including information about your interests, actions and connections—to
select and personalize ads, offers and other sponsored content that we show you.

Worse, we found no privacy settings that directly let
a user view or control which PII is used for advertising; indeed, we found that Facebook was using the
above PII for advertising even if our control account
user had set the existing PII-related privacy settings on
to their most private configurations. Finally, some of
these phone numbers that were usable to target users
with did not even appear in Facebook’s “Access Your

 Proceedings on Privacy Enhancing Technologies ..; .. (..):3–18

Data” feature that allows users to download a copy of
all of their Facebook data as a ZIP file.
Taken together, our results highlight the need to
make the uses of PII collected clear to users, and to provide them with easy-to-use privacy controls over their
PII. The sources of PII that we investigated, while being the most straightforward ones, are by no means exhaustive. However, since our method relies only on the
provision of size estimates by PII-based advertising platforms, and since size estimates are an integral part of
these platforms (as they are valuable to advertisers),
methodology similar to ours can be potentially used to
investigate other sources of PII and other services.
Ethics: All the experiments in this paper were conducted according to community ethical standards. All
were performed only using the authors’ accounts (or
one fake account we created for this paper) and did
not interact with other Facebook accounts in any way.
When experimenting with the Facebook interface, we
only used email addresses and phone numbers that we
controlled, email addresses that Facebook already had,
or publicly-available data. In no instance did we reveal
any user PII that we did not already have, or disclose
any PII to Facebook that Facebook did not already
possess. We also ensured that we put a minimal load
on Facebook’s systems: we only created one fake Facebook advertiser account, made a limited number of API
queries, and respected rate-limits.
Additionally, we responsibly disclosed our discovery of a method to check whether a given piece of PII
is targetable to Facebook’s security team. Facebook responded by stating this was not a security vulnerability [9] and closed our bug report:
Enumeration vulnerabilities which demonstrate that a given
e-mail address or mobile phone number is tied to "a Facebook account" are not eligible under the bug bounty program. This is true whether the endpoint returns only confirmed items, or both confirmed and unconfirmed. In absence of the user ID that the e-mail/mobile number is linked
to, this behavior is considered extremely low risk.

Overall, we believe that any de minimis harm to Facebook as a result of our experiments is outweighed by
the benefits to users in terms of increased transparency
and understanding of how their PII is used.

2 Background
We begin by providing background on online social network advertising. In this paper, we focus on the Facebook advertising platform, as it is the largest and most
mature. However, other competing services now offer
similar features (e.g., PII-based user targeting), including Google’s Customer Match [3] and Twitter’s Tailored
Audiences [37]. Thus, similar issues may exist on these
sites as well; we leave a full exploration to future work.

2.1 PII-based targeting
Advertising on Facebook traditionally relied on selecting the attributes of the users to whom the advertiser
wished to show ads. For example, an advertiser might
specify that they wish to show ads to 30–35 year-old
males who live in Chicago and who like a particular TV
show. To allow the advertiser to place multiple ads with
the same set of attributes, Facebook automatically creates a “group” of users after the advertiser selects the
attributes of interest; these groups are called audiences.
Recently, Facebook—along with many other
sites [40]—has introduced a new feature: custom audiences. Unlike traditional audiences, the advertiser is
allowed to target specific users. To do so, the advertiser
uploads user PII to Facebook; Facebook then matches
the provided PII against platform users. Facebook
currently allows users to upload 15 different types of
PII, including phone numbers, email addresses, names,
and even advertising tracking IDs [40]. Facebook then
creates an audience consisting of the matched users and
allows the advertiser to target this specific audience.
Figure 1 shows an example of the audience creation
flow. In panel A, the advertiser is prompted to select
the source of user information for targeting. The first
option allows them to upload a list of users (e.g., existing
customers).3 In panel B, the advertiser is instructed on
the types of data available for targeting, including email
addresses, phone numbers, mobile advertising IDs, etc.
Audience statistics
Facebook does not reveal to
the advertiser which users were actually in the matched
group, but it does provide two statistics: the total number of matched users, called the audience size; and the
number of daily active matched users, called the poten-

3 The data is uploaded “in an encrypted format to maintain
privacy” [12]; in reality, it hashed using SHA-256 with no salt.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):4–18
A

B

C

Fig. 1. Flow of creating custom audiences using the Facebook advertising site. The advertiser can select to upload data (panel A), and
then can choose from among 15 supported types of PII (panel B). Once the data has been uploaded, the advertiser is provided with
coarse-grained statistics about the users who matched (panel C).

tial reach [40]. An example of the reported potential
reach is shown in Figure 1 panel C. Facebook also allows advertisers to combine multiple custom audiences,
and will provide the potential reach of the combination;
we refer to this as the union of the audiences.
We recently demonstrated that even these coarsegrained audience size approximations could allow advertisers to infer PII of particular Facebook users by
observing changes in size statistics [40]. In brief, this attack worked because Facebook “deduplicated” PII that
referred to the same underlying Facebook user in a list
of uploaded PII. Facebook claims to have mitigated this
issue by refusing to provide audience size and potential
reach statistics when the advertiser uploads PII of multiple types; irrespective of whether this is the case, we
show in the next section that we are still able to infer
whether specific uploaded PII is targetable.

2.2 Site features
There are a few features that we investigate as potential
sources of PII for advertising.
Profiles
Users are allowed to provide PII such as
email addresses and phone numbers as part of their basic profile, both to serve as their login username and to
be revealed to friends. Such user-reported PII could be
used to match against advertiser-uploaded PII.
Login Alerts
If the user opts into Login Alerts,
they are notified whenever anyone successfully logs in
to the account from a “new” device. The alerts can be
presented as a Facebook notification, a Messenger notification, an email, or an SMS. The two latter channels
require the user to provide the email address or phone
number (i.e., PII) to which the notification should be

sent; as we will see later, this information could then be
used to match against advertiser-uploaded PII.
Two-Factor Authentication (2FA)
Facebook allows adding a variety of second security factor (a whatyou-have factor) authentication methods: SMS messages, USB security keys, code generators, and onetime-use recovery codes. The most commonly used of
these is the SMS message, which requires a user to provide Facebook with a phone number to send the SMS
to. Similar to login alerts, this PII could then be used
to match against advertiser-uploaded PII.
Address book synchronization Facebook users can
find their friends on the platform by allowing the Facebook app to access their phone’s address book. Each
contact in the address book can have multiple pieces of
PII, for example the name, email address, and phone
number. Hence, Facebook could potentially match contacts based on partial PII (just the email address), but
still learn new PII (a phone number of a person whose
email address is known to the platform).
WhatsApp Users are identified in WhatsApp by their
phone numbers. If a user has both WhatsApp and Facebook apps installed on the same (Android) phone, Facebook could use the Android advertising ID to learn that
the two disconnected accounts belong to the same user,
and thus associate the phone number with the Facebook
account as well.
Messenger Upon installation of the Messenger app,
the user is prompted to upload their address book (potentially leaking contacts’ PII) and to use it as the default SMS app. Granting the latter permission reveals
the user’s phone number to Facebook.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):5–18

3 Methodology
We now develop a methodology to check whether Facebook uses PII from a given source for targeted advertising.

3.1 Datasets and setup
In order to reverse-engineer the size estimates provided
by Facebook’s PII-based advertising platform, similar to
prior work [40], we collect 103 emails and phone numbers corresponding to friends and family who have Facebook accounts and had previously provided their PII to
Facebook (i.e., they had already done so; we did not ask
them to upload it). There were no requirements (activity or otherwise) asked of these users. Thus, these users
were not affected in any way by our experiments.
To differentiate between PII which matches Facebook users and PII which does not, we also create
dummy PII designed not to match any Facebook user.
We generate dummy phone numbers by adding a random sequence of 20 digits to the Italian country code
(+39). Since Italian phone numbers do not exceed 12
digits, these dummy numbers cannot correspond to any
Facebook user. Similarly, we generate dummy email addresses by using randomly generated strings of alphabets as usernames and (long) dummy domain names.4
We then automate the process of uploading lists of PII
to create custom audiences, and of collecting potential
reach estimates, using scripts that make appropriate
calls to Facebook’s marketing API [21].

3.2 Reverse-engineering potential reach
As described in Section 2, Facebook’s advertising platform provides two audience size estimates; for this paper, we only use the potential reach, which measures
the number of active users in the audience. Below, we
reverse-engineer how the displayed potential reach estimates are computed.

4 To confirm that the dummy PII we created do not correspond
to any Facebook user, we created two audiences containing 1,000
dummy phone numbers and 1,000 dummy email addresses; both
audiences had a potential reach of 20 (the smallest value it can
take), meaning that the dummy PII indeed do not correspond
to any Facebook user.

3.2.1 Are size estimates obfuscated?
Our prior work [40] demonstrated that Facebook’s advertising platform obfuscated the potential reach by
simple rounding; to do so, we showed that the potential
reach estimates were granular (rounded in steps of 10 up
to 1,000),5 consistent over short periods of time while
occasionally varying over longer time periods (which is
expected since some active users might become inactive
and vice versa), and monotonic. However, Facebook’s
interface changed significantly since that work was conducted; thus, we revisit these findings to see if Facebook
still obfuscates size estimates via simple rounding.
Granularity: We create 10,000 different custom audiences by uploading sets of varying sizes containing
either phone numbers or email addresses (from the
103 that we collected), and obtain their corresponding potential reach estimates. Consistent with prior
work [40], we find that the estimates are still granular and increase in steps of 10, always returning one of
{20, 30, 40, ..., 80}.6
Consistency: To check consistency of potential reach
estimates over short periods of time, we create two
audiences by uploading 70 and 89 phone numbers respectively; for each audience, we make 1,000 potential
reach queries back-to-back. We found that all the 1,000
queries returned the same potential reach for each audience, with their estimates being 40 and 80 respectively.
We repeated the above experiment for various lists, both
of phone numbers and of emails, and at different times,
and found that the potential reach estimates for a given
audience were always consistent over short periods of
time. This is also consistent with prior work [40].
To check the consistency of potential reach estimates over longer periods of time, we take three audiences created by uploading 60, 80, and 103 phone numbers respectively, and repeatedly obtain the potential
reach for each audience every five minutes, over a period of around 14 hours, giving us 164 samples in total
for each audience. We find that the estimates were consistent over this period of time. with values of 40, 60,
and 80 respectively. We repeated the above experiment
for other audiences and across longer periods of time

5 We do not discuss or investigate larger audience sizes as they
are not necessary for our paper.
6 It is important to note that the potential reach may not always
be 100 even though we uploaded 100 PII records, both because
we uploaded subsets of varying size and because potential reach
only counts “daily active” users.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):6–18

and found that the size estimates were generally consistent, sometimes changing over a period of hours. This
is also consistent with prior findings, and is expected
as whether a given user is “daily active” (and counts
towards potential reach) may change over time.
Finally, to check the consistency of potential reach
across multiple uploads of the same list of PII, we repeatedly upload a list of 70 phone numbers 100 times
over a period of three hours, and obtain the corresponding potential reach estimates; we find that the estimates
are generally consistent across uploads, with 99 of the
custom audiences having a potential reach of 40, with
only one having a different potential reach of 50.7
Monotonicity: Prior work [40] found that the potential reach was monotonic, meaning adding additional
records to an uploaded list would never reduce the potential reach. To check whether the potential reach estimates are still monotonic, we upload a series of lists
of phone numbers, starting at 70 numbers and successively adding one number until we reach 89 numbers.
Surprisingly, we find that the potential reach does not
increase monotonically! For example, uploading a list of
77 phone numbers resulted in a potential reach of 70;
adding three more records to these 77 and uploading the
resulting list resulted in a potential reach of 50. This
indicates Facebook’s potential reach computation has
changed, and that they are likely obfuscating the potential reach estimates by randomly perturbing them.
We repeated this experiment with other series of lists
of phone numbers and email addresses, and found that
similar lack of monotonicity holds.
Summary: We find that the potential reach estimates remain granular, rounded to the nearest 10 (for
the range of values that we observed), and remain consistent for a given audience across short periods of time,
as observed in prior work [40]. However, we find that the
potential reach estimates are no longer monotonic, indicating that Facebook might be additionally perturbing
potential reach estimates by randomly perturbing them
with noise. Therefore, we move on to reverse-engineer
the updated—potentially more sophisticated—way in
which Facebook obfuscates potential reach estimates.

7 While we are not sure about why this one upload resulted
in a different value, we believe this could either be because of
an occasional error in the process of creating an audience, or
because of the variation of potential reach over longer periods
of time.

3.2.2 Properties of noisy estimates
Since Facebook appears to be using noise to perturb
the potential reach estimates, we move on to study how
the noise is seeded, and to characterize the relationship
of the noisy estimates corresponding to a given custom
audience with the true value.
What seeds the noise? Since the potential reach estimates are consistent across multiple repeated queries
to the same custom audience, this indicates that a fresh
sample of noise is not generated corresponding to each
query, and that the noise is fixed for a given custom audience (perhaps to limit a malicious advertiser’s ability
to generate multiple noise samples). Additionally, since
multiple uploads of the same list of PII records have the
same potential reach, this indicates that the same seed
is used to compute the noise sample whenever a given
list of PII records is uploaded (indicating that this seed
is computed using the list of PII records uploaded, for
example by using a hash of the list contents).
In order to check whether all the PII records in a
given list are used to determine this seed, or whether
only records that match some Facebook user are used,
we take a list of 60 phone numbers and upload it 400
times, each time with a different dummy phone number added (i.e., a phone number that we know cannot
match any user). This gives us 400 custom audiences,
each with the same set of users (since they were created
using the same list of valid PII records), which were
created with different lists of PII records (since each list
contains a different dummy record). We find that the potential reach varies across the audiences, with values 20
(appearing once), 30 (appearing 42 times), 40 (appearing 192 times) and 50 (appearing 165 times). We find
that the result holds even if we separately create one
audience corresponding to the 60 phone numbers, create 400 audiences corresponding to one different dummy
record each, and then dynamically ask for the potential
reach of the union of the large audience with each of the
dummy audiences. This result indicates that Facebook
is considering all the PII records uploaded when deterministically calculating the noise to add, regardless of
whether they are valid records or not.
Summary: We find that Facebook obfuscates potential reach estimates corresponding to a given custom
audience using a fixed noise value; the seed for this noise
is computed based on the list of (both valid and invalid)
PII records uploaded. However, this suggests a method
to obtain multiple noisy estimates corresponding to a
given audience, and potentially overcome the effect of

 Proceedings on Privacy Enhancing Technologies ..; .. (..):7–18

650-123-4567
650-123-4568
650-123-4569
...
L1

1

Upload a list
with potentially
targetable phone
numbers

100-123-4568

100-123-4569

D1

D2

U
U
est1

...

est51

...

...

100-123-4618

2

Upload 50
non-targetable
phone numbers

3

Dynamically create audiences
each by adding two nontargetable phone numbers to
the uploaded list

4

Obtain the audience size
estimates and determine
distribution

D50

Fig. 2. Process of using combinations of a small number of audiences to obtain a large number of samples. In our case, we combined
the target audience with pairs of a set of 50 audiences, resulting in 1,225 samples for the target.

noise: upload the same list of PII records multiple times
with a different dummy record added each time and obtain the corresponding potential reach values. We can
then examine the distribution of potential reach values
to say something about the true underlying value.
Obtaining a large number of samples
To measure the distribution of potential reach values, we need
a way to easily obtain a large number of samples of
noisy estimates without having to upload a large number of dummy audiences (since Facebook only allows us
to maintain 500 custom audiences in a given advertising account). To accomplish this, we extend the idea of
combining dummy audiences proposed in the previous
section by creating 50 audiences with a different dummy
phone number each, and then dynamically taking the
union of two dummy audiences at a time with the given
custom audience, as illustrated in Figure 2.
Since the list of PII records corresponding to a given
combination of audiences is different (since each combination corresponds to a different combination of dummy
records), each combination of two dummy audiences
should give us a different sample of the noisy estimate.
Using 50 dummy audiences gives us 1,225 samples corresponding to all possible combinations of dummy audience pairs; it takes up to 20 minutes to obtain all
samples once all the audiences are uploaded and ready
(at the rate of about a query a second).
Distribution of noisy estimates
To characterize
the distribution of noisy estimates, we upload consecutive lists of 70 to 90 phone numbers, each with one
phone number beyond the previous one; we then obtain the distribution of potential reach for each of the
audiences (1,225 samples per audience), and plot the
variation in the histogram of the distribution against
the number of phone numbers used in the first part of
Figure 3. We observe a number of interesting trends:

Fig. 3. Figure showing how the distribution of noisy potential
reach values varies with with uploaded sets of phone numbers of
increasing size. The first part of the figure shows how the histogram and average of potential reach values varies, while the
second part of the figure shows how the average changes between
subsequent distributions. Error bars correspond to confidence intervals at 95% confidence.

Is noise bounded? We notice from the first part of
Figure 3 that the observed potential reach values corresponding to a given set of phone numbers is always
drawn from a set of three consecutive multiples of 10,
or two consecutive multiples of 10; for example, the distribution corresponding to the set of 70 uploaded phone
numbers has values in the set {30, 40, 50}.8 We experimented with phone number lists of various sizes (up

8 For four of the phone number lists, we occasionally notice a
very small number of samples with an outlier value (the maximum number of samples observed being 4). We believe this
might be because of occasional inconsistencies when combining
a custom audience with different dummy records.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):8–18

Fig. 4. Figure showing how the frequency of observed potential
reach values changes with uploaded sets of phone numbers of
increasing size. Whenever there are three values observed for a
given set of phone numbers, the bar at the top shows the median value. Error bars correspond to confidence intervals at 95%
confidence.

to 100) and observed that this result held irrespective
of the size of the phone number list. This experiment
shows that the noise is bounded, that noisy estimates
are drawn from a range of thirty consecutive values or
lesser (since each potential reach value could potentially
correspond to one of ten unrounded values), and that
these bounds do not depend on the magnitude of the actual potential reach (at least within the range of values
that we study).
Is noise added before or after rounding? Having observed both that the true value is obfuscated via rounding (in steps of ten) and perturbed by noise, we move on
to examine whether the true value is first perturbed with
noise and then rounded, or whether it is first rounded
and then perturbed with noise. To do so, we study how
the distribution of observed values shifts as the size of
the corresponding list of phone numbers increases. In
the upper panel of Figure 3 we see how the histogram
of potential reach values shifts towards higher values as
the size of the phone number list increases; we see that
the frequency with which any potential reach value occurs changes in discrete steps as the size of the phone
number list increases.
To further characterize the steps by which these frequencies change, Figure 4 shows how the frequency of
occurrence of a particular potential reach value in a distribution varies with the size of the corresponding list of
phone numbers. From the figure, we see that frequencies

of occurrence change in steps of uniform size (of about
0.1). First, note that such uniform steps are what would
be expected if noise is added to (or subtracted from)
the (rounded or unrounded) true value, as opposed to
say multiplied, in which case the steps would be nonuniform. Second, if the noise is added to the true value
after rounding it (in steps of ten), then the distribution
would shift if and only if the underlying rounded value
had changed (by ten); thus, we would expect that with
every “shift” in the distribution the set of observed values would shift by 10. For example, assume that the
value after rounding (before adding noise) is 60 and
the corresponding set of observed values is {30, 40, 50}.
This distribution would only “shift” when the value after rounding “shifts” to the next multiple of 10 (to 70
say), in which case the corresponding set of observed
values would be expected to be {40, 50, 60}. However,
this is contrary to the much finer steps with which the
observed distribution shifts, showing that noise is added
before rounding.
Is distribution uniform or non-uniform? To study
whether the noise added is uniformly drawn from uniformly spaced values, we study how the frequency of
occurrences of the median of the three observed potential reach values changes as the true value increases
(when there are only two observed potential values, either could be considered the median). For example, assume the noise added is uniformly drawn from a range of
m contiguous values, such as from the set {0, · · · , m−1}
(where m ≤ 30). We would expect that of the range of
contiguous values obtained by adding the noise to the
true value, the smallest few values will be rounded to
the smallest of the three observed potential reach values, that the next ten values would be rounded to the
median of the three observed potential reach values,
and the remaining (largest) values would be rounded
to the largest of the three observed potential reach values. Whatever the true value, we would therefore expect
that the median observed potential reach would always
have exactly ten distinct noise values corresponding to
it. Therefore, assuming the noise added is uniformly distributed over {0, · · · , m − 1}, the expected frequency of
occurrences of the median potential reach is 10
m irrespective of the true value. On the other hand, if the
noise added is non-uniformly distributed, the expected
frequency of occurrences of the median potential reach
would change with the true value. Similarly, if the noise
is drawn from values that are non-uniformly spaced, the
expected frequency of occurrences of the median potential reach would change with the true value.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):9–18

To study this, we show the median potential reach
value for each size of phone number list in Figure 4
by indicating the color of the appropriate line (we only
show this when there are three distinct median values
observed).9 We observe that the frequency of the median potential reach remains constant (around 0.5) regardless of the increase in number of phone numbers
uploaded, and despite the fact that the frequency of
the other two values of potential reach show multiple
changes over the range shown; this shows that the distribution of noise values is uniformly distributed (and
that the values are uniformly spaced). Besides, since the
expected frequency of occurrences of the median potential reach is 10
m , we can determine that Facebook has
chosen m = 20 and therefore that the noise is uniformly
distributed between 0 and 20.
However, if the noise was indeed uniformly distributed over twenty consecutive values, whenever the
true value increases by one, we would expect the frequency of the smallest observed potential reach value to
1
decrease in steps of 20
(i.e., of 0.05). However, as previously observed, the step sizes observed in figure 4 are
close to 0.1, approximately double the expected value.
Investigating the unexpected step size: To investigate why the frequencies of observed potential reach
values change in steps twice as large as expected (i.e,
in steps of 0.1 rather than 0.05), we check whether the
true value of the potential reach (obtained by averaging out the different samples) increases in steps of one
as we increase the number of phone numbers uploaded.
The lower panel of Figure 3 shows the changes in the
average of the observed potential reach values between
consecutive sizes of phone number lists. We see that all
non-zero changes are close to two in magnitude, showing
that the true value increases in steps of two, rather than
one as expected, potentially showing that the true value
is first rounded in steps of two (before adding noise and
rounding in steps of ten).
To further confirm this, we upload a list of 61 phone
numbers, and similarly create six custom audiences containing six different phone numbers respectively (one
each) corresponding to users we know to be active Facebook users. We obtain the distribution of potential reach

9 Figure 4 also sometimes shows a fourth observed potential
reach value for some sizes of phone number lists. As previously
described, we believe these to be because of occasional inconsistencies when combining different dummy records with a given
custom audience; we disregard these when finding the median
bin.

estimates corresponding to the 61 phone numbers; we
then take a union of the audience with the audiences
corresponding to each of the six phone numbers, adding
in each one by one and finding the distribution of potential reach. We find that the distribution shifted with the
addition of the first phone number, did not shift with the
addition of the second phone number, and so on, shifting only with every alternate phone number added. Repeating the experiment with different phone numbers,
we also find that distribution shifted with the addition
of the first phone number, irrespective of which of the
six phone numbers were chosen as the first phone number. This confirms that Facebook rounds the true value
of the potential reach in steps of two, before obfuscating
it further.
Summary
Taken together, we find that Facebook
is first rounding the true values of the potential reach
estimates in steps of two, then adding (or subtracting)
uniform pseudorandom noise seeded by the uploaded
PII records and drawn from a range of 20 consecutive
values, and finally rounding the result in steps of ten.
Given this understanding of how potential reach is calculated, we can now revisit our original goal of determining whether a piece of uploaded PII can be used to
target a Facebook user (i.e., is targetable).

3.3 Determining whether PII is targetable
Since Facebook first rounds the true value of the potential reach prior to adding noise (and then subsequently
rounding the resulting value in steps of ten), we need to
overcome the first layer of rounding by finding a threshold audience At whose true size falls right on the rounding threshold (i.e., adding another user to which would
cause the value to be rounded to the next higher value).
This idea of finding a threshold audience is adapted from
our prior work [40].
Finding a threshold audience: To find a threshold
audience, we first upload a series of PII lists to Facebook (call them {L1 , L2 , ..., Ln }), where each list consists of the previous list with one record added to it.
We then check the potential reach distributions for the
resulting audiences {A1 , A2 , ..., An }, and find an audience At such that the distributions for At and At+1
are different. At is then our threshold audience (if At
was not a threshold audience, the true size estimates
of At and At+1 would have been rounded to the same
value, leading to identical distributions for the potential
reach). In all our experiments in the previous section,

 Proceedings on Privacy Enhancing Technologies ..; .. (..):10–18

we noticed that the change in the number of occurrences
(out of 1,225 samples) of the lowest observed potential
reach estimate across consecutive PII lists was either
very small (with no variation of more than 60), or large
(never smaller than 90). Therefore, to check whether
the distribution shifts between At and At+1 , we check
whether the number of occurrences of the smallest observed potential reach estimate drops by more than 90
(in expectation, we would expect a shift to cause a drop
of 123 in the lowest bucket, or 10% of the 1,225 samples).
Checking whether PII is targetable: In order to
check whether a given PII V is targetable, we compare
S
the potential reach distributions of At versus At V . If
these come from different underlying distributions, then
V matches an active Facebook user and is targetable
(as adding V changed the distribution), else not. We
check whether the distribution shifts between At and
S
At V in a similar manner as above, checking whether
the number of occurrences of the smallest observed potential reach estimate drops by more than 90.
Validation:
To validate the above methodology,
we generate ten dummy phone numbers and email addresses and check whether they can be used to target
some user. We then check for three phones and two
email addresses belonging to the authors (with active
Facebook accounts) whether they can be used to target
some user on Facebook. Using the technique proposed
in this section, we find that none of the dummy records
are targetable, while all of the PII corresponding to the
authors are targetable.

3.4 Determining if a source of PII is used
We now describe a methodology that uses the technique
developed in the previous section to check whether PII
gathered by Facebook via a given source is actually used
by Facebook for PII-based advertising. The methodology can be summarized as follows:
1. Pick a PII (e.g., a new phone number) that we control (call it the test PII ) to use for the experiment.
Check whether the test PII is targetable to begin
with; if so, then it is already associated with some
Facebook account (and thus might interfere with
the experiment); pick another PII instead.
2. Take a Facebook account that we control (call it the
control account) and the test PII from the previous
step. Using the given source from which Facebook

gathers PII about users, provide the test PII in a
way that allows Facebook to associate it to the control account. This could be direct (e.g., adding the
test PII directly to the control Facebook account as
a 2FA number); or indirect (e.g. syncing a contact
list containing the test PII from some other Facebook account so that Facebook can link the test PII
to the control account). We describe in detail how
we do this for different sources in Section 4.3.
3. Check daily over the next month whether the test
PII becomes targetable. If it does so, confirm that
it is associated with the control account by running
an ad targeting the test PII; verify that the control
account receives it. 10 If so, we can conclude that the
given source is a vector for PII-based advertising.

4 Experimental Results
We continue by using the methodology developed above
to investigate which of various potential sources of PII
are actually used by Facebook to gather information for
their PII-based advertising feature.

4.1 Facebook’s data use policy
We first analyze Facebook’s data use policy [13] (last
revised on April 19, 2018) to understand what it reveals
to users about the potential uses of PII, and about the
sources from which PII is collected. Facebook’s data policy covers the information it processes to support what
it terms the “Facebook Products” or “Products” [42],
which include services offered to users such as Facebook
(including the mobile app), Messenger, Instagram etc.;
and services offered to business partners, such as the
Facebook Business Tools [38] (including various APIs,
Facebook social plugins, etc.)
Other companies owned by Facebook [39] (currently
eight are listed), such as WhatsApp, have their own privacy policies. Here, we focus on the information that is
disclosed in Facebook’s data policy.
Potential uses of PII
First, we note that Facebook’s data policy [13] describes the potential uses of
PII collected only at a high level, and in general does
not differentiate among different types of information or

10 This may not be guaranteed to succeed owing to the complexity of the ad delivery process [23]

 Proceedings on Privacy Enhancing Technologies ..; .. (..):11–18

the sources from which it is obtained. Regarding advertising, it does not directly refer to PII, but says:
... we use the information we have about you—including
information about your interests, actions and connections—
to select and personalize ads, offers and other sponsored
content that we show you.

Potential sources of PII To understand potential
sources of PII that Facebook could use, we analyze the
sources of information listed in Facebook’s data policy
and describe how various sources listed there could be
potentially used to collect PII. In general, we find that
the sources of information are also described only at a
high level, making it hard for users to understand which
of them could be potentially used to collect PII, and
what PII might be collected from each source.
“Information and content you provide” The policy
mentions that Facebook collects “the content, communications and other information you provide when you
use our Products, including when you sign up for an
account, create or share content, and message or communicate with others.” This indicates that PII directly
provided to Facebook (e.g., the email address or phone
number you use when you sign up to an account), or
PII mentioned in messages with other users might potentially be collected and used for advertising.
“Things others do and information they provide”
The policy mentions that Facebook collects "information that other people provide when they use our Products. This can include information about you", such as
when they “upload, sync or import your contact information.”; besides, the policy also mentions that contact information collected from such uploading, syncing,
or importing, is used for any of the purposes listed in
the policy (of which advertising is one). This indicates
that PII provided about you to Facebook by other users
might potentially be collected by Facebook and used for
advertising; in our context, this is particularly worrying
as the user may not even be aware that such PII about
them has been collected by Facebook and is being used
to target advertisements to them.
“Device information” The policy mentions that
Facebook collects device information, including PII such
as location and device IDs; and connection information
such as language and mobile phone number.
“Information from partners” The policy mentions
that Facebook receives information “about your activities off Facebook” and “about your online and offline
actions and purchases” from third-party partners; while
the policy mentions that PII is never shared with ad-

name.surname@domain1.com
name.surname@domain2.com

Fig. 5. Editing PII using the Facebook interface. The user can
decide who to make the information available to and whether or
not it should appear on their timeline.

vertising partners (without user permission), or with
measurement or analytics partners, it never mentions
whether PII is received from these partners. Thus, PII
about users could potentially also be obtained by Facebook from third-party partners (such as advertisers, app
developers, third-party data providers etc.)
Summary
Facebook’s data use policy reveals that
a variety of potential sources of information, including
sources where the user is not directly involved, could be
used by Facebook to collect PII for advertising. However, we find that the data use policy describes the
sources of information at a high level, making it hard for
a user to understand which sources might be used to collect PII. Moreover, the policy simply mentions that all
collected information might be used to target advertisements; this is likely insufficient for users to understand
what sources of PII are used for targeted advertising.

4.2 Privacy controls for PII
We examined Facebook’s interface and found that only
the three following privacy options help control the usage of PII (we limit ourselves to high-fidelity PII: email
addresses and phone numbers). First, users can specify
who can see each PII listed on their profiles, the current
list of possible general settings being: Public, Friends,
Only Me; see Figure 5. In addition, users can specify
a custom list of users who can see or not see the PII,
or choose from preset groups of people (computed by
Facebook) who match the user’s workplace, location,
university, etc. We call this the profile privacy control.
Second, Facebook allows users to restrict the set of
users who can search for them using their email address

 Proceedings on Privacy Enhancing Technologies ..; .. (..):12–18

Fig. 6. In the privacy settings the user can decide who can look
their profile up using the provided email address and phone number. The most restrictive option available is “Friends”. We find
that even when the user sets the PII visibility to “Only me” and
searchability to “Friends”, the advertisers can still use that bit of
information for targeting.

or phone numbers; users can choose from the following
options: Everyone, Friends of Friends, and Friends, see
Figure 6.11 We call this the lookup privacy control. Note
that this control does not refer to any particular phone
number or email address; it is one global setting for
phone numbers and one for email addresses.
Third, on the ads preferences page [11], Facebook
shows users a list of advertisers who have included them
in a custom audience using their contact information.
Users can opt out of receiving ads from individual advertisers listed here; however, they cannot see what PII
is used by each advertiser. Additionally, Facebook does
not let users directly control which PII is used to target
advertisements to them.

4.3 PII sources for PII-based advertising
We move on to use the methodology proposed in Section 3 to study which of a number of potential sources
of PII are actually used in PII-based advertising.

4.3.1 Setup
In order to obtain phone numbers to use for our experiments, we purchased SIM cards and plans from various
mobile operators. We verified that some of the numbers
were already targetable (as per our methodology proposed in Section 3); we discarded those and used only

11 Recently, Facebook changed its policy to disable the ability
to look users via email addresses or phone numbers in response
to data leakage attacks. However, these controls still remain.

the numbers that were not targetable before our experiments. In addition, we used other email addresses
belonging to the authors which they had not previously
provided to Facebook (and which were similarly doublechecked to not be associated with active Facebook accounts). We use the accounts of the three authors with
active Facebook accounts for all our experiments. We
performed a factory reset on the Android phone we used
for these experiments before inserting each new SIM
card, in order to wipe out any context that might lead
to interference between experiments.12

4.3.2 PII provided as profile data
Facebook allows users to add contact information (email
addresses and phone numbers) on their profiles. While
any arbitrary email address or phone number can be
added, it is not displayed to other users unless verified (through a confirmation email or confirmation SMS
message, respectively). Since this is the most direct and
explicit way of providing PII, we first study this to obtain a baseline estimate of how quickly Facebook makes
newly collected PII available for targeted advertising.
We added and verified an email address and phone
number to one author’s account, and find that they both
became targetable within six days. We also added an
unverified email address and a phone number to one
of the author’s accounts (i.e., we did not complete the
SMS/email verification process), and found that neither
the email address nor the phone number became targetable after one month, suggesting that only verified
phone numbers or email addresses are used for advertising.
Note that, for the purposes of this baseline experiment, we had set the least restrictive options of both the
profile privacy control and the lookup privacy control.
For all remaining experiments, we assume that a user is
privacy-conscious and turns both the PII-level privacy
controls to their most restrictive settings.
Disclosure and privacy controls: When users add
mobile phone numbers directly to their profile, no information about potential uses of that number is directly
disclosed to them, as shown by Figure 7 (panels A and
C show the interfaces for adding phone numbers on the

12 While Facebook could potentially use the device’s immutable identifiers, such as the IMEI number, to link data obtained across various factory resets, this is unlikely as it contravenes Google’s Android developer best practices [6].

 Proceedings on Privacy Enhancing Technologies ..; .. (..):13–18

A

B

C

D

Fig. 7. Screenshots of interfaces used by users to add mobile phone numbers on Facebook’s main website (A and B) and on Facebook’s mobile app (C and D). While interfaces A and C come up when a user directly adds phone numbers to his Facebook profile,
interfaces B and D arise when Facebook adds a phone number for a security feature.

website and the Facebook app respectively); the same
holds for email addresses. Thus, users adding contact
information for their friends’ convenience may not be
aware that their PII will then be used for targeting ads.

4.3.3 PII provided for security
We move on to examine whether PII provided by users
for security purposes such as two-factor authentication
(2FA) or login alerts are used for targeted advertising.
Users may naturally provide this data with only security purposes in mind; if used for advertising, this may
significantly violate a user’s privacy expectations.
Two-factor authentication We added and verified a
phone number for 2FA to one of the authors’ accounts.
We found that the phone number became targetable
after 22 days, showing that a phone number provided for
2FA was indeed used for PII-based advertising, despite
our account having set the privacy controls to the most
restrictive choices.
Unrecognized login alerts Facebook allows users to
add email addresses or phone numbers to receive alerts
about logins from unrecognized devices. We added a
phone number and an email address to an author’s account to receive login alerts, and found that both the
email address and phone number became targetable after 17 days.
Disclosure and privacy controls: Information about
potential uses of a mobile phone number added for security purposes is only disclosed to users when adding
a number from the Facebook website (and not from the
Facebook mobile app). This can be seen from panels B
and D of Figure 7, which show the interface for adding
mobile phone numbers for security features using the
website or app, respectively (no disclosure about potential uses happens elsewhere during the process). The interface informs users that “confirming your mobile num-

ber helps you reset your password if you ever need to,
find friends, get SMS updates and more. Only you will
see your number.” The text “and more” in the above
is hyperlinked to the Facebook’s data policies page, as
discussed in Section 4.1. There is no disclosure on either
the website or the mobile app when email addresses are
added to receive unrecognized login alerts. Finally, as
with adding PII to the profile, there is no indication to
the users that there exist other relevant privacy controls
that they might want to revisit. Thus, it is highly likely
that users are unaware that enabling security features
through this interface will enable advertisers to target
them directly with ads.

4.3.4 PII provided to other Facebook services
We move on to study whether PII provided on services
owned by Facebook other than the main website and
app is used for advertising.
Facebook Messenger Users must provide and verify
a mobile phone number to the Facebook Messenger app
if they want to use its SMS functionalities. We installed
the Facebook Messenger app on a freshly wiped phone,
added a phone number to it (verified with an SMS message), and checked whether the phone number became
targetable. We find that the phone number did indeed
become targetable after nine days. Again, use of these
phone numbers for targeted advertising can potentially
be counter-intuitive to users and violate their privacy
expectations, since the phone number is provided with
a specific purpose in mind (SMS messaging), and in the
specific context of the Facebook Messenger app.
Disclosure: The first page of the process of adding
a phone to the Messenger app discloses to the user that
setting up Messenger “lets Friends find each other on
Facebook and helps us create a better experience for
everyone.” However, apart from this generic description,

 Proceedings on Privacy Enhancing Technologies ..; .. (..):14–18

no other details are provided to the user about potential
uses of the data collected.
WhatsApp
Users are generally identified by their
phone numbers on WhatsApp. To study whether these
numbers are used for advertising, we first installed the
Facebook app on a freshly wiped phone and logged in
with one of the authors’ accounts. We then installed
the WhatsApp app on the same phone, providing a new
phone number. We found that the new phone number
did not become targetable even after a month. Note that
our experimental setup is not exhaustive, and there may
be other situations when Facebook would use WhatsApp phone numbers that we did not consider.
Disclosure: The first page of the process of adding
a phone number to the WhatsApp app includes a link
to WhatsApp’s Terms of Service and Privacy Policy;
these make a generic statement that Facebook “may use
information from us to improve your experiences within
their services” including for showing ads.

4.3.5 PII obtained without user’s knowledge
Finally, we investigate whether PII obtained without
a user’s knowledge, such as by some other user syncing
their phone contacts, or by an advertiser uploading it to
create a custom audience, is used for PII-based advertising. Such use would be particularly pernicious because
it involves PII that a user is not even aware Facebook
has, and which additionally could be inaccurate (as it
is not verified by the user).
Phone contacts
One way that Facebook could
learn users’ PII from their friends would be by scanning
friends’ contact databases, linking contacts to existing
Facebook accounts, and then augmenting the Facebook
accounts with any additional PII found in the contacts
database. For example, if a Facebook user has a phone
contact containing an email address corresponding to
some Facebook user, and some phone number that does
not correspond to any Facebook user, Facebook might
link the new phone number with the account corresponding to the email address.
We used a factory-reset Android phone, and created a contact containing the full name and the email
address of one of the authors (both of which Facebook
already had), as well as a new phone number that we
controlled and had verified was non-targetable. We then
installed the Facebook Messenger App, giving it permissions to sync the list of phone contacts. We found that
the previously-unused phone number became targetable

in 36 days, 13 showing that it had indeed been linked to
the corresponding author’s account without their knowledge. Making this situation worse, the matched phone
number was not listed on the account’s profile, nor in the
“Download Your Information” archive obtained from
Facebook [5]; thus the target user in this scenario was
provided no information about or control over how this
phone number was used to target them with ads.
Information provided by advertisers
As described in Section 2, in order to use PII-based advertising, advertisers first upload lists of PII belonging to
customers, and then target the resulting set of Facebook
users that match with advertisements. This information is “encrypted” [12] (in reality, hashed) prior to upload. However, because Facebook uses SHA-256 with no
salt added, they could potentially determine what PII
was uploaded via techniques like rainbow tables. Even
without reverse-engineering the uploaded PII, Facebook
could potentially use this data to enrich the PII information it uses to match users for targeted advertising as
illustrated in the following example. Assume Facebook
knows the hashed value ha for a PII attribute a for a
particular user u. Assume an advertiser uploads a record
(ha , hb ) consisting of hashed values for attributes a and
b. Using the value of ha , Facebook can determine that
the corresponding user is u, and learn that the hashed
value of attribute b for u is hb ; in the future, Facebook
can then match hb to user u without ever knowing the
actual value of b.
To study whether Facebook uses the above source,
we first upload a list consisting of just a single record
containing an email address (which one of the authors
uses to log in to Facebook), and another email address that we verified was not targetable. We then check
whether the email address becomes targetable; we found
that it did not become targetable even after a month,
suggesting that this source is not used by Facebook to
infer PII for advertising.

4.4 Verification by running ads
Out of seven different potential sources of PII studied
above, we found that five were indeed used to enable
PII-based advertising. Most worrisome, we found that
phone numbers uploaded as part of syncing contacts—
that were never owned by a user and never listed on their
account—were in fact used to enable PII-based adver-

13 This is an upper bound, owing to a short gap in our testing

 Proceedings on Privacy Enhancing Technologies ..; .. (..):15–18

tising. In all cases we found either no disclosure about
the potential uses of the data, or insufficient disclosure
in the form of generic statements.
In order to confirm that a given piece of PII has
become targetable as a result of us providing it to Facebook through a given source, we run PII-based targeted advertisements targeting each PII found to be
targetable (the final step of methodology described in
Section 3). As described in that section, this may not be
guaranteed to succeed, even if the PII does indeed correspond to the user being targeted (owing to the complexity of the ad delivery process [23]). To increase the
chances of our ad winning in the auction process (which
is used to decide which ad among a set of competing
ads is shown to a user), we use a bid amount four times
higher than the default bid shown by Facebook. We then
search for our ads by scrolling through the Facebook
pages of the corresponding accounts, and identify them
by looking for the custom text that we put in each ad.
We were able to successfully target and receive ads
targeting the phone numbers added for two-factor authentication, for security notifications, and provided to
Facebook as part of uploading the phone contacts; we
were not able to successfully target and receive ads for
phone numbers added directly to a profile and via Facebook Messenger. Moreover, in the ad campaigns where
we were successful, we were only able to receive less than
half of the distinct ads targeted towards each. While
this result confirms some of our most surprising findings, it also underscores why our methodology for inferring whether a user is targetable is necessary, as relying
on placing ads alone is potentially an expensive signal
prone to false-negative errors.

5 Discussion
We briefly discuss a few issues that our methodology
and findings bring up.
Why use potential reach estimates?
At first
glance, it would seem that a better and simpler methodology to check whether a source of PII is used for PIIbased advertising is: (i) add the PII to a target user’s
account via the given source, and then (ii) target an ad
to an audience with the given PII and check that the
target user receives the ad. However, this method has a
number of drawbacks. First, such an experiment could
easily become expensive if Facebook imposed a large
minimum size on an audience for an ad to run. Second, as previously mentioned, other confounding vari-

ables (such as competing ads and Facebook’s estimates
of ad quality and relevance [23]) might interfere with the
results of the experiments; for example, false negatives
may arise if the ad launched after adding the PII fails
to reach the target users due to competing ads (with
better bids, or from more reputed advertisers). Third,
for sources of PII that the user does not verify (such
as phone contacts synced by another user), PII already
associated with some other user may not be associated
to the target user instead. Therefore, it is essential to be
able to check that no other user can be targeted with a
given PII, which we are able to do by exploiting potential reach estimates using the proposed methodology.
Limitations and challenges Our methodology can
only check if data is used for PII-based advertising, and
not for advertising in general. Another challenge, general to studying whether a given source of PII is used,
is that the service might be running sophisticated algorithms to determine whether or not to use a particular
PII for advertising, especially in cases where a user does
not directly provide and verify their PII. For example,
when using hashed PII records provided by advertisers,
Facebook might require a new hashed PII value to occur
multiple times in records that match a given user before
associating the value with the user. It can thus be challenging to provide PII via a given source in such a way
that it passes the checks imposed by any Facebook algorithms. On the flip side, this means that a positive result
indicating that Facebook does use a given source of PII
for PII-based advertising does not mean that Facebook
will always use any PII provided via that source; further
controlled experiments might be necessary to reveal the
exact conditions under which Facebook uses PII provided via that source.
Changes to the advertising interface
The experiments described in this paper that use the potential
reach estimates were all conducted before March 2018.
Subsequently, we found that potential reach estimates
could be used to leak users’ attributes; in response to
our disclosure, Facebook removed potential reach estimates for custom audiences [31]. However, when uploading records to create a custom audience, Facebook still
provides an estimate of the number of users matched
(the audience size estimate mentioned in Section 2).
Since these estimates also capture the notion of whether
a given PII is used by the advertising platform, and were
found by prior work [40] to be obfuscated in similar ways
(by simple rounding), our methodology could be modified to potentially use these rather than the potential
reach. We leave a full exploration to future work.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):16–18

Generalizability
Size estimates are a fundamental
feature of any advertising platform as they help advertisers tailor their ad campaigns and plan their ad budgets. Thus, similar methodology to that proposed in this
paper can potentially be used across different advertising platforms to study what sources of PII are used for
their PII-based advertising features. Besides, since our
procedure for reverse-engineering the potential reach estimates in this paper dealt with multiple common ways
of obfuscation (noise, rounding etc.), it could potentially
illuminate the process of reverse-engineering other size
estimates that are obfuscated differently.

6 Related work

since been disallowed by imposing a minimum audience
size of 20. However, subsequent studies showed that targeted advertising platforms are still subject to leaks of
personal information [15, 17, 28, 40] and potential discrimination [14, 16]. While Facebook took action to fix
these issues [20, 26, 33], the deployment of discriminatory ads was still possible in November 2017 [10]. Speicher et al. [35] demonstrated that the measures taken
by Facebook against discrimination (i.e., banning the
use of certain attributes, such as ‘ethnic affinity’) are
insufficient. Their work proposes an alternative solution
based on a measure for discriminatory targeting that is
independent of the attributes used in the ad.

7 Conclusion

We now overview prior work related to this study.
Transparency of targeted advertising Much attention has been dedicated to shedding light on what
factors influence the targeting of a particular ad on the
web [29, 30, 32, 44] and on specific services [8]. Ad transparency has also been studied in the context of Facebook [2] (with a focus on the explanations that Facebook provides to users as to why they were shown an
ad) and Google [41] (showing that Google does not reveal all the categories inferred about a user).
Privacy
Because of the lack of information provided by the interfaces where users’ data is input, it
is often unclear how the PII will be used. For example, Facebook’s two-factor authentication interface does
not specify the privacy terms that apply to the inserted
numbers, nor does it provide the ability to opt out of certain kinds of use. Nonetheless, the company has been using numbers obtained through this interface to send notifications as text messages and to make Friends suggestions as part of its People You May Know feature [19].
Likewise, Facebook has been suspected of using phone
numbers collected by users’ contact lists to populate the
People You May Know feature [24]. Counterintuitively,
Tucker [36] shows that giving users control over their
privacy can be beneficial for advertising. According to
Tucker’s study on Facebook, users are more likely to
click on an ad if their perception of control over their
personal information is higher.
Malicious and discriminatory advertising
In
2011, Korolova [27] found that malicious advertisers
could infer personally identifying information about
users who click on an ad. The attack was based on
Facebook’s attribute-based ‘microtargeting’, which has

Given the incentive advertising platforms have to obtain
high-fidelity PII (phone numbers and email addresses)
to enhance their services, there is a strong reason to
expect the re-purposing of collected PII for targeted
advertising. This incentive is exacerbated with the recent introduction of PII-based targeting, which allows
advertisers to specify exactly which users to target by
specifying a list of their PII.
This paper was the first to propose a methodology
that uses size estimates to study what sources of PII
are used for PII-based targeted advertising. We applied
the proposed methodology to investigate which of a potential range of sources of PII were actually used by
Facebook for its PII-based targeted advertising platform, confirming that Facebook uses at least five different sources of PII to enable PII-based advertising.
We also examined what is disclosed to users and
what controls users have over the PII that is used to
target them. We showed that there is often very little
disclosure to users, often in the form of generic statements that do not refer to the uses of the particular PII
being collected or that it may be used to allow advertisers to target users. Our paper highlights the need to
further study the sources of PII used for advertising,
and shows that more disclosure and transparency needs
to be provided to users.

8 Acknowledgements
We thank the anonymous reviewers for their helpful
comments. This research was supported in part by the
Data Transparency Lab and NSF grant CNS-1616234.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):17–18

References
[1]
[2]

[3]
[4]
[5]
[6]
[7]

[8]

[9]
[10]

[11]
[12]
[13]
[14]

[15]

[16]

[17]

[18]

[19]

[20]
[21]

Graph API: Changelog Version 3.0. https://developers.
facebook.com/docs/graph-api/changelog/version3.0.
A. Andreou, G. Venkatadri, O. Goga, K. P. Gummadi, P.
Loiseau, and A. Mislove. Investigating Ad Transparency
Mechanisms in Social Media: A Case Study of Facebook’s
Explanations. NDSS, 2018.
About Customer Match. https://support.google.com/
adwords/answer/6379332?hl=en.
About Potential Reach. https://www.facebook.com/
business/help/1665333080167380?helpref=faq_content.
Accessing Your Facebook Data. https://www.facebook.
com/help/405183566203254.
Best practices for unique identifiers. https://developer.
android.com/training/articles/user-data-ids.
C. Cadwalladr and E. Graham-Harrison. Revealed: 50 million
Facebook profiles harvested for Cambridge Analytica in major data breach. https://www.theguardian.com/news/2018/
mar/17/cambridge-analytica-facebook-influence-us-election.
A. Datta, M. C. Tschantz, and A. Datta. Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice,
and Discrimination. PETS, 2015.
Facebook. Personal Communication.
Facebook (Still) Letting Housing Advertisers Exclude Users
by Race. https://www.propublica.org/article/facebookadvertising-discrimination-housing-race-sex-national-origin.
Facebook Ads Preferences. https://www.facebook.com/ads/
preferences.
Facebook Custom Audiences. https://developers.facebook.
com/docs/marketing-api/custom-audiences-targeting/v3.1.
Facebook Data Policy. https://www.facebook.com/about/
privacy/update.
Facebook Enabled Advertisers to Reach ‘Jew Haters’. https:
//www.propublica.org/article/facebook-enabled-advertisersto-reach-jew-haters.
Facebook Leaks Usernames, User IDs, and Personal Details
to Advertisers. http://www.benedelman.org/news/0520101.html.
Facebook Lets Advertisers Exclude Users by Race. https:
//www.propublica.org/article/facebook-lets-advertisersexclude-users-by-race/.
Facebook Messenger Chatbots Can Leak Your Private Information. https://www.techworm.net/2016/09/facebookmessenger-chatbots-can-leak-private-information.html.
Facebook Response to Questions from Committee on Commerce, Science, and Transportation.
https://www.commerce.senate.gov/public/_cache/
files/9d8e069d-2670-4530-bcdc-d3a63a8831c4/
7C8DE61421D13E86FC6855CC2EA7AEA7.senatecommerce-committee-combined-qfrs-06.11.2018.pdf.
Facebook Turned Its Two-Factor Security ‘Feature’ Into
the Worst Kind of Spam. https://gizmodo.com/facebookturned-its-two-factor-security-feature-into-th-1823006334.
Facebook adds human reviewers after ‘Jew haters’ ad scandal. http://www.bbc.com/news/technology-41342642.
Facebook marketing API. https://developers.facebook.com/
docs/marketing-apis.

[22] Facebook plans crackdown on ad targeting by email without
consent. https://techcrunch.com/2018/03/31/customaudiences-certification/.
[23] Facebook: About the delivery system: Ad auctions. https:
//www.facebook.com/business/help/430291176997542.
[24] How Facebook Figures Out Everyone You’ve Ever Met.
https://gizmodo.com/how-facebook-figures-out-everyoneyouve-ever-met-1819822691.
[25] How Trump Conquered Facebook—Without Russian Ads.
https://www.wired.com/story/how-trump-conqueredfacebookwithout-russian-ads/.
[26] Improving Enforcement and Promoting Diversity: Updates
to Ads Policies and Tools. http://newsroom.fb.com/news/
2017/02/improving-enforcement-and-promoting-diversityupdates-to-ads-policies-and-tools/.
[27] A. Korolova. Privacy Violations Using Microtargeted Ads:
A Case Study. Journal of Privacy and Confidentiality, 3(1),
2011.
[28] B. Krishnamurthy, K. Naryshkin, and C. E. Wills. Privacy
leakage vs. Protection measures: the growing disconnect.
IEEE W2SP, 2011.
[29] M. Lecuyer, G. Ducoffe, F. Lan, A. Papancea, T. Petsios, R.
Spahn, A. Chaintreau, and R. Geambasu. XRay: Enhancing the Web’s Transparency with Differential Correlation.
USENIX Security, 2014.
[30] M. Lecuyer, R. Spahn, Y. Spiliopolous, A. Chaintreau, R.
Geambasu, and D. Hsu. Sunlight: Fine-grained Targeting
Detection at Scale with Statistical Confidence. CCS, 2015.
[31] G. Marvin. Exclusive: Facebook will no longer show audience
reach estimates for Custom Audiences after vulnerability
detected. 2018. https://marketingland.com/exclusivefacebook-will-no-longer-show-audience-reach-estimates-forcustom-audiences-after-vulnerability-detected-236923/.
[32] J. Parra-Arnau, J. P. Achara, and C. Castelluccia. MyAdChoices: Bringing Transparency and Control to Online Advertising. ACM TWEB, 11, 2017.
[33] Protecting Privacy with Referrers. Facebook Engineering’s Notes. http://www.facebook.com/notes/facebookengineering/protecting-privacy-with-referrers/392382738919.
[34] M. Schroepfer. An Update on Our Plans to Restrict Data
Access on Facebook. 2018. https://newsroom.fb.com/news/
2018/04/restricting-data-access/.
[35] T. Speicher, M. Ali, G. Venkatadri, F. N. Ribeiro, G. Arvanitakis, F. Benevenuto, K. P. Gummadi, P. Loiseau, and
A. Mislove. On the Potential for Discrimination in Online
Targeted Advertising. FAT*, 2018.
[36] C. E. Tucker. Social Networks, Personalized Advertising, and
Privacy Controls. Journal of Marketing Research, 2014.
[37] Target Custom Groups of Twitter Users. https://business.
twitter.com/en/targeting/tailored-audiences.html.
[38] The Facebook Business Tools. https://www.facebook.com/
help/331509497253087.
[39] The Facebook Companies. https://www.facebook.com/
help/111814505650678.
[40] G. Venkatadri, Y. Liu, A. Andreou, O. Goga, P. Loiseau, A.
Mislove, and K. P. Gummadi. Privacy Risks with Facebook’s
PII-based Targeting: Auditing a Data Broker’s Advertising
Interface. IEEE S&P, 2018.
[41] C. E. Wills and C. Tatar. Understanding What They Do
with What They Know. WPES, 2012.

 Proceedings on Privacy Enhancing Technologies ..; .. (..):18–18

[42] What are the Facebook Products? https://www.facebook.
com/help/1561485474074139.
[43] What’s a Custom Audience from a Customer List? https:
//www.facebook.com/business/help/341425252616329/.
[44] eyeWnder_Experiment. http://www.eyewnder.com/.