arXiv:1609.02943v1 [cs.CR] 9 Sep 2016

Stealing Machine Learning Models via Prediction APIs
Florian Tram`er
EPFL

Fan Zhang
Cornell University
Michael K. Reiter
UNC Chapel Hill

Abstract
Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial
value, or use in security applications. Increasingly often,
confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow
users to train models on potentially sensitive data and
charge others for access on a pay-per-query basis.
The tension between model confidentiality and public access motivates our investigation of model extraction
attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data, aims to duplicate the functionality
of (i.e., “steal”) the model. Unlike in classical learning
theory settings, ML-as-a-service offerings may accept
partial feature vectors as inputs and include confidence
values with predictions. Given these practices, we show
simple, efficient attacks that extract target ML models
with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning.
We further show that the natural countermeasure of omitting confidence values from model outputs still admits
potentially harmful model extraction attacks. Our results
highlight the need for careful ML model deployment and
new model extraction countermeasures.

1

Introduction

Machine learning (ML) aims to provide automated extraction of insights from data by means of a predictive
model. A predictive model is a function that maps feature
vectors to a categorical or real-valued output. In a supervised setting, a previously gathered data set consisting
This is an extended version of a paper that appeared at USENIX
Security, 2016.

Ari Juels
Cornell Tech, Jacobs Institute
Thomas Ristenpart
Cornell Tech

of possibly confidential feature-vector inputs (e.g., digitized health records) with corresponding output class labels (e.g., a diagnosis) serves to train a predictive model
that can generate labels on future inputs. Popular models
include support vector machines (SVMs), logistic regressions, neural networks, and decision trees.
ML algorithms’ success in the lab and in practice has
led to an explosion in demand. Open-source frameworks
such as PredictionIO and cloud-based services offered
by Amazon, Google, Microsoft, BigML, and others have
arisen to broaden and simplify ML model deployment.
Cloud-based ML services often allow model owners to
charge others for queries to their commercially valuable
models. This pay-per-query deployment option exemplifies an increasingly common tension: The query interface of an ML model may be widely accessible, yet
the model itself and the data on which it was trained
may be proprietary and confidential. Models may also
be privacy-sensitive because they leak information about
training data [4, 23, 24]. For security applications such
as spam or fraud detection [9, 29, 36, 55], an ML model’s
confidentiality is critical to its utility: An adversary that
can learn the model can also often evade detection [4,36].
In this paper we explore model extraction attacks,
which exploit the tension between query access and confidentiality in ML models. We consider an adversary that
can query an ML model (a.k.a. a prediction API) to obtain predictions on input feature vectors. The model may
be viewed as a black box. The adversary may or may
not know the model type (logistic regression, decision
tree, etc.) or the distribution over the data used to train
the model. The adversary’s goal is to extract an equivalent or near-equivalent ML model, i.e., one that achieves
(close to) 100% agreement on an input space of interest.
We demonstrate successful model extraction attacks
against a wide variety of ML model types, including decision trees, logistic regressions, SVMs, and deep neural networks, and against production ML-as-a-service

 Service

Model Type
Logistic Regression
Amazon
Logistic Regression
Decision Tree
BigML
Decision Tree

Data set
Queries
Digits
650
Adult
1,485
German Credit
1,150
Steak Survey
4,013

Time (s)
70
149
631
2,088

ies on BigML and Amazon. For both services, we show
computationally fast attacks that use a small number of
queries to extract models matching the targets on 100%
of tested inputs. See Table 1 for a quantitative summary.

Table 1: Results of model extraction attacks on ML services. For

Having demonstrated the broad applicability of model
extraction attacks to existing services, we consider the
most obvious potential countermeasure ML services
might adopt: Omission of confidence values, i.e., output
of class labels only. This approach would place model
extraction back in the membership query setting of prior
work in learning theory [3, 8, 36, 53]. We demonstrate
a generalization of an adaptive algorithm by Lowd and
Meek [36] from binary linear classifiers to more complex model types, and also propose an attack inspired by
the agnostic learning algorithm of Cohn et al. [18]. Our
new attacks extract models matching targets on >99% of
the input space for a variety of model classes, but need
up to 100× more queries than equation-solving attacks
(specifically for multiclass linear regression and neural
networks). While less effective than equation-solving,
these attacks remain attractive for certain types of adversary. We thus discuss further ideas for countermeasures.

each target model, we report the number of prediction queries made to
the ML API in an attack that extracts a 100% equivalent model. The
attack time is primarily influenced by the service’s prediction latency
(≈ 100ms/query for Amazon and ≈ 500ms/query for BigML).

(MLaaS) providers, including Amazon and BigML.1 In
nearly all cases, our attacks yield models that are functionally very close to the target. In some cases, our attacks extract the exact parameters of the target (e.g., the
coefficients of a linear classifier or the paths of a decision
tree). For some targets employing a model type, parameters or features unknown to the attacker, we additionally show a successful preliminary attack step involving
reverse-engineering these model characteristics.
Our most successful attacks rely on the informationrich outputs returned by the ML prediction APIs of all
cloud-based services we investigated. Those of Google,
Amazon, Microsoft, and BigML all return high-precision
confidence values in addition to class labels. They also
respond to partial queries lacking one or more features.
Our setting thus differs from traditional learning-theory
settings [3, 7, 8, 15, 30, 33, 36, 53] that assume only membership queries, outputs consisting of a class label only.
For example, for logistic regression, the confidence value
is a simple log-linear function 1/(1+e−(w·x+β ) ) of the ddimensional input vector x. By querying d + 1 random
d-dimensional inputs, an attacker can with high probability solve for the unknown d + 1 parameters w and β
defining the model. We emphasize that while this model
extraction attack is simple and non-adaptive, it affects all
of the ML services we have investigated.
Such equation-solving attacks extend to multiclass logistic regressions and neural networks, but do not work
for decision trees, a popular model choice. (BigML, for
example, initially offered only decision trees.) For decision trees, a confidence value reflects the number of
training data points labeled correctly on an input’s path
in the tree; simple equation-solving is thus inapplicable.
We show how confidence values can nonetheless be exploited as pseudo-identifiers for paths in the tree, facilitating discovery of the tree’s structure. We demonstrate
successful model extraction attacks that use adaptive, iterative search algorithms to discover paths in a tree.
We experimentally evaluate our attacks by training
models on an array of public data sets suitable as standins for proprietary ones. We validate the attacks locally
using standard ML libraries, and then present case stud1 We simulated victims by training models in our own accounts. We
have disclosed our results to affected services in February 2016.

In summary, we explore model extraction attacks, a
practical kind of learning task that, in particular, affects
emerging cloud-based ML services being built by Amazon, Google, Microsoft, BigML, and others. We show:
• Simple equation-solving model extraction attacks that
use non-adaptive, random queries to solve for the parameters of a target model. These attacks affect a wide
variety of ML models that output confidence values.
We show their success against Amazon’s service (using our own models as stand-ins for victims’), and also
report successful reverse-engineering of the (only partially documented) model type employed by Amazon.
• A new path-finding algorithm for extracting decision
trees that abuses confidence values as quasi-identifiers
for paths. To our knowledge, this is the first example
of practical “exact” decision tree learning. We demonstrate the attack’s efficacy via experiments on BigML.
• Model extraction attacks against models that output
only class labels, the obvious countermeasure against
extraction attacks that rely on confidence values. We
show slower, but still potentially dangerous, attacks in
this setting that build on prior work in learning theory.
We additionally make a number of observations about the
implications of extraction. For example, attacks against
Amazon’s system indirectly leak various summary statistics about a private training set, while extraction against
kernel logistic regression models [57] recovers significant information about individual training data points.
The source code for our attacks is available online at
https://github.com/ftramer/Steal-ML.

 ML#service#

Background

For our purposes, a ML model is a function f : X → Y.
An input is a d-dimensional vector in the feature space
X = X1 × X2 × · · · × Xd . Outputs lie in the range Y.
We distinguish between categorical features, which assume one of a finite set of values (whose set size is the
arity of the feature), and continuous features, which assume a value in a bounded subset of the real numbers.
Without loss of generality, for a categorical feature of arity k, we let Xi = Zk . For a continuous feature taking
values between bounds a and b, we let Xi = [a, b] ⊂ R.
Inputs to a model may be pre-processed to perform feature extraction. In this case, inputs come from a space
M, and feature extraction involves application of a function ex : M → X that maps inputs into a feature space.
Model application then proceeds by composition in the
natural way, taking the form f (ex(M)). Generally, feature extraction is many-to-one. For example, M may
be a piece of English language text and the extracted
features counts of individual words (so-called “bag-ofwords” feature extraction). Other examples are input
scaling and one-hot-encoding of categorical features.
We focus primarily on classification settings in which
f predicts a nominal variable ranging over a set of
classes. Given c classes, we use as class labels the set
Zc . If Y = Zc , the model returns only the predicted class
label. In some applications, however, additional information is often helpful, in the form of real-valued measures
of confidence on the labels output by the model; these
measures are called confidence values. The output space
is then Y = [0, 1]c . For a given x ∈ X and i ∈ Zc , we denote by fi (x) the ith component of f (x) ∈ Y. The value
fi (x) is a model-assigned probability that x has associated class label i. The model’s predicted class is defined
by the value argmaxi fi (x), i.e., the most probable label.
We associate with Y a distance measure dY . We drop
the subscript Y when it is clear from context. For Y = Zc
we use 0-1 distance, meaning d(y, y ) = 0 if y = y and
d(y, y ) = 1 otherwise. For Y = [0, 1]c , we use the 0-1
distance when comparing predicted classes; when comparing class probabilities directly, we instead use the total variation distance, given by d(y, y ) = 12 ∑  y[i]−y [i] .
In the rest of this paper, unless explicitly specified otherwise, dY refers to the 0-1 distance over class labels.
Training algorithms. We consider models obtained
via supervised learning. These models are generated by
a training algorithm T that takes as input a training set
{(xi , yi )}i , where (xi , yi ) ∈ X × Y is an input with an associated (presumptively correct) class label. The output
of T is a model f defined by a set of parameters, which
are model-specific, and hyper-parameters, which specify the type of models T generates. Hyper-parameters

Data#owner#
Train#
model##

DB#

x1

f (x1 )

…#

2

xq

Extrac3on#
adversary#

fˆ

f (xq )

Figure 1: Diagram of ML model extraction attacks. A data owner
has a model f trained on its data and allows others to make prediction
queries. An adversary uses q prediction queries to extract an fˆ ≈ f .

may be viewed as distinguished parameters, often taken
from a small number of standard values; for example, the
kernel-type used in an SVM, of which only a small set
are used in practice, may be seen as a hyper-parameter.

3

Model Extraction Attacks

An ML model extraction attack arises when an adversary
obtains black-box access to some target model f and attempts to learn a model fˆ that closely approximates, or
even matches, f (see Figure 1).
As mentioned previously, the restricted case in which
f outputs class labels only, matches the membership
query setting considered in learning theory, e.g., PAC
learning [53] and other previous works [3, 7, 8, 15, 30,
33, 36]. Learning theory algorithms have seen only limited study in practice, e.g., in [36], and our investigation may be viewed as a practice-oriented exploration of
this branch of research. Our initial focus, however, is on
a different setting common in today’s MLaaS services,
which we now explain in detail. Models trained by these
services emit data-rich outputs that often include confidence values, and in which partial feature vectors may
be considered valid inputs. As we show later, this setting
greatly advantages adversaries.
Machine learning services. A number of companies
have launched or are planning to launch cloud-based ML
services. A common denominator is the ability of users
to upload data sets, have the provider run training algorithms on the data, and make the resulting models generally available for prediction queries. Simple-to-use Web
APIs handle the entire interaction. This service model
lets users capitalize on their data without having to set
up their own large-scale ML infrastructure. Details vary
greatly across services. We summarize a number of them
in Table 2 and now explain some of the salient features.
A model is white-box if a user may download a representation suitable for local use. It is black-box if accessible only via a prediction query interface. Amazon and Google, for example, provide black-box-only
services. Google does not even specify what training
algorithm their service uses, while Amazon provides
only partial documentation for its feature extraction ex

 White-box

Monetize

Confidence
Scores

Logistic
Regression

SVM

Neural
Network

Decision
Tree

Service
Amazon [1]
Microsoft [38]
BigML [11]
PredictionIO [43]
Google [25]

✗
✗
✓
✓
✗

✗
✗
✓
✗
✓

✓
✓
✓
✗
✓

✓
✓
✓
✓
✓

✗
✓
✗
✓
✓

✗
✓
✗
✗
✓

✗
✓
✓
✓
✓

Table 2: Particularities of major MLaaS providers. ‘White-box’
refers to the ability to download and use a trained model locally, and
‘Monetize’ means that a user may charge other users for black-box
access to her models. Model support for each service is obtained from
available documentation. The models listed for Google’s API are a projection based on the announced support of models in standard PMML
format [25]. Details on ML models are given in Appendix A.

(see Section 5). Some services allow users to monetize
trained models by charging others for prediction queries.
To use these services, a user uploads a data set and
optionally applies some data pre-processing (e.g., field
removal or handling of missing values). She then trains a
model by either choosing one of many supported model
classes (as in BigML, Microsoft, and PredictionIO) or
having the service choose an appropriate model class (as
in Amazon and Google). Two services have also announced upcoming support for users to upload their own
trained models (Google) and their own custom learning
algorithms (PredictionIO). When training a model, users
may tune various parameters of the model or trainingalgorithm (e.g., regularizers, tree size, learning rates) and
control feature-extraction and transformation methods.
For black-box models, the service provides users with
information needed to create and interpret predictions,
such as the list of input features and their types. Some
services also supply the model class, chosen training parameters, and training data statistics (e.g., BigML gives
the range, mean, and standard deviation of each feature).
To get a prediction from a model, a user sends one
or more input queries. The services we reviewed accept
both synchronous requests and asynchronous ‘batch’ requests for multiple predictions. We further found varying degrees of support for ‘incomplete’ queries, in which
some input features are left unspecified [46]. We will
show that exploiting incomplete queries can drastically
improve the success of some of our attacks. Apart from
PredictionIO, all of the services we examined respond to
prediction queries with not only class labels, but a variety
of additional information, including confidence scores
(typically class probabilities) for the predicted outputs.
Google and BigML allow model owners to monetize their models by charging other users for predictions.
Google sets a minimum price of $0.50 per 1,000 queries.
On BigML, 1,000 queries consume at least 100 credits,
costing $0.10–$5, depending on the user’s subscription.

Attack scenarios. We now describe possible motivations for adversaries to perform model extraction attacks.
We then present a more detailed threat model informed
by characteristics of the aforementioned ML services.
Avoiding query charges. Successful monetization of
prediction queries by the owner of an ML model f requires confidentiality of f . A malicious user may seek to
launch what we call a cross-user model extraction attack,
stealing f for subsequent free use. More subtly, in blackbox-only settings (e.g., Google and Amazon), a service’s
business model may involve amortizing up-front training
costs by charging users for future predictions. A model
extraction attack will undermine the provider’s business
model if a malicious user pays less for training and extracting than for paying per-query charges.
Violating training-data privacy. Model extraction
could, in turn, leak information about sensitive training
data. Prior attacks such as model inversion [4, 23, 24]
have shown that access to a model can be abused to infer
information about training set points. Many of these attacks work better in white-box settings; model extraction
may thus be a stepping stone to such privacy-abusing attacks. Looking ahead, we will see that in some cases,
significant information about training data is leaked trivially by successful model extraction, because the model
itself directly incorporates training set points.
Stepping stone to evasion. In settings where an ML
model serves to detect adversarial behavior, such as identification of spam, malware classification, and network
anomaly detection, model extraction can facilitate evasion attacks. An adversary may use knowledge of the
ML model to avoid detection by it [4, 9, 29, 36, 55].
In all of these settings, there is an inherent assumption
of secrecy of the ML model in use. We show that this
assumption is broken for all ML APIs that we investigate.
Threat model in detail. Two distinct adversarial models arise in practice. An adversary may be able to make
direct queries, providing an arbitrary input x to a model f
and obtaining the output f (x). Or the adversary may be
able to make only indirect queries, i.e., queries on points
in input space M yielding outputs f (ex(M)). The feature
extraction mechanism ex may be unknown to the adversary. In Section 5, we show how ML APIs can further
be exploited to “learn” feature extraction mechanisms.
Both direct and indirect access to f arise in ML services.
(Direct query interfaces arise when clients are expected
to perform feature extraction locally.) In either case, the
output value can be a class label, a confidence value vector, or some data structure revealing various levels of information, depending on the exposed API.
We model the adversary, denoted by A, as a randomized algorithm. The adversary’s goal is to use as few
queries as possible to f in order to efficiently compute

 an approximation fˆ that closely matches f . We formalize
“closely matching” using two different error measures:
• Test error Rtest : This is the average error over a test set
D, given by Rtest ( f , fˆ) = ∑(x,y)∈D d( f (x), fˆ(x))/ D .
A low test error implies that fˆ matches f well for inputs distributed like the training data samples. 2
• Uniform error Runif : For a set U of vectors uniformly
chosen in X , let Runif ( f , fˆ) = ∑x∈U d( f (x), fˆ(x))/ U .
Thus Runif estimates the fraction of the full feature
space on which f and fˆ disagree. (In our experiments,
we found  U  = 10, 000 was sufficiently large to obtain
stable error estimates for the models we analyzed.)
We define the extraction accuracy under test and uniform error as 1 − Rtest ( f , fˆ) and 1 − Runif ( f , fˆ). Here we
implicitly refer to accuracy under 0-1 distance. When assessing how close the class probabilities output by fˆ are
to those of f (with the total-variation distance) we use
TV
ˆ
ˆ
the notations RTV
test ( f , f ) and Runif ( f , f ).
An adversary may know any of a number of pieces
of information about a target f : What training algorithm
T generated f , the hyper-parameters used with T , the
feature extraction function ex, etc. We will investigate a
variety of settings in this work corresponding to different
APIs seen in practice. We assume that A has no more
information about a model’s training data, than what is
provided by an ML API (e.g., summary statistics). For
simplicity, we focus on proper model extraction: If A
believes that f belongs to some model class, then A’s
goal is to extract a model fˆ from the same class. We
discuss some intuition in favor of proper extraction in
Appendix D, and leave a broader treatment of improper
extraction strategies as an interesting open problem.

4

Extraction with Confidence Values

We begin our study of extraction attacks by focusing on
prediction APIs that return confidence values. As per
Section 2, the output of a query to f thus falls in a range
[0, 1]c where c is the number of classes. To motivate this,
we recall that most ML APIs reveal confidence values
for models that support them (see Table 2). This includes
logistic regressions (LR), neural networks, and decision
trees, defined formally in Appendix A. We first introduce
a generic equation-solving attack that applies to all logistic models (LR and neural networks). In Section 4.2, we
present two novel path-finding attacks on decision trees.

4.1

Equation-Solving Attacks

Many ML models we consider directly compute class
probabilities as a continuous function of the input x and
2 Note that for some D, it is possible that fˆ predicts true labels better
than f , yet Rtest ( f , fˆ) is large, because fˆ does not closely match f .

Data set
Circles
Moons
Blobs
5-Class
Adult (Income)
Adult (Race)
Iris
Steak Survey
GSS Survey
Digits
Breast Cancer
Mushrooms
Diabetes

Synthetic
Yes
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No

# records
5,000
5,000
5,000
1,000
48,842
48,842
150
331
16,127
1,797
683
8,124
768

# classes # features
2
2
2
2
3
2
5
20
2
108
5
105
3
4
5
40
3
101
10
64
2
10
2
112
2
8

Table 3:

Data sets used for extraction attacks. We train two models on the
Adult data, with targets ‘Income’ and ‘Race’. SVMs and binary logistic regressions are trained on data sets with 2 classes. Multiclass regressions and neural
networks are trained on multiclass data sets. For decision trees, we use a set of
public models shown in Table 5.

real-valued model parameters. In this case, an API that
reveals these class probabilities provides an adversary A
with samples (x, f (x)) that can be viewed as equations
in the unknown model parameters. For a large class of
models, these equation systems can be efficiently solved,
thus recovering f (or some good approximation of it).
Our approach for evaluating attacks will primarily
be experimental. We use a suite of synthetic or publicly available data sets to serve as stand-ins for proprietary data that might be the target of an extraction attack. Table 3 displays the data sets used in this section,
which we obtained from various sources: the synthetic
ones we generated; the others are taken from public
surveys (Steak Survey [26] and GSS Survey [49]), from
scikit [42] (Digits) or from the UCI ML library [35].
Mre details about these data sets are in Appendix B.
Before training, we remove rows with missing values,
apply one-hot-encoding to categorical features, and scale
all numeric features to the range [−1, 1]. We train our
models over a randomly chosen subset of 70% of the
data, and keep the rest for evaluation (i.e., to calculate
Rtest ). We discuss the impact of different pre-processing
and feature extraction steps in Section 5, when we evaluate equation-solving attacks on production ML services.
4.1.1

Binary logistic regression

As a simple starting point, we consider the case of logistic regression (LR). A LR model performs binary classification (c = 2), by estimating the probability of a binary response, based on a number of independent features. LR is one of the most popular binary classifiers,
due to its simplicity and efficiency. It is widely used in
many scientific fields (e.g., medical and social sciences)
and is supported by all the ML services we reviewed.
Formally, a LR model is defined by parameters w ∈
Rd , β ∈ R, and outputs a probability f1 (x) = σ (w · x +
β ), where σ (t) = 1/(1 + e−t ). LR is a linear classifier: it
defines a hyperplane in the feature space X (defined by
w · x + β = 0), that separates the two classes.

 Given an oracle sample (x, f (x)), we get a linear equation w·x+β = σ −1 ( f1 (x)). Thus, d +1 samples are both
necessary and sufficient (if the queried x are linearly independent) to recover w and β . Note that the required
samples are chosen non-adaptively, and can thus be obtained from a single batch request to the ML service.
We stress that while this extraction attack is rather
straightforward, it directly applies, with possibly devastating consequences, to all cloud-based ML services we
considered. As an example, recall that some services
(e.g., BigML and Google) let model owners monetize
black-box access to their models. Any user who wishes
to make more than d + 1 queries to a model would then
minimize the prediction cost by first running a crossuser model extraction attack, and then using the extracted
model for personal use, free of charge. As mentioned in
Section 3, attackers with a final goal of model-inversion
or evasion may also have incentives to first extract the
model. Moreover, for services with black-box-only access (e.g., Amazon or Google), a user may abuse the service’s resources to train a model over a large data set D
(i.e.,  D  d), and extract it after only d + 1 predictions.
Crucially, the extraction cost is independent of  D . This
could undermine a service’s business model, should prediction fees be used to amortize the high cost of training.
For each binary data set shown in Table 3, we train a
LR model and extract it given d + 1 predictions. In all
cases, we achieve Rtest = Runif = 0. If we compare the
TV
probabilities output by f and fˆ, RTV
test and Runif are lower
−9
than 10 . For these models, the attack requires only 41
queries on average, and 113 at most. On Google’s platform for example, an extraction attack would cost less
than $0.10, and subvert any further model monetization.
4.1.2

Multiclass LRs and Multilayer Perceptrons

We now show that such equation-solving attacks broadly
extend to all model classes with a ‘logistic’ layer, including multiclass (c > 2) LR and deeper neural networks.
We define these models formally in Appendix A.
A multiclass logistic regression (MLR) combines c binary models, each with parameters wi , βi , to form a multiclass model. MLRs are available in all ML services we
reviewed. We consider two types of MLR models: softmax and one-vs-rest (OvR), that differ in how the c binary models are trained and combined: A softmax model
fits a joint multinomial distribution to all training samples, while a OvR model trains a separate binary LR for
each class, and then normalizes the class probabilities.
A MLR model f is defined by parameters w ∈ Rcd ,
β ∈ Rc . Each sample (x, f (x)) gives c equations in w and
β . The equation system is non-linear however, and has
no analytic solution. For softmax models for instance,
w j ·x+β j ) =
the equations take the form ewi ·x+βi /(∑c−1
j=0 e

Model

Unknowns

Softmax

530

OvR

530

MLP

2,225

Queries
265
530
265
530
1,112
2,225
4,450
11,125

1 − Rtest
99.96%
100.00%
99.98%
100.00%
98.17%
98.68%
99.89%
99.96%

1 − Runif
99.75%
100.00%
99.98%
100.00%
94.32%
97.23%
99.82%
99.99%

Time (s)
2.6
3.1
2.8
3.5
155
168
195
89

Table 4: Success of equation-solving attacks. Models to extract
were trained on the Adult data set with multiclass target ‘Race’. For
each model, we report the number of unknown model parameters, the
number of queries used, and the running time of the equation solver.
The attack on the MLP with 11,125 queries converged after 490 epochs.

fi (x). A common method for solving such a system is
by minimizing an appropriate loss function, such as the
logistic loss. With a regularization term, the loss function is strongly convex, and the optimization thus converges to a global minimum (i.e., a function fˆ that predicts the same probabilities as f for all available samples). A similar optimization (over class labels rather
than probabilities) is actually used for training logistic
models. Any MLR implementation can thus easily be
adapted for model extraction with equation-solving.
This approach naturally extends to deeper neural networks. We consider multilayer perceptrons (MLP), that
first apply a non-linear transform to all inputs (the hidden layer), followed by a softmax regression in the transformed space. MLPs are becoming increasingly popular
due to the continued success of deep learning methods;
the advent of cloud-based ML services is likely to further
boost their adoption. For our attacks, MLPs and MLRs
mainly differ in the number of unknowns in the system
to solve. For perceptrons with one hidden layer, we have
w ∈ Rdh+hc , β ∈ Rh+c , where h is the number of hidden
nodes (h = 20 in our experiments). Another difference
is that the loss function for MLPs is not strongly convex.
The optimization may thus converge to a local minimum,
i.e., a model fˆ that does not exactly match f ’s behavior.
To illustrate our attack’s success, we train a softmax
regression, a OvR regression and a MLP on the Adult
data set with target ‘Race’ (c = 5). For the non-linear
equation systems we obtain, we do not know a priori
how many samples we need to find a solution (in contrast to linear systems where d + 1 samples are necessary
and sufficient). We thus explore various query budgets
of the form α · k, where k is the number of unknown
model parameters, and α is a budget scaling factor. For
MLRs, we solve the equation system with BFGS [41]
in scikit [42]. For MLPs, we use theano [51] to run
stochastic gradient descent for 1,000 epochs. Our experiments were performed on a commodity laptop (2-core Intel CPU @3.1GHz, 16GB RAM, no GPU acceleration).
Table 4 shows the extraction success for each model,
as we vary α from 0.5 to at most 5. For MLR models

 (a)

(b)

Figure 2: Training data leakage in KLR models. (a) Displays 5 of
20 training samples used as representers in a KLR model (top) and 5 of
20 extracted representers (bottom). (b) For a second model, shows the
average of all 1,257 representers that the model classifies as a 3, 4, 5, 6
or 7 (top) and 5 of 10 extracted representers (bottom).

(softmax and OvR), the attack is extremely efficient, requiring around one query per unknown parameter of f
(each query yields c = 5 equations). For MLPs, the system to solve is more complex, with about 4 times more
unknowns. With a sufficiently over-determined system,
we converge to a model fˆ that very closely approximates
f . As for LR models, queries are chosen non-adaptively,
so A may submit a single ‘batch request’ to the API.
We further evaluated our attacks over all multiclass
data sets from Table 3. For MLR models with k = c·(d +
1) parameters (c is the number of classes), k queries were
sufficient to achieve perfect extraction (Rtest = Runif = 0,
TV
−7
RTV
test and Runif below 10 ). We use 260 samples on
average, and 650 for the largest model (Digits). For
MLPs with 20 hidden nodes, we achieved >99.9% accuracy with 5,410 samples on average and 11,125 at most
(Adult). With 54,100 queries on average, we extracted a
fˆ with 100% accuracy over tested inputs. As for binary
LRs, we thus find that cross-user model extraction attacks for these model classes can be extremely efficient.
4.1.3

Training Data Leakage for Kernel LR

We now move to a less mainstream model class, kernel
logistic regression [57], that illustrates how extraction attacks can leak private training data, when a model’s outputs are directly computed as a function of that data.
Kernel methods are commonly used to efficiently extend support vector machines (SVM) to nonlinear classifiers [14], but similar techniques can be applied to logistic regression [57]. Compared to kernel SVMs, kernel
logistic regressions (KLR) have the advantage of computing class probabilities, and of naturally extending to
multiclass problems. Yet, KLRs have not reached the
popularity of kernel SVMs or standard LRs, and are not
provided by any MLaaS provider at the time. We note
that KLRs could easily be constructed in any ML library
that supports both kernel functions and LR models.
A KLR model is a softmax model, where we replace the linear components wi · x + βi by a mapping
∑sr=1 αi,r K(x, xr ) + βi . Here, K is a kernel function, and
the representers x1 , . . . , xs are a chosen subset of the
training points [57]. More details are in Appendix A.
Each sample (x, f (x)) from a KLR model yields c
equations over the parameters α ∈ Rsc , β ∈ Rc and the

representers x1 , . . . , xs . Thus, by querying the model,
A obtains a non-linear equation system, the solution of
which leaks training data. This assumes that A knows
the exact number s of representers sampled from the data.
However, we can relax this assumption: First, note that
f ’s outputs are unchanged by adding ‘extra’ representers,
with weights α = 0. Thus, over-estimating s still results
in a consistent system of equations, of which a solution
is the model f , augmented with unused representers. We
will also show experimentally that training data may leak
even if A extracts a model fˆ with s
s representers.
We build two KLR models with a radial-basis function
(RBF) kernel for a data set of handwritten digits. We select 20 random digits as representers for the first model,
and all 1,257 training points for the second. We extract
the first model, assuming knowledge of s, by solving a
system of 50,000 equations in 1,490 unknowns. We use
the same approach as for MLPs, i.e., logistic-loss minimization using gradient descent. We initialize the extracted representers to uniformly random vectors in X ,
as we assume A does not know the training data distribution. In Figure 2a, we plot 5 of the model’s representers
from the training data, and the 5 closest (in l1 norm) extracted representers. The attack clearly leaks information
on individual training points. We measure the attack’s robustness to uncertainty about s, by attacking the second
model with only 10 local representers (10,000 equations
in 750 unknowns). Figure 2b shows the average image
of training points classified as a 3, 4, 5, 6 or 7 by the target model f , along with 5 extracted representers of fˆ.
Surprisingly maybe, the attack seems to be leaking the
‘average representor’ of each class in the training data.
4.1.4

Model Inversion Attacks on Extracted Models

Access to a model may enable inference of privacydamaging information, particularly about the training
set [4, 23, 24]. The model inversion attack explored by
Fredrikson et al. [23] uses access to a classifier f to find
the input xopt that maximizes the class probability for
class i, i.e., xopt = argmaxx∈X fi (x). This was shown
to allow recovery of recognizable images of training set
members’ faces when f is a facial recognition model.
Their attacks work best in a white-box setting, where
the attacker knows f and its parameters. Yet, the authors
also note that in a black-box setting, remote queries to a
prediction API, combined with numerical approximation
techniques, enable successful, albeit much less efficient,
attacks. Furthermore, their black-box attacks inherently
require f to be queried adaptively. They leave as an open
question making black-box attacks more efficient.
We explore composing an attack that first attempts to
extract a model fˆ ≈ f , and then uses it with the [23]
white-box inversion attack. Our extraction techniques re-

 place adaptive queries with a non-adaptive “batch” query
to f , followed by local computation. We show that extraction plus inversion can require fewer queries and less
time than performing black-box inversion directly.
As a case study, we use the softmax model from [23],
trained over the AT&T Faces data [5]. The data set consists of images of faces (92 × 112 pixels) of 40 people. The black-box attack from [23] needs about 20,600
queries to reconstruct a recognizable face for a single
training set individual. Reconstructing the faces of all 40
individuals would require around 800,000 online queries.
The trained softmax model is much larger than those
considered in Section 4.1, with 412,160 unknowns (d =
10,304 and c = 40). We solve an under-determined system with 41,216 equations (using gradient descent with
TV
200 epochs), and recover a model fˆ achieving RTV
test , Runif
−3
in the order of 10 . Note that the number of model
parameters to extract is linear in the number of people c,
whose faces we hope to recover. By using fˆ in white-box
model inversion attacks, we obtain results that are visually indistinguishable from the ones obtained using the
true f . Given the extracted model fˆ, we can recover all
40 faces using white-box attacks, incurring around 20×
fewer remote queries to f than with 40 black-box attacks.
For black-box attacks, the authors of [23] estimate a
query latency of 70 milliseconds (a little less than in our
own measurements of ML services, see Table 1). Thus,
it takes 24 minutes to recover a single face (the inversion
attack runs in seconds), and 16 hours to recover all 40 images. In contrast, solving the large equation system underlying our model-extraction attack took 10 hours. The
41,216 online queries would take under one hour if executed sequentially and even less with a batch query. The
cost of the 40 local white-box attacks is negligible.
Thus, if the goal is to reconstruct faces for all 40 training individuals, performing model inversion over a previously extracted model results in an attack that is both
faster and requires 20× fewer online queries.

4.2

Decision Tree Path-Finding Attacks

Contrary to logistic models, decision trees do not compute class probabilities as a continuous function of their
input. Rather, decision trees partition the input space into
discrete regions, each of which is assigned a label and
confidence score. We propose a new path-finding attack,
that exploits API particularities to extract the ‘decisions’
taken by a tree when classifying an input.
Prior work on decision tree extraction [7, 12, 33] has
focused on trees with Boolean features and outputs.
While of theoretical importance, such trees have limited
practical use. Kushilevitz and Mansour [33] showed that
Boolean trees can be extracted using membership queries
(arbitrary queries for class labels), but their algorithm

does not extend to more general trees. Here, we propose
attacks that exploit ML API specificities, and that apply
to decision tree models used in MLaaS platforms.
Our tree model, defined formally in Appendix A, allows for binary and multi-ary splits over categorical features, and binary splits over numeric features. Each leaf
of the tree is labeled with a class label and a confidence
score. We note that our attacks also apply (often with better results) to regression trees. In regression trees, each
leaf is labeled with a real-valued output and confidence.
The key idea behind our attack is to use the rich information provided by APIs on a prediction query, as a
pseudo-identifier for the path that the input traversed in
the tree. By varying the value of each input feature, we
then find the predicates to be satisfied, for an input to
follow a given path in the tree. We will also exploit the
ability to query incomplete inputs, in which each feature
xi is chosen from a space Xi ∪ {⊥}, where ⊥ encodes
the absence of a value. One way of handling such inputs
([11, 46]) is to label each node in the tree with an output
value. On an input, we traverse the tree until we reach a
leaf or an internal node with a split over a missing feature, and output that value of that leaf or node.
We formalize these notions by defining oracles that
A can query to obtain an identifier for the leaf or internal node reached by an input. In practice, we instantiate
these oracles using prediction API peculiarities.
Definition 1 (Identity Oracles). Let each node v of a tree
T be assigned some identifier idv . A leaf-identity oracle
O takes as input a query x ∈ X and returns the identifier
of the leaf of the tree T that is reached on input x.
A node-identity oracle O⊥ takes as input a query x ∈
X1 ∪ {⊥} × · · · × Xd ∪ {⊥} and returns the identifier of
the node or leaf of T at which the tree computation halts.
4.2.1

Extraction Algorithms

We now present our path-finding attack (Algorithm 1),
that assumes a leaf-identity oracle that returns unique
identifiers for each leaf. We will relax the uniqueness
assumption further on. The attack starts with a random
input x and gets the leaf id from the oracle. We then
search for all constraints on x that have to be satisfied to
remain in that leaf, using procedures LINE SEARCH (for
continuous features) and CAT SPLIT (for categorical features) described below. From this information, we then
create new queries for unvisited leaves. Once all leaves
have been found, the algorithm returns, for each leaf, the
corresponding constraints on x. We analyze the algorithm’s correctness and complexity in Appendix C.
We illustrate our algorithm with a toy example of a
tree over continuous feature Size and categorical feature
Color (see Figure 3). The current query is x = {Size =
50, Color = R} and O(x) = id2 . Our goal is two-fold:

 Algorithm 1 The path-finding algorithm. The notation id ←
O(x) means querying the leaf-identity oracle O with an input x and
obtaining a response id. By x[i] ⇒ v we denote the query x obtained
from x by replacing the value of xi by v.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:

xinit ← {x1 , . . . , xd }
random initial query
Q ← {xinit }
Set of unprocessed queries
P ← {}
Set of explored leaves with their predicates
while Q not empty do
x ← Q.POP()
id ← O(x)
Call to the leaf identity oracle
if id ∈ P then
Check if leaf already visited
continue
end if
for 1 ≤ i ≤ d do
Test all features
if IS CONTINUOUS(i) then
for (α, β ] ∈ LINE SEARCH(x, i, ε) do
if xi ∈ (α, β ] then
P[id].ADD(‘xi ∈ (α, β ]‘)
Current interval
else
Q.PUSH(x[i] ⇒ β )
New leaf to visit
end if
end for
else
S,V ← CATEGORY SPLIT(x, i, id)
P[id].ADD(‘xi ∈ S‘)
Values for current leaf
for v ∈ V do
Q.PUSH(x[i] ⇒ v)
New leaves to visit
end for
end if
end for
end while

(1) Find the predicates that x has to satisfy to end up in
leaf id2 (i.e., Size ∈ (40, 60], Color = R), and (2) create
new inputs x to explore other paths in the tree.
The LINE SEARCH procedure (line 12) tests continuous features. We start from bounds on the range of a feature Xi = [a, b]. In our example, we have Size ∈ [0, 100].
We set the value of Size in x to 0 and 100, query O, and
obtain id1 and id5 . As the ids do not match, a split on
Size occurs on the path to id2 . With a binary search over
feature Size (and all other features in x fixed), we find all
intervals that lead to different leaves, i.e., [0, 40], (40, 60],
(60, 100]. From these intervals, we find the predicate for
the current leaf (i.e., Size ∈ (40, 60]) and build queries to
explore new tree paths. To ensure termination of the line
search, we specify some precision ε. If a split is on a
threshold t, we find the value t˜ that is the unique multiple
of ε in the range (t − ε,t]. For values xi with granularity
ε, splitting on t˜ is then equivalent to splitting on t.
The CATEGORY SPLIT procedure (line 20) finds splits
on categorical features. In our example, we vary the
value of Color in x and query O to get a leaf id for each
value. We then build a set S of values that lead to the current leaf, i.e., S = {R}, and a set V of values to set in x to
explore other leaves (one representative per leaf). In our
example, we could have V = {B, G, Y} or V = {B, G, O}.
Using these two procedures, we thus find the predicates defining the path to leaf id2 , and generate new
queries x for unvisited leaves of the tree.

≤ 40

∈ {R, B, G}
Size
> 40

id1

=R
id2

Color
=B

id6

Size

≤ 60

∈ {Y, O}

Color

> 60
id5

=G

id3

id4

Figure 3: Decision tree over features Color and Size. Shows the
path (thick green) to leaf id2 on input x = {Size = 50, Color = R}.
Data set
IRS Tax Patterns
Steak Survey
GSS Survey
Email Importance
Email Spam
German Credit
Medical Cover
Bitcoin Price

# records
191,283
430
51,020
4,709
4,601
1,000
163,065
1,076

# classes
51
5
3
2
2
2
Y =R
Y =R

# features
31
12
7
14
46
11
13
7

Table 5: Data sets used for decision tree extraction. Trained trees
for these data sets are available in BigML’s public gallery. The last two
data sets are used to train regression trees.

A top-down approach. We propose an empirically
more efficient top-down algorithm that exploits queries
over partial inputs. It extracts the tree ‘layer by layer’,
starting at the root: We start with an empty query (all
features set to ⊥) and get the root’s id by querying O⊥ .
We then set each feature in turn and query O again. For
exactly one feature (the root’s splitting feature), the input
will reach a different node. With similar procedures as
described previously, we extract the root’s splitting criterion, and recursively search lower layers of the tree.
Duplicate identities. As we verify empirically, our attacks are resilient to some nodes or leaves sharing the
same id. We can modify line 7 in Algorithm 1 to detect
id duplicates, by checking not only whether a leaf with
the current id was already visited, but also whether the
current query violates that leaf’s predicates. The main
issue with duplicate ids comes from the LINE SEARCH
and CATEGORY SPLIT procedures: if two queries x and
x differ in a single feature and reach different leaves with
the same id, the split on that feature will be missed.
4.2.2

Attack Evaluation

Our tree model (see Appendix A) is the one used by
BigML. Other ML services use similar tree models. For
our experiments, we downloaded eight public decision
trees from BigML (see Table 5), and queried them locally using available API bindings. More details on these
models are in Appendix B. We show online extraction
attacks on black-box models from BigML in Section 5.
To emulate black-box model access, we first issue
online queries to BigML, to determine the information
contained in the service’s responses. We then simulate

 Model
IRS Tax Patterns
Steak Survey
GSS Survey
Email Importance
Email Spam
German Credit
Medical Cover
Bitcoin Price

Leaves
318
193
159
109
219
26
49
155

Unique IDs
318
28
113
55
78
25
49
155

Depth
8
17
8
17
29
11
11
9

Without incomplete queries
1 − Rtest
1 − Runif
Queries
100.00%
100.00%
101,057
92.45%
86.40%
3,652
99.98%
99.61%
7,434
99.13%
99.90%
12,888
87.20%
100.00%
42,324
100.00%
100.00%
1,722
100.00%
100.00%
5,966
100.00%
100.00%
31,956

With incomplete queries
1 − Rtest
1 − Runif
Queries
100.00%
100.00%
29,609
100.00%
100.00%
4,013
100.00%
99.65%
2,752
99.81%
99.99%
4,081
99.70%
100.00%
21,808
100.00%
100.00%
1,150
100.00%
100.00%
1,788
100.00%
100.00%
7,390

Table 6: Performance of extraction attacks on public models from BigML. For each model, we report the number of leaves in the tree, the
number of unique identifiers for those leaves, and the maximal tree depth. The chosen granularity ε for continuous features is 10−3 .

black-box access locally, by discarding any extra information returned by the local API. Specifically, we make
use of the following fields in query responses:
• Prediction. This entry contains the predicted class label (classification) or real-valued output (regression).
• Confidence. For classification and regression trees,
BigML computes confidence scores based on a confidence interval for predictions at each node [11]. The
prediction and confidence value constitute a node’s id.
• Fields. Responses to black-box queries contain a
‘fields’ property, that lists all features that appear either in the input query or on the path traversed in the
tree. If a partial query x reaches an internal node v,
this entry tells us which feature v splits on (the feature
is in the ‘fields’ entry, but not in the input x). We make
use of this property for the top-down attack variant.
Table 6 displays the results of our attacks. For each
tree, we give its number of leaves, the number of unique
leaf ids, and the tree depth. We display the success
rate for Algorithm 1 and for the “top-down” variant with
incomplete queries. Querying partial inputs vastly improves our attack: we require far less queries (except for
the Steak Survey model, where Algorithm 1 only visits
a fraction of all leaves and thus achieves low success)
and achieve higher accuracy for trees with duplicate leaf
ids. As expected, both attacks achieve perfect extraction
when all leaves have unique ids. While this is not always the case for classification trees, it is far more likely
for regression trees, where both the label and confidence
score take real values. Surprisingly maybe, the top-down
approach also fully extracts some trees with a large number of duplicate leaf ids. The attacks are also efficient:
The top-down approach takes less than 10 seconds to extract a tree, and Algorithm 1 takes less than 6 minutes
for the largest tree. For online attacks on ML services,
discussed next, this cost is trumped by the delay for the
inherently adaptive prediction queries that are issued.

5

Online Model Extraction Attacks

In this section, we showcase online model extraction attacks against two ML services: BigML and Amazon. For
BigML, we focus on extracting models set up by a user,

Model
Circles
Digits
Iris
Adult

OHE
Yes

Binning
Yes
No
Yes
Yes

Queries
278
650
644
1,485

Time (s)
28
70
68
149

Price ($)
0.03
0.07
0.07
0.15

Table 7: Results of model extraction attacks on Amazon. OHE
stands for one-hot-encoding. The reported query count is the number
used to find quantile bins (at a granularity of 10−3 ), plus those queries
used for equation-solving. Amazon charges $0.0001 per prediction [1].

who wishes to charge for predictions. For Amazon, our
goal is to extract a model trained by ourselves, to which
we only get black-box access. Our attacks only use exposed APIs, and do not in any way attempt to bypass the
services’ authentication or access-control mechanisms.
We only attack models trained in our own accounts.

5.1

Case Study 1: BigML

BigML currently only allows monetization of decision
trees [11]. We train a tree on the German Credit data,
and set it up as a black-box model. The tree has 26
leaves, two of which share the same label and confidence
score. From another account, we extract the model using the two attacks from Section 4.2. We first find the
tree’s number of features, their type and their range, from
BigML’s public gallery. Our attacks (Algorithm 1 and
the top-down variant) extract an exact description of the
tree’s paths, using respectively 1,722 and 1,150 queries.
Both attacks’ duration (1,030 seconds and 631 seconds) is dominated by query latency (≈ 500ms/query).
The monetary cost of the attack depends on the perprediction-fee set by the model owner. In any case, a
user who wishes to make more than 1,150 predictions
has economic incentives to run an extraction attack.

5.2

Case Study 2: Amazon Web Services

Amazon uses logistic regression for classification, and
provides black-box-only access to trained models [1].
By default, Amazon uses two feature extraction techniques: (1) Categorical features are one-hot-encoded,
i.e., the input space Mi = Zk is mapped to k binary features encoding the input value. (2) Quantile binning is
used for numeric features. The training data values are

 split into k-quantiles (k equally-sized bins), and the input
space Mi = [a, b] is mapped to k binary features encoding the bin that a value falls into. Note that  X   >  M ,
i.e., ex increases the number of features. If A reverseengineers ex, she can query the service on samples M in
input space, compute x = ex(M) locally, and extract f in
feature-space using equation-solving.
We apply this approach to models trained by Amazon.
Our results are summarized in Table 7. We first train a
model with no categorical features, and quantile binning
disabled (this is a manually tunable parameter), over the
Digits data set. The attack is then identical to the one
considered in Section 4.1.2: using 650 queries to Amazon, we extract a model that achieves Rtest = Runif = 0.
We now consider models with feature extraction enabled. We assume that A knows the input space M, but
not the training data distribution. For one-hot-encoding,
knowledge of M suffices to apply the same encoding locally. For quantile binning however, applying ex locally
requires knowledge of the training data quantiles. To
reverse-engineer the binning transformation, we use linesearches similar to those we used for decision trees: For
each numeric feature, we search the feature’s range in input space for thresholds (up to a granularity ε) where f ’s
output changes. This indicates our value landed in an adjacent bin, with a different learned regression coefficient.
Note that learning the bin boundaries may be interesting
in its own right, as it leaks information about the training data distribution. Having found the bin boundaries,
we can apply both one-hot-encoding and binning locally,
and extract f over its feature space. As we are restricted
to queries over M, we cannot define an arbitrary system of equations over X . Building a well-determined
and consistent system can be difficult, as the encoding ex
generates sparse inputs over X . However, Amazon facilitates this process with the way it handles queries with
missing features: if a feature is omitted from a query,
all corresponding features in X are set to 0. For a linear model for instance, we can trivially re-construct the
model by issuing queries with a single feature specified,
such as to obtain equations with a single unknown in X .
We trained models for the Circles, Iris and Adult data
sets, with Amazon’s default feature-extraction settings.
Table 7 shows the results of our attacks, for the reverseengineering of ex and extraction of f . For binary models
(Circles and Adult), we use d + 1 queries to solve a linear
equation-system over X . For models with c > 2 classes,
we use c · (d + 1) queries. In all cases, the extracted
model matches f on 100% of tested inputs. To optimize
the query complexity, the queries we use to find quantile
bins are re-used for equation-solving. As line searches
require adaptive queries, we do not use batch predictions.
However, even for the Digits model, we resorted to using
real-time predictions, because of the service’s significant

overhead in evaluating batches. For attacks that require
a large number of non-adaptive queries, we expect batch
predictions to be faster than real-time predictions.

5.3

Discussion

Additional feature extractors. In some ML services
we considered, users may enable further feature extractors. A common transformation is feature scaling or normalization. If A has access to training data statistics
(as provided by BigML for instance), applying the transformation locally is trivial. More generally, for models
with a linear input layer (i.e., logistic regressions, linear
SVMs, MLPs) the scaling or normalization can be seen
as being applied to the learned weights, rather than the
input features. We can thus view the composition f ◦ ex
as a model f that operates over the ‘un-scaled’ input
space M and extract f directly using equation-solving.
Further extractors include text analysis (e.g., bag-ofwords or n-gram models) and Cartesian products (grouping many features into one). We have not analyzed these
in this work, but we believe that they could also be easily
reverse-engineered, especially given some training data
statistics and the ability to make incomplete queries.
Learning unknown model classes or hyper-parameters. For our online attacks, we obtained information
about the model class of f , the enabled feature extraction ex, and other hyper-parameters, directly from the
ML service or its documentation. More generally, if A
does not have full certainty about certain model characteristics, it may be able to narrow down a guess to a small
range. Model hyper-parameters for instance (such as the
free parameter of an RBF kernel) are typically chosen
through cross-validation over a default range of values.
Given a set of attack strategies with varying assumptions, A can use a generic extract-and-test approach:
each attack is applied in turn, and evaluated by computing Rtest or Runif over a chosen set of points. The adversary succeeds if any of the strategies achieves a low error.
Note that A needs to interact with the model f only once,
to obtain responses for a chosen set of extraction samples
and test samples, that can be re-used for each strategy.
Our attacks on Amazon’s service followed this approach: We first formulated guesses for model characteristics left unspecified by the documentation (e.g., we
found no mention of one-hot-encoding, or of how missing inputs are handled). We then evaluated our assumptions with successive extraction attempts. Our results indicate that Amazon uses softmax regression and does not
create binary predictors for missing values. Interestingly,
BigML takes the ’opposite’ approach (i.e., BigML uses
OvR regression and adds predictors for missing values).

 Extraction Given Class Labels Only

The successful attacks given in Sections 4 and 5 show
the danger of revealing confidence values. While current
ML services have been designed to reveal rich information, our attacks may suggest that returning only labels
would be safer. Here we explore model extraction in a
setting with no confidence scores. We will discuss further countermeasures in Section 7. We primarily focus
on settings where A can make direct queries to an API,
i.e., queries for arbitrary inputs x ∈ X . We briefly discuss
indirect queries in the context of linear classifiers.

Runif

Rtest

100
Avg. Extraction Error

6

Uniform
Line-Search
Adaptive
Lowd-Meek

10−1
10−2
10−3
10−4
0

0

25 50 75 100
Budget Factor α

0

25 50 75 100
Budget Factor α

Figure 4: Average error of extracted linear models. Results are for
The Lowd-Meek attack. We start with the prior work
of Lowd and Meek [36]. They present an attack on any
linear classifier, assuming black-box oracle access with
membership queries that return just the predicted class
label. A linear classifier is defined by a vector w ∈ Rd
and a constant β ∈ R, and classifies an instance x as positive if w · x + β > 0 and negative otherwise. SVMs with
linear kernels and binary LRs are examples of linear classifiers. Their attack uses line searches to find points arbitrarily close to f ’s decision boundary (points for which
w · x + β ≈ 0), and extracts w and β from these samples.
This attack only works for linear binary models. We
describe a straightforward extension to some non-linear
models, such as polynomial kernel SVMs. Extracting a
polynomial kernel SVM can be reduced to extracting a
linear SVM in the transformed feature space. Indeed,
for any kernel Kpoly (x, x )=(xT · x + 1)d , we can derive
a projection function φ (·), so that Kpoly (x, x )=φ (x)T ·
φ (x ). This transforms the kernel SVM into a linear one,
since the decision boundary now becomes wF · φ (x) +
β = 0 where wF = ∑ti=1 αi φ (xi ). We can use the LowdMeek attack to extract wF and β as long as φ (x) and its
inverse are feasible to compute; this is unfortunately not
the case for the more common RBF kernels.3
The retraining approach. In addition to evaluating
the Lowd-Meek attack against ML APIs, we introduce
a number of other approaches based on the broad strategy of re-training a model locally, given input-output
examples. Informally, our hope is that by extracting a
model that achieves low training error over the queried
samples, we would effectively approximate the target
model’s decision boundaries. We consider three retraining strategies, described below. We apply these
to the model classes that we previously extracted using
equation-solving attacks, as well as to SVMs.4
3 We did explore using approximations of φ , but found that the adap-

tive re-training techniques discussed in this section perform better.
4 We do not expect retraining attacks to work well for decision trees,
because of the greedy approach taken by learning algorithms. We have
not evaluated extraction of trees, given class labels only, in this work.

different extraction strategies applied to models trained on all binary
data sets from Table 3. The left shows Rtest and the right shows Runif .

(1) Retraining with uniform queries. This baseline
strategy simply consists in sampling m points xi ∈
X uniformly at random, querying the oracle, and
training a model fˆ on these samples.
(2) Line-search retraining. This strategy can be seen
as a model-agnostic generalization of the LowdMeek attack. It issues m adaptive queries to the
oracle using line search techniques, to find samples
close to the decision boundaries of f . A model fˆ is
then trained on the m queried samples.
(3) Adaptive retraining. This strategy applies techniques from active learning [18, 47]. For some
number r of rounds and a query budget m, it first
queries the oracle on mr uniform points, and trains a
model fˆ. Over a total of r rounds, it then selects mr
new points, along the decision boundary of fˆ (intuitively, these are points fˆ is least certain about),
and sends those to the oracle before retraining fˆ.

6.1

Linear Binary Models

We first explore how well the various approaches work
in settings where the Lowd-Meek attack can be applied.
We evaluate their attack and our three retraining strategies for logistic regression models trained over the binary
data sets shown in Table 3. These models have d + 1 parameters, and we vary the query budget as α · (d + 1), for
0.5 ≤ α ≤ 100. Figure 4 displays the average errors Rtest
and Runif over all models, as a function of α.
The retraining strategies that search for points near
the decision boundary clearly perform better than simple
uniform retraining. The adaptive strategy is the most efficient of our three strategies. For relatively low budgets,
it even outperforms the Lowd-Meek attack. However, for
budgets large enough to run line searches in each dimension, the Lowd-Meek attack is clearly the most efficient.
For the models we trained, about 2,050 queries on av-

 6.2

Multiclass LR Models

The Lowd-Meek attack is not applicable in multiclass
(c > 2) settings, even when the decision boundary is a
combination of linear boundaries (as in multiclass regression) [39, 50]. We thus focus on evaluating the three
retraining attacks we introduced, for the type of ML
models we expect to find in real-world applications.
We focus on softmax models here, as softmax and onevs-rest models have identical output behaviors when only
class labels are provided: in both cases, the class label
for an input x is given by argmaxi (wi · x + βi ). From an
extractor’s perspective, it is thus irrelevant whether the
target was trained using a softmax or OvR approach.
We evaluate our attacks on softmax models trained on
the multiclass data sets shown in Table 3. We again vary
the query budget as a factor α of the number of model
parameters, namely α · c · (d + 1). Results are displayed
in Figure 5. We observe that the adaptive strategy clearly
performs best and that the line-search strategy does not
improve over uniform retraining, possibly because the
line-searches have to be split across multiple decisionboundaries. We further note that all strategies achieve
lower Rtest than Runif . It thus appears that for the models
we trained, points from the test set are on average ‘far’
from the decision boundaries of f (i.e., the trained models separate the different classes with large margins).
For all models, 100 · c · (d + 1) queries resulted in extraction accuracy above 99.9%. This represents 26,000
queries on average, and 65,000 at the most (Digits data
set). Our equation-solving attacks achieved similar or
better results with 100× less queries. Yet, for scenarios with high monetary incentives (e.g., intrusion detector evasion), extraction attacks on MLR models may be
attractive, even if APIs only provide class labels.

Avg. Extraction Error

Runif

Rtest

100

Uniform
Line-Search
Adaptive

10−1
10−2
10−3
10−4

0

25 50 75 100
Budget Factor α

0

25 50 75 100
Budget Factor α

Figure 5: Average error of extracted softmax models. Results are
for three retraining strategies applied to models trained on all multiclass
data sets from Table 3. The left shows Rtest and the right shows Runif .

Avg. Extraction Error

erage, and 5,650 at most, are needed to run the LowdMeek attack effectively. This is 50× more queries than
what we needed for equation-solving attacks. With 827
queries on average, adaptive retraining yields a model fˆ
that matches f on over 99% of tested inputs. Thus, even
if an ML API only provides class labels, efficient extraction attacks on linear models remain possible.
We further consider a setting where feature-extraction
(specifically one-hot-encoding of categorical features) is
applied by the ML service, rather than by the user. A is
then limited to indirect queries in input space. Lowd and
Meek [36] note that their extraction attack does not work
in this setting, as A can not run line searches directly over
X . In contrast, for the linear models we trained, we observed no major difference in extraction accuracy for the
adaptive-retraining strategy, when limited to queries over
M. We leave an in-depth study of model extraction with
indirect queries, and class labels only, for future work.

Runif

Rtest

100

Uniform
Line-Search
Adaptive

10−1
10−2
10−3

0

25 50 75 100
Budget Factor α

0

25 50 75 100
Budget Factor α

Figure 6: Average error of extracted RBF kernel SVMs Results
are for three retraining strategies applied to models trained on all binary
data sets from Table 3. The left shows Rtest and the right shows Runif .

6.3

Neural Networks

We now turn to attacks on more complex deep neural
networks. We expect these to be harder to retrain than
multiclass regressions, as deep networks have more parameters and non-linear decision-boundaries. Therefore,
we may need to find a large number of points close to a
decision boundary in order to extract it accurately.
We evaluated our attacks on the multiclass models
from Table 3. For the tested query budgets, line-search
and adaptive retraining gave little benefit over uniform
retraining. For a budget of 100 · k, where k is the number of model parameters, we get Rtest = 99.16% and
Runif = 98.24%, using 108,200 queries per model on average. Our attacks might improve for higher budgets but
it is unclear whether they would then provide any monetary advantage over using ML APIs in an honest way.

6.4

RBF Kernel SVMs

Another class of nonlinear models that we consider are
support-vector machines (SVMs) with radial-basis function (RBF) kernels. A kernel SVM first maps inputs into
a higher-dimensional space, and then finds the hyperplane that maximally separates the two classes. As mentioned in Section 6, SVMs with polynomial kernels can
be extracted using the Lowd-Meek attack in the trans-

 formed feature space. For RBF kernels, this is not possible because the transformed space has infinite dimension.
SVMs do not provide class probability estimates. Our
only applicable attack is thus retraining. As for linear
models, we vary the query budget as α · (d + 1), where
d is the input dimension. We further use the extract-andtest approach from Section 5 to find the value of the RBF
kernel’s hyper-parameter. Results of our attacks are in
Figure 6. Again, we see that adaptive retraining performs
best, even though the decision boundary to extract is nonlinear (in input space) here. Kernel SVMs models are
overall harder to retrain than models with linear decision
boundaries. Yet, for our largest budgets (2,050 queries
on average), we do extract models with over 99% accuracy, which may suffice in certain adversarial settings.

7

Extraction Countermeasures

We have shown in Sections 4 and 5 that adversarial
clients can effectively extract ML models given access
to rich prediction APIs. Given that this undermines the
financial models targeted by some ML cloud services,
and potentially leaks confidential training data, we believe researchers should seek countermeasures.
In Section 6, we analyzed the most obvious defense
against our attacks: prediction API minimization. The
constraint here is that the resulting API must still be useful in (honest) applications. For example, it is simple to
change APIs to not return confidences and not respond
to incomplete queries, assuming applications can get by
without it. This will prevent many of our attacks, most
notably the ones described in Section 4 as well as the feature discovery techniques used in our Amazon case study
(Section 5). Yet, we showed that even if we strip an API
to only provide class labels, successful attacks remain
possible (Section 6), albeit at a much higher query cost.
We discuss further potential countermeasures below.
Rounding confidences. Applications might need confidences, but only at lower granularity. A possible defense is to round confidence scores to some fixed precision [23]. We note that ML APIs already work with some
finite precision when answering queries. For instance,
BigML reports confidences with 5 decimal places, and
Amazon provides values with 16 significant digits.
To understand the effects of limiting precision further,
we re-evaluate equation-solving and decision tree pathfinding attacks with confidence scores rounded to a fixed
decimal place. For equation-solving attacks, rounding
the class probabilities means that the solution to the obtained equation-system might not be the target f , but
some truncated version of it. For decision trees, rounding confidence scores increases the chance of node id
collisions, and thus decreases our attacks’ success rate.

10−1
Labels only
2 decimals
3 decimals

10−2
Rtest

4 decimals
5 decimals
No rounding

10−3
10−4
0

0

20

40
60
Budget Factor α

80

100

Figure 7: Effect of rounding on model extraction. Shows the average test error of equation-solving attacks on softmax models trained
on the benchmark suite (Table 3), as we vary the number of significant
digits in reported class probabilities. Extraction with no rounding and
with class labels only (adaptive retraining) are added for comparison.

Figure 7 shows the results of experiments on softmax
models, with class probabilities rounded to 2–5 decimals.
We plot only Rtest , the results for Runif being similar. We
observe that class probabilities rounded to 4 or 5 decimal places (as done already in BigML) have no effect on
the attack’s success. When rounding further to 3 and 2
decimal places, the attack is weakened, but still vastly
outperforms adaptive retraining using class labels only.
For regression trees, rounding has no effect on our attacks. Indeed, for the models we considered, the output
itself is unique in each leaf (we could also round outputs, but the impact on utility may be more critical). For
classification trees, we re-evaluated our top-down attack,
with confidence scores rounded to fewer than 5 decimal
places. The attacks on the ‘IRS Tax Patterns’ and ‘Email
Importance’ models are the most resilient, and suffer no
success degradation before scores are rounded to 2 decimal places. For the other models, rounding confidences
to 3 or 4 decimal places severely undermines our attack.
Differential privacy. Differential privacy (DP) [22]
and its variants [34] have been explored as mechanisms
for protecting, in particular, the privacy of ML training data [54]. DP learning has been applied to regressions [17, 56], SVMs [44], decision trees [31] and neural
networks [48]. As some of our extraction attacks leak
training data information (Section 4.1.3), one may ask
whether DP can prevent extraction, or at least reduce the
severity of the privacy violations that extraction enables.
Consider na¨ıve application of DP to protect individual
training data elements. This should, in theory, decrease
the ability of an adversary A to learn information about
training set elements, when given access to prediction
queries. One would not expect, however, that this prevents model extraction, as DP is not defined to do so:
consider a trivially useless learning algorithm for binary
logistic regression, that discards the training data and sets
w and β to 0. This algorithm is differentially private, yet
w and β can easily be recovered using equation-solving.

 A more appropriate strategy would be to apply DP directly to the model parameters, which would amount to
saying that a query should not allow A to distinguish between closely neighboring model parameters. How exactly this would work and what privacy budgets would
be required is left as an open question by our work.

Ensemble methods. Ensemble methods such as random forests return as prediction an aggregation of predictions by a number of individual models. While we
have not experimented with ensemble methods as targets,
we suspect that they may be more resilient to extraction
attacks, in the sense that attackers will only be able to obtain relatively coarse approximations of the target function. Nevertheless, ensemble methods may still be vulnerable to other attacks such as model evasion [55].

8

Related Work

Our work is related to the extensive literature on learning
theory, such as PAC learning [53] and its variants [3, 8].
Indeed, extraction can be viewed as a type of learning, in
which an unknown instance of a known hypothesis class
(model type) is providing labels (without error). This is
often called learning with membership queries [3]. Our
setting differs from these in two ways. The first is conceptual: in PAC learning one builds algorithms to learn a
concept — the terminology belies the motivation of formalizing learning from data. In model extraction, an attacker is literally given a function oracle that it seeks to
illicitly determine. The second difference is more pragmatic: prediction APIs reveal richer information than assumed in prior learning theory work, and we exploit that.
Algorithms for learning with membership queries
have been proposed for Boolean functions [7, 15, 30, 33]
and various binary classifiers [36, 39, 50]. The latter line
of work, initiated by Lowd and Meek [36], studies strategies for model evasion, in the context of spam or fraud
detectors [9, 29, 36, 37, 55]. Intuitively, model extraction
seems harder than evasion, and this is corroborated by
results from theory [36, 39, 50] and practice [36, 55].
Evasion attacks fall into the larger field of adversarial
machine learning, that studies machine learning in general adversarial settings [6,29]. In that context, a number
of authors have considered strategies and defenses for
poisoning attacks, that consist in injecting maliciously
crafted samples into a model’s train or test data, so as to
decrease the learned model’s accuracy [10,21,32,40,45].
In a non-malicious setting, improper model extraction
techniques have been applied for interpreting [2, 19, 52]
and compressing [16, 27] complex neural networks.

9

Conclusion

We demonstrated how the flexible prediction APIs exposed by current ML-as-a-service providers enable new
model extraction attacks that could subvert model monetization, violate training-data privacy, and facilitate
model evasion. Through local experiments and online
attacks on two major providers, BigML and Amazon,
we illustrated the efficiency and broad applicability of
attacks that exploit common API features, such as the
availability of confidence scores or the ability to query
arbitrary partial inputs. We presented a generic equationsolving attack for models with a logistic output layer and
a novel path-finding algorithm for decision trees.
We further explored potential countermeasures to
these attacks, the most obvious being a restriction on the
information provided by ML APIs. Building upon prior
work from learning-theory, we showed how an attacker
that only obtains class labels for adaptively chosen inputs, may launch less effective, yet potentially harmful,
retraining attacks. Evaluating these attacks, as well as
more refined countermeasures, on production-grade ML
services is an interesting avenue for future work.
Acknowledgments. We thank Mart´ın Abadi and the
anonymous reviewers for their comments. This work
was supported by NSF grants 1330599, 1330308, and
1546033, as well as a generous gift from Microsoft.

References
[1] A MAZON W EB S ERVICES.
https://aws.amazon.com/
machine-learning. Accessed Feb. 10, 2016.
[2] A NDREWS , R., D IEDERICH , J., AND T ICKLE , A. Survey and
critique of techniques for extracting rules from trained artificial
neural networks. KBS 8, 6 (1995), 373–389.
[3] A NGLUIN , D. Queries and concept learning. Machine learning
2, 4 (1988), 319–342.
[4] ATENIESE , G., M ANCINI , L. V., S POGNARDI , A., V ILLANI ,
A., V ITALI , D., AND F ELICI , G. Hacking smart machines
with smarter ones: How to extract meaningful data from machine
learning classifiers. IJSN 10, 3 (2015), 137–150.
[5] AT&T L ABORATORIES C AMBRIDGE. The ORL database
of faces.
http://www.cl.cam.ac.uk/research/dtg/
attarchive/facedatabase.html.
[6] BARRENO , M., N ELSON , B., S EARS , R., J OSEPH , A. D., AND
T YGAR , J. D. Can machine learning be secure? In ASIACCS
(2006), ACM, pp. 16–25.
[7] B ELLARE , M. A technique for upper bounding the spectral norm
with applications to learning. In COLT (1992), ACM, pp. 62–70.
[8] B ENEDEK , G. M., AND I TAI , A. Learnability with respect to
fixed distributions. TCS 86, 2 (1991), 377–389.
[9] B IGGIO , B., C ORONA , I., M AIORCA , D., N ELSON , B.,
Sˇ RNDI C´ , N., L ASKOV, P., G IACINTO , G., AND ROLI , F. Evasion attacks against machine learning at test time. In ECML
PKDD. Springer, 2013, pp. 387–402.
[10] B IGGIO , B., N ELSON , B., AND L ASKOV, P. Poisoning attacks
against support vector machines. In ICML (2012).

 [11] B IG ML. https://www.bigml.com. Accessed Feb. 10, 2016.

[35] L ICHMAN , M. UCI machine learning repository, 2013.

[12] B LUM , A. L., AND L ANGLEY, P. Selection of relevant features
and examples in machine learning. Artificial intelligence 97, 1
(1997), 245–271.

[36] L OWD , D., AND M EEK , C. Adversarial learning. In KDD
(2005), ACM, pp. 641–647.

[13] B LUMER , A., E HRENFEUCHT, A., H AUSSLER , D., AND WAR MUTH , M. K. Occam’s razor. Readings in machine learning
(1990), 201–204.
[14] B OSER , B. E., G UYON , I. M., AND VAPNIK , V. N. A training
algorithm for optimal margin classifiers. In COLT (1992), ACM,
pp. 144–152.
[15] B SHOUTY, N. H. Exact learning boolean functions via the monotone theory. Inform. Comp. 123, 1 (1995), 146–153.
[16] B UCILU Aˇ , C., C ARUANA , R., AND N ICULESCU -M IZIL , A.
Model compression. In KDD (2006), ACM, pp. 535–541.
[17] C HAUDHURI , K., AND M ONTELEONI , C. Privacy-preserving
logistic regression. In NIPS (2009), pp. 289–296.

[37] L OWD , D., AND M EEK , C. Good word attacks on statistical
spam filters. In CEAS (2005).
[38] M ICROSOFT A ZURE.
https://azure.microsoft.com/
services/machine-learning. Accessed Feb. 10, 2016.
[39] N ELSON , B., RUBINSTEIN , B. I., H UANG , L., J OSEPH , A. D.,
L EE , S. J., R AO , S., AND T YGAR , J. Query strategies for evading convex-inducing classifiers. JMLR 13, 1 (2012), 1293–1332.
[40] N EWSOME , J., K ARP, B., AND S ONG , D. Paragraph: Thwarting signature learning by training maliciously. In RAID (2006),
Springer, pp. 81–105.
[41] N OCEDAL , J., AND W RIGHT, S. Numerical optimization.
Springer Science & Business Media, 2006.

[19] C RAVEN , M. W., AND S HAVLIK , J. W. Extracting treestructured representations of trained networks. In NIPS (1996).

[42] P EDREGOSA , F., VAROQUAUX , G., G RAMFORT, A., M ICHEL ,
V., T HIRION , B., G RISEL , O., B LONDEL , M., P RETTEN HOFER , P., W EISS , R., D UBOURG , V., VANDERPLAS , J., PAS SOS , A., C OURNAPEAU , D., B RUCHER , M., P ERROT, M.,
AND D UCHESNAY, E. Scikit-learn: Machine learning in Python.
JMLR 12 (2011), 2825–2830.

[20] C YBENKO , G. Approximation by superpositions of a sigmoidal
function. MCSS 2, 4 (1989), 303–314.

[43] P REDICTION IO. http://prediction.io. Accessed Feb. 10,
2016.

[21] DALVI , N., D OMINGOS , P., S ANGHAI , S., V ERMA , D., ET AL .
Adversarial classification. In KDD (2004), ACM, pp. 99–108.

[44] RUBINSTEIN , B. I., BARTLETT, P. L., H UANG , L., AND TAFT,
N. Learning in a large function space: Privacy-preserving mechanisms for SVM learning. JPC 4, 1 (2012), 4.

[18] C OHN , D., ATLAS , L., AND L ADNER , R. Improving generalization with active learning. Machine learning 15, 2 (1994),
201–221.

[22] DWORK , C. Differential privacy. In ICALP (2006), Springer.
[23] F REDRIKSON , M., J HA , S., AND R ISTENPART, T. Model inversion attacks that exploit confidence information and basic countermeasures. In CCS (2015), ACM, pp. 1322–1333.
[24] F REDRIKSON , M., L ANTZ , E., J HA , S., L IN , S., PAGE , D.,
AND R ISTENPART, T. Privacy in pharmacogenetics: An endto-end case study of personalized Warfarin dosing. In USENIX
Security (2014), pp. 17–32.
[25] G OOGLE P REDICTION API. https://cloud.google.com/
prediction. Accessed Feb. 10, 2016.
[26] H ICKEY, W.
How Americans Like their Steak.
http://fivethirtyeight.com/datalab/how-americanslike-their-steak, 2014. Accessed Feb. 10, 2016.
[27] H INTON , G., V INYALS , O., AND D EAN , J. Distilling the knowledge in a neural network. arXiv:1503.02531 (2015).
[28] H ORNIK , K., S TINCHCOMBE , M., AND W HITE , H. Multilayer
feedforward networks are universal approximators. Neural networks 2, 5 (1989), 359–366.
[29] H UANG , L., J OSEPH , A. D., N ELSON , B., RUBINSTEIN , B. I.,
AND T YGAR , J. Adversarial machine learning. In AISec (2011),
ACM, pp. 43–58.
[30] JACKSON , J. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. In FOCS
(1994), IEEE, pp. 42–53.
[31] JAGANNATHAN , G., P ILLAIPAKKAMNATT, K., AND W RIGHT,
R. N. A practical differentially private random decision tree classifier. In ICDMW (2009), IEEE, pp. 114–121.
[32] K LOFT, M., AND L ASKOV, P. Online anomaly detection under
adversarial impact. In AISTATS (2010), pp. 405–412.
[33] K USHILEVITZ , E., AND M ANSOUR , Y. Learning decision trees
using the Fourier spectrum. SICOMP 22, 6 (1993), 1331–1348.
[34] L I , N., Q ARDAJI , W., S U , D., W U , Y., AND YANG , W. Membership privacy: A unifying framework for privacy definitions. In
CCS (2013), ACM.

[45] RUBINSTEIN , B. I., N ELSON , B., H UANG , L., J OSEPH , A. D.,
L AU , S.- H ., R AO , S., TAFT, N., AND T YGAR , J. Antidote:
understanding and defending against poisoning of anomaly detectors. In IMC (2009), ACM, pp. 1–14.
[46] S AAR -T SECHANSKY, M., AND P ROVOST, F. Handling missing
values when applying classification models. JMLR (2007).
[47] S ETTLES , B. Active learning literature survey. University of
Wisconsin, Madison 52, 55-66 (1995), 11.
[48] S HOKRI , R., AND S HMATIKOV, V. Privacy-preserving deep
learning. In CCS (2015), ACM, pp. 1310–1321.
[49] S MITH , T. W., M ARSDEN , P., H OUT, M., AND K IM , J. General
social surveys, 1972-2012, 2013.
[50] S TEVENS , D., AND L OWD , D. On the hardness of evading combinations of linear classifiers. In AISec (2013), ACM, pp. 77–86.
[51] T HEANO D EVELOPMENT T EAM.
Theano: A Python
framework for fast computation of mathematical expressions.
arXiv:1605.02688 (2016).
[52] T OWELL , G. G., AND S HAVLIK , J. W. Extracting refined rules
from knowledge-based neural networks. Machine learning 13, 1
(1993), 71–101.
[53] VALIANT, L. G. A theory of the learnable. Communications of
the ACM 27, 11 (1984), 1134–1142.
[54] V INTERBO , S. Differentially private projected histograms: Construction and use for prediction. In ECML-PKDD (2012).
[55] Sˇ RNDI C´ , N., AND L ASKOV, P. Practical evasion of a learningbased classifier: A case study. In Security and Privacy (SP)
(2014), IEEE, pp. 197–211.
[56] Z HANG , J., Z HANG , Z., X IAO , X., YANG , Y., AND W INSLETT,
M. Functional mechanism: regression analysis under differential
privacy. In VLDB (2012).
[57] Z HU , J., AND H ASTIE , T. Kernel logistic regression and the
import vector machine. In NIPS (2001), pp. 1081–1088.

 A

Some Details on Models

SVMs. Support vector machines (SVMs) perform binary classification (c = 2) by defining a maximally separating hyperplane in d-dimensional feature space. A
linear SVM is a function f (x) = sign(w · x + β ) where
‘sign’ outputs 0 for all negative inputs and 1 otherwise.
Linear SVMs are not suitable for non-linearly separable
data. Here one uses instead kernel techniques [14].
A kernel is a function K : X × X → R. Typical kernels
include the quadratic kernel Kquad (x, x ) = (xT · x + 1)2
and the Gaussian radial basis function (RBF) kernel
2
Krbf (x, x ) = e−γ  x−x    , parameterized by a value γ ∈ R.
A kernel’s projection function is a map φ defined by
K(x, x ) = φ (x) · φ (x ). We do not use φ explicitly, indeed for RBF kernels this produces an infinite-dimension
vector. Instead, classification is defined using a “kernel trick”: f (x) = sign([∑ti=1 αi K(x, xi )] + β ) where β is
again a learned threshold, α1 , . . . , αt are learned weights,
and x1 , . . . , xt are feature vectors of inputs from a training
set. The xi for which αi = 0 are called support vectors.
Note that for non-zero αi , it is the case that αi < 0 if the
training-set label of xi was zero and αi > 0 otherwise.
Logistic regression. SVMs do not directly generalize to
multiclass settings c > 2, nor do they output class probabilities. Logistic regression (LR) is a popular classifier that does. A binary LR model is defined as f1 (x) =
σ (w · x + β ) = 1/(1 + e−(w·x+β ) ) and f0 (x) = 1 − f1 (x).
A class label is chosen as 1 iff f1 (x) > 0.5.
When c > 2, one fixes c weight vectors w0 , . . . , wc−1
each in Rd , thresholds β0 , . . . , βc−1 in R and defines
w j ·x+β j ) for i ∈ Z . The class lafi (x) = ewi ·x+βi /(∑c−1
c
j=0 e
bel is taken to be argmaxi fi (x). Multiclass regression is
referred to as multinomial or softmax regression. An alternative approach to softmax regression is to build a binary model σ (wi · x + βi ) per class in a one-vs-rest fashion and then set fi (x) = σ (wi · x + βi )/∑ j σ (w j · x + β j ).
These are log-linear models, and may not be suitable for data that is not linearly separable in X . Again,
one may use kernel techniques to deal with more complex data relationships (c.f., [57]). Then, one replaces
wi · x + βi with ∑tr=1 αi,r K(x, xr ) + βi . As written, this
uses the entire set of training data points x1 , . . . , xt as socalled representors (here analogous to support vectors).
Unlike with SVMs, where most training data set points
will never end up as support vectors, here all training set
points are potentially representors. In practice one uses a
size s < t random subset of training data [57].
Deep neural networks. A popular way of extending
softmax regression to handle data that is non linearly separable in X is to first apply one or more non-linear transformations to the input data. The goal of these hidden
layers is to map the input data into a (typically) lowerdimensional space in which the classes are separable by

the softmax layer. We focus here on fully connected networks, also known as multilayer perceptrons, with a single hidden layer. The hidden layer consists of a number h of hidden nodes, with associated weight vectors
(1)
(1)
(1)
(1)
w0 , . . . , wh−1 in Rd and thresholds β0 , . . . , βh−1 in R.
The i-th hidden unit applies a non linear transformation
(1)
(1)
hi (x) = g(wi · x + βi ), where g is an activation function such as tanh or σ . The vector h(x) ∈ Rh is then
input into a softmax output layer with weight vectors
(2)
(2)
(2)
(2)
w0 , . . . , wc−1 in Rh and thresholds β0 , . . . , βc−1 in R.
Decision trees. A decision tree T is a labeled tree. Each
internal node v is labeled by a feature index i ∈ {1, . . . , d}
and a splitting function ρ : Xi → Zkv , where kv ≥ 2 denotes the number of outgoing edges of v.
On an input x = (x1 , x2 , . . . , xd ), a tree T defines a computation as follows, starting at the root. When we reach
a node v, labeled by {i, ρ}, we proceed to the child of
v indexed by ρ(xi ). We consider three types of splitting
functions ρ that are typically used in practice ([11]):
(1) The feature xi is categorical with Xi = Zk . Let
{S, T } be some partition of Zk . Then kv = 2 and
ρ(xi ) = 0 if xi ∈ S and ρ(xi ) = 1 if xi ∈ T . This is a
binary split on a categorical feature.
(2) The feature xi is categorical with Xi = Zk . We have
kv = k and ρ(xi ) = xi . This corresponds to a k-ary
split on a categorical feature of arity k.
(3) The feature xi is continuous with Xi = [a, b]. Let
a < t < b be a threshold. Then kv = 2 and ρ(xi ) = 0
if xi ≤ t and ρ(xi ) = 1 if xi > t. This is a binary split
on a continuous feature with threshold t.
When we reach a leaf, we terminate and output that leaf’s
value. This value can be a class label, or a class label and
confidence score. This defines a function f : X → Y.

B

Details on Data Sets

Here we give some more information about the data sets
we used in this work. Refer back to Table 3 and Table 5.
Synthetic data sets. We used 4 synthetic data sets from
scikit [42]. The first two data sets are classic examples
of non-linearly separable data, consisting of two concentric Circles, or two interleaving Moons. The next two
synthetic data sets, Blobs and 5-Class, consist of Gaussian clusters of points assigned to either 3 or 5 classes.
Public data sets. We gathered a varied set of data sets
representative of the type of data we would expect ML
service users to use to train logistic and SVM based models. These include famous data sets used for supervised
learning, obtained from the UCI ML repository (Adult,
Iris, Breast Cancer, Mushrooms, Diabetes). We also
consider the Steak and GSS data sets used in prior work

 on model inversion [23]. Finally, we add a data set of digits available in scikit, to visually illustrate training data
leakage in kernelized logistic models (c.f. Section 4.1.3).
Public data sets and models from BigML. For experiments on decision trees, we chose a varied set of models
publicly available on BigML’s platform. These models
were trained by real MLaaS users and they cover a wide
range of application scenarios, thus providing a realistic
benchmark for the evaluation of our extraction attacks.
The IRS model predicts a US state, based on administrative tax records. The Steak and GSS models respectively predict a person’s preferred steak preparation
and happiness level, from survey and demographic data.
These two models were also considered in [23]. The
Email Importance model predicts whether Gmail classifies an email as ‘important’ or not, given message
metadata. The Email Spam model classifies emails as
spam, given the presence of certain words in its content.
The German Credit data set was taken from the UCI library [35] and classifies a user’s loan risk. Finally, two
regression models respectively predict Medical Charges
in the US based on state demographics, and the Bitcoin
Market Price from daily opening and closing values.

C

Analysis of the Path-Finding Algorithm

In this section, we analyze the correctness and complexity of the decision tree extraction algorithm in
Algorithm 1. We assume that all leaves are assigned a
unique id by the oracle O, and that no continuous feature is split into intervals of width smaller than ε. We
may use id to refer directly to the leaf with identity id.
Correctness. Termination of the algorithm follows immediately from the fact that new queries are only added
to Q when a new leaf is visited. As the number of leaves
in the tree is bounded, the algorithm must terminate.
We prove by contradiction that all leaves are eventually visited. Let the depth of a node v, denote the length
of the path from v to the root (the root has depth 0). For
two leaves id, id , let A be their deepest common ancestor (A is the deepest node appearing on both the paths of
id and id ). We denote the depth of A as ∆(id, id ).
Suppose Algorithm 1 terminates without visiting all
leaves, and let (id, id ) be a pair of leaves with maximal ∆(id, id ), such that id was visited but id was not.
Let xi be the feature that their deepest common ancestor A splits on. When id is visited, the algorithm calls
LINE SEARCH or CATEGORY SPLIT on feature xi . As all
leaf ids are unique and there are no intervals smaller than
ε, we will discover a leaf in each sub-tree rooted at A, including the one that contains id . Thus, we visit a leaf
id for which ∆(id , id ) > ∆(id, id ), a contradiction.
Complexity. Let m denote the number of leaves in the

tree. Each leaf is visited exactly once, and for each leaf
we check all d features. Suppose continuous features
have range [0, b], and categorical features have arity k.
For continuous features, finding one threshold takes at
most log2 ( εb ) queries. As the total number of splits on
one feature is at most m (i.e., all nodes split on the same
feature), finding all thresholds uses at most m · log2 ( εb )
queries. Testing a categorical feature uses k queries.
The total query complexity is O(m · (dcat · k + dcont · m ·
log( εb )), where dcat and dcont represent respectively the
number of categorical and continuous features.
For the special case of boolean trees, the complexity is
O(m · d). In comparison, the algorithm of [33], that uses
membership queries only, has a complexity polynomial
in d and 2δ , where δ is the tree depth. For degenerate
trees, 2δ can be exponential in m, implying that the assumption of unique leaf identities (obtained from confidence scores for instance) provides an exponential speedup over the best-known approach with class labels only.
The algorithm from [33] can be extended to regression
trees, with a complexity polynomial in the size of the output range Y. Again, under the assumption of unique leaf
identities (which could be obtained solely from the output values) we obtain a much more efficient algorithm,
with a complexity independent of the output range.
The Top-Down Approach. The correctness and complexity of the top-down algorithm from Section 4.2
(which uses incomplete queries), follow from a similar
analysis. The main difference is that we assume that all
nodes have a unique id, rather than only the leaves.

D

A Note on Improper Extraction

To extract a model f , without knowledge of the model
class, a simple strategy is to extract a multilayer perceptron fˆ with a large enough hidden layer. Indeed, feedforward networks with a single hidden layer can, in principle, closely approximate any continuous function over
a bounded subset of Rd [20, 28].
However, this strategy intuitively does not appear to be
optimal. Even if we know that we can find a multilayer
perceptron fˆ that closely matches f , fˆ might have a far
more complex representation (more parameters) than f .
Thus, tailoring the extraction to the ‘simpler’ model class
of the target f appears more efficient. In learning theory,
the problem of finding a succinct representation of some
target model f is known as Occam Learning [13].
Our experiments indicate that such generic improper
extraction indeed appears sub-optimal, in the context of
equation-solving attacks. We train a softmax regression
over the Adult data set with target “Race”. The model
f is defined by 530 real-valued parameters. As shown in
Section 4.1.2, using only 530 queries, we extract a model
fˆ from the same model class, that closely matches f ( fˆ

 and f predict the same labels on 100% of tested inputs,
and produce class probabilities that differ by less than
10−7 in TV distance). We also extracted the same model,
assuming a multilayer perceptron target class. Even with
1,000 hidden nodes (this model has 111,005 parameters),
and 10× more queries (5,300), the extracted model fˆ is
a weaker approximation of f (99.5% accuracy for class
labels and TV distance of 10−2 for class probabilities).