J Exp Criminol (2016) 12:347–371
DOI 10.1007/s11292-016-9272-0

Predictions put into practice: a quasi-experimental
evaluation of Chicago’s predictive policing pilot
Jessica Saunders 1 & Priscillia Hunt 1 &
John S. Hollywood 1

Published online: 12 August 2016
# Springer Science+Business Media Dordrecht 2016

Abstract
Objectives In 2013, the Chicago Police Department conducted a pilot of a predictive
policing program designed to reduce gun violence. The program included development
of a Strategic Subjects List (SSL) of people estimated to be at highest risk of gun
violence who were then referred to local police commanders for a preventive intervention. The purpose of this study is to identify the impact of the pilot on individual- and
city-level gun violence, and to test possible drivers of results.
Methods The SSL consisted of 426 people estimated to be at highest risk of gun
violence. We used ARIMA models to estimate impacts on city-level homicide trends,
and propensity score matching to estimate the effects of being placed on the list on five
measures related to gun violence. A mediation analysis and interviews with police
leadership and COMPSTAT meeting observations help understand what is driving
results.
Results Individuals on the SSL are not more or less likely to become a victim of a
homicide or shooting than the comparison group, and this is further supported by citylevel analysis. The treated group is more likely to be arrested for a shooting.
Conclusions It is not clear how the predictions should be used in the field. One
potential reason why being placed on the list resulted in an increased chance of being
arrested for a shooting is that some officers may have used the list as leads to closing
shooting cases. The results provide for a discussion about the future of individual-based
predictive policing programs.
Keywords Predictive policing . Program evaluation . Propensity score matching . Quasiexperimental design . Risk assessment . Time series analysis

* Jessica Saunders
jsaunders@rand.org

1

RAND Corporation, 1776 Main Street, Santa Monica, CA 90407, USA

 348

J. Saunders et al.

Introduction
The term Bpredictive policing^ has garnered significant interest in the law enforcement
community as a potential way to increase the likelihood of preventing crime before it
occurs. While the term has various operational definitions, predictive policing is
typically comprised of two elements: a prediction model that uses an algorithm to
identify instances of increased crime risk, and an associated prevention strategy to
mitigate and/or reduce those risks (Perry et al. 2013; Ridgeway 2013). With the
progress of more advanced analytics (also commonly referred to as predictive analytics,
machine learning, or data mining methods), statistical approaches are shown to make
more accurate predictions than traditional crime analysis methods in the lab (Berk
2011; Berk and Bleich 2013; Cohen et al. 2007; Gorr and Harries 2003). By leveraging
advanced analytics, police departments may be able to more effectively identify future
crime targets for preemptive intervention. However, there is little experimental evidence from the field demonstrating whether implementing an advanced analytics
predictive model, along with a prevention strategy—Bpredictive policing^—works to
reduce crime, particularly compared to other policing practices in the field.
The impact of both prediction and prevention, that is, a predictive policing program,
needs to be tested because there are plenty of reasons to believe improvements in the
accuracy of predictions alone may not result in a reduction in crime. First, the
predictions may not be actionable because the location and/or time are not precise
enough. For example, predictions may identify census blocks at increased risk of crime
in the following week, but for most police departments, this period is too long to
efficiently and effectively implement a strategy to prevent a crime. Second, since the
baseline accuracy of predictions is still relatively low, small improvements can be made
to appear as large percentage improvements, when they are rather insufficient to make a
difference in the real world. For instance, a method or model may improve the
prediction of homicide perpetrators in a city in a year from 1 out of 100,000 people
to 6 out of 100,000—a 500 % improvement—but using the average homicide levels in
cities, the new approach will still fail to identify nearly 99.5 % of homicide perpetrators.
Third, the crime prevention strategies may not work. So in our example, even if a
prediction model identifies five more future homicide perpetrators than traditional
analysis, the prevention program may not stop them from committing a homicide.
Not to mention, they may not even receive the prevention program—the new method
identified 6 out of 100,000, meaning 100,000 people would need to receive prevention
to avoid 6 homicides. Finally, law enforcement may choose not to use the predictions
(noncompliance). And these are only a few of the many reasons that an enhanced
prediction model or method may not lead to crime reduction in the field, so there is still
a need for research studies to better understand if and how predictive policing programs
work in practice.
There is some experimental evidence of the impact of predictive policing strategies
on crime, albeit limited. Two peer-reviewed field experiments of explicitly formulated
predictive policing programs suggest results are mixed. One study, co-authored by
predictive policing software developers, compared the use of predictive policing
software to identify micro-places at high risk of crime to crime analysts manually
labeling high-risk micro-places; police then conducted additional patrols in the identified places (Mohler et al. 2015). The study found that the predictive policing tool better

 Chicago’s predictive policing pilot

349

recognized future crime risk and reduced crime in comparison to the manual labeling
by the crime analysts. But the other study, identifying blocks at high risk within an
intelligence-led policing paradigm, did not result in a reduction of property crime
compared to business-as-usual hot spots mapping and policing (Hunt et al. 2014).
The authors concluded that the failure to identify an effect could be because the
program did not in fact work (theory failure), low statistical power, and/or lack of
program fidelity in some treatment units (implementation failure). The contradictory
findings may be due to differences in study design, experimental control, prediction
accuracy, prevention implementation, or a dozen other factors, which are discussed in
more detail later.
In order to better understand the effects of an individual-focused predictive policing
program in the field, this study analyzes a pilot program implemented in Chicago in
2013 aimed at reducing gun violence. The theory behind the program is not dissimilar
to prior efforts to identify, monitor, and deter or incapacitate high-risk or highly active
offenders to reduce crime (Abrahamse et al. 1991; Martin and Sherman 1986;
Ridgeway et al. 2011; Sherman and Berk 1984). One difference, however, is that the
individuals at high risk of being involved in crime in the future were identified using a
predictive policing strategy based on a statistical model of co-arrest networks in a
policing context. Importantly, the high-risk individuals were not necessarily under
official criminal justice supervision nor were they identified through intelligence to
be particularly criminally active.
The predictive policing strategy examined in this study refers to the pilot, or first
phase, of the Chicago Police Department’s (CPD) larger predictive policing program
where individuals at highest risk for gun violence were placed on a Strategic Subjects
List (SSL). The SSL was disseminated by central command and the prevention strategy
was deferred to district commanders who decided the relevant policing intervention
strategy for SSL individuals in their district. We test whether the introduction of the
SSL affected city-level homicide rates. Furthermore, using individual-level data, we
apply propensity score matching methods to estimate the impact of the SSL on the
likelihood of high-risk individuals being involved in gun violence. Lastly, we test
hypotheses for why the program may or may not have worked.
The rest of this paper is structured as follows: in section two, we provide a review of
previous literature on the prediction and prevention of criminal behavior at the individual-level, focusing on linking predictions to practice within the intelligence-led
policing paradigm. Section three provides a description of Chicago’s SSL program
and how it was implemented during the pilot period. In section four, we present the data
and quasi-experimental methods used to evaluate program effectiveness and to identify
the mechanisms driving outcomes. Section five presents results, and section six
concludes with a discussion of what the findings mean for police departments looking
to develop and implement similar programs.

Literature review
Predictive policing is a proactive policing model that was popularized, in part, by the
development of advanced analytics that were lauded as highly successful in other fields
(Beck and McCue 2009; Berry and Linoff 2004; Zikopoulos and Eaton 2011). Now

 350

J. Saunders et al.

that analysts have increasing methodological and computational sophistication to
predict future crime patterns (Berk and Bleich 2013; Cohen et al. 2007; Perry et al.
2013), many have claimed that police should be using these models to reorient their
officers toward future, rather than current, problems (Beck and McCue 2009). The units
of crime predictions can fall across a continuum of targets, ranging from large to small
geographically sized areas (e.g., predict when and where crime is more likely to occur),
all the way down to individuals (e.g., predict who is more likely to be involved in
criminal activity). As the models differ in their predicted targets, we might expect these
models to be tied to policing strategies for crime control purposes. The policing
strategies would likely differ in terms of specificity of the prevention mechanism
(e.g., from general prevention to targeted prevention of particular crime types or of
specific behaviors) and of the time period (e.g., how far into the future the strategy
might operate). However, the best ways to translate predictions into practice are still
underdeveloped (Ridgeway 2013).
Predictive policing comes from a long history of proactive policing strategies
focused on getting ahead of crime before it occurs (Bordua and Reiss 1966). Since
the 1980s, a large body of robust literature has grown on the effectiveness of proactive
policing techniques for reducing crime (Braga et al. 2012; Mazerolle et al. 1998, 2000;
Sherman and Weisburd 1995; Weisburd and Mazerolle 2000). The evidence-based
policing field is replete with studies that demonstrate that the police can proactively
prevent and reduce crime, and each share a commonality in the broadest sense with one
another and predictive policing—a basis for the selection of a particular target and a
mechanism to prevent and reduce crime. Multiple examples of these studies can be
found on the Evidenced-Based Policing Matrix (Lum et al. 2011), and one can conceive
of integrating predictive analytics into most of these programs.
Predicting crime using individual-focused modeling
To date, much of the interest in predictive policing has focused on using geospatial
modeling to predict future hot spots (Beck and McCue 2009; Groff and La Vigne 2002;
Hunt et al. 2014; Mohler et al. 2015; Perry et al. 2013). In fact, researchers started
assessing computational methods for what they called Bpredictive crime mapping^ over
10 years ago (Gorr and Harries 2003; Groff and La Vigne 2002). Crime control
programs that focus on geographic targets have been met with the most success
(Braga 2005; Braga and Weisburd 2010; Sherman 1986; Sherman et al. 1997;
Sherman and Weisburd 1995; Weisburd and Mazerolle 2000). Models and methods
to predict the criminal behavior of individuals are less prominent in the criminal justice
crime prevention literature.
While more recent advanced analytic approaches have focused on improving
geographic predictions, predicting future dangerousness of individuals is nothing
new. For decades, researchers have been developing clinical predictions of future
dangerousness using subjective approaches, such as intelligence or expert opinion
assessments of Bhigh risk^ (Abrahamse et al. 1991; Bar-Hillel 1980; Martin and
Sherman 1986; Pate et al. 1976) and static crime analysis (Braga et al. 2012; Eck
et al. 2005; Ridgeway 2013; Sherman and Weisburd 1995). Subjective-based predictions have been replaced by more modern, reliable, and valid actuarial methods (Grove
and Meehl 1996; Litwack 2001). Generally, these methods apply mathematical models

 Chicago’s predictive policing pilot

351

to administrative data in order to conduct risk assessments and predict future dangerousness. Examples here include models predicting the risk of persons under community
supervision reoffending (Berk 2008, 2011; Berk and Bleich 2013; Berk et al. 2009;
Wright et al. 1984) and models assessing the risk that a gang affiliate will be involved
in violence as a function of their social relationships (Papachristos 2009; Papachristos
et al. 2011, 2012). Thus far, these models have been found to have a moderate level of
predictive accuracy (Yang et al. 2010).
These actuarial risk assessments have not been used in a policing context until now.
Police traditionally use intelligence and subjective assessments to identify and monitor
high-risk individuals, whereas actuarial risk assessments are relatively standard practice
in other parts of the criminal system, such as for correctional placement, court sentencing, and probation decisions (Dvoskin and Heilbrun 2001; Quinsey et al. 2000; Wright
et al. 1984). These tools and methods are routinely integrated into decisions about
supervision and sentencing, but researchers have warned that they were designed to
faciltate efficient management of institutional resources, not to target individuals, and
caution should be applied to ensure their proper use. The challenges of predicting future
offending behavior and people’s misunderstanding of predictions have been well
documented (Bar-Hillel 1980). For the models assesssing a person’s future risk of
offending or victimization, a key complication is that, while the models can identify
increased risk, the overall risk can still be very low. Indeed, a Bvery high risk^ person
for homicide might have a risk rate of 1 % per year. According to some scholars, this
may still lead to cost-effective policing strategies, as the cost of a Bfalse positive^, or
someone who is incorrectly identified as a potential offender, is likely less costly than a
Bfalse negative,^ or someone who was incorrectly identified as a non-offender (Berk
2011). To further complicate the issue, researchers note that decisions using actuarial
predictions may advance the continued marginalization of economically- and
politically-disenfranchised populations (Silver and Miller 2002), which could be more
detrimental in a policing context compared to someone in custody who has already
been found guilty.
Prevention: intelligence-led policing paradigm
While the prevention strategies connected to any particular prediction may vary in
specificity (e.g., general vs. targeted prevention) and the level of proactivity, the process
of matching them is most closely described as an Intelligence-Led Policing (ILP)
approach. A precise definition of Intelligence-Led Policing is difficult to nail down,
but most broadly, it is a strategy that integrates data analysis and intelligence to help
police prioritize their targets and activities (Ratcliffe 2002). Intelligence-led policing
grew in popularity throughout the 1990s and early 2000s (Cope 2004), and according
to Ratcliffe (2002, 2012), the defining characteristic is that intelligence is used as a
Bdecision-making tool^ to help the police prioritize their work effectively and efficiently to reduce crime, sometimes using external partnerships as a force multiplier.
Intelligence-led policing can be situated at a variety of different organizational levels
because Bintelligence^ can be used to inform programs, strategies, and even larger
administrative priorities and policies. There are three components of an intelligence-led
policing strategy, described in Ratcliffe’s B3i^ model (Ratcliffe 2005): interpreting the
available information (in predictive policing, through the use of a predictive model);

 352

J. Saunders et al.

using the information to influence an agency’s decision-makers to adapt a crimereduction strategy; and executing the strategy to impact criminal behavior and ideally,
reduce crime (in predictive policing, through the use of a prevention program).
Evidence on the effectiveness of a well-implemented ILP paradigm in general
appears to be lacking in one way or another. ILP is a framework or model and not a
singular program, which makes evaluation very challenging because it can be implemented very differently across different settings (treatment heterogeneity). Research
tends to focus on organizational issues in adopting a true ILP model at different levels
(Ratcliffe 2002, 2005; Ratcliffe and Guidetti 2008). Ratcliffe (2012) reviewed the
effectiveness of a few different programs in ILP frameworks with different methods,
targets, and settings, and found that the frameworks met with some success. However,
it may be that ILP alone is not enough to create success. It is likely the framework must
be paired with evidenced-based programs to create positive results (Ratcliffe 2002).
This is not unlike other disciplines, such as health (Chinman et al. 2004) and education
(Sailor et al. 2008) in which evidence-based frameworks or models are built to include
evidence-based programs that can vary over time and across locations.
Connecting individual-focused predictions to practice
ILP is a framework for applying interventions in which specific interventions must be
developed separately. There are a number of interventions that can be directed at
individual-focused predictions of gun crime because intervening with high-risk individuals is not a new concept. There is research evidence that targeting individuals who
are the most criminally active can result in significant reductions in crime (Braga and
Weisburd 2012; Gendreau et al. 1996; Lipsey 1999; Loeber and Farrington 1998;
Martin and Sherman 1986; Sherman et al. 1997). Additionally, successful programs
that work with these high-risk offenders have also demonstrated cost-effectiveness
(Caldwell et al. 2006; Foster and Jones 2006). For example, Martin and Sherman
(1986) found that Washington, DC’s selective apprehension program did arrest repeat
offenders more frequently than they would have been arrested otherwise, but the study
did not examine whether the increased arrests reduced crime.
Conversely, some research shows that interventions targeting individuals can
sometimes backfire (McCord 2003; Sherman 1992; Welsh and Rocque 2014). As
an example, some previous proactive interventions, including increased arrest of
individuals perceived to be at high risk (selective apprehension) and longer
incarceration periods (selective incapacitation), have led to negative social and
economic unintended consequences. Auerhahn (1999) found that a selective
incapacitation model (Greenwood and Abrahamse 1982) generated a large number
of persons falsely predicted to be high-risk offenders, although it did reasonably
well at identifying those who were low risk. At the extreme, BThree Strikes and
You’re Out^ laws were intended to incapacitate indefinitely those labeled as
lifetime high-rate offenders (Bpredators^). Gottfredson and Hirschi (1986) found
that such lifetime offender groups do not exist (i.e., propensity to engage in crime
drops dramatically with age regardless of risk factors present earlier), and thus
their incapacitation was unlikely to have prevented crime. Indeed, one study found
three strikes was positively associated with homicides and had no statistically
significant impact on crime rates (Kovandzic et al. 2004).

 Chicago’s predictive policing pilot

353

Models engaging small groups of criminally active offenders with policing actions
have seen more recent interest. A meta-analysis of focused deterrence (also known as
Bpulling levers^ models, e.g., Kennedy 1996 and McGarrell et al. 2006) found such
interventions promising at reducing community crime and violence (Braga and
Weisburd 2012). However, those selected for focused deterrence interventions were
identified through manual police and community efforts, not through predictive analytics. And, notably, the studies did not follow the individuals who were targeted, but
instead examined their impact on community crime rates, with the exception of one
evaluation that examined reduction of violence within targeted gang sets, but not
necessarily program participants (Papachristos and Kirk 2015).
Prevention strategies could affect the likelihood of criminal activity through four
potential mechanisms: (1) treatment, which would change the internal motivation to
offend (for high-risk people); (2) specific deterrence, which would change the external
motivation to offend (for high-risk people); (3) incapacitation, which would limit the
ability to offend (for high-risk people); and (4) general deterrence through perceiving
more credible deterrence messaging and changing the social environment (for all
people); see Fig. 1 for an illustration.
Three of the four crime control mechanisms focus directly on the high-risk individual, which according to research, is likely to be a much more effective way to target
scarce resources because a small minority of offenders commit the majority of crimes
(Blumstein 1986). Treatment works through changing the internal motivations for
committing a crime and providing high-risk individuals with the skills they need to
succeed (Gendreau et al. 1996; Sherman et al. 1997). Deterrence works through three
factors associated with punishment: certainty, swiftness, and severity (Becker 1993;
Cornish and Clarke 2014). People are assumed to act as if they are trading off the gains
from crime with the costs of punishment and to weigh the present more heavily than the
future (discounting). So when the expected benefit of committing a crime is greater
than the expected costs, people commit crime. The greater the probability the crime
prompts swift, certain, and severe punishment, for example, arrest or citation, the less
likely someone will commit that crime, thus deterring the would-be criminal. The
program may also reduce crime through an incapacitation effect whereby police may
immediately increase incapacitation focusing on serving warrants or on violations of
probation or parole for predicted individuals, which would remove their opportunity to
commit crimes. Removing or reducing some of the criminally active population can
even generate an overall Bcooling^ effect on others who were not targeted, thus leading
Treatment

Deterrence
High Risk
Persons

Reduction in Crimes
Committed by High Risk
Person

Incapacitation

Cooling Effect

Reduction in Crimes in
General

Fig. 1 Crime control mechanisms for individual-focused predictive policing program

 354

J. Saunders et al.

to further reduction in crime by those indirectly impacted by the crime control strategy
(Braga and Weisburd 2012; Kennedy 1996).

The Chicago SSL predictive policing program
Overview
This study investigates the predictive policing pilot program developed in collaboration
between the Chicago Police Department (CPD) and the Illinois Institute of Technology
(IIT), funded by the National Institute of Justice. 1 The aim of the pilot was to more
efficiently and effectively target limited resources towards subjects in a community at
risk for participation in gun violence, either as victims or perpetrators. IIT was
responsible for the prediction model and used data contained within the CPD data
warehouse to identify prior arrestees at a heightened risk for homicide (e.g., developed
automated intelligence). The CPD led in the prevention strategy, using the computergenerated intelligence to action the intelligence.
All five stages of ILP—(1) acquiring information, (2) analyzing intelligence/ information, (3) reviewing and prioritizing, (4) acting on intelligence and tasking responsible parties with the plan, and (5) evaluating the impact of that action—were included in
the strategy. CPD worked on step 1 by upgrading their IT infrastructure to allow them
to automate the data collection from their data warehouse (Cope 2004). IIT worked on
step 2, using the data to estimate an empirical model to generate data-driven intelligence. CPD worked on steps 3 and 4, with the Department of Operations first
reviewing the list produced by IIT, and then tasking the district commanders to action
the intelligence. While IIT continues to make model improvements and is currently on
their fourth version (Lewin and Wernick 2015), this study evaluates implementation of
model version 1.0.
CPD prediction model (interpreting)
The prediction model uses social networks (in the form of co-arrests) to previous
homicide victims to predict the likelihood of someone becoming a victim of a homicide, and therefore essentially automating the intelligence production in the ILP
strategy. The model focuses heavily on the recent body of literature examining correlations between victimization and the social connections to others who were victims of
homicide (Papachristos 2009; Papachristos et al. 2011, 2012). The first model specification, version 1.0, estimates the relative risk of a subject being a homicide victim
based on two input variables—the number of first-degree co-arrest links and the
number of second-degree co-arrest links with previous homicide victims. A Bfirstdegree link^ refers to a relationship between a subject and an individual with whom the
subject was previously co-arrested who later became a homicide victim. A Bseconddegree link^ refers to a relationship in which a subject was co-arrested with another
person who, in turn, was co-arrested with a later homicide victim (see Fig. 2).
1

This is only one component of the larger long-term collaboration which broadly explores whether and how
crime can be predicted.

 Chicago’s predictive policing pilot

355

Fig. 2 First- and second-degree co-arrest links

The counts of co-arrest links were summed over the past 5 years and weighted for
recency. A quadratic model was fitted to the data, meaning the probability of being a
homicide victim increased at an increasing rate with respect to the count of links. Early
model tests found that for some subjects (those with many links), the odds of being a
homicide victim were thousands of times the risk of the general public over a 2-year
follow-up window. To address the numerical instability2 reflected by these results, the
maximum assigned risk multiplier was B500+^. This study does not evaluate the
validity or reliability of the model, but rather focuses on the impact of the predictions
being used in practice.
CPD’s prevention strategy (influencing and impacting)
The list of subjects and their risk scores were then forwarded to the CPD
Deployment Operations Center (DOC). The DOC did a subjective review of the
list, assigning subjects from different police districts to the SSL based on their
scores and human intelligence collected by CPD. Each district was to be
assigned a list of the 20 people in their district with the top risk scores in
addition to all subjects with risk scores of B500+^. A total of 426 of some of
the highest-risk individuals 3 were put onto the SSL version 1.0 on March 26,
2013.
The individuals on the SSL were considered to be Bpersons of interest^ to the CPD.
District commanders were responsible for directing their personnel to act on this
intelligence and were to be held accountable at regular COMPSTAT meetings. Commanders were not given specific guidance on what treatments to apply to their SSL
members; instead, they were expected to tailor interventions appropriately. The main
guidance provided by central leadership was to use the programs detailed in Chicago’s
Gang Reduction Strategies (Chicago Police Department 2014) on SSL members when
possible, 4 but commanders were left wide discretion as to what actions their units
should take.

2
Here, Bnumerical instability^ refers to the likelihood that the estimates that an SSL member’s calculated risk
of a specified thousands of times more likely to be killed is due more to statistical artifacts from fitting the
quadratic curve than an accurate estimate.
3
Initially, CPD said they would put all the highest-risk individuals on the list; however, they decided to vet the
list through their Deployment Operations Center (DOC), who made some changes to who would appear on the
list, and therefore, the 426 individuals did not represent the highest scoring individuals based on the model.
4
Details of GVRS are available at http://directives.chicagopolice.org/directives/data/a7a57bf0-136d1d3116513-6d1d-382b311ddf65fd3a.html

 356

J. Saunders et al.

Since the prevention strategy was decentralized to district commanders, we collected
qualitative evidence to identify the prevention and intervention strategies used in the
field. Specifically, from October 2013 to June 2014, the research team conducted
interviews with district commanders (or their executive officers) and observed
COMPSTAT meetings. A total of 43 semi-structured interviews were conducted across
all 22 police districts. Commanders or executive officers from each district were
interviewed at least once and a maximum of three times over the course of 9 months.
Interviews lasted between 10 and 30 minutes and consisted of questions regarding the
quality of the SSL, uses of the SSL in practice, and suggestions to improve the SSL.
The research team also observed 48 COMPSTAT presentations by both area and
individual district commanders across 17 COMPSTAT meetings. Structured data
collection protocols were completed for both the interviews and the COMPSTAT
observations, which were then entered into a spreadsheet. The detailed notes were then
coded and content analyzed.
The COMPSTAT observations demonstrate most districts did not focus on
intervening with SSL subjects. In over two-thirds (68.8 %) of presentations
observed, there was no mention of the SSL. In another 12.5 % presentations,
there was an acknowledgment of the SSL, with no further discussion. In less than
one in five (18.7 %) presentations, there was both a discussion and executive
guidance, which consisted of: (1) allow beat officers to take the lead in contacting
SSL subjects, (2) consider using fugitive location and district intelligence teams to
locate SSL subjects, and/or (3) change the focus from arresting SSL subjects for
minor offenses (for which they would be immediately released) to finding ways to
detain SSL subjects over the long term. There was no evidence of executive
follow-up on these recommendations at the meetings.
Interview findings indicate two common themes for how commanders recommended addressing SSL subjects to beat officers during the pilot. First, officers (or
teams of officers) were assigned to make contact with SSL subjects on varying
schedules, usually by going to their home addresses or other locations (7 of 22
districts, or 31.6 %). Second, officers were provided information about the identities of the SSL subjects, and the officers were to make contact with the SSL
subjects Bif noticed^, especially if the subjects were acting suspiciously (10
districts, 45.4 %). The remaining cluster of districts (5 districts, 22.6 %) reported
some combination of both approaches. Interviews indicated that directing officers
to increase contact with SSLs was likely the extent of the preventive intervention
strategy for the majority of districts. In interviews with CPD staff, it was noted
that the central command encouraged the local commanders to take advantage of
enhanced legal sanctions authorized through the Gang Violence Reduction Strategy (Chicago Police Department 2014), but officers reported using these programs
for a very small subset of SSL subjects.
In sum, findings from the observations of COMPSTAT meetings and interviews
with district-level administration suggest the topic of SSL subjects received relatively
little attention. Overall, the observations and interview respondents indicate there was
no practical direction about what to do with individuals on the SSL, little executive or
administrative attention paid to the pilot, and little to no follow-up with district
commanders. These findings led the research team to question whether this should be
considered a prevention strategy.

 Chicago’s predictive policing pilot

357

Methods
This study uses an intent-to-treat analysis to estimate the impact of the CPD SSL
predictive policing program version 1.0 on city-wide homicide levels and on incidence
rates of individual-level involvement with gun crime. Outcomes of the pilot are
analyzed using two quasi-experimental methods. First, we perform an interrupted
time-series analysis of city-level data to determine whether the introduction of the
SSL changed the aggregate homicide crime trend. Second, we conduct propensity score
analysis using individual-level data in order to determine whether being on the SSL
affected the likelihood of being involved with gun violence. To do so, we exploit the
fact that some of the subjects on the SSL identified as being at highest risk were not
treated (i.e., their score met the criteria for inclusion on the SSL but they were excluded
for reasons we describe later). Last, we test hypotheses regarding the mechanisms
driving the effects.
Data
For the city-level analysis, this study used publically available data on CPD’s website
(gis.chicagopolice.org). The outcome was monthly homicides from January 2004
through April 2014. Examining a 10.75-year period, there was an average of 38
monthly homicides (SD = 10.65). Homicide counts followed a distinct seasonal pattern
and a general linear trend downward over the entire period.
For the individual-level analysis, we used two datasets provided by CPD that cannot be
made public due to the sensitive nature of the data (for aggregated descriptive statistics, see
Table 1). The first dataset was a person-level file of 873,281 individuals with arrest
histories prior to March 2013, including data on (1) demographics (gender, age at most
proximate arrest, race), (2) arrest history (number and type), (3) social network variables
(number of first- and second-degree co-arrestees who were victims of homicide), and (4)
the risk score generated by IIT. This file also contained four outcomes of interest for the 1year follow-up period: (1) murder victim, (2) shooting victim, (3) arrest for murder, and (4)
arrest for a shooting. The second dataset contained all recorded police contact with the
17,754 arrestees with at least one first- or second-degree association with a homicide
victim and law enforcement from 1980 through the end of the observation window. This
file contained 542,636 records, with 464,006 dating before the intervention period.
Analysis
City-level outcomes: interrupted time-series design (ARIMA)
This paper uses a time-series analysis to better understand whether the introduction of the
SSL version 1.0 affected the homicide trend in Chicago. Specifically, we tested whether it
led to a reduction of gun violence at an aggregate level. The ARIMA models were based
on two components: (1) a model of homicides as a time series, which makes inferences
about the underlying process in the dependent outcome using different time-series
variational components, and (2) an impact assessment of the introduction of the SSL
version 1.0 (McCleary et al. 1980). The first step was to identify the model for the time
series, which characterizes autoregressive, non-seasonal differences, and lagged forecast

 358

J. Saunders et al.

Table 1 Sample descriptive statistics and balance table
Variables

SSLs
n = 426

Mean
or %

SD

Unweighted
comp group
n = 17,754

Weighted
comparison
group
ESS = 273

Mean
or %

Mean
or %

SD

Std. effect K Smirnoff p
size
statistic

SD

Demographics
Male

95.8 %

African-American

77 %

Age at last
arrest

22.57

90.4 %

96.9 %

77.5 %
4.76

26.23

76.9 %
9.02

22.91

−0.06

.01

.29

0.00

0.00

1

5.05

−0.70

.05

.74

Prior arrests
Murder

.05

.25

.03

.19

.06

.24

−0.15

.01

1

Sexual assault

.02

.15

.02

.16

.02

.13

0.02

0

1

Robbery

.49

.91

.37

.77

.53

.88

−.04

.03

1

Aggravated
battery

.25

.57

.17

.45

.32

.65

−.12

.04

1
.51

Burglary

.35

.81

.31

.81

.48

.97

−.17

.06

Theft

.73

2.08

.77

2.81

.87

2.19

−.07

.02

1

MVT

.77

1.24

.61

1.16

.75

1.18

.02

.02

1

Arrest

3.76

4.62

3.60

4.51

3.98

4.82

−.04

.03

1

Outstanding warrant 1.21

1.73

.88

1.47

1.18

1.75

.02

.04

.88

Prior contact with CPD
All arrests

18.87

12.19

11.28

9.51

18.13

12.74

.07

.06

1.00

Contact cards

40.83

84.12

12.18

20.18 34.46

58.55

.08

.07

.40

Victimizations

.85

1.18

0.82

2.13

1.56

−.08

.03

.31

# 1st Degree

1.16

1.08

.24

.46

1.04

.92

.12

.07

.42

# 2nd Degree

7.28

5.70

1.46

1.40

6.25

5.31

.18

.12

.01

Risk score

268.58 195.15 30.48

34.61 208.62 177.14 .31

.16

<.01

.95

Risk

errors, and the second step was to model in intervention effects (Bruinsma and Weisburd
2014). Once the appropriate model was selected, we also conducted a series of sensitivity
analyses to test if there were other breaks in the time series not related to SSL.5
This study uses the ARIMAX R procedure (Ohri 2013) to test the impact of the SSL
on overall city-level homicides in Chicago. ARIMAX identified the best fitting model
of the entire time trend as ARIMA (0,0,0)(2,0,1), meaning the homicide series does not
display non-seasonal variation, yet it is characterized by the following seasonal
5

It is important to note that Chicago has gone through transformative changes over this time period, including
a new Superintendent in 2011 and the integration of COMPSTAT to provide oversight. In addition to changes
in leadership and management style, CPD has implemented a large number of homicide reduction strategies
during this time period, including the multiple changes to the Gang Violence Reduction Strategy, and gang
call-ins across different districts starting in 2010.

 Chicago’s predictive policing pilot

359

processes: two seasonal autoregressive terms, no differences across seasons, and one
seasonal moving average term.
Individual level outcomes: propensity score matching design
Chicago PD implemented the program citywide and was not willing to randomize highrisk subjects to the SSL, so an experimental setting for evaluating program effectiveness
on the individual-level was not achieved. CPD originally stated that they would treat the
top 20 individuals in each district, and everyone who scored above a certain threshold
(500+), which would lend itself to statistical approaches for quasi-experimental settings
such as a regression discontinuity design or instrumental variable approach. However,
there was, in fact, quite a substantial overlap in risk scores between the SSL and a group of
individuals that did not end up on the list, who became our pool of non-treated potential
comparison cases (see Fig. 3). This overlap of treated (i.e., on the SSL) and untreated (i.e.,
not on the SSL) happened for two reasons: (1) some districts did not have a large number
of the highest-risk people in their area of operation, so individuals with slightly lower
scores appeared on their list; and (2) the DOC had some discretion on who to put on the
SSL, particularly when there were a lot of very high-risk individuals, so not all of the
highest scoring individuals were ultimately placed on the SSL. This allowed us to match
risk scores, along with other observed features that research indicates are associated with
predicting future criminal offending, in order to generate a weighted comparison group
that was almost statistically indistinguishable from the SSL on all observable measures.6
Individuals were selected into treatment by their risk scores and, therefore, they
differed systematically from eligible comparison cases. In order to control for these
differences, case weights, wi, were estimated to alter the covariate distribution (including the risk scores) for the comparison group. This results in equivalent comparison and
treatment groups (to create treatment effects on the treated estimates, as opposed to
average treatment effects). These weights were created using boosted regression to
provide the conditional odds of receiving treatment where xi is the vector of control
variables and pðxi Þ is the estimated conditional probability of receiving treatment for an
individual with control variables equal to xi, also known as the propensity score
(McCaffrey et al. 2004; Rosenbaum and Rubin 1983):
wi ¼

pðxi Þ
:
1−pðxi Þ

Using the weighting scores generated in the TWANG R package (Ridgeway et al.
2014), the following equation reduces the bias of estimates of the effective average
treatment effect on the treated (EATT):
Xn

Xn
y
t
yi wi ð1−t i Þ
i
i
− Xi¼1
;
EATT ¼ Xi¼1
n
n
t
w
ð
1−t
Þ
i
i
i
i¼1
i¼1
6

While there is always the possibility that the groups are different on unobservable variables, we have
captured many of the important research-validated criminogenic factors. That is why we specify the approach
reduces, rather than eliminates, bias.

 J. Saunders et al.
200
180
160
140
120
100
80
60
40
20
0

Comp
SSL

Risk
17
24
31
52
83
104
125
175
202
236
265
318
338
393
420
472
479

Number of People

360

Fig. 3 Overlap of risk scores between SSLs and comparison group

where y is the outcome variable and t is the treatment indicator, for all individuals i.
After applying the propensity score weights, the comparison group was reduced from
17,754 unweighted cases to an effective sample size of 273, and the only significantly
different pre-treatment covariates were the number of second-degree associates and the
risk scores. Since these two variables were still unbalanced (e.g., the treatment group
still had higher scores than the weighted comparison), we used these variables as
covariates in all weighted regressions and estimated doubly-robust models (Funk
et al. 2011; Huber 1973). Doubly-robust models add remaining unbalanced variables
as predictors to control for any residual between-group differences to create the closest
statistical match between treatment and control groups as possible. According to
scholars, this is a valid way of controlling for unbalanced and missing variables for
causal modeling with propensity score matching (Kang and Schafer 2007).
Estimates of the impact of the predictive policing strategy could be biased if there
were other programs targeting our control and/or treatment groups. While there were
many violence reduction initiatives taking place in Chicago, including call-ins, almost
none of the SSL subjects were included. An exception was one district (out of 22),
which was a pilot area of the program that issued focused deterrence notification letters
to SSL subjects. The commander (or her designee), along with a representative from a
non-profit that coordinates services to ex-offenders, visited the SSL subjects or their
families to let the subject know that he or she was identified at being at heightened risk
for homicide victimization. This was the only place where a formal Btreatment^ was
offered as a way to prevent gun crime, but we had no way to track if any SSL subject
received services. As only one district participated, less than 5 % the SSLs were subject
to this intervention, so we would not expect this to be driving our results.
Mediation analysis
In an effort to better understand what may be driving an association between inclusion
on the SSL and our individual-level outcomes (involvement in gun violence), we
conducted a mediation analysis. The mediation analysis allows us to explain the
mechanisms that underlie an observed relationship between the SSL and involvement
in the commission of a gun crime via the contacts with police. That is, we hypothesize
the program was designed to enhance the deterrence message (through the probability
of getting caught) not only directly but also by delivering prevention strategies that
could affect decision-making that would lead to a weapons offense (as either an

 Chicago’s predictive policing pilot

361

offender or a victim). It may also have an incapacitation effect by removing the SSLs
from the community at a higher rate. In this analysis, we try to tease out whether the
individual-level effects of being on the SSL are based on deterrence or incapacitation.
Since the prevention strategy most consistently described for every district of
Chicago was contact with SSL members, we considered contact with police as a
mediator variable by first testing whether those on the SSL experienced greater contact
with police than the comparison group. We then conducted a mediation analysis in
which the likelihood of involvement with a gun crime as either a victim or offender
depends on being on the SSL (direct effect) and on the extent of contact with police,
which is a function of being on the SSL (mediation effect).

Results
Impact
City-level homicide rates
Examining the raw data, it is clear that homicide trends display a high degree of
seasonality, with more homicides occurring in the warm months.7 There is a negative
linear trend across the series (b = −0.0267), shown as the dotted line in Fig. 4, with
overall monthly homicides decreasing from January 2004 through September 2014.
Visually, it appears the homicide trend was falling prior to the introduction of the SSL
in April 2013; the SSL version 1.0 was released to the district commanders at the end of
March. The ARIMA analysis was conducted to statistically test whether the introduction of the SSL affected the monthly homicide levels in Chicago.
When we entered the month that the SSL was introduced to the ARIMA (0,0,0)(2,0,1)
model in April, we found that there was a decrease in monthly homicides of 3.90 (95 % CI:
0.22, 7.58). However, sensitivity analyses examining decreases in homicide that pre- and
post-date the SSL program show that this decrease is likely to be part of an overall trend
downward, and not specifically due to the SSL intervention. Specifically, we conducted a
set of Bplacebo tests^, where we modeled in a fake intervention date and tested whether
there was still an effect. The analysis indicated a statistically significant reduction in monthly
homicides when using 7 months pre- and post-SSL introduction. This demonstrates there
was a statistically significant reduction in monthly homicides each month above and beyond
the longer-term linear trend prior to the intervention (Table 2). With this sensitivity analysis
in mind, we conclude that the statistically significant reduction in monthly homicides
predated the introduction of the SSL, and that the SSL did not cause further reduction in
the average number of monthly homicides above and beyond the pre-existing trend.
Impact on individual risk of homicide
Between March 2013 and March 2014, there were 405 homicides in Chicago. Seventynine percent of the homicide victims had a criminal history in the years prior to their
deaths, 16 % had at least one association with a homicide victim, and 1 % of them were
7

With the exception of the winter of 2012 which did not experience the same degree of a Bcooling off^ period.

 362

J. Saunders et al.
70
60
50
40
30
20
10
0

1/2004
5/2004
9/2004
1/2005
5/2005
9/2005
1/2006
5/2006
9/2006
1/2007
5/2007
9/2007
1/2008
5/2008
9/2008
1/2009
5/2009
9/2009
1/2010
5/2010
9/2010
1/2011
5/2011
9/2011
1/2012
5/2012
9/2012
1/2013
5/2013
9/2013
1/2014
5/2014
9/2014

Introduction of SSL

Fig. 4 Monthly homicides in Chicago from January 2004 to September 2014

listed on the SSL (see Fig. 5). Looking at this from another perspective, 0.7 % of the
426 SSL subjects were homicide victims, 0.4 % of the 17,754 associates were homicide
victims, 0.029 % of the 855,527 former arrestees with no associates were homicide
victims, and 0.003 % of the rest of the almost 2 million Chicago residents without any
criminal record were victims of homicide.
A greater proportion of individuals on the SSL were involved in a shooting as either
a victim or an arrestee, 6.8 % (n = 29), than comparable former arrestees who were not
placed on the list but had at least one first- or second-degree association with a
homicide victim (3.2 %, n = 529), and 0.2 % of former arrestees with no linkages to
prior homicide victims (n = 1,939). However, once other demographics, criminal history variables, and social network risk have been controlled for using propensity score
weighting and doubly-robust regression modeling, being on the SSL did not significantly reduce the likelihood of being a murder or shooting victim, or being arrested for
murder. Results indicate those placed on the SSL were 2.88 times more likely to be
arrested for a shooting (Table 3).
Mediation analysis
Seventy-seven percent of the SSL subjects had at least one contact card over the year
following the intervention, with a mean of 8.6 contact cards, and 60 % were arrested at
some point, with a mean of 1.53 arrests. In fact, almost 90 % had some sort of interaction
Table 2 Placebo sensitivity analyses: all homicides
Intervention
month

Intervention
coefficient

SE

Oct 2012

−4.23*

2.15

Nov 2012

−4.13*

2.18

Dec 2012

−4.52**

1.39

Jan 2013

−4.46**

1.68

Feb 2013

−6.33**

1.89

March 2013

−5.54**

1.93

April 2013

−3.90**

1.88

May 2013

−3.00

2.36

*p < 0.05, **p < 0.01

 Chicago’s predictive policing pilot

No CJ Record

363

No Associates

1+ Associates

SSL

N=3, 1%
N=64, N=85,
16% 21%
N=253, 62%

Fig. 5 Homicides in Chicago from March 2013 to March 2014 by risk group

with the Chicago PD (mean = 10.72 interactions) during the year-long observation window. This increased surveillance does appear to be caused by being placed on the SSL.
Individuals on SSL were 50 % more likely to have at least one contact card and 39 % more
likely to have any interaction (including arrests, contact cards, victimizations, court
appearances, etc.) with the Chicago PD than their matched comparisons in the year
following the intervention. There was no statistically significant difference in their
probability of being arrested or incapacitated8 (see Table 4). One possibility for this result,
however, is that, given the emphasis by commanders to make contact with this group,
these differences are due to increased reporting of contact cards for SSL subjects.
Results of the impact analysis show the only statistically significant difference
between the comparison and SSL group was an increase in arrest for shooting.
Therefore, we analyzed whether the impact of SSL membership on being arrested for
a shooting was mediated by additional contact. Results indicate that the total relationship between treatment and the outcome is 1.18, and most of that comes from the direct
effect—not mediation (Fig. 6). In other words, the additional contact with police did
not result in an increased likelihood for arrests for shooting, that is, the list was not a
catalyst for arresting people for shootings. Rather, individuals on the list were people
more likely to be arrested for a shooting regardless of the increased contact.

Discussion
Several important findings have emerged from the current pilot. First, while using
arrestee social networks improved the identification of future homicide victims, the
number was still too low in the pilot to make a meaningful impact on crime. The pilot
version 1.0 of the model identified less than 1 % of homicide victims (3 out of 405) so
there is certainly room for improvement. Second, the prevention strategy associated
with the predictive strategy was not well developed and only led to increased contact
with a group of people already in relatively frequent contact with police. As such, the
main result of this study is that at-risk individuals were not more or less likely to
become victims of a homicide or shooting as a result of the SSL, and this is further
supported by city-level analysis finding no effect on the city homicide trend.
8

Most arrestees were not incapacitated for any significant period of time, but rather were booked into the
Cook County jail and released within a few hours to a few days.

 364

J. Saunders et al.

Table 3 Doubly robust treatment estimates
Estimate (SD)

t test

Exp(b)

Shooting victim

−.221 (.397)

−.558, p = .58

.802

Shooting arrest

1.06 (.430)

2.46, p = .01*

2.88

Murder victim

.037 (.669)

.055, p = .96

1.04

Murder arrest

.453 (.462)

.981, p = .33

1.57

Any weapon outcome

.168 (.300)

.559, p = .58

1.18

We do find, however, that SSL subjects were more likely to be arrested for a
shooting. The effect size was rather large, such that those placed on the SSL were
2.88 times more likely than their matched counterparts to be arrested for a shooting,
although this is based on of a small absolute number of shootings—only 9 individuals
from the SSL were arrested for a shooting in the year after being placed on the list,
against 5 from the matched control group (or 84 from the unmatched comparison
group). This raises some questions as to whether this is a Bpositive^ or Bnegative^
finding—did the program lead to an increase in shootings perpetrated by SSL subjects
due to some sort of backfire or harmful effect, similar to one that has been identified in
prior research (McCord 2003; Sherman 1992; Welsh and Rocque 2014), or were they
more likely to be arrested for a shooting they perpetrated because they were under more
intense surveillance and intelligence-gathering activities?
We explore this question by examining the date of the shooting associated with the
arrests, hypothesizing that a backfire effect could not occur before the introduction of
the SSL. Unfortunately, the data do not lead to much clarification on this point because
of missingness—there was a shooting date associated with the arrest in only 56 %
(n = 5) of the SSLs (total n = 9) and 81 % (n = 68) of the unweighted comparison group
(total n = 84). In the SSL group, 80 % of the shootings occurred after the intervention
date compared to the 88 % of the eligible unweighted comparison cases. Therefore,
based on these available data, there does not appear to be a difference in the timing of
the shootings that resulted in arrests during our observation window between SSL
members and to the unweighted comparison group (see Table 5).
While missing data make it impossible to confirm if the shootings associated with
arrests occurred before or after treatment for 44 % of the SSL subjects, there are some
reasons to believe the program did not increase shootings. If the existence of the SSL
caused a backfire of shootings that would not have happened otherwise, committed by
a group of people that the Chicago PD were more closely monitoring (and had
significantly more contact with), we would have expected more mentions of the SSL
Table 4 Doubly-robust difference in contact with CPD between SSL subjects and their matched comparisons
Estimate (SD)

t test

Exp(b)

Contact cards

.393 (.147)

2.68, p = .007

1.48

Arrest

.195 (.145)

1.34, p = .18

1.22

Any interaction

.332 (.134)

2.48, p = .01

1.39

 Chicago’s predictive policing pilot

365

Police
Contact

0.41

0.024
Arrested for
Shooting

1.06

SSL 1.0
Fig. 6 Mediation analysis results

subjects during interviews and COMPSTAT meetings. We found no evidence such a
surge occurred, either in the media or in our interviews, and interviewees were typically
forthright about problems with the pilot.
Second, there is a large overlap between victims and offenders, and we find a
program impact only on arrest for shootings and on none of the other indicators.
Indeed, the risk of an SSL subject being a victim was lower than that of the control
group, just not to a statistically significant level. Further evidence disputing the backfire
theory is that we find no indirect relationship between increased police contact on being
arrested for a shooting. Rather, it appears the list had a direct effect of increasing the
probability of arrest, which did not work through police contact. We offer one explanation of this provided to us by the Chicago PD. They emphasize that the list was used
as an intelligence-gathering source. Meaning, when there was a shooting, the police
looked at the members of the SSL as possible suspects. This suggests that the impact of
the SSL was on clearing shootings, but not on gun violence in general, during the
observation window. However, a key question is why this did not lead to a reduction in
the perpetration of gun violence.
A further discussion of drivers of the null, city-level result is also warranted. The
effect on the city-wide homicide trend is difficult to detect because homicide was
trending downward before the introduction of the SSL. In the year after the introduction
of the SSL, a greater proportion of the 426 SSL individuals were victims of homicides
(0.7 %) than the 17,754 individuals who were previously known to law enforcement
but were not placed on the list (0.036 %) or the over 2 million Chicago residents
without criminal records (0.003 %). However, controlling for demographics, criminal
history, and co-arrest social network characteristics, those on the SSL were no less
likely to be the victims of a homicide or a shooting than those who were not placed on
the list. Again, it is not clear if this is due to the absence of a defined prevention strategy
or a lack of impact because this sort of approach cannot work (e.g., theory failure vs.
implementation failure).
The prediction model in this study was a first version model based on a simple
calculation of the number of first- and second-degree co-arrestees who were homicide
victims, inspired by prior work by Papachristos and colleagues (Papachristos 2009;
Table 5 Timing of shooting associated with post-treatment arrest
Group

Total arrests for
shooting

Arrests with valid
shooting date

# Shootings after
intervention

% Shootings after
intervention

Treatment (n = 426)

9

5

4

80 %

Unweighted comparison
group (n = 17,754)

84

68

60

88 %

 366

J. Saunders et al.

Papachristos et al. 2011, 2012). Although outside the scope of this study, there is clearly
a question about how well the model performed in predicting homicide victimization.
Those placed on the SSL were twice as likely to be victims of a homicide as others with
arrest records, and 233 times more likely to be victims of homicide than the average
Chicago resident. However, even with those increased odds, the individuals on SSL
still only experienced a 0.7 % homicide rate over 12 months, illustrating how difficult it
is to predict low-incidence events. Since the first list development, the algorithm has
been improved, and now, according to ITT, 29 % of the top 400 subjects were
accurately predicted to be involved in gun violence over an 18-month window
(Lewin and Wernick 2015).
In terms of how to operationalize the predictions, the algorithm did not differentiate
between Bhigh-risk^ versus Bhigh-threat^ individuals, which was found to be problematic in the field. There were no formal differentiations in the SSL predictive model
between persons who were Bhigh threat^ in terms of being violent threats to their local
community and persons who were Bhigh risk^ of becoming a victim based on their
associations or lifestyle attributes, for example, substance use disorders and gambling,
but who were not violent themselves. Assuming these two groups require different
interventions, group identification would be necessary information for police to devise
targeted prevention strategies. According to IIT and CPD, in the newer iteration of the
SSL, specific risk factors are presented along with the list of individuals to facilitate the
appropriate selection of an intervention.

Conclusions
Although the findings of this study are more relevant for cities, particularly those
experiencing gang-related homicides, the conclusions do offer important, general
insights into the potential implications of predictive policing programs. The pilot effort
does not appear to have been successful in reducing gun violence, although it may have
improved justice by identifying more perpetrators. It does not appear that there were
any unintended crime consequences, such as a violent backfire effect. The SSL
program continues to evolve through improvements in the statistical algorithms of
the prediction model.
In terms of how the SSL program could be improved, it may be that both better
predictions for the likelihood of being involved in gun violence and prevention
strategies are necessary in order for an individual-based predictive policing program
to work in the field. The discrepancy between observed outcomes and predicted risk is
operationally significant, but statistically reasonably small given the difficulty of
predicting a low-probability event. Also, it may be that the absolute quantification of
the risk factor is less important in practice than the relative ranking of the subjects,
which we did not attempt to measure. Since this study, based on version 1.0 of the
model, the researchers who developed the SSL algorithm have progressively improved
the predictive performance of the model, which is currently in version 4 (Lewin and
Wernick 2015).
Regardless of how Bwell^ a model performs, there will always be a concern with
misidentifying people as not going to commit or be a victim of gun violence (false
negatives) and misidentification of people as going to commit or be a victim of gun

 Chicago’s predictive policing pilot

367

violence (false positives). This problem is well researched and the problem with false
negatives and positives must be considered in terms of what it is being used for
(Aitchison and Dunsmore 1980; Benjamini and Hochberg 1995). Models can be
specified to penalize false negatives or positives more strongly, and it could be that a
criminal justice application would have a different threshold than, for instance, a
medical screening tool (Berk 2011). The problem is actually a fairly standard issue of
modern statistical practice of large-scale multiple testing in the analysis of highdimensional data. In their study to identify potentially problematic officers, for example, Ridgeway and MacDonald (2009) used a false discovery rate approach and
selected a threshold of 0.50, which implies that the cost of failing to identify a problem
officer equals the cost of flagging a good officer. Analysts and program developers
should consider this trade-off for their purposes. Additionally, the consideration of this
trade-off and thresholds should explicitly take into account the prevention strategy—if
risk scores are used for a benign (and inexpensive) treatment, false positives are less of
a problem; however, once the treatment becomes invasive, detrimental, and expensive,
these false positives can become a huge problem. Conversely, false negatives in this
application can be very costly, as they represent a missed opportunity to prevent
shootings.
Second, and perhaps more importantly, law enforcement needs better information
about what to do with the predictions—the Bprevention^ part of predictive policing.
Indeed, the district commanders may have been cautious about intervening without the
explicit direction of more senior administration, since these individuals were initially
identified as potential victims. As this study shows, by providing almost no guidance to
district commanders and police officers in the field, either up front or on an ongoing
basis, they all generally opted for recommending to their officers to increase Bcontact^
with individuals on the list in varying forms and levels of effort. And our analysis
shows the officers did just that—we find a statistically significant increase in police
contacts. However, it is not at all evident that contacting people at greater risk of being
involved in violence—especially without further guidance on what to say to them or
otherwise how to follow up—is the relevant strategy to reduce violence. Alternatively,
we did not find convincing evidence that increased contact resulted in a backfiring
effect, but the possibility cannot be ruled out.
The finding that the list had a direct effect on arrest, rather than victimization, raises
privacy and civil rights considerations that must be carefully considered, especially for
predictions that are targeted at vulnerable groups at high risk of victimization. Both
local and national media openly ask whether the CPD SSL pilot constitutes racial
profiling (Erbentraut 2014; Llenas 2014; Stroud 2014). A review of the legal and
constitutional issues involved in using predictions for criminal justice purposes notes
that, while it is not legal to use protected classes as predictors (Starr 2014), classifications that have differential impact on different protected classes, such as racial groups,
that are not designed to have this impact, are legal (Tonry 1987). Tonry (1987) also
argues that the ethical issues with using prediction in a policing context are less
controversial than in other criminal justice settings because they are necessary for the
cost-effective distribution of scarce resources, and their decisions will ultimately be
reviewed by impartial judges before punishment is delivered. However, using predictions to identify individuals in the commmunity for increased police scrutiny has not
been subject to judicial review.

 368

J. Saunders et al.

Acknowledgments We would like to thank the Chicago Police Department and Dr. Miles Wernick from the
Illinois Institute of Technology for their participation and support of this evaluation. We would also like to
acknowledge research assistance provided by Sam Cooper and Alessandra Sienra-Canas. This publication was
made possible by Award Number 2009-IJ-CX-K114 - Predictive Policing Analytic & Evaluation Research
Support awarded by the National Institute of Justice, Office of Justice Programs. The opinions, findings,
conclusions and recommendations expressed in this publication are those of the authors and do not necessarily
reflect the views of the Department of Justice.

References
Abrahamse, A. F., Ebener, P. A., Greenwood, P. W., Fitzgerald, N., & Kosin, T. E. (1991). An experimental
evaluation of the Phoenix repeat offender program. Justice Quarterly, 8(2), 141–168.
Aitchison, J., & Dunsmore, I. R. (1980). Statistical prediction analysis. CUP Archive.
Auerhahn, K. (1999). Selective incapacitation and the problem of prediction. Criminology, 37(4), 703–734.
Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44(3), 211–233.
Beck, C., & McCue, C. (2009). Predictive policing: what can we learn from Wal-Mart and Amazon about
fighting crime in a recession? Police Chief, 76(11), 18.
Becker, G. S. (1993). Nobel lecture: the economic way of looking at behavior. Journal of Political Economy,
101, 385–409.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach
to multiple testing. Journal of the Royal Statistical Society: Series B: Methodological, 57, 289–300.
Berk, R. (2008). Forecasting methods in crime and justice. Annual Review of Law and Social Science, 4, 219–
238.
Berk, R. (2011). Asymmetric loss functions for forecasting in criminal justice settings. Journal of Quantitative
Criminology, 27(1), 107–123.
Berk, R., & Bleich, J. (2013). Statistical procedures for forecasting criminal behavior. Criminology and Public
Policy, 12(3), 513–544.
Berk, R., Sherman, L., Barnes, G., Kurtz, E., & Ahlman, L. (2009). Forecasting murder within a population of
probationers and parolees: a high stakes application of statistical learning. Journal of the Royal Statistical
Society: Series A (Statistics in Society), 172(1), 191–211.
Berry, M. J., & Linoff, G. S. (2004). Data mining techniques: for marketing, sales, and customer relationship
management. Indianapolis, IN: John Wiley & Sons.
Blumstein, A. (1986). Criminal Careers and BCareer Criminals^ (Vol. 2). Washington, DC: National
Academies.
Bordua, D. J., & Reiss, A. J., Jr. (1966). Command, control, and charisma: reflections on police bureaucracy.
American Journal of Sociology, 72, 68–76.
Braga, A. (2005). Hot spots policing and crime prevention: a systematic review of randomized controlled
trials. Journal of Experimental Criminology, 1(3), 317–342.
Braga, A., & Weisburd, D. L. (2010). Policing problem places: Crime hot spots and effective prevention.
New York, NY: Oxford University Press on Demand.
Braga, A., & Weisburd, D. L. (2012). The effects of focused deterrence strategies on crime: a systematic
review and meta-analysis of the empirical evidence. Journal of Research in Crime and Delinquency,
49(3), 323–358.
Braga, A., Papachristos, A. V., & Hureau, D. M. (2012). The effects of hot spots policing on crime: An
updated systematic review and meta-analysis. Justice Quarterly, 1–31.
Bruinsma, G., & Weisburd, D. (2014). Encyclopedia of Criminology and Criminal Justice. New York:
Springer.
Caldwell, M. F., Vitacco, M., & Van Rybroek, G. J. (2006). Are violent delinquents worth treating? A cost–
benefit analysis. Journal of Research in Crime and Delinquency, 43(2), 148–168.
Chicago Police Department. (2014).Gang Violence Reuction Strategy: General Order G10-01. Chicago, IL.
Chinman, M., Imm, P., & Wandersman, A. (2004). Getting To Outcomes™ 2004. Santa Monica: Rand
Corporation.
Cohen, J., Gorr, W. L., & Olligschlaeger, A. M. (2007). Leading indicators and spatial interactions: a crime‐
forecasting model for proactive police deployment. Geographical Analysis, 39(1), 105–127.

 Chicago’s predictive policing pilot

369

Cope, N. (2004). ‘Intelligence led policing or policing led intelligence?’Integrating volume crime analysis into
policing. British Journal of Criminology, 44(2), 188–203.
Cornish, D. B., & Clarke, R. V. (2014). The reasoning criminal: Rational choice perspectives on offending.
New Brunswick, NJ: Transaction Publishers.
Dvoskin, J. A., & Heilbrun, K. (2001). Risk assessment and release decision-making: Toward resolving the
great debate. American Academy of Psychiatry and the Law, 29, 6–10.
Eck, J., Chainey, S., Cameron, J., & Wilson, R. (2005). Mapping crime: Understanding hotspots (Vol. NCJ
209393). Washington, DC: National Institute of Justice.
Erbentraut, J. (2014). Chicago’s controversial new police program prompts fear of racial profiling. The
Huffington Post.
Foster, E. M., & Jones, D. (2006). Can a costly intervention be cost-effective?: an analysis of violence
prevention. Archives of General Psychiatry, 63(11), 1284–1291.
Funk, M. J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M. A., & Davidian, M. (2011). Doubly robust
estimation of causal effects. American Journal of Epidemiology, 173(7), 761–767.
Gendreau, P., Little, T., & Goggin, C. (1996). A meta-anallysis of the predictors of adult offender recidivism:
what works! Criminology, 34(4), 575–608.
Gorr, W., & Harries, R. (2003). Introduction to crime forecasting. International Journal of Forecasting, 19(4),
551–555.
Gottfredson, M., & Hirschi, T. (1986). The true value of Lambda would appear to be zero: an essay on career
criminals, criminal careers, selective incapacitation, cohort studies, and related topics*. Criminology,
24(2), 213–234.
Greenwood, P. W., & Abrahamse, A. F. (1982). Selective incapacitation. Santa Monica: Rand Corporation.
Groff, E. R., & La Vigne, N. G. (2002). Forecasting the future of predictive crime mapping. Crime Prevention
Studies, 13, 29–58.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and
formal (mechanical, algorithmic) prediction procedures: the clinical–statistical controversy. Psychology,
Public Policy, and Law, 2(2), 293.
Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo. The Annals of Statistics, 1,
799–821.
Hunt, P., Saunders, J., & Hollywood, J. S. (2014). Evaluation of the Shreveport Predictive Policing
Experiment. Santa Monica: RAND Corporation.
Kang, J. D., & Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies
for estimating a population mean from incomplete data. Statistical Science, 25, 523–539.
Kennedy, D. M. (1996). Pulling levers: chronic offenders, high-crime settings, and a theory of prevention.
Valparaiso University Law Review, 31, 449.
Kovandzic, T. V., Sloan, J. J., III, & Vieraitis, L. M. (2004). BStriking out^ as crime reduction policy: the
impact of Bthree strikes^ laws on crime rates in US cities. Justice Quarterly, 21(2), 207–239.
Lewin, J., & Wernick, M. (2015). Chicago Police Department Data Analytics and Predictive Policing. Paper
presented at the International Association of Chief’s of Police, Chicago, IL.
Lipsey, M. W. (1999). Can intervention rehabilitate serious delinquents? The Annals of the American Academy
of Political and Social Science, 564(1), 142–166.
Litwack, T. R., & 2. (2001). Actuarial versus clinical assessments of dangerousness. Psychology, Public
Policy, and Law, 7, 409.
Llenas, B. (2014). Brave New World of BPredictive Policing^ Raises Specter of High-Tech Racial Profiling,
Fox News Latino.
Loeber, R., & Farrington, D. P. (1998). Serious and violent juvenile offenders: Risk factors and successful
interventions. Thousand Oaks, CA: Sage Publications.
Lum, C., Koper, C. S., & Telep, C. W. (2011). The evidence-based policing matrix. Journal of Experimental
Criminology, 7(1), 3–26.
Martin, S. E., & Sherman, L. W. (1986). Selective apprehension: a police strategy for repeat offenders.
Criminology, 24, 155.
Mazerolle, L. G., Kadleck, C., & Roehl, J. (1998). Controlling drug and disorder problems: the role of place
managers. Criminology, 36(2), 371–404.
Mazerolle, L. G., Ready, J., Terrill, W., & Waring, E. (2000). Problem-oriented policing in public housing: the
Jersey City evaluation. Justice Quarterly, 17(1), 129–158.
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression
for evaluating causal effects in observational studies. Psychological Methods, 9(4), 403.
McCleary, R., Hay, R. A., Meidinger, E. E., & McDowall, D. (1980). Applied time series analysis for the
social sciences. Beverly Hills: Sage Publications.

 370

J. Saunders et al.

McCord, J. (2003). Cures that harm: unanticipated outcomes of crime prevention programs. The Annals of the
American Academy of Political and Social Science, 587(1), 16–30.
McGarrell, E. F., Chermak, S., Wilson, J. M., & Corsaro, N. (2006). Reducing homicide through a Blever‐
pulling^ strategy. Justice Quarterly, 23(02), 214–231.
Mohler, G. O., Short, M. B., Malinowski, S., Johnson, M., Tita, G. E., Bertozzi, A. L., & Brantingham, P. J.
(2015). Randomized controlled field trials of predictive policing. Journal of the American Statistical
Association, 110(512), 1399–1411.
Ohri, A. (2013). Forecasting and Time Series Models R for Business Analytics (pp. 241–258). Springer.
Papachristos, A. (2009). Murder by structure: dominance relations and the social structure of gang homicide.
American Journal of Sociology, 115(1), 74–128.
Papachristos, A. V., & Kirk, D. S. (2015). Changing the street dynamic. Criminology and Public Policy, 14(3),
525–558.
Papachristos, A., Braga, A., & Hureau, D. (2011). Six-degrees of violent victimization: Social networks and
the risk of gunshot injury.
Papachristos, A., Braga, A., & Hureau, D. (2012). Social networks and the risk of gunshot injury. Journal of
Urban Health, 89(6), 992–1003.
Pate, T., Bowers, R. A., & Parks, R. (1976). Three approaches to criminal apprehension in Kansas City: An
evaluation report. Washington, DC: Police Foundation.
Perry, W. L., McInnis, B., Price, C. C., Smith, S. C., & Hollywood, J. S. (2013). Predictive policing: The role
of crime forecasting in law enforcement operations. Santa Monica, CA: Rand Corporation.
Quinsey, V. L., Harris, G. T., & Rice, M. E. (2000). Violent Offenders: Appraising and Managing Risk.
Psychiatric Services, 51(3), 395
Ratcliffe, J. (2002). Intelligence-led policing and the problems of turning rhetoric into practice. Policing and
Society, 12(1), 53–66.
Ratcliffe, J. (2005). The effectiveness of police intelligence management: a New Zealand case study. Police
Practice and Research, 6(5), 435–451.
Ratcliffe, J. H. (2012). Intelligence-led policing. New York, NY: Routledge.
Ratcliffe, J. H., & Guidetti, R. (2008). State police investigative structure and the adoption of intelligence-led
policing. Policing: An International Journal of Police Strategies and Management, 31(1), 109–128.
Ridgeway, G. (2013). Linking prediction and prevention. Criminology and Public Policy, 12(3), 545–550.
Ridgeway, G., & MacDonald, J. M. (2009). Doubly robust internal benchmarking and false discovery rates for
detecting racial bias in police stops. Journal of the American Statistical Association, 104(486), 661–668.
Ridgeway, G., Braga, A. A., Tita, G., & Pierce, G. L. (2011). Intervening in gun markets: an experiment to
assess the impact of targeted gun-law messaging. Journal of Experimental Criminology, 7(1), 103–109.
Ridgeway, G., McCaffrey, D., Morral, A., Burgette, L., & Griffin, B. A. (2014). Toolkit for Weighting and
Analysis of Nonequivalent Groups: A tutorial for the twang package. R vignette. RAND.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for
causal effects. Biometrika, 70(1), 41–55.
Sailor, W., Dunlap, G., Sugai, G., & Horner, R. (2008). Handbook of positive behavior support. New York,
NY: Springer.
Sherman, L. W. (1986). Policing communities: what works? Crime and justice, 343–386.
Sherman, L. W. (1992). The influence of criminology on criminal law: evaluating arrests for misdemeanor
domestic violence. Journal of Criminal Law and Criminology, 83, 1–45.
Sherman, L. W., & Berk, R. A. (1984). The specific deterrent effects of arrest for domestic assault. American
Sociological Review, 49, 261–272.
Sherman, L. W., & Weisburd, D. (1995). General deterrent effects of police patrol in crime Bhot spots^: a
randomized, controlled trial. Justice Quarterly, 12(4), 625–648.
Sherman, L. W., Gottfredson, D., MacKenzie, D., Eck, J., Reuter, P., & Bushway, S. (1997). Preventing crime:
What works, what doesn’t, what’s promising: A report to the United States Congress. Washington, DC:
US Department of Justice, Office of Justice Programs.
Silver, E., & Miller, L. L. (2002). A cautionary note on the use of actuarial risk assessment tools for social
control. Crime and Delinquency, 48(1), 138–161.
Starr, S. B. (2014). Evidence-based sentencing and the scientific rationalization of discrimination. Stanford
Law Review, 66, 803–953.
Stroud, M. (2014). The Minority Report: Chicago’s new police computer predicts crimes, but is it racist? The
Verge.
Tonry, M. (1987). Prediction and classification: legal and ethical issues. Crime and Justice, 9, 367–413.
Weisburd, D., & Mazerolle, L. G. (2000). Crime and disorder in drug hot spots: implications for theory and
practice in policing. Police Quarterly, 3(3), 331–349.

 Chicago’s predictive policing pilot

371

Welsh, B. C., & Rocque, M. (2014). When crime prevention harms: a review of systematic reviews. Journal of
Experimental Criminology, 10(3), 245–266.
Wright, K. N., Clear, T. R., & Dickson, P. (1984). Universal applicability of probation risk‐assessment
instruments. Criminology, 22(1), 113–134.
Yang, M., Wong, S. C., & Coid, J. (2010). The efficacy of violence prediction: a meta-analytic comparison of
nine risk assessment tools. Psychological Bulletin, 136(5), 740.
Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and
streaming data. McGraw-Hill Osborne Media.
Jessica Saunders is a Senior Criminologist at RAND and Professor with Pardee RAND Graduate School. Her
research focuses on policing and policing reform, school safety, and criminal justice policy evaluation.
Priscillia Hunt is an Economist at RAND, Professor with Pardee RAND Graduate School, and Research
Fellow of the Institute for the Study of Labor (IZA). Her research interests focus on the economics of crime,
including studies of criminal behavior, assessing operations of the criminal justice system, and evaluating
impacts of criminal justice policy.
John S. Hollywood is a senior operations researcher at the RAND Corporation. He conducts research on
criminal justice technologies, to include leading a general technical assessment of predictive policing for the
National Institute of Justice. He has also led multiple studies to identify and prioritize top science and
technology-related needs for criminal justice for NIJ.