Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 1 of 31

DISPARITY REVIEW – PART I

Disparity Review – Part I
Using Propensity Score Matching to
Analyze
Racial Disparity in Police Data

April 2019

SEATTLE POLICE DEPARTMENT

1

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 2 of 31

EXECUTIVE SUMMARY

The Consent Decree included several requirements related to Bias-Free Policing, including the
general mandate that the Seattle Police Department (“SPD”) “deliver police services that are
equitable, respectful, and free of unlawful bias, in a manner that promotes broad community
engagement and confidence in the Department.” To that end, it also required updates to SPD’s
Bias Free Policing policies and training. These requirements were satisfied during Phase I of the
Consent Decree. [CITE]. However, the Consent Decree also stated that, in consultation with the
Community Policy Commission (“CPC”), SPD should consider whether to revise SPD Manual
5.140. In [YEAR], SPD did amend Policy 5.140 (Bias-Free Policing), which now contains a
“Disparate Impacts” provision.
Under that policy, the Seattle Police Department committed to eliminating policies and practices
that have an unwarranted disparate impact on certain protected classes. While recognizing that
it is possible that the long-term impacts of historical inequality and institutional bias could result
in disproportionate enforcement, even in the absence of intentional bias, the Department’s
policy is to identify ways to protect public safety and public order without engaging in
unwarranted or unnecessary disproportionate enforcement.
As the Monitor found in its Tenth Annual Systemic Assessment on Stops and Detentions, disparity
in law enforcement activities exists in the City of Seattle with respect to Stops and Detentions.
SPD’s own analysis, including in preparation of this report, confirms this fact. The question,
therefore, becomes whether the disparities are “unwarranted.” A common example in thinking
about warranted vs. unwarranted disparity is enforcement activities with respect to a criminal
gang. A gang may be formed by members with the same race or national origin. If that gang
engages in criminal activity and law enforcement addresses that activity, the statistical results
may indicate a higher rate of enforcement against that particular race or ethnicity. Such disparity
would not be considered “unwarranted.”
The challenge, of course, is in discerning what observable disparities are warranted vs.
unwarranted. As the Monitor recognized in its Tenth Annual Systemic Assessment on Stops and
Detentions, quantifying unwarranted racial disparities in police contacts is difficult and there is
no academic consensus about the best method to do so. In that assessment, using various
approaches, the Monitor’s team sought to examine, both quantitatively and qualitatively, the
nature of the disparity observed and, broadly, the extent to which stops, and outcomes, were
supported by policy and law.
Among the methodologies selected, the Monitor used the technique of Propensity Score
Matching (PSM) – an approach that has gained popularity in social science fields, including
criminal justice – that uses regression to “score” how similar events are to each other across a
variety of factors and match them for comparison. For example, PSM was used to match a Terry

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 3 of 31

DISPARITY REVIEW – PART I

stop where the stopped individual was Black to a stop where the subject was White, but all other
known (available in fielded data) were as similar as possible.
In the current report, SPD builds upon and extends the Monitor’s application of Propensity Score
Matching.1 However, SPD can do so in even more robust manner through its Consent Decreedriven investments in data systems and analytic tools and professionals. From these systems,
SPD has created the capacity to routinely analyze its processes and actions in a way historically
restricted to multi-year engagements with research partners. Accordingly, under this review and
report, SPD was able to (1) match additional factors that now are available through SPD’s data
analytics platform but were not available to the Monitoring Team at the time of data production
for the Tenth Systemic Assessment; and (2) apply Propensity Score Matching to examine the role
of race in Use of Force data relating to the pointing of a firearm – an area that, like stops and
frisks, is often a matter of substantial officer discretion.
As is shown in the published reports SPD continues to release, and directly analyzable in the
public dashboards SPD has created, it is important to note that the behaviors under review here
are relatively rare events. To put the numbers presented here in context, over the study period,
SPD answered 978,608 calls for service, comprising 2,175,181 associated officer dispatches.
Across these events, officers, engaged in 19,544 Terry stops and 4,371 uses of force –
approximately three-quarters of which force involved no greater than low level, Type I force.
The Department emphasizes this context not to minimize the occurrence of these events, nor
the potential disparity that exists in their occurrence. The intent is quite the opposite: to highlight
the need to examine these events with a precise instrument capable of isolating true differences
between relatively rare events occurring in a sea of activity across an entire complex and everchanging city.
As SPD has committed in its policy, if and when SPD identifies and verifies unwarranted disparate
impacts, the Department will consult with neighborhoods, businesses, community groups,
1

See O’Toole, K (2018, pp. 70-71) The Garda Inspectorate: Driving Collaborative Reform Through a Model of
Equilibrated Governance. (Submitted and accepted in fulfillment of PhD requirements of the School of Business,
Trinity University Dublin), describing the model for good research in non-traditional scientific method:
The learner works through a cyclical process of consciously and deliberately (1) assessing a situation which
is calling for change; (2) planning to take action; (3) taking action; and (4) evaluating the action, leading to
further cycles of planning, taking action, and so on. Just as the empirical researcher may adjust a
methodology to account for confounding variables that may surface, research cycles are repeated and
modified to account for knowledge learned from the prior iteration, until such time the researcher
determines the process is complete.

SEATTLE POLICE DEPARTMENT

3

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 4 of 31

and/or the Community Police Commission, and the Office of the Inspector General for Public
Safety, to explore equally effective alternative practices that would result in less disproportionate
impact. Alternative enforcement practices may include addressing the targeted behavior in a
different way, de‐emphasizing the practice in question, or other measures.
This report moves SPD one step closer toward that end: it solidifies findings of disparity regarding
frisk rates for weapons and the pointing of firearms. As discussed in further detail below, SPD
will focus additional research and analytics on these issues in the next iteration of its study of
disparity, which is due to be filed with the Court in December 2019.

Summary of Major Findings
The report examines racial and ethnic disparities in a variety of police actions where research
suggests there is more of an opportunity for disparate effects. The data for this report cover twoand-a-half years (2016 through mid-2018) of interactions. The primary results of these analyses,
which are based on a relatively low number of observations, along with initial thoughts and/or
next steps for analyses and action, are summarized below:
1. Frisks
a. Our overall findings demonstrate that minorities are frisked at a higher rate than
non-minorities.
b. There were 3,021 frisks2 conducted during the 14,036 stops (21.5%) reported and
matched in the study time period:
i. Asians were frisked (frisks = 123; rate = 25.7%) more than whites (frisks =
92; rate = 19.2%) in matched stops
ii. Blacks were frisked (frisks = 1,183; rate = 24.7%) more than whites (frisks
= 1,002; rate= 20.9%) in matched stops
iii. Hispanics were frisked (frisks = 229; rate = 27.2%) more than whites (frisks
= 177; rate = 21%) in matched stops
iv. American Indians/Alaska Natives were frisked (frisks = 127; rate = 10.6%)
more than whites (frisks = 122; rate = 10.2%) in matched stops
v. Non-white3 individuals were frisked (frisks =1,636; rate = 23.3%) more than
whites (frisks = 1,385; rate = 19.7%) in matched stops
2. “Hit” rate (finding a weapon during a frisk)
a. We found that frisks of minorities produce weapons about 1/3 less frequently
than frisks of similarly-situated whites.

2

Note: these incident counts are different than total counts reported later in the report as these are the total events
within each matched population.
3
Subjects reported to be of “unknown,” “missing” or “other” perceived race by officers.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 5 of 31

DISPARITY REVIEW – PART I

b. Of the 3,021 frisks in the study, a weapon was found in 627 incidents, an overall
hit rate of 20.7%
i. Asians had a weapon found (hits = 21; rate = 17.1%) in fewer similar frisks
than whites (hits = 20; rate = 21.7%)
ii. Blacks had a weapon found (hits = 191; rate = 16.1%) in fewer similar frisks
than whites (hits = 247; rate = 24.7%)
iii. Hispanics had a weapon found (hits = 42; rate = 18.3%) in fewer similar
frisks than whites (hits = 40; rate = 22.6%)
iv. American Indians/Alaska Natives had a weapon found (hits = 26; rate =
20.5%) in fewer similar frisks than whites (hits = 42; rate = 34.4%)
v. Non-whites4 had a weapon found (hits = 275; rate = 16.8%) in fewer similar
frisks than whites (hits = 352; rate = 25.4%)
3. Stop Durations
a. Of the 14,036 stops in the study, the average duration was 10.9 minutes.
i. Asians were stopped (average time = 11.1 mins) for a shorter period than
whites (average time = 11.9 mins) in similar stops
ii. Blacks were stopped (average time = 11.0 mins) for a shorter period than
whites (average time = 11.3 mins) in similar stops
iii. Hispanics were stopped (average time = 15.4 mins) for a shorter period
than whites (average time = 15.6 mins) in similar stops
iv. American Indians/Alaska Natives were stopped (average time = 10.9 mins)
for a longer period of time than whites (average time = 10.5 mins) in similar
stops
v. Non-whites5 were stopped (average time = 11.1 mins) for a longer period
of time than whites (average time = 10.8 mins) in similar stops
4. Use of Force – Pointing of Firearm
a. Firearms were pointed at non-whites about 30% more often than at similarly
situated whites.
b. In the 3,138 incidents in this study where force was used, a firearm was pointed
in 675 (21.5%)
i. Asians had a firearm pointed at them (point = 47; rate = 14.2%) more
frequently than whites (point = 33; rate = 10%) in similar incidents

4

Subjects reported to be of “unknown,” “missing” or “other” perceived race by officers.

5

Ibid.

SEATTLE POLICE DEPARTMENT

5

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 6 of 31

ii. Blacks had a firearm pointed at them (point = 290; rate = 13.8%) more
frequently than whites (point = 203; rate = 9.7%) in similar incidents
iii. Hispanics had a firearm pointed at them (point = 54; rate = 19.4%) more
frequently than whites (point = 29; rate = 10.4%) in similar incidents
iv. Those individuals who were not specified or not in a large enough
population to separately examine had a firearm pointed at them (point =
149; rate = 12.4%) more frequently than whites (point =184; rate = 10.1%)
in similar incidents
v. Non-whites6 had a firearm pointed at them (point = 384; rate = 12.2%)
more than whites (point = 291; rate = 9.3%) in similar incidents
Taken together, these results generally are in-line with those produced by the Monitoring Team
in the Tenth Assessment. The current analysis did find a lower rate of disparity among the
decision to frisk an individual during a stop; this difference likely is the result of the SPD model
having additional data available to it to ensure the stops were more comparable.
It should also be noted that SPD recently conducted an audit of its stop and detention practices
for the period from January 1 to June 30, 2018. It determined that in 93.5% of all stops during
the study period, the stop template filled out by the officer contained a narrative that
documented adequately reasonable suspicion for the stop;7 in over half of a random sample of
the remaining 6.5% of cases, adequate reasonable suspicion was identified in the accompanying
police report. The Monitor and the U.S. Department of Justice each performed an independent
validation of these findings and concluded that SPD continues to maintain compliance with the
relevant Consent Decree requirements. This audit was submitted to the Court on March 7, 2019.
While SPD has limited its approach to Propensity Score Matching for purposes of this report, SPD
does not intend to suggest that there are not other ways to examine these issues; certainly, the
robust argument in academic circles as to how best to address the issue of disparity well
illustrates the debate. Rather, given the nature of SPD’s data, in the interest of focusing this
report, and with a good foundation for future steps laid using this methodology in the Monitor’s
Tenth Assessment, SPD has selected Propensity Score Matching as the most appropriate
approach for the purposes of this research cycle and this particular inquiry.
The PSM methodology allows SPD to know that these are real and true differences in how the
department interacted with these populations while providing police services. They do not,
however, explain what was occurring in each of these interactions, including a variety of very
important factors not yet available to the model – officer prior knowledge of the individual,
behaviors and words expressed during the incident, and recent crime trends in the specific area.
SPD plans to include more of these factors in Phase II report work, as well as to undertake the
qualitatively intensive analyses of “scoring” the actual interactions. SPD is working with experts
in assessing police behavior and disparity to develop the tools for this analysis so the nature of
6

7

Subjects reported to be of “unknown,” “missing” or “other” perceived race by officers.

As noted previously, the lack of clearly documented reasonable suspicion does not mean the remaining 6.5% of
stops were not proper.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 7 of 31

DISPARITY REVIEW – PART I

each interpersonal interaction can be understood beyond administrative data. SPD will leverage
the results of these analyses to help prioritize the areas where it should begin additional work in
understanding what may be leading to the outcomes of the interaction and how to potentially
address those factors through policies, training, and operational guidelines.
While all the matches that showed signs of disparity will be further examined, SPD will prioritize
its analytical work for those pairs that showed larger differences, namely: (i) the disparate rate
of frisk for Asian, Black, and Hispanic individuals compared to Whites; (ii) the lower rate of finding
a weapon during those frisks for Asians, Blacks. Hispanics, or American Indian/Alaska Natives
compared to Whites; and (iii), the higher rate of pointing firearms at Asian, Black and Hispanic
individuals compared to Whites. It is possible that the differences in these behaviors are
attributable to circumstances not captured currently in fielded data (such as officers’ prior
knowledge of the suspect, etc.).

SEATTLE POLICE DEPARTMENT

7

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 8 of 31

INTRODUCTION
Under Paragraph 223 of the Consent Decree, the Court retains jurisdiction over this matter “until
such time as the City has achieved full and effective compliance and maintained such compliance
for no less than two years.” On January 10, 2018, the Court entered an order finding the
Department to be in “full and effective compliance” as of the date of the Order, thus commencing
the two-year “sustainment period.” Dkt. #439. The Court further ordered the parties and the
monitor to “meet, confer, and prepare a plan for discharging their obligations under the Consent
Decree” during this two-year period.
On March 13, 2018, the Court entered an order approving the Sustainment Plan developed
pursuant to the Court’s January 10th order. This plan, and an attached matrix of deadlines,
became the governing documents for this Sustainment Period. To a large extent, the
Sustainment Plan follows the scope of Consent Decree itself, as it requires SPD to self-assess
those core topical areas that, collectively, cover the entirety of SPD’s obligations as set forth in
Section III of the Consent Decree (“Commitments”) and assessed, comprehensively, in Phase I.
The Sustainment Plan also includes commitments by SPD not specifically called for under the
Consent Decree. SPD is committed to continuing the work it has done to further itself as a
“learning” institution and driver, nationally, of innovative and data-informed policing. This
analytic project is a part of that promise.
Included within the Sustainment Plan is SPD’s commitment to continue its emphasis on impartial,
bias-free policing, to regularly examine disparities across its data to ensure that this emphasis is
carried out in practice, and to provide the Court with a fuller view of “disparities with respect to
stops, searches, and seizure; use of force; and other law enforcement activity.” SPD also
demonstrates that commitment with its Bias-Free Policing policy (Seattle Police Manual, 5.140) and
requiring minimizing bias training that is coordinated with the state’s Criminal Justice Training
Center.
The Monitor noted in his Tenth Systemic Assessment,
Sorting out whether disparity on the basis of suspect classifications, like race, is
the result of intentional discrimination, the result of unknowing or subconscious
bias, or is the effect of one or many factors having nothing to do with race or that
are tangled up with race is challenging. When there are reasonable and legitimate
reasons for a practice that produces disparities with respect to whom the practice
is applied, the courts have been historically reluctant to invalidate government
actions as discriminatory and impermissible.
Consequently, neither the Consent Decree nor the Court-approved policies on
stops and bias-free policing demand that SPD immediately stop practices that it
may determine are linked to disparate impacts. Instead, and importantly, it
requires that SPD determine whether such disparities are warranted or
unwarranted and, where “unwarranted disparate impacts are identified” with
respect to a given SPD practice or policy, “the Department will consult as

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 9 of 31

DISPARITY REVIEW – PART I

appropriate with neighborhood, business and community groups, including the
Community Police Commission, to explore equally effective alternative practices
that would not result in disproportionate impact.”
Tenth Systemic Assessment, Dkt. #394, pp. 40-41 (internal citations omitted).
Consistent with this understanding, SPD’s work over the past six years to identify disparities, and
its evaluation as to whether disparities are “unwarranted,” has taken many forms. As routine
practice, SPD includes in each of its annual reports on use of force, crisis response, and stops and
detentions, a section that details the subject demographics in each area. To foster further
transparency and encourage exploration of its data, SPD presents data in each area in publicfacing, queryable dashboards;8 to foster the analytic inquiry of the many researchers who
regularly seek this data, SPD offers public links to the raw data underlying the same. In meeting
a key deliverable under the Sustainment Plan, SPD presented in its Stops and Detentions Audit,
filed with the Court on March 7, 2019, a qualitative analysis of an agreed-upon subset of its stops
data, to which the Monitor and DOJ verified continuing compliance with Consent Decree
obligations; a quantitative effort (Pearson’s Chi-square) to test the relationship between the
outcome of the audit and the perceived race and gender of the subject failed to achieve statistical
significance.
In the Tenth Systemic Assessment, the Monitor undertook a two-pronged approach to examine
SPD’s stops and detentions: a quantitative approach, which included several independent
statistical approaches to examining disparity, and a qualitative approach, which involved a review
of a statistically valid subset of stops documentation to determine whether the stops, and actions
taken thereafter (length of the stop and any subsequent weapons frisk) were supported by
articulated reasonable suspicion. As to the latter, the Monitor found that while there was
documented disparity proportional to population demographics as represented in tract-level
census data, SPD was in compliance with relevant Consent Decree requirements relating to both
the documentation and constitutionality of its stops over that study period. In that regard, the
Monitor reported as follows:
•

The vast majority of stops were adequately justified. SPD officers have reasonable,
articulable suspicion that the involved subject had been, was, or would soon be
engaged in criminal activity – which is the required legal standard for initiating a socalled Terry stop – in 99 percent of stops. (In the remaining one percent, the
Monitoring Team was unable to reach a determination based on the level of
articulation in the documentation it reviewed.)

8

Accordingly, for those interested in examining the findings presented in this report in the context of SPD’s overall
stops and detentions, or use of force, crisis response activity, or overall crime stats, SPD refers those readers to the
data and information pages of its website.

SEATTLE POLICE DEPARTMENT

9

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 10 of 31

•

The vast majority (97 percent) of frisks were adequately justified. In 97 percent of
frisks conducted during Terry stops, officers had appropriate and separate grounds
for conducting a minimally-invasive search for a weapon during a Terry stop and were
not automatically conducting a frisk of subjects simply because they were stopped.
(In the remaining three percent, the Monitoring Team was unable to reach a
determination based on the level of articulation in the documentation it reviewed.)

•

Most stops were appropriately limited to a reasonable scope and reasonable
duration, as required under law and SPD policy. Race did not impact the odds of being
subjected to a stop of an unreasonable scope or unreasonable duration.

•

Few additional policy issues with respect to initiating or conducting the stops were
identified.

Tenth Systemic Assessment, Dkt. #394, pp. 4-6 (emphasis in original).
As to the quantitative analysis, the Monitoring Team prefaced its approach:
Evaluating the extent to which an individual’s race or ethnicity affects the likelihood
that he or she will be stopped by police is a challenging analytical task. To do so,
researchers must develop a benchmark against which to compare the racial
distribution of actual stop data with an individual’s risk of being stopped in the
absence of bias. An appropriate benchmark must incorporate the various legal and
non-legal factors that shape stop risk, including when, where, and how often a person
is out in public, and the nature of their appearance, activity, and demeanor while
engaging in public activity, among several other relevant variables.
As reliable records of these variables are difficult – if not impossible – to obtain,
analysts are forced to develop statistical proxies based on rough assumptions about
the profile and activity of a jurisdiction’s residents and the priorities of the relevant
law enforcement actors. This has led to a robust discussion in academic and social
science literature about both the merits and disadvantages of a host of statistical tests
and benchmarks.
Tenth Systemic Assessment, Dkt. #394, pp. 50-51 (internal citations omitted).
Explicitly avoiding the debate as to “what type of statistical analysis is best or most accurate, or
what ‘benchmark’ is most appropriate or analytically powerful” (Id. at p. 51), and cautioning as
to interpretation of the findings given the limitations of both the data and the analyses
themselves, the Monitoring Team chose to examine the question of “whether stop activity
affects some people more than others” through three distinct approaches, carried across 13,124
administrative records of stops (Terry templates) generated by Seattle Police Officers between
July 1, 2015 and January 31, 2107, described as, and with findings, as follows:

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 11 of 31

DISPARITY REVIEW – PART I

1. Overall, population-based analysis that compared the racial distribution of stops to
Seattle racial population data.
2. Statistical modeling of the effects of beat-level variation in resident demographics and
crime incident on the distribution of Terry stops at the patrol beat level, to provide some
control for crime factors that may influence the findings above, through a series of multivariate regressions.
3. A series of statistical tests intended to analyze post-treatment stops and outcomes (stop
disposition, duration, and rate of frisk), where the point of comparison is among the
population of those who have been stopped, rather than the general population.
See, generally, Tenth Systemic Assessment, Dkt. #394, pp. 51-77.
SPD recognizes the Monitor’s Tenth Systemic Assessment as a comprehensive foundation from
which to launch further inquiries. SPD’s purpose in presenting this report, pursuant to its general
obligation under the Sustainment Plan to further examine disparity in its data, is not to revisit or
attempt to replicate the many sophisticated analyses that, either, were done by experts on the
Monitoring Team or that have formed the vast body of academic literature around disparity
throughout the criminal justice system – efforts that notwithstanding the dedicated careers of
many within the social sciences still, as the Monitoring Team acknowledged, have not settled on
either a complete or agreed-upon approach to examining this complex subject.9 SPD is proud of
its committed partnerships, nationally and internationally, to advancing the study of the social
science of policing, but SPD is not itself an academic institution with the expertise or resources
to undertake analyses equivalent in scope or sophistication as those performed by professional
researchers. Nor does SPD seek to wade into what the Monitor referred to as the “robust
discussion in academic and social science literature” as to the “correct” approach.
Rather, SPD’s purpose in presenting this report, and the report that will follow, is three-fold: (1)
primarily, to critically examine the data currently available to the SPD to determine if its actions
are applied in a disparate manner, ; (2) to build upon the Monitor’s findings by leveraging
additional data now available within SPD’s Data Analytics Platform (DAP) to further refine, by
accounting for additional potential confounding variables, analyses around the role of race in
9

This report expressly does not seek to provide an aggregate overview of its stops data; for those interested in
exploring that data specifically, SPD directs readers to its public-facing, navigable dashboards through which the user
can explore demographic information relating to subjects across a broad range of police encounters, including stops
and detentions, use of force, and crisis response. To the extent that the reader is interested in a more qualitative
assessment as to the nature, and constitutionality, of SPD’s stops and detentions, the Department refers the reader
to its Stops and Detentions Audit, Dkt. No. 547-1, filed with the Court on March 7, 2019, in which the Monitor found
that SPD was sustaining its compliance with Consent Decree requirements with respect to its stops and detentions.

SEATTLE POLICE DEPARTMENT

11

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 12 of 31

post-stop outcomes using Propensity Score Matching (PSM) in order to identify whether
identified disparities are “unwarranted”; and (3) to eventually use the findings of this iterative
study to examine the implication of its findings on any SPD policies, training, and/or operations
as an organization committed to data informed decision-making and continuous improvement.
Second, consistent with principles of critical, iterative review this report builds on the prior work
of the Monitor and offers direction for further work, but, as is true of research across this field of
study, it should not be understood to be exhaustive or conclusive as to any cause(s) of the
disparity noted. As those who study this complex topic professionally will note, there are
inevitably circumstantial factors present in these encounters that are not captured through
fielded data that may be discerned through a qualitative review of the facts of these cases. For
example, in their validation of SPD’s recent Stops and Detentions Audit, both the Monitor and
DOJ observed that a substantial number of SPD’s reported stops appeared not to be investigative
stops, but rather supported by the higher bar of probable cause. One question raised, that may
bear on training, is whether some frisks documented on Terry templates are more appropriately
categorized as non-discretionary searches incident to arrest.
In its follow-up report, SPD will analyze any descriptive information obtained from a structured
review – including examinations of body worn video, report narratives, and audio from a
weighted sample of cases that may yield additional insight into the numbers reported here and
provide yet further direction for future research. Additionally, this work will be informed by
structured discussions with community members regarding their perceptions of what occurred
in the interactions and how they could have gone differently. These discussions will be grounded
in the additional analyses in the Phase II report, as well as work with a national expert on
evaluating disparity in police actions, as well as the CPC.

METHODOLOGY10
Propensity Score Matching
Propensity Score Matching (PSM) allows a researcher to estimate the effect of a condition,
intervention, or policy, while accounting for factors that predict the likelihood of receiving that
treatment– in this case, the treatment is the effect of race on (1) certain Terry stop outcomes
and (2) the likelihood that an individual will be the subject of a Type I, firearm-pointing use of
force. PSM accomplishes this by pairing treatment and control units that have similar propensity
scores – based on the presence or absence of known variables – when random assignment is not
feasible.

10

This is not intended to be a technical report on the methodology itself. A comprehensive discussion of PSM theory,
the mathematical underpinnings of determining statistically valid matches, and the application is contained in the
Monitor’s Tenth Systemic Assessment. For purposes of readability, SPD refers the reader interested in a fuller
analytic discussion to that well-articulated explanation or to the many studies cited herein.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 13 of 31

DISPARITY REVIEW – PART I

In the social sciences, involving dynamic human interactions in the real world where latent
variability is inevitable, PSM helps to focus the analysis by holding constant factors that can be
held constant – in other words, it “eliminates many things from being causes, and this is probably
very good, since it gives more specificity to the meaning of the word cause.” Holland (1986, p.
959).11 PSM has been applied consistently in medical and psychological/psychiatric settings since
the 1980s, but has only relatively recently been applied to the criminal justice sciences. Slade et
al., (2008)12, for example, used PSM to explore the role of substance use disorders in adult
incarceration in a study of 780 juveniles. Gibson et al., (2009)13 examined gang membership and
victimization. A 2010 chapter in Springer’s Handbook of Quantitative Criminology cites myriad
studies and a growing popularity of the approach in criminology and criminal justice research.14
While there are a number of statistical approaches that researchers have employed to examine
disparate impact, an advantage of applying PSM in observational studies, at least one researcher
has noted, is that “adjusting for confounding variables using the propensity score offers an
alternative to multivariate regression that is more interpretable, less prone to errors in model
assumptions, and ultimately easier to present to stakeholders[.]”15; see also the Monitor’s Tenth

11

Holland, P.W. (1986). Statistics and causal inference. Journal of the American Statistical Association., 945-960.

12

Slade, E., Stuart, E. A., Salkever, D. S., Krakus, M., Green, K. M., & Ialongo, N. (2008). Impacts of age of onset of
substance use disorders on risk of adult incarceration among disadvantaged urban youth: A propensity score
matching approach. Drug and alcohol dependence, 1-13.
13

Gibson, C. L., Miller, J. M., Jennings, W. G., Swatt, M., & Gover, A. (2009). Using propensity score matching to
understand the relationship between gang membership and violent victimization: A research note. Justice Quarterly,
625-643.
14

Apel, R. J., & Sweeten, G. (2010). Propensity score matching in criminology and criminal justice. In Handbook of
quantitative criminology (pp. 543-562). New York, NY: Springer.
15

Ridgeway, G. (2006). Assessing the effects of race bias in post-traffic stop outcomes using propensity scores.
Journal of Quantitative Criminology, 1-29.

SEATTLE POLICE DEPARTMENT

13

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 14 of 31

System Assessment, discussing the advantages of PSM over linear regression.16 As another
researcher argued,
Propensity score (PS) methods offer certain advantages over more traditional
regression methods to control for confounding [variables] by indication in
observational studies. Although multivariate regression models adjust for
confounders by modeling the relationship between covariates and outcome, the
PS methods estimate the treatment effect by modelling the relationship between
confounders and treatment assignment. Therefore, methods based on PS are not
limited by the number of events, and their use may be warranted when the
number of confounders is large, or the number of outcomes is small.17
Within the intersection of policing and disparate impact research specifically, two publications
have applied PSM to estimate the effects of “decisions to search.” Holding constant other
variables, Higgins, Jennings, Jordan & Gabbidon (2011) found that subjects identified as Black
were more likely to be frisked than White subjects, but found no differences between Hispanics
and Whites, suggesting that “race, but not ethnicity, appears to be a causal factor in a police
officer’s decision to search.”18 Applying PSM to examine the effect of race in traffic stop
outcomes in Oakland, Ridgeway (2006) concluded, among other findings, that while non-white
drivers were treated equitably in terms of traffic stop outcomes such as citation rates and
consent search rates, white drivers were less likely to be frisked, and that race appeared to have
the strongest influence on the duration of the stop, with stops of Black drivers less likely to last
less than ten minutes. As to the methodology itself, Ridgeway noted that PSM provides “a
transparent, intuitive, and easily implemented method for assessing race” in outcome
determination.19
SPD also recognizes, however, that (as with other singular methods of data analysis), PSM has
limitations. In particular, PSM may result in false negatives for disparity. Because this report
found positive results for disparity, this concern is not currently as issue. However, in order to
prevent the possibility of false negatives in the second phase of this analysis (due to be filed in
December 2019), SPD commits to providing basic summary statistics without adjustment for
comparison groups, as well as standard (probit/logit/logistic/linear probability model)
16

As the Monitor explained, “The quantity of interest is the effect of a subject’s race or ethnicity on the likelihood
of experiencing certain post-stop outcomes, which can be estimated using marginal effects in the case of linear fixed
effects model, and average treatment effects for the matching models [such as that used here].”
17

Benedetto et al. (2018) Statistical primer: propensity score matching and its alternatives. Eur J Cardiothoracic
Surg. 53(6) 1112-1117) (arguing for application of PSM in observational studies where the number of outcomes is
low and the number of confounding variables high).
18

Higgins, G. E., Jennings, W. G., Jordan, K. L., & Gabbidon, S. L. (2011). Racial profiling in decisions to search: A
preliminary analysis using propensity-score matching. International Journal of Police Science & Management, 336347.
19

Ridgeway, G. (2006). Assessing the effects of race bias in post-traffic stop outcomes using propensity scores.
Journal of Quantitative Criminology, 1-29.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 15 of 31

DISPARITY REVIEW – PART I

regressions for comparative purposes, in order to help ensure against a misleading negative
result.

I.

Investigative Stops – Data

The data population selected for purposes of examining outcomes relating to investigative stops
comprised a total of 19,544 Terry stops reported over a two-and-a-half-year time (2016 through
mid-2018). Controlled variables were selected based upon the information available in the
Department’s Data Analytics Platform (DAP), which includes not only the information available
in the Terry templates but additional information relating to characteristics of the officer.
Variables that were controlled in the match are presented in Table 1. These variables were
selected for their theoretical effect on the outcome. Note: Again, a more complete discussion of
aggregated data associated with many of these variables is presented in the Department’s Annual
Stops and Detentions Reports for the years in question; the data itself is available for download
or exploration through the Department’s Terry Stops Dashboard.
Table 1: Controlled Variables – Stops

A. Event Data
Call Type:20 All activities begin with either a call from the public (call for service, or CFS) or an
officer-initiated action in response to activity or behavior they observe. Broadly, stops are
characterized (call-typed) as either dispatched or on-viewed, depending on how they originated.
In total, 68.5% of stops in this data set were associated with a CFS from the public; 27.5% were
associated with an on-viewed incident. In either case, whether reasonable suspicion for a stop
20

Approximately 4% (n = 793) of stops could not be related to the record used to identify call type. The SPD Computer
Aided Dispatch (CAD) system logs and tracks all calls, whether officer initiated or the result of a CFS. The logistic
regression underlying the PSM procedure requires either imputation of missing values or elimination of the
observation from the analysis. There is no way to reconstruct or otherwise impute the call type. The 793 stops with
missing CAD information (e.g. Call Type, Priority) were eliminated to control for missing data.

SEATTLE POLICE DEPARTMENT

15

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 16 of 31

is relayed by a member of the public or a matter of officer observation, the decision of an officer
to initiate the stop is ultimately subject to officer discretion. Generally, the distribution of
dispatched and on-viewed stops remains stable in this data set when viewed across subject
perceived race (even in demographics represented at or below 5%, where lensing, or
exaggeration of a rate, is often a concern).
Priority: As part of the call taking and dispatch process, information available from the
community member calling (dispatch) or the officer (on-view) is used to determine both initial
call type and priority. While a total of 205 distinct case types were observed during the study
period, and while PSM can be a powerful tool for the balancing of many variables, for purposes
of assuring adequate match this project utilized a higher level of aggregation – call priority – as a
dimensionally scaled21 representation of case type.
Call types are prioritized according to level of emergency. Priority 1 calls are incidents that
require an immediate response, including incidents that involve obvious immediate danger to
the life of a citizen or an officer. Priority 2 calls are noted as urgent, or incidents which if not
policed quickly could develop into a more serious issue (such as a threat of violence, injury, or
damage). Priority 3 calls are investigations or minor incidents where response time is not critical
to public safety. Priority 4 calls involve nuisance complaints, such as fireworks or loud music.
Priority 7 calls are officer-initiated events, such as traffic stops. Priority 9 is used to indicate
administrative tasks or downtime.
The clear majority (more than 80%) of active calls (events requiring some level of officer
response) were classified as either a Priority 1, 2 or 3. Priority 7 calls comprised just over 12% of
the sample. The remaining 7% of stops were observed across a variety of administrative types.
Sector: Sector is included as a unit of both patrol deployment and optimal22 representation of
geography. The SPD map is broken into five (5) Precincts, seventeen (17) Sectors and fifty-one
(51) Beats. Squads are deployed to Sectors and individual officers or teams of officers (two person
cars) are assigned to be primary in their “districts” or Beats. Sectors were found to represent the
geographic variability of the city and aligns with units of management under the patrol
deployment strategy.23 In addition to controlling for neighborhood level characteristics, the use
21

Caution must be exercised when matching too many variables. Risk of overfitting the model or regression to the
mean applies. The Event Per Variable (EPV) ratio of 10 heuristic was used. See Peduzzi, P., Concato, J., Kemper, E.,
Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression
analysis. Journal of clinical epidemiology, 49(12), 1373-1379 (finding that for values of EVP “10 or greater, no major
problems occurred.”).
22

Beat, the most granular unit of patrol deployment, was piloted. R (the software package for running logistical
regressions for purposes of achieving matched sets) performed poorly rendering balance tables when Beat was
included. Sector, as the next highest level of aggregation was determined to be sufficient without sacrificing
geographic variability.
23

Similar to Ridgeway’s (2006) findings in Oakland, this study represents citywide measures of disparity. Although
Sectors represent squads, the PSM controls for squad level management variability. A study of disparity at various
levels of management and supervision may yield additional insight into individuals or groups of individuals.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 17 of 31

DISPARITY REVIEW – PART I

of Sector balances the effect of temporary assignment to a different span of control (e.g., an
officer who is assigned to an administrative unit may work overtime in patrol; their administrative
assignment may not be relevant but the geographic location of the stop and temporary span of
control is.
Date/Time: Three date/time variables or combinations of variables can be related to stops. The
CAD system applies date/time stamps at several points as an event is processed (e.g. queued,
dispatched, first arrived, arrived, cleared, etc.). The officer indicates the “occurred” date and time
on the report used to document the stop, and the RMS logs the date/time the report is
submitted. Although the occurred date/time would ideally be split into its date parts for this
analysis, implementation of the original report template24 resulted in an unusable and
inconsistent format for occurred date and time (free text). Further complicating this analysis is
that “original time queued,” which is the preferred CAD event date/time, is often reflective of
the report of an offense or crime under investigation, but is not necessarily the time of the stop
itself. For this reason, date and time were matched according to reported date (year, month, day
of week) and Officer shift.

B. Officer Demographics and Assignment
Officer Ordinance Title:25 During the study period, SPD employed 1,614 sworn officers; of those,
902 reported an investigative stop during this period. Over three-quarters (77.3%) of all stops
were reported by officers holding the ordinance title of “Police Officer,” followed by “Police
Student Officers.” Both Police Officers and Police Student Officers are most commonly assigned
to the Operations Bureau, in Precincts, in Patrol or 911 response functions. Probationary officers
- those within one (1) year of hire, between the end of Field Training (e.g. student officers) and
permanent status, comprised another 8.7%.
Officer Gender: Approximately 85% of sworn officers were male; 15% were female. Male
officers reported 88.9% of all stops; female officers reported 11.1% of stops. In eight (8) stops,

24

Versaterm, SPD’s current RMS, allows for customized “templates” to be created. Ad hoc implementation of these
templates is often limited in their ability integrate with other RMS structures and utilities, the fields available
(date/time stamps versus free text fields) and the format of the report once processed and stored (i.e. xml). The
NRMS, scheduled to go-live on March 31st, 2019, natively integrates the investigative stop report as a “field contact”
report.
25

Ordinance titles are the City of Seattle permanent job titles for a position.

SEATTLE POLICE DEPARTMENT

17

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 18 of 31

officer gender could not be accounted for, and those stops were accordingly omitted from the
analysis.
Officer Race: Officer race is obtained at the time of hiring and reported under the standard used
to transmit employment eligibility verification to the Department of Homeland Security (DHS),
Form I-9. Overall, nearly 80% of all stops were conducted by officers identifying as “White.”
Officers identifying as “Hispanic or Latino” accounted for approximately 5% of all stops. Black
officers accounted for almost 5% of stops. Asian officers accounted for almost 4% of stops. All
other officer race demographics accounted for less than 5% of stops. The same eight (8) stops
where the officer gender was missing were identified in the race demographic data and were
thus controlled.
Officer Years of Experience: Officer Years of Experience (YOE) is calculated as the difference
between the officer’s hire date and the date of the stop. The average26 YOE was an asymmetrical
(SD = 7.19) 6.3 years. Overall, 16% of all stops were conducted by officers with between one and
two YOE.
Officer Age: Officer age is calculated from officer date of birth (DOB) and the reported date of
the stop. The average age of officers reporting stops was 35, along a symmetrical, approaching
left-modal curve (SD = 8.5). Approximately one-third of all stops (30%) were reported by officer
between 30 and 35 years of age. Officers between 25 and 30 accounted for 23.2% of stops;
officers between 35 and 40 years of age accounted for 15.4% of stops. In all, officers between 25
and 40 reported nearly 70% of all stops.
Reporting Squad by Bureau: The functional organization of the SPD contains seven levels of
management, each of which is defined by its relationship to or distance from the Chief of Police
(COP). The broadest level of structured management27 below the COP are the Bureaus, each of
which is led by an Assistant Chief. Precincts/Sections roll up under Bureaus and report to either
Captains or civilian managers. Within the Operations Bureau, out of which most stops are
reported, geographic precinct areas are administered by Captains; the Patrol/911 response

26

Average is reported here for information only. PSM is a non-parametric technique and is not dependent on
satisfaction of distributional assumptions.
27

Virtual structures exist immediately below the COP. These structures are led by the Deputy Chief and civilian
Executive Directors.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 19 of 31

DISPARITY REVIEW – PART I

function, which maintains 24-hour coverage, is separated into Watches, led by Lieutenants.28
Squads are the most granular units of management and are administered by Sergeants.29
In total, 130 squads reported investigative stops during the study period. Of these, 78.3%
reported to the Operations Bureau; 10.1% reported to the Investigations Bureau; and 9.3%
reported to the Homeland Security and Special Operations Bureau.
With 130 squads, adding each squad as a discrete variable was impractical and tested the
demonstrated validity of the PSM method. Squad groups were accordingly constructed by
grouping squads that share similar functions to dimensionally scale the squad variable. Units
assigned to 911 response assigned units reported 87.6% of stops. Officers assigned to special
“beats” or emphasis patrols and bikes reported 8% of all stops. Tactical units, including the AntiCrime Team (ACT) and SWAT, reported just 2.7% stops; another 1.7% of stops were reported by
“Other” units, primarily units with an investigative function (Gang, Narcotics, etc.).
Officer Crisis Intervention Certification Status: In addition to descriptive characteristics of the
officer, officer certification as a member of the Crisis Intervention Team (CIT) was included in the
match. It was hypothesized that holding and maintaining certification as a member of the CIT
would constitute both an officer attitude and advanced training which would affect an officers’
decision-making and thus the outcome of the stop. In 9,116 stops (51.7% of the total), the
reporting officer was CIT-certified.30

C. Subject Demographics
Under Washington State law, a community member who is subject to an investigative stop is not
generally required to identify themselves. By SPD policy, “officers may request identification;
however, subjects are not obligated to provide identification or information upon request.”
(Seattle Police Manual, 6.220). When reporting a stop, officers are accordingly asked to
28

Functional watches are differentiated from the temporal watch period. SPD maintains twenty-four-hour coverage
of 911 response and some specialty unit functions. 911 response / patrol squads (the most granular unit of
organization) are organized around six (6) nine-and-a-half (9.5) hour overlapping watches in three eight hour periods,
roughly: 0300 to 1100, 1100 – 1900 and 1900 – 0300, the next day (1st, 2nd and 3rd Watch, respectively).
29

Units are an intermediate level of management, between Precinct/Sector and Squad. Units are led by Lieutenants
and, while all watches correspond to Units, not all Units occupy a standard watch, as described above.
30

CIT certification is voluntary and requires 40 hours of specialized training and self-selection in the certification
group. Some officers may have received the specialized training but are no longer part of the CIT certification group.

SEATTLE POLICE DEPARTMENT

19

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 20 of 31

document their “perception” of the race, age and gender of the subject they contact. When the
Terry template for reporting was constructed, it was believed that prompting the officers for their
perception would render a data point closer to their decision-making and could aid in analysis,
such as this. Confounding factors influence the consistency of this assumption, however; for
example, although the officer is being asked for their perception, their response may be
influenced by information they obtained during their investigation. An officer may have been
exposed to statements by the subject or been provided voluntarily with identification indicating
the subject’s self-reported identifying characteristics. If an investigative stop produces probable
cause leading to an arrest and booking, the officer may have confirmed identification of the
subject from biometric scans conducted while processing into a Department of Juvenile and Adult
Detention (DAJD) facility.
Overall, slightly more than 50% of all stops involved a subject the officer reported perceiving as
“White.” Subjects perceived to be “Black” comprised another 30.6%. All other perceived
demographic groups, comprising in total 13.2% of stops, each represented less than 5% of the
distribution. In 4.4% of stops, the officer reported the subject’s perceived race as “Unknown.”
Stops involving subjects reported to be of “unknown,” “missing” or “other” perceived race are
included in the analysis, but only as part of the White/Non-White binary group. See Table 2.
Blacks are vastly overrepresented in the stop data relative to the 2010 U.S. Census or the 20122016 American Community Survey where they are estimated to comprise 7.7% and 7.0% of
Seattle’s population, respectively.31 While some of this 4-fold disparity in stop rates could be due
to non-racial factors, given the magnitude of the disparity it will receive considerable focus in the
next report.

Table 2: Stops by Perceived Subject Race

Subject Age: Officers indicate perceived subject age by category. Overwhelmingly, the majority
of subjects are perceived to be within the age brackets of 18 to 45 years of age, as shown in Table
3.
31

https://www.seattle.gov/opcd/population-and-demographics/about-seattle#raceethnicity

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 21 of 31

DISPARITY REVIEW – PART I

Table 3: Stops by Perceived Subject Age32

Subject Gender: Officers are provided the option to indicate perceived subject gender as male,
female, and unable to determine. Overwhelmingly, the majority of subjects are perceived to be
male. See Table 4.
Table 4: Stops by Perceived Subject Gender

II.

Investigative Stops – Matching

Five separate matches were conducted: White/Non-White33, White/Black (WB), White/Hispanic
White/Asian, and White/American Indian Alaska Native. In each case, the match balanced the
variables across the classification (treatment/control), where a minority race (Asian, American
Indian Alaska Native, Black, Hispanic) is the treatment and majority race (White) is the control.
After controlling for missing data, 17,628 observations (94% of the universe) were involved in the
match. Valid, balanced matched populations were generated across all five treatment/control
conditions. See Table 5.

32

This table, and the analysis, exclude 518 stops where the subject age was not indicated.

33

Subjects reported to be of “unknown,” “missing” or “other” perceived race by officers.

SEATTLE POLICE DEPARTMENT

21

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 22 of 31

Table 5: Match Summary – Stops

Across the five observation groups, the match rate ranged between 80% and 99.5% of the
treatment groups’ representation in the population of complete observations (n= 17,628).
Logically, a smaller number of observations resulted in a reduced number of permutable
combinations of variables in smaller subsets (e.g. Asian, American Indian Alaska Native, Hispanic).
The availability of observed control permutations (e.g. stops involving control subjects) resulted
in a high match rate. Unsurprisingly, the highest match rate was observed in the American Indian
Alaska Native match, where only 601 control matches were possible. A nearly 15:1 control (n =
8,861) to treatment (n = 601) ratio resulted in 99.5% (n = 598) of stops involving a subject
perceived to be Asian finding a control match. Predictably, the lowest match rate (80%) was
observed in the Non-White treatment.
The goal of the PSM is to replicate, to the extent possible, the controlling effect of randomization
by balancing the confounding effects of variables theoretically likely to affect an outcome.
Several “model fit” statistics have been proposed for PSM; however, generally the standardized
mean difference (SMD) is considered to be the most reliable diagnostic for measuring a
treatment effect.
Table 6: Balance Summary - Stops

Table 6 shows a summary, for each treatment group (subject race), of the number of balanced
variables, the maximum balanced variable, and the adjusted difference. For each match, a
balance table was generated to assess the quality of the resulting matched population.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 23 of 31

DISPARITY REVIEW – PART I

III.

Use of Force - Data34

During the study period, 4,371 uses of force were reported. More than three-quarters (76.8%) of
force was classified at the lowest level, Type 1. One-fifth of force (21.5%) was classified at the
next highest level, Type 2. Cumulatively, 98.3% of all force was reported at these two levels,
comprising complaints of “transitory pain” and firearms pointing (Type 1) and minor cuts, bruises
and strains (Type 2). 1.7% of force was reported as Type 3 or Type 3 – OIS (Officer Involved
Shooting). These highest classifications of force represent reports of serious injury and lethal or
potentially lethal force. See Tables 7(a), (b);
Table 7a: Use of Force by Year – Percent of Total

Table 7b: Use of Force by Year – Total Count

In contrast to the reporting of race in relation to investigative stops, the race of a subject in a use
of force report is, generally, not limited to the officer’s perception.35 Force often involves
physical restraint and some limited custody of the subject, if not a full booking into a King County
Department of Juvenile and Adult Detention (DJAD) facility. The process of booking a subject
involves reporting, to DAJD, a subject identity (e.g. the detainee’s self-report or government
34

A The distribution of more complete discussion of SPD’s use of force data is presented in the Department’s Annual
Use of Force Reports for the years in question, and the data are available for download or exploration through the
Department’s Use of Force Dashboard.
35
Where force is used to disperse a crowd, for example, or apprehend a fleeing subject believed to pose a danger,
there may not be an identifiable subject associated with the report.

SEATTLE POLICE DEPARTMENT

23

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 24 of 31

identification). Additionally, DAJD may utilize biometric data (e.g. fingerprints) to confirm a
subject’s identity during intake.
Roughly 40% of all reported force involved a white subject; 32% of subjects were Black.
Approximately one-fifth (20.1%) of subjects were listed as “Not Specified” (in contrast to the
cumulative 5.7% of subjects of stops).36 This analysis treats Not Specified as a discrete treatment
group. Asian subjects (3.5%) were combined with Nat Hawaiian/Other Pac Islander (.8%) for
cross-analysis consistency, for a cumulative 4.4%. Hispanic subjects were indicated in 3.4% of
force.
Only demographics representing, cumulatively, more than 100 observations were handled as
treatment (match) groups. American Indian / Alaska Native subjects (AIAN) were involved in 37
(.8%) uses of force and are represented in the White/Non-White match, only.
Of the 3,357 Type I uses of force reported over the study period, 1,049 involved the pointing of
a firearm. A demographic breakdown of these uses of force is shown in Table 8.
Table 8a: Distribution of Force by Subject Race

Table 8b: Distribution of Type I, Firearm Pointing by Subject Race

Variables controlled for in the analysis are presented in Table 9; descriptions provided above
apply.
36

Unknown and Missing subjects were assumed to be minority and included in the Non-White treatment group.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 25 of 31

DISPARITY REVIEW – PART I

Table 9: Controlled Variables – Use of Force

IV.

Data Matching – Use of Force

Propensity scores were generated using logistic regression for each treatment group (black,
Hispanic, Asian, and non-white and matched to the nearest propensity score in the control group
(white). Five matches were conducted: white/non-white, white/black, white/not specified,
white/Hispanic, and white/Asian. Because of the extremely low number of available
observations, Native American/Alaska Native was not included as a treatment group. In total,
14,371 observations were involved in the match. Valid, balanced matched populations were
generated across all five treatment/control conditions. See Table 10.
Table 10: Match Summary – Use of Force

Across the five treatment groups, the match rate ranged between 60% and 94% of the treatment
group’s representation in the population of complete observations (n= 14,371).
While not every factor was matched, every variable was balanced and no variables were found
to be out of balance, post-match. See Table 11.

SEATTLE POLICE DEPARTMENT

25

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 26 of 31

Table 11: Balance Summary – Use of Force

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 27 of 31

DISPARITY REVIEW – PART I

RESULTS
Using PSM, with subject race as the treatment variable, SPD examined disparity in four separate
outcomes across two areas of police/community interactions. Outcomes examined in the
context of investigatory stops were (1) whether the subject was frisked (frisk rate); (2) whether
a weapon was recovered pursuant to a frisk (hit rate); and (3) the duration of the stop. The
outcomes examined in the context of use of force was whether a subject had a firearm pointed
at them (Type I) during the encounter.

I.

Investigative Stops
A. Frisk Rate

An officer may decide to conduct a light search (“frisk”) for weapons of a subject of an
investigative stop whom they have reasonable suspicion to be “armed and presently dangerous.”
See Seattle Police Manual, 6.220.
During the study period, frisks were reported in 21.58% (4,220) of the 19,554 stops reported.
The distribution of frisks, across subject demographics, is presented in Table 12.
Table 12: Frisk Rate by Subject Race

A frisk rate comparison between control (white) and treatment (non-white) group subjects is
presented in Table 13.

SEATTLE POLICE DEPARTMENT

27

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 28 of 31

Table 13: Frisk Rate Comparison (Treatment/Control)
Treatment
Group

Black
Hispanic
American
Indian/Alaska
Native
Asian
Non-White

Frisks –
Treatment
(NonWhite)
1,183
229
127

Frisks –
Control
(White)

123
1,636

92
1,385

1,002
177
122

Frisk Rate – Frisk Rate Difference
Treatment – Control
(Points)
(Non(White)
White)
24.7%
20.9%
3.8
27.2%
21%
6.2
10.6%
10.2%
.4

25.7%
23.3%

19.2%
19.7%

6.5
3.6

Within the comparison of matched sets as to the decision to frisk a subject, some disparity can
be observed across all five (5) matched groups. Controlling for other variables, and although
Asians account for only a small fraction of stops overall, subjects perceived by the officer as Asian
were frisked approximately 6.5 percentage points more than control (white) subjects. Subjects
perceived to be Hispanic were frisked 6.2 percentage points more frequently; and subjects
perceived to be black were frisked 3.8 percentage points more frequently. The lowest frisk
disparity was observed in subjects perceived to be American Indian/Alaska Native, at 0.4
percentage points.

B. Hit Rate (Weapon Recovery During a Frisk)
A hit rate comparison between control (White) and treatment (Non-White) group subjects is
presented in Table 14.
Table 14: Hit Rate Comparison (Treatment/Control)
Treatment
Group

Black
Hispanic
American
Indian/Alaska
Native
Asian
Non-White

Hits –
Hits –
Treatment Control
(Non-White) (White)
191
42
26

247
40
42

Hit Rate –
Treatment
(NonWhite)
16.1%
18.3%
20.5%

21
275

20
352

17.1%
16.8%

Hit Rate –
Control
(White)

Difference
(Points)

24.7%
22.6%
34.4%

8.6
4.3
13.9

21.7%
25.4%

4.6
8.6

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 29 of 31

DISPARITY REVIEW – PART I

When frisked, White subjects (control group) were found with weapons (hit) more frequently
than minority subjects (treatment). Subjects perceived to be Asian were found with weapons 4.6
points less than white subjects in the same situations. The largest disparity in hit rate was found
in subjects perceived to be American Indian/Alaska Native, who were found with weapons 13.9
points less than White subjects.
While some caution is necessary given the relatively low number of observations, given the fairly
low frisk rate within each treatment group overall, these findings are generally consistent with
those reported by the Monitor in the Tenth Systemic Assessment, who found that despite being
frisked at higher rates, Non-White subjects are less likely to be found to be in possession of a
weapon.

C. Stop Duration
Officers note the time a subject was detained in a dropdown box on the stop report. These data
are not ideal for analysis; the bins are not equal (some range five (5) minutes, some range ten
(10) minutes and some are indefinite), and an officers’ perception of time is subject to many
factors. SPD’s new RMS system, which again comes online this spring, captures the beginning
and end time of the “seizure,” thus enabling more accurate future analyses.
For the purposes of this analysis, a weighted mean was calculated by finding the midpoint of the
bin, multiplying the number of observations in that duration group and dividing by the total
number of observations.
Table 15: Weighted Mean Duration Comparison
Treatment
Group
Black
Hispanic
American
Indian/Alaska
Native
Asian
Non-White

II.

Treatment
Average Time
(minutes)
11.0
15.4
10.9

Control
Average Time
(minutes)
11.3
15.6
10.5

Difference

11.1
11.1

11.9
10.8

-0.8
0.3

-0.1
-0.1
0.4

Use of Force – Firearm Pointing (Type I)

The decision to point a firearm can depend on many things – and in some circumstances, as a
matter of training, it is required. However, firearm pointing is also closely related to officer
decision-making and is near the genesis of the force event. The officer will decide to point a
firearm, for example, based on a perception of an imminent threat of danger from the subject, if
they believe the subject to be armed, or in response to a set of conditions calling for the use of
lethal cover (felony vehicle stops, for example). Subject race is not a conscious part of that

SEATTLE POLICE DEPARTMENT

29

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 30 of 31

calculus; no training or policy allows an officer to judge the safety of a contact based on the race
or perceived race of the subject. Subconsciously, however, race may influence an officer’s
perception of safety, and thus the decision to use lethal cover. For this reason, the pointing of a
firearm was also selected as an outcome measure for analysis.
The PSM match procedure balances the observations under an area of common probabilistic
support but the quality of the inference relies on the appropriate selection of variables used in
the match. While this procedure attempted to match the type of call the officers were
responding to (priority and call type), it was unable to match all 205 detailed call types, without
exceeding the demonstrated performance of the PSM procedure. Priority and Call Type (onview/dispatch) were thus selected as a dimensionally scaled representation of the nature of the
call. Neither priority or type are exclusive to each other or Call Type Initial and so are partially
independent.
The comparison between groups with respect to the pointing of a firearm is presented in Table
16.
Table 16: Comparison of Treatment Groups – Pointing of a Firearm

Non-White
Black
Asian
Not
Specified
Hispanic

Pointing –
Treatment
(NonWhite)
384
290
47
149

Pointing
– Control
(White)

Pointing
Rate –
Control
(White)
9.3%
9.7%
10%
10.1%

Difference
(Points)

291
203
33
184

Pointing
Rate –
Treatment
(Non-White)
12.2%
13.8%
14.2%
12.4%

54

29

19.4%

10.4%

9.0

2.9
4.1
4.2
2.3

There is notable disparity in these data of incidents in which a firearm is pointed with the caveat
that these are overall low percentages of incidents. The greatest disparity in firearms pointing
was observed with Hispanic subjects, who had a firearm pointed at them in 19.4% of Type I force
cases in which they were involved compared to Whites who had a firearm pointed at them in
10.4% of similar incidents. Subjects Not-Specified were observed with the lowest disparity in
firearms pointing, 2.3 points. However, a preliminary review of associated use of force reports
reflects that a majority of these instances occur in circumstances (e.g. felony warrant arrests,
stolen vehicle (high risk felony) stops, shootings) where the pointing of a firearm is less a matter
of officer discretion than a trained tactical expectation. Potential related factors not accounted
for in the model, such as known suspect criminal history, local crime trends, and
statements/behaviors expressed during the incident will be examined in the Phase II report.

 Case 2:12-cv-01282-JLR Document 554-1 Filed 04/30/19 Page 31 of 31

DISPARITY REVIEW – PART I

CONCLUSION
This report is the first in a two-part series called for under the Sustainment Plan for SPD to further
review disparity in its stops and detentions. In this report, SPD built upon the good work done
by the Monitoring Team in the Tenth Assessment and employed the technique of Propensity
Score Matching to account for potential confounding variables inherent in investigative stops to
better understand the disparity observed with respect to frisks, hit rate, and duration of stop.
This report also meets one of the Monitor’s recommendations in the Ninth Systemic Assessment
on Use of Force in that it applies a more sophisticated approach to examining disparity in certain
force outcomes – the decision to point a firearm.
Our main findings are that there are racial disparities in all three of the outcomes examined.
Minorities are frisked at a rate 15% higher than whites, even when using PSM to match on
observable characteristics. For Hispanics, the frisk rate is 30% higher than whites. Although
stopped and frisked at higher rates, firearms are found 1/3 less frequently in frisks of minorities,
relative to the matched sample of white detainees. Additionally, minorities have weapons
pointed at them by police far more frequently (30%) than white detainees.
In the second disparity report, due to be filed in December 2019, SPD anticipates using additional
enhancements to the data set to allow for an even more precise estimation of disparity in the
areas identified by this report: namely, frisk hit rates and the pointing of a firearm. SPD is
presently evaluating additional variables that it might factor into the next iteration of this review,
to be presented in the second Phase II report later this year. In light of continued improvements
in SPD’s data collection and analytic capabilities, SPD anticipates potentially being able to include
data regarding recent crime trends in a neighborhood, officer’s knowledge of a subject’s history
and prior weapon-carrying behavior, and/or the behaviors/statements exhibited during the
encounter.
SPD also plans to engage the services of a national expert on measuring, analyzing, and
responding to disparity in law enforcement practices. This work will be informed by the findings
of the Phase II report. This partnership will be used to create, test, and implement a structured
instrument for turning such data sources as behaviors and statements captured in body-worn
video, officer narratives, and victim statements, into quantitative and qualitative data that can
be analyzed in aggregate. SPD invites the input of the Office of the Inspector General and CPC in
this process.
As more information becomes available about the dynamics of these interactions, it is SPD’s
intent to be able to isolate the key situational differences and begin the difficult work of
determining which law enforcement and other social policies and practices could be adjusted to
reduce the disparity. Ultimately, the SPD views this report as another powerful statement that it
is committed to being an innovative organization committed to self-evaluation, continuous
improvement, data-driven decision-making, which has earned the recognition of being a national
leader in policing.

SEATTLE POLICE DEPARTMENT

31