New College THE CDLLEBE Natural Sciences Division March 8, 2017 Chief Judge Charles Williams Twelfth Judicial Circuit Court of Florida Dear Chief Judge Williams, I write to report on the methodology used in the recent Sarasota Herald Tribune series entitled ?Bias on the Bench.? The series claims to have analyzed more than 84 million criminal records in reaching the conclusion that many judges in the Florida Criminal Justice System sentence black defendants to signi?cantly longer periods of incarceration than white defendants for the same crime. While an analysis of bias in the Florida Criminal Justice System is long overdue, I have concluded that the methodology employed by the Herald Tribune is deeply ?awed and their conclusions cannot be adopted with con?dence. What the Herald Tribune has done, however, has the potential to inform future work intended to identify and remove bias. Accordingly, I offer more detailed cements below. The Herald Tribune built a database comprised of records obtained from the Florida Offender Based Tracking System (OBTS) and the Florida Department of Corrections (DOC) spanning the years 2004-2016. In making pronouncements regarding bias on the part of individual judges, they restricted attention to third degree felonies, where they produced rough statistical aggregates. My criticisms of their approach are primarily of two kinds: shortcomings involving transparency and shortcomings involving control. Because the charge of bias is leveled at individuals and is serious, it is essential that the methods by which conclusions are reached be transparent. As a minimum standard, the Herald Tribune should have provided 0 Access to the raw data used to come to the conclusion; 0 The software and code used to clean and analyze the data; 0 A discussion of how the results of the analysis are being interpreted; A measure of con?dence in their conclusions. Because the Herald Tribune did not provide this information, it is impossible to know precisely what they did, let alone check the validity and reproducibility of their results. Starting at the top, without access to the data it is impossible to assess the quality of the data, and the quality of the data limits the con?dence one might have in any subsequent conclusion. The data required extensive cleaning, a process that involves making decisions as to what part of the data is retained and interpreted as part of a judge?s sentencing record. Without a detailed description of how this was done, it is not possible to provide any measure of con?dence in what the record and without knowing the record it is not possible to assess in what sense it is biased. It appears that the Herald Tribune relied on some type of averaging, but without precise information of what constitutes a record, it is impossible to compute an average, let alone determine whether the particular average computed is a good representation of the behavior it purportedly describes. Each bullet point thus represents a serious problem that must be addressed in order for one to independently make sense of and assess the Herald Tribune?s claims. 5800 Bay Shore Road Sarasota, Florida 34243?2109 941-487?4370 Fax: 941?487-4396 In addition to problems of transparency, there are a number of problems involving control, a technical term that refers to the extent to which the measurements (sentence length) can be attributed to a variable of interest (individual judge?s bias). My most serious objections to the Herald Tribune?s methodology from the point of view of control are 0 Failure to control for confounding by plea deals; 0 Failure to control for time dependent guidelines and sentencing norms; 0 Failure to control for strong coupling of charges in the computation of aggregates. Each point above describes a well-known feature of the criminal justice system. For example, it is well known that upwards of 95% of all sentences involve no trial and in the overwhelming majority of such cases a plea deal has been reached independent of the judge in whose court the plea is formally accepted. Not controlling for this variable and subsequently ascribing bias to the judge in their sentencing is at face value a very bad idea: at best it misconstrues how the criminal justice system works; at worst it lays the blame for possible bias on the part of other agents in the criminal justice system at the feet of the judge who happened to be the last stop in the criminal justice system before incarceration. In both cases, the net outcome may well be a failure to recognize where reform would do the most good. The other two bullet points similarly undermine con?dence in conclusions: in the case of time dependent guidelines and sentencing norms, it is well?known that the war on drugs took/takes a disProportionate toll on different ethnic groups with different intensities over the course of the last twenty years. Comparing records that terminate in 2004 with records that terminate much later therefore introduces a confounder that undermines con?dence. These concerns are not just theoretical, nor are they exhaustive: I have examined the OBTS data for the Twelfth Circuit and witnessed instances where each of the above problems (and a number of others) surfaced g. the data includes missing ?elds and incorrect entries, the data required extensive cleaning, the structure of the distribution for felony charges by degree and race are completely different when accounting for plea data, cases involving multiple charges versus cases involving a single charge, examples where judges whose record terminates in 2004 compared to judges active from 2004-2014, small sample problems when seeking to determine the same or suf?ciently similar crimes, etc.). These observations undermine con?dence in the Herald Tribune?s conclusions, and, in particular, the charge of individual bias is unsupported. While the claims in the ?Bias on the Bench? series are unsupportable, it is clear that the Herald Tribune has done a valuable service in calling attention to the existence of bias in the criminal justice system and in putting together a database intended to address the problem. Hopefully, the latter could be used as a springboard to provide a more comprehensive, methodologically sound investigation of the Florida Criminal Justice System with an toward improving the status quo. Sincerely, WM Patrick McDonald Director of Data Science Professor of Mathematics 5800 Bay Shore Road Sarasota, Florida 34243-2109 941-487-4370[ Fax: 941-487?4396 3/19l2017 Preliminary Bias Report Preliminary Bias Report C. Dowdy, C. Ede/son, C. Leonard and P. McDonald February 17, 2017 Introduction Our criminal justice system is sustained by the belief that all citizens are to be afforded fair and equal treatment under the law. It is no secret, however, that our prison population is skewed by race: people of color are over- represented as a proportion of those incarcerated. Investigating why this is the status quo and how the status quo can be addressed is an important and difficult problem, one deserving careful attention. Recently, the Sarasota Herald Tribune ran a four part series addressing the issue of bias in the Florida Criminal Justice System. Their series, entitled ?Bias On The Bench", lays the blame for the status quo squarely at the feet of Florida's judges.1 The series purports to be data-driven, the result of a year of work involving millions of cases. It names names: the series ascribes biased sentencing behavior to specific judges and in so doing impunges their credibility and the credibility of the system they represent. It does not provide the raw data from which the conclusions are drawn, it does not provide the algorithms or software used to clean and process the data, and it does not provide a clear discussion of how the processed data is being interpreted. Because the charge of bias is serious, it is essential that it be well-founded. In this note we review the data associated to two of the judges named in the Sarasota Herald Tribune series, Judge Lee Haworth and Judge Charles Williams. We provide both a description of the data used for our study and a link to the raw data. We provide a description of how we process the data and the software and algorithms we use. We provide a discussion of how we interpret the results of our processed data. Our report is thus fully reproducible: anyone who downloads the data can check the results we report. We have constructed the code to be reusable and we have provided an example for how those interested might use our tools to investigate data associated to other judges. Our main conclusions are as follows: - Including data involving cases where a plea bargain is accepted undermines any confidence one might have in the inferences drawn regarding bias in the sentence (See Figure 1 and 2). - lnferences involving bias on the part of a sentencing judge that are based on data aggregated on felony degree can any no confidence (because such aggregation treats each felony charge as an independent event). - Identifying a common notion of ?same crime" can lead to problems of small sample size (See Figure 6, 7, and 8). A clear statement of the protocol used to treat such problems and transparency with processing is required forthere to be any confidence in inferences drawn. Because the Sarasota Herald Tribune series did not describe precisely how they treated the data, reproduction of their results is not possible. It appears that their analysis did not isolate the data involving plea bargains. It also seems to have relied on precisely the aggregate measures that cannot provide inferences that can be trusted. In trying to reproduce the Tribune's results for the data involving Judge Williams, we computed the aggregate statistics suggested in the Herald Tribune series (see Figure 4). As is clear from the visualization, there is no evidence in the data we examined that, for defendents tried and found guilty of the same crime, Judge Williams assigns longer sentences to black defendents than to white defendents. We provide an analysis of the data for Judge Haworth with analogous findings. 1/29 31192017 Preliminary Bias Report The remainder of this report is structured as follows. In the next section we discuss the data with which we work, the software we use, and the iniital data processing protocols. Following that we provide a discussion of the what constitutes bias in sentencing, a discussion which drives the code which is subsequently constructed. Next. we provide code which produces a preliminary analysis of the subset of the data corresponding to observations related to Judge Williams. Following this we produce code which allows us to analyze data involving length of sentence. The next section involves a discussion of how to construct identi?ers for cases which might provide a basis for establishing meaningful measures of bias, and, more importantly, why this is necessary. A second example involving data associated to Judge Haworth follows and indicates how the code for Judge Williams can be used to validate other claims made in our analysis. We conclude with a discussion of future work. Data, software and initial processing Data maintained by the Offender Based Transaction System (OBTS) was obtained for cases heard in the Florida Twelfth Circuit via a Freedom of Information Act request. The raw data consisted of a PostgreSQL database encoding roughly of information. A separate communication provided a data dictionary (OBTS Criminal Justice Data Element Dictionary, July The data dictionary included a description of the 105 features (columns) associated to each of the roughly four million observations (rows) contained in the database. The raw data was provided in the form of a compressed SQL dump file, so recovering the database followed the typical procedure outlined below. We begin by creating an empty database to be used during restoration process: CREATE DATABASE obts Once the empty database has been created, we can exit PostgreSQL and use the utility program, pg_restore to facilitate the recovery from our dump ?le: 3 pg_restore -d [database name] [dump file name] pg_restore -d obts obts_back.psq1 With the database recovered, we subset the records such that the judge at the time of sentencing was Charles Williams.To account for clerical errors upon data entry we ran the following query before proceeding: SELECT "SP_JudgeatSentencing", count(*) as "Frequency" FROM obts_data WHERE "SP_JudgeatSentencing" iLike 'williams%' GROUP BY 1; The results of the query are shown below: SP_JudgeatSentencing Frequency WILLIAMS, CHARLES I 18574 WILLIAMS, CHARLES 29537 There appear to be two variations for our judge of interest, we will address this redundancy later in our analysis. Finally, we exported all of the records involving Charles Williams to a to be further processed using R. The syntax required to export the results of a query to a are shown below:4 2/29 319/2017 Preliminary Bias Report \copy (SELECT FROM obts_data WHERE ?SP_JudgeatSentencing? iLike 'williams%') to with CSV HEADER The quality of the raw data leaves much to be desired. A quick scan reveals a number of missing values, fields where values have been incorrectly entered, fields in which information has been hashed, and fields where coded information might be coerced and corrupted unless care is taken in processing. Following the above protocol, all data corresponding to Charles Williams and Lee Haworth was extracted as a and saved to disk (Judge Williams appeared under two names, Judge Haworth under three). While this data contained a number of missing values, the fields coding for length of confinement were intact. This data was the starting point of the initial analysis. To process the data we employed the statistical programming language and the Studio IDE. Both and Studio are available for free download.5 They represent the state of the art for statistical processing packages. Thus, we begin with raw data consists of 68286 observations each exhibiting 105 features. Defining the problem to be studied: What constitutes bias? The initial analysis sought to address the hypothesis ?Judge Williams sentencing record indicates a bias against black defendants." To address the hypothesis we were forced to clarify a number of points: 1. What constitutes Judge Williams? sentencing record? 2. What constitutes racial bias? 3. What constitutes racial bias on the part of Judge Williams? To address the first point, we followed the Sarasota Herald Tribune?s lead and focussed on observations involving felonies. For our initial study we limited our analysis to those observations for which there was a trial, the trial involved a felony, the felony was committed by an adult, and the trial resulted in a guilty outcome. ?Bias? is a charged term deserving a careful treatment. We use the word ?bias? to mean ?a pattern of behavior for which the same crime is treated differently and the difference is causally related to race.? Causal relations are the gold standard in matters statistical and they are notoriously dif?cult to establish. As our analysis is preliminary, we instead investigate the correlation between race and length of sentence for the same crime, noting that if there is a causal relationship, the data will exhibit a correlation between race and sentence length. In particular, if it is the case that there is racial bias in sentencing, we would expect to see a significant difference in mean sentence length, and there are standard statistical tools to test such a proposition. Because we have a complete record of the trials over which Judge Williams presided, we can augment these rough measures with a more nuanced investigtion of the data. Given the above, we can formulate what we mean by "racial bias on the part of Judge Williams": To demonstrate racial bias, we will study cases in which Judge Williams is free to exercise discretion in sentencing and we will require that there be a signi?cant pattern in which Judge Williams sentences black defendants to longer sentences than he sentences white defendants for the same crime. ?lezll/C :lUsers/Kristy%208al 3129 311912017 Preliminary Bias Report Defining what is meant by "the same crime? is a challenging problem that requires a trade-off between granularity in the information that is being processed and the creation of general tools designed to be used to study the sentencing behavior of any judge in the Florida Criminal Justice System. As a ?rst pass, one might consider aggregating observations according to charge and degree. There are serious problems with such an approach: there are cases for which the charges against the defendant are not restricted to the aggregate classes (for example, cases in which a defendant is charged with multiple felony counts of different degree). It is obvious that such cases do not involve ?the same crime? as cases involving a single charge, and therefore drawing inferences on data that has been aggregated on charge and degree does not necessarily provide information on bias as defined above. To address this feature of the data would require the construction of unique case identi?ers, that is, collections of observations in the data that de?ne unique cases. Once unique cases have been defined, cases which are the same (for example, involve the same charges and counts) can be compared. While these constructions are beyond the scope of this preliminary analysis, they are necessary to establish a claim of bias in sentencing. Initial data processing: studying felony and degree Having established what we mean by bias, we turn to the data. We refer to each row of the data as an observation. Each observation has associated values for 105 features (though some rows may not contain values for all features). It is important to realize that it is very rare for an observation to correspond to a crime or a trial: crimes and trials almost always involve more than one observation. Sentences are associated to trials which are collection of observations defining a crime. Our goal is to produce collections of trials that lend themselves to an analysis that addresses whether or not there is bias in the sentencing record of Judge Williams. This will require a great deal of work. Our ?rst course of action is to replace column numbers with names appearing in the data dictionary: ##change coLumn names Next, we filter the data to obtain the records for which Charles Williams is the sentencing judge, the prosecutor ?les charges, the defendant is not a juvenile, the crime is a felony and the defendant is found guilty. 319/2017 Preliminary Bias Report #remove white space Data[,89] ##initiat data fitter ##_first reduce to data far Chartes Wittiams then reduce to prosecutor fiLes charges then reduce to "not juvenile" then reduce to beany charges then reduce to guiLty outcomes Williams_data Data NATOR (COURT ##check size dim(williams_data) 13391 105 Our initial filtration results in 18301 observations of 105 features. We can produce a preliminary view of the data: fill geom_bar() facet_grid(TRIAL .) labs(title "Judge Charles Williams: Distribution of felonies by degree, faceted on trial") labs(y= "Number of records") labs(x= "Degree of felony") 1: Comparison of cases involving trials (2 and 3) and cases involving no trial 5/29 3119/2017 Preliminary Bias Report Judge Charles Williams: Distribution of felonies by degree, faceted on trial 10000- 5000- _l 0- 10000- 5000 - I 0 Number of records a 0.1 10000 5000 - 0. Degree of felony Figure 1: Comparison of cases involving trials (2 and 3) and cases involving no trial (1) Figure 1 provides a look at how the felony data is distributed across degree (F- first degree felonies, L- life, P- ?rst degree felonies punishable by life, 8- second degree felonies, T- third degree felonies), faceted on trial (1- no trial occured, 2- trial byjury, 3- trial by judge). The top plot, labeled 1, displays the data on race in the case that no trial occurred, the middle one, labeled 2, displays the data when there was a jury trial, and the lower plot, labeled 3, displays the data when the judge tried the case. There is a common scale for the plots. The relative size of the boxes in the top plot (no trial) versus the boxes in the lower two plots gives an accurate visual display of the collapse in the data: it is hard to detect the boxes in the middle and bottom plot because the amount of the data in the top plot far, far exceeds the amount of data in the other two. The plots includes a description of how the felony data is distributed according to race. We note that for third degree felonies (the largest of the categories for plead cases), white defendants dominate the observations. While it is important to keep in mind that we have yet to construct a unique identi?er for each case, it is possible to get some idea of how much of the data involves plea bargains in which the judge has minimal discretion with respect to sentencing. To do so, ?lter on cases that involved a trial. The data dictionary indicates that the value in the TRIAL ?eld indicates that no trial occured. w_data Williams_data filter((TRIAL dim(w_data) 544 165 We call attention to how the data collapses when one restricts to cases that were tried: there are 544 observations remaining from the 18301 observations with which we began (ie we are working with about 3 percent of the felony data). This is only part of the story: when restricting to cases in which a trial occured, the ?lezlilc 311912017 Preliminary Bias Report distributions by race change dramatically: fill: geom_bar() 1abs(tit1e ?Judge Charles Williams: Felonies by degree for observations with trial") 1abs(y= "Number of records") 1abs(x= "Degree of felony") 1abs(caption= "Figure 2: Stacked barplot giving distribution of observations involving Felony trials by race") Judge Charles Williams: Felonies by degree for observations with trial 200- U) 2150- 8 RACE "5 I8 S. (D 9100felony Figure 2: Stacked barpiot giving distri ution of observations involving felony trials by race Figure 2 provides detail that is suppressed when including data involving plead cases. There are two important features to note: 1. The number of third degree felony observations relative to the number of observations of either ?rst or second degree felonies has changed dramatically. 2. The relative distribution by race of third degree felony observations has changed dramatically. These observations and the questions they engender suggest that a very careful treatment of the data is in order. We begin by including information involving length of con?nement. Length of confinement The length of con?nement is coded as a fourteen character sequence. This sequence is comprised of two sub?elds of length seven. The first sub?eld codes for the length of the minimal time to be served; the second for the maximal time to be served. For each subfield. 7/29 3119/2017 Preliminary Bias Report - The first three characters represent the number of years in the sentence. - The fourth and fifth characters represent the number of months in the sentence. - The last two characters represent the number of days in the sentence. We will focus attention on the second sub?eld. it is easy to check that all of the data is formatted correctly. The data dictionary indicates that when length of confinement is not applicable, the code "8888888" is to be entered. We count and remove these instances. ##Trim to work with second subfietd w_data gsub("A . . . . . ##Deat with the NA data by buiLding a tag ##Write a fanction to generate a Logical vector that marks 8888888. AS function(x) FALSE if (x "8888888?) return(y) function(U) as.character(U) return(T)} ##Use the function to mark the NA observations Thus, we have eliminated 59 observations in which the length of sentence field recorded ?Not Applicable." Next, we isolate death sentences. From the data dictionary, observations involving death sentences correspond to codes of the form ?9999999? a sprintf("There are %s observations involving death sentences.", a) "There are observations involving death sentences." This indicates that there are 0 observations related to death sentences recorded in the data involving Judge Williams. According to the data dictionary, observations associated to life sentences are coded as ?9999998?: 3 sprintf("There are %s observations involving life sentences.", 3) "There are 2 observations involving life sentences." 8/29 3/1 912017 Preliminary Bias Report This indicates that there are 2 observations associated to life sentences recorded in the data involving Judge Williams. We record the felony degree and race corresponding to the life sentence data, as well as the associated OBTS number, and docket number: LIFE a sprintf("There are %s distinct OBTS numbers and %5 distinct court docket numbers associated to ife sentences.", 3, b) "There are 1 distinct OBTS numbers and 1 distinct court docket numbers associated to life sentences." What this means is that the two observations in the data corresponding to life sentences are associated to a single OBTS identification number and court docket number. Thus, the two observations involve the same trial, illustrating, as aknowledged above, the need to find unique identifiers for cases before making inferences involving the data. The entry ?9999998? if converted to years is larger than 999. We can ?lter the data for other observations with sentences in excess of 500 years: LONG 560) I a sprintf("There are %5 distinct OBTS numbers, %s distinct court docket numbers and %s distinct se ntence dates associated to sentences in excess of 599 years.", a, b, c) "There are 2 distinct OBTS numbers, 2 distinct court docket numbers and 2 distinct senten ce dates associated to sentences in excess of 599 years." These results indicate that the observations not involving the previously discovered life sentence are all tied to the same OBTS number, the same court docket number and the same sentencing date. It is possible that Judge Williams sentenced the defendant in question to 999 years (as indicated in the data). It is also possible that there was an error in data entry. In either case, if we ?lter the data to study sentences less than 500 years, we know that we lose two cases amounting to 9 observations. To continue our initial investigation of length of confinement, we write a function which converts the length of con?nement data to years. We start with the ?rst three characters. Our function reads the first three characters, converts them to numeric and writes a column called SYEARS. Years function(x) substring(x,1,3) as.numeric(y) return(y) mutate(SYEARS As mentioned above, there are at least 9 observations that involve sentences of more than 500 years. We filter on occurences of such sentences and deal with them as special cases: 9129 311912017 Preliminary Bias Report Williams_clean filter(SYEARS see) To convert months to years, kill the first three characters and convert the second two by dividing by 12. Months function(x) x) substring(y,1,2) return(y) Williams_clean Williams_clean Now finish Days function(x) x) return(y) Williams_clean Williams_clean mutate(SDAYS mutate(STOTAL SYEARS SDAYS) Because there are 476 observations in the clean data, there were 9 observations corresponding to sentences over 500 years. Thus, all the observations involving sentences of more than 500 years have been accounted for. In fact, only observations with a associated sentence of less than 40 years remain: Min. Qu. Median Mean 3rd Qu. Max. 6.66 5.66 25.66 21.51 46.66 46.66 We can visualize how charges are distributed in our clean data set r111: geom_bar() labs(title "Judge Williams: Felonies by degree for observations with trial and sentence") labs(y= "Number of records") labs(x= "Degree of felony") labs(caption= "Figure 3: Distribution of observations involving felony trials by race for clea data") 10/29 Judge Williams: Felonies by degree for observations with trial and sentence 3/19f2017 200 - _l 01 I 100- Number of records Degree of felony Figure 3: Distribution of observations involving felony trials by race for clean data Treating charges as independent, we can visualize the distribution of sentence length for observations as a function of race, faceted on felony degree: aes(x=RACE, fill=RACE)) geom_boxplot() . scales "free_y") labs(title= "Judge Charles Williams sentence length by race, faceted on felony degree") labs(y= "Length of sentence in years") labs(caption= "Figure 4: Sentence length as a function of race for felony degree: aggregate da 11/29 3/19/2017 Preliminary Bias Report Judge Charles Williams sentence length by race, faceted on felony degree Race Figure 4: Sentence length as a function of race for felony degree: aggregate data. As mentioned in the Introduction, data aggregated on degree of felony cannot lead to inferences involving sentencing bias. Nonetheless, they provide information concerning how sentence are distributed and they are easy to compute: we do so in Figure 4. Figure 4 features paired boxplots: shaded regions represent interquartile range, dark horizontal lines represent mean values, vertical bars indicate the range of the data, and points represent outliers. We include Figure 4 because we are interested in displaying what happens to aggregates when one restricts attention to data that excludes plea deals. The intended message is: if there were reason to believe that such statistics provide a signal in which we can have confidence, in the case of Judge Williams the aggregates still provide no meaningful signal for detecting bias. In order to address questions of bias, we need to study how observations are tied to trials and how trials are tied to sentencing. We begin such a study. highlighting obstructions as we proceed. Connecting crimes to sentence outcomes: towards constructing unique identifiers and establishing case comparisons As indicated in Figure 4, there are 476 observations involving felony charges for which a sentence was imposed. As previously mentioned, Figure 4 is the result of an analysis that treats these charges as independent events. Unfortunately, from the point of view of trial and sentencing, they are not independent: defendants are tried for crimes they commit. crimes usually involve a suite of charges, and trials often involve a suite of crimes. In an effort to keep track of what occurs during the course of an investigation, the Florida 12/29 311912017 Preliminary Bias Report Criminal Justice System mandates that a number of date stamps, case identification numbers and personal identi?ers be maintained. To the extent that they are maintained, these data provide a means of associating a sentence to a crime. To carry out a rigorous analysis of how sentences depend on crimes committed as a function of race is beyond the scope of this preliminary investigation. Instead, we will provide evidence that such an analysis is required if there is to be any confidence in inferences connecting sentences to race. Our ?rst order of business is to identify which observations constituting our clean data are associated to an individual trial. There is a feature of the data that is intended to accomplish precisely this: the court docket number. According to the data dictionary, a court docket number is assigned ?to each case heard by the court to identify it in its ?les." There are 37 unique court docket numbers and each value is either a hash, or it is missing. This is a surprisingly small number, suggesting that issues of small sample size and missing data will likely play a role in any meaningful analysis. We begin by visualizing how observations are distributed over court docket numbers and race. To do this we partition the data into subsets given by unique pairs of court docket numbers and race, and we associate a size to every such pair: the number of observations that correspond to the pair. We plot by placing a colored bar of height at the location on the horizontal axis that records how many observations are associated to the court docket number being plotted. Bars of height grater than 1 therefore represent trials with the same number of associated charges: I by_CD Williams_clean count_CD by_CD tally() aes(x=n, fill=RACE)) geom_histogram(bins 196) labs(title= "Judge Charles Williams: number of observations by court docket number and race") of observations associated to a race-court docket pair") labs(y= "Number of court docket numbers") 5: Studying the number of observations associated to race?court docket pa irs") 13/29 31912017 Preliminary Bias Report Judge Charles Williams: number of observations by court docket number and rac: 3- 5- RACE 4.. Number of court docket numbers :lm 25 50 7'5 160 Number of observations associated to a race-court docket pair Figure 5: Studying the number of observations associated to race-court docket pairs Figure 5 gives some measure of how bad things can be. There are arrests that result in dozens of charges. all of which are tried during a single trial. Such charges are coupled and the conglomerate of charges de?nes the crime. Because the total number of observations is small, cases that involve such a large number of charges make comparative analysis difficult: such cases de?ne crimes for which there is no other comparable crime, and thus they play no role in the analysis of bias (in Figure 5 these correspond to cases where the bars have height 1). In fact. from Figure 5 it is clear that for any attempt to analyze bias, small sample problems will abound: the richest sample of comparable crimes seem to be a collection of 4 black defendants and 4 white defendants. lt might be the case that things will improve if we can resolve issues involving missing data, but it is unlikely. Nonetheless, we give it a try. We begin with an investigation of the subset of the court docket numbers involving more than 20 of the observations: count__CD filter-(n >26) 14/29 3/19l2017 Preliminary Bias Report Source: local data frame [6 3] Groups: RACE (fctr) (fctr) (intThis suggests that observations involving missing court docket numbers account for a significant portion of the data. To address this problem we begin by ?ltering the data with missing court docket numbers and grouping by OBTSranberandrace: missing_CD Williams_clean present_CD Nilliams_clean RA CE) count_missing_CD missing_CD tally() count_present_CD present_CD tally() aes(x=n, fill=RACE)) geom_histogram(bins 106) I labs(title= ?Judge Charles Williams trials by race and counts per trial") of OBTS numbers associated to court docket number") labs(y= "Number of unique OBTS numbers") 6: The number of OBTS numbers associated to unique court docket numbers") 1529 3(19l2017 Preliminary Bias Report Judge Charles Williams trials by race and counts per trial 7.5- a G) .D 3 8 5.0- RACE .W 3 2.5- I 0.Number of OBTS numbers associated to court docket number Figure 6: The number of OBTS numbers associated to unique court docket numbers In Figure 6 we plot data for which there is a court docket number and we group by OBTS number. There are 34 groups and there are 36 court docket numbers. suggesting OBTS numbers are a good proxy for the court docket numbers which are present. In fact, every court docket number has only one OBTS number associated to it (there are single arrests that are tried over more than one trial). Note that the largest group which constitutes "same crime" corresponds to a collection of 8 cases (4 white defendants and 4 black defendants). aes(x=n, fill=RACE)) geom_histogram(bins 65) labs(tit1e= "Judge Charles Williams trials by race and counts per trial") of observations associated to missing court docket number") labs(y= "Number of unique OBTS numbers") 7: The number of OBTS numbers without court docket numbers") 16l29 3/19l2017 Preliminary Bias Report Judge Charles Williams trials by race and counts per trial 6- i2 0 .D E4- RACE Number of observations associated to missin court docket number Figure 7: The number of OBTS num are without court docket numbers In Figure 7 we plot data for which there is no court docket number and we group by OBTS number and race. There are 23 OBTS numbers plotted in this data. Since there are a total of 48 unique OBTS numbers, this suggests that some of the data without court docket numbers should be folded into the data plotted in Figure 6. From the point of view of improving sample size, the largest possible increase would result in a sample of size 14 with 6 white defendants and 8 black defendants. But is that what happens? We isolate the OBTS numbers that occur in both lists and plot: present_OBTS move__DBTS missing_CD 90% %in% present_OBTS) count_move_OBTS move_OBTS ta11y() aes(x=n, fill=RACE)) geom_histogram(bins 65) 1abs(tit1e= "Judge Charles Williams trials with OBTS numbers with and without docket numbers") of observations associated to missing court docket number, labs(y= "Number of unique DBTS numbers") 8: The number of OBTS numbers with and without court docket numbers") 17I29 3/19/2017 Preliminary Bias Report Judge Charles Williams trials with OBTS numbers with and without docket numb 2.0 - .L 01 I RACE Number of unique OBTS numbers 9 01 0.0 - 4 12 1'6 Number of observations associated to missing court docket number, OBTS Figure 8: The number of OBTS numbers with and without court docket numbers Figure 8 indicates that additional analysis would yield a maximal sample of at most 9 defendants. In particular, sample sizes available for analysis of bias in sentencing will remain very small, even when we restrict to the rough level of equating cases with the same number of charges. This analysis indicates that an aggregation of data that informs questions of bias in sentencing must involve a schema for dealing with small samples. It is beyond the scope of this investigation to construct such a schema, implement it and analyze the result; we will return to these tasks in a later investigation. Our purpose is simply to demonstrate that meaningful bias analysis cannot occur without transparent treatment of the above. It is possible that these concerns arise because we have chosen to focus on data associated to Judge Williams. To address such concerns, we run the same analysis on data associated to Judge Lee Haworth. in addition to addressing the extent to which the problems will arise in a general analysis, we provide a guide to how to use our code to investigate the data associated to any judge whose name appears in the data. A second example We ?lter the original data to obtain the records for which Lee Haworth is the sentencing judge, the prosecutor ?les charges. the defendant is not a juvenile, the crime is a felony and the defendant is found guilty. 18/29 3119/2017 Preliminary Bias Report #remove white space Data[,86] ##initiat data fitter Haworth_data Data I I ION I Our ?ltration results in 16607 observations of 105 features. We can produce a preliminary view of the data: fill geom_bar() facet_grid(TRIAL .) labs(title "Judge Lee Haworth: Distribution of felonies by degree, faceted on trial") labs(y= "Number of records?) labs(x= "Degree of felony") 9: Comparison of cases involving trials (2 and 3) and cases involving no trial Judge Lee Haworth: Distribution of felonies by degree, faceted on trial 10000- 7500 - 5000 - 0 - 10000 - 7500 - 5000 - I 2500 - 0- ESE-F 10000- Number of records 7500 - 5000 2500 I Degree of felony Figure 9: Comparison of cases involving trials (2 and 3) and cases involving no trial (1) As was the case with Judge Williams, the columns labeled and correspond to ?First?, "Life", ?Second? and ?Third? degree felonies. The column labeled correspods to ?Capital? cases; those for which the death penalty applies. We ?lter the data to study observations in which a trial occured: 19129 3/19/2017 Preliminary Bias Report w_data Haworth_data filter((TRIAL dim(w_data) 1599 195 As was the case when studying the data associated to Judge Williams, the data collapses when one restricts to cases that were tried: there are 1500 observations remaining from the 16607 observations with which we began (ie we are working with about 9 percent of the felony data). As before we study to what extent distributions change when restricting to cases in which a trial occured: fill: geom_bar() 1abs(title "Judge Lee Haworth: Felonies by degree for observations with trial") labs(y= "Number of records") 1abs(x= "Degree of felony") labs(caption= "Figure 16: Stacked barplot giving distribution of observations involving felony trials by race") Judge Lee Haworth: Felonies by degree for observations with trial 400- U) 1.2 8 RACE felony Figure 10: Stacked barplot giving distri ution of observations involving felony trials by race As before, the data again collapses: only about 9% of the observations involve trials. We clean the data as we did before: 3/19/2017 Preliminary Bias Report ##Trim to work with second subfieLd w_data gsub("". . . . . . . ##Use the function to mark the NA observations Thus, we have eliminated 114 observations in which the length of sentence ?eld is recorded as "Not Applicable.? Next, we isolate observations involving death sentences. From the data dictionary, these correspond to sentences of the form "9999999". a I sprintf("There are %5 observations associated to death sentences.", a) ?There are 8 observations associated to death sentences.? This indicates that there are 8 observations with death sentence recorded in the field for length of con?nement in the data involving Judge Haworth. a sprintf("There are %s distinct OBTS numbers associated to death sentences.", a) ?There are 1 distinct OBTS numbers associated to death sentences." Thus, the 8 observations involving death sentences are associated to 1 OBTS number. Next we consider observations involving life sentences. As before according to the data dictionary, observations involving life sentences are coded as ?9999998?: a sprintf("There are %5 observations associated to life sentences.", a) "There are 217 observations associated to life sentences." This indicates that there are 217 observations with life sentences recorded in the field for length of confinement in the data involving Judge Haworth. We record the felony degree and race corresponding to the life sentence data, as well as the associated OBTS number, and court docket number: LIFE a sprintf("There are %s distinct OBTS numbers and %5 distinct court docket numbers associated to 1 ife sentences.", a, b) 21/29 3/1912017 Preliminary Bias Report i "There are 12 distinct OBTS numbers and 11 distinct court docket numbers associated to 1i I fe sentences." I What this means is that the observations in the data corresponding to life sentences are associated to 12 OBTS numbers and 11 court docket number. The entry ?9999998? if converted to years is larger than 999. We can filter the data for other observations with sentences in excess of 500 years: LONG 569) a sprintf("There are %5 distinct OBTS numbers, %5 distinct court docket numbers and %s distinct se ntence dates associated to life sentences.", a, b,c) ?There are 13 distinct OBTS numbers, 12 distinct court docket numbers and 13 distinct sen tence dates associated to life sentences." These results indicate that we have accounted for all observations asociated to Judge Haworth involving life sentences and death sentences. If we ?lter on sentences longer than 500 years we know that we lose data associated to 12 trials and 13 OBTS numbers. We convert the length of confinement data to years using our previously defined functions: mutate(SYEARS We filter on occurences of such outcomes greater than 500 years and deal with them as separate cases: Haworth_clean filter(SYEARS 566) ##Continue conversion Haworth_c1ean Haworth_clean Haworth_clean Haworth_clean mutate(SDAYS mutate(STOTAL SYEARS SDAYS) Because there are 1161 observations in the clean data, there were 225 observations corresponding to sentences over 500 years. Thus, all the observations involving sentences of more than 500 years have been accounted for. in fact, only observations with 3 associated sentence of less than 100.5 years remain: Min. Qu. Median Mean 3rd Qu. Max. 6.66 5.66 19.33 12.68 16.59 169.56 We can visualize how charges are distributed in our clean data set an/D i i nary_Bi as_Report.htm 2229 31912017 Preliminary Bias Report fill: geom_bar() labs(title "Judge Haworth: Felonies by degree for observations with trial and sentence") labs(y= "Number of records?) labs(x= "Degree of felony?) labs(caption= ?Figure 11: Distribution of observations involving felony trials by race for cle an data") Judge Haworth: Felonies by degree for observations with trial and sentence 500- 400- t? 5 300- 8 IQFKZE "5 I6 Iw e200- :5 100- 0_ Degree of felony Figure 11: Distribution of observations involving felony trials by race for clean data Treating charges as independent, we can visualize the distribution of sentence length for observations as a function of race, faceted on felony degree (Again, this information does not help with the analysis of bias. but it is information we can collect with ease): aes(x=RACE, fill=RACE)) geomuboxplot() . scales "free_y") labs(tit1e= "Judge Lee Haworth sentence length by race, faceted on felony degree") labs(y= "Length of sentence in years") labs(caption= "Figure 12: Boxplot for sentence length as a function of race for felony degre 23/29 3i19/2017 Preliminary Bias Report Judge Lee Haworth sentence length by race, faceted on felony degree 10010' Length of sentence in years .3 01 I I VV 0 m--iI Race Figure 12: Boxplot for sentence length as a function of race for felony degree. Finally. we investigate how observations are distributed over trials. We begin by counting the number of trials using unique court docket numbers: a sprintf("There are %5 distinct court docket numbers.", a) "There are 56 distinct court docket numbers." Once again, there is a relatively small number of trials (56) given the number of observations (1161). We can get an initial plot of the data by_pD Haworth_c1ean count_CD by_CD ta11y() aes(x=n, fill=RACE)) geom_histogram(bins 256) 1abs(tit1e= "Judge Lee Haworth: number of observations by court docket number and race") of observations associated to a race-court docket pair") 1abs(y= "Number of court docket numbers") 12: Studying the number of observations associated to race-court docket airs") 24/29 3119/2017 Preliminary Bias Report Judge Lee Haworth: number of observations by court docket number and race 10.0- 7.5- :5 *5 RACE 8 a 5.0- lw 0 53 .O 2.5- 2 I 0.0- 50 160 1&0 260 250 Number of observations associated to a race-court docket pair Figure 12: Studying the number of observations associated to race-court docket pairs We begin with an investigation of the subset of the court docket numbers involving more than 60 of the observations: count_CD filter(n >66) Sour-Ce: local data frame [4 3] Groups: RACE (fctr?) (fctr) (int111 This suggests that observations involving missing court docket numbers account for a significant portion of the data. To address this problem we begin by ?ltering the data with missing court docket numbers and grouping by OBTS number and race: 25/29 3/191'2017 Preliminary Bias Report missing_CD Haworth_c1ean present_CD Haworth_c1ean count_missing_CD missing_CD ta11y() count_present_CD present_CD ta11y() aes(x=n, fill=RACE)) geom_histogram(bins 111) labs(tit1e= "Judge Lee Haworth trials by race and counts per trial") of OBTS numbers associated to court docket number") labs(y= "Number of unique OBTS numbers") 13: The number of OBTS numbers associated to unique court docket number Judge Lee Haworth trials by race and counts per trial a: .0 10- 3 .9 RACE 6'0 9'0 Number of OBTS numbers associated to court docket number Figure 13: The number of OBTS numbers associated to unique court docket numbers In Figure 13 we plot data for which there is a court docket number and we group by OBTS number. There are 56 groups and there are 56 distinct court docket numbers, suggesting OBTS numbers are a good proxy for the court docket numbers which are present. Note that the largest group which constitutes ?same crime? corresponds to a collection of 14 cases (8 white defendants and 6 black defendants). As before. we can investigate how missing data is partitioned amongst OBTS numbers and plot: 26129 3119f20?l7 Preliminary Bias Report aes(x=n, fill=RACE)) geom_histogram(bins 111) labs(tit1e= "Judge Lee Haworth trials by race and counts per trial?) of observations associated to missing court docket number") 1abs(y= "Number of unique OBTS numbers") 14: The number of OBTS numbers without court docket numbers") Judge Lee Haworth trials by race and counts per trial 4.. 2 83? 3 no RACE 0 022Number of observations associated to missing court docket number Figure 14: The number of OBTS num ers without court docket numbers In Figure 14 we plot data for which there is no court docket number and we group by OBTS number and race. There are 30 OBTS numbers plotted in this data. Since there are a total of 80 unique OBTS numbers, this suggests that some of the data without court docket numbers should be folded into the data plotted in Figure 6. From the point of view of improving sample size, the largest possible increase would result in a sample of size 18 with 8 white defendants and 10 black defendants. But is that what happens? We isolate the OBTS numbers that occur in both lists and plot: present_OBTS move_pBTS missing_?D %in% present_pBTS) count_move_OBTS move_pBTS ta11y() aes(x=n, geom_histogram(bins 111) labs(tit1e= "Judge Lee Haworth trials with OBTS numbers with and without docket numbers?) of observations associated to missing court docket number, 1abs(y= "Number of unique OBTS numbers") 15: The number of OBTS numbers with and without court docket numbers") 27129 3119/2017 Preliminary Bias Report Judge Lee Haworth trials with OBTS numbers with and without docket numbers 1.00- s; 01 I RACE 0.50 - 0.25 Number of unique OBTS numbers 0.00 - 6 2'0 4'0 6'0 Number of observations associated to missing court docket number, OBTS Figure 15: The number of OBTS numbers With and without court docket numbers Thus, the best improvement we can hope for results in a sample of size 17. Again, in this case the sample involves cases with the same number of charges. Further re?nement is likely to result in significantly smaller sample sizes, again resulting in the necessity of a method to deal with small samples. Conclusions and future directions Our investigation reveals several important features of the data as it relates to questions of bias in sentencing: - Including the data associated to plea deals greatly distorts any analysis of sentence bias that might be associated to judges. - Simple aggregates across charge and degree treat each observation as an independent event, contrary to what is known to be the case. Relying on such aggregates to measure bias in sentencing results in inference without con?dence. - When aggregating data across trials to de?ne crimes which can reasonably be compared, one runs into problems involving small samples. Our ?rst two observations are easy to address: filter the data appropriately and do not use the aggregates that are seductively easy to obtain. The third observation requires much more careful consideration. Dealing with small sample sizes will require a protocol which - Identi?es cases in which similar crimes are being tried; - Identi?es if there is a difference in sentencing for such cases; - Identi?es whethere there is a pattern in the sentencing differences. We intend to deveIOp and implement an algorithm which addresses these three design constraints and we intend to use it to continue our analysis. Preliminary Bias Report The problem of constructing an appropriate tool to study small sample problems was not the only interesting question that arose in our study that proved to be beyond the scope of a preliminary analysis. indeed. our preliminary analysis provides far fewer answers than it does questions. For example, - In studying the difference between Figure 1 and Figure 2 and the difference between Figure 9 and Figure 10, one notes that there is a difference in the proportion of defendants taking plea deals which is correlated to race and charge. Is this really a feature of the data? - What do patterns we observe in sentencing mean about the criminal justice system? Are there other correlates for which we see patterns similar to those involving race (for example, representation, location, etc.)? - Is there a better way to collect data if one is interested in providing transparency for outcomes? We hope to return to these questions soon. We have written our code in hopes that others will use it. Our second example was intended in part to show how this might be done: by changing the names ofjudges we produce new results. We can do the same with other fields. For example, we can study how having a public defender as counsel affects outcomes by changing ?race? to an appropriately modi?ed ?eld. It is our hope that others will expand on what we have done to improve the system we share. 1. The Sarasota Herald Tribune series can be found at 2. More information about the OBTS can be found at .pdf .pdf)? 3. For more info on dumping/restoring a SQL database please refer to the following docs: 4. More on exporting query results can be found in the PostgreSQL docs here: 5. and Studio are available for free download: 29/29