econstor A Service of zbw Make Your Publications Visible. Leibniz-Informationszentrum Wirtschaft Leibniz Information Centre for Economics Mastrobuoni, Giovanni; Rivers, David A. Working Paper Criminal Discount Factors and Deterrence IZA Discussion Papers, No. 9769 Provided in Cooperation with: IZA – Institute of Labor Economics Suggested Citation: Mastrobuoni, Giovanni; Rivers, David A. (2016) : Criminal Discount Factors and Deterrence, IZA Discussion Papers, No. 9769, Institute for the Study of Labor (IZA), Bonn This Version is available at: http://hdl.handle.net/10419/141528 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Zwecken und zum Privatgebrauch gespeichert und kopiert werden. Documents in EconStor may be saved and copied for your personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich machen, vertreiben oder anderweitig nutzen. You are not to copy documents for public or commercial purposes, to exhibit the documents publicly, to make them publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, gelten abweichend von diesen Nutzungsbedingungen die in der dort genannten Lizenz gewährten Nutzungsrechte. www.econstor.eu If the documents have been made available under an Open Content Licence (especially Creative Commons Licences), you may exercise further usage rights as specified in the indicated licence. SERIES PAPER DISCUSSION IZA DP No. 9769 Criminal Discount Factors and Deterrence Giovanni Mastrobuoni David A. Rivers February 2016 Forschungsinstitut zur Zukunft der Arbeit Institute for the Study of Labor Criminal Discount Factors and Deterrence Giovanni Mastrobuoni University of Essex and IZA David A. Rivers University of Western Ontario Discussion Paper No. 9769 February 2016 IZA P.O. Box 7240 53072 Bonn Germany Phone: +49-228-3894-0 Fax: +49-228-3894-180 E-mail: iza@iza.org Any opinions expressed here are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but the institute itself takes no institutional policy positions. The IZA research network is committed to the IZA Guiding Principles of Research Integrity. The Institute for the Study of Labor (IZA) in Bonn is a local and virtual international research center and a place of communication between science, politics and business. IZA is an independent nonprofit organization supported by Deutsche Post Foundation. The center is associated with the University of Bonn and offers a stimulating research environment through its international network, workshops and conferences, data service, project support, research visits and doctoral program. IZA engages in (i) original and internationally competitive research in all fields of labor economics, (ii) development of policy concepts, and (iii) dissemination of research results and concepts to the interested public. IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author. IZA Discussion Paper No. 9769 February 2016 ABSTRACT Criminal Discount Factors and Deterrence* The trade-off between the immediate returns from committing a crime and the future costs of punishment depends on an offender’s time discounting. We exploit quasi-experimental variation in sentence length generated by a large collective pardon in Italy and provide nonparametric evidence on the extent of discounting from the raw data on recidivism and sentence length. Using a discrete-choice model of recidivism, we estimate an average annual discount factor of 0.74, although there is heterogeneity based on age, education, crime type, and nationality. Our estimates imply that the majority of deterrence is derived from the first few years in prison. JEL Classification: Keywords: D9, K4 discounting, deterrence, collective pardon, recidivism Corresponding author: Giovanni Mastrobuoni University of Essex Department of Economics Wivenhoe Park C04 3SQ Colchester United Kingdom E-mail: gmastrob@essex.ac.uk * We would like to thank Philip Cook, Igor Livshits, Lance Lochner, Salvador Navarro, Daniel Nagin, as well as seminar participants at the University of Essex, the University of Western Ontario, and the University of Trier. 1 Introduction Criminal activity involves an inherent trade-off between the immediate returns associated with committing the crime and the possibility of future punishments if caught. Since punishments are received in the future, the value that a potential criminal places on them depends on how that individual values future events. One implication is that individuals who more heavily discount the future will be more likely to engage in crime. Knowing the degree to which potential criminals discount the future is therefore a key component to developing a better understanding of what drives criminal behavior and for designing crime policy. This information is of substantial interest to policymakers and those involved in law enforcement, as it relates directly to the deterrent power of criminal punishments. This is a pressing issue, as policy makers are currently deciding whether to reform some of the mandatory minimum sentences enacted in the 1970s and 1980s, which contributed to prison population increases of 400 percent (Hunt (2015)).1 Discounting has two important implications for deterrence. First, the more an individual discounts future events, the less deterrent power future punishments have. Second, discounting shapes the relative deterrent power of punishments received at different points in the future. For example, consider an offender who is choosing whether to use a modus operandi that might aggravate his/her crime (e.g., using a firearm, being violent, damaging property, etc.). As long as criminals discount the future, individuals considering crimes with longer baseline sentences will be less deterred by the same sentence enhancements compared to those considering crimes with shorter sentences, as these enhancements would be served further into the future. The magnitude of this differential effect is increasing in the degree of discounting. Discounting also has implications for the debate over the effect of severity of punishment on deterrence. In the original economic model of crime developed by Becker (1968), certainty and severity of punishment combine to form the expected cost of committing a crime. Since severity is typically associated with sentence length, it is commonly assumed that doubling sentence length leads to a doubling of the “costs” paid by offenders, and is therefore equivalent to a doubling of the probability of punishment (certainty). But this is only true if criminals do not discount the future, since increased sentence lengths are added on at the end. Discounting of future consequences fundamentally changes this relationship, as 1 The US House of Representatives has recently unveiled a sentencing reform legislation as a companion to a bipartisan bill introduced in the Senate on October 1, 2015, the Sentencing Reform and Corrections Act of 2015 (S. 2123). 2 it implies decreasing marginal deterrence of sentence length (severity). As a consequence, the marginal cost to society of a given increase in deterrence is increasing, rather than constant, as sentences are extended. A related consequence is that as discounting increases, the deterrent power of severity of punishment decreases relative to that of the certainty of punishment. The notion that swiftness and certainty of the penalty outmatch severity has been theorized at least as early as 250 years ago by Beccaria (1764). In a survey of the recent evidence for deterrent effects of imprisonment, Durlauf and Nagin (2011) find that there is little evidence for strong marginal deterrent effects of increasing the severity of punishment, but considerable support in the literature for marginal deterrent effects of increasing the certainty of punishment (see e.g., Hawken and Kleiman (2009)). One explanation that they offer is low discount factors among criminals. The idea that discounting is an important component of criminal behavior has been recognized in the literature at least since Ehrlich (1973). Cook (1980) highlights that with discounting and a constant perperiod disutility of prison, increasing the severity of punishments will have a greater deterrent effect when the initial punishment is mild. Davis (1988) and Polinsky and Shavell (1999) develop formal models of intertemporal crime decisions that explicitly take into account the effect of discounting. Wilson and Herrnstein (1985) and Katz, Levitt, and Shustorovich (2003) suggest that one explanation for criminal behavior itself is low discount factors among those individuals that decide to commit crimes. However, despite prior recognition in the literature of the importance of discounting for criminal behavior (see also McCrary (2010)), there is very limited empirical evidence about the extent of discounting among criminals. In this paper, we provide direct quantitative estimates of discount factors for criminals by taking advantage of a unique dataset related to a large-scale collective pardon passed in Italy at the end of July, 2006. In an attempt to reduce prison overcrowding, more than 20,000 inmates, corresponding to over one-third of the entire prison population, were released over a period of a few weeks. Inmates who had a residual sentence below three years at the time of the collective pardon were immediately released. A key condition of the pardon was that, if a released inmate is found guilty of a crime in the future, the pardoned sentence (or residual sentence) is automatically added to the new sentence. Our dataset consists of individual-level data on each released inmate. In addition to detailed information on the characteristics of each inmate, we observe the length of their original sentence that led to their pre-pardon incarceration, their residual sentence length, and whether or not they recidivate over a 3 period of 17 months following the pardon. Our identification strategy for estimating the discount factor is based on measuring how recidivism rates decrease with the length of the sentence. Specifically, it is the shape of this relationship that identifies the discount factor. The presence of discounting implies that the marginal effect on offending is decreasing in sentence length, as individuals discount the later years of a sentence more heavily. This leads to a convex decreasing relationship between offending (recidivism) and total sentence length. The more convex the relationship is, the lower the discount factor. A key challenge for identification arises due to the fact that prison sentences are not randomly assigned. Judges impose sentences based on characteristics of the offender and the offender’s criminal history, many of which will be unobserved to the researcher. This correlation between unobserved drivers of crime and observed sentence lengths generates an endogeneity problem, which has long been recognized in the deterrence literature (see Ehrlich (1973) for an early discussion and Levitt and Miles (2007) and Durlauf and Nagin (2011) for more recent summaries). The natural experiment generated by the Italian pardon, however, provides a solution. Under the conditions of the pardon, inmates whose original incarceration happened at different times face different expected total sentences for the same future crime. Conditional on the length of the original sentence, the residual sentence only depends on the original date of entry into prison, which is plausibly exogenous and can be exploited to estimate the discount factor.2 This source of exogenous variation in sentence length has been used previously by Drago, Galbiati, and Vertova (2009) to measure the average deterrent effect of imprisonment. An additional advantage of our analysis is that the policy we study is likely to be particularly salient. Our dataset consists entirely of individuals who have been previously incarcerated, and therefore are particularly likely to be aware of sanctioning rules (Kaplow (1990)). The California Assembly Study (1968) shows, not surprisingly, that knowledge of the maximum penalty for various FBI index type crimes is far better among incarcerated individuals (62 percent) compared to the rest of the population (25 percent).3 Furthermore, the same policy was applied simultaneously to a large number of inmates, which should have served to circulate knowledge and understanding of the policy. Finally, one key aspect 2 Conditioning on the original sentence is important, because individuals with longer original sentences are more likely to have longer residual sentences, as well as higher recidivism rates. This is because individuals that are deemed more likely to recidivate are more likely to be given longer original sentences, and mechanically, individuals with longer original sentences are more likely to have longer residual sentences at the time of the pardon. 3 See also Nagin (2013) and Lochner (2007) on the importance of criminals’ perceptions in shaping crime. 4 of the policy (early release) was almost certainly noticed by all affected inmates. Our estimates imply annual discount factors of 0.74 among previously convicted criminals. This degree of discounting suggests that while these individuals place a significantly lower value on the future, it is still the case that punishments, even those received several years in the future, entail non-negligible costs to a potential offender. In order to highlight the importance of knowing the magnitude of discounting among criminals, it is useful to compare the effect of increasing sentence length for different values of the discount factor. Our estimated discount factor of 0.74 implies that doubling a 5-year sentence increases the disutility of prison by 22%, whereas doubling a 10-year sentence increases it by only 5%. If, instead, we employ a more traditional value of the discount factor of 0.95, these increases would be 77% and 60% respectively, suggesting a much larger potential role of sentence increases for deterrence, even at relatively long sentence levels. Our estimates, therefore, imply that increases in sentence length, either through mandatory minimums, sentence enhancements, or more severe sentencing, are unlikely to have much deterrent effect when the baseline sentence is already long. Since our dataset contains detailed information on the characteristics of each inmate, we are able to examine differences in discount factors across many different dimensions of individual heterogeneity. We find evidence of substantial heterogeneity in discount factors, suggesting that across different groups of individuals, the marginal deterrent effect of imprisonment diverges as sentences become longer. The largest differences in discount factors are associated with differences in educational attainment, age, nationality, and crime types. While the average discount factor is estimated to be 0.74 in our baseline specification, for some groups (high education and those who commit crimes related to organizing prostitution) the estimated discount factors are much closer to 1 (0.99 and 0.92, respectively). The lowest discount factors are found for immigrants (0.66) and drug-offenders (0.70). This heterogeneity in discount factors implies important differences in the deterrent effect of imprisonment across the population. For example, given our estimates, longer prison sentences are likely to generate more effective deterrence for offenders committing prostitution-related crimes compared to drug offenders. There are a few studies in the literature that have taken advantage of survey questions designed to elicit time preferences or measures of impulsivity, and relate these to criminal behavior (see e.g., Nagin and Pogarsky (2004); Jolliffe and Farrington (2009); Akerlund et al. (2014); and Mancino, Navarro, and Rivers (2015)). These papers generally find that individuals who are more present-oriented, or impulsive, 5 are more likely to commit a crime. One drawback of these approaches is that it can be difficult to quantify the magnitude (as opposed to just the sign) of these relationships given the qualitative measures of time preference that are often used. There are also concerns regarding the measurement of time preference from laboratory/experimental studies (Coller and Williams (1999)). The paper most closely related to ours is Lee and McCrary (2009), who analyze recidivism rates for a group of released juvenile inmates using a regression discontinuity design around the age of majority (age 18) in the severity of sentencing. They find only a small decrease in offending at the age of 18, when expected sentence length increases, which is consistent with low discount factors. We also contribute to the non-experimental literature that has estimated discount factors for consumers (see the early studies by Friedman (1957) and Heckman (1976)) as well as for various other subpopulations (US military personnel, Warner and Pleeter (2001); US purchasers of durable and storable goods, Hausman (1979); Ching and Osborne (2015); UK homeowners, Giglio, Maggiori, and Stroebel (2015); and US workers, Viscusi and Moore (1989)).4 Our dataset contains only individuals who have been convicted of at least one crime, and therefore our results correspond to the subset of the population with a criminal record. Relative to the literature that estimates discount factors for the broader population, our results are on the low end of the range of estimates, particularly among those papers that use observational or non-experimental data. Thus our estimates are consistent with high discounting serving as a determinant of crime. The rest of the paper is organized as follows. In Section 2 we describe our dataset covering the collective pardon in Italy. Section 3 contains the description of our model and Section 4 details our identification strategy. In Section 5 we present our results and perform several robustness checks. In Section 6 we discuss the implications of our results for measuring deterrence. Section 7 concludes. 2 Data Collective pardons are deeply rooted in Italian history. Since World War II there has been on average one pardon or amnesty every five years (Barbarino and Mastrobuoni (2014)), although in more recent years there have been only two, in 1990 and in 2006. Such pardons eliminate part of the inmates’ sentences, typically two or three years, and inmates whose new net sentence drops below 0 are immediately released. 4A series of papers also estimate discount factors using laboratory experiments (see e.g., Harrison, Lau, and Williams (2002); Harrison, Lau, and Rutström (2010); Meier and Sprenger (2010); Coller and Williams (1999)). 6 Given their wide reach, these pardons generate sudden releases of large numbers of inmates.5 On July 31, 2006 the Italian Parliament passed the last pardon, which was enacted shortly thereafter. We were given access to the incarceration spell of all prison inmates released on this occasion, including the exact dates of release and re-incarceration through December 2007. A key condition of the release is that if pardoned inmates are rearrested and convicted, the pardoned sentenced is added to the new sentence. Conditional on the initial sentence, this generates a plausibly exogenous source of variation in the expected severity of punishment.6 This variation in expected future sentences allows us to measure how recidivism varies with sentencing and thus identify the discount factor. We also have information on inmates’ nationality, age, education, and some other individual characteristics. These are the same data used in Mastrobuoni and Pinotti (2015), and, compared to Drago, Galbiati, and Vertova (2009), allow us to follow the inmates an additional 10 months after release, for a total of 17 months. A large number of inmates were released (over 20,000 individuals representing more than one-third of the prison population).7 In Table 1 we present summary statistics for the variables in our dataset. The average recidivism rate (within 17 months of release) is about 22%. Because our sample consists of individuals who have previously been convicted of a crime and been released from prison, the mean age in our sample (38) is higher than in many datasets on criminal offenders. One consequence of this is that these inmates are more likely to be familiar with the Italian criminal law and the rules governing their release. In other words, the deterrent effect of the law is likely to be salient. Overall the sample is quite uneducated. As shown in Table 1, about 27% of the individuals in our data have an education at or below primary school, and only 3% completed secondary school. About 62% of inmates are of Italian origin, 26% are married, 5% are female, and 15% were permanently employed prior to being imprisoned. 74% have a definitive sentence (i.e., exhausted all appeals). Table 2 provides a breakdown by crime type. Crime types are not mutually exclusive, as a crime can be both a property crime and a violent crime (e.g., robbery), and therefore the means of the indicator variables for each crime type do not sum to 1. Property crime is the most common type (58%), and prostitution and mafia-related crimes are the least common (2% each). 5 In the 2006 pardon a few crimes were excluded, including those related to the mafia, terrorism, kidnapping, and felony sex offenses, while other types of violent offenses, such as murder and armed robbery, were eligible. 6 Technically, only individuals who receive a future sentence of at least two years are subject to serving their residual (pardoned) sentence. We discuss this in more detail in Section 5.3. 7 As in Drago, Galbiati, and Vertova (2009), we drop individuals who were released while still awaiting trial, as we have no information about their sentence. In addition, we lose a small number of observations when matching the sample of released inmates used in Drago, Galbiati, and Vertova (2009) with a data extraction made by the Italian Prison Administration 17 months after the collective pardon. 7 We supplement our main dataset with auxiliary data on clearance rates and transition probabilities across crime types. Province-level data on the fraction of cleared crimes come from ISTAT (2005)’s criminal law statistics and are merged by province (there are 103 provinces in Italy) and crime type, with the individual-level recidivism data. The definition of a clearance in these data is that at least one suspect has been identified. As shown in Table 3, the province-level clearance rate is 23% on average, but with considerable variation both across location and crime type. While we have information on the crime committed before the pardon, for inmates who are reincarcerated all we observe for the new offense is the date of re-incarceration. Our main dataset also does not contain information on previous incarcerations that could be used to construct transition probabilities across crime types. Instead, we take advantage of a dataset containing information on all inmates who served time in prison in one of two prisons located in Milan (Bollate and Opera) between 2001 and 2012. Using the entire history of offenses for these individuals (spanning from 1972-2012), we construct transition probabilities between crime types and merge these into our dataset. For inmates who commit a crime that falls into more than one category, we base the transition probabilities on the most serious crime, based on either the average or median sentence for each crime type. Table 4 shows that both measures generate similar transition rates. The large probabilities on the main diagonal show that criminals tend to recommit the same types of crimes. 3 3.1 Model Baseline Model Consider the decision problem of an individual i who has been pardoned and released from prison after committing a crime of type j. Given the conditions of the collective pardon described in Section 2, if this person commits another crime of type k and is caught, their total prison sentence, denoted tsik , will be equal to the new sentence, nsik , plus a residual sentence rsi , where tsik = nsik + rsi . Each individual decides whether or not to recidivate, and if so, which crime to commit. For now let us assume that j = k; the new crime is the same as the original crime. We will relax this assumption in Section 3.2. Let ci j = 1 denote that person i commits a crime of type j and ci j = 0 otherwise. We normalize the utility associated with ci j = 0 to be equal to 0. The net expected utility associated with committing a 8 crime is given by: V (ci j = 1) = u Xiuj + Pjlc D Xidj ,tsi j , δ Xiδj + εi j . (1) The functions u and D capture the utility of committing a crime and the (dis)utility of being caught, sentenced to prison, and serving that time in prison.8 Xiuj denotes observed characteristics of the utility received from committing a crime that are related to the individual and the crime. Xidj denotes observed characteristics of the disutility of going to prison. Xiδj are observable characteristics that are related to differences in discount factors. Pjlc is the probability of being caught and sentenced to prison for committing crime j in location l, and εi j captures unobserved drivers of the crime decision. δ is the discount factor, and the main object of interest here. We assume that each year in prison leads to a constant flow disutility9 of d Xidj and discounting is T −1 t exponential.10 Given that ∑t=0 δ = 1−δ T 1−δ , we have that11  tsi j  1 − δ Xiδj   = d Xidj  D Xidj ,tsi j , δ Xiδj . 1 − δ Xiδj Letting u Xiuj = α0 + α1 Xiuj and d Xidj = β0 + β1 Xidj , the expected (net) utility of crime can be written as:  tsi j  h i 1 − δ Xiδj u d c  V (ci j = 1) = α0 + α1 Xi j + β0 + β1 Xi j Pjl   + εi j . 1 − δ Xiδj (2) The probability that a crime of type j is committed by individual i is then given by Pr (ci j = 1) = Pr [V (ci j = 1) > 0] , which is equal to tsi j  h i 1 − δ Xiδj    u d c  Pr εi j > − α0 + α1 Xi j + β0 + β1 Xi j Pjl   . δ 1 − δ Xi j    8 In Appendix A, we show that the model for utility in equation (1) can be obtained from a more general dynamic utility model through a restriction on the option value of future crimes. 9 This disutility includes the pain and suffering due to being imprisoned, the lack of freedom, as well as the opportunity cost of spending time in prison. We discuss this in more detail in Section 4. 10 In Section 5.3 we consider alternative forms of discounting such as hyperbolic discounting. 11 The equation below implicitly assumes that the first month in prison is served immediately (t = 0). Our results are not sensitive to this. 9 This equation is the basis for our estimates in Section 5. 3.2 Allowing for the New Crime to be Different from the Old Crime We now consider the case in which the new crime is not of the same type as the original crime (i.e., j 6= k). Let ci jk = 1 denote that individual i, who previously committed a crime of type j, commits a crime of type k after being released from prison. The net utility associated with this choice is V ci jk = 1 = α0 + α1 Xiku + " # h i δ tsik d c 1 − δ Xik + εik , β0 + β1 Xik Pkl 1 − δ Xikδ where tsik = nsik + rsi . The probability of ci jk = 1 is then given by Pr ci jk = 1 = Pr V ci jk > 0 . Recall that we only observe whether an individual is caught committing another crime and the day on which that occurred. We do not observe the type of the new crime type (e.g., violent, property, etc.). However, even though such information might be of use, it would still be unobserved for inmates who do not recidivate. With a slight abuse of notation, we can denote the probability that individual i, who previously committed crime j, commits another crime of any type as an integral over all possible future crime choices k Pr (ci j = 1) = ∑ Pr ci jk = 1 Pr (k j) , k where Pr (k j) are the probabilities of transitioning from crime type j to crime type k. In the data, we directly observe the residual sentence rsi and the original sentence osi j . However, we do not observe the new sentence that an individual would receive for committing a new crime: nsik . The new sentence is likely to be a function of both the individual i and the crime type k. Therefore, we assume that the new sentences can be decomposed as nsik = θi ask , where ask is the average sentence for a crime of type k, and θi is an individual-specific component of severity, reflecting violent attitudes of the offender, as well as any possible systematic sentencing differences based on characteristics of offenders that are observable to the judges. We can compute ask = 1 Ik I k osik from our data on original sentences, ∑i=1 where Ik is the total number of people who committed crime k. We can then compute the individual 10 component as θi = osi j as j . This implies that the new sentence for each possible crime choice k for a given individual i who previously committed crime j can be constructed as nsik = osi j ask . as j (3) In the baseline model for which j = k, this implies that nsi j = osi j . 3.3 Relationship to Deterrence Models Without Discounting The primary focus of our paper is estimating criminal discount factors. However, the model described above also has implications for the literature that attempts to measure the deterrent effect of imprisonment. In order to make this comparison clear, we will compare to the model used by Drago, Galbiati, and Vertova (2009), which uses a dataset very similar to ours to estimate the magnitude of deterrence, but the points we make apply more generally. h For simplicity, if we drop the characteristics X, and express 1−δ tsi j 1−δ i as h 1−δ osi j 1−δ +δ osi j (1−δ rsi ) 1−δ i , we can re-write equation (2) as: V (ci j = 1) = α0 + β0 Pjlc 1 − δ osi j 1−δ + β0 Pjlc δ osi j (1 − δ rsi ) + εi j . 1−δ (4) This equation resembles the baseline equation that Drago, Galbiati, and Vertova (2009) estimate, which is given by the following: yi = α + β0 osi +β1 rsi +εi , (5) where yi is a binary variable for whether individual i commits another crime after being released and εi is assumed to follow a logistic distribution.12 There are two key differences between equation (4) and equation (5). First, the original sentence os and the residual sentence rs enter (4) non-linearly. Second, rs does not enter equation (4) separately from os. The importance of this is that, in Drago, Galbiati, and Vertova (2009), the effect of the residual sentence does not depend (directly) on the length of the original sentence.13 Put differently, the first year 12 Drago, Galbiati, and Vertova (2009) also estimate specifications which include characteristics of the individual and crime fixed effects. 13 Because they estimate the model using a logit specification, this is not exactly true. Since the logit model is non-linear, the 11 of the sentence is assumed to have the same effect as the last year. This is a general feature of most models of deterrence in which the effect of additional years of imprisonment is not allowed to vary based on the length of the sentence onto which they are added. In equation (4), however, the effect of rs depends directly on os (recall that ns = os in our baseline setup). The intuition for this is straightforward. If an individual is caught committing another crime, the offender will receive a new sentence ns. In addition to this, the sentence will be extended by rs. In other words, after the ns months are served, the individual will need to serve an additional rs months. As long as δ < 1, this distinction will be important. When δ < 1, individuals discount the future, and the effect of an additional rs months in prison will depend on when that extra time is served, which depends on ns. The larger ns is, the further into the future the rs months are served, and the lower the effect on current behavior. By L’Hopital’s Rule, limδ →1 δ osi (1−δ rsi ) 1−δ os i = rsi and limδ →1 1−δ 1−δ = osi . Therefore, when δ = 1, equa- tion (4) becomes V (ci j = 1) = α0 + β0 Pjlc (osi ) + β0 Pjlc (rsi ) + εi j , which matches the form of the regression equation in Drago, Galbiati, and Vertova (2009). Thus their model can be seen as implicitly assuming that the discount factor is equal to 1. Since discount factors less than 1 generate marginal deterrent effects of imprisonment that are decreasing in sentence length, this implies that the estimate of deterrence in Drago, Galbiati, and Vertova (2009) is some average measure of deterrence across sentence lengths. This may help to explain one of their results. They find that for individuals with the longest original sentences, the effect of the residual sentence is negligible (pgs. 275276). They attribute this to the fact that long sentences are given for serious crimes, and they suggest that more dangerous inmates (those that commit more serious crimes) are not deterred by prison. However, given the derivations above, an alternative explanation is that these individuals, if caught again, are likely to face a lengthy new sentence. The residual sentence will be served after that, and therefore far into the future. If individuals discount the future, then the residual sentence will have a low effect on utility, and thus it will not have much of an effect on decisions to recidivate.14 original sentence length will impact the effect of the residual sentence, but in the same way as any other covariate included in the regression. The original sentence length also has this effect in equation (4), but has an additional effect via its interaction with the residual sentence. 14 More generally, our model predicts that when the discount factor is less than 1, the effect of the residual sentence on recidivism should be decreasing in the length of the original sentence, where recall that the original sentence is our proxy for 12 This analysis has important implications for studies of deterrence. To the extent that punishments are applied in the future (as is the case with imprisonment), failing to account for it might mask some important heterogeneity in deterrent effects. For example, sentence enhancements or other policies designed to lengthen sentences for very serious offenses (where baseline sentences tend to be long) will potentially have very small deterrent effects on crime. 4 Identification and Estimation Under the assumption that εi j is independent of Xiuj , Xidj , Pjlc ,tsi j , one could estimate the model described by equation (2) via maximum likelihood. However, as stated earlier, the total sentence tsi j = nsi j + rsi is potentially correlated with the error εi j . There may be characteristics of the individual, which make that person more likely to commit crimes, that are unobserved to the econometrician, but observed by judges. As a result, individuals with higher εi j ’s may also receive longer sentences, which generates a correlation between εi j and nsi j (and osi j ). This also generates a correlation between εi j and rsi , since individuals with longer original sentences are likely to have longer residual sentences as well. Fortunately, it is still possible to consistently estimate the parameters of the model. The key identifying assumption is that conditional on the original sentence received osi j , the total sentence tsi j is independent of the error. This conditional independence assumption is based on the fact that the timing of the release is dictated by an (unanticipated) mass prison release. Therefore, the only source of endogeneity in the residual sentence rsi is via the original sentence length. Once we condition on the length of the original sentence, the residual sentence is independent of the error.15 Since nsik = osi j as j ask (or nsi j = osi j in the baseline model), controlling for the original sentence also controls for the endogeneity of the new sentence, since as j and ask do not vary by individual. To summarize, the error in equation (2) is assumed to be correlated with the original sentence, but conditional on the original sentence, independent of the residual sentence. Therefore, as we discuss in more detail in Appendix B, we can control for the endogeneity by expressing ε as a function of the original sentence: εi j = h (osi j ; γ) + ui j , the new sentence. Although they do not highlight this, this is exactly what Drago, Galbiati, and Vertova (2009) find (see their Table 4). 15 See Drago, Galbiati, and Vertova (2009), which employs a similar assumption in order to estimate the deterrent effect of imprisonment, for a detailed justification of this assumption. 13 where h is a flexible function of os, and u is is assumed to be independent of os. By replacing this expression in to equation (2), we have the following equation:  tsi j  h i 1 − δ Xiδj u d c  V (ci j = 1) = α0 + α1 Xi j + β0 + β1 Xi j Pjl   + h (osi j ; γ) + ui j , 1 − δ Xiδj (6) where u is independent of Xiuj , Xidj , Pjlc ,tsi j , osi j . The additional term h (osi j ; γ) controls for the dependence between the error and the total sentence tsi j that would otherwise bias our estimates.16 Intuitively, identification of the discount factor is obtained by measuring how each additional month of a prison sentence deters future crimes, conditional on the total length of the sentence. If the discount factor was equal to 0, then variation in the residual sentence would have no impact on the probability of recidivism, as individuals would place no value on future prison sentences. As the discount factor increases towards 1, each additional month of residual sentence should have an increasing (negative) effect on the probability of recidivism. The rate at which the marginal deterrent effect of imprisonment changes is what identifies the discount factor. A key identifying assumption is that the flow disutility from prison does not depend on the sentence length.17 If this were not the case, and the per-period flow disutility decreased by a constant fraction each period (mimicking the discount factor), then we would not be able to separately identify the discount factor from the rate of decay of the disutility of prison. We are not aware of any estimates in the economics literature on the relationship between prison time and the per-period disutility of imprisonment. However, studies in criminology and psychiatry have found no evidence of a correlation between prison time and various measures of prison well-being, e.g., subjective quality of life, depression, anxiety, post-traumatic stress symptoms (see Bukstel and Kilmann (1980); Hochstetler, Murphy, and Simons (2004); Gullone, Jones, and Cummins (2000)). Furthermore, given our baseline estimates of the discount factor in Section 5 below, the value of the 15th year in prison is only about 1% of the value of the first year in prison. If instead of being due to discounting, this was purely a result of a decline in the flow disutility of imprisonment over time, then this would suggest that criminals are close to indifferent between spending that 15th year in prison versus being released, which we believe is unlikely to be the case. While we believe that the assumption of a constant flow disutility of prison is a reasonable one it may 16 Essentially 17 This h osi j ; γ acts as a control function (Heckman and Robb (1985)). assumption is also employed in Lee and McCrary (2009). 14 not even be necessary from a policy perspective. A more general interpretation of our results would be that of decreasing marginal returns to additional years of imprisonment. The policy-maker cares about the deterrent effect of various sentence lengths when designing policies to reduce crime, and may not care if the source of diminishing marginal returns to imprisonment come from discounting or from a decreasing negative value placed on subsequent periods of punishment. Drago, Galbiati, and Vertova (2009) find that each additional month of residual sentence decreases the probability of recidivism, which they interpret as evidence of a deterrent effect of imprisonment. An alternative explanation that has been offered by Philip Cook (see Durlauf and Nagin (2011)) is that each additional month of imprisonment (and therefore one less month of residual sentence) increases a prisoner’s criminal capital, which leads to a higher recidivism rate. Since both theories generate a negative relationship between residual sentence and the propensity to recommit crime, it is difficult to disentangle the two mechanisms. However, by considering the relationship between discounting and deterrence, we show that it is possible to distinguish between them. Under the theory of deterrence, as long as discount factors are strictly below 1, the first months of the residual sentence should affect behavior the most, since they are discounted the least. Alternatively, under the theory that being imprisoned is a treatment that increases the propensity to commit crime, one would expect that the first months of imprisonment have the strongest effects, as inmates acquire the most knowledge capital initially (i.e., decreasing marginal returns). We illustrate this idea in Figure 1, for a hypothetical original sentence of 3 years. Since the amount of time served in prison and the residual sentence are negatively correlated, this implies that last few months of the residual sentence, which correspond to the earliest months of imprisonment, should have the largest impact on behavior under the treatment hypothesis, and the opposite under the deterrence hypothesis. In the left panel, we have plotted the hypothetical marginal effects of an additional month of residual sentence under both the deterrence and treatment mechanisms. For deterrence, we have assumed a discount factor of 74%, and for treatment we have assumed that the value of treatment decreases exponentially at a rate of 74%, (i.e., the criminal capital acquired in prison next year is 74% of the current year). In the right panel, we have plotted the cumulative effects. The first thing to note about the right panel is that both mechanisms generate a negative relationship between residual sentence and recidivism, which is what makes it challenging to distinguish between the two theories. The key difference is in the shape of the relationship, which allows us to disentangle them. For deterrence it is convex, while for 15 treatment it is concave. Intuitively, the shape of the relationship between the (dis)utility of imprisonment (which drives recidivism in the data) and sentence length is what identifies the discount factor in our model. We have illustrated this in Figure 2, by graphing the cumulative utility from prison as a function of sentence length for a discount factor of 0.74. The more convex the relationship between residual sentence length and the probability of recidivism, the larger the degree of discounting that is occurring. In the case of no discounting (discount factor of 1), this relationship would be linear. Since we find estimates of the discount factor of around 0.74, this indicates that this relationship in the data is in fact convex. Under the assumption of decreasing marginal returns to criminal capital accumulation, the relationship between residual sentence and recidivism is concave. In order for the treatment hypothesis to rationalize the data, it would need to be the case that the criminal capital returns to being in prison are convex over time. This would imply that individuals learn very little initially while in prison, but that this learning accelerates over time, with the most criminal capital being gained in the very last part of incarceration, a scenario which we believe is less likely. Overall, this insight connecting deterrence and discounting helps us to separately identify the two hypotheses, and lends support to the deterrence hypothesis. 5 Empirical Results Before we present the estimates from our model, we first provide some evidence from the raw data that illustrates how the discount factor is identified. In Table 5, we present linear probability model estimates of recidivism on residual sentences, conditional on the inmates’ original sentence, for various bins of the original sentence. If people did not care at all about the future, in other words had discount factors of 0, then the length of the residual sentence would not matter for recidivism, and we would expect all of the coefficients on residual sentence to be zero. If they had discount factors of 1, meaning that they were not discounting the future at all, the coefficient would be negative, but it would be the same no matter the original sentence. If the discount factor was between 0 and 1, then the coefficient should be decreasing in absolute value towards 0 with the original sentence, where recall that the original sentence proxies for the new sentence. We do indeed find that the group with the shortest original sentences (below one year) has the steepest marginal reduction in recidivism: as one increases the residual sentence by one month, recidivism drops by 1.46 percentage points (pp). The second steepest slope coefficient is found for inmates with the next 16 longest sentences (between one and two years) with a slope indicating a drop in recidivism by 0.50 pp, and so on. For the last group of inmates, those with sentences that are longer than six years, the regression line is almost flat. The slope for the overall sample of -0.20 pp represents a weighted average of the various slopes and is, when considering the longer recidivism window, similar in size compared to the estimated marginal effect of imprisonment of -0.16 pp found in Drago, Galbiati, and Vertova (2009). The coefficient on residual sentence for the shortest sentences is almost an order of magnitude larger than the overall value. This indicates that the relationship between the probability of recidivism and residual sentence is substantially weakening as the original sentence length increases and is consistent with the presence of non-trivial amounts of discounting behavior by criminals. As a more direct way to see how the discount factor is identified in the data, we provide some nonparametric plots of the relationship between recidivism and total expected sentence length. Specifically, we compute the average recidivism rate for each of 40 quantiles of sentence length, and plot them in the left panel of Figure 3.18 The figure illustrates that as the total sentence length increases, the recidivism rate decreases, which is consistent with deterrence. Furthermore, the relationship is convex, suggesting that the marginal deterrent effect of imprisonment is decreasing, consistent with discounting. In the right panel, we add original sentence as a control in order to deal with the endogeneity of the total expected sentence.19 To do this, we regress the recidivism indicator on a flexible function of the original sentence (a cubic polynomial in logs) and dummies for each of the quantiles of sentence length.20 We then plot the relative coefficients on each of the dummies. There are two key changes in the figure. First, the overall magnitude of deterrence increases by a factor of almost four. Since sentence length is likely correlated with unobservable drivers of recidivism, we would expect the effect of increased sentence length to be biased towards zero. Once we add controls for original sentence to correct for this bias, we see much larger estimated deterrent effects. The second difference is that the predicted recidivism rates for each bin of sentence length are much more tightly concentrated along a curve that closely resembles Figure 2. The curvature of this line is what pins down the discount factor. Overall, this descriptive evidence suggests non-trivial amounts of discounting that are apparent even in the raw data on recidivism and sentence length. The tight relationship between recidivism and sentence length in Figure 3 foreshadows the high degree of statistical precision in our estimates, as well as the 18 We trim the top and bottom 1% of total sentence length to make the figure easier to see. on the original sentence, the remaining variation in total sentence is due to the exogenous variation in residual sentence. 20 Using splines and dummy variables based on the months of original sentence length led to very similar results. 19 Conditional 17 robustness of our results to alternative model specifications. In order to estimate the specific value of the discount factor, we now turn to the estimates from our model. 5.1 Baseline Results We estimate our baseline specification described in equation (6) by maximum likelihood, using a logistic distribution for the error u, and we model h (os) using a cubic polynomial in the logarithm of the original sentence.21 In Table 6, we present estimates of the baseline model in equation (6). In our main dataset we do not observe the probability of being caught Pjlc , and in our baseline estimates we assume that this is a constant. In Section 5.3 below, we bring in some additional external data on clearance rates to capture this probability, and find that our results are qualitatively unaffected. We start in column 1 with the simplest specification that includes no covariates. In other words, we investigate the effect of the residual sentence on recidivism, conditioning on only the original sentence length to control for the endogeneity of the residual sentence. Our estimates imply an annual discount factor of 0.71.22 In columns 2-5 we add controls to the estimating equation to allow for the utility from crime commission and the disutility of imprisonment to vary across individuals. In column 2, we add controls for age, permanent employment status, gender, marital status, whether the inmate is Italian, whether the original sentence is definitive (all appeals have been exhausted), and binary indicators for low education and high education. We define low education as having at most a primary education (corresponding to less than 5 years of education), and we define high education as an education at or beyond secondary school (corresponding to at least 13 years of education). The first thing to note is that the estimated discount factor is relatively unchanged, having increased slightly to 0.74. Age is associated with lower utility of crime commission, and a higher disutility of going to prison (although it is not precisely estimated in the first term). One explanation for this is that as people age, the expected benefit from committing crime is lower, perhaps due to a lower productivity of the individual in criminal activity, and serving time in prison is more unpleasant the older someone is. Having a definitive sentence is associated with a higher disutility of prison. This is intuitive given that those without firm sentences face some chance of having their residual sentence commuted or shortened 21 Model selection criteria show that this is the preferred specification for the original sentence. We also tried using splines and dummy variables based on the months of original sentence length, which led to very similar results. 22 If δ is the monthly discount factor, then δ = δ 12 is the annual one. A 18 upon appeal. Being Italian is not associated with the utility from committing a crime, but is associated with a lower disutility of prison. This suggests that going to prison in Italy is worse for foreigners than for Italians. Since Italians have a higher chance of receiving alternative sentences such as home arrest and parole, this makes sense (e.g., according to ISTAT (2015), in 2013, foreigners represented 35 percent of the prison population but only 13 percent of the population receiving alternative sentences). Furthermore, in Italy, inmates are required by law to be imprisoned in the prison closest to their home, so that family can most easily visit. However, since many of the inmates that are foreign are illegal immigrants23 , by definition they have no official home, which means that they are likely to be imprisoned further from nearby friends and family. We also find that being married is associated with a lower utility of committing a crime and a lower disutility of going to prison. We find that gender, permanent employment prior to the original incarceration, and the level of education of the individual are not associated with either component of utility. In columns 3 and 4 of Table 6, we estimate the model using alternative definitions of low education and high education. We first adjust low education to include only individuals with no completed education (column 3). In column 4, we re-define high education to be a high school degree or higher.24 These changes have very little effect on the other estimates, including the discount factor. The point estimates on the indicators for low and high education do change a bit, but none of the estimates are statistically different from zero. The differences between these education measures do matter below, however, when we interact them with the discount factors in Section 5.2. In column 5 we add dummies for the various crime categories, with the idea that the returns to committing crimes and/or the opportunity cost of going to prison might vary by crime type. The other coefficients are largely unchanged, and as was the case in the previous results, the estimated discount factor is relatively unaffected, with a point estimate of 0.70. Overall, the results in Table 6 indicate that the (annual) discount factor among our sample of previous criminal offenders is between 0.70 and 0.74. In line with the non-parametric evidence shown in the second panel of Figure 3, the discount factor is precisely estimated. The 95% confidence interval for our baseline result in column 2 of is [0.65,0.83], and thus we can easily reject the hypothesis that the discount factor is either 0 or 1.25 23 In Italy, 70% of foreign inmates are illegal immigrants (see Italian Ministry of Internal Affairs (2007)). to our original measure of high education we exclude individuals who went to vocational school. 25 Standard errors, and therefore confidence intervals, for the annual discount factors can be computed via the delta method. 24 Compared 19 In order to illustrate how discounting affects the relationship between utility and imprisonment, in Figure 4, we have graphed the present discounted flow disutility of prison as a function of sentence length, for various values of the discount factor. We normalize the disutility to 100 in period 0. Under no discounting (discount factor of 1), the relationship would be a straight horizontal line at 100. As the figure illustrates, even small changes in the discount factor lead to large relative changes in utility, and therefore behavior, which is what allows us to obtain precise estimates of the discount factor. Our estimates imply that the disutility of an additional year of imprisonment is 74% of the previous year. This indicates a fairly large degree of discounting by criminal offenders, but not so much that they act myopically and ignore any future consequences of their actions. Relative to the first year in prison, the marginal discounted annual disutility of prison drops to 30%, 7%, and 1% after 5, 10, and 15 years, respectively. The effect diminishes over time at a reasonably fast rate, but not so fast that the difference between short- and medium-term sentences becomes meaningless from a deterrence perspective. In other words, there is scope for a deterrent effect of imprisonment. For a more traditional discount factor of 0.95, the discounted marginal disutility of imprisonment falls to 50% of its original value after 15 years. If we compare to smaller discount factors—at values of 0.10 and 0.30—the effect drops to below 1% after only 3 and 5 years, respectively, reflecting negligible marginal deterrent effects of imprisonment for sentences of more than a few years. This figure helps to illustrate that our estimates of the criminal discount factor are quite reasonable. At a discount factor of 0.74, there is some drop-off in the deterrent effect, but it does not vanish immediately. If discount factors were much smaller, then this would imply almost no deterrent effect of imprisonment beyond the first year or two. Estimates of the individual discount factor for the general population, based both on lab experiments (e.g., Coller and Williams (1999); Harrison, Lau, and Williams (2002); Andersen et al. (2014)) and observational studies (e.g., Viscusi and Moore (1989); Warner and Pleeter (2001); Cagetti (2003)), vary considerably in the literature, ranging from values close to 0 to close to 1. However, the majority of the estimates range from discount factors of 0.7 to close to 1. Our estimates are on the low end of those found in the literature, particularly compared to those from other observational or non-experimental studies. This is consistent with the idea that those individuals who decide to commit crimes have lower discount factors than people in the population at large, and suggests that low future time preference could be an important driver of criminal behavior. 20 The only other paper that we are aware of that attempts to elicit individual discount factors for criminals is Lee and McCrary (2009). Using detailed panel data on arrests in Florida, they exploit the discontinuity in the severity of punishment at the age of majority (18). Despite large increases in the average severity of punishment at the age of 18, they find very small drops in the probability of arrest. Using a dynamic extension to Becker’s (1968) model, they map these estimates into plausible values of the discount factor, and find that the youth they study are essentially myopic. Specifically, they find that their point estimates are inconsistent with annual discount factors larger than 0.022, although they note that they cannot statistically rule out larger ones. There are several factors that likely contribute to the difference between our estimates and those in Lee and McCrary (2009). First, it is likely that the effect of the policy in our sample is much more salient. Individuals in our dataset are older, with an average age of 38, compared to 18 in Lee and McCrary (2009), and therefore likely to be more familiar with the criminal justice system, particularly the adult system.26 Individuals in our sample are also quite well-informed about the future consequences of their actions, both in terms of the legal implications (a longer sentence) and the costs associated with being imprisoned. Tied to their early prison release, which should be apparent to them, is the caveat of having to serve the residual sentence if they are convicted again. Furthermore, the massive scale of the pardon and the homogeneity of the policy across individuals (with the important exception of the length of the residual sentence) should have served to increase the flow of information regarding the pardon across prisoners. Second, given the age differences in our samples, it is possible that younger people act more myopically (particularly those who decide to commit criminal acts), perhaps due to problems with self-control or impulsivity (Wilson and Herrnstein (1985); Jolliffe and Farrington (2009)). It is difficult for us to compare directly, as no one in our sample is less than 19 years of age, and fewer than 1% are below the age of 22. 5.2 Heterogeneity in the Discount Factor In order to investigate whether there are any important differences in discounting behavior across individuals we also allow for discount factors to vary across individuals based on observables. We first allow the discount factor to vary based on characteristics of the inmates, such as gender, age, education, 26 Using NLSY data for the US, Hjalmarsson (2009) finds that youth perceptions of the change in the sanctions at the age of majority (age 18) are smaller than the true changes observed in the data. 21 nationality, etc. Second, we allow the discount factor to vary based on the category of the original crime committed (e.g., violent, property, drugs). This analysis not only provides a description of how discount factors vary (or do not vary) across individuals, but also provides a check on our results, as it facilitates comparison to other results in the literature regarding individual discount factors. We first estimate the model allowing for the discount factors to depend on individual characteristics, by interacting the discount factor δ with indicator variables for marital status, gender, low education, high education, employment status, as well as age and measures of nationality. These results are presented in Tables 7 and 8. In columns 1 and 2 of Table 7, the point estimates suggest higher annual discount factors for inmates who are married and for women (6 percentage points (pp) and 9 pp, respectively). Although the differences are not precisely estimated, our results with respect to gender are consistent with the literature (see e.g., Cagetti (2003); Coller and Williams (1999)). We are not aware of any results in the literature regarding a systematic relationship between discount factors and marital status. We find little evidence of differences based on whether or not an individual was permanently employed at the time imprisonment prior to the pardon. We find evidence that discount factors are decreasing with age. In columns 4 and 5 we report results in which we allow the discount factor to depend on age in a log-linear27 and quadratic fashion, respectively.28 In both cases we find evidence that discount factors decrease in age at a decreasing rate. For the log-linear specification the discount factor decreases from 0.78 at age 26 (the 10th-percentile of the age distribution) to 0.68 at age 51 (the 90th percentile). With the quadratic model, the discount factor drops from 0.76 at age 26 to 0.68 at age 51. Overall there seems to be evidence of a somewhat steep decline in discount factors early on, followed by a flattening out. There is no consensus in the literature on the relationship between time preference and age, with some papers finding that discount factors decrease with age (e.g., Meier and Sprenger (2010)) and others finding the opposite (e.g., Warner and Pleeter (2001)). One potential explanation for our finding of a negative relationship in our dataset is selection. To the extent that low discount factors are an important driver of crime, individuals who are still committing crimes later in life are likely those who have the lowest discount factors. For education, we consider two definitions of both low and high education, following the definitions used previously. For low education (columns 6 and 7), the differences in discount factors for both definitions are small (less than 2%) and statistically insignificant, although the signs are what we would expect 27 Specifically 28 We we interact the discount factor with the natural log of the age of the offender minus 18 years. also tried interacting the discount factor with dummies based on different age bins, and found similar results. 22 (lower educational attainment is associated with lower discount factors). One likely explanation is that the average education level of individuals in our dataset is quite low. Therefore distinguishing between low and very low education levels has negligible effects on the estimates of the discount factor. In contrast to the measures of low education, we do find sizable differences for high education. The results in columns 8 and 9 indicate that having a higher education is associated with significantly larger discount factors. For individuals with an education of at least 13 years, the estimated discount factor is 0.89 compared to the rest of the released inmates at 0.73. When we exclude from the high education measure those that attended vocational school, the difference is even larger, with an estimated discount factor of 0.99 for those with high education, a difference of 26 percentage points. While less than 2% of the sample have this high of a level of education, the difference is very precisely estimated, suggesting that particularly high levels of education are strongly related to discount factors. Given the nature of our data, we are unable to determine whether education actually increases discount factors (Becker and Mulligan (1997); Lochner and Moretti (2004)) or whether people with high discount factors are more likely to invest in education. Regardless of the explanation, consistent with the literature, we find strong evidence of a positive relationship between (high) education and discount rates (Viscusi and Moore (1989); Warner and Pleeter (2001); Cagetti (2003)). The last set of individual characteristics that we analyze are based on nationality. Recall that we observe the country of origin for each individual. We first estimate a specification in which we allow the discount factor to depend on whether the person is Italian or not. The results in column 1 of Table 8 indicate a discount factor of 0.78 for Italians compared to 0.66 for non-natives. This result could be driven by a number of factors. First, it could be the case that the discount rates of the populations from which the immigrants originate are lower than in Italy. It could also be the case that those individuals who both immigrate and commit crimes have particularly low discount factors. Salience might be important here as well. Immigrants may be less familiar with the Italian justice system and they may not speak Italian, both of which could lead to a failure to fully recognize the consequences of their future criminal activities, and therefore lower estimated discount factors in the data. Regardless of the explanation, one implication of our estimates is that foreigners are less likely to be deterred by imprisonment, and particularly so for longer sentences. In addition to examining differences between Italian natives and immigrants, we examine how differences in criminal discount factors vary among immigrants across country of origin. A recent literature 23 has emerged documenting an empirical cross-country relationship between measures of time preference and many aggregate behaviors and outcomes such as national income, investment in human and physical capital, and savings rates (e.g., Galor and Özak (2014) and Dohmen et al. (2015)). In a recent attempt to determine the cause of these underlying differences in time preference, Chen (2013) examines differences across countries based on the extent to which their languages distinguish between current and future events, which linguists refer to as future time reference (FTR). His hypothesis is that speaking in a way that places a larger distinction between current and future events causes people to value the future less. In support of this, Chen finds a negative relationship between FTR and future-oriented behaviors such as savings and various health-related activities. In other words, in countries in which the language places a stronger distinction between the current and future, individuals exhibit less forward-looking behavior on average.29 In order to investigate the presence of differences in discounting across immigrants based on country of origin, we merged in the data on FTR from Chen’s paper. We matched the primary languages spoken in each country30 to the country of origin for each individual. Each language is coded by linguists as having either strong FTR or weak FTR. If all major languages spoken had either strong or weak references, we classified the country accordingly. For example, the primary language in Italy is Italian, which is classified as a strong FTR language. If some languages spoken had strong FTR and others weak FTR, then we classified the country as “both”. For example, in Belgium, Flemish (weak FTR) and French (strong FTR) are the primary languages. We estimate two specifications corresponding to different ways to model countries having FTR of “both”. The results from these models are reported in columns 2 and 3 in Table 8. In each specification we also include an indicator for whether an individual is an Italian native or not, to allow for there to be differences in discount factors between native Italians and immigrants from other countries with strong FTR. In column 2, for countries with a FTR of “both”, we assign a value of 0.5 to the strong-FTR indicator, and in column 3, we drop these observations. In line with the results in column 1, we find that native Italians have the largest discount factors. Our results are also consistent with those in Chen (2013), as we find that discount factors are strongly correlated with FTR. Immigrants from countries with weak FTR have significantly higher discount factors than those from countries with strong FTR (12 pp - 14 pp depending on the specification). 29 See also Falk et al. (2015) who find a systematic relationship between patience and FTR. information was collected from the CIA World Factbook. 30 This 24 An alternative interpretation of these results is that linguistic differences across countries could be proxying for other cultural factors that lead to differences in time preference. We do not take a stand on the particular mechanism at work. Instead, we emphasize that our results indicate that the patterns across country of origin in discounting behavior among criminals in our data are consistent with other forward-looking behaviors for the population at large. This gives additional credibility to our results and suggests that they reflect patterns in the general population as well. Lastly, in addition to examining differences in time preference based on individual characteristics, we also examine whether there are differences across crime types. In Table 9, we report results from 10 different specifications (one for each crime category) in which we interact the discount factor with an indicator for whether the person was originally imprisoned for a crime of that type. For most crime categories, the estimated differences in discount factors are small, both statistically and economically. However, for two crime categories we find large statistically significant differences: prostitution and drug-related. For prostitution crimes, we find a discount factor of 0.92 compared to 0.72 for all other crimes. In Italy, prostitution is legal, but organized prostitution is not. Therefore prostitution crimes in our data are those relating to the supply of prostitutes. It is interesting that those individuals participating in this type of crime have discount factors closer to what we typically assign to economic agents, as these crimes involve running a business (albeit an illegal one), as opposed to the theft of property or acts of violence.31 We find discount factors that are 10 percentage points lower for violations of the law regarding the use and selling of drugs. Selection into the use of drugs by a pool of more myopic individuals might explain this result. It may also be the case that drug usage directly affects time preferences and/or decision making. 5.3 Robustness Checks In this section we discuss a number of robustness checks to our baseline specification. In particular, we control for differences in clearance rates and allow for individuals to transition to different crimes following the model described in Section 3.2. We also examine the impact of the fact that only individuals who receive future sentences of at least two years are subject to serving their residual sentence, and we explore the evidence for hyperbolic discounting behavior. 31 Of course this is not to say that such crimes do not involve violence, but rather that violence is not the primary activity. 25 5.3.1 Clearance Rates Our main dataset does not contain any information on clearance rates. In order to determine whether variation in clearance rates, across either location or crime type, could affect our estimates of discounting, we perform two sets of robustness checks. In the first set we attempt to control for regional differences in clearance rates using location-specific and crime-specific dummy variables. The results are reported in columns 1-4 of Table 10. In the first column we model the clearance rates using a set of three dummy variables, corresponding to each of the three areas of Italy (North, Center, and South). In column 2 we add crime-type dummies. In columns 3 and 4 we repeat the exercise using region dummies (there are 20 regions in Italy). The coefficients on the dummy variables are consistent with clearance rates being the highest in the North, and the estimated discount factors, which range from 0.70-0.74, are extremely similar to our baseline estimate. We also collected separate data by province on both the total number of crimes and the number of crimes in which the perpetrator is known, using data from ISTAT (2005), to compute clearance rates. It is important to note that identifying a suspect and clearing the crime via arrest are not necessarily the same, and therefore these data serve only as proxies for the true clearance rates. Quite surprisingly, the richest regions, e.g. Lombardy, Piedmont, Veneto, Lazio and Liguria appear to have, by far, the lowest average clearance rates. One possible explanation is that reporting rates of crimes are different across regions. The criminal justice system is widely recognized to function less efficiently in the South of Italy (see Jappelli, Pagano, and Bianco (2005)), which could lead to people being less willing to report crime32 , and thus inflating the clearance rates. We are therefore cautious not to rely too heavily on these estimates.33 We first estimate our baseline model using the province-specific average clearance rate, and then estimate a version using the province and crime-specific clearance rate. In columns 5 and 6 we report estimates from the models using the clearance rate data. The point estimates of the discount factor increase slightly (to 0.81) in these specifications, but so does the standard error, and the resulting confidence intervals almost completely cover that of the baseline model. Overall, we conclude that accounting for heterogeneity in clearance rates has at most a small impact on our estimates of discounting. 32 For most property crimes, the most recent victimization survey shows that reporting rates are approximately 10 percentage points lower in the South than in the North (see Muratore et al. (2004)). 33 Cook (1979) highlights the endogenous nature of clearance rates (they are mediated by the choices made by criminals) and advises against using these as a measure of criminal justice system effectiveness. 26 5.3.2 Transitioning to Different Crimes In this section we discuss estimates of the model described in Section 3.2, in which we take into account that the next crime a pardoned individual commits may differ from the one for which he was previously incarcerated, and thus carry a different sentence length. Using data on the prison population from two prisons in Milan, we constructed transition probabilities between crimes. For cases in which multiple crime types are committed, we categorized crimes based on the most serious offense (defined based on either the mean or median sentence). We constructed predicted sentences for all inmates over all possible future crime choices based on equation (3). We then integrated the likelihood over these different possible future sentences, using the transition probabilities as weights. The estimates for this model are reported in Table 11. The results are similar, both to each other, and to our baseline estimates. The discount factor is 0.76 using the mean sentence to define the most serious crime, and 0.75 using the median. Not surprisingly the standard errors increase slightly, but the results are still fairly precisely estimated. Overall, we conclude that using the original crime type as a proxy for the future crime does not bias our estimates of time preference. 5.3.3 Sentences Less Than Two Years Under the Collective Clemency Bill passed in 2006, only individuals who receive a future sentence of at least two years are subject to serving their residual sentence. About one-third of the individuals in our sample had original sentences of less than two years. This suggests that many individuals in our data may not face the residual sentence enhancement. One natural way to deal with this would be to set the residual sentence equal to zero for these individuals. One potential problem with this approach is that a common reason for shorter sentences is sentence reductions, and repeat offenders are less likely to obtain sentence reductions.34 This suggests that not all individuals with original sentences below two years should expect to receive similarly low sentences for their next offense. In order to investigate whether this two-year cutoff is driving any of our results, we estimated three additional specifications that deal with this issue in different ways. First, we simply set the residual sentence equal to zero for individuals with an original sentence below two years. Second, we set the expected new sentence equal to two years (plus the residual sentence). Third, we dropped all observations 34 There are several alternative sentencing arrangements that can reduce an individual’s incarceration spell. The two most common are probation and home arrest. 27 with an original sentence below two years. The results are presented in Table 12. For the first two specifications, the results are very similar to our baseline estimates, with estimated discount factors of 0.77 and 0.74. In the third specification, the point estimate drops to 0.66, but it is not nearly as precisely estimated, due to the smaller number of observations. The resulting confidence interval completely covers that of the baseline model. These results indicate that our estimates are not being driven by the two-year threshold for the law. 5.3.4 Hyperbolic Discounting Since criminal acts generate immediate rewards but delayed costs that are spread over time, it has been argued that present bias, or a tendency to focus on the immediate gratification, might explain the engagement in criminal activities (Jolls, Sunstein, and Thaler (1998)). This kind of impatience, which is very strong for near rewards but later declines over time, is often referred as “hyperbolic discounting".35 While there is growing evidence that at least some individuals exhibit inconsistent time preferences36 , for criminal behavior such evidence is still mainly speculative, with the possible exception of very young offenders (Lee and McCrary (2009)). In order to investigate the presence of time inconsistency in preferences among criminals, we estimate a version of our model under hyperbolic discounting. For mathematical convenience we model hyperbolic preferences in continuous time, where such preferences are commonly expressed as 1 . DFHyperbolic-Continuous (t) = 1 + kt (7) The parameter k governs the rate of discounting. In the continuous-time formulation, hyperbolic discounting differs from the exponential version in two ways. First, under the hyperbolic model, individuals discount the future to a much larger extent early on. Second, the rate of decay of discount factors is slower, leading to larger discount factors (relative to exponential discounting) in later periods. Although we have little data with short sentences, we do have data with longer sentences, which allow us to evaluate the hyperbolic model relative to the exponential model. For convenience, researchers have often used a quasi-hyperbolic discounting specification, which is a discrete-time approximation to the continuous-time version. These functions have been adopted for analytic tractability by Laibson (1997); O’Donoghue and Rabin (1999); DellaVigna and Paserman 35 An overview of hyperbolic discounting can be found in Rabin (1998). In the criminology literature, a related notion is referred to as impulsivity (Wilson and Herrnstein (1985); Jolliffe and Farrington (2009)). 36 For a recent overview of the empirical literature see DellaVigna (2009). 28 (2005); Mastrobuoni and Weinberg (2009), among others. With quasi-hyperbolic discounting, discount factors fall quickly in the first period (or periods), and then afterward decay at a geometric rate. Under the specification due to Laibson (1997) the discount factors are given by: DFQuasi-Hyperbolic-Discrete (t) = β × δ t , where β < 1 denotes the initial drop in the discount factor. After the first period, discount factors continue to decline in the same way as under exponential discounting. Due to the nature of our data, we do not observe decisions for individuals facing extremely short sentences of only a month (or even a few months). As a result, we are not able to separately identify β from the mean disutility of prison, and the quasi-hyperbolic discounting model is observationally equivalent to the exponential discounting model.37 Instead, we focus on the continuous-time formulation. In Table 13, we provide estimates from the hyperbolic form of discounting in equation (7). For comparison we also estimate a continuous time version of exponential discounting, which is given by DFExponential-Continuous (t) = e−rt , where r is the monthly discount rate. (The estimates for the discrete and continuous time versions of exponential discounting yield results that are essentially identical to each other.) Comparing the likelihood values in Table 13, we can see that the model with exponential discounting outperforms the model with hyperbolic discounting. Furthermore, the hyperbolic discounting parameter k is not precisely estimated. When k equals zero in equation (7), this implies no discounting, and the 95% confidence interval easily covers zero. The top end of the confidence interval implies a discount factor in the first year (t = 12 months) of 0.24. Together the results reflect a very large range of potential discount factors under hyperbolic discounting that are consistent with the data. Because the parameters of the two different forms of discounting (k and r) affect discount rates in different ways, it is not straightforward to compare the implied discount factors. In order to facilitate a comparison, we have computed the implied discount factors under both estimated models and plotted them in Figure 5. One feature of hyperbolic discounting is that discount factors fall more quickly in 37 From a policy perspective, separately identifying the quasi-hyperbolic parameter β from the mean disutility of prison may not even matter, as long as one is not interested in the effects of very short incarceration spells, as both models would generate the same implications for utility, and therefore behavior, over such horizons. 29 early periods, but then flatten out, compared to exponential discounting. The implied 1-year discount factors are 0.74 and 0.44 for the exponential and hyperbolic models, respectively. At about 8 years, they intersect, with hyperbolic discount factors being higher afterward. Similar to several recent papers in the literature on estimating time preference (e.g., Warner and Pleeter (2001); Harrison, Lau, and Williams (2002); Sutter et al. (2013)), we find no evidence of hyperbolic discounting behavior, though we cannot rule out that the discount function might be hyperbolic over very short periods, giving rise to “quasihyperbolic” discount functions. 6 Implications for Estimates of Deterrence Our finding of significant discounting behavior among criminals implies that the marginal deterrent effect of imprisonment is decreasing in sentence length. Estimates that fail to take this into account capture an “average” effect that is biased downward for early periods and upward for later periods. This is particularly important to consider when comparing the effectiveness of deterrence across countries and policies. For example, the elasticity of recidivism with respect to sentence length has been found to vary considerably in the literature, suggestive of large differences in deterrent power. To illustrate the importance of discounting in driving estimates of deterrence, we computed this elasticity both for the Italian pardon in our data and for the estimates in Helland and Tabarrok (2007), a study of the deterrent effect of three strikes legislation in the US. We find elasticities of -0.45 and -0.055, respectively.38,39 The fact that the elasticity is almost an order of magnitude larger for the Italian pardon is, at face value, suggestive of large differences in the deterrent power of increases in prison sentences between Italy and the US. However, when we simulate the effect of the three strikes policy in our data using our estimated model40 , the predicted drop in recidivism implies an elasticity of only -0.10, much closer the Helland and Tabarrok (2007) estimates. While both policies led to percentage decreases in recidivism of about 17%, the large apparent difference in deterrence is caused by the fact that the sentence enhancements under three strikes are substantially longer. The percentage increase in sentence length was 37% in the Italian pardon, and 313% total change in recidivism rate baseline sentence = baseline recidivism months of additional sentence . Galbiati, and Vertova (2009) report an elasticity of -0.74. The difference is due to the fact that, in order to facilitate comparison with other results in the literature, we compute the elasticity with respect to the baseline sentence (39 months) and the baseline recidivism (0.14), as opposed to values that include the changes in sentence and recidivism induced by the policy. If we follow their formula, we obtain an almost identical estimate to theirs of -0.75. 40 That is, we set all future sentences to be equal to 20 years (the minimum sentence under three strikes). 38 We compute all elasticities using the following formula: %∆recidivism %∆sentence 39 Drago, 30 under three strikes. However, the later months of the sentence are heavily discounted, leading to small incremental effects on recidivism. Using our baseline estimates of the discount factor, the percentage changes in present discounted (dis)utility from imprisonment are 18% and only 24% for the pardon and three strikes, respectively. In other words, the true force of the three strikes policy is only marginally larger than that of the Italian pardon. For a baseline sentence s and an additional sentence ∆s, we can use equation (2) to compute the per(1−δ s+∆s )−(1−δ s ) . Compared centage change in present discounted disutility of prison, %∆disutility(δ ) = 1−δ s to the elasticity with respect to sentence length, the only additional information that is needed to compute the elasticity with respect to the total disutility of prison, %∆recidivism , %∆disutility(δ ) is the discount factor δ . Using our discount factor, increasing the disutility of prison by 10% reduces recidivism by 9.3% in our data and by 7.2% in Helland and Tabarrok (2007). This analysis may also help to explain why the literature has failed to find systematic convincing evidence as to the deterrent effect of the severity of punishment (compared to certainty of punishment). The magnitude of the effect depends on the length of punishments studied, and, for long sentences, the effect is likely to be quite small after taking into account discounting. 7 Conclusion A large number of studies have cited excessive discounting as a potential explanation for both engagement in crime as well as the (lack of) responsiveness to increased severity of punishment. Empirical evidence on the discounting behavior of criminals, on the contrary, is quite scarce. We provide new evidence on the extent of discounting among criminals by exploiting a quasi-experiment that is driven by a mass release of Italian prison inmates that took place in 2006. Conditional on the original sentence, the collective pardon generates a distribution of exogenous sentence enhancements. The raw data show a monotonic reduction in marginal deterrence as the sentence increases (thereby reducing the additional disutility of incarceration), suggestive of the presence of discounting. In order to identify discount factors for the large number of pardoned inmates, we estimate a basic intertemporal model of criminal behavior, and our findings are very robust to a series of different specifications. Overall, our baseline estimate of criminal discount factors of 0.74 is on the low end of the range of individual-level discount factors found in the literature for the general population. This supports the hypothesis that low future time preference is a driver of criminal behavior. However, the estimates are 31 still far from zero, suggesting that imprisonment does have the potential to deter crime, as criminals do not fully discount future punishments. While it has been recognized that low future time preference could be important in shaping the behavioral responses to future punishments for crimes, this is rarely taken into account both in empirical studies on deterrence and in policy debates. Given the delayed nature of imprisonment, discounting and deterrence go hand-in-hand. As we highlight above, this can have importance implications for estimates of the deterrent effect of imprisonment, and can help explain the lack of consensus on the magnitude of deterrence. Our paper provides one of the first empirical investigations and finds significant, but not complete, discounting and illustrates that accounting for discounting is critical for designing optimal deterrence policy. Our estimates suggest that while increased sentence length can have quite strong deterrent effects for low initial sentence lengths, the incremental effects are much smaller for longer sentence lengths. From a deterrence perspective it might be beneficial to trade off the costs of incarceration associated with long prison sentences with policies aimed at increasing the certainty of punishment either through a higher probability of apprehension or through improved efficiency of the criminal justice system. Appendix A: Dynamic Utility Model In this appendix we should how our main estimating equation can be obtained from a more general dynamic utility model via restrictions on the option value of future crimes. The value of being free and facing sentence s if caught committing another crime is given by V f (s) = max u (nc) + δ E V f (s) , u (c) + P (d + δ E [V p (s − 1)]) + δ (1 − P) E V f (s) + ε , where u (nc) is the utility from not committing a crime, u (c) is the utility from committing a crime, P is the probability of being apprehended, V p (s) is the utility of being in prison with a sentence of s years, d is the per-period disutility of prison, and ε is a random utility term associated with committing a crime. Without loss of generality, we normalize u (nc) to be equal to zero. If an individual commits a crime and is apprehended, they are placed in prison for s total periods periods (the first of which is served immediately), and then released. Therefore the value of being in 32 prison with a remaining sentence of s − 1 periods can be written as: p V (s − 1) = d 1 − δ s−1 1−δ + δ s−1 E V f (s) . One way to generate our main estimating equation is to assume that the utility shocks ε vary across individual, but are constant over time. Under this assumption, there will be a cutoff value of ε (which depends on observables) that determines whether or not an individual commits a crime when not incarcerated. Individuals with values of ε below the cutoff will not commit crime, which given our normalization of u (nc) = 0 implies that the value of their option to commit future crime is zero E V f (s) = 0 , and we obtain the model in equation (2). For individuals with an ε above the cutoff, we can write their value functions as follows. Since ε is fixed over time, V f (s) = E V f (s) , and since ε is above the cutoff, these individuals will choose to commit crime, which implies the following: V f (s) = u (c) + P (d + δV p (s − 1)) + δ (1 − P)V f (s) + ε. Plugging in for the value of being incarcerated V p (s), we have 1 − δ s−1 s f + δ V (s) + δ (1 − P)V f (s) + ε V (s) = u (c) + P d + δ d 1−δ f which we can solve for: V f (s) = u (c) + Pd 1−δ s 1−δ + ε, 1 − Pδ s − δ (1 − P) . Since these individuals choose to commit crime it must be the case that they prefer to do it earlier rather than later (due to discounting), meaning that  0+δ  u (c) + Pd 1−δ s 1−δ +ε   < 1 − Pδ s − δ (1 − P) s   1−δ s u (c) + Pd + ε, 1−δ 1−δ  + ε. u (c) + Pd + (Pδ s + δ (1 − P))  s 1−δ 1 − Pδ − δ (1 − P) 33 Subtracting δ s u(c)+Pd ( 1−δ 1−δ )+ε, 1−Pδ s −δ (1−P) 0 < u (c) + Pd from both sides we have 1−δs 1−δ  + ε + (Pδ s + δ (1 − P) − δ )  u (c) + Pd 1−δ s 1−δ + ε, 1 − Pδ s − δ (1 − P)  , which can be rewritten as 0< For δ < 1, 1−δ 1−Pδ s −δ (1−P) 1−δ s 1 − Pδ − δ (1 − P) 1−δs u (c) + Pd +ε . 1−δ is always positive, and this inequality holds if and only if u (c)+Pd 1−δ s 1−δ + ε > 0. Therefore, we have shown that for individuals with random utility draws ε both above and below the cutoff, their decisions are consistent with the model we employ in Section 3.1. If we plug in for u (c) h i 1−δ X δ tsi j ( ) u d + εi j > and d, and add back in subscripts, this is equivalent to α0 + α1 Xi j + β0 + β1 Xi j Pjlc 1−δ iXj δ ( ij) 0, which is the model in equation (2). An alternative way to motivate our model is to set E V f (s) = 0, which essentially sets the value of having the option to commit future crime to be zero. For example, suppose that individuals make a once-and-for-all decision of whether to commit a crime, and then receive zero payoff afterward. In this case the model simplifies to: 1−δs V (s) = max 0, u (c) + Pd +ε . 1−δ f s We then have that an individual commits a crime if and only if u (c) + Pd 1−δ 1−δ + ε > 0, which again is consistent with our model in equation (2). Appendix B: Conditional Independence Assumption Recall that ci j is an indicator for whether individual i commits a crime of type j. For notational simplicity, we drop the subscripts and let c be an indicator for committing a crime. The probability that a crime is committed conditional on the original sentence os, the residual sentence rs, the other observables X, and 34 a vector of parameters to be estimated θ , can be written as: Pr (c = 1 os, rs, X; θ ) = Pr (g (os, rs, X; θ ) + ε > 0) = Pr (ε > −g (os, rs, X; θ )) = 1 − Fε os,rs (−g (os, rs, X; θ )) 1 − Fε os (−g (os, rs, X; θ )) , where g is the utility model as a function of observables and F is the distribution function for ε. The last equality follows from the key identifying assumption that conditional on the original sentence os, the error ε is independent of the residual sentence rs. Recognizing the dependence between ε and os, we can write ε = h (os; γ) + u, where h is an unknown (non-parametric) function. Assuming u is independent of os, and using a simple change of variables from ε to u, we can then write the probability of crime as Pr (c = 1 os, rs, X; θ ) = 1 − Fε os (−g (os, rs, X; θ )) = 1 − Fu os (−g (os, rs, X; θ ) − h (os; γ)) = 1 − Fu (−g (os, rs, X; θ ) − h (os; γ)) , where the second equality is due to the change of variables, and the third equality follows from the independence of u and os. This implies that in addition to our model of the crime choice g, an additional term needs to be added to the model: h (os; γ). This function controls for the dependence between the error ε and the original sentence os. This additional term is necessary because we do not know the exact form of the dependence h between ε and os. It may be the case that os enters h differently from how it enters g. The model can then be estimated using maximum likelihood, with the caveat that any part of g that is separable in os is not separately identified from h. Since our only interest is in identifying the part of g related to the residual sentence, this is not a problem for identification of the discount factor. 35 References Akerlund, D., B. Golsteyn, H. Gronqvist, and L. Lindahl. 2014. “Time Preferences and Criminal Behaviour.” Tech. Rep. 8168, IZA. Andersen, Steffen, Glenn W Harrison, Morten I Lau, and E Elisabet Rutström. 2014. “Discounting behavior: A reconsideration.” European Economic Review 71:15–33. Barbarino, Alessandro and Giovanni Mastrobuoni. 2014. “The Incapacitation Effect of Incarceration: Evidence from Several Italian Collective Pardons.” American Economic Journal: Economic Policy 6 (1):1–37. Beccaria, Cesare. 1764. On Crimes and Punishments. Philadelphia: Philip H. Nicklin, 1819, 2nd american edition. translated from the italian by anonymous ed. Becker, G. S. and C. B. Mulligan. 1997. “The Endogenous Determination of Time Preference.” The Quarterly Journal of Economics 112(3):729–758. Becker, Gary S. 1968. “Crime and Punishment: An Economic Approach.” The Journal of Political Economy 76 (2):169–217. Bukstel, Lee H and Peter R Kilmann. 1980. “Psychological effects of imprisonment on confined individuals.” Psychological Bulletin 88 (2):469. Cagetti, Marco. 2003. “Wealth accumulation over the life cycle and precautionary savings.” Journal of Business & Economic Statistics 21 (3):339–353. California Assembly. 1968. “Deterrent Effects of Criminal Sanctions-Progress Report of the Assembly Committee on Criminal Procedure.” . Chen, M.K. 2013. “The Effect of Language on Economic Behavior: Evidence from Saving Rates, Health Behaviors and Retirement Access.” American Economic Review 103(2):690–731. Ching, Andrew and Matthew Osborne. 2015. “Identification and Estimation of Forward-Looking Behavior: The Case of Consumer Stockpiling.” Unpublished working paper, University of Toronto. Coller, Maribeth and Melonie B Williams. 1999. “Eliciting individual discount rates.” Experimental Economics 2 (2):107–127. Cook, Philip J. 1979. “The clearance rate as a measure of criminal justice system effectiveness.” Journal of Public Economics 11 (1):135–142. ———. 1980. “Research in criminal deterrence: Laying the groundwork for the second decade.” Crime and justice :211–268. Davis, M.L. 1988. “Time and Punishment: an Intertemporal Model of Crime.” Journal of Political Economy 96(2):383–390. DellaVigna, Stefano. 2009. “Psychology and Economics: Evidence from the Field.” Journal of Economic Literature 47 (2):315–72. DellaVigna, Stefano and Daniele M. Paserman. 2005. “Job Search and Impatience.” Journal of Labor Economics 23 (3):527–587. 36 Dohmen, Thomas, Benjamin Enke, Armin Falk, David Huffman, and Uwe Sunde. 2015. “Patience and The Wealth of Nations.” Working Paper. Drago, Francesco, Roberto Galbiati, and Pietro Vertova. 2009. “The deterrent effects of prison: Evidence from a natural experiment.” Journal of Political Economy 117 (2):257–280. Durlauf, Steven N. and Daniel S. Nagin. 2011. “The Deterrent Effect of Imprisonment.” Controlling Crime: Strategies and Tradeoffs . Ehrlich, Isaac. 1973. “Participation in illegitimate activities: A theoretical and empirical investigation.” The Journal of Political Economy :521–565. Falk, Armin, Anke Becker, Thomas Dohmen, Benjamin Enke, David Huffman, and Uwe Sunde. 2015. “The Nature and Predictive Power of Preferences: Global Evidence.” Unpublished manuscript, Available at SSRN 2691910. Friedman, Milton. 1957. A Theory of the Consumption Function. Princeton, NJ: Princeton University Press. Galor, Oded and Ömer Özak. 2014. “The Agricultural Origins of Time Preference.” Tech. rep., National Bureau of Economic Research. Giglio, Stefano, Matteo Maggiori, and Johannes Stroebel. 2015. “Very Long-Run Discount Rates.” The Quarterly Journal of Economics 130 (1):1–53. Gullone, Eleonora, Tessa Jones, and Robert Cummins. 2000. “Coping styles and prison experience as predictors of psychological well-being in male prisoners.” Psychiatry, psychology and law 7 (2):170– 181. Harrison, Glenn W., Morten I. Lau, and E. Elisabet Rutström. 2010. “Individual discount rates and smoking: Evidence from a field experiment in Denmark.” Journal of Health Economics 29 (5):708– 717. Harrison, Glenn W., Morten I. Lau, and Melonie B. Williams. 2002. “Estimating Individual Discount Rates in Denmark: A Field Experiment.” American Economic Review 92 (5):1606–1617. Hausman, Jerry A. 1979. “Individual Discount Rates and the Purchase and Utilization of Energy-Using Durables.” Bell Journal of Economics 10 (1):33–54. Hawken, Angela and Mark Kleiman. 2009. “Managing Drug Involved Probationers with Swift and Certain Sanctions: Evaluating Hawaii’s HOPE: Executive Summary.” Washington, DC: National Criminal Justice Reference Services . Heckman, James J. 1976. “A Life-Cycle Model of Earnings, Learning, and Consumption.” Journal of Political Economy 84 (4):S11–44. Heckman, James J. and Richard Robb. 1985. “Alternative Methods for Evaluating the Impact of Interventions: An Overview.” Journal of Econometrics 30 (1-2):239–267. Helland, Eric and Alexander Tabarrok. 2007. “Does Three Strikes Deter?: A Nonparametric Estimation.” Journal of Human Resources 42 (2). Hjalmarsson, Randi. 2009. “Crime and Expected Punishment: Changes in Perceptions at the Age of Criminal Majority.” American Law and Economics Review 11 (1):209–248. 37 Hochstetler, Andy, Daniel S Murphy, and Ronald L Simons. 2004. “Damaged goods: Exploring predictors of distress in prison inmates.” Crime & Delinquency 50 (3):436–457. Hunt, Albert R. 2015. “Reforming the Criminal Justice System Is Not Assured.” New York Times . ISTAT. 2005. “Statistiche Giudiziarie Penali.” Tech. rep., Istituto Italiano di Statistica. ———. 2015. “I detenuti nelle carceri italiane.” Report, Istituto Nazionale di Statistica. URL http: //www.istat.it/it/archivio/153369. Italian Ministry of Internal Affairs. 2007. Rapporto sulla criminalità in Italia. Analisi, Prevenzione, Contrasto. Jappelli, Tullio, Marco Pagano, and Magda Bianco. 2005. “Courts and Banks: Effects of Judicial Enforcement on Credit Markets.” Journal of Money, Credit and Banking 37 (2):pp. 223–244. Jolliffe, Darrick and David P Farrington. 2009. “A systematic review of the relationship between childhood impulsiveness and later violence.” Personality, Personality disorder, and violence :41–61. Jolls, Christine, Cass R Sunstein, and Richard Thaler. 1998. “A behavioral approach to law and economics.” Stanford Law Review :1471–1550. Kaplow, Louis. 1990. “Optimal deterrence, uninformed individuals, and acquiring information about whether acts are subject to sanctions.” Journal of Law, Economics, & Organization :93–128. Katz, Lawrence, Steven D. Levitt, and Ellen Shustorovich. 2003. “Prison Conditions, Capital Punishment, and Deterrence.” American Law and Economics Review 5 (2):318–343. Laibson, David I. 1997. “Golden Eggs and Hyperbolic Discounting.” The Quarterly Journal of Economics 112 (2):443–477. Lee, David S. and Justin McCrary. 2009. “The Deterrence Effect of Prison: Dynamic Theory and Evidence.” Nber working papers, National Bureau of Economic Research, Inc. Levitt, Steven D and Thomas J Miles. 2007. “Empirical study of criminal punishment.” Handbook of law and economics 1:455–495. Lochner, Lance. 2007. “Individual Perceptions of the Criminal Justice System.” American Economic Review 97 (1):444–460. Lochner, Lance J. and Enrico Moretti. 2004. “The Effect of Education on Crime: Evidence from Prison Inmates, Arrests, and Self-Reports.” American Economic Review 94 (1):155–189. Mancino, Maria Antonella, Salvador Navarro, and David A. Rivers. 2015. “Separating State Dependence, Experience, and Heterogeneity in a Model of Youth Crime and Education.” Centre for Human Capital and Productivity Working Papers 20151, University of Western Ontario. Mastrobuoni, Giovanni and Paolo Pinotti. 2015. “Legal status and the criminal activity of immigrants.” American Economic Journal: Applied Economics 7 (2):175–206. Mastrobuoni, Giovanni and Matthew Weinberg. 2009. “Heterogeneity in intra-monthly consumption patterns, self-control, and savings at retirement.” American Economic Journal: Economic Policy 1 (2):163–189. McCrary, Justin. 2010. “Dynamic Perspectives on Crime.” Handbook on the Economics of Crime :82. 38 Meier, Stephan and Charles Sprenger. 2010. “Present-biased preferences and credit card borrowing.” American Economic Journal: Applied Economics :193–210. Muratore, Maria Giuseppina, Isabella Corazziari, Giovanna Tagliacozzo, Alessandro Martini, Alessandra Federici, Manuela Michelini, Agostina Loconte, Roberta Barletta, and Anna Costanza Baldry. 2004. La Sicurezza dei Cittadini. 2002 Reati Vittime, Percezione della Sicurezza e Sistemi di Protezione. Indagine Multiscopo sulle Famiglie “Sicurezza dei cittadini?- Anno 2002. Rome: Istituto Nazionale di Statistica. Nagin, Daniel S. 2013. “Deterrence: A Review of the Evidence by a Criminologist for Economists.” Annu. Rev. Econ. 5 (1):83–105. Nagin, Daniel S and Greg Pogarsky. 2004. “Time and punishment: Delayed consequences and criminal behavior.” Journal of Quantitative Criminology 20 (4):295–317. O’Donoghue, Ted and Matthew Rabin. 1999. “Doing It Now or Later.” American Economic Review 89 (1):103–124. Polinsky, A Mitchell and Steven M Shavell. 1999. “On the Disutility and Discounting of Imprisonment and the Theory of Deterrence.” Journal of Legal Studies 28 (1):1. Rabin, Matthew. 1998. “Psychology and economics.” Journal of economic literature :11–46. Sutter, Matthias, Martin G. Kocher, Daniela Glätzle-Rüetzler, and Stefan T. Trautmann. 2013. “Impatience and Uncertainty: Experimental Decisions Predict Adolescents’ Field Behavior.” American Economic Review 103 (1):510–31. Viscusi, W Kip and Michael J Moore. 1989. “Rates of time preference and valuations of the duration of life.” Journal of Public Economics 38 (3):297–317. Warner, John T. and Saul Pleeter. 2001. “The Personal Discount Rate: Evidence from Military Downsizing Programs.” American Economic Review 91 (1):33–53. Wilson, James Q. and Richard J. Herrnstein. 1985. Crime and Human Nature: The Definitive Study of the Causes of Crime. New York: Simon and Schuster. 39 24 0 6 12 Time Served 18 12 6 0 18 24 Residual Sentence 30 36 36 30 24 0 6 12 Time Served 18 12 6 0 18 24 Residual Sentence 30 36 Marginal Effect on Recidivism Cumulative Effect on Recidivism 0 30 0 36 Deterrence, 74% Treatment, 74% Deterrence, 74% Treatment, 74% Figure 1: Discounting against Prison Treatment Notes: The left (right) panel shows the marginal (cumulative) effect of deterrence (solid line) as well as prison treatment (dashed line) on recidivism. Note that for both deterrence and treatment, the effect of increasing prison sentence is to lower recidivism. The degree of discounting is set to 74 percent per year, and prison treatment effects exhibit decreasing returns to scale, meaning that they are larger at the beginning of the time served than at the end. An extra month of residual sentence corresponds to one less month spent in prison. 40 0 Cumulative Utility of Prison 0 2.5 5 7.5 10 Years of Sentence 12.5 15 Figure 2: Cumulative Utility of Prison as a Function of Sentence Length Notes: The figure shows the cumulative utility of prison time as a function of sentence length assuming an annual discount factor of 0.74. Note that all values of cumulative utility are negative. -.4 -.1 Change in Recidivism Rate -.05 Change in Recidivism Rate -.3 -.2 -.1 0 Conditional on Original Sentence 0 Unconditional 0 50 100 150 Total Sentence Length (Months) 200 0 50 100 150 Total Sentence Length (Months) 200 Figure 3: Change in Average Recidivism by Total Sentence Length Notes: The left panel plots the average recidivism rate for each of 40 quantiles of the total expected sentence length. The right panel adds controls for original sentence length. In order to control for original sentence length we regress the recidivism indicator on a cubic polynomial in the log of original sentence and dummies for each of the quantiles of total sentence length. The right panel plots the coefficients on each of the dummies. For both panels the differences in average recidivism are normalized relative to the maximum recidivism rate across the quantiles. 41 100 Discounted Flow Disutility 20 40 60 80 0 0 2.5 5 7.5 10 Years of Sentence 12.5 15 Discount Factors: 10% 50% 95% 30% 74% Figure 4: Flow Disutility of Prison as a Function of Sentence Length–Different Discount Factors 0 Discounted Flow Disutility 20 40 60 80 100 Notes: The figure shows the discounted flow of disutility of prison time (normalized to be 100 at the beginning of incarceration) for different annual discount factors. 0 2.5 5 7.5 10 Years of Sentence 12.5 15 Discount Factors: Exponential: Continuous Hyperbolic: Continuous Figure 5: Flow Disutility of Prison as a Function of Sentence Length–Different Forms of Discounting Notes: The figure shows the discounted flow of disutility of prison time (normalized to be 100 at the beginning of incarceration) for exponential and hyperbolic discounting. These curves are based on the estimates in Table 13. 42 Table 1: Summary Statistics Variable Recidivism (re-incarceration) Original sentence Residual sentence Definitive sentence Age Female Married Permanently employed Primary education or less No education Secondary school or above High school or above Italian Mean 0.22 39.90 15.01 0.74 37.72 0.05 0.26 0.15 0.27 0.06 0.03 0.02 0.62 Std. Dev. 0.41 32.67 10.52 0.44 9.79 0.21 0.44 0.36 0.44 0.24 0.18 0.13 0.49 Min. 0 1 0 0 19 0 0 0 0 0 0 0 0 Max. 1 360 36 1 70 1 1 1 1 1 1 1 1 Notes: This table presents summary statistics for the individual characteristics of the pardoned inmates. Sentence durations are expressed in months. The total number of observations is 19,616. Table 2: List of Crime Types Variable Property Violent Illegal detention of weapons Drug-related Prostitution Illegal migration Organized crime (mafia) Against the economy and state Fraud Other crimes Mean 0.58 0.26 0.09 0.39 0.02 0.07 0.02 0.26 0.10 0.19 Std. Dev. 0.49 0.44 0.29 0.49 0.13 0.26 0.13 0.44 0.29 0.39 Min. 0 0 0 0 0 0 0 0 0 0 Max. 1 1 1 1 1 1 1 1 1 1 Notes: This table presents summary statistics for the crime types committed by the pardoned inmates. The total number of observations is 19,616. 43 Table 3: Clearance Rates Crime Type Property Violent Drug-related Organized crime (mafia) Against the economy and state Fraud Other crimes All Mean 0.11 0.58 0.73 0.88 0.50 0.33 0.77 0.23 Std. Dev. 0.08 0.17 0.17 0.16 0.11 0.13 0.31 0.11 Min. 0.03 0.14 0.19 0.00 0.21 0.08 0.00 0.08 Max. 0.39 0.94 1.00 1.00 0.83 0.82 1.00 0.63 Notes: The province-level clearance rates are based on ISTAT’s criminal statistics (ISTAT, 2005). Table 4: Transition Probabilities Panel A: Most Serious Crime Based on Median Sentence Property Violent Drugrelated Prostitution Organized crime Against economy/state Fraud Other crimes Sum 0.47 0.24 0.13 0.25 0.10 0.28 0.21 0.22 0.14 0.33 0.10 0.00 0.14 0.15 0.13 0.15 0.09 0.12 0.51 0.17 0.10 0.10 0.15 0.25 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.01 0.02 0.01 0.00 0.37 0.02 0.04 0.03 0.17 0.15 0.08 0.17 0.13 0.29 0.18 0.15 0.03 0.03 0.03 0.00 0.04 0.05 0.22 0.04 0.09 0.11 0.14 0.17 0.12 0.10 0.08 0.17 1 1 1 1 1 1 1 1 Property Violent Drug-related Prostitution Organized crime (mafia) Against the economy and state Fraud Other crimes Panel B: Most Serious Crime Based on Mean Sentence Property Violent Drugrelated Prostitution Organized crime Against economy/state Fraud Other crimes Sum 0.47 0.24 0.13 0.22 0.10 0.28 0.23 0.22 0.13 0.32 0.10 0.09 0.13 0.15 0.14 0.14 0.10 0.13 0.52 0.09 0.11 0.11 0.11 0.27 0.00 0.01 0.00 0.13 0.01 0.00 0.00 0.00 0.01 0.02 0.01 0.00 0.37 0.02 0.03 0.03 0.17 0.15 0.08 0.17 0.13 0.29 0.20 0.15 0.02 0.02 0.02 0.04 0.03 0.04 0.21 0.03 0.09 0.11 0.14 0.26 0.12 0.10 0.07 0.17 1 1 1 1 1 1 1 1 Property Violent Drug-related Prostitution Organized crime (mafia) Against the economy and state Fraud Other crimes Notes: The transition probabilities are based on the entire criminal history of a sample of inmates who served at least some time in one of two different Milan prisons between 2001 and 2012. Since criminals may be imprisoned for an offense related to multiple crime types, the transitions are based on the crimes with either the highest median sentence (Panel A) or the highest mean sentence (Panel B). Table 5: Recidivism and Sentence Length Original Sentence Length Coefficient on Residual Sentence Less than 1 Year 1 - 2 Years 2 - 4 Years 4 - 6 Years More than 6 Years All -1.459 (0.072) -0.502 (0.042) -0.257 (0.018) -0.139 (0.041) -0.005 (0.075) -0.196 (0.016) Notes: Each column contains the estimated coefficient (and standard error) on residual sentence from a linear regression of recidivism on original sentence and residual sentence, for various bins of original sentence length. 44 Table 6: Baseline Results Specification Monthly Discount Factor (δ ) Annual Discount Factor (1) No Covariates (2) Baseline Covariates (3) Alt. Low Education (4) Alt. High Education (5) Crime Dummies 0.97142 (0.00537) 0.706 0.97488 (0.00519) 0.737 0.97546 (0.00511) 0.742 0.97469 (0.00526) 0.735 0.97104 (0.00549) 0.703 0.366 (0.409) 0.199 (0.145) -0.032 (0.016) 0.252 (0.398) 0.174 (0.146) -0.026 (0.016) 0.174 (0.393) 0.203 (0.146) -0.028 (0.016) 0.279 (0.399) 0.163 (0.147) -0.025 (0.016) 0.372 (0.425) 0.133 (0.152) -0.025 (0.017) -1.841 (0.379) -1.169 (0.439) -0.010 (0.006) -0.331 (0.195) -0.024 (0.123) -1.144 (0.435) -0.010 (0.006) -0.336 (0.193) -1.190 (0.440) -0.009 (0.006) -0.333 (0.196) -0.028 (0.124) -1.153 (0.459) -0.010 (0.006) -0.337 (0.209) -0.013 (0.131) Controls for Original Sentence: Log-sentence Log-sentence sq. Log-sentence cu. Utility of Crime: Constant Age Permanently employed Primary education or less No education Secondary education and above -0.511 (0.361) 0.333 (0.216) -0.467 (0.356) 0.026 (0.114) -0.405 (0.288) -0.597 (0.137) 0.055 (0.115) -0.429 (0.286) -0.609 (0.136) -0.035 (0.014) -0.001 (0.000) 0.001 (0.007) 0.007 (0.005) -0.034 (0.014) -0.001 (0.000) 0.001 (0.006) High school and above Italian Female Married -0.562 (0.387) -0.568 (0.441) 0.025 (0.115) -0.412 (0.289) -0.600 (0.138) Crime dummies Disutility of Prison: Constant -0.068 (0.018) Age Permanently employed Primary education No education Secondary education and above 0.007 (0.013) -0.006 (0.008) 0.004 (0.012) -0.014 (0.002) 0.020 (0.005) -0.020 (0.012) 0.012 (0.005) -0.014 (0.002) 0.019 (0.005) -0.019 (0.011) 0.012 (0.005) 0.010 (0.018) -0.014 (0.002) 0.020 (0.005) -0.020 (0.012) 0.012 (0.005) 19616 -9846.72 19616 -9854.02 19616 -9848.48 High school and above Definitive sentence Italian Female Married -0.035 (0.014) -0.001 (0.000) 0.000 (0.007) 0.008 (0.005) 19616 -10197.64 -0.051 (0.019) -0.001 (0.000) 0.002 (0.008) 0.007 (0.005) 0.011 (0.015) Crime dummies Obs. log-likelihood 0.065 (0.136) -0.443 (0.306) -0.613 (0.147) √ -0.012 (0.002) 0.014 (0.006) -0.018 (0.013) 0.014 (0.006) √ 19616 -9746.41 Notes: This table contains estimates from our baseline model in equation (6). In column 1 we control only for original sentence using a polynomial in log of original sentence. In column 2 we add our baseline controls. Columns 3 and 4 use alternative measures of education, and column 5 allows the utility of committing a crime, as well as the disutility of prison, to depend on a full set of crime-type dummies. Standard errors are reported in parentheses below the point estimates. 45 46 19616 -9845.52 0.718 0.780 0.062 0.97280 (0.00547) 0.00674 (0.00417) (1) Married 19616 -9846.04 0.728 0.821 0.093 0.97390 (0.00523) 0.00979 (0.00753) (2) Female 19616 -9846.72 0.739 0.734 -0.005 0.97506 (0.00547) -0.00051 (0.00506) (3) Permanently employed 19616 -9842.45 0.776 0.677 0.99526 (0.00837) -0.00780 (0.00271) (4) Log(age-18) 19616 -9842.26 0.760 0.676 1.00842 (0.01459) -0.00161 (0.00063) 0.000016 (0.000006) (5) Age, age sq. 19616 -9846.70 0.739 0.731 -0.008 0.97508 (0.00527) -0.00086 (0.00389) (6) Primary education or less 19616 -9853.98 0.744 0.728 -0.016 0.97567 (0.00519) -0.00177 (0.00639) (7) No education 19616 -9845.14 0.734 0.885 0.151 0.97456 (0.00535) 0.01529 (0.00694) (8) Secondary education and above 19616 -9846.62 0.733 0.995 0.263 0.97440 (0.00514) 0.02520 (0.00892) (9) High school and above Notes: In these specifications we allow the discount factor to differ by the variables shown in the first row. Standard errors are reported in parentheses below the point estimates. Obs. log-likelihood At age 26 At age 51 Baseline With interaction Difference Annual Discount Factor Age squared Interaction coefficient Constant Monthly Discount Factor (δ ) Specification: Table 7: Heterogeneity in Discount Factors Table 8: Heterogeneity in Discount Factors Specification: (1) Nationality (2) Nationality-Language 1 (3) Nationality-Language 2 0.96547 (0.00872) 0.01442 (0.00613) 0.97551 (0.00977) 0.01886 (0.00915) -0.01674 (0.00751) 0.97956 (0.00851) 0.01418 (0.00602) -0.01408 (0.00601) 0.784 0.656 0.762 0.780 0.603 0.743 0.656 0.780 19571 -9821.16 19220 -9700.85 Monthly Discount Factor (δ ) Constant Italian Strong FTR Annual Discount Factor Italian Immigrant (All) Immigrant (Strong FTR) Immigrant (Weak FTR) Obs. log-likelihood 19616 -9842.21 Notes: In these specifications we allow the discount factor to differ by nationality and language. FTR refers to future time reference. There are 45 observations for which either the country of origin was unknown or we were unable to classify the FTR for the country. Standard errors are reported in parentheses below the point estimates. 47 48 19616 -9841.44 0.710 0.033 0.97191 (0.00513) 0.00364 (0.00385) (1) Violent 19616 -9805.33 0.758 -0.037 0.97715 (0.00525) -0.00407 (0.00595) (2) Property 19616 -9842.98 0.751 -0.042 0.97637 (0.00497) -0.00463 (0.00793) (3) Arms 19616 -9837.12 0.803 -0.107 0.98190 (0.00511) -0.01160 (0.00387) (4) Drug-related 19616 -9837.18 0.719 0.205 0.97284 (0.00530) 0.02052 (0.00649) (5) Prostitution 19616 -9841.82 0.697 -0.279 0.97041 (0.00566) -0.04047 (0.03629) (6) Migrant Law 19616 -9845.78 0.744 -0.072 0.97571 (0.00543) -0.00825 (0.01619) (7) Organized crime (mafia) 19616 -9800.75 0.741 -0.073 0.97529 (0.00510) -0.00841 (0.00564) (8) Against the economy and state 19616 -9843.39 0.727 -0.095 0.97382 (0.00507) -0.01133 (0.00971) (9) Fraud 19616 -9836.25 0.715 0.078 0.97243 (0.00533) 0.00843 (0.00409) (10) Other crimes Notes: In these specifications we allow the discount factor to differ by the types of crimes committed by the inmates before the pardon. Standard errors are reported in parentheses below the point estimates. Obs. log-likelihood Overall Difference Annual Discount Factor Interaction coefficient Constant Monthly Discount Factor (δ ) Interaction variable: Table 9: Heterogeneity in Discount Factors by Crime Type 49 19616 -9838.81 √ √ 19616 -9752.81 √ 0.97524 (0.00551) 0.740 (2) 0.97291 (0.00515) 0.719 (1) 19616 -9821.16 √ 0.97111 (0.00504) 0.703 (3) 19616 -9734.41 √ √ 0.97392 (0.00554) 0.728 (4) 19616 -9881.57 √ 0.98263 (0.00798) 0.810 (5) 19616 -9882.37 √ 0.98269 (0.00729) 0.811 (6) Notes: In column 1 we proxy for clearance rates using dummy variables for each of the three areas of Italy (North, Center, and South). In column 2 we include area dummies as well as dummies for the crime type of the original offense. In column 3 we use dummies for each region (20 regions), instead of areas. Column 4 adds crime-type dummies. In columns 5 and 6 we use province-level clearance rates (taken from ISTAT, 2005), overall and crime-specific, respectively. Standard errors are reported in parentheses below the point estimates. Obs. log-likelihood Area dummies Region dummies Crime-type dummies Clearance rates by province Clearance rates by province and crime type Annual Discount Factor Monthly Discount Factor (δ ) Specification Table 10: Robustness Results Using Various Controls for Clearance Rates Table 11: Incorporating Transition Probabilities Across Crime Types Specification Monthly Discount Factor (δ ) Annual Discount Factor Obs. log-likelihood (1) (2) Mean 0.97750 (0.00616) 0.761 19616 -9854.88 Median 0.97613 (0.00641) 0.748 19616 -9854.98 Notes: These results are based on the model in Section 3.2 in which we intergrate against the distribution of future crime choices, conditional on the original crime type. We classify offenses into the most serious crime type, based on both the mean (column 1) and median (column 2) sentence for each type. Standard errors are reported in parentheses below the point estimates. Table 12: Robustness Regarding the Expected New Sentence Specification Monthly Discount Factor (δ ) Annual Discount Factor Obs. log-likelihood (1) 0.97860 (0.00604) 0.771 19616 -9854.23 (2) 0.97477 (0.00771) 0.736 19616 -9852.64 (3) 0.96577 (0.01321) 0.658 13241 -6449.57 Notes: Each column presents a different approach to dealing with original sentences below two years. Each is based on our baseline specification. In column 1, we set the residual sentence equal to zero for observations with original sentences below two years. In column 2 we set the expected new sentence equal to two years (plus the residual sentence). In column 3, we drop these observations. Standard errors are reported in parentheses below the point estimates. Table 13: Alternative Forms of Discounting Specification r (1) Exponential–Continuous 0.02543 (0.00532) k Obs. log-likelihood 19616 -9846.72 (2) Hyperbolic–Continuous 0.10150 (0.07441) 19616 -9848.08 Notes: Standard errors are reported in parentheses below the point estimates. 50