Statistical Review Of The Judicial Performance Evaluation Program Commonwealth of Massachusetts Supreme Judicial Court Committee on Judicial Performance Evaluation June 2018 Acknowledgments This research project was funded by the Committee on Judicial Performance Evaluation of the Commonwealth of Massachusetts’ Supreme Judicial Court in an effort to identify any potential concerns regarding the current methodology of the survey, identify questionnaire issues, and identify potential improvements. The report was prepared by the research team at Market Decisions Research (MDR) of Portland, Maine. Brian Robertson, Ph.D., Vice President Research John Charles, MS, Research Analyst Lindsay Gannon, MPA, Evaluation Associate Paige Lewis, Research Assistant 1 Table of Contents Acknowledgments...........................................................................................................1 Executive Summary.........................................................................................................5 Survey Review and Recommendations.........................................................................5 Bias Analysis Results and Recommendations...............................................................6 Bottom line......................................................................................................................9 Background....................................................................................................................10 Methodology Report......................................................................................................11 Literature Review............................................................................................................................... 11 Data Verifications.............................................................................................................................. 11 Variables Used in the Main Analyses................................................................................................ 11 Statistical Methods.......................................................................................................14 Survey Response.........................................................................................................16 Glossary of Terms.........................................................................................................17 Limitations....................................................................................................................17 Survey Review and Recommendations......................................................................18 Review of Attorney Survey Materials...........................................................................18 Review of Survey Questions........................................................................................22 Review of Survey Questionnaire and Materials for Potential Bias Based on Gender or Race/Ethnicity...............................................................................................................25 Overview........................................................................................................................................... 25 Survey Review.................................................................................................................................. 26 General Recommendations on Gender and Race/Ethnic Bias..........................................................27 Detailed Results.............................................................................................................28 Descriptive Analysis.....................................................................................................28 Bottom line........................................................................................................................................ 32 Factor Analysis.............................................................................................................33 What is Factor Analysis?................................................................................................................... 33 Factor Analysis of the Massachusetts Judicial Performance Evaluation Survey...............................35 Regression Analysis.....................................................................................................37 Overall Results (2014 – 2017)........................................................................................................... 39 Results by Court................................................................................................................................ 41 2 Content Analysis...........................................................................................................44 Overview........................................................................................................................................... 44 Content Analysis Results................................................................................................................... 47 Conclusions and Recommendations..........................................................................49 Appendices.....................................................................................................................51 Include “summary question” in future survey administrations......................................51 Recommendations for reporting judicial performance results to judges......................52 Additional Analyses......................................................................................................55 The Scope of Gender and Racial/Ethnic Bias in the Results.............................................................55 Analysis of Judges who received less than 70% in Categories Usually/Always................................56 Detailed Regression Results........................................................................................57 Detailed Content Analysis Results...............................................................................61 3 Table of Tables Table 1. Number of Judges with Completed Surveys by Court Type and Demographics...........16 Table 2. Overall Average Outcome Scores by Judge Gender and Race/Ethnicity.....................28 Table 3. Overall Average Outcome Scores by Judge Gender and Race/Ethnicity.....................30 Table 4. Average Outcome Scores by Court Type.....................................................................31 Table 4. Average Outcome Scores by Court Type (continued)..................................................32 Table 5: Components within the MA JPE Survey.......................................................................35 Table 6: Variables used in regression analysis..........................................................................37 Table 7: Overall Average Outcome Scores by Judge Gender and Race/Ethnicity.....................39 Table 8: Interactions between Judge Gender and Judge Race/Ethnicity...................................39 Table 9: Average Attorney Ratings by Race/Ethnicity................................................................40 Table 10: Regression Significant Results by Court (Legal Ability and Courtroom Management) .................................................................................................................................................. 43 Table 11: Regression Significant Results by Court (Integrity & Judicial Temperament).............43 Table 12: Likelihood of Judges Receiving Comments................................................................47 4 Executive Summary This executive summary provides high-level findings and recommendations for the 2018 statistical review of the Commonwealth of Massachusetts’ Judicial Performance Evaluation (JPE) program, a comprehensive review of survey materials, methods and data from the 2014-2017 Massachusetts judicial performance survey. This survey was administered among attorneys and all trial court judges were evaluated at least once. It is important to note that not all attorneys who appeared before a judge evaluated that judge. Hence, the findings presented in this report are only based on data from the sample of attorneys who completed the survey. We do not know if that sample is representative of the entire population of attorneys in the Commonwealth. Survey Review and Recommendations Market Decisions Research (MDR) was tasked with a review of the survey materials and questionnaire for the Massachusetts JPE Program. The goal of this review was to identify any potential concerns regarding the current methodology of the survey, identify questionnaire issues, and identify potential improvements. To improve the survey process, the following recommendations were made:  Shorten the email invitation and highlight the login instructions.  Modify the survey design and layout to simplify the process of evaluating a judge.  For questions about a specific judge, include design elements that clearly identify the judge and simplify some of the questions. Market Decisions Research was also tasked with reviewing survey materials and questions to identify the presence of language that demonstrated implicit bias. Based on this review, the following recommendations were made: Recognize that implicit bias exists as it is an important first step in combatting it. Minor updates to the survey tool may help mitigate implicit bias. Advocate for training and education on implicit bias for judges, judiciary staff, other court personnel, and attorneys. 5 Bias Analysis Results and Recommendations The primary objective of this research was to identify any potential bias towards certain groups of judges. MDR used a series of advanced statistical analyses to analyze the survey data and attorneys’ in-depth open-ended comments. For the purpose of these analyses, survey questions were grouped into judicial performance categories.1 Judicial performance scores were then calculated as the average score among all the survey questions included in the specific category. The following conclusions were derived from the analyses: There is evidence of bias favoring white male judges in reviews of trial court judges during 2014-2017. The bias is stronger against racial and ethnic minority male judges. There is also evidence of bias against female judges. Male Judges Outcome Legal Ability and Courtroom Management Integrity & Judicial Temperament White Female Judges White 4.54 Racial/Ethnic Minority 4.29 4.42 Racial/Ethnic Minority 4.33 4.55 4.39 4.44 4.47 The bias is also noticeably stronger among white attorneys who tend to score white judges higher than other groups. White Attorneys Outcome Legal Ability and Courtroom Management Integrity & Judicial Temperament White Judges (male) Racial/Ethnic Minority Attorneys 4.51 (4.55) Racial/Ethnic Minority Judges (male) 4.31 (4.29) White Judges (male) 4.43 (4.48) Racial/Ethnic Minority Judges (male) 4.38 (4.41) 4.53 (4.57) 4.43 (4.40) 4.42 (4.46) 4.45 (4.44) () In parentheses are the average scores received by male judges within that specific group. 1 Judicial performance categories were determined using factor analysis. Two main categories or constructs emerged from the factor analysis: 1) Legal Ability and Courtroom Management; 2) Integrity & Judicial Temperament. More information about factor analysis can be found in the methods section of this report. 6 It is difficult to measure the extent to which the bias observed overall is present within different court departments given the lack of diversity in certain courts; however, in the Boston Municipal Court and the District and Superior Courts which are, by and large, the more diverse courts in Massachusetts, there is evidence of bias against racial/ethnic minority male judges and some evidence of bias against female judges. MDR conducted a content analysis of more than 13,000 in-depth comments provided by attorneys on the 335 judges. The goal of these comments was to allow attorneys to clarify their views on a judge’s overall judicial performance. The following conclusions were derived from this analysis:  Female judges are more likely to be the subject of biased comments, receive comments about their physical appearance, have their intelligence questioned, or to receive negative comments about their communication skills, legal ability, administrative capacity, their integrity and impartiality, and their professionalism and temperament.  Racial and ethnic minority judges were the most likely to have their intelligence questioned. They were also more likely to be perceived as being too emotional, to be the subject of biased comments, and to receive negative comments about their legal ability. The table below displays the likelihood of judges receiving certain comments from attorneys. A likelihood of 1 based on judge’s gender simply means that both male and female judges are as likely to be the subject of a specific type of comment. A likelihood greater than 1 means that one group is more likely to receive a specific type of comment. Nature of Attorney Comments Total Mention s Negative comment about judge's legal ability Negative comment about judge's integrity and impartiality Negative comment about judge's communication skills Negative comment about judge's professionalism and temperament Negative comment about judge's administrative capacity 1160 Attorney comments about judge's physical appearance (looks, eye Judge Gender Female Male Judge Judge s s 1.19 0.88 Judge Race/Ethnicity Racial/Ethni White c Minority Judge Judges s 1.33 0.96 985 1.04 0.97 0.89 1.01 216 1.25 0.84 0.89 1.01 1996 1.03 0.98 0.93 1.01 1165 1.08 0.95 0.82 1.02 27 2.31 0.18 0.31 1.09 7 Nature of Attorney Comments rolling, posture) Attorney comments questioning judge's intelligence Attorney comments about judge's being emotional Implicit attorney biased comments stereotypes, attitudes Explicit attorney biased comments Total Mention s Judge Gender Female Male Judge Judge s s Judge Race/Ethnicity Racial/Ethni White c Minority Judge Judges s 152 2.12 0.30 2.20 0.84 46 1.98 0.39 1.82 0.89 450 1.68 0.57 1.14 0.98 11 1.89 0.44 1.52 0.93 Looking at the issue from a different angle: Currently, results from the JPE survey are reviewed as a component of the overall evaluation of judicial performance. Prior to 2014, the results were used to help determine if an improvement plan should be developed for a judge, based on whether 70% or more of the attorney responses for a judge are within the categories “always” and “usually” when answering the 17 survey questions. Currently, every judge is required to have a professional development plan. Based on this former threshold: One in four (25%) racial and ethnic minority judges and 1 in 6 female judges would have been subject to an improvement plan. This is compared to only 1 in 10 male judges and a similar proportion of white judges. The lowest rate was observed among white male judges with only 9% receiving a score below the 70% threshold. Judges who received less than 70% in categories usually/always Total Judges Percent Male 22 206 11% Female 20 128 16% Racial/Ethnic Minority 10 40 25% White 32 294 11% White Male Judges 16 185 9% Overall 32 334 10% Judge Demographic Bottom line 8 We believe that gender and race/ethnicity bias is a societal problem; and overcoming subconscious stereotypes that certain attorneys may harbor about certain groups of judges will require a pragmatic and multifaceted approach. To combat hidden and unconscious bias among attorneys, Market Decisions Research proposes: 1. Advocate that the Bar provide customized training to attorneys based on the results presented in this report as the first step in combating unconscious and hidden bias is to become aware of it in the first place. 2. The Committee could also mathematically correct for the observed bias on a court-by-court basis by adjusting the scores of groups for which evidence of bias has been found in judicial performance ratings. However, any decision to adjust scores mathematically must be subject to careful consideration by the Committee, as the complexity and cost may not make this a viable choice for the Massachusetts program. 9 Background The Massachusetts Judicial Performance Evaluation Program was established in 2001 to evaluate the performance of all judges serving in the seven departments of the trial court. One component of the evaluation was a survey of attorneys who had appeared before a judge in the past two years. Attorneys were asked to evaluate one or more judges based on their experiences during these appearances. The initial attorney survey instrument was revised and a new survey was introduced in 2014. This survey was administered among attorneys during the pilot program period from 2014 to 2017 during which all trial court judges were evaluated by attorneys at least once. The current review focuses on responses to the revised survey instrument. Market Decisions Research (MDR), a survey research, analytics, and program evaluation firm based in Portland, Maine, was contracted by the Committee on Judicial Performance Evaluation (Committee) to evaluate survey responses with the goal of identifying potential bias in the results as well as identifying possible improvements to the questionnaire and the process of administering the survey as a whole. Research Questions According to the initial Request for Proposals, the Committee was looking for answers to the following questions: 1. Is the current attorney questionnaire a valid instrument for evaluating the Commonwealth’s Trial Court judges? 2. Is the questionnaire demonstrative of, or susceptible to, bias, and if so, what type or types of bias? Is there bias in the questionnaire? If so, how can the questions be changed? If the bias is response bias, what steps could be taken to correct for this problem? E.g., would changing the evaluation process or instrument help, and if so, in what ways? 3. Is there a relationship between the respondent’s individual demographic profiles and how they evaluate the judges? 4. What additions or changes can be made to the attorney questionnaire to obtain a more comprehensive picture of the judge’s performance?  10 Methodology Report Literature Review The research and evaluation team at MDR conducted a thorough review of existing literature on judicial performance evaluation to determine whether the Massachusetts survey is capturing the totality of the judicial performance evaluation experience. Data Verifications Before conducting analysis, MDR merged datasets from nine (9) databases containing responses from the nine rounds of evaluations in the pilot program and conducted data verifications to check and report on the quality of the data. This step involved checks to ensure that the data were complete and consistent. Any data quality issue discovered by MDR was reported to the Committee for resolution before starting the analysis. It is important to note that all data containing confidential information or individual records was stored on a secure partition of the MDR file server that was accessible only by the senior researchers who worked on this project. Variables Used in the Main Analyses In addition to survey and sample data provided by the Committee, MDR developed appropriate variables for use with statistical analyses. The tables below detail each variable used in the main statistical analyses. To simplify reporting, certain phrases are abbreviated in the tables below as follows: DA = Descriptive Analysis FA = Factor Analysis RA = Reliability Analysis RE = Regression Analysis A full description of these types of analyses can be found in the next section. 11 The following variables were used as outcome/analytical variables: Variable Name Q03_Arguments Q04_Relevant Q05_Oral Q06_Evidence Q07_Patient Q08_Proceeding s Q09_Decorum Q10_Respect Q11_Disposes Q12_Prepared Q13_Listens Q14_Written Q15_Procedure Q16_Pro_Se Q17_Fairly Q18_Substantive Q19_Progress Variable Label The Judge considers arguments from all parties before ruling. The Judge recognizes relevant issues. The Judge issues oral rulings that are clear. The Judge demonstrates knowledge of rules of evidence. Type Analysis Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE The Judge is patient. The Judge adequately frames the nature of the proceedings. The Judge maintains decorum in the courtroom. The Judge shows respect to all courtroom participants. The Judge disposes of matters in a timely manner. The Judge appears appropriately prepared for court proceedings. The Judge listens attentively during court proceedings. The Judge issues written rulings and decisions that are clear. The Judge follows the applicable Rules of Procedure. The Judge adequately explains the nature of the proceedings to pro se litigants. Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE The Judge treats all parties fairly. The Judge demonstrates knowledge of substantive law. The Judge effectively manages progress of the case. Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE Survey question DA, FA, RA, RE 12 The following variables will be used for classification purposes: Variable Name Label Type Analysis JudgeDept Court Department Sample variable DA, RE JudgeGender Judge Gender Sample variable DA, RE JudgeEthnicity Judge Race/Ethnicity Sample variable DA, RE The variables below were used as control variables in regression analyses. Control variables are potential confounding variables that may affect how judges are being rated. Variable Name Full Label Type Analysis JudgeBench Judge's Time on Bench Sample variable RE AttyGender Attorney Gender Sample variable RE AttyEthnicity Sample variable RE Q01_Appears Attorney Race/Ethnicity Number of Attorney Appearances Before Judge Sample variable RE Q02_Hours Hours Attorney Before Judge Sample variable RE AttyLitigation Attorney Percent of Litigation in Practice Sample variable RE AttyPractice Attorney Years in Practice Sample variable RE 13 Statistical Methods The following statistical analyses were conducted to investigate the validity of the survey questionnaire and identify any potential bias towards certain groups of judges. Descriptive Analysis Descriptive analysis was used to provide the summary of responses broken out by gender and race/ethnicity, then determine if these differences are statistically significant. Descriptive analysis was used to provide simple summaries of the data and the measures. MDR produced tables of means (averages) for each survey question classified by court type and judges’ demographics. T-tests was used to test whether any of the groups’ and sub-groups’ average ratings are statistically different. A t-test is a statistical procedure that compares averages across several groups and determines among which groups judicial scores tend to have the most variability. Factor Analysis Factor analysis was used to determine whether the responses to the survey questions are independent of one another, that is, each measuring an aspect of a judge’s performance, or whether attorneys are answering the questions based on some other factor that determines answers to two or more survey questions. While the questions included in a survey may seem to offer a straightforward answer to a direct question, it is often the case that responses to individual questions are actually the result of some underlying concept; responses to an individual question are actually derived from the attorney’s views of this underlying concept. Further, the concept may be something that cannot be directly measured. An underlying construct (also called a latent variable) is something that cannot be directly measured but can be inferred from responses to survey questions. In survey research and other fields, we often use constructs to explain behavior. These constructs represent the way people group information and thoughts used to evaluate their experiences. For example, attorneys may think of their experiences as a set of discrete factors such as judge’s legal ability, administrative skill, or temperament. Factor analysis was used to understand if there are key factors underlying the answers attorneys give to each of the survey questions. Factor analysis is a data reduction tool that can help identify these underlying constructs. If there is a strong relationship between groups of items, then they can be used to construct scales that measure judicial performance with an underlying latent construct. 14 A strong relationship would imply that the individual questions are not independent, that is, each assessing a unique aspect of a judge’s performance. Rather, this would suggest that there are underlying constructs that attorneys use in answering survey questions. If such a pattern exists, then we would propose focusing the bias analysis on any identified constructs rather than the individual questions. Reliability Analysis Reliability analysis was used to confirm whether the bias analysis should focus on each of the individual survey items or whether it should focus on the factors identified through factor analysis. Reliability analysis was used to assess the overall properties of any factors identified through factor analysis. Reliability analysis allows researchers to study the properties of a scale and the items that compose the scale. If there are important constructs used in evaluating judicial performance, reliability analysis will determine how consistently survey questions are capturing this important construct. Regression Analysis Regression analysis was used to examine differences in responses to determine if there is any apparent bias, based on the judge’s gender and race/ethnicity, controlling for other factors that may influence differences in scores. To evaluate the possibility of bias in judicial performance reviews while controlling for factors like the number of years the judge has been on the bench and attorneys’ demographics among others, the data was analyzed using regression. A regression analysis is a statistical procedure that uses a set of independent variables to explain changes in a dependent variable. This technique is useful when one wants to examine the effect of a single variable on an outcome while statistically controlling for other variables. To isolate the source of any potential bias, regression analyses were run separately for each court department (BMC, District, Housing, Juvenile, Land, Probate & Family, Superior). These analyses assessed the effect of the judge’s demographics while controlling for selected factors. Additionally, interactions between respondent and judge demographic characteristics were also included in these analyses. 15 Content Analysis Content analysis helped determine if a pattern of bias exists in open-ended comments and acted as supporting analysis for the regression analysis. MDR coders reviewed and blind coded responses and flagged: o o o o Instances of positive commentary. Instances of negative commentary. Instances of explicit bias based on gender and race/ethnicity. Instances of implicit (charged words) bias based on gender and race/ethnicity. MDR also used regression analysis to assess the impact of attorneys’ comments on how they rate judges’ judicial performance. Survey Response The table below shows the number of judges for whom surveys were completed by court type and by judge demographics. Table 1. Number of Judges with Completed Surveys by Court Type and Demographics Court Department of Judge BMC District Court Housing Court Juvenile Court Land Court Female Male 11 17 48 99 5 4 22 14 2 4 Probate & Family Court 20 18 Asian Black Hispanic Other White 1 5 1 1 20 2 11 3 1 130 0 1 0 0 8 1 3 1 0 31 0 0 0 0 6 White Racial/Ethnic Minority (overall) Total Judges 20 130 8 31 8 28 17 147 1 9 5 36 Demographics Superior Court Total 21 50 129 206 0 0 0 0 38 3 6 0 0 62 7 26 5 2 295 6 38 62 295 0 6 0 38 9 71 40 335 It is important to note that the Housing, Land, and Probate & Family courts do not contain enough data for sub-group analyses by race/ethnicity. 16 Glossary of Terms The following terms are defined in the context of this report to provide clarity and avoid certain confusion revolving around statistical terms that are often misused or defined differently in different industries. Bias and Disparity For the purpose of this research, bias and disparity are defined as statistically significant results that are consistently in favor of or against one demographic group. Response Bias Technically, response bias is a type of cognitive bias that affects the results of a statistical survey when survey participants answer questions in the way they think the questioner or society wants them to answer rather than according to their true beliefs. Validity Validity is equivalent to accuracy. A valid instrument is a questionnaire that accurately measures what it claims to measure. Limitations It is important to note that not all attorneys who have appeared before a judge evaluated that judge. Hence, the findings presented in this report are based on quantitative and qualitative data from the sample of attorneys who completed the survey. While Market Decisions Research took extensive precautions to ensure that the findings presented in this report are statistically sound and accurate, it is not possible to verify whether the sample used in the survey is representative of the population of attorneys in the Commonwealth of Massachusetts because response is voluntary and anonymous. 17 Survey Review and Recommendations Review of Attorney Survey Materials  One key design element of the survey process is to minimize the burden on attorneys when they are asked to respond to a survey. Based on a review of the survey materials, MDR recommends changes to the current methodology that would shorten invitations and instructions as well as consolidate the number of steps required to respond to the survey.  Shorten the email invitation and highlight the login information. The current email invitation sent to attorneys has four paragraphs of text that describe the goals of the project and provide instructions for responding. It takes approximately one minute to read the email invitation. o Simplify the language of the email to provide the reason for the invitation, the goal of the survey, why they are being asked to participate, and how to participate. o If possible, include in the email how many judges they will be asked to evaluate. o If possible, embed in the survey link the account number and login ID. This way they can access the survey in one click. o Highlight the actual instructions for logging in to complete the survey. o More detailed information can be provided in an FAQ document that can be accessed through the survey.  Simplify the process of evaluating a Judge. Based on a review of the screenshots of the attorney survey, the process to complete a survey involves the attorney 1. Signing in with their account number and password, 2. Clicking through to an initial survey page that gives the counties included in the survey and asks the attorney to “Proceed to Demographics” by clicking on Demographics, 3. Entering demographic items or checking decline all and clicking “Save,” 4. Going to a screen to select the judge to evaluate, 5. Reading through the instructions for a recall exercise, and 6. Then actually answering questions about a judge. 18 Thus, an attorney must enter information, read the text, and click through six screens before he or she can answer a single question about a judge. The process requires 3.5 minutes. This places a substantial burden on an attorney who may have already been reluctant to reply and offers a convenient reason for not participating.  Minimize the number of screens an attorney must click through before he or she is answering questions about a judge. Embedding a link directly in the email can provide a mechanism for the attorney to quickly access the survey. SAMPLE INVITATION Chief Justice Ralph D. Gants and Chief Justice Paula M. Carey invite you to participate in the ongoing evaluation of trial court judges. The goal of this evaluation is [FILL IN SHORT RATIONALE]. Your responses are a critical part of this program. You are being asked to evaluate one or more judges because court records indicated that you were associated with a case before this judge during the past two years. Each survey should take only a few minutes to complete. You need not complete all your surveys at one time. To complete your surveys: 1. Your login is [LOGINID]. 2. Your password is [PASSWORD]. 3. Click here to log in and access your surveys. Please be assured your responses are anonymous. Questions? Please contact [FILL NAME]. You will also find information about the Massachusetts Judicial Evaluation Program at [FILL WEBPAGE]. This is your chance to make your voice heard and help improve our judiciary. Thank you for responding to this important survey. 19  Move the judge evaluation screen immediately after the welcome screen and make the process of selecting a judge to evaluate simpler. The goal is to get the attorney to the point where he or she is completing actual evaluations as quickly as possible. Provide the attorney with the list of judges in alphabetical order (put the court department after their name in parenthesis). If software allows, set up a grid that will allow an attorney to select the judges he or she feels qualified to evaluate: There are [XX] judges to evaluate. For each, please indicate if you have sufficient experience to evaluate his or her performance. I have worked with this judge since [DATE] and feel qualified to evaluate his or her performance. I do not feel qualified to evaluate his or her performance. Judge A (court type)   Judge B (court type)   Judge C (court type)   This will allow the attorney to select only those judges for whom the attorney has sufficient knowledge. By clicking “next”, one could have the table update to include only those judges the person has sufficient knowledge to evaluate.  While important in framing the evaluation, simplify the recall task for the attorney or at least shorten the language used in describing it. Another option is to ask an attorney to evaluate only one judge. When evaluating an experience, a recall exercise is important to help link experience to events rather than having an evaluation based on general impressions of or feelings about a judge. If the attorney were evaluating one judge then the recall task would be relatively easy to complete. However, it becomes more difficult when an attorney is asked to evaluate more than one judge. Depending on the number of judges to evaluate, an attorney could see this as a significant burden and choose not to complete the evaluation(s). Limiting the process so that an attorney evaluates only one judge would help to limit this burden but this is likely not practical given the need to have a sufficiently large population of attorneys evaluating each judge. 20 Barring this, it is important to simplify the judge evaluation instruction screen. We recommend removing the first paragraph on rationale entirely, leaving the second paragraph, and eliminating the bullet list of items. While the bullet list is useful for framing the context of the recall it does impose a burden on the attorney; there will be a tradeoff in providing this level of detail for the recall exercise and the attorney’s willingness to evaluate a judge or multiple judges. Further, in lengthy lists, people will tend to recall either the first few or last few items on the list which may change the context in which attorneys evaluate the judge. An alternative is to reduce the list elements to a sentence or two each. We also suggest removing the last sentence “Some people find it helpful…” as respondents will decide what method works for them. Another reason to shorten the language is that it will have to be repeated in cases where an attorney is evaluating multiple judges. The survey design cannot use one screen prior to all evaluations but rather the instructions must be given with each evaluation (think of each judge’s evaluation as a separate survey). As a lead-in to each evaluation after the first, use language similar to: “As with the prior judge(s), please think about your appearances before [JUDGE XX] and specifically recall details of the judge’s performance during those appearances.”  In cases where an attorney is asked to evaluate more than one judge, include a method by which the attorney can see progress in completing all evaluations. In a conventional online survey, a status bar can be used to keep a respondent informed about his or her progress. Depending on the software and its ability to accurately track status in surveys involving multiple evaluations, we recommend incorporating a status bar. We also suggest the use of a progress page. The progress page should include the list of all the judges an attorney is being asked to evaluate (only those for which the attorney indicated sufficient experience to evaluate) with color coding or another scheme to indicate which evaluations have been completed. After completing each evaluation, the attorney should automatically be returned to the progress page. The progress page can also remind attorneys they can complete any remaining evaluations at a later time.  Make responding to the demographics the last survey task. Because responding to the demographics is an invasive task that interrupts the key aspect of the survey which is answering the evaluation questions, we suggest putting the demographics on the last survey page. Once the attorney has completed all judge evaluations, the survey should then skip to a page asking about their demographics. 21 Review of Survey Questions  Include a clear identifier for each judge on the top of each page of the evaluation. In cases where an attorney is asked to evaluate more than one judge, it is possible for the attorney to lose track of the specific judge he or she is evaluating at the time. In reviewing attorney comments, we found at least one instance where an attorney was evaluating the incorrect judge. At the top of each screen asking questions about a specific judge include the judge’s name in large bold print. It is also helpful to add a picture of the judge. Another option is to include the name of the judge in each question rather than using the generic “judge.”  Simplify appearance demographics. Simplify and ask only one question about either the number of appearances (preferred) or the hours before the judge. While not perfectly related, the two are strongly correlated.  For the supporting information provided (left column links) identify which items are PDFs and which will take the attorney to a new web page. In addition, provide a way for the attorneys to link directly back to the survey page they left when they access the links for “Home,” “More info,” “Law and Rule,” “Anonymity,” and “FAQ”. A “Return to Survey” button should be placed at the bottom and top (for longer pages) of each of these web pages.  Make a clear distinction between “close account” and “sign out” or remove the close account altogether. Attorneys may easily confuse the purposes of these two functions in the survey and inadvertently close the survey before finishing. It would be best to remove the close account option altogether. In most online survey software, the login would be deactivated after the person has completed the survey or after a set period of time. If you choose not to remove the close account option then clarify both functions: o “Close account and end my evaluations.” o “Sign out to return later and complete the evaluations.” 22 Evaluation Questions  The current survey was revised to focus on behaviors that an attorney could observe in the courtroom.  The question scale (always, usually, sometimes, rarely, never) is appropriate.  The survey has eliminated double barrel items (questions that ask respondents to evaluate two or more aspects of behavior at once). The one possible exception is “The Judge issues written rulings and decisions that are clear” since decisions are a subset of rulings. Given their relationship, this should not result in problems or confusion for respondent.  There are instances of vague or ambiguous language in the survey questions, but minor changes would help improve some questions. Questions should have a clear and distinct meaning to respondents. That is, the question language should limit any interpretation or definition on the part of the attorneys answering the questions. The survey items include a number of words that, while they have a common meaning, could potentially be defined in slightly different ways by attorneys. The table below contains the text of each question with words highlighted that could be interpreted differently by attorneys. Survey Questions The Judge considers arguments from all parties before ruling. The Judge recognizes relevant issues. The Judge issues oral rulings that are clear. The Judge demonstrates knowledge of rules of evidence. The Judge is patient. The Judge adequately frames the nature of the proceedings. The Judge maintains decorum in the courtroom. The Judge shows respect to all courtroom participants. The Judge disposes of matters in a timely manner. The Judge appears appropriately prepared for court proceedings. The Judge listens attentively during court proceedings. The Judge issues written rulings and decisions that are clear. The Judge follows the applicable Rules of Procedure. The Judge adequately explains the nature of the proceedings to pro se litigants. 23 Survey Questions The Judge treats all parties fairly. The Judge demonstrates knowledge of substantive law. The Judge effectively manages progress of the case. However, without completely restructuring the survey to focus only on directly observable events (for example, the judge begins court on time), it is impossible to craft questions that would not be subject to slightly different interpretations. Thus, while the evaluation questions do use some vague or ambiguous terms, the context for each is reasonably clear and the questions measure the behaviors to which they refer. A few minor changes that can add clarity: The Judge [adequately] frames the nature of the proceedings. The Judge appears [appropriately] prepared for court proceedings. The Judge [adequately] explains the nature of the proceedings to pro se litigants. Comments Regarding the Judge  Make it clear to the respondent that their comments will be shared with the judges.  Remove the word “serious.” It is not appropriate to imply that the attorney may not be serious.  As with all other aspects, shorten the language that asks attorneys to comment on a judge’s performance. Remove the additional burden of having to read through a paragraph before the attorney enters his or her comments. Get straight to the point of the task: o “Judges find comments very helpful to improving their performance. Please provide any comments that explain your views of the judge’s performance. Remember to comment in a way that does not reveal anyone’s identity.” o “If you choose, you can provide specific examples that explain your answers to the questions above.” 24 Review of Survey Questionnaire and Materials for Potential Bias Based on Gender or Race/Ethnicity Lindsay Gannon, Market Decisions Research’s expert on implicit bias, reviewed the survey materials and questions to identify the presence of language that demonstrates implicit bias. Overview It can be difficult to categorize race/ethnicity or gender-based trigger words beyond the explicit and unacceptable terms we know to be offensive. There is a body of research covering a variety of professional fields (business, technology, teaching, criminal justice, medical, etc.) that shows how race/ethnicity and gender stereotypes are ingrained in ways of thinking[ CITATION Che16 \l 1033 ]. For example, how feminine qualities of communicating such as speaking softly or being perceived as emotional, are traits that could be seen as negative for women in traditionally male roles and professions. We also see how the more masculine traits of being direct, aggressive, taking the lead, and being an authority figure are seen as positive professional attributes for men but negative characteristics for women, with aggressive women being labeled high maintenance, difficult, or bossy[ CITATION Mad01 \l 1033 ]. We all carry bias in our thoughts and interactions. It is embedded in our brains and serves an important evolutionary purpose. At any given time, we are being bombarded with as many as 11 million pieces of information. To deal with this challenge our brains build mental models, or unconscious perceptional filters based on our personal and cultural experiences, in order to be efficient. Implicit bias is activated quickly and often unknowingly by cues that influence our behaviors, memories, and understanding of content. Discussing gender, race, and ethnicity as an element of implicit bias can be uncomfortable, but it is important to understand how implicit bias works, how it impacts outcomes and relationships, and the specific strategies that can be used to address it. In the workplace, recognizing the personal biases that each person brings, processing situations with the team or supervisor, thinking of people as individuals, and challenging stereotypes are all strategies that can minimize bias. Using these steps to increase awareness during difficult or stressful situations may limit stereotyping and improve responses. Market Decisions Research was tasked with reviewing the responses to the Massachusetts Judicial Performance Evaluation survey to identify words or specific language that appeared to be rooted in gender or other biases. If the goal is to use objective words, no matter the context, in an effort to remove any unconscious triggers, then please consider the suggestions and rationale listed below. Given the herculean task of removing any and all bias in subjective survey responses, it is possible that these small survey changes will not drastically change bias in outcomes for female judges and judges of color. The Massachusetts Committee on Judicial Performance 25 Evaluation has shown impressive initiative in addressing this issue with survey reviews, stakeholder and expert input, attention to best practice, and adhering to American Bar Association guidelines. Any recommendations for improving the survey tool or procedures would be considered part of a comprehensive quality improvement process that is continuous and ongoing.  Overall, the questions were found to be objective in wording and language, with some minor edits suggested.  The currently used practice of instructing survey respondents to recall specific details before evaluating a judge helps to counteract stereotypes that are often automatic, the brain’s path of least resistance. Survey Review Judge Evaluation - Instructions  The care with which the judge reviewed the arguments and evidence presented by the parties.  Examples of the judge’s command of substantive law and legal procedure. Suggestions: o Use of the word care: the term “care” could be applied favorably to women (caring and nurturing) but unfavorably to men that do not show emotion (even while treating people with respect and rendering fair decisions) and may be considered a gendered term in certain contexts. Suggested alternatives: attention, consideration, regard o Use of word command: the term “command” could be applied favorably to men (aggressive) but unfavorably to a woman on the bench (unfeminine or bossy) and may be considered a gendered term in certain contexts. Suggested alternatives: understanding, knowledge, expertise Evaluation Questions  No suggested edits based on gender/race/ethnicity.  A note on Q4: The Judge is patient. The idea of patience was found in open-ended comments to be viewed as a positive trait for some but negative for others; for example, being impatient was viewed 26 positively if it meant the judge was moving cases along quickly. Scoring on this question was internally consistent with other measures within the construct of integrity and judicial temperament, so after some consideration, the recommendation would be to leave this question and wording unchanged. Temperament and corresponding biased/negative comments are addressed in the open-ended comments section review later in this report. General Recommendations on Gender and Race/Ethnic Bias  In the event that our analysis of responses was to identify bias based on gender and race/ethnicity and if the Committee decides to adjust scores/control for bias there is justification and rationale for doing so. Bias exists, we can analyze and control for it to make scores equitable, and due to the confidential nature of this process, there are safeguards in place to support it.  Increase survey frequency and review fewer judges per round. A two-year time span of memory recall and asking for the review of more than 2-3 judges at a time will add to relying on stereotypes and memory failure/rewriting of events, even with the best practice in place of prompting for memory recall before each survey.  Adding a photo: This is a neutral recommendation. It is possible that a photo would help with recall. We might assume that since respondents already know the judge’s gender and race/ethnicity from appearing before them in court, a photo should not increase or decrease any already present stereotypes or implicit bias. It may be a worthwhile exercise in the quality improvement process to add a photo if the survey tool allows for it.  Training and Education: If not already included, advocate that an implicit bias/cultural competency component be added to the required “Practicing with Professionalism” attorney course. Advocate for implicit bias education and training in statewide CLE programs. Embed an implicit bias/cultural competency module in orientation and/or annual court employee training.  Final note: In the Massachusetts project summary provided to Market Decisions Research, it notes that the “departmental Chief Justices can edit for bias.” This may be an area of opportunity to address these issues, however, it does not go into detail about how this is done and if it has been effective to date. Works Cited Cheryl Staats, K. C. (2016). State of the Science Implicit Bias Review. Columbus: The Kirwan Institute. Heilman, M. E. (2001). Description and Prescription: How Gender Stereotypes Prevent Women's Ascent Up the Organizational Ladder. Journal of Social Issues, 17. 27 Detailed Results Descriptive Analysis The table below displays average scores by gender and race/ethnicity across all surveys administered during the pilot program period from 2014 to 2017. Statistical analysis was conducted to determine if there are differences between groups. Table 2. Overall Average Outcome Scores by Judge Gender and Race/Ethnicity Judges' Demographics Question The Judge considers arguments from all parties before ruling. The Judge recognizes relevant issues. The Judge issues oral rulings that are clear. The Judge demonstrates knowledge of rules of evidence. The Judge is patient. The Judge adequately frames the nature of the proceedings. The Judge maintains decorum in the courtroom. The Judge shows respect to all courtroom participants. The Judge disposes of matters in a timely manner. The Judge appears appropriately prepared for court proceedings. The Judge listens attentively during court proceedings. The Judge issues written rulings and decisions that are clear. The Judge follows the applicable Rules of Procedure. The Judge adequately explains the nature of the proceedings to pro se litigants. The Judge treats all parties fairly. The Judge demonstrates knowledge of substantive law. The Judge effectively manages progress of the case. Male Judges Female Judges White Judges Racial/ Ethnic Minority Judges 4.54 4.45 4.52 4.41 4.53 4.51 4.40 4.41 4.51 4.49 4.28 4.33 4.53 4.39 4.51 4.21 4.32 4.23 4.29 4.27 4.52 4.42 4.50 4.34 4.69 4.61 4.67 4.60 4.56 4.46 4.53 4.47 4.46 4.36 4.44 4.32 4.62 4.53 4.60 4.45 4.59 4.52 4.57 4.48 4.51 4.42 4.49 4.30 4.54 4.45 4.53 4.32 4.65 4.59 4.64 4.52 4.50 4.38 4.47 4.34 4.54 4.40 4.51 4.24 4.51 4.40 4.49 4.34 Judges’ scores are statistically significantly different on the average depending on Judges' gender. Judges’ scores are statistically significantly different on the average depending on Judges' 28 race/ethnicity. Overall, male judges tend to have significantly higher scores than female judges. In addition, white judges also tend to have higher scores than judges of other races or ethnicities in nearly all judicial performance ratings. The results below indicate that the race and ethnicity of a judge had a significant impact on his or her judicial performance scores with racial/ethnic minority judges having lower scores, on average, than their white counterparts. The table also suggests a possible interaction between a judge’s gender and the judge’s race/ethnicity. Overall, white male judges tend to receive considerably higher scores than all other demographic groups. Interestingly, when looking at racial/ethnic minority judges only, women minority judges tend to have higher scores than male minority judges on most scores except on questions that evaluate the legal ability of a judge such as “The Judge demonstrates knowledge of rules of evidence.” and “The Judge demonstrates knowledge of substantive law.” 29 Table 3. Overall Average Outcome Scores by Judge Gender and Race/Ethnicity Judges' Demographics White Males White Females Racial/Ethni c Minority Males Racial/Ethni c Minority Females The Judge considers arguments from all parties before ruling. 4.56 4.45 4.39 4.44 The Judge recognizes relevant issues. 4.55 4.42 4.30 4.26 4.53 4.41 4.32 4.35 4.56 4.42 4.24 4.18 4.33 4.21 4.22 4.34 4.54 4.43 4.33 4.36 4.70 4.61 4.58 4.63 4.57 4.45 4.42 4.54 4.48 4.36 4.26 4.41 4.64 4.53 4.42 4.49 4.60 4.52 4.45 4.52 4.53 4.43 4.28 4.34 4.57 4.47 4.30 4.35 4.66 4.60 4.50 4.54 4.52 4.38 4.31 4.38 4.57 4.43 4.26 4.21 4.53 4.41 4.31 4.38 Question The Judge issues oral rulings that are clear. The Judge demonstrates knowledge of rules of evidence. The Judge is patient. The Judge adequately frames the nature of the proceedings. The Judge maintains decorum in the courtroom. The Judge shows respect to all courtroom participants. The Judge disposes of matters in a timely manner. The Judge appears appropriately prepared for court proceedings. The Judge listens attentively during court proceedings. The Judge issues written rulings and decisions that are clear. The Judge follows the applicable Rules of Procedure. The Judge adequately explains the nature of the proceedings to pro se litigants. The Judge treats all parties fairly. The Judge demonstrates knowledge of substantive law. The Judge effectively manages progress of the case. Note: This analysis did not include determining differences between groups. 30 Table 4 shows that white male judges tend to receive higher scores across all departments and across all variables. Table 4. Average Outcome Scores by Court Type Court Type BMC District Court Housing Court Juvenile Court Averag e Score Whit e Male s Averag e Score White Males Averag e Score White Males Averag e Score White Male s The Judge considers arguments from all parties before ruling. 4.46 4.54 4.46 4.48 4.61 4.67 4.46 4.70 The Judge recognizes relevant issues. 4.43 4.57 4.45 4.50 4.55 4.61 4.51 4.68 The Judge issues oral rulings that are clear. 4.46 4.57 4.48 4.51 4.51 4.57 4.48 4.65 The Judge demonstrates knowledge of rules of evidence. 4.38 4.56 4.43 4.49 4.55 4.61 4.46 4.67 The Judge is patient. 4.21 4.26 4.27 4.24 4.45 4.59 4.27 4.58 The Judge adequately frames the nature of the proceedings. 4.44 4.55 4.47 4.50 4.53 4.59 4.49 4.66 The Judge maintains decorum in the courtroom. 4.63 4.70 4.64 4.64 4.74 4.77 4.60 4.70 The Judge shows respect to all courtroom participants. 4.46 4.56 4.49 4.48 4.64 4.76 4.50 4.76 The Judge disposes of matters in a timely manner. 4.45 4.53 4.49 4.53 4.34 4.38 4.33 4.37 The Judge appears appropriately prepared for court proceedings. 4.58 4.65 4.58 4.61 4.62 4.68 4.60 4.73 The Judge listens attentively during court proceedings. 4.53 4.59 4.53 4.52 4.65 4.71 4.57 4.76 The Judge issues written rulings and decisions that are clear. 4.46 4.54 4.45 4.48 4.47 4.48 4.51 4.66 The Judge follows the applicable Rules of Procedure. 4.44 4.54 4.47 4.51 4.50 4.51 4.45 4.62 The Judge adequately explains the nature of the proceedings to pro se litigants. 4.61 4.67 4.60 4.61 4.66 4.73 4.65 4.80 The Judge treats all parties fairly. 4.41 4.53 4.42 4.43 4.47 4.53 4.43 4.69 The Judge demonstrates 4.41 4.55 4.44 4.50 4.58 4.62 4.53 4.73 Question 31 Court Type BMC Question District Court Housing Court Juvenile Court Averag e Score Whit e Male s Averag e Score White Males Averag e Score White Males Averag e Score White Male s 4.47 4.58 4.50 4.53 4.46 4.50 4.41 4.49 knowledge of substantive law. The Judge effectively manages progress of the case. Table 4. Average Outcome Scores by Court Type (continued) Land Court Probate & Family Court White Average Male Score s Average Score White Male s The Judge considers arguments from all parties before ruling. 4.73 4.77 4.51 The Judge recognizes relevant issues. 4.64 4.68 The Judge issues oral rulings that are clear. 4.53 The Judge demonstrates knowledge of rules of evidence. Question Superior Court Average Score White Males 4.58 4.57 4.64 4.50 4.57 4.50 4.58 4.60 4.43 4.48 4.51 4.58 4.59 4.65 4.50 4.56 4.55 4.62 The Judge is patient. 4.54 4.67 4.26 4.32 4.31 4.40 The Judge adequately frames the nature of the proceedings. 4.61 4.69 4.46 4.52 4.52 4.59 The Judge maintains decorum in the courtroom. 4.85 4.88 4.64 4.71 4.72 4.77 The Judge shows respect to all courtroom participants. 4.77 4.85 4.51 4.59 4.56 4.64 The Judge disposes of matters in a timely manner. 4.07 4.17 4.32 4.33 4.49 4.58 The Judge appears appropriately prepared for court proceedings. 4.73 4.81 4.56 4.61 4.60 4.69 The Judge listens attentively during court proceedings. 4.74 4.79 4.54 4.60 4.62 4.68 The Judge issues written rulings and decisions that are clear. 4.59 4.64 4.44 4.48 4.53 4.61 The Judge follows the applicable Rules of Procedure. 4.64 4.68 4.53 4.59 4.59 4.66 The Judge adequately explains the nature of the proceedings to pro se litigants. 4.81 4.83 4.59 4.62 4.75 4.80 32 Land Court Probate & Family Court White Average Male Score s 4.46 4.55 Average Score 4.73 White Male s 4.76 The Judge demonstrates knowledge of substantive law. 4.68 4.72 4.54 The Judge effectively manages progress of the case. 4.44 4.59 4.40 Question The Judge treats all parties fairly. Superior Court Average Score 4.51 White Males 4.59 4.60 4.50 4.60 4.45 4.52 4.61 Bottom line It is clear that in general female and racial/ethnic minority judges tend to receive lower scores when compared to their white male counterparts. 33 Factor Analysis What is Factor Analysis? A factor analysis is a statistical procedure that is commonly used in the social sciences and market research to assess the structure of a dataset by evaluating the relationship between variables in that dataset. Factor analysis condenses highly correlated variables into a smaller set of constructs that cannot be measured or observed directly, but which may be easier to interpret or understand. By finding the variables which respondents rate in similar ways across all the data these underlying concepts are revealed. It is these broad underlying concepts (also known as constructs or components) which respondents draw upon when answering survey questions. In the case of the Massachusetts Judicial Performance Evaluation Survey, the procedure can help to identify what is important when an attorney evaluates a judge’s performance. These broad underlying concepts are the main factors an attorney uses in evaluating the performance of judges. An attorney’s responses to the individual questions are not independent but rather are informed and guided by these underlying concepts. What are these concepts? Some examples might include the legal ability of the judge or perceptions of his or her administrative skills. The key takeaway is that the individual questions on the survey are not measuring discrete, independent aspects of a judge’s performance but are measuring these underlying concepts by which attorneys are evaluating the performance of the judge. However, this begs a question: why ask the individual questions? It would seem more valid and useful to question about the underlying concepts directly, once they are identified. Unfortunately, this is not the case. It is difficult to ask direct questions about underlying concepts. These underlying concepts are often broad ideas that are difficult to evaluate in their whole. By asking a series of questions, researchers can zero in on the underlying concept since this concept influences the answer to each question. Think of the underlying concept as a bullseye on a special target. On this target, we can’t know where the bullseye is. We can only measure the hits on the target. By analyzing how the hits are grouped on all targets, we can deduce the location of the bullseye. The answers to the individual questions fall all around the bullseye on this target but together would average out to hit the bullseye. For example, one cannot simply ask an attorney to rate a judge’s “legal ability.” It is simply too complex a concept. Instead the survey might ask a series of questions regarding the knowledge of rules, understanding case precedents, following rules and procedures, the applicability of written rules to the case, and using applicable legal rules when making a decision. Together, the questions provide the attorney’s assessment of a judge’s “legal ability.” 34 Understanding these concepts is valuable because it offers greater insight into responses and provides a more accurate method to present data. First, it provides insight into how attorneys are actually evaluating judges. Second, by combining the results from the individual questions into a scale score, the results provide a more accurate assessment of performance as they are now related back to the concepts attorneys use when making their evaluation. Finally, it provides a way to present results that are easier to understand and interpret. An important caveat is that the analysis can sometimes group themes together that are different. This happens because attorneys are answering the questions the same way; giving judges the same ratings across all questions. 35 Factor Analysis of the Massachusetts Judicial Performance Evaluation Survey A factor analysis was conducted on the Massachusetts Judicial Performance Evaluation Survey data. The analysis revealed that there are two underlying or latent concepts (components) behind survey questions to identify these primary factors. The procedure identified two primary ways in which the attorneys evaluated judges in Massachusetts. The first is the judge’s legal ability and courtroom management while the second is the judge’s integrity and judicial temperament. Table 5 shows how the procedure grouped the survey questions into these two components. Table 5: Components within the MA JPE Survey Component 1 Legal Ability and Courtroom Management Component 2 Integrity & Judicial Temperament The Judge considers arguments from all parties before ruling. The Judge recognizes relevant issues. The Judge issues oral rulings that are clear. The Judge is patient. The Judge demonstrates knowledge of rules of evidence. The Judge maintains decorum in the courtroom. The Judge adequately frames the nature of the proceedings. The Judge shows respect to all courtroom participants. The Judge listens attentively during court proceedings. The Judge adequately explains the nature of the proceedings to pro se litigants. The Judge disposes of matters in a timely manner. The Judge appears appropriately prepared for court proceedings. The Judge issues written rulings and decisions that are clear. The Judge treats all parties fairly. The Judge follows the applicable Rules of Procedure. The Judge demonstrates knowledge of substantive law. The Judge effectively manages progress of the case. These results mean that the survey questions that measure the same general constructs produce similar scores. For instance, a judge who receives a score of 5 for the question “The Judge recognizes relevant issues” is also likely to receive a 5 for questions “The Judge issues oral rulings that are clear,” “The Judge demonstrates knowledge of rules of evidence,” and so on. Similarly, a judge who receives a score of 3 for the question “The Judge is patient” is more likely to receive a similar score for question “The Judge adequately explains the nature of the proceedings to pro se litigants.” 36 In this case, the factor analysis groups together two ideas (legal ability and courtroom management) into one factor since all of the questions in this component were answered similarly. Given the high correlations between survey variables and the clear separation of survey questions into the two general constructs, Market Decisions Research conducted the bias analysis using these two components rather than running it for each of the 17 individual questions. For this analysis we kept component one intact with all questions. Although it contains two thematically different ideas, the response patterns were the same across all items. Simply, the results would have been the same running it as one component or breaking it into separate legal ability and courtroom management components. Benefits of running analyses on components By running analyses on the components, we are able to present what is going on throughout the data in a more concise way, while giving us a better and broader understanding of how judges are being rated. Furthermore, the components reduce statistical noise. In statistics, noise refers to unexplained variation or randomness that is found within a given data set. For instance, a judge may be receiving scores of 5s and 4s from a specific attorney throughout the survey and receives a score of 1 for a particular question. Running advanced analyses at the component level allows us to mitigate this unexplained variation in the scores of that judge. How component scores are calculated Component scores were calculated as the average score among all the survey questions included in a specific component. 37 Regression Analysis A regression analysis is a statistical procedure used to examine the effect of a single variable on an outcome while mathematically controlling for other variables. Importantly, regression automatically controls for every variable included in a model. This allows us to explain the effect of a variable of interest in the model without having to worry about the effects of the other variables. Regression analysis enables us to determine which factors or characteristics matter most. It also informs researchers of how those factors and characteristics interact with each other. And, perhaps most importantly, regression analysis helps determine how certain factors or characteristics are really influencing the outcome observed in the survey results. In order to assess possible gender and race/ethnicity bias in judicial evaluations, these data were analyzed using multiple linear regression. Regression analyses were run separately for each type of court on each of the two identified component scores (Legal Ability and Courtroom Management, and Integrity and Judicial Temperament). The regressions assess the effect of the judge’s gender and race/ethnicity, while controlling for the demographics of the attorney giving the review, the number of years the judge has been on the bench, the number of attorney appearances before a judge, and the number of years in practice, among others. Table 6 shows the variables that were included in these analyses. Table 6: Variables used in regression analysis Variable Type Judge Gender Main Effect Variable Judge Race/Ethnicity Main Effect Variable Judge Gender by Judge Race Interaction Effect Variable Attorney Gender Control Variable Attorney Race/Ethnicity Control Variable Attorney Gender by Attorney Race/Ethnicity Interaction Effect Variable (Control) Attorney Gender by Judge Gender Interaction Effect Variable Attorney Race by Judge Race/Ethnicity Interaction Effect Variable Judge's Time on Bench Control Variable Attorney Appearances Before Judge Control Variable Attorney Hours Before Judge Control Variable Attorney Percent of Litigation in Practice Control Variable Attorney Years in Practice Control Variable 38 As seen above, interactions between judge gender and race/ethnicity, attorney gender and race/ethnicity, judge gender and attorney gender as well as judge race/ethnicity and attorney race/ethnicity are included in the regression analyses. These interaction variables examine the effects of the cross pairings of two variables. For example, the regression results may show a statistically significant effect of judge gender on the Legal Ability & Courtroom Management scores where female judges may be rated differently than male judges (this is called a main effect); however, there may also be an interaction effect between a judge’s gender and race/ethnicity. For example, a difference in scores between racial/ethnic minority female judges and racial/ethnic minority male judges. For the purposes of this gender and racial/ethnic bias analysis report, only gender and race/ethnicity main effects or interactions are discussed. The regression results for all the variables can be found in the appendix. Definitions A main effect is the statistically significant effect of a variable (gender or race/ethnicity) on an outcome variable (judicial score) – ignoring all other potential confounding variables. By statistically significant, we mean an effect that is consistent throughout the survey and is not attributable to random variation of responses. An effect of interaction occurs when the relation between (at least) two variables is modified by (at least) one other variable. In other words, the strength or the direction of a relation between (at least) two variables is different depending on the level of some other variable(s). For example, when the analysis looks at both the gender and race of a judge we are measuring the effects of the interaction. How are differences determined? Differences are not determined by comparing the averages between groups. Rather, regression looks at the pattern of responses for each respondent to see if they are significantly different than other respondents. The averages presented below are simply a way to illustrate these differences. 39 Overall Results (2014 – 2017) Table 7 displays average scores on both components by gender and race/ethnicity across all survey cycles. Overall, there is a main effect for both judge gender and judge race/ethnicity in the Integrity and Judicial Temperament component as male judges and white judges tend to have consistently higher ratings than other judges. Table 7: Overall Average Outcome Scores by Judge Gender and Race/Ethnicity Outcome Legal Ability and Courtroom Management Integrity & Judicial Temperament Male Judges Female Judges White Judges 4.52 4.41 4.50 Racial/Ethnic Minority Judges 4.30 4.54 4.44 4.51 4.42 Judges’ scores are statistically significantly different on the average depending on Judges' gender. Judges’ scores are statistically significantly different on the average depending on Judges' race/ethnicity. In addition, there are strong interaction effects between judge race/ethnicity and judge gender in both scores. The interaction effect can be seen when looking at scores for white male judges. Throughout the survey, white male judges tend to receive consistently higher scores than all other demographic groups including white female judges. Table 8: Interactions between Judge Gender and Judge Race/Ethnicity Outcome Legal Ability and Courtroom Management Integrity & Judicial Temperament Male White Judges Racial/Ethnic Minority Judges 4.54 4.29 4.55 4.39 White Judges 4.42 4.44 Female Racial/Ethnic Minority Judges 4.33 4.47 Moreover, we have found evidence of bias among white attorneys. The table below is a good illustration of the statistically significant interaction effect between judge race/ethnicity and attorney race/ethnicity. Table 9 shows that white attorneys rate white judges in general, significantly higher than racial and ethnic minority judges (e.g. 4.51 on average for the Legal Ability & Courtroom Management component compared to 4.31, a 0.20-point gap). Moreover, they tend to rate white male judges even higher and racial/ethnic minority male judges lower. Table 9 also shows that these wide disparities in scores do not exist in racial/ethnic minority attorneys’ evaluations. 40 Table 9: Average Attorney Ratings by Race/Ethnicity White Attorneys Outcome Legal Ability and Courtroom Management Integrity & Judicial Temperament White Judges (male) 4.51 (4.55) Racial/Ethnic Minority Judges (male) 4.31 (4.29) 4.53 (4.57) 4.43 (4.40) Racial/Ethnic Minority Attorneys White Racial/Ethnic Judges Minority (male) Judges (male) 4.43 (4.48) 4.38 (4.41) 4.42 (4.46) () In parentheses are the average scores received by male judges within that specific group. 41 4.45 (4.44) Results by Court Regression analyses were run separately for each court department (BMC, District, Housing, Juvenile, Land, Probate & Family, Superior). It is important to note that the Housing, Land, and Probate & Family courts did not contain enough data for sub-group analyses by race/ethnicity. Boston Municipal Court There is a main effect for judge gender in the Boston Municipal Court with male judges receiving significantly higher scores than female judges on average. In addition, there is a significant interaction effect between attorney race/ethnicity and judge race/ethnicity with white attorneys rating white judges consistently higher than racial/ethnic minority judges. District Court In the District Courts, contrarily to what we observed overall, there is a significant effect in favor of female judges in the score of Integrity & Judicial Temperament. This effect is mainly driven by white female judges who tend to receive significantly higher scores than all other demographic groups. There is also an interaction effect between judge gender and judge race/ethnicity in the score of Legal Ability and Courtroom Management. Moreover, there is a significant interaction effect between attorney race/ethnicity and judge race/ethnicity in both scores. There is also a significant interaction effect between attorney gender and judge gender in the score of Integrity and Judicial Temperament. Housing Court There is an interaction effect in the Housing Courts between attorney gender and judge gender in the Integrity and Judicial Temperament score with male attorneys rating male judges consistently higher than female judges (4.67 on average for male judges compared to 4.53 for female judges). Juvenile Court There are main effects for both judge gender and judge race/ethnicity in the Juvenile courts with male judges receiving significantly higher scores than female judges and racial/ethnic minority judges receiving significantly higher scores than white judges. It is important to note that, this latter trend is unique to the Juvenile court system. In addition, there are strong interaction effects between judge race/ethnicity and judge gender in both scores. The interaction effect can be seen when looking at scores for white male judges and racial/ethnic minority female judges. Throughout the survey, 42 racial/ethnic minority female judges and white male judges tend to receive consistently higher scores than all other demographic groups including white female judges. Land Court In the Land Court, although the eye test shows that on the average male judges are more likely to receive higher scores than female judges, our regression models could not detect if these differences in scores were statistically significant. This is likely due to the low number of judges present in that court (6 judges). Probate & Family Court In the Probate & Family Court, the differences in the scores based on gender were not deemed significant after we controlled for judges’ time on the bench. Superior Court In the Superior Courts, there is a main effect for judge gender in the score of Integrity and Judicial Temperament with male judges receiving significantly higher scores than their female counterparts. In addition, there is also a significant interaction effect between judge gender and race/ethnicity in both scores. This interaction effect can be seen in the scores of racial/ethnic minority female judges who tend to score significantly higher than racial/ethnic minority male judges on the average. 43 Table 10: Regression Significant Results by Court (Legal Ability and Courtroom Management) Component 1 (Legal Ability and Courtroom Management) BM C Distric t Court Housin g Court Juvenil e Court Land Cour t Probat e Court Superio r Court Judge Gender X X ○ X ○ ○ ○ Judge Race/Ethnicity Judge Gender * Judge Race/Ethnicity ○ ○ NA X NA NA ○ ○ X NA X NA NA X Attorney Gender ○ ○ ○ ○ ○ X ○ Attorney Race/Ethnicity Attorney Gender * Attorney Race/Ethnicity Attorney Gender * Judge Gender Attorney Race/Ethnicity * Judge Race/Ethnicity ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ X X NA ○ NA NA ○ Variable Table 11: Regression Significant Results by Court (Integrity & Judicial Temperament) Component 2 (Integrity & Judicial Temperament) BM C Distric t Court Housin g Court Juvenil e Court Land Cour t Probat e Court Superio r Court Judge Gender X X ○ X ○ ○ X Judge Race/Ethnicity Judge Gender * Judge Race/Ethnicity ○ ○ NA X NA NA ○ ○ ○ NA X NA NA X Attorney Gender X ○ ○ ○ ○ X ○ Attorney Race/Ethnicity Attorney Gender * Attorney Race/Ethnicity Attorney Gender * Judge Gender Attorney Race/Ethnicity * Judge Race/Ethnicity ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ X ○ ○ X X ○ ○ ○ ○ X X NA ○ NA NA ○ Variable X indicates that the above effects are statistically significant using Multiple Linear Regressions. ○ indicates that the above effects are NOT statistically significant using Multiple Linear Regressions. NA indicates that significance test not available due to lack of data. 44 Content Analysis Overview Qualitative data were collected during the Massachusetts Judicial Performance Evaluation with attorneys providing comments on 335 judges. The objective of the survey was to produce quantitative and qualitative data of the opinions of attorneys on the overall performance of Massachusetts’ trial court judges. The survey asked all attorneys a basic set of 17 questions to rate a judge’s judicial performance. It then allowed attorneys to provide in-depth and constructive comments in order to clarify their views on the judge’s performance. Attorneys provided over 13,000 open-ended comments to the in-depth probing question in the survey. It is important to note that not all attorneys shared comments for the probing question. The Market Decisions Research (MDR) team conducted a content analysis by meticulously reviewing and coding all open-ended comments for similar and recurrent themes. We suggest using this content analysis as a supporting analysis to the initial bias analysis conducted by MDR. Content analysis is a research method used to make inferences by interpreting and coding oral or written material such as open-ended comments in surveys. Content analysis allows us to investigate socio-cognitive and perceptual constructs that may be difficult to study via traditional quantitative methods. In order to analyze open-ended comments provided by attorneys during the 2014-2017 judicial performance evaluation period, Market Decisions Research created a list that enabled us to code each individual comment into specific categories. The majority of comments were classified into well-defined judicial performance categories such as, legal ability, judicial temperament, impartiality, and so on. These categories also align with the underlying constructs used in the survey and the ABA guidelines. In addition, MDR analysts flagged comments for any explicit or implicit reference to a judge’s demographics, physical appearance, among others (and specifically any negative mention). The table below shows how attorney comments were grouped together for coding purposes. Nature of comment Comment about Judge's Legal Ability Comment about Judge's Integrity and Impartiality Matter discussed in the comment Knowledge of the law Knowledge of rules of procedure and evidence Keeping current with developments in law Experience level Dignity and respect Favor or disfavor toward anyone Consideration of both sides of argument Open-minded Fairness 45 Comment about Judge's Communication Skills Comment about Judge's Professionalism and Temperament Comment about Judge's Administrative Capacity Attorney Bias Judge Bias Clear and Understandable Communication with kids Explanation of procedures and ruling Hand-writing Dignified Treating people with courtesy Patience and self-control Physical Appearance Levelheadedness, hotheaded, emotional Punctuality Maintaining control Makes ruling in a timely manner Well prepared for cases Scheduling Judge's physical appearance (looks, eye rolling, posture) Judge is emotional Questioning Judge's intelligence Implicit bias - stereotypes, attitudes Explicit bias Judge is biased - gender Judge is biased - race/ethnicity Judge is biased - general The primary goal of this content analysis was to examine whether the comments showed evidence of bias against certain groups of judges. To do this Market Decisions Research coders under the guidance of MDR’s implicit bias expert kept track of words and phrases that are often negatively associated with certain groups of individuals. It is important to note that, codes were applied across the board for every judge, not dependent on race/ethnicity or gender. Bias can change individuals’ perception, attitude, behavior and even how they listen and interpret what a person or group is saying. Biased comments are separated out into several areas including, explicit, implicit, physical appearance, intelligence, and emotional. Explicit bias is when someone is conscious of their bias toward a particular person or group of people. This includes the use of racial or sexist terms, hate speech and discrimination. Implicit bias is also known as “unconscious bias,” these are the underlying thoughts and feeling you have about a person that can be positive or negative, however, you are unconscious of. For example, when we associate stereotypes or have a particular attitude toward someone, this could be projected in comments we make to them or about them to others. Words such as, unpredictable, erratic, condescending, hostile, weak and short-tempered were used to identify where an attorney could have underlying bias. 46 Physical appearance, intelligence, and emotional categories were used as subcategories of implicit bias. These were identified because they are areas that bias typically comes through, such as the objectification of women based on their physical attributes, body language, and expressions. Bias regarding minority races/ethnicities and women could also be identified based on assumptions of intelligence or emotional reaction. Once all open-ended responses were coded, they were merged with survey responses based on attorney and judge identification numbers. These indicators were then analyzed by the frequency of the codes according to race/ethnicity and gender of the judges. MDR also used regression to investigate the relationship between coded comments and judicial ratings. 47 Content Analysis Results Table 12 displays the likelihood of groups of judges of receiving certain comments from attorneys. If all things are equal we would expect those figures to be equal to 1 or be as close as possible to 1. For instance, a likelihood of 1 based on judge’s gender simply means that both male and female judges are as likely to be the subject of a specific type of comment. Table 12: Likelihood of Judges Receiving Comments Nature of Attorney Comments Total Mention s Negative comment about judge's legal ability Negative comment about judge's integrity and impartiality Negative comment about judge's communication skills Negative comment about judge's professionalism and temperament Negative comment about judge's administrative capacity 1160 Positive comment about judge's legal ability Positive comment about judge's integrity and impartiality Positive comment about judge's communication skills Positive comment about judge's professionalism and temperament Positive comment about judge's administrative capacity Attorney comments about judge's physical appearance (looks, eye rolling, posture) Attorney comments questioning judge's intelligence Attorney comments about judge's being emotional Implicit attorney biased comments stereotypes, attitudes Explicit attorney biased comments Judge Gender Female Male Judge Judge s s 1.19 0.88 Judge Race/Ethnicity Racial/ethni White c Minority Judge Judges s 1.33 0.96 985 1.04 0.97 0.89 1.01 216 1.25 0.84 0.89 1.01 1996 1.03 0.98 0.93 1.01 1165 1.08 0.95 0.82 1.02 3718 0.98 1.01 0.47 1.07 3910 0.92 1.05 0.68 1.04 800 1.00 1.00 1.01 1.00 4613 0.96 1.02 0.79 1.03 2569 0.92 1.05 0.81 1.03 27 2.31 0.18 0.31 1.09 152 2.12 0.30 2.20 0.84 46 1.98 0.39 1.82 0.89 450 1.68 0.57 1.14 0.98 11 1.89 0.44 1.52 0.93 48 Nature of Attorney Comments Total Mention s Judge Gender Female Male Judge Judge s s 0.99 1.01 Judge Race/Ethnicity Racial/ethni White c Minority Judge Judges s 0.60 1.05 Judge is biased - gender 90 Judge is biased - race/ethnicity 22 1.06 0.96 1.52 0.93 Judge is biased - general 426 1.17 0.89 1.04 0.99 Positive comments about judge (general) Negative comments about judge (general) Other (comments that could not be classified as any of the above) 878 0.92 1.05 0.87 1.02 75 0.89 1.07 0.81 1.03 515 0.87 1.08 1.12 0.98 Overall, the meticulous review of the open-ended comments provided by attorneys to the Massachusetts Judicial Performance Evaluation survey did corroborate previous findings from the regression analysis: this is strong evidence of racial and gender bias against female and racial/ethnic minority judges in the survey results. While MDR coders only identified 11 comments that could be classified as direct explicit biased comments, both female and racial/ethnic minority judges were more likely to be the subject of those comments. The results also show that female judges were 2.3 times more likely to be the subject of derogatory comments involving a judge’s physical appearance, his or her look or posture among others. Female judges were more likely to receive comments about judge's physical appearance (looks, eye rolling, posture), to have their intelligence questioned, to be perceived as being too emotional, to be the subject of both explicit and implicit biased comments, to receive negative comments about their communication skills, their legal ability, their administrative capacity, their integrity and impartiality, and their professionalism and temperament. Moreover, female judges were more likely to be called biased. On the other hand, racial/ethnic minority judges were the most likely to have their intelligence questioned. They were also more likely to be perceived as being too emotional, to be the subject of explicit and implicit biased comments, and to receive negative comments about their legal ability. Moreover, racial/ethnic minority judges were more likely to be called biased in general as well as to be called biased because of their race/ethnicity. It important to note that a regression analysis did show that the nature of a comment provided by an attorney to a judge did have a significant impact on how that attorney rated that judge. 49 Conclusions and Recommendations The results of this research clearly suggest some degree of bias in reviews of Massachusetts trial court judges during the 2014-2017 period.  Overall, white male judges tend to receive higher judicial scores than any other group.  The bias is noticeably stronger against racial/ethnic minority male judges overall.  The bias is also noticeably stronger among white attorneys.  In the Boston Municipal Courts, there is strong evidence of bias against female judges and among white attorneys.  In the District Courts, there is strong evidence of bias against female judges and racial/ethnic minority male judges.  In the Housing Courts, there is some evidence of bias among male attorneys.  In the Juvenile Courts, there is evidence of bias in favor of white male judges and racial/ethnic minority female judges.  In the Superior Courts, there is evidence of bias against racial/ethnic minority male judges.  Furthermore, a content analysis showed that overall, female judges were more likely to receive comments about judge's physical appearance (looks, eye rolling, posture), to have their intelligence questioned, to be perceived as being too emotional, to be the subject of both explicit and implicit biased comments, to receive negative comments about their communication skills, their legal ability, their administrative capacity, their integrity and impartiality, and their professionalism and temperament. Moreover, female judges were more likely to be called biased.  Racial/ethnic minority judges were the most likely to have their intelligence questioned. They were also more likely to be perceived as being too emotional, to be the subject of explicit and implicit biased comments, to receive negative comments about their legal ability. Moreover, racial/ethnic minority judges were more likely to be called biased in general as well as to be called biased because of their race/ethnicity. The data show some degree of implicit bias in nearly every court in the Massachusetts judicial system with racial/ethnic minority male judges faring worst on average than all 50 other demographic groups including racial/ethnic minority female judges. To cope with the issue of bias in judicial ratings, Market Decisions Research recommends that the Committee on Judicial Performance Evaluation advocates that the Bar offer customized training to attorneys based on the results presented in this report as we believe that the first step in combatting unconscious and hidden bias is to become aware of it in the first place. The Committee could also opt to mathematically correct for the observed bias on a court-by-court basis by adjusting the scores of groups for which evidence of bias have been found in judicial performance ratings. This process would require applying a weight adjustment to survey responses. It is however important to note that this process is quite complex because current numbers cannot be used to adjust past rounds and vice versa – each around of evaluation must be adjusted independently. The analysis required for adjustment would entail the Court hiring an outside consultant for each round of evaluation. In addition, each metric must be adjusted independently. Any decision to adjust scores mathematically must be subject to careful consideration by the Committee, as the complexity and cost may not make this a viable choice for the Massachusetts program. It is not surprising that bias exists in the survey responses and scores. We know bias is an unconscious internalization of experiences and patterns that influence our thoughts and actions. It is also generally accepted that there are benefits to promoting equality in the workforce, and that we can use data informed solutions to bring positive impact to important institutions, like the judiciary. Leveling the playing field by addressing bias in evaluation scores can promote wellbeing at work, open a productive dialogue for colleagues/constituents, and increase a sense of inclusiveness. A proactive decision to recognize and resolve bias in survey results is one step towards larger accomplishments in policy and practice. Having a lens of “equality in all policies” where you are walking the walk, will attract and retain a qualified workforce representing a spectrum of viewpoints and expertise. It will also encourage similar practices in the courtroom and build trust between legal professionals and under-represented communities. We would encourage the committee to continue to discuss these issues and consider sustainability plans for data analysis in future evaluations. This will reassure attorneys and judges that the scores are important, consistent, and comparable. The ultimate goal is that through training, awareness, and changing societal views these biases may become less prevalent over time. While MDR recognizes that such a recommendation can be controversial, we believe that implicit bias is a societal problem and overcoming subconscious stereotypes that certain attorneys may harbor about certain groups of judges will require a pragmatic and multifaceted approach. 51 52 Appendices Include “summary question” in future survey administrations MDR recommends that the Committee includes a “summary question” in future administrations of the Massachusetts JPE survey. A summary question, as the name suggests, is a question that allows attorneys to provide a global and final assessment of a judge’s overall performance. This question will most likely be the last question in the questionnaire and can be a variant of the following question: Based on your entire experience with Judge X and your responses to the previous questions related to the performance evaluation criteria, do you think Judge X meets judicial performance standards? o Yes, meets performance standards o No, does not meet performance standards o No opinion Such a question is very useful because it would give the Committee a better understanding of how well the survey measures performance; in addition, it will also allow the Committee to assess the overall performance of the survey questionnaire itself. For instance, if attorneys provide high ratings on the 17 performance evaluation question but conclude that the judge does not meet performance standards this would indicate that there may exists certain aspects of judicial performance that the survey is not adequately measuring or not measuring at all. 53 Recommendations for reporting judicial performance results to judges One of the main conclusions from the factor analysis is that the answers to the individual questions are related. That is, answers to the individual questions are not independent of one another. The answers are guided and informed by important concepts that attorneys use in evaluating judicial performance. To provide a measure of how attorneys are evaluating judges based on these important concepts, we propose that the Judicial Performance Evaluation Program of Massachusetts calculate three scores based on the 17 survey questions and that these scores be reported along with the results from the individual questions. While the factor analysis grouped the 17 questions into two primary factors, we would recommend dividing the Legal Ability and Courtroom Management factor in two, and report separate scores for legal ability and courtroom management. We believe that these do represent distinct areas of performance evaluation. 2 The questions included in each of the three scores are listed below. Legal Ability Score Questions The Judge recognizes relevant issues. The Judge issues oral rulings that are clear. The Judge demonstrates knowledge of rules of evidence. The Judge issues written rulings and decisions that are clear. The Judge follows the applicable Rules of Procedure. The Judge demonstrates knowledge of substantive law. Courtroom Management Score Questions The Judge adequately frames the nature of the proceedings. The Judge disposes of matters in a timely manner. The Judge appears appropriately prepared for court proceedings. The Judge effectively manages progress of the case. 2 While the questions about legal ability and courtroom management were grouped together by the factor analysis, reliability analysis indicates that as separate scores both are highly reliable measures. 54 Integrity & Judicial Temperament Score Questions The Judge considers arguments from all parties before ruling. The Judge is patient. The Judge maintains decorum in the courtroom. The Judge shows respect to all courtroom participants. The Judge listens attentively during court proceedings. The Judge adequately explains the nature of the proceedings to pro se litigants. The Judge treats all parties fairly. Calculations of Summary Scores We recommend calculating a score in instances where at least one-half of the questions are answered. If fewer than half of the questions are answered we recommend not calculating the score. Question Threshold for Calculating Score (Minimum number of questions with a valid response required to calculate a score) Number of Score Questions Legal Ability 3 Courtroom Management 2 Integrity & Judicial Temperament 4 Each of the three summary scores should be computed for each respondent meeting the threshold for the number of questions. We recommend using the average of all answered questions normalized back to the original five-point scale. This is calculated by adding all of the individual question scores and then dividing by the number of questions with a valid response. SCORE x ∑ Item ix (valid ) = nx (valid) where: 55 SCORE x is score itemix is the score for each item for score x n x is the number of items in score x For example, to calculate the legal ability score given to the judge, the calculation would add together the valid responses (any with a rating from one to five) to the six questions asked about legal ability and divide by the number of questions with a valid response. That is, if an attorney answers all six questions, the score would be calculated by adding together the six survey responses and then dividing this total by six. This would provide a rating scale from 1 to 5 that would correspond to the question categories. For example an average score of 4 would correspond to the category “usually.” The individual items can also be calculated as averages and presented along with the scores for Legal Ability, Courtroom Management, and Integrity & Judicial Temperament. Quality Improvement Process Considering that the purpose of the Judicial Performance Evaluation program is the continuous improvement of judges, MDR believes that these three scores will provide to a judge a better understanding of which characteristic of his or her performance needs to be improved. A judge can easily compare his or her three scores and see which score is inadequately low and realize where improvement may be needed. If, for instance, a judge’s rating is much lower on the Court Management score compared to the remaining two scores, it is an indication that the judge may need to improve his or her management skills. Once the area of improvement is identified, the judge can then go back to the individual questions that are included in the Courtroom Management score to determine which specific traits of his or her Courtroom Management skills are being rated negatively; finally, s/he can refer to the open-ended comments to understand what attorneys are saying about these traits. 56 Additional Analyses The Scope of Gender and Racial/Ethnic Bias in the Results If the Committee decides to modify scores to adjust for the bias in the results, it would be appropriate to apply adjustments to judicial scores (e.g. Legal Ability score, Judicial Temperament score, etc.). We do not recommend adjusting responses to individual survey questions. The Committee should be aware that adjusting scores can be very complex and is beyond the scope of this project. The tables below show the scope of the mathematical adjustments that could be made based on the current results to account for the observed bias in judicial scores. For instance, the results suggest that in the Boston Municipal Courts the Legal Ability and Courtroom Management score of female judges would have to be adjusted by 8% to account for the observed main effect in that court in favor of male judges. In most cases the figures presented below only constitute a slight change in a judge’s scores. The disparity in scores due to bias should be taken into account when a judge’s score is close to the passing score (we suggest between 65% and 70%). Scope of Bias based on Regression Coefficients Legal Ability and Courtroom Management Score Racial/Ethnic Female White Male Minority Judges Judges Judges BMC 8% District Court -3% 3% Housing Court NA Juvenile Court 5% -7% 13% Land Court NA Probate & Family Court NA Superior Court 5% Scope of Bias based on Regression Coefficients Integrity & Judicial Temperament Score Racial/Ethnic Female White Male Minority Judges Judges Judges BMC 5% District Court -3% Housing Court NA Juvenile Court 7% -7% 20% Land Court NA Probate & Family Court NA Superior Court 4% 8% 57 Analysis of Judges who received less than 70% in Categories Usually/Always Currently, results from the JPE survey are reviewed as a component of the overall evaluation of judicial performance. Prior to 2014, the results were used to help determine if an improvement plan should be developed for a judge, based on whether 70% or more of the attorney responses for a judge are within the categories “always” and “usually” when answering the 17 survey questions. Currently, every judge is required to have a professional development plan. To assess differences between judges based on Gender and Race/Ethnicity, MDR calculated a score for each judge based on this 70% metric and analyzed these scores by these judge characteristics. A summary is provided in the table below. Number of Judges who Received less than 70% in categories Usually/Always Total Judge Percent Male 22 206 11% Female 20 128 16% Racial/Ethnic Minority 10 40 25% White 32 294 11% White Male Judges 16 185 9% Overall 32 334 10% Judge Demographic The results of that analysis clearly support the findings of our bias and content analysis. We found, based on the former 70% threshold, that 1 in 4 racial/ethnic minority judges would have been subject to an improvement plan and 1 in 6 female judges may be subject to an improvement plan compared to only 1 in 10 male judges and a similar proportion of white judges. The lowest rate was observed among white male judges with only 9% receiving a score below this 70% threshold. 58 Detailed Regression Results Variable Overall Regression Results Component 1 (Legal Ability and Courtroom Management) Significanc Where are the discrepancies in e scores coming from? Judge Gender ○ Judge Race/Ethnicity ○ Judge Gender*Judge Race/Ethnicity X White Male Judges Attorney Gender X Male Attorneys Attorney Race/Ethnicity Attorney Gender*Attorney Race/Ethnicity ○ Attorney Gender*Judge Gender Attorney Race/Ethnicity*Judge Race/Ethnicity ○ X Judge's Time on Bench ○ Attorney Appearances Before Judge ○ Attorney Hours Before Judge Attorney Percent of Litigation in Practice ○ X - Attorney Years in Practice X - ○ White Attorneys Vs. White Judges X indicates that the above effects are statistically significant using Multiple Linear Regressions. ○ indicates that the above effects are NOT statistically significant using Multiple Linear Regressions. 59 Variable Overall Regression Results Component 2 (Integrity & Judicial Temperament) Significanc Where are the discrepancies in e scores coming from? Judge Gender X Male Judges Judge Race/Ethnicity X White Judges Judge Gender*Judge Race/Ethnicity X White Male Judges Attorney Gender X Male Attorneys Attorney Race/Ethnicity Attorney Gender*Attorney Race/Ethnicity ○ Attorney Gender*Judge Gender Attorney Race/Ethnicity *Judge Race/Ethnicity ○ X White Attorneys Vs. White Judges Judge's Time on Bench X - Attorney Appearances Before Judge X - Attorney Hours Before Judge Attorney Percent of Litigation in Practice ○ X - Attorney Years in Practice X - ○ X indicates that the above effects are statistically significant using Multiple Linear Regressions. ○ indicates that the above effects are NOT statistically significant using Multiple Linear Regressions. 60 Variable Regression Significant Results by Court Component 1 Legal Ability and Courtroom Management) Land Distric Housin Juvenil BMC Cour t Court g Court e Court t Probat e Court Superio r Court Judge Gender X X ○ X ○ ○ ○ Judge Race/Ethnicity Judge Gender*Judge Race/Ethnicity ○ ○ NA X NA NA ○ ○ X NA X NA NA X Attorney Gender Attorney Race/Ethnicity Attorney Gender*Attorney Race/Ethnicity Attorney Gender*Judge Gender Attorney Race/Ethnicity *Judge Race/Ethnicity Judge's Time on Bench Attorney Appearances Before Judge Attorney Hours Before Judge Attorney Percent of Litigation in Practice Attorney Years in Practice ○ ○ ○ ○ ○ X ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ X X NA ○ NA NA ○ ○ X ○ ○ ○ X X ○ ○ ○ ○ X ○ ○ ○ ○ ○ ○ ○ ○ ○ X X X ○ ○ X X X ○ ○ X ○ ○ ○ X indicates that the above effects are statistically significant using Multiple Linear Regressions. ○ indicates that the above effects are NOT statistically significant using Multiple Linear Regressions. NA indicates that significance test not available due to lack of data. 61 Variable Regression Significant Results by Court Component 2 (Integrity & Judicial Temperament) Land BM Distric Housin Juvenil Cour C t Court g Court e Court t Probat e Court Superio r Court Judge Gender X X ○ X ○ ○ X Judge Race/Ethnicity Judge Gender*Judge Race/Ethnicity ○ ○ NA X NA NA ○ ○ ○ NA X NA NA X Attorney Gender Attorney Race/Ethnicity Attorney Gender*Attorney Race/Ethnicity Attorney Gender*Judge Gender Attorney Race/Ethnicity*Judge Race/Ethnicity Judge's Time on Bench Attorney Appearances Before Judge Attorney Hours Before Judge Attorney Percent of Litigation in Practice Attorney Years in Practice X ○ ○ ○ ○ X ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ X ○ ○ X X ○ ○ ○ ○ X X NA ○ NA NA ○ X X ○ X ○ X ○ ○ ○ ○ ○ ○ ○ X ○ ○ ○ ○ ○ ○ ○ X X ○ ○ ○ X X X X ○ X ○ ○ ○ X indicates that the above effects are statistically significant using Multiple Linear Regressions. ○ indicates that the above effects are NOT statistically significant using Multiple Linear Regressions. NA indicates that significance test is not available due to lack of data. 62 Detailed Content Analysis Results Frequency table of attorney comments by judge gender. Judge Gender Nature of Attorney Comments Negative comment about judge's legal ability Negative comment about judge's integrity and impartiality Negative comment about judge's communication skills Negative comment about judge's professionalism and temperament Negative comment about judge's administrative capacity Positive comment about judge's legal ability Positive comment about judge's integrity and impartiality Positive comment about judge's communication skills Positive comment about judge's professionalism and temperament Positive comment about judge's administrative capacity Female Judges Male Judges Diff Total Mentions Count 1160 530 46% 630 54% 9% 985 392 40% 593 60% 20% 216 104 48% 112 52% 4% 1996 802 40% 1194 60% 20% 1165 486 42% 679 58% 17% 3718 1378 37% 2340 63% 26% 3910 1378 35% 2532 65% 30% 800 305 38% 495 62% 24% 4613 1638 36% 2975 64% 29% 2569 909 35% 1660 65% 29% % Count % Attorney comments about judge's physical appearance (looks, eye rolling, posture) Attorney comments questioning judge's intelligence Attorney comments about judge's being emotional Implicit attorney biased comments stereotypes, attitudes 27 24 89% 3 11% -78% 152 124 82% 28 18% -63% 46 35 76% 11 24% -52% 450 291 65% 159 35% -29% Explicit attorney biased comments 11 8 73% 3 27% -45% 63 Judge Gender Female Judges Male Judges Diff Total Mentions Count Judge is biased - gender 90 34 38% 56 62% 24% Judge is biased - race/ethnicity 22 9 41% 13 59% 18% 426 192 45% 234 55% 10% 878 302 34% 576 66% 31% 75 25 33% 50 67% 33% 515 199 39% 316 61% 23% Nature of Attorney Comments Judge is biased - general Positive comments about judge (general) Negative comments about judge (general) Other 64 % Count % Frequency table of attorney comments by judge race/ethnicity. Nature of Attorney Comments Negative comment about judge's legal ability Negative comment about judge's integrity and impartiality Negative comment about judge's communication skills Negative comment about judge's professionalism and temperament Negative comment about judge's administrative capacity Positive comment about judge's legal ability Positive comment about judge's integrity and impartiality Positive comment about judge's communication skills Positive comment about judge's professionalism and temperament Positive comment about judge's administrative capacity Total Mentions Judge Race/Ethnicity Racial/ethnic Minority Judges White Judges Count % Count Diff % 1160 184 16% 976 84% 68% 985 109 11% 876 89% 78% 216 23 11% 193 89% 79% 1996 212 11% 1784 89% 79% 1165 114 10% 1051 90% 80% 3718 350 9% 3368 91% 81% 3910 406 10% 3504 90% 79% 800 57 7% 743 93% 86% 4613 448 10% 4165 90% 81% 2569 209 8% 2360 92% 84% Attorney comments about judge's physical appearance (looks, eye rolling, posture) Attorney comments questioning judge's intelligence Attorney comments about judge's being emotional Implicit attorney biased comments stereotypes, attitudes 27 1 4% 26 96% 93% 152 40 26% 112 74% 47% 46 10 22% 36 78% 57% 450 61 14% 389 86% 73% Explicit attorney biased comments 11 2 18% 9 82% 64% Judge is biased - gender 90 5 6% 85 94% 89% Judge is biased - race/ethnicity 22 4 18% 18 82% 64% 65 Nature of Attorney Comments Judge is biased - general Positive comments about judge (general) Negative comments about judge (general) Other Total Mentions Judge Race/Ethnicity Racial/ethnic Minority Judges White Judges Count % Count Diff % 426 53 12% 373 88% 75% 878 85 10% 793 90% 81% 75 10 13% 65 87% 73% 515 62 12% 453 88% 76% 66