Office of Inspector General Chicago Board of Education Nicholas Schuler, Inspector General EXECUTIVE MEMO The Chicago Board of Education: To: Miguel del Valle, President Sendhil Revuluri, Vice President Luisiana Melendez Amy Rome Lucino Sotelo Elizabeth Todd-Breland Dwayne Truss Dr. Janice Jackson, Chief Executive Officer From: Nick Schuler, Inspector General Date: September 26, 2019 (Revised October 18, 2019) Re: NWEA Test Administration A performance review by the CPS Office of Inspector General has found a concerning level of unusually long test durations, high pause counts and other irregularities during CPS’s Spring 2018 administration of a high-stakes exam produced by the Northwest Evaluation Association. As a result, the OIG is recommending that CPS overhaul its NWEA test administration procedures. A series of data analyses by the OIG’s Performance Analysis Unit uncovered some unusual patterns in these untimed, computer-based tests. These analyses, combined with interviews of students and teachers at some schools with unusual results, indicated that proper test administration procedures were repeatedly violated during the Spring 2018 NWEAs. This occurred in a minority of cases, but enough to be worrisome and to warrant action. Students and teachers described a variety of improper practices, including gaming and cheating techniques, that could have added to test durations. And test durations emerged as an issue of major concern in OIG analyses. 567 West Lake Street  Suite 1120  Chicago, Illinois  60661-1405 Phone (773) 534-9400  Fax (773) 534-9401  inspectorgeneral@cpsoig.org During the Spring 2018 testing season, nearly 83,000 CPS NWEA tests — or more than one out of every four — took at least double the national average duration to complete, according to an OIG analysis. More than 24,000 third- through eighth-grade Math and Reading NWEAs took CPS students at least three times as long as their peers nationally to finish. Nearly 7,500 tests took four or more times the national norm to complete, as shown below. Spring 2018 CPS Test Durations vs. National Norm # of Tests % of Tests Students* Schools** 320,561 100% 160,906 498 At least 2 times the national norm 82,824 25.8% 55,630 495 At least 3 times the national norm 24,269 7.6% 17,853 482 At least 4 times the national norm 7,448 2.3% 5,832 401 At least 5 times the national norm 2,388 0.7% 1,966 258 All CPS 3rd – 8th grade tests *Reflects number of students with the indicated duration ratio on at least one test. **Reflects number of schools with at least one test with the indicated duration. Source: OIG Analysis of CPS Data As a result, in some CPS schools and grades, a maximum 53-question test that the average student nationally completed in roughly an hour turned into a multi-day and even a week-long event. A degree of longer durations could be expected in those CPS NWEAs that carry student stakes involving promotion or admission to selective-enrollment high schools. However, Spring 2018 Reading and Math tests in all grades — including those with no student stakes — averaged longer durations than the national norms, the OIG found. In addition, CPS non-Diverse Learners were more likely to take long tests than Diverse Learners, an OIG analysis showed. Notably, tests with long durations often were concentrated in certain schools. Of the more than 24,000 tests with durations that were at least three times the national norm, nearly 5,000more than 4,700 (or almost 20 percent) were clustered in just 14 schools out of roughly 500 (or almost 3 percent of all schools). At each of these 14 schools, more than a third of all tests took three times the national norm to complete. This is especially worrisome because excessive durations can make it difficult to accurately compare CPS results to national norms, NWEA has warned. Page 2 of 11 According to NWEA: The data from CPS’s MAP tests is compared to the norms for growth and performance from NWEA’s nationally representative sample. For the inferences from that comparison to be accurate, CPS testing conditions should be reasonably reflective of testing conditions of other schools that administer MAP throughout the United States. . . . Test durations that vary excessively from the norms, or test durations that differ significantly between terms, may pose a risk to the accuracy of inferences made from the results of NWEA assessments. If test durations exceed the norms by an excessive degree, NWEA believes it may be reasonable to take steps to bring testing durations closer to norms. Importantly, CPS’s duration averages1 are getting worse over time. Although NWEA is untimed, NWEA said it releases its Average Map Growth Test Durations to help schools understand typical durations. However, CPS average durations2 were longer than the national norm in every grade tested in Reading and Math in Spring 2016. In each of the next two Spring administrations, every average duration increased to the point that, between 2016 and 2018, the average duration of CPS Reading and Math tests, in every grade, third through eighth, saw double-digit percent increases. Thus, if no action is taken, CPS durations could continue to move farther and farther from national norms, putting CPS results at increasing risk. In addition, well over 10more than 12,000 CPS tests were paused at least five times during the Spring 2018 NWEAs, as indicated below. CPS Spring 2018 NWEA Tests by Number of PausesTimes Paused or Time-OutsTimed Out Total 0 1-4 5-9 10-14 15-19 20+ Tests 262,3983 125,629145 125,972145 02,993 ,424 ,388 9,27310,5 24 1,033149 276290 215218 Students* 132,2221 78,69791,2 81,50894,3 52,128 21 09 7,8358,90 4 9451,053 254268 178180 371401 146165 5460 24 Schools* 429463 423459 429462 Source: OIG Analysis of NWEA data of CPS Spring 2018 Grade 3-8 Tests When asked which CPS durations were likely problematic, NWEA responded that this was an “empirical question” and “NWEA has not published any research on this issue.” 1 The data CPS receives from NWEA includes the duration of each test taken, so CPS already has the ability to monitor duration data without any extra information from NWEA. 2 Page 3 of 11 *Reflects students and schools with at least one Reading or Math test in the indicated pause range. Note: The OIG did not receive pause data for some tests. Those tests are excluded from this analysis. In fact, NWEA data indicated some CPS tests were paused 20, 30, 50 and even 60 times (See Appendix F). The heaviest pause rates occurred in seventh and eighth grades, where more thanat least seven percent of all tests in each of those grades were paused at least five times. As with long durations, tests with at least five pauses tended to be clustered in certain schools. Any paused test replaces a pending question with a new question of similar difficulty in the same area, according to NWEA. However, if questions are paused because students don’t know the answer, students “have a 50 percent chance of getting a new question that might be more favorable,” NWEA told the OIG. Consecutive pauses of the same question number increase those odds “considerably,” NWEA said. Yet OIG interviews indicated that when some students were stumped by questions, some proctors paused and then resumed student tests in order to produce new questions. The OIG considers this an attempt to game the test. NWEA told the OIG that this tactic is improper. In addition, in its latest Guidance for Administering MAP Growth Assessments, published in April 2019, NWEA specifically labeled a “high number of pauses” as “outside the bounds of what is expected.” NWEA conceded to the OIG that “this language did not appear in prior iterations of the Guidelines.” Appropriate uses for pauses would be a bathroom, water or wiggle break, NWEA says. Other CPS students were intentionally letting hard questions “time out” after 25 minutes of inactivity, OIG interviews with students and teachers indicated. This paused the tests and, once resumed by a proctor, produced new questions. For example, one student said kids in her seventh-grade Math classroom were so worried about how their NWEAs would impact their high school admission chances that they didn’t guess at questions that stumped them. Instead, they just sat and waited for them to time out. This student’s Math test was paused or timed out 18 times over her four days of testing, custom-ordered NWEA data showed. Intentional time-outs are an improper testing practice, according to NWEA. “In general, there is no reason for a student or proctor to allow a question to `time out,’ ” NWEA told the OIG. NWEA added at another point: “The assessment’s validity may be compromised if a student intentionally allows a question to time out because he/she does not know the answer and would like a new question.” Page 4 of 11 NWEA also added that “When testing conditions are manipulated to inflate artificially student scores, the RIT3 and percentile scores are higher than they would be under normal conditions.” And, NWEA cautioned, “An inflated score is also less useful as a diagnostic score.” Excessive durations and pauses can occur for benign reasons. However, in untimed tests like the NWEAs that carry stakes for students, teachers, principals and schools, they also can be indicative of attempts to game the test to win higher scores or gains. In fact, the OIG found both high durations and high pause counts were associated with a greater likelihood of achieving unusually high 2018 gains.4 The association between unusually high growth and test durations is indicated in the chart below.5 3 NWEA scores its tests on its RIT, or Rasch Unit, scale. The OIG compared each student’s Reading and Math growth from Spring 2017 to Spring 2018 to the average growth of CPS students in the same grade with the same Spring 2017 score on the same subject test. Students whose gains were two or more standard deviations above the average growth (1.9 percent of all CPS tests) were considered to have unusually-high growth. 4 The OIG compared each student’s Reading and Math growth from Spring 2017 to Spring 2018 to the average growth of CPS students in the same grade with the same Spring 2017 score on the same subject test. Only students with normal grade progressions were counted. Students whose 5 Page 5 of 11 For example, the above chart shows that the longer students took on their Reading and Math tests, the more likely they were to produce the kind of unusually high gains seen among only 1.9 percent of all Spring 2018 CPS tests. A similar pattern was observed as pause counts increased. And both these general trends were true among Diverse Learners as well as non-Diverse Learners, OIG analyses indicated. OIG interviews with 20 students in schools with long durations, high pauses or unusually high gains, as well as with 10 teachers from a cross section of schools, indicated that proctors and/or students were breaking proper test procedures in several ways that added to the duration of Spring 2018 NWEA exams. These tactics, as described, included everything from attempts to game the test to coaching and outright cheating, although not all were specifically prohibited in CPS training materials. Such tactics included: o Students who sat on a question for 25 minutes, without answering it, so that the question would automatically pause, or “time out,’’ and be replaced with a new question once a proctor resumed the test. o Proctors who allowed students to skip hard questions by giving them a certain number of pauses per test day or per test question. o Teachers who asked students to write down both the questions, as well as all of the answers, of any difficult questions. This information was collected at the end of the test day and sometimes used in later lessons. o Classic forms of coaching or cheating, such as proctors who told students to re-read questions, nodded or shook their heads at student answers, rephrased questions, or gave students math formulas. STAKES Spring NWEA results carry stakes for many CPS parties. They impact, to varying degrees: student promotions in third, sixth and eighth grades; for seventh graders, admission to selective-enrollment high schools and selective programs; the evaluations of Math and Reading teachers; principal evaluations; Independent School Principal status; and school SQRP levels. NWEA results also help drive curricular decisions that can involve CPS resources at the school and district level. Thus, NWEA is so integral to so many aspects of CPS that the accuracy of its results is paramount. gains were two or more standard deviations above the average growth (1.9 percent of all CPS tests) were considered to have unusually-high growth. Page 6 of 11 However, the OIG found that CPS NWEA tests are being administered with insufficient security protocols for a test with so many different stakes. Among other things, this includes allowing Reading and Math teachers whose evaluations depend in part on NWEA results to be the sole proctor of their students’ tests. In its guidance for administering high-stakes NWEAs, even NWEA recommends as a best practice that a student’s teacher be assisted by a second proctor with “no direct investment in the performance of those students being tested.” CAPPING NWEA DURATIONS Some test experts told the OIG they would not recommend using an untimed test as a high-stakes exam. Untimed tests with multiple breaks or exams that stretch over multiple days are vulnerable to testing irregularities during down times, including coaching students on questions “harvested” by proctors as they walk around testing rooms or question students afterwards, one test security expert said. Another expert noted that because NWEA carries stakes for teachers, teachers who proctor their own students have no incentive to encourage kids to finish the test in a reasonable time period. NWEA also is an adaptive test, meaning it adapts to each student’s ability level by following correct answers with harder questions and incorrect answers with easier questions. It is constructed so that a student is expected to get half the questions at his or her achievement level right and half of them wrong. Based on student and teacher accounts, some general-education students, faced with high-stakes, adaptive, untimed tests, may be spending an excessively long time struggling with and stressing over questions they are not even expected to get correct. This can be unhealthy. (For example, one teacher said one high-scoring eighth grader went through “mental torture” while taking more than a week to complete her NWEA.) It also is a questionable use of instructional time. Thus, the OIG’s recommendations include moving toward a reduction in NWEA durations, preferably by setting a time limit for non-Diverse Learners. PRIOR AUDIT An April 2018 audit by the CPS Office of Internal Audit and Compliance found that CPS’s NWEA “detective” controls for detecting test irregularities as well as its “preventive” controls for preventing test irregularities needed strengthening. Audit recommended a series of reforms, but the OIG found that some key suggestions were never executed, were partially executed or were poorly executed. For example, as part of its “detective” overhaul, Audit suggested that CPS create a formal NWEA Assessment Monitoring Process that would use at least seven NWEA Page 7 of 11 data points, as well as other criteria, to flag subject-grade combinations within schools for unusual test results. However, ultimately only three NWEA data points were used, and two of the three (changes in RIT scores and changes in percentile scores) are closely aligned so, preferably, only RIT scores should be used, according to one test security expert. The Audit report also recommended that one criteria for flagging unusual test results should be the number of pauses per test because “a high number of pauses could indicate that the test administrator is encouraging students to pause the exam in order to skip difficult questions.” However, this criteria was not included in CPS’s NWEA Assessment Monitoring Process. In addition, some schools with Reading or Math tests, by grade, that had multiple Assessment Monitoring flags based on 2018 results were never visited during audits of 2019 testing procedures, according to Audit documents provided by the Department of Student Assessment.6 In other cases, schools with flags were visited — but not the subjects and grades in those schools that drew the flags, documents showed. Audit documents indicated that the protocol used to audit classrooms did not assess whether general education students were being tested in small groups, which is generally not allowed. This omission is concerning because some general education students described to the OIG being tested in small groups, rather than with their Reading or Math class, by proctors who used improper testing procedures. As a “preventive” control, Audit recommended that CPS develop mandatory NWEA training. This was done, but the PowerPoint used in the proctor training did not adequately cover clear examples of unacceptable behavior, as at least one Test Security Guidebook7 recommends. The five-question post-training “exit slip” that had to be passed (by getting four of five questions correct) to proctor NWEAs did not sufficiently cover testing irregularities, even though testing irregularities is what Audit said CPS needed to prevent. Instead, questions included “When is the last day of testing for benchmark grades?” and “How many units are in the NWEA test?” The PowerPoint advised employees to anonymously report “cheating or fraud allegations” via an Anonymous Fraud Reporting Form that told filers that the name The Office of Internal Audit and Compliance audited many classrooms with unusual test results in 2018 and 2019, but is not expected to do so in 2020, one Student Assessment official said. The OIG hopes these important annual audits will continue, but with improved protocols. 6 See Test Security Guidebook: Preventing, Detecting and Investigating Security Irregularities, 2013, produced by the Council of Chief State School Officers Technical Issues in Large-Scale Assessment (TILSA) State Collaborative. 7 Page 8 of 11 associated with their Google accounts would be recorded once they submitted their forms. In truth, names were not recorded, but some complainants could have been discouraged from filing complaints based on the misinformation on the form, one CPS official said. The OIG also should be listed in training documents as an office that should be contacted by phone or email, anonymously or not, with test irregularity concerns. Audit also recommended that the penalties for breaking test administration rules, which can include termination, be listed in the Test Security Agreement that all proctors must sign. This was never done. The OIG agrees it should be done. Finally, Audit suggested that CPS determine the feasibility of rotating proctors so teachers are not proctoring their own students. Nevertheless, many CPS Reading and Math teachers continue to be the sole proctor of their own students — something the OIG and even NWEA oppose when teachers have a stake in students’ results. OIG interviews with students and teachers indicated CPS teachers often proctor their own students, but sometimes they don’t. Among schools with excessive durations, unusual numbers of pauses or high test gains, the OIG heard of several situations in which coaches, school deans or classroom assistants proctored generaleducation students — sometimes in small groups, which is not allowed under existing CPS rules. Therefore, the OIG strongly recommends that CPS find an auditable way of documenting who proctors each test. The OIG found numerous testing irregularities during the 2018 NWEA tests. It only makes sense that the proctor in the room during any irregularities be recorded. This would deter test gaming and cheating and also aid the investigation of any testing irregularities. Also, audits should be based on the proctor who was in the room when testing irregularities occurred. Instead, CPS generally has been identifying unusual results from one test year, but not visiting classrooms in the subject and grade where the unusual results occurred until the following test year. As a result, CPS auditors usually are, in effect, flying blind because they have no idea if the proctors they are auditing were in the test rooms the previous year when unusual results occurred. The OIG credits the Departments of Student Assessment and School Quality Measurement and Research for proactively requesting an audit of CPS testing procedures, but finds that the reforms that emanated from it did not go far enough. The OIG recommends that CPS hire a test security expert to examine its NWEA testing procedures and help it implement improvements such as those listed below. This is a highly technical area; it warrants the help of an independent, highly experienced expert. The OIG should play a role in hiring the test security expert, preferably in time to institute reforms for the Spring 2020 testing season. Page 9 of 11 RECOMMENDATIONS Based on its performance review, the OIG recommends that CPS: 1. Reduce its duration averages, preferably by setting a reasonable time limit for general education students. CPS should analyze its durations annually to monitor this situation. 2. Take concrete steps to limit pauses and develop clear instructions to proctors and test administrators about the right and wrong way to use pauses. 3. Find an auditable way to record the proctors of each test, preferably in a test data field. 4. Focus audits on the proctors who were in the classrooms that previously produced unusual test results, rather than on the grades and subjects in a school that may be using different proctors by the time audits are conducted. 5. Prohibit Reading and Math teachers whose evaluations are tied in part to their students’ NWEA results from being the sole proctors of their students. One solution is adding a second proctor who has no stake in the test. One proctor should be responsible for the integrity of the test session, preferably the proctor without any stakes in the test. 6. Bolster NWEA administration training and the “exit-slip” mini-test that must be passed to proctor a test. Include advice on how to guard against improper pauses and unusually long durations, among other things. During training, cite the OIG as an office that should be called, anonymously or not, with test irregularity concerns. 7. Insert penalties for test cheating in the Test Security Agreement that all proctors must sign, as recommended by Audit in April 2018. 8. Hire a test security expert for help and guidance in overhauling NWEA test administration and security procedures and in addressing the concerns outlined in this report, preferably in time for Spring 2020 tests. The current NWEA contract, worth up to $2.2 million, ends June 30, 2020. CPS has one remaining option to renew. If CPS cannot adequately address its duration issue and obtain an agreement for NWEA to provide additional needed data, CPS may want to consider a different test as a high-stakes assessment. If so, the test security expert can help CPS write the Request for Proposal for a new test vendor while, if possible, helping CPS and NWEA improve the Spring 2020 test administration. The OIG should be involved in the hiring of the test security expert and be kept apprised of any NWEA contracting changes or RFPs for a new test vendor resulting from this report. Page 10 of 11 A Summary Report that discusses the evidence and contains the OIG’s full findings and recommendations accompanies this memo. We would appreciate any comments, as well as any corrective actions the Board or CPS administration is planning, within 45 days, or by Tuesday, November 12Monday, December 2, so we can fold them into any public report we release on NWEA test administration. If you have any questions, please feel free to contact me at (773) 534-9410. CC: Pedro Soto, Chief of Staff to the Chief Executive Officer Kathryn Ellis, Acting Chief of Staff to the Board of Education LaTanya McDade, Chief Education Officer Joseph Moriarty, General Counsel Ronald DeNard, Senior Vice President of Finance Arnaldo Rivera, Chief Operating Officer 18-01670 / Performance Review-NWEA Test Administration Page 11 of 11