Consortium 2018 State?by?State Comparisons D. J. McRae, October 11, 2018 [Updated Late November, 2018] When two consortiums of states were chosen by the US Department of Education in 2010 to develop new statewide assessment systems, one of the purposes was to generate state~by?state comparable achievement data. Roughly 45 states initially signed up for potential use of consortium tests, but by 2015 when the new tests were ready for their first ?operational? use, only 18 states administered the Smarter Balanced tests and 11 states (plus the District of Columbia) administered the PARCC tests, representing just under 50 percent of total K-12 enrollments across the country. In 2018, 11 states administered Smarter Balanced tests, and 4 states (plus DC) administered PARCC tests, representing 29 percent of K-12 enrollments across the country. In addition, Louisiana, Massachusetts, Colorado, and Rhode island (PARCC) and Michigan (Smarter Balanced) used public domain test questions for their statewide tests but did not use full consortium protocols and hence are not included in this set of state-by-state comparisons. A footnote on page 5 notes this will be the last year that comparable information will be available for PARCC states. The data charts on pages 2 and 4 provide state-by?state results for the Smarter Balanced and PARCC states, respectively, for spring 2018 testing. The results are expressed as "percents meeting target? grade- by-grade for English Language Arts and Mathematics, along with average percents across grades. On pages 3 and 5, the average gain scores for each state for 2015 thru 2018 are provided, respectively, for Smarter Balanced and PARCC states (plus DC). Notes describing the data in the charts are provided at the bottom of pages 3 and 5, respectively, for SBAC and PARCC. Results represent preliminary data released by states in many cases, with final data to be released later. Annual percent?meeting-target results from Smarter Balanced states are not comparable to results from PARCC states, and even within consortia there may be some differences in test administration or reporting practices across states. However, within consortiums, the comparability of scores is sufficient for general comparisons. Finally, it is fair game to average gain scores for ELA and Math for each state to produce an annual overall gain score for 2016, 2017, and 2018 results. In the early 2000?s, highly respected educational measurement expert Bob Linn testified before Congress that 3? to 4-percentage point annual gains for statewide testing programs could be characterized as good to very good, and 2?point annual gains were typical. With this background, it is fair game to interpret each annual gain score as a letter grade based on a 4.0 grade point average (GPA) metric, with 4.0 being an A, 3.0 being a B, 2.0 being a C, 1.0 being a D, and 0.0 being an F. Annual gain scores for 2016 thru 2018 for all consortium states (plus DC) are provided on page 6. The year-to?year gain scores and assigned letter grades on page 6 are comparable across consortiums. Smarter Balanced 2018 State-by-State Comparisons [Level 3 Above Percents] Compiled by D. J. McRae, English/Language Arts G__rad_.e 5 5 .6 .7. .8. Avie 1 California 48 49 49 48 50 49 48.8 2 Connecticut 53 55 58 54 55 56 55.2 3 Delaware 52 55 58 52 54 53 54.0 4 Hawaii 53 51 56 53 52 ?55 53.3 5 Idaho 50 50 55 53 54 54 52.7 6 Montana 50 49 53 51 51 49 50.5 7 Nevada 46 48 50 44 47 47 47.0 8 Oregon 47 50 55 52 57 56 52.8 9 South Dakota 51 50 53 52 56 56 53.0 10 Vermont 50 53 55 53 57 57 54.2 11 Washington 56 57 59 56 60 59 57.8 Averages 51 52 55 52 54 54 52.7 Mathematics 1 California 49 43 36 38 37 37 40.0 2 Connecticut 54 51 45 44 44 43 46.8 3 Delaware 54 50 43 40 39 39 44.2 4 Hawaii 55 47 43 42 37 38 43.7 5 Idaho 52 48 43 44 44 41 45.3 6 Montana 49 45 40 39 39 37 41.5 7 Nevada 48 42 36 32 31 30 36.5 8 Oregon 46 43 40 38 42 41 41.7 9 South Dakota 55 49 41 44 48 48 47.5 10 Vermont 52 49 42 41 44 42 45.0 11 Washington 58 54 49 48 49 48 51.0 Averages 52 47 42 41 41 40 43.9 Smarter Balanced 2015?18 Gain Scores English/Language Arts 2015 2016 2017 2018 15?16 16?17 17-18 Stai ELA ELA ELA ELA Gain Gain Gain 1 California 42.3 46.7 46.8 48.8 +4.4 +0.1 +2.0 2 Connecticut xx* 55.8 54.2 55.2 xx* -1.6 +1.0 3 Delaware 51.7 54.8 53.8 54.0 +3.1 ?1.0 +0.2 4 Hawaii 47.7 50.5 49.2 53.3 +2.8 ?1.3 +4.1 5 Idaho 49.7 51.8 51.2 52.7 +2.1 ?0.6 +1.5 6 Montana xx* 50.0 49.8 50.5 xx* -0.2 +0.7 7 Nevada xx* 48.3 46.2 47.0 xx* -2.1 +0.8 8 Oregon 538 53.3 51.5 52.8 ?0.5 ?1.8 +1.3 9 South Dakota 47.5 51.2 49.8 53.0 +3.7 -1.4 +3.2 10 Vermont 53.7 56.5 52.5 54.4 +2.8 -4.0 +1.9 11 Washington 55.5 57.8 57.0 57.8 +2.2 ?0.8 +0.8 Was 2015 2016 2017 2018 15?16 16?17 17-18 State Math Math Math Math Gain Gain Gain 1 California 34.2 37.3 38.2 40.0 +3.3 +0.9 +1.8 2 Connecticut 40.3 44.2 45.8 46.8 +3.9 +1.6 +1.0 3 Delaware 40.7 43.7 44.5 44.2 +3.0 +0.8 -0.3 4 Hawaii 42.2 43.0 43.2 43.7 +0.8 +0.2 +0.5 5 Idaho 40.8 43.3 43.3 45.3 +3.5 0.0 +2.0 6 Montana xx* 41.0 41.2 41.5 xx* +0.2 +0.3 7 Nevada xx* 33.8 33.3 36.5 xx* ?0.5 +3.2 8 Oregon 43.5 42.8 41.8 41.7 -1.2 ?1.0 -0.1 9 South Dakota 41.2 44.5 45.8 47.5 +3.3 +1.3 +1.7 10 Vermont 43.2 46.7 44.2 45.0 +3.4 ~2.5 +0.8 11 Washington 49.8 51.5 51.2 51.0 +1.7 ?0.3 ?0.2 Notes for Smarter Balanced Data: All averages and gains are based on Grade 3-8 data. Only selected states use Smarter Balanced HS tests. Montana and Nevada participated in Smarter Balanced testing in 2015, but both states experienced technology difficulties that prevented generation of representative scores for the entire state. This circumstance prevents calculation of selected gain scores. Connecticut discontinued the Performance Task fer the ELA test in 2016, so for comparability reasons the 15-16 ELA gain score is not recorded. 3 PARCC 2018 State?by-State Comparisons [Level 4 Above Percents] Compiled by D. J. McRae, English Language Arts .G.__rade .3: ?1 6 .3 1. Dist Columbia 31 35 34 31 39 33 33.8 2. Illinois 37 39 36 34 40 36 37.0 3. Maryland 39 43 42 39 46 41 41.7 4. New Jersey 52 58 58 56 63 60 57.8 5. New Mexico 29 29 31 28 29 29 29.2 Averages 38 41 4O 38 43 40 39.9 Mathematics Grade 3 :1 5 6 8 1. Dist Columbia 41 34 33 24 25 xx 31.4 2. Illinois 38 32 31 27 31 31 31.7 3. Maryland 42 39 38 32 xx xx 37.8 4. New Jersey 53 49 49 44 43 xx 47.6 5. New Mexico 32 26 28 21 20 xx 25.4 Averages 41 36 36 30 30 xx 34.8 PARCC 2015?18 Gain Scores ME Dist Columbia Illinois Maryland New Jersey New Mexico Dist Columbia Illinois Maryland New Jersey New Mexico Notes for PARCC Data: Most PARCC states utilized PARCC End?of?Course Math tests for the High School level rather than PARCC grade level tests. Since course taking patterns differ from state?to-state, HS results are not included. All averages and gains reflect grades 3-8 only, and the xx under a grade means the state does not uniformly administer PARCC grade level tests to all students at that grade level. All xx?s reflect administrations of Algebra and/or Geometry End?of?Course tests to grade 7 and 8 students taking these courses, rather than the regular grade level tests. The pattern of xx?s in the chart for 2018 results are identical to the patterns for 2015 thru 2017. Averages are based on grade levels with comparable results. The PARCC consortium provided item retirement item replacement/ and resulting cut score adjustment services to their member states in the past, but individual states will be responsible for these services in future years. As a result, states using PARCC items as a base for future tests will no longer have comparable gain scores beyond 2018. 2015 Ave 26.2 36.0 38.8 50.0 23.2 25.3 28.8 31.8 41.0 21.4 English/Language Arts 2016 Ave 26.5 36.5 38.7 52.8 24.5 29.5 31.3 37.3 45.6 23.0 2017 Ave 32.5 37.0 40.5 56.0 26.7 30.2 31.2 37.0 45.8 22.4 2018 Ave 33.8 37.0 41.7 57.8 29.2 Mathematics 31.4 31.7 37.8 47.6 25.4 15?16 Gain +0.3 +0.5 -0.1 +2.8 +1.3 +4.2 +2.5 +5.5 +4.6 +1.6 16-17 Gain +6.0 +0.5 +1.8 +3.2 +1.2 +0.7 01 -0.3 +0.2 -0.6 17-18 Gain +1.3 0.0 +1.2 +1.8 +2.5 +1.2 +0.5 +0.8 +1.8 +3.0 Math Gain Score Averages by Year with and Letter Grades AveGain GPA AveGain GPA AveGain GPA 15?16 Letter 16?17 Letter 17-18 Letter SBAC 1. California 3.75 A 0.50 1.90 2. Connecticut xx -- 0.00 1.00 3. Delaware 3.05 -0.10 -0.05 4. Hawaii 1.80 ?0.55 2.30 5. ldaho 2.30 0.30 1.75 6. Montana xx 0.00 0.50 D- 7. Nevada xx -1.30 2.00 8. Oregon ?1.40 0.60 D- 9. South Dakota 3.50 A- -0.05 2.45 10. Vermont 3.65 As ?3.25 1.35 11. Washington 2.00 -0.55 0.30 Averages 2.46 -0.90 1.28 PARCC 1. Dist Columbia 2.25 3.35 1.25 2. Illinois 1.50 C- 0.20 0.25 3. Maryland 2.70 0.75 1.00 4. New Jersey 3.70 1.70 C- 1.80 5. New Mexico 1.45 0.80 2.75 Averages 2.40 1.17 1.41 GPA to Letter Conversions: A 3.50 to 4.49, 2.50 to 3.49, 1.50 to 2.49, 0.50 to 1.49, Less than 0.50. Within each range, the higher range of 0.25 to 0.49 merits a plus sign, the lower range of 0.50 to 0.74 merits a minus sign. Equal to or greater than 4.50 merits an Observations for Smarter Balanced and PARCC 2018 State-by?State Comparison Scores Smarter Balanced vs PARCC Results, ELA vs Math Results, Trends Across Grades, and Gain Results It is clear students score better on Smarter Balanced tests than on PARCC tests. Smarter Balanced states averaged 53 percent meeting targets for ELA and 44 percent for Math, while PARCC states averaged 40 percent meeting targets for ELA and 35 percent for Math. In 2004, Bob Linn noted differences of 3 to 4 percent are clearly meaningful. Differences of close to 10 percent for ELA are clearly very meaningful differences. In addition, consortium ELA tests averaged 46.5 percent meeting target, while consortium Math tests averaged 39.5 percent meeting target, another meaningful difference. While there may be demographic differences between the two cohorts of states, or there may be differences for implementation of common core instruction, it is unlikely either of these reasons would cause the large differences in Smarter Balanced scores vs PARCC scores. Rather, it is likely that the differences between Smarter Balanced and PARCC results are due to the tests themselves, either in the difficulty of the items or in the setting of threshold scores for the respective targets upon which the data in the charts are based. Perhaps the best way to describe the differences between Smarter Balanced and PARCC results is simply that PARCC has the more difficult set of tests. A look at trends across grades shows no obvious trends for ELA results for both consortiums, but do show declining results for Math as the grades increase for both consortiums. These trends across grades are very similar to the trends across grades found for both Smarter Balanced and PARCC for 2015, 2016, and 2017 results. The gain scores on pages 3 and 5 as well as the gain scores on page 6 are comparable across Smarter Balanced and PARCC. The annual gains for 2016 through 2018 show both consortiums had somewhat better than typical gains for 2016, considerably lower gains for 2017 [with most Smarter Balanced states showing actual declines], and recovery for less than typical gains for 2018. The pattern of gains for Smarter Balanced show extreme changes from year-to-year, notably different than a more modest pattern for PARCC. One might describe the 2016 thru 2018 gains for Smarter Balanced states as reflective of a Level 5 roller coaster, while the gains for PARCC states reflect a less extreme level 2 roller coaster. The like metrics for the overall patterns of annual gains translate into Letter grades that communicate these differences very accurately. Due to the number of student scores entering into these consortium-wide calculations, increases or declines in results of perhaps 0.1 to 0.2 percentage points may be considered "statistically significant.? However, the use of theoretical statistical significance calculations for these analyses of statewide test results is questionable. From a practical perspective, increases or decreases of 0.5 percentage points may be considered ?meaningful? changes, and increases/decreases of more than 1.0 percentage points should be considered as "very meaningful? changes, similar to typical interpretation of 4.0 GPAs. Other Considerations it should be noted that changes (or lack of needed changes) in the tests between 2015 and 2018 may substantially affect the gain scores displayed in the charts. For example, the Smarter Balanced submission for federal peer review covering spring 2015 tests ?revealed some gaps in item coverage at the low end of the performance spectrum.? in January?February 2018, Smarter Balanced released information that the operational item bank used for the spring 2017 testing cycle changed considerably, in an attempt to add easier items to improve coverage at the lower end of the achievement spectrum. Based on Smarter Balanced internal technical data dated October 2016 but not released until January?February 2018, it appeared this effort was not entirely successful. in late March 2018, Smarter Balanced released additional technical information based on analysis of actual spring 2017 item performance. However, this technical information was not consistent with the October 2016 technical information upon which anticipated 2017 item bank performance was based, and the March 2018 technical information did not thoroughly address the differential 2017 Smarter Balanced consortiumnwide scores in a way that explained the extremely meaningful declines in scores for 2017. The March 2018 information released by Smarter Balanced as well as reviews of that information by the author are available upon request. Smarter Balanced has not released information to date on whether the 2018 adaptive item bank and/or adaptive algorithm changed for the 2018 test administration. If there were changes to either the adaptive item bank or adaptive algorithm, that information would inform the interpretation of the data on page 6 considerably. In the absence of such information, one can only conclude that the comparability of the Smarter Balanced testing system from year?to?year remains suspect in terms of generating quality year- to?year change data, with extreme declines in gain scores for 2017 and partial recovery for 2018, substantially different than the PARCC year-to-year gains and at odds with expected annual gains for statewide testing systems. Finally, it should be noted that the overall consortium testing gain scores for 2016 thru 2018 are lower than the typical gains described by Bob Linn 15 years ago. Over these three years, the Smarter Balanced gains average less than 1.0 percentage point (0.95 to be exact), while the PARCC gains averaged 1.67 percentage points. The letter grades reflect a reasonable mix of A?s and B?s for 2016, but for 2017 and 2018 only a single state each year received a letter grade of and no state received an A. Why this is the case is unknown. it may be due to the characteristics of each testing systemthe demanding nature of implementing challenging academic content standards in our schools over the past five yea rs. Author Tagline Doug McRae is a retired educational measurement specialist living in Monterey, California. In his almost 50 years in the K-12 testing field, he has served as an educational testing company executive in charge of the design and development of K-12 tests widely used across the country, as well as an advisor for the design and development of California?s STAR statewide testing system which was used from 1998 through 2013. He has a in Quantitative from the L. L. Thurstone Laboratory at the University of North Carolina, Chapel Hill.