3? Memo Date: October 2, 2013' To: Ms. Jessie Montano, Deputy Commissioner, Minnesota Department of Education From Jon Cohen, American Institutes for Research Re: Response to Letter Dated September 16, 2013 AIR is committed to ensuring reliable and accurate testing, in accordance with the terms of the existing scope of work, for the duration of this contract. I agree that there have been significant communication problems, and seek to correct them. Similarly, I agree that timely adherence to project schedules is critical to the success of the project. As we will demonstrate in our detailed response to the specific complaints in your letter, most of the significant problems cited in your letter arose as_ a result of MDE actions, including failure to meet schedules, an unwillingness to make decisions in a timely manner, revision of requirements well after the specifications were completed and work commenced, poor decision making, and unreasonable demands for out-of-scope deliverables. Successful delivery of the assessments has required that AIR exceed the contract scope to protect the program and prevent MDE actions from impacting the field. Most often, these efforts have generally been successful. We address each specific complaint below. Regarding next steps, the detailed scope--planning document requested in your letter is outside the scope of the contract. We would be happy to price the development of such a document, but will not otherwise undertake such a large, out-of- scope effort. Instead, we recommend quickly identifying and closely monitoring the milestones along the critical path to the success of this project. I 1000 Thornas Jefferson Street NW, Washington, DC 20007-3835 I 202.403.5000 TTY 877.334.3499 1 Ms. Jessie Montano,Deputy Commissioner I October 2, 2013 Page 2 of 30 Concerns and Responses Failure to Adequatelv Prepare for Meetings (Scope of Work Section. 1.1.12.1). AIR staff have been poorly prepared for several weekly meetings with MDE. Examples include, but are not limited to, the following: On July 10, July 24, August 7, and August 14, 2013 MDE and AIR convened to discuss the Mathematics and Reading MCA and GRAD assessments. AIR was inadequately prepared, as evidenced by: 0 Failure to Timely Confirm Functionality of MCA Adaptive Algorithm. During each of the calls on July 10, July 24, and August 7, AIR was asked to confirm that its _computer adaptive algorithm ensured proper initial item selection in the adaptive test, based on an individual student's prior MCA demonstrated ability, if such information is available, or based on the prior year's state average score. This issue also was discussed at the Minnesota Technical Advisory Committee (TAC) meeting on July 16, 2013, which AIR attended. On August 9, 2013 AIR confirmed that its algorithm properly selects the initial item for students with a prior test score in the database, but, where no prior score is available, AIR has -- yet to confirm that its algorithm selects a student's first item using the state average. 0 Failure to Prepare For August 14 Meeting. During the August 14 meeting, AIR staff did not include action items in the agenda nor was AIR prepared to discuss these items with MDE. MDE staff had to remind AIR of the action items and the status of each. I MDE has claimed "inadequate preparation for meetings" but provides two examples that are inaccurate. In contrast to MDE assertions, AIR has provided documentation about the functionality of our adaptive algorithm. AIR prepared documentation about the functionality of our adaptive algorithm and sent it to MDE on June 16, 2011. The document was discussed on a conference call with MDE on June 17, 2011. AIR discussed the adaptive algorithm as part of a presentation to the TAC in December 2011. The adaptive algorithm document was provided by MDE to the TAC during the October 2012 TAC meeting. (See the email from Patricia Olson to TAC members on December 12, titled Algorithm. AIR has also demonstrated this functionality in operational tests. - AIR has provided documentation about the functionality of the adaptive algorithm and has repeatedly confirmed functionality when questioned. From the earliest TAC meeting presentations, we have provided documentation on the functionality of the adaptive algorithm, including providing MDE and its TAC -with proprietary information describing the item selection engine (December 2011), and subsequent simulation technical reports describing the configuration of the adaptive algorithm for operational deployment (February 2012, October 2012, January 2013). For example, during the December 2012 meeting with MDE's presented an overview of the functioning of the adaptive algorithm._ The minutes note: I 1000 Street NW, Washington, DC 20007-3835 1 202.403.5000 I 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 3 of 30 The TAC asked AIR to clarifii where the initial student ability estimate is derived--AIR noted that for a student 's first opportunity, initial ability estimate is the state average from the previous school year. Thereafter, the initial ability estimate for each student is based on their previous attempts within the same administration. In future administrations, it will be possible for a student's ability estimate for opportunity one to be derived fiom their previous year 's highest performing attempt. . Summaries of the simulations for the 2013 administration of the assessments were discussed starting in the October 2012 meeting with MDE's Technical Advisory Committee, and again in the January 2013 meeting with the TAC. Specifically, during discussion of the Test Administration Plan in Mathematics, the meeting minutes note the following under the heading "Confirming rules for the initial item selection for There was unanimous agreement that highest score flom Spring 2012 should be used for MCA. MDE confirmed that the first item selected for each student tested in 2013 would be chosen using the student 's highest score attained in 2012. MDE insists that to date they still have not received confirmation about the start value for the adaptive assessments,_but nevertheless engaged in an email exchange about whether the OLPA start values should be based on the mean of the spring accountability administration or based on the Fall 2012 OLPA administration. (See email from John Denbleyker sent September' 2, 2013, titled Re: Simulation Update.) . - The reference to failure to provide action items is not correct. We have attached a copy of the agenda from the meeting in question, and note that it does, in fact, contain action items. These 'agendas are the result of a collaborative activity. AIR provides the draft agenda in advance, generally on Mondays (occasionally on Tuesdays), and MDE often adds to it. Therefore, if MDE notices an issue, it should raise it prior to the meeting. We note that MDE often adds agenda items only a few hours before the meeting, or raises issues not on the agenda, which may contribute to MDE staff dissatisfaction with those meetings. For example, on August 28, 2013, an item was added at 9:46 and at 8:30 a.m. on August 7, 2013, MDE requested a detailed update on two topics for the meeting that day. One to two additional topics were raised at each meeting on 7/18, 7/24, 8/14, 9/4, and 9/18. - Failure to Timely Provide Agreed-Upon Deliverables (Scope of Work Sections 1.1: 12.1, 1.2.1 1. AIR has failed to follow up on and/or timely provide agreed-upon deliverables after agreeing to specific timelines. Examples include, but are not limited to: 0 Late Provision of Administration Exposure Rates (Scope of Work Line 5.5.1). Under the MDE contract, AIR was to provide administration exposure rates for Minnesota's OLPA and MCA tests by May 1 and May 24, 2013, respectively (see project schedule, lines 6.2.1.4 and 6.2.1.5). AIR failed to meet either of those deadlines. As early as March 27, 2013, MDE reminded AIR of the importance of timely provision" of these data. MDE and AIR discussed these overdue data and i 1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 202.403.5000 I TTY 877.334.3499 I Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 4 of 30 listed provision of the data as an action item on all four of the weekly meetings listed above, yet AIR did not provide this information until August 16, 2013. This gave MDE only 24 hours to review the data in order to meet AIR's deadline for finalizing the 2013-14 OLPA test item pool. Because of AIR's late submission of the data, MDE was unable to perform a complete analysis of the exposure rates and was forced toretain the existing. OLPA item pool for the 2013-14 test year. MDE claims that decisions regarding items to be included in the interim assessment (OLPA) were hindered by a failure to provide data on exposure rates. AIR disagrees for the following- reasons: - 0 In accordance with AIR's quality assurance plan, exposure rates are available to MDE on . - demand,' and the exposure rate reports can be requested at any time; and 0 Failure to establish a pool of items for the OLPA reflected MDE indecision and unwillingness to" establish and abide by item selection criteria. MDE and its TAC have reviewed AIR's quality assurance plan on at least two occasions (February 2012, June 2012) and therefore understand that AIR produces a series of QA reports, including item statistic, blueprint match, item exposure, analysis reports, that can be generated at any time on request by MDE. Following TAC recommendation, AIR generates and reviews these QA reports at intervals across the test administration windows to evaluate items for possible mis-scoring of test items or blueprint match Violations. - The exposure rate report requested by MDE was not the standard, within-scope report, but rather an out-of-scope report that gathered exposure rate information across administrations, which requires time-consuming additional programming.>> Our schedule for helping MDE select the OLPA pool was rejected by MDE as too aggressive, indicating that MDE staff would be unable to attend to the task during the month of June (see comments from George Henley on the schedule sent April 25, 2013). Furthermore, MDE indicated that it would not be able to establish criteria-for selection of items in the pool, but instead wanted to make decisions based on individual items: The decision about which option to pursue must be made at the item level. Some items appear unusable, whereas others need recalibration, based either on appropriate current data, or additional field test administration. - --emailfiom George Henley to E. Ayers dated 9/23/13, one week after AIR received the September 16 letter With regard to the specific MDE comment, please note that the" need for field testing or recalibration has nothing to do with exposure rates. - 1000 Thomas .'lefterson Street NW, Was-Iiington, DC 20007-3835 202.403.5000 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 5 of 30 Failure to Develop Plan for Provision of Immediate Results For 2014 Test Administrations. During a call on July 24,2013, MDE requested AIR provide a detailed update regarding the provision of immediate results to districts during the spring 2014 test administrations. The need for advanced planning in this area also had been discussed in the TAC'meeting the previous week, which AIR attended. On July 25, 2013, AIR provided minutes to the July 24 call that noted the importance of the information for the upcoming Assessment Conference." MDE spoke with AIR representatives on July 25, but AIR did not provide the requested update. In the July 25 call, MDE reiterated its request for the update and requested that AIR provide the information in a table detailing the plan for immediate scoring by test and subject. MDE asked the table be provided with the minutes of the July 25 call, and reiterated the need for this information in order to prepare for upcoming Assessment Conference 'presentations on July 31 and August 1. On July 26, AIR provided MDE with the minutes of the July 25 call, but did" not include requested table. Due to the time-sensitive nature of the information and the need to prepare MDE's presentation for July 31, MDE was forced to determine a plan, on its own, for providing immediate results during the spring 2014_ test administration. AIR eventually provided the requested table two weeks later, on August 7, but it did not correctly reflect the reporting status regarding paper reporting of the reading test and AIR's data entry interface. Specifically, in the TAC meeting held on July 16, 2013, MDE and AIR decided not to provide immediate results for paper and data entry interface tests, but the table indicated immediate reporting. This comment is that AIR did not provide a brief table summarizing MDE intentions in a timely manner. While that is true, it is because MDE was unclear about What it was looking for, and seemed to be seeking some sort of technical plan. Ultimately, AIR leamed that the plan that MDE was requesting was a simple table summarizing MDE's own intentions about reporting. This, however, was a miscommunication since AIR did not understand that all MDE wanted was an acknowledgement of MDE's views. Failure to Timely Provide Science Standard-Setting Technical Report. (Scope of Work 11.2.13) The United States Department of Education requires MDE to post all technical reports associated with the program in a timely manner. The Science Standard Setting took place in June 2012 and the Technical report was not finalized for posting until August 2013. On June 5, 2013, MDE determined that the last version of the technical report, dated October 23, 2012, contained references to secure testing material that needed to be removed in preparation for posting. On June 7, 2013, AIR.agreed to provide an update at the June 12, 2013 phone call. AIR did not provide an update on the status of the 2012 Science Standard-Setting Technical Report during the June 12, 2013 phone call. On July 9, 2013 MDE asked AIR when it expectedvto provide an updated draft of the technical report. On July 11, 2013, AIR indicated that its technical team would complete its edits on July 12, but that the document still needed to be reviewed for compliance with state 1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 1, 202.403.5000 TTY 877.334.3499 Ms. 'Jessie Montano, Deputy Commissioner October 2, 2013 Page 6 of 30 I accessibility requirements. On July 17, 2013, AIR informed MDE that it would take five Weeks to finish the report, pushing the date out to August 21, 2013. As a result of the delay, MDE was not able to post this document to its Website until September 4, 2013. - MDE claims that AIR did not provide a timely Science Standard Setting Technical Report. In_ fact, AIR provided technicalreporting for the science standard setting meetings in a timely manner. Standard setting activities were completed on June 28, 2012, and AIR provided MDE with acompleted technical report for the commissioner's review on July 3, 2012. Outcomes of I . the standard setting were discussed with MDE's TAC in its October 10-11, 2012, meeting. . A The dates referred to in this comment pertain to summer 2013, a full year following delivery of the technical report to MDE. MDE waited one year to request some presentation changes in the report, and to request that an accessible version of the report be provided. Our accessibility team did require five weeks to create the requested deliverable. AIR delivered the document two weeks later than agreed; however, this delay followed a delay of one yearon the part of MDE. 0 Failure to Timely Provide Forms for 2012 Science Test Administration. .test forms for the 2012 Science test administration were not completed and available for students when the test administration Window opened on March 26, 2012. These forms should have been ready two Weeks p.rior to the opening of the test window per the Scope of Work Instead, the forms were not available until April 9, 2012. This delay jeopardizedMDE's ability to use these test items operationally. in -future test administrations because of the risk that an inadequate number of students Wouldtake the field test items on the proper forms. In this comment, MDE asserts that AIR was unable to get all of the science simulations running in.time for the test deployment. In fact, if the.forms were deployed on time and every student saw a_ full form. In this case, MDE asked for rapid development of an out-of-scope feature--a'feature that AIR delivered successfully and atno additional cost to the State. Science simulations were not part of the RF P, nor were they included in the sample items provided to bidders. After we signed the contract, MDE asked if AIR could support this new item type. MDE was informed that AIR was nearing completion of the tools to support the item type, but would rush them into production. This was done without an increase in cost or a modification to the contract. MDE understood that there was risk that the new software would not beready in time. As in' other cases, MDE requested work-that was not in scope, had the risks explained, and accepted those risks. A year and a half later,the history of goodwill, extra effort, and ultimate success has been forgotten. However, contrary to MDE's assertion above, it was in fact able to use those items successfully in subsequent administrations, which was entirely in line with expectations at the time. 1000 Thomas Llefferson Street NW, Washington, DC 20007-3835 I 202.403.5000 1 877.334.3499, Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 7_ of 30 0 Failure to Timely Develop Science Test Items. In July 2012, AIR's Science Content Lead informed'MDE that it would not be able to meet the deadline for full development of new science items set forth in the Project Schedule (line 17 92; Section 5.23:1.3.1) and the Item Development Plan (May 7,2012, Scope of Work 3.3.9). Specifically,'AIR informed lVfl)E that it was not able to complete development of 107 science-test items prior to the deadline forreview by a Minnesota item review panel. AIR informed MDE of this failure on July 10, 2012; only one week before the panel's scheduled 'review. This required MDE to completely revise the item development schedule (including creation of items, MDE review of items, scheduling a new item review panel, and final review prior to administration) at a time when MDE and AIR were occupied with other tasks in the test development cycle. -AIR did not complete writing the 107 new items until September 10, 2012 and the earliest a teacher review panel could review these items was September 25-27, 2012. As a result of AIR's delay, these items were not finalized with all edits in sufficient time to be field tested in Spring 2013 as originally planned. A . MDE indicates that AIR's science development lagged behind schedule in 2012. This is true._ AIR reorganized our science team to address the problem. It is important to note the following: - All science items were completed in time to field test. A For reasons unrelated to scheduling, MDE chose to field test a different set of items. On September 18, 2012, MDE notified AIR of the intention to .-change the approach to the selection of field--test items. MDE decided to field test "Up to orms of MCA--II scenarios in order to get useful data and put 'back into our pool of options." All science items were developed in time to fie1_d test; AIR mapped out a reasonable 0 schedule that would have had the additional 107 items completed by October 26. The .1ockdown date for items to be included on forms was November 27. Failure to Timely Load Legacy Test Items. The transition plan for AIR to load legacy test_items from the previous contractor into its item management system called for the work to be completed no later than three months from time of delivery of the items (Scope of Work 3.2.20). Even though these items were delivered to AIR by March 2011, however, AIR still has not completely and correctly loaded all legacy items from the previous contractor into its item management system. AIR did not complete its import of science items until August 30, 2013, more than 14 months after the deadline. To date, there still is no. projected date for completing the import of all legacy items for mathematics and reading. Rather, MDE is to identify the missing items or information, per AIR's instructions, and AIR will make the updates as MDE identifies them. This has complicated test construction efforts for spring 2014 administrations because not all necessary information is included in the item bank and will require additional MDE staff time to locate the information in 1000 Thomas .lcfferson Street NW, Wasliington, DC 20007-3835 1 202.403.5000 I TTY 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 8 of 30 other sources. The item import and this specific concern was addressed in several meetings, including on June 6, 2012, November 28, 2012, and April 24, 2013. MDE asserts that AIR was late importing items from the prior contractor. In fact, MDE failed to deliver the items' on time, failed to deliver the items to AIR in a reasonably importable condition, and failed to deliver accurate information about the items. Despite these failures, AIR has imported all available items and worked with MDE to correct the inaccurate information transmitted to us. All of this was outside the scope of the contract and resulted in a considerable I additional cost to AIR but was completed at no additional cost to the State. As documented in" AIR's letter to Linda Samsdated June 9, 2011, AIR'hadnot received the vast majority of items from MDE by that date. At that point, AIR considered all scheduled testing windows at risk. These well-documented facts stand in contrast to the unsupported claims made in MDE's letter. 0 AIR has successfully completed the import of the approximately 24,000 items delivered by MDE, a fact acknowledged by MDE's payment of the invoice for this work. The bulk of those deliveries took place during or after July 201 1. . 0 In contrast, the items and attendant metadata received from MDE 0 did not include any list or catalogue of the items to be imported; were not delivered in a reasonable, importable format; were delivered with incorrect information about items; and did not maintain any apparent Version control over items created by the prior contractor. These items were imported according to a phased schedule, importing the items most critical to upcoming administrations. This plan enabled us to meet critical operational deadlines endangered by MDE's extremely late and imperfect delivery of items. As AIR imported the items, it performed a series of quality checks on those items, ensuring that they were properly formatted, contained clear graphics, had correct keys, and reflected the version as we received it from MDE. During the import activities, MDE often asked AIR to make out-of-scope changes to the items received, sharpen graphics, tweak scoring logic, and upload new item alignments. AIR met all of those out-of-scope requests at no additional cost to the State. The files that we received from MDE were a challenge to work with. Many files were missing and the metadata was incomplete and in some cases inaccurate (even containing incorrect keys - on operational items). AIR received files in a variety of formats, which complicated the upload process. Item rationales, for instance, were in Word documents and had to be converted "to XML so that they could be stored as item attributes. In some cases, the Modified Assessment, 1000 Thomas .'l'efferson Street Washington, 20007-3835 202.403.5000 TTY 877.334.3499 Ms." Jessie Montano, Deputy Commissioner October 2, 2013 Page 9 of 30 MDE had the most recent item versions on hard copy, which AIR had to manually type into ITS and put through an additional QC process. Despite the fact that MDE failed to deliver items in a reasonable and customary format, AIR imported and corrected all items at no additional cost. AIR did so according to mutually agreed- upon schedules and in sufficient time for those items to appear on each test administration. 0 _Failure to Timely Provide Hollow-T hrough. Agenda items reside on the agenda for many months and/or are not competently managed, 2012 - - April 18 MDE notified AIR of an issue with science item sampler attributes and instructed AIR to update the items in the bank. June 7 MDE reminded AIR that above work needed to be done. June 13 AIR committed to doing the clean-up of the science item sampler pool items and attributes September 12 AIRpput a timeline of one week after redeployment of item samplers to complete the task described above. November 21 User Acceptance Training of item samplers for redeployment occurred. December 5 AIR removed the item sampler pool clean-up from the Weekly meeting agenda 2013 . February 6 and MDE reminded AIR that the science item sampler clean up March 13 needed to be on the agenda. May 1 4 AIR committed to completing the science item sampler pool clean up by July 29, 2013. August 21 AIR committed to providing a schedule for completing the science item sampler pool clean up by the week of August 26. September 16 AIR will send a spreadsheet of corrected attributes to MDE End of September Work will be finalized In this comment, MDE confuses appropriate prioritization with a failure to provide timely follow-through. In this case, the timeline details a matter of intemal record keeping, without any impact whatsoever on the program or the field. The science item sampler referred to in the comment is a collection of released items. The "attributes" referred to in the comment are data - about these items stored in an AIR system that is informational only, is of no consequence to the I program, and is at most a housekeeping matter that can be addressed at the time of the transition. 1000 Thomas Jefferson streemw, Washington, DC 20007-3835 1 202.403.5000 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 10 of 30 A Variety of day-to-day issues, which might have impacted work flow or the field, took priority over this matter of internal record keeping. Both AIR and MDE content staff have been working on higher priority, scheduled project deliverables. It is only important that all records be accurate in time for the transition, and the work is now complete. 0 Inadeguate Communication (Scope of Work Sections 1.1.1 and Communication has been an ongoing problem, especially in the past year. Concerns range from the failure to inform MDE of known problems with test delivery to not anticipating risks - with approaching deadlines. MDE has been forced to make decisions with incomplete information, with consequences that could have been avoided if communication had been clear and timely. Examples include, but are not limited to: 0 Failure to Alert MDE About Browser Problems. on July 25, 2013, AIR failed to raise a flaw with the newly released Secure Browser, the web browser developed by AIR that is used exclusively to deliver the online administrations. The installation tool for the new Secure Browser was supposed to automatically remove the old secure browser from Windows workstations. This functionality .. did not operate as described in the specifications. This required extra effort and rework for school districts to remove previous versions of the Secure Browser or to reload an updated Secure Browser. A district notified AIR's help desk of the problem on-July 23, which created a help desk case. MDE reviewed AIR's weekly help desk cases as part of its routine procedure. MDE'staff noticed the _July 23 district complaint and asked AIR twice for an update on the status of the case, on July 24 and July 25. AIR only acknowledged the statewide problem with the Secure Browser 50 minutes into a July 25 call, and, even then, only when MDE asked about it. Failure to Notifv MDE of Website Link Changes. On July 30, 2013, MDE was finalizing documents for the August 1 Assessment Conference and discovered AIR web links within resource materials for districts on AIR's web site were no longer working. Upon inquiry, AIR confirmed that several internal webpages links had been changed. According to AIR, this was to allow for faster Webpage changes on the AIR'side. The change impacted the manuals, trainings, and modules in various stages of development, but no one at AIR had informed MDE. MDE attempted to update the URL's in its manuals and training documents and discovered that all URL's directed the user to the same webpage rather than to the specific material referenced in the training materials. AIR's initial solution was a multi-step process requiring MDE to insert a specific number into the URL that mapped back to a specific page. MDE was concerned with this approach because it would require MDE staff to go into all of the places MDE linked to the AIR portal (help desk emails, MDE emails to districts, documents, trainings, modules, etc.), and manually type in the link, and this 1000 Thomas Jefferson Street NW, Washington, DC 200071-3835 202.403.5000 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 11 of 30 process would have to be repeated every time materials were updated. MDE also was concerned that school districts would experience increased difficulty developing specific training materials with AIR links all tied to generic resource pages. When MDE told AIR that this was going to be unacceptable, AIR reversed the change the same day, ultimately alleviating the need to alter URL references. The time wasted investigating and communicating after-the-fact on this issue all could have been avoided if AIR simply contacted MDE in advance about potential website changes. MDE asserts inadequate communication, and cites two examples. The first example 'actually documents the timeliness and effectiveness of communication. The second was, in fact, a minor internal communication oversight at AIR. In the first example, the help desk case was escalated to technical experts who diagnosed the problem on 7/25, the same day the problem reported to MDE. Between the first report on 7/23 and the escalation on 7/25, our help desk agent worked with the original caller to resolve the problem independently. The absence of the uninstallroutine represents a flaw, but a relatively minor one. The automatic uninstall feature does not exist for other operating systems, so many users have to do the manual uninstall by design. In the second example, AIR's team upgraded the portal to 'improve our ability to respond to MDE requests for updates. Our software team that implemented the upgrade did not know that the MDE manuals contained references to specific URLs within the site, and our management team did not know that such references would change. As a result, our software team did not communicate the potential issue to our management team. We regret the oversight. A 0 "Failure to Timely Communicate Stafl Changes and Effectively Train AIR Replacement Staff Per Scope of Work (line 1.1.6), AIR must notify MDE "of any staff changes. MDE retains the right to approve any changes to leads assigned to the project. A change in scope that necessitates additional staff or reduction in staff requires approval of staff by MDE. In critical areas, AIR has not followed these requirements in its implementation of MDE's contract. For example, as discussed during a conference call on August 8, 2013 and during Pre--Review meetings with AIR and MDE, MDE has ongoing concerns about receiving MCA- . Modified items that are not appropriate for the_population; MDE has informed AIR that the Alternate Assessment lacks content oversight. In February 2013, AIR's Jess Unger indicated he would oversee the project. On August 7, 2013, MDE notified AIR that it had not heard from Unger since March 2013. In a project call on August 8, 2013, AIR informed MDE that Kevin Minkoffwas stepping in for Unger, because Unger was busy working with another state. On' August 15, 2013, after asked AIR to include contract sltaffing on its next conference call agenda, AIR's Kevin Murphy officially notified MDE that Unger - 1000 Thomas Jefferson Street NW, 20007-3835 I 202.403.5000 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 12 of 30 was no longer assigned to the Minnesota project. Because of staffing changes in critical areas, AIR has not been able to develop a knowledge base about the Minnesota program. MDE work flow is impacted due to lack of consistency in AIR staffing and the additional training MDE must provide to new AIR staff. An effective transition plan for staffing changes has not been executed by AIR under this contract. MDE has experienced turnover of content leads. In several cases, AIR staff members objected to working with some MDE staff members because they felt that treatment by MDE staff was disrespectful and unprofessional. At least two content leads stated that they would leave AIR rather than continue to work with MDE. A In the case cited above, AIR notified MDE of Jess Unger's departure, albeitlate. AIR acknowledged MDE's right to approve new team members and submitted Kevin Minkoff' resume for their review. MDE replied that it did not oppose this change. Contrary to MDE's claim that "ithad not heard from Jess since March," Jess attended the content planning meeting at.MDE's offices in late April and standard setting in late June. Jess continues to provide transition support to Kevin. However, it should be noted that MDE has communicated that the MCA-Modified project would be ending, and at one point stated that it was ending. 'Specifically, on March 28, 2013, MDE communicated that MCA-Modified was being eliminated and no further item development was needed. This communication signaled to staff that they ought to begin searching for other internal opportunities. - 3- Failure to Timely' Provide Reg uested Staffing and Contract Data [Scope of Work Section 1.2.1 and Contract). On June 27 2013, MDE requested that AIR provide certain - information with respect to the AIR staff assigned to Minnesota's contract and about the gains resulting from certain contractamendments. Specifically, MDE requested: the percent of time staff listed on the Communications Guidelines are dedicated to the Minnesota project; their total hours attributed to the Minnesota project for the past year, as reflected in timesheets; and documentation of the gains AIR effected using the increased project management fee MDE agreed to in the February 4, 2013 contract amendment, including more effective communication with MDE and the districts, proactive planning, and improved adherence to deadlines. MDE requested this information by July 15. On August 27 AIR provided time documentation for a very limited number of project management staff. On August 28, 2013, MDE reiterated that its request was for time documentation for all staff on the Minnesota project, and requested this information by A September 5. AIR did not meet this deadline. MDE requested a status update on September 6, 2013, and, to date, has received no response from AIR. - MDE claims that AIR failed to provide required' information about staffing levels. However, AIR provided exactly 'the required information. On August 27, AIR provided employee charging data - 1000 Thomas Street NW, Washington, DC 20007-3835 1 202.403.5000 TTY 877.334.3499 1] Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 13 of 30 . 'directly from our financial systems. AIR normally would not provide this information on a fixed- price contract, but provided it because of the modification specifically calling for dedicated project managers. As indicated in the email, the managers devoted 97% of their project time to Minnesota. We have clearly met our commitment. . SECTION 2--TEST DESIGN - Failureto Timely Provide Test Design Support ?Scope of Work Section of AIR's poor scheduling and struggles designing fixed forms, experienced difficulties in determining and executing the test design for the spring 2013 administration of the reading MCA. AIR also failed to initiate preliminary discussions for spring 2014 test administration. As a result of these two things, decided it was in the best interest of the program for MDE to take the lead in planning reading test design for the spring 2014 tests. Below are only a few examples of the numerous difficulties MDE experienced during spring 2013 reading test construction efforts: July 11, 2012--AIR agreed to provide information about meeting 'targets_ and test specifications, and never provided this information. I - August 14, difficulties in getting the reading form construction underway led AIR and MDE to set new procedures and timelines. The result was a compressed review schedule. . 0 September 19, submitted the first set of.'eading forms, for grade 3. The word counts ranged from 2,453 to 3,483 words, which exceeded MDE's initial test specifications of 2,000 words. forms were rejected and MDE adjusted the test specifications according y. Between October 10, 2012 and October 26, submitted the following base operational forms. MDE reviewed proposed test forms and rejected forms for a variety of reasons, including' lack of measurement precision at targeted cut points-, forms outside of equivalence bounds on some constructed forms, and not meeting revised word count ranges in test specifications. Below .lists the specific grades and forms reviewed and rejected by MDE. I Grade 4, orms submitted,'orms rejected by MDE for failure to meet specifications . I Grade 6,ufor submitted, orms rejected by MDE. AIR resvubmittedilfevised forms which was again rejected by MDE for concerns I I Grade 8, orms submitted rejected by MDE . I Grade 10, orms submitted orms rejected by MDE 0 January 10, DE advised AIR of multiple errors discovered in test maps created by AIR. A 0 January 16, 2013-In the midst of reviewing several resubmitted complete forms and supporting documentation from AIR to address word count concerns, MDE discovered that information within the documentation about the plan to link across grades was inconsistent with previous approved versions and- 000 Thomas Jefferson Street NW, Washington, DC 20007-3835 202.403.5000 877.334.3499 I 0 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 14 of 30 associated documents. The materials also included vague language about AIR's recommendations that required clarification. As a result, MDE scheduled a January 17 telephone conference with AIR. 0 January 17, 2013 -For the first time, AIR informed MDE of errors in AIR's implementation of the agreed upon vertical linking design. Incorrect passages were placed on some forms. Some passages designated to be used in the vertical linking design were not placed in the proper adjacent grades. This error involved passages from grades 4, 5, 6, and 8 and resulted in significant rework on several forms across the grades. All online forms were to- be_ finalized January 29, 2013 so the final design could be configured in the online testing system (2012--13 schedule line 3.5.6). We highlight this event because the process was pushed very close to the deadline and there was a risk that it would not be met and the spring testing window would not open on schedule. MDE asserts that the test forms for the 2013 online reading test were far behind schedule. This is true, but the situation was directly the result of MDE's own actions: A 0 MDE and MDE's National Technical Advisory Committee approved a plan for forms construction, which MDE abandoned after the forms were constructed. 0 After the forms were constructed, MDE introduced new and arbitrary criteria for forms construction that were outside the criteria outlined in the approved plan. 0 MDE's new criteria were incompatible with the item bank, which MDE had selected earlier. MDE's unwillingness to adhere to the approved plan and last-minute imposition of new and arbitrary constraints put the project schedules at risk and created substantial additional and out- of--scope work for AIR, which it performed. in good faith and at no additional cost to the State. The new criteria added at the last minute reduced the size and robustness of the item bank, ultimately reducing the quality of the test. AIR and MDE agreed to atwo-phase test construction plan. During the first phase, a block of items would be assembled into modules; in the second, those modules would be assembled into forms according to a predetermined design. The plan was endorsed by MDE's Technical Advisory Committee on several occasions, and accepted by MDE. After completing the first phase of the plan, MDE. abandoned the plan, instead working toward unstated requirements and arbitrary, unarticulated, or inconsistent criteria. MDE's failure to adhere to approved plans has been a continuing challenge on this program. While implementing the approved plan, MDE failed to meet a single one of its review deadlines for reviewing the reading test forms. From August 2012 through October 2012, MDE reviewed item blocks (which required approval before forms could be built). MDE did not return any of these materials on time. In several instances, materials were retumed as many as 12-14 business days late. This delayed AIR's ability to build the forms. AIR adjusted the schedule on a regular basis and proposed solutions to expedite the process. However, continued delays put forms 1000 Thomas Jefferson StreetNW, "Washington, DC 20007-3835 202.403.5000 I Ms.' Jessie Montano, Deputy Commissioner October 2, 2013 Page 15 of 30 further at risk and put end dates in jeopardy. AIR was in daily contact with MDE about the increasing risk of the forms schedule. On 9/21, AIR sent the following email to MDE: As we discussed this morning, the reading forms production work is at serious risk. We 're now at the point where our end production dates are in jeopardy. We appreciate that MDE content stafi' throughout have worked, and continue to work, diligently to complete this task. In turn, we have_ tried to be responsive by adjusting the process based on MDE requests, revising the production schedule to make upfor the delay, develop tracking documents like the priority list to help I identifiz delayed deliverables quickly and change our communication protocols to help ensure a more efiicient workflow. Unfortunately, despite everyone 's good intentions, over the past 4.5 weeks, we have not received any of the forms materials on time. The schedule continues to slip and we continue to receive communications indicating additional changes to the process. In spite of the MDE delays, AIR continued building forms based on the agreed-upon form construction criteria. As AIR was completing the second phase of the plan, MDE decided to abandon the plan completely. This action not only imperiled schedules but also increased risk to the program and lowered the quality of the test by imposing arbitrary and out-of-scope technical requirements. When MDE, abandoned the plan they added new (and arbitrary) statistical criteria after the forms had been created on October 4, 2012, and well into the forms building schedule (see email from John Denbleyker titled Targets for 2013 Reading Forms). MDE's new statistical specifications were incompatible with the bank (which MDE had selected) and resulted in the removal of many sets of items from the bank, reducing the robustness of the item" bank available for the assessment. On September 27, 2012, after AIR had built oveI.inked and equivalent forms, MDE added a requirement establishing the total number of words a student must read on A reading passages. This date was 6 weeks after AIR had begun building and delivering forms. The requirements for word count, which were later revised, were incompatible with the passages that MDE chose to include in the pool, with the average number . of words in the passages in the pool far exceeding the average number of words set in the new requirements. This further reduced the robustness of the bank. Each of these actions, and many more like them, required substantial re-work, outside the scope of the contract. AIR and our committed staff members worked diligently to meet these shifting requirements and protect the program from management-induced failure. AIR did not insist on contract modifications to cover this rework. After receiving the first round of forms, MDE rejected nearly half of them based on word count criteria that had never been articulated. MDE apparently wanted to apply the word count 1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 I 202.403.5000 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 16 of 30 requirements from a different test, despite MDE's selection of longer passages for the new - bank. After carefully reviewing the bank to determine the feasibility of this request, AIR reported back to MDE that this was not reasonable or mathematically achievable. MDE wanted the total word count on grade 3 forms, for instance, to be 2,000 words or less. The average number of words per passage in grade 3 bank is over 500 words. Each form contains six passages. As a result, most forms had a total word count of approximately 3,000 words. It became apparent that MDE was also applying other unstated criteria to the forms. AIR requested several conference calls with MDE "leadership and team to clarify the criteria of the forms (which were in a constant state of flux) and to reach consensus on the linking design. Since the linking design approved longer possible because of the. newly introduced criteria, AIR spent many staff hours building and rebuilding forms based on changing directives from MDE. AIR made a good-faith effort to meet each successive set of shifting requirements at no additional cost to the State, despite the additional costs that these shifting requirements imposed on AIR. -- MDE cannot in good faith complain about the supposed "error" in the vertical scaling forms. By January 2013, the forms schedule had been so compromised that quality assurance steps had to be compressed into a very small window of time. As a result, MDE review of the forms took' place at the same time as some of AIR's internal QC of the forms. During this overlap period, AIR discovered an error in the vertical linking design and recommended a simple change to correct it. The change affected'out of-forms. The draft forms were corrected. 'The difficulties MDE experienced with form construction were self-inflicted. As with many of the issues raised in the MDE letter, the problems arise from MDE's unwillingness to abide by agreed-upon plans. MDE rejected forms and then rejected the plan on which they were based. - SECTION ITEM DEVELOPMENT I Failure to Maintain Minnesota's Item Bank Sco of Work Section 3.2. hi hli htin 3.2.1 and 3.2.5 1. MDE has ongoing concerns regarding AIR's maintenance of Minnesota's item bank. These concerns range from ensuring Minnesota items are easily searchable to ensuring Minnesota items' are properly categorized. Examples of contract noncompliance include, but are not limited to: 0 Failure to Implement Specific Requested Corrections.' MDE has compiled lists of attribute edits to thereading item bank to ensure that test items are stored with correct and complete information. Many of these edits were necessary due to errors or omissions during AIR's import that MDE identified during test construction efforts in early fall of 2012. More than 120 of the originally 132 requested edits have not been made. MDE sent lists of these items that still 1000 Thomas .leffe2'son Street NW, Washington, DC 20007-3835. 202.403.5000 TTY 877.334.3499 I Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 17 of 30 need attributes edited in May and August of 2013, with a request that the edits be completed prior to test construction efforts that were beginning August 8, 2013. MDE asserts that AIR has not corrected incorrect or missing information provided by MDE when the items 'were imported. AIR completed the import process and MDE deemed the work satisfactory as evidenced by the payment of the invoice. MDE apparently believes that some of the information provided to -AIR was not imported. AIR disagrees. However, on August 9, AIR emailed MDE offering to import additional attributes if MDE could point out the location of the attributes in the files it sent to AIR and if the data was in a format that could be imported without re-work. On August 29, MDE declined to accept this - offer. 0 Failure to Ensure /tern Bank Import Correct and Complete. Associating test items with correct and complete information is critical to the test development process. Because of AIR's failure to enter test item edits in a' timely manner, the test construction process for 2013-14 has been complicated. Specifically, MDE staff have been forced to search through historical and other outside documentation to identify "missing information that, if edits were complete, should be stored i with the items in the item bank. I November 28, 2012--MDE staff met face-to-face with AIR staff to discuss content issues. One of the topics discussed in detail was missing information thatshould be associated with MDE's items stored in AIR's item banking system and how it adversely affected MDE's ability to complete work efficiently and completely. The minutes of this meeting included 23 action items for AIR that were specifically related to this topic. In the edited minutes of this meeting, which MDE returned to AIR on December 27, 2012-, MDE requested follow-up to many of these action items by January 18, 2013, and reiterated the need for follow up during 5 conference calls in January and February 2013. To date, 10 action items remain unresolved. I November 29, 2012--MDE Statewide Testing leadership clarified expectations and Scope of Work requirements (3.2.2, 3.2.3, 3.2.4) with AIR's leadership regarding the requirements of item attributes in the item banking system. I February 10, 2013--After months of no follow-up on this issue, MDE sent AIR a spreadsheet with MDE's desired review levels for individual test items. I - I March 20, acknowledged receipt of the spreadsheet and stated work would begin. i 1000 Thomas Jefferson Street NW, Washingto11,DC 20007-3835 202.403.5000 877.334.3499 1 wVvw.aii'.org Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 18 of30 March" 25, 2013--AIR submitted revised item review and a plan for completion of the work. I MDE continued discussions with AIR over the next six weeks, including setting forth specific expectations regarding item bank attributes. MDE specified that item attributes were to be identified and corrected within the item banking system in time for test construction to begin in August.' I May provided feedback to AIR on the plan for updating item bank attributes. At that time, AIR -indicated the work should be completed by June 28, 2013. I Early June 2013--MDE asked for a prioritized schedule in an attempt to make sure item bank work was completed in time for test construction, which originally was scheduled to begin on May 20, 2013. I June 6, 2013--AIR provided a schedule and indicated the work might be completed around the end of July. I July 11, 2013--AIR activated new item review levels. I July and August 2013--MDE and AIR discussed updated attributes and insertion of missing data during weekly phone calls. I To date, significant test item development concerns remain, including: test item data is still missing or inaccurate on approximately 6,000 items; rationales are attached to the wrong items or missing in at least 62 cases; and item and passage level attributes needed for test construction have not yet been updated, even though the levels were available as of July 11, 2013. I Despite MDE's repeated requests for such work since November 28,2012, this work has not been completed. This has caused additional work by MDE staff and impaired test construction. I AIR imported and performed a quality check of all the item metadata that it received from the previous vendor. In some cases, metadata from the previous vendor was incomplete or inaccurate. While the scope of the contract does not include correcting errors in the MDE- supplied item bank, AIR has been working with MDE to resolve these issues at no additional cost to the State. AIR began with attributes that are the most critical for test delivery. In some cases, this process took longer because the data were not delivered in any well-structured form that could be imported. In some cases, the "missing" data was not missing at all; it was provided by the previous vendor and reviewed by MDE content staff who wanted to change it. In MCA Reading, for instance, MDE content staff has asked to re-classify passages or realign items. This work has been clearly out of scope, but AIR has tried to fulfill these requests as quickly as 1000 Thomas Jefferson Street Nw, Washington, DC 20007-3835 202.403.5000 rrv' 877.334.3499 1 Ms. Jessie Montano, Deputy Commissioner I October 2, 2013 Page 19 of 30 possible without compromising other important tasks, such as test construction, test delivery, and item development. However, AIR imported everything that it received from MDE. If attributes are still missing or incorrect, this is because they were missing or incorrect in the files AIR received. It is a challenging task to research the metadata (which AIR received in the form of 83 different files) and compare this to tens of thousands of attributes in ITS. It is true that this has been a very time- and labor-intensive process. MDE has mentioned on numerous occasions that there are still missing or inaccurate attributes. If MDE can provide a list of these, along with the correct attributes, we can import them into ITS. Failure to Implement Minnesota Item Bank Functionality (Scope of Work Section 3.2, highlighting 3.2.2, 3.2.3, and 3.2.12), During contract negotiations in June 2011, MDE revised the Scope of Work to permit AIR to delay some item bank functionality until March 2012. Even with this additional negotiated time, there are several aspects of the Scope of Work related to AIR's item banking system that have not been met. Additionally, the current quality of the item bank is not satisfactory. Examples include, but are not limited to: 0 Despite the contract requirement that MDE be able to filter or search items by administration year and item statistics, this functionality does not exist in AIR's item bank (Scope of Work 3.2.2). This requirement was discussed on November 28, 2012, in "a meeting with AIR staff. AIR does not project any functionality to search items by item status until some time in winter 2014, and even then, searchability may only be partial- MDE would like tobe able to search AIR's item bank in ways that are .not currently supported. AIR agreed to extend the functionality of the item bank to support this request, and that agreement was added to the scopeof work. AIR and MDE were to meet March 7-8 to discuss the solution and gather requirements. That meeting was canceled due to inclement weather. AIR is happy to reschedule. The Scope of Work requires AIR's item bank system to allow construction of a customized item cards (Scope of Work 3.2.12). In numerous conversations," including meetings in November 2012 and April 2013, MDE explained its need for an item card on a single page especially for technology enhanced items. In June. 2013, notified MDE that this functionality would not be available in 2013-14, despite previous communications to the contrary. Instead, AIR stated this feature would be "Under Research and will look to add it in the future." To date, AIR still has not provided MDE with this contractually-required item. . The contract calls for customized item cards, and AIR does deliver customized item cards. The contract does not create an unbounded obligation to provide functionality beyond that which is currently configurable in AIR's systems. AIR's items cards are configurable and can include any information stored as attributes in ITS. Technology-enhanced items are designed to be displayed I 1000 Thomas Jefferson Street NW, Washington, DC 20007-3835 1 202.40_3.5000 TTY 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 20 of 30 online and viewed/approved by MDE using the ITS web preview feature. Of course, the technology-enhanced items, which include simulation item types, render differently when printed on an item card. Each year, AIR adds many new enhancements and features to its online applications free of charge to its clients. An enhancement that would improve the representation of TE items when printed on an item card was planned; however, this out-of-scope and low priority enhancement was not adopted by AIR in 2013-14. On August 14, 2013, during activities related _to test construction, MDE This is true. AIR confirmed that attributes were temporarily mis-associated in ITS and took steps . to quickly resolve the issue. We apologize for the inconvenience it caused. discovered that AIR had replaced attributes of several Mathematics MCA items in the Minnesota item bank with Reading MCA-Modified item attributes. On August 23, 2013, AIR explained the cause of the error but did not indicate a date when it would be corrected. MDE received confirmation on September 5, 2013 that the item attributes 'had been corrected. SECTION 4--TEST CONSTRUCTION - Putting Test Construction for 2013-14 Administrations at Risk (Scope of Work Sections 4.1.4 and 1.1.11.2). MDE is concerned about AIR's lack of progress on test construction. In addition to the problems noted elsewhere in this letter that impact test I construction tasks item bank maintenance and exposure rates), AIR has not finalized the schedule for test construction. The. Scope of Work clearly states that A schedules for test construction should be completeby February and approved before the start of any tasks. The timelines in developing the schedule are summarized below: 0 0 January 28, received first draft schedule from AIR. This draft was A I 3,885 lines and 53 pages in length. - February 15, provided feedback, including extensive edits to the test construction schedule. February 26, provided a revised schedule, which was discussed at a face-to-face meeting. March 4, 2013--AIR provided a revised schedule; this version did not include updates to test construction. . I April 18, moved 2013-14 test administration schedule from a standing item to "Topics with No Updates" on weekly Cross-Project Call agendas April 24, updated the schedule; this version still did not include revisions to test construction. April 25, 2013--A draft test construction schedule was sent to MDE in an email to begin discussion of dates and durations. September 4, 2013%Conference Call with MDE and AIR to discuss paper test construction for MCA reading and mathematics. Dates were finalized for the paper forms. Initial dates for online test construction for MCA reading and I 1000 Thomas Jefferson Street NW, Washington_,DC 20007-3835 202.403.5000 TTY 877.334.3499 I Ms. Jessie Montano, Deputy Commissioner October 2, 2013 I Page 21 of 30 A mathematics were discussed. MDE instructed AIR to baseline the .MCA reading and mathematics portions of the test construction schedule. This is six months - behind schedule for paper tests, and although AIR has now included the dates for online test construction, they have not been finalized. MDE asserts, that it received a satisfactory test construction' schedule six months late. In fact, MDE received a schedule early, and has continued to revise it. Many of the revisions resulted from late policy decisions on the part of MDE. For example, MDE had intended to deliver an . adaptive reading test. On April 17, MDE made the far--reaching decision to deliver the online test using fixed forms, which introduced the need to construct online forms. Other late MDE 'decisions include an A ril decision that MDE would construct forms in-house, anld 3 my decision It is true that the schedule has been in flux; however, that flux has been caused by ever-changing decisions on the part "of MDE. Our team has diligently tried to keep the schedule in with the changing plans. Below, we provide additional details and dates about the schedule review process: 0 January provided first draft of schedule to MDE at 3,885 lines . 0 Feb 15--AIR received first round of feedback, 2013-2014 Important Dates sheet (detailing test windows and milestone dates), and MDE's Materials . Development Stages documentation - 0 Feb provided second version to MDE per previously agreed upon schedule at 3,623 lines. This version included updates to the Item Development, Test Construction, and Test Materials development sections among other section edits and - indicated concerns about meeting vendor Approval-to-Print dates with current task projections. . 0 March 4--AIR received MDE's second round of feedback includin_g'Test construction section. edits 0 March provided third version to MDE per previously agreed--upon schedule and responses to MDE's first round of feedback at3,60l lines 0 March 25-AIR received MDE's third round of feedback including Test Construction section edits. . - - I in April provided updates to the master schedule including responses to MDE's second and third rounds of feedback. With this revision, AIR asked for dates in the following week that MDE could meetfor a focused schedule call to begin to bring closure to the issue. An alternate proposal included adding topics to the current week's Content call agendas. 0 . April 10--AIR received MDE's fourth round of feedback including Test' Construction section edits. - - . I 1000 Thomas .left'ersori Street NW, Washington, DC 20007-3835 I 202.403.5000 1 'l"l'Y 877.334.3499 1 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 22 of 30 April received MDE's 2014 Reading Operational Test documentation detailing tasks and durations for Reading MCA test construction and indicating grade 11 math MCA should follow this plan as well. . April provided an updated version to MDE including revisions based on the Reading Operational Test documentation at 3,762 lines April 29--MDE provided feedback on the'Test Construction section May construction schedule discussed at weekly content call per meeting minutes . June 30--AIR provided an updated version to MDE and posted file on Knowledge Tree for better tracking. July 3--Test construction schedule discussed at weekly content call per meeting minutes - July provided an updated version to MDE including Test Materials development updates July approved recycling grades 3-8 math MCA paper forms. July 23--AIR provided extensive feedback and schedule revisions (including grades 3-8 math MCA TAC feedback) with proposals on how to make outstanding materials meet vendor's Approval-to-Print dates. These revisions addressed issues from MDE's -feedback on 4/29 and weekly content call discussions. Aug 6--MDE provided additional edits on the Test Construction sections Aug provided updated version to MDE incorporating MDE's feedback on 8/6 as well as feedback provided during the weekly content calls Sept 4-Conference call with MDE and AIR to discuss outstanding questions on Item development and Test Construction schedules. AIR made live edits while sharing the computer screen with MDE. AIR prepared the baseline of the Test Construction section per MDE's request. AIR provided an updated version of the master schedule to MDE. SECTION 5--ONLINE ADMINISTRATIVE SYSTEM AND TEST ENGINE Failure to Finalize Mathematics Adaptive Algorithm for 2013-14 School Year Scope of Work Sections 5.5.1 and 5.6.6). Under the schedule (Section4.4.2.7), AIR was to finalize the mathematics adaptive algorithm for the 2013-14 test administrations by August 27, 2013. AIR failed to meet this deadline. MDE was not able to review adaptive simulations for Mathematics OLPA until September 3, 2013 for-grade 3. On September 4, AIR informed MDE the simulation results for three more grades would be availableat the close of business on September 5. AIR failed to provide the promised material on September 5 and failed to contact MDE about any delay. On the morning of September 6, when no additional simulation results had been provided, MDE requested an update. AIR provided simulation results for all grades that afternoon. The project timeline originally set MDE approval for the adaptive algorithm as August 30. AIR twice revised this deadline when it was unable to deliver the simulations on time, first to September 9 and then to September 12, 2013. This compressed review schedule jeopardizes the opening of the OLPA . - l000 Thomas Jefferson Street NW, Washington, 20007-3835 I 202.403.5000 i 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 23 of 30- testing window and compromises MDE's ability to thoroughly review the simulation results and request appropriate changes. These concerns are a continuation of, anddirectly tied to, AIR's failure to satisfy other portions of the contract discussed in this letter, including item exposure, item bank maintenance, scheduling, and projectmanagement. . 0 Moreover, the mathematics MCA adaptive algorithm work was scheduled to begin on July 30 and end on August 27 (Project Schedule 4.4.2.7 .2). On September 16, 2013, AIR inquired about finalizing the pool and MDE _has not received any adaptive simulation results. Here, MDE asserts that it has not received results of simulations about the performance of the "adaptive algorithm for mathematics. However, as MDE knows, simulations cannot be conducted 1 1 . "ii, E's actions have caused serious delays without knowing the item pool. The results' of the simulations depend entirely on the pool of items available to the algorithm. MDE's failure to make timely decisions about which items to include in thepool makes it impossible to run meaningful simulations. To begin simulations for the online adaptive_ systems, anitem bank must first be defined. To that end, AIR has proposed a set of procedures for evaluating items in the accountability bank each year and identifying items for migration to the OLPA item banks. In spring 2012, AIR implemented a misfit statistic' to evaluate items for potential drift so those items could be flagged for recalibration and placed in the OLPA item bank. MDE expressed concern that the item fit statistic was not ideal in the context of the 3PL IRT model, so AIRinvested significant effort in summer 2013 to develop and implement an item fit statistic that could be used to evaluate item misfit in the context of an adaptively administered test. AIR provided MDE with the 'analysis of the misfitting items, as well as item exposure rates, and proposed a set of items for recalibration and_ migration to the OLPA item banks. MDE said that they wanted additional time to evaluate the misfit statistic and decided not to migrate any items to the OLPA bank, which nominally meant that AIR could begin to run simulations- . However, MDE has also decided that continued post-equating of the paper math forms, which AIR recommended against, is not feasible, and decided to move to a pre-equated design for scoring and reporting paper test forms. AIR provided MDE with" recommendations for how pa - er forms could be constructed and scored for future test administration n_aother example of unanticipated onsequence of MDE decision making_ave A orce to ork ar 0 acmmodate them. Although AIR wants to accommodate the desires of its clients,, MDE must share some responsibility for the compressed timeline for receipt and review of simulation results1000 Thomas ?le'fi'erson Street NW, Washington, DC 20007-3835 1 202.403.5000 1 TTY 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 24 of 30 AIR is still waiting for MDE's feedback on analysis of misfitting items and direction on whether they will be administered as part of the 2014 MCA assessments, before simulations of the spring assessments can begin, and the configuration' for the adaptive algorithm finalized thereafter. MDE decisions, actions, or lack of actions have schedule impacts on final deliverables. Simulations cannot begin until the item bank is finalized. MDE continued to delay decisions needed to finalize the item bank. Revising decisions consumes extra time_and affects timeline. Failure to Provide Complete Documentation for 2013-14 School Year Online Administrations Sco eof Work Sections 5.6.1 and 5.6.7 . AIR'was late in providing documentation related to load testing and proactive steps that could be taken to prepare for the 2013-14 administrations. It was due on'August 6, 2013, and was not provided until August 8. Moreover, the documentation AIR provided_was lacking critical technical details such as load simulator specifications, as well as exact dates of load testing. On August 29, 2013, MDE provided feedback and asked AIR to revise the document by Monday, September 9. AIR provided an updated document Thursday, September 12. Finalizing this document is important to communicate load testing opportunities to school districts as well as allow for any adjustments that need to occur at AIR or the school district level. MDE requested information about AIR's load-testing plan, and AIR provided those details. This planning document is not called for in the SOW, and was provided in a spirit of collaboration. The document did not_lack any critical technical documentation. A subsequent request by MDE sought proprietary technical information that is not relevant to the load-testing activity. Since MDE has no contractual basis to request such proprietary information, AIR will not provide it. SECTION 6-AMATERIALS PRODUCTION Inadequate Development of Manuals for 2013--14 (Scope of Work Sections 6.3.2 and Though MDE has explained several times to AIR that manual directions and screenshots must be accurate and consistent with AIR's current systems, AIR continues to present manuals to MDE for final approval without confirming that manual information matches the systems. Failureto Provide High Quality Draft Materials. MDE's role at the Blackline 1 stage should be for reviewing content. The Scope of Work (line 6.3.2) states the vendor will use experienced and skilled writers to develop manuals or new procedures. These documents are to reflect the current testing processes and the - development of these documents should include input and review by the vendor's Project Management staff who are most familiar with the Minnesota - project. AIR's draft manuals repeatedly contain errors, which AIR could prevent with simple fact-checking. For example, MDE recently reviewed AIR's Blackline 1 copy of the GRAD Test Monitor Directions for Online 1000 Thomas .left'e-rson Street NW, Washington, DC20007-3835 202.403.5000 TTY 877.334.3499 1 Ms. Jessie Montano, Deputy Commissioner, October 2, 2013 A Page 25 of 30 Administration. Whilereviewing for content, MDE found several buttons/steps that did not match the AIR systems. AIR indicated that the button cross- references in the first draft were "mostly" correct. Thischeck should have been completed prior to review of Blackline 1, and the first draft should have been consistent with AIR systems. MDE was forced to spend time pointing out cross-reference deficiencies instead of focusing on program content. During their first draft review of the GRAD Test Monitor Directions (three separate documents), MDE noticed that the incorrect versions had been used as the basis for the first draft and notified AIR. A staff member incorrectly archived final copies of the 2012-13 documents. AIR acknowledged the regrettable error and immediately worked to create new drafts of the GRAD Test Monitor Directions. We note that subsequent drafts of the GRAD _Test Monitor Directions went smoothly, and all three documents were completed approximately three weeks ahead of schedule, so the impact of this error was minimal. 0 Failure to Provide Systems in_ a Timely Manner to Release Documentation. MDE also has experienced problems with AIR's development of test User Guides. 4 MDE must approve the test User Guides before the testing system is made available to school districts. The schedule for developing the Guides was designed so that the Guides would be finalized in time to post them on the same day the system is opened to school districts. Because of the MDE's ongoing concerns with AIR's provision of inaccurate screenshots (described above), MDE does not have confidence that the User Guides will be accurate. As a result, MDE has insisted upon seeing the final screenshots in the system before approving these documents, which puts the posting date in jeopardy. In this comment, MDE asserts that they would like systems available earlier than they are scheduled to be completed. The driving consideration is, apparently, having early screenshots for manuals. However, screenshots cannot be generated for a system that is not complete. AIR cannot change our development schedule. However, MDE has the option of not updating - the system used in Minnesota until a year later. In a few instances, MDE has made this decision. For example, all of our other clients are now supporting iPads and Chromebooks (which involve changing screenshots in the manuals), and MDE has declined thatupgrade. SECTION 10--SCORING Limitations of AIR's Scoring System and AIR Qualig Control of Responses {Scope of Work Sections 8.2.4 and 10.2.3). In 2012, MDE made changes to the scoring rules for some mathematics online technology enhanced items. Some of these items also were included on the paper forms in spring 2013. During MDE's scoring verification process it was discovered that AIR carried the 2012 changes to the online scoring rules over to all paper administrations. Because of limitations in AIR's scoring system that failed to recognize all correct paper answers, AIR initially scored some spring 2013 paper tests 1000 Thomas Jefferson Street NW, Washington_._ DC 20007-3835 202.403.5000 TTY 877.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 26 of 30 incorrectly. AIR had not made MDE aware that its scoring system worked this way. To ensure accurate scoring, MDE was forced to capture all correct student responses, which required manual processes by AIR. This delayed reporting results from the originally scheduled June 26, 2013 to July 11,2013. Test construction activities and schedules for 2013-14 have been impacted. As a result of the limitations of AIR's scoring system, MDE has been" forced to remove any items from the paper tests that AIR's system is not able to electronically score correctly, which took additional time. This comment, addressed earlier, is not well founded or fair. MDE specified the rules to be applied when scoring items. AIR implemented those rules with fidelity. MDE was unsatisfied with the implications of those rules, and wanted to change them, which resulted in cost and schedule impacts. This topic was previously addressed with the department in a memo from Dr. Cohen to Ms. Montano on June 21 in response to Ms. Montano's memo sent on June 20. It remains uncontested that AIR's system applied the agreed-upon scoringrules. MDE's late decision to change the scoring rules resulted in schedule delays. The problems MDE sought to address were not only foreseeable, but were foreseen. AIR recommended against including symbols on the paper test form that were not available on the online form. Against AIR's guidance, MDE continued to include symbols on its answer documents that could be selected by the student. The same option was not provided to online students. A This is not a limitation of AIR's scoring system; it is a direct result of MDE's decisions and direction, for which MDE must accept responsibility. SECTION SETTING Inadequate Quality Control of Materials (Scope of Work Section 11.2.7). The materials AIR presented to Standard Setting panelists were not properly reviewed to ensure accuracy of cut score recommendations. During the MTAS Anchor Grade Articulation Meeting, incorrect data were presented twice to panelists. Specifically, panel scores were not correctly summarized for group discussions. Fortunately, panelists and other AIR staff realized the errors before they could have impacted the panel's final recommendation to the commissioner. However, the lack of quality control during such an important process is concerning. In this comment, MDE refers to an incident in which incorrect information was presented to panelists during standard setting, information that was identified as incorrect by AIR staff and corrected immediately, with no consequences. Furthermore, formal quality control procedures were in place that would have caught and corrected the error later, had AIR not corrected it. 1000 Thomas Street. NW, Wasl1ington_,DC 200076835 I 202.403.5000 877.334.3499 I Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 27 of 30 MDE accepted the increasedrisk of issues arising in standard setting by insisting on a compressed schedule. Both AIR and MDE's Technical Advisory Committee recommended a allowing more time for the standard setting. MDE was apprised of the risks' of a compressed schedule by both AIR and MDE's technical advisors. Hence, MDE's instance on the shorter timeline constituted an acceptance of this risk. AIR's effective quality assurance mitigated the risk, and avoided any consequences of a human error made during the standard setting. AIR recommended that standard setting meetings for reading, including (grades 3-8, 10), Modified, (grades 5-8, 10), and MTAS (3-8, 10) take place over a two-week period to ensure that there was sufficient time to carry out all activities thoroughly and accurately. MDE's TAC also recommended that the 'standard setting activities not be forced into a single week and . they expressed concem that too much was being scheduled into too short a time frame. _In particular, insufficient time was scheduled between the conclusion of round activities and convening of table leaders for moderation. In AIR's expedited effort to produce graphs for the moderation meeting, graphs read data from a previous round rather than the current one. We note that QC procedures being conducted during the moderation meeting also identified the mistake, so panelists would have been provided the correct information even if the mistake had not been identified by AIR staff in attendance. MDE knowingly increased the risk of an error by insisting on an unreasonably short time frame, - notwithstanding contrary technical advice. Although this short-lived error occurred, AIR managed to deliver the standard settings without errors in the standards. SECTION 12--REPORTING Late delivery of the Student Detail File for Reading MCA (Scope of Work Section 12.3.1 The Project Schedule states that AIR will provide student detail files for individual tests to MDE no later than July 31, and because of the critical nature of these files for producing final reports to the districts on the publicized date, the contract permits MDE to seek liquidated damages ($10,000 per day) if this deadline is not met. The Reading MCA SOF was due to MDE on July31, 2013. On July 24, AIR confirmed the file would be delivered by the close of business on July 31. It was not delivered until August 3, 2013. There was a regularly scheduled conference call between MDE and AIR on July 30 to discuss tasks related to scoring and reporting, yet AIR did not mention that the SDF file would be late. MDE was not made aware of the potential - delay in receiving the files until the morning of July 31. The file was more than twodays late, which caused significant overtime work by MDE staff (and extra overtime costs to MDE). This compressed timeline on MDE's part introduced additional risk. The delayed delivery of this file put the department's credibility at risk if scores were not reported on the announced date. As noted below, MDE reserves the right to enforce all liquidated damages provisions in the contract. . l000 Thomas .iefferson Street NW, Washingto11_.IDC 20007-3835 202.403.5000 TTY 877.334.3490 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 28' of 30 MDE delayed the processing of the reading data by failing to make a decision about which students to include or exclude in the analysis. MDE halted all analyses due to concerns about impacts related to item load times students experienced during the online administrations on target test administration dates. AIR demonstrated that that item parameter estimates were not impacted by the inclusion or exclusion of students participating on the affected dates. Yet, decisions about what students to include in the final calibrations were delayed by MDE for several critical weeks. A subsequent independent analysis by supported the AIR conclusion. The impact of the MDE decision to delay processing was that AIR quality assurance procedures were pushed back against the delivery date. AIR's final quality assurance procedures found problems, which required several days to correct and confirm. Again, MDE must accept this sort of schedule risk when it causing such delays. SECTION 13 ANALYSIS AND SUPPORT - Late delivery of the Lexile Scores for Reading MCA (Scope of Work Section 13.1.11 and 2.3.7 1. According to the project schedule developed by MDE and AIR in early 2013, AIR was to provide its initial analysis of student Lexiles by June 12, 2013. AIR did not provide any Lexile information until July 29, 2013. AIR's complete initial analysis of student Lexiles was not provided until August 7,2013. As a resultof this delay, Lexile scores were not included in the initial data file provided to districts. Incorporating this information afterward caused additional work on the part of MDE staff and districts; Late delivery of Lexile' scores to MDE was entirely driven by MDE decisions and actions: 0 MDE directed AIR to de-lay the analysis; and . 0 At the last minute, MDE abandoned the approved plan for conducting the analysis. All analyses associated with the reading assessments were originally scheduled for completion prior to conducting the standard setting workshops at the end of June 2013. However, MDE halted all analyses due to concerns about impacts related to item load times students experienced during the online administrations on target test administration dates. Even though AIR demonstrated that item parameter estimates were not impacted by the inclusion of students participating on the affected dates, decisions about what students to include in the final calibrations were delayed by MDE for several critical weeks. Further delays by MDE in reviewing and signing off on ordered item books for standard setting not only placed in jeopardy the successful execution of the standard setting workshops, but made it impossible to maintain any other analysis deliverable related to the reading assessments. We note that MDE has made no further progress on the development of the vertical scale beyond what was executed by AIR following the TAC-approved plan. 1' 1000 Thomas ;icfte1'son StreetNW, \Vashingto11, 1 202.403.5000 377.334.3499 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 "Page 29 of 30 When MDE turned its attention to the Lexile analyses, analysis plans that had been reviewed by the TAC on multiple occasions were abandoned and decision-making was started anew. This is counterproductive, precipitates schedule slippage, and unfortunately is characteristic of how the program was managed by MDE. 1 Late delivegy of the 2012 Technical Manual and Yearbook {Scope of Work Section 13.3). According to the Project'Schedule (Section 10.2), the draft 2012 Technical Manual was to be provided to MDE by October "2012. To date, the manual and yearbook still are not posted to the MDE web site. AIR's 61-page draft was provided on January 14, 2013. MDE provided feedback on the draft and-met with AIR in April 2013 to go over the material. At that meeting, MDE indicated that AIR's draft was inadequate and needed to more closely follow the 2011 Technical Manual (230 pages), which MDE provided to AIR in April 2013. The most recent delivery date, proposed by AIR, was September 6. MDE received an updated draft of the 2012 Technical Manual Thursday, September 12,2013. MDE is required to post all technical reports associated with the program in a timely manner. AIR submitted 2012 Yearbook results to MDE in December 2012, as well as a draft technical manual in January 2013. During the January 9 kickoff meeting, AIR, MDE, and discussed a plan for moving forward with Yearbook analyses. As the meeting minutes show, MDE was tasked with providing specifications for additional analyses they had requested. The next communication that AIR received from MDE concerning _the Yearbook was a March 8 2013, phone call in which MDE directed AIR and to' work with MDE staff to develop specifications for additional analyses requested by MDE. AIR posted materials related to the call on March 12. On March 21, MDE stated that they would not be able to review the material until the following week, and finally provided feedback on April 2 (see emails from John Denbleyker titled Re: Call to discuss Yearbook on 3/21/13 and 4/2/13). The delay in the technical manual results directly from a failure of MDE to adhere to established plans, and to react to completed deliverables with new and often unclear requirements. The planned format for the technical manual was presented in our original proposal, and known to MDE for years before the requested change in format. MDE requested a different format three months after the complete technical manual was delivered. This late decision led to significant additional, out-of--scope costs to nonetheless, we made the changes at no additional cost to the state. Conclusion In conclusion, we believe that MDE and the students of 'Minnesota would benefit from better planning and an adherence to process and standards. The current situation allows individual MDE staff members to make and change decisions without accountability. From AIR's I, 1000 Thomas J'effe2'son Street NW, Wasliington, DC 202.403.5000 877.334.34.99 1 Ms. Jessie Montano, Deputy Commissioner October 2, 2013 Page 30 of30 perspective, this increases both risk and costs, and puts AIR at risk in terms of both our finances and our mission. For these reasons we chose not to bid on your testing program. Please be assured that AIR will continue to work diligently to" implement the complete scope of the contract and to provide reliable, accurate, and timely testing services for the remainder of the contract. We will also stand by our commitments to support your transition to a new contractor. To help ensure the success of your program, I recommend that our teams work together to identify the milestones on the critical path to this year's successful test delivery, and make the meeting of those milestones our highest priority. Our Chief Operating Officer, Steve Kromer, and I are available to work directly with you to help see that the milestones are identified and adhered to. I 1000 Thomas Jefferson Street NW Washington, DC 202.403.5000 TTY 877.334.3499 Reading and Math Meeting0"Agenda 2013-2014 Date/Location: 8/14/13 3:00 p.m. -4200 p. m. EDT 2:00 -- 3:00 Central Conference Call Number: 800-503-2899 Code: 4035939 A Project Manager: Kayla Convery, Rivka Gates A tten ded: MDE: - - Angie Norburg, Dave Onsrud, Diana Moore,. George Henly, Jennifer Dugan, Johnny Denbleyker, Julie Nielsen-Fuhrrnann, Kyle Weaver, Linda Sams, Margarita Alvarez, Patricia Olson, Rosemary Heinitz, Sarah Schroyer, Tony Aarts AIR: - Beth Ayers, Catherine Kugler, Evelyn Chester, Hagit Sela, Jess' Unger, Kayla Convery, Kevin Murphy, Rivka Gates, Stephan Ahadi, Wendy Pickett, Yuan Hong, Naomi Lang, Peggy Holland, Kristina Swamy, Joshua Smith, Meredith Durgin Angie Karn, Ben Leer, Dave Payne, Laura Bollinger, Neil Athmann, Pete Tressel, Tom Boatrnan, John Koby Names in bold are the staff that attended. Item amplers Discussion Desire 0utcome'(s) Lexiles - Update Update on Action Standing AIR provided an update on the status of ITS on Friday, 8/9. Items from last Operational item lists for math will be delivered to MDE by the Week end of the week. Operational Item lists for reading will be delivered to MDE by noon on 8/8/13. An update on math will be provided at the same time. AIR will propose a schedule for deployment of the revised Reading MCA item samplers. AIR will update the project schedule to incorporate MDE's revised dates for the Reading new item review panels. AIR will send a list of math items that were previously operational to MDE for review. MDE will complete this review within a week. ITS Update Test Construction Update Deliverables GRAD topics Standing New Item Review Panels Discussion 1 Confidential Reading and Math Meeting Agenda 2013-2014 . . A . item ooi" pl: . Finalizing adaptive Discussion 0 algorithm for 2013- 2014 OLPA and MCA . Spring 2014 Discussion . grades 3-8 paper forms Upcoming. Meetings/Events: - - Reading New item Reviews (grades 3-6) -- September 30-October 1 Reading New item Reviews (grades 7-8 and 10) October 2-3 2 Confidential