Monday, May 18, 2020 at 18:30:25 Eastern Daylight Time Subject: Re: Final CNN request for comment re: Worldometer story Date: Monday, May 18, 2020 at 4:58:09 PM Eastern Daylight Time From: Jill Rosen To: Hernandez, Sergio, Douglas Donovan CC: McLean, ScoM, Perez Maestro, Laura AFribute to a Johns Hopkins University spokesperson: At Johns Hopkins more than two dozen people from mulRple divisions – the Center for Systems Science and Engineering, the Applied Physics Lab, the Bloomberg School of Public Health, and the library’s data services, and others – have been working around the clock every day making sure the site's automated data feeds are accurate and up to date so that policy makers and the public have a free and open resource for tracking the virus. Everything below can be aFributed to Lauren Gardner: How did JHU make the iniPal decision to use Worldometer as a data source for its Covid-19 data? What was its selecPon criteria and veUng process? All sources we use in the dashboard – and there are dozens -- are either primary health authoriRes or data aggregaRon websites that provide sources for their data that can be validated. Before incorporaRng any new source, we validate their data by comparing it against other references. It is important to recognize there are differences in reporRng across sources because reporRng varies across (and even within) countries. For some countries we choose to report data differently than other data aggregaRon sites, and for these countries we source our data directly from the relevant primary health authority for that country/region. This includes, but is not limited to, data for Spain, France, Germany, Australia, US, China, Hong Kong, Macau, Taiwan, Kosovo, Serbia and the West Bank and Gaza. France is one example of the significant challenges in reporRng. We source our data directly from the French Health Ministry and relevant regional health ministries. This data includes (confirmed and probable) cases, deaths, and recoveries in overseas regions and “departments” such as Guadeloupe and MayoMe, as well as a breakdown of cases from nursing homes. The departments are included as separate entries so our numbers have to be corrected to avoid double counRng. This disRncRon separates ours from other data aggregaRon sites. The logic is discussed in depth at this link on our GitHub repository. Which specific data points does JHU rely on Worldometer for? We try not to use a single source for any of our data. We use a wide collecRon of data sources to ensure the best-level of consistency and accuracy as possible. We use reporRng from public health agencies and sources of aggregaRon to cross-validate numbers. We understand that JHU scrapes Worldometer and other sources for new data. Once sources have been scraped and new data idenPfied, is the data automaPcally published to the ArcGIS dashboard or GitHub repository, or does a human validate it first? (If so, what is that validaPon process?) We scrape data from mulRple sources, but before data is published in the dashboard it is processed through a two-stage anomaly detecRon system we have developed specifically for this dashboard. The first layer alerts us to moderate changes in any updated value, so Page 1 of 5 we can double check them in real Rme. A second more conservaRve layer holds back updates that exceed a certain pre-defined threshold. This second layer of the anomaly detecRon system requires a human to manually check and approve the values before publicaRon to the dashboard. Any anomalies are then followed up and addressed immediately. You have referred to a “process of conPnual cross-checking and refinement.” Can you provide more detail on what that process is? We regularly compare our dashboard data against reporRng by independent sources such as WHO, and validate the data trends for each country and state. An example of this is illustrated in the Lancet Inf Dis publicaRon detailing the dashboard in its early stages. This same approach has been in place throughout, and presented in mulRple (recorded) presentaRons I have given on the architecture of the dashboard. When inconsistencies between sources are idenRfied we invesRgate them, and if warranted, we self-correct the github Rmeseries files, and document any change in the github under our ‘data modificaRon records’. RE: JHU has declined to say what specific data points it relies on Worldometer for, but issues with the counter site’s data have caused at least one notable error: On April 8 JHU’s global tally of confirmed Covid-19 cases briefly crossed 1.5 million before dropping by more than 30,000. JHU told CNN at the Pme, that the error appeared to come from a double-counPng of French nursing home cases. Johns Hopkins’s figure appeared to come directly from Worldometer, which did not cite a source for its tally of cases in France. -- The drop in cases was explained in April here, on our Github: hMps://github.com/CSSEGISandData/COVID-19/issues/2094 From: "Hernandez, Sergio" Date: Monday, May 18, 2020 at 9:53 AM To: Jill Rosen , Douglas Donovan Cc: "McLean, ScoM" , "Perez Maestro, Laura" Subject: Re: Final CNN request for comment re: Worldometer story Hi Jill, Thanks for geong back to us so quickly, We understand your posiRon. We could extend our deadline unRl close of business (5 p.m.) today. That would give the University a full business day. We think that’s fair, especially given that most of what we’re outlining below should just be a recap/reiteraRon of CNN’s quesRons from the last couple of weeks. Page 2 of 5 (Specifically, Scon and Laura's questions from late April and May 10712, and some ofthe exchanges we've had on sourcing/vetting when we've noticed errors in the dashboard data.) Thanks, Sergio Hernandez Data Editor, cum I desk: (ml--l mobile: (646]- Fro 'll Roseri Dat Monday, May 18, 2020 at 9:38 AM To ernandez, Sergio" Douglas Donovan Cc cLean, Scott" "Perez Maestro, Laura" Snbjen: Re: Final CNN request for comment re: Worldometer story Thanks Sergio. We're going to need more time to be able to look at all ofthis and respond thoughtfully. Since you had originally mentioned we'd get these questions on Friday, we'd like to have a day ortwo. Fro Hernandez, Sergio" Dat Monday, May 18, 2020 at 9:18 AM To lill Roseri Douglas Donovan McLean, Scott" "Perez Maestro, Laura" Snbjen: Final CNN reouest lorcomment re: Worldometer story Hi AS you are aware, we are preparing a piece for this week about data. The story focuses on Worldometer and how governments (most recently the Spanish government] and institutions [including Johns Hopkins] have relied on it as a source for some oftheir statistics. Below is a list ofour ouestions, lollowed by a list of statements and quotes about JHU that we currently plan to include in our story. Please let uS know ifyou see anything here that is lactually inaccurate or that you'd like to respond to or comment on. If you'd prefer to speak by phone, we'd be happy to record an interyiew and include those comments in the story. We're hoping to post this story tonight/early tomorrow. lfyou could please get back to uS with your responses by 2_ pm. ET today, we would be grateful. Thanks as always, Sergio Sergio Hernandez Data Editor, cum I desk: (ml-l mobile: Questions: Page 3 of 5 How did JHU make the iniRal decision to use Worldometer as a data source for its Covid-19 data? What was its selecRon criteria and veong process? Which specific data points does JHU rely on Worldometer for? We understand that JHU scrapes Worldometer and other sources for new data. Once sources have been scraped and new data idenRfied, is the data automaRcally published to the ArcGIS dashboard or GitHub repository, or does a human validate it first? (If so, what is that validaRon process?) You have referred to a “process of conRnual cross-checking and refinement.” Can you provide more detail on what that process is? Statements about JHU: On April 28, Spanish Prime Minister Pedro Sánchez said: “we have found out about another study, from the Johns Hopkins University, that […] ranks us fi{h in the world in total tests carried out.” Jill Rosen, a spokesperson for the university told CNN that JHU could not idenRfy a report matching Spanish Prime Minister Pedro Sánchez’ descripRon. JHU has not published an internaRonal data on Covid-19 tesRng. The study Sánchez cited does not exist. JHU lists Worldometer as one of several data sources for its Coronavirus dashboard. JHU has declined to say what specific data points it relies on Worldometer for, but issues with the counter site’s data have caused at least one notable error: On April 8 JHU’s global tally of confirmed Covid-19 cases briefly crossed 1.5 million before dropping by more than 30,000. JHU told CNN at the Rme, that the error appeared to come from a double-counRng of French nursing home cases. Johns Hopkins’s figure appeared to come directly from Worldometer, which did not cite a source for its tally of cases in France. Wikipedia editor James Heilman said Wikipedia volunteers have noRced persistent errors with both Worldometer and JHU It’s not clear why the Spanish government conRnues to insist the tesRng data published by Worldometer was put out by JHU Quotes about JHU: Jill Rosen, on why JHU has chosen to rely on Worldometer given concerns about its accuracy or who operates it: “We use many sources to corroborate the data we publish. It’s a process of conRnual crosschecking and refinement to ensure that the data we are present [sic] is as accurate and Rmely as Page 4 of 5 possible.” Spanish Health Minister Salvador Illa, on the provenance of tesRng data cited by Spanish PM Sánchez on April 28: “It is data given by the John Hopkins University […] taken from, as a fundamental source of informaRon, the website Worldometer.” James Heilman, an clinical assistant professor of emergency medicine at the University of BriRsh Columbia and a Wikipedia editor: NoRced persistent errors with Worldometer and also with “a more reputable name with a long history of accuracy” (referring to Johns Hopkins) “We hope they also double check the numbers.” Phil Beaver, data scienRst at University of Denver, on whether JHU should be using Worldometer: “I am not sure, that is a great quesRon, I kind of got the impression that Worldometer was relying on [Johns] Hopkins” Eduardo Mathieu, data manager for Our World in Data: “I think JHU has been under a lot of pressure to update their numbers” “Because of this pressure they have been forced to or incenRvized to get data from places that they shouldn’t have, but in general I would expect JHU to be a fairly reliable source.” Spokeswoman for the Embassy of Spain in London: “Back in April Mr. Sánchez menRoned analysis of staRsRcal data carried out by Johns Hopkins University that are based upon data published by Worldometer.” Spokesperson for a UK government office: “Both Worldometers and John Hopkins provided comprehensive and well respected data. As the situaRon developed, we transferred from Worldometers to John Hopkins as John Hopkins relies more on official sources” Page 5 of 5