Year 3 report: SPP evaluation

nieer.org

YEAR 3 REPORT: SEATTLE 
PRE‐K PROGRAM 
EVALUATION 

Milagros Nores, Ph.D., 
Steve Barnett, Ph.D., 
Kwanghee Jung, Ph.D. 
Gail Joseph, Ph.D., Lea 
Bachman, Ph.D. & 
Janet S. Soderberg, 
Ph.D., The National 
Institute for Early 
Education Research & 
Cultivate Learning 

September 2018

NIEER Technical Report

1

 Year 3 report: SPP evaluation

nieer.org

About the Authors
Milagros Nores, Ph.D. Dr. Nores is Co-Director for Research at The National Institute for Early
Education Research (NIEER) at Rutgers University. Dr. Nores conducts research at NIEER on
issues related to early childhood policy, programs, and evaluation, both nationally and
internationally. She is also on staff with the Center for Enhancing Early Learning Outcomes
(CEELO), a federally funded comprehensive center that provides technical assistance to state
agencies around early childhood.
W. Steve Barnett, Ph.D. Dr. Barnett is a Senior Co-Director of the National Institute for Early
Education Research (NIEER) and a Board of Governors Professor at Rutgers University. He is
also Principal Investigator of the Center for Enhancing Early Learning Outcomes (CEELO). His
research includes studies of the economics of early care and education including costs and
benefits, the long-term effects of preschool programs on children's learning and development, the
economics of human development, practical policies for translating research findings into
effective public investments and the distribution of educational opportunities.
Kwanghee Jung, Ph.D. Dr. Jung is an Assistant Professor at The National Institute for Early
Education Research (NIEER) at Rutgers University. Her expertise is in quantitative data analysis
and the effect of participation in child care and early education on children’s learning and
development.
Gail Joseph, Ph.D. Dr. Joseph is the Bezos Family Distinguished Professor in Early Learning at
the University of Washington. She teaches courses, advises students, provides service and
conducts research on topics related to early care and education. She is the Founding Executive
Director of Cultivate Learning at the University of Washington (previously known as the
Childcare Quality and Early Learning Center for Research and Professional Development,
CQEL).
Lea Bachman, Ph.D. Dr. Bachman is a Research Associate at Cultivate Learning (CL) at the
University of Washington. She leads CL's work on the SPP Evaluation Study and conducts
research on topics related to early childhood education and assessment. She is a psychologist
with significant experience in classroom observation, data collection and management.
Janet S. Soderberg, Ph.D. Dr. Soderberg is a Senior Research Scientist Cultivate Learning at
the University of Washington. Dr. Soderberg's research includes exploration of the association
between classroom quality and children’s development, QRIS program evaluation, refinement
and support of kindergarten entrance assessments, and dissemination of research. Her interests
include child development, assessment, childcare quality, and multi-systems alignment.

NIEER Technical Report

2

 Year 3 report: SPP evaluation

nieer.org

Grateful acknowledgment is made to Erica Johnson and the Seattle’s Preschool Program for their
support on this project. The authors are also grateful to Lea Bachman and Ran Guo for their
assistance in producing this report.
Correspondence regarding this report should be addressed to Milagros Nores at the National
Institute for Early Education Research. Email: mnores@nieer.org.
Permission is granted to reprint this material if you acknowledge NIEER and the authors. For
more information, call the Communications contact at (848) 932-4350, or visit NIEER at
nieer.org.

Suggested citation: Nores, M., Barnett, Jung, K., W.S., Joseph, G., Bachman, L., & Soderberg,
J.S. (2018). Year 3 report: Seattle Pre-k program evaluation. New Brunswick, NJ: National
Institute for Early Education Research & Seattle, WA: Cultivate Learning.

NIEER Technical Report

3

 Year 3 report: SPP evaluation

nieer.org

Table of Contents
Table of Contents ............................................................................................................................ 4 
Executive Summary ........................................................................................................................ 5 
Introduction ..................................................................................................................................... 7 
Study Methods ................................................................................................................................ 7 
Sample......................................................................................................................................... 8 
Measures ................................................................................................................................... 10 
Measures on Children ........................................................................................................... 10 
Measures on Classrooms....................................................................................................... 11 
Procedures ................................................................................................................................. 12 
Methods..................................................................................................................................... 13 
Results ........................................................................................................................................... 13 
1. Who enrolled in SPP in 2017–18, and how do they compare demographically to children in
Seattle more generally? ............................................................................................................. 13 
2. What was the observed quality of children’s SPP classroom experiences in 2017–18, and
did it improve over the prior year? ........................................................................................... 14 
Average ECERS-3 Results ................................................................................................... 14 
Average CLASS Scores ........................................................................................................ 15 
Distribution of Classroom Quality across Classrooms ......................................................... 16 
ECERS-3 subscales............................................................................................................... 21 
3. How does quality vary within SPP and do children from different backgrounds experience
different quality? ....................................................................................................................... 24 
Classroom quality for Classrooms and FCCs separately ...................................................... 24 
Classroom quality for children from different backgrounds................................................. 26 
Classroom quality by year of entry into SPP ........................................................................ 27 
Associations between program features and quality ............................................................. 28 
4. How did children in SPP classrooms and family child care providers progress in 2017–18,
and how did it vary with classroom quality? Other program characteristics? How did it vary
with child characteristics? ......................................................................................................... 29 
Sensitivity Analyses .................................................................................................................. 35 
Summary ....................................................................................................................................... 35 
References ..................................................................................................................................... 37 
Appendices .................................................................................................................................... 40 
Appendix A. ECERS-3 and CLASS, additional details. .............................................................. 41 
Appendix B. Child Scores, pre, post and gains............................................................................. 44 
Appendix C. Sensitivity Analyses. ............................................................................................... 52 
Appendix D. P-values for tests of differences in means. .............................................................. 60 

NIEER Technical Report

4

 Year 3 report: SPP evaluation

nieer.org

Executive Summary
Third Year Evaluation (2017-18) of the Seattle Preschool Program (SPP)
In 2017–18 SPP grew to 48 classrooms and 13 family child care providers from 32 classrooms in
the prior year. SPP quality continued to improve and now reaches levels associated with strong
gains in children’s learning and development. We recommend that SPP build on its success by
seeking further improvements in the quality of instruction with particular attention to language
and literacy, integration of content across domains in children’s activities, and supports for
sustained, reflective thinking as well as personal care routines that contribute to health.
In Year 3 of the evaluation we addressed four specific questions. Below, we summarize
key findings for each question and note specific sections of the report where additional
information about the findings can be found.
1. Who enrolled in SPP in 2017–18, and how do they compare demographically to children in
Seattle more generally?
SPP children closely resemble the general public-school population in Seattle with respect to
gender, language, and income (p. 13). SPP children are somewhat more likely to identify as
African American or Black and Asian (and less likely to identify as White) than the overall
public-school population. Overall, 74% of the children enrolled in SPP were 4-year-olds, 29%
were dual language learners, and they were identified as 12% Hispanic, 22% White, 27%
African American, and 28% Asian.
2. What was the observed quality of children’s SPP classroom experiences in 2017–18, and did it
improve over the prior year?
SPP quality has continued to improve on two separate measures, the Early Childhood
Environmental Rating Scale—Third edition (ECERS-3) and the Classroom Assessment Scoring
System (CLASS). The average ECERS-3 score increased from 3.89 to 3.99 (on a 7-point scale)
and this increase was statistically significant (p.14). CLASS scores also increased, with a
particularly large gain for instructional supports that are important for building academic
success, moving from 3.06 to 3.42 (also on a 7-point scale) (p.15). Emotional support moved
from 6.29 to 6.38 and Classroom Organization moved from 5.55 to 5.96.
SPP quality as measured by the ECERS-3 and CLASS now exceeds that in some other
major city and state pre-k and/or childcare systems. SPP quality is similar to that of the widely
recognized New York City and San Antonio programs. Quality must continue to improve if it is
to reach the levels in the states and cities with the highest levels of quality observed in research
(p. 20).
3. How does quality vary within SPP, and do children from different backgrounds experience
different quality?
Average quality does not differ significantly between classrooms and family child care providers
(FCCs), which were added this year as part of a pilot (p. 25). Controlling for center and

NIEER Technical Report

5

 Year 3 report: SPP evaluation

nieer.org

classroom characteristics (lead teacher qualifications, class size, among others), quality is lower
for FCCs on the Emotional Support and Classroom Organization dimensions of CLASS.
There is some variation in quality among classrooms, and continuous improvement
should seek to raise the bottom end of the distribution. Average quality as measured by the
ECERS-3 and average quality of instructional support as measured by the CLASS are not
significantly different by race and ethnicity. Modest differences in quality of classroom
organization and emotional supports are observer by children’s race and ethnicity, but average
quality in those two domains were high for all groups (p. 26).
4. How did children in SPP classrooms and family child care providers progress in 2017–18, and
how did it vary with classroom quality? Other program characteristics? How did it vary with
child characteristics?
Children in SPP made gains in every domain measured (p.28). Gains in language, literacy and
mathematics were larger than would be expected based on maturation (increased age) alone.
There were no large differences in gains by gender, race or ethnicity, or language. Children
identified as Asian made smaller gains in receptive vocabulary, but larger gains in executive
functions, when accounting for other child and school characteristics. No systematic differences
in gains were found by income.
Differences in classroom size or curriculum were not found to relate to children’s
performance (p.31). Better quality of classroom organization was associated with strong gains in
math. (Last year’s evaluation indicated that better classroom organization also improves literacy
scores.) Teacher qualifications were not found to be associated with test score gains. AfricanAmerican and Asian teachers’ students had larger gains in vocabulary, reinforcing the
importance of teacher diversity in SPP where most students identify as African-American, Black,
or Asian.
The SPP evaluation was conducted by National Institute for Early Education Research at Rutgers
University and Cultivate Learning, at the University of Washington. This report focuses on the of
the 2017–2018 school year and includes information on the prior two years.

NIEER Technical Report

6

 Year 3 report: SPP evaluation

nieer.org

Introduction
The City of Seattle has concluded its third year in its four-year demonstration phase for the
Seattle Preschool Program (SPP). SPP was established after voter approval on 2014 of a fouryear, $58 million property tax levy. The levy’s proposition was of “accessible high-quality
preschool services for Seattle children designed to improve their readiness for school and to
support their subsequent academic achievement.” SPP was subsequently launched in 2015 by the
city of Seattle’s Department of Education and Early Learning (DEEL) with 14 classrooms
operating that year. In the 2016–17 school year, the program more than doubled, operating in 32
classrooms, and by 2017–18, it had expanded to 48 classrooms and 13 family child care
providers.
The four-year demonstration phase of SPP included from its very beginning an evaluation
component to inform its viability and most significantly support its quality improvement
processes. The National Institute for Early Education Research at Rutgers University and
Cultivate Learning, at the University of Washington concluded the third year of the evaluation of
the demonstration phase of the Seattle Preschool Program (SPP).
This report presents findings for the third year (2017–18) of the program and is centered
on classroom quality and children’s learning. The report includes information on the children
served, children’s learning and development during the school year, and program quality across
the three years of SPP so far. It also looks at specific subgroups of children and classrooms and
examines associations between SPP children’s learning gains and their classroom experiences
including observed quality.

Study Methods
The SPP evaluation study is a multi-site study that combines various components in order to
provide a comprehensive appraisal of the program’s quality and its impact on children through
the four-year demonstration period. The third year of the study included collection of child and
classroom information to address the following four questions:
1. Who enrolled in SPP in 2017–18, and how do they compare demographically to children
in Seattle more generally?
2. What was the observed quality of children’s SPP classroom experiences in 2017–18, and
did it improve over the prior year?
3. How does quality vary within SPP, and do children from different backgrounds
experience different quality?
4. How did children in SPP classrooms and family child care providers progress in 2017–
18, and how did it vary with classroom quality? Other program characteristics? How did
it vary with child characteristics?
The SPP evaluation was framed to understand SPP children’s learning and development, as well
as how classroom processes evolved over time. In Year 1, the research team measured learning
and development at the beginning and at the end of the year, as well as classroom quality. In
Year 2, the research team repeated this process, and also recruited a non-equivalent comparison
group that is composed of children in the waiting list for SPP together with children attending
NIEER Technical Report

7

 Year 3 report: SPP evaluation

nieer.org

centers were some waiting list children ended up enrolled. The team continued to conduct
classroom observations. In Year 3, the research team measured learning and development at the
beginning and at the end of the year as well as classroom quality in SPP classrooms, and family
child care providers (FCCs) which were incorporates this year into SPP as a pilot. FCCs were
brought into the program through two hubs, which were tasked with managing up to eight FCCs.
In the end, one of the hubs contracted with eight FCCs and the other one with seven, for a total
of 13 FCCs being brought into SPP in the 2017-18 school year. Measures and procedures used
across all centers, FCCs and children are described below.
Children were first assessed this year in the Fall of 2017 and assessed again at the end of
the school year in 2018. Direct observations of classroom practices were performed to assess
overall quality, teacher-child interactions, and engagement. Classroom quality observations were
completed between February and March. Quality was assessed using observation protocols
widely established in the field. Figure 1 reports the data collection timeline for the 2017–18
school year.
Figure 1. Data Collection Timeline
2017
September
October
November
2018
January
February
March
April - June








Training for data collectors
Initial SPP site information gathered
Fall assessment visit scheduling
Fall child assessment visits begin
Fall child assessment visits continue
Fall assessment visits completed (by December 8)

 Communications to directors to discuss classroom observations (CLASS &
ECERS-3)
 Unannounced CLASS & ECERS-3 observations
 Unannounced CLASS & ECERS-3 observations continue
 Unannounced CLASS & ECERS-3 observations completed
 Spring assessment visit scheduling (early April)
 Spring child assessment visits

Sample
In the school year 2017–18 the research team assessed 761 children in 48 SPP classrooms and 13
SPP family child care providers at pre- and post-test (615 with the full battery). To recruit
children in the study, consent forms were distributed forms to parents or guardians of all 943
children enrolled in these classrooms. A total of 913 were consented to participate in the study.
We randomly selected 10 children per classroom for the full battery. Figure 1 below shows the
study attrition tree. Seven children required language accommodations.

NIEER Technical Report

8

 Year 3 report: SPP evaluation

nieer.org

Figure 1. Pre-Post Sample Attrition Tree (includes children assessed with PPVT only)
N=943 SPP enrolled children (at some 
point during the year) 

N=913 with consent

N=820 with pre‐test 
(666 Full, 154 PPVT Only)

 N=761 with post‐test (615 
Full, 146 PPVT Only)

N=30 without consent

N=93 without pre‐
test

N=59 without post‐test (51 
Full, 8 PPVT) 

We conducted classroom observations on the 48 SPP classrooms and 13 SPP family child
care providers (FCC) in the Spring of 2018. SPP Classrooms and FCCs are described in Table 2.
Classrooms in SPP in Year 2 use either Creative Curriculum or HighScope Curriculum, they
reported an average class size of about 18 (17.77 in the Spring and 17.31 in the Fall), and they
were distributed across eleven agencies, with about 4 classrooms per agency. FCCs are smaller
in size with average class sizes of about 9 (not all preschool children), and all of them using
Creative Curriculum. Teacher qualifications and race and ethnicity are also reported in Table 1.

NIEER Technical Report

9

 Year 3 report: SPP evaluation

Table 1. SPP Classroom characteristics, N=32
Classroom characteristic

Curriculum
Class Sizea
Agencies/Hubs
Teacher
Qualifications

Creative
HighScope

Less AA/unspecified
AA
BA
MA
Black
Teacher Race and
b
Ethnicity
Hispanic
White
Asian
Average No. Classrooms per Agency/Hub
a

nieer.org

SPP Classroom
Frequency or Mean
(SD1)
18
30
17.77 (2.13)
11
20.84%
6.25%
45.83%
27.08%
18.75%
16.68%
33.33%
14.58%
4.36 (4.46)

SPP FCC
Frequency or
Mean (SD1)
7
6
8.92 (2.18)
2
76.92%
23.08%
0.00%
0.00%
84.62%
0.00%
7.69%
0.00%
6.50 (0.71)

Number of children in classroom as reported by director/roster in the Spring (and for FCCs, in the Winter).
Percentages do not add to 100% as information was not available for all teachers.

b

Measures
Measures on Children
The Peabody Picture Vocabulary Test—Fourth Edition (PPVT-IV; Dunn & Dunn, 2007) is a
228-item test of receptive vocabulary in standard English predictive of general cognitive
abilities. The test is adaptive and can be used with population ages 2.5 and above. The test has
proven reliability based on reported split-half reliabilities or test-retest reliabilities, as well as
concurrent validity (e.g., Qi, Kaiser, Milan, & Hancock, 2006). Results on the PPVT have been
found to be strongly correlated with school success (Blair & Razza, 2007; Early, et al., 2007).
The test is standardized to a mean of 100 and a standard deviation of 15.
The Woodcock-Johnson Psycho-Educational Battery—Third Edition (WJ-III; Woodcock,
McGrew, Mather, & Schrank, 2001) includes several subtests. Two of these were used in this
study: the Applied Problems and Letter-Word Identification subtests. WJ is also adaptive and for
use with populations above the age of 3. The WJ has shown correlations with other tests of
cognitive ability and achievement ranging between 0.60 and 0.70. This measure has been used in
numerous large-scale preschool studies (e.g., Early, et al., 2007; Wong, Cook, Barnett, & Jung,
2008). The test is standardized to a mean of 100 and a standard deviation of 15.
The Dimensional Change Card Sort Task (DCCS; Zelazo, 2006) engages children in
reverse categorization by sorting a set of cards based on different criteria provided by the
examiner. The test assesses attention-shifting, as well as short term memory. Scores on the
DCCS reflect a pass/fail system on three levels of increasing difficulty, and raw scores range
1

SD stands for standard deviation, which is a measure of variation in the data. That is, it measures how close
together or spread apart the classrooms are relative to the mean. The larger the value, the farther apart from the mean
classrooms are, and the smaller the value, the closer to the mean classrooms are, in a specific indicator, such as
classroom size.

NIEER Technical Report

10

 Year 3 report: SPP evaluation

nieer.org

between 0 and 3 based on these levels. There are no standard score equivalents. However, in a
study of test-retest reliability, means by age for children age 48 months or younger were 1.14, for
48–50 months they were 1.33, for 51–53 months they were 1.42, and for 54–56 months they
were 1.58 (Meador et al., 2013).
The Peg Tapping Test (PT; Diamond & Taylor, 1996) asks children to tap a peg twice
when the experimenter taps once and vice versa. The task requires children to inhibit a natural
tendency to mimic the experimenter while remembering the rule for the correct response. Sixteen
trials are conducted with 8 one-tap and 8 two-tap trials in random sequence. The task requires
two abilities: (a) the ability to hold two things in mind—the rule to tap once when experimenter
taps twice and the rule to tap twice when experimenter taps once, and (b) the ability to exercise
inhibitory control over one’s proponent behavior, the natural tendency to mimic what the
experimenter does. The final score for Peg Tapping is a sum of all the 16 items that comprise the
test. Again, while there are no standard score equivalents, in a study of test-retest reliability,
means by age for children age 48 months or younger were 4.05, for 48–50 months they were
4.57, for 51–53 months they were 6.02, and for 54–56 months they were 7.87 (Meador et al.,
2013).
Measures on Classrooms
Early Childhood Environment Rating Scale—Third Ed. (ECERS-3; Harms, Clifford & Cryer,
2014). The ECERS-3 is an observation and rating tool for preschool and kindergarten classrooms
measuring environmental factors and teacher-child interactions. It emphasizes the role of the
teacher in relation to environment and children’s developmental gains. The overall ECERS-3
score is an average on 35 items under 6 domains, which are each rated in a scale between 1 and
7. A rating of 1 indicates inadequate quality, a rating of 3 indicates minimal quality, a rating of 5
indicates good quality, and a rating of 7 indicates excellent quality. A general description of each
of the 35 items on the ECERS-3 is provided in Appendix Table A.1. A recent validation paper
(Early, et. al, 2018) reports a four-factor (Learning Opportunities, Gross Motor, Teacher
Interactions, and Math Activities) structure to the ECERS-3, found moderate correlations with
the three CLASS Pre-K domains, and positive associations with growth in children’s executive
functions (while not with children’s cognitive measures). The ECERS-3 was only used in
classrooms in center-based care.
Classroom Assessment Scoring System Pre-K (CLASS Pre-K; Pianta, La Paro, & Hamre, 2008).
The CLASS Pre-K is an observational tool that identifies the classroom interactions that promote
children's development and learning. Observations consist of four 20-minute cycles, with 10minute coding periods between each cycle, which are then averaged for an overall quality score.
Interactions are measured through 10 dimensions in three domains. The Emotional Support
domain is measured by four dimensions: Positive Climate, Negative Climate, Teacher
Sensitivity, and Regard for Student Perspectives. The Classroom Organization domain is
measured by 3 dimensions: Productivity, Behavior Management, and Instructional Learning
Formats. The Instructional Support domain is measured by three dimensions: Concept
Development, Quality of Feedback, and Language Modeling. Each scale uses a 7-point Likert-

NIEER Technical Report

11

 Year 3 report: SPP evaluation

nieer.org

type scale, for which a score of 1 or 2 indicates low quality, and a score of 6 or 7 indicates high
quality. The CLASS domains and dimensions are outlined in Appendix Table A.2.
Because a CLASS instrument does not exist for mixed aged groupings, Family child care
providers were observed with three CLASS instruments using a Combined CLASS Protocol
(Joseph, Feldman, Phillips & Jackson, 2010),2 which was designed to be used in any child care
facility in a home, with multiple age groups. This protocol integrates the dimensions from Infant,
Toddler, and Pre-K CLASS. There are three dimensions that apply only to pre-K children:
Productivity, Instructional Learning Formats, and Concept Development. All other dimensions
apply to children of different age groups, depending on which children are present.3 In addition,
the combined protocol includes a new dimension, that of Facilitation of Learning and
Development, from the CLASS protocols for children under 3. Observers using the combined
protocol are trained and reliable in all three CLASS instruments, and the items on the combined
protocol draw from the corresponding the Infant CLASS Manual, the Toddler CLASS Manual,
and the Pre-K CLASS Manual, which are used by observers throughout the process. The
protocol requires paying attention to children of all ages. Therefore, if differentiation by age does
not adequately occur (e.g. adequate language modelling is observed for infants and toddlers but
not for preschoolers), scores will reflect the average for the whole age-group served, rather than
only preschool children.4 Further information is provided in Appendix Table A.3.

Procedures
Data collection processes were conducted by Cultivate Learning (CL) at the University of
Washington. The center trained data collectors on standardized child assessments and classroom
observation measures. Data collectors received a two-day training on the measures for child
assessments, were given several days to practice, and were then tested for reliability on the
assessments before starting data collection.
Observations of classroom quality were conducted by trained and reliable observers.
Initial training in administering the observation protocol included the ECERS-3 and the CLASS
protocols. ECERS-3 observers were trained by an ECERS-3 certified trainer and met the ERSI5
reliability requirements for observer certification. The trainee must complete three observations
with the trainer with an average of 85% or above exact matches or one-away from the true score.
All data collectors met the ECERS-3 reliability requirements with agreement percentages
ranging between 89–94%. CLASS observers were trained by a CLASS certified trainer and met
the Teachstone reliability certification requirements. CLASS reliability6 agreement percentages
ranged between 93–100%. Assessment and observation score sheets were cleaned and entered at
CL by trained staff. Language accommodations were made as necessary in the requested
language (N=29). Assessment procedures incorporated culturally sensitive attitudes, knowledge,
2

Protocol designed for Washington State’s QRIS, Early Achievers. Also used in Oregon, see Tout, et. al, (2017).
If a given age group is not present or sleeping during the observation, the particular age group will not be
considered when scoring.
4
Although there may be benefits of mixed age-grouping that the CLASS is not designed to capture.
5
ERSI is the company that sells ECERS-3; for information on the tool and reliability go to http://www.ersi.info/
6
Teachstone is the company that sells CLASS products and manages CLASS certifications. All training activity is
monitored and reported to them. http://www.teachstone.com/about-teachstone/.
3

NIEER Technical Report

12

 Year 3 report: SPP evaluation

nieer.org

interview skills, intervention strategies and evaluation practices specifically informed by the age
of the children in the study. Satisfaction surveys were delivered after data collection to providers
to follow up on the procedures followed by data collectors, their interactions with the sites, and
whether the experience was overall positive and responses to these questions were quite
positive.7

Methods
To address the descriptive questions on classroom quality and change over time, or differences
across types of providers, data were collected and analyzed from the ECERS-3 and the CLASS.
Two tailed two sample t-tests assuming unequal variances were used to test changes in quality
between years, or between FCCs and Classrooms or to compare the quality received for males
versus females. One-way anovas, with Bonferroni multiple-comparison tests are used to tests for
differences in quality experienced by different subgroups of children (across race and ethnicity,
by language indicators, and by FPL levels).8
To address the question concerning children’s development over the school year, the
child assessments collected from a randomly selected group of children is first described across
subgroups and over the years (in terms of standard gains) and then analyzed using multivariate
analyses to explore the relationship between children’s growth and child demographic
information, as well as school and classroom features.

Results
Each of the research questions is addressed individually. Analyses draw from all the SPP
classrooms. SPP FCCs are incorporated into comparisons later below in question 3, as well as in
analyses in question 4. Questions 3 and 4 also incorporate information on the sample of children
in SPP classrooms (although all children were assessed with the PPVT).
1. Who enrolled in SPP in 2017–18, and how do they compare demographically to children
in Seattle more generally?
Children’s demographics9 are summarized in Table 2, below, which also summarizes similar
demographics for children enrolled in Seattle Public Schools (as these children embody the SPP
program target population). Children in the sample were mostly 4-year-olds (74%) and
predominantly from English-speaking households (57%), with 29% speaking other languages,
including Vietnamese, Amharic, Mandarin, Somali, and Oromo, among others. Children more
predominantly represented non-Whites than children in Seattle Public Schools, with 22% White,
7

Only 17 sites answered the survey: 100% agreed data collectors entered the facility and checked in as requested;
data collectors' interactions were courteous and professional, and data collectors arrived during the dates/times I
expected. 94% found the experience working with the team positive (with one site reporting the testing time was too
long).
8
These categories are limited by what can be identified in this dataset. This is not indicative of importance over
other categorizations, nor that there may not be important intersectional groupings as well.
9
Demographics were provided by DEEL.

NIEER Technical Report

13

 Year 3 report: SPP evaluation

nieer.org

27% Black (slightly increasing from last year’s 24%), 28% Asian (increasing from last years
17%), 12% Hispanic (also increasing from 8%), and 11% Multiracial/Other. About 77% of the
children were under 300% of the Federal Poverty Level (FPL).
Table 2. Child demographics for SPP study children relative to children in Seattle Public Schools
Child Characteristics
Gender
Female
Male
Age at Pre-Test
3-Year-Olds
4-Year-Olds
Primary Language
English
Non-English
Unknown

SPP Children 2017–18
N
%

Seattle Public
Schools

386
375

50.7%
49.3%

51.3%a
48.7%a

196
565

25.8%
74.2%

-

437
219
105

57.4%
28.8%
13.8%

21.7%a
-

203
157
127
88
175
11

26.7%
20.6%
16.7%
11.6%
23.0%
1.4%

236
157
190
175
3

31.0%
20.6%
25.0%
23.0%
0.4%

33.9%a,c

164
200
214
93
82

21.8%
26.6%
28.4%
12.4%
10.9%

47.2%a
15.0%a
14.0%a
12.1%a
11.7%a

Income
20,000 or Less
21,000-40,000
41,000-60,000
61,000-80,000
81,000 or more
Unknown
FPL Percentage
Less than 100%
100 – 199%
200 – 299%
≥ 300%
Unknown
Race/Ethnicity
White
Black
Asian
Hispanic
Multi-Racial/Other

-

a

Seattle Public Schools as reported in http://www.seattleschools.org/district/district_quick_facts.
Students attending Seattle Public Schools, as reported in Rivers (2016).
c
Based on Free and Reduce Lunch which is for families <185% FPL.
b

2. What was the observed quality of children’s SPP classroom experiences in 2017–18, and
did it improve over the prior year?
Average ECERS-3 Results
ECERS-3 scores for SPP classrooms for 2016 through 2018 are reported in Table 3 below. Mean
scores, standard deviations, and minimum and maximum scores are reported for the six ECERS3 subscales and overall scores. Average ECERS-3 scores and subscale scores in 2018 slightly
NIEER Technical Report

14

 Year 3 report: SPP evaluation

nieer.org

increased relative to 2017 (a 0.19 SD increase) even with the program continuing to grow in
number of classrooms. Variation also increased. Statistically significant differences in the
average compared to the previous year are marked with an asterisk.10
Table 3. ECERS-3 Item, Subscale, and Overall Means and Ranges, 2016-2018
ECERS-3
Item and
Subscales
Overall
Space and
Furnishings
Personal Care
Routines
Language &
Literacy
Learning
Activities
Interaction
Program
Structure

Spring 2016 (N=14)
Mean (SD)
Min Max

Spring 2017 (N=32)
Mean (SD)
Min Max

Spring 2018 (N=48)
Mean (SD) Min Max

3.57 (0.46)
3.88 (0.55)

2.94
2.86

4.50
4.57

3.89* (0.55)
3.94 (0.61)

2.74
2.71

5.44
5.29

3.99 (0.63)
4.25 (0.80)

2.47
2.43

4.94
5.86

3.14 (0.65)

1.75

4.25

3.41 (0.86)

1.50

5.50

2.67 (0.85)

1.00

4.25

3.47 (0.83)

2.40

5.20

3.93 (0.82)

2.40

6.00

4.22 (0.92)

2.40

5.80

2.87 (0.56)

2.10

4.00

3.26 (0.57)

2.40

4.70

3.45 (0.66)

2.18

4.60

4.49 (0.90)
4.43 (0.97)

3.20
2.67

5.80
6.00

4.99 (1.07)
4.67 (0.88)

2.40
3.00

6.80
6.33

5.12 (0.99)
4.76 (1.01)

2.60
2.67

6.60
6.33

Average CLASS Scores
Classrooms were observed using the CLASS pre-K. Scores reported below only include overall
means for the pre-K classrooms in the SPP program for the spring 2016 through 2018 (scores for
the 13 FCCs added as a pilot this year are reported separately below). Table 4 reports mean
scores, standard deviations, and minimum and maximum scores for three CLASS domains. All
three domains increased in mean scores relative to 2017 (increases were of 0.04SD, 0.41SD and
0.44SD, respectively). Statistically significant differences in the average scores compared to the
previous year are indicated in Table 4 by an asterisk.11
Table 4. CLASS Domain Means and Ranges, 2016, 2017 and 2018
CLASS
Domains
Emotional
Support
Classroom
Organization
Instructional
Support

Spring 2016 (N=14)
Mean (SD)
Min Max

Spring 2107 (N=32)
Mean (SD)
Min Max

Spring 2018 (N=48)
Mean (SD) Min Max

6.14 (0.53)

4.88

6.81

6.29 (0.47)

5.19

7.00

6.38 (0.57)

4.19

7.00

5.67 (0.74)

4.17

6.58

5.55 (0.76)

3.42

6.83

5.96* (0.77)

3.75

6.92

2.65 (0.71)

1.50

4.25

3.06 (0.88)

1.67

5.75

3.42* (1.05)

1.75

6.33

In sum, program has shown continuous improvement in quality as measured by the ECERS-3 (in
all areas but the personal care routines scale) and all three CLASS domains. A particularly large
gain is observed for instructional supports which are central for building academic success.

10
11

Two-tailed two-sample t-test assuming unequal variances were used, P-values are reported in Appendix D.
Two-tailed two-sample t-test assuming unequal variances were used, P-values are reported in Appendix D.

NIEER Technical Report

15

 Year 3 report: SPP evaluation

nieer.org

Distribution of Classroom Quality across Classrooms
The ECERS-3 and CLASS domains distributions of classroom quality are depicted in Figures 2
and 3 below. While in the spring of 2018, on average, classrooms scored below the good quality
threshold of 5 in the ECERS-3, the percentage of classrooms scoring above it increased from
38% in 2017 to 54% in 2018. Classrooms scored high on Emotional Support or ES (92% percent
scored above 5.5). Classroom Organization (CO) also had moderately high scores, with a large
portion of classrooms scoring above 5.5 (77%). Following national patterns, classrooms scored
lower on Instructional Support (IS), with 41% of the classrooms scoring above 3.5. This
percentage increased from 25% in 2017.
Figures 2 and 3 present normalized distributions for ECERS-3 and CLASS dimensions
for the spring of 2016 (dotted line), 2017 (striped line) and 2018 (solid line). The ECERS-3
distribution of classrooms evidences a larger portion of classrooms scoring higher in the scale
but also lower maximum scores and minimum scores.
Figure 2. ECERS-3 distributions of normalized scores, 2016-2018
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1

2
Spring 2016 (N=14)

3

4
Spring 2017 (N=32)

5

6

7

Spring 2018 (N=48)

The 2018 CLASS score distributions show an increase in the number of higher-CLASS
ES scoring classrooms which drives the increase in average scores even with the minimum
scores having decreased (panel a). For CLASS CO (panel b) the distribution shows a shift
towards more classrooms scoring in the 5-7 range. For CLASS IS (panel c) there is a shift
towards higher scores, and the distribution is starting to spread across the 3-6 score, driving the
increase in the mean.

NIEER Technical Report

16

 Year 3 report: SPP evaluation

nieer.org

Figure 3. CLASS Domain distributions of normalized scores, 2016-2018
a. CLASS Emotional Support
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1

2

3

Spring 2016 (N=14)

4

5

Spring 2017 (N=32)

6

7

Spring 2018 (N=48)

b. CLASS Classroom Organization
0.6
0.5
0.4
0.3
0.2
0.1
0
1

2
Spring 2016 (N=14)

NIEER Technical Report

3

4
Spring 2017 (N=32)

5

6

7

Spring 2018 (N=48)

17

 Year 3 report: SPP evaluation

nieer.org

c. CLASS Instructional Support
0.6
0.5
0.4
0.3
0.2
0.1
0
1

2
Spring 2016 (N=14)

3

4
Spring 2017 (N=32)

5

6

7

Spring 2018 (N=48)

Table 5 and Figure 4 contextualize SPP ECERS-3 scores in relation with other 4
programs/studies: in GA, PA, UW state pre-K and childcare centers and NJ Abbott districts.12
These are also reported for each subscale (and standard deviations are included when available).
SPP classrooms score on average closely to NJ Abbott’s average score for Space and
Furnishings, Interaction and Program Structure. Areas that underperform the most relative to NJ
Abbott are Personal Care Routines, and Learning Activities. This is depicted in Figure 4.

12

The ECERS-3 is still not as widely used as the ECERS-R, which does not allow for comparisons with many highquality programs.

NIEER Technical Report

18

 Year 3 report: SPP evaluation

nieer.org

Table 5. Studies with reported ECERS-3 scores
Study

Space/
Furnishing

Personal
Care
Routines

Language
& Literacy

Learning
Activities

Interaction

Program
Structure

Average
Total

4.25
(0.80)
3.94
(0.61)
3.88
(0.55)
3.49

2.67
(0.85)
3.40
(0.86)
3.14
(0.65)
3.14

4.22
(0.92)
3.93
(0.82)
3.47
(0.83)
3.36

3.45
(0.66)
3.26
(0.57)
2.87
(0.56)
3.14

5.12
(0.99)
4.99
(1.07)
4.49
(0.90)
4.31

5.12
(1.01)
4.67
(0.86)
4.43
(0.97)
3.64

3.99
(0.63)
3.89
(0.55)
3.57
(0.46)
3.46

3.45

2.89

3.40

2.68

3.88

3.63

3.23

3.74

3.77

3.77

2.93

4.72

4.10

3.68

3.62

3.36

3.62

2.97

4.41

3.92

3.53

4.20
(0.84)
4.43
(1.02)

4.26
(1.14)
4.36
(1.33)

4.70
(1.10)
4.86
(1.26)

4.17
(1.11)
4.22
(1.17)

5.17
(1.30)
5.26
(1.34)

5.02
(1.38)
5.20
(1.31)

4.48
(0.92)
4.61
(1.03)

SPP 2018 (N=48)
SPP 2017 (N=32)
SPP 2016 (N=12)
GA1
UW state pre-K &
childcare (2013-14)
(N=299)2
PA3
GA, PA, WA (201516) (N=1063)4
NJ Abbott:
2016–17 (N=300)5
2015–16 (N=293)6
1Jenson

(2015); 2CQEL (Unpublished); 3PAKEYS (Unpublished); 4Early et. al (2018), subscales estimated from paper 5NIEER
(2017);
(2016).
6NIEER

SPACE AND 
FURNISHINGS 

PERSONAL 
CARE 
ROUTINES

LANGUAGE 
AND 
LITERACY

INTERACTION

NJ 2017

PROGRAM 
STRUCTURE

NJ 2016

4.0
3.9
3.6
3.5
3.2
3.7
3.5
4.5
4.6

3‐state

5.0
5.2

PA

3.6
3.6
4.1
3.9

3.5
3.3
2.9
3.1
2.7
2.9
3.0
LEARNING 
ACTIVITIES

UW

5.1
5.0
4.5
4.3
3.9
4.7
4.4
5.2
5.3

GA

4.2
4.2

SPP 2016 (N=12)

4.2
3.9
3.5
3.4
3.4
3.8
3.6
4.7
4.9

3.4
3.1
3.1
2.9
3.8
3.4
4.3
4.4

SPP 2017 (N=32)

2.7

4.3
3.9
3.9
3.5
3.5
3.7
3.6
4.2
4.4

SPP 2018 (N=48)

5.1
4.7
4.4

Figure 4. SPP ECERS-3 scores by subscale in relation to other programs

OVERALL

Table 6 and Figure 5 below report CLASS scores for SPP classrooms, 2016-2018, and for
selected preschool programs. Seattle is quite at par with the highest scoring programs in the
CLASS Emotional Support and Classroom Organization domains (New York and San Antonio)
and has increased its CLASS Instructional Support domain scores enough to be the just below
previous levels in San Antonio PreK (which actually dropped in scores last year) and the Boston
Pre-K program.

NIEER Technical Report

19

 Year 3 report: SPP evaluation

nieer.org

Table 6. Classroom quality across the nation, and for selected programs
Study

Emotional
Support

Classroom
Organization

Instructional
Support

SPP classrooms 2018 (N=48)
SPP classrooms 2017 (N=32)
SPP classrooms 2016 (N=14)

6.38 (0.57)
6.29 (0.47)
6.14 (0.53)

5.96 (0.77)
5.55 (0.76)
5.67 (0.74)

3.42 (1.05)
3.06 (0.88)
2.65 (0.71)

5.23 (0.57)
5.22 (0.78)
5.63 (0.60)
6.40
6.20
6.00
6.03 (0.28)
5.30
5.96 (0.66)

4.96 (0.69)
4.80 (0.84)
5.10 (0.68)
6.20
6.10
5.80
5.80 (0.36)
4.70
5.26 (0.77)

3.21 (0.93)
3.26 (0.94)
4.30 (0.84)
3.10
3.30
3.60
2.88 (0.54)
2.30
2.34 (0.71)

5.97 (0.63)
6.24 (0.52)
6.44 (0.51)
6.34 (0.64)
6.28 (0.35)

5.32 (0.89)
5.60 (0.79)
5.98 (0.81)
5.93 (0.97)
5.75 (0.60)

3.15 (0.96)
3.55 (1.32)
3.67 (1.23)
3.02 (1.14)
2.82 (0.82)

Tulsa1
TPS pre-k (N=77)
CAP Head Start (N=28)
Boston2 (N=83) (2009-2010)
NYC (N=1,570) (2016–17)3
NYC (N=1,134) (2015–16)4
NYC (N=555) (2012-13 to 2014-15)5
National Head Start Overview 20156
Head Start FACES 20097
EA Validation study (N=75) (20132014)8
NJ Abbott 2013-2014 (N=163)9
San Antonio (N=89) (2017)10
San Antonio (N=89) (2016)11
San Antonio (N=76) (2015)12
San Antonio (N=36) (2014)13
1Phillips

et. al (2009); 2Weiland et. al (2013); 3NYC Department of Education (2018); 4NYC Department of Education (2017);
Department of Education (2016); 6Office of Head Start. (2015); 7Aikens et. al (2013); 8CQEL (Unpublished); 9NIEER
(2014); 10EDVANCE (2017); 11EDVANCE (2016); 12EDVANCE (2015); 13EDVANCE (2014).
5NYC

Figure 5. SPP CLASS scores by domain in relation to other programs
SPP 2016

TPS pre‐k

CAP Head Start

Boston 2009‐10

NYC 2016‐17

NYC 2015‐16

NHS 2015

FACES 2009

EA Validation

NJ Abbott 2013‐14

SA PreK '17

SA PreK '16

SA PreK '15

6.4
6.2
6.0

6.0
5.6
5.7
5.0
4.8
5.1
6.2
6.1
5.8
4.7
5.3
5.3
5.6
6.0
5.9
CLASSROOM ORGANIZATION

3.4
3.1
2.7
3.2
3.3

5.2
5.2
5.6

6.4
6.3
6.1

EMOTIONAL SUPPORT

4.3
3.1
3.3
2.9
2.3
3.2
3.2
3.6
3.7
3.0

SPP 2017

5.3
6.0
6.0
6.2
6.4
6.3

SPP 2018

NIEER Technical Report

INSTRUCTIONAL SUPPORT

20

 Year 3 report: SPP evaluation

nieer.org

ECERS-3 subscales
Items and subscales for the ECERS-3 are reported in Table 7 for 2016, 2017, and 2018,
including the average scores and the ranges, which illustrate the minimum and maximum scored
by classrooms.
The Space and Furnishings subscale incorporates whether children have enough space
and furniture, whether the arrangement of the furniture allows for learning and exploration and
whether displays are meaningful and representative of the children in the class. The items for
“space for gross motor play” and “gross motor equipment” evidence made up the lowest scores
in this subscale. “Indoor space” score have decreased consistently as more programs have been
added. All other items have increased scores through the years. Four items under this subscale
continue to range starting at 1, indicating classrooms scoring at the inadequate rating. In contrast,
this year five items under this subscale showed classrooms scoring at the excellent level.13
The Personal Care Routines subscale, addresses health, hygiene and safety practices in
the classroom. Under personal care routines, only “safety practices” is above the minimal
threshold score of 3. All other score on average are under it, at the inadequate level. All items in
this subscale show reductions in scores with the addition of classrooms. In all items in this scale
there were classrooms scoring at “1” (inadequate) and at “7” (excellent).
Language and Literacy focuses on how staff direct activities and materials towards
supporting children’s development of their language and literacy skills. All but one item under
this subscale have continued a positive trend in relation to the previous two years. “Becoming
Familiar with Print” remains the lowest scoring item (3.44).14 The item for “Staff Use of Books”
averaged 3.79 this year (up from 3.07 in 2016).15 On three items in this scale there were
classrooms scoring at “1” (inadequate), while in the other two the minimums were at 2 and 3
(minimal). Maximum scores were “7” (excellent) in four items this year and only “5” (good) in
one item.
Learning Activities includes the presence, variety, and accessibility of learning materials
in the classroom for children, and at the same time captures the extent to which teachers actively
engage children with different types of materials. Under this subscale, the average for “fine
motor,” “art,” and “math in daily events” were the highest, 4.88, 4.15, and 4.29, respectively.
While still not reaching the level of “good” (5.00), “math in daily events” increased one full
point. In the other areas, scores are lower, and three items remain under the minimal score of 3:
“nature/science,” “math materials and activities,” and “understanding written numbers.” In seven
of the ten items there was improvement relative to 2017.
The Interaction subscale assesses children’s supervision during gross motor time,
teachers’ individualization of teaching and learning and interactions between children and
teachers. Three items under this subscale showed continued positive trends since 2016, and all
two items now score in the good level: “individualized teaching and learning” and “staff-child
interaction.” For all items there are classrooms scoring at 7 (excellent).

13

“Space for gross motor” and “gross motor equipment” have a time requirement of 15 minutes to receive credit in
the “minimal” category of scoring and 30 minutes for “good.” It does not count e.g. walking time to the playground.
14
This item expects observing visible print being combined with pictures and staff taking dictation of children’s
words in a way that is interesting and engaging to children for the purpose of showing print as a useful tool.
15
A score in the good (5) to excellent (7) on this item is attained when all children are observed to be actively
engaged during story time.

NIEER Technical Report

21

 Year 3 report: SPP evaluation

nieer.org

The Program Structure subscale is centered on the general formats of the classroom and
how children spend their time. “Transitions and waiting times” and “free play” showed increases
on average scores, now averaging 5.21 and 4.58.
Table 7. ECERS-3 Item, Subscale, and Overall Means and Ranges by Item, 2016, 2017 and 2018
ECERS-3 Item and Subscales

Space and Furnishings
1. Indoor space
2. Furnishings for care, play and learning
3. Room arrangement for play and learning
4. Space for privacy
5. Child-related display
6. Space for gross motor play
7. Gross motor equipment
Personal Care Routines
8. Meals/ snacks
9. Toileting/diapering
10. Health practices
11. Safety practices
Language and Literacy
12. Helping children expand vocabulary
13. Encouraging children to use language
14. Staff use of books with children
15. Encouraging children’s use of books
16. Becoming familiar with print
Learning Activities
17. Fine motor
18. Art
19. Music and movement
20. Blocks
21. Dramatic Play
22. Nature/science
23. Math materials and activities
24. Math in daily events
25. Understanding written numbers
26. Promoting acceptance of diversity
Interaction
27. Appropriate use of technology
28. Supervision of gross motor
29. Individualized teaching and learning
30. Staff-child interaction
31. Peer interaction
32. Discipline

NIEER Technical Report

2016 Mean
(Range)
N=14

2017 Mean
(Range)
N=32

2018 Mean
(Range)
N=48

6.43 (4-7)
4.36 (4-7)
3.64 (2-7)
4.14 (2-6)
3.36 (1-5)
3.14 (1-4)
2.07 (1-4)

5.47 (2-7)
4.56 (3-7)
4.72 (2-7)
4.53 (1-7)
3.09 (1-4)
3.06 (1-6)
2.13 (1-5)

5.40 (2-7)
4.44 (3-7)
5.04 (2-7)
4.63 (1-7)
4.29 (1-7)
3.10 (1-4)
2.81 (1-6)

3.07 (1-4)
2.21 (1-3)
2.93 (2-4)
4.36 (2-7)

3.88 (1-7)
3.19 (1-7)
2.69 (1-5)
3.88 (1-7)

2.90 (1-5)
2.79 (1-6)
1.88 (1-5)
3.13 (1-7)

3.50 (3-5)
4.36 (3-7)
3.07 (1-6)
4.21 (1-7)
2.21 (1-4)

3.63 (1-7)
4.84 (3-7)
3.50 (1-6)
4.41 (3-6)
3.25 (1-6)

4.63 (3-7)
5.15 (2-7)
3.79 (1-7)
4.08 (1-7)
3.44 (1-5)

4.36 (2-5)
3.71 (2-6)
3.50 (2-5)
2.00 (1-4)
2.79 (1-6)
2.50 (1-4)
1.71 (1-3)
2.86 (1-5)
1.29 (1-2)
4.21 (3-6)

4.47 (2-7)
4.28 (1-7)
3.47 (2-6)
2.97 (1-5)
3.50 (1-7)
2.28 (1-5)
2.25 (1-4)
3.34 (1-5)
1.69 (1-5)
4.34 (2-6)

4.88 (2-7)
4.15 (1-7)
3.58 (1-5)
3.13 (1-7)
3.77 (1-7)
2.73 (1-6)
2.42 (1-6)
4.29 (1-7)
1.44 (1-3)
4.06 (3-6)

N/A (1-1)*
3.71 (1-7)
4.21 (3-7)
4.93 (3-7)
5.00 (3-7)
4.57 (2-7)

N/A
4.56 (1-7)
4.94 (2-7)
5.66 (3-7)
4.84 (1-7)
4.97 (2-7)

N/A
4.67 (1-7)
5.33 (3-7)
5.96 (2-7)
4.85 (1-7)
4.77 (2-7)

22

 Year 3 report: SPP evaluation

Program Structure
33. Transitions and waiting times
34. Free play
35. Whole - group activities for play and learning

nieer.org

4.86 (3-7)
4.50 (3-6)
3.93 (2-5)

4.75 (3-7)
4.44 (2-7)
4.81 (2-6)

5.21 (2-7)
4.58 (3-7)
4.50 (2-6)

Note: (*) Only 2 classrooms received a score for #27, both were 1. All others were N/A.

CLASS: Emotional Support Domain
Table 8 shows the scores for dimensions under the three CLASS domains. The Emotional
Support (ES) domain assesses teacher’s promotion of a nurturing and safe environment for
children to learn. All dimensions in this domain scored on average above 6. The “Positive
Climate” and “Negative Climate” dimensions focus on the emotional connection between
teachers and students.16 Negative Climate scores have been inverted in this report, and scores the
highest (6.94), indicating a lack of expressed negativity. The dimension on “Teacher Sensitivity”
assesses whether teachers anticipate problems and are able to support children effectively
(average 6.23, increasing from 6.04 in 2017). This high range score implies consistency in
teachers’ awareness of children who need assistance or support, responsiveness to their needs,
abilities, problems and emotions, providing individualized support, and generally helping
children feel comfortable to seek support and share thoughts. “Regard for Student Perspectives”
(average 6.04, with a slight increase from 5.96 in 2017) assesses the degree to which teachers
follow children’s interests, motivations, and perspectives and encourage student responsibility
and autonomy. More consistent opportunities for children to have time to express themselves and
move about freely in the classroom, to receive encouragement from the teacher, and to have their
interests acknowledged by the teacher, would bring this score even higher.
CLASS: Classroom Organization Domain
The Classroom Organization domain focuses on how teachers manage and redirect behavior,
how they manage instructional time and routines, and how they manage activities to expand
students’ interests. “Behavior Management” assesses whether teachers provide clear behavioral
expectations and enforce them consistently, whether they are proactive in preventing problems
from arising and effectively redirect misbehavior by focusing on the positive. “Productivity”
measures teachers’ time management, pacing, and transitions throughout the day and across
activities and teachers’ preparation for activities. “Instructional Learning Formats” measures
how teachers’ facilitate student learning during activities, including how effective questions are,
having clear learning objectives, and using modalities and materials to engage children. All three
dimensions in this domain increased in relation to 2017. “Productivity” scored above 6 this year,
and the other two dimensions increased by about 0.40 points each.
CLASS: Instructional Supports Domain
The Instructional Supports Domain assesses the interactions through which teachers enable highorder thinking skills, provide feedback, encourage creativity and reasoning, and promote
16

Positive Climate “reflects the emotional connection between the teacher and students and among students and the
warmth, respect, and enjoyment communicated by verbal and nonverbal interactions” (Pianta, La Paro & Hamre,
p.23). Negative Climate “reflects the overall level of expressed negativity in the classroom” (p. 28).

NIEER Technical Report

23

 Year 3 report: SPP evaluation

nieer.org

language development. This domain is the most important in terms of teacher practices that
impact on student’s learning. It has also proven to be the most challenging. In every published
study of pre-K quality, scores on the Instructional Support domain lag the other two domains.
Therefore, the pattern has been that it scores lower than the other two across programs as seen
above. Two of the three dimensions under this domain maintained its positive trend and
continued to increase this year.
“Concept Development” gauges teachers’ use of discussions to stimulate reasoning,
analysis, and understanding. It also measures teachers’ ability to ask questions that encourage
children to plan, to connect concepts to their lives, and to integrate information with prior
knowledge. Consistency and intentionality are central. Concept Development scored the lowest
(average 2.63). Increasing scores in this dimension requires more consistent use of discussions
and activities to foster problem solving, prediction, comparison, planning and real-world
applications. “Quality of Feedback” (average 3.40) assesses the degree to which teachers’
scaffold, engage in of feedback loops, and utilize metacognitive approaches with children,
encouraging children to think and explain their thinking. “Language Modeling” measures the
quality and quantity of teacher’s language used to promote children’s language development
(average 4.19, up from 3.57). This dimension increased the most under IS, with some classrooms
now scoring at 7 (excellent).
Table 8. CLASS Domains and Dimensions Means and Range by Item, 2016, 2017 & 2018
CLASS Dimensions and Domains

Emotional Support Domain
1. Positive Climate
2. Negative Climate*
3. Teacher Sensitivity
4. Regard for Student Perspectives
Classroom Organization Domain
5. Behavior Management
6. Productivity
7. Instructional Learning Formats
Instructional Support Domain
8. Concept Development
9. Quality of Feedback
10. Language Modeling

2016 Mean
(Range)
N=14
6.14 (4.88-6.81)
5.80 (4.25-7.00)
6.86 (5.75-7.00)
5.91 (4.25-6.75)
5.96 (4.25-7.00)
5.67 (4.17-6.58)
5.73 (3.75-7.00)
6.05 (4.50-7.00)
5.21 (3.50-6.50)
2.65 (1.50-4.25)
2.07 (1.25-3.50)
2.61 (1.50-4.25)
3.29 (1.75-5.00)

2017 Mean
(Range)
N=32
6.29 (5.19-7.00)
6.33 (5.25-7.00)
6.95 (6.63-7.00)
6.04 (4.25-7.00)
5.96 (4.25-7.00)
5.55 (3.42-6.83)
5.46 (3.50-6.75)
5.91 (3.50-7.00)
5.21 (3.00-6.75)
3.06 (1.67-5.75)
2.64 (1.25-5.50)
3.03 (1.50-5.50)
3.57 (1.75-6.25)

2018 Mean
(Range)
N=48
6.38 (4.19-7.00)
6.23 (3.00-7.00)
6.94 (5.00-7.00)
6.23 (4.00-7.00)
6.04 (4.00-7.00)
5.96 (3.75-6.92)
5.98 (3.00-7.00)
6.06 (4.00-7.00)
5.69 (3.00-7.00)
3.42 (1.75-6.33)
2.63 (1.00-6.00)
3.40 (2.00-6.00)
4.19 (2.00-7.00)

Note: (*) The Negative Climate dimension was transposed so that on here, high represents “good”.

3. How does quality vary within SPP and do children from different backgrounds
experience different quality?
Classroom quality for Classrooms and FCCs separately
We also looked at whether classrooms and FCCs differed in quality as measured by the CLASS.
While the measures somewhat differ, the standard for what levels are necessary for quality is
NIEER Technical Report

24

 Year 3 report: SPP evaluation

nieer.org

consistent across all versions of the CLASS.17 Average CLASS scores by domains are reported
in Table 9 for classrooms and for FCCs. Distributions are depicted in Figure 7. CLASS CO and
ES were higher on average on SPP classrooms than SPP FCCs, and these distributions are further
to the right. Overall, there were no statistically significant differences in mean scores across
domains or dimensions.18 This was also the case for the three dimensions that are scored only
with pre-K children in the protocol used in FCCs; that is, productivity, instructional learning
formats and concept development.
Table 9. CLASS Domain and Dimension scores for classrooms in centers and FCCs
Classrooms in Centers
Mean (SD) Min. Max.
Emotional Support
6.38 (0.57) 4.19
7.00
1. Positive Climate
6.23 (0.88) 3.00
7.00
2. Negative Climate*
6.94
0.32
5.00
7.00
3. Teacher Sensitivity
6.23 (0.83) 4.00
7.00
4. Regard for Student Perspectives 6.04 (0.62) 4.00
7.00
Classroom Organization
5.96 (0.77) 3.75
6.92
5. Behavior Management
5.98 (1.04) 3.00
7.00
6. Productivitya
6.06 (0.78) 4.00
7.00
7. Instructional Learning Formatsa 5.69 (0.83) 3.00
7.00
8. Facilitation of Learning & Dev.
n/a
n/a
n/a
n/a
Instructional Support
3.42 (1.05) 1.75
6.33
9. Concept Developmenta
2.63 (1.20) 1.00
6.00
10. Quality of Feedback
3.40 (1.25) 2.00
6.00
11. Language Modeling
4.19 (1.18) 2.00
7.00

Mean
6.03
6.00
6.92
5.85
5.85
5.52
5.46
5.85
5.31
4.15
3.53
2.54
3.31
4.38

FCCs
(SD) Min.
(0.70) 4.50
(0.91) 4.00
0.28
6.00
(0.80) 4.00
(0.80) 4.00
(0.83) 3.50
(1.13) 3.00
(0.69) 4.00
(0.95) 4.00
(1.63) 2.00
(0.96) 2.25
(1.13) 1.00
(1.25) 2.00
(0.87) 3.00

Max.
6.80
7.00
7.00
7.00
7.00
6.38
7.00
7.00
6.00
7.00
5.50
5.00
6.00
6.00

Note: (*) The Negative Climate dimension was transposed so that on here, high represents “good”. a These three are
scored only for pre-K children in the combined protocol used in FCCs.

17

We also estimated alphas for consistency within domains within the CLASS Pre-K used in the 48 classrooms and
the CLASS combined used in the 13 FCCs. Both of these were equally consistent (with alphas between 80%-93%).
18
Two-tailed two-sample t-test assuming unequal variances. P-values in Appendix D.

NIEER Technical Report

25

 Year 3 report: SPP evaluation

nieer.org

Figure 6. CLASS Domain distributions of normalized scores for classrooms and FCCs
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1

2
Classrooms CLASS ES
FCC CLASS ES

3

4
Classrooms CLASS CO
FCC CLASS CO

5

6

7

Classrooms CLASS IS
FCC CLASS IS

Classroom quality for children from different backgrounds
Figure 7 depicts the quality of care by children’s gender, ethnicity/race, language background
and FPL for the SPP children in the sample. Tests of statistical significance between groups
found no significant differences in quality by gender or language. There were however some
differences by race and ethnicity. While there were no differences on the average quality as
measured by ECERS by race/ethnicity, there were statistically significant differences in CLASS
ES and CLASS CO.19 On average, children identified as African American or Black experience
statistically significantly lower levels of CLASS ES and CLASS CO (at a 5% level) relative to
children identified as White. Children identified as Asian, multi-racial or other do so as well for
CLASS CO. Children identified as Hispanic experience higher levels of CLASS CO on average
than children identified as African American or Black and than children identified as multi-racial
or other. Children identified as Hispanic also experience higher average levels of CLASS ES
than Black children. For FPL a statistically significant difference was present for CLASS ES and
CLASS CO, between families under 100% FPL and families above 300% FPL with children of
families above 300% FPL experiencing slightly higher levels of CLASS ES and CLASS CO.

19

One-way anova, with Bonferroni multiple-comparison tests for race/ethnicity, DLL and FPL, and Two-tailed t-test
with unequal variances for gender. P-values in Appendix D.

NIEER Technical Report

26

 Year 3 report: SPP evaluation

nieer.org

Figure 7. ECERS and CLASS Domain scores by Child Characteristics (N=859 for CLASS,
N=910 for ECERS)
7.00
6.00
5.00
4.00
3.00
2.00
1.00

Total

Gender

Ethnicity
ECERS

CLASS_ES

Language
CLASS_CO

>300

100‐300

<100

Unknown

Bilingual

English

Other

Hispanic

Asian

Black

White

Male

Female

0.00

FPL
CLASS_IS

Note: Includes classrooms and FCCs.

Classroom quality by year of entry into SPP
We inquired into whether there were differences in quality between new classrooms in the
program, and those with two or three years in the program. Tables 10 and 11 describe ECERS-3
and CLASS scores for classrooms grouped according to the number of years in SPP. Classrooms
with three years in the program scored slightly higher on the overall ECERS-3 score than those
with two years in the program but did not score higher than new classrooms. Classrooms with
three years in the program also scored higher in CLASS ES than classrooms with fewer years in
SPP. No clear pattern emerges between year cohort of entry into SPP and scores for the rest of
the CLASS domains. Without information on teacher turnover, leadership turnover or other
factors that may define individual classroom growth, we cannot identify within this report what
factors may be contributing to this lack of patterns.
Table 10. ECERS-3 Subscale, and Overall Means and Ranges, 2017 (N=48)
ECERS-3 Item and
Subscales
Overall
Space and Furnishings
Personal Care Routines
Language and Literacy
Learning Activities
Interaction
Program Structure

NIEER Technical Report

3 years in SPP
(N=9)
Mean
(SD)
4.11
0.59
4.38
0.79
3.08
0.79
4.07
0.95
3.58
0.68
5.36
0.85
4.59
0.97

2 year in SPP
(N=27)
Mean
(SD)
3.91
0.65
4.15
0.78
2.70
0.76
4.10
0.94
3.36
0.62
4.99
1.05
4.63
1.03

1 year in SPP
(N=12)
Mean
(SD)
4.11
0.63
4.37
0.88
2.29
0.99
4.58
0.80
3.57
0.74
5.22
0.96
5.19
0.97

27

 Year 3 report: SPP evaluation

nieer.org

Table 11. CLASS Domain Means and Ranges, 2018 (N=61)
CLASS Domains
Emotional Support
Classroom Organization
Instructional Support

3 years in SPP (N=9)
Mean
(SD)
6.40
0.46
5.79
0.80
3.42
1.11

2 year in SPP (N=27)
Mean
(SD)
6.38
0.54
5.97
0.80
3.22
0.87

1 year in SPP (N=25)
Mean
(SD)
6.20
0.72
5.77
0.80
3.70
1.13

Associations between program features and quality
Lastly, we also estimated the association between program features and classroom quality
through multi-level regression models that accounted for classrooms clustering at the agency
level. First, we assessed these associations for the ECERS-3, then for CLASS pre-K only
(classrooms) and then for CLASS pre-K and the CLASS combined (in classrooms and FCCs).
None of the indicators included showed any statistically significant association with quality, with
the exception of a positive association with a teacher meeting or exceeding required
qualifications and the ECERS-3 and missing teacher information and CLASS CO and IS levels.
Results did not quite vary whether we constrained analyses to classrooms (assessed with the
CLASS Pre-K) protocol, or whether we included the FCCs (assessed with the combined
protocol).20 It is critical to acknowledge that the modest number of classrooms provides low
statistical power to detect relationships between classroom characteristics and classroom quality.
FCCs however did show a negative association with CLASS ES and CO quality, after
controlling for classroom characteristics.
Table 12. Association between classroom quality and program features (N=61)
ECERS CLASS ES
0.039
-0.032
(0.05)
(0.05)
Creative Curriculum 0.265
-0.020
(0.19)
(0.18)
Teacher Qual. Meets 0.483
0.306
(0.28)
(0.27)
Teacher Qual. Exc.
0.124
0.513*
(0.22)
(0.23)
Missing T. Qual.
0.618
0.644
(0.34)
(0.33)
T Black
0.130
0.029
(0.33)
(0.31)
T Hispanic
-0.104
0.042
(0.30)
(0.29)
T Asian
0.029
0.109
(0.33)
(0.32)
FCC

CLASS ES CLASS CO CLASS CO CLASS IS CLASS IS
-0.044
-0.058
-0.080
-0.055
-0.052
(0.04)
(0.06)
(0.05)
(0.08)
(0.07)
0.002
-0.262
-0.281
-0.370
-0.274
(0.16)
(0.25)
(0.21)
(0.31)
(0.26)
0.223
0.450
0.361
0.432
0.558
(0.27)
(0.36)
(0.35)
(0.47)
(0.45)
0.127
0.509
0.526
0.297
0.280
(0.23)
(0.29)
(0.29)
(0.38)
(0.38)
0.679*
0.859*
0.898*
1.313*
1.290*
(0.34)
(0.43)
(0.43)
(0.57)
(0.56)
-0.172
-0.181
-0.393
-0.637
-0.464
(0.26)
(0.41)
(0.33)
(0.54)
(0.43)
-0.058
-0.129
-0.231
-0.298
-0.194
(0.28)
(0.37)
(0.35)
(0.50)
(0.47)
-0.008
-0.242
-0.377
-0.512
-0.400
(0.31)
(0.42)
(0.40)
(0.55)
(0.51)
-0.990
-1.144*
-1.453*
(0.80)
(0.49)
(0.62)
N
48
48
61
48
61
48
61
Note: Omitted groups are teacher not meeting qualifications, teacher identifies as White and classroom is centerbased. * p<0.05; ** p<0.01; *** p<0.001.
Class Size

20

While not shown, estimations without including program features showed a negative association between home
provision and CLASS ES and CO. However, this negative association disappeared once we accounted for class
sizes, and for teacher education.

NIEER Technical Report

28

 Year 3 report: SPP evaluation

nieer.org

4. How did children in SPP classrooms and family child care providers progress in 2017–
18, and how did it vary with classroom quality? Other program characteristics? How did it
vary with child characteristics?
This evaluation measured child outcomes in receptive vocabulary (using the Peabody
Picture Vocabulary Test), literacy (using the Woodcock-Johnson Tests of Achievement LetterWord subtest), and math (using the Woodcock-Johnson Tests of Achievement Applied Problems
subtest). In addition, it measured executive functioning (EF) using two measures: the
Dimensional Change Card Sort Game (DCCS) and the Peg Tapping task (PT). The latter two
assess a combination of short-term memory, the ability to inhibit automatic response tendencies
that can interfere with achieving a task, and the capacity for set shifting.
Child gains for the 2017–18 school year for the all children in the SPP sample (all
children were assessed with PPVT and only a random sample was assessed with the rest of the
battery) and then for various child subgroups of interest are reported in Appendix B. The PPVT
(vocabulary) and Woodcock-Johnson (literacy and math) assessments provide standardized
scores that provide comparisons to expected gains after controlling for age. Positive gains in
these standard scores indicate that children gained more than other children from a similar
background adjusting for age. Overall, children’s standard scores increased on all three
measures. Children also improved on the executive function measure. The other trends that stand
out are: (a) growth in gains across all measures compared to the prior year, except for math
(panel c), (b) larger fall to spring gains for children identified as Black, Hispanic, DLL, and low
FPL. Gains for the 2016-17 and 2017-18 school years are reported by race/ethnicity, language
and FPL in Figure 8 below.
Figure 8. Child gains across the different measures by child demographics

Total

Ethnicity
2016–17

NIEER Technical Report

Language
2017–18

FPL

Total

Ethnicity
2016–17

Language

>300

100‐300

<100

Unknown

DLL

English

Other

Asian

Hispanic

Black

5.0
4.0
3.0
2.0
1.0
0.0
‐1.0

White

b. Standard Score LW gains

Total

>300

100‐300

<100

Unknown

DLL

English

Other

Hispanic

Asian

Black

White

6.0
5.0
4.0
3.0
2.0
1.0
0.0
‐1.0

Total

a. Standard Score PPVT gains

FPL

2017–18

29

 Year 3 report: SPP evaluation

nieer.org

c. Standard Score AP gains

d. DCCS gains

Ethnicity

Language

2016–17

Total

FPL

Ethnicity

2017–18

2016–17

Language

>300

100‐300

<100

DLL

Unknown

English

Other

Hispanic

Asian

Black

White

>300

100‐300

<100

Unknown

DLL

English

Other

Hispanic

Asian

White
Total

Black

0.4
0.3
0.2
0.1
0.0

8.0
6.0
4.0
2.0
0.0
‐2.0
‐4.0

FPL

2017–18

e. PT gains

Total

Ethnicity
2016–17

Language

>300

100‐300

<100

Unknown

DLL

English

Other

Hispanic

Asian

Black

White

5
4
3
2
1
0

FPL

2017–18

This next section focuses on assessing if differences (if any) in the school year trajectory
of children across these subgroups exist and doing so through estimations that relate various
children’s characteristics to children’s gains in the various measures included in the study and
controlling for school features.
Multivariate analyses also allow exploring whether there are associations between
children’s learning gains and program features while taking into account children’s
characteristics. We incorporate demographics on the children such as their age, gender, race and
ethnicity, and home language, as well as household demographics such as income, household
size and Federal Poverty Level (FPL). Program features for SPP include class size, agency,
curriculum used (whether it is Creative or High Scope) teacher race and ethnicity, teacher degree
and classroom quality. We also account for the fact that children that are grouped together in the
same classroom or FCC program should not be considered to be independent of each other.
Table 13-15 present the estimates of the associations of program features and child
characteristics with children’s development. We performed separate analyses with the two
measures of quality, one controlling for quality as measured by the ECERS-3 (Table 13), and the
other for quality as measured by the CLASS dimensions for classrooms in centers only (Table
14), as well as including FCCs (Table 15). Statistically significant results are highlighted in bold.
For categorical variables, such as female, the results need to be interpreted in relation to the
omitted group (i.e. males).

NIEER Technical Report

30

 Year 3 report: SPP evaluation

nieer.org

In terms of children’s characteristics, this year21 we do not find evidence of disadvantages
for children that identify as Black or Hispanic across any of the outcomes. Children identified as
Asian evidence lower receptive vocabulary gains (standard and raw) than their peers, yet they
outperform their peers in one of the executive function measures (Peg Tapping, or PT).22 The
latter is also the case for children identified as other. No systematic differences were found for
dual language children (in comparison to English speaking children), by income or FPL. Agency
selected children (usually enrolled by the agency to maintain continuity with previous years)
showed higher receptive vocabulary scores.23 There is no evidence that the program is creating
consistent patterns of advantages or disadvantages that emerge from these results for the
analyzed subgroups of children and across the various areas of development measured.
In terms of program or classroom features, there are no differences by curriculum, with
results shown for HighScope in relation to the omitted group being classrooms implementing
Creative Curriculum. No association was found between classroom size and children’s
performance. There are some positive associations between teachers’ who identify as people of
color24 and children’s vocabulary and literacy gains. No associations are observed between lead
teacher qualifications and children’s outcomes, or between the ECERS-3 measure of quality and
the different measures of child progress. Positive associations were found between CLASS CO
scores and Math standard and raw gains (see Appendix Table C.2 and C.3). Results are quite
consistent in estimations with and without family child care providers (Tables 12 and 13). FCCs
(Table 15) on average show smaller gains in their children’s executive function levels (as
measured by the DCCS), although in estimations including FCCs CLASS CO is positively
associated with changes in DCCS.
Table 13. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and overall ECERS-3, excluding FCCs
Variables
3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL
Agency Selected

Rec.
Vocabulary
(PPVT/TVIP)
-1.271
(1.12)
-2.232
(1.37)
-2.923*
(1.37)
-0.852
(1.40)
-0.043
(1.52)
-2.426
(1.50)
-0.467
(1.15)
2.360*

Literacy
(WJ/WM-LW)

Math
(WJ/WM-AP)

3.435***
(1.01)
1.283
(1.33)
-0.682
(1.24)
-0.222
(1.29)
-1.167
(1.43)
0.594
(1.36)
0.831
(1.01)
-0.006

0.679
(1.10)
-0.664
(1.45)
0.501
(1.37)
-1.214
(1.44)
1.565
(1.59)
1.177
(1.51)
0.926
(1.11)
0.270

Executive Function
DCCS

PT

-0.129*
(0.06)
-0.018
(0.08)
-0.037
(0.07)
-0.117
(0.07)
0.044
(0.08)
0.002
(0.08)
0.030
(0.06)
0.007

-1.432*
(0.65)
-0.060
(0.82)
1.693*
(0.77)
-0.544
(0.81)
1.014
(0.89)
2.448**
(0.85)
-0.588
(0.63)
-0.211

21

Last year Blacks and Hispanics evidenced lower gains in receptive vocabulary and children categorized as Other
evidenced lower literacy scores (Nores, et. al, 2017).
22
Intersectional estimations for gender and race (not shown) found that the negative receptive effect observed for
children identified as Asian is driven by males and the positive executive functions effect is driven by females.
23
This may be due to accumulated benefits of programming, or to selection biases when programs “select” children.
24
Self identification as having an ethnic or minority background.

NIEER Technical Report

31

 Year 3 report: SPP evaluation

(1.08)
-1.805
(2.64)
-1.785
(2.07)
-0.381
(1.99)
0.551
(1.99)
0.289
(2.62)
-0.563
(1.83)
0.518
(1.09)
0.396
(0.27)
2.697

HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
High Scope
Class Size
Teacher Qual.
Exceeds

nieer.org

(1.02)
-3.456
(2.55)
-1.462
(1.88)
-1.348
(1.85)
-2.166
(1.81)
2.275
(2.57)
-1.280
(1.68)
0.094
(1.04)
-0.020
(0.25)
2.173

(1.08)
2.420
(2.80)
-1.196
(2.09)
0.223
(2.04)
0.132
(1.99)
-4.232
(2.80)
-1.666
(1.84)
-0.281
(1.07)
-0.370
(0.26)
1.106

(0.06)
-0.157
(0.15)
-0.188
(0.11)
-0.098
(0.11)
-0.060
(0.11)
0.097
(0.15)
0.015
(0.10)
-0.042
(0.06)
-0.001
(0.01)
0.010

(0.61)
-0.736
(1.59)
-1.369
(1.18)
-0.666
(1.15)
-0.075
(1.13)
0.994
(1.58)
-0.208
(1.04)
0.087
(0.61)
-0.080
(0.15)
-0.127

(1.52)
(1.51)
(1.55)
(0.08)
(0.88)
2.471
-0.419
0.634
0.035
0.907
(1.38)
(1.30)
(1.35)
(0.07)
(0.76)
Teacher Black
1.747
-1.194
-0.066
-0.504
4.869**
(1.65)
(1.69)
(0.09)
(0.96)
(1.66)
Teacher Hispanic
1.932
-0.536
-1.147
-0.130
-1.015
(1.54)
(1.51)
(1.54)
(0.08)
(0.87)
Teacher Asian
2.154
-0.094
-0.082
-1.784
4.527*
(1.80)
(1.86)
(0.10)
(1.05)
(1.87)
Teacher Other
2.000
0.843
-0.002
-0.580
3.312*
(1.30)
(1.33)
(0.07)
(0.75)
(1.30)
ECERS-3
0.451
-0.237
0.492
-0.052
-0.694
(0.76)
(0.74)
(0.77)
(0.04)
(0.43)
N
702
573
573
571
574
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualifications and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Teacher Qual. Meets

Table 14. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and CLASS dimensions, excluding FCCs
Variables
3-year-olds
Returning Status
Asian
Black
Hispanic
Other

Rec.
Vocabulary
(PPVT/TVIP)
-1.255
(1.12)
-2.200
(1.38)
-2.867*
(1.37)
-0.830
(1.41)
-0.053
(1.52)
-2.372
(1.50)

NIEER Technical Report

Literacy
(WJ/WM-LW)
3.217**
(1.01)
1.236
(1.32)
-0.819
(1.24)
-0.135
(1.29)
-1.304
(1.43)
0.650
(1.36)

Math
(WJ/WMAP)
0.555
(1.11)
-0.481
(1.45)
0.553
(1.37)
-1.030
(1.44)
1.362
(1.58)
1.380
(1.50)

Executive Function
DCCS

PT

-0.125*
(0.06)
-0.007
(0.08)
-0.025
(0.07)
-0.106
(0.07)
0.040
(0.08)
0.007
(0.08)

-1.349*
(0.65)
0.045
(0.83)
1.817*
(0.77)
-0.477
(0.81)
1.004
(0.89)
2.475**
(0.85)

32

 Year 3 report: SPP evaluation

DLL

-0.466
(1.15)
2.433*
(1.06)
-1.860
(2.64)
-1.809
(2.07)
-0.410
(2.00)
0.535
(1.99)
4.936
(4.18)
0.385
(2.61)
-0.511
(1.83)
0.396
(1.11)
0.425
(0.27)
2.777

Agency Selected
HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
HH Income Missing
FPL < 100
FPL 100 to 300
High Scope
Class Size
Teacher Qual.
Exceeds

nieer.org

0.904
(1.00)
0.118
(0.98)
-3.203
(2.54)
-1.201
(1.88)
-0.997
(1.85)
-1.946
(1.81)
3.699
(3.77)
2.036
(2.56)
-1.412
(1.67)
-0.480
(1.03)
-0.011
(0.25)
1.751

0.977
(1.11)
0.425
(1.06)
2.717
(2.79)
-1.026
(2.08)
0.371
(2.03)
0.279
(1.99)
4.288
(4.14)
-4.456
(2.78)
-1.669
(1.83)
-1.000
(1.09)
-0.284
(0.26)
0.670

0.030
(0.06)
-0.016
(0.06)
-0.137
(0.15)
-0.184
(0.11)
-0.094
(0.11)
-0.058
(0.10)
-0.056
(0.22)
0.068
(0.15)
0.005
(0.10)
-0.042
(0.06)
0.000
(0.01)
-0.024

-0.618
(0.63)
-0.533
(0.60)
-0.584
(1.59)
-1.398
(1.18)
-0.680
(1.15)
-0.099
(1.13)
0.317
(2.36)
0.745
(1.58)
-0.296
(1.04)
0.322
(0.62)
-0.082
(0.15)
-0.461

(1.48)
(1.42)
(1.52)
(0.08)
(0.87)
2.632
-1.208
-0.128
-0.029
0.402
(1.34)
(1.23)
(1.32)
(0.07)
(0.75)
Teacher Black
2.267
-0.821
-0.090
-0.873
4.943**
(1.59)
(1.69)
(0.09)
(0.96)
(1.67)
Teacher Hispanic
1.889
-0.117
-0.924
-0.130
-1.098
(1.56)
(1.45)
(1.55)
(0.08)
(0.88)
Teacher Asian
2.930
0.428
-0.085
-2.015
4.493*
(1.77)
(1.89)
(0.10)
(1.07)
(1.91)
Teacher Other
1.976
0.413
-0.002
-0.442
2.711*
(1.33)
(1.35)
(0.07)
(0.77)
(1.26)
CLASS ES average
0.322
-2.054
-1.610
-0.100
-0.353
(1.25)
(1.20)
(1.28)
(0.07)
(0.73)
CLASS CO average
0.146
1.029
0.103
0.516
2.063*
(1.00)
(0.95)
(0.05)
(0.57)
(1.01)
CLASS IS average
-0.043
0.930
0.213
-0.041
-0.521
(0.52)
(0.48)
(0.51)
(0.03)
(0.29)
N
702
573
573
571
574
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualifications and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Teacher Qual. Meets

Table 15. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and CLASS dimensions, including FCCs
Variables
3-year-olds
Returning Status

Rec.
Vocabulary
(PPVT/TVIP)
-1.476
(1.08)
-2.450

NIEER Technical Report

Executive Function
Literacy
(WJ/WM-LW)
2.520**
(0.97)
0.835

Math
(WJ/WM-AP)
0.560
(1.05)
-0.514

DCCS

PT

-0.128*
(0.06)
0.003

-1.671**
(0.62)
-0.043

33

 Year 3 report: SPP evaluation

Asian
Black
Hispanic
Other
DLL
Agency Selected
HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
FCC
High Scope
Class Size
Teacher Qual.
Exceeds

(1.37)
-2.691*
(1.36)
-0.848
(1.38)
-0.585
(1.49)
-2.380
(1.48)
-0.606
(1.11)
2.238*
(1.06)
-1.537
(2.58)
-1.297
(2.05)
-0.438
(1.98)
0.356
(1.99)
-0.622
(2.55)
-0.687
(1.82)
0.429
(3.38)
-0.463
(1.06)
0.389
(0.25)
2.290

nieer.org

(1.32)
-0.117
(1.23)
0.273
(1.26)
-0.894
(1.40)
1.219
(1.34)
0.622
(0.97)
-0.077
(1.00)
-2.598
(2.46)
-1.164
(1.87)
-1.163
(1.84)
-1.806
(1.81)
1.107
(2.47)
-1.454
(1.68)
-5.721
(3.06)
-0.720
(1.00)
-0.111
(0.24)
0.941

(1.43)
0.857
(1.34)
-0.640
(1.39)
1.726
(1.53)
1.714
(1.47)
0.529
(1.06)
0.415
(1.05)
1.594
(2.67)
-0.526
(2.05)
0.409
(2.01)
0.300
(1.97)
-3.517
(2.66)
-1.841
(1.82)
-1.494
(3.18)
-1.250
(1.02)
-0.230
(0.25)
0.699

(0.08)
-0.041
(0.07)
-0.138~
(0.07)
0.014
(0.08)
-0.005
(0.08)
0.046
(0.06)
-0.023
(0.06)
-0.194
(0.14)
-0.188
(0.11)
-0.102
(0.11)
-0.079
(0.11)
0.101
(0.14)
-0.006
(0.10)
-0.349*
(0.17)
-0.043
(0.05)
-0.005
(0.01)
-0.060

(0.81)
1.788*
(0.75)
-0.360
(0.78)
0.955
(0.86)
2.442**
(0.83)
-0.595
(0.60)
-0.576
(0.60)
-0.980
(1.52)
-1.342
(1.16)
-0.531
(1.14)
-0.085
(1.12)
1.027
(1.51)
-0.309
(1.03)
-2.276
(1.81)
0.283
(0.58)
-0.108
(0.14)
-0.629

(1.47)
(1.45)
(1.49)
(0.08)
(0.85)
2.474
-1.315
-0.218
-0.035
0.291
(1.34)
(1.27)
(1.31)
(0.07)
(0.74)
Teacher Black
0.773
-0.851
-0.150
-1.138
3.432*
(1.54)
(1.58)
(0.08)
(0.90)
(1.59)
Teacher Hispanic
1.268
-0.752
-0.921
-1.183
-0.160*
(1.54)
(1.49)
(1.52)
(0.86)
(0.08)
Teacher Asian
3.615
1.863
0.717
-0.137
-2.103*
(1.87)
(1.78)
(1.83)
(0.10)
(1.04)
Teacher Other
1.958
0.371
-0.010
-0.534
2.646*
(1.32)
(1.34)
(0.07)
(0.76)
(1.31)
CLASS ES average
0.257
-1.583
-1.697
-0.076
-0.447
(1.22)
(1.20)
(1.23)
(0.07)
(0.70)
CLASS CO average
0.362
1.077
0.662
2.202*
0.112*
(0.98)
(0.96)
(0.56)
(0.98)
(0.05)
CLASS IS average
-0.175
0.824
0.221
-0.045
-0.530
(0.51)
(0.49)
(0.51)
(0.03)
(0.29)
N
735
606
606
604
607
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualifications and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Teacher Qual. Meets

NIEER Technical Report

34

 Year 3 report: SPP evaluation

nieer.org

Sensitivity Analyses 
We also conducted three types of sensitivity checks to assess the robustness of findings. First, we
repeated the analyses with raw scores because imperfections in the standardization could affect
results. Second, we investigated whether a quality threshold made a difference, third, we
replicated the analyses with fixed effects for agencies, which can be interpreted as understanding
differences within agencies.
The results of the three types of sensitivity analyses are summarized as follows.
(1) Results of analyses on raw scores for the PPVT, LW and AP measures (Tables C.1
using ECERS and C.2 and C.3. using CLASS) are consistent with the standard score analyses.
(2) Analyses investigating thresholds of quality are reported in Appendix Tables C.4 for
ECERS and C.5 for CLASS.25 We find no association between the ECERS-3 threshold above 3
and children’s standard score gains (or raw score gains, either, although these are not reported).26
We also find no associations between the CLASS thresholds and children’s outcomes.
(3) Analyses with agency fixed effects (Tables C.6 and C.7) revealed that on average
some few agencies under or over performed in specific few areas of development (not shown),
while the majority seem to have no specific effects on children. That is, for the most part,
children attending most agencies did not perform any different than children attending other
agencies. However, within agencies, ECERS scores were actually negatively associated with the
DCCS measure. On the other hand, CLASS IS differences showed a statistically significant
positive association with letter-word identification changes in children.

Summary
The evaluation finds that SPP quality has continued to improve on two separate
measures, the ECERS-3 and the CLASS. SPP quality as measured by the ECERS-3 and CLASS
now exceeds that in some other major city and state pre-k and/or childcare systems. Average
quality does not differ significantly between classrooms and family child care providers, the
latter having been added to SPP this year as a pilot. Average quality as measured by the ECERS3 and the CLASS instructional support does not significantly differ by race and ethnicity. Modest
differences in the CLASS classroom organization and emotional supports were observed for race
and ethnicity, albeit high for all children regardless. Children in SPP made gains in all measured
domains with gains in language, literacy and mathematics larger than expected based on
maturation. High CLASS classroom organization was associated with strong gains in math for
children in the program. African-American and Asian teachers’ students had larger gains in
vocabulary, pointing to the importance of teacher diversity in SPP. We recommend that the
Seattle Preschool Program builds on its success by focusing further improvement efforts in the
25
Burchinal et al. (2010) found evidence of CLASS IS thresholds at 3.25, and CLASS ES in the 5-7 range, and
Hatfield et al. (2016) found evidence of CLASS IS threshold at 3 and CLASS ES and CO at 6. Given the
distributions of quality in the sample, we chose to use a level of 3 for the ECERS and levels of 5.5 for CLASS
emotional support and classroom organization scales, and a level of 3 for CLASS instructional supports.
26
We also tested the higher level of 5 considered high in the instrument and did not find positive associations either.
These are not reported.

NIEER Technical Report

35

 Year 3 report: SPP evaluation

nieer.org

quality of instruction with particular attention to language and literacy, integration of content
across domains in children’s activities, and supports for sustained, reflective thinking as well as
personal care routines that contribute to health.

NIEER Technical Report

36

 Year 3 report: SPP evaluation

nieer.org

References
Aikens, N., Klein, A. K., Tarullo, L., & West, J. (2013). Getting ready for kindergarten:
Children's progress during Head Start. FACES 2009 Report. (OPRE Report 2013–21a)
Office of Planning. Research and Evaluation, Administration for Children and Families.
Washington, D.C.: US Department of Health and Human Services.
Barnett, W.S. (2013). Expanding Access to Quality Pre-K is Sound Public Policy. New
Brunswick, NJ: National Institute for Early Education Research.
Blair, C., & Razza, R. P. (2007). Relating effortful control, executive function, and false belief
understanding to emerging math and literacy ability in kindergarten. Child development,
78(2), 647-663.
Burchinal, M., Vandergrift, N., Pianta, R., & Mashburn, A. (2010). Threshold analysis of
association between child care quality and child outcomes for low-income children in prekindergarten programs. Early Childhood Research Quarterly, 25(2), 166-176.
Childcare Quality & Early Learning Center for Research & Professional Development
(Unpublished). Early Achievers Standards Validation Study. Seattle: University of
Washington
Childcare Quality & Early Learning Center for Research & Professional Development
(Unpublished). Large Scale Psychometric Assessment of ECERS 3. Seattle: University of
Washington
Diamond, A., & Taylor, C. (1996). Development of an aspect of executive control: Development
of the abilities to remember what I said and to “Do as I say, not as I do”. Developmental
psychobiology, 29(4), 315-334.
Dunn, L. M., & Dunn, D. M. (2007). PPVT-4: Peabody picture vocabulary test. Pearson
Assessments.
Early, D. M., Maxwell, K. L., Burchinal, M., Alva, S., Bender, R. H., Bryant, D. Cai, K.,
Clifford, R.M., Ebanks, C., Griffin, J.A & Henry, G. T. (2007). Teachers' education,
classroom quality, and young children's academic skills: Results from seven studies of
preschool programs. Child development, 78(2), 558-580.
Early, D. M., Sideris, J., Neitzel, J., LaForett, D. R., & Nehler, C. G. (2018). Factor structure and
validity of the Early Childhood Environment Rating Scale–Third Edition (ECERS3). Early Childhood Research Quarterly, 44, 242-256.
Edvance Research (2014). Pre-K 4 SA Evaluation Report. YEAR 1. Final Report Submitted to
Early Childhood Education Municipal Development Corporation. San Antonio, TX:
Author.
Edvance Research (2015). Pre-K 4 SA Evaluation Report. YEAR 2. Final Report Submitted to
Early Childhood Education Municipal Development Corporation. San Antonio, TX:
Author.
Edvance Research (2016). Pre-K 4 SA Evaluation Report. YEAR 3. Final Report Submitted to
Early Childhood Education Municipal Development Corporation. San Antonio, TX:
Author.
Harms, T., Clifford, R. M., & Cryer, D. (2014). Early childhood environment rating scale.
Teachers College Press.
Hatfield, B. E., Burchinal, M. R., Pianta, R. C., & Sideris, J. (2016). Thresholds in the
association between quality of teacher–child interactions and preschool children’s school
readiness skills. Early Childhood Research Quarterly, 36, 561-571.
NIEER Technical Report

37

 Year 3 report: SPP evaluation

nieer.org

Jenson, D. (2015) ECERS-3: One year out. Four States’ Experiences with Planning and
Implementing Use of ECERS-3. Presented at the QRIS National Meeting 2015. Available
at http://www.qrisnetwork.org/sites/all/files/conferencesession/resources/703ECERS3_0.pdf
Joseph, G. E., Feldman, E., Phillips, J. J., & Jackson, E. (2010). The combined CLASS:
Assessing the adult-child interactions in mixed age family childcare. A procedure
manual. Designed for Washington State’s QRIS, Early Achievers. Seattle, WA: Cultivate
Learning.
Lamy, C.E., Frede, E. Seplocha, H., Strasser, J., Jambunathan, S., Juncker, J. A., Ferrar, H.
Wiley, L., & Wolock, E. (2004). Inch by Inch, Row by Row Gonna Make This Garden
Grow. Classroom Quality and Language Skills in the Abbott Preschool Program. Year
One Report, 2002-2003 Early Learning Improvement Consortium. New Jersey: Early
Learning Improvement Consortium. Available at
http://www.state.nj.us/education/ece/research/inch.pdf
Meador, D. N., Turner, K. A., Lipsey, M. W., & Farran, D. C. (2013). Administering Measures
from the PRI Learning-Related Cognitive Self- Regulation Study. Nashville, TN: Peabody
Research Institute. Available at
https://my.vanderbilt.edu/cogselfregulation/files/2012/11/SR-Measure-Training-Manualfinal.pdf
Nores, M., Barnett, W.S., Joseph, G., Stull, S., Jung, K. & Soderberg, J.S. (2017). Year 2 report:
Seattle Pre-k program evaluation. New Brunswick, NJ: National Institute for Early
Education Research & Seattle, WA: Cultivate Learning.
NIEER (2014). New Jersey Abbott Preschool Quality Evaluation Study. Summary Report. New
Brunswick, NJ: National Institute for Early Education Research.
NIEER (2016). New Jersey Abbott Preschool Quality Evaluation Study. Summary Report. New
Brunswick, NJ: National Institute for Early Education Research.
NYC Department of Education (2017). Pre-K Program Assessments Early Childhood
Environmental Rating Scale –Revised (ECERS-R) and Classroom Assessment Scoring
System (CLASS) Release. New York: Author. Available at
http://schools.nyc.gov/NR/rdonlyres/5FEA3D5B-E615-4E16-83A8C4E58A4D6F02/0/201516ProgramAssessmentResultsSummary.pdf
NYC Department of Education. (2015). Pre-K Program Assessments Classroom Assessment
Scoring System (CLASS) and Early Childhood Environmental Rating Scale – Revised
(ECERS-R) Release. New York: Author. Available at
http://schools.nyc.gov/NR/rdonlyres/A8A27BFE-7C58-4F03-8EB7B90E01BA3D0D/0/CLASSandECERSRReleaseDeckFinal.pdf
NYC Department of Education. (December 18, 2015). Mayor de Blasio Announces Over 68,500
Students Enrolled in Pre-K for All. Press office. New York: Author. Available at
http://www1.nyc.gov/office-of-the-mayor/news/954-15/mayor-de-blasio-over-68-500students-enrolled-pre-k-all
Office of Head Start. U.S. A National Overview of Grantee CLASS® Scores in 2015.
Washington, D.C.: Department of Health and Human Services. Available at
http://eclkc.ohs.acf.hhs.gov/hslc/data/class-reports/docs/national-class-2015-data.pdf
PAKEYS (Unpublished). What does the data tell us? The evolution of environment rating scale
(ERS) use within QRIS. Available at https://qrisnetwork.org/sites/all/files/conferencesession/resources/651DataTellsUs.pdf.
NIEER Technical Report

38

 Year 3 report: SPP evaluation

nieer.org

Phillips, D. A., Gormley, W. T., & Lowenstein, A. E. (2009). Inside the pre-kindergarten door:
Classroom climate and instructional time allocation in Tulsa's pre-K programs. Early
Childhood Research Quarterly, 24(3), 213-228.
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System:
Manual Pre-K. Education Review//Reseñas Educativas.
Qi, C. H., Kaiser, A. P., Milan, S., & Hancock, T. (2006). Language performance of low-income
African American and European American preschool children on the PPVT–III.
Language, Speech, and Hearing Services in Schools, 37(1), 5-16.
Rivers, N. M. (2016). Seattle Public Schools and Housing Report. Seattle: Seattle Public
Schools. Available at
https://www.seattleschools.org/cms/one.aspx?portalId=627&pageId=15652.
Tout, K., Magnuson, K. Lipscomb, S., Karoly, L, Starr, R., Quick H., Early, D. Epstein, D.,
Joseph, G., Maxwell, K, Roberts, J., Swanson, C. & Wenner, J. (2017). Validation of the
Quality Ratings Used in Quality Rating and Improvement Systems (QRIS): A Synthesis of
State Studies. OPRE Report #2017-92. Washington, DC: Office of Planning, Research and
Evaluation, Administration for Children and Families, U.S. Department of Health and
Human Services.
Weiland, C., Ulvestad, K., Sachs, J., & Yoshikawa, H. (2013). Associations between classroom
quality and children's vocabulary and executive function skills in an urban public
prekindergarten program. Early Childhood Research Quarterly, 28(2), 199-209.
Wong, V. C., Cook, T. D., Barnett, W. S., & Jung, K. (2008). An effectiveness-based evaluation
of five state pre-kindergarten programs. Journal of policy Analysis and management,
27(1), 122-154.
Woodcock, R. W., McGrew, K. S., Mather, N., & Schrank, F. (2001). Woodcock-Johnson III
NU tests of achievement. Rolling Meadows, IL: Riverside Publishing.
Zelazo, P. D. (2006). The dimensional change card sort (DCCS): A method of assessing
executive function in children. Nature Protocols, 1, 297-301.

NIEER Technical Report

39

 Year 3 report: SPP evaluation

nieer.org

Appendices
Appendix A. ECERS-3 and CLASS, additional details.
Appendix B. Child Scores, pre, post and gains.
Appendix C. Sensitivity Analyses.
Appendix D. P-values for tests of differences in means.

NIEER Technical Report

40

 Year 3 report: SPP evaluation

nieer.org

Appendix A. ECERS-3 and CLASS, additional details.
Table A.1. ECERS-3 Subscale and Item Descriptions.
Subscale
Space for
Furnishings

Items
1. Indoor Space

Description
Considers enough indoor space for children, staff, and basic furnishings
for routines, play, and learning.
2. Furnishings for care,
Focuses on ample furniture for routine care, play, and learning,
play, and learning
including convenient cubbies for individual use.
3. Room arrangement for Space is arranged so that classroom pathways generally do not interrupt
play and learning
play and supervision.
4. Space for privacy
Considers an indoor space for privacy available and set up physically in
the classroom to discourage interruptions.
5. Child-related display
Focuses on appropriate materials displayed for children throughout the
classroom, including simple pictures, posters, and artwork.
6. Space for gross motor Gross motor area is spacious, generally safe, and easily accessible to
play
children.
7. Gross motor equipment Equipment is age appropriate, accessible, and ample enough to interest
every child.
Schedule and sanitary procedures are appropriate during meal times.
Personal Care Meals/Snacks
Staff sit with children to encourage learning.
Routines
Toileting/diapering
Proper sanitary procedures usually followed with pleasant supervision.
Health practices
Proper sanitary procedures used consistently as needed, with a few
lapses.
Safety practices
Considers no more than 2 major safety hazards present indoors or
outdoors.
Language and Helping children expand Measures how frequent staff uses specific words for objects and actions
vocabulary
and descriptive words as children experience routines and play.
Literacy
Encouraging children to
Assesses how frequent staff asks questions that children are interested in
use language
answering and that require longer answers. Includes many conversations
during gross motor free play and routines.
Staff use of books with
Staff read appropriate books to children that relate to current classroom
children
activities or themes, showing interest and enjoyment while doing so.
Encouraging children’s
Many books are accessible and organized in a defined interest center.
use of books
Becoming familiar with
Focuses on how most visible print is combined with pictures, relates to
print
current classroom topics, and shows a variety of words.
Fine motor
Focuses on the accessibility for children of fine motor materials,
Learning
including interlocking building materials, manipulatives, puzzles, and
Activities
art materials.
Art
Art materials, including drawing materials, paints, 3D objects, collage
materials, and tools, must be accessible for children.
Music and movement
Measures how many music materials and activities are accessible for
children during free play.
Blocks
Enough space, unit blocks and accessories from 3 different categories
for 2-3 children to build at once.
Dramatic play
Many and varied dramatic play materials, including dolls, furniture, play
food and dress-up clothes must be accessible for children during free
play.

NIEER Technical Report

41

 Year 3 report: SPP evaluation

Nature/science
Math materials and
activities

Interaction

Math in daily events
Understanding written
numbers
Promoting acceptance of
diversity
Appropriate use of
technology
Supervision of gross
motor
Individualized teaching
and learning
Staff-child interaction
Peer interaction
Discipline

Program
Structure

Transitions and waiting
times
Free play
Whole - group activities
for play and learning

nieer.org

At least 15 nature/science materials, including living things, natural
objects, factual books, tools, or sand/water must be accessible for
children.
At least 10 different appropriate math materials accessible, including
materials to count/compare quantities, measure/compare sizes, and
familiarize children with shapes.
Assess how staff encourages math learning as part of daily routines.
At least 3-5 different materials should be present in the classroom that
shows children the meaning of print numbers.
At least 10 examples of diversity accessible, including books, displayed
pictures and materials.
All observed materials used are appropriate and limited to 10-15
minutes per child during the observation.
Focuses on careful supervision in order to ensure children’s safety.
Many activities observed are open- ended and most allow children to be
successful.
Evaluates frequent positive staff- child interactions, with no long
periods of no interaction.
Captures positive peer interactions during at least half of the
observation.
Children appear to be aware of classroom rules, and generally follow
them with reasonable amount of teacher control.
Classroom transitions are usually smooth and productively engaging.
Free play takes place for 1 hour during observation, including some time
indoors and some time outdoors (weather permitting).
Staff are responsive and flexible in ways that maximize child
engagement during whole group activities.

Table A.2. CLASS Domains and Dimension Descriptions.
Domain
Dimension
Description
Positive Climate Reflects the emotional connection between teachers and children and
Emotional
among children, and the warmth, respect, and enjoyment communicated by
Support
verbal and nonverbal interactions.
Negative Climate Reflects the overall level of expressed negativity in the classroom. The
frequency, quality, and intensity of teacher and peer negativity are key to
this dimension
Teacher
Encompasses the teacher’s awareness of and responsiveness to students’
Sensitivity
academic and emotional needs.
Regard for
Captures the degree to which the classroom activities and teacher’s
Student
interactions with students place an emphasis on students’ interests,
Perspectives
motivations, and points of view and encourage student responsibility and
autonomy.
Behavior
Encompasses the teacher’s ability to provide clear behavior expectations
Classroom
Management
and use effective methods to prevent and redirect misbehavior.
Organization
Productivity
Considers how well the teacher manages instructional time and routines and
provides activities for students so that they have the opportunity to be
involved in learning activities.
Instructional
Focuses on the ways in which teachers maximize students’ interest,
Learning Formats engagement, and abilities to learn from lessons and activities.

NIEER Technical Report

42

 Year 3 report: SPP evaluation

Instructional
Support

Concept
Development
Quality of
Feedback
Language
Modeling

nieer.org

Measures the teacher’s use of instructional discussions and activities to
promote students’ higher-order thinking skills and cognition and the
teacher’s focus on understanding rather than on rote instruction.
Assesses the degree to which the teacher provides feedback that expands
learning and understanding and encourages continued participation.
Captures the effectiveness and amount of teacher’s use of languagestimulation and language-facilitation techniques.

Table A.3. Considerations on The Combined CLASS protocol
The protocol for using Combined CLASS manuals (Joseph, Feldman, Phillips & Jackson,
2010) integrates dimensions from all three CLASS tools (Infant, Toddler, and Pre-K) to allow
for multi-age groupings, most often present in family child care homes. Each of the individual
CLASS protocols contain differing numbers of dimensions i.e., Infant has 4, Toddler has 8,
and Pre-K has 10. Therefore, some dimensions in the Combined CLASS process apply only to
certain age groups. For example, three dimensions apply only to preschool children, four
dimensions apply only to toddlers and preschoolers, and the remaining four dimensions apply
to all age groups. When coding dimensions that span all age groups, consideration is given to
how many children are present within an age group and the relative breadth/depth of
interactions that impact each. For example, imagine half the attendees are infants and half are
preschoolers. In such a scenario, if the caregiver provides appropriate language stimulation to
infants but only provides low level language modeling for Pre-K children, the score on this
dimension may fall in the mid-range when using the Combined Class process even though it
may fall in the low range for Pre-K CLASS children. Only children present are counted and
infants sleeping are not considered “present”. Please note this process is a hybrid model
designed for Washington State’s QRIS and utilized in this study. For information about other
Family Child Care CLASS models, please see “Using the CLASS Measure in Family Child
Care Homes” (Vitiello, 2014) via Teachstone.com
Table A.3. ECERS and CLASS Dimension and Domain Means by Child Demographics, 2018
N
ECERS
N
CLASS_ES
CLASS_CO
Mean
SD
Mean
SD
Mean
SD
859
4.00
0.62 910
6.35
0.59
5.91
0.78
Total
Female
409
3.99
0.63 435
6.34
0.60
5.93
0.77
Gender
Male
450
4.01
0.61 475
6.35
0.58
5.89
0.79
181
4.05
0.58 192
6.45
0.41
6.12
0.60
Ethnicity White
Black
228
3.99
0.66 257
6.26
0.69
5.75
0.87
Asian
240
3.95
0.63 242
6.33
0.62
5.88
0.81
Hispanic
109
4.09
0.54 115
6.45
0.51
6.08
0.66
Other
89
3.90
0.62
92
6.24
0.55
5.75
0.81
479
4.01
0.62 520
6.35
0.54
5.91
0.79
Language English
Bilingual
259
4.02
0.57 268
6.35
0.54
5.89
0.77
Unknown
121
3.89
0.70 122
6.33
0.83
5.98
0.80
<100
271
4.05
0.62 289
6.28
0.67
5.85
0.79
FPL
100-300
401
4.01
0.60 427
6.36
0.56
5.89
0.80
>300
181
3.89
0.65 188
6.42
0.49
6.03
0.73

NIEER Technical Report

CLASS_IS
Mean
SD
3.41
1.04
3.36
0.98
3.46
1.09
3.56
0.97
3.29
0.97
3.45
1.17
3.49
1.07
3.21
0.88
3.38
0.94
3.33
0.99
3.73
1.41
3.36
0.97
3.38
1.04
3.56
1.13

43

 Year 3 report: SPP evaluation

nieer.org

Appendix B. Child Scores, pre, post and gains.
Receptive vocabulary results
Table B.1. reports children’s receptive vocabulary scores for the fall (pre-test) and spring (posttest) and fall to spring gains. Standardized scores—which are adjusted for age—are reported in
this section (raw scores are reported in section further below). The mean standard score for this
measure is set at 100 which represents the average child in the U.S. population at any age. The
standard deviation is 15. Thus, positive gains are an indication that children improved more over
the course of the preschool year than is expected based on the change in age alone. Only valid
scores for children assessed in both fall and spring of the school year are included.
Table B.1. Receptive vocabulary means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
735

PPVT 2017
Fall
Mean
96.17

SD
19.27

371
364
185
550
161
190
203

95.02
97.35
92.70
97.34
108.79
89.07
89.67

19.36
19.13
16.69
19.95
17.69
15.17
18.98

92
81
432
200
103
228
334
170

95.63
104.17
102.80
83.88
92.26
89.24
95.34
107.26

18.49
16.13
17.36
15.70
20.51
17.58
18.74
17.68

PPVT 2018
Spring
Mean
SD
99.33
18.31
99.04
18.13
99.63
18.52
95.63
15.55
100.58
19.01
111.15
15.08
93.88
16.52
93.08
17.43
99.28
18.54
104.05
17.55
104.51
17.40
89.09
14.84
97.53
19.54
93.29
17.23
98.50
17.94
109.17
16.51

PPVT Gains
2017–18
Mean
SD
3.16
11.66
4.02
11.46
2.28
11.80
2.93
10.83
3.24
11.93
2.36
11.89
4.81
12.91
3.41
10.24
3.65
9.65
-0.12
13.07
1.71
12.45
5.21
9.83
5.27
10.59
4.06
12.31
3.16
11.82
1.91
10.41

Children’s pre-test and post-test vocabulary standard scores for selected center
characteristics are reported in Table B.2. (raw scores are reported further below).
Table B.2. Receptive vocabulary means and gains by center characteristics

Total
Curriculum
ECERS
CLASS ES
CLASS
CO
CLASS IS

High Scope
Creative Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

PPVT 2017
Fall
N
Mean
735
96.17
464
96.85
271
95.01
72
93.60
630
96.72
38
88.87
697
96.57
158
94.39
577
96.66
270
93.88
465
97.50

SD
19.27
19.63
18.62
17.14
19.57
15.85
19.37
16.05
20.05
17.37
20.19

PPVT 2018
Spring
Mean
99.33
99.97
98.25
96.29
99.97
92.53
99.71
97.49
99.84
97.53
100.38

SD
18.31
18.51
17.96
16.68
18.46
16.00
18.37
15.40
19.02
17.62
18.65

PPVT Gains
2017–18
Mean
SD
3.16
11.66
3.12
11.28
3.24
12.29
2.69
10.32
3.24
11.87
3.66
8.40
3.13
11.81
3.10
10.12
3.18
12.05
3.65
12.33
2.88
11.25

44

 Year 3 report: SPP evaluation

nieer.org

Literacy results
Children’s WJ-III letter-word (LW) identification scores for the overall sample and by selected
child characteristics are reported in Table B.3. The LW subtest measures children’s ability to
identify letters and subsequently read a list of words of increasing difficulty. The test also has a
mean standard (i.e., age adjusted score) of 100 and a standard deviation of 15 (raw scores are
reported further below).
Table B.3. Literacy means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

WJ-LW 2017
Fall
N
Mean
606
101.20
298
101.43
308
100.97
167
102.40
439
100.74
129
103.09
152
100.16
178
102.52
70
97.56
71
99.94
357
102.31
171
100.61
78
97.41
176
97.32
288
101.16
140
106.10

SD
15.32
16.04
14.60
15.29
15.32
13.87
16.04
17.33
13.37
12.06
15.07
15.66
15.17
15.65
14.05
16.16

WJ-LW 2018
Spring
Mean
102.68
102.47
102.87
105.31
101.68
103.97
102.07
103.98
98.91
102.39
103.03
102.56
101.35
100.28
101.67
107.93

SD
14.92
16.12
13.69
14.78
14.87
12.99
14.71
17.27
14.36
12.26
15.11
14.85
14.30
14.91
13.70
16.12

WJ-LW Gains
2017–18
Mean
SD
1.48
9.77
1.04
9.85
1.90
9.70
2.91
12.34
0.94
8.55
0.88
7.94
1.91
9.44
1.47
11.49
1.36
8.65
2.45
9.68
0.72
8.47
1.95
11.56
3.94
10.67
2.96
11.48
0.51
8.86
1.83
8.72

Table B.4. reports SPP children’s pre- and post-test letter-word identification standard
scores across selected center characteristics (raw scores are reported further below).
Table B.4. Literacy means and gains by center characteristics

Total
Curriculum
ECERS
CLASS ES
CLASS
CO
CLASS IS

High Scope
Creative Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

N
606
382
224
58
515
30
576
129
477
227
379

WJ-LW 2017
Fall
Mean
SD
101.20 15.32
100.40 14.00
102.56 17.27
100.19 13.74
101.36 15.66
99.10
12.59
101.31 15.45
100.39 14.06
101.42 15.64
100.02 14.18
101.90 15.93

WJ-LW 2018
Spring
Mean
SD
102.68
14.92
102.29
13.90
103.34
16.53
101.34
13.48
102.96
15.14
101.00
10.85
102.76
15.10
100.65
14.00
103.22
15.13
101.45
13.99
103.41
15.42

WJ-LW Gains
2017–18
Mean
SD
1.48
9.77
1.89
9.83
0.78
9.65
1.16
11.06
1.61
9.67
1.90
9.61
1.46
9.79
0.26
9.62
1.81
9.79
1.43
10.24
1.51
9.49

45

 Year 3 report: SPP evaluation

nieer.org

Early math results
Children’s pre- and post-test math scores, as measured by the applied problems (AP) subscale of
the WJ-III are reported in Table B.5. Like the two measures above, AP is normed with a mean of
100 and a standard deviation of 15.
Table B.5. Math means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
606
298
308
167
439
129
152
178
70
71
357
171
78
176
288
140

WJ-AP 2017
Fall
Mean
101.54
100.84
102.22
99.90
102.16
110.60
97.22
98.10
99.91
103.94
104.77
95.85
99.22
97.64
100.38
108.81

WJ-AP 2018 Spring
SD
14.40
14.98
13.80
14.53
14.31
11.21
13.08
15.35
12.80
12.99
13.04
14.82
15.22
13.79
14.25
12.95

Mean
103.60
102.72
104.46
103.49
103.65
109.66
99.01
102.07
103.91
106.04
105.71
100.20
101.40
100.09
102.71
109.94

SD
13.84
14.55
13.08
13.70
13.91
10.93
12.71
13.90
14.57
15.38
13.19
14.15
14.37
14.74
12.28
13.80

WJ-AP Gains 2017–
18
Mean
SD
2.06
10.77
1.88
11.45
2.24
10.08
3.58
11.16
1.49
10.57
-0.94
9.32
1.79
9.34
3.97
11.21
4.00
11.48
2.10
12.97
0.94
10.20
4.35
10.90
2.18
12.24
2.45
11.76
2.33
9.60
1.13
11.75

Table B.6. shows children’s pre- and post-test standardized math scores and gains by
selected center characteristics (raw scores are reported below).
Table B.6. Math means and gains by center characteristics
N
606
382
224

Mean
101.54
101.68
101.30

SD
14.40
14.46
14.31

WJ-AP 2018
Spring
Mean
SD
103.60
13.84
103.68
13.88
103.47
13.79

58
515
30
576
129
477
227
379

97.69
102.17
93.87
101.94
98.51
102.36
100.09
102.41

13.87
14.58
9.19
14.51
12.74
14.72
12.83
15.21

100.36
104.22
97.00
103.95
99.71
104.66
101.97
104.58

WJ-AP 2017 Fall
Total
Curriculum
ECERS
CLASS ES
CLASS
CO
CLASS IS

High Scope
Creative
Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

11.51
14.10
11.76
13.86
12.83
13.93
13.68
13.86

WJ-AP Gains
2017–18
Mean
SD
2.06
10.77
2.00
11.43
2.17
9.55
2.67
2.05
3.13
2.01
1.19
2.30
1.88
2.17

10.20
10.96
7.80
10.90
11.35
10.60
10.68
10.83

46

 Year 3 report: SPP evaluation

nieer.org

Executive functions
We used two measures of executive functions. The DCCS is an attention shifting test which taps
into a child’s short-term memory. Table B.7. reports children’s pre- and post-test DCCS scores
by selected child characteristics. As a reference, the Learning-Related Cognitive Self-Regulation
School Readiness Measures for Preschool Children Study (aka the Self-Regulation Measurement
Study) (Meador, et. al, 2013) tested alternative measures of executive functions and included the
DCCS. The authors found average DCCS scores of 1.42 at 51–53 months and 1.62 at 57–59
months (an average difference of 0.20 between these two ages); ranges which include the
average ages at fall and spring testing in this study (53.2 months in the fall and 59.3 in the
spring). Table B.8. report children’s pre- and post-test DCCS scores by selected center
characteristics.
Table B.7. DCCS means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
604
298
306
168
436
129
151
178
69
71
357
170
77
175
287
140

DCCS 2017
Fall
Mean
1.41
1.36
1.45
1.09
1.53
1.69
1.19
1.37
1.30
1.58
1.52
1.21
1.34
1.25
1.38
1.66

SD
0.61
0.60
0.62
0.53
0.60
0.58
0.56
0.62
0.60
0.55
0.61
0.61
0.50
0.56
0.62
0.58

DCCS 2018
Spring
Mean
SD
1.63
0.63
1.59
0.65
1.67
0.60
1.33
0.57
1.75
0.61
1.86
0.57
1.39
0.59
1.60
0.63
1.65
0.56
1.79
0.65
1.71
0.64
1.51
0.59
1.56
0.60
1.53
0.60
1.57
0.62
1.90
0.59

DCCS Gains
2017–18
Mean
SD
0.23
0.59
0.23
0.62
0.22
0.57
0.24
0.60
0.22
0.59
0.17
0.53
0.20
0.64
0.24
0.60
0.35
0.54
0.21
0.63
0.19
0.59
0.31
0.63
0.22
0.53
0.27
0.62
0.19
0.59
0.24
0.57

DCCS 2018
Spring
Mean
SD
1.63
0.63
1.66
0.63
1.60
0.62
1.48
0.70
1.66
0.61
1.50
0.73
1.64
0.62
1.54
0.67
1.66
0.61
1.59
0.65
1.66
0.61

DCCS Gains
2017–18
Mean
SD
0.23
0.59
0.22
0.58
0.23
0.63
0.16
0.64
0.24
0.59
0.23
0.82
0.22
0.58
0.19
0.68
0.24
0.57
0.23
0.62
0.22
0.58

Table B.8. DCCS means and gains by center characteristics

Total
Curriculum
ECERS
CLASS ES
CLASS
CO
CLASS IS

High Scope
Creative Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

N
604
380
224
91
513
30
574
129
475
226
378

DCCS 2017
Fall
Mean
1.41
1.43
1.37
1.32
1.42
1.27
1.42
1.36
1.42
1.37
1.43

SD
0.61
0.60
0.64
0.58
0.62
0.58
0.61
0.62
0.61
0.59
0.62

47

 Year 3 report: SPP evaluation

nieer.org

Children were also assessed with the Peg Tapping (PT) measure. PT is a measure of
inhibitory control. Table B.9. reports children’s pre- and post-test Peg Tapping scores by selected
child characteristics. No norms exist for this measure, either. The Self-Regulation Measurement
Study (Meador, et. al, 2013) included this measure as well. Authors reported average scores of
6.02 at 51–53 months and 8.80 at 57–59 months, with a difference of 2.78. SPP children
advanced similarly throughout the preschool year. Table B.10. reports pre- and post-test PegTapping scores for children in the sample across selected center characteristics.
Table B.9. Peg Tapping means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
607
299
308
167
440
130
152
178
70
71
357
171
79
176
289
140

PT 2017
Fall
Mean
5.50
5.74
5.27
1.83
6.89
8.18
3.52
4.72
5.06
7.21
6.41
3.85
4.94
3.51
5.26
8.53

SD
5.97
5.84
6.09
4.00
6.00
5.93
5.19
5.86
5.78
5.85
6.02
5.58
5.79
5.22
5.82
6.01

PT 2018
Spring
Mean
SD
8.38
6.52
8.02
6.04
8.73
6.95
5.10
5.14
9.63
6.57
9.42
5.77
6.07
5.71
8.46
5.91
8.26
5.80
11.32
9.47
9.20
6.79
7.16
5.94
7.33
5.96
7.58
7.77
7.78
5.92
10.68
5.38

PT Gains
2017–18
Mean
SD
2.88
6.15
2.28
5.28
3.47
6.85
3.27
5.01
2.74
6.53
1.25
5.23
2.55
5.37
3.74
5.16
3.20
5.07
4.11
10.58
2.79
6.58
3.30
5.31
2.39
5.82
4.07
7.83
2.52
5.24
2.15
5.29

Table B.10. Peg-Tapping means and gains by center characteristics

Total
Curriculum
ECERS
CLASS ES
CLASS
CO
CLASS IS

High Scope
Creative Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

N
607
383
224
58
516
30
577
129
478
227
380

PT 2017
Fall
Mean
5.50
5.87
4.87
5.22
5.64
3.53
5.60
4.99
5.64
5.00
5.80

SD
5.97
6.07
5.74
6.40
5.95
5.31
5.98
5.97
5.96
5.92
5.98

PT 2018
Spring
Mean
SD
8.38
6.52
8.66
6.80
7.91
6.01
8.00
5.85
8.61
6.60
6.40
5.60
8.49
6.55
7.61
5.75
8.59
6.71
8.30
7.31
8.43
6.01

PT Gains
2017–18
Mean
SD
2.88
6.15
2.79
6.51
3.04
5.50
2.78
5.99
2.97
6.21
2.87
5.26
2.89
6.20
2.62
5.42
2.96
6.33
3.31
7.28
2.63
5.35

48

 Year 3 report: SPP evaluation

nieer.org

Raw Scores
Table B.11. Receptive vocabulary raw score means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
735
371
364
185
550
161
190
203
92
81
432
200
103
228
334
170

PPVT 2017
Fall
Mean
66.31
65.60
67.04
48.58
72.27
84.43
55.50
57.31
66.27
77.70
75.52
48.70
61.86
56.77
64.19
83.58

SD
27.31
27.55
27.08
20.93
26.62
25.86
21.35
25.85
26.45
23.69
25.09
22.04
27.76
23.73
26.50
25.67

PPVT 2018
Spring
Mean
SD
79.41
26.56
79.58
26.71
79.24
26.44
62.17
20.59
85.21
25.82
96.81
22.94
71.07
22.74
69.93
25.11
79.75
27.76
87.56
24.20
87.32
24.63
63.52
22.84
77.12
26.98
70.50
24.33
77.55
25.81
95.28
24.09

PPVT Gains
2017–18
Mean
SD
13.10
14.26
13.99
15.05
12.20
13.36
13.59
13.20
12.94
14.60
12.37
15.48
15.57
14.66
12.62
12.80
13.48
12.30
9.85
15.87
11.80
15.30
14.82
11.80
15.25
13.56
13.73
14.72
13.36
14.23
11.70
13.78

Table B.12. Receptive vocabulary raw score means and gains by center characteristics

Total
Curriculum
ECERS
CLASS ES
CLASS
CO
CLASS IS

High Scope
Creative
Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

N
735
464
271
72
630
38
697
158
577
270
465

PPVT 2017
Fall
Mean
SD
66.31
27.31
68.28
27.96
62.93
25.85
63.31
67.05
58.26
66.75
64.11
66.91
63.15
68.14

26.51
27.57
23.89
27.43
24.71
27.97
25.25
28.30

PPVT 2018
Spring
Mean
SD
79.41
26.56
80.99
27.24
76.71
25.16

PPVT Gains
2017–18
Mean
SD
13.10
14.26
12.71
14.32
13.78
14.15

75.64
80.30
71.03
79.87
77.14
80.03
77.13
80.74

12.33
13.25
12.76
13.12
13.03
13.12
13.98
12.59

26.50
26.52
25.26
26.57
24.31
27.13
24.81
27.46

13.78
14.34
11.27
14.41
13.47
14.48
14.45
14.14

49

 Year 3 report: SPP evaluation

nieer.org

Table B.13. Literacy raw score means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
606
298
308
167
439
129
152
178
70
71
357
171
78
176
288
140

WJ-LW 2017
Fall
Mean
8.05
8.39
7.72
6.07
8.81
8.76
7.56
8.64
6.49
7.80
8.50
7.60
6.99
6.70
7.68
10.54

SD
6.13
6.58
5.65
4.48
6.49
6.59
5.62
7.13
4.26
4.80
6.62
5.50
4.81
5.40
5.26
7.78

WJ-LW 2018
Spring
Mean
SD
11.12
7.53
11.23
7.37
11.00
7.69
8.75
5.31
12.02
8.04
11.71
6.85
10.31
6.25
11.71
7.80
9.30
5.74
12.11
11.07
11.44
8.48
10.54
5.81
10.87
6.11
10.30
8.84
10.18
5.66
14.15
8.32

WJ-LW Gains
2017–18
Mean
SD
3.06
4.56
2.84
3.25
3.28
5.55
2.68
3.48
3.21
4.91
2.95
2.99
2.75
3.04
3.07
3.88
2.81
3.18
4.31
9.65
2.94
5.12
2.95
3.57
3.88
3.65
3.60
6.86
2.50
2.90
3.61
3.50

Table B.14. Literacy raw score means and gains by center characteristics

Total
Curriculum
ECERS
CLASS ES
CLASS CO
CLASS IS

High Scope
Creative Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

N
606
382
224
58
515
30
576
129
477
227
379

WJ-LW 2017
Fall
Mean
SD
8.05
6.13
7.89
5.53
8.33
7.03
7.53
4.91
8.16
6.35
7.13
4.18
8.10
6.21
7.74
5.37
8.14
6.32
7.52
5.39
8.37
6.51

WJ-LW 2018
Spring
Mean
SD
11.12
7.53
11.23
7.39
10.92
7.77
10.52
5.69
11.27
7.77
10.40
4.93
11.15
7.64
10.26
5.99
11.35
7.88
10.68
8.11
11.37
7.16

WJ-LW Gains
2017–18
Mean
SD
3.06
4.56
3.34
5.18
2.60
3.21
2.98
3.64
3.11
4.69
3.27
3.84
3.05
4.60
2.52
3.18
3.21
4.86
3.17
6.00
3.00
3.44

50

 Year 3 report: SPP evaluation

nieer.org

Table B.15. Math raw score means and gains by child characteristics

Total
Gender
Age
Ethnicity

Language
FPL

Male
Female
3-Year-Old Cohort
4-Year-Old Cohort
White
Black
Asian
Hispanic
Other
English
DLL
Unknown
<100
100-300
>300

N
606
298
308
167
439
129
152
178
70
71
357
171
78
176
288
140

WJ-AP 2017
Fall
Mean
10.30
10.33
10.28
7.03
11.55
13.57
8.44
9.29
9.50
11.49
11.47
8.13
9.71
8.81
9.66
13.54

SD
5.22
5.44
5.01
4.29
5.00
4.44
4.57
5.52
4.35
4.72
4.86
5.24
5.22
4.79
5.05
4.75

WJ-AP 2018
Spring
Mean
SD
13.50
6.63
13.26
5.60
13.73
7.49
10.70
9.24
14.56
4.92
15.78
4.11
11.98
9.61
12.78
5.35
13.17
5.14
14.75
5.39
14.47
7.25
11.76
5.28
12.82
5.35
11.99
5.22
13.08
7.64
16.31
4.96

WJ-AP Gains
2017–18
Mean
SD
3.19
5.21
2.93
3.58
3.45
6.41
3.67
8.21
3.01
3.44
2.22
3.12
3.54
8.45
3.48
3.43
3.67
3.81
3.25
3.92
3.00
6.11
3.63
3.40
3.12
3.83
3.18
3.81
3.42
6.43
2.77
3.76

Table B.16. Math raw score means and gains by center characteristics
N
606
382
224

Mean
10.30
10.60
9.79

SD
5.22
5.40
4.86

WJ-AP 2018
Spring
Mean
SD
13.50
6.63
3.04
3.61
3.45
7.16

58
515
30
576
129
477
227
379

9.17
10.53
7.73
10.44
9.42
10.54
9.77
10.62

5.32
5.24
3.79
5.25
4.82
5.30
4.94
5.36

12.26
13.76
11.20
13.61
12.07
13.88
13.14
13.71

WJ-AP 2017 Fall
Total
Curriculum
ECERS
CLASS ES
CLASS CO
CLASS IS

High Scope
Creative
Curriculum
Less than 3
3 or More
Less than 5.5
5.5 or More
Less than 5.5
5.5 or More
Less than 3
3 or More

NIEER Technical Report

5.34
6.82
5.36
6.67
5.27
6.90
8.59
5.10

WJ-AP Gains
2017–18
Mean
SD
3.19
5.21
13.65
5.19
13.24
8.55
3.09
3.23
3.47
3.18
2.65
3.34
3.37
3.09

3.16
5.49
3.12
5.30
3.52
5.58
7.29
3.42

51

 Year 3 report: SPP evaluation

nieer.org

Appendix C. Sensitivity Analyses.
Table C.1. Multivariate analyses of children’s 2017–18 raw score gains in relation to child and
site or classroom characteristics and ECERS-3, excluding FCCs
Variables

Rec. Vocabulary
Literacy
Math
(PPVT/TVIP)
(WJ/WM-LW)
(WJ/WM-AP)
3-year-olds
-4.260**
0.720
-0.378
(1.44)
(0.53)
(0.61)
Returning Status
-3.730*
1.674*
-0.943
(1.71)
(0.68)
(0.78)
Asian
-3.205
-0.249
0.725
(1.70)
(0.64)
(0.74)
Black
0.209
-0.412
1.013
(1.75)
(0.66)
(0.77)
Hispanic
0.217
-0.128
0.715
(1.89)
(0.73)
(0.85)
Other
-2.175
1.027
0.748
(1.87)
(0.70)
(0.81)
DLL
-1.115
0.319
0.035
(1.43)
(0.51)
(0.60)
Agency Selected
2.540
0.156
0.269
(1.34)
(0.53)
(0.58)
HH Income<20k
-1.731
-1.436
0.486
(3.30)
(1.30)
(1.51)
HH Income 21-40k
-2.294
-0.658
-0.638
(2.58)
(0.96)
(1.12)
HH Income 41-60k
0.312
-0.704
-0.307
(2.49)
(0.95)
(1.09)
HH Income 61-80k
1.303
-0.761
1.381
(2.48)
(0.92)
(1.07)
FPL < 100
-0.385
1.885
-0.970
(3.27)
(1.31)
(1.50)
FPL 100 to 300
-0.855
-0.217
-0.162
(2.28)
(0.86)
(0.99)
High Scope
0.076
0.430
-0.410
(1.36)
(0.55)
(0.58)
Class Size
0.413
0.042
-0.148
(0.33)
(0.13)
(0.14)
Teacher Qual. Exceeds
1.810
1.250
-1.309
(1.89)
(0.80)
(0.84)
Teacher Qual. Meets
2.389
0.019
-0.403
(1.72)
(0.69)
(0.72)
Teacher Black
4.486*
0.659
-1.863*
(2.08)
(0.87)
(0.91)
Teacher Hispanic
1.328
-0.312
-1.578
(1.92)
(0.80)
(0.83)
Teacher Asian
3.828
0.641
-1.277
(2.34)
(0.95)
(1.00)
Teacher Other
2.134
1.078
0.045
(1.62)
(0.69)
(0.72)
ECERS-3
1.011
-0.244
0.350
(0.94)
(0.39)
(0.41)
N
702
573
573
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days

NIEER Technical Report

52

 Year 3 report: SPP evaluation

nieer.org

between tests and an indicator for missing language, income, race, FPL, and teacher qualification and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.

Table C.2. Multivariate analyses of children’s 2017–18 raw score gains in relation to child and
site or classroom characteristics and CLASS dimensions, excluding FCCs
Variables
3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL
Agency Selected
HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
High Scope
Class Size
Teacher Qual. Exceeds
Teacher Qual. Meets
Teacher Black
Teacher Hispanic
Teacher Asian
Teacher Other
CLASS ES average
CLASS CO average

NIEER Technical Report

Rec. Vocabulary
(PPVT/TVIP)
-4.192**
(1.45)
-3.686*
(1.72)
-3.079
(1.71)
0.267
(1.76)
0.250
(1.89)
-2.081
(1.88)
-1.102
(1.43)
2.726*
(1.32)
-1.889
(3.29)
-2.364
(2.59)
0.231
(2.49)
1.244
(2.48)
-0.160
(3.26)
-0.741
(2.28)
-0.167
(1.39)
0.469
(0.34)
2.114
(1.85)
2.823
(1.68)
4.612*
(2.09)
1.207
(1.94)
3.771
(2.39)
2.123
(1.66)
0.509
(1.57)
0.362
(1.25)

Literacy
(WJ/WM-LW)
0.645
(0.53)
1.671*
(0.68)
-0.290
(0.64)
-0.389
(0.66)
-0.178
(0.73)
1.042
(0.70)
0.325
(0.51)
0.158
(0.52)
-1.343
(1.30)
-0.583
(0.96)
-0.601
(0.95)
-0.691
(0.92)
1.799
(1.31)
-0.269
(0.86)
0.321
(0.56)
0.038
(0.13)
1.048
(0.78)
-0.293
(0.67)
0.767
(0.87)
-0.192
(0.80)
0.836
(0.96)
0.928
(0.69)
-0.569
(0.66)
0.256
(0.52)

Math
(WJ/WM-AP)
-0.448
(0.62)
-0.880
(0.78)
0.772
(0.74)
1.132
(0.77)
0.677
(0.85)
0.820
(0.81)
0.067
(0.60)
0.403
(0.57)
0.597
(1.50)
-0.527
(1.12)
-0.232
(1.09)
1.463
(1.07)
-1.098
(1.50)
-0.196
(0.99)
-0.826
(0.59)
-0.108
(0.14)
-1.352
(0.82)
-0.726
(0.71)
-1.660
(0.91)
-1.442
(0.83)
-0.914
(1.02)
-0.170
(0.72)
-1.285
(0.69)
1.246*
(0.54)

53

 Year 3 report: SPP evaluation

nieer.org

CLASS IS average

-0.199
0.253
0.033
(0.65)
(0.26)
(0.28)
N
702
573
573
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualification and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.

Table C.3. Multivariate analyses of children’s 2017–18 raw score gains in relation to child and
site or classroom characteristics and CLASS dimensions, including FCCs
3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL
Agency Selected
HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
FCC
High Scope
Class Size
Teacher Qual. Exceeds
Teacher Qual. Meets
Teacher Black
Teacher Hispanic
Teacher Asian

NIEER Technical Report

Rec. Vocabulary
(PPVT/TVIP)
-4.359**
(1.39)
-3.947*
(1.71)
-2.900
(1.70)
0.195
(1.73)
-0.472
(1.86)
-2.177
(1.86)
-1.145
(1.38)
2.461
(1.32)
-1.568
(3.23)
-1.695
(2.57)
0.147
(2.48)
0.993
(2.49)
-1.386
(3.19)
-0.960
(2.29)
0.103
(4.24)
-1.248
(1.32)
0.436
(0.32)
1.498
(1.84)
2.623
(1.68)
2.656
(1.99)
0.387
(1.93)
2.604

Literacy
(WJ/WM-LW)
0.290
(0.50)
1.471*
(0.67)
0.033
(0.63)
-0.247
(0.64)
-0.059
(0.71)
1.233
(0.68)
0.226
(0.49)
0.050
(0.53)
-1.045
(1.25)
-0.642
(0.95)
-0.668
(0.94)
-0.659
(0.92)
1.233
(1.25)
-0.317
(0.85)
-0.802
(1.64)
0.080
(0.54)
-0.015
(0.13)
0.633
(0.79)
-0.341
(0.69)
-0.124
(0.84)
-0.577
(0.81)
0.207

Math
(WJ/WM-AP)
-0.464
(0.58)
-0.863
(0.76)
0.821
(0.71)
1.188
(0.74)
0.852
(0.81)
0.917
(0.78)
-0.046
(0.56)
0.409
(0.56)
0.176
(1.42)
-0.403
(1.09)
-0.182
(1.07)
1.464
(1.05)
-0.688
(1.41)
-0.240
(0.97)
-0.103
(1.69)
-0.828
(0.54)
-0.093
(0.13)
-1.342
(0.79)
-0.740
(0.69)
-1.611
(0.84)
-1.417
(0.81)
-0.805

54

 Year 3 report: SPP evaluation

nieer.org

(2.35)
(0.96)
(0.97)
2.106
0.934
-0.187
(1.66)
(0.72)
(0.71)
CLASS ES average
0.461
-0.320
-1.241
(1.53)
(0.65)
(0.65)
CLASS CO average
0.623
0.252
1.256*
(1.23)
(0.52)
(0.52)
CLASS IS average
-0.404
0.188
0.045
(0.64)
(0.27)
(0.27)
N
735
606
606
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualification and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Teacher Other

Table C.4. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and ECERS-3 threshold, excluding FCCs
Variables
3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL
Agency Selected
HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
High Scope
Class Size
Teacher Qual. Exceeds
Teacher Qual. Meets

NIEER Technical Report

Rec.
Vocabulary
(PPVT/TVIP)
-1.238
(1.12)
-2.272
(1.37)
-2.944*
(1.37)
-0.849
(1.40)
0.042
(1.52)
-2.436
(1.50)
-0.436
(1.15)
2.564*
(1.06)
-1.917
(2.64)
-1.802
(2.07)
-0.432
(2.00)
0.481
(1.99)
0.423
(2.61)
-0.510
(1.83)
0.416
(1.08)
0.398
(0.27)
3.066*
(1.46)
2.844*

Literacy
(WJ/WM-LW)

Math
(WJ/WM-AP)

3.471***
(1.01)
1.346
(1.33)
-0.652
(1.24)
-0.215
(1.29)
-1.258
(1.43)
0.591
(1.36)
0.820
(1.01)
-0.177
(1.00)
-3.374
(2.55)
-1.482
(1.88)
-1.258
(1.86)
-2.112
(1.81)
2.166
(2.57)
-1.320
(1.68)
0.144
(1.03)
-0.019
(0.25)
1.858
(1.46)
-0.664

0.719
(1.11)
-0.673
(1.45)
0.523
(1.37)
-1.216
(1.44)
1.591
(1.59)
1.151
(1.51)
0.938
(1.11)
0.386
(1.06)
2.322
(2.80)
-1.227
(2.09)
0.205
(2.04)
0.134
(1.99)
-4.144
(2.79)
-1.631
(1.84)
-0.402
(1.06)
-0.357
(0.26)
1.333
(1.50)
0.915

Executive Function
DCCS

PT

-0.132*
(0.06)
-0.015
(0.08)
-0.038
(0.07)
-0.118
(0.07)
0.038
(0.08)
0.005
(0.08)
0.028
(0.06)
-0.009
(0.06)
-0.148
(0.15)
-0.189
(0.11)
-0.095
(0.11)
-0.059
(0.11)
0.087
(0.15)
0.012
(0.10)
-0.030
(0.06)
-0.002
(0.01)
-0.022
(0.08)
0.001

-1.448*
(0.65)
0.021
(0.83)
1.708*
(0.77)
-0.535
(0.81)
0.886
(0.89)
2.477**
(0.85)
-0.620
(0.63)
-0.489
(0.60)
-0.575
(1.59)
-1.375
(1.18)
-0.550
(1.16)
-0.008
(1.13)
0.807
(1.58)
-0.283
(1.04)
0.245
(0.60)
-0.088
(0.15)
-0.673
(0.86)
0.387

55

 Year 3 report: SPP evaluation

nieer.org

(1.28)
(1.21)
(1.26)
(0.07)
(0.71)
4.995**
1.666
-1.095
-0.078
-0.679
(1.65)
(1.63)
(1.68)
(0.09)
(0.96)
Teacher Hispanic
1.901
-0.565
-1.166
-0.130
-1.029
(1.54)
(1.50)
(1.54)
(0.08)
(0.88)
Teacher Asian
4.595*
2.055
-0.048
-0.090
-1.926
(1.87)
(1.80)
(1.86)
(0.10)
(1.06)
Teacher Other
1.977
3.357**
0.873
-0.003
-0.551
(1.30)
(1.30)
(1.33)
(0.07)
(0.76)
ECERS-3 above 3.0
-0.348
0.889
0.427
-0.007
0.535
(1.40)
(1.37)
(1.42)
(0.07)
(0.80)
N
702
573
573
571
574
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualification and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Teacher Black

Table C.5. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and CLASS dimensions’ thresholds, including FCCs
Variables
3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL
Agency Selected
HH Income<20k
HH Income 21-40k
HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
FCC
High Scope

NIEER Technical Report

Rec. Vocabulary
(PPVT/TVIP)

Literacy
(WJ/WM-LW)

-1.387
(1.07)
-2.480
(1.37)
-2.817*
(1.35)
-0.918
(1.37)
-0.518
(1.48)
-2.553
(1.48)
-0.569
(1.11)
2.338*
(1.05)
-1.518
(2.58)
-1.355
(2.05)
-0.505
(1.98)
0.288
(1.99)
-0.640
(2.55)
-0.673
(1.82)
-0.489
(3.31)
-0.371

2.615**
(0.97)
0.850
(1.32)
0.020
(1.24)
0.184
(1.26)
-0.881
(1.40)
1.234
(1.35)
0.549
(0.97)
-0.264
(1.00)
-2.799
(2.46)
-1.306
(1.87)
-1.278
(1.85)
-1.918
(1.82)
1.228
(2.47)
-1.464
(1.68)
-5.807
(3.03)
-0.299

Math
(WJ/WMAP)
0.571
(1.05)
-0.752
(1.43)
0.789
(1.35)
-0.880
(1.39)
1.728
(1.54)
1.662
(1.47)
0.486
(1.06)
0.380
(1.04)
1.272
(2.67)
-0.621
(2.05)
0.402
(2.01)
0.184
(1.97)
-3.307
(2.66)
-1.886
(1.83)
-2.451
(3.11)
-0.870

Executive Function
DCCS

PT

-0.139*
(0.06)
-0.011
(0.08)
-0.061
(0.07)
-0.155*
(0.07)
0.006
(0.08)
-0.009
(0.08)
0.044
(0.06)
-0.011
(0.06)
-0.205
(0.14)
-0.184
(0.11)
-0.093
(0.11)
-0.076
(0.11)
0.114
(0.14)
-0.011
(0.10)
-0.393*
(0.16)
-0.037

-1.847**
(0.62)
0.017
(0.81)
1.829*
(0.75)
-0.393
(0.78)
0.805
(0.86)
2.552**
(0.83)
-0.590
(0.60)
-0.550
(0.59)
-0.877
(1.52)
-1.308
(1.16)
-0.406
(1.13)
0.083
(1.12)
1.032
(1.50)
-0.328
(1.03)
-1.777
(1.76)
0.132

56

 Year 3 report: SPP evaluation

nieer.org

(1.03)
(0.99)
(1.00)
(0.05)
(0.57)
0.320
-0.090
-0.240
-0.007
-0.122
(0.25)
(0.24)
(0.24)
(0.01)
(0.14)
Teacher Qual. Exceeds
2.882
0.834
0.644
-0.078
-0.646
(1.48)
(1.50)
(1.51)
(0.08)
(0.86)
Teacher Qual. Meets
3.122*
-1.319
-0.039
-0.029
0.231
(1.35)
(1.30)
(1.31)
(0.07)
(0.74)
Teacher Black
3.219*
0.755
-0.480
-0.122
-0.847
(1.61)
(1.59)
(1.60)
(0.09)
(0.91)
-0.706
Teacher Hispanic
1.344
-1.050
-1.210
-0.168*
(1.58)
(1.56)
(1.56)
(0.08)
(0.88)
Teacher Asian
3.506
1.410
0.557
-0.128
-2.135*
(1.86)
(1.79)
(1.81)
(0.10)
(1.02)
Teacher Other
2.145
2.990*
0.573
-0.029
-0.525
(1.33)
(1.35)
(1.35)
(0.07)
(0.76)
CLASS ES above 5.5
1.435
-1.930
-2.354
-0.144
0.458
(2.31)
(2.28)
(2.32)
(0.12)
(1.31)
CLASS CO above 5.5
-0.848
2.068
2.389
0.080
1.408
(1.29)
(1.26)
(1.27)
(0.07)
(0.72)
CLASS IS above 3.0
-0.281
-0.012
0.545
0.012
-1.080
(1.01)
(0.99)
(0.99)
(0.05)
(0.56)
N
735
606
606
604
607
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualification and race.
Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Class Size

Table C.6. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and overall ECERS-3 with Agency Fixed Effects, excluding
FCCs
Variables

Rec. Vocabulary
(PPVT/TVIP)

3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL
Agency Selected
HH Income<20k
HH Income 21-40k

NIEER Technical Report

-1.471
(1.16)
-2.299
(1.40)
-2.856*
(1.37)
-0.868
(1.40)
0.309
(1.51)
-2.353
(1.50)
-0.681
(1.16)
3.914**
(1.35)
-1.913
(2.67)
-2.490

Executive Function
Literacy
(WJ/WMLW)
3.628***
(1.04)
1.732
(1.34)
-0.937
(1.23)
-0.255
(1.29)
-1.267
(1.42)
0.341
(1.35)
0.330
(1.01)
1.432
(1.24)
-3.396
(2.55)
-1.714

Math
(WJ/WMAP)
0.739
(1.14)
0.356
(1.48)
0.354
(1.37)
-1.050
(1.44)
1.138
(1.58)
1.416
(1.50)
0.795
(1.12)
1.801
(1.37)
2.575
(2.80)
-1.502

DCCS

PT

-0.113~
(0.06)
0.017
(0.08)
-0.032
(0.07)
-0.094
(0.08)
0.046
(0.08)
0.016
(0.08)
0.010
(0.06)
0.063
(0.07)
-0.163
(0.15)
-0.221*

-1.386*
(0.68)
0.280
(0.85)
1.563*
(0.77)
-0.355
(0.81)
0.908
(0.89)
2.396**
(0.85)
-0.720
(0.64)
-0.145
(0.78)
-0.561
(1.61)
-1.424

57

 Year 3 report: SPP evaluation

HH Income 41-60k
HH Income 61-80k
FPL < 100
FPL 100 to 300
High Scope
Class Size
Teacher Qual.
Exceeds

(2.07)
-1.094
(1.99)
0.094
(1.99)
-0.209
(2.66)
-0.061
(1.83)
1.268
(10.51)
0.112
(0.39)
-4.322

nieer.org

(1.88)
-1.407
(1.84)
-1.975
(1.80)
1.752
(2.57)
-1.682
(1.67)
11.346
(8.64)
0.090
(0.35)
-13.185

(2.09)
-0.080
(2.04)
0.231
(1.99)
-4.728
(2.79)
-1.710
(1.84)
-10.407
(9.52)
0.444
(0.38)
8.327

(0.11)
-0.128
(0.11)
-0.084
(0.11)
0.078
(0.15)
0.015
(0.10)
0.486
(0.50)
0.019
(0.02)
-0.601

(1.19)
-0.753
(1.16)
-0.107
(1.14)
0.839
(1.60)
-0.284
(1.05)
-1.157
(5.45)
-0.017
(0.22)
1.237

(10.99)
(9.13)
(10.06)
(0.53)
(5.75)
1.505
-0.868
1.522
-0.013
0.547
(1.69)
(1.46)
(1.61)
(0.08)
(0.92)
Teacher Black
3.368
-0.103
-1.156
0.033
-0.140
(2.16)
(1.98)
(2.18)
(0.11)
(1.25)
Teacher Hispanic
1.758
-0.467
-1.463
-0.110
-0.997
(1.66)
(1.48)
(1.63)
(0.09)
(0.93)
Teacher Asian
3.972
-2.884
-3.638
-0.319*
-4.250**
(2.81)
(2.45)
(2.72)
(0.14)
(1.55)
Teacher Other
1.875
3.965**
1.750
0.047
-0.424
(1.36)
(1.26)
(1.38)
(0.07)
(0.79)
ECERS-3
0.322
-1.345
-1.157
-0.096*
-0.887
(0.86)
(0.79)
(0.87)
(0.05)
(0.50)
N
702
573
573
571
574
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualifications, race and agency
fixed effects. Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Teacher Qual. Meets

Table C.7. Multivariate analyses of children’s 2017–18 standard score gains in relation to child
and site or classroom characteristics and overall CLASS dimensions with Agency Fixed Effects,
including FCCs
Variables
3-year-olds
Returning Status
Asian
Black
Hispanic
Other
DLL

NIEER Technical Report

Rec.
Vocabulary
(PPVT/TVIP)
-1.702
(1.11)
-2.481
(1.39)
-2.925*
(1.36)
-0.850
(1.37)
-0.175
(1.48)
-2.346
(1.47)
-0.523

Literacy
(WJ/WM-LW)
2.410*
(1.00)
1.029
(1.33)
-0.564
(1.22)
0.128
(1.27)
-0.998
(1.39)
0.846
(1.33)
0.068

Math
(WJ/WMAP)
0.476
(1.09)
0.229
(1.45)
0.384
(1.35)
-0.693
(1.39)
1.346
(1.53)
1.770
(1.46)
0.501

Executive Function
DCCS

PT

-0.114
(0.06)
0.017
(0.08)
-0.039
(0.07)
-0.127
(0.07)
0.022
(0.08)
-0.001
(0.08)
0.025

-1.597*
(0.64)
0.184
(0.83)
1.673*
(0.76)
-0.252
(0.79)
0.904
(0.87)
2.375**
(0.83)
-0.706

58

 Year 3 report: SPP evaluation

nieer.org

(1.13)
(0.98)
(1.08)
(0.06)
(0.61)
3.892**
0.979
1.491
0.009
-0.633
(1.31)
(1.22)
(1.33)
(0.07)
(0.76)
HH Income<20k
-1.379
-2.406
1.811
-0.212
-0.909
(2.60)
(2.48)
(2.71)
(0.15)
(1.55)
HH Income 21-40k
-1.965
-1.282
-0.747
-0.221*
-1.387
(2.04)
(1.87)
(2.06)
(0.11)
(1.17)
HH Income 41-60k
-1.145
-0.936
0.221
-0.127
-0.632
(1.98)
(1.84)
(2.02)
(0.11)
(1.15)
HH Income 61-80k
-0.052
-1.436
0.458
-0.109
-0.182
(1.98)
(1.81)
(1.98)
(0.11)
(1.13)
FPL < 100
-1.040
0.586
-4.061
0.099
1.039
(2.58)
(2.48)
(2.68)
(0.14)
(1.53)
FPL 100 to 300
-0.051
-1.839
-2.122
0.001
-0.335
(1.82)
(1.68)
(1.83)
(0.10)
(1.04)
FCC
1.240
-12.659*
9.340
-0.320
-1.382
(6.64)
(5.63)
(6.14)
(0.33)
(3.51)
High Scope
-9.193
3.303
-8.112
0.101
-1.049
(5.63)
(4.67)
(5.10)
(0.27)
(2.91)
Class Size
-0.035
-0.168
0.325
-0.007
-0.173
(0.36)
(0.32)
(0.35)
(0.02)
(0.20)
Teacher Qual. Exceeds
4.726
-7.581
5.307
-0.303
0.947
(6.52)
(5.56)
(6.06)
(0.32)
(3.46)
Teacher Qual. Meets
1.721
-2.296
0.426
-0.102
-0.181
(1.59)
(1.39)
(1.53)
(0.08)
(0.87)
Teacher Black
1.430
-1.572
-1.344
-0.097
-0.866
(1.99)
(1.81)
(1.98)
(0.11)
(1.13)
Teacher Hispanic
1.138
-0.837
-1.431
-0.144
-1.166
(1.66)
(1.47)
(1.61)
(0.09)
(0.92)
Teacher Asian
3.234
-2.971
-2.797
-0.303*
-3.997*
(2.91)
(2.54)
(2.79)
(0.15)
(1.59)
Teacher Other
1.695
3.123*
1.037
0.028
-0.457
(1.39)
(1.28)
(1.40)
(0.07)
(0.80)
CLASS ES average
0.471
-1.616
-1.809
-0.062
-0.417
(1.26)
(1.17)
(1.28)
(0.07)
(0.73)
CLASS CO average
-0.072
0.316
0.764
0.100
0.706
(1.11)
(1.03)
(1.12)
(0.06)
(0.64)
CLASS IS average
-0.097
1.115*
0.649
-0.054
-0.585
(0.54)
(0.49)
(0.53)
(0.03)
(0.30)
N
735
606
606
604
607
* p<0.05; ** p<0.01; *** p<0.001. Note: Reference groups omitted from the estimation are Males, White, English,
FPL 300%+, Income>80 thousand, and Creative Curriculum. Other controls are pre-test, age in months, days
between tests and an indicator for missing language, income, race, FPL, and teacher qualifications, race and agency
fixed effects. Standardized scores are used for PPVT, and WJ or WM. Errors are clustered by classroom.
Agency Selected

NIEER Technical Report

59

 Year 3 report: SPP evaluation

nieer.org

Appendix D. P-values for tests of differences in means.
Table D.1. P-values for T-tests comparing distributions
P(T<=t) two-tail
16' vs. 17'
ECERS-3
CLASS ES
CLASS CO
CLASS IS

0.049
0.346
0.620
0.107

17' vs. 18'
0.442
0.444
0.021
0.099

17' vs. 18'
including FCCs
n/a
0.894
0.064
0.062

Table D.2. P-values for T-tests for comparisons of CLASS means between classrooms and FCCs
Domains and Dimensions
P-value
Emotional Support
0.113
1. Positive Climate
0.429
2. Negative Climate*
0.874
3. Teacher Sensitivity
0.145
4. Regard for Student Perspectives
0.426
Classroom Organization
0.105
5. Behavior Management
0.153
6. Productivity
0.341
7. Instructional Learning Formats
0.206
8. Facilitation of Learning & Dev.
n/a
Instructional Support
0.733
9. Concept Development
0.811
10. Quality of Feedback
0.824
11. Language Modeling
0.510
Table D.3. P-values for T-tests and Bonferroni tests comparing quality across children
subgroups, includes FCCs
Ethnicity
Gender
FPL
DLL
Bonferroni
T-Test
Bonferroni
Bonferroni
Prob>chi2
Pr( T  >  t )
Prob>chi2
Prob>chi2
ECERS-3
0.116
0.604
0.459
0.028
CLASS ES
0.794
0.000
0.000
0.000
CLASS CO
0.384
0.401
0.886
0.000
CLASS IS
0.152
0.006
0.077
0.000

NIEER Technical Report

60