BLOG

Can nudging mentors weaken student support? Experimental evidence from a virtual communication intervention

Does reminding mentors to reach out support virtual communication?

Not necessarily. This study tested the effect of a low-cost, light-touch intervention—mentor reminders—designed to bolster virtual outreach. The study examined impacts on the frequency of student-mentor communication and subsequent impacts on mentoring relationship quality. Unexpectedly, although students did not report receiving more (or less) outreach from mentors who received reminders, compared to mentors that did not receive reminders, treated mentors reported that their students initiated less outreach. In addition, students of treated mentors reported that they were less likely to respond to messages their received from their mentors. 

 

Read More

The Uncertain Role of Educational Software in Remediating Student Learning

What is the potential of educational software for remediation?

Educators must balance the need to remediate students who are performing behind grade level with their obligation to teach grade-appropriate content to all. Educational software programs could help them strike this balance by incorporating below-grade-level content into an existing curriculum, allowing students to learn at their own pace while remaining in the same classroom as their peers. If effective, this practice could save school systems the high costs of more intensive remedial interventions like high-dosage tutoring, summer school, extra coursework, and grade retention.

How did this study examine the effectiveness of educational software for remediating below-grade-level students?

This study estimates the causal effects of providing low-performing students in grades 3-6 with below-grade-level math content via an online software program. Students who scored below a designated cutoff on a prior-year math assessment were assigned a modified version of the software program. The modified software included below-grade-level content before the grade-level material. Students who scored above the cutoff received only the grade-level curriculum. We examined whether receiving the modified curriculum affected students’ completion of grade-level learning objectives, pre- and post-objective quiz scores, and math test scores

Read More

Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

When randomized control trials are not possible, quasi-experimental methods like Regression Discontinuity and Difference-in-Difference (DiD) often represent the best alternatives for high quality evaluation. Researchers using such methods frequently conduct exhaustive robustness checks to make sure the assumptions of the model are met, and that results aren’t sensitive to specific choices made in the analysis process. However, often there is less thought applied to how the outcomes for many quasi-experimental studies are created. For example, in studies that rely on survey data, scores may be created by adding up the item responses to produce total scores, or achievement tests may rely on scores produced by test vendors. In this study, several item response theory (IRT) models specific to the DiD design are presented to see if they improve on simpler scoring approaches in terms of the bias and statistical significance of impact estimates.

Why might using a simple scoring approach do harm in the quasi-experimental/DiD context?

While most researchers are aware that measurement error can impact the precision of treatment effect estimates, they may be less aware that measurement model misspecification can introduce bias into scores and, thereby, treatment effect estimates. Total/sum scores do not technically involve a measurement model, and therefore may seem almost free of assumptions. But in fact, they resemble a constrained measurement model that oftentimes makes unsupported assumptions, including that all items should be given the same weight when producing a score. For instance, on a depression survey, total scores would assume that items asking about trouble sleeping and self-harm should get the same weight in the score. Giving all items the same weight can bias scores. For example, if patterns of responses differ between treated and control groups, faulty total score assumptions could bias treatment effect estimates and mute variability in the outcome researchers wish to quantify.

What decisions involved in more sophisticated scoring approaches impact treatment estimates?

Read More

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Multi-site randomized controlled trials (RCTs) produce rigorous evidence on whether educational interventions “work.” However, principals and superintendents need evidence that applies to their students and schools. This paper examines whether the average impact of an intervention in a particular site—school or district—can be accurately predicted using evidence from a multi-site RCT.

What Methods Did the Study Use to Predict Impacts?

This paper used three methods to predict the average impact in individual sites: (1) the average of the impact estimates in the other sites, (2) lasso regression, and (3) Bayesian Additive Regression Trees (BART). Lasso and BART used a variety of moderators as predictors, including characteristics of participating students, participating schools, the intervention as implemented, and the counterfactual condition.  

How Was the Accuracy of These Predictions Gauged?

Read More

Supporting Teachers in Argument Writing Instruction at Scale: A Replication Study of the College, Career, and Community Writers Program (C3WP)

This large-scale randomized experiment found that the National Writing Project’s (NWP’s) College, Career, and Community Writers Program (C3WP) improved secondary students’ ability to write arguments drawing from nonfiction texts.

What impacts did C3WP have on student achievement?

The study team collected and scored student writing from an on-demand argument writing task similar to those in some state assessments. At the end of the year, students in C3WP districts outscored students in comparison districts by about 0.24 on a 1- to 6-point scale on each of the four measured attributes (see graph). On average, these effects are equivalent to moving a student from the 50th percentile of achievement to the 58th percentile of achievement.

Read More

Which Students Benefit from Computer-Based Individualized Instruction? Experimental Evidence from Public Schools in India

Does computer-based individualized instruction boost math learning?

 Yes. In public schools in Rajasthan, India, students who scored in the bottom 25% of their class improved by 22% of a standard deviation in math test scores (top chart). However, the average student in grades 6-8 who had access to individualized instruction did not outperform those who did not over nine months. Our results suggest that computer-based individualized instruction is most beneficial for low performers.

What is computer-based individualized instruction?

 We provided all students with a computer-adaptive math learning software called “Mindspark.” When students first log in, they take a diagnostic test, which identifies what they know and can do, and the areas in which they can improve. Then, the software presents them with exercises appropriate for their preparation level based on the diagnostic test. The difficulty and topic covered by subsequent exercises dynamically adjust to each student’s progress.

Read More

Effect of Active Learning Professional Development Training on College Student Outcomes

Is there an effect of participating in Active Learning Professional Development (ALPD) training on student performance?

Students who took a course with an ALPD instructor were three percentage points more likely to take additional classes in the same subject area compared to students who were taught by non-participant. Non-participants persisted at a rate of about 68%, so a three percentage point increase represents a 5% improvement. Importantly, ALPD training is related to higher likelihood of implementing active learning instructional practices in the classroom. We do not find any differences in students’ current course grade or performance in the next class.

 

How to read this chart: This figure shows that students who took a course with an ALPD trained instructor were three percentage points more likely to take another course in the same field of study in the immediate next term (p<0.05). No clear difference in course grades was evident either in the ALPD-instructed course, or in the next course taken.

Read More

Experimental Design and Statistical Power for Cluster Randomized Cost-Effectiveness Trials

Cluster randomized trials (CRTs) are commonly used to evaluate educational effectiveness. Recently there has been greater emphasis on using these trials to explore cost-effectiveness. However, methods for establishing the power of cluster randomized cost-effectiveness trials (CRCETs) are limited. This study developed power computation formulas and statistical software to help researchers design two- and three-level CRCETs.

Why are cost-effectiveness analysis and statistical power for CRCETs important?

Policymakers and administrators commonly strive to identify interventions that have maximal effectiveness for a given budget or aim to achieve a target improvement in effectiveness at the lowest possible cost (Levin et al., 2017). Evaluations without a credible cost analysis can lead to misleading judgments regarding the relative benefits of alternative strategies for achieving a particular goal. CRCETs link the cost of implementing an intervention to its effect and thus help researchers and policymakers adjudicate the degree to which an intervention is cost-effective. One key consideration when designing CRCETs is statistical power analysis. It allows researchers to determine the conditions needed to guarantee a strong chance (e.g., power > 0.80) of correctly detecting whether an intervention is cost-effective.

How to compute statistical power when designing CRCETs?

Read More

The Impact of a Virtual Coaching Program to Improve Instructional Alignment to State Standards

What is the virtual coaching program tested in this study?

Feedback on Alignment and Support for Teachers (FAST) is a virtual coaching program designed to help teachers better align their instruction to state standards and foster student learning. Key components of this 2-year program include collaborative meetings with grade-level teams, individual coaching sessions, instructional logs and video recordings of teachers’ own instruction, and models of aligned instruction provided by an online library of instructional resources. During the collaborative meetings and coaching sessions, teachers and coaches use the logs, video recordings, and models of aligned instruction to discuss ways of improving alignment of their instruction to state standards. Teachers were expected to complete 5 collaborative meetings, 5 individual coaching sessions, 5 video recordings of their instruction, and 5 instructional logs per year.

 How did we assess the impact of the virtual coaching program?

 We assessed the impact of the FAST program on teachers’ instructional alignment and students’ achievement through a multisite school-level randomized controlled trial, which took place in 56 elementary schools spanning five districts and three states. We randomly assigned 29 of the 56 schools to the treatment group and 27 to the control group. The study focused on Grade 4 math and Grade 5 English language arts (ELA) and used the respective state test scores as student achievement outcomes. We used an instructional survey to measure teachers’ instructional alignment. Teacher attendance, FAST coaching logs, teachers’ instructional logs, and video recordings of teachers’ instruction were collected to describe the implementation of the FAST program.

 What did we find?

Read More

Conjuring power from a theory of change: The PWRD method for trials with anticipated variation in effects

Timothy Lycurgus, Ben B. Hansen, and Mark White

PDF Version

Many efficacy trials are conducted only after careful vetting in national funding competitions. As part of these competitions, applications must justify the intervention’s theory of change: how and why do the desired improvements in outcomes occur? In scenarios with repeated measurements on participants, some of the measurements may be more likely to manifest a treatment effect than others; the theory of change may provide guidance as to which of those observations are most likely to be affected by the treatment.


Figure 1:
Power for the various methods across increasing effect sizes when the theory of change is correct.  

Read More

ICUE Intervention Improves Children’s Understanding of Mathematical Equivalence

Jodi L. Davenport, Yvonne Kao, Kristen Johannes, Caroline Byrd Hornburg, and Nicole M. McNeil

PDF Version

Does the ICUE intervention improve math learning?

Yes, second grade students in classrooms using the Improving Children’s Understanding of Equivalence (ICUE) materials and lessons scored higher on measures related to mathematical equivalence, including equation solving and conceptual problem solving. These higher scores came with no observable trade-offs in computational fluency.

Read More

A Cautionary Tale of Tutoring Hard-to-Reach Students in Kenya

Beth Schueler, Daniel Rodriguez-Segura

PDF Version

What was this study about?

Covid-19 school closures have generated significant interest in tutoring to make up for lost learning time. Tutoring is backed by rigorous research, but it is unclear whether it can be delivered effectively remotely. We study the effect of teacher-student phone calls in Kenya when schools were closed. Schools (j=105) were randomly assigned for 3rd, 5th and 6th graders (n=8,319) to receive one of two versions of a 7-week weekly math intervention—5-minute accountability checks or 15-minute mini-tutoring sessions—or to the control group.

Read More

Selecting Districts and Schools for Impact Studies in Education: A Simulation Study of Different Strategies

Daniel Litwok, Austin Nichols, Azim Shivji, and Robert Olsen

PDF Version

Experimental studies of educational interventions are rarely designed to produce impact evidence, justified by statistical inference, that generalizes to populations of interest to education policymakers.  This simulation study explores whether formal sampling strategies for selecting districts and schools improve the generalizability of impact evidence from experimental studies.

Which selection strategies produced samples with the greatest generalizability to the target population?

Read More

How Do the Impacts of Healthcare Training Vary with Credential Length? Evidence from the Health Profession Opportunity Grants Program

Daniel Litwok, Laura R. Peck, and Douglas Walton

PDF Version

How do the earnings impacts of healthcare training vary?

This article explores how earnings impacts vary in an experimental evaluation of a sectoral job training program. We find that over the first two years in the study, those who completed long-term credentials (defined as college degrees or certificates that require a year or more of classes to earn) had program impacts that were about $2,000 larger per year than those who did not complete long-term credentials (whether they completed a short-term credential or no credential at all). A possible explanation for this finding is that those who earned a long-term credential had different experiences in the program, including more engagement with support services, and different post-program outcomes, such as greater employment in high-wage healthcare occupations like registered nurse.

Read More

The AIC and aBIC Work Best For Identiying the Correct Number of Profiles in Latent Transition Analysis Applied to Typical Educational Settings

Peter A. Edelsbrunner, Maja Flaig, Michael Schneider

PDF Version

How can we best tell how many different learning patterns there are in our data?

Latent transition analysis is used to describe different learner patterns. However, it is often hard to tell how many patterns there are. Is there a pattern of learners who have little knowledge, another pattern of learners with a specific misconception, and another pattern of learners who have properly understood everything that we tried to teach them? Or are there some of these patterns but not all, or even additional ones? This is really hard to tell, and different indicators (called “relative fit indices”) are available for helping us determinate how many patterns there really are. We compare the performance of several relative fit indices. We find that the Bayesian information criterion (BIC), which is commonly used to determine the number of learning patterns, is not very accurate in finding the right number of patterns in comparison to other indices.

Read More

Effects of Cross-Age Peer Mentoring Program Within a Randomized Controlled Trial

Eric Jenner, Katherine Lass, Sarah Walsh, Hilary Demby, Rebekah Leger, and Gretchen Falk

PDF Version

How does a cross-age peer mentoring program affect ninth-grade outcomes?

Ninth-grade students who were offered Peer Group Connection High School (PGC-HS) were less likely to receive a suspension or disciplinary referral and self-reported higher levels of school engagement and postsecondary expectations. However, offering the program had no effect on academics (credit attainment, attendance at school, GPA) and other non-cognitive skills (e.g., decision-making skills).

Read More

Examining the Impact of a First Grade Whole Number Intervention by Group Size

Ben Clarke, Christian Doabler, Marah Sutherland, Derek Kosty, Jessica Turtura, and Keith Smolkowski

PDF Version

The importance of early mathematics

The importance of a successful start to learning mathematics has been a national priority for several decades. Mounting evidence indicates that trajectories of mathematics performance are established early and remain relatively stable across time. This may in part be due to substantial disparities in young students’ access to early mathematics experiences and instruction with preschool-aged students from upper- and middle-class backgrounds already outperforming their economically disadvantaged peers.

Read More

Raising Teacher Retention in Online Courses through Personalized Support. Evidence from a Cross-national Randomized Controlled Trial

Davide Azzolini, Sonia Marzadro, Enrico Rettore, Katja Engelhardt, Benjamin Hertz, Patricia Wastiau

PDF Version

Does providing teachers with personalized support help them complete online training courses?

Yes, but not for all and not everywhere. The TeachUP policy experimentation found large effects of personalized support on course completion in nine European Union Member States among professional (i.e., in-service) teachers (+10.6 percentage points), but not among student teachers. Moreover, no effects are found in Turkey. More studies are needed to investigate the contextual and learner characteristics that drive the heterogeneous effects.

Read More

Does Early Mathematics Intervention Change the Processes Underlying Children’s Learning?

Summary by: Wen Wen

PDF Version

What are “state-” and “trait-” math achievements in early education?

Interventions can boost early math skills, but the role of these early skills on later math achievement is unclear. Consider that students who demonstrate stronger early math skills tend to demonstrate stronger later math achievement, yet some interventions that improve early math skills do not improve later math achievement – that is, the early benefits fade substantially after 2 or 3 years.

Read More

Design and Analytic Features for Reducing Biases in Skill-Building Intervention Impact Forecasts

Daniela Alvarez-Vargas, Sirui Wan, Lynn S. Fuchs, Alice Klein, & Drew H. Bailey

PDF Version

Despite policy relevance, long term evaluations of educational interventions are rare relative to the amount of end of treatment evaluations. A common approach to this problem is to use statistical models to forecast the long-term effects of an intervention based on the estimated shorter term effects. Such forecasts typically rely on the correlation between children’s early skills (e.g., preschool numeracy) and medium-term outcomes (e.g., 1st grade math achievement), calculated from longitudinal data available outside the evaluation. This approach sometimes over- or under-predicts the longer-term effects of early academic interventions, raising concerns about how best to forecast the long-term effects of such interventions. The present paper provides a methodological approach to assessing the types of research design and analysis specifications that may reduce biases in such forecasts.

What did we do?

Read More