Endless Phonics: Know Your Options: Phonics Interventions

Know Your Options : Phonics Programs

While a comprehensive reading program has many critical components, we know that phonics and phonemic awareness are foundational. A kiddo cannot comprehend what they cannot read. Unfortunately, one of the major gaps in our school system today is lack of sufficient, explicit instruction in the foundational reading skills: Phonics, Phonemic Awareness, and Decoding.

If your child is struggling in these areas, you’ll want to look into intervention. If you have a formal diagnosis and an IEP, this can happen in a school setting. Alternatively, you may opt to hire a private tutor or even DIY. These are all valid options.

But…

The intervention has to be GOOD.

So how do we know which phonics interventions are better than others? Over the years, a few key elements have emerged as essential to a foundational phonics program. Phonics, phonemic awareness, and decoding skills need to be taught explicitly and systematically, following a scope and sequence that focuses on the individual sound ("phoneme") level, with the teacher modelling each concept, a high amount of student practice, and immediate corrective feedback.

Here's the problem.

Theoretically, many phonics programs have all these components. But in practice? Some of them are far more effective than others.

So what does the real-world evidence say?

REALLY Know Your Options:

What Does the Scientific Research Say?

Just because a phonics program has a strong theoretical basis, and is “aligned” to the science of reading, does not mean that it has evidence of effectiveness. Things that might sound great in theory, don’t always work out the way we’d hoped once we test them out in the real world.

When my own kiddos struggled to learn to read, I went on a deep dive into the research on the real-world effectiveness of phonics programs. And what I found wasn’t always pretty.

What did I find? I’ll walk you through it all below.

I did the deep dive so you can do the snorkel.

(.... okay, let's be honest. It's more like a snorkel tour; it's an extended snorkel). NOTE: This blog post is still in draft form as I continue to review the research, but I'm making it available for those interested.

Understanding the Research

If you’re well-versed in reading academic journal articles, are familiar with scientific jargon, and already know all about the importance of statistical significance and effect sizes, and control groups, then feel free to skip this next part and hop down to my summary of the research in the Evidence of Effectiveness section.

If you’re not familiar with these terms, you’ll definitely want to stick around for a minute, because these concepts are critical to understanding the research on the effectiveness of phonics programs. I’ll take you through each of these concepts. If you have some extra time, you might also find this video on Reading Program Effectiveness: Science vs. Snake Oil quite helpful.

Scientific Research Crash Course

Okay, there are a few terms that are important to know before diving into the scientific research on reading programs, and my summaries below: Control Groups, "Signifiance," and "Effect Size." Here comes a crash course.

Control Groups: What Are They, and Why Do They Matter?

When diving into the research on the evidence-base for any program, it is very very very important to check if the researchers compared their data to that of a “control group” or control population. Here’s why:

A Control Group is basically a group that did not receive the intervention being studied. The group who received the intervention (usually called the “treatment” group) is compared to the control group. Why is this important?

If a study does not compare the progress of the students who DID use the program to the progress of similar students who did NOT use the program, there is no way to tell whether or not the program had an impact, or if the students would have improved without it. Sure, students who received the intervention might have improved, but did they improve more than students who used something else? Without a control or comparison group, we quite simply don’t know the answer to that question. Intervention students might have had less progress than kids who received regular instruction. Without a control, we just don’t know.

Let’s put this another way:

Say we ran a reading intervention study and didn’t use a control group. During the course of the study, we measured reading outcomes and also children’s height. At the end of the intervention study, we noted that there were gains in reading and also growth in height. Would we say that the children’s increase in height was due to the reading program? No, of course not, that would be ridiculous.

If, in that reading intervention & height study, we had included a Control Group (a group not receiving reading intervention), after the study, we would have compared the data between the two groups. It would have been crystal clear that no, the reading intervention had nothing to do with increase in height - we would quickly see that the Control Group children increased in height just as much as the Treatment Group who received reading intervention.

The same goes for any increase in reading ability.

Without a Control Group, we can measure reading progress, but we simply have no way of knowing whether the intervention had anything to do with it. We cannot definitively attribute any growth we see to the intervention program itself. It could be due to some other factor, such as ‘natural’ progress due to self-teaching, or business-as-usual classroom teaching.

Sidenote: Studies that use Control Groups are called experimental or quasi-experimental studies. You will often see this mentioned in the “Abstract” section of the journal article or dissertation that summarizes the research. These are the studies that are most valuable to us if we want to look at the effectiveness of a reading program. And within these, the gold standard is what is called a Randomized Controlled Trial (RCT).

Scientific Jargon: “Significance” vs “Effect Size”

Beware the word “significant” in academic research. It may not mean what you think it means.

Usually when we see the word "Significant" we think "large" or "substantial." So if we read the phrase “The program had significant impact on reading scores,” we might intuitively think, that this means: “The program had a LARGE impact on children’s reading scores”

This is NOT how the term "significance" is used in academic research.

Significant DOES NOT mean large.

Instead, there is a second meaning that is much much more common in academic literature: Statistical Significance. In these cases …

Significant = The result you got would not likely occur unless the treatment itself was having a REAL impact.

Yeah, it’s a mouthful. And THIS is the way that the term “significant” is used most often in academic literature with an experimental design. It refers to statistical significance. So in these instances, the phrase “The program had significant impact on reading scores” does not necessarily mean large impacts, instead it means: “The program DID, ITSELF, APPEAR TO HAVE A REAL impact on reading scores.”

In this usage, the phrase indicates NOTHING as to whether the impact the reading program had was a large effect, a medium/moderate effect, or a small effect … or even if the program had a positive or negative effect! “Significant” only indicates that, statistically speaking, the reading program probably had a real impact, and the result probably wouldn't occur due to chance or an external factor.

IMPORTANT NOTE: I will only use the word “significant” in this second, statistical way on this page. Whenever I use it, just think: REAL RESULT or LIKELY NOT DUE TO CHANCE.

EXTRA INFO FOR THE CURIOUS: In my research summaries below, I haven’t bothered to list the numbers that researchers use to report statistical significance. I may add them in later, but for now, to keep things as simple as possible, I have just summarized the findings as significant or not. Typically, significance is measured by Confidence Intervals or a P-Value. P-Values are more commonly reported, but this is shifting. If you are interested in looking at the actual research studies yourself, know that significance is reported on a continuum. A p-value of 0.05 basically means that there is a 5% chance that the results could have occurred even if the treatment (i.e. reading intervention) had no real effect. Basically, it’s the percent chance that we thought our treatment had a real impact when it didn't (technically “spurious” / false positive). A p-value of of 0.01 means that there is a 1% chance of this error. For p-values, smaller is better. Typically, researchers only accept a maximum of 5% chance of error, i.e. a maximum p-value of 0.05. Anything greater than this and the result is dubbed "not significant," and we can't really use it as evidence.

If a result is significant, it means we can probably rely on it to be real evidence.

If a result is not significant, it means we probably can't rely on it as evidence.

Typically, it is difficult to reach statistical significance levels in small studies (i.e. with few students).

Effect Size = Amount of Impact

Okay, so significant means real evidence, and not significant means we can't really use the results as solid evidence.

But what we really want to know is, of results that were real/significant, how much impact did the reading intervention have? Instead of using the term “significant,” when academic research discusses the amount of impact that a treatment (ie reading intervention) had, they will usually refer to this as the “effect size.”

Effect sizes are calculated in many different ways, depending on the statistical analysis that is performed. All you really need to know is that effect sizes are usually described in three categories (1) Large/Strong, (2) Medium/Moderate, and (3) Small/Weak. You should also know that in education research, large effect sizes are rare, and teacher effect size is right at the small/medium threshold (see also this video). In other words, what we want to see is an effect size beyond the “teacher effect” - that is, we want to see at least a Medium effect size, and we should keep in mind that it is rare to see Large effects.

Effect sizes are not always calculated in the research I have reviewed. At this time I have opted not to calculate them myself, but I may do so at a later date, because this information would be valuable to have.

EXTRA INFO FOR THE CURIOUS: In my research summaries below, I have not included the exact numbers calculated for effect size. I may add them in later, but for now, to keep things as simple as possible, I just refer to the effect sizes by category: [Large/Strong], [Medium/Moderate], and [Small/Weak]. The reason I don't provide numbers is that effect sizes are calculated differently depending on the statistical analysis required by the study and these are reported on different scales.

For example: Cohen's d (a common effect size calculation) reports Effect Sizes in this way: at least 0.2 = Small effect, at least 0.5 = Medium effect, and at least 0.8 = Large effect .... but for Pearson's r (another common effect size calculation): at least 0.1 = Small effect, at least 0.3 = Medium effect and at least 0.5 = Large effect.

Comparing numbers across studies can clearly be deceiving. Which is why I don't include them here.

Whew. Okay.

As a recap:

Control Groups are very important.

“Significant” = Real Evidence of Impact

“Effect” = Amount of Impact

Enough with the scientific jargon and onward to the research summaries!

Evidence of Effectiveness: Broad Research

So, how do we choose a phonics intervention program? How well do each of these types of programs help dyslexic and otherwise struggling readers? What does the evidence say?

Which programs have a Good Evidence Base?

While we don’t have many experimental studies comparing one phonics program to another directly, we do have experimental studies which examine phonics interventions compared to control groups. And we have Meta-Analyses of these studies. Meta-Analyses are studies which compare the effectiveness of a program in one study to the effectiveness of another program in a different study using a standardized measure of effectiveness (for example, comparing significance and effect sizes). This sort of research can be extremely helpful in sorting through all the noise.

So, What Do the Meta-Analyses and Literature Reviews say?

In 2001, the National Reading Panel conducted a meta-analysis of phonics interventions and found that while overall, the phonics interventions studied had positive effects compared to controls, and moderate effect sizes on average (d = 0.41), but Orton-Gillingham approaches had the lowest effect sizes by far (d = 0.22).

Similarly, in 2006, Ritchie & Goeke looked at studies examining Orton-Gillingham based interventions and their results showed “inconclusive findings of the effectiveness of OG programs" due to mixed results, and lack of well-designed research. A subsequent meta-analysis of OG research was carried out a decade later by Stevens et al (2021). Unfortunately, they also found that “Orton-Gillingham reading interventions do not statistically significantly improve foundational skill outcomes (i.e., phonological awareness, phonics, fluency, spelling … vocabulary and comprehension outcomes)” when compared to other interventions or business-as-usual controls.

For more in-depth summaries of and commentary on this body of research, see Dr. Ginsberg’s “Is OG The Gold Standard?” … Solari et al’s “What does science say about OG?” and “Which aspects are supported?” as well as evidence that Syllable Division rules are not very useful and Nathaniel Hansford’s summaries of Orton-Gillingham research, including Multi-Sensory research, Wilson: Fundations research and Sonday research.

It all amounts to the same thing. Mixed, underwhelming results for Orton-Gillingham methods. As Dr. Tim Shanahan puts it, “so much for being the gold standard.”

Whoa, whoa, whoa. Okay, if you are anything like me when I dove into the academic research on phonics interventions, you might be starting to freak out. I mean, Orton-Gillingham is THE thing, right? I won’t lie, I started to panic a little. I mean, where does this leave us?

Hang On. Don’t Panic.

We’re just getting started. Keep reading.

Here’s the thing. While a meta-analytic approach can give us a good sense as to how well a general approach to intervention is working (or NOT working), the issue is that it groups programs together which may have quite different approaches in practice. So I decided to investigate a bit further.

What DOES Work, and What Doesn’t?

Program by Program

It’s important to keep in mind that with just a few exceptions, the vast majority of phonics programs examined in research studies DO result in positive outcomes for students. It’s just that some of them are more effective than others, and some of the most popular pre-packaged programs don’t appear to be the best ones.

I dug up the original research studies that were summarized by Stevens et al 2021 in the most recent Meta-Analysis of phonics research, grouped them by the specific phonics program used, and summarized the findings of each one below. I also included research studies examining several other popular phonics programs. This is ongoing work, so as I find more that meet the criteria outlined below, I will review and add those.

Below, I have organized the program summaries into four categories based on the evidence of effectiveness for the program: (1) Poor, (2) Mixed/Uncertain, (3) Fair or (4) Good.

This organization gives us a clearer picture as to what the research says about an individual program, rather than a generic approach.

Criteria for Inclusion

On this page, I’ve ONLY included research studies which used a Control Group. Why? Remember our Reading Intervention & Child Height example from earlier? Without a control group to compare to, we have no idea whether or not it is the program which led to reading growth, or some other factor (such as standard classroom instruction, outside tutoring, passage of time etc). With one exception, I've only included research that has been peer-reviewed ("peer-review" basically means that people from competing universities fact-checked the work of the researchers). I have only included research studies that looked at how interventions specifically performed with dyslexic or otherwise struggling readers, not the broader student population. I love our non-struggling readers, but quite frankly, they aren't who I'm worried about.

Below, I will summarize my findings, but if you want to dig deeper, search for a phonics program on Google Scholar, or DOAJ, or Core UK and see if you can find an experimental or quasi-experiemental study that used a control group. Some academic journals are open access now, so even if you don’t have a university login, you might find what you need. Reading the Abstract (which is usually open access) and Discussion sections of a paper can often give you a fair idea regarding the results of the study without wading into complex methods and tables of confusing data.

For an even easier overview of some of the reading program research, visit Pedagogy Non Grata’s research summaries. They cover many more programs than I do here, including HMH, SPIRE, SIPPS, Logic of English (sorta), and Read 180 to name a few. They include programs designed for classroom use, not just intervention. Other sources of information on the effectiveness of reading interventions include the What Works Clearinghouse and Evidence for ESSA. The Reading League also provides commentary on a program’s theoretical base. Be very careful how you interpret the results from WWC and ESSA, however. They have two ratings for each program. One measurement rates the quality of the research, and one measures the average effect size of the program. So a program could, for example, have a rating of "Strong 0.02" which actually means that there is strong evidence of a weak effect (i.e probably not a great program).

ONE FINAL NOTE: Please note that though I hold a PhD, and have read many many research articles in my day, my degree is not in Education. I am not an educational researcher myself. I have summarized the research findings below to the best of my ability, but it is always wise to cross-check this information by digging into the original sources, as well as peer-reviewed articles and meta-analyses written by educational researchers who have examined these studies.

All right, without further ado, here is the summary of the scientific research on Phonics Interventions. Below each program summary, I have an even more concise tl;dr section "My Takeaway"

Poor Evidence of Relative Effectiveness:

These are programs that have had a number of studies carried out to examine their effectiveness with struggling readers. Unfortunately, across multiple studies, these programs underperform compared to others on this list. They have weak evidence of effectiveness relative to the controls or the alternative programs that they were studied alongside.

Wilson

Wilson Fundations is an OG-based reading and spelling program that can be used as the Core Classroom or “Tier 1” program in schools, and can also be used for small group intervention (“Tier 2”). It covers phonics, phonemic awareness, fluency, handwriting, and spelling, but not reading comprehension. It is a lengthy program which takes 3 years to complete, with 80 hours of instruction delivered each year, for a total of 240 intervention hours (Wilson 2025).

The Wilson Reading System is an intensive, OG-based intervention delivered for 2-8 hours a week. It is a lengthy program, often taking students several years to complete. In one district-wide study, 21% of students graduated from the program within 3 years (Stamm 2017). The rest had to continue intervention. While intervention hours appear to vary, it seems that 200+ hours of intervention would be quite commonly needed with the WRS program.

There are 5 research studies of Wilson products which include control groups. In these studies, the average effect size was small to moderate, the findings were usually not statistically significant, other interventions or controls sometimes performed better than Wilson, and in a few measures there were negative effects. In the first study, Young 2001 compared Wilson to classroom instruction for a group of 31 high schoolers. After 28 sessions, they found no statistically significant impacts on reading or spelling measures. For a subset of the children who were given an additional tracing treatment, there were large positive effects on sound spelling, letter-word identification, nonsense word reading, and fluency, and a small effect on spelling. Those with a writing treatment saw smaller effects. Unfortunately in all cases (as is common with small studies), the effects were not significant. Another small study (Reuter 2006) of 26 middle school students with reading deficits carried out a 14 week intervention, with 13 students as a control, and the other 13 receiving a total of ~48 intervention hours of Wilson. No significant differences were found for most measures. The intervention had large positive effects in Word Attack and ORF, and reading comprehension, but these results were not significant. The Word Identification post-test found a large and significant but negative effect for the Wilson group. In a larger scale study of 779 struggling readers, Torgesen et al. 2007, compared four reading interventions to controls. The four interventions were Corrective Reading, Failure Free Reading, Spell Read PAT, and Wilson Reading System. After 80-93 hours of 1:3 intervention, impacts were assessed one year after the intervention year. For 3rd Graders, Wilson had a small but positive effect as compared to the control group in terms of Nonsense Word Reading, Nonsense Word Fluency, Word Identification, and Sight Words. These results were considered statistically significant. Compared to the other interventions, Wilson performed relatively well on most measures, but was second lowest for Nonsense Word Fluency, and the lowest of the four interventions in Oral Reading Fluency results. For 5th Grade, the results were very mixed. Here, the only significant positive impact Wilson had was in Nonsense Word Reading (medium effect size). Compared to controls, Wilson had a negative impact on Oral Reading Fluency, and this was statistically significant for the students who had struggled the most prior to intervention. None of the interventions in the study resulted in improvements in statewide standardized testing. Based on this study, Evidence for ESSA categorized the effect of Wilson as weak (ESSA, 2025). Furthermore, Wanzek & Roberts 2012 studied a group of 101 students who had been diagnosed with or had symptoms of dyslexia. After 70 hours of Wilson instruction, there were no significant differences between the Wilson treatment group and the control group. There were moderate but non-significant positive effects on word attack, small but non-significant positive effects on word identification. There was a large but non-significant negative effect on comprehension. In the most recent study Fritts 2016 looked at students identified as having learning disabilities and compared Wilson Fundations and Wilson Reading System to the Direct Instruction program Corrective Reading, as well as a Business-As-Usual condition. The study showed no significant differences between the three programs after 30 hours of delivery, but those receiving Wilson did slightly worse than the others. Those who received DI: Corrective Reading saw the best gains on the NWEA MAP test, though the effect size was small. When adjusted for pre-test scores, those receiving Wilson had the lowest average score on the NWEA post-test.

Casual, observational studies (e.g. Stebbins et al 2012, Duff et al 2015 ) of Wilson programs indicate improvements in some reading measures after 2 years, but because there were no control or comparison groups in these studies, the effects cannot be decisively attributed to Wilson program, just as we could not attribute change in student height over the course of the study to the Wilson program.

Notably, when discussing its alignment to Structured Literacy and the Science of Reading, Wilson does not highlight the above research studies which examined its effectiveness. Rather, it focuses on its theoretical alignment to the science of reading.

My takeaway? I would not reach for Wilson programs to help my child unless I had exhausted several other avenues. Wilson programs, though founded on very solid theoretical principles, often take a long time to complete, and simply do not have strong evidence of effectiveness in practice. Out of the five studies, only one showed statistically significant gains in any reading measures, these gains were small, and this study had negative results at the fifth grade level. In fact, four of the five studies had at least one measure where Wilson had a negative effect compared to no intervention. A phonics intervention having negative impacts compared to no intervention is rare. This program has the most negative outcomes of any program on this list. Can Wilson help some children? Certainly. But the research suggests that the intervention will often take a long time (upwards of 200 hours), and that you will likely have greater success with other options.

Mixed or Uncertain Evidence of Relative Effectiveness:

These are programs that have had some studies carried out to examine their effectiveness with struggling readers. Unfortunately, due to mixed or limited results, we just can’t conclude much about their effectiveness.

Alphabetic Phonics

Alphabetic Phonics is an Orton-Gillingham derived program. It is an older program which formed the basis for Take Flight (see my discussion of Take Flight below). It is a lengthy program, usually requiring well over 100 hours of intervention.

Dooley (1994), studied MIRC: an adaptation of Alphabetic Phonics for middle school remedial reading students for one semester. The adaptation included reading comprehension components. Compared to a control group using a standard basal reading program, the MIRC students performed better than the controls in terms of word attack, reading rate, and various reading comprehension and writing metrics. The findings were statistically significant. No effect sizes were calculated, but the biggest differences in post-test scores were in reading rate and writing. Kuveke 1996, did a small study. They found that a group of six students receiving the Alphabetic Phonics program made greater but mostly not statistically significant progress compared to controls. No effect sizes were calculated but after 2 years, intervention students had 3-7 month score gains in auditory discrimination, phonics and word reading. Reading comprehension gains were 1-2 years and the only measures to reach statistical significance.

My takeaway? The OG-based Alphabetic Phonics program has a good theoretical base, and it has some statistically significant results. This is promising. However, it is difficult to draw strong conclusions from two studies, especially when one of them only examined the results from 6 students. This is an older phonics program, and some of its components may not be well-aligned with newer Science of Reading research findings. Alphabetic Phonics formed the basis of what has now evolved into the Take Flight program (see below).

Barton

Barton Reading and Spelling System is an OG-based reading and spelling intervention that begins with phonemic awareness, then teaches phonics, decoding, and spelling via a series of rules and morphology. It is a lengthy program, requiring 2-3 years to complete at a dose of 2 hours per week, for a total of about 150-220 intervention hours for the entire program.

The research on the effectiveness of the Barton system comes with mixed results. In a study of 68 Iranian school children, results indicated that the Barton system had statistically significant positive impacts on reading fluency as compared to control groups - this following a pilot study that indicated positive impacts on reading comprehension. Effect sizes were not provided (Azizfar et al 2019, Mihandoost 2011). An earlier study of high school students in Florida also suggested greater gains in decoding, word attack, and spelling measures for Barton students as compared to control groups, though the research questions and data analysis were somewhat unconventional, and therefore difficult to compare to the other studies on this page. The treatment group scored higher in post-testing than controls by 0.46 - 3.6 standard score gains, but only performed significantly better than their pre-test score in Word Attack (i.e. nonsense word reading). The study’s very small size (two groups of 9 students) was noted as an issue (Giess 2005). A fourth study of 21 students found that although students receiving 36 hours of Barton generally had higher scores than those using HMH’s Reading Tool Kit, none reached the level of statistical significance. No effect sizes were calculated in the original study, but a subsequent analysis found them to be medium effect sizes, albeit not statistically significiant (Bisplinghoff 2015, PedagogyNonGrata 2021). A fifth study compared 22 students receiving Barton to those who hadn't on standardized state tests of reading comprehension. There was no statistically significant difference in reading scores between Barton and the controls. Effect sizes were not calculated, but in general, students in Grades 1-2 who received Barton scored worse than controls, while those in Grade 4-6 who did Barton scored better than controls. Grade 3 had mixed results. None of the differences were statistically significant (Gearhart 2017).

Barton lists 15 case studies on its website which provide stories and data in support of its effectiveness. These do offer promising results, and more data than many programs provide consumers with, I might add! However, due to the fact that these case studies were not designed with experimental controls, we unfortunately cannot definitively attribute the reading growth of these studies to the Barton System, (just as we would not attribute an increase in a child’s height during the study to the Barton intervention).

My takeaway? Barton has a solid theoretical base, and some evidence of positive impacts, but also some mixed results, and with the exception of two international studies, most of this evidence was not found to be statistically significant, so it is difficult to say whether the program itself is what is moving the needle in these cases. The research studies were all quite small. The Barton System can take a long time for students to progress through. Overall I would say that its effectiveness seems to be muted compared to other options. It does not appear to be the most effective or time-efficient of the programs available.

Similar Programs: Logic of English, All About Reading, Sonday

Corrective Reading

Corrective Reading is a Direct Instruction phonics and reading comprehension program designed for Grades 3-12. Direct Instruction approaches (note the capitalization) are also known as the “DI” or “Engelmann" approaches. These are explicit, highly scripted fast-paced phonics programs structured for high levels of student engagement via call-and-response. Of the intervention studies I have reviewed thus far, the programs are of middling length for a phonics program. The interventions ranged from 35 to 200 hours, spanning 1-3 years. Below is a summary of the research on the Direct Instruction program Corrective Reading: Gunn et al. 2000 did a study with 256 students. After about 130 hours of intervention, they compared the progress of struggling students who received intervention with Reading Mastery or Corrective Reading with those who did not. Intervention students had significantly higher gains in word reading, nonsense word reading, vocabulary, and passage comprehension, and a near-significant gain in ORF. Effect sizes were not calculated. McCollum-Rogers 2004 analyzed data from several schools to compare Direct Instruction (Reading Mastery or Corrective Reading, depending on age of student) to the reading program Success for All, as well as a basal reader control group. After 3 years of intervention, students who received intervention via DI programs had the lowest reading test scores (as measured by the WRAT assessment). The effect size was small, but statistically significant. Conversely, Benner 2005 studied the use of Corrective Reading as an intervention for middle-schoolers in special education programs. After 30+ hours of intervention, students who received Corrective Reading demonstrated higher gains in Letter and Word Identification, Nonsense Word Reading, and ORF. Gains were statistically significant and effect sizes were moderate to large. Jackson 2005 carried out a small study of 30 students that examined the impact of approximately 180 hours (1 year) of intervention via Corrective Reading. Those who received Corrective Reading had slightly higher scores than controls on the STAR reading test following treatment. The calculated p-value (0.18) indicates that the results were not significant. Note, however that the researcher stated that the results were significant. So there is a discrepancy in interpretation here. Torgesen et al. 2007, in a fairly large scale study of 779 struggling readers, compared four reading interventions to controls. The four interventions were Corrective Reading, Failure Free Reading, Spell Read PAT, and Wilson Reading System. After 80-93 hours of 1:3 intervention, impacts were assessed one year after the intervention year. For 3rd Graders, there were significant positive impacts on Nonsense Word Reading, Nonsense Word Fluency, Word Identification. Compared to the other four interventions, Corrective Reading had the highest impact of any of them on Nonsense Word Fluency, but ranked second lowest for effects on Nonsense Word Reading, Word Reading Fluency, Oral Reading Fluency, and Passage Comprehension. There were no significant effects on standardized test reading scores for the Third Grade cohort. There were also no significant impacts compared to controls at the Fifth Grade level on any reading measures. Unfortunately, most of the impacts were negative. For the Fifth Grade cohort, the impact on standardized test reading scores was significant and negative. At the fifth grade level Corrective Reading had the more negative impacts of any of the programs studied except Failure Free Reading (though many of these negative effects were not statistically significant). For the students who entered the study with the lowest decoding scores, Corrective Reading had mixed impacts, with several negative but statistically insignificant effects. Reid 2010 compared a group of 110 special education students who received Corrective Reading to a control group. After one year of instruction, those who received Corrective Reading saw more improvement in reading scores than controls. The difference in gains was statistically significant. Effect size was not calculated. Joseph 2011 compared Corrective Readng to an HMH Reading intervention for a group of 180 ELL students. After 80-90 hours of intervention, there were no significant differences between the groups in terms of ORF and STAR reading scores. Note that the group of students studied in this case was ELL students, and not specifically struggling readers. Young 2012 analyzed data from a few schools with students who had received Corrective Reading vs those who had not. Those who received Corrective Reading had significantly higher reading scores on a standardized test (small effect size). Sawyer 2015 examined the impact of Corrective Reading as compared to a control group. After one year, those who received Corrective Reading had no significant differences in letter and word identification, nonsense word fluency, or reading comprehension as measured on the WRMT, but there was a significant and positive effect on standardized test reading scores. Fritts 2016 looked at students identified as having learning disabilities and compared Wilson Fundations and Wilson Reading System to the Direct Instruction program Corrective Reading, as well as a Business-As-Usual condition. After ten weeks (30 hours) of intervention, the study showed no significant differences between the three programs, but those receiving DI: Corrective Reading saw the best gains on the NWEA MAP test, though the effect size was small. When adjusted for pre-test scores, those receiving Corrective Reading had the highest average score on the NWEA post-test, though the difference was very small.

PedagogyNonGrata has a meta-analysis of Corrective Reading here. In a 2002 meta-analysis in the Journal of Direct Instruction, the authors found that out of 17 older studies comparing DI to other programs for struggling readers: 10 studies found DI to have the best outcomes, 3 had higher results for another program, while the remaining 4 were inconclusive.

The IEC What Works Clearinghouse has reviewed one study of Corrective Reading and found the effectiveness to be "Potentially Positive" Evidence for ESSA rated the Corrective Reading program based on one study and determined it to have a positive but "Weak" effect.

My Takeaway? The evidence on the effectiveness of Corrective Reading is somewhat mixed. In two studies it did have some negative results, which is not a great sign. However, it also has more studies with statistically statistically significant positive results than many others on this list. The program can also be delivered more rapidly than most and is scripted, which can be a bonus for some. I am still reviewing some literature, so I'll withhold final judgement until then, but at the moment, while this might not be the first intervention I'd reach for, I might try it if I couldn't get results with another.

Orton-Gillingham Approaches: Other

Many of those who are trained in Orton-Gillingham ( O.G. / “OG” ) approaches do not use a specific curriculum. Rather they use the scope and sequence provided by their particular training (for example ISME OG), or derive their own for a given student. For the purposes of this discussion, I have grouped these studies of “un-branded” or other O.G. approaches together. I address a few of the “branded” Orton-Gillingham derived programs, including Alphabetic Phonics, Barton, Take Flight, and Wilson elsewhere on this page.

An early experimental study of an Orton-Gilligham intervention (Litcher & Roberge, 1979) found that after about 500 hours of instruction (3 hours per day for the entire school year), students who were given OG instruction outperformed controls in terms of word analysis, word knowledge, vocabulary, reading, and reading comprehension. The results were considered statistically significant. No effect sizes were calculated, and due to the abbreviated summary of the research available, it was difficult to interpret the results further. Another study, Simpson et al (1992) found that overall, a 1 year OG intervention had a statistically significant impact on incarcerated students’ reading (nearly 1 year growth for the OG treatment, but virtually no growth for the control). However, the researchers noted that the impacts varied widely with time of intervention. Students who received less than 23 hours of instruction made little to no progress, and students who received under 50 hours of instruction did not make statistically significant progress. Only students who received more than 50 hours of intervention made significant progress compared to their peers. The effect size by amount of treatment was moderate. A 1993 study (Westrich-Bond, 1993) of 72 students found that students receiving OG approaches made gains in nonsense word decoding, and also in word identification. I only had access to a partial copy, so was only able to access the statistical significance via a meta-analysis by Ritchey and Goeke, who noted that the result was not statistically significant compared to controls (see Ritchey and Goeke 2006). Another study also had mixed results. A small intervention study carried out over a summer suggested that OG had a slightly weaker effect on phonemic awareness than the auditory training program Fast ForWord, but a statistically significant positive effect on word attack (large effect size) - though this was due to the fact that FFW actually resulted in declines in word attack scores (Hook et al. 2001 - see also this link). A study of 1st Grade students found that compared to a basal reader program, 45 hours of OG intervention had a statistically significant and moderately positive effect on decoding skills: word attack and word identification (Stewart 2011). A 2013 study of 87 struggling readers in grades 1-3 examined the impact of the OG intervention Sonday, as well as the Fast ForWord intervention as compared to a control group. Unfortunately, the group who received Sonday usually performed worse than Fast ForWord and sometimes worse than the control group in reading post-tests, though the differences were not always statiscially significant. After approximately 30 weeks of intervention, there were no significant differences between the groups in terms of post-test reading scores on the WIAT early reading subtests (phonics, rhyming, blending, and sight words). Fast ForWord had the highest mean score, Sonday the second highest, and the Control group had the lowest, but the difference was not statistically significant. Students in the Control Group had the highest score on the Reading Comprehension subtest of the WIAT, while the Sonday group had the second highest. The differences were not statistically significant. In terms of ORF, the Control Group had the highest score, Fast ForWord had the second highest, while Sonday was lower. The differences were not statistically significant. On the WIAT word reading and pseudoword reading subtests, the Control Group had the highest score, Fast ForWord had the second highest, and Sonday was lower than these. For these two subtests, the result was statistically significant (Reed 2013).

For further evaluation of Orton-Gillingham programs, see also this extended analysis of OG programs at PedagogyNonGrata, though note that not all the studies there focus on dyslexic or struggling readers.

My takeaway? Orton-Gillingham approaches have a solid theoretical base, but mixed evidence of effectiveness in practice. It is difficult to draw broad conclusions from studies of these approaches because the specific scope, sequence, and lessons may be wildly different between each of them. With a high enough dosage (45-500 hours), some of the OG interventions seem to be effective at getting results. Others, less so.

It’s a bit of a dice-roll here. If deciding to pursue an OG intervention, I would want to know more about the track-record of the individual tutor or interventionist. Specifically I would ask them how long it typically takes their students to go from square one to mastering all phonics, phonemic awareness, and decoding/word-attack skills and graduate from their tutoring program into independent reading.

Other OG Programs: SPIRE, PAF Reading

Reading Mastery (formerly known as DISTAR) Reading Mastery is a Direct Instruction approach. Direct Instruction approaches (note the capitalization) are also known as the “DI” or “Engelmann Approaches.” These are explicit, highly scripted phonics programs. They include Teach Your Child to Read in 100 Easy Lessons, DISTAR, Reading Mastery, Corrective Reading, Read Well, PHAB/DI, PHAST , Empower Reading TM, Phonics for Reading, and Early Interventions in Reading. Of the intervention studies I have reviewed thus far, the programs are of middling length for a phonics program. The interventions ranged from 35 to 151 hours, spanning 1-2 years. Below is a summary of the research on Reading Mastery:

O'Connor et al. 1993 (see alt pdf) compared DI Reading Mastery to the phonics program Superkids. There were many complications that occurred over the course of the study, resulting in variable hours of intervention (though an average of 90 hours), and ultimately very small sample sizes. The study found no statistically significant differences in reading between the two interventions, with the exception that of students who performed above average, DI students tended to be further above the average. DI students also had more gains in spelling, with a statistically significant and moderate effect size. Gunn et al. 2000 did a study with 256 students. After about 130 hours of intervention, they compared the progress of struggling students who received intervention with Reading Mastery or Corrective Reading with those who did not. Intervention students had significantly higher gains in word reading, nonsense word reading, vocabulary, and passage comprehension, and a near-significant gain in ORF. Effect sizes were not calculated. Butler 2001 carried out a study of 34 students over the course of 5 months. Unfortunately, the study found that students receiving Reading Mastery performed worse than those receiving basal reader instruction, and in fact the intervention appeared to have a negative impact. Those receiving Reading Mastery unfortunately declined across all reading areas tested (nonsense word reading, word identification, vocabulary, and passage comprehension). The Reading Mastery group did see gains on standardized reading test scores, but the control group had higher scores. The differences between the intervention and control group were statistically significant for overall reading scores, passage comprehension, and on standardized reading test scores, with Reading Mastery having a negative impact compared to the control group. Jones 2002 studied the impact of Reading Mastery on phonemic awareness with 36 kindergarten students who were at risk for reading difficulties. After 20 hours of intervention, there was a significant difference in the Phonemic Awareness measures between the groups. Those who received Reading Mastery had significantly higher PA scores (effect size moderate). McCollum-Rogers 2004 analyzed data from several schools to compare Direct Instruction (Reading Mastery or Corrective Reading, depending on age of student) to the reading program Success for All, as well as a basal reader control group. After 3 years of intervention, students who received intervention via DI programs had the lowest reading test scores (as measured by the WRAT assessment). The effect size was small, but statistically significant. Kamps et al. 2008 examined the impact of several different reading programs as compared to a balanced literacy "Guided Reading" approach. Several of the intervention programs were Direct Instruction (DI) programs: Reading Mastery, Early Intervention in Reading, and Read Well. The other programs included Open Court and Programmed Reading. After 120 - 151 hours of intervention over the course of 2 years, all of the interventions performed better than the Guided Reading control on measures of Nonsense Word Fluency and Oral Reading Fluency. Significance level was not provided, but in terms of impact, the DI interventions had a medium effect size. The difference between DI and the "Guided Reading" approach was statistically significant in terms of Nonsense Word Reading, Word Identification, and Passage Reading. Effect sizes were not calculated. The researchers concluded that structured, explicit phonics instruction was what moved the needle, as opposed to a specific program.

Despite performing well against control groups, the meta-analysis linked here found that research on Reading Mastery has been a bit mixed when compared to other phonics programs.

The IEC What Works Clearinghouse has reviewed a few studies on Reading Mastery for teens and rated the program as "Potentially Positive" for fluency. It also reviewed one study of Reading Mastery for ELLs and again rated it as "Potentially Positive" The older, DISTAR program was rated by the WWC via one study and was found to have "No discernable effect"

My Takeaway? The evidence on the effectiveness of Reading Mastery is somewhat mixed. In two studies it did have some negative results, which is not a great sign. However, it also has more studies with statistically statistically significant positive results than many others on this list. The program can also be delivered more rapidly than most and is scripted, which can be a bonus for some. I am still reviewing some literature, so I'll withhold final judgement until then, but at the moment, this would not be the first intervention I'd reach for.

Similar Programs: "Teach Your Child to Read in 100 Easy Lessons" "100 Easy Lessons" was derived from DISTAR, the same parent program that Reading Mastery derived from. Thus far, I have only found three research studies of the "100 Easy Lessons" program that included control groups (Kay 2003, Fjortfort et al 2014, McConnell & Kubina 2016). The results were positive, but the sample sizes were so small (in one case, 2 students), and the study designs and control groups were often very different from the rest of the research on this page, so I have opted not to do a full review at this time. Hopefully I will be able to find more studies soon. Because "100 Easy Lessons"

Take Flight

Take Flight is a 2 year phonics, phonemic awareness, vocabulary, fluency, and reading comprehension program designed on OG principles and based off of earlier programs such as Alphabetic Phonics (see my summary above). It is designed to be implemented by highly trained Certified Academic Language Therapists (CALTs), a certification that requires hundreds of hours of training and hundreds more hours of practicum.

There have only been a few studies of Take Flight which included control groups.

Oakland et al 1998 studied a closely related program, the 350-hour “Dyslexia Training Program” DTP. This program was derived from Alphabetic Phonics, and was in many ways a precursor to Take Flight. In this study of 48 students, 2-year DTP interventions had statistically significant impacts on word recognition, multi-syllable word decoding, and reading comprehension compared to business-as-usual controls. Neither the DTP or Control group improved in spelling. Effect sizes were not calculated, but the researchers labelled them "modest... given the intensity and duration of the intervention." Students’ word recognition levels were still below average after 2 years of intervention. Ring et al 2017 studied a group of 12 students before and after a minimum of 280 hours of Take Flight or adapted DTP intervention with a CALT. The study showed strong effects on comprehension but moderate to weak effects on decoding, word reading, and fluency. Unfortunately, the significance and reliability of these results is complicated by the fact there were problems with the study design, as the intervention group had higher reading skills to begin with. As a result, these findings must be interpreted with caution (see PedagogyNonGrata for further discussion). Another study by Rauch et al (2017) compared Take Flight to a district-designed program which combined Rite Flight, SIPPS, and LLI. After an average of 60 weeks of Take Flight, or 55 weeks of the district-designed program, there was no statistically significant difference in student reading outcomes between the two approaches, and neither was able to get students’ reading scores up to those of the general population on standardized tests, but the district-designed program performed somewhat better than Take Flight (see also this link).

The Take Flight program has a brochure on its website which describes data that it has collected over the years, including follow-up data with 22 children which shows reading comprehension and word recognition scores in the average range 4 years after intervention. It also shows Pre-test / Post-test scores indicating growth following intervention, though word efficiency and oral reading fluency were still below average. Unfortunately, there were no control groups in this study, so although the data is promising, we can’t definitively attribute the effects to the Take Flight program, in the same way that we could not attribute changes in children's height during the study period as an effect of the program.

My takeaway? Take Flight has a solid theoretical base, and is often administered by highly trained professionals (Certified Academic Language Therapists - CALTs) who have spent hundreds of hours tutoring struggling students under the guidance of a mentor. CALTs will have seen a lot, and will have a lot of experience. That’s worth a great deal. That said, the Take Flight program itself seems to get fairly muted results, even after 280-350 hours of intervention. In one case, it didn't even perform as well as a district-designed program. This is one of the most time-intensive interventions I’ve ever seen. I suspect you could do a fair bit better with a CALT using a different program.

Fair Evidence of Relative Effectiveness:

These programs nearly always have good outcomes in the research. However, there may be too few studies to draw firm conclusions about their performance, or they may target a very specific subset of struggling students, rather than the broader struggling student population.

PHAB / PHAST / Empower Reading TM

PHAB or "Phonological Analysis and Blending" Instruction is a Canadian program designed by medical researchers at the Hospital for Sick Children and modelled after Direct Instruction approaches (note the capitalization). The program was later combined with a reading strategy program, becoming PHAST, and eventually has been branded as Empower Reading TM. It is only available via training. Also known as the “DI” or “Engelmann Approaches,” Direct Instruction programs are explicit, highly scripted phonics programs. Of the intervention studies I have reviewed thus far, DI programs are of middling length for a phonics program. The interventions ranged from 35 to 80 hours, spanning 1-2 years.

Lovett & Steinbach 1997 studied two interventions: the PHAB/DI program for phonological awareness and phonics, which was modeled after the DI program Reading Mastery and Corrective Reading, as well as the WIST program, which taught students to tackle words in chunks (such as morpheme chunks) and to adjust vowel sounds (set for variability). After 35 hours of intervention, both the PHAB and WIST interventions performed better than controls with statistically significant results across all reading measures. No effect sizes were calculated. When comparing the two interventions to one another, the WIST tended to have a bigger impact on word identification, while PHAB/DI tended to have a bigger impact on reading nonsense words and sound-spellings accurately. In a followup study, Lovett et al. 2000 examined the PHAB/DI and WIST interventions again, as well as combined PHAB + WIST interventions, alongside Control Groups. After 70 hours of intervention, researchers found that students who received intervention performed better than controls in terms of Nonsense Word Reading and Word Identification, and the results were statistically significant. Effect sizes were not calculated. Researchers also found that those who received 35 hours of PHAB and 35 hours of WIST had greater improvement than those who received 70 hours of a single program.

My Takeaway? Of the Direct Instruction programs that I have reviewed, the PHAB / PHAST / Empower Reading approaches seem to get the best results. However, as the program is only available in some Canadian schools, or by group training sessions with your school, it might be hard to access. Efficiency wise, they are a middling program. They don't take as long to accomplish their work as many OG-based programs, which is a plus, but they also aren't nearly as swift as Speech-to-Print programs.

LiPS: Lindamood Phoneme Sequencing Program for Reading
(formerly Auditory Discrimination in Depth - ADD )

Lindamood-Bell has three core programs which target different aspects of reading. LiPS is its phonemic awareness and phonics strand which addresses things from a primarily auditory and articulation perspective. Seeing Stars approaches phonics and phonemic awareness using visualization techniques, and Visualizing and Verbalizing focuses on reading comprehension via visualization and schema building. Given that the programs are very different, I will keep them separate in this analysis. Here, I only discuss the research on LiPS. This program focuses primarily on articulation, phonemic awareness, and some decoding. In terms of intervention time, it is of middling length, and usually done in a burst of intensive intervention over the course of just a few months (20-100 hours).

Gunn 1996 (alt link) found that 20 hours of small group LiPS (then called ADD) intervention for readers with poor phonological awareness did not significantly improve reading measures compared to a control group who were given basal reader instruction. None of the students were able to reach average grade level on these measures. The researchers highlighted a high degree of variability in the results. That is to say, the intervention seemed to help some students quite a lot, and other students not much at all. This sort of result could suggest many things, including perhaps, that the intervention targets a specific underlying weakness which not all students share. This would not be surprising given that the intervention is designed to specifically emphasize phonemic awareness and articulation. Meanwhile Kutrumbos 1993 (alt link) studied an adapted version of LiPS and found that when combined with an OG-derived phonics approach, the hybrid program had a significant impact on student decoding ability after 45 hours of instruction. Torgesen 2001 studied two interventions with students who had severe reading disabilities (the lowest 2%), and found that students receiving LiPS (then called ADD) instruction saw statistically significant growth in reading outcomes including word attack and word identification, though not reading rate. While students were still below average in reading rate at the end of a 2-year follow up, their decoding, accuracy, and reading comprehension were all in the average range. Pokorni et al 2004 carried out a small study of 54 students which examined the impact of three different interventions: LiPS, Earobics, and Fast ForWord. After a 4 week (60 hour) intervention, those who received the LiPS intervention performed significantly better than the other groups on phoneme blending measures, and also had higher PA gains, though the difference was not significant. All effect sizes were large. On a 6-week follow-up, this was not found to transfer to significant improvement on other reading posttests. While the intervention was associated with improvements on segmenting and blending phonemes and nonsense word reading, there appeared to be negative impacts on real word reading and passage comprehension with the intervention (Sidenote: this is perhaps unsurprising, given that the intervention was abbreviated and primarily taught phonemic awareness and letter-sounds, with "little exposure to decoding"). Toregesen et al 2010 examined two interventions: LiPS and Read, Write, Type (RWT). Both interventions are built on similar theories of reading. They were compared to a business-as-usual control. In two schools the control classrooms used Open Court. After approximately 80-84 hours of intervention, both the LiPS intervention and the Read, Write, Type interventions had substantial impacts. Unsurprisingly, given their similaritites, the two intervention programs had similarly positive results across the board. The interventions moved the students' average scores on two key measures from far below average to above average: from the 16th to the 73rd percentile on word reading accuracy and from the 5th to the 77th for decoding. Control students also improved but were only able to reach the 50th percentile. Both LiPS and RWT had significantly higher positive impacts compared to controls in all measures: in word accuracy, word reading fluency, nonsense word reading, nonsense word fluency, phonological awareness, reading comprehension, and spelling. Effect sizes were mostly moderate, though some approached large.

One study of LiPS meets What Works Clearinghouse Standards, and based on the evidence from this study, WCC rates LiPS as Potentially Positive. Similarly, Evidence for ESSA found that it had promising evidence and a moderate to strong effect size.

My takeaway? On its own, LiPS is a unique program which focuses heavily on articulation of sounds and phonemic awareness without letters, as well as some phonics. It’s an older program. Given that we now know it is best to do phonemic awareness activities WITH letters, the theoretical basis for some aspects of LiPS is a bit shaky. That said, phonemic awareness is important, and for some children, articulation exercises may be the exact intervention which helps them to distinguish sounds. LiPS did have statistically significant positive impacts in several of the studies. However, it performed no better than another phonics intervention in two studies and in another study, it was integrated with phonics exercises. There is some indication that due to its sequential design, LiPS may solve PA issues but not always phonics issues when the program is not done in its entirety. It also does not incorporate passage reading. Of note: LiPS has also been extensively studied in the neuroscience literature. While I have not had a chance to include these studies yet, I know that some of them (e.g. Simos 2002) have had very good results. Stay tuned.

For students who need a heavy amount of articulation practice and PA work, LiPS activities may be just the thing, but alone, this may not help all students and it seems wise to pair it with a full phonics and reading program, or choose a program that has these more deeply integrated. The Lindamood-Bell approach might use their Seeing Stars and Visualizing and Verbalizing reading comprehension program as followups. But these would add a lot of time onto the intervention sequence.

UFLI Foundations

UFLI is a new program. I wouldn’t normally review a program designed for classroom use and not designed for intervention, but due to the substantial recent buzz around the program, I am including a discussion of it here!

UFLI Foundations is a recently developed program based on the Orton-Gillingham method, but incorporating some of the articulation emphasis of LiPS as well as some emphasis on word chaining, and a de-emphasis on some of the “phonics rules” ala Speech-To-Print programs such as Reading Simplified (see below). It is a lengthy program designed for whole classroom use. The program consists of 148 one-hour lessons (usually split into 30 minute teaching increments). The scope and sequence takes 2-3 years to complete.

UFLI Foundations has had one quasi-experimental study carried out to determine its effectiveness. The study has not been published in a peer-reviewed journal, but was carried out by an independent researcher. (“Peer-reviewed” basically means that other education researchers from competing universities/businesses have fact-checked the study).

Gage 2023 compared 1,670 students in grades K-1 who received UFLI in 2021/2022 to a demographically matched control group who did not receive it in the previous 2020/2021 school year. Both groups of students had performed below-average on a DIBELS pre-test. The study found that students who received UFLI had significantly higher post-test DIBELS composite scores than students from the previous year who had not received UFLI. The effect size was large. The total instructional time, if carried out to fidelity, was approximately 90 instructional hours, at 30 min/day, but teachers varied in terms of the hours they actually taught. The study has promising results, albeit with some caveats. While the study was large, it was not randomized, and studies with control groups from different time periods run the risk of comparing apples to oranges. This is somewhat problematic, and might be particularly so in the years studied because the control group may have been impacted by the COVID school closures, which could lead to lower post-test reading scores for the control group, and therefore artificially inflate gains observed in the experimental year. Declines in reading scores were seen nation-wide during the 2020-2021 school year (see Kuhman et al. 2022, and 2023, and Domingue et al. 2021), so it seems highly possible that the study was affected by this. Long story short, the real effects of UFLI might not be as large as this particular study indicates. Subsequent studies will help resolve this.

While UFLI has not been studied as an intervention, an earlier version of the program was piloted as such in a small 3-week summer study. Though not the full UFLI program, it showed promise, with students who received the 1:1 intervention scoring significantly higher than controls on two measures: Consonant Blends with Short Vowels and R-Controlled Vowels. Effect sizes were considered moderate. (Contesse et al. 2021).

My Takeaway? The UFLI foundational skills program has a solid theoretical basis and promising results in a single large study and one smaller study. The effect size in the larger study was substantial and significant. This is a good sign. However, as the control group may have been impacted by the COVID-19 school closures, and the results haven't, to my knowledge, been peer-reviewed, the results must be interpreted with some caution. The program is not currently designed for intervention, and delivery of the full program requires upwards of 140 instructional hours. While the program has a lot of positive buzz around it, and some promising evidence of effectiveness at producing gains in reading, it is not incredibly efficient, time-wise, at getting kids from point A to point B, and therefore would not be my first choice, especially for intervention.

Good Evidence of Relative Effectiveness:

These are phonics programs which have been found to repeatedly and consistently get significant positive results with struggling readers.

Reading Simplified
(formerly called the Targeted Reading Intervention)

Reading Simplified is a Speech-To-Print phonics, phonemic awareness, decoding, and fluency-building program. It is designed to teach its entire scope and sequence over the course of a 12 -16 week reading intervention. It can also be used whole-class in Kindergarten or First Grade. Reading Simplified is the publicly available offshoot of the Targeted Reading Instruction program ( formerly called the Targeted Reading Intervention - TRI ). Both programs were designed by Dr. Marnie Ginsberg, and the word work activities within the program are modelled after the speech-to-print approach developed by Dr. Diane McGuinness. The full Reading Simplified scope and sequence takes 3 to 6 months to complete as intervention, but can be delivered over the course of 1-1.5 years, if preferred, as a whole-classroom Tier 1 program.

In total, 6 research studies with control groups have examined the effectiveness of the Reading Simplified instructional approach (known in research as the Targeted Reading Intervention - TRI), making it one of the most thoroughly studied programs on this page. Most of the studies used some form of a randomized controlled trial (RCT) study design, which is considered the gold standard in clinical research.

A preliminary study of Reading Simplified was carried out in 2010, when it was called the Targeted Reading Intervention (TRI). In this randomized controlled trial, researchers found that struggling Kindergarten students who received approximately 9 hours of TRI instruction had significantly higher gains in word identification measures than the control group, with a large effect size. No significant effects were found in word attack, vocabulary or for First Grade, where the small sample size and lack of fidelity were noted as complications ( Vernon-Feagans et al. 2010 ). A 2011 study used a cluster randomized control design to examine the impact of the TRI on 112 struggling readers in Title 1 schools in the southwest. Intervention times per student varied, but most teachers provided a total of 2 to 5 hours of TRI instruction over the course of several weeks. The study found that students who received the TRI scored significantly higher than controls across the board. Effect sizes were medium for word-attack, and letter-word identification, medium-high for passage comprehension, and smaller for spelling. Furthermore, these impacts translated beyond the target intervention students. Students in classrooms whose teacher received TRI training performed significantly better than classrooms whose teacher did not receive TRI training (Amendum et al. 2011). A subsequent study of struggling readers in Title 1 schools in the rural southeast found that overall, students who received TRI had significant positive impacts on letter-sound knowledge and word reading (medium effect sizes), and positive, but non-significant impacts on word attack. The study found that students with slower processing skills benefited more than those with higher processing skills (Vernon-Feagans et al. 2012). A similar RCT study of 635 students in poor rural counties found that struggling readers who received 3-5 hours of TRI, though only able to catch up to their non-struggling peers in spelling, outperformed controls in all areas. The intervention had medium effects in letter & word identification, and spelling, slightly smaller effect sizes for word attack, and passage comprehension, and a small effect on vocabulary. All effects were statistically significant. (Vernon-Feagans et al 2013). In a more recent study Amendum et al 2018 used a randomized control design to examine the impact of the TRI on English Language Learners (ELLs) who had been identified as struggling readers. The study found that after 5 hours of instruction (delivered over 9 weeks) the TRI had statistically significant positive effects on letter, word, and nonsense word reading measures (all effect sizes medium), and positive, though non-significant impacts on passage reading. Vernon-Feagans et al. 2018 carried out a 3 year RCT study with 556 at-risk students (305 treatment, 251 control). Each student received approximately 4 total hours of intervention over the course of 6 to 8 weeks. The study found that students who received the TRI intervention performed better than controls, with small but significant effect sizes across the board, in letter-sound knowledge, nonsense words, word reading, passage comprehension, and spelling. Researchers noted that fidelity to time-of-intervention was an issue: students should have received 7-10 hours of intervention but only received 4 hours. Bratsch-Hines et al 2020 further analyzed the data collected for the Vernon-Feagans 2018 study and found that the TRI intervention seemed to be equally effective for students who pre-intervention, had a range of phonological awareness skills. It also found that with regards to student differences in oral vocabulary, the TRI, while effective across the board, was most effective for students who came to the table with low vocabulary, as opposed to average or high vocabulary.

The IES What Works Clearinghouse reviewed the Targeted Reading Intervention and lists it as having “Positive Effects.” Due to its promising results, the IES recently awarded researchers a $5 million research grant to replicate the TRI studies and examine the program further. This research will be completed in 2026. Evidence for ESSA also lists the TRI as having strong research and a moderate effect that is above average for tutoring. For further discussion of Reading Simplified / TRI, see also this meta-analysis on PedagogyNonGrata, who gave it the highest ranking of any program it has reviewed.

My Takeaway? Reading Simplified consistently gets very good results in a very short amount of time. I won’t mince words; it is, hands-down, the best phonics program I have ever seen. It has very good evidence supporting it, replicated in multiple, large, randomized controlled trials. Furthermore, students see these gains after an incredibly short amount of intervention (if you look back at the TRI studies, this is typically 5-10 hours, carried out over several weeks. Reading Simplified extends this to 12-16 hours, over a few months to deliver its entire scope and sequence).

Many people get hung up on this short intervention time. We have become so used to phonics interventions which take an incredibly long time, so much so that we have a mantra for it: “it’s a marathon, not a sprint.” But here, we have to look at what the evidence is telling us. And what it indicates is that teaching phonics, phonemic awareness, and decoding to struggling readers really does not have to take as long as we thought, so long as the program is well-structured. Based on my research thus far (and, fwiw, my personal experience as a mother and reading tutor), Reading Simplified / TRI should really be the first phonics program parents and educators reach for when teaching a child to read … especially if they have a struggling reader.

Similar Speech-to-Print Programs: Reading Simplified is what is known colloquially as a “Speech-To-Print” program. Not sure what that is? Here is my Speech-To-Print 101 Primer doc. While Reading Simplified is certainly the most well studied of the programs, there are others that follow a similar approach to teaching phonics (I have summarized some research on them but many of those studies do not include control groups. See also this extended discussion on research and S2P Programs at PedagogyNonGrata). I will list the major Speech-to-Print intervention programs below along with any research studies that used an experimental control design, and the approximate dosage for each of the interventions.

Reading Simplified ( Dosage: 12-16 hours, 1 hr/wk )
Studies with controls: Several. See above discussion.
Phono-Graphix / Reading Reflex Book ( Dosage: 12-24 hours, 1 hr/wk )
It is important to mention that this program is the grandparent of the other programs listed here. While it hasn’t had any educational studies with experimental controls, it has been the subject of several experimental neuroscience studies. These show significant positive impacts on reading as well as on neural reading networks. In Simos 2002, for example, the researchers concluded that after 2 months of Phono-graphix “dyslexia can be reversed.”
SPELL-Links ( Dosage: 12-24 hours )

Studies of SPELL-Links with Control: Wanzek 2017: Significant positive impact

Sounds-Write: ( Dosage: 3 years for classroom. Intervention dose unknown )

Studies of SW with Control: Alfey 2009: Positive impact, albeit similar to other interventions

EBLI ( Dosage: 12-24 hours, 1 hr/wk )

That Reading Thing: ( Dosage: 12+ hours, 1hr/wk )

Phonics for Pupils with SEN: ( Dosage: Varies )

The Reading Foundation: ( Dosage: Varies )

Rooted in Language: ( Dosage: 1-2 years for homeschool curriculum )

Sharpen / ABeCaDerian: ( Dosage: 2-12 months for homeschool curriculum )

On My “To Review” List:

There are a few programs that I am interested in exploring further, but have not had time to yet. In the meantime, the research on some of these programs is explored at PedagogyNonGrata.

Structured Word Inquiry

Structured Word Inquiry (SWI) is a newer approach to teaching spelling and reading. Rather than teaching via a systematic phonics scope and sequence, it analyzes words via a series of four questions: 1) What does the word mean? 2) How is it constructed? 3) What are its morphological and etymological relatives? 4) How do the graphemes and phonemes function in the word? The approach uses a lot of explicit, direct instruction embedded into a broader scientific inquiry led approach. The theory underlying it is that students who struggle to learn phonics (letter-sound relationships) may find greater success attacking words from a meaning-based standpoint. Because morphemes (meaning chunks) have fairly regular spellings, the relationship between letter-strings and meaning-chunks could provide a more stable foundation for children who really struggle with sound. Personal sidenote: While I currently use some SWI methods to teach spelling, I am not personally convinced that this is would be a very efficient way to teach most struggling children to read. That said, if your kiddo has really struggled with phonics even after trying a variety of approaches (including a speech-to-print approach), then you might give SWI a look, as it is very good for spelling and vocabulary.

95 Phonics

This program was designed as a core classroom program, so I may not review it on this intervention-focused page.

Jolly Phonics

The research on Jolly Phonics is largely not peer-reviewed (“peer-reviewed” basically means that other education researchers from competing universities have fact-checked the paper), so this research needs to be taken with a grain of salt, but it appears to have strongly positive outcomes.

Lexia
While there is a fair bit of high-quality research on Lexia showing positive results, there are no known studies of the intervention on students with dyslexia or reading struggles, so I have opted not to review it at this time.

Lippincott Basic Reading Series

This is an older program, but had the highest effect size in the NRP 2001 report, so I would eventually like to review it. From what I can gather, the approaches used in Lippincott influenced the thinking of Diane McGuinness and subsequently, the Reading Simplified program which I did review above, and which has the strongest evidence of effectiveness of any phonics program I have found.

Phonics for Reading

This is a phonics program for older readers and follows the principles of the direct & explicit instruction models. It was designed by Anita Archer. While grounded in strong theoretical research, there are currently no real-world studies of its effectiveness. One study is underway and will be completed in 2026.

RAVE-O

This intervention program was designed by reading neuroscientist Maryanne Wolf and specifically targets students who have deficits in RAN. It is not a full phonics program, but rather is intended to be taught alongside or following a phonics program, so I have opted not to review it yet. Related programs and off-shoot programs include EUREKA! (Italian). RAVE-O is often studied alongside the more phonics-oriented multicomponent programs: PHAB / WIST / PHAST and Empower Reading (which are derived from the Direct Instruction model - see above). Studies include Lovett et al. 2017, Lovett 2000, and Morris 2011. And Wolf et al. 2009 summarizes some of the earlier research.

Seeing Stars

This is the visualization-oriented phonics program from Lindamood-Bell. While I did find one recent study that included a control, I am still looking for more. In this study, the outcomes were positive and significant, but only half the intervention group made gains, suggesting that perhaps the intervention only helps students who have a particular underlying condition or profile ( Christodoulou et al 2015 ). This program, in addition to LiPS, led to an offshoot program called NOW! Neuro-development of Words. I have some theoretical concerns with this program. It relies a great deal on visualization. Given that visualization relies heavily on the brain's right hemisphere, while expert readers rely heavily on the left hemisphere, this approach may not be the optimal way to train the brain to read (and it could produce slower than necessary reading). However, it may help some kiddos. I will reserve my final judgement until I have reviewed more research on this.

Spalding

This is another offshoot of the Orton-Gillingham approach. It shares some similarities to the “speech-to-print” methods such as Reading Simplified in that it has a heavy emphasis on writing and teaches multiple grapheme-phoneme correspondences at once in order to organize and streamline instruction. I actually personally learned to read using a combination of this method and Play N’ Talk. However, I was a precocious reader, and the theoretical basis for its specific method of organizing the phonetic code (organized by letter/spelling) is not as strong as the Speech-To-Print method (organized by sound/phoneme). Pre-readers come to the table with knowledge of sounds and word meaning NOT letters & spellings … so I would suspect that Spalding would not be as effective as S2P at helping struggling readers. I’ll have to see what the research says.

Slingerland

This is an Orton-Gillingham based approach. Given that it is carried out whole classroom rather than as intervention, I may or may not opt to review it on this page.

Research Studies:

Alfrey, J. (2009). An Evaluation of the Sounds-Write Approach to Initial Literacy Development in Schools. Dissertation. The University of Manchester (United Kingdom). Google Scholar

Amendum, S. J., Vernon-Feagans, L., & Ginsberg, M. C. (2011). The effectiveness of a technologically facilitated classroom-based early reading intervention: The targeted reading intervention. The Elementary School Journal, 112(1), 107-131. Google Scholar

Austin, C., Stevens, L., Demchack, A., & Solari, E. (2023). Orton Gillingham: Which aspects are supported by research and which require additional research. The Reading League Journal, 4(3), 5-15. The Reading League (pdf)

Azizifar, A., Salamati, M., Mohamadian, F., Veisani, Y., Cheraghi, F., Alirahmi, M., & Aibod, S. (2019). The effectiveness of an intervention program-barton intervention program-on reading fluency of Iranian students with dyslexia. Journal of education and health promotion, 8(1), 167. Google Scholar

Bisplinghoff, S. E. (2015). The effectiveness of an orton-gillingham-stillman-influenced approach to reading intervention for low achieving first-grade students (Doctoral dissertation, Florida Gulf Coast University). Google Scholar

Bratsch-Hines, M., Vernon-Feagans, L., Pedonti, S., & Varghese, C. (2020). Differential effects of the targeted reading intervention for students with low phonological awareness and/or vocabulary. Learning Disability Quarterly, 43(4), 214-226. Google Scholar

Dooley, B. (1994). Multisensorially integrating reading and composition: Effects on achievement of remedial readers in middle school. Dissertation. Texas Woman's University. Google Scholar

Duff, D., Stebbins, M. S., Stormont, M., Lembke, E. S., & Wilson, D. J. (2016). Using curriculum-based measurement data to monitor the effectiveness of the Wilson Reading System for students with disabilities: an exploratory study. International Journal on Disability and Human Development, 15(1), 93-100. Google Scholar

Fritts, J. L. (2016). Direct instruction and Orton-Gillingham reading methodologies: Effectiveness of increasing reading achievement of elementary school students with learning disabilities. Dissertation. Northeastern University. Google Scholar

Gage, N. (2023). Districtwide Pilot Study of UFLI Foundations. UFLI | WestEd

Ehri, L. C., Nunes, S. R., Stahl, S. A., & Willows, D. M. (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of educational research, 71(3), 393-447. Google Scholar

Gearhart, S. L. (2017). Reading Comprehension through Incidental Learning: Efficacy of an After-School Literacy Program Utilizing the Barton Reading & Spelling System®. Thesis. Arkansas State University. AState (pdf) | Google Scholar

Giess, S. (2005). Effectiveness of a multisensory, Orton-Gillingham-influenced approach to reading intervention for high school students with reading disability. Dissertation. University of Florida. Google Scholar

Gunn, B. K. (1996). An investigation of three approaches to teaching phonological awareness to first-grade students and the effects on word recognition. University of Oregon. Google Scholar

Hook, P. E., Macaruso, P., & Jones, S. (2001). Efficacy of Fast ForWord training on facilitating acquisition of reading skills by children with reading difficulties—A longitudinal study. Annals of Dyslexia, 51, 73-96. Google Scholar

Kutrumbos, B. M. (1993). The effect of phonemic training on unskilled readers: A school-based study. Dissertation. University of Denver. Google Scholar

Kuveke, S. H. (1996). Effecting Instructional Change: A Collaborative Approach. ERIC (pdf) | Google Scholar

Litcher, J. H., & Roberge, L. P. (1979). First grade intervention for reading achievement of high risk children. Bulletin of the Orton Society, 29, 238-244. Google Scholar

Mihandoost, Z., & Elias, H. (2011). The effectiveness of the Barton’s intervention program on reading comprehension and reading attitude of students with dyslexia. Iranian Journal of Psychiatry and Behavioral Sciences, 5(2), 43. Google Scholar

NRP - National Reading Panel (2001). Systematic phonics instruction helps students learn to read: Evidence from the National Reading Panel’s meta-analysis. Review of educational research, 71(3), 393-447. Google Scholar

Oakland, T., Black, J. L., Stanford, G., Nussbaum, N. L., & Balise, R. R. (1998). An evaluation of the dyslexia training program: A multisensory method for promoting reading in students with reading disabilities. Journal of learning disabilities, 31(2), 140-147. Google Scholar

Rauch, A. L. I. (2017). An analysis of two dyslexia interventions (Doctoral dissertation, Texas Woman's University). Google Scholar

Reuter, H.B. (2006). Phonological awareness instruction for middle school students with disabilities: A scripted multisensory intervention. Dissertation 3251867. Proquest | Google Scholar

Ring, J. J., Avrit, K. J., & Black, J. L. (2017). Take flight: The evolution of an Orton Gillingham-based curriculum. Annals of Dyslexia, 67, 383-400. Google Scholar

Ritchey, K. D., & Goeke, J. (2006). Orton-Gillingham and Orton-Gillingham-Based Reading Instruction: A Review of the Literature. The Journal of Special Education. ERIC (pdf) | Google Scholar

Simos, P. G., Fletcher, J. M., Bergman, E., Breier, J. I., Foorman, B. R., Castillo, E. M., ... & Papanicolaou, A. C. (2002). Dyslexia-specific brain activation profile becomes normal following successful remedial training. Neurology, 58(8), 1203-1213. Phono-Graphix (pdf) | Google Scholar

Simpson, S. B., Swanson, J. M., & Kunkel, K. (1992). The impact of an intensive multisensory reading program on a population of learning-disabled delinquents. Annals of Dyslexia, 42, 54-66. Google Scholar

Solari, E., Petscher, Y., & Hall, C. (2021). What Does Science Say About Orton-Gillingham Interventions? An Explanation and Commentary on the Stevens et al. (2021) Meta-Analysis. PsyArXiv. March, 29. The Reading League (pdf) | Google Scholar

Stamm, A. H. (2017). A Program Evaluation: Fidelity of Implementation of the Wilson Reading System in A Mid-Atlantic School District. Core UK (pdf) | Google Scholar

Stebbins, M. S., Stormont, M., Lembke, E. S., Wilson, D. J., & Clippard, D. (2012). Monitoring the effectiveness of the Wilson reading system for students with disabilities: One district's example. Exceptionality, 20(1), 58-70. Google Scholar

Stevens, E. A., Austin, C., Moore, C., Scammacca, N., Boucher, A. N., & Vaughn, S. (2021). Current state of the evidence: Examining the effects of Orton-Gillingham reading interventions for students with or at risk for word-level reading disabilities. Exceptional children, 87(4), 397-417. NIH | Google Scholar

Stewart, E. D. (2011). The impact of systematic multisensory phonics instructional design on the decoding skills of struggling readers. Dissertation. Walden University. Google Scholar

Torgesen, J. K., Alexander, A. W., Wagner, R. K., Rashotte, C. A., Voeller, K. K., & Conway, T. (2001). Intensive remedial instruction for children with severe reading disabilities: Immediate and long-term outcomes from two instructional approaches. Journal of learning disabilities, 34(1), 33-58. LMB (pdf) Google Scholar

Torgesen, J., Schirm, A., Castner, L., Vartivarian, S., Mansfield, W., Myers, D., ... & Haan, C. (2007). National Assessment of Title I. Final Report. Volume II: Closing the Reading Gap--Findings from a Randomized Trial of Four Reading Interventions for Striving Readers. NCEE 2008-4013. National Center for Education Evaluation and Regional Assistance. IES (pdf) | Google Scholar

Vernon–Feagans, L., Gallagher, K., Ginsberg, M. C., Amendum, S., Kainz, K., Rose, J., & Burchinal, M. (2010). A diagnostic teaching intervention for classroom teachers: Helping struggling readers in early elementary school. Learning Disabilities Research & Practice, 25(4), 183-193. Google Scholar

Vernon-Feagans, L., Kainz, K., Amendum, S., Ginsberg, M., Wood, T., & Bock, A. (2012). Targeted reading intervention: A coaching model to help classroom teachers with struggling readers. Learning Disability Quarterly, 35(2), 102-114. Google Scholar

Vernon-Feagans, L., Kainz, K., Hedrick, A., Ginsberg, M., & Amendum, S. (2013). Live webcam coaching to help early elementary classroom teachers provide effective literacy instruction for struggling readers: The Targeted Reading Intervention. Journal of Educational Psychology, 105(4), 1175. Google Scholar

Vernon-Feagans, L., Bratsch-Hines, M., Varghese, C., Cutrer, E. A., & Garwood, J. D. (2018). Improving struggling readers’ early literacy skills through a Tier 2 professional development program for rural classroom teachers: The targeted reading intervention. The Elementary School Journal, 118(4), 525-548. Google Scholar

Wanzek, J., & Roberts, G. (2012). Reading interventions with varying instructional emphases for fourth graders with reading difficulties. Learning Disability Quarterly, 35(2), 90-101. ERIC (pdf) | Google Scholar

Wanzek, J., Gatlin, B., Al Otaiba, S., & Kim, Y. S. G. (2017). The impact of transcription writing interventions for first-grade students. Reading & Writing Quarterly, 33(5), 484-499. Google Scholar

Westrich-Bond, A. (1993). The effect of direct instruction of a synthetic sequential phonics program on the decoding abilities of elementary school learning-disabled students. Dissertation. Rutgers - The State University of New Jersey, School of Graduate Studies. Google Scholar

Young, C. A. (2001). Comparing the effects of tracing to writing when combined with Orton-Gillingham methods on spelling achievement among high school students with reading disabilities. Dissertation. The University of Texas at Austin. Google Scholar

Know Your Options: Phonics Interventions

No comments:

Post a Comment

Endless Phonics

Report Abuse