AI ResearchCritical ThinkingLearning VisibilityFuture of Education

The 28 Percent That Matters

An AI education platform just earned one of the first evidence-based certifications in the field. The finding that earned it tells us exactly what kind of AI belongs in schools.

April 24, 20266 min readKoan Team

Yesterday, an AI education company called SchoolAI announced something that, on the surface, looks like a routine press release. They earned an ESSA Tier 3 evidence certification based on a two-year study conducted with Jordan School District in Utah. The headline number: students using their platform demonstrated a 28% improvement in critical thinking.¹

In a field flooded with bold claims and thin evidence, this is worth pausing on. Not because 28% is a magic number, but because of what had to be true for that number to emerge at all.

What ESSA Tier 3 Actually Means

The Every Student Succeeds Act established four tiers of evidence quality for education interventions. Tier 1 requires a large randomized controlled trial. Tier 4 requires only a logic model. Most education technology products operate at Tier 4 or below, which is a polite way of saying they have a theory about why they should work but no rigorous evidence that they do.

Tier 3 sits in the middle: it requires a well-designed study with statistical controls for selection bias. It is not the gold standard, but it is the floor of credibility. And in the AI education space, where companies routinely cite engagement metrics or user satisfaction surveys as "evidence," reaching Tier 3 is genuinely rare.

SchoolAI is one of a small number of AI education platforms to earn this designation. That fact alone tells you something about the gap between the promises the industry makes and the evidence it provides.

The Metric That Matters

Here is the detail that caught my attention. The study did not measure test score improvements. It did not measure time on task or assignment completion rates. It measured critical thinking.

That choice is significant. Most AI education tools optimize for efficiency: faster completion, higher scores, more content covered. Those metrics are easy to move in the short term. The OECD documented this precisely in their Digital Education Outlook 2026: students using general-purpose AI tools showed a 48% performance boost that collapsed into a 17% deficit once the AI was removed.² The tool did the work. The student's own capacity did not grow.

Critical thinking is harder to fake. It requires the student to evaluate, synthesize, and reason independently. You cannot boost it by giving better hints or generating cleaner summaries. You can only build it by creating conditions where the student has to think harder than they otherwise would.

Deanna Taylor, Digital Learning Specialist at Jordan School District, put it simply: "What we saw over two years wasn't students becoming dependent on AI for answers, but rather students learning to think harder because the AI was designed to expect more of them."¹

That sentence deserves to be read twice. The AI was designed to expect more of them.

The Design Philosophy Behind the Number

There is a fork in the road that every AI education tool encounters in its earliest design decisions. One path leads toward answer-giving: the AI helps the student produce better work, faster, with less friction. The other path leads toward question-asking: the AI creates productive friction, scaffolding the student's reasoning without doing the reasoning for them.

The first path produces impressive short-term metrics. The second path produces something harder to measure but far more valuable: a student who can think without the tool.

The Jordan School District study suggests that SchoolAI chose the second path, at least in how they designed the interactions that produced the critical thinking gains. The AI guided reasoning rather than replacing it. This aligns with what a growing body of research continues to confirm. A February study in the Journal of Computer Assisted Learning found that students using Socratic AI developed "cyclical, reflective practices," while those using direct-answer AI engaged in "superficial mimicry." Georgia Tech's Socratic Mind pilot found that 77.8% of students considered the questioning approach more educational than traditional assessments.³

The pattern is consistent across independent research teams: AI that asks questions builds thinking. AI that gives answers replaces it.

What Schools Should Ask

The timing of this certification matters. Right now, 134 bills related to AI in education have been introduced across 31 states.⁴ School districts are scrambling to write AI policies, often without any framework for evaluating whether the tools they adopt actually work. Sixty-one percent of elementary school educators say their students struggle significantly to distinguish between AI-generated and non-AI-generated content.⁵ The ground is shifting fast, and schools need something to hold onto.

Evidence-based certifications like ESSA Tier 3 are one handhold. Not because they guarantee a tool will work in every context, but because they establish a minimum threshold of accountability. They force the question: did you actually study this, and what did you find?

But schools should push further. The right question is not just "does this tool have evidence?" It is "evidence of what?" A tool that improves test scores but atrophies independent thinking has evidence. It just has evidence of the wrong thing.

At Koan, we think about this distinction constantly. Our AI tutor, Aidan, is built around the Socratic method: asking calibrated questions rather than providing answers, creating the productive friction that develops genuine reasoning. And our WorkHub captures every revision, every pause, every shift in a student's thinking, not as surveillance but as a visible record of the learning process itself. Because the question we want evidence of is not "did the student perform?" but "did the student grow?"

That is a harder question to answer. It requires watching the process, not just grading the product. It requires tools that make thinking visible.

The Quiet Standard

The SchoolAI announcement will likely pass through the news cycle quickly. It does not have the drama of an AI completing an entire course autonomously or a major city banning ChatGPT. It is a two-year study in a Utah school district that found a 28% improvement in how students think.

But that quietness is part of what makes it important. The best evidence in education is rarely dramatic. It accumulates slowly, through patient observation, through controlled studies, through the careful work of watching what actually happens when students encounter well-designed tools over time.

If we measured every AI education tool not by the scores it produces, but by the thinking it develops, how many would survive the test?

●

References

Sources cited in order of appearance. Click any inline number to jump.