The Crutch Effect: When AI Helps Too Much
A landmark OECD study finds that AI boosts student performance by 48%, then drops it by 17% the moment AI is taken away. But one approach breaks the pattern.
This week, the American Federation of Teachers launched something quietly historic. In a New York City conference room on March 18th, the first cohort of the National Academy for AI Instruction sat down to learn not how to use AI as a search engine or a lesson-plan generator, but how to build what the program calls "agentic" tools: autonomous AI systems designed to reason alongside students across subjects and grades.
The ambition is striking: 400,000 teachers trained to move beyond surface-level AI use toward something more intentional, more pedagogically grounded. It is a recognition that the question is no longer whether AI belongs in classrooms, but whether we are building the right kind.
That distinction matters more than ever now. Because we finally have the data to prove it.
The Number That Should Worry Us
The OECD's Digital Education Outlook 2026, released earlier this year and widely discussed this month, contains a finding that deserves to be printed on the wall of every school board office in the world. Students using general-purpose AI tools were 48% more successful at completing tasks. That sounds like a triumph. But when the AI was removed, their performance dropped 17% below where they started.
The OECD researchers have a name for this: "metacognitive laziness." It is the quiet atrophy that happens when a tool does your thinking for you so seamlessly that you stop noticing you have stopped thinking at all. The student writes a better essay with AI. Then writes a worse one without it. The tool did not teach. It carried.
This is the crutch effect. And it is not a hypothetical. It is the measured reality of how most students interact with AI right now.
The Exception That Proves the Rule
But here is where the OECD report becomes genuinely hopeful. Purpose-built educational AI tools, specifically those designed around Socratic questioning and scaffolded guidance, showed sustained learning benefits. Not temporary boosts. Not performance that vanished when the tool was removed. Real, persistent gains in critical thinking and comprehension.
This finding does not stand alone. A February study in the Journal of Computer Assisted Learning compared students using AI that gives direct answers with students using AI that asks Socratic questions. The direct-answer group engaged in what the researchers called "superficial mimicry." The Socratic group developed "cyclical, reflective practices." Same technology. Same subject matter. Radically different learning.
Georgia Tech's Socratic Mind pilot, running with 600 students, found that 77.8% considered the questioning approach more educational than traditional assessments. Not just more engaging. More educational. Students themselves can feel the difference between being carried and being coached.
The Design Choice That Matters
What separates a crutch from a coach? It is not the sophistication of the model or the size of the training data. It is a single design decision: does this tool give answers, or does it ask questions?
A crutch absorbs the weight. A student leans on it, moves forward, and never builds the muscle to walk alone. A coach stands beside the student and asks: Where are you trying to go? What is making this hard? What would happen if you tried it differently?
The OECD data confirms what good teachers have always known intuitively. The struggle is the learning. When you remove the struggle, you remove the growth. When you scaffold the struggle, making it visible, giving it structure, asking the right questions at the right moment, you get something that lasts.
What This Means for Schools
The AFT's new academy is a promising step because it signals a shift in how educators think about AI. Not as a content delivery mechanism, but as something that requires pedagogical intent. The 400,000 teachers who move through this program will be asked to think about what "agentic" really means: an AI that reasons, not one that retrieves.
But training teachers to build better AI tools is only half the equation. Schools also need platforms built from the ground up around these principles. Platforms where the AI does not generate essays but generates questions. Where every revision, every pause, every shift in a student's thinking is captured. Not as surveillance, but as evidence of the learning process itself.
This is the work we are doing at Koan. Our AI tutor, Aidan, is built on the same Socratic foundation that the OECD data validates. Aidan never writes for students. It asks them what they are trying to say. It notices when they rush past an assumption. It remembers the pattern from their last essay and asks whether they see it too. And the WorkHub captures all of this, the drafts, the revisions, the moments of genuine breakthrough, so teachers can see not just what a student submitted but how they got there.
The crutch effect is real. But it is not inevitable. It is the predictable outcome of AI designed without pedagogical intent. When you design with intent, when you build AI that asks instead of tells, you get the opposite: students who think more deeply, not less.
A Question Worth Sitting With
Stanford recently reviewed over 800 studies on AI in education and found that only about 20 met the standard for rigorous causal research. But across those 20, a clear pattern held: scaffolding thinking works, and giving answers does not.
Twenty studies, from independent teams across the world, all pointing in the same direction. The evidence is not ambiguous. The question is whether we will build accordingly.
If we know that AI designed to ask questions produces deeper learning than AI designed to give answers, why are we still building, buying, and deploying the kind that gives answers?