Building Agents That Learn from Practice

Modern AI agents can browse the web, write code, manage databases, and coordinate multi-step workflows. Yet for all their apparent sophistication, they share a curious limitation: they do not get better with experience. An agent that fails to parse a particular API response on Monday will fail in exactly the same way on Friday. The thousand-th customer service interaction teaches the system nothing that the first one did not.

This stands in sharp contrast to how humans develop expertise. A junior developer does not merely execute instructions from a textbook; she builds intuitions from mistakes, develops mental shortcuts from repeated patterns, and gradually internalizes a model of what good code looks like. The gap between a novice and an expert is not primarily one of knowledge—it is one of practiced judgment.

At Etude AI, we believe this gap represents one of the most important unsolved problems in AI agent design. Not because agents need more parameters or longer context windows, but because they lack the fundamental machinery for learning from their own experience in a structured, deliberate way.

The Static Agent Problem

Today's AI agents operate in what we might call a frozen competence regime. Their capabilities are fixed at deployment time, determined entirely by their training data and the scaffolding code that surrounds them. When an agent encounters a novel situation, it must rely on whatever generalizations its base model can muster. When it makes an error, that error vanishes into the void—no trace remains to prevent the same mistake from recurring.

This limitation manifests in several concrete ways. Agents repeatedly attempt strategies that have already failed in similar contexts. They cannot adapt their communication style based on what has worked with a particular user. They treat every task as if it were their first, unable to build on the momentum of prior successes. And when the environment shifts—a new API version, a changed website layout, an updated company policy—they have no mechanism for incremental adaptation.

The standard industry response has been to periodically retrain or fine-tune the underlying model. But retraining is expensive, slow, and coarse-grained. It cannot capture the kind of rapid, situated learning that characterizes human expertise. What we need is something fundamentally different: a way for agents to learn during deployment, from the very tasks they are trying to accomplish.

Lessons from Deliberate Practice

The cognitive science of expertise offers a compelling framework for thinking about this problem. Beginning in the 1990s, psychologist K. Anders Ericsson and his colleagues developed a theory of deliberate practice (Ericsson et al., 1993) that transformed our understanding of how humans develop expert performance. Their central finding was that raw experience alone is insufficient for improvement. What matters is the structure of that experience.

The difference between an expert and a novice is not merely ten thousand hours of doing—it is ten thousand hours of doing with intention, feedback, and progressive challenge.
Adapted from K. Anders Ericsson, "The Role of Deliberate Practice"

Ericsson identified several key characteristics of deliberate practice that distinguish it from mere repetition. First, deliberate practice involves well-defined tasks with clear goals—not vague aspirations to "get better," but specific objectives like "improve accuracy on minor key transitions" or "reduce response latency for compound queries." Second, it requires immediate, informative feedback—the practitioner must know not just whether they succeeded or failed, but why, and what specific adjustments to make. Third, it demands progressive difficulty—tasks must be calibrated to sit just beyond the practitioner's current ability, in what Vygotsky (1978) called the zone of proximal development.

Finally, and perhaps most importantly, deliberate practice involves focused attention on weaknesses. Experts do not simply repeat what they already do well. They identify their specific failure modes, design exercises that target those weaknesses, and concentrate their effort where it will produce the greatest marginal improvement.

These principles map onto the AI agent problem with remarkable precision. Current agents lack all four components: they have no mechanism for defining targeted sub-goals, no structured feedback beyond binary success or failure, no curriculum of progressive difficulty, and no ability to identify and concentrate on their own weaknesses.

A Framework for Agent Practice

Over the past year, our research at Etude AI has focused on translating these cognitive science principles into a concrete architectural framework. The result is what we call the practice loop—a runtime mechanism that allows agents to improve their performance on specific task categories through structured experience.

Structured Task Decomposition

The first component addresses the "well-defined tasks" criterion. Rather than treating each request as a monolithic challenge, the practice loop decomposes agent activity into a hierarchy of discrete skills. A customer service agent, for example, does not simply "handle tickets." It exercises distinct sub-skills: understanding user intent, retrieving relevant documentation, formulating clear explanations, knowing when to escalate, and managing emotional tone.

This decomposition serves two purposes. It creates a granular map of the agent's competencies, allowing us to measure performance at the level of individual skills rather than overall task completion. And it provides natural units for practice—an agent can focus on improving its intent classification without simultaneously trying to improve everything else.

We implement this through a skill taxonomy that is partially hand-designed and partially discovered. Domain experts define the top-level skill categories, but the agent itself learns to identify sub-skills and failure modes within those categories through a reflection mechanism we describe below.

Self-Evaluation Loops

The second component provides the "immediate, informative feedback" that deliberate practice requires. After each task attempt, the agent enters a structured self-evaluation phase. This is not the simple "chain-of-thought" reflection that has become common in prompt engineering. It is a multi-step process with specific outputs.

The agent first generates a performance trace—a structured record of what it did, what it expected to happen, and what actually happened. It then performs a causal analysis, attempting to identify the specific decision points where its choices led to suboptimal outcomes. Finally, it produces a corrective strategy—a concrete description of what it would do differently if faced with the same situation again.

These corrective strategies are stored in a practice memory—a specialized knowledge base that the agent consults before attempting similar tasks in the future. Unlike the agent's base model weights, which are frozen, the practice memory is a living document that grows and evolves with experience. Critically, entries in the practice memory are not just examples or rules; they are contextualized strategies that capture both what to do and why.

The goal is not an agent that memorizes answers, but one that develops judgment—the ability to recognize which situations call for which approaches, and why.

Progressive Difficulty Scaling

The third component addresses the curriculum problem. In a deployment setting, the agent does not control which tasks it receives. But it can control how it practices between tasks. We introduce a practice scheduler that generates synthetic practice scenarios calibrated to the agent's current skill profile.

The scheduler maintains a model of the agent's competency across its skill taxonomy. When it detects a skill area where performance is below threshold, it generates practice scenarios that target that specific weakness. These scenarios begin at a difficulty level slightly above the agent's demonstrated competence and gradually increase as performance improves.

This approach draws directly from the concept of spaced repetition in human learning (Ebbinghaus, 1885; Cepeda et al., 2006). Skills that the agent has recently practiced and demonstrated mastery of are revisited less frequently. Skills that remain weak receive more attention. The result is an efficient allocation of practice time that mirrors the strategies used by expert human learners.

Architecture: Putting the Pieces Together

The complete architecture integrates these three components into a coherent runtime system. During normal operation, the agent handles tasks using both its base model capabilities and its practice memory. After each task, the self-evaluation loop runs asynchronously, updating the practice memory and the competency model. During idle periods, the practice scheduler generates targeted exercises.

The key architectural decision is the separation between the agent's base competence (its pretrained model weights) and its practiced competence (the accumulated strategies in its practice memory). This separation provides several benefits. It allows the practice memory to be inspected, edited, and version-controlled by human operators. It makes the agent's learned behaviors transparent and auditable. And it allows the system to fall back gracefully to base competence if the practice memory is corrupted or inapplicable.

We also maintain a skill hierarchy that evolves over time. As the agent encounters new task types, it proposes additions to its skill taxonomy. These proposals are reviewed (currently by humans, eventually through automated validation) and integrated into the competency model. This allows the agent's understanding of its own capabilities to grow alongside those capabilities themselves.

Early Observations

Our experiments with practice-enabled agents are still in early stages, but several patterns have already emerged that we find encouraging.

Skill-specific improvement is real and measurable. In our test environments, agents with practice loops show statistically significant improvement on targeted skill areas within 50-100 task interactions. This is not merely memorization of specific cases; improvement generalizes to novel instances within the same skill category. An agent that practices handling ambiguous user queries, for example, develops strategies that transfer to new types of ambiguity it has not previously encountered.

Reflection quality matters enormously. The most important predictor of improvement is not the volume of practice but the quality of the self-evaluation phase. Agents that generate vague or superficial corrective strategies ("try harder," "be more careful") show minimal improvement. Those that generate specific, actionable strategies ("when the user mentions both a deadline and a budget constraint, address the deadline first because it is typically the harder constraint") improve rapidly. We are actively researching how to improve reflection quality, including using separate "coach" models that specialize in generating high-quality feedback.

Catastrophic forgetting is a real risk. Just as human experts can lose skills they do not practice, agents can lose previously learned strategies when their practice memory grows large or when new strategies conflict with old ones. Our spaced repetition scheduler helps mitigate this, but we have not fully solved the problem. This is an area of active research.

Practice memories are surprisingly interpretable. One unexpected benefit of our approach is that the practice memory provides a readable record of what the agent has learned and why. This has proven valuable for debugging and for building trust with human operators who need to understand the agent's decision-making process.

Implications for Safety and Alignment

An agent that learns from experience raises important questions about safety and alignment. If an agent can modify its own behavior through practice, how do we ensure that those modifications remain aligned with human values and intentions?

We believe the practice loop framework actually improves the safety picture relative to alternatives. The key insight is that practice memories are explicit, inspectable artifacts. Unlike the opaque weight changes that result from fine-tuning, practice memories are written in natural language and can be reviewed, approved, or rejected by human overseers. An agent's learned behaviors are not hidden inside billions of parameters; they are recorded in a structured knowledge base that admits direct human oversight.

Furthermore, the skill taxonomy provides a natural mechanism for constraining what the agent is allowed to learn. By defining boundaries on the skill hierarchy, operators can ensure that practice remains focused on approved competency areas. An agent authorized to improve its customer service skills cannot use the practice loop to develop capabilities in areas it has not been authorized to explore.

That said, we are cautious about overstating these safety properties. As practice memories grow in complexity and as agents become more capable of self-directed learning, the oversight challenge will scale accordingly. We view this as a problem that must be solved incrementally, with careful empirical work at each stage, rather than one that admits a simple theoretical solution.

Transparency is not a feature we add to the system—it is a structural property of how the system learns. Every adaptation the agent makes is recorded, readable, and reversible.

Looking Forward

The analogy to musical practice that gives our company its name is more than a metaphor. A pianist does not become great by reading about music theory or by playing pieces they have already mastered. They become great by identifying the passages that challenge them, practicing those passages with focused attention, and gradually expanding the boundary of what they can play with fluency and expression.

We believe AI agents should develop in the same way. Not through ever-larger training runs or ever-longer context windows, but through structured, deliberate engagement with their own performance. The practice loop is our first step toward agents that earn their competence through experience—agents that do not merely execute, but genuinely learn.

The road ahead is long. We are still in the early measures of this particular composition. But the initial results suggest that the principles of deliberate practice, refined over decades of cognitive science research, have deep relevance to the systems we are building today. We look forward to sharing more detailed technical results as our work progresses.

If you are interested in this line of research, we encourage you to follow our work and reach out. The best ideas in this space will come from the intersection of cognitive science, machine learning, and practical agent engineering—and we are always looking for collaborators who bring fresh perspectives to the problem.

References

Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). “The Role of Deliberate Practice in the Acquisition of Expert Performance.” Psychological Review, 100(3), 363–406.
Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press.
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). “Distributed Practice in Verbal Recall Tasks.” Review of Educational Research, 76(3), 354–380.