Practice makes
intelligence.

We build AI agents that improve through deliberate practice — the same mechanism behind every elite human performer.

Benchmarks · Agents · Open Source · Published at ICML, NeurIPS
Explore our research

3 open-source models · 6 research papers · Apache 2.0

01

Evaluation

Rigorous benchmarks and metrics that reveal true model capabilities beyond surface-level performance.

How do we know if AI is actually getting smarter? We build the tests.

3 benchmarks
02

Agents

Autonomous systems that learn and improve through deliberate practice and structured self-evaluation.

AI that learns from practice, just like humans do.

2 frameworks
03

Vision

Multimodal understanding that bridges visual perception with deep reasoning and world knowledge.

Teaching AI to understand what it sees, not just describe it.

1 toolkit
Evaluation
EtudeEval-7B
Can spot when AI is cheating on tests — and build harder ones on the fly
7B parameters
Contamination detection: 94.2% · Adaptive difficulty: ±0.3 calibration

Dynamic evaluation model. Adversarial benchmark generation that adapts to model capabilities in real time.

View on GitHub →
Agents
PracticeAgent-13B
Helps AI learn from its mistakes, like a coach running drills
13B parameters
SWE-bench: 45.2% · 40% faster skill acquisition vs. static training

Self-improving agent. Deliberate practice loops that drive compounding performance gains over time.

View on GitHub →
Vision
VisPractice-3B
Teaches AI to truly understand what it sees in photos and diagrams
3B parameters
MMMU: 51.8% · Compositional VQA: 67.3%

Multimodal vision model. Compositional reasoning that bridges visual perception with structured world knowledge.

View on GitHub →
Stanford NLP MIT CSAIL Berkeley AI CMU LTI Oxford NLP ETH Zürich
3 Open-Source
Models
6 Research
Publications
7 Blog Posts &
Technical Essays
8 Open Roles
Hiring Now

We're hiring researchers and engineers

Join a small team doing frontier research on agents that learn through deliberate practice.

View open roles

Research notes, technical essays, and dispatches from the frontier.