Practice makes
intelligence.

We build AI agents that improve through deliberate practice — the same mechanism behind every elite human performer.

Benchmarks · Agents · Open Source · Published at ICML, NeurIPS

Explore our research

3 open-source models · 6 research papers · Apache 2.0

Our Focus

Evaluation

Rigorous benchmarks and metrics that reveal true model capabilities beyond surface-level performance.

How do we know if AI is actually getting smarter? We build the tests.

3 benchmarks

Agents

Autonomous systems that learn and improve through deliberate practice and structured self-evaluation.

AI that learns from practice, just like humans do.

2 frameworks

Vision

Multimodal understanding that bridges visual perception with deep reasoning and world knowledge.

Teaching AI to understand what it sees, not just describe it.

1 toolkit

Our Models

Evaluation

EtudeEval-7B

Can spot when AI is cheating on tests — and build harder ones on the fly

7B parameters

Contamination detection: 94.2% · Adaptive difficulty: ±0.3 calibration

Dynamic evaluation model. Adversarial benchmark generation that adapts to model capabilities in real time.

View on GitHub →

Agents

PracticeAgent-13B

Helps AI learn from its mistakes, like a coach running drills

13B parameters

SWE-bench: 45.2% · 40% faster skill acquisition vs. static training

Self-improving agent. Deliberate practice loops that drive compounding performance gains over time.

View on GitHub →

Vision

VisPractice-3B

Teaches AI to truly understand what it sees in photos and diagrams

3B parameters

MMMU: 51.8% · Compositional VQA: 67.3%

Multimodal vision model. Compositional reasoning that bridges visual perception with structured world knowledge.

View on GitHub →

Our Team Has Roots At

Stanford NLP MIT CSAIL Berkeley AI CMU LTI Oxford NLP ETH Zürich

3 Open-Source
Models

6 Research
Publications

7 Blog Posts &
Technical Essays

8 Open Roles
Hiring Now

Latest Writing

Research notes, technical essays, and dispatches from the frontier.

March 15, 2026 Perspectives 7 min read

Practice makes
intelligence.

Our Focus

Evaluation

Agents

Vision

Our Models

Our Team Has Roots At

We're hiring researchers and engineers

Latest Writing

Beyond Task Automation: The Case for Agents That Truly Learn

Introducing EtudeEval: A Dynamic Benchmark for Agent Learning

From Self-Play to Deliberate Practice: What AlphaZero Teaches Us

Practice makesintelligence.

♪ Our Focus

Evaluation

Agents

Vision

♪ Our Models

♫ Our Team Has Roots At

We're hiring researchers and engineers

Beyond Task Automation: The Case for Agents That Truly Learn

Introducing EtudeEval: A Dynamic Benchmark for Agent Learning

From Self-Play to Deliberate Practice: What AlphaZero Teaches Us

Stay in tune

Practice makes
intelligence.

Our Focus

Our Models

Our Team Has Roots At