EvalAI Leaderboard

Interactive Leaderboard

EvalAI-hosted rankings with track filters and score breakdowns.

Benchmark version: v1.0 | Last updated: 2026-03-30 | Source: leaderboard_public_mock.json

How to Read This Leaderboard

  • Public: EvalAI phase for development-facing ranking.
  • Private: EvalAI hidden phase used for final ranking.
  • Total: weighted aggregate across latent, mechanism, support, and intervention tasks.

Reference Baselines

Random Attribution

Sanity-check floor expected to perform near chance across all tasks.

Linear Probe + Heuristics

Simple baseline that validates schema and submission pipeline behavior.

Causal Rollout Family

Stronger baseline for competitive public-track performance.