EvalAI Leaderboard

Interactive Leaderboard

EvalAI-hosted rankings with track filters and score breakdowns.

Benchmark version: v1.0 | Last updated: 2026-03-30 | Source: leaderboard_public_mock.json

How to Read This Leaderboard

Public: EvalAI phase for development-facing ranking.
Private: EvalAI hidden phase used for final ranking.
Total: weighted aggregate across latent, mechanism, support, and intervention tasks.

Track:

Sanity-check floor expected to perform near chance across all tasks.

Simple baseline that validates schema and submission pipeline behavior.

Stronger baseline for competitive public-track performance.