Official Leaderboards
EPOB results, evidence first.
The paper package is the baseline. EPOB publishes periodic EPOB-standard runs for new agent frameworks, with comparable scores, evidence packets, and published data snapshots.
| Rank | Framework | Score | Deliverable | Evidence | Failure label | Model | Artifact |
|---|---|---|---|---|---|---|---|
| Loading leaderboard data... | |||||||
Matrix View
Framework by family evidence map
Result Viewer
Selected run evidence
Select a leaderboard row to inspect score components, package provenance, and evidence scope.
Dynamic Score Profile
Selected row breakdown
Evidence
Frozen artifacts remain the source of truth.
Paper-package rows under docs/paper/ define the baseline. New market-framework evaluations should preserve raw runtime, judge, and bundle metadata before entering public snapshots.
Submit
Email evaluation inquiries to info@epob.us.
Public comparable submissions, artifact handoff questions, and evaluator access requests can now start by email while regular market-framework evaluations continue under the EPOB protocol.