EPOB End-to-End Project Orchestration Benchmark

Official Leaderboards

EPOB results, evidence first.

The paper package is the baseline. EPOB publishes periodic EPOB-standard runs for new agent frameworks, with comparable scores, evidence packets, and published data snapshots.

Rank Framework Score Deliverable Evidence Failure label Model Artifact
Loading leaderboard data...

Matrix View

Framework by family evidence map

Result Viewer

Selected run evidence

Select a leaderboard row to inspect score components, package provenance, and evidence scope.

Dynamic Score Profile

Selected row breakdown

Evidence

Frozen artifacts remain the source of truth.

Paper-package rows under docs/paper/ define the baseline. New market-framework evaluations should preserve raw runtime, judge, and bundle metadata before entering public snapshots.

Submit

Email evaluation inquiries to info@epob.us.

Public comparable submissions, artifact handoff questions, and evaluator access requests can now start by email while regular market-framework evaluations continue under the EPOB protocol.