Skip to content

Commit 111258a

Browse files
committed
Added eval metric explanation
1 parent 7633325 commit 111258a

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

src/pages/LeaderboardPage.tsx

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ const LeaderboardPage: React.FC = () => {
166166
className={`metric-filter-btn ${activeMetric === 'optimalAcc' ? 'active' : ''}`}
167167
onClick={() => setActiveMetric('optimalAcc')}
168168
>
169-
Opt. Acc
169+
Opt. Acc.
170170
</button>
171171
<button
172172
className={`metric-filter-btn ${activeMetric === 'latency' ? 'active' : ''}`}
@@ -398,6 +398,46 @@ const LeaderboardPage: React.FC = () => {
398398
</div>
399399
</div>
400400

401+
<div className="metric-card">
402+
<div className="metric-summary">
403+
<h3>Accuracy Score</h3>
404+
<p> The average correctness across all of our dataset's queries.</p>
405+
</div>
406+
407+
<div className="metric-details">
408+
<h4>Definition</h4>
409+
<p>
410+
We calculate accuracy as the average correctness of the answers generated by the router's selected models across all of our dataset's queries
411+
</p>
412+
413+
414+
<p>
415+
<strong>Range:</strong> [0, 100]
416+
</p>
417+
</div>
418+
</div>
419+
420+
<div className="metric-card">
421+
<div className="metric-summary">
422+
<h3>Cost/1k Queries</h3>
423+
<p>Measures the cost incurred by a router’s routing decisions per 1000 queries.</p>
424+
</div>
425+
426+
<div className="metric-details">
427+
<h4>Definition</h4>
428+
<p>
429+
This is the average token cost incurred by the router's selected models for 1000 queries from our dataset.
430+
<br />
431+
We obtain the per-token cost for the specific models a router
432+
chooses using the official API pricing published by their providers. For unpopular models that are not served by commercial providers, we deploy them ourselves for experiments.
433+
In such cases, we approximate their costs using the pricing tiers published by commercial hosting
434+
platforms.
435+
</p>
436+
437+
</div>
438+
</div>
439+
440+
401441
{/* 2️⃣ Optimal Selection Score */}
402442
<div className="metric-card">
403443
<div className="metric-summary">

0 commit comments

Comments
 (0)