feat: Add multi-language translation leaderboard system#65
Closed
feat: Add multi-language translation leaderboard system#65
Conversation
- Add core leaderboard components (Leaderboard.js, SubmitToLeaderboard.js) - Add leaderboard API endpoints (get_leaderboard) - Add evaluation functions (BLEU, BERTScore) - Add test files and documentation in docs/leaderboard/ - Add minimal required dependencies - No changes to existing DevOps, agents, or infrastructure
This comment was marked as outdated.
This comment was marked as outdated.
095de0d to
00bdef4
Compare
- Remove hardcoded dataset sorting, use dynamic Object.values approach - Fix backend port mapping to 5001 to avoid macOS AirPlay conflict - Update frontend API URLs to use environment variables - Add missing /public/submit_model endpoint to backend - Create missing test files for all language/metric combinations - Update documentation to reflect port change from 3001 to 3000 - Fix CORS configuration for proper frontend-backend communication
- Remove leaderboard route from RouteConstants.js and Dashboard.js - Delete Leaderboard.js component completely - Move FAQs section to Evaluations.js - Remove 'View all models' button - now shows all models directly - Simplify to single /evaluations page with all functionality
- Show only top 5 models by default to keep UI clean - Add expand/collapse functionality to view all models in-place - Remove messy full model list display - Maintain original clean UI design
- Add back /leaderboard route and simple Leaderboard.js component - Evaluations page shows only top 5 models (clean UI) - 'View all X models →' button navigates to dedicated leaderboard page - Leaderboard page shows full list for selected dataset only - FAQs remain on evaluations page - Maintains clean UI separation
- Add benchmark_datasets table with translation datasets - Add model_submissions table for storing model results - Add evaluation_results table for storing scores - Insert initial dataset entries for Spanish, Arabic, Japanese, Chinese, Korean (BLEU + BERTScore) - Add proper indexes for performance Fixes: Table 'agents.benchmark_datasets' doesn't exist error
- Add evaluation_details column to evaluation_results table - Fix evaluation_metric values to be lowercase (bleu, bertscore) - Verify model submission and leaderboard API working correctly - Database tables now properly created and functional
- Add NLTK punkt download to Dockerfile for BLEU calculation - Add curl to Dockerfile for healthcheck functionality - Make OpenAI/Stripe API key initialization more robust with fallbacks - Add comprehensive .env example file (env-example.txt) - Update documentation with troubleshooting section - Add specific guidance for new project setup and database recreation - Prevent startup failures when API keys are missing
- Update backend API to include empty datasets with is_empty flag - Update frontend to handle and display empty datasets with 'No submissions yet' message - Ensure all 10 leaderboards (Spanish, Arabic, Japanese, Chinese, Korean × BLEU/BERTScore) are always visible
- Remove placeholder Google Scripts URL that was never implemented - Clean up non-functional code that was creating unnecessary network requests - Keep only the working backend API submission functionality
Collaborator
|
this code isn't relevant to this github repo. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.