Evallm - LLM Evaluation platform

A platform to evaluate Large Language Models (LLMs) for specific tasks.

Features

User queries generate relevant responses from models by Meta, Google, and Mistral
Users can view past experiments in a side panel, allowing them to pull up the respective analytics on click
Cumulative statistics for each LLM are available on the "LLM Statistics" page
LLM Statistics page also includes insights on each LLM's performance, generated using Meta's llama-3.1-8b-instant model

MongoDB used for storage
React / Tailwind css frontend
Current list of models used for responses:
- llama-3.3-70b-versatile
- llama-3.1-8b-instant
- openai/gpt-oss-20b
Google auth
LLM Responses evaluated based on:
- Response time (ms)
- % cosine similarity
- % BLEU score
- % ROUGE score

Experiment with letting users implement their own system prompts
Statistical visualizations such as line graphs for the cumulative analytics
More analysis metrics
- Adjusted method for finding BLEU score so that its outputs are more unique

Note: Evallm v1 with the old ui is no longer public

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
app		app
components		components
hooks		hooks
lib		lib
public		public
types		types
.gitignore		.gitignore
README.md		README.md
auth.ts		auth.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json