Scaled evals #159

EMZEDI · 2025-04-30T20:03:19Z

Only a few new scripts to run eval with vllm.

Copilot

Pull Request Overview

This PR scales evaluations by updating dependencies and modifying the deepspeed accelerate configuration for vllm-based eval scripts.

Added additional dependencies (setuptools, ipykernel, matplotlib) in pyproject.toml.
Increased num_processes and added main_process_port in the deepspeed configuration file.

Reviewed Changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated no comments.

File	Description
pyproject.toml	Added dependencies to support additional functionality.
benchmarks/ppo/accelerate_configs/deepspeed_zero2.yaml	Updated parallel process count and added a process port setting.

Files not reviewed (2)

jobs/validate_all_static.sh: Language not supported
jobs/validate_all_static_diversity.sh: Language not supported

Jacob-Chmura · 2025-04-30T23:02:19Z

jobs/validate_all_static.sh

+
+# list all sub‐tasks
+tasks=(
+  ultra-hh-sampled


Why don't we no longer need the full lest of datasets as it was before? The loop is much better, just trying to understand why the change

Jacob-Chmura · 2025-04-30T23:03:00Z

benchmarks/ppo/accelerate_configs/deepspeed_zero2.yaml

Should these changes be propogated to the other benchmark deepspeed configs?

Jacob-Chmura · 2025-04-30T23:04:07Z

pyproject.toml

Are the dependency changes temporary?

…to scaled-evals

EMZEDI added 4 commits April 27, 2025 11:42

full validation scripts add

fbbb4bc

full validation scripts add

3f068d7

configs for validation

b6817e3

validation scripts add

173b3a9

EMZEDI requested review from Jacob-Chmura and Copilot April 30, 2025 20:03

Copilot AI reviewed Apr 30, 2025

View reviewed changes

Jacob-Chmura requested changes Apr 30, 2025

View reviewed changes

EMZEDI added 4 commits May 2, 2025 22:56

embedding diversity calculation with mistral

a742880

unkillable diversity for CC

8d4bf9c

addition of notebooks

d398e86

Merge branch 'scaled-evals' of github.com:ComplexData-MILA/AIF-Gen in…

495d464

…to scaled-evals

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaled evals #159

Scaled evals #159

Uh oh!

EMZEDI commented Apr 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Jacob-Chmura Apr 30, 2025

Uh oh!

Jacob-Chmura Apr 30, 2025

Uh oh!

Jacob-Chmura Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Scaled evals #159

Are you sure you want to change the base?

Scaled evals #159

Uh oh!

Conversation

EMZEDI commented Apr 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Jacob-Chmura Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

Jacob-Chmura Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

Jacob-Chmura Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants