Marcos/benchmarks #1

mdvillagra · 2025-07-24T18:50:05Z

No description provided.

…odelLLMAggregator

…nclude timeout for launch_fedllm_custom.py execution

…FullModelLLMTrainer and FullModelLLMAggregator

…gging in FedMLServerManager

…LLMAggregator by removing commented-out code and ensuring consistent execution of checkpointing after aggregation.

…ess and format compliance scoring. Introduce DataFormatting and Evaluation classes for data handling and numerical extraction. Update FullModelLLMTrainer to utilize the new reward function.

…nd adjust communication rounds in grpo_gsm8k_test_config.yaml from 1 to 2 for testing.

…tion class to streamline input parameters and improve clarity. Update FullModelLLMTrainer to utilize the revised combined_reward function.

…amline reward calculation, focusing solely on correctness scoring in the combined_reward method.

…mproved model output capacity. Update grpo_gsm8k_test_config.yaml to change client setup from 1 to 2 clients for enhanced testing scenarios.

…LLMTrainer to enhance experiment tracking.

…rd_fn instead of combined_reward, enhancing clarity and consistency in reward calculation.

… clarity and maintainability. Update reward_fn to utilize class attributes for reward values, improving consistency in reward calculations.

…_config.yaml for improved testing. Adjust timeout duration in run_fedml_client_custom.sh and run_fedml_server_custom.sh scripts to accommodate longer execution times.

…r faster testing iterations.

…ng and improve logging of missing/unexpected keys. Adjust grpo_gsm8k_test_config.yaml for testing parameters, reducing communication rounds and training steps for quicker iterations.

…s base model state. Enhance logging to provide clearer output of missing and unexpected keys during model loading for improved debugging.

…gFace's save_pretrained method for improved compatibility. Implement fallback mechanism for models lacking this method, ensuring robust checkpointing during training.

…30 for extended testing iterations.

…r improved troubleshooting during answer validation.

…ingle client setup for testing, adjusting client_num_in_total and client_num_per_round to 1.

… print statements to display completions and answers for better troubleshooting during answer validation.

… FullModelLLMTrainer. Refactor reward function to improve answer validation by incorporating numeric equivalence checks for better accuracy.

… for clarity in numeric equivalence checks, enhancing accuracy in answer validation.

… training parameters for improved testing. Increase client_num_in_total and client_num_per_round to 2, extend comm_round to 300, and raise grpo_max_steps to 150 for more comprehensive evaluation.

…from 256 to 512 for enhanced response handling.

…ForCausalLM for improved flexibility and evaluation. Removed deprecated torch_dtype handling and ensured dropout is disabled for the reference model.

…iner to avoid CPU↔GPU mismatch. Retain commented line for CPU off-loading during debugging.

… improved performance and compatibility.

…Q-Int8 for enhanced performance and compatibility.

…om 512 to 256 for optimized response handling.

…from 256 to 512 for enhanced response handling.

…iner.py and ensure reference model is moved to CPU for consistency in TimedGRPOTrainer.

…ine imports and improve code clarity.

… improved performance and compatibility.

…to model's device in _get_per_token_logps_and_entropies method to prevent CPU↔GPU mismatch errors.

…le tensor inputs in _get_per_token_logps_and_entropies method, ensuring compatibility with both tensor and mapping types.

…ssary tensor conversion logic, simplifying the process of moving inputs to the model's device.

…ties and entropies are moved to the appropriate device after computation in _get_per_token_logps_and_entropies method, improving device compatibility.

…or tensor types before moving log probabilities and entropies to the appropriate device, enhancing robustness and preventing potential errors.

…strategy to move policy outputs to CPU when the reference model is on CPU, ensuring efficient memory usage and preventing GPU memory spikes during rollouts.

…og probabilities and entropies to float16 and ensuring they are moved to the appropriate device, enhancing performance and memory efficiency during training.

… 256 for improved performance and resource management during training.

…mentation and clarity, and update max completion length and new tokens in FullModelLLMTrainer to 512 for enhanced training performance.

…additional parameters for enhanced performance and compatibility, and clean up commented code for better readability.

…ning performance and stability.

…h size in GRPO test config to 2 for improved testing efficiency.

…zed testing configuration.

…timized testing efficiency.

…or enhanced testing scalability.

…aster testing.

…"Qwen/Qwen3-0.6" for improved compatibility and performance.

…Qwen/Qwen3-0.6B" for enhanced performance and compatibility.

…cstrings, and update GRPO test configuration by increasing max steps from 20 to 50 and batch size from 1 to 2 for enhanced testing efficiency.

…2 in FullModelLLMAggregator for improved checkpoint management.

mdvillagra added 30 commits July 22, 2025 16:19

Implement TimedGRPOTrainer to log roll-out batch durations

8c5497b

Add venv to .gitignore to exclude virtual environment files

e91d56a

Add timing logs for set_model_params in FullModelLLMTrainer and FullM…

b990ba2

…odelLLMAggregator

Update run_fedml_client_custom.sh and run_fedml_server_custom.sh to i…

26f9668

…nclude timeout for launch_fedllm_custom.py execution

Add periodic checkpointing and per-round checkpoint configuration to …

74a395f

…FullModelLLMTrainer and FullModelLLMAggregator

Remove commented-out code blocks in FullModelLLMTrainer for clarity

6c3ea9c

Add logging for global update frequency in FedMLServerManager

499104a

Add Nesterov momentum support in FullModelLLMAggregator and update lo…

a2e576e

…gging in FedMLServerManager

Refactor checkpoint saving logic in FullModelLLMTrainer and FullModel…

579dd31

…LLMAggregator by removing commented-out code and ensuring consistent execution of checkpointing after aggregation.

Merge remote changes from origin/marcos/benchmarks

5d885ae

Add RewardFunction class for evaluating model responses with correctn…

b2c987e

…ess and format compliance scoring. Introduce DataFormatting and Evaluation classes for data handling and numerical extraction. Update FullModelLLMTrainer to utilize the new reward function.

Update logging format in FedMLServerManager to include total rounds a…

37ad46e

…nd adjust communication rounds in grpo_gsm8k_test_config.yaml from 1 to 2 for testing.

Refactor correctness_reward and combined_reward methods in RewardFunc…

b5fb686

…tion class to streamline input parameters and improve clarity. Update FullModelLLMTrainer to utilize the revised combined_reward function.

Remove the format_reward method from the RewardFunction class to stre…

91642a4

…amline reward calculation, focusing solely on correctness scoring in the combined_reward method.

Increase max_new_tokens in FullModelLLMTrainer from 512 to 1024 for i…

eaa7344

…mproved model output capacity. Update grpo_gsm8k_test_config.yaml to change client setup from 1 to 2 clients for enhanced testing scenarios.

Add report_to parameter for Weights & Biases integration in FullModel…

afc697c

…LLMTrainer to enhance experiment tracking.

Refactor reward function usage in FullModelLLMTrainer to utilize rewa…

99434ed

…rd_fn instead of combined_reward, enhancing clarity and consistency in reward calculation.

Refactor reward function parameters in FullModelLLMTrainer to enhance…

ee38a16

… clarity and maintainability. Update reward_fn to utilize class attributes for reward values, improving consistency in reward calculations.

Update gradient accumulation steps and batch sizes in grpo_gsm8k_test…

f4b4f1d

…_config.yaml for improved testing. Adjust timeout duration in run_fedml_client_custom.sh and run_fedml_server_custom.sh scripts to accommodate longer execution times.

Update grpo_batch_size in grpo_gsm8k_test_config.yaml from 32 to 2 fo…

740eed7

…r faster testing iterations.

Update FullModelLLMTrainer to correctly handle model state dict loadi…

cd16d02

…ng and improve logging of missing/unexpected keys. Adjust grpo_gsm8k_test_config.yaml for testing parameters, reducing communication rounds and training steps for quicker iterations.

Refactor state dict loading in FullModelLLMTrainer to correctly acces…

f8261bb

…s base model state. Enhance logging to provide clearer output of missing and unexpected keys during model loading for improved debugging.

Enhance checkpoint saving in FullModelLLMAggregator to utilize Huggin…

c7848a3

…gFace's save_pretrained method for improved compatibility. Implement fallback mechanism for models lacking this method, ensuring robust checkpointing during training.

Update communication rounds in grpo_gsm8k_test_config.yaml from 3 to …

8732a29

…30 for extended testing iterations.

Add debugging breakpoint in reward function of FullModelLLMTrainer fo…

85220ca

…r improved troubleshooting during answer validation.

Update client configuration in grpo_gsm8k_test_config.yaml to use a s…

dd8a896

…ingle client setup for testing, adjusting client_num_in_total and client_num_per_round to 1.

Enhance debugging in reward function of FullModelLLMTrainer by adding…

304b809

… print statements to display completions and answers for better troubleshooting during answer validation.

Add methods to handle boxed content and convert strings to numbers in…

97e3a90

… FullModelLLMTrainer. Refactor reward function to improve answer validation by incorporating numeric equivalence checks for better accuracy.

Fix reward function in FullModelLLMTrainer by updating variable names…

09291ee

… for clarity in numeric equivalence checks, enhancing accuracy in answer validation.

Update grpo_gsm8k_test_config.yaml to adjust client configuration and…

8c1ed2d

… training parameters for improved testing. Increase client_num_in_total and client_num_per_round to 2, extend comm_round to 300, and raise grpo_max_steps to 150 for more comprehensive evaluation.

mdvillagra added 30 commits August 4, 2025 19:52

Increase max completion length and new tokens in FullModelLLMTrainer …

52f6122

…from 256 to 512 for enhanced response handling.

Refactor reference model loading in TimedGRPOTrainer to use AutoModel…

133bb78

…ForCausalLM for improved flexibility and evaluation. Removed deprecated torch_dtype handling and ensured dropout is disabled for the reference model.

Move reference model to the same device as the policy in TimedGRPOTra…

83f7051

…iner to avoid CPU↔GPU mismatch. Retain commented line for CPU off-loading during debugging.

Update reference model in TimedGRPOTrainer to use Qwen/Qwen3-0.6B for…

8da9d85

… improved performance and compatibility.

Update reference model in TimedGRPOTrainer to use Qwen/Qwen3-0.6B-GPT…

0417ee8

…Q-Int8 for enhanced performance and compatibility.

Reduce max completion length and new tokens in FullModelLLMTrainer fr…

2329b2c

…om 512 to 256 for optimized response handling.

Increase max completion length and new tokens in FullModelLLMTrainer …

f5344ad

…from 256 to 512 for enhanced response handling.

Remove deprecated SyncRefModelCallback implementation from custom_tra…

9d3c905

…iner.py and ensure reference model is moved to CPU for consistency in TimedGRPOTrainer.

Remove fallback stub for prepare_fsdp in custom_trainer.py to streaml…

efd2e18

…ine imports and improve code clarity.

Update reference model in TimedGRPOTrainer to use Qwen/Qwen3-1.7B for…

48ec150

… improved performance and compatibility.

Add device compatibility in TimedGRPOTrainer by moving batch tensors …

44e075f

…to model's device in _get_per_token_logps_and_entropies method to prevent CPU↔GPU mismatch errors.

Enhance batch handling in TimedGRPOTrainer by adding support for sing…

c08bc3b

…le tensor inputs in _get_per_token_logps_and_entropies method, ensuring compatibility with both tensor and mapping types.

Refactor batch tensor handling in TimedGRPOTrainer by removing unnece…

bfcf77c

…ssary tensor conversion logic, simplifying the process of moving inputs to the model's device.

Enhance tensor handling in TimedGRPOTrainer by ensuring log probabili…

69194e9

…ties and entropies are moved to the appropriate device after computation in _get_per_token_logps_and_entropies method, improving device compatibility.

Improve tensor device handling in TimedGRPOTrainer by adding checks f…

800f566

…or tensor types before moving log probabilities and entropies to the appropriate device, enhancing robustness and preventing potential errors.

Refine tensor device alignment in TimedGRPOTrainer by implementing a …

a3d7b7b

…strategy to move policy outputs to CPU when the reference model is on CPU, ensuring efficient memory usage and preventing GPU memory spikes during rollouts.

Optimize tensor device management in TimedGRPOTrainer by converting l…

e6e2c15

…og probabilities and entropies to float16 and ensuring they are moved to the appropriate device, enhancing performance and memory efficiency during training.

Adjust max completion length and new tokens in FullModelLLMTrainer to…

92b942e

… 256 for improved performance and resource management during training.

Refactor TimedGRPOTrainer by adding docstrings for improved code docu…

33f6ed2

…mentation and clarity, and update max completion length and new tokens in FullModelLLMTrainer to 512 for enhanced training performance.

Update reference model initialization in TimedGRPOTrainer to include …

760493c

…additional parameters for enhanced performance and compatibility, and clean up commented code for better readability.

Update beta parameter in FullModelLLMTrainer to 0.1 for improved trai…

a29d3eb

…ning performance and stability.

Update generation count in FullModelLLMTrainer to 4 and increase batc…

5cc0537

…h size in GRPO test config to 2 for improved testing efficiency.

Update generation count in FullModelLLMTrainer from 4 to 2 for optimi…

28d7513

…zed testing configuration.

Reduce grpo_max_steps in GRPO test configuration from 50 to 20 for op…

5c68b92

…timized testing efficiency.

Update client configuration in GRPO test setup to support 4 clients f…

2a7da13

…or enhanced testing scalability.

Update GRPO test configuration to reduce batch size from 2 to 1 for f…

79929f7

…aster testing.

Update reference model in TimedGRPOTrainer from "Qwen/Qwen3-1.7B" to …

8c8739f

…"Qwen/Qwen3-0.6" for improved compatibility and performance.

Update reference model in TimedGRPOTrainer from "Qwen/Qwen3-0.6" to "…

19e0aad

…Qwen/Qwen3-0.6B" for enhanced performance and compatibility.

Refactor TimedGRPOTrainer to improve code documentation with added do…

e57d3e2

…cstrings, and update GRPO test configuration by increasing max steps from 20 to 50 and batch size from 1 to 2 for enhanced testing efficiency.

Increase the number of retained old wallclock checkpoints from 6 to 1…

3b7a416

…2 in FullModelLLMAggregator for improved checkpoint management.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Marcos/benchmarks #1

Marcos/benchmarks #1

Uh oh!

mdvillagra commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Marcos/benchmarks #1

Are you sure you want to change the base?

Marcos/benchmarks #1

Uh oh!

Conversation

mdvillagra commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant