Skip to content

Conversation

@mdvillagra
Copy link

No description provided.

…nclude timeout for launch_fedllm_custom.py execution
…FullModelLLMTrainer and FullModelLLMAggregator
…LLMAggregator by removing commented-out code and ensuring consistent execution of checkpointing after aggregation.
…ess and format compliance scoring. Introduce DataFormatting and Evaluation classes for data handling and numerical extraction. Update FullModelLLMTrainer to utilize the new reward function.
…nd adjust communication rounds in grpo_gsm8k_test_config.yaml from 1 to 2 for testing.
…tion class to streamline input parameters and improve clarity. Update FullModelLLMTrainer to utilize the revised combined_reward function.
…amline reward calculation, focusing solely on correctness scoring in the combined_reward method.
…mproved model output capacity. Update grpo_gsm8k_test_config.yaml to change client setup from 1 to 2 clients for enhanced testing scenarios.
…rd_fn instead of combined_reward, enhancing clarity and consistency in reward calculation.
… clarity and maintainability. Update reward_fn to utilize class attributes for reward values, improving consistency in reward calculations.
…_config.yaml for improved testing. Adjust timeout duration in run_fedml_client_custom.sh and run_fedml_server_custom.sh scripts to accommodate longer execution times.
…ng and improve logging of missing/unexpected keys. Adjust grpo_gsm8k_test_config.yaml for testing parameters, reducing communication rounds and training steps for quicker iterations.
…s base model state. Enhance logging to provide clearer output of missing and unexpected keys during model loading for improved debugging.
…gFace's save_pretrained method for improved compatibility. Implement fallback mechanism for models lacking this method, ensuring robust checkpointing during training.
…r improved troubleshooting during answer validation.
…ingle client setup for testing, adjusting client_num_in_total and client_num_per_round to 1.
… print statements to display completions and answers for better troubleshooting during answer validation.
… FullModelLLMTrainer. Refactor reward function to improve answer validation by incorporating numeric equivalence checks for better accuracy.
… for clarity in numeric equivalence checks, enhancing accuracy in answer validation.
… training parameters for improved testing. Increase client_num_in_total and client_num_per_round to 2, extend comm_round to 300, and raise grpo_max_steps to 150 for more comprehensive evaluation.
…from 256 to 512 for enhanced response handling.
…ForCausalLM for improved flexibility and evaluation. Removed deprecated torch_dtype handling and ensured dropout is disabled for the reference model.
…iner to avoid CPU↔GPU mismatch. Retain commented line for CPU off-loading during debugging.
…Q-Int8 for enhanced performance and compatibility.
…om 512 to 256 for optimized response handling.
…from 256 to 512 for enhanced response handling.
…iner.py and ensure reference model is moved to CPU for consistency in TimedGRPOTrainer.
…to model's device in _get_per_token_logps_and_entropies method to prevent CPU↔GPU mismatch errors.
…le tensor inputs in _get_per_token_logps_and_entropies method, ensuring compatibility with both tensor and mapping types.
…ssary tensor conversion logic, simplifying the process of moving inputs to the model's device.
…ties and entropies are moved to the appropriate device after computation in _get_per_token_logps_and_entropies method, improving device compatibility.
…or tensor types before moving log probabilities and entropies to the appropriate device, enhancing robustness and preventing potential errors.
…strategy to move policy outputs to CPU when the reference model is on CPU, ensuring efficient memory usage and preventing GPU memory spikes during rollouts.
…og probabilities and entropies to float16 and ensuring they are moved to the appropriate device, enhancing performance and memory efficiency during training.
… 256 for improved performance and resource management during training.
…mentation and clarity, and update max completion length and new tokens in FullModelLLMTrainer to 512 for enhanced training performance.
…additional parameters for enhanced performance and compatibility, and clean up commented code for better readability.
…h size in GRPO test config to 2 for improved testing efficiency.
…"Qwen/Qwen3-0.6" for improved compatibility and performance.
…Qwen/Qwen3-0.6B" for enhanced performance and compatibility.
…cstrings, and update GRPO test configuration by increasing max steps from 20 to 50 and batch size from 1 to 2 for enhanced testing efficiency.
…2 in FullModelLLMAggregator for improved checkpoint management.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant