Add minimal multi turn examples based on the new rollout functions (#493) by GuanxingLu · Pull Request #577 · radixark/miles

GuanxingLu · 2026-02-10T16:38:17Z

No description provided.

This reverts commit 25819c8.

This reverts commit 881d5a3.

Co-authored-by: PopSoda2002 [zhouhp.me@gmail.com](mailto:zhouhp.me@gmail.com)

…and_agent_examples

Co-authored-by: PopSoda2002 [zhouhp.me@gmail.com](mailto:zhouhp.me@gmail.com)

…and_agent_examples

gemini-code-assist · 2026-02-10T16:38:47Z

Summary of Changes

Hello @GuanxingLu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the framework's capabilities for tool-enabled language model training by introducing comprehensive examples for multi-turn and agentic generation. It provides a robust and secure environment for tool execution, alongside refined logging mechanisms to capture the nuances of multi-step interactions. These additions aim to streamline the development and evaluation of models that leverage external tools for complex tasks.

Highlights

New Retool-v2 Examples: Introduced a new retool_v2 example directory containing scripts for multi-turn and agentic tool-enabled RL training, complete with a detailed README.
Safe Python Sandbox and Tool Registry: Added tool_sandbox.py which provides a PythonSandbox for secure code execution with memory/time limits and module restrictions, and a ToolRegistry for managing and executing tools like a code interpreter.
Enhanced Rollout Functions: Implemented new rollout functions for multi-turn and agentic tool calling, allowing models to interact with tools over multiple steps.
Improved Logging and Metadata Tracking: Updated logging utilities to support multi-turn data and pass rate logging, and added round_number and tool_call_count to sample metadata for better tracking of tool interactions.

Changelog

examples/retool_v2/README.md
- Added a new README file detailing the retool_v2 example, including its overview, file structure, usage instructions for setup, model conversion, multi-turn and agentic RL training, tool format, and safety features.
examples/retool_v2/run_agentic.sh
- Added a new shell script to execute agentic RL training, configuring custom generation functions, tool specifications, and execution paths, along with various training and performance arguments.
examples/retool_v2/run_multi_turn.sh
- Added a new shell script to execute multi-turn RL training, setting up custom generation functions, tool specifications, and execution paths, including evaluation arguments and specific GRPO and optimizer configurations.
examples/retool_v2/tool_sandbox.py
- Added a new Python module that implements a PythonSandbox for safe code execution, a ToolRegistry for managing and executing tools, and a reward_func that incorporates tool usage into the reward calculation.
miles/backends/training_utils/log_utils.py
- Modified the log_rollout_data function to pass the parallel_state argument to log_passrate when args.log_passrate is enabled.
miles/rollout/base_types.py
- Removed a trailing newline character from the end of the file.
miles/rollout/generate_hub/multi_turn.py
- Added round_number to the sample metadata, tracking the current turn in multi-turn generation.
miles/rollout/generate_utils/tool_call_utils.py
- Updated sample metadata to include tool_call_count, incrementing it based on the number of tool messages processed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces valuable examples for multi-turn and agentic tool usage, complete with a Python sandbox for code execution. The overall structure is good, but there are several areas for improvement to enhance robustness, security, and usability. My review focuses on making the shell scripts more portable by removing hardcoded paths, improving the security and correctness of the Python sandbox, and increasing the robustness of shell script logic. I've provided specific suggestions for each of these points in the comments below.

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/tool_sandbox.py

+    def _check_code_safety(self, code: str) -> tuple[bool, str]:
+        """Check code safety by scanning for dangerous patterns"""


The _check_code_safety method relies on a blacklist of dangerous patterns. This approach is fundamentally insecure as it can often be bypassed with obfuscation techniques (e.g., using string concatenation like 'ev' + 'al', or character encodings). For any production-like environment, this poses a significant security risk. It would be much safer to use a proper sandboxing library or execute the code in a more isolated environment like a dedicated, heavily restricted container.

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/tool_sandbox.py

+# Set memory limit (4GB)
+try:
+    resource.setrlimit(resource.RLIMIT_AS, (4 * 1024 * 1024 * 1024, -1))
+except Exception:
+    pass


The memory limit for the sandboxed process is hardcoded to 4GB, which ignores the self.memory_limit attribute passed to the PythonSandbox constructor. This is a bug that prevents the memory limit from being configurable. You should parse the self.memory_limit string (e.g., '4GB', '512MB') into bytes and use that value to set the resource limit.

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/README.md

+hf download --repo-type dataset zhuzilin/dapo-math-17k --local-dir /root/dapo-math-17k
+hf download Qwen/Qwen3-4B --local-dir /root/Qwen3-4B
+```
+
+### 2. Convert Model to Megatron-LM Format
+
+```bash
+source scripts/models/qwen3-4B.sh
+PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
+    ${MODEL_ARGS[@]} \
+    --hf-checkpoint /root/Qwen3-4B \
+    --rotary-base 5000000 \
+    --save /root/Qwen3-4B_torch_dist


The setup instructions contain hardcoded paths like /root/dapo-math-17k and /root/Qwen3-4B. This makes the example less portable and assumes a specific user environment, likely a particular Docker container. To improve usability for a wider audience, consider using environment variables for these paths or adding a note to instruct users to replace them with their own local paths.

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/run_agentic.sh

+pkill -9 sglang
+sleep 3
+ray stop --force
+pkill -9 ray
+pkill -9 python
+sleep 3
+pkill -9 ray
+pkill -9 python


The cleanup block at the start of the script is quite aggressive, immediately using pkill -9. It's also repetitive. It's generally better to first attempt a graceful shutdown (e.g., pkill without -9) before forcing termination. The repetition of pkill -9 ray and pkill -9 python suggests that processes might not be terminating correctly on the first attempt, which could be investigated. For cleaner code, you could wrap this logic in a function.

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/run_agentic.sh

+   --hf-checkpoint /root/Qwen3-4B
+   --ref-load /root/Qwen3-4B_torch_dist
+   --save /root/Qwen3-4B_miles/retool_v2_agentic/
+   --save-interval 20
+   --rotary-base 1000000
+)
+
+ROLLOUT_ARGS=(
+   --prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl


The script contains hardcoded paths like /root/Qwen3-4B and /root/dapo-math-17k/dapo-math-17k.jsonl. This limits portability and reusability. It's better to use environment variables to define base paths for models and data, with /root as a default value. This would make the script more flexible for users with different environment setups.

Suggested change

--hf-checkpoint /root/Qwen3-4B

--ref-load /root/Qwen3-4B_torch_dist

--save /root/Qwen3-4B_miles/retool_v2_agentic/

--save-interval 20

--rotary-base 1000000

)

ROLLOUT_ARGS=(

--prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl

--hf-checkpoint "${DATA_DIR:-/root}/Qwen3-4B"

--ref-load "${DATA_DIR:-/root}/Qwen3-4B_torch_dist"

--save "${DATA_DIR:-/root}/Qwen3-4B_miles/retool_v2_agentic/"

--save-interval 20

--rotary-base 1000000

)

ROLLOUT_ARGS=(

--prompt-data "${DATA_DIR:-/root}/dapo-math-17k/dapo-math-17k.jsonl"

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/run_agentic.sh

+   WANDB_ARGS=(
+      --use-wandb
+      --wandb-project miles-dev-retool-v2
+      --wandb-group qwen3-4B-multi-turn


The wandb group is named qwen3-4B-multi-turn, but this script is for agentic training (run_agentic.sh). This appears to be a copy-paste error from run_multi_turn.sh. For better clarity and experiment tracking, you should consider changing it to qwen3-4B-agentic or something similar that reflects the script's purpose.

Suggested change

--wandb-group qwen3-4B-multi-turn

--wandb-group qwen3-4B-agentic

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/run_agentic.sh

+RUNTIME_ENV_JSON="{
+  \"env_vars\": {
+    \"PYTHONPATH\": \"/root/Megatron-LM/:${SCRIPT_DIR}:/root/miles\",
+    \"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",
+    \"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"
+  }
+}"


Constructing JSON by embedding shell variables directly into a string is fragile. If a variable like ${SCRIPT_DIR} contains characters that need to be escaped in JSON, the string will become invalid. Using a here document is a more robust and readable way to create the JSON string.

Suggested change

RUNTIME_ENV_JSON="{

\"env_vars\": {

\"PYTHONPATH\": \"/root/Megatron-LM/:${SCRIPT_DIR}:/root/miles\",

\"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",

\"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"

}

}"

RUNTIME_ENV_JSON=$(cat <<EOF

{

"env_vars": {

"PYTHONPATH": "/root/Megatron-LM/:${SCRIPT_DIR}:/root/miles",

"CUDA_DEVICE_MAX_CONNECTIONS": "1",

"NCCL_NVLS_ENABLE": "${HAS_NVLINK}"

}

}

EOF

)

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/run_multi_turn.sh

+   --hf-checkpoint /root/Qwen3-4B
+   --ref-load /root/Qwen3-4B_torch_dist
+   --save /root/Qwen3-4B_miles/retool_v2_multi_turn/
+   --save-interval 1000
+)
+
+ROLLOUT_ARGS=(
+   --prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl


The script contains hardcoded paths like /root/Qwen3-4B and /root/dapo-math-17k/dapo-math-17k.jsonl. This limits portability. It's recommended to use environment variables to define base paths for models and data, with /root as a default value, to make the script more flexible.

Suggested change

--hf-checkpoint /root/Qwen3-4B

--ref-load /root/Qwen3-4B_torch_dist

--save /root/Qwen3-4B_miles/retool_v2_multi_turn/

--save-interval 1000

)

ROLLOUT_ARGS=(

--prompt-data /root/dapo-math-17k/dapo-math-17k.jsonl

--hf-checkpoint "${DATA_DIR:-/root}/Qwen3-4B"

--ref-load "${DATA_DIR:-/root}/Qwen3-4B_torch_dist"

--save "${DATA_DIR:-/root}/Qwen3-4B_miles/retool_v2_multi_turn/"

--save-interval 1000

)

ROLLOUT_ARGS=(

--prompt-data "${DATA_DIR:-/root}/dapo-math-17k/dapo-math-17k.jsonl"

gemini-code-assist · 2026-02-10T16:53:31Z

examples/retool_v2/run_multi_turn.sh

+RUNTIME_ENV_JSON="{
+  \"env_vars\": {
+    \"PYTHONPATH\": \"/root/Megatron-LM/:${SCRIPT_DIR}:/root/miles\",
+    \"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",
+    \"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"
+  }
+}"


Constructing JSON by embedding shell variables directly into a string is fragile. If a variable like ${SCRIPT_DIR} contains characters that need to be escaped in JSON, the string will become invalid. Using a here document is a more robust and readable way to create the JSON string.

Suggested change

RUNTIME_ENV_JSON="{

\"env_vars\": {

\"PYTHONPATH\": \"/root/Megatron-LM/:${SCRIPT_DIR}:/root/miles\",

\"CUDA_DEVICE_MAX_CONNECTIONS\": \"1\",

\"NCCL_NVLS_ENABLE\": \"${HAS_NVLINK}\"

}

}"

RUNTIME_ENV_JSON=$(cat <<EOF

{

"env_vars": {

"PYTHONPATH": "/root/Megatron-LM/:${SCRIPT_DIR}:/root/miles",

"CUDA_DEVICE_MAX_CONNECTIONS": "1",

"NCCL_NVLS_ENABLE": "${HAS_NVLINK}"

}

}

EOF

)

gemini-code-assist · 2026-02-10T16:53:32Z

examples/retool_v2/tool_sandbox.py

+            r"type\s*\(",
+            r"isinstance\s*\(",
+            r"issubclass\s*\(",
+            r"super\s*\(",
+            r"property\s*\(",
+            r"staticmethod\s*\(",
+            r"classmethod\s*\(",
+            r"__\w+__",  # double underscore methods


The safety check blacklists many fundamental Python features, including type(), isinstance(), property(), staticmethod(), classmethod(), and all dunder methods (__\w+__). This is overly restrictive and will break a lot of valid, safe, object-oriented code, which severely limits the utility of the code interpreter. The blacklist should be more targeted to allow for common and safe language constructs.

gemini-code-assist

Code Review

This pull request introduces new examples for multi-turn and agentic tool use, including new run scripts, a README, and a Python sandbox for code execution. However, a critical sandbox escape vulnerability has been identified in the newly added tool_sandbox.py file. The current denylist-based sandboxing mechanism is insecure and can be bypassed, potentially leading to Remote Code Execution (RCE). Beyond this, the Python sandbox implementation is overly restrictive, and there are also areas for improvement in the example scripts and documentation for clarity and correctness.

gemini-code-assist · 2026-02-10T16:55:09Z

examples/retool_v2/tool_sandbox.py

+    def _check_code_safety(self, code: str) -> tuple[bool, str]:
+        """Check code safety by scanning for dangerous patterns"""
+        # Check for dangerous operations
+        dangerous_patterns = [
+            r"import\s+os",
+            r"import\s+sys",
+            r"import\s+subprocess",
+            r"import\s+shutil",
+            r"import\s+glob",
+            r"import\s+pathlib",
+            r"__import__",
+            r"eval\s*\(",
+            r"exec\s*\(",
+            r"open\s*\(",
+            r"file\s*\(",
+            r"input\s*\(",
+            r"raw_input\s*\(",
+            r"compile\s*\(",
+            r"execfile\s*\(",
+            r"getattr\s*\(",
+            r"setattr\s*\(",
+            r"delattr\s*\(",
+            r"hasattr\s*\(",
+            r"globals\s*\(",
+            r"locals\s*\(",
+            r"vars\s*\(",
+            r"dir\s*\(",
+            r"type\s*\(",
+            r"isinstance\s*\(",
+            r"issubclass\s*\(",
+            r"super\s*\(",
+            r"property\s*\(",
+            r"staticmethod\s*\(",
+            r"classmethod\s*\(",
+            r"__\w+__",  # double underscore methods
+        ]
+
+        for pattern in dangerous_patterns:
+            if re.search(pattern, code, re.IGNORECASE):
+                return False, f"Code contains dangerous pattern: {pattern}"
+
+        # Check imported modules
+        import_pattern = r"import\s+(\w+)"
+        from_pattern = r"from\s+(\w+)"
+
+        imports = re.findall(import_pattern, code)
+        froms = re.findall(from_pattern, code)
+
+        all_imports = set(imports + froms)
+        for imp in all_imports:
+            if imp not in self.allowed_modules:
+                return False, f"Import of '{imp}' is not allowed"
+
+        return True, "Code is safe"


The PythonSandbox.execute_code method, specifically the _check_code_safety function, relies on a denylist of regular expressions (e.g., r"__\w+__") to detect dangerous patterns. This denylist approach is fundamentally insecure and prone to bypasses, creating a critical sandbox escape vulnerability that could lead to Remote Code Execution (RCE). The pattern r"__\w+__" is also overly restrictive, blocking valid Python constructs like __init__ or __name__ == "__main__", which severely limits the utility of the interpreter. An attacker could craft malicious code to bypass these regex checks and achieve arbitrary code execution, leading to system compromise. Remediation: Replace the denylist-based validation with a more robust sandboxing solution. Consider using technologies like: - Containers: Run the code in a minimal, isolated Docker container with no network access and limited resources. - gVisor or Firecracker: Use a container runtime or microVM that provides a secure kernel abstraction layer. - RestrictedPython: A tool that restricts the available modules and objects within a Python environment.

gemini-code-assist · 2026-02-10T16:55:09Z

examples/retool_v2/README.md

+cd miles
+pip install -e . --no-deps


The setup instructions have a directory navigation issue. After cd miles, the shell's working directory is changed, which will cause subsequent commands that expect to be run from the repository root to fail. Using a subshell (...) for the installation steps will prevent changing the current directory and ensure the following commands execute correctly from the repository root.

Suggested change

cd miles

pip install -e . --no-deps

(cd miles && pip install -e . --no-deps)

gemini-code-assist · 2026-02-10T16:55:09Z

examples/retool_v2/run_agentic.sh

+pkill -9 sglang
+sleep 3
+ray stop --force
+pkill -9 ray
+pkill -9 python
+sleep 3
+pkill -9 ray
+pkill -9 python


The process cleanup logic at the beginning of the script is redundant. The pkill -9 ray and pkill -9 python commands are repeated. This can be simplified for better readability and maintainability. Also, pkill can accept multiple process names.

Suggested change

pkill -9 sglang

sleep 3

ray stop --force

pkill -9 ray

pkill -9 python

sleep 3

pkill -9 ray

pkill -9 python

pkill -9 sglang

ray stop --force

sleep 3 # Wait for processes to terminate

pkill -9 ray python # Force kill any remaining processes

gemini-code-assist · 2026-02-10T16:55:09Z

examples/retool_v2/run_agentic.sh

+   WANDB_ARGS=(
+      --use-wandb
+      --wandb-project miles-dev-retool-v2
+      --wandb-group qwen3-4B-multi-turn


The wandb-group appears to be copied from run_multi_turn.sh. For better experiment tracking and to avoid confusion, this should be unique to the agentic run.

Suggested change

--wandb-group qwen3-4B-multi-turn

--wandb-group qwen3-4B-agentic

gemini-code-assist · 2026-02-10T16:55:09Z

examples/retool_v2/run_multi_turn.sh

+pkill -9 sglang
+sleep 3
+ray stop --force
+pkill -9 ray
+pkill -9 python
+sleep 3
+pkill -9 ray
+pkill -9 python


The process cleanup logic at the beginning of the script is redundant. The pkill -9 ray and pkill -9 python commands are repeated. This can be simplified for better readability and maintainability. Also, pkill can accept multiple process names.

Suggested change

pkill -9 sglang

sleep 3

ray stop --force

pkill -9 ray

pkill -9 python

sleep 3

pkill -9 ray

pkill -9 python

pkill -9 sglang

ray stop --force

sleep 3 # Wait for processes to terminate

pkill -9 ray python # Force kill any remaining processes

gemini-code-assist

Code Review

This pull request introduces new examples for multi-turn and agentic generation with tool usage, which is a great addition. The changes include new run scripts, a README file, and a Python tool sandbox, along with updates to track multi-turn and tool call metadata. My review focuses on improving the clarity of the documentation, correcting a potential configuration error, and addressing a significant issue in the tool sandbox's security implementation that could block valid code.

gemini-code-assist · 2026-02-10T17:01:21Z

examples/retool_v2/tool_sandbox.py

+            r"property\s*\(",
+            r"staticmethod\s*\(",
+            r"classmethod\s*\(",
+            r"__\w+__",  # double underscore methods


The regex pattern r"__\w+__" for checking code safety is too broad and will block valid Python code that uses dunder methods, such as __init__ in class definitions. This will cause the sandbox to reject legitimate code, limiting its usability. It's better to remove this overly restrictive pattern.

gemini-code-assist · 2026-02-10T17:01:22Z

examples/retool_v2/README.md

+### 1. Setup and download datasets
+
+```bash
+cd miles


The cd miles command is likely to cause confusion or errors. Users typically run setup commands like pip install -e . from the project's root directory. Changing directories may break subsequent commands that rely on relative paths from the root.

gemini-code-assist · 2026-02-10T17:01:22Z

examples/retool_v2/run_agentic.sh

+   WANDB_ARGS=(
+      --use-wandb
+      --wandb-project miles-dev-retool-v2
+      --wandb-group qwen3-4B-multi-turn


The wandb-group is set to qwen3-4B-multi-turn, which appears to be a copy-paste from the run_multi_turn.sh script. Using a group name specific to the agentic run will improve experiment tracking and clarity.

Suggested change

--wandb-group qwen3-4B-multi-turn

--wandb-group qwen3-4B-agentic

fzyzcjy · 2026-02-12T01:33:33Z

examples/retool_v2/run_multi_turn.sh

nit: maybe we can use python script, and maybe run_agentic.sh and run_multi_turn.sh can become one single file w/ a simple bool flag and a short if-else

fzyzcjy

the general direction LGTM, i.e. almost zero lines of code change, except for copy-pasting the sandbox code

fzyzcjy added 30 commits January 17, 2026 13:30

more

25819c8

Revert "more"

a34fd40

This reverts commit 25819c8.

Revert "more"

73a8ad4

This reverts commit 881d5a3.

more

887814b

more

8d00a0c

more

5832c22

more

8bd8459

more

90c934a

more

5a15002

more

80ba47d

more

885aff8

more

075aa93

more

8d50974

more

6651808

more

546db91

more

ef42bd4

more

8d469d2

more

b0bd948

more

d1ab853

more

d745520

more

d3b9088

more

5379bb9

revert

6aab421

more

9988ce0

more

ec5334d

more

0bd4900

more

f7afe9e

more

ccb96a1

more

ec14934

more

017e56b

fzyzcjy and others added 23 commits January 18, 2026 10:50

chore: separate pytest from main package installation

091577f

more

b269de0

more

3b31227

rm

51dd13f

more

9127f4f

more

6ab64c7

more

bf0a3b4

more

0697448

more

d6e522e

more

6ab728b

fmt

d964184

fix: typo args.device -> args.devices

775552f

fix: skip gated llama model and fix tool_index expectations

083f676

Merge branch 'feat/ac8113aw' into feat/ac8113ax

3a64139

Merge branch 'feat/ac8113ax' into feat/ac8113ay

3689e4f

Merge branch 'main' into feat/ac8113ay

e389808

Implement simple multi-turn and agent examples based on the new code

f17fb65

Co-authored-by: PopSoda2002 [zhouhp.me@gmail.com](mailto:zhouhp.me@gmail.com)

Merge remote-tracking branch 'upstream/main' into feature/multi_turn_…

bb37837

…and_agent_examples

Add examples/retool_v2/README.md

f0b6353

Co-authored-by: PopSoda2002 [zhouhp.me@gmail.com](mailto:zhouhp.me@gmail.com)

Add tool call metrics

ce05e49

Co-authored-by: PopSoda2002 [zhouhp.me@gmail.com](mailto:zhouhp.me@gmail.com)

Fix log_passrate

4ea29cc

Co-authored-by: PopSoda2002 [zhouhp.me@gmail.com](mailto:zhouhp.me@gmail.com)

Merge remote-tracking branch 'upstream/main' into feature/multi_turn_…

a5e4855

…and_agent_examples

Add base types

5319098

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

Impl SFT multi turn

1fb55c7

fzyzcjy reviewed Feb 12, 2026

View reviewed changes

		def _check_code_safety(self, code: str) -> tuple[bool, str]:
		"""Check code safety by scanning for dangerous patterns"""

	--wandb-group qwen3-4B-multi-turn
	--wandb-group qwen3-4B-agentic

	cd miles
	pip install -e . --no-deps
	(cd miles && pip install -e . --no-deps)

Conversation

GuanxingLu commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

fzyzcjy Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

fzyzcjy left a comment

Choose a reason for hiding this comment

Uh oh!