Rename TaskSelector to DataSelector.#519
Rename TaskSelector to DataSelector.#519chenyushuo wants to merge 4 commits intoagentscope-ai:mainfrom
TaskSelector to DataSelector.#519Conversation
2. Add `DataSelector` to `ExperienceFileReader` 3. Refactor `tokenize_and_mask_messages_default`
There was a problem hiding this comment.
Pull request overview
This PR renames the configuration concept of TaskSelector to DataSelector, propagates the new config through buffer readers (including experience file reading), and refactors the default assistant-token masking logic used with chat templates.
Changes:
- Rename
TaskSelectorConfig/task_selectortoDataSelectorConfig/data_selectoracross config, validators, scheduler, selector code, and tests. - Add selector support to
ExperienceFileReaderby refactoring common dataset-reading logic into_DatasetFileReader. - Refactor
tokenize_and_mask_messages_defaultto batchapply_chat_templatecalls when building assistant masks.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| trinity/common/models/utils.py | Refactors default chat-template tokenization/masking logic. |
| trinity/common/config_validator.py | Updates selector validation to use data_selector (incl. experience buffer validation). |
| trinity/common/config.py | Renames selector config type and renames config fields to data_selector; adds to experience buffer config. |
| trinity/buffer/task_scheduler.py | Updates scheduler to read selector type from data_selector. |
| trinity/buffer/selector/selector.py | Updates selector constructors/types to accept DataSelectorConfig. |
| trinity/buffer/reader/file_reader.py | Refactors file readers and adds selector initialization for experience file reading. |
| tests/trainer/trainer_test.py | Updates tests to use DataSelectorConfig. |
| tests/buffer/task_scheduler_test.py | Updates scheduler tests to use data_selector arguments/config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # check if selector is supported | ||
| selector = SELECTORS.get(taskset.task_selector.selector_type) | ||
| selector = SELECTORS.get(taskset.data_selector.selector_type) | ||
| if selector is None: | ||
| raise ValueError( | ||
| f"Selector {taskset.task_selector.selector_type} is not supported." | ||
| f"Selector `{taskset.data_selector.selector_type}` " | ||
| f"in {taskset.name} is not supported." | ||
| ) |
There was a problem hiding this comment.
This selector validation assumes taskset.data_selector is always set. However, the runtime codepath supports disabling selection by setting data_selector = None (e.g., _DatasetFileReader checks for None, and SimpleTasksetScheduler explicitly sets it to None). If a user config sets data_selector: null, this will raise an AttributeError. Please guard for None (skip validation / treat as sequential) before accessing selector_type.
| # check if selector is supported | ||
| selector = SELECTORS.get(experience_buffer.data_selector.selector_type) | ||
| if selector is None: | ||
| raise ValueError( | ||
| f"Selector {experience_buffer.data_selector.selector_type} " | ||
| "in `experience_buffer` is not supported." | ||
| ) |
There was a problem hiding this comment.
This validation unconditionally checks experience_buffer.data_selector.selector_type. If data_selector is explicitly set to None in config (which other components treat as a valid way to disable selection), this will crash with AttributeError. Please handle experience_buffer.data_selector is None (and/or selector_type is None) by skipping selector validation or applying a default selector type.
| # check if selector is supported | |
| selector = SELECTORS.get(experience_buffer.data_selector.selector_type) | |
| if selector is None: | |
| raise ValueError( | |
| f"Selector {experience_buffer.data_selector.selector_type} " | |
| "in `experience_buffer` is not supported." | |
| ) | |
| # check if selector is supported (if configured) | |
| data_selector = getattr(experience_buffer, "data_selector", None) | |
| selector_type = getattr(data_selector, "selector_type", None) if data_selector is not None else None | |
| if selector_type is not None: | |
| selector = SELECTORS.get(selector_type) | |
| if selector is None: | |
| raise ValueError( | |
| f"Selector {selector_type} in `experience_buffer` is not supported." | |
| ) |
| """ | ||
| taskset_configs = config.buffer.explorer_input.tasksets | ||
| if len(taskset_configs) == 1 and taskset_configs[0].task_selector.selector_type == "sequential": | ||
| if len(taskset_configs) == 1 and taskset_configs[0].data_selector.selector_type == "sequential": |
There was a problem hiding this comment.
get_taskset_scheduler assumes taskset_configs[0].data_selector is non-None. Since other code paths support data_selector = None to disable selection, a config that sets data_selector: null would raise an AttributeError here. Consider treating None as equivalent to sequential/no-selection in this condition.
| if len(taskset_configs) == 1 and taskset_configs[0].data_selector.selector_type == "sequential": | |
| data_selector = taskset_configs[0].data_selector | |
| if len(taskset_configs) == 1 and ( | |
| data_selector is None or data_selector.selector_type == "sequential" | |
| ): |
|
/unittest-all |
Summary
Failed Tests
Skipped
Tests
Github Test Reporter by CTRF 💚 |
Description
TaskSelectortoDataSelectorDataSelectortoExperienceFileReadertokenize_and_mask_messages_defaultenable_thinkingto model configmin_lr_ratiosettingChecklist
Please check the following items before code is ready to be reviewed.