Gemini-CLI 使用期间出现BUG - DetachedInstanceError

Oracle VPS 

已配置 vertex ai + Gemini CLI
使用Gemini CLI 的时候无法完成模型测试。
出现2个真实问题。

经由 codex GPT-5.4 归纳整理，并进行修复，至少实测下来问题解决了。
但由于是AI的代码和思路，没有细致测试，就提issue，而不是pull request了。


### 2个问题

问题A
```
DetachedInstanceError
Parent instance <ProviderEndpoint ...> is not bound to a Session
lazy load operation of attribute 'provider' cannot proceed
```

问题B
```
Failed to record usage for endpoint check: cannot access local variable 'candidate'
```
### 问题A

#### 问题概述

`src/services/provider/adapters/vertex_ai/transport.py` 里的 `is_vertex_ai_context()` 在 endpoint check 阶段可能抛出 `DetachedInstanceError`。原因是：即使当前请求并不是真正的 Vertex AI 请求，它仍然会去访问 `endpoint.provider` / `key.provider`；如果这里传进来的 ORM 对象已经脱离原来的 session，这一步就会触发 SQLAlchemy 的懒加载异常。

在测试 `gemini-cli` endpoint 时撞到这条错误，但 bug 本体并不是 Gemini CLI 专属问题。真正出错的位置更早，发生在共享的 provider 上下文判断逻辑里。

在 commit `f57fe6e13e037e52fa3e35114f1448a9d2ee262b` 下，最相关的是：

- `src/services/provider/adapters/vertex_ai/transport.py:34-58`
- `src/api/handlers/base/handler_adapter_base.py:380-385`
- `src/services/provider/provider_context.py:1-1`
- `src/services/provider/provider_context.py:68-96`

#### 当前代码路径

`handler_adapter_base.py` 里会这样调用：

```python
is_vertex_ai_context(
    base_url=validated_base_url,
    provider_type=provider_type,
    endpoint=provider_endpoint,
    key=provider_api_key,
)
```

而 helper 当前的关键逻辑是：

```python
for obj in (endpoint, key):
    provider = getattr(obj, "provider", None) if obj is not None else None
    if normalize_provider_type(getattr(provider, "provider_type", None)) == (
        ProviderType.VERTEX_AI.value
    ):
        return True
```

#### 实际行为

如果 `endpoint` / `key` 是已经 detached 的 SQLAlchemy ORM 对象，访问 `obj.provider` 时就可能触发懒加载，并报出：

```text
DetachedInstanceError: Parent instance <ProviderEndpoint ...> is not bound to a Session;
lazy load operation of attribute 'provider' cannot proceed
```

这会导致 endpoint test / check 链在真正发出上游请求之前就先被打断。


#### 问题A的根因链条
```
provider_query.py
  ->
把 Provider / ProviderEndpoint / ProviderAPIKey 从新 session 里查出来
  ->
expunge 掉
  ->
后续测试链继续拿这些 detached ORM 对象往下传
  ->
is_vertex_ai_context() 旧实现会去碰 endpoint.provider / key.provider
  ->
SQLAlchemy 尝试 lazy load
  ->
对象已经不在 session 里
  ->
DetachedInstanceError
```

#### 修复点

文件 `src/services/provider/adapters/vertex_ai/transport.py`
修复原则：不要再直接摸 detached ORM relation。

修复后的逻辑是：
1. 先看显式传入的 `provider_type`
2. 如果显式类型已经存在，直接判断是不是 `vertex_ai`
3. 如果显式类型没有，再走 detached-safe 的：

```python
resolve_provider_type(endpoint=endpoint, key=key)
```

4. 最后才退回 `base_url` / `api_format` / `auth_type` 这类纯值判断

### 问题B

在问题A修好后，出现
```
Failed to record usage for endpoint check:
cannot access local variable 'candidate' where it is not associated with a value
```


#### 问题概述

`src/api/handlers/base/endpoint_checker.py` 里的 `_calculate_and_record_usage()` 可能把一条本来已经成功的 usage 记录链，重新打成错误返回。根因是：它最后返回 `candidate_id` 时，读的是一个当前作用域里根本没有定义的外层 `candidate` 变量。

在测试 `gemini-cli` endpoint 时撞到这条错误的，坏掉的是共享的 endpoint usage 记录逻辑。

在 commit `f57fe6e13e037e52fa3e35114f1448a9d2ee262b` 下，最相关的是：

- `src/api/handlers/base/endpoint_checker.py:344-406`

#### 当前代码结构

`_calculate_and_record_usage()` 里当前大致是这样：

```python
def _record_candidate_sync() -> str:
    candidate = RequestCandidateService.create_candidate(...)
    ...
    return str(candidate.id)

candidate_id = await asyncio.to_thread(_record_candidate_sync)
...
return {
    ...
    "candidate_id": str(candidate.id) if candidate else None,
}
```

#### 实际行为

当 `RequestCandidate` 创建成功时，当前链路会变成：

1. usage 记录已经成功
2. `_record_candidate_sync()` 已经把 `candidate_id` 返回给外层
3. 日志里也已经写出 candidate 创建成功
4. 但最后 `return` 时，代码还是回头去读外层 `candidate`

于是会报出：

```text
cannot access local variable 'candidate' where it is not associated with a value
```

虽然前面的动作已经成功，但函数最后还是会掉进外层错误分支。

从现场日志看，前面的成功路径其实已经发生了：

```text
[endpoint_check] Usage recorded successfully | usage_id=...
[endpoint_check] RequestCandidate created | request_id=..., candidate_id=...
Failed to record usage for endpoint check: cannot access local variable 'candidate' ...
```


#### 问题B的根因链条：
```
内层 _record_candidate_sync() 创建了 candidate
  ->
只把 str(candidate.id) 返回给外层
  ->
外层拿到的是 candidate_id 这个字符串
  ->
但 return 时又写成 str(candidate.id) if candidate else None
  ->
外层根本没有 candidate 这个变量
  ->
UnboundLocalError
```

#### 修复点：

修复文件：

```
src/api/handlers/base/endpoint_checker.py
```

1. 在外层先初始化：

```python
candidate_id = None
```

2. 成功创建 `RequestCandidate` 后，把返回值写进这个变量：

```python
candidate_id = await asyncio.to_thread(_record_candidate_sync)
```

3. 返回结果时，直接回：

```python
"candidate_id": candidate_id
```

4. 失败分支也只把它显式设回 `None`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini-CLI 使用期间出现BUG - DetachedInstanceError #272

2个问题

问题A

问题概述

当前代码路径

实际行为

问题A的根因链条

修复点

问题B

问题概述

当前代码结构

实际行为

问题B的根因链条：

修复点：

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gemini-CLI 使用期间出现BUG - DetachedInstanceError #272

Description

2个问题

问题A

问题概述

当前代码路径

实际行为

问题A的根因链条

修复点

问题B

问题概述

当前代码结构

实际行为

问题B的根因链条：

修复点：

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions