Skip to content

Conversation

@Anti-Entrophic
Copy link
Contributor

No description provided.

@KaiLv69 KaiLv69 self-requested a review April 13, 2024 12:42
)

self.num_heads_tp = query_states.shape[2]
self.tp_size = self.num_heads // self.num_heads_tp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tp_size能通过self.config.tp_size得到


attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)

if attn_weights.size() != (bsz, self.num_heads_tp, q_len, kv_seq_len):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的assert也应该通过self.config.tp_size和self.num_heads来做

rearrange(value_states, "b n (h d) -> b n h d", d=self.head_dim),
)

self.num_heads_tp = query_states.shape[2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和qwen2attention里一样,通过config.tp_size得到

"unexpected results may be encountered."
)
# self.self_attn = QWEN2_ATTENTION_CLASSES[config._attn_implementation](config, layer_idx)
self.self_attn = Qwen2FlashAttention2(config, layer_idx)
Copy link
Collaborator

@KaiLv69 KaiLv69 Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

像这样写吧,否则config里的use_flash就不能控制这里的attn实现了
if config.attn_implementation == "flash_attention_2" or config.use_flash:
self.attention = InternLM2FlashAttention2(config=config)
else:
self.attention = InternLM2Attention(config=config)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen2Attention也需要测试一下

attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[List[torch.FloatTensor]] = None,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个空行得删掉

self.self_attn = Qwen2FlashAttention2(config, layer_idx)
# self.self_attn = Qwen2SdpaAttention(config, layer_idx)

if config._attn_implementation == "flash_attention_2" or config.use_flash:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

842把_attn_implementation赋值成了"flash_attention_2",这里or是恒为True吗

@KaiLv69
Copy link
Collaborator

KaiLv69 commented Apr 24, 2024

测试test_generation.py pp_size=2和tp_size=2生成结果不一样。应该是kv cache的问题。

)
from collie.models.utils import inputs_to_kv_cache_for_layer, kv_cache_to_inputs_for_layer, kv_cache_to_inputs_for_model, inputs_to_kv_cache_for_model

if is_flash_attn_2_available():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是2.0及以前版本的flahattn会是false,config.use_flash=True的时候会报错,可以优化一下报错信息。

我看到的报错是
File "/fs-computility/llm/shared/lvkai/workspace/collie/tests/models/qwen2/../../../collie/models/qwen2/model.py", line 488, in forward _flash_supports_window_size NameError: name '_flash_supports_window_size' is not defined
可以改成提示他flash attn版本最少2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants