Skip to content

Question about predicted token in final layer outputs #1

@yileitu

Description

@yileitu

Hi Zhenyu, thanks a lot for this very useful repo!

My question may be less about the code implementation itself and more about the logit lens method and its interpretation.

I noticed in the visualization all_layers_step_0.png
, the rank-0 token of Layer31 block_output is 1. Does this mean that for the prompt “The best food in Japan is”, LLaMA2’s final prediction (highest probability token) is literally token 1? Of course, with nucleus sampling it might still generate other tokens, but the caption at the bottom of the figure also says “Predicted Token: 1”.

Interestingly, in your paper arXiv:2503.11667
, Figure 1 shows a very similar situation: for the prompt “The capital of France is”, the rank-0 token of Layer31 block_output again is 1. But the bottom caption there says “Predicted Token: 256”. Token 256 is not even in the top-15 tokens shown, so 256 seems quite unlikely to get sampled.

Both cases suggest that LLaMA2-7B predicts a number token (e.g. 1 or 256) as the next token, even for simple factual prompts like “The best food in Japan is” or “The capital of France is”. Is this expected behavior?

I also noticed that with logit lens techniques, some mid-to-late layers (before the final one) actually predict the seemingly right tokens (“Japanese” and “Paris”), but by the final layer this collapses into predicting a numeric token instead. Could you help explain why this happens?

Thanks again for making this repo and for clarifying!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions