-
Notifications
You must be signed in to change notification settings - Fork 343
Open
Description
There's an extra tensor slicing in compute_log_probabilities function.
Current Behavior
Let's suppose the this is the input sequence ["A", "B", "C", "D", "E", "F"]
where prompt tokens are ["A", "B", "C"] and completion tokens are ["D", "E", "F"]
1- get output logits of logits_to_keep+1 tokens
logits_to_keep = 3
logits = model(
input_ids=input_ids,
attention_mask=attention_mask,
logits_to_keep=logits_to_keep + 1 # Request one extra logit for proper alignment.
).logitsThis will compute the logits of ["C", "D", "E", "F"]
2- We do not need logits for token 'F' since we do not have a next token
logits = logits[:, :-1, :]and this will keep the logits of ["C", "D", "E"]
3- get only the input token
input_ids = input_ids[:, -logits_to_keep:]the will keep ["D", "E", "F"]
4- get the logits of completion tokens
logits = logits[:, -logits_to_keep:, :]this will keep the logits of ["C", "D", "E"] and logits already has the logits of ["C", "D", "E"]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels