Skip to content

Extra tensor slicing in compute_log_probabilities function #14

@mustaphabenhajm

Description

@mustaphabenhajm

There's an extra tensor slicing in compute_log_probabilities function.

Current Behavior

Let's suppose the this is the input sequence ["A", "B", "C", "D", "E", "F"]
where prompt tokens are ["A", "B", "C"] and completion tokens are ["D", "E", "F"]

1- get output logits of logits_to_keep+1 tokens
logits_to_keep = 3

logits = model(
        input_ids=input_ids,
        attention_mask=attention_mask,
        logits_to_keep=logits_to_keep + 1  # Request one extra logit for proper alignment.
    ).logits

This will compute the logits of ["C", "D", "E", "F"]

2- We do not need logits for token 'F' since we do not have a next token

logits = logits[:, :-1, :]

and this will keep the logits of ["C", "D", "E"]

3- get only the input token

input_ids = input_ids[:, -logits_to_keep:]

the will keep ["D", "E", "F"]

4- get the logits of completion tokens

logits = logits[:, -logits_to_keep:, :]

this will keep the logits of ["C", "D", "E"] and logits already has the logits of ["C", "D", "E"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions