Skip to content

Code Stuck Due to Retry in Greedy Mode #5

@huiyeruzhou

Description

@huiyeruzhou

Hi, thanks for this wonderful dataset~ I ran into some issues while running the tests.

Problem and Analysis

During the execution of the testing code, the program seemed to get stuck. While debugging, I discovered that the code seemed to be retrying the greedy chatgpt calls, this happens when extract_answer attempts to extract an answer, and the model generates a blank reply as no answer is found.

    def get_chat_response(self, prompt, temperature=0, max_tokens=256, n=1, patience=1000, sleep_time=0):
        messages = [
            {"role": "user", "content": prompt},
        ]
        payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "n":n}

        while patience > 0:
            patience -= 1
            try:
                response = self._post_request(payload)
                if n == 1:
                    prediction = response["choices"][0]["message"]["content"].strip()
                    if prediction and prediction != "":
                        return prediction

And in a infinite loop in score_answer, there will be a retry as long as the output is not 0/1.

            judgement = match_answer(save_inst, args.api_key, args.quick_match)
            while True:
                if judgement.strip() not in ['0', '1']:
                    print('Wrong return format: ', judgement)
                    judgement = match_answer(save_inst, args.api_key, args.quick_match)
                else:
                    save_inst['judgement'] = int(judgement)
                    break

In both cases, the number of retries is incredibly large (even infinite), and due to the greedy mode, the model will never generate the expected response. Therefore, this will stall the code and consume a lot of API quota.

I also noticed that the prompt didn't explicitly ask the model to output null when it can't extract, or to only output 0/1 without explanation when scoring, it seems few-shot example is not enough to restrict the format.

What I tried to fix

I added format requirement in the prompt:

For extract:

Directly output the extracted answer with no explanation. 

For score:

Output the Judgement (0 or 1) DIRECTLY without any explanation.

In the code:

    def get_chat_response(self, prompt, temperature=0, max_tokens=256, n=1, patience=1000, sleep_time=0):
        messages = [
            {"role": "user", "content": prompt},
        ]
        payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "n": n}

        while patience > 0:
            patience -= 1
            try:
                response = self._post_request(payload)
                if n == 1:
                    prediction = response["choices"][0]["message"]["content"].strip()
                    if prediction and prediction != "":
                        return prediction
                    else:
+                       if temperature == 0:
+                           # no need to retry, greedy search always return the same result
+                           return ""
            judgement = match_answer(save_inst, args.api_key, args.quick_match)
-           while True:
-               if judgement.strip() not in ['0', '1']:
+           if judgement[0] not in ['0', '1']:
-                   judgement = match_answer(save_inst, args.api_key, args.quick_match)
-               else:
-                   save_inst['judgement'] = int(judgement)
-                   break
+              print('Wrong return format: ', judgement)
+          else:
+              save_inst['judgement'] = int(judgement)

The above two issues are both alleviated.

Discuss

I am curious to know if others have encountered this issue, as I see it should be universal due to the nature of greediness. Also, I look forward to developers' feedback on my modifications, I am willing to submit a PR if my understanding and fix is correct.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions