Code Stuck Due to Retry in Greedy Mode

Hi, thanks for this wonderful dataset~ I ran into some issues while running the tests.

## Problem and Analysis
During the execution of the testing code, the program seemed to get stuck. While debugging, I discovered that the code seemed to be retrying the greedy chatgpt calls, this happens when [extract_answer](https://github.com/ZrrSkywalker/MathVerse/blob/main/evaluation/extract_answer_s1.py) attempts to extract an answer, and the model generates a blank reply as no answer is found.

```python
    def get_chat_response(self, prompt, temperature=0, max_tokens=256, n=1, patience=1000, sleep_time=0):
        messages = [
            {"role": "user", "content": prompt},
        ]
        payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "n":n}

        while patience > 0:
            patience -= 1
            try:
                response = self._post_request(payload)
                if n == 1:
                    prediction = response["choices"][0]["message"]["content"].strip()
                    if prediction and prediction != "":
                        return prediction
```

And in a infinite loop in [score_answer](https://github.com/ZrrSkywalker/MathVerse/blob/main/evaluation/score_answer_s2.py#L89), there will be a retry as long as the output is not 0/1.

```python
            judgement = match_answer(save_inst, args.api_key, args.quick_match)
            while True:
                if judgement.strip() not in ['0', '1']:
                    print('Wrong return format: ', judgement)
                    judgement = match_answer(save_inst, args.api_key, args.quick_match)
                else:
                    save_inst['judgement'] = int(judgement)
                    break
```

In both cases, the number of retries is incredibly large (even infinite), and due to the greedy mode, the model will never generate the expected response. Therefore, this will stall the code and consume a lot of API quota.

I also noticed that the prompt didn't explicitly ask the model to output null when it can't extract, or to only output 0/1 without explanation when scoring, it seems few-shot example is not enough to restrict the format.

## What I tried to fix

I added format requirement in the prompt:

For extract:
```
Directly output the extracted answer with no explanation. 
```

For score:
```
Output the Judgement (0 or 1) DIRECTLY without any explanation.
```

In the code:

```diff
    def get_chat_response(self, prompt, temperature=0, max_tokens=256, n=1, patience=1000, sleep_time=0):
        messages = [
            {"role": "user", "content": prompt},
        ]
        payload = {"model": self.gpt_model, "messages": messages, "temperature": temperature, "max_tokens": max_tokens, "n": n}

        while patience > 0:
            patience -= 1
            try:
                response = self._post_request(payload)
                if n == 1:
                    prediction = response["choices"][0]["message"]["content"].strip()
                    if prediction and prediction != "":
                        return prediction
                    else:
+                       if temperature == 0:
+                           # no need to retry, greedy search always return the same result
+                           return ""
```

```diff
            judgement = match_answer(save_inst, args.api_key, args.quick_match)
-           while True:
-               if judgement.strip() not in ['0', '1']:
+           if judgement[0] not in ['0', '1']:
-                   judgement = match_answer(save_inst, args.api_key, args.quick_match)
-               else:
-                   save_inst['judgement'] = int(judgement)
-                   break
+              print('Wrong return format: ', judgement)
+          else:
+              save_inst['judgement'] = int(judgement)
```

The above two issues are both alleviated.

## Discuss

I am curious to know if others have encountered this issue, as I see it should be universal due to the nature of greediness. Also, I look forward to developers' feedback on my modifications, I am willing to submit a PR if my understanding and fix is correct.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Stuck Due to Retry in Greedy Mode #5

Problem and Analysis

What I tried to fix

Discuss

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Code Stuck Due to Retry in Greedy Mode #5

Description

Problem and Analysis

What I tried to fix

Discuss

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions