Description
This wasn't tested in the original paper, but I've found GPT+4 with Python (Advanced Data Analysis) is often capable of solving these kinds of problems by writing a Python program that finds the solution in the search space:
Here, GPT wrote a Python program that tries all permutations of the expression (A^B)^(C^D), where "^" are one of the four basic arithmetic operators (+,-,*,/) and A, B, C, D are a permutation of the given numbers.
It then found the solution (14 - 8) * (8 / 2) = 24
, which is correct. And this is in a relatively small number of tokens (input = 112 tokens, output = 512 + prompt for Advanced Data Analysis >= 624 tokens), whereas AoT would likely require far more (the openai.logs file in this repo, for instance, is 15 306 tokens).