Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/