SOP-bench SOP-bench is a Benchmark for evaluating llm agents to solve real-world standard operating procedures. We will Release the dataset and metrics in 2024