Bug in GRPO advantage calculation: Extra brackets in std computation

There's a bug in the GRPO advantage calculation at line 159 of `verl/trainer/core_algos.py`. The standard deviation computation has extra brackets that create an incorrect tensor shape.

# Line 159
id2std[idx] = torch.std(torch.tensor([id2score[idx]]))

The `id2score[idx]` is already a list; wrapping it in additional brackets `[id2score[idx]]` creates a nested structure.