Skip to content

Conversation

@endolith
Copy link

Add support for structured outputs (JSON schema, regex, choice, grammar,
and structural_tag constraints) using vLLM 0.11.0+'s
StructuredOutputsParams API.

Changes:

  • Import StructuredOutputsParams from vllm.sampling_params
  • Extract structured_outputs from extra_body in sampling_params
  • Create StructuredOutputsParams instances for json, regex, choice,
    grammar, and structural_tag constraint types
  • Reject deprecated guided_json parameter (from vLLM <0.11.0) with
    clear error message pointing to migration guide

This enables users to enforce JSON schemas and other structured output
formats on model responses, as documented in vLLM 0.11.0.

https://docs.vllm.ai/en/v0.11.0/features/structured_outputs.html

Fixes #235

Add support for structured outputs (JSON schema, regex, choice, grammar,
and structural_tag constraints) using vLLM 0.11.0+'s
StructuredOutputsParams API.

Changes:
- Import StructuredOutputsParams from vllm.sampling_params
- Extract structured_outputs from extra_body in sampling_params
- Create StructuredOutputsParams instances for json, regex, choice,
  grammar, and structural_tag constraint types
- Reject deprecated guided_json parameter (from vLLM <0.11.0) with
  clear error message pointing to migration guide

This enables users to enforce JSON schemas and other structured output
formats on model responses, as documented in vLLM 0.11.0.

https://docs.vllm.ai/en/v0.11.0/features/structured_outputs.html

Fixes runpod-workers#235
@endolith
Copy link
Author

endolith commented Nov 15, 2025

This was written by AI but seems to work, with the possible exception of the structural_tag. I need to check more carefully. It also doesn't enforce field order in json, which might be a vLLM issue.

@endolith
Copy link
Author

endolith commented Nov 17, 2025

with the possible exception of the structural_tag.

I think this was just a misunderstanding of the AI verifying the outputs? structural_tag only constrains text within the start and end tags, but allows arbitrary text outside of it?

I tested it with a bunch of contradictory prompts to prove that the output was actually being constrained to the schema:

There are some discrepancies, like:

  • In the "JSON schema with optional fields" example, the LLM doesn't seem to ever use the optional fields, even when specifically told to.
  • Also the "JSON object" mode seems to allow extraneous text outside the JSON object, which I'm not sure is as intended.

But those are probably all vLLM or Outlines/LMFE issues, not worker-vllm issues?

There may also be an issue with newlines not being escaped correctly in json output? I'm not sure if that's the fault of my script, or worker-vllm, or vLLM itself, or Outlines/LMFE.

If I go on the Runpod Requests tab and request

{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "IMPORTANT: When outputting JSON, use actual newlines within string values, not escaped \\n characters. For example, use:\n{\n  \"text\": \"This is line one\nThis is line two\nThis is line three\"\n}\nNever use \\n escapes in your JSON string values."
      },
      {
        "role": "user",
        "content": "Output a JSON object with a multi-line text field describing the Eiffel Tower"
      }
    ],
    "sampling_params": {
      "max_tokens": 200,
      "temperature": 0.1,
      "extra_body": {
        "structured_outputs": {
          "json": {
            "type": "object",
            "properties": {
              "description": {
                "type": "string",
                "description": "Multi-line description"
              },
              "year_built": {
                "type": "integer"
              }
            },
            "required": [
              "description",
              "year_built"
            ]
          }
        }
      }
    }
  }
}

it ignores the prompt and still escapes newlines:

{
  "delayTime": 61,
  "executionTime": 4479,
  "id": "7d5bacca-3d9f-4bc8-84bf-f1d44d078bd8-u2",
  "output": [
    {
      "choices": [
        {
          "tokens": [
            "{\n  \"description\": \"The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower.\\n\\nStanding at 330 meters tall, it was the tallest man-made structure in the world until the completion of the Chrysler Building in New York in 1930. The Eiffel Tower is now a global cultural icon of France and one of the most recognizable structures in the world.\\n\\nThe tower has three levels for visitors, with restaurants on the first and second levels. The top level's upper platform is 276 meters above the ground, accessible by staircase or elevator. The tower's design includes a series of open-lattice platforms, which allow wind to pass through and reduce the swaying of the structure.\\n\\"
          ]
        }
      ],
      "usage": {
        "input": 90,
        "output": 200
      }
    }
  ],
  "status": "COMPLETED",
  "workerId": "1ut1hgflhf242q"
}

so I'm not sure why I'm seeing unescaped newlines occasionally. I made tests for them and they all passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Guided decoding backends not working - both old and new APIs fail

1 participant