Skip to content

Serialize: Respect attribute type for empty arrays #8789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

ockham
Copy link
Contributor

@ockham ockham commented May 9, 2025

In serialize_attributes, if the value of an attribute is an empty array(), it will always be JSON-encoded into [] even though it could be an associative array (which should be encoded into {}) — but that distinction just doesn’t exist for empty arrays in PHP. In other words, PHP simply doesn’t have enough (meta) information to know what it should encode an empty array() into.

So our best bet is to infer the attribute type from the block definition. This isn't great, since serialize_attributes is a low-level function that was previously able to operate without any knowledge about higher-level block semantics, but alas.

Note that this also violates function signature parity of serialize_attributes (PHP) and serializeAttributes (JS), as the PHP version needs to know the block name in order to look up its block definition. Maybe the violation is permissible here, but I’d like to hear @dmsnell’s optionion before proceeding. Alternatively, we can move the logic I’m introducing here one level up, i.e. into get_comment_delimited_block_content. This should still be sufficient for all practical purposes; it is also currently the only callsite of serialize_attributes in the entire Core codebase.

Trac ticket: https://core.trac.wordpress.org/ticket/63325
Gutenberg issue: WordPress/gutenberg#69959
Prior art: #8735 h/t @margolisj


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Copy link

github-actions bot commented May 9, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props bernhard-reiter, mamaduka, gziolo, wildworks, getsyash, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

// An empty `array()` is encoded as `[]` in JSON. However, it's possible
// that the attribute type is really an object (associative array in PHP),
// so we need to check for that.
$block_type = WP_Block_Type_Registry::get_instance()->get_registered( $block_name );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's probably better to move this outside of the loop. There's no reason to get a block type instance on each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh that's right 😅
The reason I put it there was to avoid the block type lookup altogether if there was no empty array at all. But we can probably just cache it after it's first been looked up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registry is already working like a memory cache, right? The lookup shouldn't be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mamaduka Mamaduka requested a review from aaronjorbin May 9, 2025 08:44
Copy link

github-actions bot commented May 9, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@ockham ockham self-assigned this May 9, 2025
@gziolo
Copy link
Member

gziolo commented May 9, 2025

Alternatively, we can move the logic I’m introducing here one level up, i.e. into get_comment_delimited_block_content. This should still be sufficient for all practical purposes; it is also currently the only callsite of serialize_attributes in the entire Core codebase.

That sounds more compelling to me as this would leave serialize_attributes with the old signature, helping to avoid adding a bit confusing behavior. I was looking at the code, and it doesn't look it would be straightforward to map an empty array to an empty object during block parsing. It feels like that would be the best place to ensure proper type recognition.

@ockham
Copy link
Contributor Author

ockham commented May 9, 2025

BTW this still doesn't solve the problem for empty arrays that are nested inside objects, which was the original example in WordPress/gutenberg#69959, e.g.

<!-- wp:nectar-blocks/button {"blockId":"block-RX3iloq4aK","bgColor":{"desktop":{"type":"solid","solidValue":"#2c80af","gradientValue":""},"tablet":{},"mobile":{},"hover":{}}} -->
<div class="wp-block-nectar-blocks-button nectar-blocks-button nectar-font-label" id="block-RX3iloq4aK"><a href="#" class="nectar__link nectar-blocks-button__inner"><span class="nectar-blocks-button__text"></span></a></div>
<!-- /wp:nectar-blocks/button -->

Note that bgColor is an object; its tablet, mobile, and hover properties are all empty objects ({}). These will currently get transformed into []s, as block.json doesn't contain "full" schema information about object-type attributes.

The only piece of information that we might be able to use is the default value, e.g.

"query": {
"type": "object",
"default": {
"perPage": null,
"pages": 0,
"offset": 0,
"postType": "post",
"order": "desc",
"orderBy": "date",
"author": "",
"search": "",
"exclude": [],
"sticky": "",
"inherit": true,
"taxQuery": null,
"parents": [],
"format": []
}
},

(In this example, we can assume that the exclude, parents, and format types are all arrays.)

Might be material for a follow-up, though, as it gets us into slightly more "heuristic" territory. Also, ironically, this is plagued by the same problem as the original bug: If the default is a {}, PHP will transform it into array(). So we'll need to do some extra legwork to figure out the exact property type in the default value; and we can only do so if it's coming from a block.json rather than an "ad-hoc" register_block_type.

@t-hamano
Copy link

t-hamano commented May 9, 2025

This should still be sufficient for all practical purposes; it is also currently the only callsite of serialize_attributes in the entire Core codebase.

If I understand correctly, if consumers do not specify the optional second argument, the return value will be the same as before, right? Because this function seems to be used by some plugins.

https://wpdirectory.net/search/01JTTHDK6PXHAWX44Z2HXE3SEH

@getsyash
Copy link

Thanks for tackling this — it's a subtle edge case with real consequences, especially for block attributes that depend on accurate type preservation.
I agree that introducing $block_name into serialize_attributes() complicates its previously low-level, stateless nature. Moving this logic into get_comment_delimited_block_content() seems like a cleaner solution and keeps serialize_attributes() generic and reusable.
Also appreciate the note about nested empty arrays — definitely sounds like it warrants a follow-up, maybe with enhancements to how block schemas are defined or parsed (especially from block.json).

@gziolo
Copy link
Member

gziolo commented May 12, 2025

BTW this still doesn't solve the problem for empty arrays that are nested inside objects, which was the original example in WordPress/gutenberg#69959, e.g.

One way to solve it would be to introduce another field similar to enum that would allow defining the JSON schema describing the shape of the objects and arrays. In fact, it wouldn't differ that much from what enum offers today, but with greater flexibility.

@gziolo
Copy link
Member

gziolo commented May 12, 2025

The root cause lies in how block attributes gets decoded from the serialized JSON:

? json_decode( $matches['attrs'][0], /* as-associative */ true )

In effect, empty arrays and empty objects become indistinguishable in PHP.

The same case in JavaScript gets handled differently:

https://github.com/WordPress/gutenberg/blob/918d152f25248a53e3fedd1ec3af2eadeb038e9d/packages/block-serialization-default-parser/src/index.js#L415

https://github.com/WordPress/gutenberg/blob/918d152f25248a53e3fedd1ec3af2eadeb038e9d/packages/block-serialization-default-parser/src/index.js#L373

Example:

Screenshot 2025-05-12 at 11 27 27

By the way, I don't think we can't change the associative flag in PHP at this point as a measure to fix the issue. The return shape of the parsed JSON changes to substantially as illustrated here (true vs false):

Screenshot 2025-05-12 at 11 36 38

@ockham
Copy link
Contributor Author

ockham commented May 14, 2025

BTW this still doesn't solve the problem for empty arrays that are nested inside objects, which was the original example in WordPress/gutenberg#69959, e.g.

One way to solve it would be to introduce another field similar to enum that would allow defining the JSON schema describing the shape of the objects and arrays. In fact, it wouldn't differ that much from what enum offers today, but with greater flexibility.

Indeed. But that sounds like more of a longer-term solution to me, as it'll require both updating the block.json schema (to allow a JSON schema for its attribute types); Core blocks to be updated to specify that attribute type schema where appropriate; and 3rd party block authors to adopt it as well.

@dmsnell
Copy link
Member

dmsnell commented May 14, 2025

What happens if we skip serializing empty arrays? I’m unnerved for the same reasons you are about reading the block registry on serialization:

  • it doesn’t solve this problem; it only helps in some cases where the block attribute definitions are available and where the attributes are at the top-level
  • it couples a behavior that works universally with one that’s now highly dependent on the runtime environment, because my system will serialize blocks differently than yours and my system might even serialize differently based on what plugins are activated, meaning this is different now than it was an hour ago

I’d be interested in approaching this from the other side: let’s declare a hidden attribute on all parsed map types and use the presence of that attribute upon serialization. Though, this is still incomplete because people will be running things like $block['attrs']['settings'] = []

// decoding attributes on parse
foreach ( walk_parsed_attributes() as $name => &$value ) {
   if ( $value instance stdClass ) {
      $value['__wp_meta__is_js_map'] = true;
   }
}

// serializing
if ( is_array( $value ) && isset( $value['__wp_meta__is_js_map'] ) ) {
   unset( $value['__wp_meta__is_js_map'] );
   $value = (stdClass) $value;
}

@ockham
Copy link
Contributor Author

ockham commented May 15, 2025

@dmsnell Interesting idea 🤔 I was trying to find a way to add that sort of meta information (i.e. array or object?) but didn't think of that. I like it, it's pretty straight-forward!

I'll give that a try 👍 (Not sure when I'll get around to; I'm about to wrap up my week and will be pretty busy next week.)

@gziolo
Copy link
Member

gziolo commented May 15, 2025

Very interesting idea. Unless I misunderstood what you proposed, my only concern is that the problem exists at the JSON parsing level in PHP, so this meta information would have to be serialized in HTML on the client. In effect, it would also need to get removed during parsing on the client before passing the data to the block editor. It might require very detailed testing to ensure it doesn't break anything when this special property leaks somewhere.

@dmsnell
Copy link
Member

dmsnell commented May 19, 2025

@gziolo the JS side doesn’t have this problem because it does differentiate between empty arrays and empty objects. In fact, we also don’t have this problem when calling json_decode() in PHP because we can tell json_decode() to return stdClass for objects and array for arrays. so if we use json_decode( $json, false ) we now retrieve a nested object hierarchy which we can convert into numeric and associative arrays. it does involve some overhead in the conversion, but I would think not much

on serialize we erase any residual of this meta-typing


care of mlx-community/Qwen3-32B

/**
 * Recursively converts a nested stdClass object into an associative array.
 * Every stdClass object converted into an array will include a "__js_map" => true
 * marker key to indicate that conversion took place.
 *
 * @param mixed $data The input data structure which may include:
 *                    - stdClass objects
 *                    - arrays
 *                    - scalars (string, bool, null, etc.)
 *
 * @return mixed The transformed structure with stdClass objects converted into arrays
 */
function convertToAssociativeArray($data) {
    if (is_object($data) && get_class($data) === 'stdClass') {
        // Initialize result array
        $arrayResult = [];

        // Recursively convert each property of the object
        foreach ($data as $key => $value) {
            $arrayResult[$key] = convertToAssociativeArray($value);
        }

        // Add the "__js_map" marker after processing all properties
        $arrayResult['__js_map'] = true;

        return $arrayResult;
    }

    if (is_array($data)) {
        // Recursively process each element of the array
        $result = [];
        foreach ($data as $key => $value) {
            $result[$key] = convertToAssociativeArray($value);
        }
        return $result;
    }

    // Return scalars, nulls, etc., unchanged
    return $data;
}
$obj = new stdClass();
$obj->name = "Alice";
$obj->age = 30;
$obj->tags = ['developer', 'PHP'];
$obj->metadata = new stdClass();
$obj->metadata->role = "Senior";
$obj->metadata->active = true;

$result = convertToAssociativeArray($obj);

print_r($result);
Array
(
    [name] => Alice
    [age] => 30
    [tags] => Array
        (
            [0] => developer
            [1] => PHP
        )

    [metadata] => Array
        (
            [role] => Senior
            [active] => 1
            [__js_map] => true
        )

    [__js_map] => true
)

we would ideally try to think of a non-recursive version, but this would be okay with better naming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants