Skip to content

array_unique() with SORT_REGULAR returns duplicate values #20262

@jmarble

Description

@jmarble

Description

The following code:

<?php
$units = ['5', '10', '5', '3A', '5', '5'];
$unique = array_unique($units, SORT_REGULAR);
print_r($unique);

Resulted in this output:

Array
(
    [0] => 5
    [1] => 10
    [3] => 3A
    [4] => 5
)

But I expected this output instead:

Array
(
    [0] => 5
    [1] => 10
    [3] => 3A
)

The value "5" appears at both indices [0] and [4] in the result.

This array represents apartment/unit numbers (e.g., Unit 5, Unit 10, Unit 3A), which is a standard format in building management systems where letter suffixes denote sub-units or variations.

Demonstration: https://3v4l.org/M5lcP


Root Cause

From analyzing PHP source code (ext/standard/array.c, Zend/zend_operators.c):

The algorithm:

  1. Sort array using zendi_smart_strcmp() which calls is_numeric_string_ex()
  2. Walk through sorted array comparing only adjacent elements
  3. Delete duplicates from original array

The bug:

is_numeric_string_ex() extracts leading numeric portions:

  • "3A" → extracts 3
  • "5" → extracts 5
  • "10" → extracts 10
  • Compares numerically: 3 < 5 < 10

However, unstable sort produces:

Sorted: ["5", "10", "10", "3A", "5", "5"]

The "3A" (numeric value 3) ends up AFTER "10" instead of before "5", separating the duplicate "5" values.

The deduplication walks through comparing adjacent elements:

lastkept = position_0;  // "5"
position_1 "10" != "5"keep, lastkept = position_1
position_2 "10" == "10"delete
position_3 "3A" != "10"keep, lastkept = position_3
position_4 "5" != "3A"keepBug! Never compared to position_0
position_5 "5" == "5"delete

The flaw: The algorithm only compares with lastkept (last unique value), not with all previous values. Position 4's "5" is never compared back to position 0's "5".

Source files:

  • ext/standard/array.c - PHP_FUNCTION(array_unique)
  • Zend/zend_operators.c - zendi_smart_strcmp(), is_numeric_string_ex()

Comparison with SORT_STRING

<?php
$units = ['5', '10', '5', '3A', '5', '5'];
echo count(array_unique($units, SORT_REGULAR)) . "\n"; // 4 ✗ Wrong
echo count(array_unique($units, SORT_STRING)) . "\n";  // 3 ✓ Correct

SORT_STRING uses lexical comparison without numeric extraction, so duplicates stay grouped.


Workaround

<?php
$unique = array_unique($array, SORT_STRING);

PHP Version

PHP 8.4.13 (cli) (built: Sep 26 2025 00:45:36) (NTS clang 15.0.0)
Copyright (c) The PHP Group
Built by Laravel Herd
Zend Engine v4.4.13, Copyright (c) Zend Technologies
    with Zend OPcache v8.4.13, Copyright (c), by Zend Technologies

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions