-
Notifications
You must be signed in to change notification settings - Fork 8k
Description
Description
The following code:
<?php
$units = ['5', '10', '5', '3A', '5', '5'];
$unique = array_unique($units, SORT_REGULAR);
print_r($unique);
Resulted in this output:
Array
(
[0] => 5
[1] => 10
[3] => 3A
[4] => 5
)
But I expected this output instead:
Array
(
[0] => 5
[1] => 10
[3] => 3A
)
The value "5"
appears at both indices [0] and [4] in the result.
This array represents apartment/unit numbers (e.g., Unit 5, Unit 10, Unit 3A), which is a standard format in building management systems where letter suffixes denote sub-units or variations.
Demonstration: https://3v4l.org/M5lcP
Root Cause
From analyzing PHP source code (ext/standard/array.c
, Zend/zend_operators.c
):
The algorithm:
- Sort array using
zendi_smart_strcmp()
which callsis_numeric_string_ex()
- Walk through sorted array comparing only adjacent elements
- Delete duplicates from original array
The bug:
is_numeric_string_ex()
extracts leading numeric portions:
"3A"
→ extracts3
"5"
→ extracts5
"10"
→ extracts10
- Compares numerically:
3 < 5 < 10
However, unstable sort produces:
Sorted: ["5", "10", "10", "3A", "5", "5"]
The "3A"
(numeric value 3) ends up AFTER "10"
instead of before "5"
, separating the duplicate "5"
values.
The deduplication walks through comparing adjacent elements:
lastkept = position_0; // "5"
position_1 "10" != "5" → keep, lastkept = position_1
position_2 "10" == "10" → delete
position_3 "3A" != "10" → keep, lastkept = position_3
position_4 "5" != "3A" → keep ← Bug! Never compared to position_0
position_5 "5" == "5" → delete
The flaw: The algorithm only compares with lastkept
(last unique value), not with all previous values. Position 4's "5"
is never compared back to position 0's "5"
.
Source files:
ext/standard/array.c
-PHP_FUNCTION(array_unique)
Zend/zend_operators.c
-zendi_smart_strcmp()
,is_numeric_string_ex()
Comparison with SORT_STRING
<?php
$units = ['5', '10', '5', '3A', '5', '5'];
echo count(array_unique($units, SORT_REGULAR)) . "\n"; // 4 ✗ Wrong
echo count(array_unique($units, SORT_STRING)) . "\n"; // 3 ✓ Correct
SORT_STRING
uses lexical comparison without numeric extraction, so duplicates stay grouped.
Workaround
<?php
$unique = array_unique($array, SORT_STRING);
PHP Version
PHP 8.4.13 (cli) (built: Sep 26 2025 00:45:36) (NTS clang 15.0.0)
Copyright (c) The PHP Group
Built by Laravel Herd
Zend Engine v4.4.13, Copyright (c) Zend Technologies
with Zend OPcache v8.4.13, Copyright (c), by Zend Technologies
Operating System
No response