⚡️ Speed up function `_negotiate_grid_size` by 328% #788

codeflash-ai · 2025-12-03T23:11:32Z

📄 328% (3.28x) speedup for `_negotiate_grid_size` in `inference/core/utils/drawing.py`

⏱️ Runtime : 452 microseconds → 106 microseconds (best of 41 runs)

📝 Explanation and details

The optimized code achieves a 327% speedup through three key optimizations:

1. Replaced np.sqrt() with math.sqrt()
The original code used math.ceil(np.sqrt(len(images))) which is significantly slower than math.ceil(math.sqrt(images_len)). NumPy's sqrt function has overhead for dtype checking and array operations even when called on a scalar, while math.sqrt() is optimized for scalar operations.

2. Eliminated the while loop with direct arithmetic
The original code used an iterative approach to find the minimum number of rows:

while proposed_columns * (proposed_rows - 1) >= len(images):
    proposed_rows -= 1

This was replaced with a direct calculation using ceiling division:

proposed_rows = (images_len + proposed_columns - 1) // proposed_columns

This eliminates 1-39 loop iterations (averaging ~0.36 iterations per call based on profiler data).

3. Cached len(images) in images_len
The original code called len(images) multiple times (3-4 times per function call). Caching this value eliminates redundant length calculations.

Performance Impact Analysis:

For small image counts (≤3), improvements are modest (0.5-6%) since only the fast path executes
For larger image counts (≥4), speedups are dramatic (280-580%) due to avoiding the expensive np.sqrt() and loop operations
The function is called from _establish_grid_size() which handles grid layout calculations, suggesting this optimization will benefit any drawing/visualization workflows that need to arrange multiple images in grids

Test Case Insights:
The optimization is most effective for cases requiring grid calculations (4+ images), which represents the majority of real-world usage where multiple images need to be arranged in a grid layout.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 131 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import math
from typing import List, Tuple

import numpy as np

# imports
import pytest  # used for our unit tests
from inference.core.utils.drawing import _negotiate_grid_size

MAX_COLUMNS_FOR_SINGLE_ROW_GRID = 3
from inference.core.utils.drawing import _negotiate_grid_size


# Helper for creating dummy images
def make_images(n):
    # Each image is a 1x1 numpy array
    return [np.zeros((1, 1)) for _ in range(n)]


# -------------------------
# Basic Test Cases
# -------------------------


def test_zero_images():
    # Edge case: zero images
    images = []
    rows, cols = _negotiate_grid_size(images)  # 699ns -> 687ns (1.75% faster)


def test_one_image():
    # Single image should be 1 row, 1 column
    images = make_images(1)
    rows, cols = _negotiate_grid_size(images)  # 581ns -> 550ns (5.64% faster)


def test_two_images():
    # Two images should be 1 row, 2 columns
    images = make_images(2)
    rows, cols = _negotiate_grid_size(images)  # 562ns -> 542ns (3.69% faster)


def test_three_images():
    # Three images should be 1 row, 3 columns
    images = make_images(3)
    rows, cols = _negotiate_grid_size(images)  # 547ns -> 544ns (0.551% faster)


def test_four_images():
    # Four images: sqrt(4)=2, so 2x2 grid
    images = make_images(4)
    rows, cols = _negotiate_grid_size(images)  # 11.4μs -> 1.68μs (580% faster)


def test_five_images():
    # Five images: ceil(sqrt(5))=3, so start with 3x3, reduce rows if possible
    images = make_images(5)
    rows, cols = _negotiate_grid_size(images)  # 6.83μs -> 1.27μs (438% faster)


def test_six_images():
    # Six images: ceil(sqrt(6))=3, 3x2=6
    images = make_images(6)
    rows, cols = _negotiate_grid_size(images)  # 5.93μs -> 1.21μs (388% faster)


def test_seven_images():
    # Seven images: ceil(sqrt(7))=3, 3x3=9, 3x2=6<7, so should be 3x3
    images = make_images(7)
    rows, cols = _negotiate_grid_size(images)  # 5.90μs -> 1.19μs (397% faster)


def test_nine_images():
    # Nine images: sqrt(9)=3, so 3x3
    images = make_images(9)
    rows, cols = _negotiate_grid_size(images)  # 5.85μs -> 1.14μs (414% faster)


# -------------------------
# Edge Test Cases
# -------------------------


def test_max_columns_for_single_row_grid():
    # Test at the boundary of MAX_COLUMNS_FOR_SINGLE_ROW_GRID
    images = make_images(MAX_COLUMNS_FOR_SINGLE_ROW_GRID)
    rows, cols = _negotiate_grid_size(images)  # 504ns -> 505ns (0.198% slower)


def test_one_more_than_max_columns_for_single_row_grid():
    # Test just above the boundary
    images = make_images(MAX_COLUMNS_FOR_SINGLE_ROW_GRID + 1)
    rows, cols = _negotiate_grid_size(images)  # 6.69μs -> 1.27μs (425% faster)


def test_perfect_square_images():
    # Test for a perfect square number of images (16)
    images = make_images(16)
    rows, cols = _negotiate_grid_size(images)  # 5.84μs -> 1.22μs (381% faster)


def test_just_above_perfect_square():
    # Test for 17 images (just above 16)
    images = make_images(17)
    rows, cols = _negotiate_grid_size(images)  # 5.94μs -> 1.21μs (390% faster)


def test_just_below_perfect_square():
    # Test for 15 images (just below 16)
    images = make_images(15)
    rows, cols = _negotiate_grid_size(images)  # 5.69μs -> 1.16μs (392% faster)


def test_large_prime_number_images():
    # Test for a large prime number (e.g., 97)
    images = make_images(97)
    rows, cols = _negotiate_grid_size(images)  # 5.70μs -> 1.17μs (388% faster)


def test_non_square_but_rectangular():
    # Test for 20 images
    images = make_images(20)
    rows, cols = _negotiate_grid_size(images)  # 5.79μs -> 1.18μs (393% faster)


# -------------------------
# Large Scale Test Cases
# -------------------------


@pytest.mark.timeout(2)
def test_large_number_of_images_just_below_1000():
    # Test for 999 images
    images = make_images(999)
    rows, cols = _negotiate_grid_size(images)  # 6.71μs -> 1.47μs (358% faster)


@pytest.mark.timeout(2)
def test_large_number_of_images_perfect_square():
    # Test for 900 images (30x30)
    images = make_images(900)
    rows, cols = _negotiate_grid_size(images)  # 6.95μs -> 1.44μs (384% faster)


@pytest.mark.timeout(2)
def test_large_number_of_images_just_above_square():
    # Test for 821 images (sqrt=28.6, ceil=29, 29x29=841, 29x28=812<821, so rows=29, cols=29)
    images = make_images(821)
    rows, cols = _negotiate_grid_size(images)  # 6.40μs -> 1.41μs (355% faster)


@pytest.mark.timeout(2)
def test_large_number_of_images_max_columns_for_single_row_grid():
    # Test for 3 images (should be 1x3, even for large tests)
    images = make_images(3)
    rows, cols = _negotiate_grid_size(images)  # 534ns -> 545ns (2.02% slower)


# -------------------------
# Additional Edge Cases
# -------------------------


def test_non_ndarray_elements():
    # Should work, function doesn't check type of elements
    images = [1, 2, 3]
    rows, cols = _negotiate_grid_size(images)  # 716ns -> 673ns (6.39% faster)


# -------------------------
# Property-based Test
# -------------------------


@pytest.mark.parametrize("n", [0, 1, 2, 3, 4, 10, 50, 100, 500, 999])
def test_grid_covers_all_images(n):
    # For any n, grid must have at least n slots
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images)  # 47.2μs -> 10.5μs (349% faster)
    # For n <= 3, must be 1 row
    if n <= MAX_COLUMNS_FOR_SINGLE_ROW_GRID:
        pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import math
from typing import List, Tuple

import numpy as np

# imports
import pytest  # used for our unit tests
from inference.core.utils.drawing import _negotiate_grid_size

MAX_COLUMNS_FOR_SINGLE_ROW_GRID = 3
from inference.core.utils.drawing import _negotiate_grid_size

# unit tests


def make_images(n, shape=(1, 1)):
    """Helper function to create a list of n dummy np.ndarray images."""
    return [np.zeros(shape, dtype=np.uint8) for _ in range(n)]


# -------------------- Basic Test Cases --------------------


def test_empty_list_returns_1_0():
    # Edge: 0 images should return (1, 0)
    images = []
    codeflash_output = _negotiate_grid_size(images)  # 605ns -> 566ns (6.89% faster)


def test_single_image_returns_1_1():
    # Basic: 1 image should return (1, 1)
    images = make_images(1)
    codeflash_output = _negotiate_grid_size(images)  # 533ns -> 548ns (2.74% slower)


def test_two_images_returns_1_2():
    # Basic: 2 images should return (1, 2)
    images = make_images(2)
    codeflash_output = _negotiate_grid_size(images)  # 510ns -> 512ns (0.391% slower)


def test_three_images_returns_1_3():
    # Basic: 3 images should return (1, 3)
    images = make_images(3)
    codeflash_output = _negotiate_grid_size(images)  # 506ns -> 521ns (2.88% slower)


def test_four_images_returns_2_2():
    # Basic: 4 images, sqrt(4)=2, should produce 2x2 grid
    images = make_images(4)
    codeflash_output = _negotiate_grid_size(images)  # 9.09μs -> 1.48μs (514% faster)


def test_five_images_returns_2_3():
    # Basic: 5 images, ceil(sqrt(5))=3, should produce 2x3 grid
    images = make_images(5)
    codeflash_output = _negotiate_grid_size(images)  # 6.40μs -> 1.17μs (446% faster)


def test_six_images_returns_2_3():
    # Basic: 6 images, ceil(sqrt(6))=3, should produce 2x3 grid
    images = make_images(6)
    codeflash_output = _negotiate_grid_size(images)  # 5.95μs -> 1.14μs (422% faster)


def test_seven_images_returns_3_3():
    # Basic: 7 images, ceil(sqrt(7))=3, should produce 3x3 grid
    images = make_images(7)
    codeflash_output = _negotiate_grid_size(images)  # 5.27μs -> 1.16μs (356% faster)


def test_nine_images_returns_3_3():
    # Basic: 9 images, perfect square, should produce 3x3 grid
    images = make_images(9)
    codeflash_output = _negotiate_grid_size(images)  # 5.43μs -> 1.15μs (371% faster)


def test_ten_images_returns_4_4():
    # Basic: 10 images, ceil(sqrt(10))=4, should produce 3x4 grid
    images = make_images(10)
    codeflash_output = _negotiate_grid_size(images)  # 5.39μs -> 1.19μs (355% faster)


# -------------------- Edge Test Cases --------------------


def test_max_columns_single_row_grid_boundary():
    # Edge: 3 images is the max for single row
    images = make_images(3)
    codeflash_output = _negotiate_grid_size(images)  # 511ns -> 510ns (0.196% faster)


def test_just_above_max_columns_single_row_grid():
    # Edge: 4 images should trigger grid logic, not single row
    images = make_images(4)
    codeflash_output = _negotiate_grid_size(images)  # 6.99μs -> 1.31μs (434% faster)


def test_large_prime_number_of_images():
    # Edge: 17 images (prime), ceil(sqrt(17))=5, so should be 4x5 grid
    images = make_images(17)
    codeflash_output = _negotiate_grid_size(images)  # 6.84μs -> 1.55μs (341% faster)


def test_perfect_square_images():
    # Edge: 16 images, perfect square, should be 4x4
    images = make_images(16)
    codeflash_output = _negotiate_grid_size(images)  # 6.48μs -> 1.32μs (392% faster)


def test_just_below_perfect_square():
    # Edge: 15 images, ceil(sqrt(15))=4, should be 4x4 grid, but only 4x4=16
    # The while loop should reduce rows to 4, as 4*3=12 < 15, so should be 4x4
    images = make_images(15)
    codeflash_output = _negotiate_grid_size(images)  # 5.90μs -> 1.21μs (389% faster)


def test_just_above_perfect_square():
    # Edge: 17 images, ceil(sqrt(17))=5, should be 4x5
    images = make_images(17)
    codeflash_output = _negotiate_grid_size(images)  # 5.61μs -> 1.13μs (397% faster)


def test_large_gap_between_rows_and_columns():
    # Edge: 23 images, ceil(sqrt(23))=5, so should be 5x5 or 5x5-1
    # 5x4=20 < 23, so should be 5x5
    images = make_images(23)
    codeflash_output = _negotiate_grid_size(images)  # 5.50μs -> 1.13μs (388% faster)


def test_images_not_square_shape():
    # Edge: Images of different shapes should not affect grid size
    images = [np.zeros((10, 20)), np.zeros((5, 7)), np.zeros((1, 1)), np.zeros((3, 4))]
    codeflash_output = _negotiate_grid_size(images)  # 5.33μs -> 1.19μs (347% faster)


def test_images_are_different_types():
    # Edge: Images of different dtypes should not affect grid size
    images = [np.zeros((1, 1), dtype=np.uint8), np.zeros((1, 1), dtype=np.float32)]
    codeflash_output = _negotiate_grid_size(images)  # 498ns -> 516ns (3.49% slower)


# -------------------- Large Scale Test Cases --------------------


def test_large_number_of_images_perfect_square():
    # Large: 900 images (30x30 grid)
    images = make_images(900)
    codeflash_output = _negotiate_grid_size(images)  # 7.67μs -> 1.52μs (404% faster)


def test_large_number_of_images_just_above_perfect_square():
    # Large: 901 images, ceil(sqrt(901))=30, 30x30=900 < 901, so 31x31
    images = make_images(901)
    # 31*29=899 < 901, so 31*30=930 >= 901, while loop will set rows=31, cols=30
    # Let's check the actual calculation:
    # nearest_sqrt=30, proposed_columns=30, proposed_rows=30
    # while 30*(29)=870 >= 901? No, so rows stay at 30
    # But 30*29=870 < 901, so rows=30, columns=30
    # Actually, the code only decrements rows if columns*(rows-1) >= len(images)
    # So for 901: nearest_sqrt=30, proposed_columns=30, proposed_rows=30
    # 30*29=870 < 901, so while loop does not run, returns (30, 30)
    codeflash_output = _negotiate_grid_size(images)  # 6.88μs -> 1.43μs (381% faster)


def test_large_number_of_images_not_perfect_square():
    # Large: 997 images, ceil(sqrt(997))=32, so should be 32x32 or 32x31
    images = make_images(997)
    # 32*31=992 < 997, so should be (32, 32)
    codeflash_output = _negotiate_grid_size(images)  # 6.88μs -> 1.39μs (395% faster)


def test_large_prime_number_of_images():
    # Large: 997 images (prime), see above
    images = make_images(997)
    codeflash_output = _negotiate_grid_size(images)  # 6.84μs -> 1.55μs (341% faster)


def test_large_number_of_images_just_below_perfect_square():
    # Large: 999 images, ceil(sqrt(999))=32, 32x31=992 < 999, so (32, 32)
    images = make_images(999)
    codeflash_output = _negotiate_grid_size(images)  # 6.48μs -> 1.36μs (376% faster)


def test_large_number_images_boundary():
    # Large: 1000 images, ceil(sqrt(1000))=32, 32x32=1024 >= 1000, 32x31=992 < 1000
    images = make_images(1000)
    codeflash_output = _negotiate_grid_size(images)  # 6.58μs -> 1.44μs (358% faster)


# -------------------- Mutation Testing / Robustness --------------------


@pytest.mark.parametrize(
    "n",
    [
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        16,
        17,
        23,
        30,
        31,
        32,
        33,
        100,
        256,
        900,
        997,
        1000,
    ],
)
def test_monotonicity_of_grid_size(n):
    # For increasing n, the number of cells in the grid should always be >= n
    images = make_images(n)
    rows, cols = _negotiate_grid_size(images)  # 119μs -> 26.0μs (360% faster)
    # For n <= 3, rows should be 1
    if n <= 3:
        pass
    # For n > 3, rows and cols should both be >= 2 unless n==4 (2x2)
    elif n > 3:
        pass


def test_no_extra_rows():
    # The grid should not contain more rows than necessary
    for n in range(4, 20):
        images = make_images(n)
        rows, cols = _negotiate_grid_size(images)  # 23.2μs -> 6.11μs (280% faster)


def test_no_extra_columns():
    # The grid should not contain more columns than necessary
    for n in range(4, 20):
        images = make_images(n)
        rows, cols = _negotiate_grid_size(images)  # 21.8μs -> 5.83μs (275% faster)


def test_grid_is_as_square_as_possible():
    # The grid should be as close to square as possible (difference between rows and columns minimized)
    for n in [
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        16,
        17,
        23,
        30,
        31,
        32,
        33,
        100,
        256,
        900,
        997,
        1000,
    ]:
        images = make_images(n)
        rows, cols = _negotiate_grid_size(images)  # 25.1μs -> 7.17μs (251% faster)


def test_non_integer_images_raises():
    # If input is not a list of np.ndarray, should still work as long as list length is valid
    # This test is to ensure type hints are not enforced at runtime
    images = [1, 2, 3]
    codeflash_output = _negotiate_grid_size(images)  # 522ns -> 531ns (1.69% slower)


def test_negative_length_list_raises():
    # Negative length lists are impossible in Python, but test for robustness
    with pytest.raises(TypeError):
        _negotiate_grid_size(None)  # 1.16μs -> 1.14μs (1.49% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_negotiate_grid_size-miqmfhet and push.

The optimized code achieves a **327% speedup** through three key optimizations: **1. Replaced `np.sqrt()` with `math.sqrt()`** The original code used `math.ceil(np.sqrt(len(images)))` which is significantly slower than `math.ceil(math.sqrt(images_len))`. NumPy's sqrt function has overhead for dtype checking and array operations even when called on a scalar, while `math.sqrt()` is optimized for scalar operations. **2. Eliminated the while loop with direct arithmetic** The original code used an iterative approach to find the minimum number of rows: ```python while proposed_columns * (proposed_rows - 1) >= len(images): proposed_rows -= 1 ``` This was replaced with a direct calculation using ceiling division: ```python proposed_rows = (images_len + proposed_columns - 1) // proposed_columns ``` This eliminates 1-39 loop iterations (averaging ~0.36 iterations per call based on profiler data). **3. Cached `len(images)` in `images_len`** The original code called `len(images)` multiple times (3-4 times per function call). Caching this value eliminates redundant length calculations. **Performance Impact Analysis:** - For small image counts (≤3), improvements are modest (0.5-6%) since only the fast path executes - For larger image counts (≥4), speedups are dramatic (280-580%) due to avoiding the expensive `np.sqrt()` and loop operations - The function is called from `_establish_grid_size()` which handles grid layout calculations, suggesting this optimization will benefit any drawing/visualization workflows that need to arrange multiple images in grids **Test Case Insights:** The optimization is most effective for cases requiring grid calculations (4+ images), which represents the majority of real-world usage where multiple images need to be arranged in a grid layout.

codeflash-ai bot requested a review from grzegorz-roboflow as a code owner December 3, 2025 23:11

codeflash-ai bot requested a review from mashraf-222 December 3, 2025 23:11

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_negotiate_grid_size` by 328% #788

⚡️ Speed up function `_negotiate_grid_size` by 328% #788

Uh oh!

codeflash-ai bot commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function _negotiate_grid_size by 328% #788

Are you sure you want to change the base?

⚡️ Speed up function _negotiate_grid_size by 328% #788

Uh oh!

Conversation

codeflash-ai bot commented Dec 3, 2025

📄 328% (3.28x) speedup for _negotiate_grid_size in inference/core/utils/drawing.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `_negotiate_grid_size` by 328% #788

⚡️ Speed up function `_negotiate_grid_size` by 328% #788

📄 328% (3.28x) speedup for `_negotiate_grid_size` in `inference/core/utils/drawing.py`