Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 3, 2025

📄 75% (0.75x) speedup for _establish_grid_size in inference/core/utils/drawing.py

⏱️ Runtime : 162 microseconds 92.2 microseconds (best of 42 runs)

📝 Explanation and details

The optimization achieves a 75% speedup through two key changes that eliminate redundant function calls:

What was optimized:

  1. Replaced np.sqrt() with math.sqrt() in _negotiate_grid_size() - NumPy's sqrt is designed for arrays and has overhead when used on scalars
  2. Cached len(images) as num_images to avoid repeated length calculations within the same function

Why this leads to speedup:

  • math.sqrt() is significantly faster than np.sqrt() for scalar values (avoiding NumPy's array handling overhead)
  • Caching len(images) eliminates redundant list traversals, especially impactful for larger image lists

Performance impact by test case:

  • Largest gains (300-480% faster) occur when _negotiate_grid_size() is called, particularly with 4+ images that require square grid calculation
  • Minimal impact (0-6% variation) for cases that bypass _negotiate_grid_size() (explicit grid dimensions)
  • Best performance with large image counts (1000 images: 373% faster) where the sqrt calculation dominates

How this impacts existing workloads:
Based on the function reference, _establish_grid_size() is called from create_tiles() - a utility for arranging multiple images into a grid layout. This optimization will significantly improve performance when:

  • Processing batches of images for visualization dashboards
  • Generating image grids for inference results display
  • Creating tiled layouts where grid dimensions need to be calculated automatically

The optimization maintains identical behavior while delivering substantial performance gains for the common case of automatic grid size negotiation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import math
from typing import List, Optional, Tuple

import numpy as np

# imports
import pytest
from inference.core.utils.drawing import _establish_grid_size

# unit tests


# Helper function to generate dummy images
def make_images(n):
    # Each image is a 1x1 numpy array
    return [np.zeros((1, 1), dtype=np.uint8) for _ in range(n)]


# 1. Basic Test Cases


def test_grid_size_none_returns_negotiate_grid_size():
    # grid_size is None, should negotiate grid size
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 11.9μs -> 2.17μs (446% faster)


def test_grid_size_all_none_returns_negotiate_grid_size():
    # grid_size is (None, None), should negotiate grid size
    images = make_images(2)
    codeflash_output = _establish_grid_size(
        images, (None, None)
    )  # 1.67μs -> 1.72μs (2.62% slower)


def test_grid_size_both_specified():
    # grid_size is (rows, cols), both specified
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (2, 3)
    )  # 1.28μs -> 1.36μs (6.25% slower)


def test_grid_size_rows_none():
    # grid_size is (None, cols), should compute rows
    images = make_images(7)
    # ceil(7/3) = 3 rows, 3 columns
    codeflash_output = _establish_grid_size(
        images, (None, 3)
    )  # 2.10μs -> 2.13μs (1.55% slower)


def test_grid_size_cols_none():
    # grid_size is (rows, None), should compute columns
    images = make_images(8)
    # ceil(8/2) = 4 columns, 2 rows
    codeflash_output = _establish_grid_size(
        images, (2, None)
    )  # 1.82μs -> 1.93μs (5.45% slower)


# 2. Edge Test Cases


def test_zero_images():
    # No images, should return (1, 0) as per negotiate logic
    images = []
    codeflash_output = _establish_grid_size(
        images, None
    )  # 958ns -> 972ns (1.44% slower)


def test_one_image():
    # Single image, should be (1, 1)
    images = make_images(1)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 874ns -> 828ns (5.56% faster)


def test_max_columns_single_row_grid():
    # 3 images, should be (1, 3)
    images = make_images(3)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 805ns -> 804ns (0.124% faster)


def test_just_over_single_row_grid():
    # 4 images, should be (2, 2)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 10.00μs -> 1.85μs (440% faster)


def test_grid_size_rows_none_not_divisible():
    # grid_size is (None, 4), 10 images, should ceil(10/4)=3 rows
    images = make_images(10)
    codeflash_output = _establish_grid_size(
        images, (None, 4)
    )  # 2.15μs -> 2.12μs (1.56% faster)


def test_grid_size_cols_none_not_divisible():
    # grid_size is (3, None), 10 images, should ceil(10/3)=4 columns
    images = make_images(10)
    codeflash_output = _establish_grid_size(
        images, (3, None)
    )  # 1.95μs -> 1.87μs (4.34% faster)


def test_grid_size_both_none_and_empty_images():
    # grid_size is (None, None) and images is empty
    images = []
    codeflash_output = _establish_grid_size(
        images, (None, None)
    )  # 1.60μs -> 1.71μs (5.92% slower)


def test_grid_size_both_specified_smaller_than_images():
    # grid_size is (2, 2), but 5 images, should still return (2, 2) as per function contract
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (2, 2)
    )  # 1.30μs -> 1.27μs (2.37% faster)


def test_grid_size_both_specified_larger_than_images():
    # grid_size is (5, 5), but only 2 images, should still return (5, 5)
    images = make_images(2)
    codeflash_output = _establish_grid_size(
        images, (5, 5)
    )  # 1.17μs -> 1.22μs (3.85% slower)


def test_grid_size_rows_zero():
    # grid_size is (0, 3), should return (0, 3)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, (0, 3)
    )  # 1.19μs -> 1.21μs (1.65% slower)


def test_grid_size_cols_zero():
    # grid_size is (3, 0), should return (3, 0)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, (3, 0)
    )  # 1.19μs -> 1.16μs (2.32% faster)


def test_large_number_of_images_negotiate():
    # 1000 images, should negotiate grid size
    images = make_images(1000)
    rows, cols = _establish_grid_size(images, None)  # 11.9μs -> 2.46μs (384% faster)


def test_large_number_of_images_rows_none():
    # grid_size is (None, 25), 999 images
    images = make_images(999)
    rows, cols = _establish_grid_size(
        images, (None, 25)
    )  # 2.46μs -> 2.30μs (6.92% faster)


def test_large_number_of_images_cols_none():
    # grid_size is (25, None), 999 images
    images = make_images(999)
    rows, cols = _establish_grid_size(
        images, (25, None)
    )  # 2.02μs -> 2.03μs (0.543% slower)


def test_large_number_of_images_both_specified():
    # grid_size is (30, 30), 900 images
    images = make_images(900)
    codeflash_output = _establish_grid_size(
        images, (30, 30)
    )  # 1.38μs -> 1.35μs (1.93% faster)


def test_large_number_of_images_both_none():
    # grid_size is (None, None), 1000 images
    images = make_images(1000)
    rows, cols = _establish_grid_size(
        images, (None, None)
    )  # 11.3μs -> 2.96μs (280% faster)


# Additional edge: grid_size with negative numbers
def test_grid_size_negative_rows():
    # grid_size is (-2, 5)
    images = make_images(10)
    codeflash_output = _establish_grid_size(
        images, (-2, 5)
    )  # 1.36μs -> 1.37μs (0.803% slower)


def test_grid_size_negative_cols():
    # grid_size is (5, -2)
    images = make_images(10)
    codeflash_output = _establish_grid_size(
        images, (5, -2)
    )  # 1.18μs -> 1.25μs (5.54% slower)


def test_grid_size_rows_none_negative_cols():
    # grid_size is (None, -2)
    images = make_images(10)
    codeflash_output = _establish_grid_size(
        images, (None, -2)
    )  # 2.06μs -> 2.10μs (1.91% slower)


def test_grid_size_negative_rows_cols_none():
    # grid_size is (-2, None)
    images = make_images(10)
    codeflash_output = _establish_grid_size(
        images, (-2, None)
    )  # 1.88μs -> 1.81μs (3.64% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import math
from typing import List, Optional, Tuple

import numpy as np

# imports
import pytest  # used for our unit tests
from inference.core.utils.drawing import _establish_grid_size

# unit tests


# Helper function to generate dummy images
def make_images(n):
    return [np.zeros((2, 2), dtype=np.uint8) for _ in range(n)]


# -----------------------------
# BASIC TEST CASES
# -----------------------------


def test_grid_size_none_with_1_image():
    # grid_size is None, 1 image: should return (1,1)
    images = make_images(1)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 1.36μs -> 1.34μs (1.26% faster)


def test_grid_size_none_with_3_images():
    # grid_size is None, 3 images: should return (1,3)
    images = make_images(3)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 876ns -> 916ns (4.37% slower)


def test_grid_size_none_with_4_images():
    # grid_size is None, 4 images: should return (2,2) (since sqrt(4)=2)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 11.8μs -> 2.03μs (482% faster)


def test_grid_size_explicit_both():
    # grid_size is (2,3), 6 images: should return (2,3)
    images = make_images(6)
    codeflash_output = _establish_grid_size(
        images, (2, 3)
    )  # 1.43μs -> 1.37μs (4.39% faster)


def test_grid_size_explicit_rows_none():
    # grid_size is (None, 2), 5 images: should return (3,2)
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (None, 2)
    )  # 2.09μs -> 2.01μs (4.08% faster)


def test_grid_size_explicit_cols_none():
    # grid_size is (2, None), 5 images: should return (2,3)
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (2, None)
    )  # 1.85μs -> 1.82μs (1.93% faster)


def test_grid_size_tuple_all_none():
    # grid_size is (None, None), 7 images: should negotiate
    images = make_images(7)
    codeflash_output = _establish_grid_size(
        images, (None, None)
    )  # 10.3μs -> 2.60μs (296% faster)


# -----------------------------
# EDGE TEST CASES
# -----------------------------


def test_zero_images():
    # Zero images, grid_size None: should return (1,0)
    images = make_images(0)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 885ns -> 905ns (2.21% slower)


def test_one_image_explicit_grid():
    # One image, grid_size (1,1): should return (1,1)
    images = make_images(1)
    codeflash_output = _establish_grid_size(
        images, (1, 1)
    )  # 1.33μs -> 1.35μs (1.18% slower)


def test_rows_none_with_zero_images():
    # grid_size (None, 2), 0 images: should return (0,2)
    images = make_images(0)
    codeflash_output = _establish_grid_size(
        images, (None, 2)
    )  # 2.01μs -> 1.91μs (5.51% faster)


def test_cols_none_with_zero_images():
    # grid_size (2, None), 0 images: should return (2,0)
    images = make_images(0)
    codeflash_output = _establish_grid_size(
        images, (2, None)
    )  # 1.81μs -> 1.74μs (4.38% faster)


def test_invalid_grid_size_too_small():
    # grid_size (1,1) with 5 images: should return (1,1) as per implementation
    # (function does not validate grid_size vs. image count)
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (1, 1)
    )  # 1.19μs -> 1.16μs (2.67% faster)


def test_rows_none_with_exact_division():
    # grid_size (None, 2), 4 images: should return (2,2)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, (None, 2)
    )  # 2.01μs -> 1.92μs (4.91% faster)


def test_cols_none_with_exact_division():
    # grid_size (2, None), 4 images: should return (2,2)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, (2, None)
    )  # 1.79μs -> 1.70μs (5.77% faster)


def test_rows_none_with_non_exact_division():
    # grid_size (None, 2), 5 images: should return (3,2)
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (None, 2)
    )  # 1.82μs -> 1.82μs (0.055% faster)


def test_cols_none_with_non_exact_division():
    # grid_size (2, None), 5 images: should return (2,3)
    images = make_images(5)
    codeflash_output = _establish_grid_size(
        images, (2, None)
    )  # 1.75μs -> 1.77μs (0.903% slower)


def test_large_single_row():
    # grid_size None, 3 images: should be (1,3) (max single row)
    images = make_images(3)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 1.04μs -> 981ns (6.32% faster)


def test_large_single_row_over_limit():
    # grid_size None, 4 images: should be (2,2) (no longer single row)
    images = make_images(4)
    codeflash_output = _establish_grid_size(
        images, None
    )  # 9.69μs -> 1.87μs (419% faster)


def test_grid_size_with_large_column_hint():
    # grid_size (None, 1000), 999 images: should return (1,1000)
    images = make_images(999)
    codeflash_output = _establish_grid_size(
        images, (None, 1000)
    )  # 2.24μs -> 2.15μs (4.14% faster)


def test_grid_size_with_large_row_hint():
    # grid_size (1000, None), 999 images: should return (1000,1)
    images = make_images(999)
    codeflash_output = _establish_grid_size(
        images, (1000, None)
    )  # 2.08μs -> 2.02μs (2.92% faster)


# -----------------------------
# LARGE SCALE TEST CASES
# -----------------------------


def test_large_scale_negotiate_grid_size():
    # grid_size None, 1000 images: should negotiate a nearly square grid
    images = make_images(1000)
    rows, cols = _establish_grid_size(images, None)  # 9.95μs -> 2.10μs (373% faster)


def test_large_scale_rows_none():
    # grid_size (None, 30), 900 images: should return (30,30)
    images = make_images(900)
    codeflash_output = _establish_grid_size(
        images, (None, 30)
    )  # 2.17μs -> 2.04μs (6.21% faster)


def test_large_scale_cols_none():
    # grid_size (30, None), 900 images: should return (30,30)
    images = make_images(900)
    codeflash_output = _establish_grid_size(
        images, (30, None)
    )  # 2.04μs -> 1.98μs (3.19% faster)


def test_large_scale_rows_none_non_exact():
    # grid_size (None, 33), 1000 images: should return (31,33)
    images = make_images(1000)
    codeflash_output = _establish_grid_size(
        images, (None, 33)
    )  # 2.17μs -> 2.10μs (3.05% faster)


def test_large_scale_cols_none_non_exact():
    # grid_size (31, None), 1000 images: should return (31, 33)
    images = make_images(1000)
    codeflash_output = _establish_grid_size(
        images, (31, None)
    )  # 2.08μs -> 2.02μs (2.62% faster)


def test_large_scale_explicit_grid():
    # grid_size (25, 40), 1000 images: should return (25,40)
    images = make_images(1000)
    codeflash_output = _establish_grid_size(
        images, (25, 40)
    )  # 1.34μs -> 1.30μs (2.76% faster)


# -----------------------------
# ADDITIONAL EDGE CASES
# -----------------------------


def test_grid_size_tuple_short():
    # grid_size is a tuple of length 1, should raise an error
    images = make_images(5)
    with pytest.raises(IndexError):
        _establish_grid_size(images, (2,))  # 2.13μs -> 2.09μs (1.72% faster)


def test_images_is_none():
    # images is None, should raise an error
    with pytest.raises(TypeError):
        _establish_grid_size(None, None)  # 1.75μs -> 1.86μs (6.01% slower)


def test_images_is_empty_list_with_explicit_grid():
    # images is empty, grid_size (2,2): should return (2,2)
    images = []
    codeflash_output = _establish_grid_size(
        images, (2, 2)
    )  # 1.42μs -> 1.44μs (1.18% slower)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_establish_grid_size-miqm4k49 and push.

Codeflash Static Badge

The optimization achieves a **75% speedup** through two key changes that eliminate redundant function calls:

**What was optimized:**
1. **Replaced `np.sqrt()` with `math.sqrt()`** in `_negotiate_grid_size()` - NumPy's sqrt is designed for arrays and has overhead when used on scalars
2. **Cached `len(images)` as `num_images`** to avoid repeated length calculations within the same function

**Why this leads to speedup:**
- `math.sqrt()` is significantly faster than `np.sqrt()` for scalar values (avoiding NumPy's array handling overhead)
- Caching `len(images)` eliminates redundant list traversals, especially impactful for larger image lists

**Performance impact by test case:**
- **Largest gains** (300-480% faster) occur when `_negotiate_grid_size()` is called, particularly with 4+ images that require square grid calculation
- **Minimal impact** (0-6% variation) for cases that bypass `_negotiate_grid_size()` (explicit grid dimensions)
- **Best performance** with large image counts (1000 images: 373% faster) where the sqrt calculation dominates

**How this impacts existing workloads:**
Based on the function reference, `_establish_grid_size()` is called from `create_tiles()` - a utility for arranging multiple images into a grid layout. This optimization will significantly improve performance when:
- Processing batches of images for visualization dashboards
- Generating image grids for inference results display
- Creating tiled layouts where grid dimensions need to be calculated automatically

The optimization maintains identical behavior while delivering substantial performance gains for the common case of automatic grid size negotiation.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 3, 2025 23:03
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant