Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 7% (0.07x) speedup for _normalize_json_dumps in skyvern/forge/sdk/core/security.py

⏱️ Runtime : 1.78 milliseconds 1.66 milliseconds (best of 250 runs)

📝 Explanation and details

The optimization achieves a 7% speedup by replacing isinstance() calls with direct type() comparisons and caching function references to avoid repeated lookups.

Key optimizations:

  1. type(x) is float instead of isinstance(x, float) - Direct type comparison is faster than isinstance() for built-in types, avoiding the overhead of inheritance checking.

  2. Local function reference caching - Storing normalize = _normalize_numbers before loops avoids repeated global function lookups during recursive calls, which becomes significant in nested structures.

  3. Intermediate variable assignment - Using items = x.items() separates the method call from the comprehension, potentially reducing overhead.

Performance impact by workload:

  • Large flat structures see the biggest gains (10-16% faster) as they trigger many type checks
  • Nested structures benefit from cached function references (5-7% faster)
  • Small/simple structures show minimal improvement (0-3%) due to fixed overhead

Context significance:
The function is called from generate_skyvern_webhook_signature() for payload serialization, suggesting it's used in API/webhook processing where consistent performance matters. The optimization is particularly valuable for large JSON payloads containing many numeric values, which are common in API responses and webhook data.

Test case patterns:
The optimization performs best on test cases with repeated numeric conversions in large data structures, while maintaining identical behavior for all edge cases including Unicode handling and type preservation.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 34 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import json
from typing import Any

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.core.security import _normalize_json_dumps

# unit tests

# --------------------
# Basic Test Cases
# --------------------

def test_simple_integer_and_float():
    # Test that integer values are unchanged and float values with no fraction are converted to int
    data = {"a": 1, "b": 2.0, "c": 3.5}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 9.16μs -> 9.29μs (1.37% slower)

def test_nested_dict_and_list():
    # Test nested structures with floats and ints
    data = {"x": [1.0, 2, 3.1], "y": {"z": 4.0}}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 9.78μs -> 9.85μs (0.741% slower)

def test_string_and_bool_types():
    # Test that strings and booleans are not affected
    data = {"s": "hello", "b": True, "f": 5.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.62μs -> 7.46μs (2.12% faster)

def test_empty_dict_and_list():
    # Test empty dict and list
    data = {"a": {}, "b": []}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.03μs -> 7.08μs (0.720% slower)

def test_no_floats():
    # Test a dict with no floats at all
    data = {"x": 1, "y": [2, 3], "z": {"a": 4}}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 8.60μs -> 8.67μs (0.750% slower)

# --------------------
# Edge Test Cases
# --------------------

def test_float_in_list_of_lists():
    # Test float normalization inside nested lists
    data = {"a": [[1.0, 2.2], [3.0, 4]]}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 8.82μs -> 8.54μs (3.24% faster)

def test_large_and_small_floats():
    # Test with very large and very small floats, including negative
    data = {"big": 1e10, "small": 1e-10, "negative": -2.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 10.5μs -> 10.1μs (3.79% faster)

def test_zero_and_negative_zero():
    # Test with 0.0, -0.0, and 0
    data = {"zero": 0.0, "negzero": -0.0, "intzero": 0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 6.95μs -> 7.07μs (1.70% slower)

def test_unicode_and_ensure_ascii():
    # Test that ensure_ascii=False allows unicode characters
    data = {"text": "你好", "num": 1.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 10.8μs -> 10.6μs (1.46% faster)

def test_large_flat_dict():
    # Test with a large flat dict of 1000 elements
    data = {str(i): float(i) if i % 2 == 0 else i for i in range(1000)}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 163μs -> 147μs (10.7% faster)
    # All even keys should be int, odd remain int
    loaded = json.loads(result)
    for i in range(1000):
        v = loaded[str(i)]

def test_large_nested_structure():
    # Test with a large nested structure
    data = {"outer": [{"inner": [float(i), i + 0.5]} for i in range(500)]}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 389μs -> 367μs (6.06% faster)
    loaded = json.loads(result)
    for i, item in enumerate(loaded["outer"]):
        pass

def test_large_list_of_dicts():
    # Test with a large list of dicts
    data = [{"a": float(i), "b": i} for i in range(800)]
    codeflash_output = _normalize_json_dumps({"lst": data}); result = codeflash_output # 389μs -> 343μs (13.7% faster)
    loaded = json.loads(result)
    for i, item in enumerate(loaded["lst"]):
        pass

def test_performance_large_data():
    # Test that function completes in reasonable time for large data
    import time
    data = {"nums": [float(i) for i in range(1000)]}
    start = time.time()
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 90.7μs -> 87.6μs (3.58% faster)
    duration = time.time() - start
    loaded = json.loads(result)
    # All values should be int
    for i, v in enumerate(loaded["nums"]):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import json
from typing import Any

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.core.security import _normalize_json_dumps

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_empty_dict():
    # Test that an empty dict is dumped as '{}'
    codeflash_output = _normalize_json_dumps({}) # 5.88μs -> 5.90μs (0.322% slower)

def test_simple_integers():
    # Test that integers are serialized as-is
    data = {'a': 1, 'b': 2}
    codeflash_output = _normalize_json_dumps(data) # 6.96μs -> 6.82μs (2.02% faster)

def test_simple_floats():
    # Test that floats that are not integers are serialized as floats
    data = {'a': 1.5, 'b': 2.0}
    # 2.0 should be normalized to 2 (int), 1.5 should remain float
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.64μs -> 7.67μs (0.391% slower)
    # Parse back to check
    loaded = json.loads(result)

def test_nested_dict():
    # Test nested dict with float normalization
    data = {'a': {'b': 2.0, 'c': 3.3}}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.98μs -> 7.90μs (0.936% faster)
    loaded = json.loads(result)

def test_list_of_numbers():
    # Test list containing floats and ints
    data = {'lst': [1, 2.0, 3.5, 4.0]}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 8.28μs -> 8.39μs (1.36% slower)
    loaded = json.loads(result)

def test_string_and_bool():
    # Test with string and boolean values
    data = {'s': "hello", 'b': True, 'f': 2.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.15μs -> 7.21μs (0.874% slower)
    loaded = json.loads(result)

def test_none_value():
    # Test with None value
    data = {'x': None, 'y': 2.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 6.51μs -> 6.59μs (1.24% slower)
    loaded = json.loads(result)

# -------------------- EDGE TEST CASES --------------------

def test_float_negative_zero():
    # Python's -0.0 is a float, but should normalize to 0
    data = {'negzero': -0.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 6.27μs -> 6.26μs (0.208% faster)
    loaded = json.loads(result)

def test_large_float():
    # Large float that is an integer
    data = {'big': 1e10}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 6.21μs -> 6.36μs (2.31% slower)
    loaded = json.loads(result)

def test_small_float():
    # Small float that is not an integer
    data = {'small': 1e-10}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 9.04μs -> 9.06μs (0.298% slower)
    loaded = json.loads(result)

def test_nested_list_dict():
    # Nested lists and dicts
    data = {'a': [{'b': 2.0}, {'c': [3.0, 4.5]}]}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 9.48μs -> 9.53μs (0.545% slower)
    loaded = json.loads(result)

def test_tuple_and_set():
    # Tuples and sets are not JSON serializable, should raise TypeError
    data = {'t': (1, 2.0), 's': {1, 2.0}}
    with pytest.raises(TypeError):
        _normalize_json_dumps(data) # 8.19μs -> 8.01μs (2.30% faster)

def test_non_ascii_characters():
    # Non-ASCII characters should be preserved (ensure_ascii=False)
    data = {'greeting': 'こんにちは', 'num': 2.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.51μs -> 7.52μs (0.106% slower)
    loaded = json.loads(result)

def test_keys_with_ints():
    # JSON keys must be strings, so int keys will be converted to strings by json.dumps
    data = {1: 'a', 2.0: 'b'}
    # 2.0 key should be normalized to 2
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 7.51μs -> 7.45μs (0.846% faster)
    loaded = json.loads(result)

def test_empty_list_and_dict():
    # Test with empty list and dict
    data = {'empty_list': [], 'empty_dict': {}, 'num': 2.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 11.0μs -> 10.9μs (0.201% faster)
    loaded = json.loads(result)

def test_bool_vs_int():
    # bools should not be normalized to ints
    data = {'true': True, 'false': False, 'one': 1.0, 'zero': 0.0}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 8.39μs -> 8.36μs (0.431% faster)
    loaded = json.loads(result)

# -------------------- LARGE SCALE TEST CASES --------------------

def test_large_flat_dict():
    # Test with a large flat dict of floats and ints
    data = {f'k{i}': float(i) if i % 2 == 0 else i for i in range(500)}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 86.0μs -> 77.6μs (10.8% faster)
    loaded = json.loads(result)
    for i in range(500):
        key = f'k{i}'
        if i % 2 == 0:
            pass
        else:
            pass

def test_large_nested_list():
    # Test with a large list of dicts containing floats
    data = {'lst': [{'v': float(i)} for i in range(500)]}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 185μs -> 174μs (5.94% faster)
    loaded = json.loads(result)
    for i, item in enumerate(loaded['lst']):
        pass

def test_large_mixed_structure():
    # Large mixed structure of lists and dicts with floats and ints
    data = {
        'outer': [
            {'a': float(i), 'b': [float(j) for j in range(5)]}
            for i in range(100)
        ]
    }
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 110μs -> 103μs (7.13% faster)
    loaded = json.loads(result)
    for i, d in enumerate(loaded['outer']):
        pass

def test_large_unicode():
    # Test with large number of unicode strings
    data = {f'ключ{i}': f'значение{i}' for i in range(200)}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 45.1μs -> 38.8μs (16.1% faster)
    loaded = json.loads(result)
    for i in range(200):
        pass

def test_large_float_precision():
    # Test that floats with decimals are not normalized
    data = {f'k{i}': i + 0.5 for i in range(500)}
    codeflash_output = _normalize_json_dumps(data); result = codeflash_output # 116μs -> 114μs (2.21% faster)
    loaded = json.loads(result)
    for i in range(500):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_normalize_json_dumps-mird3ltr and push.

Codeflash Static Badge

The optimization achieves a **7% speedup** by replacing `isinstance()` calls with direct `type()` comparisons and caching function references to avoid repeated lookups.

**Key optimizations:**

1. **`type(x) is float` instead of `isinstance(x, float)`** - Direct type comparison is faster than `isinstance()` for built-in types, avoiding the overhead of inheritance checking.

2. **Local function reference caching** - Storing `normalize = _normalize_numbers` before loops avoids repeated global function lookups during recursive calls, which becomes significant in nested structures.

3. **Intermediate variable assignment** - Using `items = x.items()` separates the method call from the comprehension, potentially reducing overhead.

**Performance impact by workload:**
- **Large flat structures** see the biggest gains (10-16% faster) as they trigger many type checks
- **Nested structures** benefit from cached function references (5-7% faster) 
- **Small/simple structures** show minimal improvement (0-3%) due to fixed overhead

**Context significance:**
The function is called from `generate_skyvern_webhook_signature()` for payload serialization, suggesting it's used in API/webhook processing where consistent performance matters. The optimization is particularly valuable for large JSON payloads containing many numeric values, which are common in API responses and webhook data.

**Test case patterns:**
The optimization performs best on test cases with repeated numeric conversions in large data structures, while maintaining identical behavior for all edge cases including Unicode handling and type preservation.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 11:38
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant