Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 12% (0.12x) speedup for encode_query in skyvern/client/core/query_encoder.py

⏱️ Runtime : 9.27 milliseconds 8.24 milliseconds (best of 206 runs)

📝 Explanation and details

The optimization achieves a 12% speedup by restructuring type checking logic to minimize redundant isinstance() calls and reduce branch prediction overhead.

Key optimizations applied:

  1. Reordered type checking: Moved dict check first since it's more common in query structures, avoiding the expensive protocol lookup for pydantic models in the common case.

  2. Eliminated redundant compound conditions: Split the original isinstance(query_value, pydantic.BaseModel) or isinstance(query_value, dict) into separate elif branches, reducing the number of type checks when the first condition fails.

  3. Streamlined list processing: In the list handling branch, removed nested conditional logic and directly handled dict and pydantic.BaseModel cases separately, eliminating duplicate isinstance() calls within the loop.

  4. Direct method calls: For pydantic models in lists, directly call .dict(by_alias=True) instead of storing in an intermediate variable, reducing memory allocations.

Why this leads to speedup:

  • isinstance() calls are relatively expensive in Python, especially for protocol-based types like pydantic models
  • Branch prediction is improved with simpler conditional structures
  • Fewer temporary variable assignments reduce memory pressure
  • The reordering takes advantage of the fact that plain dicts are more common than pydantic models in typical query parameters

Impact on workloads:
The function is called from HTTP client methods (request() and stream()) for encoding query parameters in API calls. Since these are hot paths that may process many requests with complex nested data structures, the 12% improvement becomes significant at scale. The optimization particularly benefits workloads with:

  • Large lists of dictionaries (64.9% faster in tests)
  • Nested dictionary structures (up to 23.3% faster)
  • Mixed data structures with many type checks (40.4% faster for large mixed structures)

The optimization maintains identical behavior while significantly improving performance for dictionary-heavy query encoding scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 58 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any, Dict, List, Optional, Tuple

import pydantic
# imports
import pytest  # used for our unit tests
from skyvern.client.core.query_encoder import \
    encode_query  # --- End encode_query implementation ---

# unit tests

# Helper Pydantic model for testing
class Address(pydantic.BaseModel):
    street: str
    city: str

class User(pydantic.BaseModel):
    name: str
    age: int
    address: Address

class AliasModel(pydantic.BaseModel):
    the_id: int

    class Config:
        fields = {'the_id': {'alias': 'id'}}

# 1. Basic Test Cases

def test_encode_query_none():
    # Test that None input returns None
    codeflash_output = encode_query(None) # 312ns -> 313ns (0.319% slower)

def test_encode_query_empty_dict():
    # Test that empty dict returns empty list
    codeflash_output = encode_query({}) # 514ns -> 517ns (0.580% slower)

def test_encode_query_simple_flat_dict():
    # Test encoding of a simple flat dict
    query = {'a': 1, 'b': 'foo'}
    expected = [('a', 1), ('b', 'foo')]
    codeflash_output = encode_query(query) # 1.88μs -> 1.93μs (2.34% slower)

def test_encode_query_list_of_scalars():
    # Test encoding of a list of scalars
    query = {'tags': ['python', 'pytest']}
    expected = [('tags', 'python'), ('tags', 'pytest')]
    codeflash_output = encode_query(query) # 2.16μs -> 2.15μs (0.464% faster)

def test_encode_query_nested_dict():
    # Test encoding of a nested dict
    query = {'user': {'name': 'Alice', 'age': 30}}
    expected = [('user[name]', 'Alice'), ('user[age]', 30)]
    codeflash_output = encode_query(query) # 2.48μs -> 2.08μs (19.5% faster)

def test_encode_query_list_of_dicts():
    # Test encoding of a list of dicts
    query = {'users': [{'name': 'Alice'}, {'name': 'Bob'}]}
    expected = [('users[name]', 'Alice'), ('users[name]', 'Bob')]
    codeflash_output = encode_query(query) # 3.84μs -> 3.12μs (23.0% faster)

def test_encode_query_pydantic_model():
    # Test encoding of a Pydantic model
    address = Address(street='Main', city='Springfield')
    user = User(name='Alice', age=30, address=address)
    query = {'user': user}
    expected = [
        ('user[name]', 'Alice'),
        ('user[age]', 30),
        ('user[address][street]', 'Main'),
        ('user[address][city]', 'Springfield')
    ]
    codeflash_output = encode_query(query); result = codeflash_output # 18.7μs -> 19.2μs (2.43% slower)

def test_encode_query_list_of_pydantic_models():
    # Test encoding of a list of Pydantic models
    users = [
        User(name='Alice', age=30, address=Address(street='Main', city='Springfield')),
        User(name='Bob', age=25, address=Address(street='Second', city='Shelbyville'))
    ]
    query = {'users': users}
    expected = [
        ('users[name]', 'Alice'),
        ('users[age]', 30),
        ('users[address][street]', 'Main'),
        ('users[address][city]', 'Springfield'),
        ('users[name]', 'Bob'),
        ('users[age]', 25),
        ('users[address][street]', 'Second'),
        ('users[address][city]', 'Shelbyville')
    ]
    codeflash_output = encode_query(query) # 21.1μs -> 20.1μs (4.85% faster)

def test_encode_query_empty_list():
    # Test encoding of an empty list
    query = {'tags': []}
    expected = []
    codeflash_output = encode_query(query) # 1.45μs -> 1.48μs (2.36% slower)

def test_encode_query_empty_nested_dict():
    # Test encoding of an empty nested dict
    query = {'user': {}}
    expected = []
    codeflash_output = encode_query(query) # 1.51μs -> 1.08μs (39.6% faster)

def test_encode_query_list_of_empty_dicts():
    # Test encoding of a list of empty dicts
    query = {'users': [{}, {}]}
    expected = []
    codeflash_output = encode_query(query) # 3.00μs -> 2.21μs (35.5% faster)

def test_encode_query_dict_with_none_value():
    # Test encoding of dict with None value
    query = {'a': None}
    expected = [('a', None)]
    codeflash_output = encode_query(query) # 1.40μs -> 1.40μs (0.214% slower)

def test_encode_query_nested_none():
    # Test encoding of nested None value
    query = {'user': {'name': None}}
    expected = [('user[name]', None)]
    codeflash_output = encode_query(query) # 2.00μs -> 1.66μs (20.5% faster)

def test_encode_query_list_with_none():
    # Test encoding of a list containing None
    query = {'tags': ['python', None, 'pytest']}
    expected = [('tags', 'python'), ('tags', None), ('tags', 'pytest')]
    codeflash_output = encode_query(query) # 2.17μs -> 2.25μs (3.56% slower)

def test_encode_query_mixed_types():
    # Test encoding of mixed types in list
    query = {'values': [1, 'two', None, {'three': 3}]}
    expected = [
        ('values', 1),
        ('values', 'two'),
        ('values', None),
        ('values[three]', 3)
    ]
    codeflash_output = encode_query(query) # 3.59μs -> 3.18μs (12.8% faster)

def test_encode_query_deeply_nested_dict():
    # Test encoding of a deeply nested dict
    query = {'a': {'b': {'c': {'d': 1}}}}
    expected = [('a[b][c][d]', 1)]
    codeflash_output = encode_query(query) # 2.76μs -> 2.29μs (20.9% faster)

def test_encode_query_deeply_nested_list_of_dicts():
    # Test encoding of a dict with a list of dicts, each containing a list
    query = {'groups': [{'members': ['Alice', 'Bob']}, {'members': ['Carol']}]}
    expected = [
        ('groups[members]', 'Alice'),
        ('groups[members]', 'Bob'),
        ('groups[members]', 'Carol')
    ]
    codeflash_output = encode_query(query) # 4.02μs -> 3.34μs (20.4% faster)

def test_encode_query_dict_with_various_scalar_types():
    # Test encoding of dict with various scalar types
    query = {'int': 1, 'float': 2.5, 'bool_true': True, 'bool_false': False, 'none': None}
    expected = [('int', 1), ('float', 2.5), ('bool_true', True), ('bool_false', False), ('none', None)]
    codeflash_output = encode_query(query) # 2.93μs -> 3.13μs (6.27% slower)

def test_encode_query_list_of_lists():
    # Test encoding of a list of lists (should flatten only one level)
    query = {'matrix': [[1, 2], [3, 4]]}
    expected = [
        ('matrix', [1, 2]),
        ('matrix', [3, 4])
    ]
    codeflash_output = encode_query(query) # 1.84μs -> 1.87μs (1.28% slower)

def test_encode_query_dict_with_empty_string():
    # Test encoding of dict with empty string value
    query = {'a': ''}
    expected = [('a', '')]
    codeflash_output = encode_query(query) # 1.28μs -> 1.30μs (1.76% slower)

def test_encode_query_dict_with_zero():
    # Test encoding of dict with zero value
    query = {'a': 0}
    expected = [('a', 0)]
    codeflash_output = encode_query(query) # 1.26μs -> 1.31μs (4.11% slower)

def test_encode_query_dict_with_false():
    # Test encoding of dict with False value
    query = {'a': False}
    expected = [('a', False)]
    codeflash_output = encode_query(query) # 1.40μs -> 1.41μs (1.20% slower)

def test_encode_query_dict_with_true():
    # Test encoding of dict with True value
    query = {'a': True}
    expected = [('a', True)]
    codeflash_output = encode_query(query) # 1.39μs -> 1.32μs (4.91% faster)

def test_encode_query_nested_dict_with_list_of_dicts_and_scalars():
    # Test encoding of nested dict with list of dicts and scalars
    query = {'outer': {'inner': [{'x': 1}, {'x': 2}], 'y': 'z'}}
    expected = [('outer[inner][x]', 1), ('outer[inner][x]', 2), ('outer[y]', 'z')]
    codeflash_output = encode_query(query) # 3.94μs -> 3.37μs (16.7% faster)

def test_encode_query_dict_with_unicode_and_special_characters():
    # Test encoding of dict with unicode and special characters
    query = {'emoji': '😀', 'special': '@#$%^&*'}
    expected = [('emoji', '😀'), ('special', '@#$%^&*')]
    codeflash_output = encode_query(query) # 1.73μs -> 1.75μs (0.971% slower)

# 3. Large Scale Test Cases

def test_encode_query_large_flat_dict():
    # Test encoding of a large flat dict (up to 1000 items)
    query = {f'key{i}': i for i in range(1000)}
    expected = [(f'key{i}', i) for i in range(1000)]
    codeflash_output = encode_query(query) # 139μs -> 138μs (0.827% faster)

def test_encode_query_large_list_of_scalars():
    # Test encoding of a large list of scalars
    query = {'numbers': list(range(1000))}
    expected = [('numbers', i) for i in range(1000)]
    codeflash_output = encode_query(query) # 84.4μs -> 84.8μs (0.495% slower)

def test_encode_query_large_list_of_dicts():
    # Test encoding of a large list of dicts
    query = {'users': [{'id': i, 'name': f'user{i}'} for i in range(1000)]}
    expected = []
    for i in range(1000):
        expected.append(('users[id]', i))
        expected.append(('users[name]', f'user{i}'))
    codeflash_output = encode_query(query) # 660μs -> 400μs (64.9% faster)

def test_encode_query_large_nested_dict():
    # Test encoding of a large nested dict (depth=3, width=10)
    query = {f'level1_{i}': {f'level2_{j}': {f'level3_{k}': k for k in range(10)} for j in range(10)} for i in range(10)}
    expected = []
    for i in range(10):
        for j in range(10):
            for k in range(10):
                expected.append((f'level1_{i}[level2_{j}][level3_{k}]', k))
    codeflash_output = encode_query(query) # 118μs -> 116μs (1.84% faster)

def test_encode_query_large_mixed_structure():
    # Test encoding of a large mixed structure
    query = {
        'numbers': list(range(500)),
        'users': [{'id': i, 'tags': [f'tag{i}', f'tag{i+1}']} for i in range(10)],
        'meta': {'count': 10, 'desc': 'large test'}
    }
    expected = [('numbers', i) for i in range(500)]
    for i in range(10):
        expected.append(('users[id]', i))
        expected.append(('users[tags]', f'tag{i}'))
        expected.append(('users[tags]', f'tag{i+1}'))
    expected.append(('meta[count]', 10))
    expected.append(('meta[desc]', 'large test'))
    codeflash_output = encode_query(query) # 58.0μs -> 54.9μs (5.47% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from typing import Any, Dict, List, Optional, Tuple

import pydantic
# imports
import pytest  # used for our unit tests
from skyvern.client.core.query_encoder import \
    encode_query  # --- END encode_query implementation ---

# unit tests

# Basic Test Cases

def test_encode_query_none():
    # Test None input returns None
    codeflash_output = encode_query(None) # 302ns -> 298ns (1.34% faster)

def test_encode_query_empty_dict():
    # Test empty dict returns empty list
    codeflash_output = encode_query({}) # 542ns -> 524ns (3.44% faster)

def test_encode_query_simple_flat_dict():
    # Test a simple flat dict
    input_query = {'a': 1, 'b': 'foo'}
    expected = [('a', 1), ('b', 'foo')]
    codeflash_output = encode_query(input_query) # 1.95μs -> 2.05μs (5.31% slower)

def test_encode_query_simple_nested_dict():
    # Test a simple nested dict
    input_query = {'a': {'b': 2}}
    expected = [('a[b]', 2)]
    codeflash_output = encode_query(input_query) # 2.28μs -> 1.85μs (23.3% faster)

def test_encode_query_list_of_scalars():
    # Test a list of scalar values
    input_query = {'a': [1, 2, 3]}
    expected = [('a', 1), ('a', 2), ('a', 3)]
    codeflash_output = encode_query(input_query) # 2.23μs -> 2.29μs (2.62% slower)

def test_encode_query_list_of_dicts():
    # Test a list of dicts
    input_query = {'a': [{'b': 1}, {'b': 2}]}
    expected = [('a[b]', 1), ('a[b]', 2)]
    codeflash_output = encode_query(input_query) # 3.81μs -> 3.14μs (21.5% faster)

def test_encode_query_mixed_types():
    # Test mixed types in dict
    input_query = {'a': 1, 'b': {'c': [2, 3]}, 'd': [4, {'e': 5}]}
    expected = [
        ('a', 1),
        ('b[c]', 2),
        ('b[c]', 3),
        ('d', 4),
        ('d[e]', 5)
    ]
    codeflash_output = encode_query(input_query) # 4.61μs -> 4.31μs (6.75% faster)

# Edge Test Cases

def test_encode_query_empty_list():
    # Test an empty list
    input_query = {'a': []}
    expected = []
    codeflash_output = encode_query(input_query) # 1.44μs -> 1.45μs (0.829% slower)

def test_encode_query_empty_dict_in_list():
    # Test a list containing an empty dict
    input_query = {'a': [{}]}
    expected = []
    codeflash_output = encode_query(input_query) # 2.22μs -> 1.90μs (17.1% faster)

def test_encode_query_deeply_nested_dict():
    # Test deeply nested dict
    input_query = {'a': {'b': {'c': {'d': 42}}}}
    expected = [('a[b][c][d]', 42)]
    codeflash_output = encode_query(input_query) # 2.72μs -> 2.29μs (18.9% faster)

def test_encode_query_list_of_lists():
    # Test a list containing lists (should treat inner lists as scalars)
    input_query = {'a': [[1, 2], [3, 4]]}
    # Lists inside lists are not flattened recursively, so they're treated as scalars
    expected = [('a', [1, 2]), ('a', [3, 4])]
    codeflash_output = encode_query(input_query) # 1.93μs -> 2.00μs (3.75% slower)

def test_encode_query_dict_with_none_value():
    # Test dict with None value
    input_query = {'a': None}
    expected = [('a', None)]
    codeflash_output = encode_query(input_query) # 1.29μs -> 1.35μs (4.67% slower)

def test_encode_query_dict_with_bool_and_float():
    # Test dict with bool and float values
    input_query = {'a': True, 'b': False, 'c': 1.23}
    expected = [('a', True), ('b', False), ('c', 1.23)]
    codeflash_output = encode_query(input_query) # 2.32μs -> 2.53μs (8.37% slower)

def test_encode_query_dict_with_empty_string():
    # Test dict with empty string
    input_query = {'a': ''}
    expected = [('a', '')]
    codeflash_output = encode_query(input_query) # 1.31μs -> 1.36μs (3.61% slower)

def test_encode_query_dict_with_special_characters():
    # Test keys and values with special characters
    input_query = {'a b': 'c&d', 'e/f': {'g.h': 'i=j'}}
    expected = [('a b', 'c&d'), ('e/f[g.h]', 'i=j')]
    codeflash_output = encode_query(input_query) # 2.51μs -> 2.23μs (12.6% faster)

def test_encode_query_list_of_empty_dicts():
    # Test a list of empty dicts
    input_query = {'a': [{}, {}]}
    expected = []
    codeflash_output = encode_query(input_query) # 3.01μs -> 2.25μs (34.0% faster)

def test_encode_query_nested_list_of_dicts():
    # Test nested list of dicts
    input_query = {'a': [{'b': [{'c': 1}, {'c': 2}]}]}
    expected = [('a[b][c]', 1), ('a[b][c]', 2)]
    codeflash_output = encode_query(input_query) # 3.88μs -> 3.52μs (10.5% faster)

def test_encode_query_dict_with_tuple_value():
    # Tuples are treated as scalars
    input_query = {'a': (1, 2)}
    expected = [('a', (1, 2))]
    codeflash_output = encode_query(input_query) # 1.39μs -> 1.49μs (6.72% slower)

def test_encode_query_dict_with_set_value():
    # Sets are treated as scalars
    input_query = {'a': {1, 2}}
    expected = [('a', {1, 2})]
    codeflash_output = encode_query(input_query) # 1.23μs -> 1.31μs (6.09% slower)

# Pydantic Model Test Cases

class SimpleModel(pydantic.BaseModel):
    x: int
    y: str

def test_encode_query_pydantic_model():
    # Test encoding a pydantic model
    model = SimpleModel(x=1, y='foo')
    input_query = {'model': model}
    expected = [('model[x]', 1), ('model[y]', 'foo')]
    codeflash_output = encode_query(input_query) # 17.0μs -> 17.2μs (1.41% slower)

def test_encode_query_list_of_pydantic_models():
    # Test encoding a list of pydantic models
    model1 = SimpleModel(x=1, y='foo')
    model2 = SimpleModel(x=2, y='bar')
    input_query = {'models': [model1, model2]}
    expected = [('models[x]', 1), ('models[y]', 'foo'), ('models[x]', 2), ('models[y]', 'bar')]
    codeflash_output = encode_query(input_query) # 17.9μs -> 17.3μs (3.22% faster)

class NestedModel(pydantic.BaseModel):
    a: SimpleModel
    b: int

def test_encode_query_nested_pydantic_model():
    # Test encoding a nested pydantic model
    model = NestedModel(a=SimpleModel(x=5, y='baz'), b=10)
    input_query = {'nm': model}
    expected = [('nm[a][x]', 5), ('nm[a][y]', 'baz'), ('nm[b]', 10)]
    codeflash_output = encode_query(input_query) # 11.7μs -> 12.3μs (4.24% slower)

def test_encode_query_large_flat_dict():
    # Test a large flat dict
    input_query = {f'key{i}': i for i in range(1000)}
    expected = [(f'key{i}', i) for i in range(1000)]
    codeflash_output = encode_query(input_query) # 145μs -> 144μs (0.691% faster)

def test_encode_query_large_nested_dict():
    # Test a large nested dict
    input_query = {'outer': {f'inner{i}': i for i in range(1000)}}
    expected = [(f'outer[inner{i}]', i) for i in range(1000)]
    codeflash_output = encode_query(input_query) # 91.8μs -> 90.7μs (1.19% faster)

def test_encode_query_large_list_of_dicts():
    # Test a large list of dicts
    input_query = {'a': [{'b': i} for i in range(1000)]}
    expected = [('a[b]', i) for i in range(1000)]
    codeflash_output = encode_query(input_query) # 461μs -> 221μs (108% faster)

def test_encode_query_large_list_of_scalars():
    # Test a large list of scalars
    input_query = {'a': list(range(1000))}
    expected = [('a', i) for i in range(1000)]
    codeflash_output = encode_query(input_query) # 84.7μs -> 84.8μs (0.093% slower)

def test_encode_query_large_mixed_structure():
    # Test a large mixed structure
    input_query = {
        'a': [i for i in range(500)],
        'b': [{'c': i, 'd': [i, i+1]} for i in range(250)],
        'e': {'f': [i for i in range(100)]}
    }
    expected = []
    expected.extend([('a', i) for i in range(500)])
    for i in range(250):
        expected.append(('b[c]', i))
        expected.append(('b[d]', i))
        expected.append(('b[d]', i+1))
    for i in range(100):
        expected.append(('e[f]', i))
    codeflash_output = encode_query(input_query) # 220μs -> 156μs (40.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-encode_query-mira9h0d and push.

Codeflash Static Badge

The optimization achieves a **12% speedup** by restructuring type checking logic to minimize redundant `isinstance()` calls and reduce branch prediction overhead.

**Key optimizations applied:**

1. **Reordered type checking**: Moved `dict` check first since it's more common in query structures, avoiding the expensive protocol lookup for pydantic models in the common case.

2. **Eliminated redundant compound conditions**: Split the original `isinstance(query_value, pydantic.BaseModel) or isinstance(query_value, dict)` into separate `elif` branches, reducing the number of type checks when the first condition fails.

3. **Streamlined list processing**: In the list handling branch, removed nested conditional logic and directly handled `dict` and `pydantic.BaseModel` cases separately, eliminating duplicate `isinstance()` calls within the loop.

4. **Direct method calls**: For pydantic models in lists, directly call `.dict(by_alias=True)` instead of storing in an intermediate variable, reducing memory allocations.

**Why this leads to speedup:**
- `isinstance()` calls are relatively expensive in Python, especially for protocol-based types like pydantic models
- Branch prediction is improved with simpler conditional structures
- Fewer temporary variable assignments reduce memory pressure
- The reordering takes advantage of the fact that plain dicts are more common than pydantic models in typical query parameters

**Impact on workloads:**
The function is called from HTTP client methods (`request()` and `stream()`) for encoding query parameters in API calls. Since these are hot paths that may process many requests with complex nested data structures, the 12% improvement becomes significant at scale. The optimization particularly benefits workloads with:
- Large lists of dictionaries (64.9% faster in tests)
- Nested dictionary structures (up to 23.3% faster)
- Mixed data structures with many type checks (40.4% faster for large mixed structures)

The optimization maintains identical behavior while significantly improving performance for dictionary-heavy query encoding scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 10:18
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant