Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 139% (1.39x) speedup for UniversalBaseModel.model_construct in skyvern/client/core/pydantic_utilities.py

⏱️ Runtime : 28.2 milliseconds 11.8 milliseconds (best of 77 runs)

📝 Explanation and details

The optimized code achieves a 139% speedup through three key optimizations that reduce redundant function calls and improve type checking efficiency:

1. Single Type Origin Calculation
The original code repeatedly called typing_extensions.get_origin(clean_type) for each type check (Dict, List, Set, Sequence, Union), resulting in expensive introspection overhead. The optimized version calculates origin once and reuses it throughout, eliminating redundant calls that were consuming ~25% of the function's runtime based on the profiler data.

2. Set-Based Type Membership Checks
Instead of multiple or-chained comparisons like origin == typing.Dict or origin == dict or clean_type == typing.Dict, the code now uses efficient set membership: origin in {dict, typing.Dict} or clean_type in {dict, typing.Dict}. This reduces the number of equality operations and makes type detection faster, especially for the commonly-used container types.

3. Eliminated Redundant Processing in construct Method
The critical optimization removes the duplicate call to convert_and_respect_annotation_metadata in the construct method. The original code called this expensive function twice - once in model_construct and again in construct - effectively doubling the conversion overhead. The optimized version passes **values directly to the parent constructor after the initial conversion, cutting the function's execution time by ~80%.

Performance Impact by Test Case:

  • Large collections benefit most: 1000-element lists/dicts see 261-272% speedups due to reduced per-item overhead
  • Basic models: 40-60% improvements from eliminating redundant type checks
  • Nested/complex models: 50-60% gains from avoiding duplicate conversions
  • Union types: More efficient early-exit logic prevents unnecessary recursive calls

The optimization is particularly effective for Pydantic model construction workflows where convert_and_respect_annotation_metadata processes complex type annotations - a common pattern in API serialization libraries like this one.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 25 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import datetime as dt
import typing
from typing import Any, ClassVar, Dict, List, Optional, Set, Type

import pydantic
# imports
import pytest
from skyvern.client.core.pydantic_utilities import UniversalBaseModel
from typing_extensions import Annotated

# ---- Function to test ----
# The UniversalBaseModel and dependencies are assumed already imported as above.

# ---- Test Models ----

class SimpleModel(UniversalBaseModel):
    x: int
    y: str

class ModelWithOptional(UniversalBaseModel):
    a: int
    b: Optional[str] = None

class ModelWithAlias(UniversalBaseModel):
    x: int
    y: Annotated[str, pydantic.Field(alias="z")]

class ModelWithDefaults(UniversalBaseModel):
    a: int = 1
    b: str = "default"

class NestedModel(UniversalBaseModel):
    inner: SimpleModel
    value: int

class ModelWithList(UniversalBaseModel):
    items: List[int]

class ModelWithDict(UniversalBaseModel):
    mapping: Dict[str, int]

class ModelWithSet(UniversalBaseModel):
    tags: Set[str]

class ModelWithUnion(UniversalBaseModel):
    value: typing.Union[int, str]

class ModelWithDatetime(UniversalBaseModel):
    timestamp: dt.datetime

# ---- Basic Test Cases ----

def test_simple_model_construct_basic():
    # Test basic construction with required fields
    codeflash_output = SimpleModel.model_construct(x=1, y="foo"); obj = codeflash_output # 112μs -> 79.0μs (42.6% faster)

def test_model_with_optional_field_present():
    # Optional field present
    codeflash_output = ModelWithOptional.model_construct(a=10, b="hello"); obj = codeflash_output # 114μs -> 76.5μs (50.2% faster)

def test_model_with_optional_field_absent():
    # Optional field omitted
    codeflash_output = ModelWithOptional.model_construct(a=20); obj = codeflash_output # 93.4μs -> 64.6μs (44.6% faster)

def test_model_with_alias_field():
    # Should respect alias on input (simulate dict with alias key)
    codeflash_output = ModelWithAlias.model_construct(x=7, z="aliased"); obj = codeflash_output # 177μs -> 109μs (61.9% faster)

def test_model_with_defaults():
    # Should use defaults if not provided
    codeflash_output = ModelWithDefaults.model_construct(); obj = codeflash_output # 71.9μs -> 51.1μs (40.6% faster)

def test_model_with_defaults_override():
    # Should override defaults if provided
    codeflash_output = ModelWithDefaults.model_construct(a=5, b="overridden"); obj = codeflash_output # 81.5μs -> 54.0μs (50.9% faster)

def test_nested_model():
    # Nested UniversalBaseModel construction
    codeflash_output = NestedModel.model_construct(inner={"x": 2, "y": "bar"}, value=99); obj = codeflash_output # 134μs -> 88.1μs (52.6% faster)

def test_model_with_list():
    # List field
    codeflash_output = ModelWithList.model_construct(items=[1, 2, 3]); obj = codeflash_output # 95.3μs -> 60.1μs (58.5% faster)

def test_model_with_dict():
    # Dict field
    codeflash_output = ModelWithDict.model_construct(mapping={"a": 1, "b": 2}); obj = codeflash_output # 86.9μs -> 56.8μs (52.8% faster)

def test_model_with_set():
    # Set field
    codeflash_output = ModelWithSet.model_construct(tags={"foo", "bar"}); obj = codeflash_output # 81.8μs -> 55.9μs (46.2% faster)

def test_model_with_union_int():
    # Union field, int
    codeflash_output = ModelWithUnion.model_construct(value=123); obj = codeflash_output # 98.6μs -> 65.3μs (50.9% faster)

def test_model_with_union_str():
    # Union field, str
    codeflash_output = ModelWithUnion.model_construct(value="hello"); obj = codeflash_output # 85.5μs -> 60.4μs (41.6% faster)

def test_model_with_datetime():
    # Datetime field
    now = dt.datetime(2023, 1, 1, 12, 0, 0)
    codeflash_output = ModelWithDatetime.model_construct(timestamp=now); obj = codeflash_output # 70.4μs -> 50.3μs (40.1% faster)

# ---- Edge Test Cases ----

def test_extra_fields_are_ignored():
    # Extra fields should be ignored (pydantic construct allows extra)
    codeflash_output = SimpleModel.model_construct(x=1, y="foo", extra="bar"); obj = codeflash_output # 111μs -> 78.0μs (43.3% faster)

def test_alias_and_field_both_given():
    # If both alias and field name are given, alias should take precedence
    codeflash_output = ModelWithAlias.model_construct(x=1, y="should be ignored", z="aliased"); obj = codeflash_output # 186μs -> 115μs (62.2% faster)

def test_nested_model_with_alias():
    # Nested model with alias field
    codeflash_output = NestedModel.model_construct(inner={"x": 5, "z": "aliased"}, value=42); obj = codeflash_output # 138μs -> 89.1μs (55.3% faster)

def test_model_with_empty_list():
    # Empty list should be accepted
    codeflash_output = ModelWithList.model_construct(items=[]); obj = codeflash_output # 106μs -> 76.0μs (40.1% faster)

def test_model_with_empty_dict():
    # Empty dict should be accepted
    codeflash_output = ModelWithDict.model_construct(mapping={}); obj = codeflash_output # 81.5μs -> 59.5μs (37.0% faster)

def test_model_with_empty_set():
    # Empty set should be accepted
    codeflash_output = ModelWithSet.model_construct(tags=set()); obj = codeflash_output # 70.5μs -> 54.8μs (28.6% faster)

# ---- Large Scale Test Cases ----

def test_large_list():
    # Large list (up to 1000 elements)
    data = list(range(1000))
    codeflash_output = ModelWithList.model_construct(items=data); obj = codeflash_output # 4.51ms -> 1.21ms (272% faster)

def test_large_dict():
    # Large dict (up to 1000 elements)
    data = {str(i): i for i in range(1000)}
    codeflash_output = ModelWithDict.model_construct(mapping=data); obj = codeflash_output # 4.58ms -> 1.27ms (261% faster)

def test_large_set():
    # Large set (up to 1000 elements)
    data = set(str(i) for i in range(1000))
    codeflash_output = ModelWithSet.model_construct(tags=data); obj = codeflash_output # 2.39ms -> 901μs (165% faster)
import collections
import datetime as dt
import inspect
import typing

import pydantic
# imports
import pytest
import typing_extensions
from skyvern.client.core.pydantic_utilities import UniversalBaseModel
from typing_extensions import Annotated, NotRequired, TypedDict

# ---- Unit Tests ----

# ---- Basic Test Cases ----

def test_model_construct_invalid_type_raises():
    # Edge: wrong type for field
    class M(UniversalBaseModel):
        a: int
    with pytest.raises(Exception):
        M.model_construct(a="not an int")

def test_model_construct_missing_required_raises():
    # Edge: missing required field
    class M(UniversalBaseModel):
        a: int
    with pytest.raises(Exception):
        M.model_construct()

def test_model_construct_nested_invalid_type_raises():
    # Edge: nested wrong type
    class Inner(UniversalBaseModel):
        x: int
    class Outer(UniversalBaseModel):
        inner: Inner
    with pytest.raises(Exception):
        Outer.model_construct(inner={"x": "not an int"})
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-UniversalBaseModel.model_construct-mireyo7o and push.

Codeflash Static Badge

The optimized code achieves a **139% speedup** through three key optimizations that reduce redundant function calls and improve type checking efficiency:

**1. Single Type Origin Calculation**
The original code repeatedly called `typing_extensions.get_origin(clean_type)` for each type check (Dict, List, Set, Sequence, Union), resulting in expensive introspection overhead. The optimized version calculates `origin` once and reuses it throughout, eliminating redundant calls that were consuming ~25% of the function's runtime based on the profiler data.

**2. Set-Based Type Membership Checks** 
Instead of multiple `or`-chained comparisons like `origin == typing.Dict or origin == dict or clean_type == typing.Dict`, the code now uses efficient set membership: `origin in {dict, typing.Dict} or clean_type in {dict, typing.Dict}`. This reduces the number of equality operations and makes type detection faster, especially for the commonly-used container types.

**3. Eliminated Redundant Processing in `construct` Method**
The critical optimization removes the duplicate call to `convert_and_respect_annotation_metadata` in the `construct` method. The original code called this expensive function twice - once in `model_construct` and again in `construct` - effectively doubling the conversion overhead. The optimized version passes `**values` directly to the parent constructor after the initial conversion, cutting the function's execution time by ~80%.

**Performance Impact by Test Case:**
- **Large collections benefit most**: 1000-element lists/dicts see 261-272% speedups due to reduced per-item overhead
- **Basic models**: 40-60% improvements from eliminating redundant type checks
- **Nested/complex models**: 50-60% gains from avoiding duplicate conversions
- **Union types**: More efficient early-exit logic prevents unnecessary recursive calls

The optimization is particularly effective for Pydantic model construction workflows where `convert_and_respect_annotation_metadata` processes complex type annotations - a common pattern in API serialization libraries like this one.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 12:30
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant