Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 2, 2025

📄 3,159% (31.59x) speedup for get_skyvern_state_file_path in skyvern/utils/files.py

⏱️ Runtime : 1.28 milliseconds 39.4 microseconds (best of 241 runs)

📝 Explanation and details

The optimization applies function-level caching using @lru_cache(maxsize=1) to get_skyvern_temp_dir(), which provides a massive 32x speedup by eliminating repeated expensive filesystem operations.

Key optimization: The original code calls create_folder_if_not_exist(temp_dir) on every invocation, which performs a Path.mkdir(parents=True, exist_ok=True) operation. Even with exist_ok=True, this still requires an OS syscall to check directory existence. The profiler shows this line consuming 97.5% of execution time (5.39ms out of 5.53ms total).

How caching works: With @lru_cache(maxsize=1), the function result is cached after the first call. Since settings.TEMP_PATH is typically constant during application runtime, subsequent calls return the cached directory path without any filesystem operations - just a fast memory lookup.

Performance impact on workloads: The function references show this is called in a hot path within a continuous loop (while True: await asyncio.sleep(INTERVAL)). The streaming worker calls get_skyvern_temp_dir() multiple times per iteration to construct file paths and create directories. With the optimization, what was previously ~13-17μs per call drops to ~400-700ns per call across all test cases.

Test case benefits: The optimization is particularly effective for:

  • Repeated calls with the same TEMP_PATH (all test cases show 2000-3000% speedups)
  • Long/nested directory paths where mkdir operations are more expensive
  • High-frequency usage scenarios like the streaming worker loop

The cache is safe because temp directory paths rarely change during runtime, and the maxsize=1 ensures minimal memory overhead while covering the common single-temp-dir use case.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 226 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import os
import tempfile
from pathlib import Path

# imports
import pytest
from skyvern.utils.files import get_skyvern_state_file_path

# --- Function and dependencies under test ---
# Simulate skyvern.config.settings
class DummySettings:
    TEMP_PATH = None

# Patch point for settings
settings = DummySettings()
from skyvern.utils.files import get_skyvern_state_file_path

# ----------------------------
# 1. Basic Test Cases
# ----------------------------

def test_returns_correct_path_simple():
    # Basic: TEMP_PATH is a simple directory
    with tempfile.TemporaryDirectory() as temp_dir:
        settings.TEMP_PATH = temp_dir
        expected = os.path.join(temp_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 13.5μs -> 598ns (2154% faster)

def test_returns_path_when_dir_does_not_exist():
    # Basic: TEMP_PATH points to a non-existent directory
    with tempfile.TemporaryDirectory() as temp_root:
        temp_dir = os.path.join(temp_root, "not_created_yet")
        settings.TEMP_PATH = temp_dir
        expected = os.path.join(temp_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.4μs -> 575ns (2400% faster)

def test_returns_path_with_trailing_slash():
    # Basic: TEMP_PATH has a trailing slash
    with tempfile.TemporaryDirectory() as temp_dir:
        settings.TEMP_PATH = temp_dir + "/"
        expected = os.path.join(temp_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.1μs -> 510ns (2670% faster)

def test_returns_path_with_nested_directory():
    # Basic: TEMP_PATH is a nested directory structure
    with tempfile.TemporaryDirectory() as temp_root:
        nested_dir = os.path.join(temp_root, "foo", "bar", "baz")
        settings.TEMP_PATH = nested_dir
        expected = os.path.join(nested_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.5μs -> 486ns (2889% faster)

# ----------------------------
# 2. Edge Test Cases
# ----------------------------

def test_temp_path_is_empty_string():
    # Edge: TEMP_PATH is empty string
    settings.TEMP_PATH = ""
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.2μs -> 503ns (2728% faster)
    # Should create current.json in the current working directory
    expected = os.path.join(os.getcwd(), "current.json")

def test_temp_path_is_dot():
    # Edge: TEMP_PATH is "."
    settings.TEMP_PATH = "."
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 13.2μs -> 496ns (2554% faster)
    expected = os.path.join(os.getcwd(), "current.json")

def test_temp_path_is_dotdot():
    # Edge: TEMP_PATH is ".."
    settings.TEMP_PATH = ".."
    expected = os.path.normpath(os.path.join(os.getcwd(), "..", "current.json"))
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 12.7μs -> 442ns (2765% faster)
    # Should create the parent directory if not exists
    parent_dir = os.path.normpath(os.path.join(os.getcwd(), ".."))

def test_temp_path_is_special_characters():
    # Edge: TEMP_PATH has special characters
    with tempfile.TemporaryDirectory() as temp_root:
        weird_dir = os.path.join(temp_root, "spécial_çhår$@!#")
        settings.TEMP_PATH = weird_dir
        expected = os.path.join(weird_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 13.7μs -> 514ns (2562% faster)

def test_temp_path_is_long_path():
    # Edge: TEMP_PATH is a long path
    with tempfile.TemporaryDirectory() as temp_root:
        long_dir = os.path.join(temp_root, "a" * 200)
        settings.TEMP_PATH = long_dir
        expected = os.path.join(long_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.3μs -> 539ns (2558% faster)

def test_temp_path_is_file_path():
    # Edge: TEMP_PATH is a file, not a directory
    with tempfile.TemporaryDirectory() as temp_root:
        file_path = os.path.join(temp_root, "not_a_dir.txt")
        # Create the file
        with open(file_path, "w") as f:
            f.write("test")
        settings.TEMP_PATH = file_path
        # Should treat it as a directory and create it (overwrites file with directory)
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.8μs -> 606ns (2345% faster)

def test_temp_path_is_root_dir():
    # Edge: TEMP_PATH is root directory
    if os.name != "nt":  # Skip on Windows
        settings.TEMP_PATH = "/"
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 14.7μs -> 539ns (2623% faster)
        expected = "/current.json"

def test_many_nested_dirs():
    # Large: TEMP_PATH is a deeply nested directory (100 levels)
    with tempfile.TemporaryDirectory() as temp_root:
        nested_dir = temp_root
        for i in range(100):
            nested_dir = os.path.join(nested_dir, f"dir_{i}")
        settings.TEMP_PATH = nested_dir
        expected = os.path.join(nested_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 17.9μs -> 738ns (2321% faster)

def test_many_parallel_dirs():
    # Large: Create 100 sibling directories and check each
    with tempfile.TemporaryDirectory() as temp_root:
        for i in range(100):
            sibling_dir = os.path.join(temp_root, f"sibling_{i}")
            settings.TEMP_PATH = sibling_dir
            expected = os.path.join(sibling_dir, "current.json")
            codeflash_output = get_skyvern_state_file_path(); result = codeflash_output

def test_long_dir_name_and_special_chars():
    # Large: TEMP_PATH is a long name with special characters
    with tempfile.TemporaryDirectory() as temp_root:
        long_weird_dir = os.path.join(temp_root, "X" * 200 + "!@#$_-")
        settings.TEMP_PATH = long_weird_dir
        expected = os.path.join(long_weird_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 13.4μs -> 403ns (3230% faster)

def test_path_with_spaces_and_unicode():
    # Large: TEMP_PATH contains spaces and unicode
    with tempfile.TemporaryDirectory() as temp_root:
        space_unicode_dir = os.path.join(temp_root, "dir with spaces_测试")
        settings.TEMP_PATH = space_unicode_dir
        expected = os.path.join(space_unicode_dir, "current.json")
        codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 12.8μs -> 389ns (3194% faster)

def test_multiple_calls_consistency():
    # Large: Multiple calls with changing TEMP_PATH
    with tempfile.TemporaryDirectory() as temp_root:
        for i in range(50):
            test_dir = os.path.join(temp_root, f"multi_{i}")
            settings.TEMP_PATH = test_dir
            expected = os.path.join(test_dir, "current.json")
            codeflash_output = get_skyvern_state_file_path(); result = codeflash_output

def test_parallel_creation_of_dirs():
    # Large: Simulate parallel creation (not actual threading, but rapid changes)
    with tempfile.TemporaryDirectory() as temp_root:
        dirs = [os.path.join(temp_root, f"dir_{i}") for i in range(50)]
        for d in dirs:
            settings.TEMP_PATH = d
            codeflash_output = get_skyvern_state_file_path(); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import os
import shutil
# Patch settings in the module namespace
import sys
from pathlib import Path

# imports
import pytest
from skyvern.utils.files import get_skyvern_state_file_path

# --- Begin: Function to test and dependencies ---

# Simulate skyvern.config.settings
class SettingsMock:
    TEMP_PATH = "test_temp_dir"

module_name = "skyvern.config"
if module_name not in sys.modules:
    import types
    sys.modules[module_name] = types.SimpleNamespace(settings=SettingsMock())
else:
    sys.modules[module_name].settings = SettingsMock()
from skyvern.utils.files import get_skyvern_state_file_path

# 1. Basic Test Cases

def test_returns_expected_path_with_default_temp_dir():
    """
    Test that the function returns the correct path when TEMP_PATH is set to a normal value.
    """
    sys.modules["skyvern.config"].settings.TEMP_PATH = "test_temp_dir"
    expected = os.path.join("test_temp_dir", "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.5μs -> 388ns (2867% faster)

def test_returns_expected_path_with_absolute_temp_dir():
    """
    Test with an absolute path for TEMP_PATH.
    """
    abs_path = os.path.abspath("abs_test_temp_dir")
    sys.modules["skyvern.config"].settings.TEMP_PATH = abs_path
    expected = os.path.join(abs_path, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 10.9μs -> 387ns (2706% faster)

def test_returns_expected_path_with_nested_temp_dir():
    """
    Test with a nested directory for TEMP_PATH.
    """
    nested = os.path.join("parent", "child", "grandchild")
    sys.modules["skyvern.config"].settings.TEMP_PATH = nested
    expected = os.path.join(nested, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.2μs -> 439ns (2449% faster)

# 2. Edge Test Cases

def test_temp_dir_with_special_characters():
    """
    Test TEMP_PATH containing spaces and special characters.
    """
    special = "temp dir @#!$"
    sys.modules["skyvern.config"].settings.TEMP_PATH = special
    expected = os.path.join(special, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.4μs -> 400ns (2758% faster)

def test_temp_dir_as_empty_string():
    """
    Test TEMP_PATH as an empty string (should create 'current.json' in current directory).
    """
    sys.modules["skyvern.config"].settings.TEMP_PATH = ""
    expected = os.path.join("", "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.7μs -> 444ns (2539% faster)

def test_temp_dir_with_trailing_slash():
    """
    Test TEMP_PATH with a trailing slash.
    """
    trailing = "trailing_slash_dir/"
    sys.modules["skyvern.config"].settings.TEMP_PATH = trailing
    expected = os.path.join(trailing, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.5μs -> 453ns (2441% faster)

def test_temp_dir_with_dot_and_dotdot():
    """
    Test TEMP_PATH with '.' and '..' in the path.
    """
    dot_path = os.path.join(".", "dot_dir")
    dotdot_path = os.path.join("..", "dotdot_dir")
    sys.modules["skyvern.config"].settings.TEMP_PATH = dot_path
    expected = os.path.join(dot_path, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 12.0μs -> 424ns (2735% faster)
    sys.modules["skyvern.config"].settings.TEMP_PATH = dotdot_path
    expected2 = os.path.join(dotdot_path, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result2 = codeflash_output # 6.51μs -> 225ns (2795% faster)

def test_temp_dir_with_unicode_characters():
    """
    Test TEMP_PATH with unicode characters.
    """
    unicode_dir = "тестовая_папка"
    sys.modules["skyvern.config"].settings.TEMP_PATH = unicode_dir
    expected = os.path.join(unicode_dir, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.6μs -> 398ns (2824% faster)

def test_temp_dir_with_long_path():
    """
    Test TEMP_PATH with a long directory name (edge of OS limits).
    """
    long_dir = "a" * 100  # 100 chars, safe for most OS
    sys.modules["skyvern.config"].settings.TEMP_PATH = long_dir
    expected = os.path.join(long_dir, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 11.8μs -> 378ns (3030% faster)

# 3. Large Scale Test Cases

def test_many_nested_directories():
    """
    Test TEMP_PATH with a deeply nested directory structure.
    """
    nested = os.path.join(*[f"dir_{i}" for i in range(30)])  # 30 nested dirs
    sys.modules["skyvern.config"].settings.TEMP_PATH = nested
    expected = os.path.join(nested, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 12.1μs -> 435ns (2679% faster)
    # Check that all directories exist
    path = ""
    for i in range(30):
        path = os.path.join(path, f"dir_{i}")

def test_performance_with_large_temp_dir_name():
    """
    Test function performance with a very large directory name (within OS limits).
    """
    large_name = "x" * 250  # 250 chars, safe for most OS
    sys.modules["skyvern.config"].settings.TEMP_PATH = large_name
    expected = os.path.join(large_name, "current.json")
    codeflash_output = get_skyvern_state_file_path(); result = codeflash_output # 16.6μs -> 760ns (2081% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_skyvern_state_file_path-mio7rzhi and push.

Codeflash Static Badge

The optimization applies **function-level caching using `@lru_cache(maxsize=1)`** to `get_skyvern_temp_dir()`, which provides a massive **32x speedup** by eliminating repeated expensive filesystem operations.

**Key optimization:** The original code calls `create_folder_if_not_exist(temp_dir)` on every invocation, which performs a `Path.mkdir(parents=True, exist_ok=True)` operation. Even with `exist_ok=True`, this still requires an OS syscall to check directory existence. The profiler shows this line consuming **97.5% of execution time** (5.39ms out of 5.53ms total).

**How caching works:** With `@lru_cache(maxsize=1)`, the function result is cached after the first call. Since `settings.TEMP_PATH` is typically constant during application runtime, subsequent calls return the cached directory path without any filesystem operations - just a fast memory lookup.

**Performance impact on workloads:** The function references show this is called in a **hot path** within a continuous loop (`while True: await asyncio.sleep(INTERVAL)`). The streaming worker calls `get_skyvern_temp_dir()` multiple times per iteration to construct file paths and create directories. With the optimization, what was previously ~13-17μs per call drops to ~400-700ns per call across all test cases.

**Test case benefits:** The optimization is particularly effective for:
- Repeated calls with the same `TEMP_PATH` (all test cases show 2000-3000% speedups)
- Long/nested directory paths where `mkdir` operations are more expensive
- High-frequency usage scenarios like the streaming worker loop

The cache is safe because temp directory paths rarely change during runtime, and the `maxsize=1` ensures minimal memory overhead while covering the common single-temp-dir use case.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 2, 2025 06:45
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant