Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 22% (0.22x) speedup for database_exists in skyvern/cli/database.py

⏱️ Runtime : 271 milliseconds 223 milliseconds (best of 30 runs)

📝 Explanation and details

The optimization achieves a 21% speedup by combining two separate console.print() calls into a single call during error handling in the run_command function.

Key optimization:

  • Merged error messages: Instead of two separate console.print() calls for error messages, the optimized version concatenates the messages with a newline and makes a single call to console.print().

Why this improves performance:
The line profiler reveals that error handling dominated execution time in the original code - the two console.print() statements consumed 92.7% of total runtime (53.2% + 39.5%). Each console.print() call involves:

  • Rich text formatting and style processing
  • Terminal I/O operations
  • Internal buffer management

By combining these into one call, the optimization eliminates the overhead of one complete formatting/I/O cycle.

Impact on workloads:
Based on the function_references, this function is called extensively in setup_postgresql() for database connectivity checks and Docker container management. The optimization is particularly beneficial for:

  • Error-prone scenarios (wrong credentials, missing databases) - as shown in tests where failures see 21-22% speedups
  • Batch operations during database setup where multiple checks may fail
  • CI/CD pipelines where database setup errors are common

The test results confirm this - successful database checks see minimal improvement (0.1-2.6%), while error cases show significant gains (21-22%), making this optimization especially valuable when database connectivity issues occur during Skyvern's PostgreSQL setup process.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 3013 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import subprocess
# function to test
import sys
from typing import Optional
# --- Unit Tests ---
from unittest.mock import patch

# imports
import pytest
from skyvern.cli.database import database_exists

# Helper for mocking subprocess.run
class DummyResult:
    def __init__(self, stdout, returncode):
        self.stdout = stdout
        self.returncode = returncode

# ----------- BASIC TEST CASES -----------

def test_database_exists_success():
    """Test when the database exists and the command returns output."""
    # Simulate successful psql command
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="psql (12.7)", returncode=0)
        codeflash_output = database_exists("mydb", "myuser") # 15.9μs -> 15.9μs (0.075% faster)

def test_database_exists_failure():
    """Test when the database does not exist and the command fails."""
    # Simulate psql command failure (no stdout)
    def raise_error(*args, **kwargs):
        raise subprocess.CalledProcessError(returncode=2, cmd=args[0], stderr="database does not exist")
    with patch("subprocess.run", side_effect=raise_error):
        codeflash_output = database_exists("nonexistent_db", "myuser") # 393μs -> 324μs (21.1% faster)

def test_database_exists_empty_output():
    """Test when the command runs but returns empty output (should be True)."""
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="", returncode=0)
        codeflash_output = database_exists("mydb", "myuser") # 13.6μs -> 13.5μs (0.815% faster)

def test_database_exists_wrong_user():
    """Test when the user does not exist."""
    def raise_error(*args, **kwargs):
        raise subprocess.CalledProcessError(returncode=2, cmd=args[0], stderr="role does not exist")
    with patch("subprocess.run", side_effect=raise_error):
        codeflash_output = database_exists("mydb", "wronguser") # 362μs -> 296μs (22.1% faster)

# ----------- EDGE TEST CASES -----------

def test_database_exists_command_injection():
    """Test for command injection vulnerability."""
    # The function should not execute unintended commands; this test checks behavior.
    injected_dbname = "mydb; echo hacked"
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="psql (12.7)", returncode=0)
        codeflash_output = database_exists(injected_dbname, "myuser") # 13.1μs -> 12.8μs (2.55% faster)

def test_database_exists_many_calls_success():
    """Test function called many times with valid dbnames/users."""
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="psql (12.7)", returncode=0)
        for i in range(1000):
            dbname = f"db_{i}"
            user = f"user_{i}"
            codeflash_output = database_exists(dbname, user)

def test_database_exists_many_calls_failure():
    """Test function called many times with invalid dbnames/users."""
    def raise_error(*args, **kwargs):
        raise subprocess.CalledProcessError(returncode=2, cmd=args[0], stderr="database does not exist")
    with patch("subprocess.run", side_effect=raise_error):
        for i in range(1000):
            dbname = f"nonexistent_db_{i}"
            user = f"user_{i}"
            codeflash_output = database_exists(dbname, user)

def test_database_exists_large_strings():
    """Test with maximum allowed string lengths for dbname and user."""
    max_dbname = "x" * 1000
    max_user = "y" * 1000
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="psql (12.7)", returncode=0)
        codeflash_output = database_exists(max_dbname, max_user) # 16.5μs -> 16.3μs (1.05% faster)

def test_database_exists_performance():
    """Test performance for 1000 calls (should not take too long)."""
    import time
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="psql (12.7)", returncode=0)
        start = time.time()
        for i in range(1000):
            codeflash_output = database_exists(f"db{i}", f"user{i}")
        duration = time.time() - start

# ----------- DETERMINISM AND ERROR HANDLING -----------

def test_database_exists_unexpected_exception():
    """Test if subprocess.run raises an unexpected exception."""
    def raise_error(*args, **kwargs):
        raise RuntimeError("unexpected error")
    with patch("subprocess.run", side_effect=raise_error):
        # Should propagate the exception
        with pytest.raises(RuntimeError):
            database_exists("mydb", "myuser")

def test_database_exists_returncode_nonzero_but_stdout():
    """Test when command returns nonzero code but has stdout (should be True)."""
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = DummyResult(stdout="psql (12.7)", returncode=1)
        codeflash_output = database_exists("mydb", "myuser") # 13.6μs -> 13.5μs (0.437% faster)

To edit these changes git checkout codeflash/optimize-database_exists-mir3payk and push.

Codeflash Static Badge

The optimization achieves a **21% speedup** by combining two separate `console.print()` calls into a single call during error handling in the `run_command` function.

**Key optimization:**
- **Merged error messages**: Instead of two separate `console.print()` calls for error messages, the optimized version concatenates the messages with a newline and makes a single call to `console.print()`.

**Why this improves performance:**
The line profiler reveals that error handling dominated execution time in the original code - the two `console.print()` statements consumed 92.7% of total runtime (53.2% + 39.5%). Each `console.print()` call involves:
- Rich text formatting and style processing
- Terminal I/O operations
- Internal buffer management

By combining these into one call, the optimization eliminates the overhead of one complete formatting/I/O cycle.

**Impact on workloads:**
Based on the `function_references`, this function is called extensively in `setup_postgresql()` for database connectivity checks and Docker container management. The optimization is particularly beneficial for:
- **Error-prone scenarios** (wrong credentials, missing databases) - as shown in tests where failures see 21-22% speedups
- **Batch operations** during database setup where multiple checks may fail
- **CI/CD pipelines** where database setup errors are common

The test results confirm this - successful database checks see minimal improvement (0.1-2.6%), while error cases show significant gains (21-22%), making this optimization especially valuable when database connectivity issues occur during Skyvern's PostgreSQL setup process.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 07:15
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant