Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 4, 2025

📄 65% (0.65x) speedup for is_valid_email in skyvern/forge/sdk/services/bitwarden.py

⏱️ Runtime : 1.07 milliseconds 648 microseconds (best of 250 runs)

📝 Explanation and details

The optimization pre-compiles the email validation regex pattern into a module-level constant _EMAIL_PATTERN, eliminating the need to recompile the regex on every function call.

Key Changes:

  • Regex pre-compilation: The pattern r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" is compiled once at module import time using re.compile() and stored in _EMAIL_PATTERN
  • Direct pattern matching: Uses the pre-compiled pattern's match() method instead of re.match() with a string pattern

Why This Creates a 64% Speedup:
In the original code, re.match(pattern, email) compiles the regex pattern on every call, which is computationally expensive. The line profiler shows this operation taking 82.9% of the function's execution time (5.63ms out of 6.80ms total). The optimized version eliminates this compilation overhead entirely, reducing the critical path to just 69.2% of the total time while being much faster overall.

Impact on Workloads:
Based on the function reference, is_valid_email() is called within a credential selection loop in Bitwarden's _get_secret_value_from_url() method. When multiple credentials are found, the function iterates through them to find valid email usernames. This optimization will significantly improve performance when:

  • Multiple Bitwarden credentials exist for a domain
  • The function processes large batches of email validations
  • The service handles high-frequency credential lookups

Test Case Performance:
The optimization shows consistent 50-70% speedups across all test scenarios, with particularly strong gains on invalid emails (60-88% faster) since they fail faster without regex compilation overhead. Bulk operations see 58-73% improvements, making this especially valuable for high-throughput scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2468 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.services.bitwarden import is_valid_email

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_valid_simple_email():
    # Standard valid email
    codeflash_output = is_valid_email("user@example.com") # 2.14μs -> 1.49μs (43.5% faster)

def test_valid_email_with_dot_in_local():
    # Email with dot in local part
    codeflash_output = is_valid_email("first.last@example.com") # 1.68μs -> 1.09μs (53.7% faster)

def test_valid_email_with_plus():
    # Email with plus in local part
    codeflash_output = is_valid_email("user+tag@example.com") # 1.68μs -> 1.11μs (51.0% faster)

def test_valid_email_with_numbers():
    # Email with numbers in local and domain
    codeflash_output = is_valid_email("user123@domain456.com") # 1.63μs -> 1.06μs (53.2% faster)

def test_valid_email_with_underscore_and_dash():
    # Email with underscore and dash in local and domain
    codeflash_output = is_valid_email("first_last@my-domain.com") # 1.63μs -> 1.04μs (57.0% faster)

def test_basic_invalid_no_at():
    # Email missing '@'
    codeflash_output = is_valid_email("userexample.com") # 1.46μs -> 909ns (60.4% faster)

def test_basic_invalid_no_domain():
    # Email missing domain
    codeflash_output = is_valid_email("user@") # 1.46μs -> 777ns (87.6% faster)

def test_basic_invalid_no_local():
    # Email missing local part
    codeflash_output = is_valid_email("@example.com") # 1.54μs -> 874ns (76.4% faster)

def test_basic_invalid_no_tld():
    # Email missing TLD
    codeflash_output = is_valid_email("user@example") # 1.55μs -> 959ns (62.0% faster)

def test_basic_invalid_tld_too_short():
    # TLD too short (only 1 character)
    codeflash_output = is_valid_email("user@example.c") # 1.66μs -> 1.01μs (64.5% faster)

# -------------------- EDGE TEST CASES --------------------

def test_empty_string():
    # Empty string should be invalid
    codeflash_output = is_valid_email("") # 285ns -> 278ns (2.52% faster)

def test_none_input():
    # None input should be invalid
    codeflash_output = is_valid_email(None) # 295ns -> 285ns (3.51% faster)

def test_local_starts_with_dot():
    # Local part starts with dot (invalid)
    codeflash_output = is_valid_email(".user@example.com") # 2.05μs -> 1.32μs (55.3% faster)

def test_local_ends_with_dot():
    # Local part ends with dot (invalid)
    codeflash_output = is_valid_email("user.@example.com") # 1.83μs -> 1.17μs (55.9% faster)

def test_local_double_dot():
    # Local part has consecutive dots (invalid)
    codeflash_output = is_valid_email("first..last@example.com") # 1.79μs -> 1.13μs (58.0% faster)

def test_domain_starts_with_dash():
    # Domain starts with dash (invalid)
    codeflash_output = is_valid_email("user@-example.com") # 1.70μs -> 1.10μs (54.7% faster)

def test_domain_ends_with_dash():
    # Domain ends with dash (invalid)
    codeflash_output = is_valid_email("user@example-.com") # 1.72μs -> 1.13μs (52.2% faster)

def test_domain_double_dot():
    # Domain has consecutive dots (invalid)
    codeflash_output = is_valid_email("user@domain..com") # 1.76μs -> 1.06μs (65.8% faster)

def test_domain_with_subdomain():
    # Email with subdomain
    codeflash_output = is_valid_email("user@mail.server.com") # 1.72μs -> 1.09μs (57.4% faster)

def test_email_with_long_tld():
    # Email with long TLD
    codeflash_output = is_valid_email("user@example.technology") # 1.70μs -> 1.11μs (53.8% faster)

def test_email_with_invalid_characters():
    # Email with invalid character (comma)
    codeflash_output = is_valid_email("us,er@example.com") # 1.57μs -> 854ns (84.3% faster)

def test_email_with_space():
    # Email with space in local part
    codeflash_output = is_valid_email("us er@example.com") # 1.51μs -> 856ns (76.3% faster)

def test_email_with_unicode_characters():
    # Email with unicode character (should be invalid for this regex)
    codeflash_output = is_valid_email("usér@example.com") # 1.54μs -> 870ns (76.8% faster)

def test_email_with_trailing_space():
    # Email with trailing space
    codeflash_output = is_valid_email("user@example.com ") # 1.82μs -> 1.22μs (48.9% faster)

def test_email_with_leading_space():
    # Email with leading space
    codeflash_output = is_valid_email(" user@example.com") # 1.53μs -> 899ns (70.4% faster)

def test_email_with_multiple_ats():
    # Email with multiple '@' symbols
    codeflash_output = is_valid_email("user@@example.com") # 1.66μs -> 966ns (71.8% faster)

def test_email_with_special_chars_in_domain():
    # Email with invalid special character in domain
    codeflash_output = is_valid_email("user@exam!ple.com") # 1.58μs -> 980ns (61.5% faster)

def test_email_with_quoted_local_part():
    # Email with quoted local part (valid in RFC, but not in this regex)
    codeflash_output = is_valid_email('"user"@example.com') # 1.38μs -> 856ns (60.6% faster)

def test_email_with_ipv4_domain():
    # Email with IPv4 address as domain (not allowed by this regex)
    codeflash_output = is_valid_email("user@127.0.0.1") # 1.75μs -> 1.14μs (53.6% faster)

def test_email_with_ipv6_domain():
    # Email with IPv6 address as domain (not allowed by this regex)
    codeflash_output = is_valid_email("user@[IPv6:2001:db8::1]") # 1.64μs -> 956ns (71.5% faster)

def test_email_with_tld_all_numbers():
    # TLD is all numbers (invalid)
    codeflash_output = is_valid_email("user@example.123") # 1.69μs -> 1.03μs (64.0% faster)

def test_email_with_tld_mixed_numbers_letters():
    # TLD is mixed numbers and letters (invalid)
    codeflash_output = is_valid_email("user@example.c0m") # 1.69μs -> 1.06μs (60.0% faster)

def test_email_with_hyphen_in_tld():
    # TLD with hyphen (invalid)
    codeflash_output = is_valid_email("user@example.co-m") # 1.74μs -> 1.14μs (53.4% faster)

# -------------------- LARGE SCALE TEST CASES --------------------

def test_bulk_valid_emails():
    # Test a batch of valid emails for performance and correctness
    valid_emails = [
        f"user{i}@example{i%10}.com"
        for i in range(100)
    ]
    for email in valid_emails:
        codeflash_output = is_valid_email(email) # 45.5μs -> 28.7μs (58.4% faster)

def test_bulk_invalid_emails():
    # Test a batch of invalid emails for performance and correctness
    invalid_emails = [
        f"user{i}example.com" if i % 2 == 0 else f"user@.com"
        for i in range(100)
    ]
    for email in invalid_emails:
        codeflash_output = is_valid_email(email) # 40.9μs -> 24.2μs (69.1% faster)

def test_long_local_and_domain():
    # Test emails with very long local and domain parts (within reasonable limits)
    local = "a" * 64
    domain = "b" * 63
    tld = "com"
    email = f"{local}@{domain}.{tld}"
    codeflash_output = is_valid_email(email) # 1.80μs -> 1.17μs (53.9% faster)

def test_max_length_email():
    # Test an email at the maximum allowed length (254 chars)
    local = "a" * 64
    domain = "b" * 185  # 64 + 1 + 185 + 1 + 3 = 254
    tld = "com"
    email = f"{local}@{domain}.{tld}"
    codeflash_output = is_valid_email(email) # 1.90μs -> 1.22μs (56.3% faster)

def test_email_exceeds_max_length():
    # Test an email exceeding the maximum allowed length
    local = "a" * 65
    domain = "b" * 188
    tld = "com"
    email = f"{local}@{domain}.{tld}"
    codeflash_output = is_valid_email(email) # 1.89μs -> 1.29μs (46.6% faster)
import re
import string  # used for generating large scale test cases

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.services.bitwarden import is_valid_email

# unit tests

# -------------------- Basic Test Cases --------------------

def test_valid_standard_email():
    # Typical valid email
    codeflash_output = is_valid_email("user@example.com") # 2.79μs -> 1.71μs (63.0% faster)

def test_valid_email_with_dot_in_local():
    # Local part contains dot
    codeflash_output = is_valid_email("first.last@example.com") # 2.10μs -> 1.26μs (66.3% faster)

def test_valid_email_with_plus():
    # Local part contains plus
    codeflash_output = is_valid_email("user+tag@example.com") # 2.01μs -> 1.19μs (69.4% faster)

def test_valid_email_with_numbers():
    # Local and domain contain numbers
    codeflash_output = is_valid_email("user123@domain456.com") # 1.96μs -> 1.16μs (68.3% faster)

def test_valid_email_with_underscore():
    # Local part contains underscore
    codeflash_output = is_valid_email("user_name@example.com") # 1.86μs -> 1.17μs (59.1% faster)

def test_valid_email_with_dash():
    # Local and domain contain dash
    codeflash_output = is_valid_email("user-name@sub-domain.example.com") # 1.80μs -> 1.18μs (52.6% faster)

def test_invalid_email_missing_at():
    # Missing '@' symbol
    codeflash_output = is_valid_email("userexample.com") # 1.61μs -> 955ns (68.5% faster)

def test_invalid_email_missing_domain():
    # Missing domain part
    codeflash_output = is_valid_email("user@") # 1.47μs -> 778ns (88.4% faster)

def test_invalid_email_missing_local():
    # Missing local part
    codeflash_output = is_valid_email("@example.com") # 1.60μs -> 892ns (79.4% faster)

def test_invalid_email_missing_dot_in_domain():
    # Missing '.' in domain
    codeflash_output = is_valid_email("user@examplecom") # 1.67μs -> 997ns (67.1% faster)

def test_invalid_email_with_space():
    # Email contains space
    codeflash_output = is_valid_email("user @example.com") # 1.58μs -> 867ns (82.0% faster)

def test_invalid_email_with_multiple_ats():
    # More than one '@'
    codeflash_output = is_valid_email("user@@example.com") # 1.62μs -> 972ns (66.6% faster)

# -------------------- Edge Test Cases --------------------

def test_empty_string():
    # Empty string should be invalid
    codeflash_output = is_valid_email("") # 264ns -> 267ns (1.12% slower)

def test_none_input():
    # None input should be invalid
    codeflash_output = is_valid_email(None) # 311ns -> 305ns (1.97% faster)

def test_email_with_leading_trailing_spaces():
    # Spaces at ends should make it invalid (no strip in function)
    codeflash_output = is_valid_email(" user@example.com") # 1.65μs -> 956ns (72.2% faster)
    codeflash_output = is_valid_email("user@example.com ") # 1.10μs -> 884ns (24.1% faster)

def test_email_with_consecutive_dots_in_local():
    # Consecutive dots in local part are technically invalid, but regex allows it
    # The regex here allows it, so test for True
    codeflash_output = is_valid_email("user..name@example.com") # 1.89μs -> 1.19μs (59.1% faster)

def test_email_with_consecutive_dots_in_domain():
    # Consecutive dots in domain are technically invalid, but regex allows it
    codeflash_output = is_valid_email("user@ex..ample.com") # 1.75μs -> 1.12μs (56.2% faster)

def test_email_with_invalid_character():
    # Local part contains invalid character '!'
    codeflash_output = is_valid_email("user!@example.com") # 1.50μs -> 845ns (78.0% faster)

def test_email_with_short_tld():
    # TLD is only one character, should be invalid (regex requires at least 2)
    codeflash_output = is_valid_email("user@example.c") # 1.62μs -> 938ns (72.8% faster)

def test_email_with_long_tld():
    # TLD with many characters, valid as long as >=2
    codeflash_output = is_valid_email("user@example.technology") # 1.83μs -> 1.19μs (53.4% faster)

def test_email_with_subdomains():
    # Multiple subdomains in domain
    codeflash_output = is_valid_email("user@mail.server.example.com") # 1.75μs -> 1.12μs (55.1% faster)

def test_email_with_invalid_domain_start_end_dash():
    # Domain starts or ends with dash, which is technically invalid, but regex allows it
    codeflash_output = is_valid_email("user@-example.com") # 1.77μs -> 1.10μs (61.5% faster)
    codeflash_output = is_valid_email("user@example-.com") # 769ns -> 489ns (57.3% faster)

def test_email_with_unicode_characters():
    # Unicode in local part should be invalid (regex only allows ASCII)
    codeflash_output = is_valid_email("usér@example.com") # 1.48μs -> 844ns (75.1% faster)
    # Unicode in domain part should be invalid
    codeflash_output = is_valid_email("user@exámple.com") # 749ns -> 552ns (35.7% faster)

def test_email_with_multiple_dots_in_tld():
    # Multiple dots in TLD, e.g. .co.uk, should be invalid with this regex (only one dot allowed after @)
    codeflash_output = is_valid_email("user@example.co.uk") # 1.72μs -> 1.06μs (61.9% faster)

def test_email_with_invalid_tld_characters():
    # TLD contains numbers, should be invalid (regex only allows letters)
    codeflash_output = is_valid_email("user@example.c0m") # 1.59μs -> 1.01μs (57.1% faster)

def test_email_with_special_chars_in_domain():
    # Domain contains underscore, which is not allowed in regex
    codeflash_output = is_valid_email("user@exam_ple.com") # 1.55μs -> 900ns (72.4% faster)

# -------------------- Large Scale Test Cases --------------------

def test_large_local_part_valid():
    # Local part up to 64 characters is allowed by RFC, regex allows any length
    local = "a" * 64
    email = f"{local}@example.com"
    codeflash_output = is_valid_email(email) # 1.77μs -> 1.23μs (44.3% faster)

def test_large_domain_part_valid():
    # Domain up to 255 characters is allowed by RFC, regex allows any length
    domain = "b" * 63 + "." + "c" * 63 + "." + "d" * 61 + ".com"
    email = f"user@{domain}"
    codeflash_output = is_valid_email(email) # 1.86μs -> 1.24μs (50.0% faster)

def test_very_long_email_invalid():
    # Email that's extremely long but still matches regex (no explicit length check)
    local = "a" * 1000
    domain = "b" * 100 + ".com"
    email = f"{local}@{domain}"
    codeflash_output = is_valid_email(email) # 2.60μs -> 1.98μs (31.2% faster)

def test_batch_valid_emails():
    # Test a list of 100 valid emails
    for i in range(1, 101):
        email = f"user{i}@example{i}.com"
        codeflash_output = is_valid_email(email) # 45.2μs -> 28.5μs (58.8% faster)

def test_batch_invalid_emails():
    # Test a list of 100 invalid emails (missing '@')
    for i in range(1, 101):
        email = f"user{i}example{i}.com"
        codeflash_output = is_valid_email(email) # 40.0μs -> 24.1μs (66.3% faster)

def test_performance_large_batch():
    # Test performance with 1000 valid emails
    for i in range(1, 1001):
        email = f"user{i}@example.com"
        codeflash_output = is_valid_email(email) # 419μs -> 262μs (59.7% faster)

def test_performance_large_batch_invalid():
    # Test performance with 1000 invalid emails (missing domain)
    for i in range(1, 1001):
        email = f"user{i}@"
        codeflash_output = is_valid_email(email) # 364μs -> 210μs (73.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-is_valid_email-mirgot18 and push.

Codeflash Static Badge

The optimization pre-compiles the email validation regex pattern into a module-level constant `_EMAIL_PATTERN`, eliminating the need to recompile the regex on every function call.

**Key Changes:**
- **Regex pre-compilation**: The pattern `r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"` is compiled once at module import time using `re.compile()` and stored in `_EMAIL_PATTERN`
- **Direct pattern matching**: Uses the pre-compiled pattern's `match()` method instead of `re.match()` with a string pattern

**Why This Creates a 64% Speedup:**
In the original code, `re.match(pattern, email)` compiles the regex pattern on every call, which is computationally expensive. The line profiler shows this operation taking 82.9% of the function's execution time (5.63ms out of 6.80ms total). The optimized version eliminates this compilation overhead entirely, reducing the critical path to just 69.2% of the total time while being much faster overall.

**Impact on Workloads:**
Based on the function reference, `is_valid_email()` is called within a credential selection loop in Bitwarden's `_get_secret_value_from_url()` method. When multiple credentials are found, the function iterates through them to find valid email usernames. This optimization will significantly improve performance when:
- Multiple Bitwarden credentials exist for a domain
- The function processes large batches of email validations
- The service handles high-frequency credential lookups

**Test Case Performance:**
The optimization shows consistent 50-70% speedups across all test scenarios, with particularly strong gains on invalid emails (60-88% faster) since they fail faster without regex compilation overhead. Bulk operations see 58-73% improvements, making this especially valuable for high-throughput scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 4, 2025 13:18
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant