Skip to content

Conversation

@lance0821
Copy link
Contributor

@lance0821 lance0821 commented Aug 28, 2025

User description

Summary

  • Added string case conversion utilities (snake, camel, pascal, kebab)
  • Added text truncation and word counting functions
  • Comprehensive test coverage with 69 tests

Changes

  • New string_utils.py module with 6 utility functions
  • to_snake_case() - Convert text to snake_case
  • to_camel_case() - Convert text to camelCase
  • to_pascal_case() - Convert text to PascalCase
  • to_kebab_case() - Convert text to kebab-case
  • truncate_text() - Truncate text with customizable suffix
  • word_count() - Count words in text

Features

  • Handles consecutive capitals correctly (XMLHttpRequest → xml_http_request)
  • Preserves numbers in conversions
  • Customizable truncation suffix
  • Edge case handling for empty strings, special characters
  • All functions handle various delimiter styles (-, _, spaces)

Test plan

  • 69 unit tests covering all functions
  • Edge case testing (empty strings, single chars, special chars)
  • Parameterized tests for comprehensive coverage
  • All tests passing

PR Type

Enhancement


Description

  • Added comprehensive string case conversion utilities

  • Implemented text truncation and word counting functions

  • Comprehensive test coverage with 69 unit tests

  • Handles edge cases and special character scenarios


Diagram Walkthrough

flowchart LR
  A["Input Text"] --> B["Case Conversion Functions"]
  B --> C["snake_case"]
  B --> D["camelCase"]
  B --> E["PascalCase"]
  B --> F["kebab-case"]
  A --> G["Text Utilities"]
  G --> H["truncate_text()"]
  G --> I["word_count()"]
  J["Test Suite"] --> K["69 Unit Tests"]
  K --> L["Edge Cases & Validation"]
Loading

File Walkthrough

Relevant files
Enhancement
string_utils.py
String manipulation utilities implementation                         

src/string_utils.py

  • Added 4 case conversion functions (snake, camel, pascal, kebab)
  • Implemented text truncation with customizable suffix
  • Added word counting utility function
  • Comprehensive docstrings with examples for all functions
+159/-0 
Tests
test_string_utils.py
Comprehensive test coverage for string utilities                 

tests/test_string_utils.py

  • Created comprehensive test suite with 69 unit tests
  • Parameterized tests for all conversion functions
  • Edge case testing for empty strings and special characters
  • Separate test classes for organized test structure
+191/-0 

- Added case conversion functions (snake, camel, pascal, kebab)
- Added text truncation with custom suffix
- Added word counting utility
- Comprehensive test coverage (69 tests)
- Handles edge cases like consecutive capitals (XMLHttpRequest)
Copilot AI review requested due to automatic review settings August 28, 2025 16:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive string manipulation utilities module with case conversion functions and text processing utilities. The implementation includes proper handling of edge cases like consecutive capitals, numbers, and special characters.

  • Added 6 utility functions for common string transformations (snake, camel, pascal, kebab case conversions, text truncation, and word counting)
  • Implemented robust edge case handling for empty strings, special characters, and various delimiter styles
  • Added comprehensive test coverage with 69 unit tests covering all functions and edge cases

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/string_utils.py Main implementation of string utility functions with proper documentation and type hints
tests/test_string_utils.py Comprehensive test suite with parameterized tests covering normal cases and edge scenarios

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@qodo-merge-pro
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Behavior Consistency

Case splitters allow special characters to pass through unchanged in some functions but not others, leading to inconsistent outputs across conversions (e.g., '@', '#'). Confirm this is intentional and documented or standardize handling.

def to_camel_case(text: str) -> str:
    """Convert a string to camelCase.

    Args:
        text: Input string to convert

    Returns:
        String converted to camelCase

    Examples:
        >>> to_camel_case("hello_world")
        'helloWorld'
        >>> to_camel_case("some-variable-name")
        'someVariableName'
        >>> to_camel_case("Convert to camel")
        'convertToCamel'
    """
    # Split on non-alphanumeric characters
    words = re.split(r'[_\-\s]+', text)
    # Filter empty strings
    words = [w for w in words if w]
    if not words:
        return ""
    # First word lowercase, rest title case
    return words[0].lower() + ''.join(w.capitalize() for w in words[1:])


def to_pascal_case(text: str) -> str:
    """Convert a string to PascalCase.

    Args:
        text: Input string to convert

    Returns:
        String converted to PascalCase

    Examples:
        >>> to_pascal_case("hello_world")
        'HelloWorld'
        >>> to_pascal_case("some-variable-name")
        'SomeVariableName'
        >>> to_pascal_case("convert to pascal")
        'ConvertToPascal'
    """
    # Split on non-alphanumeric characters
    words = re.split(r'[_\-\s]+', text)
    # Filter empty strings and capitalize each word
    return ''.join(w.capitalize() for w in words if w)


def to_kebab_case(text: str) -> str:
    """Convert a string to kebab-case.

    Args:
        text: Input string to convert

    Returns:
        String converted to kebab-case

    Examples:
        >>> to_kebab_case("HelloWorld")
        'hello-world'
        >>> to_kebab_case("some_variable_name")
        'some-variable-name'
        >>> to_kebab_case("Convert To Kebab")
        'convert-to-kebab'
    """
    # Replace underscores and spaces with hyphens
    text = re.sub(r'[_\s]+', '-', text)
    # Insert hyphen before capital letters (including consecutive caps)
    text = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1-\2', text)
    text = re.sub(r'([a-z0-9])([A-Z])', r'\1-\2', text)
    # Convert to lowercase
    return text.lower()
Expected Output Validity

Some expectations appear opinionated, e.g., "already_camelCase" -> "alreadyCamelcase" and "already_PascalCase" -> "AlreadyPascalcase", which lowercase internal 'C'. Verify that lowercasing mid-word capitals is desired behavior.

@pytest.mark.parametrize(
    "input_text,expected",
    [
        ("hello_world", "helloWorld"),
        ("some-variable-name", "someVariableName"),
        ("Convert to camel", "convertToCamel"),
        ("already_camelCase", "alreadyCamelcase"),
        ("mixed-Style_Example", "mixedStyleExample"),
        ("", ""),
        ("a", "a"),
        ("first", "first"),
        ("UPPERCASE", "uppercase"),
        ("123_numbers", "123Numbers"),
    ],
)
def test_to_camel_case(self, input_text, expected):
    """Test conversion to camelCase."""
    assert to_camel_case(input_text) == expected

@pytest.mark.parametrize(
    "input_text,expected",
    [
        ("hello_world", "HelloWorld"),
        ("some-variable-name", "SomeVariableName"),
        ("convert to pascal", "ConvertToPascal"),
        ("already_PascalCase", "AlreadyPascalcase"),
        ("mixed-Style_Example", "MixedStyleExample"),
        ("", ""),
        ("a", "A"),
        ("first", "First"),
        ("UPPERCASE", "Uppercase"),
        ("123_numbers", "123Numbers"),
    ],
)
def test_to_pascal_case(self, input_text, expected):
    """Test conversion to PascalCase."""
    assert to_pascal_case(input_text) == expected
Truncation Logic

truncate_text treats max_length as total including suffix; verify off-by-one behavior and slicing with multi-byte suffixes and ensure clarity in docstring since examples rely on this contract.

def truncate_text(text: str, max_length: int, suffix: str = "...") -> str:
    """Truncate text to a maximum length with optional suffix.

    Args:
        text: Text to truncate
        max_length: Maximum length including suffix
        suffix: String to append when truncating (default: "...")

    Returns:
        Truncated text with suffix if needed

    Examples:
        >>> truncate_text("This is a long text", 10)
        'This is...'
        >>> truncate_text("Short", 10)
        'Short'
        >>> truncate_text("Exactly ten", 11)
        'Exactly ten'
    """
    if len(text) <= max_length:
        return text

    if max_length <= len(suffix):
        return suffix[:max_length]

    return text[:max_length - len(suffix)] + suffix

Copy link

@synvara-ai-reviewer synvara-ai-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review

Code Review for PR #14

Overall Assessment:
The implementation of string manipulation utilities is well-structured and includes comprehensive test coverage. The code is generally clean and follows Python conventions. However, there are a few areas where improvements can be made, particularly regarding edge cases, performance, and documentation.

Issues Identified:

  1. Edge Case Handling in to_kebab_case and to_snake_case:

    • Issue: The current implementation does not handle cases where there are consecutive capital letters or multiple delimiters effectively. For instance, to_kebab_case("XMLHttpRequest") should return xml-http-request, but it currently returns xml-http-request correctly.
    • Solution: You can enhance the regex patterns to handle consecutive capital letters more robustly. For example, you might want to add a check for sequences of uppercase letters followed by lowercase letters.
    text = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1-\2', text)  # Existing line
    text = re.sub(r'([a-z0-9])([A-Z])', r'\1-\2', text)  # Existing line
  2. Performance Considerations:

    • Issue: The regex operations in the conversion functions can be optimized. Each function currently uses multiple regex substitutions, which may lead to performance issues for larger strings.
    • Solution: Consider combining some of the regex patterns into a single pass where possible, or use string manipulation methods that may be faster.
  3. Documentation Improvements:

    • Issue: While the docstrings are generally clear, they could benefit from mentioning the expected behavior for edge cases directly in the examples.
    • Solution: Add examples in the docstrings that demonstrate how the functions handle edge cases, such as consecutive delimiters or special characters.
  4. Test Coverage:

    • Issue: The test cases are comprehensive, but there are some edge cases that could be added for better coverage, such as:
      • Strings with only special characters.
      • Very long strings to test performance.
    • Solution: Consider adding more test cases in TestEdgeCases to cover these scenarios.
  5. Return Type Consistency:

    • Issue: In truncate_text, if max_length is less than or equal to the length of suffix, it returns a substring of suffix, which may not be intuitive.
    • Solution: You might want to return an empty string or a specific message indicating that truncation is not possible.
    if max_length <= len(suffix):
        return ""  # or handle it in a way that fits your use case
  6. Unused Imports:

    • Issue: The import statement for List from typing is not used in the code.
    • Solution: Remove the unused import to keep the code clean.

Conclusion:

The implementation is solid and covers a wide range of string manipulation needs. However, addressing the issues mentioned above will improve the robustness, performance, and clarity of the code.

Recommendation:

REQUEST_CHANGES

Please address the issues outlined above, particularly focusing on edge case handling, performance optimizations, and enhancing documentation.


Automated review by Synvara AI

📊 Review Details
  • Review type: REQUEST_CHANGES
  • Strategy: initial
  • Files reviewed: 2
  • Changes: +350 -0
  • Commit: 5ab168e
  • Timestamp: 2025-08-28T16:35:48.596Z

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Aug 28, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Unify case conversion tokenization

The camelCase and PascalCase functions only split on delimiters and ignore case
transitions, producing incorrect results for inputs like "HelloWorld" or
"XMLHttpRequest" and making behavior inconsistent with snake/kebab. Implement a
shared tokenizer that detects word boundaries via delimiters, case transitions
(including acronyms), and numbers, then reuse it across all case conversions.
This will align functionality with the stated features and ensure consistent,
predictable conversions.

Examples:

src/string_utils.py [37-61]
def to_camel_case(text: str) -> str:
    """Convert a string to camelCase.
    
    Args:
        text: Input string to convert
        
    Returns:
        String converted to camelCase
        
    Examples:

 ... (clipped 15 lines)
src/string_utils.py [64-84]
def to_pascal_case(text: str) -> str:
    """Convert a string to PascalCase.
    
    Args:
        text: Input string to convert
        
    Returns:
        String converted to PascalCase
        
    Examples:

 ... (clipped 11 lines)

Solution Walkthrough:

Before:

def to_camel_case(text: str) -> str:
    # Splits only on delimiters, not case changes
    words = re.split(r'[_\-\s]+', text)
    if not words:
        return ""
    return words[0].lower() + ''.join(w.capitalize() for w in words[1:])

def to_pascal_case(text: str) -> str:
    # Splits only on delimiters, not case changes
    words = re.split(r'[_\-\s]+', text)
    return ''.join(w.capitalize() for w in words if w)

# to_snake_case and to_kebab_case use a different, more robust logic
# based on regex substitutions for case changes.

After:

def _tokenize(text: str) -> List[str]:
    # First, insert delimiters based on case changes
    text = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1_\2', text)
    text = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', text)
    # Now split by all delimiters
    return re.split(r'[_\-\s]+', text.lower())

def to_camel_case(text: str) -> str:
    words = [w for w in _tokenize(text) if w]
    if not words: return ""
    return words[0] + ''.join(w.capitalize() for w in words[1:])

def to_pascal_case(text: str) -> str:
    words = [w for w in _tokenize(text) if w]
    return ''.join(w.capitalize() for w in words)
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical design flaw where to_camel_case and to_pascal_case fail to handle case transitions, leading to incorrect outputs and inconsistency with other functions, and proposes a robust refactoring that significantly improves correctness and maintainability.

High
Possible issue
Validate non-negative truncation length
Suggestion Impact:The commit added an explicit check that raises ValueError if max_length < 0, aligning with the suggestion to validate non-negative truncation length.

code diff:

+    if max_length < 0:
+        raise ValueError(f"max_length must be non-negative, got {max_length}")
+    

Guard against negative max_length to avoid surprising behavior from negative
slicing (e.g., returning a partially sliced suffix). Explicitly validate and
raise a ValueError when max_length is negative to make the API safer and
predictable.

src/string_utils.py [132-138]

+if max_length < 0:
+    raise ValueError("max_length must be non-negative")
+
 if len(text) <= max_length:
     return text
 
 if max_length <= len(suffix):
     return suffix[:max_length]
 
 return text[:max_length - len(suffix)] + suffix

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that a negative max_length can cause unexpected slicing behavior. Adding a ValueError check for this edge case improves the function's robustness and makes its API safer and more predictable for consumers.

Low
  • Update

Improvements made:

1. Performance Optimizations:
   - Precompiled regex patterns for better performance
   - Single-pass regex operations where possible
   - Performance tests confirm <10ms for long strings

2. Edge Case Handling:
   - Improved handling of consecutive delimiters
   - Better handling of special characters
   - Proper cleanup of leading/trailing delimiters

3. Documentation Enhancements:
   - Added edge case examples in docstrings
   - Clear behavior documentation for all functions
   - Unicode handling notes

4. New Features:
   - Added is_mixed_case() function
   - Added remove_extra_whitespace() utility

5. Test Coverage:
   - Expanded to 103 tests (from 69)
   - Added performance benchmarks
   - Added special character tests
   - Unicode handling tests

6. Code Quality:
   - Removed unused imports
   - Improved truncate_text return behavior
   - All functions handle empty strings gracefully
Copy link

@synvara-ai-reviewer synvara-ai-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔄 AI Code Re-Review

⚠️ Some issues still need attention

Code Review for PR #14 (Follow-Up Review)

Overall Assessment:

The recent changes made to the string manipulation utilities have addressed most of the previous concerns raised in the initial review. The implementation now includes improved edge case handling, performance optimizations, and enhanced documentation. However, there are still a couple of areas that require attention.

Review of Previous Concerns:

  1. Edge Case Handling in to_kebab_case and to_snake_case:

    • Previous Concern: The handling of consecutive capital letters and multiple delimiters was not robust.
    • Current Status: The regex patterns have been improved to handle consecutive capitals and delimiters effectively. This is now correctly implemented.
  2. Performance Considerations:

    • Previous Concern: The regex operations were potentially inefficient.
    • Current Status: The regex patterns have been precompiled and optimized for performance. This is a significant improvement.
  3. Documentation Improvements:

    • Previous Concern: Docstrings needed examples for edge cases.
    • Current Status: The docstrings have been enhanced with examples that cover edge cases, making the expected behavior clearer.
  4. Test Coverage:

    • Previous Concern: Additional edge cases were needed in tests.
    • Current Status: The test suite has been expanded significantly, now covering various edge cases, including special characters and performance benchmarks.
  5. Return Type Consistency in truncate_text:

    • Previous Concern: The return behavior when max_length is less than or equal to the length of suffix was not intuitive.
    • Current Status: The function now returns an empty string if max_length is less than or equal to the length of suffix, which is a clearer behavior.
  6. Unused Imports:

    • Previous Concern: The import statement for List from typing was unused.
    • Current Status: The unused import has been removed, which keeps the code clean.

New Concerns:

  1. Behavior Consistency:

    • The camelCase and PascalCase functions only split on delimiters and ignore case transitions, which can lead to incorrect results for inputs like "HelloWorld" or "XMLHttpRequest". This inconsistency with the snake_case and kebab_case functions should be addressed by implementing a shared tokenizer that detects word boundaries via delimiters, case transitions, and numbers.
  2. Validation of Non-Negative Truncation Length:

    • The truncate_text function should explicitly validate that max_length is non-negative to avoid unexpected behavior from negative slicing. Raising a ValueError for negative max_length would make the API safer and more predictable.

Conclusion:

The majority of the previous concerns have been effectively addressed, improving the robustness, performance, and clarity of the code. However, the issues regarding behavior consistency in case conversion functions and the need for validation of max_length in the truncate_text function remain.

Recommendation:

REQUEST_CHANGES

Please address the following:

  • Implement a shared tokenizer for case conversion functions to ensure consistent behavior.
  • Add validation for non-negative max_length in the truncate_text function.

These changes will further enhance the quality and reliability of the string manipulation utilities.


Automated review by Synvara AI

📊 Review Details
  • Review type: REQUEST_CHANGES
  • Strategy: re-review-after-changes
  • Files reviewed: 2
  • Changes: +591 -0
  • New commits: 1
  • Commit: 1bceb42
  • Timestamp: 2025-08-28T16:41:33.645Z

…edback

Addressed final review feedback from PR #14:

1. Shared Tokenizer Implementation:
   - Created _tokenize_string() for consistent word boundary detection
   - All case conversion functions now use the same tokenizer
   - Properly handles case transitions (HelloWorld -> Hello, World)
   - Correctly tokenizes consecutive capitals (XMLHttpRequest -> XML, Http, Request)
   - Ensures consistent behavior across all conversion functions

2. Non-Negative Validation:
   - Added validation in truncate_text() for max_length parameter
   - Raises ValueError for negative values
   - Makes API safer and more predictable

3. Test Improvements:
   - Added dedicated test suite for tokenizer
   - Added test for ValueError on negative max_length
   - Updated expectations for consistent tokenization
   - All 110 tests passing

The implementation now provides consistent behavior across all case
conversion functions and safer parameter validation.
Copy link

@synvara-ai-reviewer synvara-ai-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔄 AI Code Re-Review

Previous concerns have been addressed

Code Review for PR #14 (Follow-Up Review)

Overall Assessment:

The recent changes made to the string manipulation utilities have effectively addressed many of the concerns raised in the previous review. The implementation now includes a shared tokenizer for consistent behavior across case conversion functions, as well as validation for non-negative truncation lengths in the truncate_text function. However, there are still some areas that require further attention.

Review of Previous Concerns:

  1. Shared Tokenizer Implementation:

    • Previous Concern: The camelCase and PascalCase functions only split on delimiters and ignore case transitions, leading to inconsistent behavior.
    • Current Status: The implementation now includes a shared tokenizer (_tokenize_string) that handles word boundaries via delimiters, case transitions, and numbers. This change has improved the consistency of the case conversion functions, addressing the previous concern effectively.
  2. Validation of Non-Negative Truncation Length:

    • Previous Concern: The truncate_text function needed validation for non-negative max_length.
    • Current Status: The function now raises a ValueError when max_length is negative, which enhances the safety and predictability of the API. This concern has been satisfactorily addressed.
  3. Edge Case Handling in to_kebab_case and to_snake_case:

    • Previous Concern: The handling of consecutive capital letters and multiple delimiters was not robust.
    • Current Status: The regex patterns have been improved to handle these cases effectively, which has been confirmed through the updated test cases.
  4. Performance Considerations:

    • Previous Concern: The regex operations were potentially inefficient.
    • Current Status: The regex patterns are now precompiled, which optimizes performance. This is a significant improvement.
  5. Documentation Improvements:

    • Previous Concern: Docstrings needed examples for edge cases.
    • Current Status: The docstrings have been enhanced with examples that cover edge cases, making the expected behavior clearer.
  6. Test Coverage:

    • Previous Concern: Additional edge cases were needed in tests.
    • Current Status: The test suite has been expanded significantly, now covering various edge cases, including special characters and performance benchmarks.
  7. Return Type Consistency in truncate_text:

    • Previous Concern: The return behavior when max_length is less than or equal to the length of suffix was not intuitive.
    • Current Status: The function now returns an empty string if max_length is less than or equal to the length of suffix, which is a clearer behavior.
  8. Unused Imports:

    • Previous Concern: The import statement for List from typing was unused.
    • Current Status: The unused import has been removed, which keeps the code clean.

New Concerns:

  • Behavior Consistency: While the shared tokenizer has improved consistency, there may still be edge cases regarding how special characters are handled across different functions. It would be beneficial to ensure that all functions handle special characters uniformly, as noted in the discussion thread.

Conclusion:

The majority of the previous concerns have been effectively addressed, improving the robustness, performance, and clarity of the code. The implementation now provides consistent behavior across all case conversion functions and safer parameter validation. However, attention should be given to the handling of special characters to ensure uniformity across all string manipulation functions.

Recommendation:

APPROVE

The changes made in this PR significantly enhance the quality and reliability of the string manipulation utilities. Please ensure that any remaining concerns regarding special character handling are addressed in future updates.


Automated review by Synvara AI

📊 Review Details
  • Review type: APPROVE
  • Strategy: re-review-after-changes
  • Files reviewed: 2
  • Changes: +638 -0
  • New commits: 1
  • Commit: 38e304a
  • Timestamp: 2025-08-28T16:48:17.826Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants