feat: Add string manipulation utilities #14

lance0821 · 2025-08-28T16:35:19Z

User description

Summary

Added string case conversion utilities (snake, camel, pascal, kebab)
Added text truncation and word counting functions
Comprehensive test coverage with 69 tests

Changes

New string_utils.py module with 6 utility functions
to_snake_case() - Convert text to snake_case
to_camel_case() - Convert text to camelCase
to_pascal_case() - Convert text to PascalCase
to_kebab_case() - Convert text to kebab-case
truncate_text() - Truncate text with customizable suffix
word_count() - Count words in text

Features

Handles consecutive capitals correctly (XMLHttpRequest → xml_http_request)
Preserves numbers in conversions
Customizable truncation suffix
Edge case handling for empty strings, special characters
All functions handle various delimiter styles (-, _, spaces)

Test plan

69 unit tests covering all functions
Edge case testing (empty strings, single chars, special chars)
Parameterized tests for comprehensive coverage
All tests passing

PR Type

Enhancement

Description

Added comprehensive string case conversion utilities
Implemented text truncation and word counting functions
Comprehensive test coverage with 69 unit tests
Handles edge cases and special character scenarios

Diagram Walkthrough

flowchart LR
  A["Input Text"] --> B["Case Conversion Functions"]
  B --> C["snake_case"]
  B --> D["camelCase"]
  B --> E["PascalCase"]
  B --> F["kebab-case"]
  A --> G["Text Utilities"]
  G --> H["truncate_text()"]
  G --> I["word_count()"]
  J["Test Suite"] --> K["69 Unit Tests"]
  K --> L["Edge Cases & Validation"]

File Walkthrough

Relevant files

Enhancement

string_utils.py `String manipulation utilities implementation` src/string_utils.py Added 4 case conversion functions (snake, camel, pascal, kebab) Implemented text truncation with customizable suffix Added word counting utility function Comprehensive docstrings with examples for all functions	+159/-0

Tests

test_string_utils.py `Comprehensive test coverage for string utilities` tests/test_string_utils.py Created comprehensive test suite with 69 unit tests Parameterized tests for all conversion functions Edge case testing for empty strings and special characters Separate test classes for organized test structure	+191/-0

- Added case conversion functions (snake, camel, pascal, kebab) - Added text truncation with custom suffix - Added word counting utility - Comprehensive test coverage (69 tests) - Handles edge cases like consecutive capitals (XMLHttpRequest)

Copilot

Pull Request Overview

This PR adds a comprehensive string manipulation utilities module with case conversion functions and text processing utilities. The implementation includes proper handling of edge cases like consecutive capitals, numbers, and special characters.

Added 6 utility functions for common string transformations (snake, camel, pascal, kebab case conversions, text truncation, and word counting)
Implemented robust edge case handling for empty strings, special characters, and various delimiter styles
Added comprehensive test coverage with 69 unit tests covering all functions and edge cases

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/string_utils.py	Main implementation of string utility functions with proper documentation and type hints
tests/test_string_utils.py	Comprehensive test suite with parameterized tests covering normal cases and edge scenarios

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/string_utils.py

qodo-merge-pro · 2025-08-28T16:35:43Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Behavior Consistency Case splitters allow special characters to pass through unchanged in some functions but not others, leading to inconsistent outputs across conversions (e.g., '@', '#'). Confirm this is intentional and documented or standardize handling. def to_camel_case(text: str) -> str: """Convert a string to camelCase. Args: text: Input string to convert Returns: String converted to camelCase Examples: >>> to_camel_case("hello_world") 'helloWorld' >>> to_camel_case("some-variable-name") 'someVariableName' >>> to_camel_case("Convert to camel") 'convertToCamel' """ # Split on non-alphanumeric characters words = re.split(r'[_\-\s]+', text) # Filter empty strings words = [w for w in words if w] if not words: return "" # First word lowercase, rest title case return words[0].lower() + ''.join(w.capitalize() for w in words[1:]) def to_pascal_case(text: str) -> str: """Convert a string to PascalCase. Args: text: Input string to convert Returns: String converted to PascalCase Examples: >>> to_pascal_case("hello_world") 'HelloWorld' >>> to_pascal_case("some-variable-name") 'SomeVariableName' >>> to_pascal_case("convert to pascal") 'ConvertToPascal' """ # Split on non-alphanumeric characters words = re.split(r'[_\-\s]+', text) # Filter empty strings and capitalize each word return ''.join(w.capitalize() for w in words if w) def to_kebab_case(text: str) -> str: """Convert a string to kebab-case. Args: text: Input string to convert Returns: String converted to kebab-case Examples: >>> to_kebab_case("HelloWorld") 'hello-world' >>> to_kebab_case("some_variable_name") 'some-variable-name' >>> to_kebab_case("Convert To Kebab") 'convert-to-kebab' """ # Replace underscores and spaces with hyphens text = re.sub(r'[_\s]+', '-', text) # Insert hyphen before capital letters (including consecutive caps) text = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1-\2', text) text = re.sub(r'([a-z0-9])([A-Z])', r'\1-\2', text) # Convert to lowercase return text.lower() Expected Output Validity Some expectations appear opinionated, e.g., "already_camelCase" -> "alreadyCamelcase" and "already_PascalCase" -> "AlreadyPascalcase", which lowercase internal 'C'. Verify that lowercasing mid-word capitals is desired behavior. @pytest.mark.parametrize( "input_text,expected", [ ("hello_world", "helloWorld"), ("some-variable-name", "someVariableName"), ("Convert to camel", "convertToCamel"), ("already_camelCase", "alreadyCamelcase"), ("mixed-Style_Example", "mixedStyleExample"), ("", ""), ("a", "a"), ("first", "first"), ("UPPERCASE", "uppercase"), ("123_numbers", "123Numbers"), ], ) def test_to_camel_case(self, input_text, expected): """Test conversion to camelCase.""" assert to_camel_case(input_text) == expected @pytest.mark.parametrize( "input_text,expected", [ ("hello_world", "HelloWorld"), ("some-variable-name", "SomeVariableName"), ("convert to pascal", "ConvertToPascal"), ("already_PascalCase", "AlreadyPascalcase"), ("mixed-Style_Example", "MixedStyleExample"), ("", ""), ("a", "A"), ("first", "First"), ("UPPERCASE", "Uppercase"), ("123_numbers", "123Numbers"), ], ) def test_to_pascal_case(self, input_text, expected): """Test conversion to PascalCase.""" assert to_pascal_case(input_text) == expected Truncation Logic truncate_text treats max_length as total including suffix; verify off-by-one behavior and slicing with multi-byte suffixes and ensure clarity in docstring since examples rely on this contract. def truncate_text(text: str, max_length: int, suffix: str = "...") -> str: """Truncate text to a maximum length with optional suffix. Args: text: Text to truncate max_length: Maximum length including suffix suffix: String to append when truncating (default: "...") Returns: Truncated text with suffix if needed Examples: >>> truncate_text("This is a long text", 10) 'This is...' >>> truncate_text("Short", 10) 'Short' >>> truncate_text("Exactly ten", 11) 'Exactly ten' """ if len(text) <= max_length: return text if max_length <= len(suffix): return suffix[:max_length] return text[:max_length - len(suffix)] + suffix

synvara-ai-reviewer

🤖 AI Code Review

Code Review for PR #14

Overall Assessment:
The implementation of string manipulation utilities is well-structured and includes comprehensive test coverage. The code is generally clean and follows Python conventions. However, there are a few areas where improvements can be made, particularly regarding edge cases, performance, and documentation.

Issues Identified:

Edge Case Handling in to_kebab_case and to_snake_case:
- Issue: The current implementation does not handle cases where there are consecutive capital letters or multiple delimiters effectively. For instance, to_kebab_case("XMLHttpRequest") should return xml-http-request, but it currently returns xml-http-request correctly.
- Solution: You can enhance the regex patterns to handle consecutive capital letters more robustly. For example, you might want to add a check for sequences of uppercase letters followed by lowercase letters.
```
text = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1-\2', text)  # Existing line
text = re.sub(r'([a-z0-9])([A-Z])', r'\1-\2', text)  # Existing line
```
Performance Considerations:
- Issue: The regex operations in the conversion functions can be optimized. Each function currently uses multiple regex substitutions, which may lead to performance issues for larger strings.
- Solution: Consider combining some of the regex patterns into a single pass where possible, or use string manipulation methods that may be faster.
Documentation Improvements:
- Issue: While the docstrings are generally clear, they could benefit from mentioning the expected behavior for edge cases directly in the examples.
- Solution: Add examples in the docstrings that demonstrate how the functions handle edge cases, such as consecutive delimiters or special characters.
Test Coverage:
- Issue: The test cases are comprehensive, but there are some edge cases that could be added for better coverage, such as:
  - Strings with only special characters.
  - Very long strings to test performance.
- Solution: Consider adding more test cases in TestEdgeCases to cover these scenarios.
Return Type Consistency:
- Issue: In truncate_text, if max_length is less than or equal to the length of suffix, it returns a substring of suffix, which may not be intuitive.
- Solution: You might want to return an empty string or a specific message indicating that truncation is not possible.
```
if max_length <= len(suffix):
    return ""  # or handle it in a way that fits your use case
```
Unused Imports:
- Issue: The import statement for List from typing is not used in the code.
- Solution: Remove the unused import to keep the code clean.

Conclusion:

The implementation is solid and covers a wide range of string manipulation needs. However, addressing the issues mentioned above will improve the robustness, performance, and clarity of the code.

Recommendation:

REQUEST_CHANGES

Please address the issues outlined above, particularly focusing on edge case handling, performance optimizations, and enhancing documentation.

Automated review by Synvara AI

📊 Review Details

Review type: REQUEST_CHANGES
Strategy: initial
Files reviewed: 2
Changes: +350 -0
Commit: 5ab168e
Timestamp: 2025-08-28T16:35:48.596Z

qodo-merge-pro · 2025-08-28T16:37:22Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Unify case conversion tokenization The camelCase and PascalCase functions only split on delimiters and ignore case transitions, producing incorrect results for inputs like "HelloWorld" or "XMLHttpRequest" and making behavior inconsistent with snake/kebab. Implement a shared tokenizer that detects word boundaries via delimiters, case transitions (including acronyms), and numbers, then reuse it across all case conversions. This will align functionality with the stated features and ensure consistent, predictable conversions. Examples: src/string_utils.py [37-61] def to_camel_case(text: str) -> str: """Convert a string to camelCase. Args: text: Input string to convert Returns: String converted to camelCase Examples: ... (clipped 15 lines) src/string_utils.py [64-84] def to_pascal_case(text: str) -> str: """Convert a string to PascalCase. Args: text: Input string to convert Returns: String converted to PascalCase Examples: ... (clipped 11 lines) Solution Walkthrough: Before: def to_camel_case(text: str) -> str: # Splits only on delimiters, not case changes words = re.split(r'[_\-\s]+', text) if not words: return "" return words[0].lower() + ''.join(w.capitalize() for w in words[1:]) def to_pascal_case(text: str) -> str: # Splits only on delimiters, not case changes words = re.split(r'[_\-\s]+', text) return ''.join(w.capitalize() for w in words if w) # to_snake_case and to_kebab_case use a different, more robust logic # based on regex substitutions for case changes. After: def _tokenize(text: str) -> List[str]: # First, insert delimiters based on case changes text = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1_\2', text) text = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', text) # Now split by all delimiters return re.split(r'[_\-\s]+', text.lower()) def to_camel_case(text: str) -> str: words = [w for w in _tokenize(text) if w] if not words: return "" return words[0] + ''.join(w.capitalize() for w in words[1:]) def to_pascal_case(text: str) -> str: words = [w for w in _tokenize(text) if w] return ''.join(w.capitalize() for w in words) Suggestion importance[1-10]: 9 __ Why: This suggestion correctly identifies a critical design flaw where `to_camel_case` and `to_pascal_case` fail to handle case transitions, leading to incorrect outputs and inconsistency with other functions, and proposes a robust refactoring that significantly improves correctness and maintainability.	High
Possible issue	✅ ~~Validate non-negative truncation length~~ Suggestion Impact: The commit added an explicit check that raises ValueError if max_length < 0, aligning with the suggestion to validate non-negative truncation length. code diff: + if max_length < 0: + raise ValueError(f"max_length must be non-negative, got {max_length}") + Guard against negative `max_length` to avoid surprising behavior from negative slicing (e.g., returning a partially sliced suffix). Explicitly validate and raise a `ValueError` when `max_length` is negative to make the API safer and predictable. src/string_utils.py [132-138] +if max_length < 0: + raise ValueError("max_length must be non-negative") + if len(text) <= max_length: return text if max_length <= len(suffix): return suffix[:max_length] return text[:max_length - len(suffix)] + suffix `[Suggestion processed]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that a negative `max_length` can cause unexpected slicing behavior. Adding a `ValueError` check for this edge case improves the function's robustness and makes its API safer and more predictable for consumers.	Low
Update

Improvements made: 1. Performance Optimizations: - Precompiled regex patterns for better performance - Single-pass regex operations where possible - Performance tests confirm <10ms for long strings 2. Edge Case Handling: - Improved handling of consecutive delimiters - Better handling of special characters - Proper cleanup of leading/trailing delimiters 3. Documentation Enhancements: - Added edge case examples in docstrings - Clear behavior documentation for all functions - Unicode handling notes 4. New Features: - Added is_mixed_case() function - Added remove_extra_whitespace() utility 5. Test Coverage: - Expanded to 103 tests (from 69) - Added performance benchmarks - Added special character tests - Unicode handling tests 6. Code Quality: - Removed unused imports - Improved truncate_text return behavior - All functions handle empty strings gracefully

synvara-ai-reviewer

🔄 AI Code Re-Review

⚠️ Some issues still need attention

Code Review for PR #14 (Follow-Up Review)

Overall Assessment:

The recent changes made to the string manipulation utilities have addressed most of the previous concerns raised in the initial review. The implementation now includes improved edge case handling, performance optimizations, and enhanced documentation. However, there are still a couple of areas that require attention.

Review of Previous Concerns:

Edge Case Handling in to_kebab_case and to_snake_case:
- Previous Concern: The handling of consecutive capital letters and multiple delimiters was not robust.
- Current Status: The regex patterns have been improved to handle consecutive capitals and delimiters effectively. This is now correctly implemented.
Performance Considerations:
- Previous Concern: The regex operations were potentially inefficient.
- Current Status: The regex patterns have been precompiled and optimized for performance. This is a significant improvement.
Documentation Improvements:
- Previous Concern: Docstrings needed examples for edge cases.
- Current Status: The docstrings have been enhanced with examples that cover edge cases, making the expected behavior clearer.
Test Coverage:
- Previous Concern: Additional edge cases were needed in tests.
- Current Status: The test suite has been expanded significantly, now covering various edge cases, including special characters and performance benchmarks.
Return Type Consistency in truncate_text:
- Previous Concern: The return behavior when max_length is less than or equal to the length of suffix was not intuitive.
- Current Status: The function now returns an empty string if max_length is less than or equal to the length of suffix, which is a clearer behavior.
Unused Imports:
- Previous Concern: The import statement for List from typing was unused.
- Current Status: The unused import has been removed, which keeps the code clean.

New Concerns:

Behavior Consistency:
- The camelCase and PascalCase functions only split on delimiters and ignore case transitions, which can lead to incorrect results for inputs like "HelloWorld" or "XMLHttpRequest". This inconsistency with the snake_case and kebab_case functions should be addressed by implementing a shared tokenizer that detects word boundaries via delimiters, case transitions, and numbers.
Validation of Non-Negative Truncation Length:
- The truncate_text function should explicitly validate that max_length is non-negative to avoid unexpected behavior from negative slicing. Raising a ValueError for negative max_length would make the API safer and more predictable.

Conclusion:

The majority of the previous concerns have been effectively addressed, improving the robustness, performance, and clarity of the code. However, the issues regarding behavior consistency in case conversion functions and the need for validation of max_length in the truncate_text function remain.

Recommendation:

REQUEST_CHANGES

Please address the following:

Implement a shared tokenizer for case conversion functions to ensure consistent behavior.
Add validation for non-negative max_length in the truncate_text function.

These changes will further enhance the quality and reliability of the string manipulation utilities.

Automated review by Synvara AI

📊 Review Details

Review type: REQUEST_CHANGES
Strategy: re-review-after-changes
Files reviewed: 2
Changes: +591 -0
New commits: 1
Commit: 1bceb42
Timestamp: 2025-08-28T16:41:33.645Z

…edback Addressed final review feedback from PR #14: 1. Shared Tokenizer Implementation: - Created _tokenize_string() for consistent word boundary detection - All case conversion functions now use the same tokenizer - Properly handles case transitions (HelloWorld -> Hello, World) - Correctly tokenizes consecutive capitals (XMLHttpRequest -> XML, Http, Request) - Ensures consistent behavior across all conversion functions 2. Non-Negative Validation: - Added validation in truncate_text() for max_length parameter - Raises ValueError for negative values - Makes API safer and more predictable 3. Test Improvements: - Added dedicated test suite for tokenizer - Added test for ValueError on negative max_length - Updated expectations for consistent tokenization - All 110 tests passing The implementation now provides consistent behavior across all case conversion functions and safer parameter validation.

synvara-ai-reviewer

🔄 AI Code Re-Review

✅ Previous concerns have been addressed

Code Review for PR #14 (Follow-Up Review)

Overall Assessment:

The recent changes made to the string manipulation utilities have effectively addressed many of the concerns raised in the previous review. The implementation now includes a shared tokenizer for consistent behavior across case conversion functions, as well as validation for non-negative truncation lengths in the truncate_text function. However, there are still some areas that require further attention.

Review of Previous Concerns:

Shared Tokenizer Implementation:
- Previous Concern: The camelCase and PascalCase functions only split on delimiters and ignore case transitions, leading to inconsistent behavior.
- Current Status: The implementation now includes a shared tokenizer (_tokenize_string) that handles word boundaries via delimiters, case transitions, and numbers. This change has improved the consistency of the case conversion functions, addressing the previous concern effectively.
Validation of Non-Negative Truncation Length:
- Previous Concern: The truncate_text function needed validation for non-negative max_length.
- Current Status: The function now raises a ValueError when max_length is negative, which enhances the safety and predictability of the API. This concern has been satisfactorily addressed.
Edge Case Handling in to_kebab_case and to_snake_case:
- Previous Concern: The handling of consecutive capital letters and multiple delimiters was not robust.
- Current Status: The regex patterns have been improved to handle these cases effectively, which has been confirmed through the updated test cases.
Performance Considerations:
- Previous Concern: The regex operations were potentially inefficient.
- Current Status: The regex patterns are now precompiled, which optimizes performance. This is a significant improvement.
Documentation Improvements:
- Previous Concern: Docstrings needed examples for edge cases.
- Current Status: The docstrings have been enhanced with examples that cover edge cases, making the expected behavior clearer.
Test Coverage:
- Previous Concern: Additional edge cases were needed in tests.
- Current Status: The test suite has been expanded significantly, now covering various edge cases, including special characters and performance benchmarks.
Return Type Consistency in truncate_text:
- Previous Concern: The return behavior when max_length is less than or equal to the length of suffix was not intuitive.
- Current Status: The function now returns an empty string if max_length is less than or equal to the length of suffix, which is a clearer behavior.
Unused Imports:
- Previous Concern: The import statement for List from typing was unused.
- Current Status: The unused import has been removed, which keeps the code clean.

New Concerns:

Behavior Consistency: While the shared tokenizer has improved consistency, there may still be edge cases regarding how special characters are handled across different functions. It would be beneficial to ensure that all functions handle special characters uniformly, as noted in the discussion thread.

Conclusion:

The majority of the previous concerns have been effectively addressed, improving the robustness, performance, and clarity of the code. The implementation now provides consistent behavior across all case conversion functions and safer parameter validation. However, attention should be given to the handling of special characters to ensure uniformity across all string manipulation functions.

Recommendation:

APPROVE

The changes made in this PR significantly enhance the quality and reliability of the string manipulation utilities. Please ensure that any remaining concerns regarding special character handling are addressed in future updates.

Automated review by Synvara AI

📊 Review Details

Review type: APPROVE
Strategy: re-review-after-changes
Files reviewed: 2
Changes: +638 -0
New commits: 1
Commit: 38e304a
Timestamp: 2025-08-28T16:48:17.826Z

feat: Add string manipulation utilities

5ab168e

- Added case conversion functions (snake, camel, pascal, kebab) - Added text truncation with custom suffix - Added word counting utility - Comprehensive test coverage (69 tests) - Handles edge cases like consecutive capitals (XMLHttpRequest)

Copilot AI review requested due to automatic review settings August 28, 2025 16:35

github-actions bot added python tests labels Aug 28, 2025

Copilot AI reviewed Aug 28, 2025

View reviewed changes

src/string_utils.py Show resolved Hide resolved

qodo-merge-pro bot added the Review effort 2/5 label Aug 28, 2025

synvara-ai-reviewer bot requested changes Aug 28, 2025

View reviewed changes

synvara-ai-reviewer bot approved these changes Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add string manipulation utilities #14

feat: Add string manipulation utilities #14

Uh oh!

lance0821 commented Aug 28, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

qodo-merge-pro bot commented Aug 28, 2025

Uh oh!

synvara-ai-reviewer bot left a comment

Uh oh!

qodo-merge-pro bot commented Aug 28, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

synvara-ai-reviewer bot left a comment

Uh oh!

synvara-ai-reviewer bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add string manipulation utilities #14

Are you sure you want to change the base?

feat: Add string manipulation utilities #14

Uh oh!

Conversation

lance0821 commented Aug 28, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Changes

Features

Test plan

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

qodo-merge-pro bot commented Aug 28, 2025

PR Reviewer Guide 🔍

Uh oh!

synvara-ai-reviewer bot left a comment

Choose a reason for hiding this comment

🤖 AI Code Review

Code Review for PR #14

Issues Identified:

Conclusion:

Recommendation:

Uh oh!

qodo-merge-pro bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

synvara-ai-reviewer bot left a comment

Choose a reason for hiding this comment

🔄 AI Code Re-Review

Code Review for PR #14 (Follow-Up Review)

Overall Assessment:

Review of Previous Concerns:

New Concerns:

Conclusion:

Recommendation:

Uh oh!

synvara-ai-reviewer bot left a comment

Choose a reason for hiding this comment

🔄 AI Code Re-Review

Code Review for PR #14 (Follow-Up Review)

Overall Assessment:

Review of Previous Concerns:

New Concerns:

Conclusion:

Recommendation:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lance0821 commented Aug 28, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Aug 28, 2025 •

edited

Loading