-
Notifications
You must be signed in to change notification settings - Fork 30
FIX: Fix for deprecated lib function wstring_convert #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/pybind/ddbc_bindings.h📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.cpp: 66.2%
mssql_python.row.py: 66.2%
mssql_python.helpers.py: 67.5%
mssql_python.pybind.connection.connection.cpp: 73.6%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.connection.py: 83.7%
mssql_python.cursor.py: 84.2%
mssql_python.logging.py: 85.3%🔗 Quick Links
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request removes the deprecated std::wstring_convert and std::codecvt utilities and replaces them with manual UTF-8/UTF-16/UTF-32 conversion implementations for Unix platforms (macOS/Linux). The changes aim to modernize the codebase and improve performance through direct encoding/decoding without intermediate buffers.
Key changes:
- Refactored
SQLWCHARToWStringandWStringToSQLWCHARinunix_utils.cppto manually handle UTF-16 ↔ UTF-32 conversions with explicit surrogate pair encoding/decoding - Implemented manual UTF-8 to UTF-32 decoder in
Utf8ToWStringfunction inddbc_bindings.h
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| mssql_python/pybind/unix_utils.cpp | Replaces codecvt-based conversions with manual UTF-16 ↔ UTF-32 conversion logic including surrogate pair handling |
| mssql_python/pybind/ddbc_bindings.h | Adds manual UTF-8 decoder with multi-byte sequence handling to replace deprecated wstring_convert |
Critical Issues Identified:
- Security: UTF-8 decoder lacks validation for overlong encodings and malformed continuation bytes, which could lead to security vulnerabilities
- Correctness: Invalid surrogate code points (0xD800-0xDFFF) are not properly validated in
WStringToSQLWCHAR, allowing invalid Unicode to be generated - Robustness: Flawed logic in invalid sequence detection at line 516 of
ddbc_bindings.hmay cause incorrect behavior
The implementation introduces several critical bugs that deviate from the existing, more robust implementation already present in ddbc_bindings.h (lines 79-167). The existing implementation properly validates Unicode scalars and replaces invalid sequences with the Unicode replacement character (0xFFFD), while the new code in unix_utils.cpp silently passes through or skips invalid values inconsistently.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the Alpine PR is merged, please update the x86_64 Alpine pipeline in this PR to use alpine:latest. This PR will resolve the compiler warning issue and allow x86 to move back to the latest image.
The ARM64 pipeline can be updated to latest once the ODBC fix is in (not part of this PR).
bewithgaurav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting to put tests for uncovered logic
419b024 to
ac56363
Compare
ab15ef9
@gaurav, i have addressed the review comments related with Diff code coverage. Added required test and it is now 74%. As you are not available for quick review now, hence removing you from the mandatory reviewer list in-order to merge the PR.
Work Item / Issue Reference
Summary
This pull request refactors and optimizes the string conversion utilities in
unix_utils.cppfor converting betweenSQLWCHARarrays andstd::wstringon macOS/Linux. The new implementation eliminates intermediate buffers and reliance oncodecvt, resulting in more efficient and robust conversions, especially for Unicode characters outside the Basic Multilingual Plane (BMP).String conversion optimizations:
SQLWCHARToWStringimplementation with a direct UTF-16 to UTF-32 conversion, handling surrogate pairs explicitly and removing the use ofstd::wstring_convertand intermediate buffers.WStringToSQLWCHARfunction to convertstd::wstring(UTF-32) to UTF-16, encoding surrogate pairs manually and streamlining the conversion logic for better performance and branch prediction.Robustness and correctness improvements:
Code simplification: