-
Notifications
You must be signed in to change notification settings - Fork 30
FIX: Decode Raw UTF-16 data from Conn.getinfo() #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/connection.pyLines 1189-1198 1189 # Try encodings in order: UTF-16LE (Windows), UTF-8, Latin-1
1190 for encoding in ("utf-16-le", "utf-8", "latin1"):
1191 try:
1192 return actual_data.decode(encoding).rstrip("\0")
! 1193 except UnicodeDecodeError:
! 1194 continue
1195
1196 # All decodings failed
1197 logger.debug(
1198 "error",Lines 1198-1206 1198 "error",
1199 "Failed to decode string in getinfo with any supported encoding. "
1200 "Returning None to avoid silent corruption.",
1201 )
! 1202 return None
1203 else:
1204 # If it's not bytes, return as is
1205 return data
1206 elif is_yn_type:📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.pybind.ddbc_bindings.cpp: 67.1%
mssql_python.helpers.py: 67.5%
mssql_python.pybind.connection.connection.cpp: 73.6%
mssql_python.pybind.ddbc_bindings.h: 76.9%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.connection.py: 82.2%
mssql_python.cursor.py: 83.8%🔗 Quick Links
|
mssql_python/connection.py
Outdated
| except UnicodeDecodeError: | ||
| # SQLGetInfoW returns UTF-16LE encoded strings | ||
| # Try encodings in order: UTF-16LE (Windows), UTF-8, Latin-1 | ||
| for encoding in ("utf-16-le", "utf-8", "latin1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why latin1 ? :) Asking out of curiosity. IIUC, utf-16-le and utf-8 might be sufficient. If you have other reasons to add latin1, it would be good to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I had included this in code as some kind of fallback strategy - however the expected encoding is UTF-16LE, and UTF-8 is a reasonable fallback for edge cases where data might come pre-decoded
Better to remove it. I'll make this change. Thanks for the suggestion!
Work Item / Issue Reference
Summary
This pull request introduces improvements to the handling of string encoding in the
getinfomethod for SQL Server connections, adds support for profiling builds in the Windows build script, and enhances test coverage for string decoding. The most important changes are grouped below:String Decoding Improvements
getinfomethod inconnection.pynow attempts to decode string results from SQL Server using multiple encodings in order: UTF-16LE (Windows default), UTF-8, and Latin-1. This improves robustness when handling driver responses and avoids silent data corruption by returningNoneif all decoding attempts fail.Test Coverage
test_getinfo_string_encoding_utf16intest_003_connection.pyto verify that string values returned bygetinfoare properly decoded from UTF-16, contain no null bytes, and are non-empty, helping catch encoding mismatches early.Build Script Cleanup
build.batrelated to copying themsvcp140.dllredistributable, simplifying the post-build process.