SNOW-3440288: Enhance schema string parser for quotes by sfc-gh-wshangguan · Pull Request #4206 · snowflakedb/snowpark-python

sfc-gh-wshangguan · 2026-04-29T22:00:18Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-3440288
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
- If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.

Problem

concrete examples of INFER_SCHEMA outputs that the existing parser broke on (space / comma / paren / mixed-case inside "..." field names) and why those names need quoting per Snowflake's identifier grammar.

Solution

the two new helpers (_scan_quoted_identifier, _split_object_field) and the three updated callers (split_top_level_comma_fields, _extract_paren_content, OBJECT branch of _sf_type_to_type_object), referencing the server-side SFSqlLexer.g / SqlIdentifierUtils.java grammar that pins the "" escape.

Backward compatibility

bare names still take the original split path; non-OBJECT structured strings are unchanged.

graphite-app · 2026-04-29T22:04:36Z

+        # "a""b" is the 7-char span 0..6 inclusive; index past it is 7
+        s = '"a""b" rest'
+        assert _scan_quoted_identifier(s, 0) == 6


The test comment contains an error. The comment states "a""b" is a "7-char span 0..6 inclusive" with "index past it is 7", but:

"a""b" is 6 characters (positions 0-5): ", a, ", ", b, "

The index just past it is 6 (not 7)

The assertion assert _scan_quoted_identifier(s, 0) == 6 is correct, but the comment should read:

# "a""b" is a 6-char span (positions 0-5); index just past it is 6

While this is only a comment error and won't cause production failures, it could confuse future maintainers debugging this code.

Suggested change

# "a""b" is the 7-char span 0..6 inclusive; index past it is 7

s = '"a""b" rest'

assert _scan_quoted_identifier(s, 0) == 6

# "a""b" is a 6-char span (positions 0-5); index just past it is 6

s = '"a""b" rest'

assert _scan_quoted_identifier(s, 0) == 6

Spotted by Graphite

Is this helpful? React 👍 or 👎 to let us know.

codecov-commenter · 2026-04-29T23:26:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.04%. Comparing base (75260b9) to head (5857d24).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4206      +/-   ##
==========================================
- Coverage   95.42%   95.04%   -0.38%     
==========================================
  Files         171      171              
  Lines       43801    43835      +34     
  Branches     7505     7513       +8     
==========================================
- Hits        41795    41665     -130     
- Misses       1226     1345     +119     
- Partials      780      825      +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sfc-gh-yuwang · 2026-04-30T22:11:37Z

can you also run this change against SCOS's regression test?

sfc-gh-yuwang · 2026-04-30T22:17:33Z

+
+    Raises ``ValueError`` if the closing quote is missing.
+    """
+    assert s[start] == '"'


is this assert necessary? Is there a case that this function is called on a string that its first character is not double quote?

sfc-gh-yuwang · 2026-04-30T22:29:01Z

+
+
+def _split_object_field(field_def: str) -> Tuple[str, str]:
+    """Split a single OBJECT field definition into ``(name_token, remainder)``.


is it possible that a malformed field_def like "a NUM"BER reach here? Or is this already handled in the upstream?

Yes, it's possible. Added a test that we raise exception with clear error message.

sfc-gh-joshi · 2026-05-01T20:37:11Z

+        if s[i] == '"':
+            if i + 1 < len(s) and s[i + 1] == '"':
+                i += 2  # escaped "" inside the name; keep scanning
+                continue
+            return i + 1  # index just past the closing quote


nit: prefer writing like this so it's slightly easier to tell what this check is doing

Suggested change

if s[i] == '"':

if i + 1 < len(s) and s[i + 1] == '"':

i += 2 # escaped "" inside the name; keep scanning

continue

return i + 1 # index just past the closing quote

if s[i:i + 1] == '""': # check for a "" escape sequence in the name

i += 2

continue

elif s[i] == '"':

# found closing quote, return the index just past it

return i + 1 # index just past the closing quote

sfc-gh-yuwang

LGTM, please check SCOS regression test before merge

sfc-gh-wshangguan requested review from a team as code owners April 29, 2026 22:00

sfc-gh-wshangguan requested review from sfc-gh-bkogan, sfc-gh-joshi and sfc-gh-yuwang April 29, 2026 22:00

sfc-gh-wshangguan marked this pull request as draft April 29, 2026 22:00

graphite-app Bot reviewed Apr 29, 2026

View reviewed changes

sfc-gh-wshangguan changed the title ~~first attemp with tests~~ SNOW-3440288: Enhance schema string parser for quotes Apr 30, 2026

sfc-gh-wshangguan marked this pull request as ready for review April 30, 2026 22:00

sfc-gh-yuwang reviewed Apr 30, 2026

View reviewed changes

sfc-gh-joshi approved these changes May 1, 2026

View reviewed changes

sfc-gh-yuwang approved these changes May 1, 2026

View reviewed changes

sfc-gh-wshangguan added 3 commits May 1, 2026 16:48

first attemp with tests

be58522

add code cov

ce0a9cb

resolve comments

5857d24

sfc-gh-wshangguan force-pushed the wshangguan-SNOW-3440288-enhance-schema-string-parser-for-quotes branch from 6a2cf59 to 5857d24 Compare May 1, 2026 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-3440288: Enhance schema string parser for quotes#4206

SNOW-3440288: Enhance schema string parser for quotes#4206
sfc-gh-wshangguan wants to merge 3 commits intomainfrom
wshangguan-SNOW-3440288-enhance-schema-string-parser-for-quotes

sfc-gh-wshangguan commented Apr 29, 2026 •

edited

Loading

Uh oh!

graphite-app Bot Apr 29, 2026

Uh oh!

codecov-commenter commented Apr 29, 2026 •

edited

Loading

Uh oh!

sfc-gh-yuwang commented Apr 30, 2026

Uh oh!

sfc-gh-yuwang Apr 30, 2026

Uh oh!

sfc-gh-wshangguan May 1, 2026

Uh oh!

sfc-gh-yuwang Apr 30, 2026

Uh oh!

sfc-gh-wshangguan May 1, 2026

Uh oh!

sfc-gh-joshi May 1, 2026

Uh oh!

sfc-gh-yuwang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def _split_object_field(field_def: str) -> Tuple[str, str]:
		"""Split a single OBJECT field definition into ``(name_token, remainder)``.

Conversation

sfc-gh-wshangguan commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Backward compatibility

Uh oh!

graphite-app Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sfc-gh-yuwang commented Apr 30, 2026

Uh oh!

sfc-gh-yuwang Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-wshangguan May 1, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-yuwang Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-wshangguan May 1, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-joshi May 1, 2026

Choose a reason for hiding this comment

Uh oh!

sfc-gh-yuwang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfc-gh-wshangguan commented Apr 29, 2026 •

edited

Loading

codecov-commenter commented Apr 29, 2026 •

edited

Loading