feat: Support VARCHAR(max_length) in NeuG Type System#25
Open
feat: Support VARCHAR(max_length) in NeuG Type System#25
VARCHAR(max_length) in NeuG Type System#25Conversation
Committed-by: Xiaoli Zhou from Dev container
Committed-by: Xiaoli Zhou from Dev container
Committed-by: Xiaoli Zhou from Dev container
Collaborator
|
@greptile |
Committed-by: xiaolei.zl from Dev container
Committed-by: xiaolei.zl from Dev container
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Committed-by: Xiaoli Zhou from Dev container
What do these changes do?
Related issue number
Fixes #26
Greptile Summary
This PR adds
VARCHAR(max_length)syntax to NeuG's Cypher type system, introducing aStringTypeInfoextra-type-info class, newLogicalType::STRING(size_t)factory, a grammar rule inCypher.g4, and a broad refactoring of DDL property-handling code fromDataTypeIdto the richerDataType(which now carriesmax_length).Key changes:
StringTypeInfoclass added to storemax_length;LogicalType::STRING()now defaults to 256 instead of the old 65536 constantparseStringType()added to parseVARCHAR(N)from string;endPtrvalidation was added (fixing a prior review thread), but non-positive values are not rejected —VARCHAR(0)silently creates an invalid type andVARCHAR(-1)silently wraps toVARCHAR(65536)due tolong long → size_tconversioncreate_vertex_type,create_edge_type,add_vertex_property,add_edge_property) updated frompair<string, Value>totuple<DataType, string, Value>to propagatemax_lengththrough to storagepb_utils.ccrefactored fromDataTypeIdtoDataType;temporal_type_to_property_typewas partially updated (signature changed, body left usingDataTypeIdenums)normalize()introduced to stripmax_lengthvalues from comparison strings, avoiding brittle test failuresConfidence Score: 3/5
maxLen > 0guard allows zero and negative lengths to produce silent, incorrect results that could persist in the schema.parseStringTypedoes not validate that the parsed length is positive, soVARCHAR(0)andVARCHAR(-5)are silently accepted with incorrect behavior rather than returning a clear error. All other previously raised concerns have been addressed or acknowledged.src/compiler/common/types/types.cpp— theparseStringTypefunction needs amaxLen > 0guard before callingLogicalType::STRING(maxLen).Important Files Changed
StringTypeInfo,LogicalType::STRING(size_t)factory, andparseStringType. TheparseStringTypefunction fails to validate thatmax_length > 0, allowingVARCHAR(0)to create an invalid type and negative values to silently wrap-cast to the maximum limit (65536).StringTypeInfoclass and the twoSTRING()factory overloads. Design looks sound;serializeInternalis a no-op but this was acknowledged as intentional in a previous thread.DataTypeIdtoDataTypeto carrymax_length.string_type_to_property_typeis correctly updated, buttemporal_type_to_property_typestill assigns rawDataTypeIdenum values to the newDataType&output parameter — compiles via implicit conversion but is inconsistent with the rest of the PR's refactoring.max_lengthforSTRINGtypes. Logic correctly falls back togetDefaultStringMaxLen()when noextraTypeInfois present.max_lengthfromStringTypeInfoand fall back to the default gracefully. No issues found.VARCHARlexer token andVARCHAR(integer)production rule tonEUG_DataType. Grammar change is straightforward and correct.normalize()helper that stripsmax_lengthvalues before comparing plan/YAML output in tests, avoiding brittle string-equality failures. Approach is clean.property_def_tfrompair<string, Value>totuple<DataType, string, Value>so the explicit type (withmax_length) flows through to storage. Change is consistent and correct.Sequence Diagram
sequenceDiagram participant User participant Parser as Cypher Parser<br/>(Cypher.g4) participant Binder as LogicalType::convertFromString participant PST as parseStringType() participant STR as LogicalType::STRING(size_t) participant STI as StringTypeInfo User->>Parser: CREATE NODE TABLE n(id INT64 PRIMARY KEY, name VARCHAR(128)) Parser->>Binder: trimmedStr = "VARCHAR(128)" Binder->>PST: parseStringType("VARCHAR(128)") PST->>PST: strtoll → maxLen=128, validate endPtr Note over PST: ⚠️ No check for maxLen ≤ 0 PST->>STR: LogicalType::STRING(128) STR->>STR: clamp to maxLimit (65536) if needed STR->>STI: new StringTypeInfo(128) STI-->>STR: StringTypeInfo{max_length=128} STR-->>PST: LogicalType(STRING, StringTypeInfo{128}) PST-->>Binder: LogicalType(STRING, StringTypeInfo{128}) Binder-->>User: VARCHAR(128) type createdComments Outside Diff (2)
src/compiler/common/types/types.cpp, line 1567-1576 (link)Exceeding max length is silently clamped — consider throwing instead
When a caller passes a
max_lengthlarger thangetMaxStringMaxLen()(65536), the value is silently reduced with only aLOG(WARNING). A user who writesVARCHAR(100000)will receive a column withVARCHAR(65536)without any query-level error, which is surprising.Consider throwing a binder exception (consistent with how invalid lengths are handled in
parseStringType) rather than silently accepting invalid input:src/utils/pb_utils.cc, line 168-191 (link)Inconsistent refactoring —
DataTypeIdstill assigned toDataTypeoutputThe function signature was changed from
DataTypeId& out_typetoDataType& out_typeas part of this PR's refactoring, but the assignments inside theswitchstill use rawDataTypeIdenum values (e.g.,DataTypeId::kDate,DataTypeId::kTimestampMs,DataTypeId::kInterval) rather thanDataTypefactory methods.This compiles only because
DataTypehas an implicit constructor fromDataTypeId. In contrast,primitive_type_to_property_type— updated in the same PR — consistently usesDataType::INT32,DataType::UINT64, etc. The inconsistency makes it harder to reason about the code and may cause confusion for future maintainers.Consider updating the assignments to use the
DataTypefactory style, for example:Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Last reviewed commit: 535e074