fix: Fix incorrect edge table state when transforming between bundled and unbundled#28
fix: Fix incorrect edge table state when transforming between bundled and unbundled#28zhanglei1949 wants to merge 1 commit intomainfrom
Conversation
| void EdgeTable::dropAndCreateNewBundledCSR() { | ||
| void EdgeTable::dropAndCreateNewBundledCSR( | ||
| std::shared_ptr<ColumnBase> remaining_col) { | ||
| CHECK(meta_->properties.size() == 1); |
There was a problem hiding this comment.
CHECK fails when called from DeleteProperties
dropAndCreateNewBundledCSR is now called from DeleteProperties (line 817), but at that call site meta_ has not yet been updated — it still reflects the pre-deletion property list. The surrounding production code in property_graph.cc (line 601–602) intentionally calls edge_table.DeleteProperties before schema_.DeleteEdgeProperties, because the edge table's delete logic depends on the old schema state.
Concretely, if an unbundled edge table has 2 properties and one is deleted (leaving 1), meta_->properties.size() is still 2 when this function is entered from DeleteProperties, so the CHECK always fires. Even if CHECK is a no-op in release mode, create_csr at line 952 would then receive the first original type (which may not match the remaining column's type), causing wrong-type CSR construction or a LOG(FATAL) if that type is kVarchar.
A minimal fix is to derive the remaining property type from remaining_col->type() instead of relying on meta_->properties[0], and to replace the strict equality CHECK with an assertion against the table column count:
// instead of:
CHECK(meta_->properties.size() == 1);
new_out_csr = create_csr(meta_->oe_mutable, meta_->oe_strategy,
meta_->properties[0].id());
// ...
// consider:
// DataTypeId remaining_type = (remaining_col != nullptr)
// ? remaining_col->type()
// : meta_->properties[0].id();
// new_out_csr = create_csr(meta_->oe_mutable, meta_->oe_strategy, remaining_type);| if (table_->col_num() == 0) { | ||
| dropAndCreateNewUnbundledCSR(true); | ||
| } else if (table_->col_num() == 1) { | ||
| auto remaining_col = table_->get_column_by_id(0); | ||
| dropAndCreateNewBundledCSR(remaining_col); | ||
| } |
There was a problem hiding this comment.
Incorrect transition for kVarchar remaining property
dropAndCreateNewBundledCSR is called whenever exactly one column remains after deletion, but EdgeSchema::is_bundled() (in schema.cc:241–252) explicitly returns false when the single property is kVarchar. A kVarchar edge table is always unbundled, so attempting to bundle it will hit LOG(FATAL) inside create_csr (which dispatches via FOR_EACH_DATA_TYPE_NO_STRING, deliberately excluding strings).
The guard should use the remaining column's actual type — or equivalently check whether the edge should truly become bundled — rather than unconditionally calling dropAndCreateNewBundledCSR for any single-column result:
if (table_->col_num() == 0) {
dropAndCreateNewUnbundledCSR(true);
} else if (table_->col_num() == 1) {
auto remaining_col = table_->get_column_by_id(0);
// Only transition to bundled for non-varchar types
if (remaining_col->type() != DataTypeId::kVarchar) {
dropAndCreateNewBundledCSR(remaining_col);
}
// For kVarchar the table stays unbundled; no CSR rebuild needed
}| void batch_put_edges_with_edata(const std::vector<vid_t>& src_lid, | ||
| const std::vector<vid_t>& dst_lid, | ||
| DataTypeId property_type, | ||
| const std::vector<Property>& edge_data, | ||
| CsrBase* out_csr) { | ||
| switch (property_type) { | ||
| #define TYPE_DISPATCHER(enum_val, type) \ | ||
| case DataTypeId::enum_val: { \ | ||
| std::vector<type> typed_data; \ | ||
| typed_data.reserve(edge_data.size()); \ | ||
| for (const auto& prop : edge_data) { \ | ||
| typed_data.emplace_back(PropUtils<type>::to_typed(prop)); \ | ||
| } \ | ||
| dynamic_cast<TypedCsrBase<type>*>(out_csr)->batch_put_edges( \ | ||
| src_lid, dst_lid, typed_data); \ | ||
| break; \ | ||
| } | ||
| TYPE_DISPATCHER(kBoolean, bool); | ||
| TYPE_DISPATCHER(kInt32, int32_t); | ||
| TYPE_DISPATCHER(kUInt32, uint32_t); | ||
| TYPE_DISPATCHER(kInt64, int64_t); | ||
| TYPE_DISPATCHER(kUInt64, uint64_t); | ||
| TYPE_DISPATCHER(kFloat, float); | ||
| TYPE_DISPATCHER(kDouble, double); | ||
| TYPE_DISPATCHER(kDate, Date); | ||
| TYPE_DISPATCHER(kTimestampMs, DateTime); | ||
| TYPE_DISPATCHER(kInterval, Interval); | ||
| #undef TYPE_DISPATCHER | ||
| case DataTypeId::kEmpty: { | ||
| dynamic_cast<TypedCsrBase<EmptyType>*>(out_csr)->batch_put_edges( | ||
| src_lid, dst_lid, {}); | ||
| break; | ||
| } | ||
| default: | ||
| LOG(FATAL) << "Unsupported edge property type " | ||
| << static_cast<int>(property_type); | ||
| } |
There was a problem hiding this comment.
kVarchar / string type not handled
batch_put_edges_with_edata uses a hand-rolled TYPE_DISPATCHER that mirrors FOR_EACH_DATA_TYPE_NO_STRING but does not include kVarchar. batch_put_edges_with_default_edata (the existing counterpart, line 137) handles the same set and documents this gap with a // TODO(zhanglei) comment.
While kVarchar is currently never stored in a bundled CSR (and dropAndCreateNewBundledCSR is the only caller), the default branch silently calls LOG(FATAL), making the failure mode hard to trace. At a minimum, the comment from batch_put_edges_with_default_edata should be replicated here, and ideally a THROW_NOT_SUPPORTED_EXCEPTION (consistent with the existing helper) could be used in place of LOG(FATAL) to give callers a chance to recover.
Use cibuildwheel to build wheels for all platforms and archs.
When deleting properties, a unbundled table could become bundled, a bundled table could become unbundled.
When adding properties, a bundled table could become unbundled.
Greptile Summary
This PR fixes incorrect
EdgeTablestate when transitioning between bundled and unbundled CSR storage by introducing aremaining_colparameter todropAndCreateNewBundledCSRso existing edge data is preserved during the transition. It also adds missingtable_idx_/capacity_resets, a missingbreakin a switch, and a null-safety guard inAddEdge.Key changes and issues found:
dropAndCreateNewBundledCSR(remaining_col)correctly migrates real edge data into the new bundled CSR when a column is present; thenullptrpath (used fromAddProperties) fills with defaults — correct.CHECK(meta_->properties.size() == 1)insidedropAndCreateNewBundledCSRfires when called fromDeleteProperties, because the production call site (property_graph.cc:601) intentionally updates the schema after callingDeleteProperties, leavingmeta_->propertiesat its pre-deletion size.DeletePropertiesunconditionally callsdropAndCreateNewBundledCSRwhen one column remains, butkVarcharproperties are never bundled (EdgeSchema::is_bundled()returnsfalsefor a singlekVarcharproperty). Callingcreate_csrwithkVarcharhitsLOG(FATAL)since string types are excluded fromFOR_EACH_DATA_TYPE_NO_STRING.batch_put_edges_with_edatahelper also omitskVarchar, consistent with the existing TODO but undocumented.test_edge_table.cccover the described scenarios and are good additions, butTestDeletePropertiesTransitionFromUnbundledToBundledexercises the two logic bugs above and will fail.Confidence Score: 2/5
CHECK(meta_->properties.size() == 1)assertion indropAndCreateNewBundledCSRwill fire whenever it is called fromDeletePropertiesbecause the schema is updated after the edge table, leavingmeta_stale. Additionally, the unconditional call todropAndCreateNewBundledCSRfor any single remaining column fails forkVarchartypes, whichcreate_csrexplicitly does not support. Both paths are exercised by the new tests added in this PR.src/storages/graph/edge_table.cc— specificallydropAndCreateNewBundledCSRand the newDeletePropertiesbranch.Important Files Changed
CHECK(meta_->properties.size() == 1)fires when called fromDeleteProperties(meta not yet updated), andkVarcharremaining columns incorrectly route todropAndCreateNewBundledCSRwhich doesn't support string types.dropAndCreateNewBundledCSRsignature accepting ashared_ptr<ColumnBase>parameter.TestDeletePropertiesTransitionFromUnbundledToBundledtest will fail at runtime because of the two logic bugs in the production code it exercises.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[DeleteProperties called] --> B{meta_->is_bundled?} B -- yes --> C{property in col_names?} C -- yes --> D[dropAndCreateNewUnbundledCSR true] C -- no --> E[return] B -- no --> F[table_->delete_column for each col] F --> G{table_->col_num after deletion} G -- 0 --> H[dropAndCreateNewUnbundledCSR true\nresets table_idx_ and capacity_] G -- 1 --> I[remaining_col = get_column_by_id 0] I --> J[dropAndCreateNewBundledCSR remaining_col] J --> K{CHECK meta properties.size == 1\n⚠️ FAILS if meta not yet updated} K -- pass --> L{remaining_col != nullptr?} L -- yes --> M[batch_export row_id_col\nbuild remaining_data vector] M --> N[batch_put_edges_with_edata\n⚠️ FATAL if kVarchar] L -- no --> O[batch_put_edges_with_default_edata] N --> P[drop table, reset table_idx_ / capacity_\nswap new CSRs] O --> P G -- other --> Q[no CSR rebuild needed]Last reviewed commit: eed8593