Skip to content

Specific improvements#33

Merged
offx-zinth merged 1 commit intomainfrom
master
Apr 25, 2026
Merged

Specific improvements#33
offx-zinth merged 1 commit intomainfrom
master

Conversation

@offx-zinth
Copy link
Copy Markdown
Owner

No description provided.

@offx-zinth offx-zinth merged commit 28cd564 into main Apr 25, 2026
1 check failed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the core infrastructure for the SMP Graph Engine, an ingest-free, memory-mapped graph database for code analysis. The changes include low-level file management with a Write-Ahead Log (WAL), indexing structures (Crit-bit and Radix trees), a deduplicated string pool, and a tree-sitter-based parsing engine for Python. While the architectural foundation is solid, the current implementation contains several critical bugs: serialized node and edge data are packed but never actually written to the memory-mapped file, and the indexing logic uses dummy pointers instead of real offsets. Additionally, there are logic errors regarding tree-sitter sibling traversal for decorators, case-sensitivity issues with NodeType mapping, and unsafe handling of header CRC mismatches.

graph_nodes.append(
GraphNode(
id=pnode.node_id,
type=NodeType(pnode.type.upper()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The NodeType enum in smp.core.models uses TitleCase values (e.g., "Function", "Class"). Calling .upper() on pnode.type (which is already "Function" or "Class") will result in "FUNCTION" or "CLASS", causing NodeType() to raise a ValueError because it is a StrEnum with case-sensitive matching.

Suggested change
type=NodeType(pnode.type.upper()),
type=NodeType(pnode.type),

Comment thread smp/store/graph/parser.py
def _extract_decorators(self, node: Any, content: bytes) -> list[str]:
"""Extract decorators."""
decorators: list[str] = []
for child in node.prev_sibling:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The node.prev_sibling property in tree-sitter returns a single Node object or None; it is not an iterable. Attempting to iterate over it will raise a TypeError. You should use a while loop to traverse previous siblings if you are looking for multiple decorators.

Suggested change
for child in node.prev_sibling:
curr = node.prev_sibling
while curr and curr.type == "decorator":
text = content[curr.start_byte : curr.end_byte].decode("utf-8", errors="replace")
decorators.append(text.strip())
curr = curr.prev_sibling

Comment on lines +148 to +149
if actual_crc != stored_crc:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Ignoring a CRC mismatch in the file header is dangerous for a database as it indicates potential data corruption. This should raise an exception to prevent further operations on a corrupted file.

Suggested change
if actual_crc != stored_crc:
pass
if actual_crc != stored_crc:
raise ValueError("Header CRC mismatch: file may be corrupted")

Comment on lines +23 to +33
struct.pack(
"<BIII III I",
1,
name_off,
sig_off,
file_off,
node.structural.start_line,
node.structural.end_line or 0,
0,
0,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The result of struct.pack is being discarded. This method is intended to serialize the node data into the memory-mapped file, but it currently does nothing with the packed bytes.

Comment on lines +19 to +21
payload = struct.pack("<I", count)
for target_off, etype in targets:
payload += struct.pack("<II", target_off, etype)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The payload containing the edge data is constructed but never written to the underlying mmap file. This results in edges not being persisted.

Comment on lines +97 to +99
self.index.insert(node.id, 0)
if self.radix:
self.radix.insert(node.file_path, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a dummy pointer of 0 for index insertions makes the Crit-bit and Radix indices non-functional for data retrieval from the mmap file. The actual offset returned by the NodeStore should be used here.

Comment thread smp/store/graph/parser.py
Comment on lines +321 to +323
if name and name[0].islower():
# Likely a module name, not function
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This heuristic incorrectly skips module-level imports in Python, as most module names (e.g., os, sys, requests) start with a lowercase letter. This will cause the engine to miss a significant number of valid IMPORTS relationships.

Suggested change
if name and name[0].islower():
# Likely a module name, not function
continue
if not name:
continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant