-
Notifications
You must be signed in to change notification settings - Fork 0
Enable configurable cross-project parsing #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable configurable cross-project parsing #1
Conversation
WalkthroughThe changes introduce project scoping, color coding, and source URL generation for graph nodes in the parsing and graph creation process. Configuration values, including Neo4j credentials and project metadata, are now sourced from environment variables. The parser and node handlers are updated to include project, color, and URL properties, with a root node established as the container for all nodes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Main
participant Parser
participant Neo4j
User->>Main: Set environment variables (NEO4J_URI, NEO4J_USER, etc.)
Main->>Main: Validate env variables
Main->>Neo4j: Initialize driver with credentials
Main->>Parser: Create TreeSitterParser(driver, rootName, baseURL, projectRoot)
Main->>Parser: AnalyzeDirectory(dirPath)
Parser->>Neo4j: Cleanup project nodes (scoped)
Parser->>Neo4j: Create/Merge Root node (with project, color)
loop For each file
Parser->>Parser: New parser instance (with filePath)
Parser->>Neo4j: Create nodes (Function, Namespace, Type) with project, color, url
Parser->>Neo4j: Link nodes to Root with CONTAINS
end
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (10)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @TheApeMachine, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. The primary goal of this PR, as indicated by the title and description, is to enable the code parsing process to handle multiple projects independently. This is achieved by making the parser configuration external (via environment variables), enriching the graph nodes with project-specific information, creating a root node for each project in the Neo4j database, and modifying the cleanup process to only delete nodes belonging to the current project rather than wiping the entire database. This lays the groundwork for parsing and visualizing code from different projects within the same graph database instance.
Highlights
- Configurable Parsing: The application now reads Neo4j connection details, project name (
ROOT_NAME), and base URL (BASE_URL) from environment variables, making the parsing process configurable for different projects without code changes. - Project-Specific Data: Graph nodes (Function, Namespace, Type) are updated to include a
projectproperty, linking them explicitly to the project they belong to. A dedicatedRootnode is created for each project to serve as an entry point. - Project-Specific Cleanup: The database cleanup logic is modified to delete only the nodes associated with the current project (identified by the
projectproperty), preventing accidental deletion of data from other projects. - Source Code URL Generation: A new method
createURLis added to generate links to the specific lines of source code for nodes, using the configuredBASE_URLand file path.
Changelog
Click here to see the changelog
- graphlang/graphlang.go
- Added
NodeColorsmap for potential visualization hints. - Added fields (
filePath,rootName,baseURL,projectRoot,rootID) toTreeSitterParserstruct to hold project-specific configuration. - Updated
NewTreeSitterParserto acceptrootName,baseURL, andprojectRoot. - Added
createURLmethod to generate source code links. - Modified
cleanupfunction to read Neo4j credentials from environment variables and added validation. - Changed the Neo4j query in
cleanupto delete nodes only for the current project (MATCH (n {project: $project})). - Added logic in
AnalyzeDirectorytoMERGE(create or update) aRootnode for the project. - Passed project-specific configuration (
filePath,rootName,baseURL,projectRoot,rootID) when creatingfileParserinstances inprocessFile. - Updated
handleFunction,handleNamespace, andhandleTypeto use the project'srootNamein node IDs, addproject,color, andurlproperties, and link the created nodes to the project'sRootnode.
- Added
- main.go
- Imported the
ospackage. - Modified
mainfunction to read Neo4j connection details,ROOT_NAME, andBASE_URLfrom environment variables. - Added default values for
rootNameandbaseURLif not provided. - Added checks to ensure required Neo4j environment variables are set.
- Removed the explicit Neo4j session creation in
main(now handled internally by the parser). - Updated the call to
graphlang.NewTreeSitterParserto pass the read configuration (driver,rootName,baseURL,dirPath).
- Imported the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Code flows like a stream,
Parsed into a graph dream.
Nodes connect and grow,
Where the functions go.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively implements configurable cross-project parsing. Key changes include sourcing configuration from environment variables, enriching graph nodes with project-specific data (including URLs), creating dedicated root nodes for each project, and scoping database cleanup operations to the current project. These are excellent enhancements for managing multiple codebases.
I've noted a few areas for improvement, mainly around error handling in database transactions and configuration flexibility. Addressing these will enhance the robustness of the new functionality.
Summary of Findings
- Error Handling for Node Linking: In
handleFunction,handleNamespace, andhandleType, errors fromtx.Runwhen linking these nodes to the project's Root node are currently ignored. This could lead to silent failures and graph inconsistencies. These errors should be checked and handled, potentially by logging and returning the error to ensure transaction rollback. - Neo4j Session Management: In
AnalyzeDirectory, the Neo4j session for creating the root node is not closed usingdefer. If an error occurs duringsession.Run, the explicitsession.Close()call might be skipped, leading to a potential resource leak. - Configuration Flexibility: In
main.go, the directory path for analysis (dirPath) is hardcoded. Making this configurable via an environment variable would improve the tool's flexibility.
Merge Readiness
The pull request introduces valuable functionality for cross-project parsing. However, there are several medium-severity issues related to error handling, and configuration flexibility that should be addressed to ensure the robustness and maintainability of the code. I recommend that these changes be made before merging. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from authorized team members.
| _, _ = tx.Run(context.Background(), | ||
| `MATCH (r:Root {id:$rid}), (f:Function {id:$fid}) MERGE (r)-[:CONTAINS]->(f)`, | ||
| map[string]any{"rid": parser.rootID, "fid": fmt.Sprintf("%s:%s", parser.rootName, qualifiedName)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error returned by tx.Run when attempting to merge the CONTAINS relationship between the Root node and the Function node is currently ignored (using _, _ = ...). If this operation fails, the function node might not be correctly linked to its project root, which could lead to inconsistencies or issues in graph traversal later. Should this error be checked and potentially logged or even propagated to ensure the transaction rolls back if the linking fails?
_, linkErr := tx.Run(context.Background(),
`MATCH (r:Root {id:$rid}), (f:Function {id:$fid}) MERGE (r)-[:CONTAINS]->(f)`,
map[string]any{"rid": parser.rootID, "fid": fmt.Sprintf("%s:%s", parser.rootName, qualifiedName)})
if linkErr != nil {
log.Printf("Error linking function '%s' to root '%s': %v", qualifiedName, parser.rootID, linkErr)
// Consider returning linkErr to ensure transaction rollback if this link is critical
return linkErr
}| _, _ = tx.Run(context.Background(), | ||
| `MATCH (r:Root {id:$rid}), (n:Namespace {id:$nid}) MERGE (r)-[:CONTAINS]->(n)`, | ||
| map[string]any{"rid": parser.rootID, "nid": fmt.Sprintf("%s:%s", parser.rootName, namespaceName)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the handleFunction case, the error from tx.Run when linking the Namespace node to the Root node is ignored. This could lead to a Namespace node existing without being properly connected to its project's Root. Would it be better to check this error and handle it, perhaps by logging and/or returning it to ensure transactional integrity?
_, linkErr := tx.Run(context.Background(),
`MATCH (r:Root {id:$rid}), (n:Namespace {id:$nid}) MERGE (r)-[:CONTAINS]->(n)`,
map[string]any{"rid": parser.rootID, "nid": fmt.Sprintf("%s:%s", parser.rootName, namespaceName)})
if linkErr != nil {
log.Printf("Error linking namespace '%s' to root '%s': %v", namespaceName, parser.rootID, linkErr)
// Consider returning linkErr
return linkErr
}| _, _ = tx.Run(context.Background(), | ||
| `MATCH (r:Root {id:$rid}), (t:Type {id:$tid}) MERGE (r)-[:CONTAINS]->(t)`, | ||
| map[string]any{"rid": parser.rootID, "tid": fmt.Sprintf("%s:%s", parser.rootName, qualifiedName)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from tx.Run when linking the Type node to the Root node is also ignored here. Consistent with the feedback for handleFunction and handleNamespace, it's advisable to check this error. If the link fails, it could impact graph consistency. Should this error be logged and possibly returned?
_, linkErr := tx.Run(context.Background(),
`MATCH (r:Root {id:$rid}), (t:Type {id:$tid}) MERGE (r)-[:CONTAINS]->(t)`,
map[string]any{"rid": parser.rootID, "tid": fmt.Sprintf("%s:%s", parser.rootName, qualifiedName)})
if linkErr != nil {
log.Printf("Error linking type '%s' to root '%s': %v", qualifiedName, parser.rootID, linkErr)
// Consider returning linkErr
return linkErr
}| defer session.Close(context.Background()) | ||
|
|
||
| dirPath := "/app" // Replace with the actual directory path | ||
| dirPath := "/app" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dirPath is hardcoded to "/app". While this might be suitable for a specific Docker deployment, making it configurable (e.g., via an environment variable like ANALYSIS_PATH) would offer more flexibility for running the analyzer in different environments or against different source directories without rebuilding the container or changing the code. Consider adding an environment variable for this path with "/app" as a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
main.go (1)
29-31:⚠️ Potential issueAvoid logging secrets – redact the Neo4j password before printing
log.Printfcurrently prints the plain-text password, which ends up in stdout / logs and can be harvested from container logs or CI pipelines.
Replace it with a masked or omitted value.-log.Printf("Connecting to Neo4j at %s with user %s and password %s\n", neo4jURI, neo4jUser, neo4jPassword) +log.Printf("Connecting to Neo4j at %s with user %s\n", neo4jURI, neo4jUser)graphlang/graphlang.go (1)
289-304:⚠️ Potential issueReuse the existing Neo4j driver instead of creating a second one & stop logging the password
cleanup()ignoresparser.driverand spins up a new driver with credentials
read again from the environment. This doubles the number of TCP connections,
duplicates TLS handshakes, and risks mismatching credentials at runtime. It
also logs the password in clear text.-func (parser *TreeSitterParser) cleanup() error { - neo4jURI := os.Getenv("NEO4J_URI") - neo4jUser := os.Getenv("NEO4J_USER") - neo4jPassword := os.Getenv("NEO4J_PASSWORD") - ... - driver, err := neo4j.NewDriverWithContext(neo4jURI, neo4j.BasicAuth(neo4jUser, neo4jPassword, "")) - ... - defer driver.Close(context.Background()) +func (parser *TreeSitterParser) cleanup() error { + driver := parser.driver // reuse the already-configured, shared driverand change the logging line to:
-log.Printf("Connecting to Neo4j at %s with user %s and password %s\n", neo4jURI, neo4jUser, neo4jPassword) +log.Printf("Running cleanup against project %s", parser.rootName)Finally, remove the extra
driver.Closebecause the owner (main.go) already
defers a single close.
🧹 Nitpick comments (3)
main.go (2)
36-38: Make the analysed directory configurable instead of hard-coding “/app”Hard-coding the directory path limits portability (e.g. running locally on macOS/Windows).
Expose it as an env-var or CLI flag so the same binary can analyse any checkout.-dirPath := "/app" +dirPath := os.Getenv("PROJECT_ROOT") +if dirPath == "" { + dirPath, _ = os.Getwd() // fall back to current working dir +}
30-34: Verify connectivity early and fail fastAfter constructing the driver, call
driver.VerifyConnectivity(ctx)so that
network / auth problems surface immediately instead of several minutes into the
scan.driver, err := neo4j.NewDriverWithContext(neo4jURI, neo4j.BasicAuth(neo4jUser, neo4jPassword, "")) if err != nil { log.Fatalf("Failed to create Neo4j driver: %v", err) } +if err := driver.VerifyConnectivity(context.Background()); err != nil { + log.Fatalf("Neo4j unreachable: %v", err) +}graphlang/graphlang.go (1)
373-381: Capturesession.Runerror before closing the session
session.Closemay overwriteerr; capture the error first to avoid losing it.-session := parser.driver.NewSession(...) -_, err := session.Run(...) /* close afterwards */ -session.Close(context.Background()) +session := parser.driver.NewSession(...) +result, runErr := session.Run(...) +closeErr := session.Close(context.Background()) +if runErr != nil { + return fmt.Errorf("failed to create root node: %w", runErr) +} +if closeErr != nil { + return fmt.Errorf("failed to close session: %w", closeErr) +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
graphlang/graphlang.go(10 hunks)main.go(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
main.go (1)
graphlang/graphlang.go (1)
NewTreeSitterParser(93-102)
graphlang/graphlang.go (2)
go/main.go (1)
NodeColors(20-28)js/analyzer.js (1)
NodeColors(7-15)
| func (parser *TreeSitterParser) createURL(line int) string { | ||
| rel := strings.TrimPrefix(parser.filePath, parser.projectRoot) | ||
| base := parser.baseURL | ||
| if !strings.HasSuffix(base, "/") { | ||
| base += "/" | ||
| } | ||
| return fmt.Sprintf("%s%s#%d", base, strings.TrimPrefix(rel, "/"), line) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
createURL breaks on Windows paths and needs URL-escaping
strings.TrimPrefix(parser.filePath, parser.projectRoot) leaves back-slashes on
Windows and doesn’t encode #, spaces, etc. Use filepath.ToSlash,
path.Join, and url.PathEscape (from net/url) to generate a portable,
clickable link.
import (
"context"
@@
+ "net/url"
"fmt"
@@
-func (parser *TreeSitterParser) createURL(line int) string {
- rel := strings.TrimPrefix(parser.filePath, parser.projectRoot)
- base := parser.baseURL
- if !strings.HasSuffix(base, "/") {
- base += "/"
- }
- return fmt.Sprintf("%s%s#%d", base, strings.TrimPrefix(rel, "/"), line)
+func (parser *TreeSitterParser) createURL(line int) string {
+ rel, _ := filepath.Rel(parser.projectRoot, parser.filePath)
+ rel = filepath.ToSlash(rel)
+ escaped := url.PathEscape(rel)
+ return fmt.Sprintf("%s/%s#L%d", strings.TrimRight(parser.baseURL, "/"), escaped, line)
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In graphlang/graphlang.go around lines 105 to 112, the createURL function does
not handle Windows paths correctly and fails to URL-escape special characters.
To fix this, convert the relative file path to use forward slashes with
filepath.ToSlash, join the base URL and relative path using path.Join for proper
URL path construction, and apply url.PathEscape to the path and line fragment to
ensure all special characters are correctly escaped, producing a portable and
clickable URL.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ose` Docstrings generation was requested by @TheApeMachine. * #1 (comment) The following files were modified: * `graphlang/graphlang.go` * `main.go`
|
Note Generated docstrings for this pull request at #2 |
Summary
main.goto read config from environment variablesgraphlangparser with project info and URL generationTesting
go vet ./...(fails: proxyconnect tcp dial tcp 172.25.0.3:8080: connect: no route to host)Summary by CodeRabbit