Improved CRD discovery to automatically scan repositories#886
Improved CRD discovery to automatically scan repositories#886CWAbhi wants to merge 8 commits intomeshery:masterfrom
Conversation
|
Yay, your first pull request! 👍 A contributor will be by to give feedback soon. In the meantime, you can find updates in the #github-notifications channel in the community Slack. |
Summary of ChangesHello @CWAbhi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the CRD discovery mechanism by introducing automatic repository scanning for Custom Resource Definitions. Instead of requiring precise file paths, users can now provide a GitHub repository URL, and the system will intelligently locate CRDs, improving usability and making the process more resilient to repository structure changes. This change streamlines configuration and ensures long-term maintainability of the model generator. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
|
@leecalcote can you please check the PR and review it |
There was a problem hiding this comment.
Code Review
This pull request enhances the GitHub generator to automatically discover Custom Resource Definitions (CRDs) within repositories by refactoring URL parsing to support various GitHub URL formats and introducing a crdAwareFileInterceptor for recursive traversal and identification of CRD-containing YAML/JSON files. Review comments indicate that the CRD detection logic, particularly the string-based checks for kind: CustomResourceDefinition and the multi-document YAML splitting, is fragile and complex, suggesting improvements like using regular expressions for robustness. Additionally, a minor refactoring is recommended to remove an unused scheme parameter from the isGitHubURL helper function.
generators/github/git_repo.go
Outdated
| if strings.Contains(doc, "kind: CustomResourceDefinition") { | ||
| hasCRD = true | ||
| break | ||
| } | ||
| // Check for JSON format | ||
| if strings.Contains(doc, "\"kind\":\"CustomResourceDefinition\"") || | ||
| strings.Contains(doc, `"kind":"CustomResourceDefinition"`) { | ||
| hasCRD = true | ||
| break | ||
| } |
There was a problem hiding this comment.
The current string-based checks for kind: CustomResourceDefinition are fragile and may fail with minor formatting variations, such as extra whitespace. Additionally, the check for JSON on lines 226-227 contains a redundant condition, as both sides of the || operator are identical.
To improve robustness, consider using regular expressions to handle whitespace variations. For example:
// Note: This requires importing the `regexp` package.
// For YAML
if match, _ := regexp.MatchString(`kind:\s*CustomResourceDefinition`, doc); match {
hasCRD = true
break
}
// For JSON
if match, _ := regexp.MatchString(`"kind"\s*:\s*"CustomResourceDefinition"`, doc); match {
hasCRD = true
break
}Alternatively, a full unmarshal would be even more robust, though potentially slower.
| documents := strings.Split(content, "\n---\n") | ||
| // Also handle documents separated by "---" at the start of a line | ||
| if len(documents) == 1 { | ||
| // Try splitting by lines starting with "---" | ||
| lines := strings.Split(content, "\n") | ||
| var docs []string | ||
| var currentDoc strings.Builder | ||
| for _, line := range lines { | ||
| if strings.TrimSpace(line) == "---" && currentDoc.Len() > 0 { | ||
| docs = append(docs, currentDoc.String()) | ||
| currentDoc.Reset() | ||
| } else { | ||
| if currentDoc.Len() > 0 { | ||
| currentDoc.WriteString("\n") | ||
| } | ||
| currentDoc.WriteString(line) | ||
| } | ||
| } | ||
| if currentDoc.Len() > 0 { | ||
| docs = append(docs, currentDoc.String()) | ||
| } | ||
| if len(docs) > 1 { | ||
| documents = docs | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for splitting multi-document YAML files is complex and has a few issues:
- The initial split
strings.Split(content, "\n---\n")on line 187 is not robust for all multi-document YAML files. - In the fallback logic, the condition
&& currentDoc.Len() > 0on line 195 can cause the---separator at the beginning of a file to be incorrectly included in the first document. - The
elseblock on line 198 will append the---line to the next document if theifcondition on line 195 is false. - On line 208,
if len(docs) > 1prevents processing files with a single document that might be discovered by this splitting logic.
Consider simplifying this entire block with a more robust method, such as using a regular expression to split documents, which would be cleaner and less error-prone.
|
|
||
| // isGitHubURL checks if the URL is a GitHub repository URL | ||
| // This enables automatic CRD discovery for standard GitHub URLs | ||
| func isGitHubURL(scheme string, url *url.URL) bool { |
There was a problem hiding this comment.
|
This is a good item to add to the weekly Meshery Dev meeting agenda. You can add this item in the doc, attend, and present it. Meeting details at https://meshery.io/calendar. |
|
@leecalcote Sounds good . I’ll add this to the agenda and join the meeting to present it. Thanks for the direction! |
Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
|
@leecalcote i have fixed the version issue kindly review it. |
| if doc == "" { | ||
| continue | ||
| } | ||
| // Check for YAML format |
There was a problem hiding this comment.
Use libraries, not string parsing.
generators/github/git_repo.go
Outdated
| repo = parts[1] | ||
|
|
||
| // Remove .git suffix from repo name if present | ||
| repo = strings.TrimSuffix(repo, ".git") |
There was a problem hiding this comment.
Let's find a library that offers this functionality.
There was a problem hiding this comment.
sure i will look for a library that offers this and update it
There was a problem hiding this comment.
Hi @CWAbhi you can check this library : https://github.com/git-download-manager/git-url-parse
generators/github/git_repo.go
Outdated
| if strings.Contains(doc, "kind: CustomResourceDefinition") { | ||
| hasCRD = true | ||
| break | ||
| } | ||
| // Check for JSON format | ||
| if strings.Contains(doc, "\"kind\":\"CustomResourceDefinition\"") || | ||
| strings.Contains(doc, `"kind":"CustomResourceDefinition"`) { | ||
| hasCRD = true | ||
| break | ||
| } |
|
|
||
| // isGitHubURL checks if the URL is a GitHub repository URL | ||
| // This enables automatic CRD discovery for standard GitHub URLs | ||
| func isGitHubURL(scheme string, url *url.URL) bool { |
There was a problem hiding this comment.
Using a library instead of manual parsing would be more robust.
There was a problem hiding this comment.
sure i will change from manual parsing of this to a library
| linters-settings: | ||
| staticcheck: | ||
| go: "1.25" | ||
| checks: ["all", "-ST1005"] | ||
|
|
||
| issues: | ||
| exclude: | ||
| - "ST1005" No newline at end of file |
There was a problem hiding this comment.
Please consider reverting these changes since they are not focused on the specific issue and also they have been taken care in another PR.
There was a problem hiding this comment.
sure i ll revert the latest commit
Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
|
@CWAbhi have you addressed the changes requested by gemini? And also I cannot see any usage of libraries as requested. |
|
@parthivsaikia i have not addressed the changes requested by gemini . but i have used the libraries as requested i have used different library but ill check it once again today and address the changes requested by gemini too and commit the changes. |
Signed-off-by: Abhijeet Dey <dey20405@gmail.com>
|
@parthivsaikia can you check it once i have fixed the gemini and uses git parser library. |
|
@leecalcote @parthivsaikia any updates on this? |
Description
This PR fixes #881
Notes for Reviewers
This PR enhances the CRD discovery mechanism by allowing users to specify a repository URL as the source instead of a static directory path. The generator will automatically discover CRDs within the repository, regardless of their location in the file structure.
Implementation Details
Introduces a more robust CRD discovery strategy inspired by the logic used in crdsdev/doc.
Replaces static directory assumptions with recursive repository scanning and pattern-based CRD detection.
Improves resilience against repository structure changes without impacting existing functionality.
Benefits
Reduces configuration complexity for users.
Makes CRD discovery resilient to repository refactors.
Improves usability and long-term maintainability of the model generator.
Example Usage
Source: https://github.com/crossplane/crossplaneThe generator automatically locates CRDs without requiring users to specify internal paths.
Signed commits