-
Notifications
You must be signed in to change notification settings - Fork 420
Support ignoring generated files #3318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2fac308 to
b4db382
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds experimental support for excluding generated files (marked with linguist-generated=true in .gitattributes) from CodeQL analysis. The feature is controlled by the ignore_generated_files feature flag and is automatically enabled for Copilot Code Reviews (CCR).
Key Changes
- New git utility functions (
listFiles,getGeneratedFiles) to identify generated files via git attributes - New
isCCR()helper function to detect CCR execution context - Integration into config initialization to add generated files to
paths-ignore
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/git-utils.ts | Adds functions to list tracked files and identify files marked as generated via linguist-generated attribute |
| src/git-utils.test.ts | Adds unit tests for the new git utility functions |
| src/feature-flags.ts | Defines the IgnoreGeneratedFiles feature flag with environment variable CODEQL_ACTION_IGNORE_GENERATED_FILES |
| src/config-utils.ts | Integrates generated file detection into config initialization, adding them to paths-ignore when feature is enabled or in CCR |
| src/actions-util.ts | Adds isCCR() detection function and modifies isDefaultSetup() to exclude CCR scenarios |
| src/actions-util.test.ts | Adds tests for the new workflow detection functions |
| lib/*.js | Auto-generated JavaScript code mirroring TypeScript changes (not reviewed per guidelines) |
| const files = await listFiles(workingDirectory); | ||
| const stdout = await runGitCommand( | ||
| workingDirectory, | ||
| ["check-attr", "linguist-generated", "--", ...files], | ||
| "Unable to check attributes of files.", |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing all files as command-line arguments via the spread operator could hit command-line length limits in repositories with very large numbers of files (e.g., tens of thousands). Consider using git check-attr --stdin instead, which reads file paths from stdin and avoids this limitation. This would involve piping the file list to git rather than passing it as arguments.
| test("listFiles returns array of file paths", async (t) => { | ||
| sinon | ||
| .stub(gitUtils, "runGitCommand") | ||
| .resolves(["dir/file.txt", "README.txt"].join(os.EOL)); | ||
|
|
||
| await t.notThrowsAsync(async () => { | ||
| const result = await gitUtils.listFiles("/some/path"); | ||
| t.is(result.length, 2); | ||
| t.is(result[0], "dir/file.txt"); | ||
| }); | ||
| }); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stub created in this test is not restored after the test completes. This could cause test pollution if other tests rely on runGitCommand behavior. Consider wrapping the test in a try-finally block or using t.teardown() to restore the stub, similar to how it's done in other tests in this file (e.g., lines 393-394).
| ["ls-files"], | ||
| "Unable to list tracked files.", | ||
| ); | ||
| return stdout.split(os.EOL); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting on os.EOL will include an empty string at the end of the array if the git output ends with a newline (which is typical). This empty string will then be passed to git check-attr in getGeneratedFiles. Consider filtering out empty strings: return stdout.split(os.EOL).filter((line) => line.length > 0);
| return stdout.split(os.EOL); | |
| return stdout.split(os.EOL).filter((line) => line.length > 0); |
|
|
||
| const generatedFiles: string[] = []; | ||
| const regex = /^([^:]+): linguist-generated: true$/; | ||
| for (const result of stdout.split(os.EOL)) { |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting on os.EOL will include an empty string at the end of the array if the git output ends with a newline (which is typical). The regex won't match empty strings, but it's cleaner to filter them out explicitly: for (const result of stdout.split(os.EOL).filter((line) => line.length > 0)). This pattern is used elsewhere in the codebase (e.g., line 284 checks if (line) before processing).
| for (const result of stdout.split(os.EOL)) { | |
| for (const result of stdout.split(os.EOL).filter((line) => line.length > 0)) { |
| export async function getGeneratedFiles( | ||
| workingDirectory: string, | ||
| ): Promise<string[]> { | ||
| const files = await listFiles(workingDirectory); | ||
| const stdout = await runGitCommand( | ||
| workingDirectory, | ||
| ["check-attr", "linguist-generated", "--", ...files], | ||
| "Unable to check attributes of files.", | ||
| ); | ||
|
|
||
| const generatedFiles: string[] = []; | ||
| const regex = /^([^:]+): linguist-generated: true$/; | ||
| for (const result of stdout.split(os.EOL)) { | ||
| const match = result.match(regex); | ||
| if (match) { | ||
| generatedFiles.push(match[1]); | ||
| } | ||
| } | ||
|
|
||
| return generatedFiles; | ||
| } |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing JSDoc documentation. Consider adding a comment explaining what this function does, its parameters, and return value. For example: /**\n * Returns a list of files marked as generated via the linguist-generated attribute in .gitattributes.\n * @param workingDirectory The directory to check for generated files.\n * @returns An array of file paths (relative to the working directory) that are marked as generated.\n */
| await t.notThrowsAsync(async () => { | ||
| const result = await gitUtils.getGeneratedFiles("/some/path"); | ||
|
|
||
| t.assert(runGitCommandStub.calledTwice); | ||
|
|
||
| t.is(result.length, 1); | ||
| t.is(result[0], "test.json"); | ||
| }); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stub runGitCommandStub is not restored after the test completes. Add a try-finally block or use t.teardown() to ensure the stub is restored, similar to how it's done in other tests in this file (e.g., lines 393-394).
| await t.notThrowsAsync(async () => { | |
| const result = await gitUtils.getGeneratedFiles("/some/path"); | |
| t.assert(runGitCommandStub.calledTwice); | |
| t.is(result.length, 1); | |
| t.is(result[0], "test.json"); | |
| }); | |
| try { | |
| await t.notThrowsAsync(async () => { | |
| const result = await gitUtils.getGeneratedFiles("/some/path"); | |
| t.assert(runGitCommandStub.calledTwice); | |
| t.is(result.length, 1); | |
| t.is(result[0], "test.json"); | |
| }); | |
| } finally { | |
| runGitCommandStub.restore(); | |
| } |
| return ( | ||
| (isDynamicWorkflow() && | ||
| process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith( | ||
| "dynamic/copilot-pull-request-reviewer", | ||
| )) || | ||
| false | ||
| ); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic can be simplified. The || false is needed to convert undefined from optional chaining to false, but the outer parentheses are unnecessary. Consider: return isDynamicWorkflow() && (process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith("dynamic/copilot-pull-request-reviewer") ?? false); This is clearer and uses the nullish coalescing operator which is more explicit about handling undefined.
| return ( | |
| (isDynamicWorkflow() && | |
| process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith( | |
| "dynamic/copilot-pull-request-reviewer", | |
| )) || | |
| false | |
| ); | |
| return isDynamicWorkflow() && | |
| (process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith("dynamic/copilot-pull-request-reviewer") ?? false); |
| process.env.GITHUB_EVENT_NAME = "dynamic"; | ||
| t.assert(isDynamicWorkflow()); | ||
| process.env.GITHUB_EVENT_NAME = "push"; | ||
| t.false(isDynamicWorkflow()); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test directly modifies process.env without proper cleanup, which could cause test pollution. Consider using withMockedEnv (defined at line 30) to ensure environment variables are restored after the test, similar to how it's done in other tests in this file (e.g., lines 206-251).
| process.env.GITHUB_EVENT_NAME = "dynamic"; | |
| t.assert(isDynamicWorkflow()); | |
| process.env.GITHUB_EVENT_NAME = "push"; | |
| t.false(isDynamicWorkflow()); | |
| withMockedEnv( | |
| { GITHUB_EVENT_NAME: "dynamic" }, | |
| () => { | |
| t.assert(isDynamicWorkflow()); | |
| }, | |
| ); | |
| withMockedEnv( | |
| { GITHUB_EVENT_NAME: "push" }, | |
| () => { | |
| t.false(isDynamicWorkflow()); | |
| }, | |
| ); |
| process.env.GITHUB_EVENT_NAME = "dynamic"; | ||
| t.assert(isDynamicWorkflow()); | ||
| process.env.GITHUB_EVENT_NAME = "push"; | ||
| t.false(isDynamicWorkflow()); | ||
| }); | ||
|
|
||
| test("isCCR() returns true when expected", (t) => { | ||
| process.env.GITHUB_EVENT_NAME = "dynamic"; | ||
| process.env.CODEQL_ACTION_ANALYSIS_KEY = | ||
| "dynamic/copilot-pull-request-reviewer"; | ||
| t.assert(isCCR()); | ||
| t.false(isDefaultSetup()); | ||
| }); | ||
|
|
||
| test("isDefaultSetup() returns true when expected", (t) => { | ||
| process.env.GITHUB_EVENT_NAME = "dynamic"; | ||
| process.env.CODEQL_ACTION_ANALYSIS_KEY = "dynamic/github-code-scanning"; | ||
| t.assert(isDefaultSetup()); | ||
| t.false(isCCR()); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test directly modifies process.env without proper cleanup, which could cause test pollution. Consider using withMockedEnv (defined at line 30) to ensure environment variables are restored after the test, similar to how it's done in other tests in this file (e.g., lines 206-251).
| process.env.GITHUB_EVENT_NAME = "dynamic"; | |
| t.assert(isDynamicWorkflow()); | |
| process.env.GITHUB_EVENT_NAME = "push"; | |
| t.false(isDynamicWorkflow()); | |
| }); | |
| test("isCCR() returns true when expected", (t) => { | |
| process.env.GITHUB_EVENT_NAME = "dynamic"; | |
| process.env.CODEQL_ACTION_ANALYSIS_KEY = | |
| "dynamic/copilot-pull-request-reviewer"; | |
| t.assert(isCCR()); | |
| t.false(isDefaultSetup()); | |
| }); | |
| test("isDefaultSetup() returns true when expected", (t) => { | |
| process.env.GITHUB_EVENT_NAME = "dynamic"; | |
| process.env.CODEQL_ACTION_ANALYSIS_KEY = "dynamic/github-code-scanning"; | |
| t.assert(isDefaultSetup()); | |
| t.false(isCCR()); | |
| withMockedEnv( | |
| { GITHUB_EVENT_NAME: "dynamic" }, | |
| () => { | |
| t.assert(isDynamicWorkflow()); | |
| }, | |
| ); | |
| withMockedEnv( | |
| { GITHUB_EVENT_NAME: "push" }, | |
| () => { | |
| t.false(isDynamicWorkflow()); | |
| }, | |
| ); | |
| }); | |
| test("isCCR() returns true when expected", (t) => { | |
| withMockedEnv( | |
| { | |
| GITHUB_EVENT_NAME: "dynamic", | |
| CODEQL_ACTION_ANALYSIS_KEY: "dynamic/copilot-pull-request-reviewer", | |
| }, | |
| () => { | |
| t.assert(isCCR()); | |
| t.false(isDefaultSetup()); | |
| }, | |
| ); | |
| }); | |
| test("isDefaultSetup() returns true when expected", (t) => { | |
| withMockedEnv( | |
| { | |
| GITHUB_EVENT_NAME: "dynamic", | |
| CODEQL_ACTION_ANALYSIS_KEY: "dynamic/github-code-scanning", | |
| }, | |
| () => { | |
| t.assert(isDefaultSetup()); | |
| t.false(isCCR()); | |
| }, | |
| ); |
| } | ||
|
|
||
| /** Determines whether we are running in CCR. */ | ||
| export function isCCR(): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about introducing an environment variable we set in CCR, rather than relying on the analysis key?
| export async function getGeneratedFiles( | ||
| workingDirectory: string, | ||
| ): Promise<string[]> { | ||
| const files = await listFiles(workingDirectory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially be a very large number of files, too many to pass on the command line.
If we're mainly interested in CCR, could we filter down to just the diff here?
Alternatively, we could parse globs from the .gitattributes file rather than finding all files that match. That would be more likely to contain one entry for a large directory rather than potentially hundreds.
Or we could add a limit on the number of files on which we'll run check-attr.
| export async function getGeneratedFiles( | ||
| workingDirectory: string, | ||
| ): Promise<string[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you measured how long this operation takes overall on a large mono-repo?
| // If we are in CCR or the corresponding FF is enabled, try to determine | ||
| // which files in the repository are marked as generated and add them to | ||
| // the `paths-ignore` configuration. | ||
| if ((await features.getValue(Feature.IgnoreGeneratedFiles)) || isCCR()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advise rolling this out behind a feature flag first, even in CCR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually an &&, but I changed it for testing purposes 😅
This PR adds experimental support for excluding files that are marked as
linguist-generated=truein a.gitattributesfile from analysis.Risk assessment
For internal use only. Please select the risk level of this change:
Which use cases does this change impact?
Workflow types:
dynamicworkflows (Default Setup, CCR, ...).Products:
Environments:
github.com.How did/will you validate this change?
.test.tsfiles).pr-checks).If something goes wrong after this change is released, what are the mitigation and rollback strategies?
How will you know if something goes wrong after this change is released?
Are there any special considerations for merging or releasing this change?
Merge / deployment checklist