Support ignoring generated files #3318

mbg · 2025-11-19T19:26:32Z

This PR adds experimental support for excluding files that are marked as linguist-generated=true in a .gitattributes file from analysis.

Risk assessment

For internal use only. Please select the risk level of this change:

Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types:

Managed - Impacts users with dynamic workflows (Default Setup, CCR, ...).

Products:

CCR - The changes impact analyses for Copilot Code Reviews.

Environments:

Dotcom - Impacts CodeQL workflows on github.com.

How did/will you validate this change?

Test repository - This change will be tested on a test repository before merging.
Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

Feature flags - All new or changed code paths can be fully disabled with corresponding feature flags.

How will you know if something goes wrong after this change is released?

Telemetry - I rely on existing telemetry or have made changes to the telemetry.
- Dashboards - I will watch relevant dashboards for issues after the release. Consider whether this requires this change to be released at a particular time rather than as part of a regular release.
- Alerts - New or existing monitors will trip if something goes wrong with this change.

Are there any special considerations for merging or releasing this change?

No special considerations - This change can be merged at any time.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Consider adding a changelog entry for this change.
Confirm the readme and docs have been updated if necessary.

Copilot

Pull Request Overview

This PR adds experimental support for excluding generated files (marked with linguist-generated=true in .gitattributes) from CodeQL analysis. The feature is controlled by the ignore_generated_files feature flag and is automatically enabled for Copilot Code Reviews (CCR).

Key Changes

New git utility functions (listFiles, getGeneratedFiles) to identify generated files via git attributes
New isCCR() helper function to detect CCR execution context
Integration into config initialization to add generated files to paths-ignore

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
src/git-utils.ts	Adds functions to list tracked files and identify files marked as generated via linguist-generated attribute
src/git-utils.test.ts	Adds unit tests for the new git utility functions
src/feature-flags.ts	Defines the IgnoreGeneratedFiles feature flag with environment variable CODEQL_ACTION_IGNORE_GENERATED_FILES
src/config-utils.ts	Integrates generated file detection into config initialization, adding them to paths-ignore when feature is enabled or in CCR
src/actions-util.ts	Adds isCCR() detection function and modifies isDefaultSetup() to exclude CCR scenarios
src/actions-util.test.ts	Adds tests for the new workflow detection functions
lib/*.js	Auto-generated JavaScript code mirroring TypeScript changes (not reviewed per guidelines)

Copilot · 2025-11-19T19:51:26Z

src/git-utils.ts

+  const files = await listFiles(workingDirectory);
+  const stdout = await runGitCommand(
+    workingDirectory,
+    ["check-attr", "linguist-generated", "--", ...files],
+    "Unable to check attributes of files.",


Passing all files as command-line arguments via the spread operator could hit command-line length limits in repositories with very large numbers of files (e.g., tens of thousands). Consider using git check-attr --stdin instead, which reads file paths from stdin and avoids this limitation. This would involve piping the file list to git rather than passing it as arguments.

src/actions-util.ts

Copilot · 2025-11-19T19:51:27Z

src/git-utils.test.ts

+test("listFiles returns array of file paths", async (t) => {
+  sinon
+    .stub(gitUtils, "runGitCommand")
+    .resolves(["dir/file.txt", "README.txt"].join(os.EOL));
+
+  await t.notThrowsAsync(async () => {
+    const result = await gitUtils.listFiles("/some/path");
+    t.is(result.length, 2);
+    t.is(result[0], "dir/file.txt");
+  });
+});


The stub created in this test is not restored after the test completes. This could cause test pollution if other tests rely on runGitCommand behavior. Consider wrapping the test in a try-finally block or using t.teardown() to restore the stub, similar to how it's done in other tests in this file (e.g., lines 393-394).

Copilot · 2025-11-19T19:51:27Z

src/git-utils.ts

+    ["ls-files"],
+    "Unable to list tracked files.",
+  );
+  return stdout.split(os.EOL);


Splitting on os.EOL will include an empty string at the end of the array if the git output ends with a newline (which is typical). This empty string will then be passed to git check-attr in getGeneratedFiles. Consider filtering out empty strings: return stdout.split(os.EOL).filter((line) => line.length > 0);

Suggested change

return stdout.split(os.EOL);

return stdout.split(os.EOL).filter((line) => line.length > 0);

Copilot · 2025-11-19T19:51:27Z

src/git-utils.ts

+
+  const generatedFiles: string[] = [];
+  const regex = /^([^:]+): linguist-generated: true$/;
+  for (const result of stdout.split(os.EOL)) {


Splitting on os.EOL will include an empty string at the end of the array if the git output ends with a newline (which is typical). The regex won't match empty strings, but it's cleaner to filter them out explicitly: for (const result of stdout.split(os.EOL).filter((line) => line.length > 0)). This pattern is used elsewhere in the codebase (e.g., line 284 checks if (line) before processing).

Suggested change

for (const result of stdout.split(os.EOL)) {

for (const result of stdout.split(os.EOL).filter((line) => line.length > 0)) {

Copilot · 2025-11-19T19:51:28Z

src/git-utils.ts

+export async function getGeneratedFiles(
+  workingDirectory: string,
+): Promise<string[]> {
+  const files = await listFiles(workingDirectory);
+  const stdout = await runGitCommand(
+    workingDirectory,
+    ["check-attr", "linguist-generated", "--", ...files],
+    "Unable to check attributes of files.",
+  );
+
+  const generatedFiles: string[] = [];
+  const regex = /^([^:]+): linguist-generated: true$/;
+  for (const result of stdout.split(os.EOL)) {
+    const match = result.match(regex);
+    if (match) {
+      generatedFiles.push(match[1]);
+    }
+  }
+
+  return generatedFiles;
+}


Missing JSDoc documentation. Consider adding a comment explaining what this function does, its parameters, and return value. For example: /**\n * Returns a list of files marked as generated via the linguist-generated attribute in .gitattributes.\n * @param workingDirectory The directory to check for generated files.\n * @returns An array of file paths (relative to the working directory) that are marked as generated.\n */

Copilot · 2025-11-19T19:51:29Z

src/git-utils.test.ts

+  await t.notThrowsAsync(async () => {
+    const result = await gitUtils.getGeneratedFiles("/some/path");
+
+    t.assert(runGitCommandStub.calledTwice);
+
+    t.is(result.length, 1);
+    t.is(result[0], "test.json");
+  });


The stub runGitCommandStub is not restored after the test completes. Add a try-finally block or use t.teardown() to ensure the stub is restored, similar to how it's done in other tests in this file (e.g., lines 393-394).

Suggested change

await t.notThrowsAsync(async () => {

const result = await gitUtils.getGeneratedFiles("/some/path");

t.assert(runGitCommandStub.calledTwice);

t.is(result.length, 1);

t.is(result[0], "test.json");

});

try {

await t.notThrowsAsync(async () => {

const result = await gitUtils.getGeneratedFiles("/some/path");

t.assert(runGitCommandStub.calledTwice);

t.is(result.length, 1);

t.is(result[0], "test.json");

});

} finally {

runGitCommandStub.restore();

}

Copilot · 2025-11-19T19:51:29Z

src/actions-util.ts

+  return (
+    (isDynamicWorkflow() &&
+      process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith(
+        "dynamic/copilot-pull-request-reviewer",
+      )) ||
+    false
+  );


The logic can be simplified. The || false is needed to convert undefined from optional chaining to false, but the outer parentheses are unnecessary. Consider: return isDynamicWorkflow() && (process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith("dynamic/copilot-pull-request-reviewer") ?? false); This is clearer and uses the nullish coalescing operator which is more explicit about handling undefined.

Suggested change

return (

(isDynamicWorkflow() &&

process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith(

"dynamic/copilot-pull-request-reviewer",

)) ||

false

);

return isDynamicWorkflow() &&

(process.env["CODEQL_ACTION_ANALYSIS_KEY"]?.startsWith("dynamic/copilot-pull-request-reviewer") ?? false);

Copilot · 2025-11-19T19:51:29Z

src/actions-util.test.ts

+  process.env.GITHUB_EVENT_NAME = "dynamic";
+  t.assert(isDynamicWorkflow());
+  process.env.GITHUB_EVENT_NAME = "push";
+  t.false(isDynamicWorkflow());


This test directly modifies process.env without proper cleanup, which could cause test pollution. Consider using withMockedEnv (defined at line 30) to ensure environment variables are restored after the test, similar to how it's done in other tests in this file (e.g., lines 206-251).

Suggested change

process.env.GITHUB_EVENT_NAME = "dynamic";

t.assert(isDynamicWorkflow());

process.env.GITHUB_EVENT_NAME = "push";

t.false(isDynamicWorkflow());

withMockedEnv(

{ GITHUB_EVENT_NAME: "dynamic" },

() => {

t.assert(isDynamicWorkflow());

},

);

withMockedEnv(

{ GITHUB_EVENT_NAME: "push" },

() => {

t.false(isDynamicWorkflow());

},

);

Copilot · 2025-11-19T19:51:29Z

src/actions-util.test.ts

+  process.env.GITHUB_EVENT_NAME = "dynamic";
+  t.assert(isDynamicWorkflow());
+  process.env.GITHUB_EVENT_NAME = "push";
+  t.false(isDynamicWorkflow());
+});
+
+test("isCCR() returns true when expected", (t) => {
+  process.env.GITHUB_EVENT_NAME = "dynamic";
+  process.env.CODEQL_ACTION_ANALYSIS_KEY =
+    "dynamic/copilot-pull-request-reviewer";
+  t.assert(isCCR());
+  t.false(isDefaultSetup());
+});
+
+test("isDefaultSetup() returns true when expected", (t) => {
+  process.env.GITHUB_EVENT_NAME = "dynamic";
+  process.env.CODEQL_ACTION_ANALYSIS_KEY = "dynamic/github-code-scanning";
+  t.assert(isDefaultSetup());
+  t.false(isCCR());


This test directly modifies process.env without proper cleanup, which could cause test pollution. Consider using withMockedEnv (defined at line 30) to ensure environment variables are restored after the test, similar to how it's done in other tests in this file (e.g., lines 206-251).

Suggested change

process.env.GITHUB_EVENT_NAME = "dynamic";

t.assert(isDynamicWorkflow());

process.env.GITHUB_EVENT_NAME = "push";

t.false(isDynamicWorkflow());

});

test("isCCR() returns true when expected", (t) => {

process.env.GITHUB_EVENT_NAME = "dynamic";

process.env.CODEQL_ACTION_ANALYSIS_KEY =

"dynamic/copilot-pull-request-reviewer";

t.assert(isCCR());

t.false(isDefaultSetup());

});

test("isDefaultSetup() returns true when expected", (t) => {

process.env.GITHUB_EVENT_NAME = "dynamic";

process.env.CODEQL_ACTION_ANALYSIS_KEY = "dynamic/github-code-scanning";

t.assert(isDefaultSetup());

t.false(isCCR());

withMockedEnv(

{ GITHUB_EVENT_NAME: "dynamic" },

() => {

t.assert(isDynamicWorkflow());

},

);

withMockedEnv(

{ GITHUB_EVENT_NAME: "push" },

() => {

t.false(isDynamicWorkflow());

},

);

});

test("isCCR() returns true when expected", (t) => {

withMockedEnv(

{

GITHUB_EVENT_NAME: "dynamic",

CODEQL_ACTION_ANALYSIS_KEY: "dynamic/copilot-pull-request-reviewer",

},

() => {

t.assert(isCCR());

t.false(isDefaultSetup());

},

);

});

test("isDefaultSetup() returns true when expected", (t) => {

withMockedEnv(

{

GITHUB_EVENT_NAME: "dynamic",

CODEQL_ACTION_ANALYSIS_KEY: "dynamic/github-code-scanning",

},

() => {

t.assert(isDefaultSetup());

t.false(isCCR());

},

);

henrymercer · 2025-11-20T09:57:09Z

src/actions-util.ts

+}
+
+/** Determines whether we are running in CCR. */
+export function isCCR(): boolean {


What about introducing an environment variable we set in CCR, rather than relying on the analysis key?

henrymercer · 2025-11-20T10:02:05Z

src/git-utils.ts

+export async function getGeneratedFiles(
+  workingDirectory: string,
+): Promise<string[]> {
+  const files = await listFiles(workingDirectory);


This could potentially be a very large number of files, too many to pass on the command line.

If we're mainly interested in CCR, could we filter down to just the diff here?

Alternatively, we could parse globs from the .gitattributes file rather than finding all files that match. That would be more likely to contain one entry for a large directory rather than potentially hundreds.

Or we could add a limit on the number of files on which we'll run check-attr.

henrymercer · 2025-11-20T10:02:58Z

src/git-utils.ts

+export async function getGeneratedFiles(
+  workingDirectory: string,
+): Promise<string[]> {


Have you measured how long this operation takes overall on a large mono-repo?

henrymercer · 2025-11-20T10:03:48Z

src/config-utils.ts

+  // If we are in CCR or the corresponding FF is enabled, try to determine
+  // which files in the repository are marked as generated and add them to
+  // the `paths-ignore` configuration.
+  if ((await features.getValue(Feature.IgnoreGeneratedFiles)) || isCCR()) {


I'd advise rolling this out behind a feature flag first, even in CCR.

This was actually an &&, but I changed it for testing purposes 😅

mbg added 3 commits November 19, 2025 15:35

Add function to query git for all generated files

1512f40

Add isCCR helper, and update isDefaultSetup

3eaf000

Add IgnoreGeneratedFiles FF

846f859

github-actions bot added the size/M Should be of average difficulty to review label Nov 19, 2025

Add generated files to paths-ignore, if FF is enabled

b4db382

mbg force-pushed the mbg/ignore-generated branch from 2fac308 to b4db382 Compare November 19, 2025 19:42

mbg marked this pull request as ready for review November 19, 2025 19:44

mbg requested a review from a team as a code owner November 19, 2025 19:44

Copilot AI review requested due to automatic review settings November 19, 2025 19:44

Copilot started reviewing on behalf of mbg November 19, 2025 19:45 View session

Copilot finished reviewing on behalf of mbg November 19, 2025 19:49

Copilot AI reviewed Nov 19, 2025

View reviewed changes

henrymercer reviewed Nov 20, 2025

View reviewed changes

	return stdout.split(os.EOL);
	return stdout.split(os.EOL).filter((line) => line.length > 0);

	for (const result of stdout.split(os.EOL)) {
	for (const result of stdout.split(os.EOL).filter((line) => line.length > 0)) {

Support ignoring generated files #3318

Are you sure you want to change the base?

Support ignoring generated files #3318

Uh oh!

Conversation

mbg commented Nov 19, 2025

Risk assessment

Which use cases does this change impact?

How did/will you validate this change?

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

How will you know if something goes wrong after this change is released?

Are there any special considerations for merging or releasing this change?

Merge / deployment checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

henrymercer Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

henrymercer Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

henrymercer Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

henrymercer Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

mbg Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants