central detect sensitive files #40

nilanjan-sikdar · 2026-01-22T07:24:54Z

Add

Separated Code and Config: Created .github/workflows/scripts/sensitive_file_check.py and .github/workflows/config/sensitive_files.txt to decouple the checking logic from the regex patterns.
Updated Workflow: Modified .github/workflows/check-sensitive-files.yml to use the new Python script instead of inline Bash commands.
Added Tests: Implemented unit tests in .github/workflows/scripts/test/test_sensitive_file_check.py to ensure the script works correctly.

Resolves #34

Summary by CodeRabbit

Chores
- Improved CI security with a new Python-based sensitive-file validator driven by a curated pattern list.
- Replaced large static checks with dynamic changed-file detection; enforcement now follows the validator's exit code and retains override messaging.
- Preserved early exit for non-PR contexts and simplified workflow flow.
Tests
- Added unit tests covering pattern loading, file discovery, and sensitive-file detection behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2026-01-22T07:25:04Z

Our Pull Request Approval Process

This PR will be reviewed according to our:

Your PR may be automatically closed if:

Our PR template isn't filled in correctly
You haven't correctly linked your PR to an issue

👉 https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

Thanks for contributing!

coderabbitai · 2026-01-22T07:25:10Z

Walkthrough

Refactors the sensitive-file check: moves pattern list into .github/workflows/config/sensitive_files.txt, adds .github/workflows/scripts/sensitive_file_check.py to validate changed files, and updates the GitHub Actions workflow to detect changed files via git and invoke the script; script exit code now enforces failures.

Changes

Cohort / File(s)	Summary
Workflow `.github/workflows/check-sensitive-files.yml`	Replaced inline sensitive-pattern loop with git-based changed-file detection and a single invocation of `sensitive_file_check.py`; removed previous GH_OUTPUT emission and separate failure step; retains non-PR early exit and override message on script failure.
Config `.github/workflows/config/sensitive_files.txt`	New canonical list of sensitive file/glob/regex patterns used by the script.
Script `.github/workflows/scripts/sensitive_file_check.py`	New CLI Python utility: `load_patterns()`, `get_files_to_check()`, `check_files()`, and `main()` using argparse; loads regex config, expands directories/files, reports matches and exits non‑zero on sensitive matches.
Tests `.github/workflows/scripts/test/test_sensitive_file_check.py`	New unit tests covering pattern loading, path resolution, regex matching, error handling; uses mocks for filesystem and IO.

Sequence Diagram(s)

sequenceDiagram
  participant Actions as GitHub Actions
  participant Git as git
  participant Runner as Runner FS
  participant Script as sensitive_file_check.py

  Actions->>Git: git diff --name-only origin/main...HEAD
  Git-->>Actions: list of changed file paths (CHANGED_FILES)
  Actions->>Runner: pass CHANGED_FILES + config path
  Runner->>Script: execute sensitive_file_check.py --config config/... --files <paths>
  Script->>Runner: open/read config file
  Script->>Runner: walk filesystem for provided paths
  Script->>Script: match files against compiled regex patterns
  alt matches found
    Script-->>Actions: exit code 1 + printed override instructions
    Actions-->>Actions: job fails (script exit enforces failure)
  else no matches
    Script-->>Actions: exit code 0 (no sensitive files)
    Actions-->>Actions: job continues
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through configs and scripts with delight,
Patterns now tidy and kept out of sight.
Git lists the changes, the Python inspects,
Regex finds secrets and kindly directs.
A rabbit-approved check—swift, simple, and bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'central detect sensitive files' is vague and generic, using imprecise terminology that doesn't clearly convey the main change.	Improve the title to be more specific and clear, such as 'Refactor sensitive file detection into centralized Python script' or 'Extract sensitive file patterns to configuration and Python script'.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The pull request fulfills all core coding requirements from issue `#34`: creates central Python script with argparse, stores patterns in config file, implements multi-file/directory support, adds unit tests, and maintains functional parity with existing check.
Out of Scope Changes check	✅ Passed	All changes are directly related to issue `#34` objectives. No out-of-scope modifications detected beyond the intended refactoring.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In @.github/workflows/check-sensitive-files.yml:
- Around line 50-68: Add the sensitive-file check as a prerequisite to both PR
workflows by making the existing check workflow reusable and invoking it from
pull-request-review.yml and pull-request-target.yml: update
.github/workflows/check-sensitive-files.yml to support being called (on:
workflow_call) if it isn’t already, then in both pull-request-review.yml and
pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.
- Around line 62-68: CHANGED_FILES is built by joining filenames with spaces
which breaks filenames that contain spaces; change the logic to read git diff
output into a bash array (use git diff --name-only --diff-filter=ACMRD
"$BASE_SHA" "$HEAD_SHA" piped into a null/line-safe reader) and then call the
Python checker with the array expanded so each filename is a single argument
(use "${CHANGED_FILES[@]}"); also update the emptiness check to test the array
length (e.g., ${`#CHANGED_FILES`[@]} > 0) before invoking sensitive_file_check.py
with --files to ensure filenames containing spaces are preserved and handled
correctly.

In @.github/workflows/config/sensitive_files.txt:
- Around line 2-23: The regex entries in sensitive_files.txt use unescaped dots
and broad patterns (e.g., ".flake8$", ".pydocstyle$", ".env..*$", "^.github/.*",
"^.husky/.*", "scripts/.*", "src/style/.*", ".gitignore$", "env.example$",
".eslintignore$", ".prettierrc$", ".prettierignore$", and any ".*.pem$" style
entries) which causes over‑matching; update each pattern to escape literal dots
(e.g., use "\." for ".") and ensure anchors are correct (keep leading "^" and
trailing "$" where intended) so entries match exact filenames or paths rather
than any character sequences, and apply this change consistently across all
similar lines mentioned in the diff.

In @.github/workflows/scripts/sensitive_file_check.py:
- Around line 55-66: The code currently normalizes only directory-walked file
paths (using rel_path) causing direct file paths to remain absolute and miss
regex matches; update the loop where you handle os.path.isfile(path) and where
you append full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().

In @.github/workflows/scripts/test/test_sensitive_file_check.py:
- Around line 51-53: The test patch for os.path.relpath uses an unused lambda
argument named start which causes lint failures; update the mock in the test
(the patch around os.path.relpath used when calling
sensitive_file_check.get_files_to_check) to rename the unused parameter to
_start (or *_), e.g. change lambda p, start: p.replace("/root/", "") to lambda
p, _start: p.replace("/root/", "") so Ruff stops flagging the unused argument.

🧹 Nitpick comments (1)

.github/workflows/scripts/sensitive_file_check.py (1)
26-38: Narrow the broad exception in load_patterns.
Catching Exception (Line 36) can mask programming errors and is typically discouraged by linters. Prefer OSError/UnicodeError for file I/O and decoding issues.
♻️ Proposed fix
-    except Exception as error:  # pylint: disable=broad-except
+    except (OSError, UnicodeError) as error:
         print(f"Error reading configuration file: {error}")
         sys.exit(1)

coderabbitai · 2026-01-22T07:32:34Z

.github/workflows/check-sensitive-files.yml

      - name: Detect sensitive file changes
        if: steps.check-labels.outputs.skip != 'true'
        id: sensitive-check
        run: |
          # Skip if not a PR
          if [ -z "${{ github.event.pull_request.base.sha }}" ]; then
-            echo "any_changed=false" >> $GITHUB_OUTPUT
            exit 0
          fi

          HEAD_SHA="${{ github.event.pull_request.head.sha || github.sha }}"
          BASE_SHA=$(git merge-base "${{ github.event.pull_request.base.sha }}" "$HEAD_SHA")

-          SENSITIVE_PATTERNS=(
-            '.flake8$'
-            '.pydocstyle$'
-            'pyproject.toml$'
-            '.env..*$'
-            'vitest.config.js$'
-            'src/App.tsx$'
-            '^.github/.*'
-            '^.husky/.*'
-            '^scripts/.*'
-            '^src/style/.*'
-            'schema.graphql$'
-            'package.json$'
-            'package-lock.json$'
-            'tsconfig.json$'
-            '^.gitignore$'
-            '^env.example$'
-            '.node-version$'
-            '.eslintrc.json$'
-            '.eslintignore$'
-            '.prettierrc$'
-            '.prettierignore$'
-            'vite.config.ts$'
-            '^docker/docker-compose.prod.yaml$'
-            '^docker/docker-compose.dev.yaml$'
-            '^docker/Dockerfile.dev$'
-            '^docker/Dockerfile.prod$'
-            '^config/docker/setup/nginx.conf$'
-            '^config/docker/setup/nginx.prod.conf$'
-            'CODEOWNERS$'
-            'LICENSE$'
-            'setup.ts$'
-            '.coderabbit.yaml$'
-            'CODE_OF_CONDUCT.md$'
-            'CODE_STYLE.md$'
-            'CONTRIBUTING.md$'
-            'DOCUMENTATION.md$'
-            'INSTALLATION.md$'
-            'ISSUE_GUIDELINES.md$'
-            'PR_GUIDELINES.md$'
-            'README.md$'
-            'index.html$'
-            '.*.pem$'
-            '.*.key$'
-            '.*.cert$'
-            '.*.password$'
-            '.*.secret$'
-            '.*.credentials$'
-            '.nojekyll$'
-            'yarn.lock$'
-            'knip.json$'
-            'knip.deps.json$'
-            '^docs/docusaurus.config.ts$'
-            '^docs/sidebar..*'
-            'CNAME$'
-          )
+          CHANGED_FILES="$(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA" | tr '\n' ' ')"

-          CHANGED_FILES="$(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA")"
-          UNAUTHORIZED=""
-
-          for pattern in "${SENSITIVE_PATTERNS[@]}"; do
-            MATCHES=$(echo "$CHANGED_FILES" | grep -E "$pattern" || true)
-            if [ -n "$MATCHES" ]; then
-              UNAUTHORIZED="$UNAUTHORIZED $MATCHES"
+          if [ -n "$CHANGED_FILES" ]; then
+            if ! python3 .github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files $CHANGED_FILES; then
+              echo ""
+              echo "To override, add the '${{ inputs.skip-label }}' label to this PR."
+              exit 1


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify pull-request.yml includes the check-sensitive-files prerequisite. fd -a '^pull-request\.yml$' .github/workflows rg -n "Pre-Test-Checks-Pass|check-sensitive-files" .github/workflows/pull-request.yml

Repository: PalisadoesFoundation/.github

Length of output: 146

🏁 Script executed:

#!/bin/bash # List all workflow files fd -a '\.ya?ml$' .github/workflows | head -20 # Search for files mentioning Pre-Test-Checks-Pass or check-sensitive-files across all workflows rg "Pre-Test-Checks-Pass|check-sensitive-files" .github/workflows/ --type yaml

Repository: PalisadoesFoundation/.github

Length of output: 825

🏁 Script executed:

#!/bin/bash # Check the pull-request-review.yml and pull-request-target.yml files echo "=== pull-request-review.yml ===" cat .github/workflows/pull-request-review.yml echo "" echo "=== pull-request-target.yml ===" cat .github/workflows/pull-request-target.yml

Repository: PalisadoesFoundation/.github

Length of output: 5119

The check-sensitive-files workflow must be integrated as a prerequisite to the PR workflow.
The file .github/workflows/pull-request.yml does not exist. The actual PR workflows are pull-request-review.yml and pull-request-target.yml, neither of which currently reference or run check-sensitive-files. This workflow needs to be wired into the PR process to ensure sensitive file changes are detected before PR approval.

🤖 Prompt for AI Agents

In @.github/workflows/check-sensitive-files.yml around lines 50 - 68, Add the sensitive-file check as a prerequisite to both PR workflows by making the existing check workflow reusable and invoking it from pull-request-review.yml and pull-request-target.yml: update .github/workflows/check-sensitive-files.yml to support being called (on: workflow_call) if it isn’t already, then in both pull-request-review.yml and pull-request-target.yml add a job that uses ./.github/workflows/check-sensitive-files.yml with id sensitive-check (or preserve the job id "sensitive-check") and make the rest of the PR jobs depend on it via needs: [sensitive-check] so the "Detect sensitive file changes" job runs and blocks the PR jobs when it fails.

.github/workflows/check-sensitive-files.yml

.github/workflows/config/sensitive_files.txt

coderabbitai · 2026-01-22T07:32:35Z

.github/workflows/scripts/sensitive_file_check.py

+    files_to_check = []
+    for path in paths:
+        if os.path.isfile(path):
+            files_to_check.append(path)
+        elif os.path.isdir(path):
+            for root, _, files in os.walk(path):
+                for file in files:
+                    full_path = os.path.join(root, file)
+                    # Normalize path to be relative to CWD if possible
+                    rel_path = os.path.relpath(full_path, os.getcwd())
+                    files_to_check.append(rel_path)
+        else:


⚠️ Potential issue | 🟡 Minor

Normalize file paths consistently before pattern matching.
Right now, directories are normalized to CWD, but direct file paths are not (Line 57). If callers pass absolute file paths, regex patterns like ^\.github/.* won’t match, leading to false negatives.

💡 Suggested fix

- if os.path.isfile(path): - files_to_check.append(path) + if os.path.isfile(path): + files_to_check.append(os.path.relpath(path, os.getcwd()))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

files_to_check = []

for path in paths:

if os.path.isfile(path):

files_to_check.append(path)

elif os.path.isdir(path):

for root, _, files in os.walk(path):

for file in files:

full_path = os.path.join(root, file)

# Normalize path to be relative to CWD if possible

rel_path = os.path.relpath(full_path, os.getcwd())

files_to_check.append(rel_path)

else:

files_to_check = []

for path in paths:

if os.path.isfile(path):

files_to_check.append(os.path.relpath(path, os.getcwd()))

elif os.path.isdir(path):

for root, _, files in os.walk(path):

for file in files:

full_path = os.path.join(root, file)

# Normalize path to be relative to CWD if possible

rel_path = os.path.relpath(full_path, os.getcwd())

files_to_check.append(rel_path)

else:

🤖 Prompt for AI Agents

In @.github/workflows/scripts/sensitive_file_check.py around lines 55 - 66, The code currently normalizes only directory-walked file paths (using rel_path) causing direct file paths to remain absolute and miss regex matches; update the loop where you handle os.path.isfile(path) and where you append full_path/rel_path so that you call os.path.relpath (and optionally os.path.normpath) on both file and directory entries before appending to files_to_check; look for symbols files_to_check, paths, os.path.isfile, os.path.isdir, full_path and rel_path and ensure all appended paths are consistently normalized relative to os.getcwd().

.github/workflows/scripts/test/test_sensitive_file_check.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/check-sensitive-files.yml (1)

3-10: Add this workflow as a prerequisite job to the PR workflows.

The check-sensitive-files workflow is correctly configured as a reusable workflow (workflow_call), but it is not currently called from any PR workflows. According to issue #34, it must be integrated as a prerequisite to both pull-request-review.yml and pull-request-target.yml. Add a job that calls this workflow in both files using the uses directive.

🤖 Fix all issues with AI agents

In @.github/workflows/check-sensitive-files.yml:
- Around line 62-64: Fix the bash array-length test for CHANGED_FILES: replace
the invalid backticked expression used in the conditional (the if test
referencing `${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).

🧹 Nitpick comments (1)

.github/workflows/config/sensitive_files.txt (1)
1-55: LGTM overall – patterns are properly escaped based on past review feedback.

Most patterns now correctly escape literal dots. A few minor inconsistencies remain:

Line 5: ^\.env..*$ – the dot before .* should be escaped as ^\.env\..*$ to match files like .env.local rather than .envXlocal.

Line 33: .coderabbit.yaml$ – missing ^ anchor and unescaped leading dot. Should be ^\.coderabbit\.yaml$.

Line 54: ^docs/sidebar..* – unescaped dot before .*. Should be ^docs/sidebar\..*.

Inconsistent anchoring: Some patterns lack the ^ start anchor (e.g., lines 6-7, 12-15, 23, 30-32, etc.), which means they'll match anywhere in the path. This may be intentional for flexibility, but worth confirming.
💡 Suggested fixes for remaining issues
-^\.env..*$
+^\.env\..*$

-.coderabbit.yaml$
+^\.coderabbit\.yaml$

-^docs/sidebar..*
+^docs/sidebar\..*

coderabbitai · 2026-01-22T08:05:21Z

.github/workflows/check-sensitive-files.yml

+          mapfile -t CHANGED_FILES < <(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA")
+          if [ "${`#CHANGED_FILES`[@]}" -gt 0 ]; then
+            if ! python3 .github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${CHANGED_FILES[@]}"; then


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the actual syntax in the workflow file sed -n '62,64p' .github/workflows/check-sensitive-files.yml | cat -A

Repository: PalisadoesFoundation/.github

Length of output: 398

🏁 Script executed:

#!/bin/bash # Also get broader context around those lines sed -n '60,66p' .github/workflows/check-sensitive-files.yml

Repository: PalisadoesFoundation/.github

Length of output: 598

Fix syntax error in array length check on line 63.

The file contains backticks in the array length syntax: ${#CHANGED_FILES[@]}, which is invalid bash. Backticks trigger command substitution, causing the workflow to fail. Change to:

if [ "${`#CHANGED_FILES`[@]}" -gt 0 ]; then

🤖 Prompt for AI Agents

In @.github/workflows/check-sensitive-files.yml around lines 62 - 64, Fix the bash array-length test for CHANGED_FILES: replace the invalid backticked expression used in the conditional (the if test referencing `${`#CHANGED_FILES`[@]}`) with the proper array-length expansion `${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the conditional that invokes python3 sensitive_file_check.py accordingly (the symbol to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).

nilanjan-sikdar added 2 commits January 22, 2026 12:19

feat: Central Python Script to detect sensitive files

f2fac0c

feat: Central Python Script to detect sensitive files-2

da81058

coderabbitai bot requested changes Jan 22, 2026

View reviewed changes

feat : resolved coderabbit suggestion

9a21408

coderabbitai bot requested changes Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

central detect sensitive files #40

central detect sensitive files #40

nilanjan-sikdar commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

central detect sensitive files #40

Are you sure you want to change the base?

central detect sensitive files #40

Conversation

nilanjan-sikdar commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 22, 2026

Our Pull Request Approval Process

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nilanjan-sikdar commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading