Skip to content

Conversation

@nilanjan-sikdar
Copy link

@nilanjan-sikdar nilanjan-sikdar commented Jan 22, 2026

Add

  1. Separated Code and Config: Created .github/workflows/scripts/sensitive_file_check.py and .github/workflows/config/sensitive_files.txt to decouple the checking logic from the regex patterns.

  2. Updated Workflow: Modified .github/workflows/check-sensitive-files.yml to use the new Python script instead of inline Bash commands.

  3. Added Tests: Implemented unit tests in .github/workflows/scripts/test/test_sensitive_file_check.py to ensure the script works correctly.

Resolves #34

Summary by CodeRabbit

  • Chores

    • Improved CI security with a new Python-based sensitive-file validator driven by a curated pattern list.
    • Replaced large static checks with dynamic changed-file detection; enforcement now follows the validator's exit code and retains override messaging.
    • Preserved early exit for non-PR contexts and simplified workflow flow.
  • Tests

    • Added unit tests covering pattern loading, file discovery, and sensitive-file detection behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

Our Pull Request Approval Process

This PR will be reviewed according to our:

  1. Palisadoes Contributing Guidelines

  2. AI Usage Policy

Your PR may be automatically closed if:

  1. Our PR template isn't filled in correctly
  2. You haven't correctly linked your PR to an issue

👉 https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue

Thanks for contributing!

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

Refactors the sensitive-file check: moves pattern list into .github/workflows/config/sensitive_files.txt, adds .github/workflows/scripts/sensitive_file_check.py to validate changed files, and updates the GitHub Actions workflow to detect changed files via git and invoke the script; script exit code now enforces failures.

Changes

Cohort / File(s) Summary
Workflow
​.github/workflows/check-sensitive-files.yml
Replaced inline sensitive-pattern loop with git-based changed-file detection and a single invocation of sensitive_file_check.py; removed previous GH_OUTPUT emission and separate failure step; retains non-PR early exit and override message on script failure.
Config
​.github/workflows/config/sensitive_files.txt
New canonical list of sensitive file/glob/regex patterns used by the script.
Script
​.github/workflows/scripts/sensitive_file_check.py
New CLI Python utility: load_patterns(), get_files_to_check(), check_files(), and main() using argparse; loads regex config, expands directories/files, reports matches and exits non‑zero on sensitive matches.
Tests
​.github/workflows/scripts/test/test_sensitive_file_check.py
New unit tests covering pattern loading, path resolution, regex matching, error handling; uses mocks for filesystem and IO.

Sequence Diagram(s)

sequenceDiagram
  participant Actions as GitHub Actions
  participant Git as git
  participant Runner as Runner FS
  participant Script as sensitive_file_check.py

  Actions->>Git: git diff --name-only origin/main...HEAD
  Git-->>Actions: list of changed file paths (CHANGED_FILES)
  Actions->>Runner: pass CHANGED_FILES + config path
  Runner->>Script: execute sensitive_file_check.py --config config/... --files <paths>
  Script->>Runner: open/read config file
  Script->>Runner: walk filesystem for provided paths
  Script->>Script: match files against compiled regex patterns
  alt matches found
    Script-->>Actions: exit code 1 + printed override instructions
    Actions-->>Actions: job fails (script exit enforces failure)
  else no matches
    Script-->>Actions: exit code 0 (no sensitive files)
    Actions-->>Actions: job continues
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through configs and scripts with delight,
Patterns now tidy and kept out of sight.
Git lists the changes, the Python inspects,
Regex finds secrets and kindly directs.
A rabbit-approved check—swift, simple, and bright!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'central detect sensitive files' is vague and generic, using imprecise terminology that doesn't clearly convey the main change. Improve the title to be more specific and clear, such as 'Refactor sensitive file detection into centralized Python script' or 'Extract sensitive file patterns to configuration and Python script'.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The pull request fulfills all core coding requirements from issue #34: creates central Python script with argparse, stores patterns in config file, implements multi-file/directory support, adds unit tests, and maintains functional parity with existing check.
Out of Scope Changes check ✅ Passed All changes are directly related to issue #34 objectives. No out-of-scope modifications detected beyond the intended refactoring.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 50-68: Add the sensitive-file check as a prerequisite to both PR
workflows by making the existing check workflow reusable and invoking it from
pull-request-review.yml and pull-request-target.yml: update
.github/workflows/check-sensitive-files.yml to support being called (on:
workflow_call) if it isn’t already, then in both pull-request-review.yml and
pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.
- Around line 62-68: CHANGED_FILES is built by joining filenames with spaces
which breaks filenames that contain spaces; change the logic to read git diff
output into a bash array (use git diff --name-only --diff-filter=ACMRD
"$BASE_SHA" "$HEAD_SHA" piped into a null/line-safe reader) and then call the
Python checker with the array expanded so each filename is a single argument
(use "${CHANGED_FILES[@]}"); also update the emptiness check to test the array
length (e.g., ${`#CHANGED_FILES`[@]} > 0) before invoking sensitive_file_check.py
with --files to ensure filenames containing spaces are preserved and handled
correctly.

In @.github/workflows/config/sensitive_files.txt:
- Around line 2-23: The regex entries in sensitive_files.txt use unescaped dots
and broad patterns (e.g., ".flake8$", ".pydocstyle$", ".env..*$", "^.github/.*",
"^.husky/.*", "scripts/.*", "src/style/.*", ".gitignore$", "env.example$",
".eslintignore$", ".prettierrc$", ".prettierignore$", and any ".*.pem$" style
entries) which causes over‑matching; update each pattern to escape literal dots
(e.g., use "\." for ".") and ensure anchors are correct (keep leading "^" and
trailing "$" where intended) so entries match exact filenames or paths rather
than any character sequences, and apply this change consistently across all
similar lines mentioned in the diff.

In @.github/workflows/scripts/sensitive_file_check.py:
- Around line 55-66: The code currently normalizes only directory-walked file
paths (using rel_path) causing direct file paths to remain absolute and miss
regex matches; update the loop where you handle os.path.isfile(path) and where
you append full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().

In @.github/workflows/scripts/test/test_sensitive_file_check.py:
- Around line 51-53: The test patch for os.path.relpath uses an unused lambda
argument named start which causes lint failures; update the mock in the test
(the patch around os.path.relpath used when calling
sensitive_file_check.get_files_to_check) to rename the unused parameter to
_start (or *_), e.g. change lambda p, start: p.replace("/root/", "") to lambda
p, _start: p.replace("/root/", "") so Ruff stops flagging the unused argument.
🧹 Nitpick comments (1)
.github/workflows/scripts/sensitive_file_check.py (1)

26-38: Narrow the broad exception in load_patterns.
Catching Exception (Line 36) can mask programming errors and is typically discouraged by linters. Prefer OSError/UnicodeError for file I/O and decoding issues.

♻️ Proposed fix
-    except Exception as error:  # pylint: disable=broad-except
+    except (OSError, UnicodeError) as error:
         print(f"Error reading configuration file: {error}")
         sys.exit(1)

Comment on lines 50 to 68
- name: Detect sensitive file changes
if: steps.check-labels.outputs.skip != 'true'
id: sensitive-check
run: |
# Skip if not a PR
if [ -z "${{ github.event.pull_request.base.sha }}" ]; then
echo "any_changed=false" >> $GITHUB_OUTPUT
exit 0
fi
HEAD_SHA="${{ github.event.pull_request.head.sha || github.sha }}"
BASE_SHA=$(git merge-base "${{ github.event.pull_request.base.sha }}" "$HEAD_SHA")
SENSITIVE_PATTERNS=(
'.flake8$'
'.pydocstyle$'
'pyproject.toml$'
'.env..*$'
'vitest.config.js$'
'src/App.tsx$'
'^.github/.*'
'^.husky/.*'
'^scripts/.*'
'^src/style/.*'
'schema.graphql$'
'package.json$'
'package-lock.json$'
'tsconfig.json$'
'^.gitignore$'
'^env.example$'
'.node-version$'
'.eslintrc.json$'
'.eslintignore$'
'.prettierrc$'
'.prettierignore$'
'vite.config.ts$'
'^docker/docker-compose.prod.yaml$'
'^docker/docker-compose.dev.yaml$'
'^docker/Dockerfile.dev$'
'^docker/Dockerfile.prod$'
'^config/docker/setup/nginx.conf$'
'^config/docker/setup/nginx.prod.conf$'
'CODEOWNERS$'
'LICENSE$'
'setup.ts$'
'.coderabbit.yaml$'
'CODE_OF_CONDUCT.md$'
'CODE_STYLE.md$'
'CONTRIBUTING.md$'
'DOCUMENTATION.md$'
'INSTALLATION.md$'
'ISSUE_GUIDELINES.md$'
'PR_GUIDELINES.md$'
'README.md$'
'index.html$'
'.*.pem$'
'.*.key$'
'.*.cert$'
'.*.password$'
'.*.secret$'
'.*.credentials$'
'.nojekyll$'
'yarn.lock$'
'knip.json$'
'knip.deps.json$'
'^docs/docusaurus.config.ts$'
'^docs/sidebar..*'
'CNAME$'
)
CHANGED_FILES="$(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA" | tr '\n' ' ')"
CHANGED_FILES="$(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA")"
UNAUTHORIZED=""
for pattern in "${SENSITIVE_PATTERNS[@]}"; do
MATCHES=$(echo "$CHANGED_FILES" | grep -E "$pattern" || true)
if [ -n "$MATCHES" ]; then
UNAUTHORIZED="$UNAUTHORIZED $MATCHES"
if [ -n "$CHANGED_FILES" ]; then
if ! python3 .github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files $CHANGED_FILES; then
echo ""
echo "To override, add the '${{ inputs.skip-label }}' label to this PR."
exit 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify pull-request.yml includes the check-sensitive-files prerequisite.
fd -a '^pull-request\.yml$' .github/workflows
rg -n "Pre-Test-Checks-Pass|check-sensitive-files" .github/workflows/pull-request.yml

Repository: PalisadoesFoundation/.github

Length of output: 146


🏁 Script executed:

#!/bin/bash
# List all workflow files
fd -a '\.ya?ml$' .github/workflows | head -20

# Search for files mentioning Pre-Test-Checks-Pass or check-sensitive-files across all workflows
rg "Pre-Test-Checks-Pass|check-sensitive-files" .github/workflows/ --type yaml

Repository: PalisadoesFoundation/.github

Length of output: 825


🏁 Script executed:

#!/bin/bash
# Check the pull-request-review.yml and pull-request-target.yml files
echo "=== pull-request-review.yml ===" 
cat .github/workflows/pull-request-review.yml

echo ""
echo "=== pull-request-target.yml ==="
cat .github/workflows/pull-request-target.yml

Repository: PalisadoesFoundation/.github

Length of output: 5119


The check-sensitive-files workflow must be integrated as a prerequisite to the PR workflow.
The file .github/workflows/pull-request.yml does not exist. The actual PR workflows are pull-request-review.yml and pull-request-target.yml, neither of which currently reference or run check-sensitive-files. This workflow needs to be wired into the PR process to ensure sensitive file changes are detected before PR approval.

🤖 Prompt for AI Agents
In @.github/workflows/check-sensitive-files.yml around lines 50 - 68, Add the
sensitive-file check as a prerequisite to both PR workflows by making the
existing check workflow reusable and invoking it from pull-request-review.yml
and pull-request-target.yml: update .github/workflows/check-sensitive-files.yml
to support being called (on: workflow_call) if it isn’t already, then in both
pull-request-review.yml and pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.

Comment on lines 55 to 66
files_to_check = []
for path in paths:
if os.path.isfile(path):
files_to_check.append(path)
elif os.path.isdir(path):
for root, _, files in os.walk(path):
for file in files:
full_path = os.path.join(root, file)
# Normalize path to be relative to CWD if possible
rel_path = os.path.relpath(full_path, os.getcwd())
files_to_check.append(rel_path)
else:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Normalize file paths consistently before pattern matching.
Right now, directories are normalized to CWD, but direct file paths are not (Line 57). If callers pass absolute file paths, regex patterns like ^\.github/.* won’t match, leading to false negatives.

💡 Suggested fix
-        if os.path.isfile(path):
-            files_to_check.append(path)
+        if os.path.isfile(path):
+            files_to_check.append(os.path.relpath(path, os.getcwd()))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
files_to_check = []
for path in paths:
if os.path.isfile(path):
files_to_check.append(path)
elif os.path.isdir(path):
for root, _, files in os.walk(path):
for file in files:
full_path = os.path.join(root, file)
# Normalize path to be relative to CWD if possible
rel_path = os.path.relpath(full_path, os.getcwd())
files_to_check.append(rel_path)
else:
files_to_check = []
for path in paths:
if os.path.isfile(path):
files_to_check.append(os.path.relpath(path, os.getcwd()))
elif os.path.isdir(path):
for root, _, files in os.walk(path):
for file in files:
full_path = os.path.join(root, file)
# Normalize path to be relative to CWD if possible
rel_path = os.path.relpath(full_path, os.getcwd())
files_to_check.append(rel_path)
else:
🤖 Prompt for AI Agents
In @.github/workflows/scripts/sensitive_file_check.py around lines 55 - 66, The
code currently normalizes only directory-walked file paths (using rel_path)
causing direct file paths to remain absolute and miss regex matches; update the
loop where you handle os.path.isfile(path) and where you append
full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/check-sensitive-files.yml (1)

3-10: Add this workflow as a prerequisite job to the PR workflows.

The check-sensitive-files workflow is correctly configured as a reusable workflow (workflow_call), but it is not currently called from any PR workflows. According to issue #34, it must be integrated as a prerequisite to both pull-request-review.yml and pull-request-target.yml. Add a job that calls this workflow in both files using the uses directive.

🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 62-64: Fix the bash array-length test for CHANGED_FILES: replace
the invalid backticked expression used in the conditional (the if test
referencing `${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).
🧹 Nitpick comments (1)
.github/workflows/config/sensitive_files.txt (1)

1-55: LGTM overall – patterns are properly escaped based on past review feedback.

Most patterns now correctly escape literal dots. A few minor inconsistencies remain:

  1. Line 5: ^\.env..*$ – the dot before .* should be escaped as ^\.env\..*$ to match files like .env.local rather than .envXlocal.

  2. Line 33: .coderabbit.yaml$ – missing ^ anchor and unescaped leading dot. Should be ^\.coderabbit\.yaml$.

  3. Line 54: ^docs/sidebar..* – unescaped dot before .*. Should be ^docs/sidebar\..*.

  4. Inconsistent anchoring: Some patterns lack the ^ start anchor (e.g., lines 6-7, 12-15, 23, 30-32, etc.), which means they'll match anywhere in the path. This may be intentional for flexibility, but worth confirming.

💡 Suggested fixes for remaining issues
-^\.env..*$
+^\.env\..*$

-.coderabbit.yaml$
+^\.coderabbit\.yaml$

-^docs/sidebar..*
+^docs/sidebar\..*

Comment on lines +62 to +64
mapfile -t CHANGED_FILES < <(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA")
if [ "${`#CHANGED_FILES`[@]}" -gt 0 ]; then
if ! python3 .github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${CHANGED_FILES[@]}"; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the actual syntax in the workflow file
sed -n '62,64p' .github/workflows/check-sensitive-files.yml | cat -A

Repository: PalisadoesFoundation/.github

Length of output: 398


🏁 Script executed:

#!/bin/bash
# Also get broader context around those lines
sed -n '60,66p' .github/workflows/check-sensitive-files.yml

Repository: PalisadoesFoundation/.github

Length of output: 598


Fix syntax error in array length check on line 63.

The file contains backticks in the array length syntax: ${#CHANGED_FILES[@]}, which is invalid bash. Backticks trigger command substitution, causing the workflow to fail. Change to:

if [ "${`#CHANGED_FILES`[@]}" -gt 0 ]; then
🤖 Prompt for AI Agents
In @.github/workflows/check-sensitive-files.yml around lines 62 - 64, Fix the
bash array-length test for CHANGED_FILES: replace the invalid backticked
expression used in the conditional (the if test referencing
`${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Central Python Script to detect sensitive files

1 participant