-
Notifications
You must be signed in to change notification settings - Fork 7
central detect sensitive files #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
central detect sensitive files #40
Conversation
Our Pull Request Approval ProcessThis PR will be reviewed according to our: Your PR may be automatically closed if:
Thanks for contributing! |
WalkthroughRefactors the sensitive-file check: moves pattern list into Changes
Sequence Diagram(s)sequenceDiagram
participant Actions as GitHub Actions
participant Git as git
participant Runner as Runner FS
participant Script as sensitive_file_check.py
Actions->>Git: git diff --name-only origin/main...HEAD
Git-->>Actions: list of changed file paths (CHANGED_FILES)
Actions->>Runner: pass CHANGED_FILES + config path
Runner->>Script: execute sensitive_file_check.py --config config/... --files <paths>
Script->>Runner: open/read config file
Script->>Runner: walk filesystem for provided paths
Script->>Script: match files against compiled regex patterns
alt matches found
Script-->>Actions: exit code 1 + printed override instructions
Actions-->>Actions: job fails (script exit enforces failure)
else no matches
Script-->>Actions: exit code 0 (no sensitive files)
Actions-->>Actions: job continues
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 50-68: Add the sensitive-file check as a prerequisite to both PR
workflows by making the existing check workflow reusable and invoking it from
pull-request-review.yml and pull-request-target.yml: update
.github/workflows/check-sensitive-files.yml to support being called (on:
workflow_call) if it isn’t already, then in both pull-request-review.yml and
pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.
- Around line 62-68: CHANGED_FILES is built by joining filenames with spaces
which breaks filenames that contain spaces; change the logic to read git diff
output into a bash array (use git diff --name-only --diff-filter=ACMRD
"$BASE_SHA" "$HEAD_SHA" piped into a null/line-safe reader) and then call the
Python checker with the array expanded so each filename is a single argument
(use "${CHANGED_FILES[@]}"); also update the emptiness check to test the array
length (e.g., ${`#CHANGED_FILES`[@]} > 0) before invoking sensitive_file_check.py
with --files to ensure filenames containing spaces are preserved and handled
correctly.
In @.github/workflows/config/sensitive_files.txt:
- Around line 2-23: The regex entries in sensitive_files.txt use unescaped dots
and broad patterns (e.g., ".flake8$", ".pydocstyle$", ".env..*$", "^.github/.*",
"^.husky/.*", "scripts/.*", "src/style/.*", ".gitignore$", "env.example$",
".eslintignore$", ".prettierrc$", ".prettierignore$", and any ".*.pem$" style
entries) which causes over‑matching; update each pattern to escape literal dots
(e.g., use "\." for ".") and ensure anchors are correct (keep leading "^" and
trailing "$" where intended) so entries match exact filenames or paths rather
than any character sequences, and apply this change consistently across all
similar lines mentioned in the diff.
In @.github/workflows/scripts/sensitive_file_check.py:
- Around line 55-66: The code currently normalizes only directory-walked file
paths (using rel_path) causing direct file paths to remain absolute and miss
regex matches; update the loop where you handle os.path.isfile(path) and where
you append full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().
In @.github/workflows/scripts/test/test_sensitive_file_check.py:
- Around line 51-53: The test patch for os.path.relpath uses an unused lambda
argument named start which causes lint failures; update the mock in the test
(the patch around os.path.relpath used when calling
sensitive_file_check.get_files_to_check) to rename the unused parameter to
_start (or *_), e.g. change lambda p, start: p.replace("/root/", "") to lambda
p, _start: p.replace("/root/", "") so Ruff stops flagging the unused argument.
🧹 Nitpick comments (1)
.github/workflows/scripts/sensitive_file_check.py (1)
26-38: Narrow the broad exception inload_patterns.
CatchingException(Line 36) can mask programming errors and is typically discouraged by linters. PreferOSError/UnicodeErrorfor file I/O and decoding issues.♻️ Proposed fix
- except Exception as error: # pylint: disable=broad-except + except (OSError, UnicodeError) as error: print(f"Error reading configuration file: {error}") sys.exit(1)
| - name: Detect sensitive file changes | ||
| if: steps.check-labels.outputs.skip != 'true' | ||
| id: sensitive-check | ||
| run: | | ||
| # Skip if not a PR | ||
| if [ -z "${{ github.event.pull_request.base.sha }}" ]; then | ||
| echo "any_changed=false" >> $GITHUB_OUTPUT | ||
| exit 0 | ||
| fi | ||
| HEAD_SHA="${{ github.event.pull_request.head.sha || github.sha }}" | ||
| BASE_SHA=$(git merge-base "${{ github.event.pull_request.base.sha }}" "$HEAD_SHA") | ||
| SENSITIVE_PATTERNS=( | ||
| '.flake8$' | ||
| '.pydocstyle$' | ||
| 'pyproject.toml$' | ||
| '.env..*$' | ||
| 'vitest.config.js$' | ||
| 'src/App.tsx$' | ||
| '^.github/.*' | ||
| '^.husky/.*' | ||
| '^scripts/.*' | ||
| '^src/style/.*' | ||
| 'schema.graphql$' | ||
| 'package.json$' | ||
| 'package-lock.json$' | ||
| 'tsconfig.json$' | ||
| '^.gitignore$' | ||
| '^env.example$' | ||
| '.node-version$' | ||
| '.eslintrc.json$' | ||
| '.eslintignore$' | ||
| '.prettierrc$' | ||
| '.prettierignore$' | ||
| 'vite.config.ts$' | ||
| '^docker/docker-compose.prod.yaml$' | ||
| '^docker/docker-compose.dev.yaml$' | ||
| '^docker/Dockerfile.dev$' | ||
| '^docker/Dockerfile.prod$' | ||
| '^config/docker/setup/nginx.conf$' | ||
| '^config/docker/setup/nginx.prod.conf$' | ||
| 'CODEOWNERS$' | ||
| 'LICENSE$' | ||
| 'setup.ts$' | ||
| '.coderabbit.yaml$' | ||
| 'CODE_OF_CONDUCT.md$' | ||
| 'CODE_STYLE.md$' | ||
| 'CONTRIBUTING.md$' | ||
| 'DOCUMENTATION.md$' | ||
| 'INSTALLATION.md$' | ||
| 'ISSUE_GUIDELINES.md$' | ||
| 'PR_GUIDELINES.md$' | ||
| 'README.md$' | ||
| 'index.html$' | ||
| '.*.pem$' | ||
| '.*.key$' | ||
| '.*.cert$' | ||
| '.*.password$' | ||
| '.*.secret$' | ||
| '.*.credentials$' | ||
| '.nojekyll$' | ||
| 'yarn.lock$' | ||
| 'knip.json$' | ||
| 'knip.deps.json$' | ||
| '^docs/docusaurus.config.ts$' | ||
| '^docs/sidebar..*' | ||
| 'CNAME$' | ||
| ) | ||
| CHANGED_FILES="$(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA" | tr '\n' ' ')" | ||
| CHANGED_FILES="$(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA")" | ||
| UNAUTHORIZED="" | ||
| for pattern in "${SENSITIVE_PATTERNS[@]}"; do | ||
| MATCHES=$(echo "$CHANGED_FILES" | grep -E "$pattern" || true) | ||
| if [ -n "$MATCHES" ]; then | ||
| UNAUTHORIZED="$UNAUTHORIZED $MATCHES" | ||
| if [ -n "$CHANGED_FILES" ]; then | ||
| if ! python3 .github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files $CHANGED_FILES; then | ||
| echo "" | ||
| echo "To override, add the '${{ inputs.skip-label }}' label to this PR." | ||
| exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify pull-request.yml includes the check-sensitive-files prerequisite.
fd -a '^pull-request\.yml$' .github/workflows
rg -n "Pre-Test-Checks-Pass|check-sensitive-files" .github/workflows/pull-request.ymlRepository: PalisadoesFoundation/.github
Length of output: 146
🏁 Script executed:
#!/bin/bash
# List all workflow files
fd -a '\.ya?ml$' .github/workflows | head -20
# Search for files mentioning Pre-Test-Checks-Pass or check-sensitive-files across all workflows
rg "Pre-Test-Checks-Pass|check-sensitive-files" .github/workflows/ --type yamlRepository: PalisadoesFoundation/.github
Length of output: 825
🏁 Script executed:
#!/bin/bash
# Check the pull-request-review.yml and pull-request-target.yml files
echo "=== pull-request-review.yml ==="
cat .github/workflows/pull-request-review.yml
echo ""
echo "=== pull-request-target.yml ==="
cat .github/workflows/pull-request-target.ymlRepository: PalisadoesFoundation/.github
Length of output: 5119
The check-sensitive-files workflow must be integrated as a prerequisite to the PR workflow.
The file .github/workflows/pull-request.yml does not exist. The actual PR workflows are pull-request-review.yml and pull-request-target.yml, neither of which currently reference or run check-sensitive-files. This workflow needs to be wired into the PR process to ensure sensitive file changes are detected before PR approval.
🤖 Prompt for AI Agents
In @.github/workflows/check-sensitive-files.yml around lines 50 - 68, Add the
sensitive-file check as a prerequisite to both PR workflows by making the
existing check workflow reusable and invoking it from pull-request-review.yml
and pull-request-target.yml: update .github/workflows/check-sensitive-files.yml
to support being called (on: workflow_call) if it isn’t already, then in both
pull-request-review.yml and pull-request-target.yml add a job that uses
./.github/workflows/check-sensitive-files.yml with id sensitive-check (or
preserve the job id "sensitive-check") and make the rest of the PR jobs depend
on it via needs: [sensitive-check] so the "Detect sensitive file changes" job
runs and blocks the PR jobs when it fails.
| files_to_check = [] | ||
| for path in paths: | ||
| if os.path.isfile(path): | ||
| files_to_check.append(path) | ||
| elif os.path.isdir(path): | ||
| for root, _, files in os.walk(path): | ||
| for file in files: | ||
| full_path = os.path.join(root, file) | ||
| # Normalize path to be relative to CWD if possible | ||
| rel_path = os.path.relpath(full_path, os.getcwd()) | ||
| files_to_check.append(rel_path) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize file paths consistently before pattern matching.
Right now, directories are normalized to CWD, but direct file paths are not (Line 57). If callers pass absolute file paths, regex patterns like ^\.github/.* won’t match, leading to false negatives.
💡 Suggested fix
- if os.path.isfile(path):
- files_to_check.append(path)
+ if os.path.isfile(path):
+ files_to_check.append(os.path.relpath(path, os.getcwd()))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| files_to_check = [] | |
| for path in paths: | |
| if os.path.isfile(path): | |
| files_to_check.append(path) | |
| elif os.path.isdir(path): | |
| for root, _, files in os.walk(path): | |
| for file in files: | |
| full_path = os.path.join(root, file) | |
| # Normalize path to be relative to CWD if possible | |
| rel_path = os.path.relpath(full_path, os.getcwd()) | |
| files_to_check.append(rel_path) | |
| else: | |
| files_to_check = [] | |
| for path in paths: | |
| if os.path.isfile(path): | |
| files_to_check.append(os.path.relpath(path, os.getcwd())) | |
| elif os.path.isdir(path): | |
| for root, _, files in os.walk(path): | |
| for file in files: | |
| full_path = os.path.join(root, file) | |
| # Normalize path to be relative to CWD if possible | |
| rel_path = os.path.relpath(full_path, os.getcwd()) | |
| files_to_check.append(rel_path) | |
| else: |
🤖 Prompt for AI Agents
In @.github/workflows/scripts/sensitive_file_check.py around lines 55 - 66, The
code currently normalizes only directory-walked file paths (using rel_path)
causing direct file paths to remain absolute and miss regex matches; update the
loop where you handle os.path.isfile(path) and where you append
full_path/rel_path so that you call os.path.relpath (and optionally
os.path.normpath) on both file and directory entries before appending to
files_to_check; look for symbols files_to_check, paths, os.path.isfile,
os.path.isdir, full_path and rel_path and ensure all appended paths are
consistently normalized relative to os.getcwd().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/check-sensitive-files.yml (1)
3-10: Add this workflow as a prerequisite job to the PR workflows.The
check-sensitive-filesworkflow is correctly configured as a reusable workflow (workflow_call), but it is not currently called from any PR workflows. According to issue#34, it must be integrated as a prerequisite to bothpull-request-review.ymlandpull-request-target.yml. Add a job that calls this workflow in both files using theusesdirective.
🤖 Fix all issues with AI agents
In @.github/workflows/check-sensitive-files.yml:
- Around line 62-64: Fix the bash array-length test for CHANGED_FILES: replace
the invalid backticked expression used in the conditional (the if test
referencing `${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).
🧹 Nitpick comments (1)
.github/workflows/config/sensitive_files.txt (1)
1-55: LGTM overall – patterns are properly escaped based on past review feedback.Most patterns now correctly escape literal dots. A few minor inconsistencies remain:
Line 5:
^\.env..*$– the dot before.*should be escaped as^\.env\..*$to match files like.env.localrather than.envXlocal.Line 33:
.coderabbit.yaml$– missing^anchor and unescaped leading dot. Should be^\.coderabbit\.yaml$.Line 54:
^docs/sidebar..*– unescaped dot before.*. Should be^docs/sidebar\..*.Inconsistent anchoring: Some patterns lack the
^start anchor (e.g., lines 6-7, 12-15, 23, 30-32, etc.), which means they'll match anywhere in the path. This may be intentional for flexibility, but worth confirming.💡 Suggested fixes for remaining issues
-^\.env..*$ +^\.env\..*$ -.coderabbit.yaml$ +^\.coderabbit\.yaml$ -^docs/sidebar..* +^docs/sidebar\..*
| mapfile -t CHANGED_FILES < <(git diff --name-only --diff-filter=ACMRD "$BASE_SHA" "$HEAD_SHA") | ||
| if [ "${`#CHANGED_FILES`[@]}" -gt 0 ]; then | ||
| if ! python3 .github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${CHANGED_FILES[@]}"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the actual syntax in the workflow file
sed -n '62,64p' .github/workflows/check-sensitive-files.yml | cat -ARepository: PalisadoesFoundation/.github
Length of output: 398
🏁 Script executed:
#!/bin/bash
# Also get broader context around those lines
sed -n '60,66p' .github/workflows/check-sensitive-files.ymlRepository: PalisadoesFoundation/.github
Length of output: 598
Fix syntax error in array length check on line 63.
The file contains backticks in the array length syntax: ${#CHANGED_FILES[@]}, which is invalid bash. Backticks trigger command substitution, causing the workflow to fail. Change to:
if [ "${`#CHANGED_FILES`[@]}" -gt 0 ]; then🤖 Prompt for AI Agents
In @.github/workflows/check-sensitive-files.yml around lines 62 - 64, Fix the
bash array-length test for CHANGED_FILES: replace the invalid backticked
expression used in the conditional (the if test referencing
`${`#CHANGED_FILES`[@]}`) with the proper array-length expansion
`${`#CHANGED_FILES`[@]}` so the if [ ... -gt 0 ] check works; update the
conditional that invokes python3 sensitive_file_check.py accordingly (the symbol
to locate is the CHANGED_FILES array and the surrounding if [ ... ] test).
Add
Separated Code and Config: Created .github/workflows/scripts/sensitive_file_check.py and .github/workflows/config/sensitive_files.txt to decouple the checking logic from the regex patterns.
Updated Workflow: Modified .github/workflows/check-sensitive-files.yml to use the new Python script instead of inline Bash commands.
Added Tests: Implemented unit tests in .github/workflows/scripts/test/test_sensitive_file_check.py to ensure the script works correctly.
Resolves #34
Summary by CodeRabbit
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.