Skip to content

Add Git History Secret Scanner#19

Open
MikeeBuilds wants to merge 9 commits intomainfrom
auto-claude/012-add-git-history-secret-scanner
Open

Add Git History Secret Scanner#19
MikeeBuilds wants to merge 9 commits intomainfrom
auto-claude/012-add-git-history-secret-scanner

Conversation

@MikeeBuilds
Copy link
Owner

Create a new scan_git_history.sh scanner that searches git commit history for accidentally committed secrets using the existing SECRET_VALUE_PATTERNS from scan_secrets.py. This catches secrets that were committed and later removed but still exist in git history.

MikeeBuilds and others added 9 commits February 7, 2026 00:21
…cture an

Created scripts/scan_git_history.sh following the established scanner pattern:
- Added shebang and strict mode (set -euo pipefail)
- Added comprehensive header documentation
- Implemented SCRIPT_DIR resolution
- Sourced common.sh helpers with fallback emit_finding
- Added git repository and git command validation
- Initialized FINDINGS array for future checks
- Added JSON output logic (empty array for now)
- Added placeholders for CHK-GIT-001 through CHK-GIT-008 checks

Verification: Outputs valid JSON (jq empty exits 0)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…tern matching

- Add redact_secret() helper to hide sensitive values (show last 4 chars)
- Define 14 secret patterns adapted from scan_secrets.py:
  * Slack tokens (bot/app/user)
  * JWT tokens
  * Discord/Telegram bot tokens
  * OpenAI API keys
  * Ethereum private keys
  * Generic bearer tokens
  * AWS access keys
  * GitHub tokens
  * Generic API keys
- Implement scan_git_history() function:
  * Scan last 100 commits (1000 in deep mode)
  * Use git log -p to get commit diffs
  * Parse commit hashes and file paths from diff output
  * Match secret patterns in added lines (+)
  * Skip env var references ($VAR, ${VAR})
  * Emit CHK-SEC-008 findings with redacted values
- Tested with multiple secret types: Slack, OpenAI, JWT
- All findings include commit hash, file path, and remediation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…e handli

Performance optimizations:
- Skip binary/media files by extension (images, videos, archives, etc.)
- Limit line length to 10000 chars (skip likely binary data)
- Add deduplication to avoid duplicate secret findings
- Add safety limit on total lines scanned (50k default, 500k in deep mode)
- Use --no-merges and --diff-filter=A for git log efficiency
- Skip empty lines early

Edge case handling:
- Return empty array for non-git directories (not error)
- Check if repo has any commits (handle empty repos)
- Detect shallow clones and add warning to remediation
- Skip placeholder/example values (test, sample, dummy, etc.)
- Better handling of missing file context
- Use bash 3.2 compatible deduplication (string-based, not associative array)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…e handli

Add performance optimizations:
- Skip lockfiles, minified files, test fixtures (SKIP_PATHS pattern)
- Repository size detection and warnings for large repos (>10k commits)
- Timeout protection with configurable limits (5/15 min)
- Filter very short matches (<8 chars) to reduce false positives
- Null byte and binary data detection

Add edge case handling:
- Git worktree support (.git as file, not just directory)
- Timeout detection and warnings (exit codes 124/137)
- Case-insensitive placeholder matching
- Additional placeholder patterns (todo, fixme, redacted)
- Binary data detection in diff lines
- Improved shallow clone documentation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…trator

Integration verified via auto-discovery mechanism:
- clawpinch.sh automatically discovers scripts/scan_*.sh files
- scan_git_history.sh matches the pattern and is executable
- No code changes to orchestrator needed

Verification results:
✅ bash clawpinch.sh --json runs without errors
✅ Git history scanner included in full scan output
✅ Scanner outputs valid JSON array (currently empty, expected)
✅ Integration test passes: length >= 0 returns true

The scanner created in subtask-1-1 through subtask-1-3 is now
fully integrated with the main ClawPinch orchestrator.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added git history scanner category mapping to report.sh for proper display.
Scanner now appears as "📜 Git History" instead of generic "🔍 Scanner".

Verification completed:
- Git history scanner discovered and executed by clawpinch.sh
- No crashes, completes successfully (0 findings in clean repos)
- Integration tests all pass (6/6)
- Full scan shows 9 scanners running including git history
- Scan completes in ~32s with proper categorization

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Verified git history scanner integration:
- Scanner outputs valid JSON standalone
- Integrated with clawpinch.sh orchestrator
- Appears in interactive output with proper formatting
- JSON and deep scan modes work correctly
- No crashes or errors
- Performance within acceptable limits

All verification checks passed.
…tory scanner

- Added scan_git_history.sh to CLAUDE.md architecture diagram
- Updated SKILL.md deep scan description to mention extended git history (100 commits normal, 1000 commits deep)
- Documentation now accurately reflects new CHK-SEC-008 git history scanner
@gemini-code-assist
Copy link

Summary of Changes

Hello @MikeeBuilds, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's security scanning capabilities by introducing a dedicated scanner for Git commit history. This new feature addresses the critical issue of secrets that may have been accidentally committed and later removed, but still persist within the repository's history. By integrating this scanner, the system can now proactively identify and flag such vulnerabilities, providing a more thorough security posture.

Highlights

  • New Git History Scanner: Introduced a new scan_git_history.sh script designed to detect accidentally committed secrets within the Git commit history, even if they were later removed from the current codebase.
  • Enhanced Secret Detection: The new scanner leverages existing SECRET_VALUE_PATTERNS for comprehensive secret identification, expanding the scope of secret scanning beyond the current working directory.
  • Robust Testing: Added dedicated integration tests (scripts/helpers/test_git_history.sh) and end-to-end tests (test_e2e_git_scanner.sh) to ensure the reliability and proper functioning of the Git history scanner, covering various edge cases and deep scan modes.
  • Documentation and Reporting Updates: Updated CLAUDE.md and SKILL.md to reflect the new scanner and its deep scan capabilities, and integrated it into the reporting mechanism via scripts/helpers/report.sh.
Changelog
  • CLAUDE.md
    • Added scan_git_history.sh to the list of security scanners.
  • SKILL.md
    • Updated the description for "Deep scan" to include "extended git history".
  • scripts/helpers/report.sh
    • Added a new category "📜|Git History" for scan_git_history in the _scanner_category function.
  • scripts/helpers/test_git_history.sh
    • Added a new bash script for integration testing of the scan_git_history.sh scanner.
  • scripts/scan_git_history.sh
    • Added a new bash script that scans Git repository history for secrets using defined patterns.
  • test_e2e_git_scanner.sh
    • Added a new end-to-end test script to verify the integration and functionality of the Git history scanner.
Activity
  • No specific activity (comments, reviews, progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new scanner for detecting secrets in git history, which is a valuable addition to the security toolkit. The implementation is robust, handling many edge cases like non-git directories, shallow clones, and timeouts. The PR also includes a good set of integration and end-to-end tests.

My review focuses on a few key areas for improvement:

  • Performance and Memory Usage: The main scanner script can be optimized to use less memory and run faster, especially on large repositories.
  • Test Coverage: The integration tests are missing a positive test case to verify that secrets are actually found.
  • Test Diagnostics: The test runner can be improved to provide more information when a test fails.

Overall, this is a solid contribution. Addressing the feedback will make the new scanner more robust and maintainable.


printf "\n${C_BLUE}${C_BOLD}Functionality${C_RESET}\n"
run_test "Valid JSON output format" "test_json_output"
run_test "Deep scan mode" "test_deep_mode"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The integration tests cover several important edge cases like non-git directories, empty repos, and binary files. However, there is no test case that verifies the scanner's primary function: detecting a secret in the git history. A positive test case should be added to ensure the scanner correctly identifies a committed secret.

Here is an example test you could add:

# Test: Find a secret in commit history
test_find_secret() {
    local repo="${TEST_DIR}/found_secret"
    mkdir -p "$repo"
    cd "$repo"
    git init -q
    git config user.email "test@test.test"
    git config user.name "Test"

    # Commit a file with a secret
    echo 'const SLACK_BOT_TOKEN = "xoxb-12345-67890-abcdef";' > secrets.js
    git add secrets.js
    git commit -q -m "Add initial file with secret"

    # Remove the secret in a later commit
    echo 'const SLACK_BOT_TOKEN = process.env.SLACK_TOKEN;' > secrets.js
    git add secrets.js
    git commit -q -m "Remove hardcoded secret"

    local output
    output=$(GIT_REPO_PATH="$repo" bash "$SCANNER")
    # Check that at least one finding was returned
    echo "$output" | jq -e 'type == "array" and length > 0' >/dev/null 2>&1 && \
    # Check that the finding is the correct one
    echo "$output" | jq -e '.[0].id == "CHK-SEC-008" and .[0].severity == "critical"' >/dev/null 2>&1
}

# ...and then call it in the test suite:
# run_test "Finds a committed secret" "test_find_secret"

Comment on lines +198 to +357
git_output=$(cd "$REPO_PATH" && $timeout_cmd git log -p --all --no-textconv --no-merges --diff-filter=A -n "$max_commits" $time_limit --format="COMMIT:%H" 2>/dev/null || true)

# Edge case: Check if command was killed by timeout
local git_exit_code=$?
if [[ $git_exit_code -eq 124 ]] || [[ $git_exit_code -eq 137 ]]; then
# 124 = timeout killed the process, 137 = SIGKILL
local finding
finding=$(emit_finding \
"CHK-SEC-010" \
"warn" \
"Git history scan timed out" \
"The git history scan exceeded the time limit. Repository may be too large for complete scan." \
"exit_code=$git_exit_code" \
"Consider scanning a smaller time range or using --shallow-since with git clone" \
"")
FINDINGS+=("$finding")
return 0
fi

if [[ -z "$git_output" ]]; then
# Empty history or no commits
return 0
fi

local current_commit=""
local current_file=""
local lines_scanned=0
local max_lines=50000 # Safety limit to prevent runaway scans

# Performance optimization: Early exit if we've scanned too many lines
if [[ "${CLAWPINCH_DEEP:-0}" == "1" ]]; then
max_lines=500000
fi

# Process git log output line by line
while IFS= read -r line; do
# Safety limit: Exit if we've scanned too many lines
((lines_scanned++))
if [[ $lines_scanned -gt $max_lines ]]; then
break
fi

# Extract commit hash
if [[ "$line" =~ ^COMMIT:([a-f0-9]{40}) ]]; then
current_commit="${BASH_REMATCH[1]}"
current_file=""
continue
fi

# Extract file path from diff header
if [[ "$line" =~ ^\+\+\+[[:space:]]b/(.+)$ ]]; then
current_file="${BASH_REMATCH[1]}"
# Performance optimization: Skip binary/media files early
if should_skip_file "$current_file"; then
current_file="" # Mark as skipped
fi
continue
fi

# Only check added lines (starting with +)
if [[ ! "$line" =~ ^\+[^+] ]]; then
continue
fi

# Skip if we don't have commit context or file was skipped
if [[ -z "$current_commit" ]] || [[ -z "$current_file" ]]; then
continue
fi

# Performance optimization: Skip very long lines (likely binary data)
if [[ ${#line} -gt $MAX_LINE_LENGTH ]]; then
continue
fi

# Remove the leading + from the diff line
local content="${line:1}"

# Edge case: Skip empty lines
if [[ -z "${content// /}" ]]; then
continue
fi

# Edge case: Skip lines with null bytes or other binary indicators
# (some binary data may slip through extension filtering)
if [[ "$content" == *$'\x00'* ]] || [[ "$content" =~ [[:cntrl:]]{10,} ]]; then
continue
fi

# Check each secret pattern
for pattern_entry in "${SECRET_PATTERNS[@]}"; do
# Parse "type|pattern" format
local secret_type="${pattern_entry%%|*}"
local pattern="${pattern_entry#*|}"

# Use grep -oE to extract matching secrets
local matches
matches=$(echo "$content" | grep -oE "$pattern" 2>/dev/null || true)

if [[ -n "$matches" ]]; then
while IFS= read -r secret_value; do
# Skip empty matches
[[ -z "$secret_value" ]] && continue

# Performance optimization: Skip very short matches (likely false positives)
# Exception: private keys can have short markers
if [[ ${#secret_value} -lt 8 ]] && [[ "$secret_type" != "Private key" ]]; then
continue
fi

# Edge case: Skip environment variable references (${VAR} or $VAR)
if [[ "$secret_value" =~ ^\$\{.*\}$ ]] || [[ "$secret_value" =~ ^\$[A-Z_][A-Z0-9_]*$ ]]; then
continue
fi

# Edge case: Skip placeholder/example values (case-insensitive)
local lower_value
lower_value=$(echo "$secret_value" | tr '[:upper:]' '[:lower:]')
if [[ "$lower_value" =~ (your|example|test|sample|placeholder|dummy|fake|xxx|yyy|zzz|000|111|abc|123|todo|fixme|redacted) ]]; then
continue
fi

# Performance optimization: Deduplicate findings
# Create a unique key for this secret
local secret_key="${secret_type}:${secret_value}"
if echo "$FOUND_SECRETS" | grep -qF "$secret_key"; then
continue # Already reported this secret
fi
FOUND_SECRETS="${FOUND_SECRETS}${secret_key}"$'\n'

local redacted_value
redacted_value=$(redact_secret "$secret_value")

local evidence="commit=${current_commit:0:8} file=$current_file secret_type=\"$secret_type\" value=$redacted_value"

local title="$secret_type found in git history"
local description="A $secret_type was detected in commit $current_commit in file $current_file. This secret exists in the repository history even if it was later removed from current files."

local remediation="Remove secret from git history using git filter-repo or BFG Repo-Cleaner. Rotate the exposed credential immediately. See: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository"

# Add shallow clone warning to remediation if applicable
if [[ $IS_SHALLOW -eq 1 ]]; then
remediation="$remediation NOTE: This is a shallow clone - full history may contain additional secrets. Run 'git fetch --unshallow' for complete scan."
fi

# Emit finding
local finding
finding=$(emit_finding \
"CHK-SEC-008" \
"critical" \
"$title" \
"$description" \
"$evidence" \
"$remediation" \
"")

FINDINGS+=("$finding")
done <<< "$matches"
fi
done
done <<< "$git_output"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script loads the entire git log output into memory before processing. This can be inefficient for large repositories and lead to high memory consumption. A more memory-efficient approach is to stream the output from git log directly into the processing loop using a pipeline.

For example:

cd "$REPO_PATH" && $timeout_cmd git log ... | while IFS= read -r line
do
  # ... processing logic
done

Note that variables modified inside a pipeline's while loop are in a subshell. Since FINDINGS is modified, you'll need to handle this, for example by using process substitution (while ... done < <(...)) or by having the subshell write its results for the parent shell to collect.

Comment on lines +50 to +58
if eval "$test_cmd" &>/dev/null; then
printf "${C_GREEN}✓ PASS${C_RESET}\n"
TESTS_PASSED=$((TESTS_PASSED + 1))
return 0
else
printf "${C_RED}✗ FAIL${C_RESET}\n"
TESTS_FAILED=$((TESTS_FAILED + 1))
return 1
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The run_test helper redirects all command output to /dev/null using &>/dev/null. While this keeps the test output clean for passing tests, it makes debugging failures very difficult as there is no information about what went wrong. It would be more helpful to capture the output of failed tests and display it.

Suggested change
if eval "$test_cmd" &>/dev/null; then
printf "${C_GREEN}✓ PASS${C_RESET}\n"
TESTS_PASSED=$((TESTS_PASSED + 1))
return 0
else
printf "${C_RED}✗ FAIL${C_RESET}\n"
TESTS_FAILED=$((TESTS_FAILED + 1))
return 1
fi
local output
if output=$(eval "$test_cmd" 2>&1); then
printf "${C_GREEN}✓ PASS${C_RESET}\n"
TESTS_PASSED=$((TESTS_PASSED + 1))
return 0
else
printf "${C_RED}✗ FAIL${C_RESET}\n"
# Indent the captured output for readability
echo "$output" | sed 's/^/ /' >&2
TESTS_FAILED=$((TESTS_FAILED + 1))
return 1
fi

Comment on lines +319 to +325
# Performance optimization: Deduplicate findings
# Create a unique key for this secret
local secret_key="${secret_type}:${secret_value}"
if echo "$FOUND_SECRETS" | grep -qF "$secret_key"; then
continue # Already reported this secret
fi
FOUND_SECRETS="${FOUND_SECRETS}${secret_key}"$'\n'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current deduplication logic using echo "$FOUND_SECRETS" | grep ... can be inefficient, as it repeatedly scans a growing string in memory. This can lead to performance degradation if many potential secrets are found. A more performant approach for bash 3.2 compatibility would be to use a temporary file to track seen secrets. You would also need to remove the FOUND_SECRETS variable initialization on line 53.

Suggested change
# Performance optimization: Deduplicate findings
# Create a unique key for this secret
local secret_key="${secret_type}:${secret_value}"
if echo "$FOUND_SECRETS" | grep -qF "$secret_key"; then
continue # Already reported this secret
fi
FOUND_SECRETS="${FOUND_SECRETS}${secret_key}"$'\n'
# Performance optimization: Deduplicate findings
# Create a unique key for this secret
local secret_key="${secret_type}:${secret_value}"
if grep -qFx -- "$secret_key" "$DEDUPE_FILE"; then
continue # Already reported this secret
fi
echo "$secret_key" >> "$DEDUPE_FILE"

@greptile-apps
Copy link

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR adds a new scripts/scan_git_history.sh scanner that inspects git log -p output for secret-like values in added lines, emits findings as JSON, and integrates the scanner into the report UI as a “Git History” category. It also adds an integration test harness (scripts/helpers/test_git_history.sh) and an E2E script (test_e2e_git_scanner.sh) to validate the scanner is executable, produces JSON, and is auto-discovered by clawpinch.sh.

Key issues to address before merging are in the new scanner: timeout handling is currently ineffective due to masked exit codes, shallow-clone detection won’t work correctly for worktrees, and the scanner reuses CHK-SEC-008 which already identifies a different secrets check in scan_secrets.py/catalogs (causing report ambiguity). The scanner’s pattern list also diverges from the PR description’s claim of reusing SECRET_VALUE_PATTERNS.

Confidence Score: 2/5

  • Not safe to merge until scanner correctness issues are fixed.
  • The new git-history scanner has at least two correctness bugs that change runtime behavior (timeout detection is unreachable due to || true, and shallow-clone detection misses worktrees), plus a definite check-ID collision (CHK-SEC-008) with an existing secrets check that will conflate reports/catalogs.
  • scripts/scan_git_history.sh

Important Files Changed

Filename Overview
scripts/scan_git_history.sh Adds a new git-history secret scanner, but timeout detection is currently unreachable due to `
scripts/helpers/test_git_history.sh Adds integration tests for the new scanner; works as a basic harness but uses eval for running tests (unnecessary/brittle).
scripts/helpers/report.sh Adds a category mapping for the new scan_git_history scanner so reports display it under “Git History”.
test_e2e_git_scanner.sh Adds an end-to-end script to verify the new scanner is executable, returns JSON, integrates into clawpinch.sh, and supports deep scan mode.
CLAUDE.md Updates architecture docs to list the new scan_git_history.sh scanner.
SKILL.md Updates CLI docs to note that --deep includes extended git history scanning.

Sequence Diagram

sequenceDiagram
  participant User
  participant Orchestrator as clawpinch.sh
  participant Report as scripts/helpers/report.sh
  participant Scanner as scripts/scan_git_history.sh
  participant Git as git

  User->>Orchestrator: run clawpinch (--deep/--json)
  Orchestrator->>Report: print_section_header("scan_git_history.sh")
  Orchestrator->>Scanner: execute (env: CLAWPINCH_DEEP, GIT_REPO_PATH)

  Scanner->>Git: rev-parse HEAD
  alt not a git repo OR no commits OR git missing
    Scanner-->>Orchestrator: "[]"
  else history scan
    Scanner->>Git: git log -p --all --no-merges --diff-filter=A
    Git-->>Scanner: diff stream (added lines)
    loop for each added line
      Scanner->>Scanner: match against SECRET_PATTERNS
      alt match found
        Scanner-->>Orchestrator: finding objects (emit_finding)
      end
    end
    Scanner-->>Orchestrator: JSON array of findings
  end

  Orchestrator->>Orchestrator: merge + sort findings
  Orchestrator-->>User: report UI or JSON output
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +198 to +201
git_output=$(cd "$REPO_PATH" && $timeout_cmd git log -p --all --no-textconv --no-merges --diff-filter=A -n "$max_commits" $time_limit --format="COMMIT:%H" 2>/dev/null || true)

# Edge case: Check if command was killed by timeout
local git_exit_code=$?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout detection is broken

git_exit_code=$? is taken after git_output=$(... || true), so it will always be 0 (the || true masks the real exit status). This makes the timeout branch (124/137) unreachable, so timeouts won’t produce the intended CHK-SEC-010 finding and the scan will silently return partial/no results.

A straightforward fix is to capture $? before || true, or avoid || true and instead handle non-zero exit codes explicitly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/scan_git_history.sh
Line: 198:201

Comment:
**Timeout detection is broken**

`git_exit_code=$?` is taken after `git_output=$(... || true)`, so it will always be `0` (the `|| true` masks the real exit status). This makes the timeout branch (`124`/`137`) unreachable, so timeouts won’t produce the intended `CHK-SEC-010` finding and the scan will silently return partial/no results.

A straightforward fix is to capture `$?` *before* `|| true`, or avoid `|| true` and instead handle non-zero exit codes explicitly.


How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +345 to +346
"CHK-SEC-008" \
"critical" \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check ID collision

This scanner emits findings with id CHK-SEC-008, but scripts/scan_secrets.py already uses CHK-SEC-008 for a different check (“stale secrets across backups”). IDs appear to be used as unique check identifiers in reports/catalogs, so reusing one will conflate unrelated findings and make reports ambiguous.

You’ll want to assign the git-history scanner a new unused ID (and update docs/catalog accordingly) to avoid collisions.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/scan_git_history.sh
Line: 345:346

Comment:
**Duplicate check ID collision**

This scanner emits findings with id `CHK-SEC-008`, but `scripts/scan_secrets.py` already uses `CHK-SEC-008` for a different check (“stale secrets across backups”). IDs appear to be used as unique check identifiers in reports/catalogs, so reusing one will conflate unrelated findings and make reports ambiguous.

You’ll want to assign the git-history scanner a new unused ID (and update docs/catalog accordingly) to avoid collisions.


How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +88 to +91
IS_SHALLOW=0
if [[ -f "$REPO_PATH/.git/shallow" ]]; then
IS_SHALLOW=1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow clone check incorrect

In worktrees or linked workdirs, $REPO_PATH/.git is a file pointing at the actual git dir. Checking only $REPO_PATH/.git/shallow will miss shallow status (and skip the remediation warning) for those repos. Use git rev-parse --git-dir to locate the real git directory before checking for shallow.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/scan_git_history.sh
Line: 88:91

Comment:
**Shallow clone check incorrect**

In worktrees or linked workdirs, `$REPO_PATH/.git` is a *file* pointing at the actual git dir. Checking only `$REPO_PATH/.git/shallow` will miss shallow status (and skip the remediation warning) for those repos. Use `git rev-parse --git-dir` to locate the real git directory before checking for `shallow`.


How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +144 to +148
"Private key|-----BEGIN[[:space:]]+(RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"
"Generic Bearer token|[Bb]earer[[:space:]]+[A-Za-z0-9_.~+/-]+=*"
"AWS Access Key|AKIA[0-9A-Z]{16}"
"GitHub Token|ghp_[A-Za-z0-9]{36}"
"Generic API key|api[_-]?key[[:space:]]*[:=][[:space:]]*[\"'][A-Za-z0-9_-]{20,}[\"']"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern set diverges from source

This script claims to reuse SECRET_VALUE_PATTERNS from scan_secrets.py, but it adds patterns (Private key, AWS Access Key, GitHub Token, Generic API key) that aren’t in scan_secrets.py’s SECRET_VALUE_PATTERNS (and it renames “Bearer token” to “Generic Bearer token”). That mismatch means the two scanners will behave inconsistently and the comment/docs are inaccurate.

Either actually source the canonical pattern list, or update the PR/docs to reflect that this scanner uses a separate/extended pattern set.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/scan_git_history.sh
Line: 144:148

Comment:
**Pattern set diverges from source**

This script claims to reuse `SECRET_VALUE_PATTERNS` from `scan_secrets.py`, but it adds patterns (`Private key`, `AWS Access Key`, `GitHub Token`, `Generic API key`) that aren’t in `scan_secrets.py`’s `SECRET_VALUE_PATTERNS` (and it renames “Bearer token” to “Generic Bearer token”). That mismatch means the two scanners will behave inconsistently and the comment/docs are inaccurate.

Either actually source the canonical pattern list, or update the PR/docs to reflect that this scanner uses a separate/extended pattern set.


How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +50 to +56
if eval "$test_cmd" &>/dev/null; then
printf "${C_GREEN}✓ PASS${C_RESET}\n"
TESTS_PASSED=$((TESTS_PASSED + 1))
return 0
else
printf "${C_RED}✗ FAIL${C_RESET}\n"
TESTS_FAILED=$((TESTS_FAILED + 1))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test harness uses eval

run_test executes test_cmd via eval, so any unexpected characters in the command string will be interpreted by the shell. Here the commands are currently hardcoded, but this still makes the test harness unnecessarily unsafe and brittle.

Prefer calling the test functions directly (e.g., pass function name and invoke it) to avoid eval.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/helpers/test_git_history.sh
Line: 50:56

Comment:
**Test harness uses `eval`**

`run_test` executes `test_cmd` via `eval`, so any unexpected characters in the command string will be interpreted by the shell. Here the commands are currently hardcoded, but this still makes the test harness unnecessarily unsafe and brittle.

Prefer calling the test functions directly (e.g., pass function name and invoke it) to avoid `eval`.


How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant