Add Git History Secret Scanner by MikeeBuilds · Pull Request #19 · MikeeBuilds/clawpinch

MikeeBuilds · 2026-02-10T16:04:14Z

Create a new scan_git_history.sh scanner that searches git commit history for accidentally committed secrets using the existing SECRET_VALUE_PATTERNS from scan_secrets.py. This catches secrets that were committed and later removed but still exist in git history.

…cture an Created scripts/scan_git_history.sh following the established scanner pattern: - Added shebang and strict mode (set -euo pipefail) - Added comprehensive header documentation - Implemented SCRIPT_DIR resolution - Sourced common.sh helpers with fallback emit_finding - Added git repository and git command validation - Initialized FINDINGS array for future checks - Added JSON output logic (empty array for now) - Added placeholders for CHK-GIT-001 through CHK-GIT-008 checks Verification: Outputs valid JSON (jq empty exits 0) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…tern matching - Add redact_secret() helper to hide sensitive values (show last 4 chars) - Define 14 secret patterns adapted from scan_secrets.py: * Slack tokens (bot/app/user) * JWT tokens * Discord/Telegram bot tokens * OpenAI API keys * Ethereum private keys * Generic bearer tokens * AWS access keys * GitHub tokens * Generic API keys - Implement scan_git_history() function: * Scan last 100 commits (1000 in deep mode) * Use git log -p to get commit diffs * Parse commit hashes and file paths from diff output * Match secret patterns in added lines (+) * Skip env var references ($VAR, ${VAR}) * Emit CHK-SEC-008 findings with redacted values - Tested with multiple secret types: Slack, OpenAI, JWT - All findings include commit hash, file path, and remediation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…e handli Performance optimizations: - Skip binary/media files by extension (images, videos, archives, etc.) - Limit line length to 10000 chars (skip likely binary data) - Add deduplication to avoid duplicate secret findings - Add safety limit on total lines scanned (50k default, 500k in deep mode) - Use --no-merges and --diff-filter=A for git log efficiency - Skip empty lines early Edge case handling: - Return empty array for non-git directories (not error) - Check if repo has any commits (handle empty repos) - Detect shallow clones and add warning to remediation - Skip placeholder/example values (test, sample, dummy, etc.) - Better handling of missing file context - Use bash 3.2 compatible deduplication (string-based, not associative array) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…e handli Add performance optimizations: - Skip lockfiles, minified files, test fixtures (SKIP_PATHS pattern) - Repository size detection and warnings for large repos (>10k commits) - Timeout protection with configurable limits (5/15 min) - Filter very short matches (<8 chars) to reduce false positives - Null byte and binary data detection Add edge case handling: - Git worktree support (.git as file, not just directory) - Timeout detection and warnings (exit codes 124/137) - Case-insensitive placeholder matching - Additional placeholder patterns (todo, fixme, redacted) - Binary data detection in diff lines - Improved shallow clone documentation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…trator Integration verified via auto-discovery mechanism: - clawpinch.sh automatically discovers scripts/scan_*.sh files - scan_git_history.sh matches the pattern and is executable - No code changes to orchestrator needed Verification results: ✅ bash clawpinch.sh --json runs without errors ✅ Git history scanner included in full scan output ✅ Scanner outputs valid JSON array (currently empty, expected) ✅ Integration test passes: length >= 0 returns true The scanner created in subtask-1-1 through subtask-1-3 is now fully integrated with the main ClawPinch orchestrator. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…anner

Added git history scanner category mapping to report.sh for proper display. Scanner now appears as "📜 Git History" instead of generic "🔍 Scanner". Verification completed: - Git history scanner discovered and executed by clawpinch.sh - No crashes, completes successfully (0 findings in clean repos) - Integration tests all pass (6/6) - Full scan shows 9 scanners running including git history - Scan completes in ~32s with proper categorization Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Verified git history scanner integration: - Scanner outputs valid JSON standalone - Integrated with clawpinch.sh orchestrator - Appears in interactive output with proper formatting - JSON and deep scan modes work correctly - No crashes or errors - Performance within acceptable limits All verification checks passed.

…tory scanner - Added scan_git_history.sh to CLAUDE.md architecture diagram - Updated SKILL.md deep scan description to mention extended git history (100 commits normal, 1000 commits deep) - Documentation now accurately reflects new CHK-SEC-008 git history scanner

gemini-code-assist · 2026-02-10T16:04:39Z

Summary of Changes

Hello @MikeeBuilds, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's security scanning capabilities by introducing a dedicated scanner for Git commit history. This new feature addresses the critical issue of secrets that may have been accidentally committed and later removed, but still persist within the repository's history. By integrating this scanner, the system can now proactively identify and flag such vulnerabilities, providing a more thorough security posture.

Highlights

New Git History Scanner: Introduced a new scan_git_history.sh script designed to detect accidentally committed secrets within the Git commit history, even if they were later removed from the current codebase.
Enhanced Secret Detection: The new scanner leverages existing SECRET_VALUE_PATTERNS for comprehensive secret identification, expanding the scope of secret scanning beyond the current working directory.
Robust Testing: Added dedicated integration tests (scripts/helpers/test_git_history.sh) and end-to-end tests (test_e2e_git_scanner.sh) to ensure the reliability and proper functioning of the Git history scanner, covering various edge cases and deep scan modes.
Documentation and Reporting Updates: Updated CLAUDE.md and SKILL.md to reflect the new scanner and its deep scan capabilities, and integrated it into the reporting mechanism via scripts/helpers/report.sh.

Changelog

CLAUDE.md
- Added scan_git_history.sh to the list of security scanners.
SKILL.md
- Updated the description for "Deep scan" to include "extended git history".
scripts/helpers/report.sh
- Added a new category "📜|Git History" for scan_git_history in the _scanner_category function.
scripts/helpers/test_git_history.sh
- Added a new bash script for integration testing of the scan_git_history.sh scanner.
scripts/scan_git_history.sh
- Added a new bash script that scans Git repository history for secrets using defined patterns.
test_e2e_git_scanner.sh
- Added a new end-to-end test script to verify the integration and functionality of the Git history scanner.

Activity

No specific activity (comments, reviews, progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new scanner for detecting secrets in git history, which is a valuable addition to the security toolkit. The implementation is robust, handling many edge cases like non-git directories, shallow clones, and timeouts. The PR also includes a good set of integration and end-to-end tests.

My review focuses on a few key areas for improvement:

Performance and Memory Usage: The main scanner script can be optimized to use less memory and run faster, especially on large repositories.
Test Coverage: The integration tests are missing a positive test case to verify that secrets are actually found.
Test Diagnostics: The test runner can be improved to provide more information when a test fails.

Overall, this is a solid contribution. Addressing the feedback will make the new scanner more robust and maintainable.

gemini-code-assist · 2026-02-10T16:13:48Z

scripts/helpers/test_git_history.sh

+
+printf "\n${C_BLUE}${C_BOLD}Functionality${C_RESET}\n"
+run_test "Valid JSON output format" "test_json_output"
+run_test "Deep scan mode" "test_deep_mode"


The integration tests cover several important edge cases like non-git directories, empty repos, and binary files. However, there is no test case that verifies the scanner's primary function: detecting a secret in the git history. A positive test case should be added to ensure the scanner correctly identifies a committed secret.

Here is an example test you could add:

# Test: Find a secret in commit history test_find_secret() { local repo="${TEST_DIR}/found_secret" mkdir -p "$repo" cd "$repo" git init -q git config user.email "test@test.test" git config user.name "Test" # Commit a file with a secret echo 'const SLACK_BOT_TOKEN = "xoxb-12345-67890-abcdef";' > secrets.js git add secrets.js git commit -q -m "Add initial file with secret" # Remove the secret in a later commit echo 'const SLACK_BOT_TOKEN = process.env.SLACK_TOKEN;' > secrets.js git add secrets.js git commit -q -m "Remove hardcoded secret" local output output=$(GIT_REPO_PATH="$repo" bash "$SCANNER") # Check that at least one finding was returned echo "$output" | jq -e 'type == "array" and length > 0' >/dev/null 2>&1 && \ # Check that the finding is the correct one echo "$output" | jq -e '.[0].id == "CHK-SEC-008" and .[0].severity == "critical"' >/dev/null 2>&1 } # ...and then call it in the test suite: # run_test "Finds a committed secret" "test_find_secret"

gemini-code-assist · 2026-02-10T16:13:48Z

scripts/scan_git_history.sh

+    git_output=$(cd "$REPO_PATH" && $timeout_cmd git log -p --all --no-textconv --no-merges --diff-filter=A -n "$max_commits" $time_limit --format="COMMIT:%H" 2>/dev/null || true)
+
+    # Edge case: Check if command was killed by timeout
+    local git_exit_code=$?
+    if [[ $git_exit_code -eq 124 ]] || [[ $git_exit_code -eq 137 ]]; then
+        # 124 = timeout killed the process, 137 = SIGKILL
+        local finding
+        finding=$(emit_finding \
+            "CHK-SEC-010" \
+            "warn" \
+            "Git history scan timed out" \
+            "The git history scan exceeded the time limit. Repository may be too large for complete scan." \
+            "exit_code=$git_exit_code" \
+            "Consider scanning a smaller time range or using --shallow-since with git clone" \
+            "")
+        FINDINGS+=("$finding")
+        return 0
+    fi
+
+    if [[ -z "$git_output" ]]; then
+        # Empty history or no commits
+        return 0
+    fi
+
+    local current_commit=""
+    local current_file=""
+    local lines_scanned=0
+    local max_lines=50000  # Safety limit to prevent runaway scans
+
+    # Performance optimization: Early exit if we've scanned too many lines
+    if [[ "${CLAWPINCH_DEEP:-0}" == "1" ]]; then
+        max_lines=500000
+    fi
+
+    # Process git log output line by line
+    while IFS= read -r line; do
+        # Safety limit: Exit if we've scanned too many lines
+        ((lines_scanned++))
+        if [[ $lines_scanned -gt $max_lines ]]; then
+            break
+        fi
+
+        # Extract commit hash
+        if [[ "$line" =~ ^COMMIT:([a-f0-9]{40}) ]]; then
+            current_commit="${BASH_REMATCH[1]}"
+            current_file=""
+            continue
+        fi
+
+        # Extract file path from diff header
+        if [[ "$line" =~ ^\+\+\+[[:space:]]b/(.+)$ ]]; then
+            current_file="${BASH_REMATCH[1]}"
+            # Performance optimization: Skip binary/media files early
+            if should_skip_file "$current_file"; then
+                current_file=""  # Mark as skipped
+            fi
+            continue
+        fi
+
+        # Only check added lines (starting with +)
+        if [[ ! "$line" =~ ^\+[^+] ]]; then
+            continue
+        fi
+
+        # Skip if we don't have commit context or file was skipped
+        if [[ -z "$current_commit" ]] || [[ -z "$current_file" ]]; then
+            continue
+        fi
+
+        # Performance optimization: Skip very long lines (likely binary data)
+        if [[ ${#line} -gt $MAX_LINE_LENGTH ]]; then
+            continue
+        fi
+
+        # Remove the leading + from the diff line
+        local content="${line:1}"
+
+        # Edge case: Skip empty lines
+        if [[ -z "${content// /}" ]]; then
+            continue
+        fi
+
+        # Edge case: Skip lines with null bytes or other binary indicators
+        # (some binary data may slip through extension filtering)
+        if [[ "$content" == *$'\x00'* ]] || [[ "$content" =~ [[:cntrl:]]{10,} ]]; then
+            continue
+        fi
+
+        # Check each secret pattern
+        for pattern_entry in "${SECRET_PATTERNS[@]}"; do
+            # Parse "type|pattern" format
+            local secret_type="${pattern_entry%%|*}"
+            local pattern="${pattern_entry#*|}"
+
+            # Use grep -oE to extract matching secrets
+            local matches
+            matches=$(echo "$content" | grep -oE "$pattern" 2>/dev/null || true)
+
+            if [[ -n "$matches" ]]; then
+                while IFS= read -r secret_value; do
+                    # Skip empty matches
+                    [[ -z "$secret_value" ]] && continue
+
+                    # Performance optimization: Skip very short matches (likely false positives)
+                    # Exception: private keys can have short markers
+                    if [[ ${#secret_value} -lt 8 ]] && [[ "$secret_type" != "Private key" ]]; then
+                        continue
+                    fi
+
+                    # Edge case: Skip environment variable references (${VAR} or $VAR)
+                    if [[ "$secret_value" =~ ^\$\{.*\}$ ]] || [[ "$secret_value" =~ ^\$[A-Z_][A-Z0-9_]*$ ]]; then
+                        continue
+                    fi
+
+                    # Edge case: Skip placeholder/example values (case-insensitive)
+                    local lower_value
+                    lower_value=$(echo "$secret_value" | tr '[:upper:]' '[:lower:]')
+                    if [[ "$lower_value" =~ (your|example|test|sample|placeholder|dummy|fake|xxx|yyy|zzz|000|111|abc|123|todo|fixme|redacted) ]]; then
+                        continue
+                    fi
+
+                    # Performance optimization: Deduplicate findings
+                    # Create a unique key for this secret
+                    local secret_key="${secret_type}:${secret_value}"
+                    if echo "$FOUND_SECRETS" | grep -qF "$secret_key"; then
+                        continue  # Already reported this secret
+                    fi
+                    FOUND_SECRETS="${FOUND_SECRETS}${secret_key}"$'\n'
+
+                    local redacted_value
+                    redacted_value=$(redact_secret "$secret_value")
+
+                    local evidence="commit=${current_commit:0:8} file=$current_file secret_type=\"$secret_type\" value=$redacted_value"
+
+                    local title="$secret_type found in git history"
+                    local description="A $secret_type was detected in commit $current_commit in file $current_file. This secret exists in the repository history even if it was later removed from current files."
+
+                    local remediation="Remove secret from git history using git filter-repo or BFG Repo-Cleaner. Rotate the exposed credential immediately. See: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository"
+
+                    # Add shallow clone warning to remediation if applicable
+                    if [[ $IS_SHALLOW -eq 1 ]]; then
+                        remediation="$remediation NOTE: This is a shallow clone - full history may contain additional secrets. Run 'git fetch --unshallow' for complete scan."
+                    fi
+
+                    # Emit finding
+                    local finding
+                    finding=$(emit_finding \
+                        "CHK-SEC-008" \
+                        "critical" \
+                        "$title" \
+                        "$description" \
+                        "$evidence" \
+                        "$remediation" \
+                        "")
+
+                    FINDINGS+=("$finding")
+                done <<< "$matches"
+            fi
+        done
+    done <<< "$git_output"


The script loads the entire git log output into memory before processing. This can be inefficient for large repositories and lead to high memory consumption. A more memory-efficient approach is to stream the output from git log directly into the processing loop using a pipeline.

For example:

cd "$REPO_PATH" && $timeout_cmd git log ... | while IFS= read -r line do # ... processing logic done

Note that variables modified inside a pipeline's while loop are in a subshell. Since FINDINGS is modified, you'll need to handle this, for example by using process substitution (while ... done < <(...)) or by having the subshell write its results for the parent shell to collect.

gemini-code-assist · 2026-02-10T16:13:49Z

scripts/helpers/test_git_history.sh

+    if eval "$test_cmd" &>/dev/null; then
+        printf "${C_GREEN}✓ PASS${C_RESET}\n"
+        TESTS_PASSED=$((TESTS_PASSED + 1))
+        return 0
+    else
+        printf "${C_RED}✗ FAIL${C_RESET}\n"
+        TESTS_FAILED=$((TESTS_FAILED + 1))
+        return 1
+    fi


The run_test helper redirects all command output to /dev/null using &>/dev/null. While this keeps the test output clean for passing tests, it makes debugging failures very difficult as there is no information about what went wrong. It would be more helpful to capture the output of failed tests and display it.

Suggested change

if eval "$test_cmd" &>/dev/null; then

printf "${C_GREEN}✓ PASS${C_RESET}\n"

TESTS_PASSED=$((TESTS_PASSED + 1))

return 0

else

printf "${C_RED}✗ FAIL${C_RESET}\n"

TESTS_FAILED=$((TESTS_FAILED + 1))

return 1

fi

local output

if output=$(eval "$test_cmd" 2>&1); then

printf "${C_GREEN}✓ PASS${C_RESET}\n"

TESTS_PASSED=$((TESTS_PASSED + 1))

return 0

else

printf "${C_RED}✗ FAIL${C_RESET}\n"

# Indent the captured output for readability

echo "$output" | sed 's/^/ /' >&2

TESTS_FAILED=$((TESTS_FAILED + 1))

return 1

fi

gemini-code-assist · 2026-02-10T16:13:49Z

scripts/scan_git_history.sh

+                    # Performance optimization: Deduplicate findings
+                    # Create a unique key for this secret
+                    local secret_key="${secret_type}:${secret_value}"
+                    if echo "$FOUND_SECRETS" | grep -qF "$secret_key"; then
+                        continue  # Already reported this secret
+                    fi
+                    FOUND_SECRETS="${FOUND_SECRETS}${secret_key}"$'\n'


The current deduplication logic using echo "$FOUND_SECRETS" | grep ... can be inefficient, as it repeatedly scans a growing string in memory. This can lead to performance degradation if many potential secrets are found. A more performant approach for bash 3.2 compatibility would be to use a temporary file to track seen secrets. You would also need to remove the FOUND_SECRETS variable initialization on line 53.

Suggested change

# Performance optimization: Deduplicate findings

# Create a unique key for this secret

local secret_key="${secret_type}:${secret_value}"

if echo "$FOUND_SECRETS" | grep -qF "$secret_key"; then

continue # Already reported this secret

fi

FOUND_SECRETS="${FOUND_SECRETS}${secret_key}"$'\n'

# Performance optimization: Deduplicate findings

# Create a unique key for this secret

local secret_key="${secret_type}:${secret_value}"

if grep -qFx -- "$secret_key" "$DEDUPE_FILE"; then

continue # Already reported this secret

fi

echo "$secret_key" >> "$DEDUPE_FILE"

greptile-apps · 2026-02-10T16:22:30Z

Greptile Overview

Greptile Summary

This PR adds a new scripts/scan_git_history.sh scanner that inspects git log -p output for secret-like values in added lines, emits findings as JSON, and integrates the scanner into the report UI as a “Git History” category. It also adds an integration test harness (scripts/helpers/test_git_history.sh) and an E2E script (test_e2e_git_scanner.sh) to validate the scanner is executable, produces JSON, and is auto-discovered by clawpinch.sh.

Key issues to address before merging are in the new scanner: timeout handling is currently ineffective due to masked exit codes, shallow-clone detection won’t work correctly for worktrees, and the scanner reuses CHK-SEC-008 which already identifies a different secrets check in scan_secrets.py/catalogs (causing report ambiguity). The scanner’s pattern list also diverges from the PR description’s claim of reusing SECRET_VALUE_PATTERNS.

Confidence Score: 2/5

Not safe to merge until scanner correctness issues are fixed.
The new git-history scanner has at least two correctness bugs that change runtime behavior (timeout detection is unreachable due to || true, and shallow-clone detection misses worktrees), plus a definite check-ID collision (CHK-SEC-008) with an existing secrets check that will conflate reports/catalogs.
scripts/scan_git_history.sh

Important Files Changed

Filename	Overview
scripts/scan_git_history.sh	Adds a new git-history secret scanner, but timeout detection is currently unreachable due to `
scripts/helpers/test_git_history.sh	Adds integration tests for the new scanner; works as a basic harness but uses `eval` for running tests (unnecessary/brittle).
scripts/helpers/report.sh	Adds a category mapping for the new `scan_git_history` scanner so reports display it under “Git History”.
test_e2e_git_scanner.sh	Adds an end-to-end script to verify the new scanner is executable, returns JSON, integrates into `clawpinch.sh`, and supports deep scan mode.
CLAUDE.md	Updates architecture docs to list the new `scan_git_history.sh` scanner.
SKILL.md	Updates CLI docs to note that `--deep` includes extended git history scanning.

Sequence Diagram

sequenceDiagram
  participant User
  participant Orchestrator as clawpinch.sh
  participant Report as scripts/helpers/report.sh
  participant Scanner as scripts/scan_git_history.sh
  participant Git as git

  User->>Orchestrator: run clawpinch (--deep/--json)
  Orchestrator->>Report: print_section_header("scan_git_history.sh")
  Orchestrator->>Scanner: execute (env: CLAWPINCH_DEEP, GIT_REPO_PATH)

  Scanner->>Git: rev-parse HEAD
  alt not a git repo OR no commits OR git missing
    Scanner-->>Orchestrator: "[]"
  else history scan
    Scanner->>Git: git log -p --all --no-merges --diff-filter=A
    Git-->>Scanner: diff stream (added lines)
    loop for each added line
      Scanner->>Scanner: match against SECRET_PATTERNS
      alt match found
        Scanner-->>Orchestrator: finding objects (emit_finding)
      end
    end
    Scanner-->>Orchestrator: JSON array of findings
  end

  Orchestrator->>Orchestrator: merge + sort findings
  Orchestrator-->>User: report UI or JSON output

greptile-apps

_{6 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-10T16:22:34Z

scripts/scan_git_history.sh

+    git_output=$(cd "$REPO_PATH" && $timeout_cmd git log -p --all --no-textconv --no-merges --diff-filter=A -n "$max_commits" $time_limit --format="COMMIT:%H" 2>/dev/null || true)
+
+    # Edge case: Check if command was killed by timeout
+    local git_exit_code=$?


Timeout detection is broken

git_exit_code=$? is taken after git_output=$(... || true), so it will always be 0 (the || true masks the real exit status). This makes the timeout branch (124/137) unreachable, so timeouts won’t produce the intended CHK-SEC-010 finding and the scan will silently return partial/no results.

A straightforward fix is to capture $? before || true, or avoid || true and instead handle non-zero exit codes explicitly.

Prompt To Fix With AI

This is a comment left during a code review. Path: scripts/scan_git_history.sh Line: 198:201 Comment: **Timeout detection is broken** `git_exit_code=$?` is taken after `git_output=$(... || true)`, so it will always be `0` (the `|| true` masks the real exit status). This makes the timeout branch (`124`/`137`) unreachable, so timeouts won’t produce the intended `CHK-SEC-010` finding and the scan will silently return partial/no results. A straightforward fix is to capture `$?` *before* `|| true`, or avoid `|| true` and instead handle non-zero exit codes explicitly. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-10T16:22:35Z

scripts/scan_git_history.sh

+                        "CHK-SEC-008" \
+                        "critical" \


Duplicate check ID collision

This scanner emits findings with id CHK-SEC-008, but scripts/scan_secrets.py already uses CHK-SEC-008 for a different check (“stale secrets across backups”). IDs appear to be used as unique check identifiers in reports/catalogs, so reusing one will conflate unrelated findings and make reports ambiguous.

You’ll want to assign the git-history scanner a new unused ID (and update docs/catalog accordingly) to avoid collisions.

Prompt To Fix With AI

This is a comment left during a code review. Path: scripts/scan_git_history.sh Line: 345:346 Comment: **Duplicate check ID collision** This scanner emits findings with id `CHK-SEC-008`, but `scripts/scan_secrets.py` already uses `CHK-SEC-008` for a different check (“stale secrets across backups”). IDs appear to be used as unique check identifiers in reports/catalogs, so reusing one will conflate unrelated findings and make reports ambiguous. You’ll want to assign the git-history scanner a new unused ID (and update docs/catalog accordingly) to avoid collisions. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-10T16:22:36Z

scripts/scan_git_history.sh

+IS_SHALLOW=0
+if [[ -f "$REPO_PATH/.git/shallow" ]]; then
+    IS_SHALLOW=1
+fi


Shallow clone check incorrect

In worktrees or linked workdirs, $REPO_PATH/.git is a file pointing at the actual git dir. Checking only $REPO_PATH/.git/shallow will miss shallow status (and skip the remediation warning) for those repos. Use git rev-parse --git-dir to locate the real git directory before checking for shallow.

Prompt To Fix With AI

This is a comment left during a code review. Path: scripts/scan_git_history.sh Line: 88:91 Comment: **Shallow clone check incorrect** In worktrees or linked workdirs, `$REPO_PATH/.git` is a *file* pointing at the actual git dir. Checking only `$REPO_PATH/.git/shallow` will miss shallow status (and skip the remediation warning) for those repos. Use `git rev-parse --git-dir` to locate the real git directory before checking for `shallow`. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-10T16:22:37Z

scripts/scan_git_history.sh

+    "Private key|-----BEGIN[[:space:]]+(RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"
+    "Generic Bearer token|[Bb]earer[[:space:]]+[A-Za-z0-9_.~+/-]+=*"
+    "AWS Access Key|AKIA[0-9A-Z]{16}"
+    "GitHub Token|ghp_[A-Za-z0-9]{36}"
+    "Generic API key|api[_-]?key[[:space:]]*[:=][[:space:]]*[\"'][A-Za-z0-9_-]{20,}[\"']"


Pattern set diverges from source

This script claims to reuse SECRET_VALUE_PATTERNS from scan_secrets.py, but it adds patterns (Private key, AWS Access Key, GitHub Token, Generic API key) that aren’t in scan_secrets.py’s SECRET_VALUE_PATTERNS (and it renames “Bearer token” to “Generic Bearer token”). That mismatch means the two scanners will behave inconsistently and the comment/docs are inaccurate.

Either actually source the canonical pattern list, or update the PR/docs to reflect that this scanner uses a separate/extended pattern set.

Prompt To Fix With AI

This is a comment left during a code review. Path: scripts/scan_git_history.sh Line: 144:148 Comment: **Pattern set diverges from source** This script claims to reuse `SECRET_VALUE_PATTERNS` from `scan_secrets.py`, but it adds patterns (`Private key`, `AWS Access Key`, `GitHub Token`, `Generic API key`) that aren’t in `scan_secrets.py`’s `SECRET_VALUE_PATTERNS` (and it renames “Bearer token” to “Generic Bearer token”). That mismatch means the two scanners will behave inconsistently and the comment/docs are inaccurate. Either actually source the canonical pattern list, or update the PR/docs to reflect that this scanner uses a separate/extended pattern set. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-10T16:22:38Z

scripts/helpers/test_git_history.sh

+    if eval "$test_cmd" &>/dev/null; then
+        printf "${C_GREEN}✓ PASS${C_RESET}\n"
+        TESTS_PASSED=$((TESTS_PASSED + 1))
+        return 0
+    else
+        printf "${C_RED}✗ FAIL${C_RESET}\n"
+        TESTS_FAILED=$((TESTS_FAILED + 1))


Test harness uses eval

run_test executes test_cmd via eval, so any unexpected characters in the command string will be interpreted by the shell. Here the commands are currently hardcoded, but this still makes the test harness unnecessarily unsafe and brittle.

Prefer calling the test functions directly (e.g., pass function name and invoke it) to avoid eval.

Prompt To Fix With AI

This is a comment left during a code review. Path: scripts/helpers/test_git_history.sh Line: 50:56 Comment: **Test harness uses `eval`** `run_test` executes `test_cmd` via `eval`, so any unexpected characters in the command string will be interpreted by the shell. Here the commands are currently hardcoded, but this still makes the test harness unnecessarily unsafe and brittle. Prefer calling the test functions directly (e.g., pass function name and invoke it) to avoid `eval`. How can I resolve this? If you propose a fix, please make it concise.

MikeeBuilds and others added 9 commits February 7, 2026 00:21

auto-claude: subtask-2-1 - Create integration test for git history sc…

987521f

…anner

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

greptile-apps bot reviewed Feb 10, 2026

View reviewed changes

Conversation

MikeeBuilds commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant