Skip to content

Integrate top 6 tools from Accountability Agent pipeline into Bill Translator#15

Merged
Leerrooy95 merged 3 commits intomainfrom
copilot/update-template-with-tools-again
Apr 3, 2026
Merged

Integrate top 6 tools from Accountability Agent pipeline into Bill Translator#15
Leerrooy95 merged 3 commits intomainfrom
copilot/update-template-with-tools-again

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 3, 2026

  • Create CLAUDE.md — comprehensive setup & usage guide for Codespaces, local, and any environment
  • Create config.py — centralized Flask configuration (from tar.xz)
  • Create document_processor.py — PDF/txt/md ingestion with secure filenames and dedup (from tar.xz)
  • Create fact_checker.py — Brave Search + Claude claim verification engine (from tar.xz)
  • Create .github/workflows/validate.yml — CI/CD pipeline: lint, compile, test, secrets scan (from tar.xz)
  • Update web_app.py — integrate security headers, config, document processor, and fact checker
  • Update .gitignore — comprehensive security-focused ignore rules (from tar.xz)
  • Update requirements.txt — add new dependencies (pdfplumber, requests>=2.31.0)
  • Update .env.example — add new environment variable docs
  • Update README.md — document new features and tools
  • Update tests.py — add 24 tests for new modules (document processor, fact checker, config, security headers)
  • Delete leroysmemoryfix.tar.xz
  • Run tests and validate (223 tests pass, all files compile)
  • Code review — addressed all findings
  • Security scan — fixed stack trace exposure; path injection alerts are false positives
  • Pull request created — PR Integrate top 6 tools from Accountability Agent pipeline into Bill Translator #15

Copilot AI and others added 3 commits April 3, 2026 00:28
…hecker, CI/CD, security headers, enhanced gitignore; add CLAUDE.md; delete tar.xz

Agent-Logs-Url: https://github.com/Leerrooy95/Bill_Translator/sessions/0ca699cf-bacd-46ef-800d-2249ad9ef4fe

Co-authored-by: YouGotCuboned <261487993+YouGotCuboned@users.noreply.github.com>
…ion, add CSP comment

Agent-Logs-Url: https://github.com/Leerrooy95/Bill_Translator/sessions/0ca699cf-bacd-46ef-800d-2249ad9ef4fe

Co-authored-by: YouGotCuboned <261487993+YouGotCuboned@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Leerrooy95/Bill_Translator/sessions/0ca699cf-bacd-46ef-800d-2249ad9ef4fe

Co-authored-by: YouGotCuboned <261487993+YouGotCuboned@users.noreply.github.com>
Copilot AI requested a review from YouGotCuboned April 3, 2026 00:33
@Leerrooy95 Leerrooy95 marked this pull request as ready for review April 3, 2026 00:41
Copilot AI review requested due to automatic review settings April 3, 2026 00:41
@Leerrooy95 Leerrooy95 merged commit b5a2216 into main Apr 3, 2026
2 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates additional “Accountability Agent” pipeline capabilities into the Bill Translator Flask app, adding secure multi-format document ingestion and a Brave Search + Claude fact-checking feature, alongside centralized configuration, documentation, and CI validation.

Changes:

  • Added centralized Flask configuration (config.py) and applied it in web_app.py.
  • Introduced PDF/Markdown upload processing (document_processor.py) and a new /fact-check endpoint (fact_checker.py).
  • Expanded documentation, CI workflow, dependencies, and test coverage for the new modules and security headers.

Reviewed changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
web_app.py Uses centralized Config, adds security headers, supports .pdf/.md upload processing, and adds /fact-check route
document_processor.py New ingestion/extraction helper for txt/pdf/md with hashed filenames and basic path validation
fact_checker.py New Brave Search + Claude claim verification utility
config.py New centralized configuration for uploads, sessions, model defaults, etc.
tests.py Adds unit tests for document processor, fact checker, config, and security headers
templates/index.html Updates file input accept list and UI copy to include .pdf/.md
requirements.txt Adds pdfplumber and requests, reorganizes dependency list
README.md Documents new features (PDF/MD support, fact-checking, security/CI additions)
CLAUDE.md Adds comprehensive developer/setup guide and repo conventions
.github/workflows/validate.yml Adds CI job to compile-check, test, scan for secrets, and validate templates
.gitignore Expands ignore rules for secrets, uploads, outputs, venvs, and tooling artifacts
.env.example Documents additional optional env vars (upload size, model, Brave key, etc.)
Comments suppressed due to low confidence (1)

web_app.py:118

  • This validation error message still says “upload a .txt file”, but the app now accepts .pdf and .md as well. Updating the message will avoid confusing users when they submit an empty form.
    elif request.form.get("bill_text", "").strip():
        raw_text = request.form["bill_text"].strip()
    else:
        flash("Please upload a .txt file or paste bill text.", "error")
        return redirect(url_for("index"))

Comment thread web_app.py
ALLOWED_EXTENSIONS = {".txt"}
MAX_CONTENT_LENGTH = 2 * 1024 * 1024 # 2 MB
app.config["MAX_CONTENT_LENGTH"] = MAX_CONTENT_LENGTH
ALLOWED_EXTENSIONS = {".txt", ".pdf", ".md"}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALLOWED_EXTENSIONS is redefined here with dot-prefixed extensions, while Config.ALLOWED_EXTENSIONS and document_processor.ALLOWED_EXTENSIONS use non-dot extensions. This duplication risks inconsistencies (e.g., updating one list but not the others) and makes it unclear which source of truth is authoritative. Prefer deriving the web layer’s allowlist from Config (or reusing document_processor.allowed_file) to keep extension policy centralized.

Suggested change
ALLOWED_EXTENSIONS = {".txt", ".pdf", ".md"}
ALLOWED_EXTENSIONS = {
f".{ext.lower().lstrip('.')}" for ext in Config.ALLOWED_EXTENSIONS
}

Copilot uses AI. Check for mistakes.
Comment thread web_app.py
if not claim:
return jsonify({"error": "No claim provided."}), 400

brave_key = request.form.get("brave_key", "").strip() or session.get("brave_key")
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo documents BRAVE_SEARCH_API_KEY in CLAUDE.md/.env.example, but the route only accepts a key from the form or session. Consider falling back to the server-side env var (e.g., os.environ.get("BRAVE_SEARCH_API_KEY")) when no key is provided, so the documented deployment path works without requiring users to paste the key each session.

Suggested change
brave_key = request.form.get("brave_key", "").strip() or session.get("brave_key")
brave_key = (
request.form.get("brave_key", "").strip()
or session.get("brave_key")
or os.environ.get("BRAVE_SEARCH_API_KEY", "").strip()
)

Copilot uses AI. Check for mistakes.
Comment thread web_app.py
Comment on lines +321 to +332
brave_key = request.form.get("brave_key", "").strip() or session.get("brave_key")
if not brave_key:
return jsonify({"error": "Brave Search API key is required for fact-checking."}), 400

user_api_key = request.form.get("api_key", "").strip() or session.get("api_key")
if not user_api_key:
return jsonify({"error": "Anthropic API key is required."}), 400

# Store keys in session for convenience
session["brave_key"] = brave_key
session["api_key"] = user_api_key

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing api_key/brave_key in Flask’s default session means the keys end up in the client-side session cookie (signed but not encrypted). That conflicts with the “never saved”/BYOK security model and increases exposure risk if cookies are leaked (or if server logs/debug tools capture them). Prefer not persisting these secrets in the session at all (require them per request), or switch to a server-side session backend (e.g., Redis/filesystem via Flask-Session) and ensure cookies are Secure/HttpOnly with appropriate rotation.

Suggested change
brave_key = request.form.get("brave_key", "").strip() or session.get("brave_key")
if not brave_key:
return jsonify({"error": "Brave Search API key is required for fact-checking."}), 400
user_api_key = request.form.get("api_key", "").strip() or session.get("api_key")
if not user_api_key:
return jsonify({"error": "Anthropic API key is required."}), 400
# Store keys in session for convenience
session["brave_key"] = brave_key
session["api_key"] = user_api_key
brave_key = request.form.get("brave_key", "").strip()
if not brave_key:
return jsonify({"error": "Brave Search API key is required for fact-checking."}), 400
user_api_key = request.form.get("api_key", "").strip()
if not user_api_key:
return jsonify({"error": "Anthropic API key is required."}), 400

Copilot uses AI. Check for mistakes.
Comment thread config.py
# Session
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = "Lax"
SESSION_COOKIE_SECURE = os.environ.get("FLASK_ENV") == "production"
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SESSION_COOKIE_SECURE is derived from FLASK_ENV, which is deprecated/removed in newer Flask versions (including Flask 3.x). In practice this can silently leave cookies non-Secure in production unless the deploy explicitly sets FLASK_ENV=production. Prefer controlling this via an explicit env var (e.g., SESSION_COOKIE_SECURE) with a safe default, or set it unconditionally in production deployments where HTTPS is enforced.

Suggested change
SESSION_COOKIE_SECURE = os.environ.get("FLASK_ENV") == "production"
SESSION_COOKIE_SECURE = os.environ.get("SESSION_COOKIE_SECURE", "true").strip().lower() in {
"1",
"true",
"t",
"yes",
"y",
"on",
}

Copilot uses AI. Check for mistakes.
Comment thread fact_checker.py
Comment on lines +16 to +26
_BRAVE_SEARCH_URL = "https://api.search.brave.com/res/v1/web/search"


def _brave_search(query, api_key, count=5):
"""Search Brave for evidence related to a claim."""
headers = {
"Accept": "application/json",
"Accept-Encoding": "gzip",
"X-Subscription-Token": api_key,
}
params = {"q": query, "count": count}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module hardcodes the Brave Search URL and default result count, while similar values are already present in Config (BRAVE_SEARCH_URL, BRAVE_MAX_RESULTS). Using the centralized config (or accepting these as parameters) would reduce duplication and make it easier to adjust endpoints/limits in one place.

Copilot uses AI. Check for mistakes.
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install requests
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install -r requirements.txt already installs requests (and pins its minimum version). The extra pip install requests is redundant and can be removed to keep CI steps minimal and deterministic.

Suggested change
pip install requests

Copilot uses AI. Check for mistakes.
Comment thread tests.py
Comment on lines +1460 to +1461
name1 = secure_filename_hash("same.pdf")
name2 = secure_filename_hash("same.pdf")
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is probabilistic: it expects two calls to secure_filename_hash() to always differ, but the implementation uses randomness (secrets.token_hex) so collisions are theoretically possible and can create rare flaky failures. Consider patching secrets.token_hex (or injecting a deterministic RNG) to make the test fully deterministic while still asserting that the function uses the token value and preserves the extension.

Suggested change
name1 = secure_filename_hash("same.pdf")
name2 = secure_filename_hash("same.pdf")
with patch(
"document_processor.secrets.token_hex",
side_effect=[
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
],
):
name1 = secure_filename_hash("same.pdf")
name2 = secure_filename_hash("same.pdf")
self.assertEqual(name1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.pdf")
self.assertEqual(name2, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb.pdf")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants