-
Notifications
You must be signed in to change notification settings - Fork 1
Added boilerplate code, auth, validators and guardrails API #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughComplete backend project scaffolding with FastAPI, SQLModel ORM, Alembic migrations, Postgres database, HTTP Bearer authentication, guardrails-based input/output validators, Docker Compose orchestration, and deployment infrastructure with supporting documentation and CI/CD scripts. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Areas requiring special attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
🧹 Nitpick comments (28)
backend/app/core/middleware.py (1)
17-21: Use extractedclient_ipin logging or remove it.The
client_ipvariable is extracted on Line 17 but never used in the log statement on lines 19-21. Consider including it in the log output for better observability.🔎 Apply this diff to include client IP in logs:
logger.info( - f"{request.method} {request.url.path} - {response.status_code} [{process_time:.2f}ms]" + f"{request.method} {request.url.path} - {response.status_code} [{process_time:.2f}ms] - {client_ip}" )copier.yml (1)
11-17: Document that security defaults must be changed before deployment.The template uses
changethisas the default forsecret_key,first_superuser_password, andpostgres_password. While acceptable for a template, consider adding prominent warnings in deployment documentation that these MUST be changed before production use.A comment in the template itself could help:
secret_key: type: str help: | - 'The secret key for the project, used for security, + '⚠️ REQUIRED: The secret key for the project, used for security, stored in .env, you can generate one with: python -c "import secrets; print(secrets.token_urlsafe(32))"' default: changethisAlso applies to: 24-27, 49-55
README.md (1)
14-14: Consider formatting the URL as a markdown link.Line 14 contains a bare URL. For better markdown compliance and readability, consider formatting it as a proper link.
🔎 Suggested fix:
-- [uv](https://docs.astral.sh/uv/) for Python package and environment management. +- [uv](https://docs.astral.sh/uv/) for Python package and environment management(The URL is already a proper markdown link - the linter may be confused. If you meant to add more context, consider:
[uv package manager](https://docs.astral.sh/uv/))backend/app/load_env.py (1)
5-13: Document the expected execution context for relative paths.The function uses relative paths (
../.env) which assume the script is executed from thebackend/directory. Consider:
- Adding a docstring explaining the execution context requirement
- Using
Path(__file__).parent.parent / ".env"for more robust path resolution- Adding file existence validation before calling
load_dotenv🔎 More robust implementation:
+from pathlib import Path import os from dotenv import load_dotenv def load_environment(): + """Load environment variables from .env file. + + Expects to be called from the backend/ directory. + Uses ../.env for development/production, ../.env.test for testing. + """ env = os.getenv("ENVIRONMENT", "development") - # Use the same path as config.py expects (one level above backend/) - env_file = "../.env" + base_path = Path(__file__).parent.parent.parent + env_file = base_path / ".env" if env == "testing": - env_file = "../.env.test" + env_file = base_path / ".env.test" - load_dotenv(env_file) + if env_file.exists(): + load_dotenv(env_file) + else: + # Silent fail or log warning depending on requirements + passbackend/app/initial_data.py (1)
7-8: Consider removing module-level logging configuration.Calling
basicConfigat the module level can interfere with the application's logging configuration (likely defined inmain.py). Module-level logging setup affects the global logging state and may override application-wide settings.🔎 Consider this approach:
-logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__)Let the application's main entry point handle logging configuration, and this module will inherit those settings.
scripts/test-local.sh (2)
8-8: Quote the command substitution to prevent word splitting.The unquoted
$(uname -s)could cause issues if the command returns unexpected output with spaces, though this is unlikely withuname -s.🔎 Apply this fix:
-if [ $(uname -s) = "Linux" ]; then +if [ "$(uname -s)" = "Linux" ]; then
6-15: Consider migrating todocker compose(v2).The script uses the deprecated
docker-compose(v1) command. Docker Compose v2 usesdocker compose(without hyphen) and is the recommended approach going forward.🔎 Suggested migration:
-docker-compose down -v --remove-orphans +docker compose down -v --remove-orphans if [ "$(uname -s)" = "Linux" ]; then echo "Remove __pycache__ files" sudo find . -type d -name __pycache__ -exec rm -r {} \+ fi -docker-compose build -docker-compose up -d -docker-compose exec -T backend bash scripts/tests-start.sh "$@" +docker compose build +docker compose up -d +docker compose exec -T backend bash scripts/tests-start.sh "$@"scripts/build-push.sh (1)
10-10: Consider migrating todocker compose(v2).Similar to other scripts, this uses the deprecated
docker-composecommand. Consider usingdocker composefor consistency with Docker Compose v2.🔎 Apply this change:
-docker-compose -f docker-compose.yml push +docker compose -f docker-compose.yml pushdevelopment.md (1)
105-116: Fix heading level increment for proper document structure.The heading jumps from h2 (##) to h4 (####), skipping h3. This violates proper heading hierarchy and can affect document navigation and accessibility tools.
🔎 Apply this fix:
-#### Install pre-commit to run automatically +### Install pre-commit to run automaticallyAlso consider adjusting the next heading at line 128 if needed:
-#### Running pre-commit hooks manually +### Running pre-commit hooks manually.env.example (1)
50-52: Consider using placeholder format consistently for sensitive keys.The API keys use
<ADD-KEY>and<ADD-TOKEN>placeholders which is good practice. Also, add a trailing newline at end of file.🔎 Suggested improvement:
# require as a env if you want to use kaapi guardrail apis OPENAI_API_KEY=<ADD-KEY> GUARDRAILS_HUB_API_KEY=<ADD-KEY> -AUTH_TOKEN=<ADD-TOKEN> +AUTH_TOKEN=<ADD-TOKEN> +backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (2)
8-11: Unused import:sqlmodel.sql.sqltypes.This import is not used in the migration file.
🔎 Apply this diff:
from alembic import op import sqlalchemy as sa -import sqlmodel.sql.sqltypes from sqlalchemy.dialects import postgresql
57-90: Downgrade creates sequences correctly but has a subtle issue.Line 59-61:
autoincrement=Trueonadd_columndoesn't automatically create/attach a sequence in PostgreSQL via Alembic. The subsequent sequence creation (lines 65-66) handles this, but theautoincrement=Trueparameter is misleading and has no effect here.🔎 Consider removing the misleading parameter:
def downgrade(): # Reverse the upgrade process - op.add_column('user', sa.Column('old_id', sa.Integer, autoincrement=True)) - op.add_column('item', sa.Column('old_id', sa.Integer, autoincrement=True)) + op.add_column('user', sa.Column('old_id', sa.Integer)) + op.add_column('item', sa.Column('old_id', sa.Integer)) op.add_column('item', sa.Column('old_owner_id', sa.Integer, nullable=True))backend/Dockerfile (1)
1-44: Dockerfile follows uv best practices with good layer caching.The Dockerfile correctly implements the recommended uv Docker integration patterns with cache mounts and intermediate layers for efficient builds.
A few optional improvements to consider:
- Add
EXPOSE 8000for documentation purposes (FastAPI default port).- Consider adding a
HEALTHCHECKinstruction for container orchestration.- Consider using a multi-stage build or slim base image to reduce final image size.
🔎 Optional: Add EXPOSE and HEALTHCHECK
ENV PYTHONPATH=/app +EXPOSE 8000 + COPY ./scripts /app/scriptsRUN --mount=type=cache,target=/root/.cache/uv \ uv sync +HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ + CMD ["python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"] + CMD ["fastapi", "run", "--workers", "4", "app/main.py"]backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py (1)
10-10: Unused import:sqlmodel.sql.sqltypes.This import is not used in the migration and can be removed.
🔎 Apply this diff:
from alembic import op import sqlalchemy as sa -import sqlmodel.sql.sqltypesbackend/app/core/db.py (2)
1-1: Unused imports:Sessionandselectare imported but not used.Only
create_engineis used in this module. TheSessionandselectimports appear to be leftover from template code.🔎 Apply this diff to remove unused imports:
-from sqlmodel import Session, create_engine, select +from sqlmodel import create_engine
13-22: Add trailing newline and consider documenting the placeholder.The
init_dbfunction is currently a no-op placeholder. Per the static analysis hint, add a trailing newline. Also, consider whether this function should be removed if unused, or documented for future implementation.🔎 Apply this diff:
# This works because the models are already imported and registered from app.models # SQLModel.metadata.create_all(engine) pass +backend/app/tests_pre_start.py (2)
16-29: Code duplication withbackend_pre_start.py.This implementation is nearly identical to
backend/app/backend_pre_start.py. Consider extracting the shared retry logic into a common utility function to reduce duplication and ensure consistent behavior.🔎 Example shared utility approach:
Create a shared module (e.g.,
app/core/startup.py):# app/core/startup.py import logging from sqlalchemy import Engine from sqlmodel import Session, select from tenacity import after_log, before_log, retry, stop_after_attempt, wait_fixed logger = logging.getLogger(__name__) max_tries = 60 * 5 # 5 minutes wait_seconds = 1 @retry( stop=stop_after_attempt(max_tries), wait=wait_fixed(wait_seconds), before=before_log(logger, logging.INFO), after=after_log(logger, logging.WARNING), ) def wait_for_db(db_engine: Engine) -> None: """Wait for database to be available.""" try: with Session(db_engine) as session: session.exec(select(1)) except Exception as e: logger.error(e) raise eThen both
backend_pre_start.pyandtests_pre_start.pycan import and usewait_for_db.
20-20: Uselogging.WARNINGinstead of deprecatedlogging.WARN.
logging.WARNis deprecated in favor oflogging.WARNING.🔎 Apply this diff:
@retry( stop=stop_after_attempt(max_tries), wait=wait_fixed(wait_seconds), before=before_log(logger, logging.INFO), - after=after_log(logger, logging.WARN), + after=after_log(logger, logging.WARNING), )backend/app/core/exception_handlers.py (1)
26-33: Consider logging the exception before returning the generic 500 response.For the generic exception handler, the error is silently converted to a response. In production, unhandled exceptions should be logged for debugging and monitoring purposes. Also, exposing raw exception messages (
str(exc)) in responses could leak sensitive implementation details.🔎 Suggested improvement
@app.exception_handler(Exception) async def generic_error_handler(request: Request, exc: Exception): + logger.exception("Unhandled exception occurred") return JSONResponse( status_code=HTTP_500_INTERNAL_SERVER_ERROR, content=APIResponse.failure_response( - str(exc) or "An unexpected error occurred." + "An unexpected error occurred." ).model_dump(), )This requires adding a logger import and instantiation at the top of the file. Consider exposing the actual error message only in non-production environments via a configuration flag.
backend/app/backend_pre_start.py (1)
22-29: Consider extracting shared DB health-check logic.This
initfunction is nearly identical to the one inbackend/app/tests_pre_start.py. Both perform the same DB connectivity check with retry logic.You could extract the shared logic into a common utility (e.g., in
app/core/db.py) to avoid duplication:# In app/core/db.py def check_db_connection(db_engine: Engine) -> None: with Session(db_engine) as session: session.exec(select(1))Then both pre-start scripts can import and use this with their own retry decorators.
backend/app/alembic/versions/e2412789c190_initialize_models.py (1)
40-44: Consider addingON DELETE CASCADEand an index on the foreign key.The foreign key constraint on
owner_iddoesn't specify cascade behavior. Deleting a user with associated items will fail with a foreign key violation. Additionally, an index onowner_idwould improve query performance for user-item lookups.🔎 Suggested improvement
sa.ForeignKeyConstraint( ["owner_id"], ["user.id"], + ondelete="CASCADE", ), sa.PrimaryKeyConstraint("id"), ) + op.create_index(op.f("ix_item_owner_id"), "item", ["owner_id"], unique=False)And in
downgrade():def downgrade(): # ### commands auto generated by Alembic - please adjust! ### + op.drop_index(op.f("ix_item_owner_id"), table_name="item") op.drop_table("item")Note: The AI summary mentions cascade delete behavior may be addressed in a subsequent migration. If so, disregard this comment.
backend/app/api/deps.py (3)
4-4: Remove unused importHeader.The
Headerimport is not used in this module.-from fastapi import Depends, Header, HTTPException, status, Security +from fastapi import Depends, HTTPException, status, Security
8-8: Remove commented-out code.Commented-out imports should be removed to keep the codebase clean. If this is needed later, version control will preserve the history.
-# from app.core import security
17-17: Consider renamingsecurityto avoid shadowing.The variable name
securitycould shadow the commented-out import fromapp.core. Consider a more descriptive name likebearer_schemeorhttp_bearer.-security = HTTPBearer(auto_error=False) +bearer_scheme = HTTPBearer(auto_error=False)This would also require updating the reference on line 22.
backend/app/utils.py (2)
6-6: Use modern type hintdictinstead oftyping.Dict.Python 3.9+ supports using built-in
dictdirectly for type hints.🔎 Apply this diff:
-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVarAs per static analysis hints, also update line 17:
- metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = NoneAnd lines 21, 30 where
Dict[str, Any]appears in method signatures.
37-37: Add trailing newline.As per static analysis, the file is missing a newline at the end.
backend/app/core/config.py (1)
17-22: Clarify the CORS parsing logic and improve error handling.The logic has potential issues:
- Line 20: The condition
isinstance(v, list | str)after the line 18 check is redundant sincevmust be astrat that point if it wasn't a list initially.- Line 22: Raising
ValueError(v)without a descriptive message makes debugging difficult.- The function doesn't appear to be used in the provided Settings class fields.
🔎 Apply this diff to improve the function:
def parse_cors(v: Any) -> list[str] | str: if isinstance(v, str) and not v.startswith("["): return [i.strip() for i in v.split(",") if i.strip()] - elif isinstance(v, list | str): + elif isinstance(v, (list, str)): return v - raise ValueError(v) + raise ValueError(f"Invalid CORS value: {v}. Expected string or list.")docker-compose.yml (1)
27-36: Pin the Adminer image version.Line 28 uses
image: adminerwithout a version tag, which defaults tolatest. This can lead to unexpected behavior when the image updates. Consider pinning to a specific version for reproducibility.🔎 Example with version pinning:
- image: adminer + image: adminer:4.8.1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (46)
.env.example(1 hunks)README.md(1 hunks)SECURITY.md(1 hunks)backend/.dockerignore(1 hunks)backend/.gitignore(1 hunks)backend/Dockerfile(1 hunks)backend/README.md(1 hunks)backend/alembic.ini(1 hunks)backend/app/alembic/README(1 hunks)backend/app/alembic/env.py(1 hunks)backend/app/alembic/script.py.mako(1 hunks)backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py(1 hunks)backend/app/alembic/versions/9c0a54914c78_add_max_length_for_string_varchar_.py(1 hunks)backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py(1 hunks)backend/app/alembic/versions/e2412789c190_initialize_models.py(1 hunks)backend/app/api/deps.py(1 hunks)backend/app/api/main.py(1 hunks)backend/app/api/routes/utils.py(1 hunks)backend/app/backend_pre_start.py(1 hunks)backend/app/core/config.py(1 hunks)backend/app/core/db.py(1 hunks)backend/app/core/exception_handlers.py(1 hunks)backend/app/core/middleware.py(1 hunks)backend/app/initial_data.py(1 hunks)backend/app/load_env.py(1 hunks)backend/app/main.py(1 hunks)backend/app/tests_pre_start.py(1 hunks)backend/app/utils.py(1 hunks)backend/pyproject.toml(1 hunks)backend/scripts/format.sh(1 hunks)backend/scripts/lint.sh(1 hunks)backend/scripts/prestart.sh(1 hunks)backend/scripts/test.sh(1 hunks)backend/scripts/tests-start.sh(1 hunks)copier.yml(1 hunks)deployment.md(1 hunks)development.md(1 hunks)docker-compose.yml(1 hunks)hooks/post_gen_project.py(1 hunks)release-notes.md(1 hunks)scripts/build-push.sh(1 hunks)scripts/build.sh(1 hunks)scripts/deploy.sh(1 hunks)scripts/generate-client.sh(1 hunks)scripts/test-local.sh(1 hunks)scripts/test.sh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py (1)
backend/app/alembic/versions/e2412789c190_initialize_models.py (2)
upgrade(19-46)downgrade(49-54)
backend/scripts/prestart.sh (4)
backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py (1)
upgrade(20-27)backend/app/alembic/versions/9c0a54914c78_add_max_length_for_string_varchar_.py (1)
upgrade(20-43)backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
upgrade(21-55)backend/app/alembic/versions/e2412789c190_initialize_models.py (1)
upgrade(19-46)
backend/app/initial_data.py (3)
backend/app/core/db.py (1)
init_db(13-22)backend/app/backend_pre_start.py (2)
init(22-29)main(32-35)backend/app/tests_pre_start.py (2)
init(22-29)main(32-35)
backend/app/tests_pre_start.py (1)
backend/app/backend_pre_start.py (2)
init(22-29)main(32-35)
backend/app/alembic/env.py (1)
backend/app/core/config.py (1)
SQLALCHEMY_DATABASE_URI(47-55)
backend/app/core/exception_handlers.py (1)
backend/app/utils.py (2)
APIResponse(13-37)failure_response(26-37)
backend/app/alembic/versions/9c0a54914c78_add_max_length_for_string_varchar_.py (3)
backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py (2)
upgrade(20-27)downgrade(30-37)backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (2)
upgrade(21-55)downgrade(57-90)backend/app/alembic/versions/e2412789c190_initialize_models.py (2)
upgrade(19-46)downgrade(49-54)
backend/app/core/db.py (1)
backend/app/core/config.py (1)
SQLALCHEMY_DATABASE_URI(47-55)
backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (2)
backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py (2)
upgrade(20-27)downgrade(30-37)backend/app/alembic/versions/e2412789c190_initialize_models.py (2)
upgrade(19-46)downgrade(49-54)
backend/app/backend_pre_start.py (2)
backend/app/initial_data.py (2)
init(11-13)main(16-19)backend/app/tests_pre_start.py (2)
init(22-29)main(32-35)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 17-17: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 22-22: [UnorderedKey] The FIRST_SUPERUSER key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 23-23: [UnorderedKey] The FIRST_SUPERUSER_PASSWORD key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 24-24: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 24-24: [UnorderedKey] The EMAIL_TEST_USER key should go before the FIRST_SUPERUSER key
(UnorderedKey)
[warning] 34-34: [UnorderedKey] The POSTGRES_PORT key should go before the POSTGRES_SERVER key
(UnorderedKey)
[warning] 35-35: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 37-37: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 46-46: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 47-47: [SpaceCharacter] The line has spaces around equal sign
(SpaceCharacter)
[warning] 51-51: [UnorderedKey] The GUARDRAILS_HUB_API_KEY key should go before the OPENAI_API_KEY key
(UnorderedKey)
[warning] 52-52: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
[warning] 52-52: [UnorderedKey] The AUTH_TOKEN key should go before the GUARDRAILS_HUB_API_KEY key
(UnorderedKey)
🪛 LanguageTool
SECURITY.md
[style] ~3-~3: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: # Security Policy Security is very important for this project and its community. 🔒 ...
(EN_WEAK_ADJECTIVE)
development.md
[grammar] ~13-~13: Use a hyphen to join words.
Context: ...interact with these URLs: Backend, JSON based web API based on OpenAPI: <http://...
(QB_NEW_EN_HYPHEN)
[style] ~55-~55: In American English, abbreviations like “etc.” require a period.
Context: ...ch service (backend, frontend, adminer, etc). When you deploy it to production (or...
(ETC_PERIOD)
README.md
[style] ~84-~84: Using many exclamation marks might seem excessive (in this case: 5 exclamation marks for a text that’s 2167 characters long)
Context: ...or creating and maintaining the template!!!
(EN_EXCESSIVE_EXCLAMATION)
deployment.md
[uncategorized] ~96-~96: The official name of this software platform is spelled with a capital “H”.
Context: ...ub * Add Docker permissions to the `github` user: bash sudo usermod -aG docker...
(GITHUB)
[uncategorized] ~102-~102: The official name of this software platform is spelled with a capital “H”.
Context: ...github * Temporarily switch to the `github` user: bash sudo su - github ``` *...
(GITHUB)
[uncategorized] ~108-~108: The official name of this software platform is spelled with a capital “H”.
Context: ...bash sudo su - github ``` * Go to the github` user's home directory: bash cd ...
(GITHUB)
[style] ~120-~120: Consider a more expressive alternative.
Context: ...ng, you can install it as a service. To do that, exit the github user and go bac...
(DO_ACHIEVE)
[uncategorized] ~120-~120: The official name of this software platform is spelled with a capital “H”.
Context: ...l it as a service. To do that, exit the github user and go back to the root user: ...
(GITHUB)
[uncategorized] ~128-~128: The official name of this software platform is spelled with a capital “H”.
Context: ...that user. Before being able to go the github user directory, you need to become the...
(GITHUB)
[style] ~134-~134: This phrase is redundant. Consider using “inside”.
Context: ...r, go to the actions-runner directory inside of the github user's home directory: ``...
(OUTSIDE_OF)
[uncategorized] ~134-~134: The official name of this software platform is spelled with a capital “H”.
Context: ...actions-runnerdirectory inside of thegithub` user's home directory: ```bash cd /ho...
(GITHUB)
[uncategorized] ~140-~140: The official name of this software platform is spelled with a capital “H”.
Context: ...osted runner as a service with the user github: bash ./svc.sh install github ...
(GITHUB)
[uncategorized] ~164-~164: The official name of this software platform is spelled with a capital “H”.
Context: ...secrets-for-a-repository). The current Github Actions workflows expect these secrets:...
(GITHUB)
[uncategorized] ~180-~180: The official name of this software platform is spelled with a capital “H”.
Context: ...here are GitHub Action workflows in the .github/workflows directory already configured...
(GITHUB)
backend/README.md
[grammar] ~44-~44: Use a hyphen to join words.
Context: ...ose watch ``` There is also a commented out command override, you can uncommen...
(QB_NEW_EN_HYPHEN)
[grammar] ~78-~78: Use a hyphen to join words.
Context: ...runs the live reloading server that auto reloads when it detects code changes. N...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
SECURITY.md
19-19: Dollar signs used before commands without showing output
(MD014, commands-show-output)
25-25: Dollar signs used before commands without showing output
(MD014, commands-show-output)
development.md
105-105: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4
(MD001, heading-increment)
README.md
15-15: Bare URL used
(MD034, no-bare-urls)
release-notes.md
574-574: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
575-575: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
576-576: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
577-577: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
578-578: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
579-579: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
580-580: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
581-581: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
582-582: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
583-583: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
691-691: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
692-692: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
693-693: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
694-694: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
695-695: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
696-696: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
697-697: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
698-698: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
699-699: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
705-705: Bare URL used
(MD034, no-bare-urls)
737-737: Bare URL used
(MD034, no-bare-urls)
backend/README.md
19-19: Dollar signs used before commands without showing output
(MD014, commands-show-output)
25-25: Dollar signs used before commands without showing output
(MD014, commands-show-output)
41-41: Dollar signs used before commands without showing output
(MD014, commands-show-output)
49-49: Dollar signs used before commands without showing output
(MD014, commands-show-output)
55-55: Dollar signs used before commands without showing output
(MD014, commands-show-output)
69-69: Dollar signs used before commands without showing output
(MD014, commands-show-output)
89-89: Dollar signs used before commands without showing output
(MD014, commands-show-output)
125-125: Dollar signs used before commands without showing output
(MD014, commands-show-output)
133-133: Dollar signs used before commands without showing output
(MD014, commands-show-output)
141-141: Dollar signs used before commands without showing output
(MD014, commands-show-output)
153-153: Dollar signs used before commands without showing output
(MD014, commands-show-output)
🪛 Ruff (0.14.8)
backend/app/core/middleware.py
12-12: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
17-17: Local variable client_ip is assigned to but never used
Remove assignment to unused variable client_ip
(F841)
backend/app/core/exception_handlers.py
13-13: Unused function argument: request
(ARG001)
20-20: Unused function argument: request
(ARG001)
27-27: Unused function argument: request
(ARG001)
backend/app/utils.py
6-6: typing.Dict is deprecated, use dict instead
(UP035)
37-37: No newline at end of file
Add trailing newline
(W292)
backend/app/core/db.py
22-22: No newline at end of file
Add trailing newline
(W292)
backend/app/api/routes/utils.py
10-10: No newline at end of file
Add trailing newline
(W292)
🪛 Shellcheck (0.11.0)
scripts/test-local.sh
[warning] 8-8: Quote this to prevent word splitting.
(SC2046)
🔇 Additional comments (38)
scripts/build.sh (1)
1-10: LGTM!The build script correctly enforces the TAG variable requirement and provides a sensible default for FRONTEND_ENV. Error handling with
set -eensures the script exits on failures.backend/scripts/test.sh (1)
1-9: LGTM!The test script correctly runs pytest with coverage tracking and generates multiple report formats (console, HTML, XML) for different use cases. Error handling is appropriate.
backend/scripts/tests-start.sh (1)
1-15: LGTM!The test startup script correctly orchestrates the test workflow: environment setup → pre-start health check → database migrations → test execution. Error handling for migration failures is explicit and appropriate.
backend/app/alembic/versions/9c0a54914c78_add_max_length_for_string_varchar_.py (3)
20-43: Migration logic is correct.The upgrade function properly adds 255-character length constraints to the string fields in User and Item tables. The implementation is clean and the constraints are appropriate for typical email addresses, names, titles, and short descriptions.
46-69: Downgrade logic is correctly symmetric.The downgrade function properly reverts all four field length constraints back to unlimited strings, maintaining consistency with the upgrade path.
14-15: The migration chain is correct. The migration9c0a54914c78properly depends one2412789c190, and all migrations form a valid linear chain:e2412789c190→9c0a54914c78→d98dd8ec85a3→1a31ce608336. No branching or ordering issues detected.backend/app/initial_data.py (1)
11-19: LGTM! Initial data setup structure is well-organized.The structure correctly creates a session and delegates to
init_db(). Whileinit_db()is currently a no-op (as noted in the PR description about implementing guardrails separately), this provides the proper scaffolding for future data initialization.backend/scripts/lint.sh (1)
1-8: LGTM! Standard linting workflow.The script correctly implements a comprehensive linting workflow with type checking (mypy), linting (ruff check), and format validation (ruff format --check). Error handling with
set -eensures failures are caught.backend/scripts/format.sh (1)
1-5: LGTM! Standard formatting workflow.The script correctly applies automatic fixes and formatting using ruff. The use of POSIX shell (
/bin/sh) is appropriate for this simple script.development.md (1)
1-168: Excellent development documentation!The documentation provides comprehensive guidance on Docker Compose setup, local development workflows, environment configuration, and pre-commit tooling. This will be very helpful for contributors getting started with the project.
deployment.md (1)
1-205: Comprehensive deployment guide!Excellent documentation covering server preparation, environment configuration, secret management, Docker Compose deployment, and GitHub Actions CI/CD setup. The step-by-step instructions for setting up self-hosted runners are particularly helpful.
scripts/test.sh (1)
1-11: LGTM! Well-structured test workflow.The script correctly orchestrates the test workflow with Docker Compose v2 commands. It ensures a clean state by tearing down containers before and after tests, and properly forwards arguments to the test runner.
backend/app/alembic/script.py.mako (1)
1-25: LGTM!Standard Alembic Mako template with appropriate imports for SQLModel integration. The template correctly includes
sqlmodel.sql.sqltypesto support SQLModel-specific column types in auto-generated migrations.backend/app/api/main.py (1)
1-10: LGTM with a minor note on the commented code.The router setup is clean. Note that the commented-out block (lines 9-10) references
settingsandprivatewhich are not imported in this file. If you plan to enable this later, ensure the necessary imports are added.scripts/deploy.sh (1)
1-15: LGTM!The deployment script properly validates required environment variables and follows a standard Docker Swarm deployment pattern.
Note: The script depends on
docker-auto-labels(line 13), which is a third-party tool. Ensure this is documented in deployment prerequisites or consider adding a check for its availability.backend/app/main.py (2)
20-21: Verify environment value consistency for Sentry initialization.The condition checks for
"local"environment, but.env.exampledefinesENVIRONMENT=development. Ensure the intended behavior is correct—Sentry will be initialized indevelopmentenvironment with this logic.
29-30: Middleware ordering note.
CorrelationIdMiddlewareis added after the HTTP logger middleware. With ASGI middleware, the last-added middleware runs first on requests. This means correlation IDs will be available to the logger, which is the correct order.backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
21-55: Migration logic is sound but verify on non-empty databases.The upgrade correctly:
- Creates the
uuid-osspextension- Adds new UUID columns and populates them
- Properly remaps the foreign key relationship before dropping old columns
One consideration: dropping and recreating primary keys on tables with data in production requires careful testing. Ensure this migration is tested against a database with representative data.
backend/alembic.ini (1)
1-71: Standard Alembic configuration looks good.The configuration follows standard Alembic patterns. The
sqlalchemy.urlis intentionally omitted here since it's configured programmatically viaenv.pyusing the application settings, which is the correct approach for environment-based configuration.backend/app/tests_pre_start.py (1)
32-35: LGTM on main() structure.The main function correctly logs initialization status and invokes the retry-enabled init function with the engine. The module guard pattern is appropriate for a standalone script.
backend/app/core/exception_handlers.py (3)
1-8: LGTM on imports and module structure.The imports are appropriate and the module is well-organized for centralized exception handling.
11-17: LGTM for validation error handler.The handler correctly processes Pydantic validation errors and formats them using the centralized
APIResponse.failure_response. Therequestparameter is required by FastAPI's exception handler signature despite being unused (ignore the static analysis hint).
19-24: Add type guard for dict details inAPIResponse.failure_response.HTTPException.detail accepts any JSON-serializable value, including dicts, but
APIResponse.failure_responseis typed to accept onlystr | listand assumes list items havelocandmsgkeys. If an HTTPException is raised with a dict detail, it would either be assigned as-is (semantically incorrect) or cause a KeyError when the code tries to access expected keys. Add a type guard to handle dict details or document that only str/list details are supported.backend/README.md (1)
1-156: Well-structured documentation for the backend workflow.The README provides comprehensive guidance covering Docker Compose setup, dependency management with
uv, VS Code integration, testing, and Alembic migrations. This will be valuable for onboarding developers.backend/app/backend_pre_start.py (1)
32-39: LGTM on the main entry point.The main function and script entry point are correctly structured.
backend/app/alembic/versions/e2412789c190_initialize_models.py (1)
19-46: LGTM on overall migration structure.The migration correctly creates the
useranditemtables with appropriate columns, constraints, and a unique index onuser.email. Thedowngrade()function properly reverses the operations in the correct order.backend/app/api/deps.py (1)
12-14: LGTM on the database session dependency.The
get_db()generator correctly manages the SQLModel session lifecycle using a context manager.backend/app/alembic/env.py (3)
32-33: LGTM! Clean database URL retrieval.The function properly converts the PostgresDsn object to a string for Alembic configuration.
36-54: LGTM! Standard Alembic offline migration pattern.The offline migration configuration is correct with appropriate options for SQL generation and type comparison.
57-78: LGTM! Proper online migration setup.The online migration configuration correctly uses
NullPoolfor migrations and maintains consistency with offline mode settings.backend/app/utils.py (2)
13-17: LGTM! Well-designed generic response model.The use of generics provides type safety while maintaining flexibility for different response types.
19-23: LGTM! Clean success response factory.The method provides a clear, type-safe way to construct successful API responses.
backend/app/core/config.py (1)
57-58: LGTM! Proper superuser configuration.The use of
EmailStrfor validation and required fields ensures proper superuser setup.docker-compose.yml (5)
5-26: LGTM! Well-configured database service.The service properly uses required environment variables, healthchecks, and persistent storage. The port exposure (5432:5432) allows direct database access from the host, which is useful for development but consider removing in production configurations.
38-52: LGTM! Proper prestart service configuration.The service correctly waits for database health before running migrations/setup scripts, and the profile makes it opt-in.
75-76: Development-only configuration noted.The
--reloadflag is appropriate for development (as indicated by the PR description), but ensure this docker-compose.yml is not used in production deployments.
77-87: LGTM! Well-configured development watch.The watch configuration properly syncs source changes and rebuilds on dependency updates, providing a good developer experience.
70-74: No action needed. The python:3.10 image is based on buildpack-deps, which includes a large number of extremely common Debian packages, and this variant includes curl, wget, and ca-certificates packages. The healthcheck will function correctly as curl is available in the backend container.Likely an incorrect or invalid review comment.
backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py
Show resolved
Hide resolved
backend/app/alembic/versions/1a31ce608336_add_cascade_delete_relationships.py
Show resolved
Hide resolved
| token = credentials.credentials | ||
|
|
||
| if token != settings.AUTH_TOKEN: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_401_UNAUTHORIZED, | ||
| detail="Invalid token", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constant-time comparison for token validation to prevent timing attacks.
Direct string comparison with != is vulnerable to timing attacks, where an attacker can infer the correct token character-by-character based on response times. Use secrets.compare_digest() for secure comparison.
🔎 Apply this diff
Add the import at the top of the file:
+import secrets
from collections.abc import GeneratorThen update the comparison:
token = credentials.credentials
- if token != settings.AUTH_TOKEN:
+ if not secrets.compare_digest(token, settings.AUTH_TOKEN):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| token = credentials.credentials | |
| if token != settings.AUTH_TOKEN: | |
| raise HTTPException( | |
| status_code=status.HTTP_401_UNAUTHORIZED, | |
| detail="Invalid token", | |
| ) | |
| token = credentials.credentials | |
| if not secrets.compare_digest(token, settings.AUTH_TOKEN): | |
| raise HTTPException( | |
| status_code=status.HTTP_401_UNAUTHORIZED, | |
| detail="Invalid token", | |
| ) |
🤖 Prompt for AI Agents
In backend/app/api/deps.py around lines 31 to 37, the current token check uses a
direct string comparison ("token != settings.AUTH_TOKEN") which is vulnerable to
timing attacks; import the secrets module and replace the direct comparison with
secrets.compare_digest(token, settings.AUTH_TOKEN) (ensuring both operands are
strings) and use its result to raise the same HTTPException when the comparison
returns False.
| "emoji", | ||
| "ftfy", | ||
| "presidio_analyzer", | ||
| "presidio_anonymizer", | ||
| "transformers", | ||
| "torch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pin heavy ML/NLP dependencies to tested versions.
The unpinned dependencies—emoji, ftfy, presidio_analyzer, presidio_anonymizer, transformers, and torch—pose real risks:
- Torch and transformers have documented compatibility issues across versions, with transformers requiring specific torch versions
- Unpinned versions in production environments can lead to unexpected breakage
- Mismatches between transformers, torch, and related libraries cause installation errors or runtime issues; explicit version pinning is standard practice
Instead of allowing pip to pull any compatible version, consider pinning to tested ranges (e.g., torch>=2.1,<2.2 and transformers==4.35.2) to maintain reproducibility across environments and prevent silent failures during dependency updates.
🤖 Prompt for AI Agents
In backend/pyproject.toml around lines 26-31, the listed ML/NLP deps are
unpinned; update those entries to explicit tested versions/ranges (e.g., set
transformers==4.35.2 and torch>=2.1,<2.2 as suggested) and pin the others to
known working releases or constrained ranges (for example pin emoji and ftfy to
the tested minor versions and pin presidio_analyzer/presidio_anonymizer to the
versions you validated); then update the lockfile (poetry.lock or equivalent)
and run the test suite and installation in a clean environment to confirm
compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/app/utils.py (1)
5-6: Avoid configuring logging in library modules.Calling
logging.basicConfig()in a utility module can override the application's logging configuration. This should be handled at the application entry point (e.g., inmain.py).🔎 Apply this diff to remove the basicConfig call:
-logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__)
🧹 Nitpick comments (4)
.env.example (1)
51-51: Add blank line at end of file.Environment files should end with a newline character for standard file formatting conventions.
🔎 Apply this diff:
# require as a env if you want to use doc transformation OPENAI_API_KEY=<ADD-KEY> GUARDRAILS_HUB_API_KEY=<ADD-KEY> +backend/app/backend_pre_start.py (2)
12-13: Consider making retry configuration environment-driven.The hardcoded 5-minute timeout (300 attempts × 1 second) may be excessive for some deployment environments. Consider making these values configurable via environment variables for greater flexibility across different contexts.
🔎 Example implementation
+import os + -max_tries = 60 * 5 # 5 minutes -wait_seconds = 1 +max_tries = int(os.getenv("DB_MAX_RETRIES", "300")) # Default: 5 minutes +wait_seconds = int(os.getenv("DB_RETRY_WAIT_SECONDS", "1"))
22-29: Simplify redundant exception handling.The current pattern of catching, logging, and re-raising the exception is redundant with tenacity's retry mechanism. The
@retrydecorator already handles retries and logs viaafter_log. Consider either removing the try-except block or adding meaningful context to the error message rather than just re-raising.🔎 Option 1: Remove redundant try-except (preferred)
def init(db_engine: Engine) -> None: - try: - with Session(db_engine) as session: - # Try to create session to check if DB is awake - session.exec(select(1)) - except Exception as e: - logger.error(e) - raise e + with Session(db_engine) as session: + # Try to create session to check if DB is awake + session.exec(select(1))🔎 Option 2: Add context to error message
try: with Session(db_engine) as session: # Try to create session to check if DB is awake session.exec(select(1)) except Exception as e: - logger.error(e) - raise e + logger.error(f"Failed to connect to database: {e}", exc_info=True) + raisebackend/app/utils.py (1)
3-3: Modernize type hint to use built-indict.The
typing.Dictis deprecated in favor of the built-indictfor Python 3.9+.🔎 Apply this diff to use built-in `dict`:
-from typing import Any, Dict, Generic, Optional, TypeVar +from typing import Any, Generic, Optional, TypeVarThen update line 14:
- metadata: Optional[Dict[str, Any]] = None + metadata: Optional[dict[str, Any]] = NoneAnd line 18:
- cls, data: T, metadata: Optional[Dict[str, Any]] = None + cls, data: T, metadata: Optional[dict[str, Any]] = NoneAnd line 27:
- metadata: Optional[Dict[str, Any]] = None, + metadata: Optional[dict[str, Any]] = None,
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.env.example(1 hunks)backend/app/backend_pre_start.py(1 hunks)backend/app/main.py(1 hunks)backend/app/utils.py(1 hunks)backend/pyproject.toml(1 hunks)backend/scripts/prestart.sh(1 hunks)release-notes.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- release-notes.md
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/main.py
- backend/scripts/prestart.sh
- backend/pyproject.toml
🧰 Additional context used
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 17-17: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 22-22: [UnorderedKey] The FIRST_SUPERUSER key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 23-23: [UnorderedKey] The FIRST_SUPERUSER_PASSWORD key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 24-24: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 24-24: [UnorderedKey] The EMAIL_TEST_USER key should go before the FIRST_SUPERUSER key
(UnorderedKey)
[warning] 34-34: [UnorderedKey] The POSTGRES_PORT key should go before the POSTGRES_SERVER key
(UnorderedKey)
[warning] 35-35: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 37-37: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 51-51: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
[warning] 51-51: [UnorderedKey] The GUARDRAILS_HUB_API_KEY key should go before the OPENAI_API_KEY key
(UnorderedKey)
🪛 Ruff (0.14.8)
backend/app/utils.py
3-3: typing.Dict is deprecated, use dict instead
(UP035)
36-36: No newline at end of file
Add trailing newline
(W292)
🔇 Additional comments (6)
backend/app/backend_pre_start.py (4)
1-10: LGTM!The imports and logging setup are appropriate for a database readiness check utility. The logging configuration is clear and straightforward.
16-21: LGTM!The retry decorator is properly configured. The past review comment regarding the deprecated
logging.WARNhas been correctly addressed—the code now useslogging.WARNINGas recommended.
32-35: LGTM!The main orchestration function is clean and provides clear logging for the initialization sequence.
38-39: LGTM!Standard entry point pattern is correctly implemented.
backend/app/utils.py (2)
8-20: Well-designed generic response model.The
APIResponseclass with generic typing provides a clean, type-safe structure for standardizing API responses. Thesuccess_responseclassmethod correctly constructs success cases with appropriate defaults.
22-36: Error handling properly addresses previous concerns.The
failure_responsemethod now uses.get()with appropriate defaults to safely access dictionary keys, preventingKeyErrorexceptions for malformed error dictionaries. The method correctly handles both string and list error formats.
|
|
||
| - [docker](https://docs.docker.com/get-started/get-docker/) Docker | ||
| - [uv](https://docs.astral.sh/uv/) for Python package and environment management. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add how to get GUARDRAILS_HUB_API_KEY from guardrails hub
|
would be good to at least have a small feature/functionality in; doing a review just on the scaffolding may result in adding few stale files |
|
|
||
|
|
||
| def upgrade(): | ||
| # ### commands auto generated by Alembic - please adjust! ### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to keep the migration files clean, meaning removing all these comments added by alembic. Only leave the comments really really needed to understand the context of something in migration file
|
|
||
|
|
||
| # revision identifiers, used by Alembic. | ||
| revision = '1a31ce608336' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something we recently modified in our codebase, would be helpful for you if you follow this as well - ProjectTech4DevAI/kaapi-backend#479
| op.create_primary_key('item_pkey', 'item', ['id']) | ||
|
|
||
| # Recreate foreign key constraint | ||
| op.create_foreign_key('item_owner_id_fkey', 'item', 'user', ['owner_id'], ['id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this migration file, as well as all the others have redundent comments, do remove them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now realize that these migration files are not our actual tables and just boilerplate code being used, so for when migration files for the actual tables will be created, do remember to not add redundent comments and sequentially numbering migration files
|
|
||
| # Use the same path as config.py expects (one level above backend/) | ||
| env_file = "../.env" | ||
| if env == "testing": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you will also be maintaining seperate environments for testing and development, then please add .env.test.example
| raise ValueError(v) | ||
|
|
||
|
|
||
| class Settings(BaseSettings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add guardrail hub api key here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
♻️ Duplicate comments (6)
.env.example (2)
26-30: Clarify cron functionality requirements.A previous reviewer questioned whether cron functionality is needed. The presence of
API_BASE_URLandCRON_INTERVAL_MINUTESsuggests background job scheduling, but the use case isn't documented.
5-5: Remove Traefik reference per previous feedback.Line 5 still contains a Traefik comment, but previous feedback requested removing all Traefik mentions from the repository.
🔎 Proposed fix
-# used by Traefik to transmit traffic and aqcuire TLS certificates +# Domain used for deployment and routing configurationbackend/app/core/config.py (4)
33-33: SECRET_KEY regenerates on each restart, invalidating all tokens.The
SECRET_KEYis evaluated at module import time usingsecrets.token_urlsafe(32), causing a new key on every application restart. This invalidates all existing JWT tokens and sessions.🔎 Proposed fix
- SECRET_KEY: str = secrets.token_urlsafe(32) + SECRET_KEY: strEnsure
SECRET_KEYis set via environment variable in.envfiles.
43-44: Empty password defaults create security risks.
POSTGRES_PASSWORDandPOSTGRES_DBdefault to empty strings, which could allow unauthenticated connections in certain configurations. The validator at lines 90-98 only checks for"changethis"and won't catch these empty defaults.🔎 Proposed fix
- POSTGRES_PASSWORD: str = "" - POSTGRES_DB: str = "" + POSTGRES_PASSWORD: str + POSTGRES_DB: str
90-98: Validator logic doesn't align with actual security risks.The validator has multiple issues:
- Line 92 checks
SECRET_KEYfor"changethis", but the default issecrets.token_urlsafe(32), so this check never triggers.- Line 93 checks
POSTGRES_PASSWORDfor"changethis"but doesn't catch the empty string default ("").🔎 Proposed fix
Update the validator to check for empty strings and remove the SECRET_KEY check (or validate minimum length):
@model_validator(mode="after") def _enforce_non_default_secrets(self) -> Self: - self._check_default_secret("SECRET_KEY", self.SECRET_KEY) self._check_default_secret("POSTGRES_PASSWORD", self.POSTGRES_PASSWORD) self._check_default_secret( "FIRST_SUPERUSER_PASSWORD", self.FIRST_SUPERUSER_PASSWORD ) + + # Validate non-empty passwords + if not self.POSTGRES_PASSWORD: + raise ValueError("POSTGRES_PASSWORD cannot be empty") + if not self.FIRST_SUPERUSER_PASSWORD: + raise ValueError("FIRST_SUPERUSER_PASSWORD cannot be empty") return self
38-38: Add GUARDRAILS_HUB_API_KEY configuration field.Based on previous feedback and the presence of
GUARDRAILS_HUB_API_KEYin.env.example, this field should be added to the Settings model.🔎 Proposed addition
AUTH_TOKEN: str + GUARDRAILS_HUB_API_KEY: str | None = None PROJECT_NAME: str
🧹 Nitpick comments (29)
backend/scripts/install_guardrails_from_hub.sh (2)
24-27: Quote command substitutions to prevent potential word splitting.The static analysis warning is valid. While the current implementation likely works given that each substitution produces a single flag, quoting the command substitutions is a best practice that prevents unexpected word splitting.
Based on static analysis hints (Shellcheck SC2046).
🔎 Proposed fix for quoting command substitutions
guardrails configure \ --token "$GUARDRAILS_HUB_API_KEY" \ - $( [[ "$ENABLE_METRICS" == "true" ]] && echo "--enable-metrics" || echo "--disable-metrics" ) \ - $( [[ "$ENABLE_REMOTE_INFERENCING" == "true" ]] && echo "--enable-remote-inferencing" || echo "--disable-remote-inferencing" ) + "$( [[ "$ENABLE_METRICS" == "true" ]] && echo "--enable-metrics" || echo "--disable-metrics" )" \ + "$( [[ "$ENABLE_REMOTE_INFERENCING" == "true" ]] && echo "--enable-remote-inferencing" || echo "--disable-remote-inferencing" )"
48-51: Consider a more robust iteration pattern for better safety.The loop iterates over an unquoted
$HUB_SOURCESvariable, which relies on word splitting. While validator source URIs are unlikely to contain spaces, using a safer iteration pattern (e.g., reading line-by-line or using an array) would follow bash best practices.🔎 Alternative: Read jq output line-by-line
-for SRC in $HUB_SOURCES; do - echo "Installing Guardrails hub validator: $SRC" - guardrails hub install "$SRC" -done +while IFS= read -r SRC; do + echo "Installing Guardrails hub validator: $SRC" + guardrails hub install "$SRC" +done < <(jq -r ' + .validators[] + | select(.source != "local") + | .source +' "$MANIFEST_FILE")This approach eliminates the intermediate
HUB_SOURCESvariable and safely handles sources with special characters.backend/scripts/entrypoint.sh (1)
1-8: Use absolute paths in the entrypoint.sh script for clarity and consistency with Docker best practices.The WORKDIR is correctly set to
/app/in the Dockerfile, so the relative paths in entrypoint.sh will resolve correctly. However, for clarity and reliability, absolute paths should always be used. Update the script to use absolute paths:/app/scripts/install_guardrails_from_hub.shand for the FastAPI command:
exec fastapi run --workers 4 /app/app/main.pyWhile relative paths are technically allowed, they are interpreted based on the last WORKDIR value, which can create confusion or bugs.
.env.example (1)
52-52: Add trailing newline.The file is missing a trailing newline, which is a common convention for text files.
🔎 Proposed fix
AUTH_TOKEN=<ADD-TOKEN> +backend/app/api/routes/utils.py (2)
2-2: Remove unused import.
AuthDepis imported but never used in this module.🔎 Proposed fix
from fastapi import APIRouter -from app.api.deps import AuthDep
8-8: Add trailing newline.🔎 Proposed fix
return True +backend/app/models/base_validator_config.py (1)
27-27: Add trailing newline.🔎 Proposed fix
) +backend/app/core/api_response.py (1)
16-16: Add trailing newline.🔎 Proposed fix
return cls(success=False, error=error, metadata=metadata) +backend/app/core/guardrail_controller.py (3)
5-7: Add type hints for better type safety.The
build_guardfunction lacks type annotations for its parameter and return value.🔎 Proposed improvement
-def build_guard(validator_items): +from typing import Sequence + +def build_guard(validator_items: Sequence[ValidatorConfigItem]) -> Guard: validators = [v_item.build() for v_item in validator_items] return Guard().use_many(*validators)
9-12: Add type hints and consider defensive checks for introspection logic.The
get_validator_config_modelsfunction lacks return type annotation and makes assumptions about the structure ofValidatorConfigItemwithout validation.🔎 Proposed improvement
-def get_validator_config_models(): +from typing import Sequence, Type + +def get_validator_config_models() -> tuple[Type, ...]: annotated_args = get_args(ValidatorConfigItem) + if not annotated_args: + return () union_type = annotated_args[0] return get_args(union_type)
12-12: Add trailing newline.🔎 Proposed fix
return get_args(union_type) +backend/app/core/constants.py (1)
6-6: Add trailing newline.🔎 Proposed fix
SCORE = "score" +backend/app/core/enum.py (1)
13-13: Add trailing newline.🔎 Proposed fix
All = "all" +backend/app/api/main.py (1)
9-10: Commented-out code is incomplete.If this conditional is intended for future use, the commented code references
settingsandprivatewhich are not imported. Consider either removing this dead code or adding a TODO with context, along with the required imports when ready.backend/app/models/gender_assumption_bias_safety_validator_config.py (2)
1-1: Remove unused importClassVar.
ClassVaris imported but not used in this file.🔎 Proposed fix
-from typing import ClassVar, List, Literal, Optional +from typing import List, Literal, Optional
6-8: Optional: Consider usingField(default_factory=...)for clarity with mutable list defaults.While using
default_factorycan be done in Pydantic, it is not required because Pydantic will create a deep copy of the default value when creating each instance. The current codecategories: Optional[List[BiasCategories]] = [BiasCategories.All]is safe and will not cause shared state between instances. However, usingField(default_factory=...)is more idiomatic and clearer for tooling support:+from sqlmodel import Field + class GenderAssumptionBiasSafetyValidatorConfig(BaseValidatorConfig): type: Literal["gender_assumption_bias"] - categories: Optional[List[BiasCategories]] = [BiasCategories.All] + categories: Optional[List[BiasCategories]] = Field(default_factory=lambda: [BiasCategories.All])backend/app/core/validators/pii_remover.py (1)
46-46: Setting environment variable in__init__affects global state.Setting
os.environ["TOKENIZERS_PARALLELISM"]in the constructor is a side effect that persists across the entire process. Consider setting this at module load time or in application startup instead.backend/app/models/guardrail_config.py (1)
4-5: Acknowledge TODO for auto-discovery.The TODO for an auto-discovery mechanism is a good improvement to consider for reducing boilerplate when adding new validators.
Would you like me to open an issue to track this improvement, or help design an auto-discovery pattern using Python's entry points or module introspection?
backend/app/models/lexical_slur_safety_validator_config.py (1)
1-1: Use modernlisttype hint instead oftyping.List.Per static analysis,
typing.Listis deprecated in Python 3.9+. Use the built-inlisttype hint.🔎 Proposed fix
-from typing import List, Literal +from typing import LiteralAnd update line 8:
- languages: List[str] = ["en", "hi"] + languages: list[str] = ["en", "hi"]backend/app/models/ban_list_safety_validator_config.py (1)
1-3: Remove unused imports and use modern type hints.
get_validatorandClassVarare imported but never used. Additionally,typing.Listis deprecated in Python 3.9+.🔎 Proposed fix
from guardrails.hub import BanList -from guardrails.utils.validator_utils import get_validator -from typing import ClassVar, List, Literal +from typing import Literal from app.models.base_validator_config import BaseValidatorConfig from app.core.constants import BAN_LIST class BanListSafetyValidatorConfig(BaseValidatorConfig): type: Literal[f"{BAN_LIST}"] - banned_words: List[str] + banned_words: list[str]backend/app/core/validators/utils/language_detector.py (1)
15-21: Unused instance variableself.label.
self.labelis assigned toNonebut never used elsewhere in the class.🔎 Proposed fix
def __init__(self): self.lid = pipeline( task = "text-classification", model="papluca/xlm-roberta-base-language-detection", top_k=1 ) - self.label = Nonebackend/app/api/routes/guardrails.py (2)
45-56: Consider narrowing exception handling and sanitizing error output.Catching all
Exceptiontypes may mask programming errors. Additionally,str(e)in the response could leak internal details (stack traces, file paths) in production.Consider:
- Catching specific expected exceptions (e.g.,
ValidationError,ConfigurationError)- Logging the full exception server-side
- Returning a generic message to the client in production
59-104: Significant duplication withrun_input_guardrails.The input and output endpoints share nearly identical logic. Consider extracting a shared helper to reduce duplication.
🔎 Example refactor
async def _run_guardrails(text: str, validators: list, result_key: str): response_id = str(uuid.uuid4()) try: guard = build_guard(validators) result = guard.validate(text) if result.validated_output is not None: return APIResponse.success_response( data={"response_id": response_id, result_key: result.validated_output} ) # ... validation failure handling except Exception as e: # ... error handling @router.post("/input/") async def run_input_guardrails(payload: GuardrailInputRequest, _: AuthDep): return await _run_guardrails(payload.input, payload.validators, "safe_input") @router.post("/output") async def run_output_guardrails(payload: GuardrailOutputRequest, _: AuthDep): return await _run_guardrails(payload.output, payload.validators, "safe_output")backend/app/core/validators/gender_assumption_bias.py (1)
10-10: Use modern type hints.
typing.Callableandtyping.Listare deprecated. Usecollections.abc.Callableand built-inlist.🔎 Proposed fix
-from typing import Callable, List, Optional +from collections.abc import Callable +from typing import OptionalThen update the type hints:
- categories: Optional[List[BiasCategories]] = None, + categories: Optional[list[BiasCategories]] = None,backend/app/core/validators/lexical_slur.py (5)
9-9: Modernize import to usecollections.abc.Callable.For Python 3.9+, prefer importing
Callablefromcollections.abcrather thantyping.🔎 Suggested fix
-from typing import Callable, Optional +from collections.abc import Callable +from typing import Optional
36-36: Remove unnecessary initialization.The
self.textattribute is set toNonehere but immediately overwritten in the_validatemethod (line 40). This initialization serves no purpose.🔎 Suggested fix
self.languages = languages or ["en", "hi"] self.slur_list = self.load_slur_list() - self.text = None super().__init__(on_fail=on_fail, search_words=self.slur_list)
52-61: Consolidate redundant length checks.The condition
len(detected_slurs) > 0is checked twice (lines 52 and 57). The first check and replacement loop can be merged with the final return logic for cleaner code.🔎 Suggested refactor
- if len(detected_slurs) > 0: - for word in words: - if word in detected_slurs: - self.text = self.text.replace(word, "[REDACTED_SLUR]") - if len(detected_slurs) > 0: + for word in words: + if word in detected_slurs: + self.text = self.text.replace(word, "[REDACTED_SLUR]") return FailResult( error_message=f"Mentioned toxic words: {', '.join(detected_slurs)}", fix_value=self.text ) return PassResult(value=self.text)
92-92: Track the TODO for language filtering.The language filtering feature is not yet implemented. This could lead to false positives if users specify language preferences that aren't respected.
Would you like me to open a new issue to track implementing language-based filtering for the slur list?
86-100: Consider class-level caching for the slur list CSV to optimize performance.The slur list CSV is loaded during every validator instantiation. Since validators are created per-request in the guardrail controller, this causes repeated CSV parsing on each validation call. Moving the loaded slur list to a class-level cached attribute would eliminate this overhead and align with the pattern of other similar configuration-heavy validators.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
backend/app/core/validators/utils/files/curated_slurlist_hi_en.csvis excluded by!**/*.csvbackend/app/core/validators/utils/files/gender_assumption_bias_words.csvis excluded by!**/*.csvbackend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (24)
.env.example(1 hunks)backend/.gitignore(1 hunks)backend/Dockerfile(1 hunks)backend/app/api/main.py(1 hunks)backend/app/api/routes/guardrails.py(1 hunks)backend/app/api/routes/utils.py(1 hunks)backend/app/core/api_response.py(1 hunks)backend/app/core/config.py(1 hunks)backend/app/core/constants.py(1 hunks)backend/app/core/enum.py(1 hunks)backend/app/core/guardrail_controller.py(1 hunks)backend/app/core/validators/gender_assumption_bias.py(1 hunks)backend/app/core/validators/lexical_slur.py(1 hunks)backend/app/core/validators/pii_remover.py(1 hunks)backend/app/core/validators/utils/language_detector.py(1 hunks)backend/app/core/validators/validators.json(1 hunks)backend/app/models/ban_list_safety_validator_config.py(1 hunks)backend/app/models/base_validator_config.py(1 hunks)backend/app/models/gender_assumption_bias_safety_validator_config.py(1 hunks)backend/app/models/guardrail_config.py(1 hunks)backend/app/models/lexical_slur_safety_validator_config.py(1 hunks)backend/app/models/pii_remover_safety_validator_config.py(1 hunks)backend/scripts/entrypoint.sh(1 hunks)backend/scripts/install_guardrails_from_hub.sh(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/app/core/validators/validators.json
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/.gitignore
- backend/Dockerfile
🧰 Additional context used
🧬 Code graph analysis (11)
backend/app/core/guardrail_controller.py (1)
backend/app/models/pii_remover_safety_validator_config.py (1)
build(13-19)
backend/app/core/validators/pii_remover.py (1)
backend/app/core/validators/utils/language_detector.py (3)
LanguageDetector(6-49)predict(28-43)is_hindi(45-46)
backend/app/models/base_validator_config.py (4)
backend/app/models/pii_remover_safety_validator_config.py (1)
build(13-19)backend/app/models/gender_assumption_bias_safety_validator_config.py (1)
build(10-14)backend/app/models/lexical_slur_safety_validator_config.py (1)
build(11-16)backend/app/models/ban_list_safety_validator_config.py (1)
build(12-16)
backend/app/models/pii_remover_safety_validator_config.py (5)
backend/app/models/base_validator_config.py (3)
BaseValidatorConfig(7-27)build(24-27)resolve_on_fail(12-22)backend/app/core/validators/pii_remover.py (1)
PIIRemover(20-74)backend/app/models/gender_assumption_bias_safety_validator_config.py (1)
build(10-14)backend/app/models/lexical_slur_safety_validator_config.py (1)
build(11-16)backend/app/models/ban_list_safety_validator_config.py (1)
build(12-16)
backend/app/api/routes/guardrails.py (3)
backend/app/models/guardrail_config.py (2)
GuardrailInputRequest(22-24)GuardrailOutputRequest(26-28)backend/app/core/api_response.py (3)
APIResponse(4-16)success_response(11-12)failure_response(15-16)backend/app/core/guardrail_controller.py (2)
build_guard(5-7)get_validator_config_models(9-12)
backend/app/models/gender_assumption_bias_safety_validator_config.py (6)
backend/app/models/base_validator_config.py (3)
BaseValidatorConfig(7-27)build(24-27)resolve_on_fail(12-22)backend/app/core/enum.py (1)
BiasCategories(9-13)backend/app/core/validators/gender_assumption_bias.py (1)
GenderAssumptionBias(18-78)backend/app/models/pii_remover_safety_validator_config.py (1)
build(13-19)backend/app/models/lexical_slur_safety_validator_config.py (1)
build(11-16)backend/app/models/ban_list_safety_validator_config.py (1)
build(12-16)
backend/app/models/lexical_slur_safety_validator_config.py (4)
backend/app/core/enum.py (1)
SlurSeverity(3-7)backend/app/core/validators/lexical_slur.py (1)
LexicalSlur(22-101)backend/app/models/base_validator_config.py (2)
build(24-27)resolve_on_fail(12-22)backend/app/models/ban_list_safety_validator_config.py (1)
build(12-16)
backend/app/models/guardrail_config.py (4)
backend/app/models/ban_list_safety_validator_config.py (1)
BanListSafetyValidatorConfig(8-16)backend/app/models/gender_assumption_bias_safety_validator_config.py (1)
GenderAssumptionBiasSafetyValidatorConfig(6-14)backend/app/models/lexical_slur_safety_validator_config.py (1)
LexicalSlurSafetyValidatorConfig(6-16)backend/app/models/pii_remover_safety_validator_config.py (1)
PIIRemoverSafetyValidatorConfig(7-19)
backend/app/models/ban_list_safety_validator_config.py (4)
backend/app/models/base_validator_config.py (3)
BaseValidatorConfig(7-27)build(24-27)resolve_on_fail(12-22)backend/app/models/pii_remover_safety_validator_config.py (1)
build(13-19)backend/app/models/gender_assumption_bias_safety_validator_config.py (1)
build(10-14)backend/app/models/lexical_slur_safety_validator_config.py (1)
build(11-16)
backend/app/core/validators/gender_assumption_bias.py (2)
backend/app/core/config.py (1)
Settings(26-98)backend/app/core/enum.py (1)
BiasCategories(9-13)
backend/app/core/validators/lexical_slur.py (4)
backend/app/core/config.py (1)
Settings(26-98)backend/app/core/enum.py (1)
SlurSeverity(3-7)backend/app/core/validators/pii_remover.py (1)
_validate(51-65)backend/app/core/validators/gender_assumption_bias.py (1)
_validate(32-55)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 17-17: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 22-22: [UnorderedKey] The FIRST_SUPERUSER key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 23-23: [UnorderedKey] The FIRST_SUPERUSER_PASSWORD key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 24-24: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 24-24: [UnorderedKey] The EMAIL_TEST_USER key should go before the FIRST_SUPERUSER key
(UnorderedKey)
[warning] 34-34: [UnorderedKey] The POSTGRES_PORT key should go before the POSTGRES_SERVER key
(UnorderedKey)
[warning] 35-35: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 37-37: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 51-51: [UnorderedKey] The GUARDRAILS_HUB_API_KEY key should go before the OPENAI_API_KEY key
(UnorderedKey)
[warning] 52-52: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
[warning] 52-52: [UnorderedKey] The AUTH_TOKEN key should go before the GUARDRAILS_HUB_API_KEY key
(UnorderedKey)
🪛 Ruff (0.14.8)
backend/app/core/validators/utils/language_detector.py
6-6: Unnecessary parentheses after class definition
Remove parentheses
(UP039)
27-27: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
49-49: No newline at end of file
Add trailing newline
(W292)
backend/app/core/guardrail_controller.py
12-12: No newline at end of file
Add trailing newline
(W292)
backend/app/core/constants.py
6-6: No newline at end of file
Add trailing newline
(W292)
backend/app/core/validators/pii_remover.py
13-13: Import from collections.abc instead: Callable
Import from collections.abc
(UP035)
74-74: No newline at end of file
Add trailing newline
(W292)
backend/app/models/base_validator_config.py
27-27: No newline at end of file
Add trailing newline
(W292)
backend/app/core/enum.py
13-13: No newline at end of file
Add trailing newline
(W292)
backend/app/models/pii_remover_safety_validator_config.py
2-2: typing.List is deprecated, use list instead
(UP035)
backend/app/api/routes/guardrails.py
122-122: No newline at end of file
Add trailing newline
(W292)
backend/app/models/gender_assumption_bias_safety_validator_config.py
1-1: typing.List is deprecated, use list instead
(UP035)
backend/app/models/lexical_slur_safety_validator_config.py
1-1: typing.List is deprecated, use list instead
(UP035)
backend/app/models/guardrail_config.py
2-2: typing.List is deprecated, use list instead
(UP035)
13-18: Use X | Y for type annotations
Convert to X | Y
(UP007)
28-28: No newline at end of file
Add trailing newline
(W292)
backend/app/core/api_response.py
16-16: No newline at end of file
Add trailing newline
(W292)
backend/app/models/ban_list_safety_validator_config.py
3-3: typing.List is deprecated, use list instead
(UP035)
16-16: No newline at end of file
Add trailing newline
(W292)
backend/app/core/validators/gender_assumption_bias.py
10-10: Import from collections.abc instead: Callable
Import from collections.abc
(UP035)
10-10: typing.List is deprecated, use list instead
(UP035)
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
78-78: No newline at end of file
Add trailing newline
(W292)
backend/app/core/validators/lexical_slur.py
9-9: Import from collections.abc instead: Callable
Import from collections.abc
(UP035)
backend/app/api/routes/utils.py
8-8: No newline at end of file
Add trailing newline
(W292)
🪛 Shellcheck (0.11.0)
backend/scripts/install_guardrails_from_hub.sh
[warning] 26-26: Quote this to prevent word splitting.
(SC2046)
[warning] 27-27: Quote this to prevent word splitting.
(SC2046)
🔇 Additional comments (13)
backend/app/core/api_response.py (1)
4-16: LGTM! Clean standardized response model.The
APIResponsemodel provides a consistent structure for API responses with convenient factory methods.backend/app/core/constants.py (1)
1-6: LGTM! Clear constant definitions.The constants are well-named and provide clear semantic meaning for use across the codebase.
backend/app/core/enum.py (1)
3-13: LGTM! Well-structured enum definitions.Both
SlurSeverityandBiasCategoriesenums provide clear categorization with consistent naming and the usefulAlloption for comprehensive filtering.backend/app/models/base_validator_config.py (1)
8-8: Type annotation incompatibility: enum member assigned to string literal annotation.The
on_failfield is annotated asOptional[ON_FAIL_STR]but assignedOnFailAction.FIX. Literal types may contain enum values, but ifON_FAIL_STRis aLiteraltype containing only string values (e.g.,Literal["fix", "reask"]), then assigning an enum member creates a type mismatch. Change the default to the string literal"fix"to align with the annotation.backend/app/api/main.py (1)
1-7: LGTM!The centralized API router setup follows standard FastAPI patterns and correctly aggregates the route modules.
backend/app/models/gender_assumption_bias_safety_validator_config.py (1)
10-14: LGTM!The
build()method correctly follows the established pattern, passingcategoriesand resolvingon_failvia the base class method.backend/app/models/pii_remover_safety_validator_config.py (1)
1-19: LGTM!The config class follows the established pattern correctly. The
build()method properly constructs aPIIRemoverwith all required parameters.Optional: The static analysis tool notes
typing.Listis deprecated in favor of the built-inlist(Python 3.9+), but this is a minor style preference and can be addressed in a future cleanup if desired.backend/app/models/guardrail_config.py (1)
11-28: LGTM!The discriminated union pattern using
Field(discriminator="type")is correctly implemented for polymorphic validator configuration. The request models are well-structured.Minor: Static analysis suggests using modern type syntax (
listinstead ofList,X | Yinstead ofUnion), but this is optional and can be addressed in a future cleanup.backend/app/models/lexical_slur_safety_validator_config.py (1)
11-16: LGTM!The
build()method correctly constructs theLexicalSlurvalidator with proper parameter mapping. TheSlurSeverity(self.severity)conversion aligns with the enum values defined inapp.core.enum.backend/app/models/ban_list_safety_validator_config.py (1)
12-16: LGTM!The
build()method correctly constructs theBanListvalidator. Add a trailing newline at end of file per static analysis.backend/app/core/validators/utils/language_detector.py (1)
45-49: LGTM!The helper methods are clean and straightforward. Add a trailing newline at end of file per static analysis.
backend/app/api/routes/guardrails.py (1)
106-122: LGTM with minor robustness consideration.The endpoint correctly lists validators. Consider adding error handling for the schema key access on line 116 if validator schemas could vary.
backend/app/core/validators/lexical_slur.py (1)
65-84: Well-structured text normalization utilities.The helper methods for text cleaning and normalization are well-implemented, using appropriate libraries (
ftfy,unicodedata,emoji) to handle edge cases like mojibake, fancy Unicode fonts, and emoji removal.
| def __init__( | ||
| self, | ||
| categories: Optional[List[BiasCategories]] = [BiasCategories.All], | ||
| on_fail: Optional[Callable] = OnFailAction.FIX | ||
| ): | ||
| self.gender_bias_list = self.load_gender_bias_list(categories) | ||
| self.text = None | ||
| super().__init__(on_fail=on_fail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutable default argument is a Python anti-pattern.
Using categories: Optional[List[BiasCategories]] = [BiasCategories.All] creates a single list object shared across all calls. If mutated, this causes unexpected behavior. Use None as default and initialize inside the function.
🔎 Proposed fix
def __init__(
self,
- categories: Optional[List[BiasCategories]] = [BiasCategories.All],
+ categories: Optional[List[BiasCategories]] = None,
on_fail: Optional[Callable] = OnFailAction.FIX
):
+ if categories is None:
+ categories = [BiasCategories.All]
self.gender_bias_list = self.load_gender_bias_list(categories)
self.text = None
super().__init__(on_fail=on_fail)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__( | |
| self, | |
| categories: Optional[List[BiasCategories]] = [BiasCategories.All], | |
| on_fail: Optional[Callable] = OnFailAction.FIX | |
| ): | |
| self.gender_bias_list = self.load_gender_bias_list(categories) | |
| self.text = None | |
| super().__init__(on_fail=on_fail) | |
| def __init__( | |
| self, | |
| categories: Optional[List[BiasCategories]] = None, | |
| on_fail: Optional[Callable] = OnFailAction.FIX | |
| ): | |
| if categories is None: | |
| categories = [BiasCategories.All] | |
| self.gender_bias_list = self.load_gender_bias_list(categories) | |
| self.text = None | |
| super().__init__(on_fail=on_fail) |
🧰 Tools
🪛 Ruff (0.14.8)
25-25: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
🤖 Prompt for AI Agents
In backend/app/core/validators/gender_assumption_bias.py around lines 23 to 30,
the __init__ uses a mutable default list for categories which can be shared
across instances; change the signature to use categories:
Optional[List[BiasCategories]] = None and inside __init__ set categories =
[BiasCategories.All] if categories is None before using it (then call
self.load_gender_bias_list(categories)); keep the rest of initialization and
preserve the on_fail handling.
| def load_gender_bias_list(self, categories): | ||
| file_path = Settings.GENDER_BIAS_LIST_FILEPATH | ||
| neutral_term_col = 'neutral-term' | ||
| gender_bias_list = [] | ||
|
|
||
| df = pandas.read_csv(file_path) | ||
| df['word'] = df['word'].str.lower() | ||
| df[neutral_term_col] = df[neutral_term_col].str.lower() | ||
|
|
||
| for category in categories: | ||
| if category == BiasCategories.All: | ||
| temp = df | ||
| else: | ||
| temp = df[df['type'] == category] | ||
|
|
||
| rows = temp.to_dict(orient="records") | ||
| for row in rows: | ||
| gender_bias_list.append({ | ||
| "word": row["word"], | ||
| neutral_term_col: row[neutral_term_col] | ||
| }) | ||
| return gender_bias_list No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential duplicate entries when BiasCategories.All is combined with other categories.
If categories contains BiasCategories.All alongside other specific categories (e.g., [BiasCategories.All, BiasCategories.Healthcare]), all entries are added multiple times. This causes redundant processing and potentially double-replacements.
🔎 Proposed fix: early return for All or deduplicate
def load_gender_bias_list(self, categories):
file_path = Settings.GENDER_BIAS_LIST_FILEPATH
neutral_term_col = 'neutral-term'
- gender_bias_list = []
df = pandas.read_csv(file_path)
df['word'] = df['word'].str.lower()
df[neutral_term_col] = df[neutral_term_col].str.lower()
+ # If All is present, just return all entries
+ if BiasCategories.All in categories:
+ return df[['word', neutral_term_col]].rename(
+ columns={neutral_term_col: 'neutral-term'}
+ ).to_dict(orient='records')
+
+ gender_bias_list = []
for category in categories:
- if category == BiasCategories.All:
- temp = df
- else:
- temp = df[df['type'] == category]
-
+ temp = df[df['type'] == category.value]
rows = temp.to_dict(orient="records")
for row in rows:
gender_bias_list.append({
"word": row["word"],
neutral_term_col: row[neutral_term_col]
})
return gender_bias_list📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def load_gender_bias_list(self, categories): | |
| file_path = Settings.GENDER_BIAS_LIST_FILEPATH | |
| neutral_term_col = 'neutral-term' | |
| gender_bias_list = [] | |
| df = pandas.read_csv(file_path) | |
| df['word'] = df['word'].str.lower() | |
| df[neutral_term_col] = df[neutral_term_col].str.lower() | |
| for category in categories: | |
| if category == BiasCategories.All: | |
| temp = df | |
| else: | |
| temp = df[df['type'] == category] | |
| rows = temp.to_dict(orient="records") | |
| for row in rows: | |
| gender_bias_list.append({ | |
| "word": row["word"], | |
| neutral_term_col: row[neutral_term_col] | |
| }) | |
| return gender_bias_list | |
| def load_gender_bias_list(self, categories): | |
| file_path = Settings.GENDER_BIAS_LIST_FILEPATH | |
| neutral_term_col = 'neutral-term' | |
| df = pandas.read_csv(file_path) | |
| df['word'] = df['word'].str.lower() | |
| df[neutral_term_col] = df[neutral_term_col].str.lower() | |
| # If All is present, just return all entries | |
| if BiasCategories.All in categories: | |
| return df[['word', neutral_term_col]].rename( | |
| columns={neutral_term_col: 'neutral-term'} | |
| ).to_dict(orient='records') | |
| gender_bias_list = [] | |
| for category in categories: | |
| temp = df[df['type'] == category.value] | |
| rows = temp.to_dict(orient="records") | |
| for row in rows: | |
| gender_bias_list.append({ | |
| "word": row["word"], | |
| neutral_term_col: row[neutral_term_col] | |
| }) | |
| return gender_bias_list |
🧰 Tools
🪛 Ruff (0.14.8)
78-78: No newline at end of file
Add trailing newline
(W292)
🤖 Prompt for AI Agents
In backend/app/core/validators/gender_assumption_bias.py around lines 57 to 78,
the load_gender_bias_list function will append duplicate entries when
BiasCategories.All is present together with other categories; either
short-circuit: if BiasCategories.All in categories, read the CSV, normalize
columns and immediately return the full list (skip per-category filtering), or
after building gender_bias_list deduplicate entries by word+neutral-term (e.g.,
use a seen set to keep first occurrence) before returning so no duplicates are
produced.
| if self.severity == SlurSeverity.Low: | ||
| return df[df['severity'].isin(['L', 'M', 'H'])]['label'].tolist() | ||
| elif self.severity == SlurSeverity.Medium: | ||
| return df[df['severity'].isin(['M', 'H'])]['label'].tolist() | ||
| elif self.severity == SlurSeverity.High: | ||
| return df[df['severity'] == 'H']['label'].tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify severity filtering logic.
The severity filtering appears counterintuitive: SlurSeverity.Low returns slurs of all severities (L, M, H), which suggests "Low" means "low threshold" (catch everything), but the enum naming implies "Low" should mean "only low-severity slurs."
Consider either:
- Renaming the enum values to reflect threshold semantics (e.g.,
StrictThreshold,ModerateThreshold,PermissiveThreshold) - Inverting the logic so
Lowreturns only low-severity slurs - Adding clear documentation explaining that severity levels represent filter thresholds, not slur categories
Based on the enum definition in backend/app/core/enum.py, the current naming is ambiguous and could confuse API users.
🤖 Prompt for AI Agents
In backend/app/core/validators/lexical_slur.py around lines 94-99, the current
severity-to-filter mapping treats SlurSeverity.Low as an inclusive threshold
(L,M,H) which conflicts with the enum naming; change the logic so each enum maps
to its single matching severity (Low -> ['L'], Medium -> ['M'], High -> ['H'])
and add a brief docstring above the method clarifying that the enum represents
slur categories (not thresholds); run/update unit tests that depend on this
behavior to reflect the new semantics.
| def __init__( | ||
| self, | ||
| entity_types=None, | ||
| threshold=0.5, | ||
| language="en", | ||
| on_fail: Optional[Callable] = OnFailAction.FIX | ||
| ): | ||
| super().__init__(on_fail=on_fail) | ||
|
|
||
| self.entity_types = entity_types or ["ALL"] | ||
| self.threshold = threshold | ||
| self.language = language | ||
| self.language_detector = LanguageDetector() | ||
|
|
||
| if self.language not in ALL_SUPPORTED_LANGUAGES: | ||
| raise Exception( | ||
| f"Language must be in {ALL_SUPPORTED_LANGUAGES}" | ||
| ) | ||
|
|
||
| os.environ["TOKENIZERS_PARALLELISM"] = "false" # Disables huggingface/tokenizers warning | ||
|
|
||
| self.analyzer = AnalyzerEngine() | ||
| self.anonymizer = AnonymizerEngine() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entity_types and threshold are initialized but never used.
The entity_types and threshold parameters are stored on the instance but are not passed to self.analyzer.analyze() in run_english_presidio(). This means the validator ignores these configuration options.
🔎 Proposed fix to use entity_types and threshold
def run_english_presidio(self, text: str):
- results = self.analyzer.analyze(text=text,
- language="en")
+ results = self.analyzer.analyze(
+ text=text,
+ language="en",
+ entities=self.entity_types if self.entity_types != ["ALL"] else None,
+ score_threshold=self.threshold
+ )
anonymized = self.anonymizer.anonymize(text=text, analyzer_results=results)
return anonymized.textCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/app/core/validators/pii_remover.py around lines 27 to 49,
entity_types and threshold are stored on the instance but never used; update
run_english_presidio() to pass these into the Presidio AnalyzerEngine.analyze
call (use entities=self.entity_types and score_threshold=self.threshold) so the
analyzer respects the configured entity filter and confidence threshold; ensure
the default ["ALL"] remains acceptable to the analyze call or map it to None if
Presidio expects None for all entities.
| def run_hinglish_presidio(self, text: str): | ||
| return text No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hinglish PII removal is a no-op stub.
run_hinglish_presidio returns the original text without any anonymization. If Hindi text contains PII, it won't be detected or removed. Consider adding a warning log or raising an exception until this is implemented, so users are aware of the limitation.
🔎 Proposed improvement
+import logging
+
+logger = logging.getLogger(__name__)
+
def run_hinglish_presidio(self, text: str):
+ logger.warning("Hinglish PII removal is not yet implemented; returning original text.")
return text
+Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.8)
74-74: No newline at end of file
Add trailing newline
(W292)
🤖 Prompt for AI Agents
In backend/app/core/validators/pii_remover.py around lines 73-74, the
run_hinglish_presidio method is a no-op that returns the original text, so
Hinglish/Hindi PII will not be anonymized; replace this silent stub with an
explicit failure mode by either (a) logging a clear warning with contextual info
that Hinglish PII removal is not implemented and returning the text only if
caller can accept it, or (b) raising a NotImplementedError (or a custom
exception) to force callers to handle the missing functionality until a real
Hinglish PII pipeline is implemented; ensure the chosen approach includes a
concise message indicating the limitation and guidance to use the English PII
remover or implement Hinglish support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (5)
backend/app/core/validators/gender_assumption_bias.py (2)
22-29: Mutable default argument remains unfixed.This issue was flagged in a previous review. Using a mutable list
[BiasCategories.All]as a default argument creates a shared object across all calls, leading to potential bugs if mutated.🔎 Proposed fix
def __init__( self, - categories: Optional[List[BiasCategories]] = [BiasCategories.All], + categories: Optional[List[BiasCategories]] = None, on_fail: Optional[Callable] = OnFailAction.FIX ): + if categories is None: + categories = [BiasCategories.All] self.gender_bias_list = self.load_gender_bias_list(categories) self.text = None super().__init__(on_fail=on_fail)
71-83: Duplicate entries can still occur whenBiasCategories.Allis mixed with specific categories.This issue was flagged in a previous review. If
categoriescontains bothBiasCategories.Alland other specific categories, entries will be appended multiple times, causing redundant processing.🔎 Proposed fix: early return for All
def load_gender_bias_list(self, categories): file_path = Settings.GENDER_BIAS_LIST_FILEPATH neutral_term_col = 'neutral-term' - gender_bias_list = [] try: df = pandas.read_csv(file_path) except FileNotFoundError: raise FileNotFoundError(f"Gender bias file not found at {file_path}") except Exception as e: raise ValueError(f"Failed to load gender bias list from {file_path}: {e}") df['word'] = df['word'].str.lower() df[neutral_term_col] = df[neutral_term_col].str.lower() + # If All is present, return all entries immediately + if BiasCategories.All in categories: + rows = df.to_dict(orient="records") + return [{ + "word": row["word"], + neutral_term_col: row[neutral_term_col] + } for row in rows] + + gender_bias_list = [] for category in categories: - if category == BiasCategories.All: - temp = df - else: - temp = df[df['type'] == category.value] + temp = df[df['type'] == category.value] rows = temp.to_dict(orient="records") for row in rows: gender_bias_list.append({ "word": row["word"], neutral_term_col: row[neutral_term_col] }) return gender_bias_listbackend/app/core/validators/lexical_slur.py (1)
100-107: Severity filtering semantics remain counterintuitive.This issue was flagged in a previous review.
SlurSeverity.Lowcurrently returns all severities (L, M, H), which conflicts with the enum naming. The enum name implies "Low severity slurs only," but the logic treats it as "low threshold (catch everything)."Consider renaming the enum values to reflect threshold semantics (e.g.,
PermissiveThreshold,ModerateThreshold,StrictThreshold) or inverting the logic soLowreturns only low-severity slurs.backend/app/core/validators/pii_remover.py (2)
36-37:entity_typesandthresholdremain unused.This issue was flagged in a previous review. The parameters are stored on the instance but never passed to
self.analyzer.analyze()inrun_english_presidio(), so the validator ignores these configuration options.🔎 Proposed fix
def run_english_presidio(self, text: str): - results = self.analyzer.analyze(text=text, - language="en") + results = self.analyzer.analyze( + text=text, + language="en", + entities=self.entity_types if self.entity_types != ["ALL"] else None, + score_threshold=self.threshold + ) anonymized = self.anonymizer.anonymize(text=text, analyzer_results=results) return anonymized.textAlso applies to: 66-70
72-73: Hinglish PII removal remains a no-op stub.This issue was flagged in a previous review. The method returns the original text without any anonymization. If Hindi text contains PII, it won't be detected or removed, creating a security gap.
Consider adding a warning log or raising a
NotImplementedErroruntil this is implemented, so users are aware of the limitation.
🧹 Nitpick comments (12)
.env.example (1)
9-41: Reorder environment variables alphabetically and add trailing newline.The static analysis tool (dotenv-linter) flags multiple key-ordering inconsistencies. Reorganizing variables alphabetically improves consistency and predictability, especially for template files that serve as deployment guidance. Additionally, add a blank line at the end of the file.
🔎 Proposed reordering
-DOMAIN=localhost - -# DOMAIN=localhost.tiangolo.com - -# Environment: "development", "testing", "staging", "production" - -ENVIRONMENT=development - -PROJECT_NAME="Kaapi-Guardrails" -STACK_NAME=Kaapi-Guardrails - -#Backend -SECRET_KEY=changethis -FIRST_SUPERUSER=superuser@example.com -FIRST_SUPERUSER_PASSWORD=changethis -EMAIL_TEST_USER="test@example.com" - -# API Base URL for cron scripts (defaults to http://localhost:8000 if not set) -API_BASE_URL=http://localhost:8000 - -# Postgres -POSTGRES_SERVER=localhost -POSTGRES_PORT=5432 -POSTGRES_DB=kaapi-guardrails -POSTGRES_USER=postgres -POSTGRES_PASSWORD=postgres - -SENTRY_DSN= - -# Configure these with your own Docker registry images - -DOCKER_IMAGE_BACKEND=kaapi-guardrails-backend - -# Callback Timeouts (in seconds) -CALLBACK_CONNECT_TIMEOUT=3 -CALLBACK_READ_TIMEOUT=10 - -# require as a env if you want to use doc transformation -OPENAI_API_KEY=<ADD-KEY> -GUARDRAILS_HUB_API_KEY=<ADD-KEY> -AUTH_TOKEN=<ADD-TOKEN> +# Environment: "development", "testing", "staging", "production" +ENVIRONMENT=development + +API_BASE_URL=http://localhost:8000 +AUTH_TOKEN=<ADD-TOKEN> +CALLBACK_CONNECT_TIMEOUT=3 +CALLBACK_READ_TIMEOUT=10 +DOMAIN=localhost +# DOMAIN=localhost.tiangolo.com +DOCKER_IMAGE_BACKEND=kaapi-guardrails-backend +EMAIL_TEST_USER="test@example.com" +FIRST_SUPERUSER=superuser@example.com +FIRST_SUPERUSER_PASSWORD=changethis +GUARDRAILS_HUB_API_KEY=<ADD-KEY> +OPENAI_API_KEY=<ADD-KEY> +POSTGRES_DB=kaapi-guardrails +POSTGRES_PASSWORD=postgres +POSTGRES_PORT=5432 +POSTGRES_SERVER=localhost +POSTGRES_USER=postgres +PROJECT_NAME="Kaapi-Guardrails" +SECRET_KEY=changethis +SENTRY_DSN= +STACK_NAME=Kaapi-Guardrailscopier.yml (1)
1-40: LGTM with optional improvement suggestion.The template variables are well-structured with appropriate types, help text, and defaults. The "changethis" defaults are acceptable for template scaffolding, and generation instructions are provided for
secret_keyandpostgres_password.💡 Optional: Add password generation instructions for first_superuser_password
For consistency with
secret_keyandpostgres_password, consider adding generation instructions tofirst_superuser_password:first_superuser_password: type: str - help: The password of the first superuser (in .env) + help: | + 'The password of the first superuser, stored in .env, + you can generate one with: + python -c "import secrets; print(secrets.token_urlsafe(32))"' default: changethisbackend/app/core/validators/gender_assumption_bias.py (1)
83-83: Add trailing newline.}) - return gender_bias_list + return gender_bias_listbackend/app/core/validators/lexical_slur.py (2)
98-98: TODO: Language filtering not yet implemented.Language filtering is deferred, which is acceptable for boilerplate scaffolding. However, ensure this is tracked.
Do you want me to open a new issue to track this language filtering implementation?
107-107: Add trailing newline.- return df['label'].tolist() + return df['label'].tolist()backend/app/core/validators/pii_remover.py (1)
73-73: Add trailing newline.def run_hinglish_presidio(self, text: str): - return text + return textbackend/app/core/validators/utils/language_detector.py (4)
14-20: Unused instance variableself.label.The
self.labelattribute is initialized but never used in the class. This appears to be leftover from an earlier implementation.🔎 Proposed fix
def __init__(self): self.lid = pipeline( task = "text-classification", model="papluca/xlm-roberta-base-language-detection", top_k=1 ) - self.label = None
26-41: Consider caching for repeated language detection calls.The
predictmethod calls the transformer model on every invocation. If the same text is validated multiple times (e.g., in a guardrails pipeline), this could impact performance. However, be mindful of memory leaks when caching instance methods.Consider implementing a module-level cache or a bounded per-instance cache (e.g., using
cachetools.LRUCache) if performance profiling shows this is a bottleneck. Avoid@lru_cacheon instance methods as it prevents garbage collection.
5-5: Remove unnecessary parentheses after class definition.-class LanguageDetector(): +class LanguageDetector:
47-47: Add trailing newline.def is_english(self, text: str): - return self.predict(text)[LABEL] == LANG_ENGLISH + return self.predict(text)[LABEL] == LANG_ENGLISHbackend/app/api/routes/guardrails.py (2)
24-29: Inconsistent response patterns between success and error paths.Success responses return
APIResponse.success_response()directly (lines 24-29), while error responses wrapAPIResponse.failure_response()in aJSONResponsewith.model_dump()(lines 32-44). Consider using a consistent pattern throughout.💡 Option 1: Use APIResponse directly for both
- return JSONResponse( - status_code=status.HTTP_400_BAD_REQUEST, - content=APIResponse.failure_response( + response = APIResponse.failure_response( error={ "response_id": response_id, "type": "validation_error", "action": "reask" if result.failures else "fail", "failures": [ f.failure_message for f in (result.failures or []) ], } - ).model_dump(), ) + return JSONResponse( + status_code=status.HTTP_400_BAD_REQUEST, + content=response.model_dump(), + )Or wrap both success and error in JSONResponse with appropriate status codes.
Also applies to: 32-44
123-123: Add missing newline at end of file.Static analysis tools expect a newline character at the end of the file per POSIX standards.
return {"validators": validators} +
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/app/core/validators/utils/files/gender_assumption_bias_words.csvis excluded by!**/*.csv
📒 Files selected for processing (8)
.env.examplebackend/app/api/routes/guardrails.pybackend/app/core/validators/gender_assumption_bias.pybackend/app/core/validators/lexical_slur.pybackend/app/core/validators/pii_remover.pybackend/app/core/validators/utils/language_detector.pycopier.ymldeployment.md
🧰 Additional context used
🧬 Code graph analysis (4)
backend/app/core/validators/pii_remover.py (3)
backend/app/core/validators/utils/language_detector.py (2)
LanguageDetector(5-47)is_hindi(43-44)backend/app/core/validators/gender_assumption_bias.py (1)
_validate(31-54)backend/app/core/validators/lexical_slur.py (1)
_validate(39-63)
backend/app/api/routes/guardrails.py (3)
backend/app/models/guardrail_config.py (2)
GuardrailInputRequest(22-24)GuardrailOutputRequest(26-28)backend/app/core/api_response.py (3)
APIResponse(4-16)success_response(11-12)failure_response(15-16)backend/app/core/guardrail_controller.py (2)
build_guard(5-7)get_validator_config_models(9-12)
backend/app/core/validators/lexical_slur.py (2)
backend/app/core/config.py (1)
Settings(26-98)backend/app/core/enum.py (1)
SlurSeverity(3-7)
backend/app/core/validators/gender_assumption_bias.py (1)
backend/app/core/enum.py (1)
BiasCategories(9-13)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 9-9: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 14-14: [UnorderedKey] The FIRST_SUPERUSER key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 15-15: [UnorderedKey] The FIRST_SUPERUSER_PASSWORD key should go before the SECRET_KEY key
(UnorderedKey)
[warning] 16-16: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
[warning] 16-16: [UnorderedKey] The EMAIL_TEST_USER key should go before the FIRST_SUPERUSER key
(UnorderedKey)
[warning] 23-23: [UnorderedKey] The POSTGRES_PORT key should go before the POSTGRES_SERVER key
(UnorderedKey)
[warning] 24-24: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 26-26: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 40-40: [UnorderedKey] The GUARDRAILS_HUB_API_KEY key should go before the OPENAI_API_KEY key
(UnorderedKey)
[warning] 41-41: [EndingBlankLine] No blank line at the end of the file
(EndingBlankLine)
[warning] 41-41: [UnorderedKey] The AUTH_TOKEN key should go before the GUARDRAILS_HUB_API_KEY key
(UnorderedKey)
🪛 Ruff (0.14.8)
backend/app/core/validators/pii_remover.py
13-13: Import from collections.abc instead: Callable
Import from collections.abc
(UP035)
73-73: No newline at end of file
Add trailing newline
(W292)
backend/app/api/routes/guardrails.py
123-123: No newline at end of file
Add trailing newline
(W292)
backend/app/core/validators/utils/language_detector.py
5-5: Unnecessary parentheses after class definition
Remove parentheses
(UP039)
47-47: No newline at end of file
Add trailing newline
(W292)
backend/app/core/validators/lexical_slur.py
9-9: Import from collections.abc instead: Callable
Import from collections.abc
(UP035)
backend/app/core/validators/gender_assumption_bias.py
9-9: Import from collections.abc instead: Callable
Import from collections.abc
(UP035)
9-9: typing.List is deprecated, use list instead
(UP035)
24-24: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
83-83: No newline at end of file
Add trailing newline
(W292)
🔇 Additional comments (6)
deployment.md (1)
1-7: Placeholder documentation aligns with prior feedback.The file intentionally provides only section outlines, consistent with the previous review suggestion to keep it minimal and add a "coming soon" note. Once guardrails validators and additional features are implemented, this file can be populated with concrete deployment steps.
.env.example (1)
18-19: Clarify the necessity and scope of theAPI_BASE_URL(cron) variable.A previous reviewer questioned whether cron functionality is necessary for this PR. Since this PR focuses on boilerplate and authentication scaffolding, and given feedback that functionality should take precedence over scaffolding, please confirm whether the cron-related configuration should be included or deferred to a follow-up PR when cron jobs are actually implemented.
backend/app/core/validators/lexical_slur.py (1)
89-94: Good error handling for CSV loading.The CSV loading now includes proper error handling as recommended in the previous review. This prevents unhandled exceptions during validator initialization.
backend/app/core/validators/pii_remover.py (1)
54-57: Language detection logic fixed correctly.The previous critical bug (comparing a dict to a boolean) has been resolved. The code now directly uses
is_hindi(text)to determine the processing path, which is correct and clean.backend/app/api/routes/guardrails.py (2)
1-10: Imports and router setup look good.The uuid import properly addresses the past review comment about hardcoded response_id values. All dependencies are appropriately imported.
38-38: Clarify the logic for "reask" vs "fail" action.Lines 38 and 86 set
action="fail"whenresult.failuresis empty, but this code is only reached whenresult.validated_output is None. This implies the guard returned neither a validated output nor failure messages, which seems like an edge case worth documenting or handling explicitly.Please verify the expected behavior when the guardrails
validate()method returns a result with bothvalidated_output=Noneandfailures=None/[]. Should this default to "fail" or does it indicate a different condition?Also applies to: 86-86
| @router.post("/input/") | ||
| async def run_input_guardrails( | ||
| payload: GuardrailInputRequest, | ||
| _: AuthDep, | ||
| ): | ||
| response_id = str(uuid.uuid4()) | ||
|
|
||
| try: | ||
| guard = build_guard(payload.validators) | ||
| result = guard.validate(payload.input) | ||
|
|
||
| if result.validated_output is not None: | ||
| return APIResponse.success_response( | ||
| data={ | ||
| "response_id": response_id, | ||
| "safe_input": result.validated_output, | ||
| } | ||
| ) | ||
|
|
||
| # validation failure → 400 | ||
| return JSONResponse( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| content=APIResponse.failure_response( | ||
| error={ | ||
| "response_id": response_id, | ||
| "type": "validation_error", | ||
| "action": "reask" if result.failures else "fail", | ||
| "failures": [ | ||
| f.failure_message for f in (result.failures or []) | ||
| ], | ||
| } | ||
| ).model_dump(), | ||
| ) | ||
|
|
||
| except Exception as e: | ||
| # config/runtime error → 500 | ||
| return JSONResponse( | ||
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| content=APIResponse.failure_response( | ||
| error={ | ||
| "response_id": response_id, | ||
| "type": "config_error", | ||
| "reason": str(e), | ||
| } | ||
| ).model_dump(), | ||
| ) | ||
|
|
||
|
|
||
| @router.post("/output") | ||
| async def run_output_guardrails( | ||
| payload: GuardrailOutputRequest, | ||
| _: AuthDep, | ||
| ): | ||
| response_id = str(uuid.uuid4()) | ||
|
|
||
| try: | ||
| guard = build_guard(payload.validators) | ||
| result = guard.validate(payload.output) | ||
|
|
||
| if result.validated_output is not None: | ||
| return APIResponse.success_response( | ||
| data={ | ||
| "response_id": response_id, | ||
| "safe_output": result.validated_output, | ||
| } | ||
| ) | ||
|
|
||
| # validation failure → 400 | ||
| return JSONResponse( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| content=APIResponse.failure_response( | ||
| error={ | ||
| "response_id": response_id, | ||
| "type": "validation_error", | ||
| "action": "reask" if result.failures else "fail", | ||
| "failures": [ | ||
| f.failure_message for f in (result.failures or []) | ||
| ], | ||
| } | ||
| ).model_dump(), | ||
| ) | ||
|
|
||
| except Exception as e: | ||
| # config/runtime error → 500 | ||
| return JSONResponse( | ||
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| content=APIResponse.failure_response( | ||
| error={ | ||
| "response_id": response_id, | ||
| "type": "config_error", | ||
| "reason": str(e), | ||
| } | ||
| ).model_dump(), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Significant code duplication between input and output endpoints.
The run_input_guardrails and run_output_guardrails endpoints share ~90% identical logic. Extract the common validation flow into a helper function to improve maintainability.
🔎 Proposed refactor to eliminate duplication
Add a helper function before the route handlers:
async def _validate_with_guard(
data: str,
validators: list,
response_field: str # "safe_input" or "safe_output"
) -> JSONResponse | APIResponse:
response_id = str(uuid.uuid4())
try:
guard = build_guard(validators)
result = guard.validate(data)
if result.validated_output is not None:
return APIResponse.success_response(
data={
"response_id": response_id,
response_field: result.validated_output,
}
)
return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST,
content=APIResponse.failure_response(
error={
"response_id": response_id,
"type": "validation_error",
"action": "reask" if result.failures else "fail",
"failures": [
f.failure_message for f in (result.failures or [])
],
}
).model_dump(),
)
except Exception as e:
return JSONResponse(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
content=APIResponse.failure_response(
error={
"response_id": response_id,
"type": "config_error",
"reason": str(e),
}
).model_dump(),
)Then simplify the endpoints:
@router.post("/input/")
async def run_input_guardrails(
payload: GuardrailInputRequest,
_: AuthDep,
):
- response_id = str(uuid.uuid4())
-
- try:
- guard = build_guard(payload.validators)
- result = guard.validate(payload.input)
- # ... rest of implementation
- except Exception as e:
- # ... error handling
+ return await _validate_with_guard(
+ payload.input,
+ payload.validators,
+ "safe_input"
+ )
@router.post("/output")
async def run_output_guardrails(
payload: GuardrailOutputRequest,
_: AuthDep,
):
- response_id = str(uuid.uuid4())
-
- try:
- guard = build_guard(payload.validators)
- result = guard.validate(payload.output)
- # ... rest of implementation
- except Exception as e:
- # ... error handling
+ return await _validate_with_guard(
+ payload.output,
+ payload.validators,
+ "safe_output"
+ )Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/app/api/routes/guardrails.py around lines 12-105, both
run_input_guardrails and run_output_guardrails duplicate the same validation
flow; extract that shared logic into a single helper (e.g.,
_validate_with_guard(data: str, validators: list, response_field: str) ->
JSONResponse | APIResponse) that: generates response_id, builds the guard, runs
validate, returns APIResponse.success_response with response_field set to
validated_output on success, returns a 400 JSONResponse with the same failure
payload on validation failures, and returns a 500 JSONResponse with the
config_error payload on exceptions; then replace the two route handlers to call
this helper with payload.input/payload.output and "safe_input"/"safe_output"
respectively.
| for model in validator_config_models: | ||
| schema = model.model_json_schema() | ||
| validator_type = schema["properties"]["type"]["const"] | ||
| validators.append({ | ||
| "type": validator_type, | ||
| "config": schema, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for schema access to prevent KeyError.
Line 117 assumes the schema has a specific nested structure (["properties"]["type"]["const"]). If a validator model's schema doesn't match this structure, a KeyError will be raised, resulting in a 500 error.
🔎 Proposed fix with defensive access
for model in validator_config_models:
schema = model.model_json_schema()
- validator_type = schema["properties"]["type"]["const"]
+ validator_type = schema.get("properties", {}).get("type", {}).get("const")
+ if not validator_type:
+ # Log warning or skip invalid validators
+ continue
validators.append({
"type": validator_type,
"config": schema,
})Alternatively, wrap the loop in a try-except to gracefully handle schema extraction failures:
for model in validator_config_models:
+ try:
schema = model.model_json_schema()
validator_type = schema["properties"]["type"]["const"]
validators.append({
"type": validator_type,
"config": schema,
})
+ except (KeyError, TypeError) as e:
+ # Log the error and continue with other validators
+ continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for model in validator_config_models: | |
| schema = model.model_json_schema() | |
| validator_type = schema["properties"]["type"]["const"] | |
| validators.append({ | |
| "type": validator_type, | |
| "config": schema, | |
| }) | |
| for model in validator_config_models: | |
| schema = model.model_json_schema() | |
| validator_type = schema.get("properties", {}).get("type", {}).get("const") | |
| if not validator_type: | |
| # Log warning or skip invalid validators | |
| continue | |
| validators.append({ | |
| "type": validator_type, | |
| "config": schema, | |
| }) |
🤖 Prompt for AI Agents
In backend/app/api/routes/guardrails.py around lines 115 to 121, the code
assumes schema["properties"]["type"]["const"] exists which can raise KeyError;
update the loop to defensively access nested keys (e.g., use dict.get at each
level or a helper that returns None if missing), validate that the extracted
validator_type is present and skip or log/raise a controlled error for that
model instead of crashing, and ensure validators.append only when a valid type
and config are obtained.
| _exclude: | ||
| # Global | ||
| - .vscode | ||
| - .mypy_cache | ||
| # Python | ||
| - __pycache__ | ||
| - app.egg-info | ||
| - "*.pyc" | ||
| - .mypy_cache | ||
| - .coverage | ||
| - htmlcov | ||
| - .cache | ||
| - .venv | ||
| # Frontend | ||
| # Logs | ||
| - logs | ||
| - "*.log" | ||
| - npm-debug.log* | ||
| - yarn-debug.log* | ||
| - yarn-error.log* | ||
| - pnpm-debug.log* | ||
| - lerna-debug.log* | ||
| - node_modules | ||
| - dist | ||
| - dist-ssr | ||
| - "*.local" | ||
| # Editor directories and files | ||
| - .idea | ||
| - .DS_Store | ||
| - "*.suo" | ||
| - "*.ntvs*" | ||
| - "*.njsproj" | ||
| - "*.sln" | ||
| - "*.sw?" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate entry and reorganize comments for clarity.
Two issues affect maintainability:
- Duplicate exclusion:
.mypy_cacheappears on both lines 45 and 50. - Misleading comment: The "# Frontend" comment on line 55 suggests frontend exclusions follow, but the next comment is "# Logs". Frontend-related patterns (node_modules, dist, etc.) appear later without clear grouping.
🔎 Proposed fix
_exclude:
# Global
- .vscode
- .mypy_cache
# Python
- __pycache__
- app.egg-info
- "*.pyc"
- - .mypy_cache
- .coverage
- htmlcov
- .cache
- .venv
- # Frontend
# Logs
- logs
- "*.log"
- npm-debug.log*
- yarn-debug.log*
- yarn-error.log*
- pnpm-debug.log*
- lerna-debug.log*
+ # Frontend
- node_modules
- dist
- dist-ssr
- "*.local"
# Editor directories and files
- .idea
- .DS_Store
- "*.suo"
- "*.ntvs*"
- "*.njsproj"
- "*.sln"
- "*.sw?"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _exclude: | |
| # Global | |
| - .vscode | |
| - .mypy_cache | |
| # Python | |
| - __pycache__ | |
| - app.egg-info | |
| - "*.pyc" | |
| - .mypy_cache | |
| - .coverage | |
| - htmlcov | |
| - .cache | |
| - .venv | |
| # Frontend | |
| # Logs | |
| - logs | |
| - "*.log" | |
| - npm-debug.log* | |
| - yarn-debug.log* | |
| - yarn-error.log* | |
| - pnpm-debug.log* | |
| - lerna-debug.log* | |
| - node_modules | |
| - dist | |
| - dist-ssr | |
| - "*.local" | |
| # Editor directories and files | |
| - .idea | |
| - .DS_Store | |
| - "*.suo" | |
| - "*.ntvs*" | |
| - "*.njsproj" | |
| - "*.sln" | |
| - "*.sw?" | |
| _exclude: | |
| # Global | |
| - .vscode | |
| - .mypy_cache | |
| # Python | |
| - __pycache__ | |
| - app.egg-info | |
| - "*.pyc" | |
| - .coverage | |
| - htmlcov | |
| - .cache | |
| - .venv | |
| # Logs | |
| - logs | |
| - "*.log" | |
| - npm-debug.log* | |
| - yarn-debug.log* | |
| - yarn-error.log* | |
| - pnpm-debug.log* | |
| - lerna-debug.log* | |
| # Frontend | |
| - node_modules | |
| - dist | |
| - dist-ssr | |
| - "*.local" | |
| # Editor directories and files | |
| - .idea | |
| - .DS_Store | |
| - "*.suo" | |
| - "*.ntvs*" | |
| - "*.njsproj" | |
| - "*.sln" | |
| - "*.sw?" |
🤖 Prompt for AI Agents
In copier.yml around lines 42 to 76, remove the duplicate ".mypy_cache" entry
(currently listed twice) and reorganize the comment blocks so that related
exclusions are grouped correctly: place the "# Frontend" comment immediately
above frontend patterns (node_modules, dist, dist-ssr, "*.local"), keep "# Logs"
above log-related patterns (logs, "*.log", npm-debug.log*, yarn-debug.log*,
etc.), and ensure global/python/editor groups remain intact and not interleaved;
update spacing/comments accordingly for clear grouping.
| _tasks: | ||
| - ["{{ _copier_python }}", .copier/update_dotenv.py] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if .copier/update_dotenv.py exists
if fd -t f "update_dotenv.py" .copier/ 2>/dev/null; then
echo "File found"
else
echo "File not found"
fi
# Also check if .copier directory exists
if [ -d ".copier" ]; then
echo ".copier directory exists"
ls -la .copier/ | head -20
else
echo ".copier directory does not exist"
fiRepository: ProjectTech4DevAI/kaapi-guardrails
Length of output: 125
Create the missing .copier/update_dotenv.py script or remove the task reference.
The _tasks section references .copier/update_dotenv.py, but this script and the .copier directory do not exist in the repository. The copier template will fail during execution. Either include the script in this PR or remove the task reference from lines 79-80.
🤖 Prompt for AI Agents
In copier.yml around lines 79-80, the _tasks entry references a non-existent
file .copier/update_dotenv.py which will break template execution; either add
the missing script at .copier/update_dotenv.py implementing the intended dotenv
update logic and include it in the repo, or remove (or comment out) the task
entry from _tasks so the copier template no longer attempts to run it; if adding
the script, ensure it is executable and documented in the template README, and
if removing, verify no other templates depend on it and run a local copier
dry-run to confirm the template succeeds.
Summary
This pull request contains the boilerplate code and bearer token based authentication for Kaapi-Guardrails.
Target issue is #2
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
We are explicitly looking for feedback on the boilerplate code and integration. The code for guardrails and validators will be taken up in a separate PR later on.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Infrastructure & Configuration
✏️ Tip: You can customize this high-level summary in your review settings.