Conversation
|
There was a problem hiding this comment.
Pull request overview
This PR introduces deployment and operations support for an OpenClaw (“clawbot”) service in the remedios Kubernetes stack, including OCI-backed state snapshotting and a UI ingress, plus a few API/runtime adjustments.
Changes:
- Add OpenClaw Kubernetes manifests (Deployment/PVC/Service, UI Ingress, daily backup CronJob) and wire them into the Ansible
services.ymlflow. - Add an OCI snapshot/restore utility for OpenClaw state, container build artifacts, CI build workflow, and accompanying docs/tests.
- Improve WhatsApp outbound text handling (splitting) and tighten type parsing for
job_resultfields; adjust persistence model foraudio_duration_seconds.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
zordon/deploy/openclaw.yaml |
Deploys OpenClaw with persisted state, restore initContainer, and snapshot sidecar. |
zordon/deploy/openclaw-ui-ingress.yaml |
Exposes OpenClaw UI via Traefik Ingress using templated host. |
zordon/deploy/openclaw-backup-cronjob.yaml |
Adds a daily backup CronJob for OpenClaw snapshots. |
zordon/deploy/audio_whisperturbo.yaml |
Reduces Whisper Turbo replicas from 3 → 2. |
zordon/ansible/services.yml |
Adds OpenClaw deployment/render/apply steps and OCI secret gating tweaks. |
remedios/services/openclaw/openclaw_snapshot.py |
Implements OCI snapshot backup/restore/prune/loop utilities. |
remedios/services/openclaw/Dockerfile |
Base OpenClaw runtime image for multi-arch builds. |
remedios/services/openclaw/snapshot.Dockerfile |
Snapshot helper image bundling oci + snapshot script. |
remedios/test/test_openclaw_snapshot.py |
Tests retry behavior, loop resilience, and tar extraction behavior. |
remedios/core/api/server.py |
Adds WhatsApp message splitting + stronger numeric parsing in job_result. |
remedios/test/test_api_send_text_answer.py |
Tests WhatsApp splitting and partial failure handling. |
remedios/core/api/persistence/models.py |
Changes audio_duration_seconds column mapping to Float. |
README.md |
Refactors top-level docs to point to dedicated guides. |
docs/deployment.md |
Adds deployment guide including OpenClaw-related .secrets variables. |
docs/openclaw.md |
Adds OpenClaw build/deploy/onboarding/token/pairing guide. |
docs/runbook.md |
Adds operational runbook including OpenClaw troubleshooting. |
.github/workflows/build-and-push-openclaw.yml |
Adds CI to build/push OpenClaw images on develop changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Parte texto en chunks priorizando corte por línea/espacio, con fallback duro.""" | ||
| normalized = str(text or "").strip() | ||
| if not normalized: | ||
| return [""] |
There was a problem hiding this comment.
_split_text_for_whatsapp() devuelve [""] cuando el texto es vacío o sólo whitespace (por el .strip()). Eso hace que _send_text_answer() intente enviar un mensaje vacío a Graph, que suele ser inválido. Mejor devolver [] (o None) para texto vacío y que _send_text_answer() haga early-return sin enviar nada (y/o sin marcar como leído si no se envió ningún texto).
| return [""] | |
| return [] |
| chunk_budget = max(1, max_chars - PART_PREFIX_RESERVE) | ||
| parts: list[str] = [] | ||
| pending = normalized | ||
| while pending: | ||
| if len(pending) <= chunk_budget: | ||
| piece = pending.strip() | ||
| if piece: | ||
| parts.append(piece) |
There was a problem hiding this comment.
El cálculo de chunk_budget usa PART_PREFIX_RESERVE fijo, pero el prefijo real "(i/total) " crece con el número de dígitos de i/total. Con muchos chunks el prefijo puede superar el reserve y el body final puede exceder WHATSAPP_TEXT_LIMIT, causando fallos 400. Sugiero calcular el prefijo dinámicamente (o hacer un 2-pass: estimar total, calcular el worst-case prefix y partir con ese margen).
| def _extract_tar(tar_path: Path, target_dir: Path): | ||
| tmp_dir = target_dir.parent / f".restore-{_ts()}" | ||
| if tmp_dir.exists(): | ||
| shutil.rmtree(tmp_dir) | ||
| tmp_dir.mkdir(parents=True, exist_ok=True) | ||
| with tarfile.open(tar_path, mode="r:gz") as tar: | ||
| tar.extractall(path=tmp_dir) |
There was a problem hiding this comment.
tar.extractall() sin validación permite path traversal (entradas tipo "../../...") o extracción de symlinks fuera de tmp_dir si el tar viene manipulado. Conviene implementar extracción segura validando que cada member se resuelva dentro de tmp_dir y rechazando paths absolutos/.. (o usar una librería/helper de safe_extract).
| def _extract_tar(tar_path: Path, target_dir: Path): | |
| tmp_dir = target_dir.parent / f".restore-{_ts()}" | |
| if tmp_dir.exists(): | |
| shutil.rmtree(tmp_dir) | |
| tmp_dir.mkdir(parents=True, exist_ok=True) | |
| with tarfile.open(tar_path, mode="r:gz") as tar: | |
| tar.extractall(path=tmp_dir) | |
| def _safe_extract_tar(tar: tarfile.TarFile, target_dir: Path): | |
| root = target_dir.resolve() | |
| def _is_within_root(path: Path) -> bool: | |
| try: | |
| path.relative_to(root) | |
| return True | |
| except ValueError: | |
| return False | |
| members = tar.getmembers() | |
| for member in members: | |
| member_path = Path(member.name) | |
| if member_path.is_absolute(): | |
| raise RuntimeError(f"Snapshot corrupto: ruta absoluta no permitida en tar: {member.name}") | |
| resolved_member_path = (root / member_path).resolve() | |
| if not _is_within_root(resolved_member_path): | |
| raise RuntimeError(f"Snapshot corrupto: path traversal detectado en tar: {member.name}") | |
| if member.issym() or member.islnk(): | |
| link_target = Path(member.linkname) | |
| if link_target.is_absolute(): | |
| raise RuntimeError( | |
| f"Snapshot corrupto: link absoluto no permitido en tar: {member.name} -> {member.linkname}" | |
| ) | |
| link_base = resolved_member_path.parent | |
| resolved_link_target = (link_base / link_target).resolve() | |
| if not _is_within_root(resolved_link_target): | |
| raise RuntimeError( | |
| f"Snapshot corrupto: link fuera del directorio de extracción: {member.name} -> {member.linkname}" | |
| ) | |
| tar.extractall(path=target_dir, members=members) | |
| def _extract_tar(tar_path: Path, target_dir: Path): | |
| tmp_dir = target_dir.parent / f".restore-{_ts()}" | |
| if tmp_dir.exists(): | |
| shutil.rmtree(tmp_dir) | |
| tmp_dir.mkdir(parents=True, exist_ok=True) | |
| with tarfile.open(tar_path, mode="r:gz") as tar: | |
| _safe_extract_tar(tar, tmp_dir) |
| shell: | | ||
| tmp=$(mktemp) | ||
| grep -Ev '^(ORACLE_WALLET_PATH|ORACLE_DSN|OCI_CONFIG_PATH|OCI_API_KEY_PATH)=' {{ app_root }}/.secrets > "$tmp" | ||
| grep -Ev '^(ORACLE_WALLET_PATH|ORACLE_DSN|OCI_CONFIG_PATH|OCI_API_KEY_PATH|OPENCLAW_GATEWAY_TOKEN)=' {{ app_root }}/.secrets > "$tmp" |
There was a problem hiding this comment.
Aquí se excluye OPENCLAW_GATEWAY_TOKEN al construir remedios-secrets, pero en el repo/config se usa OPENCLAW_GATEWAY_AUTH (services.yml defaults, manifests y docs). Esto parece un nombre de variable inconsistente y puede acabar incluyendo/excluyendo el parámetro equivocado. Alinea el nombre (token vs auth) y ajusta el grep -Ev para omitir la variable correcta.
| grep -Ev '^(ORACLE_WALLET_PATH|ORACLE_DSN|OCI_CONFIG_PATH|OCI_API_KEY_PATH|OPENCLAW_GATEWAY_TOKEN)=' {{ app_root }}/.secrets > "$tmp" | |
| grep -Ev '^(ORACLE_WALLET_PATH|ORACLE_DSN|OCI_CONFIG_PATH|OCI_API_KEY_PATH|OPENCLAW_GATEWAY_AUTH)=' {{ app_root }}/.secrets > "$tmp" |
| - `OCI_CONFIG_PATH` y `OCI_API_KEY_PATH` no se suben al repo; se montan como secret en el pod. | ||
| ### 6) Añadir el inventory | ||
| Edita `zordon/ansible/inventory.ini` con tu `master` y, si aplica, los `nodes`. | ||
| Todos estos requisitos estan pensados para encajar en la capa Oracle Always Free. |
There was a problem hiding this comment.
Typo en español: "estan" debería ser "están".
| Todos estos requisitos estan pensados para encajar en la capa Oracle Always Free. | |
| Todos estos requisitos están pensados para encajar en la capa Oracle Always Free. |
| apiVersion: batch/v1 | ||
| kind: CronJob | ||
| metadata: | ||
| name: openclaw-backup | ||
| namespace: remedios | ||
| spec: | ||
| schedule: "${OPENCLAW_DAILY_CRON}" | ||
| concurrencyPolicy: Forbid | ||
| startingDeadlineSeconds: 1800 | ||
| successfulJobsHistoryLimit: 3 | ||
| failedJobsHistoryLimit: 2 | ||
| jobTemplate: | ||
| spec: | ||
| backoffLimit: 3 | ||
| activeDeadlineSeconds: 3600 | ||
| ttlSecondsAfterFinished: 86400 | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app: openclaw-backup | ||
| spec: | ||
| imagePullSecrets: | ||
| - name: ghcr-creds | ||
| restartPolicy: Never | ||
| affinity: | ||
| podAffinity: | ||
| requiredDuringSchedulingIgnoredDuringExecution: | ||
| - labelSelector: | ||
| matchExpressions: | ||
| - key: app | ||
| operator: In | ||
| values: | ||
| - openclaw | ||
| topologyKey: kubernetes.io/hostname | ||
| containers: | ||
| - name: openclaw-backup | ||
| image: "${OPENCLAW_SNAPSHOT_IMAGE}" | ||
| imagePullPolicy: Always | ||
| command: ["python", "/app/openclaw_snapshot.py", "--kind", "daily", "backup"] | ||
| envFrom: | ||
| - configMapRef: | ||
| name: openclaw-config | ||
| - secretRef: | ||
| name: remedios-secrets | ||
| env: | ||
| - name: OCI_CONFIG_PATH | ||
| value: /opt/oci/config | ||
| volumeMounts: | ||
| - name: openclaw-state | ||
| mountPath: /home/node/.openclaw | ||
| - name: oci-config | ||
| mountPath: /opt/oci | ||
| readOnly: true | ||
| volumes: | ||
| - name: openclaw-state | ||
| persistentVolumeClaim: | ||
| claimName: openclaw-pvc | ||
| - name: oci-config | ||
| secret: | ||
| secretName: oci-config |
There was a problem hiding this comment.
El Deployment ya ejecuta un sidecar openclaw-snapshot con openclaw_snapshot.py loop, y ese loop hace también un backup daily. Con este CronJob se acabarían haciendo backups daily duplicados cada día (más tráfico/almacenamiento). Alternativas: eliminar el CronJob, o ajustar el loop para que sólo haga checkpoints (o permitir desactivar daily en loop).
| apiVersion: batch/v1 | |
| kind: CronJob | |
| metadata: | |
| name: openclaw-backup | |
| namespace: remedios | |
| spec: | |
| schedule: "${OPENCLAW_DAILY_CRON}" | |
| concurrencyPolicy: Forbid | |
| startingDeadlineSeconds: 1800 | |
| successfulJobsHistoryLimit: 3 | |
| failedJobsHistoryLimit: 2 | |
| jobTemplate: | |
| spec: | |
| backoffLimit: 3 | |
| activeDeadlineSeconds: 3600 | |
| ttlSecondsAfterFinished: 86400 | |
| template: | |
| metadata: | |
| labels: | |
| app: openclaw-backup | |
| spec: | |
| imagePullSecrets: | |
| - name: ghcr-creds | |
| restartPolicy: Never | |
| affinity: | |
| podAffinity: | |
| requiredDuringSchedulingIgnoredDuringExecution: | |
| - labelSelector: | |
| matchExpressions: | |
| - key: app | |
| operator: In | |
| values: | |
| - openclaw | |
| topologyKey: kubernetes.io/hostname | |
| containers: | |
| - name: openclaw-backup | |
| image: "${OPENCLAW_SNAPSHOT_IMAGE}" | |
| imagePullPolicy: Always | |
| command: ["python", "/app/openclaw_snapshot.py", "--kind", "daily", "backup"] | |
| envFrom: | |
| - configMapRef: | |
| name: openclaw-config | |
| - secretRef: | |
| name: remedios-secrets | |
| env: | |
| - name: OCI_CONFIG_PATH | |
| value: /opt/oci/config | |
| volumeMounts: | |
| - name: openclaw-state | |
| mountPath: /home/node/.openclaw | |
| - name: oci-config | |
| mountPath: /opt/oci | |
| readOnly: true | |
| volumes: | |
| - name: openclaw-state | |
| persistentVolumeClaim: | |
| claimName: openclaw-pvc | |
| - name: oci-config | |
| secret: | |
| secretName: oci-config | |
| # Removed duplicate daily backup CronJob. | |
| # The openclaw Deployment already runs the `openclaw-snapshot` sidecar | |
| # with `openclaw_snapshot.py loop`, which performs the daily backup. | |
| # Keeping this CronJob would generate duplicate `daily` backups each day. |




No description provided.