Replies: 2 comments
-
|
Interesting setup. I've been running OpenClaw in a similar split-infra pattern (not K8s specifically, but agent + external APIs on separate hosts). A few thoughts: 1. Direct K8s API vs Grafana Direct K8s API is better for your use case. The agent can query exactly what it needs ( That said — don't give it cluster-admin. Create a ServiceAccount with a Role scoped to your test namespace: 2. Context window management This is the real challenge. A busy namespace can produce megabytes of events and logs per hour. What works:
With Nemotron-3-120B you have a decent context window, but token cost per run adds up fast if you're scanning every 5 minutes. 3. Practical tip for the CronJob trigger Rather than a fixed interval, consider having your CronJob check for recent warning events first ( What namespace complexity are we talking about? (number of pods, typical churn rate) That would help narrow down the filtering strategy. |
Beta Was this translation helpful? Give feedback.
-
|
The test namespace doesn't exist yet. I am currently creating it from scratch just to test nemoclaw. My plan is to start small by setting up a dummy test backend-app's, test databases etc. to validate the workflow and see how the LLM handles the context. Once I prove this concept works, my main goal is to deploy it to real applications. I will probably split my applications into different namespaces and assign different roles across various VMs. I will definitely apply your filtering and RBAC tips while building the system. I'll make sure to share my test results and findings here once the setup is up and running. Thanks again. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, looking for best practice advice on this architecture:
The Stack: nvidia/nemotron-3-super-120b-a12b (2x H200) + Nemoclaw on a separate VM (same DC, HTTP endpoint).
The Plan: Give Nemoclaw K8s API access (via token) to a test namespace. Trigger it via CronJob or Telegram to scan for issues and send SMTP alerts.
Important context: I already use Grafana alerts for deterministic problems. This LLM setup is strictly for rapid detection of complex, non-deterministic edge cases.
My Questions:
Is querying the K8s API directly the best practice for Nemoclaw in this scenario?
Alternatively, should I ship all namespace logs/events to my Grafana stack first and have Nemoclaw analyze them from there instead of direct K8s access?
Any quick tips on filtering the data to avoid blowing up the context window?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions