[codex] Add production Salesforce skill by furukama · Pull Request #642 · HybridAIOne/hybridclaw

furukama · 2026-04-28T15:58:52Z

Summary

expands the Salesforce bundled skill from read-only inspection into CRM reads, Opportunity stage/probability updates, and Task/Event activity logging
adds deterministic natural-language planning/execution for workflows like "Move the Acme deal to Closed Won and log a call from today"
adds a 30-scenario offline eval fixture covering read, write-update, write-activity, and compound paths
updates gateway OAuth token binding so captured Salesforce tokens bind to the returned instance_url host while remaining blocked from unrelated hosts

Acceptance Criteria

OAuth token capture and storage stay gateway-side via <secret:...> placeholders, captureResponseFields, and bearerSecretName; tokens are not returned to the helper/model
lead/contact/opportunity read paths are available through find
Opportunity StageName and Probability writes are available through update-opportunity and NL run
call/email/meeting logging creates Task or Event records on the resolved CRM target
cost-per-run accounting is documented and emitted as UsageTotals metadata in helper/eval payloads

Closes #580.

Validation

npm run format
python3 skills/skill-creator/scripts/quick_validate.py skills/salesforce
python3 skills/salesforce/scripts/salesforce_query.py --format json eval-scenarios
python3 -c "import pathlib; p=pathlib.Path('skills/salesforce/scripts/salesforce_query.py'); compile(p.read_text(), str(p), 'exec')"
/Users/bkoehler/src/hybridclaw/node_modules/.bin/vitest tests/salesforce-skill.test.ts
/Users/bkoehler/src/hybridclaw/node_modules/.bin/vitest tests/gateway-http-server.test.ts -t "binds captured OAuth bearer tokens to response instance_url host"
./node_modules/.bin/tsc --noEmit --pretty false
git diff --check

Note: the focused gateway test passes but logs an existing shared better-sqlite3 Node ABI warning during mocked runtime-config startup.

Copilot

Pull request overview

This PR upgrades the bundled Salesforce skill from schema/SOQL inspection to production CRM workflows, adding read/search, Opportunity updates, and Task/Event activity logging via the gateway proxy while tightening OAuth token host binding.

Changes:

Expanded salesforce_query.py with find, update-opportunity, log-activity, plus deterministic NL plan/run and an offline eval-scenarios suite.
Added a 30-scenario eval fixture and corresponding Vitest coverage for planning and write payload construction.
Updated gateway OAuth capture to bind bearer-token secrets to the instance_url host returned by Salesforce (instead of the login host).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/salesforce-skill.test.ts	Adds tests for new helper commands, NL planning, eval suite coverage, and write payload construction.
tests/gateway-http-server.test.ts	Verifies captured OAuth token binding follows Salesforce `instance_url` and blocks unrelated hosts.
src/gateway/gateway-http-proxy.ts	Binds captured OAuth bearer tokens to response `instance_url` host when present.
skills/salesforce/scripts/salesforce_query.py	Implements CRM search, Opportunity updates, activity logging, NL planner/runner, and eval harness.
skills/salesforce/references/metadata-and-queries.md	Updates usage docs for new read/write and NL commands.
skills/salesforce/evals/scenarios.json	Adds 30 offline NL planning scenarios across read/write/compound categories.
skills/salesforce/agents/openai.yaml	Updates agent description/prompt to reflect CRM reads and safe writes.
skills/salesforce/SKILL.md	Updates skill scope, workflow, security model, and eval documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

furukama · 2026-04-28T18:40:54Z

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

claude · 2026-04-28T18:41:24Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

furukama · 2026-04-28T21:03:28Z

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

claude · 2026-04-28T21:03:46Z

Claude finished @furukama's task in 3m 8s —— View job

PR Review: KISS, YAGNI, DRY, Fail Fast, Secure by Default, Performance

Tasks:

Read all changed files
Write thorough review

Overall the implementation is solid — the gateway-proxied secret model is well-designed and the escaping fixes (separate escape_soql_like_literal) are correct. Issues below are grouped by principle. Bold items are the most consequential.

DRY

Column extraction duplicated verbatim in render_text (salesforce_query.py:1421–1430 and 1454–1463): the query/tooling-query branch and the find branch both contain identical 9-line column-collection loops. Extract to a helper, e.g. _columns_from_records(records).

child_rows list comprehension for child relationships is also copy-pasted between the describe branch (lines 1341–1349) and the relations branch (lines 1388–1395). Minor, but a named helper would make future field additions apply in one place.

normalize_activity_type called twice in log_activity — once inside build_activity_payload (line 962) and again in the return dict (line 1042). The second call on an already-validated string is redundant; save the result from build_activity_payload. Fix this →

YAGNI

usage_totals_measurement() emits schema metadata for a not-yet-wired feature. The dict lists which fields would exist in UsageTotals data, but no actual per-call cost values are captured or computed in this PR. Every response payload carries this static dict. If the real accounting plumbing isn't shipped here, strip the field (or replace it with a simple "costMeasurement": "UsageTotals" string per scenario). As-is it misleads callers into thinking cost data is present.

Eval scenarios' expected struct is only partially evaluated. Each scenario has expected.firstAction, expected.recordType, expected.stage, expected.target etc., but evaluate_scenarios() at salesforce_query.py:1245 only checks firstAction. The other expected fields are ignored — they're YAGNI decoration on the fixture right now. Either evaluate them or remove them from scenarios.json to keep the fixture honest.

ensure_positive_limit is a one-liner used in two places — not enough reuse to justify a named function. Inline it.

KISS

resolved_records cache in run_planned_actions is case-sensitive (salesforce_query.py:1197–1202). The dict is keyed on action["opportunity"] (the raw NL-parsed name, e.g. "Acme"), but the lookup at line 1201 checks if target in resolved_records. If action["target"] for the subsequent log-activity step differs in case or whitespace from the key, the cache misses silently and the downstream step retries a name-based lookup against Salesforce — correct but surprising. Key and look up by id from the resolved record instead.

SalesforceSession.request_json detects absolute URLs with a startswith check (line 129). If someone passes a relative path that begins with http:// by accident (e.g., a malformed path like http://stubs), it goes through unchanged. Use parsed.hostname or a tighter guard.

meeting_times hard-codes 9:00 AM UTC with no explanation (line 506, time(hour=9)). This will create events at the wrong time for every non-UTC org. Should be documented at minimum, or made a parameter.

_add_common_args with with_defaults=False/argparse.SUPPRESS (lines 1522–1566) is clever but opaque — the intent is to let flags appear before or after the subcommand, but the mechanism requires reading the function twice to understand. A code comment explaining why double-registration is needed would help; alternatively, argparse.ArgumentParser(parents=[...]) achieves the same outcome more readably.

Fail Fast

resolve_latest_api_version doesn't guard for non-dict items in the versions list (line 370): item.get("version", "") will raise AttributeError if Salesforce returns a non-object element. Add if is_mapping(item) inside the version_key lambda or filter beforehand. Fix this →

run_planned_actions can partially execute then crash. If action is missing a key (e.g. "openOnly" in a find-records dict), the KeyError surfaces after one or more Salesforce writes have already landed. The plan shape is owned by plan_natural_language so this is low-probability, but validation at the top of run_planned_actions (or using .get() with defaults) would prevent silent partial mutations.

main()'s else branch at line 1812 falls through to tooling-query for any unrecognised args.command. With required=True on subparsers this shouldn't fire, but an explicit raise AssertionError(f"Unreachable: {args.command}") would catch any future refactoring mistake immediately.

Secure by Default

captureOAuthResponse binds domains only for jsonPaths containing the string "token" (gateway-http-proxy.ts:427):

if (rule.jsonPath.includes('token')) {
  secrets[`${rule.secretName}${BOUND_DOMAIN_SUFFIX}`] = baseDomain;
}

This is a substring heuristic. If a future OAuth flow uses a capture rule whose jsonPath is "access" or "bearer", the domain binding is silently skipped and the token becomes usable against any host. A safer default: bind all captured secrets to the domain, not just those whose path includes "token". The instance_url capture is intentionally not bound today, but that can be handled with an explicit exclude list rather than an implicit include. Fix this →

escape_soql_literal vs escape_soql_like_literal — correctly split, Copilot's inline comment is moot. The PR already provides both helpers and uses the right one in each context. The Copilot review comment suggesting the fix is already implemented.

normalize_stage_name fallback is correct per the Copilot suggestion (known.get(normalized.lower(), normalized)) — the code already returns normalized (not .title()) for unknown stage names. No change needed; the Copilot suggestion was addressed before this PR landed.

Raw SOQL passthrough in query command is intentional but worth a SKILL.md note: any caller with gateway access can read arbitrary Salesforce data. The risk is commensurate with having stored OAuth credentials, so this is acceptable by design.

Performance / Speed

list_objects fetches all sObjects (~900 in a typical org) then filters in Python (salesforce_query.py:615). The Salesforce REST API has no server-side filter on /sobjects, so this is unavoidable, but worth noting that the response can be several MB and the full payload is held in memory before any filtering.

+5 magic number on urlopen timeout (salesforce_query.py:238): gw.timeout_ms / 1000 + 5. The extra 5 seconds is presumably a buffer so urlopen doesn't cut off a gateway that is itself about to time out the outbound request. This is reasonable but should be a named constant, e.g. GATEWAY_TIMEOUT_BUFFER_S = 5.

find_single_record always makes two SOQL round-trips on a name miss (exact-match query, then LIKE fallback). For update-opportunity and log-activity, this means 2+ sequential Salesforce calls before any write. An OR-combined single query (WHERE Id = 'x' OR Name = 'x') would halve the latency for the happy path.

Test embedding Python source as .join('\n') strings in salesforce-skill.test.ts (e.g. lines 76–101, 147–197): these inline multi-command Python scripts are maintenance liabilities. They can't be linted or type-checked by Python tooling, and a missing comma is invisible. Prefer small fixture .py scripts in tests/fixtures/ that spawnSync runs directly.

Summary

Principle	Severity	Count
DRY	Medium	3
YAGNI	Medium	3
KISS	Low-Medium	4
Fail Fast	Medium	3
Secure by Default	High	1 (domain-binding heuristic)
Performance	Low	3

The domain-binding heuristic in captureOAuthResponse is the only change I'd call blocking — everything else is code quality. The overall architecture (gateway-proxied tokens, SSRF guards, dry-run support, eval suite) is well-designed for the scope.

Benedikt Koehler and others added 2 commits April 28, 2026 17:46

feat: add production Salesforce skill

170d379

Merge branch 'main' into codex/580-salesforce-production-skill

681a7aa

furukama marked this pull request as ready for review April 28, 2026 18:29

Copilot AI review requested due to automatic review settings April 28, 2026 18:29

Copilot started reviewing on behalf of furukama April 28, 2026 18:29 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread skills/salesforce/scripts/salesforce_query.py Outdated

Comment thread skills/salesforce/scripts/salesforce_query.py

fix: address salesforce skill review feedback

bbb4fb8

Benedikt Koehler and others added 3 commits April 28, 2026 23:21

refactor: deduplicate salesforce text rendering

405f66e

fix: simplify salesforce planned action handling

ab54bca

Merge branch 'main' into codex/580-salesforce-production-skill

6ff11fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add production Salesforce skill#642

[codex] Add production Salesforce skill#642
furukama wants to merge 6 commits intomainfrom
codex/580-salesforce-production-skill

furukama commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

furukama commented Apr 28, 2026

Summary

Acceptance Criteria

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

furukama commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: KISS, YAGNI, DRY, Fail Fast, Secure by Default, Performance

DRY

YAGNI

KISS

Fail Fast

Secure by Default

Performance / Speed

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 28, 2026 •

edited

Loading

claude Bot commented Apr 28, 2026 •

edited

Loading