Skip to content

[codex] Add production Salesforce skill#642

Open
furukama wants to merge 6 commits intomainfrom
codex/580-salesforce-production-skill
Open

[codex] Add production Salesforce skill#642
furukama wants to merge 6 commits intomainfrom
codex/580-salesforce-production-skill

Conversation

@furukama
Copy link
Copy Markdown
Contributor

Summary

  • expands the Salesforce bundled skill from read-only inspection into CRM reads, Opportunity stage/probability updates, and Task/Event activity logging
  • adds deterministic natural-language planning/execution for workflows like "Move the Acme deal to Closed Won and log a call from today"
  • adds a 30-scenario offline eval fixture covering read, write-update, write-activity, and compound paths
  • updates gateway OAuth token binding so captured Salesforce tokens bind to the returned instance_url host while remaining blocked from unrelated hosts

Acceptance Criteria

  • OAuth token capture and storage stay gateway-side via <secret:...> placeholders, captureResponseFields, and bearerSecretName; tokens are not returned to the helper/model
  • lead/contact/opportunity read paths are available through find
  • Opportunity StageName and Probability writes are available through update-opportunity and NL run
  • call/email/meeting logging creates Task or Event records on the resolved CRM target
  • cost-per-run accounting is documented and emitted as UsageTotals metadata in helper/eval payloads

Closes #580.

Validation

  • npm run format
  • python3 skills/skill-creator/scripts/quick_validate.py skills/salesforce
  • python3 skills/salesforce/scripts/salesforce_query.py --format json eval-scenarios
  • python3 -c "import pathlib; p=pathlib.Path('skills/salesforce/scripts/salesforce_query.py'); compile(p.read_text(), str(p), 'exec')"
  • /Users/bkoehler/src/hybridclaw/node_modules/.bin/vitest tests/salesforce-skill.test.ts
  • /Users/bkoehler/src/hybridclaw/node_modules/.bin/vitest tests/gateway-http-server.test.ts -t "binds captured OAuth bearer tokens to response instance_url host"
  • ./node_modules/.bin/tsc --noEmit --pretty false
  • git diff --check

Note: the focused gateway test passes but logs an existing shared better-sqlite3 Node ABI warning during mocked runtime-config startup.

@furukama furukama marked this pull request as ready for review April 28, 2026 18:29
Copilot AI review requested due to automatic review settings April 28, 2026 18:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the bundled Salesforce skill from schema/SOQL inspection to production CRM workflows, adding read/search, Opportunity updates, and Task/Event activity logging via the gateway proxy while tightening OAuth token host binding.

Changes:

  • Expanded salesforce_query.py with find, update-opportunity, log-activity, plus deterministic NL plan/run and an offline eval-scenarios suite.
  • Added a 30-scenario eval fixture and corresponding Vitest coverage for planning and write payload construction.
  • Updated gateway OAuth capture to bind bearer-token secrets to the instance_url host returned by Salesforce (instead of the login host).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/salesforce-skill.test.ts Adds tests for new helper commands, NL planning, eval suite coverage, and write payload construction.
tests/gateway-http-server.test.ts Verifies captured OAuth token binding follows Salesforce instance_url and blocks unrelated hosts.
src/gateway/gateway-http-proxy.ts Binds captured OAuth bearer tokens to response instance_url host when present.
skills/salesforce/scripts/salesforce_query.py Implements CRM search, Opportunity updates, activity logging, NL planner/runner, and eval harness.
skills/salesforce/references/metadata-and-queries.md Updates usage docs for new read/write and NL commands.
skills/salesforce/evals/scenarios.json Adds 30 offline NL planning scenarios across read/write/compound categories.
skills/salesforce/agents/openai.yaml Updates agent description/prompt to reflect CRM reads and safe writes.
skills/salesforce/SKILL.md Updates skill scope, workflow, security model, and eval documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/salesforce/scripts/salesforce_query.py Outdated
Comment thread skills/salesforce/scripts/salesforce_query.py
@furukama
Copy link
Copy Markdown
Contributor Author

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@furukama
Copy link
Copy Markdown
Contributor Author

@claude Do a thorough review of this PR in terms of: KISS, YAGNI, DRY, Fail Fast, Secure by Default and performance/speed

@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Claude finished @furukama's task in 3m 8s —— View job


PR Review: KISS, YAGNI, DRY, Fail Fast, Secure by Default, Performance

Tasks:

  • Read all changed files
  • Write thorough review

Overall the implementation is solid — the gateway-proxied secret model is well-designed and the escaping fixes (separate escape_soql_like_literal) are correct. Issues below are grouped by principle. Bold items are the most consequential.


DRY

Column extraction duplicated verbatim in render_text (salesforce_query.py:1421–1430 and 1454–1463): the query/tooling-query branch and the find branch both contain identical 9-line column-collection loops. Extract to a helper, e.g. _columns_from_records(records).

child_rows list comprehension for child relationships is also copy-pasted between the describe branch (lines 1341–1349) and the relations branch (lines 1388–1395). Minor, but a named helper would make future field additions apply in one place.

normalize_activity_type called twice in log_activity — once inside build_activity_payload (line 962) and again in the return dict (line 1042). The second call on an already-validated string is redundant; save the result from build_activity_payload. Fix this →


YAGNI

usage_totals_measurement() emits schema metadata for a not-yet-wired feature. The dict lists which fields would exist in UsageTotals data, but no actual per-call cost values are captured or computed in this PR. Every response payload carries this static dict. If the real accounting plumbing isn't shipped here, strip the field (or replace it with a simple "costMeasurement": "UsageTotals" string per scenario). As-is it misleads callers into thinking cost data is present.

Eval scenarios' expected struct is only partially evaluated. Each scenario has expected.firstAction, expected.recordType, expected.stage, expected.target etc., but evaluate_scenarios() at salesforce_query.py:1245 only checks firstAction. The other expected fields are ignored — they're YAGNI decoration on the fixture right now. Either evaluate them or remove them from scenarios.json to keep the fixture honest.

ensure_positive_limit is a one-liner used in two places — not enough reuse to justify a named function. Inline it.


KISS

resolved_records cache in run_planned_actions is case-sensitive (salesforce_query.py:1197–1202). The dict is keyed on action["opportunity"] (the raw NL-parsed name, e.g. "Acme"), but the lookup at line 1201 checks if target in resolved_records. If action["target"] for the subsequent log-activity step differs in case or whitespace from the key, the cache misses silently and the downstream step retries a name-based lookup against Salesforce — correct but surprising. Key and look up by id from the resolved record instead.

SalesforceSession.request_json detects absolute URLs with a startswith check (line 129). If someone passes a relative path that begins with http:// by accident (e.g., a malformed path like http://stubs), it goes through unchanged. Use parsed.hostname or a tighter guard.

meeting_times hard-codes 9:00 AM UTC with no explanation (line 506, time(hour=9)). This will create events at the wrong time for every non-UTC org. Should be documented at minimum, or made a parameter.

_add_common_args with with_defaults=False/argparse.SUPPRESS (lines 1522–1566) is clever but opaque — the intent is to let flags appear before or after the subcommand, but the mechanism requires reading the function twice to understand. A code comment explaining why double-registration is needed would help; alternatively, argparse.ArgumentParser(parents=[...]) achieves the same outcome more readably.


Fail Fast

resolve_latest_api_version doesn't guard for non-dict items in the versions list (line 370): item.get("version", "") will raise AttributeError if Salesforce returns a non-object element. Add if is_mapping(item) inside the version_key lambda or filter beforehand. Fix this →

run_planned_actions can partially execute then crash. If action is missing a key (e.g. "openOnly" in a find-records dict), the KeyError surfaces after one or more Salesforce writes have already landed. The plan shape is owned by plan_natural_language so this is low-probability, but validation at the top of run_planned_actions (or using .get() with defaults) would prevent silent partial mutations.

main()'s else branch at line 1812 falls through to tooling-query for any unrecognised args.command. With required=True on subparsers this shouldn't fire, but an explicit raise AssertionError(f"Unreachable: {args.command}") would catch any future refactoring mistake immediately.


Secure by Default

captureOAuthResponse binds domains only for jsonPaths containing the string "token" (gateway-http-proxy.ts:427):

if (rule.jsonPath.includes('token')) {
  secrets[`${rule.secretName}${BOUND_DOMAIN_SUFFIX}`] = baseDomain;
}

This is a substring heuristic. If a future OAuth flow uses a capture rule whose jsonPath is "access" or "bearer", the domain binding is silently skipped and the token becomes usable against any host. A safer default: bind all captured secrets to the domain, not just those whose path includes "token". The instance_url capture is intentionally not bound today, but that can be handled with an explicit exclude list rather than an implicit include. Fix this →

escape_soql_literal vs escape_soql_like_literal — correctly split, Copilot's inline comment is moot. The PR already provides both helpers and uses the right one in each context. The Copilot review comment suggesting the fix is already implemented.

normalize_stage_name fallback is correct per the Copilot suggestion (known.get(normalized.lower(), normalized)) — the code already returns normalized (not .title()) for unknown stage names. No change needed; the Copilot suggestion was addressed before this PR landed.

Raw SOQL passthrough in query command is intentional but worth a SKILL.md note: any caller with gateway access can read arbitrary Salesforce data. The risk is commensurate with having stored OAuth credentials, so this is acceptable by design.


Performance / Speed

list_objects fetches all sObjects (~900 in a typical org) then filters in Python (salesforce_query.py:615). The Salesforce REST API has no server-side filter on /sobjects, so this is unavoidable, but worth noting that the response can be several MB and the full payload is held in memory before any filtering.

+5 magic number on urlopen timeout (salesforce_query.py:238): gw.timeout_ms / 1000 + 5. The extra 5 seconds is presumably a buffer so urlopen doesn't cut off a gateway that is itself about to time out the outbound request. This is reasonable but should be a named constant, e.g. GATEWAY_TIMEOUT_BUFFER_S = 5.

find_single_record always makes two SOQL round-trips on a name miss (exact-match query, then LIKE fallback). For update-opportunity and log-activity, this means 2+ sequential Salesforce calls before any write. An OR-combined single query (WHERE Id = 'x' OR Name = 'x') would halve the latency for the happy path.

Test embedding Python source as .join('\n') strings in salesforce-skill.test.ts (e.g. lines 76–101, 147–197): these inline multi-command Python scripts are maintenance liabilities. They can't be linted or type-checked by Python tooling, and a missing comma is invisible. Prefer small fixture .py scripts in tests/fixtures/ that spawnSync runs directly.


Summary

Principle Severity Count
DRY Medium 3
YAGNI Medium 3
KISS Low-Medium 4
Fail Fast Medium 3
Secure by Default High 1 (domain-binding heuristic)
Performance Low 3

The domain-binding heuristic in captureOAuthResponse is the only change I'd call blocking — everything else is code quality. The overall architecture (gateway-proxied tokens, SSRF guards, dry-run support, eval suite) is well-designed for the scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Roadmap 21.2] Salesforce production skill

2 participants