Keystone SDK for Go

Go client for the Keystone agent evaluation + sandboxed execution platform. Ships alongside the Python and TypeScript SDKs at byte-for-byte feature parity.

Install

go get github.com/Polarityinc/keystone-sdk-go@latest

import keystone "github.com/Polarityinc/keystone-sdk-go"

Quick start

package main

import (
    "context"
    "fmt"

    keystone "github.com/Polarityinc/keystone-sdk-go"
)

func main() {
    ctx := context.Background()
    ks := keystone.NewClient(keystone.Config{APIKey: "ks_live_..."})

    // Create an experiment and run it with three client-side scorers.
    exp, _ := ks.Experiments.Create(ctx, keystone.CreateExperimentRequest{
        Name: "nightly-regression", SpecID: "spec-123",
    })

    results, err := ks.Experiments.RunAndWait(ctx, exp.ID, keystone.RunAndWaitOpts{
        Scores: []keystone.Scorer{
            keystone.NewFactuality(keystone.JudgeModel("paragon-fast")),
            keystone.NewExactMatch(keystone.EMExpectedKey("expected"), keystone.EMCaseSensitive(false)),
            keystone.NewFileExists("output.txt", keystone.FEGate(true)),
        },
    })
    if err != nil {
        panic(err)
    }

    fmt.Printf("pass rate: %.0f%%\n", results.Metrics.PassRate*100)

    // Stream every trace for the experiment.
    for trace := range ks.Export.Traces(ctx, keystone.TraceFilter{ExperimentID: exp.ID}, 100) {
        fmt.Println(trace["tool"], trace["cost"])
    }
}

What's in the SDK

Area	Symbols
Client services	`Sandboxes`, `Specs`, `Experiments`, `Alerts`, `Agents`, `Datasets`, `Scoring`, `Export`, `Prompts`
28 built-in scorers	Factuality · Battle · ClosedQA · Humor · Moderation · Summarization · SQLJudge · Translation · Security · ContextPrecision · ContextRecall · ContextRelevancy · ContextEntityRecall · Faithfulness · AnswerRelevancy · AnswerSimilarity · AnswerCorrectness · ExactMatch · Levenshtein · NumericDiff · JSONDiff · JSONValidity · SemanticListContains · EmbeddingSimilarity · FileExists · FileContains · CommandExits · SQLEquals · LLMJudge
Tracing	`WrapTransport(client, sandboxID, base)` · `InitTracing(sandboxID).Traced(ctx, name, fn)` · `TracedValue[T]`
OTel bridge	`RegisterOtelFlush(cb)` · `FlushOtel(ctx)` · `gen_ai.*` metadata on LLM trace events
Prompt mgmt	`ks.Prompts.Create/Get/List/Delete`, `RenderTemplate(template, vars)`, mustache-lite syntax matching Python & TS
Export	`ks.Export.Traces/Spans/Scenarios/Scores(ctx, filter, pageSize)` returning channels; `ks.Export.Experiment(ctx, id, format)` for full dumps
Pricing	`EstimateCost(model, in, out, cache)` across 82 models (sync'd from the shared `pricing.json` SSOT)

Custom scorers

myScorer := keystone.NewScorer(
    func(ctx context.Context, s keystone.ScenarioResult) (any, error) {
        return strings.Contains(s.AgentOutput, "ok"), nil
    },
    keystone.CustomName("contains-ok"),
    keystone.CustomWeight(0.5),
)

Feature parity

All three SDKs share the same pricing.json source of truth and a byte-identical mustache-lite renderer — cost estimates and prompt rendering agree across Go, Python, and TypeScript runtime outputs. See keystone-sdk-js and keystone-sdk-python for the sibling implementations.

Versioning

Semver. v0.x = alpha/beta; v1.0 cuts when the API shape is frozen.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
agents.go		agents.go
alerts.go		alerts.go
client.go		client.go
datasets.go		datasets.go
experiments.go		experiments.go
export.go		export.go
go.mod		go.mod
integration_test.go		integration_test.go
pricing.go		pricing.go
pricing_data.go		pricing_data.go
pricing_test.go		pricing_test.go
prompts.go		prompts.go
prompts_test.go		prompts_test.go
sandboxes.go		sandboxes.go
scorer_base.go		scorer_base.go
scorer_embedding.go		scorer_embedding.go
scorer_heuristics.go		scorer_heuristics.go
scorer_llm_judge.go		scorer_llm_judge.go
scorer_prompts.go		scorer_prompts.go
scorer_rag.go		scorer_rag.go
scorer_sandbox.go		scorer_sandbox.go
scorer_test.go		scorer_test.go
scoring.go		scoring.go
specs.go		specs.go
tracing.go		tracing.go
types.go		types.go
wrap.go		wrap.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keystone SDK for Go

Install

Quick start

What's in the SDK

Custom scorers

Feature parity

Versioning

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Keystone SDK for Go

Install

Quick start

What's in the SDK

Custom scorers

Feature parity

Versioning

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages