To get started with StarGate development:
- Prerequisites: Docker, Docker Compose, .NET 8 SDK
- Start Environment:
./scripts/start-dev.sh
- Run Tests:
dotnet test
π Full setup guide: Local Development Setup
- The Problem
- The Solution (StarGate)
- Architecture - Synchronous Pull Model
- Architecture - System Components
- Extensibility
- Security & Compliance
- Implementation Approach
- Key Benefits
- Assumptions & Constraints
- Success Metrics
- API Specifications
- Error Handling & Resilience
- Monitoring & Observability
The organization manages two completely isolated systems:
ββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β SERVER β β CLIENT β
β - Azure Private β β - Supplier or Customer β
β - Not publicly exposed β X β - Digital Twin β
β - Critical Business Logic β β - Controls β
ββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
NO DIRECT COMMUNICATION
β Cannot talk to each other
β No data flow
β Manual processes required
| Aspect | Current State |
|---|---|
| Process Automation | 100% manual |
| Data Integration | No synchronization - errors, duplicates |
| Scalability | Limited by manual processes (not scalable beyond 20-30 customers) |
| Time-to-Enable | 3-6 months to enable each existing customer for new centralized processes |
| Process Visibility | No real-time status or tracking |
Every time you want to expose a new process or workflow from Server to an existing Client instance:
- Analyze customer/supplier-specific process flows and data formats
- Develop custom integrations (connectors, data mappers, orchestration logic)
- Configure security, networking, authentication per customer
- Test end-to-end on their environment
- Deploy, stabilize, and provide support
This 3-6 month cycle repeats per customer/supplier for each new process type you want to enable.
- β Customers/Supplier CANNOT access any central service
- β Central system MUST REMAIN private (compliance, security, data protection)
- β No "public gateway" exists today
- β Each integration requires manual development
StarGate is a secure hybrid bridge built directly into Customer/Supplier Client as a standard component. It enables the local instance to initiate and monitor processes with the central system while maintaining complete isolation of Server.
graph TD
SERVER["π SERVER<br/>- Remains completely<br/>isolated and secure"]
STARGATE["β‘ STARGATE<br/>AZURE PUBLIC<br/>ββββββββββββββ<br/>β Authentication<br/>β Authorization<br/>β Request/Response<br/>β State Management"]
CLIENT1["π₯ CLIENT Instance 1<br/>Customer/Supplier Local<br/>ββββββββββββββ<br/>π Process Control<br/>βοΈ Local Process Engine<br/>π StarGate Client Built-in"]
CLIENT2["π₯ CLIENT Instance 2<br/>Customer/Supplier Local<br/>ββββββββββββββ<br/>π Process Control<br/>βοΈ Local Process Engine<br/>π StarGate Client Built-in"]
CLIENT3["π₯ CLIENT Instance N<br/>Customer/Supplier Local<br/>ββββββββββββββ<br/>π Process Control<br/>βοΈ Local Process Engine<br/>π StarGate Client Built-in"]
SERVER -->|"Isolated & Secure"| STARGATE
STARGATE -->|"Auth + Control"| CLIENT1
STARGATE -->|"Auth + Control"| CLIENT2
STARGATE -->|"Auth + Control"| CLIENT3
style SERVER fill:#ff6b6b,stroke:#c92a2a,color:#fff
style STARGATE fill:#4dabf7,stroke:#1971c2,color:#fff
style CLIENT1 fill:#51cf66,stroke:#2b8a3e,color:#fff
style CLIENT2 fill:#51cf66,stroke:#2b8a3e,color:#fff
style CLIENT3 fill:#51cf66,stroke:#2b8a3e,color:#fff
- Public service on Azure (
StarGate.API) - Authenticated and authorized access only
- Rate limiting for robustness
- Complete audit trail
- StarGate client shipped as standard component for Customer/Supplier
- No custom integration required
- Automatic polling for state updates
- Offline queue capability (if connectivity lost)
- Automatic retry logic
- Seamless process submission from customer/supplier to Server
- Process status tracking
- State synchronization between systems
- Extensible for additional business processes
- Connectivity failure management
- Automatic retry with exponential backoff
- Local persistent queue
- Transaction consistency guarantees
All communication is initiated by Horizon-Customer (via built-in StarGate client):
sequenceDiagram
participant Customer as CLIENT<br/>(Customer/Supplier)
participant StarGate as STARGATE<br/>GATEWAY
participant Horizon as SERVER<br/>(Central)
Customer->>StarGate: 1. POST /process<br/>(HTTPS + JWT)
activate StarGate
Note over StarGate: 2. Validate Auth<br/>Check Rate Limit
StarGate->>Horizon: 3. Queue Request
activate Horizon
Note over Horizon: 4. Process<br/>Asynchronously
StarGate-->>Customer: 4. 202 Accepted<br/>(processId)
deactivate StarGate
loop Auto Polling (StarGate manages)
Customer->>StarGate: 5. GET /status/{id}
activate StarGate
StarGate->>Horizon: 6. Query state
Horizon-->>StarGate: State update
StarGate-->>Customer: 7. 200 OK<br/>(Process state)
deactivate StarGate
Note over Customer: Until complete
end
deactivate Horizon
The StarGate Client implements an intelligent polling strategy designed to balance responsiveness with resource efficiency:
POLLING STRATEGY TIMELINE:
Minute 0: [Process submitted - 202 Accepted]
β
Minute 0: Poll immediately (check initial status)
β
Minute 0-2: Poll every 30 seconds (aggressive)
- First 2 minutes: high frequency for quick processes
- Expect completion for 80% of operations here
β
Minute 2+: Poll every 60 seconds (conservative)
- After 2 minutes: switch to longer intervals
- Reduce load on gateway for long-running processes
β
Max Timeout: 10 minutes (default, configurable per process type)
- If process not completed after 10 min β log warning
- Continue polling with 60s intervals
- Allow manual intervention if needed
- Automatic via StarGate: Built-in client handles all API calls
- Stateless Gateway: StarGate routes and validates, Central processes
- Pull-Based State: Local instance queries status via built-in polling mechanism
- Asynchronous Processing: Request accepted immediately (HTTP 202 Accepted), processed in background
- Zero Integration Needed: Works out-of-the-box with StarGate Client
Endpoint: https://azure.tenant.com/api/stargate
Authentication: OAuth 2.0 Bearer Token (JWT)
Request Pattern (Internally Generated by StarGate Client):
POST /api/stargate/processes
Content-Type: application/json
Authorization: Bearer {token}
{
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"processType": "process-A",
"data": {
/* business-specific payload */
}
}Response (202 Accepted):
{
"serverProcessId": "PROC-20260114-00001",
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"status": "accepted",
"statusUrl": "/api/stargate/processes/PROC-20260114-00001",
"createdAt": "2026-01-14T12:48:00Z"
}Status Query Pattern (Automatically Polled by StarGate Client):
GET /api/stargate/processes/{processId}
Authorization: Bearer {token}
Response (200 OK):
{
"processId": "PROC-20260114-00001",
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"status": "processing",
"progress": 45,
"result": null,
"updatedAt": "2026-01-14T12:30:00Z"
}stateDiagram-v2
[*] --> RequestSubmitted: Local Process<br/>Initiates
RequestSubmitted --> Accepted: Process ID<br/>Generated
note right of Accepted
HTTP 202
Client notified
StarGate begins polling
end note
Accepted --> Processing: Auto-polling<br/>started
note right of Processing
StarGate monitors status
Progress tracked
Estimated completion
end note
Processing --> Completed: Success
Processing --> Failed: Error<br/>Detected
note right of Completed
Result available
Local system triggers
Auto-consumption
end note
note right of Failed
Error details logged
Retry mechanism check
Manual intervention?
end note
Completed --> [*]: Process<br/>Complete
Failed --> Processing: Auto-Retry<br/>(if enabled)
Failed --> [*]: Intervention<br/>Required
βββββββββββββββββββββββββββββββββββββββββββββ
β AZURE PUBLIC (Managed, Scalable) β
β β
β ββββββββββββββββββββββββββββββββββββββ β
β β API Gateway (StarGate Public) β β
β β β’ Request validation β β
β β β’ Authentication (OAuth 2.0) β β
β β β’ Rate limiting β β
β β β’ Request/Response routing β β
β ββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββΌββββββββββββββββββββ β
β β State Store & Cache β β
β β β’ Track process state β β
β β β’ Results storage (temporary) β β
β β β’ Fast state queries (Redis) β β
β ββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββ
β
Secure Private Link
β
ββββββββββββββββββββΌβββββββββββββββββββββββββ
β AZURE PRIVATE (Locked Down) β
β β
β ββββββββββββββββββββββββββββββββββββββ β
β β SERVER (Process Engine) β β
β β β’ Consumes incoming requests β β
β β β’ Executes business logic β β
β β β’ Persists data β β
β β β’ Updates process state β β
β ββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββΌββββββββββββββββββββ β
β β Data Layer β β
β β β’ MongoDB (requests, audit logs) β β
β β β’ Persistent storage & backup β β
β ββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββ
β CUSTOMER ON-PREMISES (Multiple Instances) β
β β
β ββββββββββββββββββββββββββββββββββββββ β
β β Customer Instance #1 β β
β β ββββββββββββββββββββββββββββββββββ β β
β β β Process Controls β β β
β β β Local Process Engine β β β
β β β ββββββββββββββββββββββββββββββ β β β
β β β β StarGate Client (Built-in) β β β β
β β β β β’ Credential management β β β β
β β β β β’ Auto request submission β β β β
β β β β β’ Auto polling β β β β
β β β β β’ Offline queue β β β β
β β β ββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββ β
β β Supplier Instance #2 β β
β β (Same structure...) β β
β ββββββββββββββββββββββββββββββββββββββ β
β β
β [Additional instances as needed] β
βββββββββββββββββββββββββββββββββββββββββββββ
API Gateway (Public):
- Validate JWT tokens with
processTypescope - Check request rate limits per tenant
- Route to process handler
- Return immediate response
- Audit all requests with correlation IDs
State Store & Cache:
- Track process execution state in Redis
- Store results temporarily
- Enable fast status queries
- Expire cached data after TTL
Server (Private):
- Consume incoming requests (via private link)
- Execute business logic with received data context
- Handle errors and automatic retries
- Update state in cache and persistent storage
- Persist business results and audit logs
Data Layer:
- MongoDB: Audit logs, request history, state snapshots
- Immutable event logs for compliance
- Backup and disaster recovery
StarGate Client (Built-in to Client):
- Manage OAuth credentials (x processType)
- Submit process requests on behalf of local instance
- Implement intelligent polling strategy
- Handle polling intervals and adaptive backoff
- Manage local offline queue
- Retry failed requests
- Deliver results to local process engine
StarGate is process-agnostic. The framework supports any business process that follows the requestβprocessingβresult pattern:
Example Processes:
ββ [Order Management]
β ββ Submit order
β ββ Track fulfillment
β ββ Retrieve order status
β
ββ [Shipping Process]
β ββ Send items
β ββ Track shipping
β ββ Retrieve delivery status
β
ββ Future Processes (Extensible Architecture)
ββ Inventory management
ββ Quality control
ββ Resource allocation
ββ Custom business workflows
The architecture supports adding new processes:
- Define process contract (request schema, result schema)
- Implement handler in SERVER
- Enable for customer tenant (RBAC configuration)
- Validate with test data (per-process testing)
- Implement logic for Customer/Supplier (data preparing)
- Built-in client automatically picks up new process types
OAuth 2.0:
- Client credentials flow
- Customer identity isolation
- Token expiry (1 hour)
- Refresh tokens (30 days)
- Scope includes
processTypefor granular access control - Credentials securely stored in local Client system
RBAC (Role-Based Access Control):
- Per-client isolation (data separation)
- Per-process access control
- Granular operation permissions
- Audit trail of all access with request context
In Transit: TLS 1.2+ (HTTPS only) At Rest: AES-256 encryption (MongoDB) Audit: Complete request/response logging with immutable event logs
- GDPR: Data ownership, deletion rights, portability (TBD: MongoDB audit trail retention policy)
- SOC 2: Security, availability, processing integrity
- Data Residency: All data stored within compliance zones
Phase 1: Foundation
- API Gateway infrastructure
- OAuth 2.0 integration with
processTypescoping - State store (Redis cache + MongoDB)
- First process type (e.g., orders)
- StarGate client with intelligent polling strategy (30s β 60s adaptive)
- Single pilot customer
Phase 2: Stabilization
- Production hardening
- Performance optimization with caching and polling interval tuning
- Resilience & error handling
- Expanded customer base (5-10)
- Polling metrics collection and analysis
Phase 3: Growth
- Additional process types
- Advanced features (analytics, reporting)
- Scaling to 50+ customers
- Operational excellence
- Optimized polling configurations per process type
| Layer | Technology | Rationale |
|---|---|---|
| API Implementation | .NET 8 Minimal APIs (C#) | Type-safe, high-performance, enterprise-grade |
| Caching | Redis | Fast state queries, distributed session management |
| Document Database | MongoDB | Flexible schema for audit logs, request history, audit compliance |
| Containerization | Docker | Consistent deployment environment |
| Infrastructure | Ubuntu VM (Azure) | Cost-effective, flexible scaling, proven stability |
β Process Automation: Manual workflows become automated and auditable
β Visibility: Real-time status of all processes (automatically tracked)
β Reliability: Persistent state ensures no request loss even on failures
β Scalability: Stateless design allows horizontal scaling at all layers
β Integration: Zero integration effort - StarGate is built-in for Server and Client
β Isolation: Central system remains private, never exposed to internet
β Flexibility: Extensible to support additional process types
β Compliance: Audit trail, data encryption, access control, data residency built-in
β Resilience: Automatic retry with exponential backoff, offline queue, error recovery
β Performance: Redis caching for sub-millisecond state queries, optimized payload handling
β Time-to-Enable: Reduce 3-6 month integration projects to weeks
β Cost Reduction: No custom integration development per customer
β Scalability: Enable 50+ customers without proportional resource increase
β Predictability: Standardized communication model across all customers
- Customers have stable internet connectivity (or local queue for offline scenarios)
- StarGate client automatically polls API at reasonable intervals
- Process completion times acceptable (synchronous: <5 seconds; asynchronous: <5 minutes)
- Credentials securely managed within Client instance provisioned at customer setup time
- No real-time push notifications (polling-based only)
- No bidirectional communication initiated by server
- Request/response size limits (standard HTTP limits apply)
- Rate limiting enforced (per-family burst capacity defined)
- API backward compatibility required (versioning strategy needed)
- Availability: 99.9% uptime target
- Latency: API response <500ms (p95), process completion within SLA
- Throughput: Support 10,000+ requests/day sustained
- Error Rate: <0.1% failed requests
- Data Consistency: Zero data loss during failures
- Number of customers with Horizon-Customer deployed
- Number of process types supported
- Volume of requests processed per family/project
- Time-to-enable new processes (target: <1 month)
- Customer satisfaction
POST /api/stargate/processes
Content-Type: application/json
Authorization: Bearer {token}
REQUEST BODY (Generated by StarGate Client):
{
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53", // Process item identifier
"processType": "order", // Type of process
"data": { // Process-specific payload
"orderId": "ORD-12345",
"items": [
{
"sku": "SKU-001",
"quantity": 10,
},
{
"sku": "SKU-002",
"quantity": 5,
}
]
},
"idempotencyKey": "UUID-4-string" // Prevent duplicate submissions
}RESPONSE (202 Accepted):
{
"processId": "PROC-20260114-00001",
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"processType": "order",
"status": "accepted",
"statusUrl": "/api/stargate/processes/PROC-20260114-00001",
"createdAt": "2026-01-14T12:48:00Z",
"estimatedCompletionTime": "2026-01-14T13:00:00Z"
}400 Bad Request:
{
"error": "INVALID_REQUEST",
"message": "Missing required field: clientProcessId",
"timestamp": "2026-01-14T12:48:00Z"
}401 Unauthorized:
{
"error": "INVALID_TOKEN",
"message": "Token expired or invalid",
"timestamp": "2026-01-14T12:48:00Z"
}403 Forbidden:
{
"error": "INSUFFICIENT_PERMISSIONS",
"message": "ProcessType not authorized",
"timestamp": "2026-01-14T12:48:00Z"
}429 Too Many Requests:
{
"error": "RATE_LIMIT_EXCEEDED",
"message": "Request rate limit exceeded (10 req/min)",
"retryAfter": 30,
"timestamp": "2026-01-14T12:48:00Z"
}GET /api/stargate/processes/{processId}
Authorization: Bearer {token}
RESPONSE (200 OK):
{
"processId": "PROC-20260114-00001",
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"processType": "order",
"status": "processing",
"progress": 45,
"currentStep": "inventory_check",
"result": null,
"createdAt": "2026-01-14T12:48:00Z",
"updatedAt": "2026-01-14T12:50:15Z",
"estimatedCompletionTime": "2026-01-14T13:00:00Z"
}RESPONSE (200 OK - Completed):
{
"processId": "PROC-20260114-00001",
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"processType": "order",
"status": "completed",
"progress": 100,
"result": {
"orderId": "ORD-12345",
"status": "confirmed",
"trackingNumber": "TRACK-123456",
"estimatedDelivery": "2026-01-17T00:00:00Z"
},
"createdAt": "2026-01-14T12:48:00Z",
"completedAt": "2026-01-14T13:05:00Z"
}RESPONSE (200 OK - Failed):
{
"processId": "PROC-20260114-00001",
"clientProcessId": "8f3c2a4e-7b91-4d8f-9e6a-2c4f8b1d7e53",
"processType": "order",
"status": "failed",
"progress": 30,
"error": {
"code": "INVENTORY_INSUFFICIENT",
"message": "Insufficient stock for SKU-001",
"details": {
"sku": "SKU-001",
"requested": 10,
"available": 5
}
},
"createdAt": "2026-01-14T12:48:00Z",
"failedAt": "2026-01-14T12:52:00Z",
"retryable": true
}Client Errors (4xx):
400 Bad Request: Malformed request, missing fields, invalid data401 Unauthorized: Missing or invalid JWT token403 Forbidden: Insufficient permissions for processType404 Not Found: Process ID not found429 Too Many Requests: Rate limit exceeded
Server Errors (5xx):
500 Internal Server Error: Unrecoverable processing error502 Bad Gateway: Downstream service unavailable503 Service Unavailable: Temporary overload or maintenance
For StarGate Client (Automatic):
- Polling Strategy:
- Initial: Immediate poll upon 202 acceptance
- Phase 1 (0-2 minutes): Every 30 seconds (aggressive)
- Phase 2 (2+ minutes): Every 60 seconds (conservative)
- Maximum Timeout: 10 minutes (default, configurable per process type)
- Polling Backoff on Failures:
- Transient Poll Failures (network timeout, 503): Exponential backoff
- Attempt 1: Retry after 5 seconds
- Attempt 2: Retry after 10 seconds
- Attempt 3: Retry after 20 seconds
- Attempt 4: Retry after 40 seconds
- Attempt 5: Retry after 80 seconds
- Max retries: 5 before alerting
- Idempotent Requests: Use
idempotencyKeyto prevent duplicate processing on retry - Offline Queue: Local queue persists requests during connectivity loss
- Queue stored in local database (SQLite or SQL Server Compact)
- Automatic flush when connectivity restored
- Maintains request order
For Internal Processing (Horizon Core):
- Automatic Retries: Transient failures (network, timeout) retried internally
- Circuit Breaker: Protect against cascading failures
- Fallback: Use cached state if database temporarily unavailable
State Consistency:
- All state changes persisted immediately to MongoDB
- Redis cache used for fast queries with MongoDB as source of truth
- No data loss on partial failures
- Polling reads always from authoritative state store
Graceful Degradation:
- API Gateway remains responsive even if Horizon Core temporarily unavailable
- Accept requests with 202 response, queue them for later processing
- StarGate client can queue requests locally if gateway unavailable
Failure Recovery:
- Automatic reprocessing of failed requests
- Manual retry capability via API (if process marked
retryable: true) - Complete audit trail for debugging failures
- Polling logs captured for root cause analysis of timeouts or failures
Distributed Tracing:
- Correlation IDs throughout request lifecycle
- Trace request from Horizon-Customer β API Gateway β Horizon β databases
- Performance bottleneck identification
- Polling event tracing (each poll logged with correlation ID)
Logging:
- Structured logs (JSON format) from all components
- Request/response payloads (sanitized for sensitive data)
- Error logs with stack traces
- Audit logs with user context (familyId, projectId, processData)
- StarGate client activity logs
- Polling initiation and completion
- Polling interval transitions (30s β 60s)
- Poll failures and backoff attempts
- Process result retrieval and consumption
Alerting:
- High error rate (>1% requests failing)
- P95 latency exceeding 2 seconds
- Queue backlog exceeding 1000 items
- Database connection exhaustion
- Cache eviction rate spikes
- Repeated polling failures per customer (>3 consecutive failures per process)
- Polling timeout rate exceeding 0.5% (processes not completing within 10 minutes)
- Polling interval anomalies (stuck in Phase 1 beyond expected completion times)
Liveness Probe: API responds to GET /health/live
Readiness Probe: GET /health/ready returns success only if all dependencies healthy
Dependencies: Database connectivity, Redis connectivity, downstream service availability