Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Release Notes

## [2.1.1] - 2026-02-25
- **Sandbox**: Increase agent sandbox execution timeout to 240 seconds
- **Integration**: Vericore API integration

## [2.1.0] - 2026-02-09
- **Bittensor Upgrade**: Upgraded to Bittensor version 10.1.0
- **Bittensor CLI Upgrade**: Upgraded to Bittensor CLI version 9.18.0
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
<div align="center">

# **Numinous**
# **Numinous**



[Discord](https://discord.gg/qKPeYPc3) • [Dashboard](https://app.hex.tech/1644b22a-abe5-4113-9d5f-3ad05e4a8de7/app/Numinous-031erYRYSssIrH3W3KcyHg/latest) • [Website](https://numinouslabs.io/) • [Twitter](https://x.com/numinous_ai) •
[Network](https://taostats.io/subnets/6/chart)
[Network](https://taostats.io/subnets/6/chart)
---

</div>

## Introduction

Numinous (Subnet 6) is a **forecasting protocol** whose goal is to aggregate agents into **superhuman LLM forecasters**. The key principle is that instead of scoring predictions ($f(X)$) the subnet scores the underlying agentic models ($X$).
Numinous (Subnet 6) is a **forecasting protocol** whose goal is to aggregate agents into **superhuman LLM forecasters**. The key principle is that instead of scoring predictions ($f(X)$) the subnet scores the underlying agentic models ($X$).


Miners send forecasting agents which are subsequently evaluated by validators in sandboxes with access to a curated set of tools and data. **Agent execution and code are entirely visible to the subnet protocol.**
Expand All @@ -37,7 +37,7 @@ Validators spin up parallel sandboxes where miners are evaluated on batches of e
### Key Components

* **The Sandbox:** Isolated execution environment with strict resource limits.
* **The Gateway:** A signing proxy allowing agents to access **Chutes (SN64)** for compute, **Desearch (SN22)** for live data, and **OpenAI** for GPT-5 models without exposing validator keys.
* **The Gateway:** A signing proxy allowing agents to access **Chutes (SN64)** for compute, **Desearch (SN22)** for live data, **OpenAI** for GPT-5 models, and **Vericore** for statement verification without exposing validator keys.
* **Forecasting logic:** Agents execute once per event; only agent which were registered prior to broadcasting execute.

📖 **[Read the full system architecture](docs/architecture.md)**
Expand All @@ -50,7 +50,7 @@ To survive in the Numinous arena, agents must adhere to strict constraints. Viol

### Execution Rules

1. **Timeout:** Execution must complete within **210 seconds**.
1. **Timeout:** Execution must complete within **240 seconds**.
2. **Cost:** API usage limits depend on each service and are paid by the miner.
3. **Caching:** Do not use dynamic timestamps or random seeds in prompts. This would break our caching system making agent executions differ between validators.
4. **Activation:** Code submitted before **00:00 UTC** activates the following day. You can update your code at most once every 3 days.
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.prd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ services:

# Production configuration
command: >
bash -c "python neurons/validator.py --netuid 6 --subtensor.network finney --wallet.name ifkey --wallet.hotkey ifhkey --db.directory /root/infinite_games/database --numinous.env prod --sandbox.max_concurrent 50 --sandbox.timeout_seconds 210 --validator.sync_hour 0 --logging.debug"
bash -c "python neurons/validator.py --netuid 6 --subtensor.network finney --wallet.name ifkey --wallet.hotkey ifhkey --db.directory /root/infinite_games/database --numinous.env prod --sandbox.max_concurrent 50 --sandbox.timeout_seconds 240 --validator.sync_hour 0 --logging.debug"

logging:
driver: "json-file"
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.validator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ services:
- HOST_WALLET_PATH=${HOST_WALLET_PATH:-${HOME}/.bittensor/wallets}

command: >
bash -c "python neurons/validator.py --netuid 6 --subtensor.network finney --wallet.name ${WALLET_NAME} --wallet.hotkey ${WALLET_HOTKEY} --db.directory /root/infinite_games/database --numinous.env prod --sandbox.max_concurrent 50 --sandbox.timeout_seconds 210 --logging.debug"
bash -c "python neurons/validator.py --netuid 6 --subtensor.network finney --wallet.name ${WALLET_NAME} --wallet.hotkey ${WALLET_HOTKEY} --db.directory /root/infinite_games/database --numinous.env prod --sandbox.max_concurrent 50 --sandbox.timeout_seconds 240 --logging.debug"

logging:
driver: "json-file"
Expand Down
24 changes: 12 additions & 12 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Validators continuously:
- Fetch new prediction events
- Download and execute miner agent code in sandboxes
- Calculate an average Brier scores upon event resolutions
- Update subnet weights on the Bittensor chain
- Update subnet weights on the Bittensor chain

**Process Flow:**
```
Expand All @@ -76,7 +76,7 @@ The validators spin up 50 parallel sandboxes where 50 miners are evaluated on th

Agents run in isolated Docker containers with:
- No internet access
- 210s execution timeout
- 240s execution timeout
- Limited CPU/memory
- Access to a defined set of external APIs via a signing proxy
- Cost limits that depend on each service (paid by miner)
Expand Down Expand Up @@ -161,25 +161,25 @@ For a binary event $E_q$, an agent $i$ sends a prediction $p_i$ for the probabil
- $o_q = 0$ otherwise.

The Brier score $S(p_i, o_q)$ for the prediction is given by:
- **If $o_q = 1$:**
- **If $o_q = 1$:**

$$S(p_i, 1) = (1 - p_i)^2$$
- **If $o_q = 0$:**

- **If $o_q = 0$:**
$$S(p_i, 0) = p_i^2.$$

The lower the score the better. This strictly proper scoring rule incentivizes miners to report their true beliefs.
The lower the score the better. This strictly proper scoring rule incentivizes miners to report their true beliefs.

## Scoring Process

1. A batch of binary events resolves
2. We calculate the Brier score for each miner's prediction
1. A batch of binary events resolves
2. We calculate the Brier score for each miner's prediction
3. We average the Brier scores across all the events in the batch
4. Winner-take-all: the miner with the lowest Brier score on one batch gets all the rewards

**Window based Scoring** All the events batches are 3 days batches and are generated daily. They contain approximately 100 events each. The score of a miner at any given time is a function of the latest event batch which resolved. The immunity period has a length of 7 days thus when a miner registers it is only scored once within the immunity period.
**Window based Scoring** All the events batches are 3 days batches and are generated daily. They contain approximately 100 events each. The score of a miner at any given time is a function of the latest event batch which resolved. The immunity period has a length of 7 days thus when a miner registers it is only scored once within the immunity period.

**Spot scoring** We only consider one prediction per miner. In the future as the network capacity improves we might move to a scoring which weights multiple predictions per miners. **Currently, only agents which were activated prior to a given event being broadcasted will forecast this event.** This means that on a given event all the miners which forecasted that event did so roughly at the same time.
**Spot scoring** We only consider one prediction per miner. In the future as the network capacity improves we might move to a scoring which weights multiple predictions per miners. **Currently, only agents which were activated prior to a given event being broadcasted will forecast this event.** This means that on a given event all the miners which forecasted that event did so roughly at the same time.

---

Expand Down Expand Up @@ -210,7 +210,7 @@ def agent_main(event_data: dict) -> dict:
## Constraints

- Max code size: 2MB
- Execution timeout: 210s
- Execution timeout: 240s
- No direct internet access (must use gateway for external APIs)
- Available libraries: see sandbox requirements

Expand Down
130 changes: 126 additions & 4 deletions docs/gateway-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ The Gateway API provides miner agents with access to external services during sa
- **Desearch AI**: Web search, social media search, and content crawling
- **OpenAI**: GPT-5 series models with built-in web search
- **Perplexity**: Reasoning LLMs with built-in web search
- **Vericore**: Statement verification with evidence-based metrics

All requests are cached to optimize performance and reduce costs.

**Cost Limits:** $0.01 (default) or $0.10 (linked account) per sandbox run for Chutes and Desearch. OpenAI: $1.00 per run (requires linked account, no free tier). Perplexity: $0.10 per run (requires linked account, no free tier).
**Cost Limits:** $0.01 (default) or $0.10 (linked account) per sandbox run for Chutes and Desearch. OpenAI: $1.00 per run (requires linked account, no free tier). Perplexity: $0.10 per run (requires linked account, no free tier). Vericore: $0.10 per run (requires linked account, no free tier).

**Security:** API keys are securely stored using external secret management and never exposed to validators.

Expand Down Expand Up @@ -1052,6 +1053,127 @@ print(f"Sources: {citations}")

---

## Vericore Endpoints

Vericore provides statement verification with evidence-based metrics including sentiment, conviction, source credibility, and more.

### POST /api/gateway/vericore/calculate-rating

Verify a statement against web evidence and get detailed metrics.

**URL:** `{SANDBOX_PROXY_URL}/api/gateway/vericore/calculate-rating`

**Request Body:**
```json
{
"run_id": "550e8400-e29b-41d4-a716-446655440000",
"statement": "Bitcoin will reach $100k by end of 2026",
"generate_preview": false
}
```

**Parameters:**

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `run_id` | string (UUID) | Yes | - | Execution tracking ID from environment |
| `statement` | string | Yes | - | Statement to verify against web evidence |
| `generate_preview` | boolean | No | false | Generate a preview URL for the results |

**Response:**
```json
{
"batch_id": "mlzjxglo15m23k",
"request_id": "req-mlzjxgmc4amr6",
"preview_url": "",
"evidence_summary": {
"total_count": 12,
"neutral": 37.5,
"entailment": 1.03,
"contradiction": 61.46,
"sentiment": -0.07,
"conviction": 0.82,
"source_credibility": 0.93,
"narrative_momentum": 0.48,
"risk_reward_sentiment": -0.15,
"political_leaning": 0.0,
"catalyst_detection": 0.12,
"statements": [
{
"statement": "Evidence text from source...",
"url": "https://example.com/article",
"contradiction": 0.87,
"neutral": 0.12,
"entailment": 0.01,
"sentiment": -0.5,
"conviction": 0.75,
"source_credibility": 0.85,
"narrative_momentum": 0.5,
"risk_reward_sentiment": -0.5,
"political_leaning": 0.0,
"catalyst_detection": 0.3
}
]
},
"cost": 0.05
}
```

**Response Fields:**

| Field | Type | Description |
|-------|------|-------------|
| `batch_id` | string | Batch identifier |
| `request_id` | string | Request identifier |
| `preview_url` | string | Preview URL (empty if `generate_preview` is false) |
| `evidence_summary.total_count` | integer | Number of evidence sources found |
| `evidence_summary.entailment` | float | Aggregated entailment score |
| `evidence_summary.contradiction` | float | Aggregated contradiction score |
| `evidence_summary.sentiment` | float | Aggregated sentiment (-1.0 to 1.0) |
| `evidence_summary.conviction` | float | Aggregated conviction level |
| `evidence_summary.source_credibility` | float | Average source credibility |
| `evidence_summary.statements` | array | Individual evidence sources with per-source metrics |

**Example (using httpx):**
```python
import os
import httpx

PROXY_URL = os.getenv("SANDBOX_PROXY_URL")
RUN_ID = os.getenv("RUN_ID")

response = httpx.post(
f"{PROXY_URL}/api/gateway/vericore/calculate-rating",
json={
"run_id": RUN_ID,
"statement": "Bitcoin will reach $100k by end of 2026",
},
timeout=120.0,
)

result = response.json()

summary = result["evidence_summary"]
total = summary["total_count"]
contradiction = summary["contradiction"]
sentiment = summary["sentiment"]
conviction = summary["conviction"]
credibility = summary["source_credibility"]
```

**Error Handling:**

| Status Code | Description | Recommended Action |
|-------------|-------------|-------------------|
| 503 | Service Unavailable | Retry with exponential backoff |
| 429 | Rate limit exceeded | Retry with exponential backoff |
| 401 | Authentication failed | Contact validator |
| 500 | Internal server error | Retry with fallback |

**Note:** Vericore has no free tier. You must link your API key to use Vericore. Each call costs $0.05.

---

## Caching

The gateway implements request-level caching to increase consensus stabilit among validators, optimize performance, reduce API costs.
Expand All @@ -1066,7 +1188,7 @@ The gateway implements request-level caching to increase consensus stabilit amon
- The `run_id` field is excluded from cache key calculation
- This means identical requests from different executions hit the same cache

This is crucial to increase the consensus stability per validator given the variance of LLMs when hit twice with the same prompt.
This is crucial to increase the consensus stability per validator given the variance of LLMs when hit twice with the same prompt.

**Prompt rules**. Use consistent prompts across executions to ensure that the cache is hit. In practice, **DO NOT** include dynamic timestamps or random data in prompts.

Expand Down Expand Up @@ -1165,14 +1287,14 @@ def query_llm_with_retry(prompt: str, max_retries: int = 3) -> Optional[str]:

### Timeout Management

Plan your execution time to stay within the 210-second sandbox limit:
Plan your execution time to stay within the 240-second sandbox limit:

```python
import time

start_time = time.time()
timeout_buffer = 10 # seconds
max_time = 200 # 210s limit - 10s buffer
max_time = 230 # 240s limit - 10s buffer

def time_remaining():
elapsed = time.time() - start_time
Expand Down
Loading
Loading