Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ tmp/
temp/

# Documentation build
site/
site/src/swing_agent.egg-info/
77 changes: 77 additions & 0 deletions docs/adr/001-centralized-database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# ADR-001: Centralized Database Architecture

## Status

Accepted

## Context

The SwingAgent system generates multiple types of data:
- Trading signals with extensive metadata
- Feature vectors for pattern matching
- Evaluation results and outcomes
- Configuration and enrichment data

Previously, this data was scattered across multiple SQLite files, making:
- Data consistency challenging
- Cross-dataset queries complex
- Deployment and backup procedures error-prone
- Development setup cumbersome

## Decision

We will adopt a centralized database architecture where:

1. **Single Database Instance**: All data (signals, vectors, evaluations) stored in one database
2. **SQLAlchemy ORM**: Use SQLAlchemy for all database operations with proper schema management
3. **Multiple Backend Support**: Support SQLite (development), PostgreSQL, and MySQL via connection strings
4. **Backward Compatibility**: Automatically migrate from old multi-file approach
5. **Centralized Configuration**: Single database URL configuration via environment variables

## Implementation Details

```python
# Centralized database configuration
database_url = get_database_config().database_url # From environment or default

# All operations use same session factory
with get_session() as session:
# Signals, vectors, and all data operations
```

**Schema Organization**:
- `signals` table: Complete trading signals with all metadata
- `vec_store` table: Feature vectors with outcomes for ML
- Foreign key relationships where appropriate
- JSON columns for flexible metadata storage

## Consequences

### Positive

- **Simplified Operations**: Single database to backup, migrate, and monitor
- **ACID Compliance**: All related data changes in same transaction
- **Better Performance**: Joins and complex queries possible across all data
- **Production Ready**: Easy to deploy with external databases (PostgreSQL/MySQL)
- **Development Velocity**: Simplified setup with single database file

### Negative

- **Migration Complexity**: Existing installations need data migration
- **Single Point of Failure**: Database issues affect entire system
- **Lock Contention**: High concurrency may require connection pooling

## Migration Strategy

```python
# Automatic migration in _ensure_db() functions
if old_sqlite_files_detected:
migrate_to_centralized_database()
```

## Monitoring

- Database connection health checks
- Query performance monitoring
- Storage space utilization alerts
- Backup verification procedures
103 changes: 103 additions & 0 deletions docs/adr/002-vector-store-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# ADR-002: Vector Store Design for Pattern Matching

## Status

Accepted

## Context

SwingAgent uses machine learning for pattern matching by:
- Encoding market setups as feature vectors
- Finding similar historical patterns via cosine similarity
- Predicting outcomes based on historical performance

Key requirements:
- Fast similarity search over thousands of vectors
- Flexible metadata storage for filtering
- Outcome tracking for backtesting validation
- Cross-symbol pattern matching capabilities

## Decision

We will implement a custom vector store with:

1. **Compact Feature Vectors**: 16-dimensional normalized vectors for core market features
2. **Rich Metadata Payloads**: JSON storage for additional context (vol regime, MTF alignment, etc.)
3. **Cosine Similarity Search**: L2-normalized vectors for angle-based similarity
4. **Outcome Integration**: Direct linkage to realized R-multiples and exit reasons
5. **Context Filtering**: Ability to filter by market conditions (volatility, etc.)

## Vector Composition

```python
# 16-dimensional feature vector
[
ema_slope, # Momentum indicator
rsi_normalized, # Overbought/oversold
atr_pct, # Volatility measure
price_above_ema, # Trend position
prev_range_pct, # Previous bar range
gap_pct, # Gap size
fib_position, # Fibonacci position
in_golden_pocket, # Golden pocket flag
r_multiple/5.0, # Risk/reward normalized
trend_up, # Uptrend flag
trend_down, # Downtrend flag
trend_sideways, # Sideways flag
session_open_close, # Session encoding
session_mid_close, # Session encoding
llm_confidence, # LLM confidence
constant_term # Bias term
]
```

## Storage Schema

```sql
CREATE TABLE vec_store (
id TEXT PRIMARY KEY, -- Unique vector ID
ts_utc TEXT NOT NULL, -- Timestamp
symbol TEXT NOT NULL, -- Trading symbol
timeframe TEXT NOT NULL, -- Timeframe
vec_json TEXT NOT NULL, -- Vector as JSON array
realized_r REAL, -- Actual outcome
exit_reason TEXT, -- Exit type
payload JSON -- Additional metadata
);
```

## Similarity Algorithm

```python
def cosine(u, v):
"""Cosine similarity with L2 normalization"""
return np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
```

## Consequences

### Positive

- **Efficient Search**: Cosine similarity scales well to thousands of vectors
- **Flexible Metadata**: JSON payloads allow rich contextual filtering
- **Cross-Symbol Learning**: Patterns work across different symbols
- **Outcome Integration**: Direct backtesting validation of predictions
- **Compact Storage**: 16 dimensions keep memory usage reasonable

### Negative

- **Vector Dimensionality**: Fixed at 16 dimensions, adding features requires migration
- **No Indexing**: Linear search may become slow with 100k+ vectors
- **Feature Engineering**: Careful normalization required for meaningful similarity

## Performance Considerations

- **Memory Usage**: ~64 bytes per vector (16 * 4 bytes float)
- **Search Time**: O(n) linear search, acceptable for <50k vectors
- **Storage**: JSON encoding adds ~2x overhead vs binary

## Future Enhancements

- Vector indexing (FAISS, Annoy) for sub-linear search times
- Dynamic vector dimensions via versioning
- Distributed vector storage for massive datasets
125 changes: 125 additions & 0 deletions docs/adr/003-llm-integration-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# ADR-003: LLM Integration Strategy

## Status

Accepted

## Context

SwingAgent integrates Large Language Models for:
- Market condition assessment and trend validation
- Entry bias determination and confidence scoring
- Action plan generation with risk scenarios
- Qualitative analysis to complement quantitative indicators

Key challenges:
- Cost control with commercial APIs
- Reliability and error handling
- Structured output validation
- Domain-specific prompt engineering

## Decision

We will use a structured LLM integration approach:

1. **OpenAI Models Only**: Focus on GPT-4 family for consistency
2. **Pydantic Validation**: All LLM outputs validated against structured schemas
3. **Dual-Purpose Usage**:
- **Voting**: Quick market assessment with confidence scoring
- **Planning**: Detailed execution plans with scenario analysis
4. **Graceful Degradation**: System continues functioning if LLM unavailable
5. **Cost Optimization**: Default to cheaper models (gpt-4o-mini) with configuration override

## Integration Architecture

```python
# Structured output models
class LlmVote(BaseModel):
trend_label: Literal["strong_up","up","sideways","down","strong_down"]
entry_bias: Literal["long","short","none"]
confidence: float = Field(ge=0, le=1)
rationale: str

class LlmActionPlan(BaseModel):
action_plan: str
risk_notes: str
scenarios: List[str]
tone: Literal["conservative","balanced","aggressive"]
```

## Prompt Engineering Strategy

### Market Analysis Prompts
- **System Role**: "Disciplined 1–2 day swing-trading co-pilot"
- **Constraints**: Return only structured JSON, no price invention
- **Context**: Provide technical indicators, market features, ML priors

### Action Plan Prompts
- **System Role**: "Execution coach for swing trades"
- **Output Style**: Checklist-based plans with invalidation levels
- **Risk Focus**: Include 2-4 scenarios with specific risk considerations

## Error Handling

```python
def safe_llm_prediction(**features) -> Optional[LlmVote]:
try:
return llm_extra_prediction(**features)
except openai.RateLimitError:
logging.warning("LLM rate limit hit")
return None # Graceful degradation
except Exception as e:
logging.error(f"LLM error: {e}")
return None
```

## Configuration

```bash
# Environment variables
OPENAI_API_KEY=sk-...
SWING_LLM_MODEL=gpt-4o-mini # Cost-effective default
```

## Consequences

### Positive

- **Structured Reliability**: Pydantic validation ensures consistent output format
- **Cost Control**: Cheaper models by default with selective upgrade capability
- **Domain Expertise**: Specialized prompts for trading context
- **Graceful Degradation**: System works without LLM for pure technical analysis
- **Rich Context**: LLM can synthesize complex market conditions humans might miss

### Negative

- **External Dependency**: Reliance on OpenAI API availability and pricing
- **Latency**: API calls add 1-3 seconds to signal generation
- **Cost Scaling**: Costs increase linearly with signal volume
- **Prompt Drift**: Model updates may change behavior over time

## Usage Guidelines

### When to Use LLM Voting
- Market conditions are ambiguous (sideways trends, mixed signals)
- Technical indicators give conflicting signals
- High-conviction setups need validation

### When to Use Action Plans
- Live trading execution
- Complex multi-scenario setups
- Client reporting and documentation

## Monitoring

- API response times and error rates
- Token usage and cost tracking
- Output quality validation against backtests
- Prompt effectiveness measurement

## Future Enhancements

- Local model deployment for cost reduction
- Multi-model ensemble voting
- Fine-tuning on historical SwingAgent data
- Real-time market news integration
Loading