Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Interim development files
backup_interim_files/
*_test.csv
*_test.json
*_temp.csv
*_temp.json
tick_database*.csv
tick_database*.json
vwap_*.csv
vwap_*.json
!production_vwap_final.csv
!production_vwap_final.json
large_vwap_*.csv
large_vwap_*.json
realistic_flow*.csv
realistic_flow*.json
sor_execution*.json
*_v2.csv
*_v2.json
*_fixed.py
!generate_production_vwap_fixed.py

# Keep only production files
!generate_production_data.py
!instruments.csv
!instruments.json
!TRADING_SIMULATION_ARCHITECTURE.md
101 changes: 101 additions & 0 deletions data/README_FINANCIAL_DATA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Financial Test Data

This directory contains realistic financial test data for SQL-CLI development and testing.

## VWAP Execution Example

**Files:**
- `vwap_example_orders.json` - Order hierarchy showing parent and child orders
- `vwap_example_fills.json` - Execution fills with venue details

**Description:**
A complete VWAP (Volume Weighted Average Price) algo execution for:
- Client: Blackrock Asset Management
- Ticker: ASML.AS (ASML Holding - Large Cap EU Equity)
- Quantity: 100,000 shares
- Side: Buy

The data shows:
1. Parent order from client
2. Child orders generated by VWAP algo throughout the day
3. SOR (Smart Order Router) child orders split across venues
4. Fills from multiple venues (NYSE, NASDAQ, BATS, ARCA, Dark Pools, IEX)

**Key fields for analysis:**
- `parent_order_id` - Links child orders to parent
- `order_state` - Order lifecycle (Pending → Accepted → Working → Filled)
- `venue` - Execution venue
- `filled_quantity` / `remaining_quantity` - Track execution progress

## Instrument Reference Data

**Files:**
- `instruments.json` - 200 financial instruments (JSON format)
- `instruments.csv` - Same data in CSV format

**Asset Classes:**
- Equities (stocks, ETFs, ADRs)
- Fixed Income (government & corporate bonds)
- Derivatives (options, futures, swaps)
- Commodities (energy, metals, agriculture)
- FX (spot, forwards, NDFs)

**Key fields:**
- Identifiers: ISIN, CUSIP, SEDOL, Bloomberg ticker
- Trading info: exchange, currency, tick_size, lot_size
- Market data: last_price, bid/ask, volume
- Risk metrics: VaR, duration, Greeks (for options)

## Sample Queries

```sql
-- Find all child orders for a parent
SELECT * FROM vwap_example_orders
WHERE parent_order_id = 'ORD_1754985600_3449'

-- Analyze fills by venue
SELECT venue, COUNT(*) as fills, SUM(quantity) as total_qty
FROM vwap_example_fills
GROUP BY venue

-- Find high-value equity instruments
SELECT name, last_price, market_cap
FROM instruments
WHERE asset_class = 'Equity' AND market_cap > 100000000000

-- Get all active derivatives
SELECT * FROM instruments
WHERE asset_class = 'Derivative' AND status = 'Active'
```

## File Sizes

All files are kept small for version control:
- vwap_example_orders.json: ~125KB
- vwap_example_fills.json: ~118KB
- instruments.json: ~239KB
- instruments.csv: ~62KB

## Generation

To regenerate this data:
```bash
cd /home/me/dev/sql-cli/data

# VWAP execution
python3 ../scripts/generate_financial_data.py \
--mode vwap \
--ticker ASML.AS \
--quantity 100000 \
--side Buy \
--client "Blackrock Asset Management" \
--format json \
--output vwap_example

# Instruments
python3 ../scripts/generate_financial_data.py \
--mode instruments \
--count 200 \
--format json \
--output instruments.json
```
244 changes: 244 additions & 0 deletions data/TRADING_SIMULATION_ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,244 @@
# Trading System Simulation Architecture

## Overview

This document describes the production-quality trading system simulation used to generate realistic test data for the SQL-CLI TUI. The simulation models a complete VWAP (Volume Weighted Average Price) execution flow from client order through algo engine, smart order router, to venue execution.

## Architecture Components

### 1. Order Hierarchy

```
CLIENT_001 (Level 0) - Original client order
ALGO_001 (Level 1) - Algo parent order (VWAP strategy)
SLICE_00001...N (Level 2) - Algo child slices
SOR_000001...N (Level 3) - SOR routes to venues
```

**Key Design Decisions:**
- `client_order_id` preserved throughout entire hierarchy for traceability
- Each level maintains its own order state
- Fill propagation cascades up immediately (5ms delays between levels)

### 2. Fill Propagation Model

Every fill follows this cascade pattern:
```
Venue Fill (T+0ms)
→ SOR Route Update (T+0ms)
→ Algo Slice Update (T+10ms)
→ Algo Parent Update (T+15ms)
→ Client Order Update (T+20ms)
```

**Rationale:** This mirrors production systems where:
- Venue sends execution report
- SOR aggregates venue fills
- Algo engine updates parent order
- FIX engine sends client update

### 3. Participation Monitoring & Urgency

The algo engine monitors participation rate against VWAP schedule:

```python
Participation Rate = (Filled Quantity / Expected Quantity) * 100

if participation < 70%: urgency = CRITICAL # Sweep all venues
elif participation < 85%: urgency = URGENT # Take liquidity aggressively
elif participation < 95%: urgency = NORMAL # Standard execution
else: urgency = PASSIVE # Post liquidity
```

**Impact on Execution:**
- **CRITICAL**: Large slices, market orders, accept slippage
- **URGENT**: Medium slices, immediate-or-cancel orders
- **NORMAL**: Standard slicing, limit orders
- **PASSIVE**: Small slices, post-only orders

### 4. Market Microstructure Modeling

#### Current Implementation

**Venue Responses:**
- **FILLED** (83%): Complete fill at venue
- **PARTIAL** (10%): Partial fill due to limited liquidity
- **FADE** (5%): Liquidity taken by competitor
- **REJECT** (2%): Connection issues or price protection

**Price Formation:**
```python
Base Price + Urgency Spread + Random Walk
- CRITICAL: +2-4 bps (paying for immediacy)
- URGENT: +1-2 bps (crossing spread)
- NORMAL: -1 to +1 bps (at mid)
- PASSIVE: -1 bps (earning spread)
```

#### Future Enhancements (Not Yet Implemented)

1. **Internal Liquidity Matching**
- SOR checks internal crossing engine before routing external
- Risk desk provides liquidity from inventory
- Internalization rate typically 10-20% for large firms

2. **Dark Pool Aggregation**
- Multiple dark pool venues with different liquidity profiles
- Conditional orders based on minimum quantity
- Mid-point crossing logic

3. **Advanced SOR Logic**
- Spray ordering across venues based on historical fill rates
- Venue toxicity scoring (avoid venues with high fade rates)
- Dynamic routing based on real-time market data

4. **Regulatory Considerations**
- Best execution validation
- Reg NMS compliance (trade-through protection)
- MiFID II reporting fields

## Data Generation Strategy

### File Size Management

Target: < 100K rows for TUI performance

**Approach:**
- **Full Detail**: All orders and routes (~3-5K rows for 2M shares)
- **Summary Mode**: Slice summaries + client updates (~500-1K rows)
- **Client Only**: Just client order updates (~10-50 rows)

### Realistic Volumes

**Production Benchmarks:**
- Large institutional order: 1-5M shares
- Slices: 500-5000 shares each
- Routes: 2-5 venues per slice
- Daily volume: 10K-100K orders

**Our Simulation:**
- Default: 2M shares, 2000 share slices
- Generates ~1000 slices, ~3000 routes
- Results in ~3-5K database snapshots

## Key Metrics Tracked

### Execution Quality
- Fill rate (% of order completed)
- VWAP performance (slippage in bps)
- Participation rate (actual vs planned)
- Venue performance (fill rates, fade rates)

### Microstructure Analysis
- Fade events (lost liquidity to competitors)
- Partial fills (liquidity constraints)
- Rejects (technical/connectivity issues)
- Retry attempts (recovery from failures)

## SQL Queries for Analysis

### Client Perspective
```sql
-- What the client sees
SELECT * FROM production_vwap_final
WHERE order_level = 0
ORDER BY snapshot_time
```

### Algo Performance
```sql
-- Participation tracking
SELECT snapshot_time, filled_quantity, participation_pct, urgency
FROM production_vwap_final
WHERE order_id = 'ALGO_001' AND event_type = 'ALGO_UPDATE'
```

### Microstructure Issues
```sql
-- Find problem venues
SELECT venue,
COUNT(*) as attempts,
SUM(CASE WHEN state = 'FADE' THEN 1 ELSE 0 END) as fades,
SUM(CASE WHEN state = 'PARTIAL' THEN 1 ELSE 0 END) as partials
FROM production_vwap_final
WHERE order_level = 3
GROUP BY venue
```

## Future Roadmap

### Phase 2: Internal Liquidity
- Implement crossing engine
- Add risk desk liquidity provision
- Model internalization benefits

### Phase 3: Advanced Market Models
- Multi-asset support (futures, options)
- Cross-asset hedging flows
- Market impact modeling

### Phase 4: Real-time Simulation
- WebSocket feed simulation
- Streaming position updates
- Live market data integration

### Phase 5: Machine Learning Integration
- Venue selection optimization
- Fill rate prediction
- Optimal slice sizing

## File Structure

```
data/
├── production_vwap_final.csv # Main dataset with fill propagation
├── instruments.csv # Reference data
├── generate_production_vwap_fixed.py # Generator script
└── TRADING_SIMULATION_ARCHITECTURE.md # This document
```

## Testing Recommendations

1. **Load Test**: Start with summary mode for large orders
2. **Propagation Test**: Verify every fill cascades up
3. **Urgency Test**: Check aggression increases when behind
4. **Microstructure Test**: Analyze fade/partial patterns

## Configuration Parameters

```python
# Current defaults
ORDER_SIZE = 2,000,000 # 2M shares
AVG_SLICE_SIZE = 2,000 # 2K shares per slice
FADE_RATE = 0.05 # 5% fade probability
PARTIAL_RATE = 0.10 # 10% partial fill probability
PROPAGATION_DELAY = 5ms # Between cascade levels
```

## Validation Checklist

- [ ] Client order quantity remains constant
- [ ] Filled quantity only increases
- [ ] Every slice fill propagates to parent
- [ ] Urgency changes with participation rate
- [ ] Venue statistics are realistic
- [ ] File size < 100K rows
- [ ] All orders preserve client_order_id

## Contact & Maintenance

This simulation was designed to provide realistic test data for SQL-CLI development. The modular Python architecture allows easy enhancement for additional scenarios.

Key principles:
1. **Realism over complexity** - Model what matters for testing
2. **Audit trail completeness** - Every event is captured
3. **Performance awareness** - Keep data volumes manageable
4. **Extensibility** - Easy to add new scenarios

---

*Last Updated: 2024-12-16*
*Version: 1.0 - Production VWAP with fill propagation*
Loading
Loading