diff --git a/CLAUDE.md b/CLAUDE.md index f2b71479..74e2cf76 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,13 +4,12 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -This is the 0L Network Explorer, a full-stack application for exploring the 0L blockchain. The repository contains: - -- **api/**: NestJS GraphQL backend with ClickHouse integration for blockchain data -- **web-app/**: React frontend with TypeScript and Vite -- **api/transformer/**: Rust binary for data transformation +The 0L Network Explorer is a full-stack blockchain explorer application with: +- **api/**: NestJS GraphQL backend with ClickHouse analytics +- **web-app/**: React TypeScript frontend with Vite +- **api/transformer/**: Rust binary for blockchain data transformation - **infra/**: Kubernetes deployment configurations -- **ol-fyi-local-infra/**: Docker Compose setup for local development +- **ol-fyi-local-infra/**: Docker Compose for local development ## Common Development Commands @@ -18,101 +17,211 @@ This is the 0L Network Explorer, a full-stack application for exploring the 0L b ```bash cd api npm install -npm run start:dev # Development with hot reload -npm run build # Build production -npm run test # Run Jest tests -npm run test:e2e # End-to-end tests -npm run lint # ESLint with auto-fix -npm run format # Prettier formatting -npx prisma generate # Generate Prisma client after schema changes -npx prisma db push # Push schema changes to database +npm run start:dev # Development with hot reload (port 3000) +npm run build # Production build +npm run test # Run Jest unit tests +npm run test:watch # Jest watch mode +npm run test:cov # Coverage report +npm run test:e2e # End-to-end tests +npm run lint # ESLint with auto-fix +npm run format # Prettier formatting +npx prisma generate # Regenerate Prisma client after schema changes +npx prisma db push # Push schema changes to database ``` ### Web App (Frontend) ```bash cd web-app npm install -npm run dev # Vite development server -npm run build # TypeScript compilation + Vite build -npm run lint # ESLint -npm run preview # Preview production build -npm run prettier-check # Check formatting -npm run prettier-fix # Fix formatting +npm run dev # Vite dev server (port 5173) +npm run build # TypeScript + Vite production build +npm run lint # ESLint +npm run preview # Preview production build +npm run prettier-fix # Auto-format code ``` ### Transformer (Rust) ```bash cd api/transformer -cargo build # Build the binary +cargo build # Build the transformer binary ``` ## Architecture Overview ### Backend Architecture (NestJS) -- **GraphQL API** with Apollo Server and subscriptions -- **Modular structure** with feature-based modules (accounts, validators, transactions, stats) -- **Database layers**: PostgreSQL (Prisma) for app data, ClickHouse for blockchain analytics -- **Background processing** with BullMQ queues and Redis -- **External integrations**: S3, Firebase, NATS messaging -- **Role-based workers** configured via ROLES environment variable - -Key modules: -- `OlModule`: Core blockchain data handling -- `StatsModule`: Analytics and metrics -- `NodeWatcherModule`: Network monitoring -- `ValidatorsModule`: Validator information -- `TransactionsModule`: Transaction processing with factory pattern + +**Module Structure** (`api/src/`): +- `app/app.module.ts`: Root module with dependency configuration +- `ol/`: Core blockchain data handling + - `accounts/`: Account processing and resolution + - `validators/`: Validator information and statistics + - `transactions/`: Transaction factory pattern implementation + - `movements/`: Token movement tracking + - `community-wallets/`: Community wallet management +- `stats/`: Analytics and metrics processing +- `node-watcher/`: Network monitoring +- `clickhouse/`: ClickHouse database integration +- `multi-sig/`: Multi-signature wallet operations +- `redis/`, `nats/`, `s3/`, `firebase/`: Service integrations + +**Role-Based Worker System**: +Workers are configured via the `ROLES` environment variable: +- `version-batch-processor`: Batch blockchain version processing +- `parquet-producer-processor`: Parquet file generation for analytics +- `version-processor`: Individual version processing +- `clickhouse-ingestor-processor`: Data ingestion to ClickHouse +- `expired-transactions-processor`: Clean up expired transactions +- `accounts-processor`: Account data processing + +**GraphQL Patterns**: +- Resolvers use `@Resolver()` decorator with model types +- Field resolvers use `@ResolveField()` for nested data +- Pagination uses cursor-based approach with `PaginatedX` types +- Custom scalars for Buffer and BigNumber types + +**Transaction Processing**: +Factory pattern in `api/src/ol/transactions/`: +- `TransactionsFactory.ts`: Creates transaction handlers +- `TransactionsService.ts`: Core business logic +- `TransactionsRepository.ts`: Database operations +- `OnChainTransactionsRepository.ts`: Blockchain data access ### Frontend Architecture (React) -- **React 18** with TypeScript and Vite -- **Routing** with React Router v6 -- **State management** via Apollo Client for GraphQL -- **Styling** with Styled Components and Tailwind CSS -- **Charts** using ECharts for data visualization -- **Wallet integration** with Aptos wallet adapters - -Component structure: -- `modules/core/`: App setup, routing, Apollo client -- `modules/core/routes/`: Page components (Account, Stats, Validators, etc.) -- `modules/ui/`: Reusable UI components -- `modules/interface/`: TypeScript interfaces + +**Module Structure** (`web-app/src/modules/`): +- `core/`: Application foundation + - `router.tsx`: React Router v6 configuration + - `apollo-client.ts`: GraphQL client setup with WebSocket subscriptions + - `routes/`: Page components (Account, Stats, Validators, Blocks, etc.) +- `ui/`: Reusable UI components +- `interface/`: TypeScript interfaces and types +- `aptos/`: Wallet integration components +- `ol/`, `movements/`: Domain-specific modules + +**Routing Structure**: +- `/accounts/:address`: Account details with nested routes (movements, transactions) +- `/transactions/:version`: Transaction details +- `/blocks/:blockHeight`: Block exploration +- `/validators`: Validator list and details +- `/stats`: Network statistics dashboard +- `/community-wallets`: Community wallet tracking + +**State Management**: +- Apollo Client for GraphQL state and caching +- Real-time updates via GraphQL subscriptions +- Styled Components + Tailwind CSS for styling ### Data Flow -1. Blockchain data ingested via background processors -2. Raw data transformed by Rust transformer binary -3. Processed data stored in ClickHouse and PostgreSQL -4. GraphQL resolvers query databases -5. React frontend consumes GraphQL API with real-time subscriptions +1. Blockchain data ingested via role-based processors +2. Rust transformer processes raw data (`api/transformer/`) +3. Processed data stored in ClickHouse (analytics) and PostgreSQL (app data) +4. GraphQL resolvers query both databases +5. React frontend consumes GraphQL API with real-time subscriptions via WebSocket ## Local Development Setup -1. **Prerequisites**: Docker, Node.js 20.11+, Rust -2. **Start databases**: `cd ol-fyi-local-infra && docker compose up -d` -3. **ClickHouse setup**: Connect and run migrations from `api/tables_local.sql` -4. **Build transformer**: `cd api/transformer && cargo build` -5. **API setup**: `cd api && npm install && cp .env.example .env` -6. **Frontend setup**: `cd web-app && npm install` -7. **Run API**: `npm run start:dev` (from api/) -8. **Run frontend**: `npm run dev` (from web-app/) - -## Key Technologies - -- **Backend**: NestJS, GraphQL, Prisma, ClickHouse, BullMQ, Redis, NATS -- **Frontend**: React, TypeScript, Apollo Client, Styled Components, ECharts -- **Infrastructure**: Docker, Kubernetes, PostgreSQL, ClickHouse -- **Blockchain**: Aptos SDK for 0L Network integration +### Prerequisites +- Docker and Docker Compose +- Node.js 22+ +- Rust and Cargo +- PostgreSQL client tools + +### Setup Steps + +1. **Start infrastructure**: + ```bash + cd ol-fyi-local-infra + docker compose up -d + ``` + +2. **Run ClickHouse migrations**: + ```bash + docker compose exec -T clickhouse clickhouse-client --database=olfyi -n < ../api/tables_local.sql + ``` + +3. **Build transformer**: + ```bash + cd api/transformer + cargo build + ``` + +4. **Setup API**: + ```bash + cd api + npm install + cp .env.example .env + npx prisma generate + npx prisma db push + npm run start:dev + ``` + +5. **Setup frontend**: + ```bash + cd web-app + npm install + npm run dev + ``` + +## Environment Configuration + +Key environment variables in `api/.env`: +- `DATABASE_URL`: PostgreSQL connection for Prisma +- `CLICKHOUSE_HOST`, `CLICKHOUSE_DATABASE`: ClickHouse connection +- `REDIS_HOST`: Redis for BullMQ queues +- `NATS_SERVERS`: NATS messaging system +- `RPC_PROVIDER_URL`: 0L blockchain RPC endpoint +- `DATA_API_HOST`: Blockchain data API +- `ROLES`: Comma-separated list of worker roles to enable +- `S3_*`: Object storage configuration for Parquet files ## Testing Strategy -- API uses Jest for unit and e2e tests -- Frontend uses ESLint for code quality -- Prisma for database schema management -- Both projects use Prettier for code formatting - -## Code Quality Standards - -When working with files in this repository: - -- **ALWAYS ensure files end with a trailing newline** - This is required for proper POSIX compliance and prevents issues with git diffs and various tools -- Follow existing code style and formatting conventions -- Use the project's linting and formatting tools before committing +### API Testing +- **Test Files**: `*.spec.ts` files co-located with source code +- **E2E Tests**: Separate `/test/` directory +- **Configuration**: Jest with ESM support via ts-jest +- **Run Tests**: `npm run test` (unit), `npm run test:e2e` (integration) + +### Code Quality +- **Prettier**: Line width 100, single quotes, trailing commas +- **ESLint**: TypeScript configuration with auto-fix +- **Pre-commit**: Run `npm run lint` and `npm run prettier-fix` + +## Key Architectural Patterns + +1. **Dependency Injection**: NestJS DI for service management +2. **Factory Pattern**: Transaction processing with pluggable handlers +3. **Repository Pattern**: Clean data access layer abstractions +4. **Module Federation**: Feature-based module organization +5. **Real-time Updates**: GraphQL subscriptions over WebSocket +6. **Multi-database**: PostgreSQL (app), ClickHouse (analytics) +7. **Background Processing**: BullMQ queues with Redis +8. **Role-based Workers**: Configurable processing roles + +## Database Schema + +### PostgreSQL (Prisma) +- Application configuration +- User preferences +- Wallet subscriptions +- Multi-sig wallet data + +### ClickHouse +- Blockchain transactions +- Account balances +- Validator statistics +- Network metrics +- Time-series analytics + +## Deployment + +### Docker Build +- **API**: Multi-stage build with Rust transformer and Node.js +- **Frontend**: Vite build with Nginx serving +- **Infrastructure**: Kubernetes manifests in `/infra/` + +### Production Considerations +- Enable specific worker roles per deployment +- Configure appropriate database connections +- Set up S3 for Parquet file storage +- Configure Firebase for push notifications \ No newline at end of file diff --git a/documentation/BULLMQ_DASHBOARD_SETUP.md b/documentation/BULLMQ_DASHBOARD_SETUP.md new file mode 100644 index 00000000..1ff9e0d7 --- /dev/null +++ b/documentation/BULLMQ_DASHBOARD_SETUP.md @@ -0,0 +1,295 @@ +# BullMQ Dashboard (Bull Board) Setup Guide + +## Overview + +The 0L Explorer includes a Bull Board web interface for monitoring BullMQ queues in real-time. This dashboard provides visual insights into queue processing, job statuses, and helps diagnose issues with data ingestion pipelines. + +## Location and Configuration + +- **Directory**: `/pacakges/bull-board/` +- **Main Application**: `index.js` +- **Default Port**: `8006` +- **Docker Port**: `8080` (internal) + +## Local Setup + +### Method 1: Direct Node.js + +1. **Navigate to the Bull Board directory**: +```bash +cd pacakges/bull-board +``` + +2. **Install dependencies**: +```bash +npm install +``` + +3. **Configure environment variables**: + +Create a `.env` file in the `pacakges/bull-board` directory: + +```bash +# Redis connection +REDIS_HOST=127.0.0.1 +REDIS_PORT=6379 + +# Queue names to monitor (comma-separated) +QUEUE_NAMES=ol-version,ol-version-batch,ol-clickhouse-ingestor,ol-parquet-producer,expired-transactions,accounts,validators,community-wallets,wallet-subscription,stats,node-watcher + +# Port for the dashboard (optional, defaults to 8006) +PORT=8006 +``` + +4. **Start the dashboard**: +```bash +npm start +``` + +5. **Access the dashboard**: +``` +http://localhost:8006 +``` + +### Method 2: Docker + +1. **Navigate to the Bull Board directory**: +```bash +cd pacakges/bull-board +``` + +2. **Build the Docker image**: +```bash +./build.sh +``` + +Or manually: +```bash +docker build \ + --file ./Dockerfile \ + --tag bull-board:local \ + . +``` + +3. **Run the container**: +```bash +docker run -d \ + --name bull-board \ + -p 8006:8080 \ + -e REDIS_HOST=host.docker.internal \ + -e REDIS_PORT=6379 \ + -e QUEUE_NAMES="ol-version,ol-version-batch,ol-clickhouse-ingestor,ol-parquet-producer,expired-transactions,accounts,validators,community-wallets,wallet-subscription,stats,node-watcher" \ + bull-board:local +``` + +**Note**: Use `host.docker.internal` on Mac/Windows. On Linux, use `--network host` or the actual host IP. + +4. **Access the dashboard**: +``` +http://localhost:8006 +``` + +## Available Queues + +The following BullMQ queues are available for monitoring: + +### Core Processing Queues +- **`ol-version`** - Individual blockchain version processing +- **`ol-version-batch`** - Batch version processing for historical data +- **`ol-clickhouse-ingestor`** - ClickHouse data ingestion from Parquet files +- **`ol-parquet-producer`** - Parquet file generation for analytics + +### Transaction and Account Queues +- **`expired-transactions`** - Cleanup of expired pending transactions +- **`accounts`** - Account data processing and updates +- **`validators`** - Validator information and statistics +- **`community-wallets`** - Community wallet tracking + +### Monitoring and Subscription Queues +- **`wallet-subscription`** - Wallet subscription notifications +- **`stats`** - Network statistics aggregation +- **`node-watcher`** - Node health monitoring + +## Dashboard Features + +### Main Overview +- **Queue List**: All configured queues with job counts +- **Status Indicators**: Visual status for each queue +- **Job Counts**: Active, waiting, completed, failed, delayed, and paused jobs + +### Queue Details +Click on any queue to see: +- **Jobs by Status**: Filtered views of jobs +- **Job Timeline**: Visual representation of job processing +- **Processing Rate**: Jobs processed per minute/hour +- **Error Rate**: Failed job statistics + +### Job Management +- **View Job Data**: Inspect job payload and results +- **Error Messages**: See failure reasons and stack traces +- **Retry Failed Jobs**: Manually retry individual or bulk jobs +- **Clean Queue**: Remove old completed/failed jobs +- **Promote Delayed Jobs**: Force delayed jobs to process immediately +- **Pause/Resume Queue**: Control queue processing + +### Individual Job View +- **Job ID and Status** +- **Creation and Processing Timestamps** +- **Attempt Count**: Number of processing attempts +- **Job Data**: Input parameters +- **Return Value**: Processing results +- **Error Details**: Failure information if applicable +- **Logs**: Processing logs (if configured) + +## Troubleshooting Stale Data + +When investigating stale transaction data, focus on these areas: + +### 1. Check `ol-version` Queue +- **Failed Jobs**: Look for repeated failures +- **Stuck Jobs**: Check "active" jobs running for too long +- **Job Data**: Inspect version numbers being processed +- **Error Messages**: Common issues: + - RPC timeout errors + - Network connectivity issues + - Invalid version numbers + +### 2. Monitor `ol-version-batch` Queue +- **Batch Processing**: Ensures historical data completeness +- **Large Jobs**: May take longer to process +- **Memory Issues**: Check for out-of-memory errors + +### 3. Verify `ol-clickhouse-ingestor` Queue +- **Ingestion Status**: Confirms data reaches ClickHouse +- **Parquet File Issues**: File format or corruption errors +- **Database Errors**: Connection or insertion failures + +### 4. Review Queue Metrics +- **Processing Rate**: Jobs/minute should be consistent +- **Queue Depth**: Growing queues indicate processing issues +- **Failure Rate**: High failure rates need investigation + +## Common Issues and Solutions + +### Issue: Queue Shows No Jobs +**Solution**: Verify the worker is enabled in `ROLES` environment variable +```bash +ROLES=api,version-processor,clickhouse-ingestor-processor,... +``` + +### Issue: All Jobs Failing +**Possible Causes**: +- Redis connection issues +- RPC provider down +- Transformer binary missing +- Database connection problems + +**Debug Steps**: +1. Check job error messages in dashboard +2. Verify Redis connectivity +3. Test RPC provider endpoint +4. Check application logs + +### Issue: Jobs Stuck in "Active" State +**Solution**: +- Jobs may have timed out +- Click on the job to see details +- Use "Retry" to reprocess +- Check worker process health + +### Issue: Growing "Delayed" Jobs +**Indicates**: Rate limiting or scheduled processing +- Check job details for delay reasons +- Use "Promote" to process immediately if needed + +## Performance Optimization + +### Queue Configuration +Monitor these metrics for optimization: +- **Concurrency**: Number of parallel jobs +- **Rate Limiting**: Jobs per time period +- **Retry Strategy**: Backoff configuration + +### Redis Connection +Ensure Redis has sufficient: +- Memory for job storage +- Connection pool size +- Network bandwidth + +### Worker Scaling +If queues are backing up: +1. Check CPU/memory usage of workers +2. Consider running multiple worker instances +3. Adjust job concurrency settings + +## Integration with Monitoring + +### Alerts to Set Up +Based on dashboard metrics, configure alerts for: +- Failed job count > threshold +- Queue depth > maximum +- Processing rate < minimum +- No jobs processed in X minutes + +### Metrics to Track +- Jobs processed per minute +- Average processing time +- Failure rate percentage +- Queue depth trends + +## Security Considerations + +### Production Deployment +1. **Authentication**: Add authentication middleware +2. **Network Access**: Restrict to internal network or VPN +3. **Read-Only Access**: Consider read-only mode for production +4. **HTTPS**: Use reverse proxy with SSL + +### Example Nginx Configuration +```nginx +server { + listen 443 ssl; + server_name bull-dashboard.example.com; + + ssl_certificate /path/to/cert.pem; + ssl_certificate_key /path/to/key.pem; + + location / { + proxy_pass http://localhost:8006; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + + # Basic authentication + auth_basic "Bull Board Dashboard"; + auth_basic_user_file /etc/nginx/.htpasswd; + } +} +``` + +## Additional Resources + +### Environment Variables Reference +- `REDIS_HOST`: Redis server hostname (default: "127.0.0.1") +- `REDIS_PORT`: Redis server port (default: 6379) +- `QUEUE_NAMES`: Comma-separated list of queue names to monitor +- `PORT`: HTTP port for dashboard (default: 8006) + +### Related Documentation +- [Bull Board GitHub](https://github.com/felixmosh/bull-board) +- [BullMQ Documentation](https://docs.bullmq.io/) +- [Redis Administration](https://redis.io/docs/manual/admin/) + +### Support Commands +```bash +# Check if Bull Board is running +curl http://localhost:8006/ + +# View Docker logs +docker logs bull-board + +# Check Redis connectivity +redis-cli ping + +# List all Bull queues in Redis +redis-cli --scan --pattern "bull:*" +``` \ No newline at end of file diff --git a/documentation/TRANSACTION_DATA_FLOW.md b/documentation/TRANSACTION_DATA_FLOW.md new file mode 100644 index 00000000..49f61fbc --- /dev/null +++ b/documentation/TRANSACTION_DATA_FLOW.md @@ -0,0 +1,324 @@ +# Transaction Data Flow in 0L Explorer + +## Overview +This document explains how transaction data flows from the blockchain to the latest transactions page, and identifies potential failure points that could cause stale data. + +## Data Flow Pipeline + +### 1. Frontend Query +**Location:** `web-app/src/modules/core/routes/Transactions/Transactions.tsx:11-26` + +The transactions page makes a GraphQL query to fetch transaction data: +```graphql +query GetUserTransactions($limit: Int!, $offset: Int!) { + userTransactions(limit: $limit, offset: $offset, order: "DESC") { + size + items { + version + sender + moduleAddress + moduleName + functionName + timestamp + success + } + } +} +``` + +### 2. GraphQL Resolver +**Location:** `api/src/ol/user-transactions.resolver.ts:33-127` + +- Directly queries the ClickHouse `user_transaction` table +- Returns paginated transaction data ordered by version (DESC by default) +- No caching layer - queries hit ClickHouse directly + +### 3. Data Ingestion Pipeline + +#### Version Processor +**Location:** `api/src/ol/ol-version.processor.ts` + +The main ingestion flow: + +1. **Fetch Latest Version** (runs every 5 seconds): + - Calls `fetchLatestVersion()` (lines 93-96, 289-296) + - Fetches latest ledger version from RPC provider + - Creates processing jobs for the last 1,000 versions + +2. **Process Individual Versions**: + - Each version job fetches the transaction from blockchain API (lines 252-266) + - Downloads transaction data from: `${RPC_PROVIDER_URL}/v1/transactions?start={version}&limit=1` + +3. **Transform Data**: + - Writes transaction JSON to temporary file (line 307) + - Calls Rust transformer binary to convert to Parquet format (line 328) + - Transformer location: `/usr/local/bin/transformer` (prod) or `./transformer/target/debug/transformer` (dev) + +4. **Insert into ClickHouse**: + - Inserts Parquet files directly into ClickHouse tables (line 331) + - Marks version as ingested in `ingested_versions` table (lines 337-345) + +#### Missing Versions Handler +**Location:** `api/src/ol/ol-version.processor.ts:384-412` + +- Runs every 5 seconds to catch any missed versions +- Compares blockchain latest version with already ingested versions +- Creates jobs for any gaps in the sequence + +## Potential Failure Points + +### 1. RPC Provider Issues +**Impact:** No new transactions will be fetched + +**Location:** `api/src/ol/ol-version.processor.ts:354-360` + +**Symptoms:** +- `getLedgerVersion()` returns stale version numbers +- The RPC endpoint (`${RPC_PROVIDER_URL}/v1`) is down or unresponsive +- Network connectivity issues to the RPC provider + +**Debug Commands:** +```bash +# Check current ledger version from RPC +curl ${RPC_PROVIDER_URL}/v1 | jq .ledger_version + +# Check if endpoint is responsive +curl -w "\n%{http_code}\n" -o /dev/null -s ${RPC_PROVIDER_URL}/v1 +``` + +### 2. Worker Role Not Enabled +**Impact:** Version processor never runs + +**Required Environment Variable:** +```bash +ROLES="api,version-processor,..." +``` + +**Debug Commands:** +```bash +# Check if version-processor is in ROLES +echo $ROLES | grep version-processor + +# Check running processes +ps aux | grep version-processor +``` + +### 3. Transformer Binary Failures +**Impact:** Transactions fetched but not transformed/ingested + +**Location:** `api/src/ol/transformer.service.ts:155-194` + +**Common Issues:** +- Binary not found at expected path +- Binary crashes during transformation +- Invalid JSON structure causes transformation failure +- Permission issues + +**Debug Commands:** +```bash +# Check if transformer binary exists +ls -la /usr/local/bin/transformer # Production +ls -la ./api/transformer/target/debug/transformer # Development + +# Check transformer logs for errors +grep "Transformer failed with code" /path/to/logs +grep "Transformer stderr:" /path/to/logs +``` + +### 4. Queue Processing Issues +**Impact:** Jobs created but not processed + +**Configuration:** +- Jobs timeout after 1 minute (line 137) +- Failed jobs retry 15 times with 5-second delays (lines 273-276) +- Jobs removed after 1 hour when complete (line 280) + +**Debug with BullMQ:** +```bash +# Connect to Redis +redis-cli + +# Check queue status +KEYS bull:ol-version:* + +# Check for failed jobs +LRANGE bull:ol-version:failed 0 -1 + +# Check for stuck jobs +ZRANGE bull:ol-version:stalled 0 -1 +``` + +### 5. ClickHouse Ingestion Issues +**Impact:** Data transformed but not queryable + +**Checks:** +```sql +-- Check latest ingested version +SELECT MAX(version) FROM user_transaction; + +-- Check latest transaction timestamp +SELECT MAX(timestamp), FROM_UNIXTIME(MAX(timestamp)) FROM user_transaction; + +-- Check ingested versions tracking +SELECT COUNT(*) FROM ingested_versions; +SELECT MAX(version) FROM ingested_versions; + +-- Check for recent ingestions +SELECT COUNT(*) +FROM user_transaction +WHERE timestamp > (UNIX_TIMESTAMP() - 3600); +``` + +### 6. Duplicate Version Prevention +**Impact:** Versions marked as ingested but data missing + +**Location:** `api/src/ol/ol-version.processor.ts:311-326` + +The system checks `ingested_versions` before processing. If a version is marked as ingested but data is missing from `user_transaction`, it won't be re-processed. + +**Fix:** +```sql +-- Find and remove incorrectly marked versions +DELETE FROM ingested_versions +WHERE version NOT IN ( + SELECT DISTINCT version FROM user_transaction +); +``` + +## Monitoring Checklist + +### Real-time Monitoring +1. **RPC Health** + - Monitor `${RPC_PROVIDER_URL}/v1` response times + - Track ledger version progression + +2. **Worker Health** + - Ensure `version-processor` is in ROLES + - Monitor worker process CPU/memory usage + - Check BullMQ queue depths + +3. **Database Health** + - Monitor ClickHouse query performance + - Track table sizes and growth rates + - Monitor ingestion rates + +### Key Metrics to Track +```sql +-- Ingestion lag (difference between blockchain and database) +WITH latest_blockchain AS ( + -- This would come from RPC call + SELECT 1000000 as version +), +latest_db AS ( + SELECT MAX(version) as version FROM user_transaction +) +SELECT + lb.version - ld.version as version_lag, + NOW() - FROM_UNIXTIME( + (SELECT MAX(timestamp) FROM user_transaction) + ) as time_lag +FROM latest_blockchain lb, latest_db ld; + +-- Ingestion rate (transactions per minute) +SELECT + COUNT(*) as txn_count, + FROM_UNIXTIME(MIN(timestamp)) as period_start, + FROM_UNIXTIME(MAX(timestamp)) as period_end +FROM user_transaction +WHERE timestamp > (UNIX_TIMESTAMP() - 300); +``` + +## Recovery Procedures + +### 1. Restart Stuck Workers +```bash +# Restart the API service (includes workers) +npm run start:dev # Development +pm2 restart api # Production with PM2 +kubectl rollout restart deployment/api # Kubernetes +``` + +### 2. Clear Failed Jobs +```bash +# Connect to Redis +redis-cli + +# Clear failed jobs queue +DEL bull:ol-version:failed + +# Clear completed jobs +DEL bull:ol-version:completed +``` + +### 3. Force Re-ingestion +```sql +-- Remove ingestion markers for a range +DELETE FROM ingested_versions +WHERE version BETWEEN :start_version AND :end_version; +``` + +Then restart the worker to trigger re-processing. + +### 4. Manual Version Processing +```javascript +// Trigger specific version processing via API or console +await olVersionQueue.add('version', { + version: '123456789' +}, { + jobId: `__version__123456789` +}); +``` + +## Common Scenarios + +### Scenario 1: "No new transactions for several days" +**Likely Causes:** +1. RPC provider returning stale data +2. Worker not running +3. Transformer binary issues + +**Investigation Steps:** +1. Check RPC provider ledger version +2. Verify worker is enabled in ROLES +3. Check logs for transformer errors +4. Query ClickHouse for latest data +5. Check BullMQ for failed jobs + +### Scenario 2: "Transactions appear with delay" +**Likely Causes:** +1. Queue backlog +2. Slow RPC responses +3. ClickHouse ingestion delays + +**Investigation Steps:** +1. Check queue depth in Redis +2. Monitor RPC response times +3. Check ClickHouse insert performance + +## Environment Variables + +Critical configuration for transaction processing: + +```bash +# RPC endpoint for fetching blockchain data +RPC_PROVIDER_URL=https://rpc.provider.example.com + +# Worker roles (must include version-processor) +ROLES=api,version-processor,clickhouse-ingestor-processor + +# Database connections +CLICKHOUSE_HOST=127.0.0.1 +CLICKHOUSE_DATABASE=olfyi +REDIS_HOST=127.0.0.1 + +# For batch processing (optional) +DATA_API_HOST=https://data.provider.example.com +``` + +## Contact Points + +For issues with: +- **RPC Provider**: Check provider status page or contact provider support +- **ClickHouse**: Database administrator or DevOps team +- **Application Logs**: Check application logging system (CloudWatch, Datadog, etc.) +- **Queue Issues**: Redis/BullMQ monitoring dashboard \ No newline at end of file