TimelordUK · TimelordUK · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025
diff --git a/sql-cli/analyze_memory.md b/sql-cli/analyze_memory.md
@@ -0,0 +1,69 @@
+# Memory Usage Analysis for 100k Row CSV
+
+## Current Data Duplication Issue
+
+When loading a 100k row CSV file, the data is stored multiple times:
+
+### 1. CsvDataSource (src/data/csv_datasource.rs)
+- Stores data as `Vec<serde_json::Value>` 
+- Each row is a JSON object with field names duplicated
+
+### 2. QueryResponse (src/api_client.rs) 
+- Contains `data: Vec<Value>` - another copy of the JSON data
+- Stored in Buffer.results
+
+### 3. Buffer.filtered_data (optional)
+- When filtering: `Vec<Vec<String>>` - string representation of filtered rows
+
+### 4. Buffer.cached_data (optional)
+- Another `Vec<serde_json::Value>` for caching
+
+## Memory Overhead Calculation
+
+For a typical trade record with 7 fields:
+```
+{
+  "id": 12345,
+  "symbol": "AAPL", 
+  "price": 150.25,
+  "quantity": 100,
+  "timestamp": "2024-01-15T10:30:00Z",
+  "side": "BUY",
+  "exchange": "NASDAQ"
+}
+```
+
+### JSON Object Overhead:
+- Field names: ~50 bytes × 100k rows = 5MB
+- serde_json::Value enum tags: 8 bytes × 7 fields × 100k = 5.6MB  
+- HashMap overhead: ~40 bytes × 100k = 4MB
+- String allocations: Each string value has its own allocation
+
+### Total Memory Usage:
+- Raw data: ~100 bytes × 100k = 10MB
+- JSON representation: ~300 bytes × 100k = 30MB
+- Multiple copies: 30MB × 2-3 = 60-90MB minimum
+- Plus heap fragmentation and allocator overhead
+
+**Result: 10MB of actual data becomes 100MB+ in memory**
+
+## Solution Options
+
+### Short-term Fix (V46)
+1. Remove duplicate storage of cached_data when not needed
+2. Use indices instead of copying filtered data
+3. Clear unused data after loading
+
+### Long-term Fix (V50+)
+1. Migrate to DataTable with columnar storage
+2. Store data only once in efficient format
+3. Use views/indices for filtering and sorting
+4. Lazy loading for large datasets
+
+## Immediate Recommendation
+
+For V46, we should:
+1. Avoid storing `cached_data` unless actually caching
+2. Use filter indices instead of `filtered_data` copies  
+3. Implement streaming for large CSV files
+4. Consider compression for string columns
diff --git a/sql-cli/integration_tests/README.md b/sql-cli/integration_tests/README.md
@@ -2,9 +2,18 @@
 
 This directory contains all integration and test files for the SQL CLI project.
 
+## Directory Structure
+
+```
+integration_tests/
+├── test_scripts/       # Shell scripts for testing features
+├── test_data/         # CSV and other data files for tests
+└── *.rs              # Rust integration test files
+```
+
 ## Organization
 
-### Shell Scripts (*.sh)
+### Shell Scripts (`test_scripts/`)
 - `test_all_fixes.sh` - Comprehensive test suite for all fixes
 - `test_buffer_switch.sh` - Tests buffer switching functionality
 - `test_column_search.sh` - Tests column search feature
@@ -61,20 +70,42 @@ These are standalone test programs that can be compiled and run individually:
 - `test_history_debug.rs`, `test_history_unit.rs` - History system tests
 - `test_state_init.rs` - State initialization
 
+### Test Data (`test_data/`)
+- Sample CSV files with various data types and structures
+- Query result exports for regression testing
+- Test fixtures for specific scenarios
+
 ## Running Tests
 
 ### Shell Scripts
 ```bash
-cd integration_tests
-./test_history_search.sh  # or any other .sh file
+# From project root
+./integration_tests/test_scripts/test_history_search.sh
+
+# Or for version-specific tests
+./integration_tests/test_scripts/test_v46_datatable.sh
 ```
 
 ### Rust Test Files
 ```bash
-cd integration_tests
-rustc test_csv.rs && ./test_csv  # Compile and run individual test
+# Run all integration tests
+cargo test --test '*'
+
+# Run specific test
+cargo test --test test_csv
+
+# With debug output
+RUST_LOG=debug cargo test --test test_name -- --nocapture
 ```
 
+## Version Tests
+
+Tests are versioned to match our DataTable migration strategy:
+- **V40-V45**: Trait-based migration (✅ complete)
+- **V46-V50**: DataTable introduction (🚧 in progress)
+- **V51-V60**: DataView implementation (📋 planned)
+- **V61-V70**: Full migration completion (📋 planned)
+
 ## Note
 These tests were moved from the main project directory to keep it clean and organized.
-All paths in the test files assume they're run from the integration_tests directory.
+Test scripts may need path adjustments if test data locations have changed.
diff --git a/sql-cli/test_case_insensitive_query.rs → ...tion_tests/test_case_insensitive_query.rs b/sql-cli/test_case_insensitive_query.rs → ...tion_tests/test_case_insensitive_query.rs
diff --git a/sql-cli/test_column_search.rs → ...i/integration_tests/test_column_search.rs b/sql-cli/test_column_search.rs → ...i/integration_tests/test_column_search.rs
diff --git a/sql-cli/test_column_search_automated.rs → ...ion_tests/test_column_search_automated.rs b/sql-cli/test_column_search_automated.rs → ...ion_tests/test_column_search_automated.rs
diff --git a/sql-cli/query_results_20250812_205413.csv → ...st_data/query_results_20250812_205413.csv b/sql-cli/query_results_20250812_205413.csv → ...st_data/query_results_20250812_205413.csv
diff --git a/sql-cli/query_results_20250813_120524.csv → ...st_data/query_results_20250813_120524.csv b/sql-cli/query_results_20250813_120524.csv → ...st_data/query_results_20250813_120524.csv
diff --git a/sql-cli/query_results_20250813_120833.csv → ...st_data/query_results_20250813_120833.csv b/sql-cli/query_results_20250813_120833.csv → ...st_data/query_results_20250813_120833.csv
diff --git a/sql-cli/query_results_20250813_120946.csv → ...st_data/query_results_20250813_120946.csv b/sql-cli/query_results_20250813_120946.csv → ...st_data/query_results_20250813_120946.csv
diff --git a/sql-cli/query_results_20250813_121518.csv → ...st_data/query_results_20250813_121518.csv b/sql-cli/query_results_20250813_121518.csv → ...st_data/query_results_20250813_121518.csv
diff --git a/sql-cli/query_results_20250813_122043.csv → ...st_data/query_results_20250813_122043.csv b/sql-cli/query_results_20250813_122043.csv → ...st_data/query_results_20250813_122043.csv
diff --git a/sql-cli/test.csv → sql-cli/integration_tests/test_data/test.csv b/sql-cli/test.csv → sql-cli/integration_tests/test_data/test.csv
diff --git a/sql-cli/test_columns.csv → ...egration_tests/test_data/test_columns.csv b/sql-cli/test_columns.csv → ...egration_tests/test_data/test_columns.csv
diff --git a/sql-cli/test_completion.csv → ...ation_tests/test_data/test_completion.csv b/sql-cli/test_completion.csv → ...ation_tests/test_data/test_completion.csv
diff --git a/sql-cli/test_data.csv → ...integration_tests/test_data/test_data.csv b/sql-cli/test_data.csv → ...integration_tests/test_data/test_data.csv
diff --git a/sql-cli/integration_tests/test_data/test_datatable.csv b/sql-cli/integration_tests/test_data/test_datatable.csv
@@ -0,0 +1,6 @@
+id,name,age,salary,active,joined_date
+1,Alice,30,75000.50,true,2020-01-15
+2,Bob,25,60000.00,false,2021-03-20
+3,Charlie,35,85000.75,true,2019-06-01
+4,Diana,28,70000.25,true,2022-02-10
+5,Eve,32,,false,2020-11-30
diff --git a/sql-cli/test_display_fix.csv → ...tion_tests/test_data/test_display_fix.csv b/sql-cli/test_display_fix.csv → ...tion_tests/test_data/test_display_fix.csv
diff --git a/sql-cli/test_export.csv → ...tegration_tests/test_data/test_export.csv b/sql-cli/test_export.csv → ...tegration_tests/test_data/test_export.csv
diff --git a/sql-cli/test_filters.csv → ...egration_tests/test_data/test_filters.csv b/sql-cli/test_filters.csv → ...egration_tests/test_data/test_filters.csv
diff --git a/sql-cli/test_fuzzy.csv → ...ntegration_tests/test_data/test_fuzzy.csv b/sql-cli/test_fuzzy.csv → ...ntegration_tests/test_data/test_fuzzy.csv
diff --git a/sql-cli/test_nav.csv → .../integration_tests/test_data/test_nav.csv b/sql-cli/test_nav.csv → .../integration_tests/test_data/test_nav.csv
diff --git a/sql-cli/test_pin.csv → .../integration_tests/test_data/test_pin.csv b/sql-cli/test_pin.csv → .../integration_tests/test_data/test_pin.csv
diff --git a/sql-cli/test_search.csv → ...tegration_tests/test_data/test_search.csv b/sql-cli/test_search.csv → ...tegration_tests/test_data/test_search.csv
diff --git a/sql-cli/test_sort.csv → ...integration_tests/test_data/test_sort.csv b/sql-cli/test_sort.csv → ...integration_tests/test_data/test_sort.csv
diff --git a/sql-cli/test_trades.json → ...egration_tests/test_data/test_trades.json b/sql-cli/test_trades.json → ...egration_tests/test_data/test_trades.json
diff --git a/sql-cli/test_types.csv → ...ntegration_tests/test_data/test_types.csv b/sql-cli/test_types.csv → ...ntegration_tests/test_data/test_types.csv
diff --git a/sql-cli/test_v29_column_search.csv → ...ests/test_data/test_v29_column_search.csv b/sql-cli/test_v29_column_search.csv → ...ests/test_data/test_v29_column_search.csv
diff --git a/sql-cli/integration_tests/test_memory.rs b/sql-cli/integration_tests/test_memory.rs
@@ -0,0 +1,36 @@
+use serde_json::json;
+
+fn main() {
+    // Test memory usage of serde_json::Value
+    let json_val = json!({
+        "id": 12345,
+        "symbol": "AAPL",
+        "price": 150.25,
+        "quantity": 100,
+        "timestamp": "2024-01-15T10:30:00Z",
+        "side": "BUY",
+        "exchange": "NASDAQ"
+    });
+
+    println!("Size of serde_json::Value: {} bytes", std::mem::size_of_val(&json_val));
+
+    // String version
+    let str_vec = vec![
+        "12345".to_string(),
+        "AAPL".to_string(), 
+        "150.25".to_string(),
+        "100".to_string(),
+        "2024-01-15T10:30:00Z".to_string(),
+        "BUY".to_string(),
+        "NASDAQ".to_string()
+    ];
+
+    println!("Size of Vec<String>: {} bytes", std::mem::size_of_val(&str_vec));
+
+    // Actual string content size
+    let json_str = serde_json::to_string(&json_val).unwrap();
+    println!("JSON string length: {} bytes", json_str.len());
+
+    let total_str_len: usize = str_vec.iter().map(|s| s.len()).sum();
+    println!("Total string content: {} bytes", total_str_len);
+}
diff --git a/sql-cli/test_colors.sh → ...gration_tests/test_scripts/test_colors.sh b/sql-cli/test_colors.sh → ...gration_tests/test_scripts/test_colors.sh
diff --git a/sql-cli/test_column_fix.sh → ...ion_tests/test_scripts/test_column_fix.sh b/sql-cli/test_column_fix.sh → ...ion_tests/test_scripts/test_column_fix.sh
diff --git a/sql-cli/test_column_ops.sh → ...ion_tests/test_scripts/test_column_ops.sh b/sql-cli/test_column_ops.sh → ...ion_tests/test_scripts/test_column_ops.sh
diff --git a/sql-cli/test_column_search_focus.sh → .../test_scripts/test_column_search_focus.sh b/sql-cli/test_column_search_focus.sh → .../test_scripts/test_column_search_focus.sh
diff --git a/sql-cli/test_column_types.sh → ...n_tests/test_scripts/test_column_types.sh b/sql-cli/test_column_types.sh → ...n_tests/test_scripts/test_column_types.sh
diff --git a/sql-cli/test_console_logging.sh → ...ests/test_scripts/test_console_logging.sh b/sql-cli/test_console_logging.sh → ...ests/test_scripts/test_console_logging.sh
diff --git a/sql-cli/test_f5_dump.sh → ...ration_tests/test_scripts/test_f5_dump.sh b/sql-cli/test_f5_dump.sh → ...ration_tests/test_scripts/test_f5_dump.sh
diff --git a/sql-cli/test_filter_script.sh → ..._tests/test_scripts/test_filter_script.sh b/sql-cli/test_filter_script.sh → ..._tests/test_scripts/test_filter_script.sh
diff --git a/sql-cli/test_fuzzy_filter.sh → ...n_tests/test_scripts/test_fuzzy_filter.sh b/sql-cli/test_fuzzy_filter.sh → ...n_tests/test_scripts/test_fuzzy_filter.sh
diff --git a/sql-cli/test_navigation.sh → ...ion_tests/test_scripts/test_navigation.sh b/sql-cli/test_navigation.sh → ...ion_tests/test_scripts/test_navigation.sh
diff --git a/sql-cli/test_sort_cycles.sh → ...on_tests/test_scripts/test_sort_cycles.sh b/sql-cli/test_sort_cycles.sh → ...on_tests/test_scripts/test_sort_cycles.sh
diff --git a/sql-cli/test_sort_ops.sh → ...ation_tests/test_scripts/test_sort_ops.sh b/sql-cli/test_sort_ops.sh → ...ation_tests/test_scripts/test_sort_ops.sh
diff --git a/sql-cli/test_sort_viewport.sh → ..._tests/test_scripts/test_sort_viewport.sh b/sql-cli/test_sort_viewport.sh → ..._tests/test_scripts/test_sort_viewport.sh
diff --git a/sql-cli/test_state_logging.sh → ..._tests/test_scripts/test_state_logging.sh b/sql-cli/test_state_logging.sh → ..._tests/test_scripts/test_state_logging.sh
diff --git a/sql-cli/test_tab_completion.sh → ...tests/test_scripts/test_tab_completion.sh b/sql-cli/test_tab_completion.sh → ...tests/test_scripts/test_tab_completion.sh
diff --git a/sql-cli/test_tui_sort_fix.sh → ...n_tests/test_scripts/test_tui_sort_fix.sh b/sql-cli/test_tui_sort_fix.sh → ...n_tests/test_scripts/test_tui_sort_fix.sh
diff --git a/sql-cli/test_v27_selection.sh → ..._tests/test_scripts/test_v27_selection.sh b/sql-cli/test_v27_selection.sh → ..._tests/test_scripts/test_v27_selection.sh
diff --git a/sql-cli/test_v28_clipboard.sh → ..._tests/test_scripts/test_v28_clipboard.sh b/sql-cli/test_v28_clipboard.sh → ..._tests/test_scripts/test_v28_clipboard.sh
diff --git a/sql-cli/test_v31_states.sh → ...ion_tests/test_scripts/test_v31_states.sh b/sql-cli/test_v31_states.sh → ...ion_tests/test_scripts/test_v31_states.sh
diff --git a/sql-cli/integration_tests/test_scripts/test_v46_datatable.sh b/sql-cli/integration_tests/test_scripts/test_v46_datatable.sh
@@ -0,0 +1,56 @@
+#!/bin/bash
+
+# Test script for V46: DataTable Introduction
+
+echo "Testing V46: DataTable Introduction"
+echo "===================================="
+
+# Build the project
+echo "Building project..."
+cargo build --release 2>&1 | grep -E "error|warning|Finished"
+
+if [ $? -ne 0 ]; then
+    echo "Build failed!"
+    exit 1
+fi
+
+echo ""
+echo "Test: DataTable Conversion Demo"
+echo "--------------------------------"
+echo "1. Load a CSV file"
+echo "2. Press F6 to convert current results to DataTable"
+echo "3. Check status message for memory comparison"
+echo "4. Check debug logs for column type information"
+echo ""
+
+# Create test data
+cat > test_datatable.csv << EOF
+id,name,age,salary,active,joined_date
+1,Alice,30,75000.50,true,2020-01-15
+2,Bob,25,60000.00,false,2021-03-20
+3,Charlie,35,85000.75,true,2019-06-01
+4,Diana,28,70000.25,true,2022-02-10
+5,Eve,32,,false,2020-11-30
+EOF
+
+echo "Test data created: test_datatable.csv"
+echo ""
+echo "Running tests..."
+
+# Test DataTable conversion
+cargo test --lib data::datatable::tests::test_from_query_response --nocapture 2>&1 | grep -E "test result|V46"
+
+echo ""
+echo "Instructions for manual testing:"
+echo "1. Run: RUST_LOG=debug ./target/release/sql-cli test_datatable.csv"
+echo "2. After data loads, press F6"
+echo "3. Look for 'V46: DataTable created!' in status bar"
+echo "4. Check debug logs (F5) for detailed column information"
+echo ""
+echo "Expected behavior:"
+echo "- Status shows memory comparison (JSON vs DataTable)"
+echo "- Debug logs show column types (Integer, String, Float, Boolean, DateTime)"
+echo "- Memory usage should be lower for DataTable"
+echo ""
+echo "===================================="
+echo "V46 DataTable Introduction Test Complete!"
diff --git a/sql-cli/test_simple_nav.rs → sql-cli/integration_tests/test_simple_nav.rs b/sql-cli/test_simple_nav.rs → sql-cli/integration_tests/test_simple_nav.rs
diff --git a/sql-cli/test_v42_breakpoints.lldb → ...tegration_tests/test_v42_breakpoints.lldb b/sql-cli/test_v42_breakpoints.lldb → ...tegration_tests/test_v42_breakpoints.lldb