Skip to content

Conversation

@NripeshN
Copy link
Owner

@NripeshN NripeshN commented Jan 20, 2026

Note

Major CI and engine updates.

  • CI (Elo Tournament): Split into setup-macos/setup-linux with parallel match jobs; add Berserk and opening book; new MetalFish wrappers; artifact packaging per OS; cancel in-progress runs; updated defaults (games=16, time=600+0.1).
  • GPU/Engine: Remove legacy gpu/batch_ops.* and gpu/gpu_accumulator.*; add unified GPU NNUE manager usage in Eval (Apple-only toggle), update benchmark_gpu.cpp accordingly.
  • CUDA/Backend: Enhance backend with hardware capability queries, recommended batch sizing, and shutdown hooks; expand/format CUDA kernels and headers.
  • Build & CMake: Rework GPU/MCTS source lists, switch Metal shader to nnue.metal, link Accelerate, adjust tests, and tidy properties.
  • Docs & Repo: Expand README (MCTS/Hybrid details, usage, tournament), ignore results/ and network file.

Written by Cursor Bugbot for commit 0242e8f. This will update automatically on new commits. Configure here.

- Updated HybridSearchBridge to utilize the Stockfish engine for move verification and search operations, improving accuracy and performance.
- Added support for synchronous search in the Engine class, allowing for real-time move evaluations.
- Enhanced fallback mechanisms to ensure functionality when the engine is not available, maintaining robustness in MCTS operations.
- Updated documentation to reflect the integration of the Stockfish engine and its impact on search capabilities.
cursor[bot]

This comment was marked as outdated.

…-to-move

- Updated score calculations in verify_with_alphabeta to correctly negate scores based on the player's perspective (white vs. black).
- Added comments for clarity on the score negation logic, improving code readability and maintainability.
- Removed redundant updates to ab_nodes in HybridSearchBridge, clarifying the statistics handling.
…e verification

- Modified the initialize method to accept an Engine parameter, enabling the use of Stockfish for more accurate alpha-beta verification.
- Enhanced the verify_with_alphabeta function to utilize the engine for move evaluations, providing a fallback to NNUE when the engine is unavailable.
- Updated the create_enhanced_hybrid_search factory function to pass the engine instance, ensuring seamless integration.
- Improved comments and code structure for better readability and maintainability.
cursor[bot]

This comment was marked as outdated.

- Removed EnhancedHybridSearch and its associated files, streamlining the codebase.
- Added ParallelHybridSearch, which integrates MCTS and Alpha-Beta search for improved performance.
- Updated CMakeLists.txt to include new source files and removed references to the deleted enhanced hybrid search.
- Modified UCI commands to support the new parallel hybrid search functionality.
- Adjusted CI workflow and wrapper scripts to accommodate the changes in search commands and engine configurations.

These updates enhance the search capabilities of MetalFish, optimizing for both strategic and tactical play through parallel processing.
- Introduced new tests for ABSearchResult, ABSearchConfig, and ABSearcher to validate functionality and default configurations.
- Added TacticalAnalyzer and HybridSearchBridge tests to ensure proper behavior and initialization.
- Updated test_mcts.cpp to include these new tests, enhancing coverage for the MCTS module and ensuring robustness in search capabilities.
cursor[bot]

This comment was marked as outdated.

- Removed legacy GPU components including nnue_eval, batch_ops, and persistent_pipeline to streamline the codebase.
- Introduced new Apple Silicon-specific optimizations in MCTS, including unified memory handling and SIMD-accelerated computations.
- Updated CMakeLists.txt to reflect the removal of obsolete files and added new sources for Apple Silicon optimizations.
- Enhanced the MCTS algorithms with Lc0-inspired techniques, improving performance and efficiency in search operations.
- Adjusted CI workflows to accommodate the changes in GPU and MCTS implementations, ensuring compatibility and robustness.

These updates significantly enhance the performance of MetalFish on Apple Silicon, optimizing both memory usage and computational efficiency.
…lHybridSearch

- Added an initialization check in the start_search method to prevent search execution without proper setup, ensuring stability.
- Refined the decision-making logic in make_final_decision to better weigh confidence and score differences, improving move selection accuracy.
- Updated async evaluation submission to safely capture batch count, preventing potential use-after-free issues.

These changes enhance the robustness and effectiveness of the ParallelHybridSearch component, optimizing its performance in various scenarios.
cursor[bot]

This comment was marked as outdated.

- Introduced a custom deleter for posix_memalign-allocated memory to ensure proper deallocation using free() instead of delete[].
- Updated memory handling in the MCTS implementation to utilize unique_ptr with the new AlignedDeleter on Apple platforms, enhancing memory management and compatibility.

These changes improve memory safety and performance on Apple Silicon, aligning with recent optimizations in the codebase.
cursor[bot]

This comment was marked as outdated.

The FPU calculation was using -parent_q instead of parent_q, which inverted
the exploration behavior. This caused unvisited nodes to be over-punished in
winning positions and incorrectly preferred in losing positions.

Fixed by changing the formula from:
  fpu = -parent_q - reduction
to:
  fpu = parent_q - reduction

This fix was applied in:
- src/mcts/hybrid_search.cpp
- src/mcts/thread_safe_mcts.cpp
- src/mcts/lc0_mcts_core.h (ComputeFpu and ComputeFpuSimple)

The correct formula ensures unvisited nodes are slightly pessimistic compared
to visited children, as intended by the FPU reduction strategy.
@cursor
Copy link

cursor bot commented Jan 20, 2026

Bugbot Autofix resolved the bug found in the latest run.

  • ✅ Fixed: FPU formula has inverted sign causing wrong exploration
    • Changed FPU formula from -parent_q - reduction to parent_q - reduction in all four locations (hybrid_search.cpp, thread_safe_mcts.cpp, and lc0_mcts_core.h) to correctly implement pessimistic unvisited node values.

@NripeshN
Copy link
Owner Author

@cursor review

cursor[bot]

This comment was marked as outdated.

…max functions

- Bug 1: Use config_.cpuct_base and config_.cpuct_factor instead of hardcoded local constants in select_child_puct
- Bug 2: Handle temperature==0 as argmax in compute_softmax_simd to prevent division by zero
- Bug 3: Handle temperature==0 as argmax in expand_node policy softmax to prevent division by zero
@cursor
Copy link

cursor bot commented Jan 20, 2026

Bugbot Autofix resolved all 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: Config parameters cpuct_base and cpuct_factor are ignored
    • Replaced hardcoded local constants with config_.cpuct_base and config_.cpuct_factor in select_child_puct function.
  • ✅ Fixed: Division by zero when temperature is zero in softmax
    • Added temperature==0 check to handle argmax case before division in compute_softmax_simd function.
  • ✅ Fixed: Division by zero in expand_node policy softmax
    • Added temperature==0 check to handle argmax case before division in expand_node policy softmax.

- Introduced a global cleanup function to ensure proper shutdown of GPU resources before program exit, preventing potential crashes during static destruction.
- Registered the cleanup function with atexit to guarantee execution at the end of the program.
- Implemented shutdown functions for MCTS components, GPU feature extractor, and GPU NNUE manager to streamline resource management.
- Updated the main function to explicitly handle cleanup and ensure all GPU operations are synchronized before destruction.

These changes improve the stability and reliability of the MetalFish engine, particularly in scenarios involving GPU resources.
…ce CPU backend shutdown handling

- Modified the GitHub Actions workflow to trigger on pull request events, allowing automatic updates of tournament results.
- Removed the manual input for PR number, streamlining the process by directly using the pull request context.
- Added no-op shutdown functions for the CPU backend to improve code clarity and maintainability, ensuring safe shutdown behavior in CPU fallback mode.

These changes enhance the automation of tournament result posting and improve the overall structure of the backend code.
cursor[bot]

This comment was marked as outdated.

…n MCTS

- Add nullptr check in AppleSiliconNodePool::allocate() to handle posix_memalign failures
- Add bounds checking when accessing GPU batch results to prevent out-of-bounds access
- Matches the defensive pattern used in hybrid_search.cpp
@cursor
Copy link

cursor bot commented Jan 21, 2026

Bugbot Autofix resolved 2 of the 2 bugs found in the latest run.

  • ✅ Fixed: Memory allocation failure causes undefined pointer arithmetic
    • Added nullptr check at the start of allocate() to return nullptr when posix_memalign fails in constructor, preventing undefined behavior from pointer arithmetic on null pointer.
  • ✅ Fixed: Missing bounds check on GPU batch results access
    • Added bounds checking before accessing gpu_batch.psqt_scores[i] and gpu_batch.positional_scores[i], matching the defensive pattern used in hybrid_search.cpp to handle partial GPU evaluation failures.

- Introduced PGN parsing capabilities to extract individual game results and details, improving the tournament result reporting.
- Enhanced output formatting for game results, including color-coded results and move counts.
- Updated match summary display to include final scores and detailed game outputs.
- Added new shell scripts for MetalFish engine wrappers to facilitate hybrid and multi-threaded MCTS execution.

These changes significantly improve the usability and clarity of tournament results, while also streamlining engine interactions.
- Integrated the Berserk engine into the Elo tournament workflow, allowing it to be cloned and built alongside other engines.
- Updated the tournament logic to include Berserk as a configurable engine, with an expected Elo rating of 3550.
- Enhanced the output to check for the presence of the Berserk binary and report its status during the tournament execution.

These changes improve the tournament's engine diversity and provide additional options for performance evaluation.
cursor[bot]

This comment was marked as outdated.

- Introduced a new JSON configuration file for engine settings, allowing for dynamic loading of engine parameters such as expected Elo ratings and options.
- Updated the `elo_tournament.py` script to load engine configurations from the new `engines_config.json`, enhancing flexibility and maintainability.
- Added support for multiple engines including MetalFish variants, Patricia, Berserk, and Lc0, with their respective configurations.
- Improved the tournament setup process by integrating the new configuration loading mechanism, ensuring a more streamlined and customizable tournament experience.

These changes significantly enhance the configurability of the Elo tournament, allowing for easier adjustments and additions of new engines in the future.
- Updated the Python code in the GitHub Actions workflow to enhance readability by using consistent string formatting and indentation.
- Maintained functionality for generating macOS and Linux match matrices based on engine requirements.
- Ensured clarity in the extraction of match data for both platforms, supporting ongoing Metal and non-MetalFish engine compatibility.
- Changed installation from Qt6 to Qt5 dependencies in the GitHub Actions workflow.
- Added a configuration option to disable GUI during the build process for cutechess-cli, enhancing flexibility in CI environments.
- Added 'libqt5svg5-dev' and 'libqt5concurrent5' to the installation step in the GitHub Actions workflow for building cutechess-cli.
- This enhancement ensures all necessary Qt5 components are available for the build process.
@NripeshN NripeshN requested a review from Copilot January 24, 2026 13:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 69 changed files in this pull request and generated 4 comments.

@NripeshN
Copy link
Owner Author

@copilot apply changes based on the comments in this thread and commit directly to this branch

Copy link
Contributor

Copilot AI commented Jan 24, 2026

@NripeshN I've opened a new pull request, #11, to work on those changes. Once the pull request is ready, I'll request review from you.

NripeshN and others added 3 commits January 24, 2026 13:10
- Included 'libqt5core5a', 'libqt5gui5', and 'libqt5widgets5' to the installation step in the GitHub Actions workflow for building cutechess-cli.
- This update ensures all required Qt5 components are available for a successful build process.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

- Updated the GitHub Actions workflow to install additional runtime dependencies for Qt5, ensuring successful builds for cutechess-cli.
- Introduced a new header file for Apple Silicon search optimizations, implementing hardware detection and tuning for improved performance on Apple chips.
- Key features include cache-line aligned structures, optimal thread counts, and memory prefetching tailored for unified memory architecture.
@cursor
Copy link

cursor bot commented Jan 24, 2026

Bugbot Autofix resolved the bug found in the latest run.

  • ✅ Fixed: Empty matrix check uses wrong JSON string format
    • Updated both if conditions to check for '{"include": []}' with a space after the colon, matching Python's json.dumps() output format.

- Updated the conditional statements in the GitHub Actions workflow for macOS and Linux match jobs to remove unnecessary string interpolation.
- This change enhances clarity and ensures proper evaluation of the matrix outputs for match execution.
@NripeshN NripeshN force-pushed the hybrid-and-mcts-fix branch from f5e35d6 to cf8c1df Compare January 24, 2026 13:35
…nalities

- Introduced new comprehensive test files for core, GPU, hybrid, and MCTS components to enhance test coverage and ensure robust functionality.
- Updated CMakeLists.txt to include these new test files in the build process.
- Enhanced the main test suite to support running both original and comprehensive tests, improving usability and flexibility for developers.
- Adjusted timeout settings in the Python testing script to accommodate longer-running tests.
- Reorganized test files by removing outdated comprehensive tests and introducing new modular test files for core, MCTS, hybrid, and GPU functionalities.
- Updated CMakeLists.txt to reflect the new test structure, ensuring proper inclusion of the updated test files in the build process.
- Enhanced the main test suite to streamline the execution of module-specific tests, improving clarity and maintainability for developers.
Repository owner deleted a comment from Copilot AI Jan 24, 2026
@NripeshN
Copy link
Owner Author

@copilot Fix this issue #15 (comment)

Copy link
Contributor

Copilot AI commented Jan 24, 2026

@NripeshN I've opened a new pull request, #22, to work on those changes. Once the pull request is ready, I'll request review from you.

Repository owner deleted a comment from Copilot AI Jan 24, 2026
NripeshN and others added 4 commits January 24, 2026 14:51
…ction

- Implemented GPUTuningParams::select_strategy for CPU fallback in cpu_backend.cpp.
- Updated test_tuning in test_gpu_module.cpp to handle strategy selection based on GPU availability, ensuring correct fallback behavior when GPU is not available.
- Enhanced test coverage for different batch sizes to validate strategy outcomes under varying conditions.
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
[WIP] Enhance MCTS hybrid search features
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants