Merged
Conversation
- Add .env.example with configuration template - Support multiple AI providers (Gemini, OpenAI, Anthropic, local LLMs) - Flexible configuration: request_url, token, model - Include optional settings for tokens, temperature, timeout - Update .gitignore to ensure .env is never committed
- Support Wayland (grim/slurp) and X11 (scrot/import) - Auto-detect display server and available tools - Capture full screen, regions, windows, or active window - Interactive region selection - Temporary file capture for AI vision - Cross-platform screenshot utilities
- Replace OCR with AI vision for better UI understanding - Support Google Gemini API for image analysis - Find UI elements by natural language description - Analyze screenshots and suggest actions - Return element coordinates with confidence scores - Flexible provider system (easy to add OpenAI, Claude, etc.) - Base64 image encoding for API requests - Add dotenv and base64 dependencies
…tions - Add mouse clicking (left, right, middle) - Add mouse press/release for drag operations - Add scrolling (vertical and horizontal) - Add keyboard key press with support for special keys - Add key down/up for key combinations - Add get_mouse_position for current cursor location - Parse common key names (Enter, Escape, Arrows, etc.) - Support modifier keys (Ctrl, Alt, Shift, Meta)
- Detect if processes are running (pgrep) - Launch applications - Focus, maximize, minimize, close windows (wmctrl) - Move and resize windows - List all windows with properties - Find windows by pattern/name - Get active window (Wayland via gdbus, X11 via xdotool) - Smart open-or-focus that checks if app is already running - Support for both Wayland/Gnome and X11 environments
- Record sequences of user actions with timestamps - Save/load action sequences as JSON files - Action library for managing recorded sequences - Support recording mouse, keyboard, app launch actions - Playback recorded sequences with proper timing - Tag and search sequences by category - Action library stored in ~/.casper/actions/ - Foundation for learning and automation capabilities - Add chrono dependency for timestamps
- Add all new screen control endpoints (click, scroll, keys) - Add window management endpoints (focus, launch, list, find) - Add action recording/playback endpoints - Maintain daemon state for recorder, player, and library - Support concurrent requests with proper locking - Increase buffer size for larger payloads - Add ping/status endpoint - Better error handling and JSON responses - Load action library from ~/.casper/actions on startup
- Update README with new features and v0.2.0 capabilities - Add ARCHITECTURE.md with complete technical roadmap - Add NEXT_STEPS.md with actionable weekly development guide - Add Spotify Daily Mix example demonstrating full workflow - Document all API endpoints and usage examples - Include installation, testing, and contribution guidelines - Provide learning resources and inspiration references - Set clear milestones and success metrics
- Comprehensive AI vision tutorial with Gemini examples - Real-world workflows showing screen understanding - Quick start guide for 5-minute setup - Troubleshooting tips and best practices - Shell aliases for convenience - Development mode setup instructions
- Document all new features and changes - List breaking changes (none in this release) - Include roadmap for future versions - Reference git commit history - Follow Keep a Changelog format
- Complete summary of all work accomplished - Statistics and metrics - Design decisions rationale - File changes breakdown - Project status and progress - Next steps and recommendations
- Add tests for all new screen control features - Add window management tests - Add action recording workflow test - Improve test output formatting - Increase buffer size to 4096 for complex responses
- Enable tokio time feature for sleep in tests - Add comprehensive test suite for all new features - Test screen control, window management, and action recording - Improve test output formatting with sections and emojis
- Auto-detect Hyprland via HYPRLAND_INSTANCE_SIGNATURE - Use hyprctl for window management on Hyprland - Parse hyprctl JSON output for window list - Fallback to wmctrl for X11/generic Wayland - Support for focus, list windows on Hyprland
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.