Skip to content

Dev#5

Merged
IgorSemed0 merged 17 commits intomainfrom
dev
Nov 6, 2025
Merged

Dev#5
IgorSemed0 merged 17 commits intomainfrom
dev

Conversation

@IgorSemed0
Copy link
Copy Markdown
Owner

No description provided.

- Add .env.example with configuration template
- Support multiple AI providers (Gemini, OpenAI, Anthropic, local LLMs)
- Flexible configuration: request_url, token, model
- Include optional settings for tokens, temperature, timeout
- Update .gitignore to ensure .env is never committed
- Support Wayland (grim/slurp) and X11 (scrot/import)
- Auto-detect display server and available tools
- Capture full screen, regions, windows, or active window
- Interactive region selection
- Temporary file capture for AI vision
- Cross-platform screenshot utilities
- Replace OCR with AI vision for better UI understanding
- Support Google Gemini API for image analysis
- Find UI elements by natural language description
- Analyze screenshots and suggest actions
- Return element coordinates with confidence scores
- Flexible provider system (easy to add OpenAI, Claude, etc.)
- Base64 image encoding for API requests
- Add dotenv and base64 dependencies
…tions

- Add mouse clicking (left, right, middle)
- Add mouse press/release for drag operations
- Add scrolling (vertical and horizontal)
- Add keyboard key press with support for special keys
- Add key down/up for key combinations
- Add get_mouse_position for current cursor location
- Parse common key names (Enter, Escape, Arrows, etc.)
- Support modifier keys (Ctrl, Alt, Shift, Meta)
- Detect if processes are running (pgrep)
- Launch applications
- Focus, maximize, minimize, close windows (wmctrl)
- Move and resize windows
- List all windows with properties
- Find windows by pattern/name
- Get active window (Wayland via gdbus, X11 via xdotool)
- Smart open-or-focus that checks if app is already running
- Support for both Wayland/Gnome and X11 environments
- Record sequences of user actions with timestamps
- Save/load action sequences as JSON files
- Action library for managing recorded sequences
- Support recording mouse, keyboard, app launch actions
- Playback recorded sequences with proper timing
- Tag and search sequences by category
- Action library stored in ~/.casper/actions/
- Foundation for learning and automation capabilities
- Add chrono dependency for timestamps
- Add all new screen control endpoints (click, scroll, keys)
- Add window management endpoints (focus, launch, list, find)
- Add action recording/playback endpoints
- Maintain daemon state for recorder, player, and library
- Support concurrent requests with proper locking
- Increase buffer size for larger payloads
- Add ping/status endpoint
- Better error handling and JSON responses
- Load action library from ~/.casper/actions on startup
- Update README with new features and v0.2.0 capabilities
- Add ARCHITECTURE.md with complete technical roadmap
- Add NEXT_STEPS.md with actionable weekly development guide
- Add Spotify Daily Mix example demonstrating full workflow
- Document all API endpoints and usage examples
- Include installation, testing, and contribution guidelines
- Provide learning resources and inspiration references
- Set clear milestones and success metrics
- Comprehensive AI vision tutorial with Gemini examples
- Real-world workflows showing screen understanding
- Quick start guide for 5-minute setup
- Troubleshooting tips and best practices
- Shell aliases for convenience
- Development mode setup instructions
- Document all new features and changes
- List breaking changes (none in this release)
- Include roadmap for future versions
- Reference git commit history
- Follow Keep a Changelog format
- Complete summary of all work accomplished
- Statistics and metrics
- Design decisions rationale
- File changes breakdown
- Project status and progress
- Next steps and recommendations
- Add tests for all new screen control features
- Add window management tests
- Add action recording workflow test
- Improve test output formatting
- Increase buffer size to 4096 for complex responses
- Enable tokio time feature for sleep in tests
- Add comprehensive test suite for all new features
- Test screen control, window management, and action recording
- Improve test output formatting with sections and emojis
- Auto-detect Hyprland via HYPRLAND_INSTANCE_SIGNATURE
- Use hyprctl for window management on Hyprland
- Parse hyprctl JSON output for window list
- Fallback to wmctrl for X11/generic Wayland
- Support for focus, list windows on Hyprland
@IgorSemed0 IgorSemed0 merged commit 4ea3eee into main Nov 6, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant