This repository is a proof of concept for a computer-use agent that can observe and steer an iPhone app running on a real device.
It has two core parts:
- A device adapter that captures state and executes taps, swipes, typing, and waits on the phone.
- A planner that asks OpenAI for the next safe UI action from the latest screenshot and optional UI tree.
-
Install dependencies.
npm install
-
Create a local env file and fill in the required values.
cp .env.example .env
-
Edit
session-goal.mdwith the flow you want the agent to complete. If you want to keep multiple goal files, setSESSION_GOAL_PATHin.envto point at a different file. -
Create a local Appium capabilities file and fill in the required values.
cp appium/capabilities.example.json appium/capabilities.json
-
Install and start Appium.
npm install -g appium appium driver install xcuitest appium
-
Start the controller.
npm start
A concrete Benevita example lives in examples/benevita/README.md.
It shows the minimal command needed to run an app-specific scenario by reusing the base project setup and pointing SESSION_GOAL_PATH at the example goal file.
Additional appium setup notes live in appium/README.md.
Before starting Appium, make sure the iPhone is prepared for device automation:
- Enable Developer Mode on the iPhone.
- Connect the iPhone over USB to the Mac running Appium.
- Accept the trust prompt on both devices if it appears.
- Confirm the device is visible in Xcode and available for development.
- Unlock the phone before starting the session.