The Agent class has been predefined in the agent/ folder, with implementations for the OpenAI interface based on
oneapi and the currently deployed GLM interface. If you need to add a base model, you need to:
- Create a new Python file under the
agent/directory, and refer toagent/model/OpenAIAgent. Implement your model call by inheriting theAgentclass. Theactfunction input is already organized according to the OpenAI message format, and the output should be a string. If the input format of the corresponding model differs from OpenAI, you can refer to theformat_historyfunction inclaude_modeland theprompt_to_messagefunction inqwen_modelfor modifications.format_historycan organize the format of historical records, and theprompt_to_messagemethod converts the prompt and image input (if any) of the current turn into the single-turn format of the current model. - Import your new class in
agent/__init__.py. - Replace the content under
agentin the config file used byeval.pywith:
agent:
name: Your Agent Module Name
args:
max_new_tokens: 512Make sure the name matches your implemented class name, and the content under args will be passed to your
class's init function.
During the process of writing a new task, it is equally important to write and use the code to determine if your code is correct through actual running results. Therefore, please follow the steps below to ensure each new task is error-free.
- Write your task. Tasks include yaml files, evaluation methods, and corresponding mobile app installation.
- The task's yaml file should refer to other existing files under
evaluation/configand must includetask_id,task,metric_type, andmetric_func.adb_queryis only used when the results need to be queried using adb commands. Althoughcategoryis not yet in use, it is strongly recommended to add it. - The evaluation method needs to inherit the
evaluation/task/SingleTaskclass. After each recorded operation, thejudgefunction will be executed, and its return value is a dict:{"judge_page": bool, "1": bool, ..., "complete": bool}. The code will record the judgment result of the last page wherejudge_pageisTrue, andcompleteshould only be set toTrueif all judgment points are correct. If it's a task that compares return values, thecheck_answermethod has already been implemented. Modifyfinal_ground_truthto the standard answer before calling this function. - Refer to other tasks, import all evaluation methods in
evaluation/app_name/__init__.pyinto thefunction_mapclass. - To ensure the model can execute the launch command correctly, add the app name and corresponding package name
in
templates/packages/apps_dict. The package name can be obtained by executingadb -s {device} shell dumpsys window | grep mCurrentFocus | awk -F '/' '{print $1}' | awk '{print $NF}'.
- The task's yaml file should refer to other existing files under
- Execute your task using at least the most advanced agent and generate evaluation results. If necessary, quickly complete the correct operation during model operation intervals to ensure that the recorded operation can capture the correct result page between two model operations to test if your code can complete the detection task.
- Use the
tools/check_result_multiprocess.pyfunction to generate screenshots of each step. Focus on checking whether the screenshots of correct model operations are indeed judged as correct.
If you want to define a mobile snapshot different from the android eval snapshot, you need to follow these steps:
- Download related docker files from the link: https://drive.google.com/file/d/1xpPEzVof5hrt5bQY6BHm_4Uoyq5mJQNb/view?usp=drive_link
- Extract the file, enter the extracted folder, and then run:
docker build -t android_eval_no_avd:latest .- Configure your AVD snapshot on an x86_64 machine (it is recommended to configure it directly using Android Studio). Note that the default installed Android AVD type is:
RUN /bin/bash -c "source /root/.bashrc && yes | sdkmanager 'platform-tools' 'emulator' 'system-images;android-33;google_apis;x86_64'"
RUN /bin/bash -c "source /root/.bashrc && yes | sdkmanager 'build-tools;33.0.0'"
RUN /bin/bash -c "source /root/.bashrc && yes | sdkmanager 'platforms;android-33'"If you want to configure the AVD for a different version, please modify the specific version number installed in the Dockerfile. Note that the version number must be strictly consistent, otherwise, the installed image will not be able to read the existing cache.
- You can use the following code to generate the AVD image used in the docker:
python tools/modify_mobile_to_docker.py
--avd_dir /Path/to/your/.android/avd
--device_name your device name
--save_dir /Path/to/your/save/avdAlternatively, you can modify it as follows:
Find your .avd folder and .ini file through Android Studio -> Virtual Devices Manager -> Right-click -> Show on Disk, and make the following modifications:
In Pixel_7_Pro_API_33.ini, modify path and path.rel to the following paths:
avd.ini.encoding=UTF-8
path=/root/.android/avd/device name.avd
path.rel=avd/device name.avd
target=android-33In Pixel_7_Pro_API_33.avd/config.ini, modify the following paths:
...
image.sysdir.1 = system-images/android-33/google_apis/x86_64/
...
skin.path = /root/.android/skins/pixel_7_pro
...Keep the other contents unchanged.
- Start an image and copy your .avd folder and .ini file into the image:
docker run -it android_eval_no_avd:latest /bin/bash
docker cp /path/to/your/device name.avd container_id:/root/.android/avd
docker cp /path/to/your/device name.ini container_id:/root/.android/avdAfter completing the above, you can execute the following in the image:
emulator -avd device name -no-window -no-audio -no-snapshot-saveVerify whether the installation is successful.