Skip to content

feat(android): add UIAutomator hierarchy dump, parsing, and agent tool#251

Open
mlikasam-askui wants to merge 2 commits intomainfrom
feat/android-uiautomator-hierarchy-tool
Open

feat(android): add UIAutomator hierarchy dump, parsing, and agent tool#251
mlikasam-askui wants to merge 2 commits intomainfrom
feat/android-uiautomator-hierarchy-tool

Conversation

@mlikasam-askui
Copy link
Copy Markdown
Contributor

Summary

Dump the current screen with uiautomator dump, parse the XML into a flat list of views (text, ids, content-desc, bounds, tap centers), and expose it as AndroidGetUIAutomatorHierarchyTool for agents when screenshots are weak or you want structured UI data.

Notes

  • Wired via get_ui_elements() on Android AgentOs / PpAdbAgentOs and facade.
  • Includes pdm.lock updates.

Add UIElement and UIElementCollection to parse UIAutomator window-dump XML
from normalized shell output (bounds, text, resource-id, content-desc,
clickable, etc.).

Expose get_ui_elements() on Android AgentOs and implement it in the facade
and PpAdb path so callers get a flattened hierarchy string.

Register AndroidGetUIAutomatorHierarchyTool in the Android tool store for
act flows that need structure instead of screenshots.

Refresh pdm.lock for the otel dependency group and OpenTelemetry-related
package updates.
self,
x: int,
y: int,
from_agent: bool = True,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the from_agent mean?

stopped and the UI has settled.
"""
self._check_if_device_is_selected()
assert self._device is not None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the assert not be integrated in _check_if_device_is_selected

assert self._device is not None
dump_cmd = f"uiautomator dump {self._UIAUTOMATOR_DUMP_PATH}"
dump_response = self.shell(dump_cmd)
if "dumped" not in dump_response.lower():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "dumped" mean?

dump_response = self.shell(dump_cmd)
if "dumped" not in dump_response.lower():
msg = f"Failed to dump UI hierarchy: {dump_response}"
raise AndroidAgentOsError(msg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to terminate the Agent Loop or is this error recoverably from the Agent??

"""Collection of UI elements."""

def __init__(self, elements: list[UIElement]) -> None:
self._elements = list(elements)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why parsing to a list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants