Skip to content

Hello World Screenshot Editor Tutorial

greglnelson edited this page Oct 8, 2014 · 4 revisions

Before we get started with the bubble cursor, we will first walk you though a simple Hello World application. In this tutorial, you'll write code that will interpret pixels, identify textual elements, and replace all of that text with the string "hello world".

How Prefab Works

You will be writing some code that builds on top of Prefab. Prefab is a system that helps you reverse engineer interfaces from pixels. Let's first overview how Prefab works by visiting this page How Prefab Works.

When given a screenshot, Prefab outputs a tree, where each node represents an identified widget. The problem is that this tree has no semantic information. In other words, which of these identified elements are text? You will be writing code to provide that semantic information. The rest of the code that renders the output and redirects input has already been implemented for you! To recover important semantic information, you will use write some Python Code that operates on top of Prefab's tree structure. Your code will output a new tree that is then used by the Bubble Cursor to find its targets.

We've also provided some screenshots of some existing interfaces to use as test cases. To interpret the pixels of these screenshots, you will use a combination of python and our authoring tool for interacting with the screenshots and creating annotations.

We call our authoring tool the Screenshot Editor; the repository is here and more documentation on how to install and run it will be written. On your monitor, open the Screenshot Editor open. Let's examine the screenshot editor first.

Screenshot Editor Overview

The Screenshot Editor has six different panels. The main panel shows a screenshot that you are working with. You can zoom in and out of the screenshot by clicking the slider or by mouse scrolling while holding down the control button.

The panel below lets you select different different screenshots that are loaded. Clicking on any one of the thumbnails will load that frame, Prefab will find elements, and then your code will be run to interpret that frame. There are some options in this area as well, which we talk about in our Viewing the Hello World Output video.

Viewing the Hello World Output

Click on the Render Hello World checkbox towards the bottom of the screen to view the end result of the Hello World application. This will render an image on top of the screenshot. You can see that the image is identical, but some of the text has been replaced. You can toggle this view on and off by clicking the checkbox.

Hide the Hello World output by unchecking the box, and then mouse over the screenshot. You will see a black rectangle follow your cursor. This rectangle highlights the smallest element identified by prefab that is underneath the cursor. You can view many elements at once by clicking the Show Rectangle Overlays checkbox. Elements will be shown in green. The Match Element Tags panel in the left can be used to change what elements are visualized with these rectangles. This is done by entering a tag name and value in one of the selector text boxes. By default, there is a text box that filters elements with the tag "is_text" = true.

Code Explorer

The panels on the right give us information about our code and what we have interpreted. The top panel shows us the list of layers that are run when we load a screenshot. We will describe how to implement layers when we get to the programming portion of our tutorial. The panel below is the Layer Libraries panel. Layers can access annotation libraries, and you can set those here.

Interpreted Tree Explorer

We mentioned that Prefab outputs a tree of identified elements. You can view that tree here by clicking on it and expanding the nodes. Clicking on a node will also highlight a the corresponding element on top of the screenshot, and properties of that node are shown in the Element Properties window below, along with a small thumbnail of the element. For example, by clicking on a node, we can see if it is tagged with an with a boolean value "is_text" = true.

Adding/Removing other Annotations

It is possible to use annotations to provide semantic information about elements. We will write some scripts that try to predict which elements are text, and then group that text together. But these scripts will be imperfect, and so we can use human-provided annotations to correct them when they're wrong. For example, this icon at the top of this dialog box was incorrectly labeled as text, so we can provide that correction and our layers will update based on that data.

Congratulations! You understand the basics of the Screenshot Editor. Now let's continue this tutorial by writing some code.

Beginner's API Hello World Tutorial

Clone this wiki locally