On the Accuracy of Automated GUI Testing for Embedded …1 On the Accuracy of Automated GUI Testing for Embedded Systems Ying-Dar Lin, Fellow, IEEE, Edward T.-H. Chu, Member, IEEE,

1

On the Accuracy of Automated GUI Testing for Embedded Systems

Ying-Dar Lin, Fellow, IEEE, Edward T.-H. Chu, Member, IEEE, Shang-Che Yu, and

Yuan-Cheng Lai

Abstract

Automated GUI (Graphical User Interface) testing tools are software programs used to test

application user interfaces and to verify their functionalities. However, due to the uncertainty of

runtime execution environments GUI operations may not be reproduced at the DUT (Device

Under Test) on time. The incorrect GUI operations then result in test failures. In this work we

design SPAG (Smart Phone Automated GUI) to avoid non-deterministic events by batching the

event sequences and reproducing them on the DUT directly. Furthermore, SPAG dynamically

changes the timing of the following operation so that all event sequences can be performed on

time. We conducted our experiments on Acer Liquid smart phone and compared SPAG with

Monkeyrunner. Our experiments showed that SPAG can maintain an accuracy of up to 99.5%.

Keywords: GUI, automated testing, embedded system, Android

Ying-Dar Lin and Shang-Che Yu are with the Department of Computer Science, National Chiao Tung University,

Taiwan. E-mail: [email protected], [email protected]

Edward T.-H. Chu is with the Department of Computer Science and Information Engineering, National Yunlin

University of Science and Technology, Taiwan. E-mail: [email protected]

Yuan-Cheng Lai is with the Department of Information Management, National Taiwan University of Science and

Technology, Taiwan. E-mail: [email protected]

mailto:[email protected]




2

1 Introduction

Automated GUI (Graphical User Interface) testing tools are software programs used to test

application user interfaces and to verify their functionalities [1, 5-7]. In the process of testing

applications, engineers first designed a test case which included several GUI operations and a set

of conditions devised to determine whether an application worked correctly or not. The test case

was converted to a script file and performed on a device under test (DUT), such as a smart phone

or tablet PC. To verify the test result, the DUT captured the screen and sent it to the host PC,

where an automated GUI testing tool performed a verification operation. We selected a popular

open source automated GUI tool Sikuli [2-3], an Android device controlling tool

AndroidScreenCast, and an Android smart phone as examples. Software engineers first wrote a

Sikuli script to describe the timing and order of GUI operations, such as scroll screen and key

press actions. During runtime, each operation of the Sikuli script was performed on the DUT

screenshot window provided by AndroidScreenCast. These operations were interpreted into

multiple motion events and key press events and were then transmitted to an Android smart

phone, which is the DUT. After performing all received events, the AndroidScreenCast captured

the screen of the DUT and sent it back to the host PC, where the Sikuli verified its correctness.

However, because of the uncertainty of the runtime execution environment, such as timing

delay variation in communication, interpreted events may not be reproduced at the DUT on time.

As a result, intervals between events may be different from those expected. The

non-deterministic event sequences may lead to an incorrect GUI operation. For example,

3

Android fling action occurs when a user scrolls a touch panel and then quickly lifts his finger. A

sequence of motion events is used to represent the operation. When an automated GUI tool

replays these event sequences, each motion event should be triggered on time in order to

reproduce the fling with the same scrolling speed. If not, the scrolling speed of the reproduced

fling action will be different from what is expected and will thus result in an incorrect result. In

order to address the issue of non-deterministic events, a commonly used method is to use

trackball instead of fling action. However, trackball is not always equipped with a smart phone.

An uncertain runtime execution environment may cause another problem, because it may

interfere in or delay the execution of an application, especially under the circumstance where the

DUT is under the condition of a heavy load. A delayed application may fail to process an event

correctly if the response to the previous event has not been completed. For example, an event

may be dropped if AUT (Application Under Test) receives the event ahead of time and is not

ready to process it. To solve the problem, an intuitive method is to delay the execution of the

operations. However, it requires experienced engineers to set the delay for each operation

properly, so that the application can receive the reproduced events.

In summary, based on our observations, automated GUI testing for smart phones faces

two major challenges: non-deterministic events and execution interference. In this work, we

aimed to design an automated GUI testing system to maximize accuracy due to the uncertainty of

a runtime execution environment. The accuracy of an automated GUI testing tool is defined as

the success rate of examining a bug-free application. The higher the success rate is, the higher

the accuracy is. For this, we designed a Smart Phone Automated GUI testing tool (SPAG), which

4

was based on Sikuli [2-3]. By using Sikuli IDE (Integrated Development Environment), we can

write GUI test cases, execute the script, automate GUI operations on desktop and verify GUI

elements presented on screenshot. To avoid non-deterministic events, we batched the event

sequence and reproduced them on the DUT. In addition, SPAG can monitor the CPU usage of

the target application during runtime and dynamically change the timing of the following

operation so that all event sequences and verifications can be performed on time, even though the

DUT is heavily loaded. We conducted several experiments on an Acer Liquid smart phone to

investigate the applicability and performance of SPAG, and compared our method with

Monkeyrunner [1].

The rest of this paper is structured as follows. Section 2 is background and related work.

Section 3 describes definitions and problem statement. Section 4 presents SPAG design.

Section 5 is performance analysis and Section 6 gives the conclusion.

2. Background and Related Work

There has been much research dedicated to automated GUI testing. The most common

approach is model-based testing (MBT), which models the behavior of target applications and

then uses the test cases generated from the models to validate the DUT. T. Takala et al. [6]

adopted Monkeyrunner and Window services to generate GUI events, and L. Zhifang et al. [7]

utilized the concept of virtual devices to test applications. These methods rely on image-based

pattern matching which is sensitive to the quality of images. On the contrary, SPAG uses GUI

components for pattern matching in order to improve the stability and the speed of the validation.

5

Several techniques and architectures were developed to cope with complex application tests.

MoGuT [8], a variant of the FSM-based test framework, uses image flow to describe event

changes and screen response. However, it lacks flexibility. Gray-box testing adopted APIs to

construct calling contexts and parameters from input files [4]. Based on a logging mechanism,

gray-box testing verifies testing results. However, for complex software, it becomes difficult to

describe the testing logic and calling context. Recently, C. Hu et al. developed an approach for

automating the testing process of Android application by using JUnit and Monkey tool [7]. W.

Yang et al. proposed a method to automatically extract a model of an application [10]. However,

both of the methods used fixed delay between consecutive GUI operations, while SPAG

determines the delay dynamically by using smart wait function. D. Amalfitano et al. designed a

method to automatically generate a model of application by using dynamic crawling [9].

However, their method required the source codes of the applications under test. On the contrary,

SPAG does not need the source code.

3. Definitions and Problem Statement

3.1 Definitions

We adopt a commonly used software testing technique called Record-Replay for embedded

systems, which includes a recording stage and a replay stage. Figure 1(a) shows the recording

stage, where the screen of the DUT is first redirected to the host PC, on which the test tool runs.

An engineer interacts with the DUT remotely: whenever the engineer performs a GUI operation

on the host PC, such as a key press or a finger touch, the test tool sends associated events of the

GUI operation to the DUT and records them in a test case. The test case also includes

verification operations, added by the engineer, to verify the response of DUT. Figure 1(b) shows

6

the replay stage, where the test executer first reads GUI operations from the test case and replays

them on the DUT. Finally, the test executer verifies the testing results according to the response

of the DUT.

denotes a test case, which includes operations { } An operation can be a

GUI operation or a verification operation, while a GUI operation can be a key press or a finger

touch, and a verification operation is used to verify the test result. The interval between

and is given by . A GUI operation consists of a sequence of events { } For

example, when a fling operation is performed, Android system generates associated move

events.

3.2 Problem Statement

Due to the uncertainty of a runtime execution environment, such as variation in

Device under test

Device under test

Host PC

Host PC

Test tool

EngineerTest case

Remote GUI of SUTScreenshot

GUI actions

Script IDEVerifications

Test tool

Engineer

Test case

Test executerScreenshot

Operations (GUI actions & Verifications)

Test result

Start testing

(a) Record stage

(b) Replay stage

Add verification

Demonstrate GUI testing GUI actions

Diagram symbols

Component

Document

Control Data

Substance

GUI actions

Figure 1. System architecture of record/replay method with DUT

7

communication delay between the host PC and DUT, each event may not be reproduced at

the DUT on time. Such non-deterministic event sequences may lead to an incorrect GUI

operation and cannot pass verification operations. Furthermore, the interval between

and may also be affected by the runtime execution environment of DUT. The GUI

application may drop the new arrival events of because the previous events of have not

yet been processed. The dropped events will also lead to test failures. In this work, we define the

accuracy of an automated GUI testing tool as the success rate of executing a bug-free application.

The higher the success rate is, the higher the accuracy is. Given a test case and a bug free

application, we aim to design an automated GUI testing system to maximize accuracy.

4. SPAG Design

We designed a Smart Phone Automated GUI (SPAG) to accurately reproduce GUI

operations and verify test results. In the record stage, SPAG monitors GUI operations, and stores

these GUI operations and associated CPU time of DUT in a test script. An engineer also added

verification operations into the test script in order to verify the result of GUI operations. In the

replay stage, GUI and verification operations were batched and sent to the DUI so that the events

could be triggered on time. Based on the CPU utilization of DUT, SPAG dynamically modified

the duration of two operations. The testing results were sent back to the host PC for verification.

4.1 Event Batch

In the replay stage, the application running on DUT kept monitoring the GUI events and

took corresponding operations. For example, a gesture, such as a swipe operation, includes

8

several multi-touch events. After receiving the multi-touch events, the application scrolled the

screen up. However, some GUI operations were sensitive to the timing of associated events. For

example, the onFling GUI operation consisted of many move events. The speed of onFling was

sensitive to both displacement and time difference between two continuous move events. If the

actual interval between two move events was longer than the interval described in the test script,

the speed of the reproduced onFling GUI was slower than was expected, and the incorrect GUI

operation may have led to test failure. Therefore, in the replay stage, it is crucial to trigger each

event at the DUT on time to avoid possible test failures.

In our implementation, SPAG stored associated events of each GUI operation and event

intervals in the test script. In addition, a tag, such as ACTION_DOWN, ACTION_MOVE or

ACTION_UP, was attached at the end of each GUI operation in order to differentiate two

continuous GUI operations. In the replay stage, SPAG first batched all events and sent them to

the DUT. Next, a module at DUT rather than a module at host PC triggered the events in order to

remove the effect of commutation uncertainty between DUT and host PC.

4.2 Smart Wait

In the replay stage, the record GUI operations were sent to the associated application

accordingly. However, the execution time of the application may be longer than was expected if

the execution environment was heavily loaded. The prolonged application may have failed to

process a GUI operation correctly if the operation came earlier than was expected. For example,

if DUT received the push bottom operation ahead of time and AUT was not ready to process the

GUI operation, this operation would be dropped and lead to test failure. A practical method was

9

to ask experienced engineers to set the duration of every two GUI operations properly, so that the

application could process GUI operations on time while maintaining a reasonable testing time.

But, the cost of manually adjusting durations is high.

In order to improve the efficiency of test process, SPAG automatically adjusted delay time

between two GUI operations based on CPU time used to perform GUI operations. The function

is called smart wait. denotes the process that performs the GUI operations. In the record stage,

when operation took place, SPAG monitored the CPU time of process at duration

between and . This was achieved by parsing data from Linux OS virtual directory

/proc. From /proc/<PID>/stat, we obtained the time the process had spent in both user space and

kernel space. In addition, from /proc/stat, we obtained the time the CPU had spent in both user

space and kernel space. Based on this information, SPAG calculated the CPU usage of the

process at duration . Both and were stored in the test script as CMD( ).

denotes the process that performed the GUI operations in the replay stage. When was

executed, SPAG monitored the CPU time of . If

was smaller than , SPAG

assumed that was not complete and calculated a proportional delay time for remaining

GUI operations. For example, in the recording stage, if used 5 ms CPU time out of 4 sec

for execution, then was 5 ms and was 4 sec. SPAG inserted a command CMD(4000 ms,

5 ms) in the test script right after . In the replay stage, when was replayed, SPAG first

waited 4 sec and read associated from the DUT. If

was 2 ms, SPAG assumed that

was not finished and estimated its completion time by 4 sec×5ms/2ms=10 sec. In this case,

the next operation was postponed by 6 sec.

10

4.3 Implementation

SPAG integrated two popular open source tools: Android Screencast and Sikuli. Android

Screencast was a desktop application that redirected the screen of DUT to the host PC and

allowed an engineer to interact remotely with the DUT by using mouse or keyboard. On the

other hand, Sikuli was a desktop application that automatically tested graphical user interfaces by

screenshot images. In the recording stage, on the host PC, SPAG record all GUI operations

performed inside the redirected screen of DUT. An engineer used Sikuli’s IDE to insert a

verification operation at the end of one or several continued GUI operations by selecting a region

of the redirected screen. The class name and activity name of the redirected screen were also

logged at that time. In the replay stage, SPAG reproduced GUI operations by sending associated

events to the DUT. Both event batch and smart wait were adopted to reduce the uncertainty of

run-time execution environment. When performing a verification operation, SPAG first checked

the class name and activity name of the redirected screen. If the check fails, SPAG will instantly

make an image comparison between the redirected screen and the predefined image. Note that

the methodologies of event batch and smart wait are portable. In order to take advantage of these

two techniques to perform GUI testing on other platforms, one would need to use an equivalent

of Android Screencast to remotely control the DUT, and integrate that tool with Sikuli or an

equivalent tool to record user interaction.

11

5. Experimental Results

5.1 Experiment Setup

In order to investigate the accuracy of SPAG, we adopted Acer Liquid smart phone for

evaluation. We compared SPAG with Monkeyrunner, which is an automated testing tool

included in Android SDK. Monkeyrunner reproduces predefined operations, such as key presses,

by generating associated events and sending the events from the host PC to DUT respectively [1].

Our test script included five commonly used scenarios: browsing a contact entry, installing an

APP over Wi-Fi, taking a picture, making a video, and browsing Google map over Wi-Fi. As

Figure 2 and 3 shows, we used a busy-loop program to adjust the CPU utilization from 25% to

100% and adopted an intensive flash read write program to simulate I/O burst condition. For

each configuration, such CPU utilization is 25%, 50%, 75% or 100%, we repeated the same

experiment 40 times and took the average value of accuracy for comparison.

5.2 Test Accuracy

The experiment evaluated the accuracy of SPAG and Monkeyrunner. As Figure 2 shows,

the x-axis is the workload type and the y-axis is the accuracy. We checked the accuracy of

Monkeyrunner manually because it did not support a good-enough image comparison function to

verify testing results. Obviously, the accuracy of Monkeyrunner droped significantly when the

CPU utilization increased or the I/O subsystem was busy. For example, the accuracy of

Monkeyrunner dropped to 64.5% when CPU utilization was 100% and to 26.5% when I/O burst

occurred. This was because the tested application was deferred for execution when the system

was heavily loaded. Monkeyrunner did not dynamically modify the duration of two continuous

12

operations. As a result, the new communing events were dropped or ignored, which made

Monkeyrunner test fail. On the contrary, with smart wait function, the accuracy of SPAG

decreased slightly when CPU utilization increased or I/O burst occurred. The accuracy of SPAG

was over 90% in all configurations we tested. In particular, the accuracy stayed 99.5% under

normal conditions, in which the CPU utilization was less than 25%.

Figure 2. Testing with SPAG and Monkeyrunner

With the same experiment setup, we also adopted three popular mobile APPs, Skype,

Twitter and Facebook, to evaluate the accuracy of SPAG and monkeyrunner. The major gesture

activity of Skype was tap while that of Twitter and Facebook was fling. As table 1 shows, the

SPAG still maintained a very high level of accuracy in all configurations. On the contrary,

Mokeyrunner performed poorly when the system was busy, especially for Twitter and Facebook.

This was because Mokeyrunner cannot trigger associated events of fling on time. Hence, the

reproduced fling speed was inaccurate and test failed.

88.0% 85.5% 77.5%

65.5% 64.5%

26.5%

99.5% 97.5% 98.5% 96.5% 96.5% 91.0%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Normal 25% CPU 50% CPU 75% CPU 100% CPU IO busy

Accuracy

Addtional workload

monkeyrunner

SPAG

13

Workload Skype Twitter Facebook

SPAG monkey

runner

SPAG monkey

runner

SPAG monkey

runner

Normal 97.5 92.5 99.5 92.5 97.5 72.5

25% CPU 97.5 99.5 99.5 92.5 97.5 65.0

50% CPU 99.5 99.5 99.5 72.5 97.5 60.0

75% CPU 99.5 99.5 99.5 40.0 92.5 60.0

100% CPU 99.5 99.5 99.5 37.5 92.5 40.0

IO Busy 99.5 72.5 95.0 20.0 92.5 40.0

Table 1. Accuracy of SPAG and Monkeyrunner in percentage

5.3 Effects of Smart Wait and Event Batch on Accuracy

This experiment evaluated the effect of smart wait and event batch. Event batch aims to

remove the communication uncertainty between DUT and PC while smart wait aims to remove

the uncertainty of DUT runtime execution environment. They can be applied together or

separately depending on the communication uncertainty and runtime execution environment. As

Figure 3 shows, in the case of 100% CPU workload, the accuracy of SPAG was 77.5% with

event batch function and was 92% with smart wait function. The smart wait function contributed

more than the event batch function in improving accuracy if the system is busy. This is because

the smart wait function can be applied to all GUI operations while the event batch function could

14

only improve the accuracy of moving GUI operations, such as scroll and flick.

Figure 3. Testing with Batch Event and Smart Wait

6. Conclusions

In this work, we designed SPAG to avoid non-deterministic events by batching the event

sequence and reproducing them on the DUT directly. In addition, SPAG monitors the CPU usage

of target application at runtime and dynamically change the timing of next operation so that all

event sequences and verifications can be performed on time, even though the DUT is heavily

loaded. Our experiments showed that SPAG can maintain a high accuracy of up to 99.5%.

According to our current design, as long as a smartphone is supported by AndroidScreencast, it

can be tested with SPAG without the need of modifying anything. In the future, we plan to

design a fully platform independent automated GUI testing system.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Normal 25% CPU 50% CPU 75% CPU 100% CPU IO busy

Accuracy

Addtional workload

SPAG(All)

SPAG(Smart Wait)

SPAG(Batch Event)

monkeyrunner

15

Acknowledgement

This work was supported in part by National Science Council and Institute for Information

Industry in Taiwan.

References

[1] Monkeyrunner. Available:

http://developer.android.com/guide/developing/tools/monkeyrunner_concepts.html

[2] T. Yeh, T.-H. Chang, and R. C. Miller, "Sikuli: using GUI screenshots for search and

automation," in Proceedings of the 22nd Annual ACM Symposium on User Interface

Software and Technology, 2009.

[3] T.-H. Chang, T. Yeh, and R. C. Miller, "GUI testing using computer vision," in Proceedings

of the 28th International Conference on Human Factors in Computing Systems, 2010.

[4] V. R. Vemuri, "Testing predictive software in mobile devices," in Proceedings of the 2008

International Conference on Software Testing, Verification, and Verification, 2008.

[5] Q. Xie and A. M. Memon, "Using a pilot study to derive a GUI model for automated testing,"

Transactions on Software Engineering and Methodology, vol. 18, pp. 1-35, 2008.

[6] T. Takala, M. Katara, and J. Harty, "Experiences of system-level model-based GUI Testing

of an Android application," in 2011 IEEE Fourth International Conference on Software

Testing, Verification and Verification, 2011.

[7] C. Hu and I. Neamtiu, " Automating GUI testing for Android applications," in Proceedings of

the 6th International Workshop on Automation of Software Test, 2011.

[8] O.-H. Kwon and S.-M. Hwang, "Mobile GUI testing tool based on image flow," in

Proceedings of the Seventh IEEE/ACIS International Conference on Computer and

Information Science, 2008.

[9] D. Amalfitano, A. R. Fasolino, P. Tramontana, S. D. Carmine, and A. M. Memon, " Using

GUI ripping for automated testing of Android applications," in Proceedings of the 27th

IEEE/ACM International Conference on Automated Software Engineering, 2012.

[10] W. Yang, M. R. Prasad and T. Xie, "A grey-box approach for automated GUI-model

generation of mobile applications," in Proceedings of the 16th International Conference on

http://developer.android.com/guide/developing/tools/monkeyrunner_concepts.html

16

Fundamental Approaches to Software Engineering, 2013.

Ying-Dar Lin is a professor in the Department of Computer Science, National Chiao Tung

University, Taiwan. His research interests include embedded systems, network protocols and

algorithms. He received a Ph.D. in CS from UCLA. He is an IEEE Fellow. Contact him at

[email protected].

Edward T.-H. Chu is an assistant professor in the Department of Computer Science and

Information Engineering, National Yunlin University of Science and Technology, Taiwan. His

research interest is embedded system software. He received a Ph.D. in CS from NTHU, Taiwan.

Contact him at [email protected].

Shang-Che Yu is a software engineer. He received a MS in CS from NTCU, Taiwan. Contact

him at [email protected].

Yuan-Cheng Lai is a professor in the Department of Information Management, National Taiwan

University of Science and Technology, Taiwan. His research interests include performance

analysis and wireless networks. He received a Ph.D. in CS from NCTU, Taiwan. Contact him at

[email protected].





Documents

On the Accuracy of Automated GUI Testing for Embedded …1 On the Accuracy of Automated GUI Testing for Embedded Systems Ying-Dar Lin, Fellow, IEEE, Edward T.-H. Chu, Member, IEEE,