Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
1
On the Accuracy of Automated GUI Testing for Embedded Systems
Ying-Dar Lin, Fellow, IEEE, Edward T.-H. Chu, Member, IEEE, Shang-Che Yu, and
Yuan-Cheng Lai
Abstract
Automated GUI (Graphical User Interface) testing tools are software programs used to test
application user interfaces and to verify their functionalities. However, due to the uncertainty of
runtime execution environments GUI operations may not be reproduced at the DUT (Device
Under Test) on time. The incorrect GUI operations then result in test failures. In this work we
design SPAG (Smart Phone Automated GUI) to avoid non-deterministic events by batching the
event sequences and reproducing them on the DUT directly. Furthermore, SPAG dynamically
changes the timing of the following operation so that all event sequences can be performed on
time. We conducted our experiments on Acer Liquid smart phone and compared SPAG with
Monkeyrunner. Our experiments showed that SPAG can maintain an accuracy of up to 99.5%.
Keywords: GUI, automated testing, embedded system, Android
Ying-Dar Lin and Shang-Che Yu are with the Department of Computer Science, National Chiao Tung University,
Taiwan. E-mail: [email protected], [email protected]
Edward T.-H. Chu is with the Department of Computer Science and Information Engineering, National Yunlin
University of Science and Technology, Taiwan. E-mail: [email protected]
Yuan-Cheng Lai is with the Department of Information Management, National Taiwan University of Science and
Technology, Taiwan. E-mail: [email protected]
2
1 Introduction
Automated GUI (Graphical User Interface) testing tools are software programs used to test
application user interfaces and to verify their functionalities [1, 5-7]. In the process of testing
applications, engineers first designed a test case which included several GUI operations and a set
of conditions devised to determine whether an application worked correctly or not. The test case
was converted to a script file and performed on a device under test (DUT), such as a smart phone
or tablet PC. To verify the test result, the DUT captured the screen and sent it to the host PC,
where an automated GUI testing tool performed a verification operation. We selected a popular
open source automated GUI tool Sikuli [2-3], an Android device controlling tool
AndroidScreenCast, and an Android smart phone as examples. Software engineers first wrote a
Sikuli script to describe the timing and order of GUI operations, such as scroll screen and key
press actions. During runtime, each operation of the Sikuli script was performed on the DUT
screenshot window provided by AndroidScreenCast. These operations were interpreted into
multiple motion events and key press events and were then transmitted to an Android smart
phone, which is the DUT. After performing all received events, the AndroidScreenCast captured
the screen of the DUT and sent it back to the host PC, where the Sikuli verified its correctness.
However, because of the uncertainty of the runtime execution environment, such as timing
delay variation in communication, interpreted events may not be reproduced at the DUT on time.
As a result, intervals between events may be different from those expected. The
non-deterministic event sequences may lead to an incorrect GUI operation. For example,
3
Android fling action occurs when a user scrolls a touch panel and then quickly lifts his finger. A
sequence of motion events is used to represent the operation. When an automated GUI tool
replays these event sequences, each motion event should be triggered on time in order to
reproduce the fling with the same scrolling speed. If not, the scrolling speed of the reproduced
fling action will be different from what is expected and will thus result in an incorrect result. In
order to address the issue of non-deterministic events, a commonly used method is to use
trackball instead of fling action. However, trackball is not always equipped with a smart phone.
An uncertain runtime execution environment may cause another problem, because it may
interfere in or delay the execution of an application, especially under the circumstance where the
DUT is under the condition of a heavy load. A delayed application may fail to process an event
correctly if the response to the previous event has not been completed. For example, an event
may be dropped if AUT (Application Under Test) receives the event ahead of time and is not
ready to process it. To solve the problem, an intuitive method is to delay the execution of the
operations. However, it requires experienced engineers to set the delay for each operation
properly, so that the application can receive the reproduced events.
In summary, based on our observations, automated GUI testing for smart phones faces
two major challenges: non-deterministic events and execution interference. In this work, we
aimed to design an automated GUI testing system to maximize accuracy due to the uncertainty of
a runtime execution environment. The accuracy of an automated GUI testing tool is defined as
the success rate of examining a bug-free application. The higher the success rate is, the higher
the accuracy is. For this, we designed a Smart Phone Automated GUI testing tool (SPAG), which
4
was based on Sikuli [2-3]. By using Sikuli IDE (Integrated Development Environment), we can
write GUI test cases, execute the script, automate GUI operations on desktop and verify GUI
elements presented on screenshot. To avoid non-deterministic events, we batched the event
sequence and reproduced them on the DUT. In addition, SPAG can monitor the CPU usage of
the target application during runtime and dynamically change the timing of the following
operation so that all event sequences and verifications can be performed on time, even though the
DUT is heavily loaded. We conducted several experiments on an Acer Liquid smart phone to
investigate the applicability and performance of SPAG, and compared our method with
Monkeyrunner [1].
The rest of this paper is structured as follows. Section 2 is background and related work.
Section 3 describes definitions and problem statement. Section 4 presents SPAG design.
Section 5 is performance analysis and Section 6 gives the conclusion.
2. Background and Related Work
There has been much research dedicated to automated GUI testing. The most common
approach is model-based testing (MBT), which models the behavior of target applications and
then uses the test cases generated from the models to validate the DUT. T. Takala et al. [6]
adopted Monkeyrunner and Window services to generate GUI events, and L. Zhifang et al. [7]
utilized the concept of virtual devices to test applications. These methods rely on image-based
pattern matching which is sensitive to the quality of images. On the contrary, SPAG uses GUI
components for pattern matching in order to improve the stability and the speed of the validation.
5
Several techniques and architectures were developed to cope with complex application tests.
MoGuT [8], a variant of the FSM-based test framework, uses image flow to describe event
changes and screen response. However, it lacks flexibility. Gray-box testing adopted APIs to
construct calling contexts and parameters from input files [4]. Based on a logging mechanism,
gray-box testing verifies testing results. However, for complex software, it becomes difficult to
describe the testing logic and calling context. Recently, C. Hu et al. developed an approach for
automating the testing process of Android application by using JUnit and Monkey tool [7]. W.
Yang et al. proposed a method to automatically extract a model of an application [10]. However,
both of the methods used fixed delay between consecutive GUI operations, while SPAG
determines the delay dynamically by using smart wait function. D. Amalfitano et al. designed a
method to automatically generate a model of application by using dynamic crawling [9].
However, their method required the source codes of the applications under test. On the contrary,
SPAG does not need the source code.
3. Definitions and Problem Statement
3.1 Definitions
We adopt a commonly used software testing technique called Record-Replay for embedded
systems, which includes a recording stage and a replay stage. Figure 1(a) shows the recording
stage, where the screen of the DUT is first redirected to the host PC, on which the test tool runs.
An engineer interacts with the DUT remotely: whenever the engineer performs a GUI operation
on the host PC, such as a key press or a finger touch, the test tool sends associated events of the
GUI operation to the DUT and records them in a test case. The test case also includes
verification operations, added by the engineer, to verify the response of DUT. Figure 1(b) shows
6
the replay stage, where the test executer first reads GUI operations from the test case and replays
them on the DUT. Finally, the test executer verifies the testing results according to the response
of the DUT.
denotes a test case, which includes operations { } An operation can be a
GUI operation or a verification operation, while a GUI operation can be a key press or a finger
touch, and a verification operation is used to verify the test result. The interval between
and is given by . A GUI operation consists of a sequence of events { } For
example, when a fling operation is performed, Android system generates associated move
events.
3.2 Problem Statement
Due to the uncertainty of a runtime execution environment, such as variation in
Device under test
Device under test
Host PC
Host PC
Test tool
EngineerTest case
Remote GUI of SUTScreenshot
GUI actions
Script IDEVerifications
Test tool
Engineer
Test case
Test executerScreenshot
Operations (GUI actions & Verifications)
Test result
Start testing
(a) Record stage
(b) Replay stage
Add verification
Demonstrate GUI testing GUI actions
Diagram symbols
Component
Document
Control Data
Substance
GUI actions
Figure 1. System architecture of record/replay method with DUT
7
communication delay between the host PC and DUT, each event may not be reproduced at
the DUT on time. Such non-deterministic event sequences may lead to an incorrect GUI
operation and cannot pass verification operations. Furthermore, the interval between
and may also be affected by the runtime execution environment of DUT. The GUI
application may drop the new arrival events of because the previous events of have not
yet been processed. The dropped events will also lead to test failures. In this work, we define the
accuracy of an automated GUI testing tool as the success rate of executing a bug-free application.
The higher the success rate is, the higher the accuracy is. Given a test case and a bug free
application, we aim to design an automated GUI testing system to maximize accuracy.
4. SPAG Design
We designed a Smart Phone Automated GUI (SPAG) to accurately reproduce GUI
operations and verify test results. In the record stage, SPAG monitors GUI operations, and stores
these GUI operations and associated CPU time of DUT in a test script. An engineer also added
verification operations into the test script in order to verify the result of GUI operations. In the
replay stage, GUI and verification operations were batched and sent to the DUI so that the events
could be triggered on time. Based on the CPU utilization of DUT, SPAG dynamically modified
the duration of two operations. The testing results were sent back to the host PC for verification.
4.1 Event Batch
In the replay stage, the application running on DUT kept monitoring the GUI events and
took corresponding operations. For example, a gesture, such as a swipe operation, includes
8
several multi-touch events. After receiving the multi-touch events, the application scrolled the
screen up. However, some GUI operations were sensitive to the timing of associated events. For
example, the onFling GUI operation consisted of many move events. The speed of onFling was
sensitive to both displacement and time difference between two continuous move events. If the
actual interval between two move events was longer than the interval described in the test script,
the speed of the reproduced onFling GUI was slower than was expected, and the incorrect GUI
operation may have led to test failure. Therefore, in the replay stage, it is crucial to trigger each
event at the DUT on time to avoid possible test failures.
In our implementation, SPAG stored associated events of each GUI operation and event
intervals in the test script. In addition, a tag, such as ACTION_DOWN, ACTION_MOVE or
ACTION_UP, was attached at the end of each GUI operation in order to differentiate two
continuous GUI operations. In the replay stage, SPAG first batched all events and sent them to
the DUT. Next, a module at DUT rather than a module at host PC triggered the events in order to
remove the effect of commutation uncertainty between DUT and host PC.
4.2 Smart Wait
In the replay stage, the record GUI operations were sent to the associated application
accordingly. However, the execution time of the application may be longer than was expected if
the execution environment was heavily loaded. The prolonged application may have failed to
process a GUI operation correctly if the operation came earlier than was expected. For example,
if DUT received the push bottom operation ahead of time and AUT was not ready to process the
GUI operation, this operation would be dropped and lead to test failure. A practical method was
9
to ask experienced engineers to set the duration of every two GUI operations properly, so that the
application could process GUI operations on time while maintaining a reasonable testing time.
But, the cost of manually adjusting durations is high.
In order to improve the efficiency of test process, SPAG automatically adjusted delay time
between two GUI operations based on CPU time used to perform GUI operations. The function
is called smart wait. denotes the process that performs the GUI operations. In the record stage,
when operation took place, SPAG monitored the CPU time of process at duration
between and . This was achieved by parsing data from Linux OS virtual directory
/proc. From /proc/<PID>/stat, we obtained the time the process had spent in both user space and
kernel space. In addition, from /proc/stat, we obtained the time the CPU had spent in both user
space and kernel space. Based on this information, SPAG calculated the CPU usage of the
process at duration . Both and were stored in the test script as CMD( ).
denotes the process that performed the GUI operations in the replay stage. When was
executed, SPAG monitored the CPU time of . If
was smaller than , SPAG
assumed that was not complete and calculated a proportional delay time for remaining
GUI operations. For example, in the recording stage, if used 5 ms CPU time out of 4 sec
for execution, then was 5 ms and was 4 sec. SPAG inserted a command CMD(4000 ms,
5 ms) in the test script right after . In the replay stage, when was replayed, SPAG first
waited 4 sec and read associated from the DUT. If
was 2 ms, SPAG assumed that
was not finished and estimated its completion time by 4 sec×5ms/2ms=10 sec. In this case,
the next operation was postponed by 6 sec.
10
4.3 Implementation
SPAG integrated two popular open source tools: Android Screencast and Sikuli. Android
Screencast was a desktop application that redirected the screen of DUT to the host PC and
allowed an engineer to interact remotely with the DUT by using mouse or keyboard. On the
other hand, Sikuli was a desktop application that automatically tested graphical user interfaces by
screenshot images. In the recording stage, on the host PC, SPAG record all GUI operations
performed inside the redirected screen of DUT. An engineer used Sikuli’s IDE to insert a
verification operation at the end of one or several continued GUI operations by selecting a region
of the redirected screen. The class name and activity name of the redirected screen were also
logged at that time. In the replay stage, SPAG reproduced GUI operations by sending associated
events to the DUT. Both event batch and smart wait were adopted to reduce the uncertainty of
run-time execution environment. When performing a verification operation, SPAG first checked
the class name and activity name of the redirected screen. If the check fails, SPAG will instantly
make an image comparison between the redirected screen and the predefined image. Note that
the methodologies of event batch and smart wait are portable. In order to take advantage of these
two techniques to perform GUI testing on other platforms, one would need to use an equivalent
of Android Screencast to remotely control the DUT, and integrate that tool with Sikuli or an
equivalent tool to record user interaction.
11
5. Experimental Results
5.1 Experiment Setup
In order to investigate the accuracy of SPAG, we adopted Acer Liquid smart phone for
evaluation. We compared SPAG with Monkeyrunner, which is an automated testing tool
included in Android SDK. Monkeyrunner reproduces predefined operations, such as key presses,
by generating associated events and sending the events from the host PC to DUT respectively [1].
Our test script included five commonly used scenarios: browsing a contact entry, installing an
APP over Wi-Fi, taking a picture, making a video, and browsing Google map over Wi-Fi. As
Figure 2 and 3 shows, we used a busy-loop program to adjust the CPU utilization from 25% to
100% and adopted an intensive flash read write program to simulate I/O burst condition. For
each configuration, such CPU utilization is 25%, 50%, 75% or 100%, we repeated the same
experiment 40 times and took the average value of accuracy for comparison.
5.2 Test Accuracy
The experiment evaluated the accuracy of SPAG and Monkeyrunner. As Figure 2 shows,
the x-axis is the workload type and the y-axis is the accuracy. We checked the accuracy of
Monkeyrunner manually because it did not support a good-enough image comparison function to
verify testing results. Obviously, the accuracy of Monkeyrunner droped significantly when the
CPU utilization increased or the I/O subsystem was busy. For example, the accuracy of
Monkeyrunner dropped to 64.5% when CPU utilization was 100% and to 26.5% when I/O burst
occurred. This was because the tested application was deferred for execution when the system
was heavily loaded. Monkeyrunner did not dynamically modify the duration of two continuous
12
operations. As a result, the new communing events were dropped or ignored, which made
Monkeyrunner test fail. On the contrary, with smart wait function, the accuracy of SPAG
decreased slightly when CPU utilization increased or I/O burst occurred. The accuracy of SPAG
was over 90% in all configurations we tested. In particular, the accuracy stayed 99.5% under
normal conditions, in which the CPU utilization was less than 25%.
Figure 2. Testing with SPAG and Monkeyrunner
With the same experiment setup, we also adopted three popular mobile APPs, Skype,
Twitter and Facebook, to evaluate the accuracy of SPAG and monkeyrunner. The major gesture
activity of Skype was tap while that of Twitter and Facebook was fling. As table 1 shows, the
SPAG still maintained a very high level of accuracy in all configurations. On the contrary,
Mokeyrunner performed poorly when the system was busy, especially for Twitter and Facebook.
This was because Mokeyrunner cannot trigger associated events of fling on time. Hence, the
reproduced fling speed was inaccurate and test failed.
88.0% 85.5% 77.5%
65.5% 64.5%
26.5%
99.5% 97.5% 98.5% 96.5% 96.5% 91.0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Normal 25% CPU 50% CPU 75% CPU 100% CPU IO busy
Accuracy
Addtional workload
monkeyrunner
SPAG
13
Workload Skype Twitter Facebook
SPAG monkey
runner
SPAG monkey
runner
SPAG monkey
runner
Normal 97.5 92.5 99.5 92.5 97.5 72.5
25% CPU 97.5 99.5 99.5 92.5 97.5 65.0
50% CPU 99.5 99.5 99.5 72.5 97.5 60.0
75% CPU 99.5 99.5 99.5 40.0 92.5 60.0
100% CPU 99.5 99.5 99.5 37.5 92.5 40.0
IO Busy 99.5 72.5 95.0 20.0 92.5 40.0
Table 1. Accuracy of SPAG and Monkeyrunner in percentage
5.3 Effects of Smart Wait and Event Batch on Accuracy
This experiment evaluated the effect of smart wait and event batch. Event batch aims to
remove the communication uncertainty between DUT and PC while smart wait aims to remove
the uncertainty of DUT runtime execution environment. They can be applied together or
separately depending on the communication uncertainty and runtime execution environment. As
Figure 3 shows, in the case of 100% CPU workload, the accuracy of SPAG was 77.5% with
event batch function and was 92% with smart wait function. The smart wait function contributed
more than the event batch function in improving accuracy if the system is busy. This is because
the smart wait function can be applied to all GUI operations while the event batch function could
14
only improve the accuracy of moving GUI operations, such as scroll and flick.
Figure 3. Testing with Batch Event and Smart Wait
6. Conclusions
In this work, we designed SPAG to avoid non-deterministic events by batching the event
sequence and reproducing them on the DUT directly. In addition, SPAG monitors the CPU usage
of target application at runtime and dynamically change the timing of next operation so that all
event sequences and verifications can be performed on time, even though the DUT is heavily
loaded. Our experiments showed that SPAG can maintain a high accuracy of up to 99.5%.
According to our current design, as long as a smartphone is supported by AndroidScreencast, it
can be tested with SPAG without the need of modifying anything. In the future, we plan to
design a fully platform independent automated GUI testing system.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Normal 25% CPU 50% CPU 75% CPU 100% CPU IO busy
Accuracy
Addtional workload
SPAG(All)
SPAG(Smart Wait)
SPAG(Batch Event)
monkeyrunner
15
Acknowledgement
This work was supported in part by National Science Council and Institute for Information
Industry in Taiwan.
References
[1] Monkeyrunner. Available:
http://developer.android.com/guide/developing/tools/monkeyrunner_concepts.html
[2] T. Yeh, T.-H. Chang, and R. C. Miller, "Sikuli: using GUI screenshots for search and
automation," in Proceedings of the 22nd Annual ACM Symposium on User Interface
Software and Technology, 2009.
[3] T.-H. Chang, T. Yeh, and R. C. Miller, "GUI testing using computer vision," in Proceedings
of the 28th International Conference on Human Factors in Computing Systems, 2010.
[4] V. R. Vemuri, "Testing predictive software in mobile devices," in Proceedings of the 2008
International Conference on Software Testing, Verification, and Verification, 2008.
[5] Q. Xie and A. M. Memon, "Using a pilot study to derive a GUI model for automated testing,"
Transactions on Software Engineering and Methodology, vol. 18, pp. 1-35, 2008.
[6] T. Takala, M. Katara, and J. Harty, "Experiences of system-level model-based GUI Testing
of an Android application," in 2011 IEEE Fourth International Conference on Software
Testing, Verification and Verification, 2011.
[7] C. Hu and I. Neamtiu, " Automating GUI testing for Android applications," in Proceedings of
the 6th International Workshop on Automation of Software Test, 2011.
[8] O.-H. Kwon and S.-M. Hwang, "Mobile GUI testing tool based on image flow," in
Proceedings of the Seventh IEEE/ACIS International Conference on Computer and
Information Science, 2008.
[9] D. Amalfitano, A. R. Fasolino, P. Tramontana, S. D. Carmine, and A. M. Memon, " Using
GUI ripping for automated testing of Android applications," in Proceedings of the 27th
IEEE/ACM International Conference on Automated Software Engineering, 2012.
[10] W. Yang, M. R. Prasad and T. Xie, "A grey-box approach for automated GUI-model
generation of mobile applications," in Proceedings of the 16th International Conference on
16
Fundamental Approaches to Software Engineering, 2013.
Ying-Dar Lin is a professor in the Department of Computer Science, National Chiao Tung
University, Taiwan. His research interests include embedded systems, network protocols and
algorithms. He received a Ph.D. in CS from UCLA. He is an IEEE Fellow. Contact him at
Edward T.-H. Chu is an assistant professor in the Department of Computer Science and
Information Engineering, National Yunlin University of Science and Technology, Taiwan. His
research interest is embedded system software. He received a Ph.D. in CS from NTHU, Taiwan.
Contact him at [email protected].
Shang-Che Yu is a software engineer. He received a MS in CS from NTCU, Taiwan. Contact
him at [email protected].
Yuan-Cheng Lai is a professor in the Department of Information Management, National Taiwan
University of Science and Technology, Taiwan. His research interests include performance
analysis and wireless networks. He received a Ph.D. in CS from NCTU, Taiwan. Contact him at