37
Linköping University | Department of Computer and Information Science Bachelor thesis, 16 ECTS | Datateknik Spring 2020| LIU-IDA/LITH-EX-G--20/023--SE Visualization of machine learning data for radio networks A case study at Ericsson Bingyu Niu Supervisors: Daniel Karlsson (Ericsson) Magnus Johansson (Ericsson) Zeinab Ganjei (Linköping University) Examiner: Mikael Asplund (Linköping University)

Visualization of machine learning data for radio networks1453820/FULLTEXT01.pdf · based data. The machine learning team at Ericsson has collected data from their machine learning

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Linköping University | Department of Computer and Information Science Bachelor thesis, 16 ECTS | Datateknik

    Spring 2020| LIU-IDA/LITH-EX-G--20/023--SE

    Visualization of machine learning data for radio networks

    A case study at Ericsson

    Bingyu Niu

    Supervisors: Daniel Karlsson (Ericsson) Magnus Johansson (Ericsson) Zeinab Ganjei (Linköping University) Examiner: Mikael Asplund (Linköping University)

  • Upphovsrätt

    Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

    Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

    Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

    För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

    Copyright

    The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circumstances.

    The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

    According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

    For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © Bingyu Niu

    http://www.ep.liu.se/http://www.ep.liu.se/

  • iii

    Abstract

    This thesis presents a method to develop a visualization software for time-varying and geographic-

    based data. The machine learning team at Ericsson has collected data from their machine learning

    algorithms. The data set contains timestamped and geographic information. To have a better

    understanding of the result made by the machine learning algorithms, it is important to understand

    the pattern of the data. It is hard to see the pattern of the data by only looking at the raw data set,

    and data visualization software will help the users to have a more intuitive view of the data. To

    choose a suitable GUI library, three common GUI libraries were compared. The Qt framework

    was chosen as the GUI library and development framework because of its wide-range support to

    user interface design. Animation is the main method to visualize the data set. The performance

    evaluation of the software shows that it handles the back-end data efficiently, renders fast in the

    front-end and has low memory and CPU usage. The usability testing indicates that the software is

    easy to use. In the end, the thesis compares its method to a previous method, developed in R. The

    comparison shows that even though the old method is easier to develop, it has worse performance.

  • iv

    Acknowledgement

    I would like to thank my supervisors at Ericsson, Daniel Karlsson and Magnus Johansson, for all

    their help and support from both practical and theoretical aspects. I also want to thank examiner

    Mikael Asplund and supervisor Zeinab Ganjei from the university that they guided the whole

    process of thesis work and provided many advice and suggestions on academic research.

  • v

    Table of Contents

    Upphovsrätt ...................................................................................................................... ii

    Copyright .......................................................................................................................... ii

    1. Introduction ..................................................................................................................... 1 1.1 Background .................................................................................................................. 1

    1.2 Motivation ................................................................................................................... 2 1.3 Aim .............................................................................................................................. 3 1.4 Approach ..................................................................................................................... 4 1.5 Delimitation ................................................................................................................. 4

    2. Background ...................................................................................................................... 5

    2.1 Explainable artificial intelligence ................................................................................ 5 2.2 Time-varying data visualization .................................................................................. 5 2.3 C++ .............................................................................................................................. 6 2.4 R language ................................................................................................................... 6

    2.5 QT software development framework ......................................................................... 6 2.6 Other GUI libraries ...................................................................................................... 7

    2.6.1 GIMP Tool Kit: GTK ........................................................................................... 7

    2.6.2 wxWidgets ............................................................................................................ 7 2.7 Usability testing ........................................................................................................... 8

    2.8 Related work ................................................................................................................ 9 3. Method ............................................................................................................................ 10

    3.1 Development language and tool ................................................................................ 10

    3.2 Implementation .......................................................................................................... 11 3.2.1 Structure of the software .................................................................................... 11

    3.2.2 User interface .................................................................................................... 11 3.2.3 Front-end ........................................................................................................... 12 3.2.4 Back-end ............................................................................................................ 13

    3.2.5 Two-way communication ................................................................................... 16

    3.3 Evaluation .................................................................................................................. 16 3.3.1 User Feedback ................................................................................................... 16

    3.3.2 Software Performance ....................................................................................... 17 3.3.3 Comparison ........................................................................................................ 18

    4. Result .............................................................................................................................. 19 4.1 User Interface ............................................................................................................ 19 4.2 Performance of the software ...................................................................................... 20

    4.2.1 Data preparation ............................................................................................... 20 4.2.2 Response time for the selected time ................................................................... 20 4.2.3 CPU and memory usage .................................................................................... 20 4.2.4 Rendering time ................................................................................................... 21

    4.3 Comparison ................................................................................................................ 21

    4.3.1 Data preparation ............................................................................................... 21

    4.3.2 Response and rendering time ............................................................................. 22

    4.3.3 CPU and memory usage .................................................................................... 22 4.3.4 Implementation efficiency .................................................................................. 23

    4.4 Usability testing result ............................................................................................... 23 5. Discussion ....................................................................................................................... 25

    5.1 Method discussion ..................................................................................................... 25 5.2 Result discussion ....................................................................................................... 26 5.3 Ethical and societal consideration ............................................................................. 27

  • vi

    6. Conclusion ..................................................................................................................... 28

    7. References ...................................................................................................................... 29

  • vii

    List of figures

    Figure 1. 1: One Node with three cells ........................................................................................... 1 Figure 1. 2: Relationship of the machine learning algorithms and visualization software ............. 2

    Figure 2. 1: Signals and Slots mechanism in Qt. ............................................................................ 7

    Figure 3. 1: Structure of the software ........................................................................................... 11 Figure 3. 2: Layout of the user interface ....................................................................................... 12 Figure 3. 3: Structure of a QML file ............................................................................................. 12 Figure 3. 4: Back-end modules ..................................................................................................... 13

    Figure 3. 5: Workflow of the back-end ......................................................................................... 15 Figure 3. 6: Data containers .......................................................................................................... 16 Figure 3. 7: Output from top command ........................................................................................ 18

    Figure 4. 1: User Interface ............................................................................................................ 19 Figure 4. 2: Time taken for data preparation of new software ...................................................... 20 Figure 4. 3: CPU usage for the new software ............................................................................... 21

    Figure 4. 4: Data preparation comparison ..................................................................................... 22 Figure 4. 5: CPU usage for the old software ................................................................................. 22

    Figure 4. 6: CPU and memory usage comparison in percentage .................................................. 23

    List of tables

    Table 1. 1: Input data set ................................................................................................................. 2

    Table 3. 1: Comparison of GTK, Qt and wxWidgets ................................................................... 10

    Table 4. 1: CPU and memory usage of the old software .............................................................. 23 Table 4. 2: Result of usability testing with SUS questions ........................................................... 24

  • 1

    1. Introduction This is a bachelor thesis of data visualization written with case study from Ericsson. In this

    chapter, the background, motivation, aim, approach, and delimitation will be described.

    1.1 Background

    Ericsson is one of the biggest networking and telecommunication companies that provides

    Information and Communication Technology (ICT) solutions. Ericsson is a Swedish company

    that has their business over many countries and areas over the world [1].

    There is a massive amount of data collected from Ericsson’s 2G, 3G, 4G and 5G radio networks

    and the machine learning server/models use this data to provide better service and solutions.

    The abstract nature of machine learning makes it is hard to understand the data provided by

    machine learning devices.

    This thesis was written at the machine learning team at Ericsson AB Linköping. The team

    consists of twelve software developers and two of them were the supervisors for this thesis.

    User equipment (UE) is the device that is used for communication by end-users. The most

    common UE is a mobile phone. When the connection between a UE and core network is weak

    or lost, the UE needs to find another connection with better performance. In cellular

    telecommunications, one cell means one network coverage area. Handover is the connection

    switching from one cell to another. The team develops machine learning algorithms to get a

    better prediction of handover.

    Ericsson has radio stations at many sites and the machine learning algorithms are connected to

    the stations. In this thesis each station will be called a “node”. As it is shown in Figure 1.1, each

    node has several cells that have different coverage areas. Cells collect UE events such as signal

    strength from the coverage areas and then those events are used to train machine learning

    models. There are different machine learning models for data training. The number of models

    in one node depends on the number of coverage areas. Each model has several states of training

    status including not trained, not valid, valid, not outdated, outdated, in lobby, and on hold.

    Figure 1. 1: One Node with three cells

  • 2

    1.2 Motivation

    The technology of artificial intelligence and machine learning has developed fast during the

    past decade. The machine learning models have been applied to different industries to provide

    a better prediction. The users sometimes have difficulty to understand why the models get a

    certain result. Explainable artificial intelligence is a concept intended to make the results from

    those machine learning models more understandable by humans [2]. To visualize the

    performance of different machine learning models, a Graphical User Interface (GUI) is needed

    to present data in a more human way.

    The relationship of the machine learning algorithms and visualization software is shown in

    Figure 1.2. The changes of models’ training state are recorded as the result of the machine

    learning algorithms. These data will be the input data of the visualization software which

    includes these attributes: node name, cell id, frequency, machine learning model id, timestamp,

    states of data training, latitude and longitude. Table 1.1 shows what the raw data set looks like.

    It is hard to see the states changes for different nodes in one area by just looking at the data set.

    Therefore, it is desirable to have a visualization software that can show the changes of each

    model state for each node in a continuous time flow.

    Figure 1. 2: Relationship of the machine learning algorithms and visualization software

    Table 1. 1: Input data set

    Node Cell Frequency Model ID Time State Latitude Longitude

    N001 1 347 0x105454 5/15/2019 6:59 VALID 48.5646 1.5864

    N001 1 347 0x205454 5/13/2019 20:24 NOT_OUTDATED 48.5646 1.5864

    N001 3 1288 0x168402 5/11/2019 20:35 NOT_TRAINED 48.5646 1.5864

    N002 7 6300 0x242956 5/12/2019 2:42 ON_HOLD 48.5874 1.5789

    N002 2 347 0x245688 5/12/2019 2:42 NOT_VALID 48.5874 1.5789

    N003 5 2850 0x229898 5/12/2019 2:42 VALID 48.5836 1.5584

    N004 4 1288 0x158912 5/12/2019 17:48 ON_HOLD 48.5744 1.5784

    ... ... ... ... ... ... ... ...

    Here are some scenario examples that expected from the visualization software.

    Scenario one:

    From the visualization, users see that the majority state of models in one node becomes ready

    in a short time, then the state of this node changes very fast during the following three days, and

  • 3

    after that, the state of this node become stable. It is interesting for the users to see that, so they

    can consider why the model state changes like this.

    Scenario two:

    There is one node that is never ready, and the colour is always red. The user clicks on the node

    and see that there are several models applied to this node. Some of them work well and the

    states are ready but some of them have worse performance. In this way, it is easy to see which

    model has better performance.

    Scenario three:

    The visualization software helps to show that different areas have its special pattern of model

    performance. The state changes and the majority state can be different at different geographical

    areas and this may base on some elements such as city size, number of user equipment, etc.

    Previous thesis students developed a visualization software which is a web application written

    in R language [3].However, this software has long response time when users interact with the

    user interface. Because of the long response time, this software is not in use. Thus, a new

    visualization software with a better performance is needed.

    It is important and interesting to find out proper methods to develop the software and evaluate

    the performance. This thesis develops a new visualization software and then compare it to the

    previously software to find out the difference between two approaches from various aspects.

    1.3 Aim

    The aim of this thesis is to find out an approach to develop a visualization software which can

    display the machine learning data of radio network in an understandable and efficient way. Here

    are the research questions based on the motivation:

    1. How can a visualization software for time-varying data with geographic information be implemented, thus the data can be understood better and the software has good

    performance?

    2. Compare the performance of new and old software and find out the advantages and disadvantages of these two approaches.

    To answer these research questions, several issues need to be examined:

    • Find a programming language that can execute code fast and is easy to use for software developing.

    • Find a GUI framework that can fulfil the visualization requirement and has good execution performance.

    • Find suitable data structures and containers to store data so it is easy to reach and use them.

    • Find a way to control the data import process so the data input is correct, transparent, and controllable.

    • Find a method to evaluate software’s performance.

    • Compare the performance of the new and old software.

    • Find a method to evaluate users' experience.

  • 4

    1.4 Approach

    The study began with reading theories about data visualization, explainable artificial

    intelligence, development approaches, common GUI libraries and software evaluation. Suitable

    developing language and GUI library were decided based on theory study, requirements and

    developing environment. The implementation idea was presented including back-end and front-

    end. Front-end description focused on how to show the information in a human way and what

    kinds of tools were needed. The back-end focused on the data input, data storage and data

    access. The two-way communication between front-end and back-end was described. After the

    implementation was done, the software was evaluated from two aspects: the user experience

    and the software performance. Then, the old software’s performance was evaluated. The

    performance of two software has been compared and analysed.

    1.5 Delimitation

    Because of the limitation of time, the study will choose suitable tools based on limited theory

    study and comparison. There are other ways to develop the software, but this thesis will focus

    on one possible way. The comparison of the new and old software will not consider the code

    quality, algorithm choosing, or other detailed issues. Since the data set was not collected by the

    author, the ethical issues of data collection will not be discussed in this thesis.

  • 5

    2. Background This chapter presents relevant theories about data visualization, programming language, GUI

    libraries and evaluation method. Those theories will offer a foundation to answer the research

    questions. Then, related works will be presented.

    2.1 Explainable artificial intelligence

    Machine learning algorithms have been applied to different industries to provide a better

    prediction. Most of those algorithm structures are non-linear and users have difficulty to

    understand why the machine learning algorithms produce a certain result [4]. People can control

    the input data and will get an output from machine learning algorithms. How the algorithms

    make the decision is unknown for the users. This is the so-called "black box" in machine

    learning which is not transparent for users [4]. If people do not understand how the decision

    was made by artificial intelligence, it will be hard to trust and explain the result. Explainable

    artificial intelligence, which is opposite to the "black box", is the method that makes the results

    from those machine learning models more understandable by humans [5, 4]. Explainable

    artificial intelligence aims to provide better explanation and transparency of why a result was

    reached by machine learning algorithms or artificial intelligence [4, 5].

    There are some goals for the explainable artificial intelligence, and they include trustworthiness,

    causality, transferability, informativeness, accessibility, and interactivity [2]. Trustworthiness

    suggests that the model should be trustful and be able to act as expected. Causality means that

    the model can show the relationship between various variables. Transferability requires that the

    existing model or solution can be applied to other problems. Informativeness means that the

    models need to provide enough and clear information so the users can understand the decision.

    Accessibility requires that the user with different knowledge levels can get the main clues and

    facts of the model rapidly. Interactivity means that the users can interact with the model.

    2.2 Time-varying data visualization

    The input data of the visualization software is time-based because all the collected data have a

    timestamp. The dynamic data which different in time is called time-varying or time-based data

    [6]. The characteristics of data update can sort data into different categories including

    continuous and discontinuous, regular and unregular, noisily and significantly [6]. Moreover,

    according to the behaviour of the data, it can be separated into three different types: regular,

    periodic and turbulent [7]. Regular data means that the value changes has a stable tendency

    during the time, like rising, stable, or decreasing. Periodic data is for instance temperature,

    which changes during the day and the night. Turbulent data means that the data varies a lot with

    both spatial and temporal aspects [7].

    Moere [6] summarized several methods to visualize the time-varying data including static state

    replacement, time-series plots, static state morphing, and control application. Static state

    replacement refers to updating a value by replacing the current data value with a new value.

    Time-series plots method is usually shown as a chart with curves and timeline in the chart.

    Static state morphing presents the data that have been filtered by a selected time interval.

    Control application requires that the visualization and result can be produced at any time during

    the execution.

  • 6

    From another aspect of view, time-varying data can be presented in two different ways: space

    and animation [8]. By space, the length of time or time interval can be shown as a line in the

    space. By animation, the visualization view should change based on the change of time.

    It is popular to combine the space information with time animation for visualizing time-

    dependent data. Interactivity is important for animation and the user should be able to choose

    the time point or filtering the data [8]. The speed of the animation should be clear and slow so

    the user can see the development and change clearly. Both stationary data presentation and

    animation are useful for data visualization and the choice of which type depends on the task

    requirement and data type [8, 9]. 2D is suitable for data visualization when the data set and task

    are not very complicated [9].

    2.3 C++

    In this thesis work, C++ was chosen as the development language for the back-end. This section

    introduces the features of C++. Seed [10] describes C++ as a general-purpose and cross-

    platform language which was developed based on the C language. It is an object-oriented

    language, but since it is an extension of C, it can also be used for structured programming. The

    object-oriented feature offers clear constructions for developing and code reuse. C++ allows

    developers to control the resource and memory usage of the system. Direct access to memory

    is available for C++ which improves the performance of speed and efficiency. C++ is a

    compiled language and the compiler will translate the code to machine code which can be

    executed directly by the machine. Code can be compiled directly without going through a virtual

    machine which contributes to the fast speed.

    2.4 R language

    R language is the development language for the previous software and this section will presents

    the features of R. R is a general-purpose programming language for statistical computing that

    can be used in different platforms. The language name is S and S stands for statistic. R is the

    implementation and environment of S [11]. R has been widely used for data analyse and data

    mining. R is an interpreted language which means that the interpreter needs to translate the

    source code to a sequence of instructions first and then those instructions can be translated to

    machine code [12].

    2.5 QT software development framework

    Koranne [13] introduces Qt as an application development framework that provides the GUI

    library for visualization. It is written in C++ and supports many other languages including C++,

    Java, Python, Go, C#, Ruby, etc. As a cross-platform framework, the source code compiles on

    many platforms including UNIX, GNU/Linux, and embedded Linux. Qt has many features that

    can fulfil various needs from developers. Besides the core module, Qt4 and the later version

    have many other independent modules and each module can be used independently. Some of

    the common modules are QtGUI, QtNetwork, QtOpenGL, QtSql, QtXML,and QtSVG. It means

    that Qt supports a wide range of applications and demands. Qt not only provides GUI solution

    but also a wide range of application programming APIs including memory sharing, database,

    multi-threading, and network programming. Qt has the commercial version and the open source

    version. The commercial version is under commercial license and the open resource version is

    under LGPL license.

    The communication among objects are usually made by call back functions in other toolkits,

    but Qt introduces a special mechanism for this which called signals and slots [14]. Each object

  • 7

    can send signal to others and receive signals by slot. As it shows in Figure 2.1, when some event

    occurs for an object, the object will emit a signal to another object’s slot.

    Figure 2. 1: Signals and Slots mechanism in Qt.

    Qt provides C++ extension and those extensions will be complied by Meta-Object Compiler

    (MOC) [14]. MOC will parse those extensions and produce standard C++ sources which can be

    compiled by a standard C++ compiler. The QObject class is a Qt C++ extension that supports

    object communications by signals and slots [14].

    Qt Creator is a cross platform integrated development environment (IDE) for developing Qt

    applications. It supports the desktop, mobile and embedded platforms. Qt creator provides the

    tool to analyse the code performance including CPU and memory usage [14].

    QML is a programming language for developing user interface. The syntax of QML is similar

    to JSON and it also supports JavaScript expressions. QML modules provide the engines and

    substructures of QML [14]. One module called Qt quick offer several visualization components

    and animation framework.

    2.6 Other GUI libraries

    There are other GUI libraries for visualization, in this chapter, two other GUI libraries will be

    introduced.

    2.6.1 GIMP Tool Kit: GTK

    GTK is a cross-platform and an object-oriented toolkit for graphical user interfaces realised

    under LGPL license [13]. The toolkit was originally developed for GNU Image Manipulation

    Program (GIMP) and that is why it is called GIMP Tool Kit. It is written in C and can support

    several programming languages including Python, C/C++, Perl, and Java. The toolkit is part of

    the GNU project and it is also free [15]. The user interface of GTK contains many widgets

    including windows, displays, buttons, menus, toolbars, etc [15].

    2.6.2 wxWidgets

    The documentation of wxWidgets [16] introduces wxWidgets as a cross-platform GUI library

    written in C++ and can be compiled by a C++ compiler. It supports several other languages

    such as C#, Perl and Python. With the growth of features, it can support many toolkits and

    platforms including GTK and Qt. wxWidgets still use the functions from the native platforms

    plus it provides an API for coding GUI application. Because wxWidgets uses the native API, it

    displays a native look for applications. There are many GUI components that can support

    different types of application development. The licence of wxWidgets is “wxWindows Library

    Licence” which is similar to LGPL but has some exceptions.

  • 8

    2.7 Usability testing

    Usability is one of the quality requirements for products that interactive with users [17]. It

    means that one product satisfies users’ demands and the interface of the product is easy to use.

    Usability testing is an evaluation tool to test if a product interface is easy to use or not for users.

    The purpose of this test is to get feedback directly from the users and then improve the product

    based on the feedback. The test is usually used for user-centred design and it will help to

    determine if the product meets the users’ expectation [17]. The testing environment should be

    real which means that the product needs to be accessible for the real users that are going to test

    it. There are various methods for usability testing including A/B testing, hallway testing, and

    expert review [17]. A/B testing means to evaluate a variable or element from opposite sides.

    Usually, the user will get two questions for the same issue, but these two questions A and B are

    against each other.

    Hartson and Pyla [18] describe System Usability Scale (SUS) as a method to measure the

    usability of a software or system. SUS provides a questionnaire with ten questions and a scoring

    system. Those questions were designed based on the A/B testing idea. For each aspect, there

    are two questions against each other. By SUS, it is easier to know if one system is usable or not

    usable. This is widely used nowadays to measure the usability of websites, but it is also suitable

    for a wide range of digital products including software applications [19].

    The SUS questionnaire contains ten standard questions. The first standard question is “I think

    that I would like to use this system frequently”. Since this software will be used by few people

    at special occasions, the word "frequently" was changed to "when needed". Ten questions in

    the SUS questionnaire are as follows [19]:

    1. I think that I would like to use this system when needed.

    2. I found the system unnecessarily complex.

    3. I thought the system was easy to use.

    4. I think that I would need the support of a technical person to be able to use this system.

    5. I found the various functions in this system were well integrated.

    6. I thought there was too much inconsistency in this system.

    7. I would imagine that most people would learn to use this system very quickly.

    8. I found the system very cumbersome to use.

    9. I felt very confident using the system.

    10. I needed to learn a lot of things before I could get going with this system.

    The scale of score for each question is from 1 to 5 points which stands for strongly disagree,

    disagree, neutral, agree, and strongly agree. The calculation of the total score should follow

    these rules [19]:

    1. The score of the odd number question = points -1.

    2. The score of the even number question = 5- points.

    3. Then all the scores of both odd and even questions can be summed together.

    4. SUS score = summed score *2.5

    SUS score should be inside the scale 0-100 and this can indicate the performance of usability.

    The interpreting of score intervals are as follows [19]:

    • SUS score

  • 9

    • SUS score is between 50–70: The performance is at marginal.

    • SUS score >70: The performance is acceptable.

    2.8 Related work

    This thesis was written for a case study. Hence it is difficult to find previous studies which are

    closely related to this thesis. Due to the time limitation, an elaborate literature study was not

    performed. However, three student theses were found which were written in related areas.

    Håkansson [20] evaluated Qt’s abilities to create customized graphical components and how

    easy it is to reuse those components in different projects. Håkansson built up a control system

    for CAN-bus signals by Qt framework in an embedded system. The prototype of the system

    architecture was provided.

    Anderson [21] made a research on tools that can fast build data visualization for IoT system.

    Apache Zeppelin was chosen as a proper tool to visualize IoT data. The study has also examined

    limitations of this tool for data visualization of IoT system. The performance of Apache

    Zeppelin was evaluated by usability scale, summed usability metric and interviews.

    Karlsson [22] created a visualization software to visualize log data. The purpose of the software

    development was to create a tool that can help the company, OptoNova, to improve the

    troubleshooting. The visualization software was developed by C++ and Qt. A database was

    designed and created to store log data.

  • 10

    3. Method This part will demonstrate how to answer the research questions. First, it will present the chosen

    development language and tool. After that, it will present the method implementation. The last

    part is about evaluation and testing.

    3.1 Development language and tool

    Due to the efficiency and execution speed of C++, it will be the developing language for the

    back-end. The current platform is Linux thus the GUI library should support the Linux OS.

    Considering the potential usage of other platforms in the future, cross-platform GUI is a better

    choice because it provides wider possibilities for future development or usage. Qt framework

    was chosen as the developing framework because of its wide range supports to interface

    development.

    There are many GUI libraries that support C++ but the GUI's mentioned in chapter 2.5 and 2.6

    are mentioned more often than others in literature and online sources. Table 3.1 shows a simple

    comparison of GTK, Qt and wxWidgets from different aspects.

    Table 3. 1: Comparison of GTK, Qt and wxWidgets

    GTK Qt wxWidgets

    Develop language C++ C++ C++

    Cross-platform Yes Yes Yes

    Licence LGPL LGPL wxWindows

    Library Licence

    Compiler Support

    Standard C++

    compiler

    Support Standard

    C++ compiler

    Support

    Standard C++

    compiler

    Libraries/Modules Limited Wide range Limited

    IDE No special Qt Creator No special

    Map plugin No prefixed Prefixed No prefixed

    The comparison of the different cross-platform GUIs of C++ shows that Qt is the better choice

    for software development because of its powerful functions. Since data visualization should be

    very easy for users to understand, powerful toolkit is needed.

    The goal is to locate all the nodes on a map hence the support for a map is important for this

    project. Qt supports many map plugins such as Open street map, Mapbox GL, HERE, and Esri.

    Since Qt 5.5 all those common map plugins are already prefixed in Qt. It is very simple to use

    the map and developers just need to use the plugin key. For example, the plugin key for Open

    street map is “osm”, so by adding “osm” as a name in the plugin, the map will be loaded.

    For GTK, there are some plugins available and the developer needs to get and install those

    packages before using them. For wxWidgets, there is no clear indication and support for loading

    a map.

    Compared with other GUIs, Qt provides a wider range of modules to support different demands

    such as QML and Qt Widgets. To show data in a more intuitive way for humans, Qt with wide

  • 11

    modules and advanced features for data visualization is a proper choice for this thesis. It should

    be easy to add more features and functions to the software with Qt modules. Moreover, Qt

    Creator makes it easy to develop the application and it provides performance analyse tools.

    3.2 Implementation

    This chapter will describe the structure of the software and then present how to implement

    different parts of the software.

    3.2.1 Structure of the software

    The software consists of three parts: User interface, front-end, and back-end. The user interface

    shows the data visualization and allows users to interact with it. The back-end handles data

    storage, data sorting, value calculation, and data updating. The front-end is responsible for

    creating the user interface and rendering items on the user interface. The back-end will be

    implemented by C++ and the front-end will be written in QML.

    Figure 3.1 shows a structure of the software. The front-end receives instructions from users and

    then send the update instructions to the back-end. The update instruction can be start/stop

    updating or terminate the software. Based on the instructions, the back-end updates data. Then,

    the updated information should be sent to the front-end. The data update information contains

    new changes of the data at the back-end. When the front-end received the data update

    information, it will update the user interface by the received data.

    Figure 3. 1: Structure of the software

    3.2.2 User interface

    The user interface visualizes the data and allows users to interact with it. To visualize the data,

    animation is a proper method to simulate events happening during the time. When one event

    occurs, the software can calculate the new value of the majority node state of a node and then

    update the information at the user interface. Different state status can be presented in different

    colours. Since there is no special need for 3D visualization, 2D visualization is an appropriate

    choice. Because each event has a timestamp and a location, it is reasonable to have the node

    located on a map with a timeline. The user interface will consist of three main parts: an

    information window, a map window, and a time slider view. Figure 3.2 illustrates the layout of

    user interface.

    All the nodes can be located inside the map window with different colours which stands for

    different state status. When the state status changes, the node colour changes simultaneously.

    A timeline can be located inside the time slider view to show time changes. Users can select a

  • 12

    certain time at the timeline. Animation control buttons can locate in the same area. In

    information window, the detailed data of a node can be presented by text. If the node

    information has been updated by a new event, the text should be updated at the same time.

    Figure 3. 2: Layout of the user interface

    3.2.3 Front-end

    Qt Quick module and QML language will be used to implement the front-end. Based on the

    design of the user interface, the front-end will also contains three main modules which are map

    window, time slider view and information window. The map window is responsible for

    rendering map, items on the map, and handle click event from user interaction. The time slider

    view shows changes of time and handles time selected event. This area is also responsible for

    animation control. The responsibility of information window is to get information from back-

    end and show detail information for a selected node. The structure of a QML file is a hierarchy

    structure with objects and functions. A QML file need to have one but only one root object. All

    other objects or functions should stay inside the root object. The example structure of a QML

    file is presented in Figure 3.3. The root object is the application window and under this there

    are three sub-objects: map window, time slider view, and information window. Under those

    objects, there are other sub-objects and functions.

    Figure 3. 3: Structure of a QML file

    To implement components of the front-end, many QML elements will be used and here are

    some important QML elements for implementation:

    Application window

    Map window

    updateColor()

    Nodes

    Object

    Function

    Time slider view

    Slider

    Timer

    Information window

    Text Area 1

    Function

    Object

    Text Area 2

  • 13

    • QML has a map type that allows developer to draw different map elements on the map. The MapCircle QML type with coordinate property is a proper choice for

    rendering nodes.

    • The time flow is an important element of the visualization, so there will be a timeline that shows the time of animation and allows users to drag the time point on the

    timeline to decide which moment they want to see. Slider QML type provides many

    features to implement the timeline.

    • TextArea QML type can display the information of a selected node. ScrollView QML type allows the text area to become scrollable.

    • To trigger different events and functions at the front-end, the Timer QML type can be applied.

    3.2.4 Back-end

    The back end needs to import data from CSV files, store data, and handle data updating. Also,

    it needs to send update information to the front-end. Figure 3.4 shows main modules of the

    back-end.

    The data preparation module responses for importing data from CSV files and creating data

    containers for data storage. It will create a data container to store events’ information and

    another container to store nodes’ information. The data update module is responsible for

    receiving update instruction from the front-end, updating node data in the back-end and sending

    update information to the front-end. When the data update module receives an update

    instruction from the front-end, it reads new events from the event data container. Then, it

    updates the node information in the node data container and sends updating information to the

    front-end. Information of a selected node can be read from node data container.

    Figure 3. 4: Back-end modules

    The workflow of the back-end is shown in Figure 3.5, and the function of each step is as follows:

    1. The user runs the software by command line. The CSV files’ paths and names should be provided in the command line. The software starts.

    2. The software read CSV files that are provided by the user.

  • 14

    3. The software controls if the files are correct or not. If the files are incorrect, the software will exit with instruction about how to provide correct files. If the files are

    correct, the process will go to data preparation.

    4. In this step, all the data will be imported from provided files and be stored in data containers. Two containers will be created for data storage. One is to store event data,

    and another is for node information.

    5. If the back-end get signal from the front-end, the process will go to next step, otherwise nothing happens.

    6. This step will check the signal from the front-end and update a Boolean value for animation. If the signal is to run animation, the Boolean value will be set to true. If the

    signal is to stop the animation, the Boolean value will be set to false.

    7. If the animation Boolean value is true, the process will go to step 7. Otherwise, the process will go back to step 5.

    8. Current node information in the node data storage will be updated by new events.

    9. The new updating will be sent to the front-end.

    10. If there is no terminate instruction, the process will go back to step 7 to check the Boolean value of animation. If terminate signal received, the process will go to step

    11.

    11. The software exits.

  • 15

    Figure 3. 5: Workflow of the back-end

    The information of one event or one node can be stored first in an object and then the object

    can be stored in a data container. An object is like a package that contains several data fields

    and functions. For example, an event contains information about node name, cell ID, frequency,

    model ID, state, time, and location. That information can be gathered inside one event object.

    All the objects should be sorted and stored in a reasonable way therefore data can be found and

    read correctly with high speed. The events need to be sorted by time, so the data sequence is

    important. For the node information, the data sequence is not significant. There are several data

    containers that can be used: vector or list for sequential data and unordered map for non-

    sequential data. A list is a sequential container that supports fast insertion and deletion of the

    data [23]. An unordered map is a container that store element as key and value pairs [23]. Event

    and node objects can be stored in these containers as shown in Figure 3.6.

  • 16

    Figure 3. 6: Data containers

    All the event objects are stored in a list and they should be sorted by the timestamp. The C Time

    Library will be used to handle the timestamp for instance to store the timestamp in time type

    and calculate time difference. The current information of each node can be stored in a map

    container with the node name as the key and node object as the value. By the node name, the

    information of the node can be accessed fast.

    3.2.5 Two-way communication

    Two-way communication between the front-end and the back-end can be implemented based

    on QML and C++ integration. The following integration mechanisms need to be used for two-

    way communication between QML and C++:

    • QObject class can expose C++ class attributes to QML.

    • C++ Objects can be embedded in QML by context properties.

    • Automatic data type conversion by QML engine.

    A developer needs to define a QObject class in C++ and then set this class as the context of

    QML items. Then, QObject class at the C++ side can be accessed from QML side. The functions

    in the QObject class can be invoked by JavaScript expression in QML or signal handlers. Both

    QObject and QML items can emit signals and receive signals via slots.

    3.3 Evaluation

    It is important that the software is useful for users and satisfies their expectations. Therefore, it

    is important to test and evaluate the software. The performance of the software will be evaluated

    from two different aspects: the users' feedback and the software performance. The users'

    feedback on their experience will help to improve the satisfaction of using this software. The

    software performance evaluation can measure if the software is well implemented or not from

    technical aspects.

    3.3.1 User Feedback

    Usability testing is a method for measuring if the software is easy to use or not. Usability is a

    significant benchmark for user experience. SUS will be used for usability testing and the result

    will be analysed. Except for the ten questions from SUS, the questionnaire will also contain an

  • 17

    open question which asks for suggestions and feedback. The questionnaire will be sent to 4-6

    people in the machine learning team at Ericsson.

    3.3.2 Software Performance

    There are many ways to evaluate software performance and some aspects are significate for this

    visualization software. Since it is a software that allows users to interact with it, the response

    time and rendering time should be short. Besides using this software, the users may need to do

    other things at the computer, thus the resource usage of this software should be measured.

    Hence, the software performance will be evaluated from the following aspects:

    1. The preparation time for reading and sorting data set.

    2. The response time after choosing a time point on the timeline.

    3. The time usage for rendering one item on the map.

    4. Memory usage.

    5. CPU usage.

    Time duration measurement will be conducted by measuring a start time before a function call

    and an end time after the function have returned. The duration time is the delta time between

    the start and end times.

    Top command in Linux allows the user to see the system resource usage of all the Linux

    processes. There are two output from this command shown in Figure 3.7. The first output was

    produced when only the new software is running, and the second output is made when only the

    old software is running. The “COMMAND” column of the output displays the process name.

    “RES” means the physical memory usage of this process in kb. The column of “%CPU”

    presents how many percentages of the CPU time the process used since the previous output.

    The “%MEM” shows the percentage of physical memory of this task which means this is the

    value of RES divided by entire physical memory. Solaris mode was used for the top command

    which means the percentage of CPU usage of each task will be divided by the total number of

    CPUs. Therefore, the total percentage of the CPU usage is 100%. The top command updates

    the output every three seconds. The “RES”, “%CPU” and “%MEM” values were recorded 60

    times during the running time and average value of them were used for comparison.

    The new software has the name “demo”, so it is very clear that the process demo is the software

    process. When running the old web application, three processes are relevant including R, Web

    Content, and Firefox. The total resource usage of the old software is the resource usage

    summation of R, Web Content and Firefox.

  • 18

    Figure 3. 7: Output from top command

    3.3.3 Comparison

    The performance of the old software written in R and the new software can be compared. The

    comparison will be made based on the same data set and environment. Since the implementation

    idea and method are different for these two applications, it is hard to compare details of code or

    functions. But it is possible to compare those aspects presented in 3.3.2.

    Besides the performance comparison, other aspects can be compared for instance how easy it

    is to develop software or develop a function by these two different approaches.

  • 19

    4. Result The following chapter presents the result of the software implementation, performance

    evaluation and the usability testing.

    4.1 User Interface

    As Figure 4.1 illustrates, the user interface allows users to run and stop the animation. The users

    can also drag the time slider and choose a time to start. The animation speed is also controllable

    so the users can increase the speed. Node colour will be updated by current majority state status

    and selected node information will be displayed in the information area.

    Figure 4. 1: User Interface

    With the user interface, users can have a better understanding of data patterns. This will help

    them to explain the prediction decision made by their machine learning algorithms. The user

    interface of the old software has similar layout as this user interface. Comparing to the old user

    interface, this user interface allows the users to change the speed of animation which can

    improve the usability of the software. Moreover, it also provides information of a selected node

    and all the cells under this node. The detail information of each node and cell will help the users

    to observe the performance of each node and cell.

  • 20

    4.2 Performance of the software

    This chapter presents the new software performance evaluation from the technical aspects.

    4.2.1 Data preparation

    Data preparation is the first step of the process at the back-end. Figure 4.2 shows how long time

    it takes to read, store, and sort the data. The X-axis of the chart shows the events number. The

    Y-axis indicates the time in milliseconds. The data preparation contains several steps: read data

    from CSV file, create objects to store data, push objects into containers, and sort all the objects.

    The blue line indicates the time usage in milliseconds for data preparation which includes

    sorting. The orange line shows the time used only for data sort. The result shows that the total

    time to prepare the data is short enough because the database will usually not be more than ten

    thousand events. One real data set has 5466 events and the average time taken of 10 records for

    data preparation is 25 milliseconds. All the time value in the chart is the average value of 10

    times runs.

    Figure 4. 2: Time taken for data preparation of new software

    The comparison between the blue and orange line shows that to read and store the data in the

    object take more time when data set size rises. The sort time increases also when the data

    amount increases but the growth rate is lower than the growth rate of data read.

    4.2.2 Response time for the selected time

    The user can drag the time point anywhere they want on the timeline slider and the software

    will update all the information until the selected time. The time duration between when the time

    was selected and when updating until the selected time is done was measured. The time point

    was dragged to different positions at the timeline 20 times and the time durations were recorded.

    The time taken for updating data until the selected time was between 132 milliseconds to 245

    milliseconds. The average value of these 20 records is 168.9 milliseconds.

    4.2.3 CPU and memory usage

    CPU usage percentage of the software were recorded 60 times and the records are displayed in

    Figure 4.3. The CPU clock speed is 2.6 GHz and it has 6 cores. The data was the output of top

    command and this command output updating every three seconds. The X-axis of the figure is

    the number of record and the Y-axis is the percentage of CPU usage.

    0

    10

    20

    30

    40

    50

    2000 4000 6000 8000 10000

    Tim

    e [m

    s]

    Data set size

    Data Preparation

    Preparation Sort

  • 21

    Figure 4. 3: CPU usage for the new software

    From Figure 4.3, we can see that CPU usage is between 0.5% to 2.4%. When the software is

    started and not running animation, the CPU usage is around 0.5%. When users drag and drop

    the time slider, the usage rate will reach a peak. The average value of these 60 records is 0.96%.

    In general, the CPU usage is low, and running this software will not affect the performance of

    other processes in the system.

    The memory usage of the software is very stable, and it is always 0.4% and the average physical

    memory usage (RES) is 138794.93 kb.

    4.2.4 Rendering time

    The main task of QML is to render items on the user interface. The time taken to render one

    circle and change the color of one circle were measured. To measure the time spend,

    console.time and console.timeEnd were used to output the time used in milliseconds. Each event

    was tested 10 times and the time taken was always 0 or 1 millisecond. The result shows that the

    time taken for creating or modifying one circle takes no longer than 1 millisecond.

    4.3 Comparison

    Some performance aspects of the new and old software were compared. This chapter will

    present the result of the comparison.

    4.3.1 Data preparation

    Both new and old software need to do data preparation. The data set is the same for both

    software and the event amount is 5466. The data preparation process in the new software is

    written in C++ and the task is to read data, create objects, store objects in a container, and sort

    objects. The preparation process of the old software includes reading data, reorganizing the

    data, and output data into a CSV file. The processes are not the same, therefore it is hard to

    compare the performance of data preparation. However, the data preparation obviously took a

    much longer time for the old software. The time taken for data preparation was measured 10

    times and the average value is 16 seconds. The preparation time of the C++ code takes 25

    milliseconds which is much lower compared to the old software (Figure 4.4). Even though the

    preparation processes are not the same, the big difference still indicates that R has a worse

    performance of data handling.

    0

    0.5

    1

    1.5

    2

    2.5

    3

    1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

    Per

    cen

    tage

    [%

    ]

    Records

    CPU Usage New

  • 22

    Figure 4. 4: Data preparation comparison

    4.3.2 Response and rendering time

    After users click on the pause button, the new software will stop updating immediately. But the

    response time for the old software is slow and it takes 3-6 seconds to stop the updating. This is

    because the logic is not efficient and the time for rendering a circle is long. Rendering one circle

    takes an average of 8.3 milliseconds which is long compared with QML. It takes a maximum

    of 1 millisecond for QML to render one circle.

    4.3.3 CPU and memory usage

    The CPU clock speed is 2.6 GHz and there are 6 cores. For the old software, the CPU usage in

    percentage was recorded 60 times as shown in Figure 4.5. The X-axis of this figure is the

    number of records and the Y-axis is the percentage of CPU usage. The old software is a web

    application and there were three relevant processes including R, Firefox, and web content.

    Figure 4. 5: CPU usage for the old software

    The average CPU usage of R was 16.52%. The web event uses 1.44% of the CPU on average.

    The average CPU usage of Firefox is 1.42%. Together, the average CPU usage of the old

    application is 19.38%. The average CPU usage of the new software is 0.96% which is much

    lower than it for the old software. The memory usage was also recorded. The percentage of

    memory usage is stable and the size of physical memory usage (RES) various inside a very

    small scope. As shown in the Table 4.1, the R process always use 0.6% memory and the average

    25

    16003

    0

    2000

    4000

    6000

    8000

    10000

    12000

    14000

    16000

    18000

    C++ R

    Data Preparation Comparision

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

    Per

    cen

    tage

    [%

    ]

    CPU Usage Old

    R Firefox Web content

  • 23

    physical memory usage (RES) is 186510.93 kb. The Firefox takes 0.92% which is 311254.67

    kb. The web content takes 0.8% of the memory which is averagely 255976.53 kb. Together, the

    memory usage for the old software is 2.32% and the average physical memory usage is

    753742.13 kb.

    Table 4. 1: CPU and memory usage of the old software

    RES (kb) %MEM %CPU

    R 186510.93 0.6 16.52

    Firefox 311254.67 0.92 1.42

    Web content 255976.53 0.8 1.44

    SUM 753742.13 2.32 19.38

    In Figure 4.6, the CPU and memory usage percentages of both software are compared. It shows

    that the old software uses more CPU and memory than the new software.

    Figure 4. 6: CPU and memory usage comparison in percentage

    4.3.4 Implementation efficiency

    Looking through the code, R is simple to implement and the code length is much shorter than

    C++ for doing a same task. For example, R recognizes the columns in the CSV file and can read

    needed data by column name directly. C++ cannot recognize the columns directly and it takes

    longer code to handle the data reading.

    4.4 Usability testing result

    Five feedbacks of SUS questionnaire are received, and the result is shown in Table 4.2. The

    highest point is 95 and the lowest point is 70. The average point is 85 which is over 70 and this

    means the performance of this software is acceptable.

    Besides those ten questions from the questionnaire, the users could give feedback in an open

    question. The feedback shown that there were some bugs for instance the software crashed

    sometimes, and the map had a flickering problem. These problems were tested again, and bugs

    have been fixed at an improved version.

    0.96 0.4

    19.38

    2.32

    0

    5

    10

    15

    20

    25

    CPU Memory

    Per

    cen

    tage

    [%

    ]

    CPU and memory usage comparison

    New Old

  • 24

    Table 4. 2: Result of usability testing with SUS questions

    SUS Questions P1 P2 P3 P4 P5 AVG

    1. I think that I would like to use this system when

    needed. 5 4 5 3 4 4.2

    2. I found the system unnecessarily complex 1 1 1 1 2 1.2

    3. I thought the system was easy to use. 4 5 5 4 4 4.4

    4. I think that I would need the support of a

    technical person to be able to use this system. 1 1 2 1 2 1.4

    5. I found the various functions in this system were

    well integrated. 5 5 4 4 4 4.4

    6. I thought there was too much inconsistency in

    this system. 2 2 1 3 2 2

    7. I would imagine that most people would learn

    to use this system very quickly. 5 5 5 3 4 4.4

    8. I found the system very cumbersome to use. 1 2 1 2 2 1.6

    9. I felt very confident using the system. 5 5 4 3 4 4.2

    10. I needed to learn a lot of things before I could

    get going with this system. 1 1 1 2 2 1.4

    Percentile SUS Score 95 92.5 92.5 70 75 85

  • 25

    5. Discussion In this chapter, the method and result will be discussed. Ethical and societal consideration will

    also be presented in this chapter.

    5.1 Method discussion

    In this thesis, the way to visualize the data is an animation with value changes. There are other

    methods to show data changes during the time such as a chart that has time as X-axis and value

    as Y-axis. Different visualization methods can be implemented and then users can test all of

    them. Then different visualization methods can be compared and evaluated to find out which

    one is better.

    The developing language and framework were chosen by simple motivation and comparison.

    The purpose of this thesis is not to find out the best language or framework. Instead, the aim is

    to try to find an efficient method to develop the software and then evaluate the performance.

    There are many other approaches for developing this software and there is limited meaning to

    deeply compare languages or frameworks in this case.

    One reason why the Qt framework was chosen is that it has many modules and plugins to use.

    However, the frequent updated plugins and versions caused some problems. Most of the

    plugins, modules, or even some data types have version dependencies and requirements. Some

    data type which works in an older module version may totally become unrecognized in a newer

    module version. Therefore, developers should be cautious with version requirements with the

    Qt framework.

    The size of the data set is not very large, and the goal of this software is not to handle large data

    set. The data preparation time increases linearly when the data set size increase. If the data set

    is very large, it may take long time for the data preparation. In this case study, the data set was

    limited and there was no requirement for handling large data size. The evaluation was more

    focused on visualization, response speed, resource usage, and user experience. It is difficult to

    find a standard way to evaluate different software and the evaluation method is always depends

    on the purpose and features of the software.

    In this software, the front-end is not very complex thus it may hard to say if QML has high

    performance for rendering more complex items. The time to render one circle was measured,

    however this function is simple, therefore the full capability of QML was not examined. If

    people want to know more about the performance and capability of QML, more complex front-

    end should be used for evaluation.

    I measured the time taken for rendering one circle by measuring how long time it takes for one

    piece of rendering/drawing code. This is maybe not the real time to render one item on the

    interface. But the time difference of the same function code still makes sense for the comparison

    purpose.

    To evaluate the CPU and memory usage, the top command was used. The data of relevant

    processes were recorded such as the demo process for the new one and the R process for the old

    one. However, other processes may also be relevant for the software.

    The comparison of old and new software is difficult because they were developed by different

    languages, frameworks, and logic.

  • 26

    5.2 Result discussion

    From the result of the new software performance evaluation, we can see that the visualization

    software made by Qt frameworks has a good performance. The back-end is written in C++ and

    this makes sure the high speed of data handling and code execution. QML as the language of

    front-end also has good performance of handling signals and rendering items. The CPU and

    memory usage are low, thus running this software will not affect other tasks.

    The comparison of the old and new software indicates that it is better to write CPU intensive

    code in C++ than R. It takes a long time for R to handle data and in this case, it takes 16 seconds

    to prepare the data of 5466 events. If the waiting time is long, it is better to provide a process

    bar so the user can see how long time left for waiting. Both the CPU and memory usages are

    higher than the new software which means it takes more resources to run.

    Figure 4.3 shows that user interaction leads to increased CPU usage. If a user interacts with the

    user interface very often, then the average value will increase. The highest value of the CPU

    usage is 2.4% and this value is still much lesser than the average value of the old software which

    is 19.38%.

    The value of time duration and resource usage were recorded many times and the average values

    were used for comparison and performance evaluation. Using average value can increase the

    reliability and validation of the result value.

    The performance difference is made by diverse elements, but the main reason can be the

    performance difference of C++ and R. A compiled language usually has better performance

    than an interpreted language. Moreover, R is developed for statistical purposes and is commonly

    used for data analysis. Visualization software is not equal to statistic software. R also takes

    more CPU usage, and this is maybe because all the code needs to be interpreted at the run time.

    From the developing aspect, the Qt framework with QML is quite easy to learn and use. The

    wide range of plugins and modules of QML provides strong support for front-end development.

    Since the QML code structure is similar to JSON and JavaScript expressions are available in

    QML, it is easy to write QML code. However, compared to the R code, the C++ code in the

    back-end is usually longer and takes more time to develop. But the Qt framework has some

    inconsistency of modules and data types in different versions which may lead to troubles.

    Consider all the aspects above, the Qt framework is a usable and efficient approach for

    visualization software development. It is very simple to handle some statistic data with R, but

    it is less efficient to develop a whole visualization software only with R.

    The result of usability testing indicates that the software is user friendly, therefore the users do

    not need to learn so much before they use it. All of the reported bugs from the questionnaire

    could not be found. Some bugs may occur due to different machine environments. The map

    flickering problem is made by the map plugin version. A map plugin version that works for all

    the Qt versions was found but this version does have a flickering problem. Other map plugins

    with better performance have Qt version and module dependencies. If the users can control the

    Qt version, version and module dependencies will not be a problem.

  • 27

    5.3 Ethical and societal consideration

    The author has considered ethical issues during the whole research process. All the data and

    results were described honestly. The questionnaire was anonymous, and participants’

    information were protected. The author has informed the questionnaire participants about the

    purpose of this research and the usage of their feedback. All the questions in the questionnaire

    are objective without any bias. The author cooperated with other participants with respect and

    tried to avoid all kinds of discrimination. Confidential data was also protected according to the

    agreement between the author and the company.

    Visualization software of machine learning data can help the users to get a better understanding

    of the machine learning algorithms and the result made by those algorithms. Visualizing data

    in a more human way is part of the explainable artificial intelligence and this will contribute to

    the trustworthiness, causality, and informativeness of machine learning algorithms. With a

    better understanding of how a result is made by the algorithms, the users can make sure that the

    result will not harm humans or society. Moreover, the visualization software will also help the

    users to provide better products and services to customers.

    All the data should be collected and used based on relevant agreements, laws, moral standards,

    and policies. The input data used for the software were collected by the company and the author

    has not participated in this process.

  • 28

    6. Conclusion This thesis aims to find an efficient way to develop a visualization software. Even though this

    thesis is written with a case study, it still provides a possible approach for other similar cases.

    In conclusion, the objective of this thesis is met, and research questions can be answered.

    Question 1): How can a visualization software for time-varying data with geographic

    information be implemented, thus the data can be understood better and the software has good

    performance?

    This question can be answered in Chapter 3.1, 3.2, 4.1, 4.2, and 4.4.

    The software was developed by C++ and Qt framework. C++ is the programming language for

    the back-end. As a compiled language, the C++ code can be executed fast with low CPU and

    memory usage. The object-oriented feature makes it is easy to store information, access stored

    data, and reuse the code. The data was packaged in objects and then stored in list and map. For

    data input, the correctness of data set needs to be checked to make sure that the data is

    controllable and transparent. The front-end was developed with QML which is a module of the

    Qt framework. Animation is the method to show the data changes at the user interface. Map and

    timeline help to visualize the time and geographic information.

    In general, the Qt framework is powerful, easy to use, and has good performance. But the

    developer needs to be careful of version conflicts and dependencies among Qt, modules, and

    plugins.

    The performance evaluation shows that the software has good performance of data handling

    and item rendering. The usage of CPU and memory is low therefore it will not affect the

    execution of other processes. From the user experience aspect, the software is easy to use and

    does not need much pre-knowledge.

    Question 2): Compare the performance of new and old software and find out the advantages

    and disadvantages of these two approaches.

    This question can be answered in Chapter 4.3. The comparison of the old program and the new

    program indicates that the software developed in R is less effective and costs more resources to

    run. Although R does not have good performance in terms of execution speed and recourse

    usage, it is simpler to develop software with R than with C++. R also provides many data types

    and functions that support statistic purposes.

  • 29

    7. References [1] Ericsson,” Ericsson” , 2020. [Online]. Available: https://www.ericsson.com/en/about-us.

    [2] A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García,

    S. Gil-López, D. Molina, R. Benjamins, R. Chatila och F. Herrera, ”Explainable Artificial

    Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward

    Responsible AI”, Information Fusion, vol. 58, pp. 82-115, 2020. DOI:

    10.1016/j.inffus.2019.12.012.

    [3] V. Antonov och A. Sterner, ”Methods for developing visualizations in a non-designer

    environment: A case study”, Linköping University Electronic Press, 2019.

    [4] T. Miller, ”Explanation in artificial intelligence: Insights from the social sciences”,

    Artificial Intelligence, vol. 267, pp. 1-38, 2019. DOI: 10.1016/j.artint.2018.07.007.

    [5] T. Miller, ” But why? Understanding explainable artificial intelligence”, XRDS: Crossroads,

    The ACM Magazine for Students, vol. 25, nr 3, pp. 20-25, 2019. DOI: 10.1145/3313107.

    [6] A. V. Moere,” Time-Varying Data Visualization Using Information Flocking Boids”, IEEE

    Symposium on Information Visualization, Austin, 2004. DOI: 10.1109/INFVIS.2004.65.

    [7] C. Wang, H. Yu och K.-L. Ma,” Importance-Driven Time-Varying Data Visualization”,

    IEEE Transactions on Visualization and Computer Graphics, vol. 14, nr 6, pp. 1547-1554,

    2008. DOI: 10.1109/TVCG.2008.140.

    [8] W. Huang, Handbook of human centric visualization, Springer, 2014. DOI: 10.1007/978-1-

    4614-7485-2

    [9] J. Bertin, Semiology of graphics: diagrams, networks, maps, Esri Press, Redlands, 2011.

    ISBN: 9781589482616.

    [10] G. M. Seed, An introduction to object-oriented programming in C++: with applications in

    computer graphics, Springer, 2001. ISBN: 1852334509.

    [11] G. Sawitzki, Computational Statistics An Introduction to R, CRC Press, Boca Raton ,2009.

    ISBN: 9781420086812.

    [12] N. Matloff, The art of R programming a tour of statistical software design, No Starch Press,

    San Francisco, 2011. ISBN: 9781593274108.

    [13] S. Koranne, Handbook of Open Souce Tools, Springer, 2011. ISBN: 9781441977182.

    [14] M. Piccolino, Qt 5 projects: develop cross-platform applications with modern UIs using

    the powerful Qt framework, Packt Publishing, 2018. ISBN: 9781788295512.

    [15] GTK, “What is GTK, and how can I use it?”, 2020. [Online]. Available:

    https://www.gtk.org/.

    [16] wxWidgets, “About”, 2020. [Online]. Available: https://www.wxwidgets.org/about/.

    [17] J. Rubin and D. Chisnell, Handbook of usability testing. How to plan, design, and conduct

    effective tests, Wiley, 2008. ISBN: 9780470185483.

    [18] R. Hartson och P. S. Pyla, The UX book: process and guidelines for ensuring a quality user

    experience, Morgan Kaufmann, 2012. ISBN: 9780123852410.

    [19] W. Albert och T. Tullis, Measuring the user experience collecting, analyzing, and

    presenting usability metrics, Elsevier, Amsterdam, 2013. ISBN: 9780124157927.

    [20] F. Håkansson, ”Platform for Development of Component Based Graphical User

    Interfaces”, Uppsala universitet, Institutionen för informationsteknologi, 2010.

    [21] J. Anderson, ”Visualisation of data from IoT systems : A case study of a prototyping tool

    for data visualisations”, Linköpings universitet, Programvara och system, 2017.

    [22] K. Hanna, ”Visualization of Log Data from Industrial Inspection Systems”, Linköpings

    universitet, Tekniska högskolan, 2007.

    [23] S. B. Lippman, J. Lajoie och B. E. Moo, C++ primer (5. ed.), Addison-Wesley, 2013. ISBN:

    9780321714114.

    https://doi.org/10.1109/INFVIS.2004.65https://doi.org/10.1109/TVCG.2008.140