Visualizations of metadata in a GDI environment

Visualizations of metadata in a GDI

environment (Usability Evaluation of

Visualization Techniques)

Potjo Tšoene March 2004

Visualizations of Metadata in a GDI Environment (Usability Evaluation of

Visualization Techniques)

by

Potjo Tšoene Thesis submitted to the International Institute for Geo-information Science and Earth Observation in partial fulfilment of the requirements for the degree of Master of Science in Geoinformatics Thesis Assessment Board: Chairman: Prof. Dr. M.J. Kraak Supervisors: 1st supervisor: Drs C. A. Blok 2nd supervisor: Prof. Dr. M.J. Kraak 3rd supervisor: Drs P. Ahonen-Rainio External Examiner: Prof. Dr. F.J. Ormeling

INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION ENSCHEDE, THE NETHERLANDS

I certify that although I may have conferred with others in preparing for this assignment, and drawn upon a range of sources cited in this work, the content of this thesis report is my original work. Signed …………………….

Disclaimer This document describes work undertaken as part of a programme of study at the International Institute for Geo-information Science and Earth Observation. All views and opinions expressed therein remain the sole responsibility of the author, and do not necessarily represent those of the institute.

I

Abstract

Geographical metadata is very essential for geographic data search purposes. The approach employed in the search process should usable in all respects of the usability definition merits- effectiveness, ef-ficiency, and the satisfaction with which users achieve their search tasks. A user-oriented approach in a service provision of metadata through s geographical metadata service facility is a necessity that shall ensure successful use of distributed environments of geographic data storage and sharing (Geo-spatial Data Infrastructures).Geographic metadata essentially provides three key functions – provision of an overview of GDI data content, ability to compare multiple geographic data sets, collections, or series, and thirdly, provision of detailed descriptions of individual data items (Gobel, 2001). The in-formation visualization paradigm (Keim, Panse, and Sips, 2003), with visualization as an approach to data exploration, has the following attributes – provision of an overview of data items, ease of zoom-ing into interesting details and making comparisons of data items based on data attribute values in order to filter datasets appropriate for a given application, and looking at details of data on demand. The paradigm overlaps with the functions of geographical metadata, and therefore it suffices to study the extent to which visualizing metadata, with information visualization techniques, can optimize the usability of geographical metadata services. This research carried out such a study with four multi-variate visualization techniques in the XmdvTool visualization system. A study of the user require-ments and tasks for geographical metadata was done as the first phase of a user-oriented project. From the analysis of the results of that study and the study of the characteristics of geographical metadata, the above system was selected as the best environment for the prototype in order to do a user evalua-tion. A focus group evaluation was done, from which the data elicited was incorporated into the plan of another evaluation – the think-aloud user testing, followed by a questionnaire to gather subjective opinions about the usability of the prototype. Some conclusions were derived: Geographical metadata is highly multidimensional, and as such the use of multivariate visualization techniques is an intuitive approach. Not all the techniques are usable. The Parallel Coordinate Plot main display visualization technique and the Dimension Stacking interactive brush toolbox were found to be very effective in querying the metadata where many metadata elements were involved. The Star Glyph main display technique was found useful for a case of an efficient browsing of a query result, for a more detailed look into individual metadata items. Geographical metadata’s high-dimensionality character requires a visualization technique with a possibility for nominal labels displays. Numeric coding of nominal val-ues is not an informative approach. The quantification and ordering of the nominal values on a nu-meric scale is very controversial and requires a careful study. A graphical visualization user interface cannot, solely, suffice to, equally, and effectively query both nominal and numeric metadata elements. It has to be combined with other types of interfaces in order to address specific-element characteris-tics of geographical metadata.

Keywords: GDI, metadata services, Geographical metadata, Usability engineering principles,

User tasks, User requirements, Visualization techniques, Multivariate visualization techniques, Meta-data elements, Usability evaluation, Nominal elements.

II

Acknowledgements

The successful execution and completion of this work is attributed to the following:

• First and foremost, the, unconditionally loving, Almighty God, for reviving my inner strength and giving me comfort in my work every time I was feeling like giving-up on the enormous pressure that came with this work.

I would, especially, like to express my gratitude for the constant guidance I received from my three supervisors:

• Mrs. Drs. Connie Blok, for her carefully thought-about advices and cautions to even the smallest of the details of my work.

• Prof Kraak, for installing structure into this work to ensure a successful achievement of the objectives of the study.

• Drs Paula Ahonen-Rainio, for opening my eyes, with expert suggestions, to the many research possibilities regarding the concept of geographical metadata visualization.

I would also like to acknowledge the desperately-needed input from following people: • Drs. Etien Koua and Drs. Arko Lucieer, for providing expert advice on approaching the de-

velopment of the prototype with efficiency that would ensure a complete and successful de-velopment of the prototype within the fixed six month period of the project. I would also like to say that I appreciate their support by sacrificing their precious time in order to participate in the usability evaluation sessions. I would also, especially, like to thank Arko for letting me have a look into his work in order to prepare for my first usability evaluation.

• Drs Daniel van de Vlag, Drs Trias Adytia, Drs van Leeuwen, Drs. Wan Bakx, Ing. Remco Dost, Mr G. Reinink, and Dr. Andreas Wytzisk, for taking time to willingly participate in the usability evaluations, and thereby, providing very useful feedback for a discussion of the us-ability of the prototype.

• Mr. Ard Blenke, for all the technical support which was always urgently needed. A special expression of appreciation is given to

• The XmdvTool development team, for their support and cooperative responses to my requests for technical advice regarding correct data pre-processing steps into the XmdvTool system, for the prototype.

Last, but not least, my gratitude also goes to: • My family, my girlfriend, my friends and colleagues, for their moral support and encourage-

ment through, both, the difficult and easy periods phases. • The Dutch government, for willing to give me an opportunity to expand my professional hori-

zons at the expense of their tax payers.

III

Table of Contents

ABSTRACT .......................................................................................................................................... I

ACKNOWLEDGEMENTS................................................................................................................II

TABLE OF CONTENTS.................................................................................................................. III

1. INTRODUCTION........................................................................................................................1

1.1. MOTIVATION ..........................................................................................................................1 1.2. BACKGROUND AND PROBLEM STATEMENT ............................................................................1

1.2.1. Background ....................................................................................................................1 1.2.2. Problems ........................................................................................................................2

1.3. RESEARCH QUESTIONS ...........................................................................................................4 1.4. OBJECTIVE(S) .........................................................................................................................4 1.5. METHODOLOGY ......................................................................................................................5 1.6. THESIS STRUCTURE ................................................................................................................7

2. USER TASKS AND REQUIREMENTS ANALYSES .............................................................8

2.1. USABILITY ..............................................................................................................................8 2.1.1. Usability defined ............................................................................................................8 2.1.2. Usability engineering.....................................................................................................9 2.1.3. Methods for user requirement and task analysis.........................................................11

2.2. TARGET USERS......................................................................................................................12 2.3. USER TASKS AND REQUIREMENTS IN LITERATURE ...............................................................13 2.4. USER TASKS AND REQUIREMENTS IN THIS RESEARCH..........................................................17 2.5. SUMMARY AND CONCLUSIONS.............................................................................................18

3. METADATA AND GEOGRAPHICAL METADATA VISUALIZATION .........................20

3.1. METADATA AND GEOGRAPHICAL METADATA ......................................................................20 3.1.1. Metadata defined..........................................................................................................20 3.1.2. Importance of metadata in a GDI................................................................................21 3.1.3. Characteristics of geographical metadata ..................................................................22

3.2. VISUALIZATION OF GEOGRAPHIC METADATA.......................................................................24 3.2.1. Status quo.....................................................................................................................24 3.2.2. Visualization techniques for geographical metadata ..................................................25

3.3. SELECTION OF THE APPROPRIATE TOOL AND TECHNIQUES ..................................................34 3.3.1. Techniques ...................................................................................................................35 3.3.2. Tools.............................................................................................................................35

3.4. SUMMARY AND CONCLUSIONS.............................................................................................37

4. PROTOTYPE DEVELOPMENT.............................................................................................39

4.1. DATA ....................................................................................................................................39 4.2. DATA PROCESSING................................................................................................................40

IV

4.2.1. Raw data ......................................................................................................................40 4.2.2. Organization of the data ..............................................................................................40 4.2.3. Data formats ................................................................................................................41

4.3. FILES AND FOLDERS .............................................................................................................45 4.4. VISUALIZATION TOOL INTERFACE ........................................................................................45

4.4.1. Working of the tool.......................................................................................................45 4.4.2. Limitations of the tool ..................................................................................................51

4.5. SUMMARY AND CONCLUSION ..............................................................................................52

5. USABILITY EVALUTATION OF PROTOTYPE.................................................................53

5.1. INTRODUCTION .....................................................................................................................53 5.2. FOCUS GROUP EVALUATION .................................................................................................53

5.2.1. Focus group plan and overview ..................................................................................54 5.2.2. Focus group members..................................................................................................54 5.2.3. Focus group session.....................................................................................................55

5.3. USABILITY TESTING..............................................................................................................60 5.3.1. Test methods.................................................................................................................61 5.3.2. Test plan and procedure ..............................................................................................62 5.3.3. Test users......................................................................................................................62 5.3.4. Test laboratory set-up ..................................................................................................63 5.3.5. Test Scripts ...................................................................................................................63 5.3.6. Test sessions .................................................................................................................66 5.3.7. Usability Data ..............................................................................................................68

5.4. DISCUSSIONS ........................................................................................................................75 5.5. SUMMARY AND CONCLUSIONS.............................................................................................79

6. CONCLUSIONS AND RECOMMENDATIONS ...................................................................81

6.1. CONCLUSIONS ......................................................................................................................82 6.1.1. Recap on problems.......................................................................................................82 6.1.2. Recap on questions ......................................................................................................83 6.1.3. Recap on objectives......................................................................................................84

6.2. RECOMMENDATIONS ............................................................................................................85

BIBLIOGRAPHY ..............................................................................................................................86

APPENDICES ....................................................................................................................................89

VISUALIZATION OF METADATA IN A GDI ENVIRONMENT (USABILITY EVALUATION OF VISUALIZATION TECHNIQUES)

1

1. Introduction

1.1. Motivation

Today’s large, distributed geographic databases require some form of cataloguing (metadata) in order to document the types of data held by the database, the entities and attributes, the spatial reference and location, the quality, and many other descriptive elements that make a dataset or a collection of data-sets unique from others and provide information that determines its fitness for use. The basic func-tions of metadata are to provide an overview of digital library the content of a data collection, enable comparison of multiple information items and, and to provide detailed descriptions of individual items. Visualization of metadata can be very relevant during a data search process where a metadata database is the basic search tool. According to an information visualization paradigm, with visualiza-tion techniques one can get an overview of data collections, one can zoom into interesting details and filter datasets by comparing the data items based on some attribute values, and finally one can get de-tails on demand of a single dataset. Therefore, the information visualization paradigm addresses the functions of metadata. However, not all visualization techniques can be considered useful for meta-data. That is why a ‘usability evaluation’ of the techniques should be done in order to determine their usability. The evaluation is done in order to determine the applicability of each of the techniques by studying its design principles and then mapping them to the characteristics of geographical metadata. The tasks performed on the metadata and the explicit and implicit user requirements and characteris-tics of the target users should be taken into account in this process. It is important to approach the evaluation optimally by employing usability evaluation methods through which the correct data can be gathered, based on the usability specifications of the study. It is by adopting this user-oriented approach during the development of visualization techniques and/or any other search mechanisms for geographical data using metadata that the search process will effectively, efficiently, and with a certain degree of satisfaction address the geographical data’s user requirements and tasks. This research work, therefore, suggests the utilization of visualization techniques as a usable approach in the search for geographical data from a distributed environment of a geographical data repository by using metadata.

1.2. Background and Problem statement

1.2.1. Background

Metadata is an indispensable component of a geographical dataset since it provides, among other things, descriptive information about the identification of the data, its quality, the types of entities and their attributes, spatial reference, distribution details, spatial location and coverage, and the data’s spatial representation. The data are described according to some standard, for example, the FGDC standard for geographical metadata. Metadata catalogue systems offer the best way to search for data,


2

especially in case of large datasets: “large geographic information (GI) database systems cannot func-tion without metadata” (ETEMII, 2002, pp 16). The spatial location and distribution information is even more important if the datasets repositories are spatially distributed, as it is the case for a GDI infrastructure. It is a pity that on user interfaces of many metadata services the search mechanisms for geographic data, with such important information as metadata, do not pay sufficient attention to addressing the problems stated in the ‘Problems’ section below. It should be noted that this is not only the case in metadata services as Shneiderman (1997) points out that despite today’s interfaces being claimed to be good, all users (novice and expert) often experience anxiety and frustration all too often. It is probably best to start by taking a look at the actual ‘stages of resource discovery’ (Gobel and Jas-noch, 2001) from a collection of geographical metadata so that each of the problems can be clearly associated with a particular stage: In a normal search scenario for metadata, the user first wants to get an overview (collection level) of the available datasets. Then he runs a query to select a subset of the whole collection that is hosted in a geographical area of interest and/or in a particular domain among the available domains spanned by the collection. Finally, he evaluates (for his particular application need) the selected set according to some more detailed search criteria (Ahonen-Rainio and Kraak, 2003). He actually carries out the evaluation by comparing values of certain metadata elements/variables. In this last stage, visualization techniques can be employed in order to simplify the comparison proc-ess. Spence (2001) defines visualization of data as a cognitive activity by a human being to gain in-sight into, and understanding of, certain aspects of data by forming a mental model of the data. In the evaluation process of metadata, visualization techniques become powerful due to the strength of the human visual sense in quickly detecting trends, correlations, and anomalies in exploring rapidly large amounts of data (Uhlenküken, 2000).

1.2.2. Problems

• Most metadata are still presented in textual form on a ‘dataset-by-dataset’ basis. In the evalua-tion stage most people do not want to read, one item after another, tables or free-flowing text containing hundreds of metadata elements (for example, ISO 19115 suggests more than 400 metadata elements) to compare (for example) hundred metadata items. This process requires a lot of time, and the user can easily lose track of the attribute relationships between the browsed items as he/she moves from one data item’s metadata text to another.

• Geographical metadata, according to existing (geographical) metadata standards, can be very complex to explore due to the existence of missing values in the metadata. The issue of miss-ing values and unknown values is of critical importance for the data searcher. If the location and patterns of these values are not followed properly in the metadata collection, the interpre-tation can be mistaken and therefore lead to wrong choices in the metadata items of interest. Also, if the metadata is still presented textually, on a ‘dataset-by-dataset’ basis, it takes too long to realize which of the selections have ‘null’ values. The search process normally starts an overview query where items belonging to certain topic categories or keywords are selected from certain geographical areas. Thereafter a more detailed (multi-dimensional) search fol-lows. This is where most missing data is encountered, and it can take some time to realize how many of these items have missing data in the metadata elements of interest.


3

• Geographical metadata use facilities exist in an age where user-driven approaches in the de-sign and development of products are gradually becoming a priority. Geographical data has historically been produced according to the producers’ own specifications, where the user simply had to accept and live with what is available. It is not surprising, therefore, that there are complaints and suggestions for improvements from users for most existing metadata use facilities. It is therefore a challenge to design a geographical metadata use facility that ad-dresses both the explicit and implicit target user requirements and tasks.

• The volume of geographical datasets increases dramatically all the time. This ‘data explosion’ requires even more efficient and, still, effective techniques for quick, reliable searches of geo-graphical data.

Based on the above problems, this study hypothesizes that the adoption of certain, carefully, selected visualization techniques can significantly improve the usability of geographical metadata facilities. As part of the ITC T-GDI project, a visualization tool prototype is being developed (Ahonen-Rainio and Kraak, 2003) whereby the use of ‘multivariate’ techniques such as Parallel Coordinate Plots (PCPs), Scatterplot Matrices, Star Glyphs, and Chernoff faces should be interactively visualized in a multiple view environment in order to query metadata in a search for appropriate datasets from a GDI. This research is related to the above research project. According to the results of a concept testing in the early stages of the design of the above prototype by Ahonen-Rainio and Kraak (2003) most users did not favour the use of Chernoff faces. Users had no problems with the extreme expressions of the faces, which, obviously, indicate the yes-or-no decision that should be followed by the user according to his suitability evaluation criteria. But the users claimed that the intermediate expressions of the faces are easily confused, and, in addition, the ex-pressions tend to arouse emotions and therefore impede a completely objective evaluation of the rela-tions between data items. As such, the Chernoff faces were ruled out. The above set of techniques, in an interactive multiple view environment, still does not fully accom-modate the need for evaluation of a large set of dataset items (hundreds or even thousands). Therefore interactive hierarchical displays technique (Yang, Ward and Rundensteiner, 2003) would be a proper tool for representing large number of datasets at different levels of detail. It should be noted that the usability of the solution proposed by Ahonen-Rainio and Kraak (2003) is, so far, only based on concept testing in the early design stages of the prototype. But the other solution (hierarchical displays) was actually tested with an existing, working prototype. However the users in the second solution were not exactly the target users of this study. Also, the data used, let alone not being geographical metadata, was not metadata. Just like many other attempts for solutions, it is not safe to assume that they are really ‘solutions’ needed by users until a proper usability evaluation with the target users is carried out. Shneiderman (1997) makes a distinction between four major usability evaluation methods, expert re-views, usability and laboratory testing, surveys, and acceptance tests. He goes further to identify sev-eral methods within the usability testing methods domain, which include discount usability engineer-ing, field tests, and competitive usability testing. Nielsen (1993) also distinguishes methods such as heuristic evaluation, focus group evaluation, observations, questionnaires and interviews, and the us-ability testing think-aloud method. According to Faulkner (2000) the methods differ by a) either being formative or summative in nature and/or b) whether they support analytical or empirical evaluation. c) Another distinction, which is especially important towards the end of the evaluation process where a


4

prototype of the system is tested, is the distinction based on the type of usability data that can be ob-tained with the method. Faulkner (2000) recommends a method that elicits from a user evaluation of a product or a prototype, the following usability metrics (measures): the effectiveness and efficiency of the system, user satisfaction that comes with the use of the system, the learnability and flexibility, and the attitude of users when utilizing the system. On the other hand, the International Standards Organi-zation (ISO)’s usability standard (ISO 9241-11, 1998) recommends a classification of these merits into just three; effectiveness, efficiency, and the degree of satisfaction with which users achieve their tasks with the use of the system. In this research work one intermediary user evaluation was carried out, using the focus group evalua-tion method (Nielsen, 1993), essentially, in order to guide the subsequent evaluation. Then a final us-ability evaluation was done, by the usability testing think-aloud method in combination with a ques-tionnaire.

1.3. Research questions

The questions that this research, given the above problem, will attempt to answer are the following: 1. What is the significance of metadata for users in a distributed setting of information provi-

sion, specifically, a GDI? 2. What are the characteristics of metadata in a GDI (i.e., characteristics of geographical meta-

data)? (Ahonen-Rainio and Kraak, 2003). 3. Who are the users of metadata in a GDI? 4. What tasks do the users perform when exploring geographical metadata, and therefore, what

are their requirements? 5. How can visualization tools and techniques help geographical metadata users in carrying-out

their tasks more effectively and efficiently? 6. Which visualization techniques can be used to address the specific user tasks and require-

ments, and the characteristics of geographical metadata? 7. Are the techniques usable?

Can visualization in geographical metadata services offer a more usable means of achieving users’ tasks on geographical metadata than the existing means?

1.4. Objective(s)

The concept of Geospatial Data Infrastructure (GDI) is capturing the interest of many people due to the wealth of geospatial data that is available in distributed data repositories and the distributed geo-computing possibilities offered by the Internet. A GDI tries to bring together the knowledge of exis-tence of geospatial data hosted in spatially distributed databases through structured catalogue descrip-tions of each dataset and data collection. These descriptions are referred to as metadata and they are normally hosted in a central metadata service facility such as a clearinghouse. According to a techni-cal report by (ETEMII, 2002) metadata is described as an educational tool required by users, and that, there is a strong demand for a metadata service. The users of geographic metadata range from experts in geoinformation sciences, experts in other fields who make use of geographic information, and casual users who only need metadata once in a


5

while. All these users (and many others) need to accomplish their tasks with metadata with ease and efficiency (among other usability attributes/merits). They can only do so if metadata services exhibit a user-driven approach to the way they advertise metadata. Such an approach involves an adaptation of usability engineering principles during the design and development of all components (including the visualization tools and techniques employed) of the service. In other words, there is need for usability assessment of the tools and techniques developed for geographic metadata usage so that they address the needs of the users. This study proposed the following objectives:

• The design and development of a geographical metadata visualization prototype. The proto-type should attempt to address user tasks and requirements (to be determined from literature review) in the use/exploration of geographical metadata.

• Determination of the usability of the prototype, through user evaluations with geographical data users. This is the primary objective of the whole study.

1.5. Methodology

Figure 1 gives an overview of a procedure followed to achieve the above objectives. It is explained thus:

1. Carried out a literature review on; • The generic user requirements and tasks when searching for and using geographic

metadata in a distributed environment. • Visualization techniques that can be employed on metadata.

2. Having identified suitable techniques based on the user requirements and tasks, selected a suitable information visualization system to develop the prototype.

3. Pre-processed metadata and developed a prototype 4. Evaluation of the usability of the prototype with some of ITC students and members of staff

as subjects (overview): A. Development of a test plan.

• List of tasks and/or list of questions • Subjective satisfaction and debriefing questions • Identification of number, type, and source of participants • Choice of participants that represent the target users.

B. Commencement of the test by reading of statements that indicate what the participants

will be doing and a note that ‘it is the system being tested and not the participants.’ C. Completion of the test by giving participants a chance to make general comments and post

questions.

i. Focus group evaluation (Evaluation of the prototype to identify the errors and early usability features of the prototype).


6

Figure 1.1: Methodology flowchart

Analysis and Discussion of user evaluation usability data and

observations

Selection of the appropriate visualization display techniques

Selection of dynamic ‘interactive’ techniques to interact with the display techniques

Selection of an appropriate visualiza-tion system

Pre-processing of the metadata and development of the prototype

Conclusions and recommendations of the research

1st Usability evaluation: Focus group session

2nd evaluation: Think-aloud Usability

Testing and Questionnaire

Identification of target users and determination of user requirements and tasks

Test user identification and invitation Revise

prototype Incorporate revisions into

the 2nd evaluation

Analyze and discuss, di-rectly, 1st

evaluation re-sults

Incorporate user feedback into 2nd evaluation


7

a. Presentation the prototype b. A supervised exercise c. Task execution d. Discussion of the usability issues and recommendations

ii. Think-aloud method supplemented by a questionnaire (Evaluation of the usability of the prototype and the whole concept of geographi-cal metadata visualization)

a. Installation of recording devices. b. Presentation the prototype c. A supervised, video-taped think-aloud exercise d. Recorded think-aloud task execution e. Questionnaire answering

5. Discussed the results and observations, and derived conclusions and recommendations.

1.6. Thesis structure

Based on the methodology outline above, the next chapter (chapter 2) shall report on the study of ex-isting research about the functions of geographical metadata and about the user requirements and tasks regarding the use of geographic metadata and metadata facilities to search for geographic data. Also, to support the hypothesis of the study, literature review about the use of visualization techniques in the process of data search with geographic metadata will be carried-out. A report about the target users of the study will also be provided. Chapter 2 is essentially about ‘focusing on the user require-ments’. The development and user evaluation of the prototype, later on, shall be based on those re-quirements. The outcome will be a user task and system response model that also addresses the visu-alization component in the process of geographic metadata use. The subsequent chapter (chapter 3) shall also provide a review of literature, but on existing visualiza-tion techniques that can address the data characteristics of geographical metadata to satisfy the user requirements and tasks and that will have been laid out at the end of chapter 2. Chapter 4 will report on the development of a prototype. The prototype will be based on a visualiza-tion system that shall be identified at the end of Chapter 3. Chapter 5 will give an account of the final stage of the usability evaluation project for geographic metadata. It will describe the prototype evaluation with target users that will have been identified in Chapter 2. The purpose of the evaluation will be to test the hypothesis of the study and provide rec-ommendations for future research. The results of the hypothesis test will be given as conclusions of the study in Chapter 6, together with the recommendations.


8

2. User tasks and requirements analyses

Once a problem has been identified with the use of a particular product or with the type of needs the prototype is supposed to provide, then the most logical step to follow is for the designer to conceptualize the solution to the problem. However, before the hard design of the product begins, in order to avoid providing users with products that are not usable, it is imperative to clarify the needs and to model the procedure that the users prefer when using the product to achieve their goals. In a usability approach, the user tasks and their requirements form a basis for the target product specifications. This chapter will start with a discussion on usability issues. That shall be followed by an investigation of user tasks and requirements as indicated in literature. Finally, a model of metadata use tasks will be developed for a selected group of target users. The ultimate aim is to use the model to design and develop a prototype that is able to satisfy the user and to assist him in an effective and efficient way in accomplishing his tasks with geographical metadata.

2.1. Usability

All technical products developed for users are supposed to meet the explicit and, sometimes, implicit needs of the users for the products to be deemed usable and useful, or if one looks at it from an eco-nomic perspective, so that both the producer and the user can both get optimal returns from their in-vestments. The same applies to the visualization features on interfaces of geographic metadata facili-ties as Wang, Liu, and Meng (2001) add that there is a need to enhance the usability of geovisualiza-tion systems in order to prevent new techniques from misleading map users and create an adaptive environment for information representation, analysis and exploration.

2.1.1. Usability defined

What is usability? Several authors have done some work on usability issues. Some of them tend to have slightly different definitions of the term usability. In her book entitled Usability engineering Faulkner (2000) argues that the definition has evolved from as early as 1959 by Brian Shackel. Though the definitions may sound slightly different they all agree on one thing; users have expecta-tions and such expectations have to be met. Webster’s third new international dictionary (1981) de-fines usability as the quality or state of being convenient and practicable for use. Again, Faulkner (2000) indicates that the International Standards Organization (ISO 9241-11, 1998) perceives usability as the effectiveness, efficiency, and satisfaction with which specified users can achieve specified goals in particular environments. She goes further to explain what the ISO standard implies by effec-tiveness, efficiency, and satisfaction: Effectiveness refers to the ability of a given system - a human computer interaction system, in this case, to accomplish the user’s task correctly. Efficiency has to do with how quickly a certain task can be accomplished in a given time period, and satisfaction is defined as the degree of comfort felt by users when using the system or whether they prefer one system over another. This is the definition given by Faulkner (2000, pp. 12), herself; “a measure of success of a


9

product – whether it be software, computer systems or a product”. She also identifies a number of at-tributes that act as measures of success. They include the ones identified by ISO above, the effective-ness, the efficiency, and the degree of satisfaction. She also mentions another list of the measures by Shackel; learnability, effectiveness, attitude, and flexibility. Therefore, ISO’s and Shackel’s defini-tions, complementarily, introduce a definition with five measures; effectiveness, efficiency, learnabil-ity, flexibility, and degree of satisfaction. The description of the ‘attitude’ measure by Shackel over-laps very much with the measure of ‘degree of satisfaction’ by ISO. Following the ISO definition, since it is a more widely accepted standard, in this study usability will be defined as the effectiveness, efficiency, and degree of satisfaction with which geographical meta-data users can achieve their search and navigation goals from a metadata facility by utilizing visualization techniques. This means that a usability evaluation shall be undertaken where the requirements of the users in accomplishing geographic metadata use tasks will be gathered from literature, evaluated with an interim user-task-and-requirement assessment and then implemented in a metadata visualization prototype. The prototype will, later, be tested for confirmation of the hypotheses of the study. Designing usability is an ongoing part of the design process. It is not just done once-and-for-all (Kristof and Satran, 1995). However, due to time constraints in this particular study, as mentioned above, only two usability evaluations will be carried out; at the early stages of the prototype design in order to guide and focus the design process and once the prototype development is complete in order to evaluate assumptions and hypotheses of the study.

2.1.2. Usability engineering

Usability engineering is a concept that grew out of the idea to create useful and usable human com-puter interaction (HCI) systems by following a user-centred design process - from the conceptualiza-tion stage of a system, throughout its whole development life cycle, until its implementation or release phase for consumption by users. Faulkner (2000, pp. 12) defines the concept thus, “an approach to the development of software and human computer systems which involves user participation from the outset and guarantees the efficacy of the product through the use of a usability specification and met-rics.” She also makes a distinction between usability engineering and usability evaluation where us-ability evaluation, on the other hand, she defines as a process (not an approach) by which systems and products are evaluated by either the usability evaluator or the usability engineer using any of the methods available to him. According to Shneiderman (1998) the concept is growing into an accepted discipline with established practices and standards. The existence of the Usability Professionals Asso-ciation, which was formed in 1991, is an evidence of such a growth. The association boasts a large membership from large corporations and several small usability firms, and a huge number of individ-ual members. Currently the association has over 1600 members worldwide. The process of applying usability engineering principles throughout all the design phases of a product is termed the usability engineering lifecycle. It is a continuous process consisting of several usability evaluations of the product, where the evaluation methods may differ at different stages of the design depending on the usability data that the product developer wants to gather from a particular stage. Faulkner (2000) has come up with a usability engineering lifecycle model to identify the important usability evaluation activities (tasks) and the associated data (information produced) derived from each activity as shown in Table 2.1 below.


10

The model seeks to gather as much usability data as possible, in a structured, hierarchical manner. However, due to time constraints, this study stands in a slightly different position; the task in italics shall not be carried out and, furthermore, the underlined tasks shall only be accomplished through the review of existing literature (no formal method will be adopted). As indicated in the background of chapter 1, this study extends a research priority by Ahonen-Rainio and Kraak (2003). Therefore the report from the concept testing by Ahonen-Rainio Paula (2003) of their original design concept will be taken into account in the determination of user characteristics, background, task, requirements, and usability specifications. Supplementary to that, more literature will be gathered from other sources of visualization tools and techniques design, especially for geographic metadata. Therefore, in a way, the procedure that the study will follow shall not be a complete diversion from the one suggested by Faulkner (2000), above. There is no standard usability engineering plan for all designs. Almost every design requires a specific plan due to a number of factors, so-called determinants by Shneiderman (1998). Each plan consists of more or less the same list of the tasks indicated in the above model, and it associates each task or a group of tasks with a particular evaluation method, and describes how the method is implemented to achieve the goals of the task(s). The determinants that are referred to by Shneiderman (1998) are the following:

• Stage of design (early, middle, late) of the product • Novelty of the whole product development project • Number of expected evaluation subjects • Criticality of the product (for example, what the costs would be if one of the usability metrics

would be non-optimal) • Costs of product and finances allocated for testing • Time available, and • Experience of the design and evaluation team

For the current study, besides criticality of the product and costs allocated for its testing, all the other determinants shall be considered; For one, most of the visualization techniques may be quite ‘noble’ to most of the target users, or just not intuitive enough in a case of metadata visualization. Therefore, this will call for methods that take into account an evaluation of the intuitiveness of the design and the prototype itself. The methods employed should also not be too complicated to understand, by both the evaluation team and the evaluation result analyst as that can imply unreliable results. It should also not be too time demanding given that there will only be a few weeks available for the evaluations and the fact that being complicated may also necessitate a lot of time for getting familiar with the method. Also, the target users are busy people who may not always be available for the entire periods of indi-vidual evaluations. Considering the ‘stage of design’, only one interim evaluation (possibly in the form of a focus group) and an end-of-prototype-design usability testing will be carried out.


11

Table 2.1: The usability engineering lifecycle model (Faulkner, 2000), tasks in descending order.

Evaluation task

Information produced

• User characteristics. Know the user

• User background.

• User’s current task. Know the task

• Task analysis. User requirements capture • User requirements.

Setting usability goals • Usability specification

Design process • Design Apply guidelines, heuristics • Feedback for design

Prototyping • Prototype for user testing Evaluation with users • Feedback for redesign

Redesign and evaluate with users • Finished product Evaluate with users and report • Feedback on product for future sys-

tems

Though no direct interaction with users will take place, in this case, when determining the user re-quirements and tasks relevant in the use of geographical metadata by the target users, it is, however, still important to know which methods are most relevant for achieving that task – the reason being; each method has its own properties that describe the type of usability data that can be derived. There-fore that kind of knowledge can be used to assess the type of usability data found in the literature.

2.1.3. Methods for user requirement and task analysis

Various usability evaluation methods can be applied to assess what the target users require and to de-termine current and expected use tasks (and hierarchies formed by sub-tasks) from the use of a pro-posed product under design. But, to obtain optimal usability data, a lot of these methods are best ap-plied at only particular stages in the product design lifecycle. Faulkner (2000) modifies Table 2, above; by adding another column to highlight some of the evaluation methods that are highly applicable in de-termining the ‘information produced’ (see Table 2.2). In the column, she indicates that, in order to determine the user characteristics, the user background, and to determine the user’s current task, ei-ther interviews with, questionnaires for, or observations of users in their work environments are suit-able approaches. Once data on users’ current tasks have been gathered, task analysis in carried-out to yield user requirements. But of course, more information in addition to user tasks can be used to real-ize the requirements. Faulkner (2000) says that the user requirements are dictated by the profiles of the users that emerge from studying the domain work environment of the users, what their tasks are, and what performance levels for the product have to be met. In addition to the above three approaches in deter-mining user tasks, the same author, introduces ethnographic interviews of the users as one of the ap-propriate approaches for compiling information about the current user tasks in a social context.


12

Table 2.2 The lifecycle with techniques available at each stage (Faulkner, 2000). Evaluation task

Information produced

Method

User characteristics. Interviews, questionnaires, ob-

servations Know the user

User background. As above User’s current task. As above Know the task

Task analysis.

User requirements capture User requirements.

Setting usability goals Usability specification

Design process Design

Apply guidelines, heuristics Feedback for design Interviews, questionnaires Prototyping Prototype for user testing Evaluation with users Feedback for redesign Interviews, questionnaires

Redesign and evaluate with users Finished product Evaluate with users and report Feedback on product for

future systems Interviews, questionnaires

She describes the approach as being capable of developing knowledge about users, their normal work-ing practices, and their operational environment. Shneiderman (1998) has similar sentiments about ethnographic observations in interface design; designers gain insight into individual behaviour and the organizational context of the target user. From the above discussion, it follows that literature on user tasks and requirements analysis of the use of geographic metadata, gathered from ethnographic observations, questionnaire surveys, and/or in-terviews with users in their normal working environments will be highly reliable for the current study.

2.2. Target users

Users of geospatial metadata in digital environments include geospatial experts as well as experts in fields other than those associated with geography, students in various phases of their education and casual users (Gluck, 1997). Among geospatial experts, there are those who specialize in geographical information sciences, including geographical information visualization experts. This study shall focus on that group, on other geographical information science experts, and geospatial science experts from the application areas of the science. Representatives of this target group can be found among staff and students of the International Institute for Aerospace Survey and Earth Sciences (ITC). The criteria for selecting this target group for the study are based on the users’ familiarity with and the need for geographic metadata and the skills level of use of metadata, especially on existing metadata services. Geographic information science and application experts such as those at ITC work with metadata on several accounts. For example, during acquisition of geographic data needed for lecturing and students’ practical purposes, during the execution of lectures and supervision of practical sessions on metadata subjects, application analyses and on research issues pertaining to utilization of metadata. Therefore, in general, they are familiar with metadata existence, metadata elements, standards, and


13

even the use of metadata service facilities. Geographical information visualization experts act more like usability reviewer experts in this study because they know a lot more about the subject of geo-graphic information visualization of any kind of geographic information on different types of comput-ing platforms. Though the subject of metadata visualization may be quite young in the metadata use community, most of the concepts used come from normal (not metadata) data visualization and, there-fore, expertise in data visualization is very much relevant. It should be noted here that researchers in-clude ITC PhD students, whose work is highly regarded in the publication work of the institute. Having described the target users of this study, the user requirements and tasks that the prototype to be developed should satisfy will be determined. ITC staff members and PhD students shall be asked for participation in the usability evaluation of the prototype.

2.3. User tasks and requirements in literature

Not much research has been done regarding the use and usability of geographic metadata and meta-data in general. This is evidenced by lack of literature about user requirements and use tasks in the use of geographical metadata. However quite a few metadata initiatives such as the Alexandria Digital Library (ADL) project and those under the umbrella of the European Territorial Management and In-formation Infrastructure (ETEMII, 2002) have carried out some usability studies about geographic metadata users’ needs. Also a few individual researchers like Ahonen-Rainio Paula (2003) have pro-duced some research results regarding the usability of some of the visualization techniques in the use of geographic metadata. These works and a few others are highly appreciated as they form a founda-tion for further research in the field of metadata utilization. The invaluable discussions and comments on the usability evaluations that were adopted in each research all indicate the importance of ‘focus-ing on the user’s needs’. A technical report by ETEMII (2002) on a metadata users’ needs study carried-out by Methods for Access to Data and Metadata in Europe (MADAME), contains findings which were gathered from a desk study and two ETEMII workshops with users of geographical metadata from metadata initiatives at regional, national and continental European level. The users are defined as working for public institutions, public utility companies, public-or-mixed funded companies in charge of the management of geographical databases, and private companies whose activities relate near to adding value to geo-graphical databases. These two functions overlap a lot with the activities of the target users of this study (ITC members of staff) who manage geographic information through the utilization of GIS and add application value to some of the raw geographic datasets through application analyses in particu-lar geographic fields such as water resources management and forestry, to name but a few. The find-ings with respect to searching and navigating facilities from a metadata service facility are shown in Table 2.3. Table 2.4 shows an earlier report by MADAME (2000) on comparative evaluation of online metadata services and user feedback. In the report, it is indicated that the requirements were gathered from a focus group session with 19 individuals from private organizations (7), academic institutions (4), and the public sector (8). It also refers to searching and navigational mechanisms.


14

Table 2.3: User requirements from the ETEMII desk study and workshops

• During a search session for metadata, a balance should be struck between using simple data structures like ASCII files for efficient access but may not cater for more complex or dynamic searches, with structures that may be less efficient but can easily take care of complicated search mechanisms such as by geographical area.

• Users do not ‘really’ care about how long it takes them to find the required meta-data as long as they will finally find up-to-date and high quality metadata. That is, users emphasize quality on both metadata and search mechanisms more than quick and fancy tools leading to out-of-date and poor-quality metadata.

• A method of search by a gazetteer place name is highly regarded. • Less enthusiasm is shown for search mechanisms that require the use of plug-ins. • Also less enthusiasm is shown for nice but complex visualizations where the user

can easily get lost. • The use of icons to represent landmarks seems to be an idea worth giving consid-

eration.

Table 2.4: User requirements for geographical metadata from the MADAME studies

• Area-based searches were supported by 50% of the focus group. These could include a user-defined rectangle, polygon, or a point on the map. Another way would be using a clickable map with the administrative boundaries or munici-palities.

• One person commented that an area-based search could be disinformative if the datasets from that area of interest were only partly included in the directory.

• Not only area-based search should be adopted as, sometimes, that can produce a much too long listing for some areas.

• One third of the focus group considered that a keyword based search facility would be useful.

• Predefined search criteria were suggested by one of the focus groups. (Linda L. Hill et al, 2000) report on the user evaluation studies and system design of the Alexandria digital library. The report clearly indicates that the library’s collection and services deal with informa-tion that is geo-referenced. This means that geographical metadata is used for data search facilities on the web user interface. The library did some user evaluation of the 1996/1997-user interface on some of the library’s user communities, which include earth scientists, information specialists such as librarians, and educators together with physical geography students at undergraduate level. Several evaluation methodologies have been utilized to test the library’s on-line service ever since the project evolved from the beginning; on-line surveys, ethnographic observations, focus groups, internal evaluations of interface design, and analysis of the use of ADL in university classrooms. But during the 1996/1997-phase, only the ethnographic studies, classroom studies using the Java interface, and on-line survey of users were the methods used. The evaluations yielded a detailed list of user requirements as shown in Table 2.5 for the user inter-face. The analysis of the requirements indicated a need for multiple user interfaces – “multiple user interfaces is seen as the way in which the system can be adapted to the various communities” (Linda L. Hill et al, 2000, pp 256).


15

Table 2.5: List of user requirements for the ADL user interface Search functions

• The system shall present a unified search screen to the user (integration of gazetteer, catalog and map-based searching)

• Reduced, simplified set of search types for metadata attributes (search buckets); map to ADL and MARC schema.

• The user shall be able to select search areas on map (rather than use whole map win-dow as search area)

• The user shall be able to search non-contiguous spatial areas

Session management

• A summary view of the session log shall be viewable by the user during the session (query history and corresponding results sets).

• Session log shall be available to ADL help desk personnel. • User shall be able to stop a query and issue a new one. • User shall be able to return to a previous point in the session. • User shall be able to modify and reissue a query.

Result display

• The total number of items in the result set shall be displayed to the user. • User shall be able to sort result set by type, date, etc. • System shall display footprint distribution of a result set. • Set of display formats for metadata shall be defined and implemented. • Low-resolution browse images that can be enlarged for evaluation will be provided.

User workspace

• The system shall provide a user workspace where selected items can be collected in a set of user-named collections.

• User-saved queries may be re-issued to the system during a later session.

Holdings visualization

• The system shall present the user with an overview of the contents of collections by geographic coverage, genre and date.

User help functions

• Search examples will be added to the interface to aid user in creating query. • Context-sensitive help will be added. • Tutorial with FAQ-type information will be available.

Usability features

• Status of process indicator will be added.

Data distribution

• Access to data through online links, offline references, or ordering processes will be clearly displayed.

• The functionality of slicing and dicing large files for delivery will be available.

Ahonen-Rainio Paula (2003) has done usability testing on the concepts for geographic metadata visu-alization methods to help users in understanding the information expressed by metadata. She drafted the concepts in collaboration with Menno Jan Kraak in (Ahonen-Rainio and Kraak, 2003). This study,


16

actually, is linked to Ahonen-Rainio’s work as mentioned in the background and problem statements section of chapter 1. The concept tests were carried-out with users of geographic metadata from the Finnish Defence Forces. The test subjects were users working with geographic metadata either as application develop-ers, data administrators, or application users, with GIS use experiences ranging between 2 and over 15 years. The visualization techniques under testing were the graphic samples of topographic data and four multivariate visualizations - scatter plot matrices, parallel coordinate plots, star glyphs, and Chernoff faces. The subjects were encouraged to think aloud during the tests. The test results are shown in Table 2.6:

Table 2.6: User requirements from concept testing of geographic metadata visualization

Overall

• Graphic samples were found to be useful for users at any skills level • Multivariate visualizations were considered useful for cases of more than one

dataset.

Graphic samples ��Attention of users

• An individual graphic sample does not give enough useful information to label the dataset as a good or poor dataset according to the selection criteria. Also, users’ skills vary in detecting important details. A checklist of ‘important de-tails’ items to be considered on a graphic sample would be helpful.

��Selection of samples • The samples should have a large coverage of the phenomenon depicted by the

data. They should also represent the location of the dataset(s), area extent, and presentation scale of the dataset(s).

��Visual design of samples • Consideration of visual variables in the design of the samples is of great

importance to avoid misinterpretation of the graphics in the data. Multivariate visualizations

• Users prefer easily adoptable visualization methods. • As long as a visualization method meets the criteria for metadata visualization

it should be tried and tested. • Concept ideas of essential interactions in exploratory visualization were easily

adopted. Therefore they can be tried on an existing prototype. • The user interface must be as intuitive as it can possibly be. • Visualized metadata should be provided along with textual metadata

All the geographic metadata user requirements and task results from the various studies described above will be used as input to the ‘needs’ of the users for geographical metadata use. It is also vital to look at procedures the users currently prefer (and would prefer) to follow in the process of metadata usage to achieve their goals on metadata facilities. It is essential, before developing the user interface, to know the order of use events that the metadata service facility user interface should satisfy. Such information is also available from the above study by Ahonen-Rainio Paula (2003). At a general view:

1. The user, first, selects the datasets by theme. 2. Select the metadata elements to be visualized


17

3. Visualizes the elements a. By star plots b. Then by parallel coordinate plots (The exclusion of Scatter plot matrices and

Chernoff faces here is due to the general negative impressions that most of the sub-jects had about the two techniques).

4. Possibly, filters out some of the datasets, and 5. Visualizes the remaining datasets by graphic samples and textual metadata.

These steps can be considered as a user task model. The user requirements from the concept testing are associated with steps 3 to 5 in this user task model. The user task model by Ahonen-Rainio Paula (2003) is very similar to the general ‘process of infor-mation retrieval’ (Gobel and Jasnoch, 2001), whose model is relevant for all information. That is, it does not distinguish between domain specific user tasks. According to Gobel and Jasnoch (2001) in-formation retrieval consists of the following (sub)tasks:

1. Query formulation and 2. Query modification over an overview of the collection level of the metadata 3. Comparison of metadata result set, and 4. Detailed presentation with respect to visualization of the result set

2.4. User tasks and requirements in this research

Every task has a goal. The goal, in this research, is to simplify, for the geographic metadata user, the process of selecting a dataset (or datasets) that is suitable for his/her application needs – whatever the application maybe, from a collection of geographic metadata in a web metadata service facility. From the above user task models and user requirements, an integrated model of geographic metadata use tasks can be established, with the tasks decomposed to give a detailed view of the process:

1. The user accesses the user interface 2. He looks at a browse graphic to obtain an overview of the aerial coverage(s) of the data col-

lection, a gazetteer list of smaller units of the area, a list of application domains in the collec-tion’s subject, and a list of metadata elements that can be visualized together on a multivariate visualization plot.

3. He selects (i) a data subset of the collection either according to location or aerial coverage, by typing-in coordinates, dragging a rectangle over the required aerial extent, by clicking over a smaller clickable aerial unit, and/or by selecting one or more of the application domains of the collection. Otherwise (ii), right away, he starts by selecting the metadata elements he would like to visualize using the multivariate visualization plots.

4. i) If he opts for approach (i) in step 3 above, the interface opens a view with, possibly, multi-ple frames: a) textual metadata list frame, b) a frame with a question about whether the user would like to view the result in a multivariate visualization graphic (yes) or not (no), c) the browse graphic again (zoomed-in to the area of interest), along with options for aerial cover-age selection with a rectangle, a clickable smaller aerial unit, or by typing-in coordinates, d) the collection’s subject list of domains again, and e) the list of metadata elements for visuali-zation again.


18

ii) If approach (ii) in step 3 is opted for, instead, another view opens with all the frames indi-cated in approach (i) above except that the map browser still shows the complete aerial extent of the collection and the question is “which visualization technique do you prefer?”

5. Assuming the answer to be ‘yes’ if procedure (i) is followed in step 4 about the use of a mul-tivariate visualization technique, then the interface presents a result similar to the result of ap-proach ii) of step 4 except that the map browser is zoomed-in.

6. Then one of the techniques is selected. 7. The result is a new window with the visualization graphic with a choice for metadata ele-

ments, other techniques, and with interaction tools to explore the datasets on the graphic. 8. i) The user, then, interacts with the datasets on the visualization to determine which datasets

are suitable for his application depending on the datasets’ metadata elements’ values. ii) Oth-erwise he selects another visualization technique if, for any reason, he feels he would prefer the other technique over the current one, or simply wants to try out the other technique.

9. With the option i) from the previous step being the one chosen, the result is a pattern of at-tribute relationships of data items.

10. The user filters-out unsuitable datasets on the visualization, and then opens another visualiza-tion technique to get a better view of the selected datasets if the user feels the other technique would be more helpful at that stage. Otherwise he continues the analysis with the same visu-alization technique to make sub-selections.

11. Finally the user opens the textual metadata of the selected datasets and the sample graphics. 12. In a case where the user is dissatisfied with his initial aerial coverage or coordinate typing-in

results, or the results from the collection subject’s domain choices, he makes a new selection, and abandons the previous search or saves the search results for further analysis later.

13. In a case where he wants to make a new choice of metadata elements he does so on the al-ready-open visualization window without going back to (e) in step 4.

14. Then the user would like to see the location or coverage of the selected dataset(s) in the browse graphic (Ahonen-Rainio and Kraak, 2003). He clicks on a link, from the textual meta-data, which updates the browse graphic to highlight those locations highlighted with place names.

The diagrammatic illustration of the model is shown in Figure 2.1. It should be noted that the visualization techniques already selected in the above user task model do not mean that a final selection of the appropriate techniques for testing has been made. But they are relevant at this stage because Ahonen-Rainio Paula (2003) already evaluated concepts of their usage in the context of geographic metadata use. In the next chapter, more potentially usable techniques shall be identified, from literature, in addition to the current selection. It is my feeling that a wider choice will give users a more relaxed expression of their preferences other than if the user feels re-stricted to a few choices and somehow feels forced to make a selection even where he/she actually is not impressed by anything at all.

2.5. Summary and Conclusions

Geographical metadata use interfaces, as technical products; need to undergo a usability evaluation in order for them to have a capacity to help users in achieving their needs. In this study, the measure of


19

usability is defined in terms of three usability merits; effectiveness, efficiency, and satisfaction with which target users (geographical information and/or application science experts) can achieve their goals with the use of metadata facilities. Determination of the usability of geographical metadata use facilities is supposed to be an ongoing process guided by usability engineering principles, which may comprise of several user evaluation methods in the design and development lifecycle of a facility. A few geographical metadata initiative projects and researchers have already embarked on research pro-jects in order to determine the user requirements and the tasks followed and preferred by geographical metadata users when using the metadata facilities. From the results of such research a Hierarchical user Task Analysis (HTA) model has been drafted. This chapter discusses the importance of usability awareness and the adoption of usability engineering principles in the design and development of systems for use of geographical metadata. Several defini-tions for the term usability exist. The same applies to usability engineering, thought it is clear that all of them stress the importance of focusing on the target user’s needs in the design and development of a technical product. Not much literature is available to determine this study’s target users’ requirements and tasks, proba-bly due to the fact that the concept of usability is relatively new (but growing) in the geographical in-formation use culture. There is even less literature to support discussion of the hypothesis of this study. However, with existing metadata use facilities, some research has been done which has resulted in the draft of user task model. The model is important in focusing the design of a geographical meta-data use prototype.

Figure 2.1: Task and system response model by Hierarchical Task Analysis (HTA) for metadata use from a geographical metadata service in a GDI environment.

Visualization graphic with interaction tools, and choice for metadata elements

and/or another visualization technique Dissatisfied with the result

Step 4, Frame (ii)

Zoomed-in Map browser

A Gazetteer

Geographic application domains list

Metadata elements list

Coordinate type-in text box

Drags a rectangle over a certain aerial cover-

age

Clicks on an aerial coverage of inter-

est

Selects applications domain(s) of interest

Defines an aerial extent with coor-

dinates

Selects metadata elements of inter-

est

Step 4, frame (i)

Visualiza- tion techniques?

(Yes or No)

A complete map browser

Which visualization technique?

Select a visualization technique of interest

Explores the data with the interaction tools

Filters-out unsuitable data sets

Textual meta-data

Dissatisfied with the tech-nique

Filtering result

Dissatisfied with the result

Pattern of attribute relationships between data items

Select gazetteer loca-tions of interest

A complete, clickable map

browser

A gazetteer

Geographical application do-

mains list

Metadata elements list

Coordinate type-in text box

Access the user interface

GEOGRAPHICAL METADATA USE FROM A METADATA SERVICE

Yes


20

3. Metadata and geographical metadata visualization

The user tasks and requirements as outlined in Chapter 2 call for a means to use with efficiency, effec-tiveness, and user satisfaction, geographic metadata. It should preferably be an information retrieval model that makes optimal use of human vision. Such a model employs visualization techniques that intuitively address the characteristics of geographic metadata (as described by geographic metadata standards) that users are interested in for determining, for example, the fitness for a specific use of a particular dataset from a data collection. In this chapter, such techniques will be determined from literature. First, a definition and a somewhat detailed description of metadata, especially geographic metadata, will be given. The description shall be put into an applied perspective by looking at the recommenda-tions of some of the well-developed (national and international) geographic metadata standards. The discussion on standards shall highlight the characteristics of geographic metadata (from metadata elements) and, therefore, drive the study towards selection of visualization techniques that qualify for visualization of such features.

3.1. Metadata and geographical metadata

The awareness of the existence of metadata and its use are still not common among geographic infor-mation professionals and users. The metadata use culture is so low that even in environments with a long tradition of data management and information services, it is still perceived as something ‘extra’ (di Colorno and Garibaldi, 2002) that is seen as a burden and use of resources with unproven benefits. Quite some work has to be done to raise awareness of the need to describe the identification of avail-able data sets in databases, to describe the fitness for use of data collections and individual data sets, and to describe their spatial referencing. The more data is produced, the more important mechanisms that organize the data become, as well as information on where to find these data (Gobel and Jasnoch, 2001).

3.1.1. Metadata defined

Metadata comes from the words ‘meta’ and ‘data’, where ‘meta’ means change or transformation. Metadata means ‘data about data,’ describing the origins and changes/transformations to data. The description of these origins and changes ensures the continued use of data over a span of years be-cause documentation of the changes in the value of data over time is maintained (FGDC-STD-001, 1998). Wilkinson (1999), on the other hand, defines metadata as facts about the setting and circum-stances under which people make observations of the data. But he also agrees that, simply put, meta-data is ‘data about data’. Going back to his other definition, though, it appears that data always con-tains metadata. This could be true since he continues to say that people’s observations are forever structured by these settings and circumstances. These settings and circumstances are used to justify the observation values. Calhoun (2002, pp 196) has adopted a more functional definition by Tom Turner (2002) which he considers a better definition of metadata than ‘data about data’; “Metadata (i)


21

helps one find or manage information, (ii) serves ‘particular’ purposes, (iii) may be used by people and/or computers, (iv) often has structure and/or content rules, and (v) can be created by people or by computers. This particular definition is probably helpful, as a wake-up call, to the information society, which, in the above section, is said to see metadata as ‘extra’ data, a burden, and a waste of resources. No matter what definition is given, the current trend is towards recognition of the value of metadata as the databases continue to grow in size. ETEMII (2002), from their user evaluations on the use of geo-graphic metadata, stamp on this statement by concluding that large geographic information databases cannot function without metadata. Other than the functions highlighted by Calhoun (2002), let us look at what other researchers perceive as the factors that deem metadata as a necessary part and parcel of every data set and data collection.

3.1.2. Importance of metadata in a GDI

Several publications exist on the usefulness of metadata, in general, and application specific metadata such as geographic metadata. Some of the existing metadata initiatives and some individual research-ers have carried out their own studies to determine how useful geographic metadata is. These include the MADAME (2000) project, and the ETEMII (2002) project, the U.S. National Spatial Data Infra-structure (NSDI) (FGDC-STD-001, 1998), and a few researchers at institutes of higher learning - Myke Gluck (1997), and Yu Deng (2002). All the above organizations work with geographical meta-data, and each of them forms part of a distributed networked infrastructure with policies, legislation, and a technical make-up for sharing geographical data among the organizations and with the general public. Therefore, they form part of a very important component of their own GDI. From the NSDI publication of its FGDC Content Standard for Digital Geographic Metadata (FGDC-STD-001, 1998), the following uses of metadata were identified;

• Maintenance of an organization’s internal investment in geospatial data • Provision of information about an organizations’ data holdings to data catalogues, clearing-

houses and brokerages so that the general public is aware of its existence, and • Provision of information needed to process and interpret data to be received through a transfer

from an external source From the above uses, the standard recommends metadata content with the following four heads of in-formation:

• Information needed to determine which data exist for a particular geographic location • Information about fitness for use of a set of data or a data item • Information on how to acquire identified data • Information necessary for post-processing and use of the data.

In the focus group study MADAME (2000) the following responds were received for reasons for use of geographical metadata services by the focus group members:

• Checking the availability of the desired ‘types’ of data • Description of organization’s own datasets • Obtaining an overview of available datasets • Checking detailed descriptions • Checking contact information for the data suppliers • Educational purposes


22

• Using it as a memo, general reference • Looking for additional services • Finding contact information for marketing

Yu Deng (2002) looks at the importance of metadata from a professional point of view - as a means of analyzing and structuring the underlying data. There are a lot of uses of metadata, and new uses shall emerge as more and more people start acquir-ing distributed data resources due to the ease of access to such resources because of the ubiquitous nature of the improved Internet technology (ETEMII, 2002). In order not to lose sight of track of the purpose of this study, let us zoom into the character our spe-cific type of metadata - geographical metadata. We will discuss what makes geographical metadata different from other metadata according to existing metadata standards. Such characteristics, in addi-tion to specific user tasks and requirements, determine the type of visualization techniques appropriate when using geographical metadata.

3.1.3. Characteristics of geographical metadata

Geographical metadata is metadata for ‘spatially indexed’ data and information. It is metadata that refers to geographic objects from the size of a room to the whole planet (Gluck, 1997), and every ob-ject whose spatial location and coverage is relevant for its use is a geographical object. Among the existing standards for metadata, the Dublin Core standard in the one that first suggested the inclusion of spatial information in the ‘coverage’ element of metadata (Gluck, 1997). But the FGDC's Spatial Data Transfer Standard is the first standard that started a discussion on metadata and the organization of metadata (Timpf, Raubal, Kuhn, 1996). Ever since then, there has been a growth in the international, regional, and national geographic metadata standards. Among the current known standards are the International Standards Organization (ISO)’s ISO 19115, the Dublin Core metadata standard (though it is not specific to geographical data), the FGDC’s Content Standard for Digital Geospatial Metadata, and the Committee for Standardization (CEN) standard (CEN TC287) in Europe. Most metadata projects use one of the above standards or a modified version of the standards that suits specific purposes of the project. For example, the European Environmental Agency (EEA) web portal has adopted the ISO 19115 standard of which they only introduced some small changes for their own particular needs. Geographical metadata standards provide a clear procedure for the description of digital geo-spatial datasets so users will be able to determine whether the data in a holding will be of use to them and how to access the data (Kim, 1999). It is a symbol of a step towards a systematic and semantic way of ensuring ‘data interoperability’ across all domains of geographical data and across different geo-graphical data-sharing communities. The FGDC’s Content Standard for Digital Geospatial Metadata (FGDC-STD-001, 1998) recommends seven types of so-called ‘compound’ metadata elements (groups of data elements):

• Identification information

• Data quality information • Spatial data organization information • Spatial reference information


23

• Entity and attribute information • Distribution information, and • Metadata reference information

These are highest abstraction compound elements. They are further decomposed into data elements and lower abstraction compound elements, forming a total of more than 300 elements of which 200 are data elements. The lowest abstraction compound elements can be found seven hierarchical levels down from the highest abstraction level. Some, but very few, data elements can be found at the second highest abstraction level. The standard has approximately 100 spatial data elements, plus three ele-ments for the browse graphic file to provide illustration of a data set. The data item definitions comprise the data item type, the domain of the type, and the short form of the data item name. The types include integer numbers, real numbers, ASCII characters (for text), day of the year in the form of ‘date’ (YYMMDD), and time of the day in hours, minutes, and seconds. In addition there are latitude and longitude coordinates in decimal fractions of degrees, and network ad-dresses and file names in Uniform Resource Locator (URL) convention, for example, http://www.itc.nl/library/digital_library.asp. According to Ahonen-Rainio and Kraak (2003) the ISO 19115 standard recommends over 400 meta-data elements (both data elements and composite/compound elements). Just like the FGDC’s standard, it also has provision for optional elements and rules for their creation for those data providers who prefer maintaining additional elements to those provided by the standard. There are only three hierar-chical levels in the structure of the standard, which is a much smaller number compared to the maxi-mum of eight levels found in the FGDC’s standard. Fewer levels make it easier to navigate the struc-ture of the standard (Jessen and Lillethum, 2003). A ‘significant’ number of the metadata elements have their type domains as ASCII characters, either names or descriptions. These ASCII character types also include metric information such as monetary values. The metadata elements with strictly metric value domain types form the minority of the elements. Unlike the FGDC’s standard the ISO 19115 also includes Boolean value domains for some element types. Both standards contain more nominal data domain types than they contain the interval, ratio, or even ordinal domain types. As we shall see later, this nominal character is one of the factors that can cause visualization of metadata, using general-purpose visualization techniques, to be a difficult process (Rosario et al, 2003). There are many geographical standards with all sorts of recommendations for the type of information that should be documented for geographical metadata. But the most important issue is; “which meta-data elements, when using geographic metadata, are the users inclined to look for?” From (Craglia and Evmorfopoulou, 2000), the focus group members were asked to mention the criteria they follow for selecting a dataset (that is; the information they look for) of geographic metadata and these were the results:

• Information on fitness for purpose • Pricing information • Information about overall quality • Information about delivery time after ordering the data set(s) • Areal coverage of data sets • Information content • Up-to-dateness of data sets


24

• ‘Value for money’ • Compatibility • Positional accuracy • Format • Accessibilty & availability, and • Homogeneity (in quality)

This is quite a profile. It can be quite a tedious job to look for that one item that fits the profile. It is therefore, a challenge for GDI geographical metadata services to provide means of exploration of such profiles in a manner that is efficient, effective and brings an expected degree of satisfaction to the user. For that, this study proposes the use of visualization techniques.

3.2. Visualization of geographic metadata

It is evident from the discussion of the standards in section 3.1.3 that geographical metadata are highly multi-dimensional, highly diverse in data types, and that users create demanding profiles as criteria when searching for a suitable data set for their application purposes. Existing metadata information systems, catalogue systems, and digital libraries, provide geographical metadata search and explora-tion services for particular use needs. From research experience of the user tasks as outlined in chap-ter 2 and from the profiles just discussed above, geographical metadata use requires a query refine-ment process consisting of a succession of dynamically built queries (Albertoni, Bertone, and De Martino, 2003). This can mean a lot of queries to, finally, find the required metadata items, and there is also no guarantee that at the end of a standard succession of queries the result will be exactly what the user expected. It follows that this process is actually a data exploration process of ‘large data-bases’, and as such techniques applied in data mining, visualization, and geographical interaction techniques could provide a solution the user needs. This study shall focus on the visualization techniques that can be useful in simplifying the query refinement process and, in general, exploration of geographical metadata. Visualization techniques are powerful, due to the strength of the human visual sense in quickly detecting trends, correlations, and anomalies in exploring rapidly large amounts of data (Uhlenküken, 2000).

3.2.1. Status quo

One of the problems with most current metadata information systems’ services is ‘data explosion’. More and more data is being injected into databases and published on the web as metadata. The more data there is, the more difficult it becomes to explore the many resulting items of metadata during the selective comparison process for the best data sets for a particular purpose in mind. Most services still present metadata to the user in a textual dataset-by-dataset basis (see Figure 3.1). As shown in the problem statement in chapter 1, in the evaluation stage of the metadata retrieval process, users are bombarded with hundreds of textual metadata lists while their aim is only to select just one or a few of those items. The rest of the task is left in the hands of these, often, inexperienced users to select the correct dataset(s). Long list of results are not very intuitive for finding the most relevant documents in the result set (Limbach et al, 2002). The Alexandria Digital Library web metadata service (URL 1) for spatially indexed data is one of the most developed metadata services. It is an example of a metadata facility with massive volumes of metadata items. But the only visualization technique they utilize to simplify the metadata selection process is a map browser, and so do other metadata services such as the FGDC clearinghouse (URL 2)


25

or the Danish metadata service (URL 3). The map is undoubtedly a very intuitive approach which quickly supports searches based on geographical location, and better still, the map footprint of the selected item can be shown on the map to pin-point the item’s repository location. Decisions sup-ported by the map browser are those related to geographic location and/or coverage. However, the map does not support multi-criteria decisions that require simultaneous exploration of many metadata elements, as it is the requirement of the geographic metadata user. In the geo-scientific community, exploratory use of metadata has been considered only recently. Rather, the focus, in the field of meta-data research and development, has primarily been on the correct logical and technical descriptions through standards (Ahonen-Rainio and Kraak, 2003). In the light of the state of affairs in geographic metadata services, this study recommends the use of visualization techniques in the query formulation and refinement process of metadata exploration. Visualization is a cognitive process that every individual engages in (Shneiderman, 1998). It is a partly unconscious, partly active mental model creation process to structure and understand data. Therefore visualization tools and techniques can significantly help users in understanding the struc-ture of geographical metadata.

Figure 3.1: An example of a web interface of a geographical metadata service facility (URL 3)

3.2.2. Visualization techniques for geographical metadata

Several visualization techniques exist for exploring data. As Sachinopoulou (2001) and Keim, Pause, and Sips (2003) indicate, in the past few decades several visualization techniques were developed for information visualization, and lately for visual data mining as data collections increase in volume and become more multi-dimensional. Every technique was designed for a particular type or types of data. Geographical metadata is a special case of data with its own characteristics as discussed in section 3.1.3 above. As indicated at the beginning of section 3.2, metadata consists of many different data types, has a large number of attributes, and users require the exploration of ‘many’ of these attributes from a given metadata collection before deciding on a suitable set of data items. From the study of the geographical metadata standards, it is also evident that there are several nominal attributes such as file format or data format, aerial coverage, spatial repository (location) of a data set, map projection used, and con-tact address of the owners of the data. In addition most of these elements contain other lower hierar-


26

chy level elements, and thereby increasing, further, the number of nominal attributes. There are also complex data types, for example, addresses, which, together with nominal types, need some pre-processing before they can be visualized on most of the visualization tools. Also, almost any data col-lection comes with some irregularities. An example of these irregularities common in metadata is ‘missing records of information.’ That is, some metadata items do not contain data on certain attrib-utes. All of these characteristics of geographical metadata make it difficult to visualize with the exist-ing visualization techniques. However some researchers have tried to overcome some of these compli-cations. Let us first take a general look at existing visualization techniques, and the visualization tools in which the techniques can be displayed, and then relate the findings to the specific case of geo-graphical metadata visualization. Overview of existing visualization techniques Keim, Pause, and Sips (2003) have come up with a classification model (Figure 3.2), based on three criteria, for information visualization techniques. Sachinopoulou (2001), in her report on existing visualization techniques and tools, follows a similar model. She claims that the model is from an ear-lier source by Keim - Keim’s presentation in KDD’97. Any visualization technique falls somewhere in the three dimensional space formed by the model. According to the model, a visualization technique can be:

• A standard 2D or 3D display of data (for example; X-Y/X-Y-Z plots, bar charts/histograms, line graphs, maps).

• A geometrically transformed surface (for example; scatterplot matrices [figure 3.3, figure 6, and figure 8], prosection views, hyperslice, parallel coordinate plots [figure 6 and figure 8]). These techniques are meant for visualizing the transformations and projections of the data in Cartesian and non-Cartesian geometrical spaces (Sachinopoulou, 2001). The two dimensional space is transformed into many dimensions whereby dimensions lie next to each other as axes, in the case of parallel coordinate plots, and the dimension values are plotted along the length or height of a dimension. In the case of scatterplot matrices and hyperslices a single 2D space is divided into several 2D coordinate systems, whereby all 2D relationships in the data can be visualized on one plot. The prosection views are a variation of hyperslices where the matrix is formed by 3D spaces instead of 2D. One of the dimensions in each of the 3D space can be se-lected to define a range of values in which all the data items that fall within it are projected (prosected, thus prosection views) onto the 2D plane of the other two dimensions. The prosec-tion can be done on any of the three 2D planes. Each prosection is differentiated from another by colour codes.

• An icon (for example; Chernoff faces [Figure 3.4], needle icons, star glyphs [Figure 3.4], stick

figure icons [Figure 3.4], colour icons [Figure 3.4], tilebars). Icon techniques display each data dimension on each feature (for example eyes, mouth, etc, in the case of Chernoff faces) of the icon, and a dimension value is indicated by a variation of the behaviour of a feature.

• • A matrix of dense pixels (for example; the recursive pattern technique, the circle segments

technique [Figure 3.7], mosaic plots [Figure 3.7]). This is one of the techniques that allow visualization of many data items. One data item is represented by adjacently lying pixels with


27

each pixel representing one of the data dimensions. A certain colour is given to the dimension to indicate its membership value to a given range of query values.

Figure 3.2: classification of information visualization techniques (Keim, Panse, and Sips, 2003

• A stacked display (for example; dimensional stacking [Figure 3.6], World Within Worlds, Treemaps, Cone Trees) of either data or dimensions of the data (in the case of multidimen-sional data) or both. An example of a stacked display of dimensions is the dimensional stack-ing technique. Two dimensions in the data, with the highest ‘speeds’ form a coordinate sys-tem, and, recursively lower-speed dimensions are mapped into the same coordinate system, virtually splitting the base system into smaller rectangular spaces. A data item belongs to one of the rectangles. A stacked display of data can be shown with tree structures such as tree-maps or cone trees (Spence, 2001). All the data items individually form nodes of a tree (or smaller rectangles, in the case of treemaps), and according to the dimension values of each data item, groups of data items are formed, and recursively groups of ‘data item groups’ are formed until the whole data collection is assembled into one node (or one large rectangle, in the case of treemaps) or the ‘root’ node). The whole process can be done in reverse until the data items are separated into individual items or ‘leaf’ nodes.

A technique either supports a one-dimensional view (for example; Tukey box plot, histogram), a two-dimensional view (for example: a scatter plot, box plots, histograms, maps), a three-dimensional view (for example; a 2D presentation of a 3D plot, a scatter plot matrix) or a multi-dimensional view (for example; parallel coordinate plots, glyph techniques, some pixel-oriented techniques, Mosaic plots) of the data (or all views). Others are capable of visualizing text/web documents (for example; tilebars, Kohonen maps). Others are useful for showing hierarchies or order (for example; tree structures, di-mensional ordering), or network relations (for example, the Netmap technique or skitter graph) in data. The last group deals with visualization of components and/or models during the design of soft-ware, or during the design of complex algorithms. A third criteria looks at how much interactivity, during exploration of the data, can be achieved with a given technique. Since the visual exploration process typically follows the paradigm; overview, zoom


28

and filter, and then details (Ahonen-Rainio and Kraak, [2003], Gobel and Jasnoch, [2001], and Keim, Pause, and Sips [2003]), it appears that a technique has to provide interactive functionalities in order to move between these three steps. Examples of interactive techniques are Dynamic Projection, Inter-active Filtering, Zooming, Distortion, Brushing and Linking (Keim, Panse, and Sips, 2003), and Dy-namic Classification (Andrienko et al, 2002).

Figure 3.3: distortion technique on a scatter plot matrix display using a fish-eye lens (Keim, Panse, and Sips, 2003)

By zooming into a subset of visualized data, a user can view hidden details (Sachinopoulou, 2001). An example of a distortion technique is shown in Figure 3.3. A portion of the scatter plot is distorted with a fish-eye lens technique to explore in more detail that portion of the matrix. When brushing and linking, the user makes a selection of certain data items in one display, and, automatically, the same selection is highlighted in the other displays. An example of this technique is shown in figure 8; All data items assigned to a particular cluster in one display – either the PCP or the scatter plot matrix, are also assigned the same cluster and the same colour in the other display. A dynamic classification technique is featured by systems such as CommonGIS (URL 4), where in a classed choropleth map, the ordinal class boundaries can be altered according to the user’s preferences (Andrienko et al, 2002). As an example of a dynamic projection technique, Keim, Pause, and Sips (2003) talk about a Grand Tour System, which allows a serial, automatic display of interesting two-dimensional projec-tions, using scatter plots, from a multidimensional data set. Finally, interactive filtering is a technique whereby a selection of data items is done by obscuring/filtering-out the rest of the items in order to focus on the selection result. Keim, Pause, and Sips (2003) provide an example of a dynamic interac-tive filtering mechanism provided by a Magic Lens. As the name implies the user plays with a lens like view of the data. Data items lying within the view of the lens are displayed differently from the other (filtered-out) items to allow for more interaction on them. Main challenges As visualization techniques were designed for particular types of data, so do particular data character-istics of geographical metadata require specific visualization techniques. Large volumes of metadata items For a start, the issue of ‘data explosion’ requires a visualization tool and visualization graphic tech-nique that can handle many data items at the same time without making it difficult for the user to dis-cern one item from another. The ADL, on its own, has metadata for over 15000 data holdings. For example, with visualization graphics such as star glyphs, once the number of glyphs increases on the screen, each glyph becomes so small that to view it, one has to zoom in, and in the process loses sight


29

of the total pattern represented by all the glyphs. The same thing applies to all other glyph techniques including Chernoff faces. Parallel coordinate plots (PCP) (Figure 3.4) can also suffer from the same problem of clutter of the display space if there are too many items. To solve this problem, some re-searchers (Yang, Ward, and Rundensteiner, 2003) suggest a ‘hierarchical display’ of the graphics (Figure 3.5). They recommend this technique, which they have tested in a multivariate visualization tool XmdvTool system (URL 5) on parallel coordinate plots, star glyphs, dimensional stacking dis-plays, and scatter plot matrices. The technique allows the display of fewer items, each representing a class/cluster of data items. A clustering algorithm, forming a hierarchical cluster tree, is used to clas-sify the items. A meanpoint-band method is adopted to display the clusters whereby the mean values for each dimension in the cluster are displayed on behalf of other members of the class. A coloured transparent band that spans the maximum and minimum deviation ranges (from the mean) of the di-mension represents the rest of the items in the class (as shown by Figure 3.5). Displays such as those in Figure 3.4 are referred to as ‘flat’ (‘F’) displays, for example, flat star glyphs or flat parallel coor-dinate plots in contrast with, for example, ‘hierarchical’ parallel coordinate plots or hierarchical scat-ter plot matrices (Figure 3.7). High-dimensionality A second limitation of existing visualization techniques is that the usability of most of these tech-niques is limited to only a few data attributes. Spencer (1998) points out that the difficulty of design-ing an interactive visualization is strongly influenced by the number of variables involved. For exam-ple, a single scatter plot can only handle two attributes or a maximum of three in a 3-D coordinate view. But with a 3-D view, the usability of the technique may be significantly reduced since users are more accustomed to two-dimensional view graphics. With glyph techniques such as star glyphs, the pattern created by a single item becomes difficult to differentiate from other items, and keeping track of the attributes can also be quite an irksome exercise. One of the few techniques that can easily han-dle a large number of attributes is the Parallel Coordinate Plot (PCP). “Parallel Coordinate Plots offer a tried and tested technique which can handle a large number of variables” (Spence, 1998, pp 46). A scatter plot ‘matrix’ has also been proven to be an effective technique (Yang, Ward, and Rundensteiner, 2003), though it does not support a simultaneous holistic view of the attributes. The user’s eye has to wonder all over a huge matrix in order to ascertain a pattern formed by the attribute relationships. The pattern can be difficult to follow since the position in attribute space (coordinate) of a single data item can change dramatically from one plot to another, making it painstaking to locate the data item. Another suitable technique for this challenge is a pixel-oriented technique embodied in the VisDB (URL 6) visualization software tool (Spence, 2001). In (Göbel and Jasnoch, 2001) the technique is implemented in a prototype for a metadata information system for residual waste. The technique evolves from the field of visual data mining: The result set from search criteria of (geographical) metadata is visually presented on the screen in a spiral pattern of pixels (Figure 3.8), whereby one pixel corresponds to one attribute value of a data item. The search criteria can consist of particular pre-defined value ranges for each data attribute. The neighbouring pixels are arranged in rectangles, and a rectangle corresponds to one data item. Data items, which satisfy most of the search criteria are displayed with most of their pixels assigned a specific colour (‘yellow’, in Figure 3.8).


30

Figure 3.4: Examples of multidimensional icon and geometric visualization techniques; (a) Stick figures, (b) Star glyphs, (c) Chernoff faces, and (d) Colour icons belong to the class Icon techniques and (e) and (f) belong to the class Geometric techniques.

Figure 3.5: A cluster shown by the meanpoint-band in a 2-D plot; a. The meanpoint of the cluster. b. The band of the cluster Tool (Yang, Ward, and Rundensteiner, 2003)


31

Figure 3.6: Stacked displays; (a) a dimensional stacking technique from Xmdv Tool (Yang, Ward, and Runden-steiner, 2003) and (b) an applet example of a treemap from Treemap 4.0 software package, showing statistics of NBA basketball players (http://www.cs.umd.edu/hcil/treemap/applet/index.shtml)

Figure 3.7: Hierarchical displays of (a) parallel coordinate plots and (b) scatter plot matrices in Xmdv Tool (Yang, Ward, and Rundensteiner, 2003)

The colour is actually given to pixels representing attribute values falling within the limits of a speci-fied value range (in a search query) for a particular attribute. Pixels, which fall outside the range, are assigned other colours. A particular colour is given to a pixel falling outside the range depending on how far the pixel’s attribute value is away from satisfying the criterion range (its relevance). Diverse data types - nominal data A third limitation of most of the visualization tools is that they do not support the visualization of nominal/categorical data type attributes. This is a very big impediment for geographical metadata visualization since, typically, it comprises of tens of these types of attributes. Patro, Ward, and Run-densteiner (2003), argue that the number of different data types that can be handled by a tool is one of the factors that can affect the usefulness of a visualization tool. Experience shows that most of the general-purpose visualization tools are not well equipped to display, on the same graphic, different data types, especially, numerical and nominal.


32

Tools such as Mosaic Displays and Fourfold Displays are specifically designed for nominal data (Rosario et al, 2003). But they are ‘strictly’ for nominal data. They are also not available in common visualization software packages. Patro, Ward, and Rundensteiner (2003) have also identified tools such as TreeVis, VisDB, DeVise, Polaris, and SGI’s MineSet as database oriented systems that can handle nominal data. They argue that disadvantage with these particular tools is that the nominal val-ues are simply assigned some order in the ordinal scale and then treated as ordinal types. Converting the nominal values to either ordinal or numerical values is a common pre-procedure in order to visual-ize nominal data with general-purpose visualization tools since such tools handle numerical and ordi-nal data very well. Therefore it is reasonable to convert other types of data into numerical or ordinal data as long as the semantics of the data are not lost. This means that both order and ‘spacing’ be-tween the numeric values (converted from the nominal values) on the attribute scale should be consid-ered. Ignoring any of them can result in a rather artificial pattern of the relationships between data items and, as such, introduce errors in the interpretation of the data (Rosario et al, 2003). Both Rosario et al (2003) and Patro, Ward, and Rundensteiner (2003) recommend a pre-processing step of nominal data prior to visualization. They suggest that the data should be mapped to numerical values by applying a nominal-to-numeric algorithm without losing the inherent relationships in the data attributes. Both their approaches have been implemented in the XmdvTool system. The approach by Rosario et al (2003) is specific to nominal-to-numerical conversion. They use a so-called Distance-Quantification-Classing algorithm. The nominal-to-mapping was user-tested with parallel coordinate plots and the conclusion was that the quality of the display is improved. For example, the crossing of the parallel coordinate plots signatures is minimized. The data is first assigned numeric values using a so-called ‘focused correspondence analysis’ (FCA) technique, which considers the semantic relation-ships of a nominal attribute with other attributes in the data. Secondly, the resulting numeric values are assigned order and semantically quantized on the attribute scale to determine the spacing between individual nominal values. The other approach by Patro, Ward, and Rundensteiner (2003) does not only work on nominal data, but other complex data types such as addresses. The data mapping algo-rithms, referred to as ‘data mapping helpers’, used are the substitution technique, augmentation tech-nique, and/or the compression technique (There is no explanation as to what factors are considered in the ‘data mapping’ process). In the referred source, ‘data mapping’ is described as a direct substitution of a wide range of data types to data types that the visualization system can process. However, all the mapping techniques are said to consider the semantic relationships in the data – semantics that can be obtained from metadata such as data format, measurements units, symbols or numbers used to indicate missing values, and resolutions or scales at which data was captured (Patro, Ward, and Rundensteiner, 2003). It would probably be easier to represent the nominal scale by graphic variables such as colour or form to denote the nominality of each metadata item’s nominal attribute value like a specific date or ad-dress. But the problem would arise when the nominal scale has high cardinality (Rosario et al, 2003). There is a limit to number of colours that can be used to indicate readily perceivable nominal differ-ences between items. Neither, alternatively, are there so many easily discernible graphical ‘forms’ to provide the same functionality.


33

Figure 3.8: Pixel-oriented techniques; (a) + (c) Six-attribute data items in spiralling order from the centre of the display (Sa-chinopoulou, 2001). (b) + (d) 9-attribute data items, recursively changing in colour from the centre of the display (URL 6). (c) and (d) The yellow attribute values satisfy the query’s criterion for that particular attribute. The other colours represent attribute values with particular relevance values (weights) to the individual attribute’s search criterion. (e) A mosaic display of hair colour attributes against eye colour attributes for a group of individuals. Each block represents the sum of individuals with a certain eye colour and a certain hair colour (Sachinopoulou, 2001). Missing values Finally, there is an issue of missing/null/unknown values. This is not just a visualization problem. In relational databases, for example, it is considered to be a violation of the database integrity. Assigning wrong values to missing values can mislead interpretations of geographical metadata. Edsall and Roedler (2002) propose a method, using GeoVista Studio (URL 7), for dealing with null values using parallel coordinate plots: They suggest the assignment of a significantly outlying value to all unknown values at one end of an attribute axis. Swayne (1998), using scatterplots in the XGobi system (URL 8), suggests the introduction of an additional window (‘shadow’ window) next to the window of the plot of recorded data. In the recorded-data window, the missing values are assigned an arbitrary value. In the shadow window, four binary plots are made (Figure 3.9) – a ‘00’ plot for data items with miss-ing values in both attributes, ‘01’ and ‘10’ plots for data items with missing values in one of the at-tributes, and a ‘11’ plot for data items with available records in both attributes. In the main window, the missing value items can be highlighted by brushing the plots in the shadow window.


34

Figure 3.9: Main scatter plot and shadow scatter plot window in the Xgobi visualization system, showing missing values in two attributes/dimensions of the data. The four plots indicate cases for ‘00’, ‘01’, ‘10’, and ‘11’. The pur-ple points in the main scatter are those that have been highlighted in the shadow window, indicated by the same colour (URL 8).

Despite criticisms of Chernoff faces and star glyphs, some researchers find them very useful and us-able for multivariate data visualization. Chernoff faces were designed after noticing that a wide range of facial expressions and appearances on people are defined by numerous possibilities of values en-coded in facial features such as the size of the eyes, the height of the eyebrows above the eyes, and the size and/or shape of the mouth (Spence, 1998). Lee et al (2003) indicate that some people argue that the faces give people a chance to perceive data values in parallel. Ahonen-Rainio stamps on this view from her user analysis of Chernoff faces; that users tend to interpret individual faces pretty quickly compared to techniques like parallel coordinate plots, star glyphs, or scatter plot matrices. However, in the same analysis, the faces were found to be a rather confusing technique in determining non-extreme values. Star glyphs, also, may not be that effective for visualizing hundreds of metadata items, though that statement can be argued due to the fact that they are actually a special form of par-allel coordinate plots when displayed as parallel coordinates in polar form as in (Wilkinson, 1999). But, surely, they are relevant for multivariate data visualization. Spence (1998) refers to them as long-standing means of displaying multivariate data. He also stresses their similarity with the parallel coordinate plots. To recap on the challenges for geographical metadata visualization, Table 3.1 shows an overview of the main challenges, possible solutions (visualization techniques), and the environments (software tools) in which the solutions can be implemented.

3.3. Selection of the appropriate tool and techniques

It should be noted that all the above-mentioned techniques are multi-dimensional in nature. Therefore any of them can be used to visualize geographic metadata. But, given the information provided by Table 3.1 (a), it turns out that some techniques are better qualifiers. Following the same argument, the tools can all be used for metadata visualization. But there are other criteria that cannot be overlooked if the prototype has to be successfully built in a limited period of the order of weeks.


35

3.3.1. Techniques

It should also be noted that the classification of the techniques into geometric techniques and the other four classes is based on “basic visualization principles that may be combined in order to implement a certain visualization system” (Keim, Pause, and Sips, 2003) Therefore, it would be wise to select a set of techniques that, in combination, satisfy the whole classification scheme. The PCPs seem to perform best in both geometric class and the stacked displays class. In the stacked displays class, they are ‘H’/hierarchical PCPs. They have the highest points, satisfying most of the challenges in both cases. In the Icon techniques section, all the techniques score the same points, thought not on similar challenges. Therefore any of the four techniques would be appropriate. The pixel oriented technique section was not divided into any examples since the information provided by Sachinopoulou (2001), indicate that the two examples; query independent techniques and query de-pendent techniques score the same points on the same challenges. Therefore any of the pixel oriented techniques would also be appropriate. The stacked display section is divided into (a) the dimensional stacking technique, which displays the stacked dimensions of the data and (b) the rest of the tech-niques, which display the stacked data items. In the (b) sub-section, as the table indicates, the ‘H’ PCP is the best option, followed by the ‘H’ scatterplot matrices, then the ‘H’ star glyphs, with ‘H’ dimen-sional stacking taking the fourth rank, and the Treemaps, and the Conetrees, respectively, fifth and sixth, in ranking.

3.3.2. Tools

Looking at the overall scores by the tools, XmdvTool 6.0a system has the highest points, utilizing most of the visualization techniques featured in Table 3.1. Nine of the techniques can be developed in Xmdv Tool 6.0a while four can be developed in SGI’s data explorer, GeoVista Studio, and Xgobi. But, when the rankings are evaluated from each of the five visualization classes, the scores follow a different trend. In the Geometric techniques class, Xgobi scores the highest points followed by Xmdv Tool, GeoVista Studio, and Mineset. In the Icons class, SGI’s data explorer scores highest, followed by XmdvTool 6.0a and VisDB. In the Pixel Oriented techniques class, VisDB scores the highest points, followed by GeoVista Studio. In the Standard 2D displays – the map, in this case, GeoVista Studio, XGobi, and Mineset all have the map feature. In the Stacked display class, XmdvTool 6.0a scores the highest points, followed by Treemap 4.0 and Mineset.


36

Tables 3.1: Potential of visualization techniques for geographical metadata visualization in the software tools in which the techniques are embedded. The information, except on missing values, about Prosection views, Hyper-slices, Colour icons, Treemaps, Conetrees, SGI Data explorer's Glyph-Maker, and Mineset (40) was obtained from (Sachinopoulou, 2001). In both tables ‘F’ means ‘flat’ and ‘H’ means hierarchical. In Table 3.1(a), ‘+’ and ‘-’ denote the relevance of a technique in visualizing data given a challenge at hand. ‘+’ means ‘relevant’, ‘++’ means ‘more relevant’, and ‘+++’ means ‘even more relevant’. ‘-’ means ‘irrelevant’. ‘0’ refers to an uncertain case. In Table 3.1(b), ‘+’, ‘-’, and ‘0’ indicate the possibility, impossibility, or uncertain cases, respectively, for a visu-alization technique to be developed in a given visualization software tool. Table 3.1(a)

Challenges for visualizing geographical metadata Visualization techniques

Number of data items Number of dimensions Nominal data types Missing values

‘F’ Scatterplot matrices ++ + + + Prosection views + + - 0 Hyperslices + + - 0

Geometric techniques

‘F’ Parallel Coordinates ++ +++ + + Stick figures - + + 0 Chernoff faces - + + - Colour icon + ++ - 0

Icon techniques

‘F’ Star glyphs - + + - Pixel oriented techniques

Pixel oriented techniques ++ ++ - 0

‘F’ Dimensional stacking + + + 0 Treemaps 0 + +++ - Conetrees + - 0 - ‘H’ Star glyphs in Xmdv Tool

++ + 0 +

‘H’ Scatter plot matices in Xmdv Tool

+++ + 0 +

‘H’ PCPs in Xmdv Tool ++ +++ + +

Stacked displays

‘H’ Dimensional stacking in Xmdv Tool

+ + + 0

In addition to the four criteria (number of data items, number of dimensions, nominal data types, and missing values) for selecting visualization techniques and a tool for geographical metadata, the fol-lowing criteria shall be used for selecting an environment (tool) in which the prototype can be built:

• platform independence of the tool, • amount of input effort necessary (such as programming and/or pre-processing of the data) ver-

sus amount of time available to learn how to use the tool, and • the amount of interactivity (such as zooming, brushing, and focusing) available from both the

tool and the technique ‘in the tool’ will be considered during the selection.


37

Table 3.1(b)

Visualization software tools

Visualization techniques SGI Data exlorer's

Glyph-Maker Xmdv Tool

6.0a GeoVista Studio

Xgobi And XGvis

VisDB Tree- map 4.0

Mineset

‘F’ Scatterplot matri-ces

- + + + - - +

Prosection views or 3D scatter plots

0 - 0 + - - +

Hyperslices 0 - 0 0 - - 0

Geometric techniques

‘F’ Parallel Coordi-nates

- + + + + - -

Stick figures + - - - + - - Chernoff faces + - - - - - - Colour icon 0 - - - - - -

Icon tech-niques

‘F’ Star glyphs + + - - - - - Pixel oriented techniques

Pixel oriented tech-niques

- - + - ++ - -

Standard 2D techniques

Map visualizer 0 - + + - - +

‘F’ Dimensional Stacking

- + - - - - -

Treemaps - + 0 - - + 0

Conetrees - - 0 - - 0 +

‘H’ Star glyphs in Xmdv Tool 0 + - - - - -

‘H’ Scatter plot matices

in Xmdv Tool - + - 0 - - 0 ‘H’ PCPs in Xmdv Tool - + 0 0 0 - - ‘H’ Dimensional stacking

Stacked displays

in Xmdv Tool - + 0 0 0 - -

According to the table rankings, of all the above tools, XmdvTool 6.0a seems to be the perfect option. The tool is also compatible with both the Windows and the UNIX operating systems, and it features interactive techniques such as interactive zooming, classification (through hierarchical displays), and brushing and linking.


Geographical metadata is becoming an important element in the use of geographical data. As a result, there are several metadata standard initiatives, which propose many ways to describe geographical data, for example; for fitness of use. But the important issue is geographical metadata use/exploration. It is currently a difficult task, on existing metadata facilities, due to the amount of text, made up of an abundance of metadata elements that the user has to read through. The task is made even more com-plicated by the fact that to select one metadata item for one’s use purposes, not one but many items have to be simultaneously explored. It is a view of this study, that with the use of visualization tech-niques, this task can be simplified. Many visualization techniques can be identified for visualization of the character(s) of geographical metadata. But only a few visualization tools exist with such tech-


38

niques built in one system or allowing the visualization of the data types that geographical metadata typically is. Chapter 3 has looked at potential visualization techniques and tools, from literature review, for the visualization of geographical metadata. In addition to the user requirements and tasks listed and mod-elled in Chapter 2, these form the basis for the design of a geographical metadata visualization proto-type. XmdvTool 6.0a system will be used for the design of the prototype and its testing. Basically, the following techniques will be available from the system for testing;

• parallel coordinate plots (flat and hierarchical) • star glyphs (flat and hierarchical) • scatter plot matrices (flat and hierarchical), and • dimensional stacking (flat and hierarchical)


39

4. Prototype development

In order to gather more data on user requirements and tasks, and to avoid making the subject of meta-data visualization seem to be an abstract concept, a prototype was developed for this study. The proto-type seeks to mimic a hypothetical case of geographical metadata visualization with other visualiza-tion techniques except the map browser. The map browser is normally used only for area-based search queries. Having identified preliminary user tasks and requirements from literature (Chapter 2) and the visuali-zation techniques and best choice of the visualization tool (Chapter 3) for the prototype, the subse-quent logical step was to process a sample of geographic metadata to the format required by the tool and to the form and file structure that would allow easy data access and querying.

4.1. Data

A dataset of geographic metadata was gathered from the Internet website (URL 9) of the Bureau of Land Management, Oregon and Washington, in the United States. The bureau’s metadata covers data from 14 organizations, most of which are branches to the main body. Other data come from organiza-tions such as USGS and the University of Montana. Just over 140 geographic metadata items were downloaded from the bureau’s data holdings. The holdings’ collection covers several applications. According to the ISO 19115 geographical metadata standard, the collection comprised, in part, of the following geoscientific applications: Biota data, (vegetation data), Forestry data, Inland waters data (streams, wetlands, and ground water), Environmental data, Geoscientific information data (includes geological data), Climatological/Meteorological data, Boundary data (includes administrative data), Elevation data, Planning Cadastres (includes data on all types of existing and planned cadastres), Transportation data, Utilities-and-Communication (utilities network and point data), Satellite-and-Aerial-Photography data, and. The holdings consist of statewide (union of Washington or Oregon) or regional (one Oregon region and individual counties) coverages captured at a scale of 1:1000000 (or smaller) and framework data captured at a scale of 1:24000 (or larger). The bureau’s metadata is based on the FGDC’s Content Standard for Digital Geospatial Metadata (FGDC-STD-001, 1998). The standard is organized in a hierarchy of data elements and compound elements. Only data elements were extracted for the purposes of this study. The 7 main compound elements are listed in section 3.1.3 of Chapter 3.The standard consists mainly of nominal metadata elements. There were a few numeric elements such as latitudes and longitudes for the geographic bounding polygon (the aerial extent covered by a data item) and data about temporal relevance. There were also a few elements with special data types such as calendar dates, time of day, and network ad-dresses (URL’s) and file names. Of these special types, those that were used for the prototype were manipulated in order to suit the data format requirements of the visualization tool. The metadata extracted consisted of the following 22 (twenty-two) ISO 19115 geographical metadata standard elements: fieldIdentifier, hierarchyLevel (dataset, series, or collection), contact (responsible person for metadata information), dateStamp (of the metadata), title, topicCategory, responsibleParty (organization responsible for the metadata), Abstract, Purpose, maitenanceFrequency, Status (Pro-


40

gress), accessConstraints (access restrictions), useConstraints (use restrictions), ExtentAreaName, spatialRepresentationType (vector, grid, text table, etc…), spatilaResolutionScale, referenceSystem, Projection, ExtentBoundingPolygon (Geographical extent), ExtentTemporal (Temporal extent), re-sourceFormat, spatialRepresentationVector, GeometricObjectType, and TopologyLevel (degree of complexity of the spatial objects (ISO/TC 211 DIS 19115, 2001). More elements were considered based on the user requirements in Chapter 2 and the profile listed at the end of sub-section 3.1.3 in Chapter 3, which were the primary criteria for making a selection of the elements to be visualized. But, unfortunately, those elements were left out due to missing data and the fact that the FGDC standard does not explicitly describe some of the elements. Such elements in-cluded pricing information, information about delivery time after ordering the data set(s), informa-tion about overall quality, positional accuracy and homogeneity in quality. For example, pricing, ac-cording to the FGDC standard is applied only on distribution of the data, and there are different medi-ums (therefore different prices) at which the data can be distributed. In addition, the distribution fees information was similar for all the metadata items, which would mean similar values for all the meta-data items. Quantitative information about data quality such as horizontal/vertical positional accuracy values and attribute accuracy values was missing. The only quality information available was docu-mented as long textual ‘reports’ of attribute accuracy, logical consistency, completeness (which is partly represented by the “Status” element in the list above), and horizontal-and-vertical positional accuracy.

4.2. Data processing

4.2.1. Raw data

As mentioned above, the bureau‘s metadata is documented according to the specifications of the FGDC’s Content Standard for Digital Geospatial Metadata. As a result the metadata had to be trans-lated to the ISO 19115 standard, which contains a more extensive list of metadata elements, and being an international standard, is a result of combined ideas from several standard institutions (including FGDC). Therefore ISO 19115 was chosen over the FGDC standard since, obviously, from an interna-tional perspective, ISO 19115 addresses more geographical metadata user needs than the FGDC stan-dard. The translation was achieved by a comparison of the metadata element descriptions of both standards outlined in (FGDC-STD-001, 1998) and (ISO/TC 211 DIS 19115, 2001). It has to be clari-fied that not all the metadata elements had a one-to-one mapping between the two standards. For ex-ample, some elements had to be assigned to one nominal category in the ISO standard while they, clearly, belonged to different categories in the FGDC standard (the converse is also true). There were also many cases of semantic differences. For example, spatial representation type “grid”in the ISO standard, with a small amount of doubt, was thought to refer to “raster” in the FGDC standard. There was also confusion in deciding whether to assign the FGDC’s geometric object type “pixel” to “com-posite” or “complex” in the ISO standard. Those were a few of the complications encountered during the dataset-by-dataset standard-to-standard translation process.

4.2.2. Organization of the data

Eighteen ISO 19115 metadata elements were organized into a Microsoft Excel spreadsheet. In addi-tion, four spreadsheets (the other one was split into two, making it, in total, fiver spreadsheets) were created to document those metadata elements that have a many-to-one relationship with the metadata


41

items and as such could not be directly included as attributes in the MD-metadata table as that would have introduced redundancy in the ‘fileIdentifier’ records of the spreadsheet. Those were; the topic category and the spatial representation vector (split into geometry object type and topology level, geographical area name/gazetteer name, and resource format to describe the data format(s) of each data item and the associated native dataset environment(s). All these data could have easily been compiled in a database system such as Microsoft Access. But the visualization tool (XmdvTool 6.0a) for the prototype does not support one-to-many (or vice-versa) or many-to-many querying possibilities based on two or more relations of the same data item. This lack of support is a disadvantage since that means a lot of user search and exploration queries for the metadata cannot be carried out with the prototype. That is especially the case for the queries relevant in the transition process, from the ‘overview level’ (Gobel and Jasnoch, 2001) of the data items to the level where multiple data items are compared. Sometimes such queries involve a consideration of more than one metadata elements (for example, geographical area name and topic category). As a re-sult the data was organized into groups of different topic categories, different geometry object types, different topology levels, different administrative level gazetteer names, and different resource for-mats of the metadata items. The groups were just meant for manual selections that would be made, of metadata items based on those five elements (or some of them), prior to the user evaluations in order to satisfy the typical use scenario of geographical metadata use. For example, the scenarios normally involve more than one topic category.

4.2.3. Data formats

To make use of all the visualization techniques in the XmdvTool 6.0a system, the data was supposed to be formatted into an OKC file, a CF file, and a CG file. The OKC file format is structured as fol-lows: Int_N_number_of_dimensions Int_M_number_of_datapoints String_fieldName_dimension1 String_fieldName_dimension2 ... String_fieldName_dimensionN Float_minimum_dimension1 Float_maximum_dimension1 Int_cardinality_dimension1_in_dimstack Float_minimum_dimension2 Float_maximum_dimension2 Int_cardinality_dimension2_in_dimstack ... Float_minimum_dimensionN Float_maximum_dimensionN Int_cardinality_dimensionN_in_dimstack Float_value_dimension1_datapoint1 Float_value_dimension2_datapoint1 ... Float_value_dimensionN_datapoint1 Float_value_dimension1_datapoint2 Float_value_dimension2_datapoint2 ... Float_value_dimensionN_datapoint2 ... Float_value_dimension1_datapointM Float_value_dimension2_datapointM ... Float_value_dimensionN_datapointM (URL 10)


42

where Int = Integer, ‘datapointN’ refers to a value of a data record in the last dimension in a list (if the dimensions are labelled 1, 2,…,N for the data in question) and ‘dimension’ refers to that last dimen-sion in the list. For example, the first line, in the code can be read as:

• type an integer number for the total count of ‘dimensions’ (or ‘metadata elements’, for the case of metadata) in the dataset,

• skip a space, and • type an integer number for the total count of data records (that is, metadata items, in this case)

The other lines can be read by following the same syntactical interpretation. In line 6, 8, and 10 ‘car-dinality’ refers to the number of discrete data values that a dimension can be broken down into if it is to be discretized. It is easy to define cardinality for geographic metadata elements since they are mostly nominal or categorical. The cardinality of a nominal/categorical dimension is the total number of different categories in the dimension. For a relatively continuous dimension such as time, there is no natural cardinality because there are no natural breaks in the scale of the dimension. It depends on the choice of the data processor where the scale can be broken into discrete ranges. The cardinality is used for dimension stacking display as explained in Chapter 3, sub-section 3.2.2, hence the word ‘dimstack’ in the OKC code above. Once the data is in the above format, it can be visualized using flat displays (Figure 12). The XmdvTool 6.0a development team has made tools for automatically converting the OKC format into the CF and CG format using a hierarchical clustering algorithm based on measures of proximity be-tween the data items (Yang et al, 2003) (refer to Chapter 3, for a more detailed description of the technique). Therefore, both the CF and CG files are used for hierarchical visualization of the data. As another consequence of the OKC-CF-and-CG requirement, data types of two of the metadata ele-ments ‘extentBoundingPolygon’ and ‘ExtentTemporal’ had to be modified. Each of the two elements was split into smaller elements – the begin date (ExtentTemporalBegin) and the end date (ExtentTem-poralEnd) for the ExtentTemporal element, and two latitude elements (NboundLat and SBoundLat) for the southern and northern boundaries and two longitude elements (EboundLong and Wbound-Long) for the eastern and western boundaries of the extentBoundingPolygon. A second major requirement of the XmdvTool visualization system, like many other visualization tools, is that the data has to be in numeric form. As a result a large majority of the metadata elements from both the MD-metadata table and the group tables were converted to numbers. But in order to keep the semantic relationships between the metadata dimensions as discussed in Rosario et al (2003), the first part of a special nominal-to-numeric conversion algorithm - the Distance Quantification Classing (DQC) approach (shown in Figure 4.1, below), was applied to the metadata using the R ("GNU S") geostatistical software (URL 11): Distance Step

1. The group spreadsheets were combined into one spreadsheet with the MD-metadata spread-sheet. The ‘Purpose’ and the ‘Abstract’ elements were left because of their data types (‘memos’). The ‘fileIdentifier’ element was used as an identifier attribute to create a combina-tion of 977 one-to-many relationships of the 141 fileIdentifier values with the five elements mentioned above.


43

2. A comma delimited (csv) file was created from the result. 3. Using the R ("GNU S"), the csv file was accessed. 4. Pearson’s product-moment correlation was applied to the data to create pairwise associations

((a)) (Rosario et al, 2003) between the attributes/dimensions in order to determine which at-tributes (analysis dimensions), both nominal and otherwise, are most highly associated with a particular nominal dimension (target dimension) in the metadata.

5. Based on a set value - greater than 3 (4 was used in this case) the analysis dimensions most highly associated (that is; those with the highest correlation coefficients) with the target di-mension (the grey records in (a)) were carried forward to the next step. The rest were disre-garded.

6. Then each of the four analysis dimensions was subjected to simple correspondence analysis ((a) to (b)) with the target dimension in order to assign numeric scores (independent dimen-sions) to nominal values of the target dimension ((c)). A by-product of the correspondence analysis was a set of four eigenvalues ((b)) for each of the analysis dimensions to indicate the measure of correspondence with the target dimension.

7. A screeplot of the eigenvalues ((c)) was made in order to reduce the number of independent dimensions to only those whose eigenvalues are plotted at or above the ‘elbow’.

Quantification Step In this step, the remaining independent dimensions were supposed to be put under a statistical Opti-mal Scaling test to determine which of them is the best numeric scale - the best ‘quantification’ of the target dimension on a numeric scale. This approach was not followed, however, because most of the independent dimensions ((c)) had a numeric value assigned to different nominal values as can be seen in the two columns at the right-hand side in (d). This was considered to be a weakness for the pur-poses of this thesis since the user might be deceived while visualizing the metadata and get an impres-sion that, for example, two metadata items have the same nominal value in one of the metadata ele-ments while in actual fact they should be different. Instead, all independent dimensions were scanned for any similar numeric values. The one without the similarities was selected as the optimal numeric scale. In cases where all the scales contained such similarities, the other scales, which fell below the screeplot elbow in sub-step 7 of the Distance step, were considered for quantification. Otherwise sub-step 5 to 6 of the Distance step were repeated, skipping step 7 and going straight to the quantification check of a pairwise association of step 4 which was fifth on the list of the most highly associated at-tributes. If the resulting scale satisfied the requirements of the quantification step, it was taken to be the optimal nominal scale for the target dimension. As mentioned in Chapter 3, geographical metadata is prone to missing data (NULL values). Null val-ues were also included as any other nominal value in the nominal-to-numeric mapping process. But for the purpose of a more intuitive visualization, following the technique proposed by Edsall and Roedler (2002), the nominal value was manually modified in each dimension to be the lowest value. The usability of this approach to NULL value visualization shall also be tested.


44

Analysis dimension Correlation coefficient topicCategory -0.2731822 AreaExtentName 0.0322815 resourceFormatName -0.06142354 datasetEnvironment 0.2421431 topologyLevel -0.2009904 geometryObjType -0.2575377 hierarchyLevel 0.290806 contact 0.03049597 title -0.06404668 responsibleParty 0.3579655 status 0.1020872 accessConstraints -0.03856884 useConstraints -0.357318 spatialRepresentType 0.1304085 spatialResolutionScale -0.535685 Projection -0.38326

(a) – Correlation results Analysis dimension Eigen value spatialResolutionScale 0.7616932 useConstraints 0.701888 responsibleParty 0.6672107 hierarchyLevel 0.4714529

(b) –highly correlated dimensions

Scree MF

00.10.20.30.40.50.60.70.8

spati

alRes

olution

Scale

useC

onstr

aints

resp

onsibl

ePart

y

hiera

rchyL

evel

Eigen value

(c) – Screeplot

Nominal values Correspondence analysis numeric mappings

spatialResolutionScale useConstraints responsibleParty

annually 1.95134077 -0.6199654 0.87564288 asNeeded 1.17301293 1.4479101 -1.48415134

continual 2.27566392 -0.6199654 0.87564288

irregular -0.08036869 -0.6199654 0.4189586

notPlanned 0.07114531 -0.4831131 0.06831717

NULL 0.72195337 2.8425829 -2.61060196

unknown -0.76384926 -0.5655443 0.59974051

(d) – Independent dimensions (numeric scales)

Figure 4.1: A model of DQC approach showing the ‘distance’ step and the ‘quantification’ step as applied in this research: (a) shows the associations of the 16 of the metadata dimensions against the target dimension maintenace-Freq. (b) shows the eigenvalues of the 4 dimensions which were highly associated target dimension. (c) is the screeplot of the eigenvalues. (d) shows the results of correspondence analysis ( obtained together with the eigen-values. After looking at the three scales in (d), the scale by spatialResolutionScale was found to be optimal, and therefore was used for plotting maintenaceFreq on the XmdvToo displays.

Elbow

If the dimension below the elbow does not have an optimal scale, check with the 5th highly correlated dimension

If the dimen-sion below the elbow has an optimal scale

If there is no optimal scale, check with the dimension below the elbow


45

4.3. Files and Folders

Finally, the data was organized into four folders corresponding to elements topic category, spatial rep-resentation vector, geographical area name, and resource format. In each folder groups of metadata items were formed as set of OKC, CF and CG files corresponding to each of the values of the meta-data item (for example WashingtonState metadata items or Lakeview county metadata items, for the geographical area name element. Each of these sets of files contained all the metadata elements listed above in section 4.1 (except the above four elements). Each of the folders had the main set that con-tained all the metadata items (141 of them). The topicCategory folder consisted of 15 sets for all the different topics in which the metadata is divided into. The geographical area name folder consisted of 14 sets for a combined area of the two states, the two states, separately, one region within the states, and ten sets for each of the counties within the states. The resourceFormat folder consisted of eight sets representing all the dataformat-and-nativeDatasetEnvironment combinations in the metadata. The spatialRepresentationVector folder consisted of two sub-folders – one for geometricObjectType, which consisted of metadata divided into five sets of types of geometric objects. The second sub-folder consisted of seven sets - for all topology levels identified in the metadata.

4.4. Visualization tool interface

4.4.1. Working of the tool

The main XmdvTool 6.0a user interface (Figure 4.2) consists of a colourful display of data items on all of the four main supported visualization display techniques – parallel coordinate plots (PCP), scat-ter plot matrices, star glyphs, and dimension stacking (see, again, Figure 4.3). The user can switch between the display techniques quite easily using the display mode buttons to the right of the display screen. As it can be seen there are eight of such buttons; for the four flat display techniques and the associated four hierarchical display techniques. However, not all but only two techniques can be displayed simultaneously using the main display and the auxiliary display. The dis-plays are all dynamically linked, making it possible to brush in one display and get the same brush in all the other flat displays. As a filtering mechanism, it is also possible to save the brushed metadata for further analysis of the brushed metadata items. The geographical metadata user (as shown in the HTA model in Chapter 2) would normally have an overview of the whole metadata collection and make a first selection based, for example, on geographical aerial coverage. Then he would like to se-lect a particular subset of the result based on additional criteria - that considers the metadata element values from more elements. The brush and the save-brush features, therefore, assist him in filtering the first selection and, subsequently, querying it with respect to more elements. In addition to the ‘dynamically-linked-brush’ interactive technique, it is possible to zoom in/out, pan, and/or distort the display for a detailed view of a particular set of metadata items. The screen display can become cluttered as a result of a simultaneous display of too many metadata elements (dimen-sions) or too many metadata items. The zooming and distortion techniques allow a non-cluttered view of the items and the panning technique allows a quicker navigation of a zoomed-display. A set of 4 brushing tools is available where each brush tool (or a combination) can be assigned either the ‘highlighting’ (see the red items in Figure 4.2)) or ‘masking’ operation of the brushed metadata items. An item representing the average values on all the displayed metadata elements of the ‘high-lighted’ or masked items can also be defined and displayed on the brush area. Each brush is,


46

Figure 4.2: The XmdvTool 6.0a interface showing the metadata with at least seven dimensions plotted using the Parallel Coordinate Plot (PCP) technique on the main display area. On the main display area the vertical lines rep-resent the metadata dimensions/elements axes. The names of the elements are indicated at the top of each axes. The other lines that cross the axes at various points represent each of the 141metadata items. Along each axis a line crosses at a numeric code (shown on the message bar), which has been assigned for its actual nominal or numeric value on each of the metadata items. For example, a value of ‘3’ along the ‘status’ axis could represent a status value of ‘complete’ for the data items. Along the maintenanceFreq axis a value of 10 could represent ‘annualy’, while, say, a value of ‘3’ represents ‘continuously’.

by default assigned a unique colour. For example, in a situation where the user does not necessarily want to do a hard filtering (by saving the brushed data) of some of the data items, he can use two brushes – the first brush for highlighting the metadata items of interest, and the second brush for masking the rest of the items from the display. Dimension Staking In Figure 4.3, the set braces drawn around the Dimension stacking display indicate the extent of the axis of each of the five metadata elements displayed: The dimension ‘status’ varies along the horizon-tal, discretized into three ranges (bounded by the vertical gray lines) from the minimum value to the maximum value. In each of the ranges falls one of the ‘status’ values – for example, ‘complete’ in the middle range and ‘onGoing’ in the range to the left. Along the vertical, another dimension is plotted against ‘status’ – ‘maintenanceFreq’, which is, on the other hand, discretized into five ranges, bounded by the horizontal gray lines. Again, in one of the ranges, there is a value representing one of the maintenanceFreq values – ‘annually’, ‘continual’, ‘asNeeded’, etc. Within each of the gray cells, formed by the ‘status’ and ‘maintenaceFreq’ discrete values (ranges) ‘accessConstraints’


47

(discretized into two values along the horizontal) and ‘spatialResolutionScale’ (discretized into four values along the vertical) are plotted and their discrete values are bounded by the green lines. Within the green cells, the ‘Projection’ dimension varies along the horizontal, forming vertical bars, where one of the bars represents a 5-D coordinate for a metadata item. Unfortunately, since there is no sixth dimension in the display, ‘Projection’ does not have a dimension to be plotted against in order to cre-ate another mash of smaller cells within the green ones. Therefore in this case the bounding lines are not shown. The more dimensions are displayed, the further the display space is broken down into smaller cells. Since in the OKC file (section 4.2.3) the cardinality values for all the dimensions have to be reduced to values below 10, in cases of a large number of metadata items, and naturally high-cardinality dimensions some metadata items can be over-plotted on the same cell. The disadvantage of displaying more dimensions with Dimension is that the system’s computing power becomes over-loaded. Then the system either responds extremely slowly or crashes. Star Glyphs The Star Glyph technique in XmdvTool, as shown in Figure 4.3 (c) and (e), supports the display of individual metadata items and the brush toolbox. On the display of the brush toolbox, the brush area represents the items highlighted. The representation of a metadata item on one of the glyphs on the main display is actually the line that bounds the rays emanating from the center of the glyph. The cen-ter represents the minimum value on all dimensions in the metadata. The rays represent the displayed dimensions of the metadata. Depending on the values of each of the metadata items on each of the displayed dimensions, each ray is cut at a value of that particular item in that dimension. Then a line that connects the points of cut around the glyph is drawn, forming a unique shape for each of the items. If individual glyphs were shown, they would appear as a mass of lines cutting across the rays of the plot at different points around the center. The glyph brush toolbox represents a case where all the items are plotted on the same multi-dimensional space, from the same center along the same axes. In this case it forms a special case of the PCPs (Spence, 2001).

Parallel Coordinate Plots and Scatterplot Matrices On the PCP each metadata item is represented by a polyline traversing the dimension axes at points representing its metadata element values. On the scatterplot matrix each data item is represented by a point in each of the 2-D scatters, the coordinate of which is determined by its value in two metadata elements (for example spatialResolutionScale and Projection). A highlight of a data item on one 2-D scatter is dynamically liked to the other scatters in the matrix.


48

Figure 4.3: ‘Flat’ display techniques ((a), (b), (c), (d)) on the main display area of XmdvTool 6.0a showing five elements (status, maintenanceFreq, accessConstraints, spatialResoultionScale, and Projection) of the metadata: (a) PCPs, (b) Star glyphs, (c) Scatter plot matrix, and (d) Dimension stacking. The techniques are all dynamically linked through the interaction techniques. The red data items are the brush result. They fall completely within the grey brush area (i.e., their values on each of the metadata elements fall within the ranges defined by the brush. The same brush is shown on (e) the ‘Glyph Brush Toolbox’ display, which is basically used for brushing the star glyph main display, and on (f) the ‘Dimension Stacking Brush Toolbox’, which is used for brushing the Dimension stack-ing main display.


49

Figure 4.4: Structure-Based Brush toolbox for hierarchical display techniques in XmdvTool 6.0a.

The hierarchical displays are also brushed but with the ‘Structure-Based Brush’ (Figure 4.4). On the ‘Structure-Based Brush’, an outline of the hierarchical cluster tree for the metadata is shown with the tree depth value (indicating the number of hierarchy levels formed by the clustering algorithm) in the display window. The level of detail (cluster radius) from the nodes to the root along the height of the tree, the brushed metadata items, or the non-brushed metadata items can be adjusted using the sliders below the display or with the mouse on the display itself. Levels of detail in the hierarchical cluster-ing of the geographical metadata might readily identify the degree of reliability in the similar-ity/proximity of the metadata items in one cluster: metadata items in a cluster closer to the node out-line are similar than those in clusters at higher levels of detail.

The extent scaling technique allows for display of the cluster bands (see also Figure 3.5, Chapter 3) around the mean of each cluster (see Figure 4.5). The user can enable extent scaling in order to see the boundary (min. and max.) values for each of the clusters on each of the displayed metadata elements as he searches for a cluster that satisfies his search criteria. By moving the sliders the displayed visual bandwidth of the cluster can be reduced from its actual width in order to minimize visual overlap of the clusters on the main display area. Extent scaling can be enabled or disabled. Just like with the four hierarchical display techniques, the hierarchical cluster tree can also be dis-played on a Treemap (Figure 4.6) but without showing the bands around the cluster mean values. The advantage of the Treemap over the other four hierarchical display techniques is that at each level of detail of the tree structure each cluster is displayed as a rectangle whose size is directly proportional to the number of data items in the cluster. Therefore larger metadata clusters at each level can be eas-ily differentiated from smaller clusters. Moreover, there is a Tukey Box Plot feature to show the data statistics of the brushed data items and the extent of the brush along each dimension.


50

(e)

Figure 4.5: Hierarchical display of the same metadata displayed on the ‘flat’ displays on Figures 4.2 and 4.3. Four clusters of the 141 metadata items can be seen clearly on the ‘Star Glyph’ display (c). The two clusters with the mean values shown in yellow on all the displays fall within the brush area defined on the Structure- Based Brush (e). The other two are outside the brush. But they are all on the same hierarchy level. The means bands can be seen - for one of the brushed clusters and one of the non-brushed clusters. The other two do not show any bands, mean-ing they either contain metadata clusters, from the next lower level, which have values similar on all metadata di-mensions or the other reason could be that each of them is actually a single metadata item.

Furthermore, the tool offers features for changing colours of the display background, the data labels and the dimension axes (for the PCP and star glyphs) and the plot and stack grids (for the scatter plot matrix and the dimension stacking).


51

Figure 4.6: A Treemap display of the four clusters shown in Figure 4.5. The two clusters shown in colour represent the two clusters in Figure 4.5, which fall outside the brush boundary. The treemap does not display clusters for the brushed items: They are shown as individual metadata items in grey scale.

4.4.2. Limitations of the tool

Despite all the above-mentioned features and others, as far as the purpose of the prototype is con-cerned, XmdvTool 6.0a has some limitations:

• The data file format allows only a maximum cardinality of ‘10’ for dimension stacking. The consequence is that some data items may, unnecessarily, be over-plotted by others in the di-mension stack as result of having been assigned a similar discrete value in a high-cardinality or continuous-scale dimension.

• As mentioned already, the tool, in its current status, does not support full access to a database management system. The file formats do not allow inter-dimension querying mechanisms of the data. That makes it lack GIS functionality too: queries based on spatial location cannot be carried out.

• The nominal labels for the value of the nominal dimensions cannot be shown directly from the OKC, CF, and/or CG files. Another tool has to be used to manipulate the XmdvTool data files such that the resulting display shows the labels.

• It is not a web-based tool. Many (if not all) metadata services are web-based in order to reach a larger group of users at the same time over world scale distances. With the current status of technology, a desktop tool would respond faster to user input than a web-based tool for a given task. This drawback, if not carefully taken into account, can make the usability data about the ‘efficiency’ of the prototype unreliable.


52

4.5. Summary and Conclusion

As the ‘objectives’ section of chapter 1 points out, the pre-requisite for achieving the main objective of this study is to develop a prototype for user evaluation. A prototype was developed using the XmdvTool visualization tool. Using Microsoft Excel spreadsheet and R ("GNU S") geostatistical software 141 items of geographical metadata, originally documented in the FGDC Content Standard for Geographical metadata, were pre-processed (nominal-to-numeric conversion) so they could be formatted according to the requirements of the visualization tool. The conversion was done using a slightly modified so-called DQC approach Rosario et al (2003), that considers the semantic relationships between the dimensions of the data. Prior to the conversion, 22 ISO 19115 metadata elements were selected to which the associated FGDC metadata elements were translated. The metadata represented data from 14 organizations in two states in the United States. Most of the organizations are smaller branches of the central body from which the metadata was gathered. It covers 14 application areas (topic categories) ranging from vegetation data (biota), geology data, topographic data, cadastral data, and satellite imagery and aerial photography data. A table consisting of 18 (out of the 22) metadata elements as attributes was populated with the metadata while the other four elements were used to organize the data into four groups based on a topic category, a spatial representation vector type, a geographical area name, and a resource format of the data.. Finally, for visualization, XmdvTool OKC, CF, and CG files were compiled. XmdvTool supports mainly 4 visualization display tech-niques, namely; parallel coordinate plots, scatterplot matrices, star glyphs, and dimension stacking. The tool also supports variations of these four ‘flat’ display techniques as ‘hierarchical’ display tech-niques, along with a ‘Treemap’ display of the hierarchies. In addition, there is an interactive display of the model of the hierarchical tree itself in order to navigate the tree. The tool also supports interactive zooming, panning, distortion, and dynamically linked views. Although the tool supports a range of visualization display and interaction techniques, some draw-backs in the case of geographical metadata use were noted: They include lack of database manage-ment system support and the fact that the tool is not web-based (since most of the geographical meta-data services are web-based).


53

5. Usability evalutation of prototype

5.1. Introduction

The primary objective of this work was to evaluate the usability of visualization techniques when us-ing geographical metadata. The evaluation was done in two phases as part of a usability engineering process. The first evaluation was in the form of a focus group session where four ITC experts from the fields of geovisualization [2], geoscience [1], and remote sensing and GIS information technology (IT) support [1] sat together with a chairman of the session in order to provide usability data about the prototype of the research. The goal of the focus group evaluation was to evaluate the attitude of the target users of the study towards the usability of the visualization techniques featured by XmdvTool. The primary aim was to identify the most usable and least usable techniques at each level of the three-fold ISO 9241-11 usability definition – effectiveness level, efficiency level, and the degree of satisfac-tion with which users achieve their tasks with the test object in question. The more detailed objectives were:

• to identify the points that need improvement in the prototype before the usability testing and the strong points in order to emphasize them in the usability testing session.

• to identify more user requirements in order to supplement the literature-reviewed user re-quirements before the usability testing.

In the usability testing, five experts from ITC again, from the fields of geovisualization [2], geoinfor-mation IT support [2], and GDI infrastructures [1], individually, participated in a think-aloud usability evaluation method that was video-taped in a usability laboratory. The purposes of the usability testing were:

• to determine, qualitatively, using the same usability definition as in the focus group evalua-tion, the extent to which the hypothesis of the study was or was not true. The hypothesis was that “the utilization of visualizations techniques is a usable approach in the search for data with the use of geographical metadata from a distributed environment of geographical data repository.

• to provide recommendations regarding the concept of geographic metadata visualization.

5.2. Focus group evaluation

Focus groups are a seemingly informal technique that can be used to evaluate the user requirements and feelings about a prototype (Nielsen, 1993). It is actually a quick way of evaluating a prototype (Fuhrmann, 2002). Focus groups belong to the branch of qualitative research methods (Morgan, 1998). Therefore data elicited from the focus group session can be used to qualitatively discuss the strong points and weak points of a prototype, and the users’ attitude towards the interface of a proto-type or towards the test object in the prototype: in this case visualization techniques for geographic metadata. Focus groups work optimally when the group members have an interest in the subject of the research (Morgan, 1998). Such members are usually a group of six to nine (Nielsen, 1993) who can openly discuss the issues put forward by the moderator about the prototype. One criterion to predict a permissive environment to all members for full participation in the discussion is equality of status (Monmonier and Gluck, 1994), especially, of background regarding the topic of discussion. In this research, the members were selected from expert users of geographical metadata, metadata services


54

and/or visualizations. The most important part of a focus group is the discussion of the issues pertain-ing to the usability of the object under evaluation. According to Nielsen (1993) one needs a moderator to prepare a list of usability issues to be discussed after the evaluation object, which could be a prototype, has been introduced or demonstrated and, pos-sibly, tried-out by the members. During the focus group session the moderator is responsible for fo-cusing the discussion only on the issues that he poses to the floor without distorting the free brain- storming process of ideas, comments, and suggestions.

5.2.1. Focus group plan and overview

The goal of the focus group evaluation was to evaluate the attitude of the target users of the study to-wards the usability of the visualization techniques featured by XmdvTool. The primary aim was to identify the most usable and least usable techniques at each level of the three-fold ISO 9241-11 usability definition – effectiveness level, efficiency level, and the degree of satisfaction with which users achieve their tasks with the test object in question. The more detailed objectives were:

• to identify the points that need improvement in the prototype before the usability testing and the strong points in order to emphasize them in the usability testing session.

• to identify more user requirements in order to supplement the literature-reviewed user re-quirements before the usability testing.

The protocol plan was divided into essentially four parts.

a. An opening speech for less than 5 minutes to welcome the focus group members, read them their rights of participation and the confidentiality of their expected inputs, inform them of the plan for the session, and put the idea of geographical metadata visualization into context for the planned discussion.

b. An uninterrupted Power-Point presentation for 25 minutes. c. A 20-minutes exercise in order to get the members familiar with the prototype and its ap-

plication on a geographical metadata set. d. A task session, which was to be completed in 25 minutes. In the task session, the skills

acquired from the exercise were to be put into practice to complete a typical scenario of a geographical data search with the use of geographical metadata.

e. Filling-out of a small questionnaire for less than 3 minutes by the members about their professional backgrounds.

5.2.2. Focus group members

The focus group consisted of one ITC staff member, one ITC support staff member and two ITC PhD students. The staff member came from the department of Water Resources with a background in hy-drology and works as an instructor. The support staff member came from the ITC’s department of In-formation Technology for Remote Sensing (RS) and GIS (RSG laboratory) with a background in RS and GIS IT support. The two PhD students were both based in the Geo-information Processing de-partment; one of them specialized in the field of Internet GIS and Cartography in hi PhD work, while the other one specialized in Human Computer Interaction and Visualizations with a background in Computer Science. Three members had a minimum of five years’ experience in their respective fields of expertise working with geoscientific/geographical data. One member had experience between two and five years. Their experiences in working with computers for geoscientific/geographical applica-tions varied: from 3-5 years (one), 5-10 years (two) and more than 10 years (one).Three had used an


55

on-line metadata service before - for selecting remote sensing (Aster) data and for quality check of the data. The support staff member from the RSG lab works with metadata in order to create and maintain a digital library - Geodata Warehouse metadata service (URL 12) for maps, satellite imagery, and ae-rial photographs for ITC. The staff member from the Water Resources and Hydrology is partly re-sponsible for obtaining data for study purposes in his department, and therefore possesses experience with metadata elements for describing the datasets needed by the department. The two PhD students are both, in part or completely, working on geographic data visualizations for their projects, and they both have had some experience with on-line metadata services.

5.2.3. Focus group session

The session was held on a Friday afternoon (1340hrs to 1540 hrs) in a closed student computer clus-ter. The cluster consisted of about 12 computers for the exercise and the task scenario. A trial version of XmdvTool, with a possibility for displaying nominal labels of data (see Figure 5.1) was used since the label display feature is still new and under test. As a result some bugs were encountered. The task scenario metadata consisted of an extract from the collection of the 141metadata items (Chapter 4): Lakeview county (Oregon) metadata belonging to the themes, elevation, geoscientificInformation, and transportation. This set was further filtered to 20 items, whose data could be obtained in ArcInfo for-mat, for AIX/UNIX platforms. The above protocol was not exactly followed as planned due to the following reasons:

1. Due to the lack of clarity in the contextualization of the presented visualizations to geographic metadata use, as some of the focus group members indicated, the members started interrupting the presentation for such clarification. As such the presentation took longer than planned.

2. The exercise involved much more assistance for the members than was anticipated. By the moment the task session was supposed to commence, only about 30 minutes of the total planned time for the focus group were remaining.

3. By this time, one of the members, who could not stay for the whole session, had to leave. 4. Due to the then-foreseeable lack of time for the discussion, towards the last minutes of the

task scenario, the group members were allowed to ask for assistance, and in the process voice their difficulties in achieving the task. The task session, therefore, proceeded with users al-ready making comments and discussing some of the usability issues of the prototype.

The session started 10-to-15 minutes late, due to a wait for more members. The presentation lasted for approximately 30 minutes with interpretations for clarification of some of the presented issues. It was followed by a supervised exercise, which went on until, approximately, 35 minutes later. Then the last part of the session combined both the task scenario and the discussion, though the discussion was not exactly the one planned or, at least, did not follow protocol. Presentation The Power-Point presentation highlighted the following issues:

1. An overview of purposes of use of geographical metadata services from (MADAME, 2000) to indicate the importance of geographical metadata,

2. The functions of metadata and the ‘visualization paradigm’ (Keim, Pause, and Sips, 2003) in order to relate the principles of visualization to the case of geographic metadata,

3. The main interface of the XmdvTool 6.0a visualization system (Figure 4.2, Chapter 4) and the principles and the working of the visualization techniques and other tools offered by the sys-tem, and


56

4. The applicability of the system’s visualization techniques to geographical metadata visualiza-tion.

One of the key features presented about the displays were the identification of individual dimension axes (metadata elements), especially in the case of the star glyphs and dimension stacking (Figure 4.2, Chapter 4) which do not show the dimension names on the main display. The next important feature was the meanings of the values along the dimension scales (for example, the hierarchyLevel value can be either ‘dataset’, ‘series’, or ‘collectionSession’). Thirdly, the metadata items had to be identified on each display and mapped to their corresponding multi-dimensional values. With the issue of identification of dimension axes (metadata elements), meaning of dimension values (metadata element values), and the identification of the individual metadata items brought into per-spective, the brushing interaction technique was introduced in order to show the selection process for the most suitable metadata elements in a given search criteria of geographical data. The brush toolbox and the associated brush operation definition dialog were also explained in terms of the types of op-erations that can be defined on the four brushes that can be shown simultaneously on the display. In addition, the technique-specific brush toolboxes were also introduced together with the auxiliary displays since the toolboxes they are also a form of auxiliary display of their corresponding main dis-plays. The auxiliary display can be any of the four flat display techniques or the hierarchical display techniques or the Treemap or the structure-based brush for the hierarchical displays. The hierarchical displays were introduced in the same order as their hierarchical counterparts where the most important features to explain were the circumstances under which hierarchical clustering of the metadata elements may be necessary: for example, cases of cluttered display due to many items. Finally some of the additional features of the system were briefly touched upon, including:

• A ‘Tukey box plot’ of the metadata statistics, and the summaries of the metadata in terms of number of dimensions and number of metadata items,

• Interactive ‘Zooming’ ‘Panning’, and ‘Distortion’, • ‘Ordered’ and ‘data-driven’ ‘Glyph placement techniques’, • The ‘Colour Requester’ dialog for changing the displayed background colour, the dimension

axis colour, the dimension names and nominal value label colours, the colours of the grid lines of the different axis in the dimensional stacking display, and the background colours of the four brushes, and

• The ‘Dimension/On/Off/Reorder’ tool. User feedback Even though the focus group session did not entirely proceed as planned some user feedback was elic-ited which addressed important issues for a subsequent user evaluation and for future research. The first feedback was gathered from the very early stages of the session – during the presentation, though the questions and suggestions put forward at this stage were not addressing the usability of the proto-type, but the weakness of the presentation of the prototype; Towards the middle of the presentation, one member indicated that, honestly, he did not have an idea what I was talking about as far as geo-graphical metadata is concerned – that, maybe, I should use a geographical metadata set as an exam-ple, instead of an iris (flower) dataset. The comment seemed to receive consensus from the rest of the


57

members. As a result, much more time was spent on trying to explain, in terms of geographical meta-data, the meaning of the displays of the iris dataset. During the preparation of the presentation, this issue had been overlooked with an assumption that the most essential thing during the presentation would be to describe just the principles of each of the display techniques, and the associated possible interactions offered by the visualization system. The issues pertaining to the applicability of the sys-tem and the techniques for the case of metadata were supposed to be clarified by the exercise. User feedback – Exercise During the exercise, some differences between members’ experiences with the individual techniques were noted. But in general, all the members had an idea of what metadata elements to look for in the metadata. The members were expected to experiment with the different visualization display tech-niques and interaction techniques. Most of them seemed to quickly grasp the information depicted by the main display of the parallel coordinate plots (PCP) and the brushing mechanism on them. Possible reasons are that;

• When data is loaded into the visualization system the default display technique is the PCP and as such the curious user is immediately confronted with information exploration questions about the PCP.

• Secondly, only the PCP allowed visualization of nominal values on the plot axes, and to most members, as this became evident even in the task scenario, the labels on the display was an in-tuitive way of looking at metadata values, which are mainly nominal in nature. Some mem-bers, at first, could not differentiate between the individual data items in the PCP. But this was quickly learned after a less-than-a-minute explanation.

In the case of the star glyphs, the members seemed to be moderately interested. At an overview level of use of metadata, they were discouraged by the lack of labelling of and on the axes: The system of-fered a Glyph Key tool that could be opened as a small window next to the main display. In addition, the Glyph Brush Toolbox shows a larger view of a brushed star glyph with labels for all dimensions. Unfortunately it took some time for the majority of the users to get used to these additional tools, and users ended up playing around with the PCP since it proved to be relatively the most informative at first glance. Indeed one of the aims of the study is to study and try to bridge the gap between the nov-ice users and expert users of visualization techniques by incorporating into the prototype easy-to-learn techniques while at the same time offering more features and techniques for the experts and explora-tory users. Therefore an important lesson in this case is that more time should be spent on explaining the principles, the working, and the applicability of the other techniques. The users could relate to the information displayed by the scatterplot matrices, possibly due to the existence of labels on the main display. But one of the users found it hard to make sense of the mean-ing of the linked brushing between the individual bi-scatterplots. The least informative display was the dimension stacking. In fact much more time during the presenta-tion was spent on the dimension stacking technique than the rest because of its novelty. Members seemed to find it very uninformative as far as metadata use tasks were concerned. User feedback – Task The last part of the session featured a geographical data search scenario using 20 metadata items. At this stage one of the PhD students had left. The major part of the first search criteria for the data, as outlined in Example scenario, part 1 could not be visualized since the dimensions involved were not


58

part of the 18 visualized dimensions due to their many-to-one relationships with the ‘fileIdentifier’ dimension (refer to Chapter 4, sub-section 4.2.2). Therefore that task was done for the users by manu-ally selecting the appropriate 20 metadata sets from the metadata collection of 141 items, from certain topic category groups, a certain gazetteer name group, and one resource format group. The task of the focus group members was only to select the appropriate geographic bounding box latitude and longi-tude ranges (SBoundLat, NBoundLat, EboundLong, and WboundLong) and the UTM1927Zone10 ‘Projection’. The subsequent task, which the members did not complete, was to consider the temporal extent (Ex-tentTemporalBegin and ExtentTemporalEnd), the spatialResolutionScale, the maintenance frequency (maintenanceFreq) of the data, and the currency (dateStamp) of the metadata. During the task it was evident that all the three users were comfortable with the use of PCP – the rea-son being that the ‘Projection’, the ‘spatialResolutionScale’, and the ‘maintenanceFreq’ labels could be read directly from the display. Practically, no other technique except the PCP was used at this stage. As the members struggled with interacting with the techniques to make their selections some com-ments were put forward regarding their expectations and observations of the usability of the proto-type:

• One of the members insisted that the default opening (when data is loaded into the visualiza-tion system) brush area on the PCP display would be more informative if it was to cover the maximum and minimum values of the dimensions such that all the metadata items are brushed: When the data is loaded into the visualization system, the default opening brush area on the PCP display covers about 50% of the dimension axes – 25% in each direction from the middle. Depending on the dataset loaded, some items may already fall within the brush, and as such be already highlighted, while in some cases no items are highlighted at all, which was the case for both the exercise and the task scenarios metadata. It is true that the brush can be interactively adjusted to the required minimum and maximum values on the dimension axis of all the displays. But the issue stressed here was that if all the items were selected by default during the data load, in the process of filtering-out the items that do not fit the criteria for se-lection the user would simply have to ‘reduce’ the brush width on each dimension to the re-quired range. The observation was that with the default brush area many suitable metadata items might be missed by mistake, since a data/metadata item on the PCP is only highlighted if it falls ‘completely’ within the brush area in all the dimension axes (see Figures 4.2, 4.3, and 5.2).

• Another issue was the ‘label placement’ along the dimension axes of the PCP: most users seemed to have no problems with understanding the meanings encoded in the labels of the dimensions axes, for example ‘spatialRepresentationType’, from looking at the nominal value labels at different points along the dimension axis. The users considered the labelling as a strong point of the prototype, since the nominal nature of the metadata is easily encrypted in text. However some of the labels were thought to be too long.

• A member suggested that ‘spatialResolutionScale’ would be more easily understood as an or-dinal dimension. He stated that since scale is naturally a representative fraction, the scale value of, for example 250000 should be plotted at a lower value than, for example a scale value of 10000.


59

Figure 5.1: The display of labels for nominal values on the PCP main display of the prototype.

• Another member emphasized the necessity for an alphanumeric interface especially for nu-

meric dimensions. The numeric dimensions have very high cardinality and as such it is some-times difficult to define with the brush the exact minimum and maximum values for a suitable range of metadata element values. The other members agreed with the suggestion.

• Finally, a comment was raised about the pre-processing of the metadata since the members all felt that the usability issues of the metadata pre-processing steps were as important as the use issues. They felt that the whole process of converting the data from its textual metadata for-mat into a displayable form with the prototype should be simplified to, for example, the sim-ple process of filling-in of a form. This point is important to note for further research. How-ever this thesis strictly focuses on the use of the metadata visualizer as a data ‘search’ tool us-ing metadata and not as a metadata ‘entry’ tool.

In addition to the comments on usability of the prototype, some errors in the data were noted and some undocumented bugs in the system were identified. The notes shall be taken into account in the preparation and during further usability evaluation: System bugs:

• The labels (for nominal values) can only be properly displayed on the correct positions along the dimension axes of the PCP plot if the user does not turn off or reorder any of the dimen-sions.

• When the nominal values are displayed, the coordinate space of the PCP loses stability if the mouse pointer is moving over the display space.


60

• The message bar is not consistent in showing the correct nominal value at the position of the mouse pointer on display.

• Along some dimensions, the message bar does not indicate the values. • The response of the system becomes considerably slow once nominal-label data is loaded.

Data errors:

• Some data items were assigned wrong nominal values. • Some nominal labels were assigned wrong positions along their dimension axes. • Some nominal values were wrongly included in the metadata for the task scenario.

5.3. Usability testing

A project that abides by usability engineering principles is not complete without ‘usability testing’ of the product. Here, the target users have a chance to try out the working of an iteratively designed pro-totype of the product and in the process, their performance with the prototype is measured against pre-defined quantitative and/or qualitative usability specifications in order to decide whether the product is a success or not. Faulkner (2000) says that the usability information gathered from this final stage of a usability evaluation process is important for future projects. However, she also identifies the fol-lowing disadvantage with usability testing:

• Tests only last for limited time periods, and the method emphasizes first time usage. It is not easy to relate first time use limited time of trying out the prototype on a few tasks to how the usage would be after a week, a month, or even a year of regular use on multiple tasks. That is why most developers of, for example, distributed service products have adopted a ‘logging’, method whereby they can monitor how the users interact with the service over long periods of time after the service has been in public use.

The number of test users can be as few as three for a complete usability testing study (Shneiderman, 1998), though with such a small sample it becomes statistically unreliable to quantify the results of test. In this research only five test users were involved, and as a result only a quantitative discussion of the results was done. Several data elicitation methods exist for usability testing. In this case two methods were adopted –the ‘think-aloud’ method followed by a ‘questionnaire’. The think-aloud method is supported by several researchers including Nielsen (1993), Shneiderman (1998), van Elzakker (1999), and Faulkner (2000); As the name implies, it is a method whereby the user, working with the technical object under evalua-tion is allowed to verbalize the thoughts that come into his/her mind without rationalizing or interpret-ing them first. By so doing, the users help the evaluator understand how they perceive the usability of the object. Also, the evaluator can easily identify the user’s misconceptions about the object (Nielsen, 1993). Nielsen (1993) further stresses that the method excels in avoiding later rationalizations of the data gathered from the test by showing explicitly what the users are doing and why they are doing it. van Elzakker (1999) adds on to that by noting that unlike with other methods such as interviews and questionnaires, the data gathered is purely original user’s actions and comments – comments which come non-rationalized, unexplained and as original (without interpretation) spoken thoughts.


61

5.3.1. Test methods

Table 5.1: Comparison of the think-aloud method with other usability testing methods (van Elzakker, 1999)

Method Advantages Disadvantages

Think aloud

• There is hardly any disturbance of

the cognitive process because talking is something that comes almost natu-rally during problem solving.

• No memory errors occur in documen-tation of gathered data since thoughts are spoken out immediately.

• Subjects do not need to give an inter-pretation, rationalization or explana-tion of their thoughts.

• Subjects are not bounded by struc-tured interviews

• Very time consuming • Subjects sometimes have difficulty

translating their thoughts into words, thus resulting in incomplete gathered data.

Others

• Not as time consuming as the think-

aloud since all the questions being raised are already predefined by the researcher.

• May lead to invalid or incomplete data due to a disturbance of the cognitive process.

• Memory errors in documentation of gathered data.

• Subjects feel inclined to rationalize their problem solving strategies

• Subjects may be steered too much into the direction predicted or favoured by the researcher in an interview

It means the data analyzer is bound to interpret the true meaning of the comments (not just take the comments literally) by merging them with the user’s physical actions and, maybe, facial expressions as well or by using similar data gathered with another method. In addition, Faulkner (2000) sees the method as a basis for other evaluation methods, like cooperative evaluation, in which problems with a prototype, for example, are identified for rectification. Just like all the other methods, think-aloud has its own disadvantages. Van Elzakker (1999) lists several advantages and disadvantages (Table 5.1 be-low, from Redido-Cusi, 2002) of the method against other usability testing methods such as the Heu-ristic evaluation, Cooperative evaluation, Observations, Questionnaires and Interviews. Another disadvantage of the think-aloud method is that it is not very relevant for gathering most types of usability performance measurements (Nielsen, 1993). As a consequence, in this work, after the method was applied it was supplemented by a questionnaire wherein questions about ‘subjective’ is-sues were put forward to the user pertaining to the usability of the visualization techniques in the pro-totype. The issues addressed in the questionnaire (Appendix C) were the usability merits recom-mended by ISO 9241-11 (1998) as outlined in chapter 2; effectiveness, efficiency, and the degree of satisfaction with which the test subjects achieved their test tasks. Questionnaires are old and estab-lished methods, which do not directly study the object of evaluation (here the prototype), but rather the user’s opinions about the object. The evaluator should not run into a risk of interpreting the opin-ions literally (Nielsen, 1993) as the opinions are expressed without being first rationalized. Despite the method’s lack of directness in studying the prototype, the method is direct in studying, as men-


62

tioned above, those issues that are too subjective to be measured objectively with the think-aloud method.

5.3.2. Test plan and procedure

The test was divided into four parts, three of which were covered in the usability laboratory: The first part was a power-point presentation of the objective of the research and the working of the prototype. This was followed by a supervised exercise (Appendix A) for test users in order to familiarize them-selves with the features of the visualization tool – XmdvTool 6.0a (without a possibility to display nominal labels). This was a one of the major changes after the focus group session. The reasons were:

• During the focus group session many bugs were encountered with the trail version. Unfortu-nately, due to time constraints, such bugs could not be fixed before the usability testing ses-sion.

• It was also noticed that the trial version reacts annoyingly slow to commands and that the dis-play becomes too unstable for proper brushing once nominal text labels are displayed.

• During the focus group exercise and task executions none of the main visualization display techniques was given attention except the Parallel Coordinate Plots since, so far, the system only visualizes nominal text labels on these plots, and for metadata, which is highly nominal, the other techniques were found useless without the display of nominal labels. This was sup-posed to be avoided in the subsequent evaluation so that the other visualization techniques get a chance to be tried and tested. Therefore nominal visualization display feature had to be sac-rificed.

Later, XmdvTool 6.0a was also replaced by XmdvTool 5.0 after encountering technical problems with XmdvTool 6.0a. It was later discovered, by chance, that for the purposes of the evaluation, XmdvTool 5.0 had a useful extra feature over XmdvTool 6.0a: With version 5.0 the message bar, below the main display of the Star Glyphs, shows all the values from all the displayed dimensions of a data item on which the mouse pointer is. With version 6.0a, that is only possible when the main display is the Di-mension Stacking technique. The exercise ended with a small task with a data search criteria based on only one element of the metadata. The third stage consisted of execution of five tasks (Appendix B) by users, with minimal assistance from the evaluator. The first two of the tasks were based on a typi-cal scenario for geographical data search using geographical metadata. The other three were tasks specific for testing particular visualization techniques. In the fourth stage the users were required to answer a questionnaire. The questionnaire could be answered directly in the laboratory, while XmdvTool 5.0 was still running or it could be taken away with illustrations to guide the user as he attempts to answer the questions alone, away from the laboratory. Nobody attempted the question-naire in the laboratory.

5.3.3. Test users

Selection criteria similar to those of the focus group session were adopted. None of the members from the focus group participated in the usability testing. Two test users were geovisualization PhD stu-dents on Remote Sensing and Spatial Data Quality, both from the GeoInformation Processing De-partment. Two other test users were information technology (IT) support members - one from the Re-mote Sensing and GIS IT support department (with responsibilities for ITC’s ‘Geodata Warehouse’ on-line metadata service) and one from the development section of the Integrated Land and Water In-formation System (ILWIS). One had a background in Remote Sensing satellite developments and at-mospheric corrections while the other one had a background in geography, cartography and digital


63

image processing. The fifth test user was a scientific staff member with a background in geoinformat-ics, distributed services and simulations. Four of the test users had an experience of more than five years working with geoscientific and/or geoinformation data while one had a 3-to-5 year experience. Two had an experience of over ten years working with computers for geoscientific and/or geoinforma-tion applications. The rest possessed experiences between 3-to-5 years and just more than five years. They had all but one used an on-line metadata facility more than once before; for finding available ‘geographic data’ (mainly imagery) for scientific staff and students of ITC [1] and for oneself [1], and for checking data quality [1]. The other member’s purpose for using the services in the past was slightly different from that of the other three; with an interest in distributed services, he said that his purposes were to do the functional tests of the services. The users were invited by the evaluator on request, through the ITC intranet e-mail facility 3-4 days before the tests. In total eight users were invited, out of which six replied positively and only five fi-nally made it to the test. Included in the invitations were three of the members of the earlier focus group evaluation to act as expert users relative to the others. Unfortunately two declined, and one did not respond. In the invitation e-mails, the purpose of the test was briefly explained. The plan of the test together with the schedule, indicating planned durations of each of the parts was included. Also attached was a document highlighting the concepts of the research and the purposes of the prototype and expected outcomes from its use. Finally the participants were briefed about their rights to terminate participa-tion in the test if ever they might feel like doing so.

5.3.4. Test laboratory set-up

The ITC cartography usability-testing laboratory (see Figure 5.2) is a single room in the first floor of the ITC building. It is a pretty quiet place away from the relatively crowded areas of student computer clusters and lecture rooms, making it suitable for undisturbed test sessions. In a think-aloud situation like here, the comments made by the user as he executes the tasks are re-corded in a temporally synchronized mode together with his actions on the computer screen and his front view (especially the face), which is captured by an analogue camera. The camera image and the user’s actions on the computer screen are each projected on separate quadrangles of the 4-split TV monitor screen. From the VCR recording, the think-aloud session can be reviewed for detailed obser-vation and notes of the user’s comments. Only one of the computers was used for each of the five sessions, with the evaluator taking some notes to supplement the video recordings. Only one evaluator was involved.

5.3.5. Test Scripts

The initial plan was to record only the last part of the test session where the user carries out the five tasks. But after a professional advice, from one of the scientific staff members, the plan was changed so that both the test exercise and the tasks could be recorded. The task script (Appendix B) was pre-pared to cover as many prototype features as possible – those that were pre-conceived useful and us-able for metadata visualization: All the visualization techniques were allocated a chance to be tested, and some tasks even aimed at comparing the performances of two of the visualization techniques. The exercise script (Appendix A) was a modification of the focus group exercise script. One of the most important modifications was the deletion of the exercise section on hierarchical techniques.


64

Figure 5.2: Floor plan view and 3D view illustrations of the components of the ITC cartography usability testing laboratory. The computer sends digital signals of the test user’s actions on the computer screen, which are then converted into analogue signals in the digital quad unit (DQU) and projected onto one of the quadrangles on a 4-split TV monitor screen. The splitting of the monitor screen is made possible by the DQU. The camera captures analogue signals of the test user’s front view, which is projected on another quadrangle of the TV monitor by the DQU. Both the signals from the test user’s computer interface and the camera are temporally synchronized and re-corded by the VCR. The diagram is not to scale

After the usability feedback from the focus group evaluation, it was decided that hierarchical display techniques offered by XmdvTool were not relevant for the purposes of geographical metadata use, at least with the limited amount of metadata that was available for the test. Furthermore, as a result of the focus group evaluation, some sections of the exercise were dedicated specifically to practicing the principles and working of the Star Glyphs and Dimension Stacking visualization techniques, which had seemed to receive little attention (or interest) in the focus group exercise and tasks. In dedicating those sections to those two techniques, the understanding of the principles and working of each of the four techniques was put at almost an equal level with the others before the tasks so that none of the techniques were given priority over others only because the user did not understand how the other technique works. Task 1 After describing the scenario, this task was to use only ‘numeric’ metadata elements: east-and-west- bounding longitude values of the geographic bounding box (displayed as EboundLong and Wbound-Long), and south-north-bounding latitude values (displayed as SboundLat and NBoundLat) of the test area, the scale (displayed as spatialResolutionScale), and the publication date of the metadata (dis-played as ExtentTemporalBegin). Any (one or more) of the visualization main display and interaction (toolboxes) techniques could be used (examples on Figure 4.3, Chapter 4). The purpose of numeric elements was to get the subjects used to seeing the actual numeric metadata element values being dis-


65

played on the message bar (Figure 5.3) as they brush or move the mouse pointer over the display. In the case of nominal elements displayed values on the message bar are also ‘numbers’ given as codes to nominal values. A text file with nominal-to-numeric conversion tables was provided later for tasks that required the use of nominal elements, so that the user could check the message bar’s displayed numeric value against the metadata’s corresponding nominal value (Appendix D). The second purpose of Task 1 was to evaluate the ‘effectiveness’ and ‘efficiency’ of the visualization techniques in find-ing the ‘correct’ data based on a multidimensional search criteria. Task 2 Here, the scenario for task 1 was extended to include the use of the element maintenance frequency (displayed as maintenanceFreq) and other nominal elements of the metadata. One purpose of Task 2, was to test the ease with which nominal values (from ‘discrete’ nominal scales) can be brushed as compared to numeric values (from ‘continuous’ scales). Another purpose was to compare the effi-ciency at which several textual metadata files (example in Appendix F) can be read as compared to using visualization techniques (Figure 4.3, Chapter 4) for finding the same information. The third purpose was to evaluate the need and ‘possible’ usability for nominal label displays instead of nu-meric codes of the nominal values. ‘Possible’ usability because the actual labels were not displayed: that could only have been done with the nominal visualization trial version of XmdvTool 6.0 alpha (Figure 5.1). Only picture illustrations of this version were shown to the test users.

Figure 5.3: Results of Task 1 and Task 2; (a) The first result of task 1 on the main display of the PCP, showing the west and east longitude values –121.25 and –119.20 for the geographical bounding box. (b) The display of the nominal element values showing a highlighted value of maintenanceFreq value ‘asNeeded’ which has a numeric value of ‘1.17’ in the metadata.

Task 3 Task 3 was a challenge; to compare the two techniques whose usability evaluation results during the focus group evaluation had proven to be two opposite extreme cases – the parallel coordinate plots and the dimension stacking techniques. The goal, here, was not to confirm the usability data from the focus group evaluation, but to prove the usability (effectiveness and efficiency) and learnability of the dimension stacking technique through the use of its interaction toolbox (Figure 5.4). This useful fea-


66

ture of the dimension stacking technique, as will later be discussed, turned out to be very intuitive ac-cording to some of the test users.

Figure 5.4: Task 3 result showing one bin brushed on the main display of the Dimension Stacking (DS) technique. The highlighted bin represents 3 data items as shown on the Brushed Data Value Dialog for Flat Displays (bottom right inset). The brush was defined in the DS Brush Toolbox (top left inset).

Task 4 As explained in chapter 4, a set of four brushes can be created and all used independently by defining their priorities of use. They can also be used in combination, utilizing the Boolean operators (Figure 5.5). A example of a combination of two brushes would be to highlight in a particular colour, mask from view, and/or display the average of metadata items that fall completely within the overlap brush area of brush 1 and brush 2. For Task 4, this feature of multiple brushes was evaluated against a case of a single brush for achiev-ing the same task. Task 5 Both the Scatterplot matrix and the parallel coordinate plot main display techniques show the names of the metadata elements (see Figure 4.3). But one of the strengths of the Scatterplot matrix is sup-posed to be in showing a series of bi-plot correlations between metadata elements. But this strength is only relevant for those metadata elements with a possible natural direct or inverse proportion relation-ship. It must also be pointed out that this relationship can only be analyzed on numeric metadata ele-ments. In this case the publication date of the data (ExtentTemporalBegin) and the publication date of the metadata (dateStamp) were correlated successively with the Scatterplot matrix and the parallel coordinate plots. The purpose was to evaluate this seemingly strong point of the Scatterplot matrix against the seemingly favoured technique, the parallel coordinate plot.

5.3.6. Test sessions

The tests took place on February the 5th, the 6th, and the 9th, 2004. Each of the tests was supposed to run continuously for about two hours. Due to the experience from the focus group where the members indicated that they did not have an idea of what the presentation was about, the presentation was ap-proached a bit slowly in order to make sure that all the presented issues were clear to the user before the exercise. Therefore it took a bit longer than was expected. The following was the schedule for the test session:


67

Figure 5.5: Task 4; (a) The Brush Toolbox and the Define Brush Operation dialog where the second brush has been defined to work independently, but with a similar operation as the first brush. The brushes have been used simultaneously in (b), where the first brush is shown in gray and the second one in purple. One brush has highlighted the metadata items with spa-tialResolutionScale of 10000 (red items) while the other one has highlighted data items with spatialResolutionScale of 250000 (green items).

1. At least 25 minutes of on-screen power-point presentation of the prototype 2. At least 25 minute exercise - to get familiar with the metadata elements in the metadata set to

be used and to get familiar with the visualization techniques of the prototype. 3. Execution of five tasks with the prototype in approximately 60 minutes. 4. Filling-in of a questionnaire for the test users’ professional biography details and for data on

the usability of the visualization techniques of the prototype. The test users completed the questionnaires in their own time, outside the laboratory, and then brought them back a day or two later.

Before the presentation a small briefing about the research hypothesis and objectives of the thesis was done. In addition, although this was included in the invitation e-mails, each test user was told of the level of confidentiality of the tests and his rights to quit the test anytime without prejudice. Presentation The presentation entailed an overview of the functions of metadata, an example interface of an on-line geographical metadata facility, the relevance of ‘visualization’ for metadata, an example of textual metadata (FGDC standard) that was used for the prototype, an Excel spreadsheet showing over 19 metadata elements and 141 metadata records in ISO 9241-11 geographical metadata standard. The metadata records had been translated from the FGDC metadata as explained in Chapter 4. Other than the 19 elements that could be visualized, the other elements considered essential for geographical metadata in Chapter 3, section 3.1.3 were also indicated with empty entries for the 141 records. Fi-nally the main part of the presentation followed with an explanation of the principles and working of the features of the prototype, focusing only on the main displays of the four ‘flat’ visualization tech-niques and their corresponding brush toolboxes, the Zooming-In-and-Out’ and ‘Distortion’ interaction techniques, the ‘Dimension On/Off’ feature, and the ‘Colour Requester’ dialogue.


68

5.3.7. Usability Data

The first set of usability data was gathered in two parts: • In real-time from the observations by the evaluator and comments by subjects made during the

exercise and the tasks, and • Later, after the recording, by reviewing audio-and-video test sessions of the exercise and

tasks. The real-time observations and comments were later combined into a single set with the audio-and-video tape reviews. The second set of usability data came from the answers to the questionnaires about the usability of the prototype. First set – observation data The documentation of the data for the first set was divided into six parts, corresponding to the exer-cise and the five tasks. For both the exercise and tasks, notes were made of the reactions of the subject and his comments regarding

• the usability of each of the specific visualization techniques, • the usability of the way ‘data’ was displayed by the techniques, • the usability of the general functionality of the interface and its visual variability. • The last few comments referred to the system bugs.

The observations and comments by individual subjects were compared to identify similarities and dif-ferences. Counts were made to find issues of general consensus of opinion among the subjects, the most common or least common opinions, and individual opinions. Merging and complementing simi-lar comments and similar observations reduced the total number of comments and observations gath-ered. Secondly, for each of the tasks on which a free choice was given of which visualization technique to use (Task 1, Task 2, and Task 4), it was noted which techniques (Parallel Coordinate Plots (PCP), Scatterplot Matrix (SM), Dimension Stacking (DS), and Star Glyphs (SG)) were used for the ‘main display’ and which were used as ‘brush toolboxes’. In this case, the usability of each of the techniques was measured against the other techniques in terms of the attitude the users had towards the tech-niques, especially in terms of the informativeness of the main displays of the techniques, and the effi-ciency of the toolboxes of the techniques. In the discussions, the data gathered will be related to the answers about Task 1, 2 and 4 in the questionnaires. From Table 5.2 it can be seen that for the main display the PCP technique was used more often than any other technique. On the main display, the PCP (and the Scatterplot matrix) could be brushed di-rectly without a brush toolbox. Therefore the PCP toolbox was not used. After all, there was no extra information that could be gathered from the PCP Brush Toolbox or any extra feature that was not available for the main display. On the main display, the Star Glyphs technique was used only in Task 2 (by two users), in order to check the nominal information (on status of the data, contact details for metadata, responsibleParty, and accessConstraints). It proved to be more effective than the PCP in displaying ‘individually’ on all dimensions, the brushed items.


69

Table 5.2: Visualization techniques choices by test users for the main display and for the brush toolboxes

Test User

Technique used

TASK 1

TASK 2

TASK 4

Main display technique

Brush Toolbox


Brush Toolbox


Brush Toolbox

A PCP DS PCP None PCP PCP

B PCP None PCP, Star glyphs None PCP None

C PCP None PCP None PCP None

D PCP None PCP None PCP PCP

E PCP DS PCP, Star glyphs None DS DS

The DS Brush Toolbox was used most frequently (other options were the PCP Brush Toolbox and the Star Glyphs Brush Toolbox). In the focus group session the main display of the Star glyphs and the DS Brush Toolbox were hardly used. First set – data from recordings The times for completing each of the main tasks, and well-defined sub-tasks of the main tasks were recorded. That is:

1. Total time for completing each of the five main tasks, Task 1, Task 2, Task 3, Task 4, and Task 5.

This data was meant for checking qualitatively the general reliability of the results on the pace of completing the tasks. From the results in Table 5.3, it is evident that test users A, B, and D had pretty similar paces in com-pleting their tasks except in Task 2 and 5 where test user D had to rush through the task due to time constraints. Task 2 was actually flexible; some users read through more textual metadata ‘.htm’ files than the others, and therefore spent more time on the task. The flexibility was allowed because the total timing was not that relevant, but more the observation of the amount of effort spent on obtaining nominal data information from textual metadata as compared to obtaining similar information with the visualization techniques. Test user C was relatively fast1. However, he managed to get the expected outcomes from all the tasks. Test user E spent a lot of time giving comments. He stopped and tried to discuss his feelings about the usability of certain features. He managed to finish only Task 5 rather quickly, also, due to time constraints because his session even ended after the normal working hours of ITC.

Table 5.3: Total time for completing the each task by each of the test subjects

1 That could have been caused by the fact that he was tested on a Friday afternoon and had to catch a train just after the testing. At some point during the test he even insisted that he should do the test quickly.


70

Test User Time for completing the task (in minutes and seconds)

TOTAL TIME

TASK 1 TASK 2 TASK 3 TASK 4 TASK 5

A 13:55 23:34 06:25 07:04 07:40 58:38

B 12:11 24:05 07:23 07:08 08:55 58:22

C 09:40 12:52 03:35 05:34 06:58 38:39

D 14:40 10:16 05:01 07:15 04:32 41:24

E 16:32 20:20 09:50 11:56 03:22 61:30

2. Time for completing subtasks

In Task 2 • Time for completing sub-task a., i.

The purpose of noting the time for this subtask was to check the reliability of the results be-tween different test users in achieving a task with a ‘numeric’ scale for finding information on ‘nominal’ values such as the ‘status’ of the dataset or ‘maintenaceFreq’. Then data would be used to reliably compare time spent on querying numeric data and time spent on querying nominal data. The numeric data was queried in most of the tasks, including Task 1. The results among subjects for Task 2 a., i., in Table 5.4, are pretty similar, which, in terms of time, indicates the reliability that was tested. Only test user E managed to complete the task in less than a minute because he attempted the task immediately after reading the scenario from the task script before he read the actual task. In Task 3

• Time for achieving the task with the Dimension Stacking (DS) technique • Time for achieving the task with the Parallel Coordinate Plot (PCP) technique

In Task 4

• Time for finding data sets with spatialResolutionScale of 10000 using one brush • Time for finding data sets with spatialResolutionScale of 250000 using one brush • Time for finding data sets with both the spatialResolutionScale of 10000 and 250000 using

two brushes. The purpose of noting the times for completing each of the sub-tasks (except sub-task a.i. in Task 2) was to compare the efficiency of each of the techniques or approaches that were used to achieve a similar task. The data gathered was also related to the answers to questions from the questionnaire. In Task 3, the time for completing a task with the PCP was in all cases less than half the time for completing the same task with the DS. The time for completing Task 4 with two brushes was shorter than with one brush for the same task in all the cases.

Table 5.4: Time for completing sub-tasks within the main tasks by each of the test users.


71

Subject

Time for completing sub-task (in minutes and seconds)

TASK 3 TASK 4

(one brush) TASK 4

(two brushes)

TASK 2 a., i.

Task 3 (DS)

Task 3 (PCP)

Task 4 (10000)

Task 4 (250000)

Total time

Task 4 (Two brushes)

A 02:37 03:40 01:05 02:24 01:15 03:39 03:15 B 03:01 04:00 01:14 03:20 00:40 04:00 03:08 C 03:47 02:20 00:41 02:02 00:25 02:27 00:50 D 04:01 02:35 00:49 03:40 NULL 03:40 01:40 E 00:40 04:07 02:38 06:57 00:25 07:22 04:26

Another important set of data was the accuracy of the results that subjects obtained at the end of each task. All tasks were completed with accuracy except two subjects – D and E, on Task 1 and subject E, on Task 2 a., i. The expected results for the two tasks were 15 and 11 metadata items respectively. In Task 1, subject D got 23 items, and subject E got 9 items. In Task 2, where the results of Task I was supposed to further filtered, subject E got the remained with the same 9 items. Second Set The questionnaire (Appendix C) consisted mainly of closed questions with spaces for additional re-marks and brief explanations of some of the closed answers. Each question addressed one or more of the three usability merits effectiveness, efficiency, and satisfaction with which the users achieved spe-cific tasks with the use of the prototype. Of the 13 questions only question 11 and 13 were general questions, respectively, about the usability of the prototype as a whole and the usability of the concept of ‘visualizing’ geographical metadata. Question 10, on the other hand addressed the issue of the dis-play of labels for nominal values. Results are summarized in Table 5.5 and Table 5.6. Questions 1 and 2 both referred to Task 1. They both required a reason for selecting the technique that the subject had used for the task. They were multiple-choice questions with four options and a choice for any another reason. For question 2, only subject D had a reason different from one of the four options. The options as shown in Appendix C were about 1) the informativeness of the technique, 2) the amount of effort required to achieve the task, 3) comfort in using the technique, or 4) that the subject just felt he liked the technique best. The results are shown in Table 5.5. Back, in Table 5.2, one notices that, of all the toolboxes, only the DS Brush Toolbox was used in Task, 1 by subjects A and E. From Table 5, the reason for subject A was that the DS Brush Toolbox was the most informative in completing the task. Subject E actually said that among all the techniques (both the main display techniques and the brush toolboxes) the DS Brush Toolbox provided him with a normal way he thinks about making selections of ranges of values on a data attribute scale. Subjects B, C, and D did not use any special brush toolbox except only brushing on the main display of the PCP. Comfort was one of the attributes describing satisfaction with which a user achieved his tasks. It was the most common reason among the subjects for selecting the PCP technique for the main dis-play. Informativeness is one of the attributes for describing effectiveness. That was the reason for sub-ject D. Amount of effort required to complete the task defines the efficiency. That was the reason for subject E. Therefore it can be seen that at least one user used the PCP on the main display because of


72

one of the three usability merits adopted in the study. In the case of the toolboxes it was, in both cases, about the effectiveness of the DS brush toolbox. Subject E chose ‘intuitiveness’, which was none of the four options. But ‘intuitiveness’ is an attribute that describes effectiveness. The response to question 3, for Task 2, indicates that the general consensus was that the use of visu-alization techniques for finding nominal information (on status of the data, contact details for meta-data, responsibleParty, and accessConstraints) was much more usable in terms of efficiency than in terms of effectiveness and satisfaction with which the users achieved the task. But one user made a remark that he considered the visualization approach more usable with an ‘assumption’ that the labels for nominal values were displayed, as with the version of XmdvTool used in the focus group evalua-tion.

Table 5.5: Reasons of subjects for selecting the ‘main display’ techniques and the ‘Brush Toolboxes’ they used in Task 1.

Subject QUESTION 1 QUESTION 2

Reason for the selection of the ‘main display’ technique.

Reasons for the selection of the ‘Brush Toolbox’ technique

A Comfort Informativeness

B Comfort Comfort

C Comfort Informativeness

D Informativeness Informativeness

E Amount of effort Intuitiveness

Responses to question 4 for Task 3 clearly show that for the main display, the PCP was the most in-formative. One user said that the DS main display was just too ‘complicated’ to extract any informa-tion out of it. Another user said that trying to understand the DS display was ‘awful’. But in terms of effectiveness, the DS display and the PCP were given more or less the same rank. Because subject C’s results did not have rankings, it would be a little unfair to conclude that the DS stacking technique was slightly better. But looking at subject D’s response of 5-to-4 in favour of PCP and the responses from subjects B and E of 4-to-2 in favour of the DS brush toolbox, the DS technique is in three cases above effectiveness average while for PCP, even if subject C gave DS a score below average, it would only be in one case that DS fell below average. But in terms of efficiency the DS brush toolbox beat the PCP. Only test user C disagreed. But then again, in a general-case question (question 6) combining both a technique’s main display and its brush toolbox, it seems the PCP induced a better degree of comfort for achieving the task. In Task 4, question 7, in general the two brushes were considered more efficient than the use of one brush in selecting element values that are ‘non-contiguous’ along a dimension axis. However one user argued that both approaches were “too tedious for such an easy task” – that there must be a more effi-cient way to get to the result. In terms of effectiveness, one user felt that the two-brushes approach was not as error-prone as using one brush.


73

In Task 5, question 8, all the users except one agreed that checking the correlation between two geo-graphical metadata elements is relevant in a data search where metadata is used. But all the positive replies lacked explicit examples of cases where checking the correlation might be relevant. They were even a bit sceptical in deciding whether they see any correlation; it is therefore surprising that three of the relevance scores are clearly high. The subject who though that correlation checks were not relevant argued that correlation is relevant for data mining purposes, not for data searching with metadata. But in general, the Scatterplot matrix was thought to be a slightly more informative approach than the PCP in finding the correlation.

Table 5.6: Questionnaire results on the usability of individual techniques in each of the tasks

Questions Test users

A B C2 D E

Task 2

3. Usability of browsing through textual nominal metadata as compared to ‘visualizing’ metadata Effectiveness scores Textual metadata 2 5 Less effective 1 4 Prototype 3 4 More effective 4 4 Efficiency scores Textual metadata 1 1 Less effective 1 3 Prototype 4 3 More effective 3 4 Satisfaction scores Textual metadata 3 2 Less effective 1 4 Prototype 4 3 More effective 4 4 Task 3

4. Informativeness of DS main display as compared to that of the PCP main display DS main display technique 2 2 Less informative 3 2 PCP main display technique 4 4 More informative 5 3

5. Effectiveness and/or efficiency of the DS brush toolbox as compared to the PCP toolbox Effectiveness scores DS brush toolbox 3 4 Less effective 4 4 PCP brush toolbox 3 2 More effective 5 2 Efficiency scores DS brush toolbox 4 4 Less effective 3 4 PCP brush toolbox 4 1 More effective 2 3

2 Test user C was the first among the five to be given the questionnaire. It was noticed before the next test that the information from answers to questions ‘3 to 10’ would be incomplete without the indications of the ranks, on a scale‘1 to 5’, at which a technique is assigned a usability value. Then the ranks were added for the other subjects. That is why the responses from Test user C do not indicate the ranks in those questions.


74

Table 5.6 (continued): Questionnaire results on the usability of individual techniques in each of the tasks

6. Degree of comfort or discomfort of using DS techniques compared to the PCP technique

DS technique 3 3 Less effective 4 2 PCP technique 4 3 More effective 4 4 Task 4 7. Effectiveness and/or efficiency of using one brush two times as compared to using two brushes once Effectiveness scores One brush - two times

3 4 Less effective 3 4 Two brushes - once

4 4 More effective 4 4

Efficiency scores

One brush - two times

3 2 Less effective 2 2 Two brushes - once

4 3 More effective 3 4

Table 5.6: Questionnaire results on the usability of individual techniques in each of the tasks

Questions Test users

A

B

C

D

E Task 5

8. Relevance of assessing correlation between two metadata elements Scores 4 4 Relevant 5 2 8. Informativeness of the Scatterplot matrix as compared to the Parallel Coordinate Plot in showing the corre-lation between dateStamp and ExtentTemporalBegin Scatterplot matrix 3 4 More informative 4 4 Parallel Coordinate Plot 4 4 Less informative 3 3

Table 5.7: Questionnaire results about the usability of individual techniques, the prototype as a whole, and the concept of geographic metadata ‘visualization’.

Questions

Test users

A

B

C

D

E Non-task specific

9. Effectiveness of the distortion technique Scores 4 Didn’t distort Effective 4 Didn’t distort 10. 'Possible' effectiveness of display of labels as compared to numeric codes for nominal values Scores 5 5 More effective 5 5 11. Effectiveness, efficiency, and satisfaction with which you achieved your tasks using the prototype as com-pared to using existing geographical metadata facilities Effectiveness scores 3 3 4 4 2 Efficiency scores 4 2 2 4 2 Satisfaction scores 4 2 2 4 2 12. Opinion about the rate of effectiveness, efficiency, and satisfaction of 'visualization' of geographical meta-data Effectiveness scores 4 4 4 5 2 Efficiency scores 4 4 4 5 2 Satisfaction scores 4 4 4 5 2


75

As mentioned at the beginning of the section, comments and observations were also noted and classi-fied into four parts - usability of 1) specific visualization techniques and other tools of the prototype, 2) the data display by the techniques, 3) the general functionality and visual variability of the inter-face. The rest of the comments and observations referred to 4) the system bugs. For a detailed look into all the comments and observations, refer to Appendix 5.

5.4. Discussions3

Comparison of the usability of the techniques In both user evaluations when multiple metadata items were compared, the PCP proved to be the most favoured main display technique. The PCP technique is old and established as compared to the rest of the techniques, and most of the users had worked with the PCP before, so they were quite familiar with the principles. But, also, those who were working with it for the first time found it very easy to learn, and quickly appreciated its informativeness. This explains the PCP’s high scores in terms of effectiveness, efficiency, and satisfaction with which the users achieved their tasks with it. For ques-tion 1 of the questionnaire the subjects actually selected three different options for reasons why they used them in Task 1; one of the reasons referred to effectiveness, one other reason referred efficiency, while the other three referred to satisfaction (comfort). However, most users made a mistake of count-ing selected items from the main display of the PCP. One cannot be absolutely sure of the actual num-ber because some items can be overplotted on others (where they have similar values in more than one dimension). Perhaps that is where some users realized the strength of the Star Glyph main display. The Star Glyph main display technique was considered useful for cases where a search result is al-ready available (i.e., for checking details on demand), for example, from the PCP. The reason is that on the Star Glyph main display each of the metadata items was plotted spatially independent from the other items. Therefore a user could simply run the mouse over each of the selected items to check more details of the dataset whereby the values of all the displayed elements would be shown on the message bar. The Scatterplot matrix main display was only used where the users were clearly asked to use it. But even then, one of the strengths of the techniques – that of showing correlations between dimensions of data, turned out not to be a relevant issue in the use of geographic metadata. The DS main display was considered very non-informative. In the case of brush toolboxes, the DS Brush Toolbox (which was hardly used during the focus group evaluation) turned out to be the most effective, after being given a chance against the PCP brush tool-box. But the fact that the DS Brush Toolbox was useful did not indicate that the design principles of the general DS technique might, after all, be relevant for geographic metadata. Both the users who selected the toolbox simply did it because the toolbox displayed individual dimension scales as sepa-rate from one another, in the form of rectangular bars, where, like with sliders, the selected values could be adjusted by just clicking and running the mouse pointer along the length or height of the bar (see Figure 4.3, Chapter 4). In that form, as subject E indicated the technique became intuitive. But there was more information in the toolbox display that provided clues for understanding the principles and working of the DS technique on the main display: The vertical and horizontal arrangement of the

3 Reference will, repeatedly, be made to the Appendix section for the actual list of comments and observations gathered from the usability testing exercise and tasks.


76

dimensions and the order from top-to-bottom and left-to-right corresponded to the arrangement in the main display of the technique. No subject seemed to be aware or interested in any of those clues. The Star Glyphs toolbox (Figure 5.6) was easily understood during the exercise but was not used in any of the tasks. One disadvantage with both the Star Glyphs Brush Toolbox and the DS Brush Tool-box was that they both lacked a zooming option, in order to increase the sensitivity of the message bar. That was actually one of the strong points of the PCP Brush Toolbox and the PCP main display over those two. In fact, almost all subjects completed Task 3 with the PCP (see Table 5.4) in less than half the time they did with the DS toolbox.

Figure 5.6: A star glyph toolbox (inset) with the Star glyph main display on the background. Only three metadata elements are displayed with only three metadata items brushed as can be seen on the main display

The interactive zooming technique was very effective, much more than the distortion technique be-cause the user could zoom in both the vertical and horizontal directions. Also, due to the insensitivity of the message bar and the ineffective quantification of the nominal values on the numeric scales, us-ers had to zoom in coarsely in almost all the tasks. However, zooming into a brush area was a very ‘clumsy’ process, as one user said (refer to Appendix E for more detailed comments). Another clumsy weakness with the prototype was that there was no possibility to undo a change in the brush coverage, to undo a step in the distortion operation, or to undo a selected colour from the ‘Col-our Requester’ dialog. Not even with a keyboard operation ‘Ctrl’ + ‘Z’ could a user go back in a ses-sion. The test users complained that they had to be unnecessarily careful not to click anywhere on the display by mistake as that would change the brushed coverage. A change of the brush coverage could easily result in losing the brushed items. That weakness was actually another cause for the clumsiness of the process of zooming-in on a brush area. Evaluation of the usability of other tools and features The ‘Dimension On/Off/Reorder’ toolbox was regarded as a highly efficient and effective tool since, in a task, most users were only interested in a few of the metadata items, and could re-arrange the di-mension positions to focus on those dimensions that they felt had a relationship or could be corre-lated. The ‘Brushed Data Value Dialog for Flat Displays’ proved to be an effective and efficient tool in checking quickly the metadata element values of numeric elements. It was not effective at all for nominal elements since only numeric codes of the nominal values were shown on the dialog. The ‘Colour Requester’ feature also proved to be highly usable (refer to Appendix E for justifica-tions). Colour has a very strong effect on visual variability and, therefore, visual clarity. As such, for


77

visualization purposes, a tool with possibilities to adjust the metadata display to ‘many’ colours is highly usable. The ‘Save Brushed Data’ feature was not used in any of the tasks because there was no need. But us-ers indicated that it would be effective to save the brushed items for later, more detailed, querying. The disadvantage with doing so, unless the saved items were saved along with all the dimensions, would be that the saved brush becomes another dataset with only the dimensions that it was saved with. In that sense the feature would not be effective. The fact that the ‘Brushes’ menu had menu items with names ending with ‘Toolbox’ and that there was a menu name called ‘Tool’ confused the users often when they wanted to open one of the ‘tool-boxes’ from the ‘Brushes’ menu. The names, therefore, created a non-efficient and annoying exercise, especially at first time usage of the prototype, as the subjects made mistakes of opening wrong win-dows. Evaluation of the usability of the data display This was discovered one of the most important issues for further research in geographical metadata visualization. Many comments on the non-usability of the prototype were a result of the non-intuitive display of nominal values: As can be seen from Appendix E, both the order and quantification of the nominal values on the nu-meric scales of the prototype’s main displays were sensitive issues. The long process of nominal-to-numeric conversion of the values of the nominal elements by the DQC approach, explained in Chapter 4, was not worth spending time on. A conclusion here was that the DQC approach was only useful for data mining purposes, not at all for data search purposes with geographical metadata. All the users had implicit and explicit expectations about the order at which the nominal values were given on the numeric scale, for example, of status of datasets. For example, users claimed that the status value ‘complete’ should be given the highest numeric value. In the case of quantification of the nominal values, most users just could not understand why a particular nominal value would be plotted, for ex-ample, closer to the next upper nominal value but very far away from the next lower nominal value. This was the case in all the nominal dimensions, especially evident on responsibleParty (Figure 5.1) due to many nominal values involved. In terms of order, one user actually thought that a higher nu-meric value for responsibleParty, as far as he was concerned, of all the ‘data sources’ described by the metadata, should have indicated the most ‘reliable’ sources. The coarse zooms were always required where the quantification had forced data items into a cluster on a nominal dimension scale. For example, again, with responsibleParty, all the values were clus-tered close to the maximum on the axis of the dimension (Figure 5.1), with a big gap of empty space between those values and ‘NULL’ values, which can be seen plotted in the middle4 of the axis. The approach of assigning outlier numeric values to ‘null’ values (Edsall and Roedler, 2002) was therefore also non-informative and made brushing inefficient and annoying (not satisfying) to users. The issue of ‘null’ values, which can occur in most of the metadata dimensions for a metadata set, brings this discussion to the issue of nominal label display (see Figure 5.1. This was the version of XmdvTool

4 It was simply due to typing errors during the pre-processing of the metadata that the ‘NULL’ values were plot-ted in the middle of the responsibleParty axis. They were actually supposed to have been at the bottom, at the same level as the ‘NULL’ values in the other dimensions (see also Figure 5.1), with the nominal labels), which, in the figure, are plotted to the right of responsibleParty.


78

that was used in the focus group session). Assigning a label ‘NULL’ to ‘null’ values would perhaps be a better alternative to the approach of assigning numeric outliers to ‘null’ values since once a label is displayed on the screen, the user can immediately know which metadata items have no information in which dimensions. As can be seen from the responses to question 10 of the questionnaire, nominal labels would be an effective, efficient, and even satisfying feature of the displays. It is still difficult, of course, to say how the labels would be displayed on the other main display techniques other than the PCP, because during the evaluation it was only with the PCPs that labels could be shown. Finally, even if nominal labels were displayed, for easy querying of nominal dimensions, numeric di-mensions would still be inconvenient to query due to the low sensitivity of the message bar, the amount of zooming required, and the issue of getting lost, while in a zoom-in mode, within a brush: When the display is zoomed into a brush area one cannot decide whether the mouse cursor, in order to adjust the brush minimum or the maximum, is closer to the minimum or to the maximum of the brush area. Some users, therefore, suggested an alphanumeric interface - like the one featured by the zoom-ing technique - as a more usable approach where the minimum and maximum values for the brush (on the axes of the numeric dimension) can just be ‘typed’ using the keyboard. A toggle interface like those available on most on-line geographical metadata facilities including the ITC’s Geodata Warehouse (URL 12) and Alexandria Digital Library Project (URL 1) were also sup-ported for those nominal dimensions with only a few choice of values such as point, vector, and grid values for the spatialRepresentationType dimension. System bugs’ effects on usability The response of the system was quite slow on the hardware that was utilized for the testing compared to the way it used to run on the computer on which the prototype was developed. The hardware of the computer used for the evaluation was a bit old and short of memory. In order to be able to select, using only one dimension, contiguous datasets or one dimension, the ‘Shift’ key was used together with the mouse. Sometimes the operation on the ‘Shift’ key did not give any reaction to the mouse pointer. The bug make brushing unnecessarily inefficient because the brush area had to be maximized first on all other dimensions not involved in the query so that only one di-mension could be brushed. An alternative would be to turn off ‘all’ those dimensions not involved in the query. But that would actually not possible. At least two dimensions had to be displayed at any one time for the data to be displayed. The fact that the DS main display could be opened when only at most five or six dimensions were dis-played on made the technique even less usable. No user wanted to keep on turning on-and-off the di-mensions when he could keep them all on with another technique. Accuracy of task results The wrong results by subjects D and E for Task 1 were to be expected due to the low insensitivity of the message bar and the text size of the message, which was too small for most users. Users often had to lean forward, closer to the screen to see properly whether the value displayed is the value they think they were seeing. Interestingly, both subjects got wrong answers on ‘numeric’ dimensions. Possibly, that was due to the fact that numeric dimensions had a lot more unique values along their respective axes than their nominal counterparts. The values would even be clustered together between certain ranges (see the four geographic bounding box elements and spatialResolutionScale in Figure 5.3), ne-cessitating constant zooming, which was quite an irksome exercise for most subjects. Without enough


79

zooming, the brushing was prone to errors. Subject E did not make any mistakes in Task 2 to obtain the wrong result: All the 9 items he had obtained from Task 1 were part of the 11 items he was sup-posed to obtain in Task 2. Therefore he brushed correctly, but still got the same result from the previ-ous task. As can be read from Appendix F, another mistake made by subjects was to forget to maximize the brush area on dimensions that were not involved in the selection query (see the maximized brush on status, contact, accessConstraints, and responsibleParty in Figure 5.3 (b)). Subject D actually lost part of his result from the Task 2 brush by turning on dimension ID, and forgetting to maximize the brush area on it. Subject E also made the same mistake, but was notified by the evaluator before proceeding with the task. The purpose of ID was simply to act as an index to correctly identify the 11 selected dimensions from the ‘Brushed Data Value Dialog for Flat Displays’ in order to open the associated textual metadata files. In short the query was: “Of all IDs, select the ID numbers of the selected items”. But the mistake could have easily been associated with the fact that, on top of the six dimen-sions that were used in Task 1, dimension maintenanceFreq was added in Task 2 a., i. Then dimen-sions responsibleParty, status, contact , and accessConstraints were added in Task 2 b. For a novice user of the prototype, that could have been an overwhelming amount of elements to deal with in one task.


The purpose of the evaluations was to elicit data pertaining to the usability merits defined by ISO 9241-11 of the visualization techniques in XmdvTool 6.0a for geographic metadata visualization. Both evaluations, in total, consisted of four subjects from the ITC’s Geoinformation Processing De-partment PhD students with a Geoinformation visualization background, three subjects from the ITC’s IT support departments, and two scientific staff members, respectively, with backgrounds from Water Resources and Distributed Services. The focus group session failed to discuss, based on a planned protocol, the usability issues regarding the performance of XmdvTool 6.0a (for nominal data visuali-zation) with all its basic four ‘flat’ display visualization techniques and the associated four ‘hierarchi-cal’ techniques. However, some useful comments were put forward by the members as they tackled with the exercise and the tasks that were given to them during the session. Some of the comments were incorporated into the subsequent evaluaton – the think-aloud usability testing, which was re-corded in a usability laboratory. In the second evaluation, version 5.0 of XmdvTool was used after noting some crucial bugs with the one used in the focus group evaluation. Also, the ‘hierarchical dis-play’ techniques were thought to be useless for the case of geographic metadata use after the focus group evaluation, and therefore were left out in the second evaluation. The second evaluation was supplemented by a questionnaire to elicit data regarding subjective usability issues that could not be measured directly from watching and/or reviewing the audio-video recordings of the subjects’s behav-iour with the prototype. In both evaluations, the Parallel Coordinate Plot (PCP) main display visualization technique was re-garded as the most effective, most efficient, and the most satisfying in achieving the tasks that were given to subjects. After noticing the source of bias in using the PCP in the focus group evaluation, it was made a point that in the second evaluation other techniques should equally be tried. As a result the Dimension Stacking (DS) interaction technique also (brush toolbox) turned out to be a good tech-nique for geographical data searches through querying metadata. The Star Glyph main display tech-


80

nique, to a certain extent, was found to be the most effective in displaying a search result for a check on ‘details on demand’. The issues of nominal labeling, nominal value ordering and quantification, multiple interfaces, and the zooming mechanism, turned out to be critical issues for geographical metadata visualization. Some problems were encountered with the evaluations:

• Not as many subjects as was expected responded or turned up. • The significant bugs of the version of XmdvTool used in the focus group evaluation. • The bugs with the version that was used first in the second evaluation, but was later replaced

with version 5.0. In this case, tests had to be postponed, and some time was lost. But, in general, the evaluations provided a rich learning opportunity about the user requirements and tasks in the use of geographical metadata, and users’ expectations with a visualization product if it is to be used for metadata visualization.


81

6. Conclusions and Recommendations

The research proposed ‘visualization’ of geographic metadata as a potentially usable approach to browse through geographical metadata when searching for geographical data. The objectives were 1) to develop a geographical metadata visualization prototype and 2) to evaluate, with users of geo-graphical metadata, the usability of the visualization techniques. The prototype was supposed to at-tempt to address user tasks and requirements (determined from literature review) in the use of meta-data for geographical data search purposes. A set of eight research questions was put forward to be answered by the end of this report. Question 1, 3, and 4 required a study of the relevance of metadata in a distributed setting of geographic data storage and sharing, study of the users of geographical metadata, the tasks they perform with the metadata, identification of the target users for this study, and essentially what they require with the use of the metadata in order to satisfy their data needs. Some research regarding these issues by a few initiative projects, and some research individuals, were uncovered and their results were highlighted in Chapter 2 and the first part of Chapter 3. A user task and system response model at the end of Chapter 2 was built in order to form a basis for a user-oriented approach in satisfying the tasks of geographic metadata users. Question 2 was referring to the data characteristics of geographic metadata. The question was specifically relevant in determining how challenging it would be to visualize geographic metadata. Therefore it was dealt with in more detail in Chapter 3, where it was followed by a look into existing visualization techniques and tools to assess the feasibility of developing a visualization prototype for metadata. Question 5 required justifi-cation of the hypotheses of the research by looking at the visualization principles in relation to tasks users perform with geographical metadata. The question was meant to be fully answered, in this study, after the user evaluation with examples of such tasks. Question 6 referred to a more reliable user evaluation of the principles identified by Question 5 through a prototype with target users. Therefore the conclusion of the study highlighted which techniques were found to be most usable. Question 7 stamped on question 6, and together with 8 necessitated a usability evaluation of the techniques with the target users of the study in order to evaluate the hypothesis. Therefore, the conclusions specifi-cally addressed questions 6, 7 and 8. In addition to the eight questions addressed by the research, four issues were apparent at the beginning of the research – the problems that inspired this study. Within the framework of the objectives and the four problems and the specific questions, conclusions were derived about the usability of visualization of geographic metadata. Problems and limitations encountered during an effort to achieve the objec-tives, were also highlighted. Finally, some recommendations were made about future research possibilities.


82

6.1. Conclusions

6.1.1. Recap on problems

Multi-dimensionality of geographical metadata: textual presentation on a dataset-by-dataset basis or visualization In this research, ‘multidimensional’ visualization techniques have proven usable in the search process for geographical if ‘many’ metadata elements are used. It is evident from the user responses to the questionnaires that dynamically linked interactive brushing of multivariate visualization display tech-niques can efficiently speed up the detailed (multi-dimensional) comparison of individual data items, even in the case of nominal metadata elements. However, visualizations of geographical metadata with multi-dimensional techniques cannot entirely replace the conventional approach due to the in-credibly large number of geographical metadata elements whose values cannot be easily standardized for ease of visualization. Examples include dataset ‘Abstract’ information, data ‘Purpose’ informa-tion, logical consistency reports (for the FGDC standard), and many others. ‘NULL’ values The only visualization technique thus far that can display nominal values is the Parallel Coordinate Plots in the version of XmdvTool mentioned above. According to the user feedback from the user test-ing, the plotting of ‘NULL’ values as outliers in the main plot necessitates unnecessary zooming-in which, the system currently does not handle very well; at very coarse zoom-in mode, the system re-sponds slow to panning and scrolling. Also, in the same mode, there is a high possibility of a user get-ting lost in the zoom brush as explained in the previous chapter and in the Appendices. Therefore the only apparent solution at this stage would be visualizing ‘NULL’ values with the display of the text ‘NULL’, using a nominal data visualization tool. Data explosion Not much can be said at this stage about the issue of very large volumes of geographical metadata (for large data collections and series). Not much can be said because that issue was not evaluated with us-ers in the research. A task scenario for metadata use that required the use of hierarchically clustered metadata items (where the clustering mechanism is not controlled by the user but is computed during the pre-processing of the metadata) could not be identified. All that can be said is that with the interactive zooming, panning and distortion possibilities the Paral-lel Coordinate Plot and the Scatterplot matrix can easily display over a thousand items without a prob-lem. The dataset that was used consisted of only 141 metadata items, and the data display space had a lot of empty spaces which could easily display many more items without clutter. Maybe that is a case where the Scatterplot matrix may become usable. Usability of prototype in general and of the concept of geographical metadata visualization against existing metadata services This problem also addressed Question 8: Can visualization in geographical metadata services offer a more usable means of achieving users’ tasks on geographical metadata than the existing means? Without a nominal data visualization feature, the prototype is not usable for geographic metadata.


83

The issue of nominal value ordering and quantification, on the numeric scales, can be solved by a careful study of the relationships between the nominal values. This can be achieved by studying the impressions that geographical metadata users have, when they search for data, about the logical order of nominal values on a numeric scale, especially in cases where they do not start their searches with pre-defined search criteria. Without pre-defined search criteria users have to carefully guess which items are better through a comparison of the degrees of relevance of each of the nominal values for the search in question. It is a study that requires careful analysis of the way users of geographic meta-data think. It might even be studied by a think-aloud approach since the method studies cognitive be-haviours of people. Visualization of metadata has a potential in geographical information science. As explained above, due to the existence of many non-standardized elements in most geographical metadata standards visualization of the entire element set of a geographical metadata file is not currently feasible. But some visualization techniques, especially PCPs and the DS brush toolbox from XmdvTool 5.0, in a geographical metadata service can act as catalytic components for the search process. Other interfaces such as ‘alphanumeric’ interfaces, ‘toggle’ interfaces (as suggested by users), and map browsers (which already exist in other metadata facilities such as URL 1 and URL 12 are still necessary for, respectively, ‘quick and accurate’ searches on numeric elements, on elements that have few nominal values, and on geographic area extent. Therefore, a graphic user interface of visualization displays cannot, on its own, provide a fully usable approach in using geographic metadata.

6.1.2. Recap on questions

Question 6: Which visualization techniques can be used to address the specific user tasks and requirements, and the characteristics of geographical metadata? Question 7: Are the techniques usable? The test results from both evaluations indicated that the main display of the Parallel Coordinate Plots (PCP) is a powerful multi-dimensional technique in searching for data by using geographical meta-data. The Dimensional Stacking (DS) toolbox is the most intuitive interactive brushing mechanism. The Star Glyphs technique is powerful in displaying (for ‘details on demand’) results of an earlier search that is carried out with another technique such as the PCP or the DS brush toolbox. The Dimension Stacking main display is the most non-informative technique (not only for geographi-cal metadata). It was found not to be usable for the purpose of the prototype. The Scatterplot matrix is powerful in showing the correlations between two metadata elements. But it was not used, at all, by choice, in the tasks, by any of the test users. The test users who agreed that checking the correlation between two metadata elements is a relevant task in a search for geographical data were even a bit sceptical in believing that it is really relevant. Therefore the Scatterplot matrix proved to be not usable as a main display technique. A mere display of data search results may be an optimal approach with the Star Glyph main display technique as explained above. But mere display is certainly not optimal with other techniques. For


84

display techniques to be more usable they need dynamic interaction. This is where the toolboxes, the dynamic zooming, panning, and distortion’s usability became evident: The interactive brush toolbox of the Star Glyph technique was also almost not used at all. It did not provide the needed satisfaction to the users. In case where it was used, it was not used for the whole task. It was always abandoned for the PCP main display or the DS brush toolbox. The toolbox also had a bug as can be read from the Appendix F. It is not as intuitive as the DS toolbox, and it lacked an interactive zooming feature just like the DS toolbox. The Star Glyph brush toolbox was therefore, found not to be a useful interactive brushing toolbox for geographical metadata. The distortion technique was not better than the interactive zooming as long as it was not possible to distort in the ‘vertical’ direction. The vertical distortion possibility only existed with the Scatterplot matrix main display, which unfortunately, was found not to be one of the favourite techniques for geographical metadata visualization. Zooming as far as visualization is concerned cannot be replaced. Its usability was noted from the very beginning of the user evaluation with the focus group evaluation where it was used in almost every task. The fact that sometimes metadata items were clusted together on numeric scales necessitated coarse zooming-in in order to delineate individual items or increase the sensitivity of the message bar when displaying the numbers.

6.1.3. Recap on objectives

First objective The first objective was to develop a prototype that meets user requirements and tasks as determined from literature review. From the literature review, one of the logical and typical basic criteria for users is that of selecting datasets that cover certain geographical aerial extents or locations. Another one is to select datasets that belong to particular topic categories.

1. The first limitation of the prototype was the lack of relational database management system (DBMS) support (SQL query support) in order to query for datasets from different topic cate-gories where a single dataset could belong to several categories depending on the range of themes in the information it contains.

2. The second limitation was related to the first one. Lack of a DBMS support also meant lack of GIS support to, intuitively, satisfy area-based queries - visual geospatial referencing is an in-tuitive approach to the use of geospatial data, and therefore, geographic metadata. The proto-type lacked that feature too.

3. However, according to (Keim, Panse, Sips, 2003), the visualization system for the prototype consisted of four visualization display techniques from ‘three’ different classes of information visualization technique. The prototype was, therefore, a good choice that provided a wide choice of visualization principles in order to do a reliable user evaluation of the concept of visualizing geographic metadata. The four techniques were also supported by a highly usable technique of dynamically linked brushing.

Second Objective

1. The usability evaluation of the prototype, as a whole, was not complete because it was not running in a web-environment. Users want to access a geographical metadata service from anywhere anytime. That is why most metadata services are web-based.


85

The effects of the web environment were not considered and therefore the usability evaluation of the whole prototype is still somehow vague in that respect.

2. However, the strengths of the principles of individual techniques (or individual classes of techniques) for geographical metadata visualization were evaluated, with regard to the user requirements, because the techniques were all tested in the same environment.

6.2. Recommendations

1. The issue of ‘data explosion’ was not addressed in the usability testing. It would be worth-while to study the relevance of hierarchical display techniques by coming up with relevant user tasks and a larger collection of metadata for the test.

2. XmdvTool was running on a desktop environment. Most geographical metadata services are web-based. A few of the techniques, especially the Parallel Coordinate Plots (which already exists in many Java environments such as GeoVista Studio) and the Dimension Stacking brush toolbox should be tested on a web-based tool. In that case, the efficiency of the tech-nique can be directly tested against the efficiency of conventional web-based metadata service interfaces since they will be running in a similar computing environment. The techniques can even be embedded in an existing service, with most of the conventional interfaces still possi-ble. In that case the technique would not be tested for the use purpose of obtaining a metadata ‘overview’, but for comparison of selected metadata items. Actually, in this study, the normal overview queries with geographical metadata (using geographical area extents and topic cate-gories) were not tested.

3. Later versions of XmdvTool, with a nominal data visualization feature, running much faster without the current bugs, should be tested.

4. Some multivariate visualization techniques (such as pixel-based techniques of the VisDB visualization system) were not tested. For example, the VisDB tool and it technique could not be tested due to the fact that the tool only belonged to one class of information visualization techniques , and as such did not satisfy the selection criteria for the prototype. GeoVista Stu-dio has a Java Bean for a pixel-based technique. It may not be exactly like the one in VisDB and possibly may not have as much interactive functionality as the one in VisDB. But it could also be tried.

5. The user requirements and tasks analysis in this study was done from literature. Methods such as interviews and questionnaires should be tried in order to gather more reliable data on user requirement and tasks before deciding how relevant particular techniques may be for achiev-ing the tasks.

6. Finally, just as users become overwhelmed with the amount of text of metadata elements that they have to browse though, after a query on conventional metadata services, it is also over-whelming to display an abundance of metadata elements as tens of dimension axis. The dis-play space becomes cluttered and, therefore, non-informative. Therefore the default number of displayed metadata elements should be reasonably few: less than 10. The user, at will, can add more elements to the display if he wishes to do so.


86

BIBLIOGRAPHY

Ahonen-Rainio, P. (2003). Concept Testing of Some Visualization Methods for Geographic Metadata. Helsinki, Helsinki University of Technology: 15. Ahonen-Rainio, P., Kraak, M. J. (2003). "Towards multivariate visualization of metadata of geo-graphic information." Exploring Geovisualization. Albertoni, R., Bertone, A., De Martino, M. (2003). A Visualization-Based Approach to Explore Geo-graphic Metadata. WSGS, Plzen, Czech Republic. Anilkumar, P., Ward, M. O., Rundesteiner, E. A. (2003). Seamless Integration of Diverse Data Types into Exploratory Visualization Systems. Worcester, Worcester Polytechnic Institute. Calhoun, K. (2002). "Special Section: Metadata." Library Collections, Acquisitions & Technical Ser-vices 26(3): 195-197. Consortium, O. G. (2001). The Open GIS Abstract Specification Topic 11:Open GIS(tm) Metadata (ISO/TC 211 DIS 19115). Massachusetts, Open GIS Consortium: 1 - 149. Deng, Y. (2002). The Metadata Architecture for Data Mangement in Web-based Choropleth Maps, University of Maryland. Edsall, R. M., Roedler, A. J. (2002). An Enhanced GIS Environment for Multivariate Exploration: A Linked Parallel Coordinate Plot Applied to Urban Greenway Use Survey Data, Arizona State Univer-sity. ETEMII (2001). Report on Metadata User's Needs, European Territorial Management Information Infrastructure. Faulkner, X. (2000). Usability Engineering. New york, Palgrave. FGDC (2000). Content Standard for Digital Geospatial Meatadata Workbook, U.S National Spatial Data Infrastructure. Fuhrmann, S. (2002). Facilitating Wayfinding in Desktop GeoVirtual Environments. Institute for Geoinformatics. Münster, University of Münster: 224. Gluck, M. (1997). A Descriptive Study of the Usability of Geographic Metadata. Florida, Florida State University. Gobel, S., Jasnoch, U. (2001). "Visualization techniques in metadata information systems for geospa-tial data." Advances in Environmental research 5: 415-424. Hill, L. L., Carver, L., Larsgaard, M., Dolin, R., Smith, T. R., Frew, J., Rae, M. (2000). "Alexandrial Degital Library: User Evaluation Studies and System Design." Journal of the American Society for Information Science 51(3): 246-259. Jessen, T., Lillethum, A. (2003). European Environment Agency: Metadata Standard for Geographic Information (EEA-MSGI), European Environment Agency.


87

Keim, D. A., Panse, C., Sips, M. (2003). Information Visualization and its Application to Geography Related Data. Konstanz, University of Konstanz: 1 - 14. Kim, T. J. (1999). "Metadata for geo-spatial data sharing: A comparative analysis." The Annals of Regional Science 33(2): 171-181. Kristof, R., Satran, A. (1995). Interactivity by design: Creating and Communicating with New Media. California, Adobe. Lee, M. D., Butavicius, M. A., Reilly, R. E. (2003). "Visualizations of binary data: A comparative evaluation." Human-Computer Studies 59: 569-602. Limbach, T., Müller, F., Klein, P., Reiterer, H. (2002). Visualization of Metadata Using the Supert-able + Scatterplot. ISI, Rosenberg. MADAME (2000). Comparative Evaluation of On-line Metadata Services and User Feedback, Meth-ods for Access to Data and Metadata in Europe (MADAME): 111. Monmonier, M., Gluck, M. (1994). "Focus Groups for Design Improvement in Dynamic Cartogra-phy." Cartography and Geographic Information Systems 21(1): 37 - 47. Morgan, D. L. (1998). The Focus Group Guidebook:. California, SAGE Publications. Nielsen, J. (1993). Usabilty Engineering. Boston, AP Professional. Redido-Cusi, D. (2002). Disseminating Philippine census data through the web. Geoinformation Proc-essing. Enschede, ITC. Rosario, G. E., Rundensteiner, E. A., Brown, D. C., Ward, M. O. (2003). Mapping Nominal Values to Numbers for Effective Visualization. Worcester, Worcester Polytechnic Institute. Shneiderman, B. (1997). Designing the user interface: Strategies for effective human-computer inter-action, Addison-Wesley. Spence, R. (2001). Information Visualization, Addison-Wesley. Swayne, D. F., Buja, A. (1998). "Missing Data in Interactive High Dimensional Data Visualization." Computational Statistics 13(1): 15-26. Timpf, S., Raubal, M., Kuhn, W. (1996). Experiences with Metadata. 7th Int. Symposium on Spatial Data Handling, SDH'96,, Delft, Netherlands. Turner, T. (2002). "What is Metadata?" Kaleidoscope 10(7). Uhlenküken, C., Schmidt, B., Streit, U. (2000). "Visual exploration of high-dimensional spatial data;requirements and deficits." Computers & Geosciences 26: 77-85. Wang, Y., Liu, Y., Chen, X., Chen, Y., Meng, L. (2001). "Adaptive Geovisualization - an approach towards the design of intelligent geovisualization systems." Journal of geographical Scineces 11: 1-8. Webster (1981). Webster's third new international dictionary.


88

Wilkinson, L. (1999). The Grammar of Graphics. New York, Springer-Verlag. Yang, J., Peng, W., Ward, M. O., Rundensteiner, E. A. (2003). Interactive Hierarchical Dimension Ordering, Spacing and Filtering for Exploration of High Dimensional Datasets. Worcester, Worcester Polythechnic Institute. Yang, J., Ward, M. O., Rundenstein, E. A. (2003). "Interactive hierarchical displays: a general frame-work for visualization and exploration of large multivariate data sets." Computers & Geosciences 27: 265-283.

URLs URL 1: http://webclient.alexandria.ucsb.edu/ URL 2: http://www.fgdc.gov/clearinghouse/clearinghouse.html URL 3: http://www.geodata-info.dk/ig-ds.asp?LA=2 URL 4: http://commongis.jrc.it/index.html URL 5: http://davis.wpi.edu/~xmdv/ URL 6: http://www.informatik.uni-halle.de/dbs/Research/VisDB/visdb.html URL 7: http://www.geovistastudio.psu.edu/jsp/index.jsp URL 8: http://www.research.att.com/areas/stat/xgobi/ URL 9: http://www.or.blm.gov/gis/resources/library.asp URL 10: http://davis.wpi.edu/~xmdv/fileformats.html URL 11: http://www.r-project.org/ URL 12: http://geodata.itc.nl/index.html


89

Appendices

Appendix A

USABILITY TESTING EXERCISE (25 minutes)

Getting familiar with the interface and the functions of the tool:

1. Open the interface: Start XmdvTool.exe from (D:\UsabilityTesting\Xmdv5\WithoutDB\XmdvTool5.0) 2. Maximize the interface.

3. From the Toolbar, click the yellow ‘Open File’ tool button (see figure 1) or go to File\Open on the menu bar.

In the Dialog Window, navigate to the folder (D:\UsabilityTesting\Exercise\) Select the .okc file ‘AllMeta’ The default display of the data is a parallel coordinate plot (PCP). However, any of the display techniques (see Display mode buttons) can be selected at any time.

The XmdvTool 6.0a interface

4. Familiarize yourself with the meanings of the labels (For example; status, maintenaceFreq) displayed for the meta-data elements (dimensions). Familiarize yourself, also, with the meanings of the displayed minimum and maximum values along the axes of the dimensions. The other values in between maximums and minimums become visible on the message bar (figure 1) as you move the mouse pointer along an axis on the display.


90

5. Experiment with the preset display colours from the ‘Colour Requester’ (View\Colour Requester). Once the Colour Requester dialog is open, click on the ‘Preset Themes’ button and select one of the themes. You can also open the ‘Colour Requester’ from the colourful button among the three tool buttons (figure 1).

6. When you are done with step 5, close the ‘Colour Requester’ dialog, and open any other display technique to have

a different view of the metadata collection. Select a display technique from the ‘Display Mode Buttons’ (see Figure 1) Note that as you move the mouse pointer over the display mode buttons, the name of the technique is reflected on the Message Bar at the bottom of the main window. You can also go to View\Display Mode on the menu bar to open any display technique.

You also have an option to open two display techniques simultaneously by going to View\Auxiliary Display on the menu bar. The Auxiliary Display window pops up (Figure 2). Click the ‘Select a Display’ button at the top of the window to select one of the techniques. But do not bother selecting the ‘Hierarchical’ techniques nor the Treemap nor the Structure-Based Brush.

Do the brushing in the auxiliary window and observe the dynamically linked changes in the main window (do the reverse process also, and note the reactions on the other display).

7. a. Experiment with all the four ‘flat’ display techniques in both the main window and the auxiliary window.

Again; do not bother experimenting with the Hierarchical display techniques. Note: In order to display the metadata using the dimension stacking technique, you will have to reduce the number of displayed dimensions to five or below.

8. From the Brushes menu on the menu bar (Figure 1) open • The ‘Flat Glyphs Brush Toolbox’. Select your main display technique as the Star Glyphs. Also, open the

corresponding ‘Glyph Key’ from the Tools menu to show the order of the dimensions/elements in angular degree units, in an anticlockwise direction from ‘3 o’clock’ as the zero (starting point). Brush the Star Glyph main display by using the ‘Flat Glyphs Brush Toolbox’. Observe the dynamically linked changes between the brush toolbox display and the main display.


91

• Then open the ‘Flat Dimension Stacking Brush Toolbox’ with the main display as the Dimension Stack. Also, open the corresponding ‘Dimension Stacking Key’ from the Tools menu to identify horizontal and vertical dimensions/elements of the metadata.

Brush the Dimension Stacking main display by using the ‘Flat Dimension Stacking Brush Toolbox’. Here also, observe the dynamically linked changes between the brush toolbox display and the main display.

Note that a brush toolbox does not exist for the Scatterplot Matrix display technique. Anyway, just like with Parallel Coordinate Plots, you can brush the metadata on the main display of the Scatterplot Matrix. Do not bother opening the brush toolbox for the Parallel Coordinates. It works similarly with brushing on the main display.

You can even brush simultaneously both the main display (of any technique) and the auxiliary display (of any technique) with a brush toolbox of any of the techniques.

9. Try to use the zooming, panning, and distortion interaction techniques to focus on interesting details of the dis-

play(s). Distortion can be activated from View\Distortion\Start. Note that the message on the ‘Message bar’ disappears once you ‘start’ the ‘Distortion’ interaction technique. You also cannot do the brushing while the distortion is active. In order to activate the message bar again and be able to brush the metadata, go to View\Distortion\Stop. To reset the display from a distorted display go to View\Distortion\Reset.

10. Do this small task:

Highlight datasets with scale values ranging from 1:10000 to 1:25000. The dimension referring to scale is labeled ‘spatialResolutionScale’ and the scale values on the message bar are displayed as 10000.00 and 25000.00, respectively. Open the ‘Brushed Data Value Dialog for Flat Displays’ from the ‘Brushes’ menu and see how many data sets have those values for scale.

Appendix B

USABILITY TESTING USER TASKS (40 minutes) Scenario: A geoinformation management scientist wants to study the change of cadastral data updates in the Oregon/Washington states, in the United States for the period between 1960 and 2001. He selects an area in Lakeview County in Oregon to be his pilot study area. This area lies between longitudes -126.00 and -117.00 and latitudes 41.30 and 46.70, forming a geographical bounding box. He needs cadastre data, transportation data, and administrative boundaries data that cover the geographical bounding box at spatial resolution scales between 1:10000 and 1:500000. Task1: Put yourself in the shoes of the scientist: Try to filter the data sets that are suitable for the scientist. Use the skills just ac-quired in the exercise and the information you got from the power-point presentation about the working of XmdvTool 6.0a system. Based on the topic categories (topicCategory); transportation data, municipal boundaries data, and cadastral data, and based on the geographical area name of interest (ExtentAreaName); Lakeview county, the first selection of the datasets has been done for you already from the metadata collection of the Bureau of Land Management, Oregon and Washington, in the United States. The bureau’s metadata covers data from 14 organizations, most of which are branches to the main body. The result of the selection is (D:\UsabilityTesting\TaskScenario). This is the data you are supposed to use for your tasks. Open the *.okc file ‘AllMeta’.


92

Your task is to filter the selection further, using one or more visualization techniques. The filtering should be based on:

• the correct geographical bounding box latitude range of values SboundLat = min. of 41.30 latitude units NboundLat = max. of 46.70 latitude units

• the correct geographical bounding box longitude range values WBoundLong = min. of -126.00 longitude units EboundLong = max. of -117.00 longitude units

• the correct time period ExtentTemporalBegin: 19940000.00 to 20040000.00

• the correct value range for the scale spatialResolutionScale: 10000.00 to 500000.00

The text typed in bold is the notation of the metadata dimension/element names as they appear on the displays. The text un-derlined is the notation of the dimension values along the axis of a dimension as they appear on the ‘Message bar’ (see EX-ERCISE, Figure 1) Take note of the ‘NULL’ values (missing data) and ‘unknown values’ (completely unavailable data). At the end of the task, check the accuracy of your result by opening the ‘Brushed Data Value Dialog for Flat Displays’ from the ‘Brushes menu. How many datasets did you filter? ……………… When you are through, do not disturb your brushed area. Leave it the way it is and proceed to the next task. Scenario From the result above, the scientist wants to select those data items, which are updated (maintenanceFreq) ‘whenever it is necessary’ (asNeeded – see the nominal labels on the Parallel Coordinate display), after which he scans through each of the items for information on:

• Contact persons for the metadata - contact • Status of completion of the data - status • The name of the responsible organizations for the data - responsibleParty , and • access constraints - accessConstraints. in order to start making plans for ordering.

TASK 2:

a. If the dimensions contact, status, responsibleParty, and accessConstraints are on the display, turn them off. Other-wise just continue:

i. Using the result of TASK 1 filter the datasets that have maintenanceFreq value ‘asNeeded’. How many datasets did you filter?................

Again, do not disturb your brushed area.

ii. Using the ID values of the filtered datasets, which can be seen in the ‘Brushed Data Value Dialog for Flat

Displays’ from the Brushes menu, open corresponding textual metadata *.htm files of five of the datasets from (D:\UsabilityTesting\Textual_Metadata). Scan through each of the files for information on contact per-sons (under Distribution Information), status of the dataset (under Identification Information), responsi-bleParty (referred to as ‘Originator’ under Identification Information), accessConstraints, and useCon-straints (under Identification Information).

Again, do not disturb your brushed area.

b. Turn on again the dimensions accessConstraints, responsibleParty, status, and contact:


93

�� Make sure that the brush area on these newly switched-on dimensions covers all the metadata items in order to keep the previous selection the same.

�� Look for the same information as in 1,b. above, but, this time, use one or more of the visualization techniques – not the *.htm files. Do not use the ‘Brushed Data Dialog for Flat Displays’ either.

Take note of ‘NULL’ and ‘unknown’ values.

Short tasks: TASK 3 Using the main display of the Dimension Stacking technique (remember to reduce the number of displayed dimensions to ‘five’ or less) and its brush toolbox, highlight data items lying between a minimum of 40.00 latitude (SBoundLat) units and a maximum of 42.00 latitude (NBoundLat) units. You can open the ‘Dimension Stacking Key’ to help you identify, on the main display, horizontal and vertical dimen-sions/elements of the metadata. How many datasets did you highlight?................. Maximize the brush area. Use the Parallel Coordinate Plot to achieve the same task (You can even open the ‘Parallel Coordinates Brush Toolbox’ if you want. But you will notice that the toolbox works similarly with brushing on the main display of the technique). How many dataset did you highlight?................... Note the efficiency and effectiveness of each of the two techniques in achieving the task. At this stage, you can change the brush area if you want. The next task does not depend on it. TASK 4

A. Highlight, using one brush, datasets with the spatialResolutionScale values 10000.00 and 250000.00. You will have to brush for 10000.00, check the result with the ‘Brushed Data Value Dialog for Flat Displays’ (‘Brushes’ menu) and then do the same thing for 250000.00.

B. Then highlight, ‘simultaneously’, datasets with the spatialResolutionScale values 10000.00 and 250000.00 by us-ing two brushes. Also, open the ‘Brushed Data Value Dialog for Flat Displays’.

You should get similar results for both A and B: Was that the case?..................

TASK 5 Check, visually, if there is a positive or negative correlation (or mixed correlation) between dimensions (metadata elements) dateStamp and ExtentTemporalBegin between ranges 18800000.00 to 20100000.00 (dateStamp) and 19700000.00 and 20100000.00 (ExtentTemporalBegin). Do so by using the Scatterplot Matrix and the Parallel Coordinates, one after the other. Note the informativeness of each of the techniques for the task. If the status of the data set is ‘complete’ do you expect the ExtentTemporalBegin date to be earlier or later than the dateS-tamp date?………..

Note that ‘dateStamp’ refers to the publication date of the metadata while ‘ExtentTemporalBegin’ refers to the be-gin date for the temporal extent of the data. The two dimensions have a similar scale (note their minimum and maximum values on the corresponding axes on the displays)

Take note of ‘NULL’ and ‘ unknown’ values since you cannot use them to check the correlation.


94

Appendix C QUESTIONNAIRE FOR USABILITY EVALUATION OF VISUALIZATION TECHNIQUES OF THE PROTOTYPE FOR

GEOMETADATA

1. Among the four main display techniques (see illustrations), you used one of them, or you used one of them the most to complete Task 1. Why do you think you preferred that technique to the other three?

� I found the technique to be the most informative in order to complete the task accurately

� I felt I did not have to use much effort to complete the task with it than I would with the others

� I just felt more comfortable with using it than with using the others

� I just liked it

� Other reason ……………………………………………………………………………………………………. …………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

2. Among the three dynamic brushing mechanisms/ brush toolboxes (see illustrations) including brushing on the main display of the Parallel Coordinates or the Scatterplot Matrix, you used one of them, or you used one of them the most to complete Task 1. Why do you think you preferred that technique to the others?

� I found the technique to be the most informative in order to complete the task accurately

� I felt I did not have to use much effort to complete the task with it than I would with the others

� I just felt more comfortable with using it than with using the others

� I just liked it

� Other reason: …………………………………………………………………………………………………… …………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

3. On a scale ‘1’ to ‘5’, in Task 2, did you find the approach of scanning through the textual metadata (see illustra-tion) effective, efficient, and/or satisfying as compared to the approach of using the visualization techniques to find the same information (‘1’ refers to ‘not effective, ‘not efficient’, or ‘not satisfying’, ‘3’ refers to ‘moderately effec-tive’, ‘moderately efficient’, or ‘moderately satisfying’ and ‘5’ refers to ‘very effective’, ‘very efficient’, or ‘very sat-isfying’)?

Effectiveness:

• Textual metadata • Visualization techniques

Efficiency:


Satisfaction:


Remarks ………………………………………………………………………………………….……………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

4. On a scale ‘1 to 5’, in Task 3, which of these two ‘main’ display techniques did you find informative (‘1’ refers to ‘not informative’, ‘3’ refers to ‘moderately informative’, and ‘5’ refers to ‘very informative’)?

�� Dimension Stacking main display �� Parallel Coordinates main display

5. On a scale ‘1 to 5’, in Task 3, how can you rate these two brushing mechanisms in terms of effectiveness and effi-

ciency (‘1’ refers to ‘not effective’ or ‘not efficient’, ‘3’ refers to ‘moderately effective’ or ‘moderately efficient’, and ‘5’ refers to ‘very effective’ or ‘very efficient’)? Effectiveness:

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5


95

�� Dimension Stacking Brush Toolbox �� Parallel Coordinates Brush Toolbox (or brushing

on the main display of the Parallel Coordinates)

Efficiency:

�� Dimension Stacking Brush Toolbox �� Parallel Coordinates Brush Toolbox (or brushing

on the main display of the Parallel Coordinates)

6. On a scale ‘1 to 5’, which of these two techniques did you feel comfortable with in achieving Task 3 (‘1’ refers to ‘not comfortable with’, ‘3’ refers to ‘moderately comfortable with’, and ‘5’ refers to ‘very comfortable with’)?

�� Dimension Stacking main display & Brush Toolbox �� Parallel Coordinates main display & Brush Toolbox

(or just the main display)

Please explain you answer briefly………………………………………………………………………………………………. ………………………………………………………………………………………………….................................................. …………………………………………………………………………………………………………………………………..

7. In Task 4, you used two approaches (one brush two times to get complementary results and two brushes to get the result all at once) On a scale similar to that of question 5, how do you rate the two approaches in a process of selecting data sets with ‘non-contiguous’ values on a dimension axis (for example values 10000 and 250000, where in metadata you used, in between them, there are other values such as 12000 and 24000 on the spatialResolutionScale axis)?

Effectiveness: �� One brush - two times �� Two brushes - at the same time ��

Efficiency: �� One brush - two times �� Two brushes - at the same time

………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

Can you suggest any other search criteria where a user might need to use more than one brush at the same time (or the same brush more than once) to select datasets?

Please explain briefly………………………………………………………………………………………………… ………………………………………………………………………………………………………………………. ……………………………………………………………………………………………………………………….

8. On a scale ‘1’ to ‘5’ (‘1’ refers to ‘No’, ‘3’ refers to ‘Undecided’, and ‘5’ refers to ‘Yes’), do you think checking the ‘correlation’ between some geographical metadata elements is important for geographical metadata search/exploration purposes? An example could be the correlation between spatialResolutionScale and the number of geometry objects (number of geometry primitives) in a data set - that is; for example, whether a topographic map with a large scale has fewer built-up areas and roads than a map with a small scale or vice-versa.

Please explain your answer briefly ………………………………........................................................................ …………………………………………………………………………………………………………………… …………………………………………………………………………………………………………………… ……………………………………………………………………………………………………………………. On a scale similar to that of question 4, how can you rate each of these two techniques in showing the correlation between dateStamp and spatialResolutionScale?

1 2 3 4 5 1 2 3 4 5

1 2 3 4 5 1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5


96

• Scatterplot Matrix �� Parallel Coordinates

9. Did you use the ‘Distortion’ technique?

� Yes � No

If the answer is ‘Yes’, on a scale similar to that of question 8, did the distortion make the display more effective?

10. On a scale similar to that of question 8, on the Parallel Coordinates main display, do you think that visually displaying the nominal labels (see illustration) for nominal values would be effective than displaying the numeric codes of those values as has been done in the prototype?

11. On a scale similar to that of question 3, if you were to compare the prototype to existing geographical metadata ser-

vices in selecting geographical data by using metadata, how would you rate it in terms of effectiveness, efficiency, and the satisfaction with which you executed your tasks?

�� Effectiveness

�� Efficiency, �� Satisfaction

Remarks……………………………………………………………………………………...................................... ……………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………

12. On a scale similar to that of question 3, in general, how can you rate ‘visualization’ of geographical metadata in

terms of effectiveness, efficiency, and the satisfaction with which you executed your tasks?

�� Effectiveness

�� Efficiency, �� Satisfaction

Remarks…………………………………………………………………………………………………………… …………………………………………………………………………………………………………………….. ……………………………………………………………………………………………………………………..

13. Please add here any additional comments about the prototype, the individual visualization techniques in the proto-

type, and/or the concept of visualization of geographical metadata. …………………………………………………………………………………………………………………….. ……………………………………………………………………………………………………………………. ……………………………………………………………………………………………………………………..

THANK YOU VERY MUCH FOR YOUR PARTICIPATION!!! If you have more comments, please let me know when I can come to collect or discuss them. Here is my e-mail; [email protected].

Appendix D

NOMINAL-TO-NUMERIC CONVERSION TABLES FOR THE SECOND EVALUATION

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5


97

hierarchyLevel: 1.4065409…………series 0.7064138…………dataset -1.1788889………...collectionSession -10.2000000……….NULL contact: 8.8138960…………B Gravenmier 0.2124902…………S Frazier -2.9044800………..P Keller -4.1139040………..J Thompson -10.2000000………NULL responsibleParty: BLM………………………………………………………0.07733029 UniversityOfMontana……………………………………. 0.33475654 USFandWS……………………………………………….. 0.32475654 USGSWaterResources…………………………………….0.29424045 BLM_DistrictOffices…………………………………….. 0.27487058 USGS …………………………………………………….. 0.26936159 BLM_ORWA……………………………………………..0.24563642 BLM Lakeview_Prineville_Burns_ValeDistrict…………..0.22872646 NULL………………………………………………………0.21872646 BLMandUSGS……………………………………………..0.20540764 DOGAMI/BLM …………………………………………..0.18139831 BLM_LakeviewResourceArea…………………………….0.18003582 RegionalEcosystemOffice ………………………………...0.11269638 BLM_LakeviewDistrict………………………………….. -0.08421639 BLM_OR …………………………………………………-5.39581692 status: 0.9606211…………onGoing -0.6848388………...complete -3.6969123………...planned -1.2000000…………NULL maintenanceFreq: 3.00000000……….. unknown 2.27566392…………continual 1.95134077………….annually 1.17301293…………asNeeded 0.07114531…………notPlanned -0.08036869………..irregular -8.20000000……….. NULL accessConstraints: -0.1889403………..noRestrictions -0.3637383………..restricted -10.2000000……….NULL useConstraints: 2.0628790…………otherRestrictions -0.4351463………...noRestrictions


98

-0.6980919…………restricted -10.2000000………..NULL spatialRepresentType: 0.3589607………… point 0.3390596………….vector -3.5496962…………grid -13.2000000……….NULL spatialResolutionScale: range……………………….996546.0 scaleIndependent…………..720382.5 24000………………………24000.0 126720……………………..126720.0 10000………………………10000.0 100000……………………..100000.0 500000……………………...500000.0 12000……………………….12000.0 250000…………………….. -250000.0 4800………………………… 4800.0 NULL………………………..-44394.9 Projection: 3.1781755 UTM1927Zone1&LambertConformalConic120.5 2.0490184 ARCcoordinateSystem&UTM1927Zones10and11 1.6681704 UTM1927_Zones10and11 1.3708829 LambertConformalConic120.5 -0.5504681 UTM1927_Zone10

Appendix E

USABILITY TESTING COMMENTS AND OBSERVATIONS

1. Specific visualization techniques and other tools a. The four main display techniques and the three brush toolboxes

i. Dimension Stacking (DS) 1. Main display was non-intuitive to all the test users. 2. A test user indicated that the display was only understandable when just two dimen-

sions were open. 3. Most test users made a mistake of counting the number of selected metadata items

from the main display. 4. Brush toolbox was intuitive to most test users.

ii. Star Glyphs 1. Main display was non-informative to most test users. 2. A test user indicated that the visual variable that catches his eye is the shape of the

glyphs. The fact that he couldn’t gather any information from the shape of a star glyph means that property of the glyphs is not usable.

3. It was the only main display showing all metadata items (selected and non-selected), individually.

4. Selecting a metadata item from the main display automatically selected similar items on the same display. However, it was not clear what thresholds were set on the se-lected item’s dimension values to determine its similarity to others.

5. A test user thought the brush toolbox would be more usable if it showed the individ-ual metadata items all plotted in one glyphs space just like with the parallel coordi-nate plots (PCPs)

iii. Scatterplot matrix


99

1. Most users found the Scatterplot matrix a better correlator of metadata dimensions 2. Only a few users thought a check of correlation between metadata elements was use-

ful for data searches with metadata. iv. Parallel coordinate plots

1. It was considered the most usable main display of all 2. Test users made a mistake of counting the number of selected metadata items from

the main display. 3. Not as good as the Scatterplot matrix for showing correlations

2. Other tools and techniques i. The ‘Brushed Data Value Dialogue for Flat Displays’ was regarded as a useful feature espe-

cially for numeric dimensions, for example, specialResolutionScale. ii. Possibility to re-order the dimensions and/or turn them on/off was regarded as a highly usable

feature. iii. For efficiency’s sake, test users suggested a possibility to turn all dimensions off or all dimen-

sions on at once. iv. There is no distortion in the vertical direction of the PCP display. v. Zooming became ineffective when whole dimension axis has to be zoomed while the user in-

terested in only a small range of values. Therefore interactive zooming by drawing a rectangle was recommended.

vi. There were complaints about not being able to undo a brush change, to undo a step in the dis-tortion operation, or to undo a selected colour from the ‘Colour Requester’ dialog. Not even with a keyboard operation ‘Ctrl’ + ‘Z’ could a user go back in a session.

vii. Test users got lost while zoomed-in on a brush area. They could not say whether they are closer to the minimum or the maximum boundary of the brush, thereby adjusting the wrong boundary, which cannot be undone.

viii. Alphanumeric zooming was very efficient for coarse zooming.

3. Data display by the techniques a. Order of nominal values on each of numeric scales by correspondence analysis was said to be non-

informative. b. Quantification of the nominal values on the numeric scales by correspondence analysis was not all intui-

tive to the users c. Nominal labels display was fully supported by everyone. d. Message (on the message bar) should be bigger e. Sensitivity of the message bar was annoyingly too low for efficient querying. f. Alpha-numeric interface was suggested for quick numeric value queries g. A toggle interface was also suggested for nominal dimensions with only few choices such as accessCon-

straints, which had only three options in the metadata set that was used ‘restricted’, ‘noRestrictions’, and ‘NULL’.

h. Plotting ‘NULL’ values away from the other values as suggested by Edsall and Roedler (2002) had a dis-advantage of forcing the other values to be clustered on one side of the axis, and thereby, necessitating zooming, panning, and/or distortion.

4. General functionality and visual variability of the interface

a. A few users commented that it is not effective not to be able to have a possibility to save the brushed (se-lected) items with only a few of metadata elements (dimensions) and still be able to attach the other meta-data elements (dimensions) when the saved items are loaded again.

b. Two concluded the system was not very compatible with the Windows systems because the operator has to click on a menu to open it. A menu cannot be open by just moving the mouse pointer over the menu bar.

c. One user insisted that a menu for checking the windows open at a particular moment would be more effi-cient than going to the task bar.

d. The menu name ‘Tools’ and the ‘Toolbox’ menu items in the ‘Brushes’ menu were confusing to the user. All the users got lost most of the time by opening the ‘Tool’ menu when they wanted to open one of the ‘Toolbox’ items from the ‘Brushes menu.

e. The use of the ‘Shift’ key to select one metadata item or a group of metadata items in a particular value range was considered effective and a good convention.

f. The ‘Colour Requester’ dialog was very usable: In total, three different themes were used where a user would claim to be more comfortable with the theme he selected. One of the users even changed the col-our of the brushed metadata items to the colour of the default non-brushed (red to green) because he thought the colour of the favoured items (brushed items) should abide by the convention ‘green’ for ‘Yes’ and not the opposite convention ‘red’. He also changed the colour of the non-brushed items from ‘red’ to the brush area’s colour so that any non-selected items falling partly within the brush area should melt into the brush area colour and therefore increase contrast of the brushed items


100

5. System Bugs a. The system was unable to open the DS main display when more 5 or more than 6 dimensions were dis-

played. b. On the Star glyphs Brush Toolbox, once the minimum and maximum boundaries were brought together

to one dimension value, the minimum could not be changed anymore unless another technique was used. c. Sometimes the ‘Shift’ key became inactive for selecting contiguous metadata items or single items on the

PCP main display or the Scatterplot matrix main display. d. The response time of the system increased when the user was coarsely zoomed-in and when many win-

dows were open at the same time.

Appendix F

AN EXAMPLE OF A DOCUMENT SHOWING IDENTIFICATION INFORMATION FROM THE FGDC METADATA STANDARD

Mineral Information Layer of Oregon (Point) Identification_Information Data_Quality_Information Spatial_Data_Organization_Information Spatial_Reference_Information Entity_and_Attribute_Information Distribution_Information Metadata_Reference_Information Identification Information Section Index

Citation: Citation Information:

Originator: DOGAMI/BLM Publication Date: Unknown Publication Time: Title: Mineral Information Layer of Oregon (Point) Edition: Geospatial Data Presentation Form: Series Information:

Series Name: Issue Identification:

Publication Information: Publication Place: Publisher:

Other Citation Details: Online Linkage: Larger Work Citation:

Description: Abstract: This theme shows the distribution of mines and prospects in the State of Oregon. Purpose: Resource Management Planning Supplemental Information:

BLM (Bureau of Land Management), DOGAMI (Oregon Department of Geology and Mineral Industries), MILOC (Mineral Information Layer of Oregon Counties), USBM (United States Bureau of Mines), USFS (United States Forest Service), USGS (United States Geologic Survey),USBM MILS (U.S. Bureau of Mines Mineral Information Layer by State), USGS CRIB/MRDS (U. S. Geological Survey Claim Recordation Information Base/Mine Record


101

Data Set), DOGAMI MLR (Mined Land Reclamation file)

Time Period of Content: Time Period Information:

Range of Dates/Times: Beginning Date: Unknown Beginning Time: Unknown Ending Date: Unknown Ending Time: Unknown

Currentness Reference: publication date Status:

Progress: Complete Maintenance and Update Frequency: Unknown

Spatial Domain: Bounding Coordinates:

West Bounding Coordinate: -124.5 East Bounding Coordinate: -116 North Bounding Coordinate: 46.25 South Bounding Coordinate: 42

Data Set G-Polygon: Data Set G-Polygon Outer G-Ring:

G-Ring: Data Set G-Polygon Exclusion G-Ring:

G-Ring: Keywords:

Theme: Theme Keyword Thesaurus: None Theme Keyword: Minerals Theme Keyword: Mines Theme Keyword: Prospects Theme Keyword: Geology

Place: Place Keyword Thesaurus: None Place Keyword: Oregon

Stratum: Stratum Keyword Thesaurus: Stratum Keyword:

Temporal: Temporal Keyword Thesaurus: Temporal Keyword:

Access Constraints: Discretionary, contains no sensitive information - generally considered releasable. Use Constraints: None Point of Contact:

Contact Information: Contact Person Primary:

Contact Person: Contact Organization:

Contact Position: Contact Address:

Address Type: Address: City: State or Province: Postal Code: Country:


102

Contact Voice Telephone: Contact TDD/TTY Telephone: Contact Facsimile Telephone: Contact Electronic Mail Address: Hours of Service: Contact Instructions:

Browse Graphic: Browse Graphic File Name: Browse Graphic File Description: Browse Graphic File Type:

Data Set Credit: Security Information:

Security Classification System: Security Classification: Security Handling Description:

Native Data Set Environment: Arc/Info; AIX/UNIX Cross Reference: Web Published:

Data_Quality_Information…. Spatial_Data_Organization_Information…. Spatial_Reference_Information…. Entity_and_Attribute_Information… Distribution_Information… Metadata_Reference_Information….

********************************

Documents

Visualizations of metadata in a GDI environment