49
Exploration Techniques for Stranded Customer Intelligence Collateral An MS-ISE Final Project Michael Hay – April 2014

Masters Project - FINAL - Public

Embed Size (px)

Citation preview

Page 1: Masters Project - FINAL - Public

ExplorationTechniquesforStrandedCustomerIntelligenceCollateralAnMS-ISEFinalProject

MichaelHay–April2014

Page 2: Masters Project - FINAL - Public

2

Table of Contents

ABSTRACT 4

1. INTRODUCTION 4

2. FACT-FINDING AND RESULTS 5

FACT-FINDING PROCESS 5RESULTS 6CONTENT HISTORY 6CONTENT EXPERIENCE 6CONTENT OUTCOMES 8OTHER DATA 9RESULTING USE CASE(S) AND REFLECTIONS 9

3. RELEVANT WORKS 10

4. TECHNOLOGY SELECTION, UTILIZED DATA, AND PROTOTYPE SYSTEM 11

TECHNOLOGY SELECTION 11UTILIZED DATA 13PROTOTYPE SYSTEM 13

5. TESTING APPROACH, ANALYSIS TECHNIQUES, AND RESULTS 16

TESTING APPROACH 16ANALYSIS TECHNIQUES 17RESULTS 18

6. CONCLUSIONS AND FUTURE WORK 20

REFERENCES 24

APPENDIXES 25

APPENDIX 1 – GENERIC INTERVIEW QUESTIONNAIRE & EXAMPLE INTERVIEW OUTCOME 25GENERIC INTERVIEW QUESTIONNAIRE 25EXAMPLE INTERVIEW OUTCOME 25APPENDIX 2 – USABILITY TEST QUESTIONNAIRE AND TASK LIST 32APPENDIX 3 – EXEMPLARY CUSTOMER INTERVIEW MATERIALS & RELATED JSON DATA 34NOTICE 34HELLER EHRMAN – EMPLOYMENT ATTORNEY 34RELATED JSON DATA 37APPENDIX 4 – PROTOTYPE EXEMPLARY SOURCE CODE 39KEY WORD EXTRACTION & IMAGE CREATION 39

Page 3: Masters Project - FINAL - Public

3

WEB USER INTERFACE FOR ALTERNATIVE 4 – SOURCE & GUI 40APPENDIX 5 – SURVEY QUESTIONS AND RESPONSES 42QUESTIONS 42RESPONSES 43APPENDIX 6 – RAW DATA WITH DESCRIPTIVE STATISTICS AND KEY STATISTICAL TESTS 44RAW DATA WITH DESCRIPTIVE STATISTICS 44KEY STATISTICAL TESTS 45

Page 4: Masters Project - FINAL - Public

4

Abstract A basic system that reveals new insights from existing customer interview material is presented. Construction of the system begins with a fact-finding process discerning key required use cases and tasks from the target audience. A coarse prototype consisting of four alternative visual representations of a set of customer interview material is implemented. The target audience will test these alternatives, with time to complete major test elements recorded. Measured results are compiled and compared to illustrate that not all visualizations result in optimal user performance. Finally, these findings are reported that illustrate benefits from relatively simple discoveries (i.e. structured file naming schemas) to complex reflections (i.e. building an audio search engine with key word visualization may improve productivity).

1. Introduction In the Information Technology (IT) markets, competitive tactics and business model changes are mandating reevaluations of how companies interact with their users. One impacted IT sub-market, computer data storage, is going through sweeping changes. An example of this is a movement away from the Standard Consumption Model1 to the Utility Consumption Model2. A second example within the Standard Consumption Model is the procurement of complete systems instead of just data storage; in fact users are purchasing complete systems inclusive of applications, management software, computer servers, data networking, and data storage. How can an organization detect macro changes and micro details leading to relevant portfolio changes that meet market needs? One obvious approach, to discerning change at both macro and micro levels, is to have Planners3 employ a very traditional fact-finding technique called a user interview [1]. These engagements target a representative set of users in an effort to understand and record their needs. From this set of recorded data Planners construct use cases, prototypes, and define requirements that are used in the development of an Information System. However, experience has shown that Artifacts4 derived from past fact-finding efforts are often stranded and forgotten. This results in Planners re-visiting the same topics, reengaging with the same customers lacking awareness of past interactions, failing to spot trends over time, lacking of awareness of specific geographic needs, and so on. To understand why Artifacts are forgotten and to potentially explore how they may become un-stranded, this project began by performing its own fact-finding activity

1 The IT Standard Consumption Model is defined as an approach whereby organizations purchase equipment and/or from a vendor and deploys equipment within a facility they own or lease. 2 The IT Utility Consumption Model is defined as a way of procuring IT functions by organizations in a way that is equivalent to utilities like power and water. In this method consumers do not directly purchase equipment, but instead pay for a service on a periodic basis, for example 10 Gigabytes of compute data storage for $50 USD per year. 3 Planners are people who have one or more of the following archetypical roles: Product Managers, Product Marketing Managers, Product Designers, Researchers, and Engineers. 4 Artifacts include but aren’t limited to PDF documents, presentation files, audio recordings, flat text documents, etc.

Page 5: Masters Project - FINAL - Public

5

[2]. Planners, the selected target audience, were interviewed to glean and document their most pressing needs. Based upon these interviews, a proper use case was derived and a prototype system constructed implementing the most pressing needs. The prototype implementation focused on the presentation and visualization aspects of a system. Ultimately, through utilizing usability testing techniques, a quantitative evaluation of user performance, or time to complete a task, occurred by comparing four alternative visualization approaches.

This paper is organized in the following manner; Processes and results of the fact-finding activity are reported in Section 2 including some hints at the analysis approach utilized. Section 3 reflects on findings from within relevant literature and suggests how it might impact the project. Section 4 details the data set, preparation of the data set, and basic construction of the prototype system. In Section 5 the usability testing approach is reported along with the results of the measurements comparing the alternatives. Lastly, conclusions and plans of future works are covered in Section 6.

2. Fact-Finding and Results

Fact-Finding Process Is any evidence proving the author’s assertion regarding stranded Artifacts available? Further how do Planners want to experience and interact with these Artifacts, and what do they expect to do with them? In an effort to understand answers to these key questions, a fact-finding activity was performed with a focus on interviewing Planners. Specifically, 16 interviews were conducted from February 10, 2014 to March 14, 2014, and each session lasted between 30-45 minutes covering four categories. During each interview the author captured notes including quotes and distillations of ideas for each interviewee. (Note, the Generic Interview Questionnaire and Example Interview Outcome are detailed in Appendix 1.) After concluding all interviews, the author read and analyzed the results by recording answers into Microsoft Excel. As answers to questions were easily discernable (e.g. Yes or No) the results were quickly recorded, but detecting more complex concepts required deeper reading that led to a transformation of concepts into summarizing key words5. For example where one interviewee might say, “…[if] I had the index or list of questions then I could at least see…” and another, “…if you have the experience of going to a public library you can do searches on anything…” then both attendees inputs were interpreted as a single concept assigned the key words “Index/Search.” With all interviews analyzed, results were compiled into a series of tables and graphs. These visual compilations were used to conclude if the original assertion regarding stranded Artifacts was correct, and uncover Planners’ preferred experience of the Artifacts. The latter concept, uncovering the users’ preferred experience, ultimately led to a key use case the author leveraged to construct the prototype system.

5 Whitten and Bentley don’t suggest that there are any hard and fast rules to distill ideas into discernable requirements or use cases. Instead they insinuate that this comes through both experience as a skilled systems analysis and through techniques like Joint Requirements Planning as well as via processes like brainstorming [1].

Page 6: Masters Project - FINAL - Public

6

Fact-Finding Results

Content History Were most Planners interviewed aware of the results of past customer interviews and the Artifacts representing them? Many, (11 out of 16), answered yes or partially yes they were aware. However, a deeper look across the body of interviewees revealed varying descriptions of at least location, type of interview, and scope. For example, participant 1 said, “I know some, but there are a bunch that haven’t been re-aggregated. So I don’t know where everything is.” While participant 3 stated, “I know there is material and I’ve looked for it and couldn’t find it.” These two examples suggest awareness of the Artifacts, but at the same time that there are more Artifacts than they knew of. Summary of the findings for this category are referenced in Figure 2-1, with Q.a. dealing with Artifact awareness and Q.b. dealing with the acceptability of Artifact format. While it would be easy to assume that, due to the number of positive and partially positive answers, there was complete awareness of the Artifacts the actual interview data doesn’t support this conclusion. Instead the best conclusion is that more than a super majority, or 87.5%, of the interviewees claimed to understand that what they knew represented a subset of all available Artifacts. Therefore, the assertion that Artifacts are stranded should be considered reasonable. Similarly, 75% of those interviewed found that the format for existing customer interview Artifacts either partially met or outright failed to meet their needs. Here, one can assume that additional ways to consume, experience and explore these Artifacts are highly desirable.

Content Experience This section in the questionnaire investigated different ways planners might want to interact with the Artifacts. Unlike the previous case, it was not possible to discern a positive, somewhat positive, somewhat negative or negative answer to a question. Instead the logic employed to gain an understanding of what kinds of use cases and capabilities are needed came from the following steps.

1. Qualitatively assign conceptual key words to statements representing ideas, 2. Read answers to each question across all interviews matching defined key words to related

concepts and ideas,

0% 20% 40% 60% 80% 100%

Q.a.

Q.b.

Q.a. Q.b.Yes 2 3PartialYes 9 6PartialNo 0 0No 5 6N/A 0 1

Figure 2-1 Summary of findings for awareness of Artifacts & desirability of customer study format.

Page 7: Masters Project - FINAL - Public

7

3. Evaluate the frequency of occurrence of key words in a bar graph and associated table, see figure 2-2, and

4. Correlate the key word to a usage pattern or capability likely needed in the prototype.

While every category could be reported on only the most frequent are reported in order of frequency. The most frequent concept, both directly and indirectly, is the idea of reporting the number of occurrences of key terms. Specifically, the concept was directly repeated nineteen times and then indirectly6 another nine times for a total of 28 total instances. By both direct and indirect measures the next most important concept was moving from abstracted customer interview data to detailed Artifacts, reported at 14 and 57, respectively for a total of 19 occurrences. Another interesting piece of information or data access pattern is Index/Search, which suggested that users wanted to invoke a “Google-like” approach to find the right set of Artifacts for deeper exploration. Finally, the last facet of relevant information gleaned, is the type of system that the users would want to experience the prototype on. Here, the highest frequency of reference was a laptop platform. All remaining key words are possibly of interest for the reader to explore, but not considered when constructing the prototype.

6 Here “indirectly” means that concepts “Key-Term | Time,” “Key-Term | Vertical,” and “Key-Term | Geo” can also be counted as “Key-Term” and are therefore add to the frequency of Key-Term. 7 Note that A to D & D to A represents Abstract to Detailed & Detailed to Abstract meaning the interviewees desired both patterns for accessing the Artifacts. This suggests the interviewees would at least be partially satisfied by a pattern starts from abstract summaries and moves to capturing individual detail or Artifacts.

8

14

13

8

19

5

1

5

23

1

5 5

1 1 1

02468101214161820

Figure 2-2 Key word frequency graph relating how Planners would like to experience user interview collateral.

Page 8: Masters Project - FINAL - Public

Content Outcomes What do Planners want to do with customer interview materials or the Artifacts? This section investigated this point in particular. Again, as described in the section, Content Experience the same process to assign key words and count their frequency was employed. Figure 2-3 graphically reports the findings of this section of the questionnaire.

Findings from the body of interviews were not surprising to the author. Specifically, the most commonly occurring key words representing important concepts include usage of the collateral for requirements/panning (freq. = 12), development of a personal strategy (freq. = 11), validation of plans or requirements (freq. = 8), understanding the customer strategy (freq. = 6), and to grow a repository of artifacts (freq. = 5). Given that those interviewed all had jobs that fell somewhere within the domains of product management, strategic product planning, product marketing, and engineering, the results aligned well to the kinds of functions a Planner would employ. While it is important to understand the motivation of the Planners, results from this section proved not impactful to the prototype system.

8

1 1 1

5

1

6

1211

21 1

02468101214

Figure 2-3 Key words summarizing what interviewees were most likely to do with the Artifacts.

Page 9: Masters Project - FINAL - Public

9

Other Data Beyond customer interview materials are other data sources that should be included in the work outcomes from Planners? The final section of the questionnaire allowed the interviewee to grapple exactly with this question. Interestingly, a consistent requirement emerged from the participants: A desire to consume data and information about their competitors in the market. In many cases the need for competitive information was expressed in combination with another topic such as competitive information and technology trends (shown in figure 2-4 as Competitive + Tech. Trends). Additional needs were expressed to combine competitive information more generally with other information like customer trends. For example, participant 4 stated, “[Competitive

information should be triangulated] with what the analysts are seeing and also [with] what our customers are thinking. To me, [with this combination] we should be so far ahead of the financial analysts.” This clearly illustrates that interviewees believed there was value in coupling data sources together. Like the previous section, results here were not reflected in the prototype system and are instead discussed as a point of future work.

Resulting Use Case(s) and Reflections Given the time scale of the project, the author hoped the fact-finding activity would result in a small set of use cases. Fortunately, the results from fact-finding easily supported the author’s hope, revealing a core use case desirable for Planners. Therefore, reviewing the Content Experience results and using the author’s experiences it was relatively easy to construct a single use case for prototyping, see table 2-1. Use case name: Abstract to detailed exploration User Action System Response

1. User logs in. 1. A screen that includes more than one

10 0 0

1 10

3

1 1

32

1 1 10

0.51

1.52

2.53

3.5

Figure 2-4 Report of key words of other types of data desired beyond customer interview collateral.

Page 10: Masters Project - FINAL - Public

10

alternative visual experience renders. 2. User clicks on one alternative. 2. Causes the alternative to run/start, and

each alternative includes: • An abstracted visualization of the

entire set of customer interview materials,

• Relevant controls to help the user traverse from the abstracted visualization of all interviews to one or more specific customer interviews.

3. User performs a task designed to cause them to traverse from an abstracted visualization to one or more specific customer interview documents.

3. Produces a summary of an individual interview including at least one URL (Uniform Resource Locator) pointing to a specific customer interview document.

4. User clicks on the URL(s). 4. Causes the referenced document(s) to download.

Table 2-1 basic use case, Abstract to detailed exploration, extrapolated from the results of the fact-finding process.

For this use case it was assumed that all alternative visualization approaches would provide participants some form of key term visualization, systematic control, and facilitate the download of discrete Artifacts via an URL. This implies that User Actions and associated System Responses for steps 2, 3 and 4 are intentionally generalized. Practically speaking, use case generalization is a sound best practice minimizing development efforts while maximizing capabilities. In fact, in the author’s experience, use cases may be developed in an object oriented manner so that one use case can “call” another. Use case interdependencies and hierarchies, expressed through object oriented ideals, help Planners better develop applications that meet user expectations without undue design and development burdens.

3. Relevant Works While processes, reports and visualizations of data from user surveys or polls are well-oiled machines today, the idea of automatically leveraging unstructured data Artifacts to design Information Systems seemed not properly studied. Given the success of product companies like Apple Inc. who must implement sound fact-finding and planning methods to achieve their stated corporate and financial goals, this was quite a surprising finding [3]. To be clear, the author is not asserting that domain specific or general-purpose mining, analysis and visualization of unstructured data like text, audio, and images are not well studied. Instead, the author is stating that the application of one or more of these techniques used to plan Information Systems is not readily available in the literature, and may not be well studied in detail. Therefore, the author assembled a set of seemingly unrelated reports and documents across a broad set of topics.

Page 11: Masters Project - FINAL - Public

11

In their work to analyze and visualize mobile Call Detail Records (CDR) Blaas et al. provided a method to approach the problem: discern key use cases, construct a prototype system, measure the results and study the findings [2]. An important need arising from the interviews conducted in this project is; key term or word frequency analysis and visualization. While the author did not employ advanced word frequency analysis techniques a broad understanding of the topic was needed, and Baron et al. supplied that [4]. A repeated theme in the interviews is a nearly explicit requirement to speed up or save time when a Planner needs to engage the Artifacts for their objectives and activities. This suggests that manual efforts to organize the Artifacts were to be minimized. In this spirit Tanner and Zhou from Lexis-Nexis provided insight on the idea of automatic content organization based upon key term analysis [5]. However, when generating the prototype system their structure and methods were not used. Furthermore, their approach to usability testing, of their prototype implementation, did not include a quantitative view of user performance. Other works were helpful in providing ideas for visualizing data – leveraging emerging Big Data visualization techniques – in ways that are comprehensible to users [6] [7].

4. Technology Selection, Utilized Data, and Prototype System

Technology Selection Construction of an operable prototype required a rapid survey followed by a quick selection of relevant technologies. Guiding the survey was one requirement discovered from the fact-finding phase, and one constraint. A requirement discerned from the fact-finding phase suggested a technology selection, which could work in a laptop context. Since the timescale was relatively short technology choices were constrained to select any relevant technology that quickened development time. Generally, the relevant types of technologies came in two categories: Toolkits that handled unstructured data processing and technologies that visualized data sets. Due to the author’s previous knowledge of the Python programing framework, and the availability of extensions that enabled key word counting, Python was selected for the processing of the unstructured data within the utilized Artifacts. Selection of the visualization toolkit proved more challenging; yet the application of the primary selection criteria facilitated a quick decision. For this phase several visualization toolkits were reviewed and a quick reporting of each follows in table 4-1. Name Description Comment(s) Tumult Hype Tumult Hype is an HTML5 authoring tool.

What is commonly referred to as “HTML5” is really a platform of technologies including the latest HTML tags, CSS styles, and improved JavaScript performance. HTML5’s capabilities allow for stunning visual effects and smooth animations, but previously required difficult hand-coding. There were no designer-friendly tools for building animated HTML5 content…

Sufficient for building rapid prototypes; yet the addition of live or semi-live data requires coding independent of the tool.

Page 12: Masters Project - FINAL - Public

12

until Tumult Hype [8]. D3.js D3.js is a JavaScript library for manipulating

documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation [9].

To achieve the construction of any system requires deep knowledge of Javascript development techniques.

MIT’s SIMILE Exhibit Exhibit 3.0 is a publishing framework for large-scale data-rich interactive Web pages. Exhibit lets you easily create Web pages with advanced text search and filtering functionalities, with interactive maps, timelines, and other visualizations. The Exhibit 3.0 software has two separate modes: Scripted for building smaller in-browser Exhibits, and Staged for bigger server-based Exhibits [10].

System construction requires understanding of emerging text based data structures like JSON and slight modifications to standard HTML code.

Table 4-1 a quick description of the various toolkits reviewed.

Due to the author’s previous knowledge of Hype and Exhibit, and an ability of these technologies to display on laptops and mobile devices, the requirement and constraint were met. Therefore, deep technical evaluations of each toolkit were not performed and the technology selection was concluded. An added benefit of the Exhibit toolkit is that a backend database or search engine was not required. Instead, a JSON8 structured flat file could be produced, bundled with any visualization, and when browser clients access a visualization instance the data set could be included or distributed along with the instance. With the toolkits chosen, selection of the relevant data set and prototype development proceeded.

8 JSON (JavaScript Object Notation) a formatted text structure loosely following the ECMA script standard designed for structured data interchange between applications and programming languages [15].

Number of customers 25 Number of interviews 26 Number of countries 6 Countries represented United States, Spain, Finland,

India, China, Singapore Number of verticals 11 Verticals represented Credit Reporting, Energy,

Financial Services, Government, Information and Communication Technology, Insurance, Legal, Media and Entertainment, Retail, Systems Integration, Telecommunications

Total combined pages 229 Format Microsoft Word Version 2011

Table 4-2 metadata summary for included customer study materials.

Page 13: Masters Project - FINAL - Public

13

Utilized Data In an effort to make the system as believable as possible real customer interview Artifacts were used. A total of 26 Microsoft Word documents, each embodying a single interview, were identified and included in this project. Due to the sensitivity of most of these materials, this report intentionally minimizes detail on them. Therefore demographic style data covering the overall set of 26 is reported in table 4-2. The author hopes that this provides a sense of the scale and properties of the data set analyzed for the project. However, because one of the companies included in the set of data analyzed is no longer a going concern information about it can be reported in detail, see Appendix 3. Finally, to prepare the data for detailed analysis the files were converted to plain text and the file names were enriched with metadata. The last point, on augmenting the file names with additional metadata, actually proved critically important in the implementation of the prototype system. Specifically, the file names included data like customer name, vertical, geographic location, and so on. The precise file name structure implemented is represented in table 4-3 using the exemplary case of Heller Eherman.

Prototype System With the decisions about technology concluded and the data set identified, construction of the prototype system commenced. Development first started by building a short program to construct a structured summary of the 26 interviews, and as previously documented, Python was utilized for this effort. While the actual program utilized is available in appendix 4 basic functions of the program are described.

1. START a. Identify each file in a supplied directory b. For each file do the following steps

i. Extract summary metadata for the study from the file name ii. Open the file and compute the top 15 most frequent key words iii. Construct a JSON data stanza (see appendix 3 for an example JSON stanza related to Heller

Eherman) iv. Append the JSON data stanza to the entire data structure v. Generate a thumbnail image of the top 15 most frequent key words

c. Persist the JSON data structure as a file 2. FINISH

Field Date Region Country State/Province City Vertical Study Name Customer Type Example 20070202 Amer USA California Palo Alto Legal Content Services Heller Eherman Interview Derived File name 20070202-AMER-USA-CALIFORNIA-PALO ALTO-LEGAL-Content Services-Heller Ehrman-Interview.txt

Table 4-3 file format structure including an example Heller Eherman and extrapolated file instance.

Page 14: Masters Project - FINAL - Public

14

Once the structured summary was built, development then moved on to the visual presentation of the system. For this phase, basic usability principals were used, such as Aesthetic and minimalist design, but a deep usability evaluation was not performed. Here development used both Tumult Hype and MIT’s Exhibit toolkit concluding in the production of a wrapping experience and four alternative visualizations. Additionally, to capture a qualitative sense of how any participant felt about the test a survey on SurveyMonkey was created9. Each alternative implemented some part of the Planners’ needs expressed during the fact-finding phase. A map for the visualization portion

of the entire prototype system is available in figure 4-1. In an effort to more clearly connect implementation to needs expressed during fact-finding, detail about each alternative is presented in table 4-4. Alternative Explanation Alternative – 1: Geographic view of key words by customer.

• Visualizes each customer’s 15 most frequent key words according to their location in the world.

• Participants must click on the key word bubble to see specific information including:

9 Based upon the author’s training sometimes an inverse correlation is evident between actual user performance and the reported perception. If present in the results the author hoped to help point out that just collecting survey data is insufficient to determine the effectiveness of a system. See appendix 5 for the survey questions and the summarized responses.

Figure 4-1 a map of the visual presentation of the entire prototype system.

Page 15: Masters Project - FINAL - Public

15

o 15 most frequent key words o Customer name which is hyperlinked to the actual

detailed interview o Date of the interview o The vertical of the customer (E.g.

Telecommunications, etc.) • Implements features like key words by geography,

summary visualizations of key words, interview data over time, and the reporting of customer verticals.

Alternative – 2: Geographic view of key words by customer including filtering and search.

• Visualizes each customer’s 15 most frequent key words according to their location in the world and afford participants the option to search and filter customer names, key words, etc.

• Participants are able to perform all functions of Alternative – 1 with the following additions: o Search and Filter on key words o Search and Filter on customer names o Search and Filter on customer verticals

• Implements all of the features in Alternative – 1 adding additional affordances around searching and filtering to ease the access to the available data. When either a search, filtering or combined search-filtering action is performed the system updates the map to match the criteria. That is to say only the customers who match the criteria remain on the map.

Alternative – 3: Time view of key words by customer.

• Visualizes each customer’s 15 most frequent key words according to when they were performed; affords participants the option to search and filter customer names, key words, etc.

• Participants are able to perform the same search and filtering functions of Alternative – 2 in addition to the following: o Each interview is placed on a timeline chart

according to when it was conducted o When a customer name, corresponding to an

interview, is clicked it reports the same information as described in Alternative – 1

• Implements all of the search and filtering features in Alternative – 2 adding additional affordances to more explicitly visualize the relationship between time and customer interviews.

Alternative – 4: View of customer and key word data by geography and time.

• Visualizes each customer’s 15 most frequent key words according to when and where they were performed; affords participants the option to search and filter customer names, key words, etc.

• Participants are able to perform the same geospatial, and time views of data with search and filtering functions as outlined in all of the other alternatives.

• Implements all of the search and filtering features in all alternatives.

Page 16: Masters Project - FINAL - Public

16

Table 4-4 explanation of each of the implemented alternatives.

Once the implementation for each alternative was completed, a wrapping visual experience was built in Tumult Hype. Development of the wrapping experience was needed to ease interactions between the prototype system and the participants. Specifically, the goal was to simplify their movements through each alternative visualization approach and the survey. Once all of the development items were completed, the prototype was deployed into the included Apache web server on a Mac OS X V10.9.2 system and was readied for testing.

5. Testing Approach, Analysis Techniques, and Results

Testing Approach To execute the usability tests a protocol was developed that intentionally caused the participants to perform the core use case, “Abstract to detailed exploration” (see appendix 2 for the actual protocol). Since learnability has the potential to positively impact user performance individual steps within the protocol were intentionally designed to minimize, not eliminate, between task learnability. Furthermore to gain awareness of an alternative’s inherent ability to cause usability slips or errors each task had both correct and incorrect answers. These facets, coupled to snooping web server logs, allowed the author to measure the time to complete a task successfully, slip to success, or fail all together for each alternative. Actual testing occurred in typical Information Technology workplaces at three locations, see table 5-1, from April 4th, 2014 to April 25th, 2014. Each test was executed on an Apple MacBook Pro 15.2 inch laptop system running the OS X Mavericks operating system. No special hardware was used to measure user performance, physiological behaviors, etc. Software for the prototype system was installed into a directory space accessible to the locally running instance of the Apache web server. Prior to starting each test the system was reset to a known good state with the Apache log files emptied, web cookies and data from Survey Monkey removed, all web browser caches emptied, and all filtering options within the prototype unset10. To initiate a test each participant was given a paper copy of the protocol, asked to read it first, and then execute the test. The participant then sat at a desk or table and performed the test, referring to the paper copy of the protocol as required.

10 Note that at least one test failed to have the web browser caches emptied and cookies for SurveyMonkey removed. This test completely failed and the data was not included. Two other tests provided partial data because the web browser cache was only partially emptied. Data from these two sessions were included.

Location Address 1. 2845 Lafayette St., Santa

Clara, CA, 95050, USA 2. 292 Yoshida-cho, Totsuka-ku,

Yokohama, Kanagawa 244-0817, Japan

3. 300 Beach Road 28-01, The Concourse, Singapore 199555

Table 5-1 testing locations around the world.

Page 17: Masters Project - FINAL - Public

17

Overall, 30 participants were included in the study each executing a test session that lasted approximately 10 minutes or less. Additional detail about the sample of participants is provided in this section to give the reader a sense of who executed the tests within the study.

• Participants matched the earlier and operationally defined role of a Planner. That is they were engineers, product managers, product planners, product marketers, and so on.

• No incentives were provided or recruitment strategies employed to entice the participants to perform the test.

• No preference was given to race, culture, gender, age, or work experience. • Generally the participants ranged from the mid-thirties to mid-fifties in age having IT work

experiences from fifteen to thirty years. • Additionally, due to the various locations American, Japanese and Singaporean cultures

were included in the study.

Analysis Techniques Closing a usability testing session consisted of executing a simple script that extracted key entries from the Apache web server log, stored the results in a CSV file, emptied the web server log and restarted the web server. With session data stored the time to complete the task, cause a slip or cause an error was computed11. The results were times, in seconds, for each task per participant, see table 5-2. Per session time data were then consolidated into a single Microsoft Excel file for statistical analysis. However for the survey, SurveyMonkey handled both data persistence and the

11 Times were computed by subtracting the timestamp of a downloaded customer interview document from the entry time into a particular alternative.

Time URL Action Seconds0to0Completion Correct01=y,00=n Alternative52734 /~mihay/mockup.html <6Start6session

52737 /~mihay/mockup.hyperesources/iframe=htmlwidget.html

53964 /~mihay/mockup.hyperesources/iframe=bytimehtml.html

<6Enter6alternative

53964 /~mihay/sample_1/locationNoFilter.html

54005/~mihay/docs/20070202=AMER=USA=CALIFORNIA=SANTA%20CLARA=RETAIL=Content%20Services=eBay%20HR=Interview.docx

<6Download6|6Slip 41 0 1

54085 /~mihay/docs/20111130=AMER=USA=CALIFORNIA=FAIRFIELD=TELCO=CC=AT%20and%20T=Interview.docx

<6Download6|6Correct 121 1 1

54098 /~mihay/mockup.hyperesources/iframe=bytimehtml=1.html

<6Enter6alternative

54098 /~mihay/sample_1/location.html

54139 /~mihay/docs/20111130=AMER=USA=CALIFORNIA=FAIRFIELD=TELCO=CC=AT%20and%20T=Interview.docx

<6Download6|6Correct 41 1 2

54170 /~mihay/mockup.hyperesources/iframe=bytimehtml=2.html

54170 /~mihay/sample_1/time.html <6Enter6alternative54182 /~mihay/sample_1/__history__.html?0

54243 /~mihay/docs/20111130=AMER=USA=CALIFORNIA=FAIRFIELD=TELCO=CC=AT%20and%20T=Interview.docx

<6Download6|6Error 73 0 3

54249 /~mihay/mockup.hyperesources/iframe=bytimehtml=3.html

<6Enter6alternative

54249 /~mihay/sample_1/locationAndTime.html

54299 /~mihay/docs/20120905=APAC=CHINA=BEIJING=BEIJING=ICT=CC=Neusoft%20Reseller=Interview.docx

<6Download6|6Correct 50 1 4

Table 5-2 exemplary session data showing task times for correct, slip and error actions.

Page 18: Masters Project - FINAL - Public

18

computation of basic descriptive statistics. Initial analysis of the session data consisted of a c2 test for normality to ensure that further analysis could be performed. With the c2 test completed, a one-way ANOVA was performed determining if differences between the means existed, see appendix 6. Following the ANOVA post-hoc T-Tests were computed to detect differences between individual means. As for the survey data, tabulation of results for the survey was computed by the SurveyMonkey service. Since the primary reason for the survey is a qualitative sense of participant perception c2 tests, an ANOVA, and post-hoc T-Tests were not used.

Results With the data collected, it was prepared for analysis. An initial review of the data showed that in some cases data points were missing – either due to failures generated directly by the participant or due to the web browser’s cache not being emptied. Therefore, for those missing data the mean for each alternative was substituted. While this obviously skewed the data towards the mean results from the c2 tests showed that the assumption of normality could not be rejected, see table 5-3. With c2 tests allowing for confidence in the assumption of normality the additional tests of ANOVA and post hoc T-Tests were performed. The ANOVA results showed significant differences between the means (p-val=1.04639E-12) of all of the alternative visualizations. Further post hoc T-Tests illustrated significant differences between most of the means except for two, alternatives 2 and 4, see table

5-4. With these basic tests completed user performance amongst the alternatives could be considered and contemplated. After concluding the initial analysis, testing normality assumptions and

checking for differences between the means, the results were graphed and are presented in figure 5-1. Again, it is possible to state that differences existed between all alternatives, excluding 2 at 83.44 seconds and 4 at 76.93 seconds. This proved not surprising to the author as their construction was similar – both included map based visualizations and identical filtering widgets. As a result, no clear winner between the alternatives can be declared. Instead it is possible to state that the visualizations, included in the prototype, which contained map based visualizations coupled to filtering controls exhibited the best-measured user performance. One potential reason, for insignificant differences between the means of alternatives 2 and 4, likely stems from between task learnability. That is because since both alternatives included the same set of widgets, it is highly possible that interactions with alternative 2 were learned and carried over to alternative 4 by the participants. In essence, learnability between the tasks likely allowed the participants to complete task 4 with higher performance. Additional detail and commentary are reported per

Alternative P+Val No.0Missing0Data0Points Ho:0Assume0normalityAlt$1 0.4693 3 Accept/NullAlt$2 0.8847 3 Accept/NullAlt$3 0.8141 6 Accept/NullAlt$4 0.2973 3 Accept/Null

Table 5-3 missing data points per alternative and Chi squared tests on normality.

Comparison P+Val Alpha Ho:2No2difference2in2meansAlt$1&to&Alt$2 0.000028293 0.05 Reject&NullAlt$1&to&Alt$3 0.047058421 0.05 Reject&NullAlt$1&to&Alt$4 0.000012119 0.05 Reject&NullAlt$2&to&Alt$3 0.000000002 0.05 Reject&NullAlt$2&to&Alt$4 0.504529797 0.05 Accept&NullAlt$3&to&Alt$4 1.2485351E$9 0.05 Reject&Null

Table 5-4 post hoc T-Test comparisons between the alternatives.

Page 19: Masters Project - FINAL - Public

19

alternative including a characterization, via additional descriptive statistics, if the alternative caused errors or slips, and the results of the survey self-reported by the participants.

• Alternative 1 – With a mean of 142.67 ± 62.95 seconds participants slipped a high percentage of the time (56.7%), and error percentages were relatively low at 10%. Interestingly, only 24.14% reported this alternative as difficult to use with the remainder reporting as extremely easy to use and moderately easy to use, 34.48% and 41.38% respectively. Based upon the author’s training, this is not surprising as it represents bias in the response of the participant. That is, on many occasions participants tend to self-report a higher degree of perceived performance even though measured performance data doesn't support the perception. An interesting qualitative observation comes from the high rate of slippage. Participants who slipped generally downloaded exactly the same incorrect customer study Artifact, and then recovered to the correct answer. A potential reason for the high rate of slippage is discussed in the conclusion section.

• Alternative 2 – Resulted in a mean of 83.44 ± 33.59 seconds and low percentages of slips and errors at 3% and 3% respectively. No participant reported this alternative as difficult to use. 27.59% reported as extremely ease to use, and 72.41% reported it as moderately easy to use.

• Alternative 3 – Produced a mean performance of 175.5 ± 61.29 seconds with the highest percentage of errors at 17% and only 3% slippage. Being the lowest performing alternative seemed to result from the primary timeline widget either having no bound or generally being unclear. Qualitative observations showed that users struggled with the timeline widget

142.67

83.44

175.50

76.93

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

160.00

180.00

200.00

Alt-1 Alt-2 Alt-3 Alt-4

Timeinseconds

Alternatives

Figure 5-1 per alternative means, in seconds. Note that no significant difference is discernable between alternatives 2 and 4.

Page 20: Masters Project - FINAL - Public

20

either by moving well beyond the final customer study Artifact, or not initially understanding that the timeline widget was active and could be controlled. Participants reporting on their perception of ease of use for this alternative was mixed with 20.69% reporting extremely easy to use, 48.28% reporting moderately easy to use, and 31.03% reporting difficult to use.

• Alternative 4 – With the lowest absolute mean at 76.93 ± 41.2 seconds and low error and slip percentages, 7% and 3% respectively, participants self-reported this alternative overwhelmingly as extremely or moderately easy to use, 72.41% and 20.69% respectively. In fact only 2 participants, or 6.9%, suggested the alternative was difficult to use. In this case participant bias seemed to match measured performance data.

One observation from the resulting data including the self-reporting survey data: It appears that participant perception of their performance sometimes differs from their measured performance. An exact and precise reason behind these phenomena cannot be clearly understood leaving both the author and the reader to guess.

6. Conclusions and Future Work Overall, the work studied the concepts related to traditional steps in Information Systems design and planning. Notably, with fact-finding the author applied an approach that may ultimately result in a more structured way to think about generating requirements for engineering teams. Specifically, the author hopes to apply this process to upcoming customer studies at his company to uncover insights into key use cases, desired non-functional behaviors, and overall awareness of a particular concept by interviewed customers. While not the focus of the paper, the author imagines a relatively structured process can emerge that includes at least the below:

1. During the interview cycle, continuous debate on key concepts ideally resulting in an evolving definition of uncovered key concepts and their associated key words.

2. Post interview cycle a joint session with most or all interviewers to debate and validate key concepts with an aim to structure the findings in a manner as illustrated in this paper.

3. Delivery of a comprehensive report and associated visualizations that could represent the body of the work to provide collateral consumers an easy way to enter and see what outcomes and findings are relevant to them.

However, this process is still overly manual and likely a challenge to team members who are non-native English speakers. This is where some simple findings from this project could be applied to a slightly modified fact-finding process. Notably, the encoding of customer metadata into the file name structure as exemplified in the Heller Eherman case is suggested. In fact, the majority of the prototype system was made possible by using this clever naming structure that allowed a Python program to extract metadata needed to produce the alternative visualizations. Furthermore other important processes emerged to ensure that awareness of past interview cycles remains high. Notably one interviewee suggested that through the usage of status reports, face-to-face

Page 21: Masters Project - FINAL - Public

21

meetings, internal blog posts and summary quotes a continuous stream of information be published. This interviewee’s intention was that repeated pointers to the content might actually serve to slowly increase awareness of a customer interview repository. Ultimately, the author hopes that various quantitative and qualitative techniques will cause more frequent usage of a repository effectively leading to un-stranding the Artifacts. Beyond application of structured debate and consideration and active advertisement, the project also sought to review what it would take to build systems that extract meaning from a repository of Artifacts. The author’s hope is that in the long term such an approach would ease Planners struggles with finding meaning from the collateral. While the system dealt mostly with the front end visualizations, a repeated suggestion of time savings and excising bias when generating interview reports proved compelling. That is because it suggests more automated approaches to extract generic meanings could both reduce time in pinpointing a subset of Artifacts and instill confidence in Planners that bias has been minimized. Further because the workers who conduct customer interviews may have limited English proficiency an automated system may also remove undue burden on the interviewers, shortening the cycle time required to turn around an Artifact. Ideally, a system would be able to extract meaning directly from audio recordings, limiting or eliminating outright the requirement to perform detailed transcriptions or meeting summaries. Yet alternatives might be imagined such as using low cost human powered transcription services to extract the conversation word-for-word including a rough time index. Beyond the gathering of new interview collateral it was clear that there is a hidden body of content available for consumption by the author’s company’s Planners, which needs to be gathered and curated into a form consumable by more than just individual personnel. Furthermore the author imagines even if some collateral was missed, through an exhaustive search, the gathering and mining of a consistent set of collateral could prove useful for advanced text mining, search/indexing, and as studied in this project, structured visualizations of the repository. In particular, this project focused in detail on several alternative visualization approaches specifically for customer interview materials. What was uncovered, at local level, are two alternative visualizations exhibiting similar user performance, in terms of time to conclude a task. In addition when these user performance data were compared to how participants perceived the difficulty of each task performed there was seemingly an inverse correlation between their perception and their performance. While certainly a topic for deeper study, it does suggest that merely looking at survey data alone, to judge user performance, is insufficient to make an informed

Figure 6-1 example of a usability defect which likely caused errors and slips.

Page 22: Masters Project - FINAL - Public

22

decision on partial or overall system construction. Instead, a more reliable way to compare the alternatives should include multiple data sources and combine them into a complete picture. At a global level, and due to actual observations of the tests, the author does wonder about how emerging Big Data sets are going to be visualized. Specifically, if the local findings can be extrapolated slightly does it suggest more study is required to better understand the impact of visualization techniques on user performance for Big Data? Essentially, the author wonders if application developers may overload dashboards and other GUI visualization techniques resulting in usability errors, information loss and ultimately incorrect decision-making. Given the promise of Big Data, in order to revolutionize decision making poor selection of visualizations may even result in dire consequences. For example, an interesting finding derived from observation is that the customer detail bubble used in the prototype system was inappropriately designed for consistent user performance, see figure 6-1. Notably, the usage of color in the visualization of the top 15 key terms may cause a user to skip over structured customer data also reported in the bubble like the vertical the customer belonged to. What resulted was users consistently slipped or failed the task outright, and in fact to help the users recover the author asked a generic question, “Please review the task to determine if you’ve completely it correctly by downloading a file associated to a telecommunications provider.” The author suspects if the bubble had been designed with color throughout perhaps the rate of errors and slips would have been reduced for the first task. Moreover, the generally poor performance on the third alternative, when coupled to observations of

the tests, illustrated that some kind of scaffolding was likely required to guide the user when interacting with the utilized timeline widget. Optionally, another widget, which bounded the span of visualized time, could be employed that clearly places restricted boundaries on time. Specifically, the author observed users, once they experimented with and understood how the widget worked, moved well past the last data point. That is the, time widget could effectively go

backwards and forwards in time without bound, see figure 6-2. Hence, the suggestion of a different widget that constrains or puts explicit boundary conditions on how far back in time a user could progress. Moving back to the author’s point on Big Data visualizations: If developers implement visualizations that cause high rates of error or slippage could the consequences be dire? Since the author is aware of Big Data visual presentations representing the likes of bullet train inter-arrival times, usability slips and errors caused by poor selection of visualizations has the potential to be dire in consequence. A potential path ahead to reduce slips and errors both in the local and global contexts follows. Included in the survey were two questions that sought to uncover the level of interest in the formal development of a proper system implied by this study. From this survey 96.55% of the participants thought additional investment should be sought to better manage the Artifacts, note

Figure 6-2 the time widget used in alternative 3.

Page 23: Masters Project - FINAL - Public

23

one participant responded that they weren’t sure. Furthermore than a super majority, 75.86%, of participants strongly agreed that such a system should include key word or key term visualization techniques. This leads to the following question: What kinds of capabilities might a “proper system” require? Certainly the core use case discovered during the fact-finding phase is mandatory, and due to the results of this study the inclusion of visualizing Artifacts overlaid on a map, with filtering and searching, should be highly preferred. Beyond that Google-like search, presentation of competitive information, inclusion of industry specific analyst data, and an ability to combine these various data all seem relevant. The overarching goal in the inclusion of additional modes of visualization, information processing, and more data types is assisting Planners in uncovering insights for their various objectives. Further hints, peppered throughout the user interviews, on time saving and eliminating bias suggest some approach to better link summaries of data to direct references of the source material. For example, if it were possible to index key words from audio recordings any reporting must directly reference the source audio recording(s) ideally including the originating time sequence. However, as new use cases, widgets and affordances are granted to users how can user performance be continuously quantitatively evaluated? Certainly, this study also suggests a path forward through the capture of systems logs; yet system log capture should be augmented with some form of continuous automated capture of detailed user session data to better track user behaviors. These data could then be feed back and forward into a design and development process that takes into account continuous measurement of user performance. Of course, more formal and regular usability tests, augmented by capturing system and session logs, should be performed systematically prior to release of the tool assuring that the materialized features realize expressed requirements. Finally, from interviewee onto participant it there was clear interest expressed in realizing a customer and market intelligence system that boosts productivity of Planners. While the author isn’t yet certain how a real system will materialize he is sure that this study will improve the offering planning experience at his company!

Page 24: Masters Project - FINAL - Public

24

References [1] J. L. Whitten, L. D. Bentley, and Kevin Dittman, Systems Analysis and Design Methods, 7th

ed., Brent Gordon, Ed. New York, New York, America: McGraw-Hill, 2007. [2] Jorik Blaas et al., Exploration and Analysis of Massive Mobile Phone Data: A Layered Visual

Analytics approach, Feb 15, 2013, A work product of the Orange Data for Development (D4D) challenge.

[3] Apple, Inc. (2012, Oct.) Investor Relations. [Online]. http://investor.apple.com/common/download/sec.cfm?companyid=AAPL&fid=1193125-12-444068&cik=320193

[4] Alistair Baron, Paul Rayson, and Dawn Archer, Word frequency and key word statistics in historical corpus linguistics, Jan 12, 2009, Used to provide a broad understanding of word and key term frequency analysis.

[5] Troy Tanner and Joe Zhou, Construction and Visualization of Key Term Hierarchies, Mar 11, 2002, Provided insights on the construction of systems based upon automated key term extraction.

[6] Olha Buchel and Eva Fischer, "Can Interactive Map-Based Visualizations Reveal Contexts of Scientific Datasets? ," Faculty of Information and Media Studies , The University of Western Ontario , Ontario, Paper 2012.

[7] Alexander Haubold and John R. Kender, "ANALYSIS AND VISUALIZATION OF INDEX WORDS FROM AUDIO TRANSCRIPTS OF INSTRUCTIONAL VIDEOS," Department of Computer Science, Columbia University, New York, Jun 16, 2004.

[8] Tumult. (2014) Tumult Hype Documentation - Overview. [Online]. http://tumult.com/hype/documentation/overview/

[9] Mike Bostock. (2013) D3.js - Data-Driven Documents. [Online]. http://d3js.org [10] Massachusetts Institute of Technology. (2012) MIT - SIMILE - Exhibit 3.0. [Online].

http://simile-widgets.org/exhibit3/ [11] Jeffrey M. Thompson and Mats P. E. Heimdahl, "Structuring Product Family Requirements for

n-Dimensional and Hierarchical Product Lines.," Department of Computer Science and Engineering, University of Minnesota, Minneapolis, 2004.

[12] Padmanabhan C. Prasanna, "DECIMAL: A Requirements Engineering Tool for Product Families," Computer Science, Iowa State University, Ames, 2001.

[13] Steven Firer and S. Mitchell Williams, "Intellectual capital and traditional measures of corporate performance," Journal of Intellectual Capital, vol. 4, no. 3, pp. 348-360, 2003.

[14] David L. Parnas, "On the Design and Development of Program Families," IEEE Transactions on Software Development, vol. SE-2, no. 1, pp. 1-9, Mar. 1976.

Page 25: Masters Project - FINAL - Public

25

Appendixes

Appendix 1 – Generic Interview Questionnaire & Example Interview Outcome

Generic Interview Questionnaire Attendees: Date: Title: Interview notes and questions for determining feature set Questions:

1. History of current content a. Do you know if there are any customer study materials and

where you might go to find them? b. If you are aware of the materials is the current format sufficient

or insufficient for your needs? 2. Experiencing the content

a. How would you like to explore the materials to get the best possible benefit?

b. Do you imagine that some kind of visualization of the findings would be useful/helpful?

c. Are you familiar with Word/Tag Clouds and key term visualization? If so do you think they would be helpful?

d. Are specific organization techniques useful/helpful such as content/key term by geography, time, and vertical/sector? Are there others than those mentioned?

e. Do you imagine that you want to get to the content directly or are more summarized abstracts or key term visualizations a better place to start?

f. What platform is the best target for such an exploration system? 3. Outcomes and consumption practices for the content

a. What kinds of discoveries and findings do you anticipate are possible or even relevant?

b. If so what kind do you think are preferable or relevant? c. How do you typically use customer study materials in your

plans? d. If you do not how do you consolidate your own information to

produce release, plan, other content? 4. Are there other kinds of data to include in conjunction with customer

study data/materials? If so can you describe the data?

Example Interview Outcome Attendees: Participant 1, Participant 2, Participant 3

Page 26: Masters Project - FINAL - Public

26

Date: Feb. 10, 2014 9:00AM PST Title: Interview notes and questions for determining feature set Questions:

1. History of current content a. Do you know if there are any customer study materials and

where you might go to find them? i. Participant 1: I know some, but there are a bunch that

haven’t been re-aggregated. So I don’t know where everything is. Participant 3: YRL may have done some of the work, and there was any kind of reporting that was done but not sure about it. I don’t know where the materials are at CompanyX, and I in fact I have a cached copy on my laptop. There were materials that were resent to CompanyX. There is information on CompanyX’s shared files, but only accessible by the product planning team.

ii. Participant 2: I know there is material and I’ve looked for it and couldn’t find it. Is there a reason it isn’t in the Sharepoint site?

iii. CONCLUSION: Even among team members who have participated or consumed content there is a distinct lack of knowledge about where the collateral is located, and intelligence that there might be additional content, but not certain.

b. If you are aware of the materials is the current format sufficient or insufficient for your needs?

i. Participant 1: I have had teams that have used the super detailed reports and this is what they wanted. However, from my perspective the format is insufficient. Today you have to read and aggregate it yourself. There is no easily searchable approach to get to data so that we can identify contents. There are a lot of audio files, but they are dark data. For a recent project I had my team do raw transcriptions, but I cannot aggregate the work. Transcription work takes hours from the interview team to do. So the short answer is that no they aren’t usable at large. A lot of the data is dark. Some of our colleagues in Japan were trained on detailed and specifics for particular topics so this could be a change in skills and awareness in the planning process.

Page 27: Masters Project - FINAL - Public

27

ii. Participant 3: No. Many reports are long and raw. There isn’t really much conclusion in the reports. Also we don’t think about reusing the reports in other ways for other projects. When we start the studies there are specific questions related to current products from various parts of the organization, but it is very specific to a topic. After 6 months to 1-year we may not be able to find something on a new topic. This is because we’re missing key words on particular topics. So the content may be okay, but we also need some skill sets from our users on how we can use the content in new ways. People need to make assumptions and draw conclusions from the collateral not have things spelled out. We may need to also change our interview style so that we talk about problems and look towards a customer re-visitation problem.

iii. Participant 2: I haven’t looked at the materials to make a proper judgment. Some form of an index would be interesting.

iv. CONCLUSION: Some way to search & discover content is critical. The declaration about there being a lot of dark data is quite interesting.

2. Experiencing the content a. How would you like to explore the materials to get the best

possible benefit? i. Participant 2: I’m thinking if you have a broad range of

topics to discuss with the customer. If I had the index or list of questions then I could at least see the areas of consideration that were thought about. It would allow me triangulate down to the set of interviews to pay attention to. I’d also like to look at multiple interviews, multiple topics, etc. I think partially you have to put in the time. Not sure if there is an automated method to do that.

ii. Participant 1: What I would like to do is [describe] something. So if you have the experience of going to a public library you can do searches on anything. It would basically give you indexes. It will basically give you hits to anything anywhere. Since I know the structure of the interviews they aren’t actually topical. The [raw] content exposes the customer thinking for a number of years. I’d like to include not only our data, but also Twitter, HDS

Page 28: Masters Project - FINAL - Public

28

Community, and I’d also like to see linkages and data that incorporates analyst and market views all in one [tool]. Essentially with key terms which will grow over time. I’d also like to see some visualization such as Wolfram Alpha. Number trending, Frequencies, meaning of words. I’d start with the key term library internal data, our written reports, but more importantly being able to get to some of the recorded data. I’d be thrilled with that, and it doesn’t have to be perfectly automated. Just having our own stuff searchable and be able to search the audio content would be huge. Very seldom will the audio content be high fidelity. Sometimes there are follow-up discussions that are followed by another team [PM, Eng, Sales] and we don’t see the connection between these studies, yet in some cases we don’t even follow-up. Also we don’t get cold calls and we miss customers that we don’t have or haven’t acquired.

iii. Participant 3: Two things. First for the data we gather we have to consider a way to do text mining including voice to text. This part should be automated. We’re focused more on the technology side and as a result we may need to a business to technology key term thesaurus. This is a mutual learning process with our customers. It isn’t always that we can ask the right questions from our customers. It isn’t always a one-time visit with our customers. [Could we use the materials with our customer as well? This might suggest some anonymization.] Not all of the information is shared across multiple teams. There is some kind of a need for a single repository [and we need to work on this with one another]

iv. CONCLUSION: Again indexing and search has come up especially references to key terms. In particular Participant 1 also talked about mashing up data types to achieve a more complete result. Finally, perfect automation of the whole process isn’t required/mandatory and some part of the human process is essential for the “ah ha” moment.

b. Do you imagine that some kind of visualization of the findings would be useful/helpful?

i. Participant 3: I do. ii. Participant 2: Yes

Page 29: Masters Project - FINAL - Public

29

iii. Participant 1: And I already talked about some from the previous question.

c. Are you familiar with Word/Tag Clouds and key term visualization? If so do you think they would be helpful?

i. Participant 1: Yes, yes. Probably need more. They are a good start as they can help people start with the beginnings of their investigations. In a company like ours they could be really dangerous. You may not be able to refine how questions are asked. If you’re a native English speaker then it can help, if you’re not then it can be very limiting and dangerous. I know that almost every team falls into this trap. Whether or not you have the content, the teams [outside of the project team] don’t trust and believe that the people gathering the data were [not] biased. I don’t know how we solve the fundamental trust and belief problem. We may not have skilled people who are capable for investigative reporting and open-ended questions. If you do all of this visualization will the teams actually believe it?

ii. Participant 2: I think it would be useful. It isn’t specific information it is more roadmap-ish and it helps you find the data. This is like a court case so our goal should potentially be to open the data for folks to draw their own opinions.

iii. Participant 3: Visualization will be very useful, but the way that you visualize is highly dependent on who created it. I don’t know how we can create things in a non-uniform way. How can we create visuals in different ways? If you want to get a specific answer from the beginning they visualizations are biased. It depends if you’re non-native then you may miss the context of the input from audio content. The way that you visualize the data there isn’t an ultimate way to present the data. At the same time it is very convincing that you have visuals.

iv. CONCLUSION: The theme of being biased when gathering the information came up a lot. As a result better access to the raw content, potentially all the way down to an originating audio file if it exists, was hinted at.

d. Are specific organization techniques useful/helpful such as content/key term by geography, time, and vertical/sector? Are there others than those mentioned?

Page 30: Masters Project - FINAL - Public

30

i. Participant 3: Time is important. Yet I’m not sure how this could be related to customer studies. Can we see the potential financial opportunity of a trend/task? Can we connect to other systems and data sources to help us make a decision?

ii. Participant 1: There are an infinite number of combinations that I cannot predict. One interesting parameter is that our number and types of key terms will increase over time. Specifically the Content Cloud study would see a high density of the term metadata and not before that time. However, over time the bias element should change over time. This would make it possible for customers to have cleaner views of what customers are asking for. The most recent visits are super biased. Can we eliminate key terms that are too frequent? If you know that J&J has this type of problem and this kind of tool can go to the EDGAR DB to find of all of these types of companies who have similar problems. Tier data comes from IDC, SEC filings, Patent filings, …, there’s all public data. Investor and manual report. Are stuff is even more siloed than the public DBs.

iii. Participant 2: I agree with Participant 1. If you have the lens of time you’ll see something different. Vertical seems like the ringer. Can we see the perspectives or who has written or who has conducted the engagement? Is EDGAR internationalized?

iv. CONCLUSION: Viewing the development of summaries/key terms over time seems relevant. Also the idea of noisy terms that are two frequent might either be removed, a potential affordance, or dampened with the lens of time. Again the idea of including external data sources came up.

e. Do you imagine that you want to get to the content directly or are more summarized abstracts or key term visualizations a better place to start?

i. See 2.a.Participant 1 ii. See 2.c.Participant 3 iii. PARTICIPANT 2: I would want to get to the content

directly for sure, however the concept of summarized abstracts sounds very appealing to start my research. Starting with a list of brief summaries could be beneficial for directing my search effort. Something as simple as

Page 31: Masters Project - FINAL - Public

31

having an abstract that would highlight the nature of the interview and key subjects discussed.

f. What platform is the best target for such an exploration system? i. Ran out of time, could not answer

3. Outcomes and consumption practices for the content a. Laptop was generally recommended. b. What kinds of discoveries and findings do you anticipate are

possible or even relevant? i. See 2.d.Participant 3 ii. PARTICIPANT 2: Individual customer strategies and

trends amongst the whole and some number of nuggets that address specific inquiries whether it was a due to coincidental response to a question or due to a specific topic Q&A.

c. If so what kind do you think are preferable or relevant? i. NOTE: UNABLE TO ASK DUE TO TIME, WILL FOLLOW

UP AND UPDATE d. How do you typically use customer study materials in your

plans? i. Participant 2> I have not yet leveraged our archived

customer studies e. If you do not how do you consolidate your own information to

produce release, plan, other content? i. PARTICIPANT 2>I use analyst reports, articles and

whitepapers and catalogue the links/references in a loosly structured list of notes.

4. Are there other kinds of data to include in conjunction with customer study data/materials? If so can you describe the data? a. See 2.d.Participant 1 b. 2.d.Participant 3

Page 32: Masters Project - FINAL - Public

32

Appendix 2 – Usability Test Questionnaire and Task List Background: Given your role (e.g. marketer, product manager, product marketing, planning, engineering, etc.) you’re to perform a series of tasks across a set of alternative visual treatments. At the end of the session you’ll be asked to take a digital survey to rate and reflect on the process. Usage Scenario: Imagine you’re planning for a major release of a new offering/product. To help you in your planning efforts you want to quickly identify thinking and sentiment from customers in our customer base. As a result you're going to use a new customer intelligence system to help find the right set of collateral for your study. Notice: If you run into any problems/challenges please do not feel afraid to ask questions. Tasks:

1. Log into the system 2. Using Alternative-1

a. For the new offering you think that Telecommunications companies who have talked about or referred to the term “data” are critically important.

b. Actions: i. Please select Alternative-1 from the home screen ii. Once the system has rendered please find the first

Telecommunications Company including a reference to the key word data and download the interview document associated to that customer.

iii. Once the document is downloaded please return to the “Home” screen.

3. Using Alternative-2 a. After some consideration and study you realized that you’d need to refine

your search slightly. Specifically, you’ll want to find any Telecommunications Company who talked about or referred to the term “metadata.”

i. Actions: 1. Please select Alternatve-2 from the home screen 2. Once the system has rendered please find the first

Telecommunications Company who makes reference of metadata, and download the interview document associated to that customer.

3. Once the document is downloaded please return to the “Home” screen.

4. Using Alternative-3 a. During your deliberations you’re beginning to wonder when users began to

wonder how long ago customers began talking about content, Big Data, etc.

i. Actions: 1. Please select Alternatve-3 from the home screen

Page 33: Masters Project - FINAL - Public

33

2. Once the system has rendered please find the oldest interview, which mentioned the word “content.” Once you’ve found the interview download the document associated to that customer.

3. Once the document is downloaded please return to the “Home” screen.

5. Using Alternative-4 a. Finally, you want the perspective of learning about the behavior of the

Chinese market with respect to Advanced Analytics. i. Actions:

1. Please select Alternatve-4 from the home screen 2. Once the system has rendered please find the first company,

in China, who makes reference of the word analytics, and download the interview document associated to that customer.

3. Once the document is downloaded please return to the “Home” screen.

6. Conducting the survey a. Please click on the survey link in the top right hand corner and conclude the

survey.

Page 34: Masters Project - FINAL - Public

34

Appendix 3 – Exemplary Customer Interview Materials & Related JSON Data

Notice Due to the sensitive nature of customer centered documentation only one example can be included. There are two reasons why this exemplary information can be included:

1. This particular customer, Heller Ehrman, is no longer in business, and 2. Since the interview was conducted in 2007 there is very little sensitive information

included.

Heller Ehrman – Employment Attorney Name: Heller Eherman Content Services Interview Date: Friday, February 02, 2007 Goal: Discuss with Heller Ehrman how they operate (e.g. what records retention policy is used and how they plan on adhering to it) and potentially determine what kinds of requirements can be gleaned for HDDS and other Content Services offerings. Outcome: Understandings of the key pain points were related by the individual employment attorney interviewed. Further ideas potentially leading to differentiation points for the Hitachi Storage Solutions Group were hinted at and documented in this memo. Background Heller Ehrman was a law firm headquartered in the San Francisco, USA. Founded in 1890 and surviving the San Francisco earthquake in 1906, the company has recently been dissolved due to a bad financial year in 2007 and the recent poor economic climate. More information on Heller Ehrman can be found in Google and at their web site http://www.hewm.com. Customer Quotes

• Heller Ehrman (HE) quote on content and metadata: “You have to provide access to the metadata to the court.”

• HE quote on data presentation: “In theory I can get rid of all of the binders in my office, but I print them anyway. The reality is that lawyers like to see paper.”

• HE quote on backup tapes and the discovery process: “I think that’s right. I would say that is true. Sometimes its better to not get the data. Sometimes we curse the fact that the company has backup tapes, because tapes are notoriously difficult to recover from.”

• HE quote on avoiding backups: “It is a various complex question. … I think I’m going to beg off of that question. As there are some times when backups are required and others that they aren’t required. It is highly dependent on specific regulation, etc.”

Page 35: Masters Project - FINAL - Public

35

• HE quote on communication in the workplace with respect to job performance for employees: “Too often in the workplace problems occur because problems aren’t communicated or the communication is obfuscated.”

• HE quote on not getting employee emails: “Sometimes it is not the end of the world if you aren’t able to get the communications from the employees.”

• HE quote on the amount of information required for lawsuits: “In the event of running a defense lawsuit there is a voracious appetite for information.”

• HE quote on which tools are used in class action lawsuits: “For big class action lawsuits Excel is the workhorse for managing information.”

Comments on Proposed Roadmap

• No roadmaps were disclosed to the customer. Potential Differentiation Points These are problem areas a user points out which may lead to features and concepts incorporated into product(s). While no feature has been suggested directly, these points should eventually map to a definable capability or a trend that maps to a series of features within product.

• The ability to make the integrity evaluation quickly without having to hire experts would be very important.

• Building a web based system, which has varied access control mechanisms and can allow for the inclusion/exclusion of unstructured data objects based on search terms was explicitly mentioned by the lawyer.

• One can infer that due to the regulations mentioned by the interviewee, regionally specific search islands that are unified using a search federation model might be applicable. For example with the privacy laws differing between the Europe and the US, employee information may not be exported from the Europe to the US. Therefore it is preferable that regionally specific searches are done implying that a federation model might be preferable.

Other Topics

• Federal Rules of Civil Procedure: He stated that the FRCP does not become active until there is a lawsuit. However that does not imply companies should not be prepared. It essentially requires that any company must be able to track maintain and produce records in printed and native form including their metadata. Data must only be retained in the event of litigation. At that point, a litigation hold is required on the relevant assets, which are a part of trial/discovery process. One final point of clarification: there might be other regionally specific regulations, which clearly specify data retention rules and policies that must be adhered to the FRCP does not countermand those rules or regulations.

o Companies must have a way of storing data related to litigation in a safe platform.

o State court will most likely follow the federal approach either in law or guidance.

Page 36: Masters Project - FINAL - Public

36

o A typical lawsuit that could occur where FRCP might be applicable is in a class action lawsuit. This is most interesting because of the large-scale requirements for data gathering, etc.

• An email correspondence or trail is used to show what an actual person is doing, in other words eligible for over time, etc. Of late there was a class litigation lawsuit for wage and compensation with Electronic Arts being one example.

• Data Integrity – means to the attorney data has not been manipulated one way or the other. When asked how would you determine that a record is authentic, the answer is that expert witnesses are typically hired to judge if the integrity of the email or document.

o Typically some of the plaintiffs suggest that the defendants are manufacturing emails. This implies that there are questions about data integrity.

• While there aren’t any specific cases related to showing the chain of custody to protect the privacy of employees’ personnel information including who has accessed the data, the lawyer can imagine that it would be a problem. Particularly paper documents are problematic in this area especially for HR departments.

o Most HR departments are still using paper, but there is a growing trend of HR IS systems being deployed.

• In order to show that we are doing the right things employers need to document various performance activities of their employees. This is a burden on the management, and building the long-term case is important. If there was a way to make that process go more quickly for the managers and HR teams involved then it would be a time saving system. This would be for performance both negative and positive kinds of actions. Suggested something that provides a little flexibility yet prompts them down the path to make the process easier.

• When talking about employment lawsuits, and providing access to information the attorney being interviewed suggested some kind of a web portal infrastructure that both the plaintiff and the defendant could access for information sharing.

o May want to export these materials in a known format for seeing the case history or web portal for remote access.

o Want to see every communication or document from the employee, however, there is a danger in creating a virtual personnel file, perhaps it would be temporal shadow managers only file.

o Standard emails aren’t appropriate for the personnel file. o The easier you make it for the defendant the easier you make it for the

plaintiff • The technology used for data winnowing and gathering is still fairly immature.

For instance having to manage 10000 documents to winnow them down to the right set is not very easy to do today. While the tools are getting more and more mature they are still somewhat arduous.

• In the even of he portal kind of system there may be a requirement to export the results to PDF and in a printable format. However this is not the best approach.

• “For every employment matter there is no consistency in what I get.” o Offer letters, Performance Improvement Plans, Reviews, etc. o Documents that are produced by a particular employee are not usually

needed unless it is needed as evidence to prove that their work is of good or bad quality. This is done mostly in the counseling context.

Page 37: Masters Project - FINAL - Public

37

• What organizational changes, post 9/11, are being created in organizations? o It is not 9/11 related, but there are things related to SoX, which are more

corporate infrastructure in nature. • How do you know what to keep or not what not to keep, pre-litigation

§ Tell our employers that they need to keep things 5 years after the employee has departed. May not apply to the typical email like the ones asking to go out after dinner and there is not any obligation in the ordinary course to keep it.

• Do employees have the right to take their employee file with them when the leave a company?

o State dependent and country dependent, and there are some states that allow users to take their file or a copy of their file with them.

o For example in California, the employee may request to look at their file but not take it with them.

• Is anything in the realm of audio files that should be kept? o Yes as voicemails are a regular part of the business process and may be

needed to defend a case retention rules are applicable to these as well. • What employee metadata is needed when looking at information for discovery

purposes? o Employee Name o Employee Serial Number o Employee Job category – in some cases this may be hard to identify. In

one example they were going through every category in the payroll system to look for all people who are close to that job type in a given organization. In some instances this requires looking at the paper pay records.

• For lawsuit management at this particular law firm there is an extranet system that contains all of the materials for a given case. This system allows the lawyers and workers on the case to have varied access to the materials within the portal.

• Having the users from the other side being able to search the content getting back the results, which don’t include the actual content is something that is very interesting. From there they could get the listing of results and bring to the attention to the opposing side the list allowing the opposing counsel to gain access to their desired information.

• Key pain points from this attorney’s perspective: o Document collection and analysis is something that has to be done and

the new regulations make this harder to do is very hard and challenging. o Keeping track of the time spent on a given activity. Might be possible to

create a small report that tells you how long your session has lasted this could help users

Related JSON Data { "customer": "Heller Ehrman", "fileName": "http://localhost/~mihay/docs/20070202-AMER-USA-CALIFORNIA-PALO%20ALTO-LEGAL-Content%20Services-

Heller%20Ehrman-Interview.docx", "imageURL": "http://localhost/~mihay/images/Heller_Ehrman_1.png", "index": 2, "interviewDate": "2007-02-02", "keyterms": [ "data", "heller", "lawsuit", "make",

Page 38: Masters Project - FINAL - Public

38

"points", "some", "system", "access", "content", "file", "may", "quote", "information", "employee", "when" ], "label": "Heller Ehrman Interview", "place": "PALO ALTO, CALIFORNIA, USA", "region": "AMER", "type": "Interview", "vertical": "LEGAL" }, { "index": 2, "interviewDate": "2007-02-02", "label": "2", "type": "Interviews" }, { "id": "Heller Ehrman Interview", "index": 2, "placeLatLng": "37.45542, -122.16708" },

Page 39: Masters Project - FINAL - Public

39

Appendix 4 – Prototype Exemplary Source Code

Key Word Extraction & Image Creation import requests, collections, bs4, re, operator, os, urllib2, json from collections import Counter from roundup.backends.indexer_common import STOPWORDS from pytagcloud import create_tag_image, create_html_data, make_tags, LAYOUT_HORIZONTAL, LAYOUTS, LAYOUT_MIX, LAYOUT_VERTICAL, LAYOUT_MOST_HORIZONTAL, LAYOUT_MOST_VERTICAL from pytagcloud.colors import COLOR_SCHEMES from pytagcloud.lang.counter import get_tag_counts from os import walk """ Desc: Make a list containing three dicts which will turn into JSON stanzas input: fileName - (string) "20120507-AMER-USA-FLORIDA-MELBOURNE-SI-CC-Northrup Grumman-Interview" keyTerms - (string) '["term", "term"]' idx - (integer) 1 return: three dicts suitable for conversion into a JSON structure, customer name with idx appended and un """ def makeJSONStanza (fileName, keyTerms, idx): myBits=re.split('\\-',fileName) myType=myBits[-1] myTypePlural='Interviews' rawCustomer=myBits[7] friendlyCustomer='_'.join(re.split('\W+',rawCustomer)) myVertical=myBits[5] imageLoc='http://localhost/~mihay/images/' fileURL='http://localhost/~mihay/docs/' + fileName + '.docx' myLongLat='UNDEFINED' myDate=myBits[0][0:4] + "-" + (myBits[0])[4:6] + "-" + (myBits[0])[6:8] myRegion=myBits[1] myCountry=myBits[2] myStateProvince=myBits[3] myCity=myBits[4] studyName=[6] myPlace=myCity + ', ' + myStateProvince + ', ' + myCountry return [ {'label': rawCustomer + " " + myType, 'type': myType, 'imageURL': imageLoc + friendlyCustomer + '_' + idx + '.png', 'interviewDate': myDate, 'vertical': myVertical, 'customer': rawCustomer, 'index': int(idx), 'keyterms': keyTerms, 'fileName': fileURL, 'region': myRegion, 'place': myPlace}, {'label': idx, 'type': myTypePlural, 'interviewDate': myDate, 'index': int(idx)}, {'id': rawCustomer + " " + myType, 'placeLatLng': myLongLat, 'index': int(idx)} ], friendlyCustomer + '_' + str (1) """ Desc: Make a list of tags/terms input: theTags - is a list of tuples

return: an array of tags """ def tagsToString (theTags): tagArray=[] for t in theTags: tagArray.append (t[0]) return tagArray def getFileNames (myPath): realFiles=[] myParse=re.compile ('\.txt$') for (dirpath, dirnames, filenames) in walk (myPath): for f in filenames: if myParse.search (f): realFiles.append (f) break return realFiles #Generate a list of specific stop words we want to avoid myStopWords=['THEM','O','NG','S','T','MANY','LOTS','HAVE','HAS','HAD','FROM','FOR','DO','DOES','DOESN','CAN','AN','ALL','ABOUT','CAME','WOULD','WAY','WANT','WHICH','YOU','YET','X','VERY','VIA','U','OUR','NO','ALSO','SUCH','ALL','HDS','NEED','DIFFERENT','OTHER','OTHERS','SYSTEMS','USED', 'WHAT'] STOPWORDS.extend (myStopWords) myFiles=getFileNames ('./docs') allJSON=[] myIdx=1 for theFile in myFiles: #Define the file name and split up the name bits into usable chuncks myFileName=re.split('\\.', theFile)[0] with open("./docs/" + theFile) as file: myText = file.read().lower() #Capture the top 15 most commont keyterms counts = collections.defaultdict(int) for word in re.split('\W+', myText): if word.upper() not in STOPWORDS and len(word)>2: counts[word.lower()] += 1 words = sorted((count, word) for word, count in counts.items()) myTags = [(word, count) for count, word in words[-15:]] keyTermString=tagsToString (myTags) (JSONStanza, friendlyCustomer)=makeJSONStanza (myFileName, keyTermString, str (myIdx)) allJSON.extend (JSONStanza) myIdx+=1 tags = make_tags(myTags,minsize=10, maxsize=17) create_tag_image(tags, './images/' + friendlyCustomer + '.png',

size=(192,128), background=(255, 255, 255, 255), layout=2, fontname='Philosopher', rectangular=True)

JSONFileName='./keyterms.js' JSONFile=open (JSONFileName, 'w') JSONFile.write (json.dumps(allJSON, sort_keys=True, indent=4, separators=(',', ': '))) JSONFile.close ()

Page 40: Masters Project - FINAL - Public

40

Web User Interface for Alternative 4 – Source & GUI <html> <head> <title>Key Words by Location and over Time</title> <meta http-equiv="content-type" content="text/html;charset=UTF-8" /> <link href="schema.js" type="application/json" rel="exhibit/data" /> <link href="keywords.js" type="application/json" rel="exhibit/data" /> <script type="text/javascript" src="http://localhost/~mihay/scripted/src/exhibit-api.js?bundle=false"></script> <link rel="exhibit-extension" type="text/javascript"

href="http://localhost/~mihay/scripted/src/extensions/map/map-extension.js?service=google&bundle=false"/>

<link rel="exhibit-extension" type="text/javascript" href="http://localhost/~mihay/scripted/src/extensions/time/time-extension.js?bundle=false" /> <link rel='stylesheet' href='styles.css' type='text/css' /> </head> <body> <div data-ex-role="collection" data-ex-item-types="Interview"></div> <table id="frame"> <tr> <td id="sidebar"> <h1>Filters</h1> <div id="exhibit-browse-panel"> <b>Search:</b> <div data-ex-role="facet" data-ex-facet

class="TextSearch"></div> <hr/> <div data-ex-role="facet" data-ex-expression=".customer"

data-ex-facet-label="Customers" data-ex-height="10em"></div>

<div data-ex-role="facet" data-ex-expression=".keyterms" data-ex-facet-label="Key Terms" data-ex-height="10em"></div>

<div data-ex-role="facet" data-ex-expression=".vertical" data-ex-facet-label="Verticals" data-ex-height="10em"></div>

</div> </td> <td id="content"> <div data-ex-role="coordinator" id="Interview"></div> <h1>Timeline and Geography</h1> <div class="item" data-ex-role="lens" style="display: none;">

<div><img data-ex-src-content=".imageURL" /></div> <div>Name: <a data-ex-href-content=".fileName"><span

data-ex-content=".label"/></a></div> <div>Date: <span data-ex-content=".interviewDate"/></div> <div>Vertical: <span data-ex-content=".vertical"/></div> </div> <div data-ex-role="view" data-ex-formats="date { mode: medium; show: date }" data-ex-view-class="Timeline" data-ex-label="Key Words over Time" data-ex-start=".interviewDate" data-ex-autoposition="true" data-ex-bubble-width="320" data-ex-top-band-pixels-per-unit="400" data-ex-show-summary="false" data-ex-timeline-height="200" data-ex-select-coordinator="Interview" > </div> <div data-ex-role="viewPanel" data-ex-initial-view="0" data-ex-

formats="date { mode: medium; show: date }"> <div class="map-lens" data-ex-role="lens" style="display:

none;"> <div><img data-ex-src-content=".imageURL" /></div> <div>Name: <a data-ex-href-content=".fileName"><span

data-ex-content=".label"/></a></div> <div>Date: <span data-ex-

content=".interviewDate"/></div> <div>Vertical: <span data-ex-content=".vertical"/></div> </div> <div data-ex-role="view" data-ex-view-class="Map" data-ex-label="Key Words by Location" data-ex-latlng=".placeLatLng" data-ex-center="38.479394673276445, -

115.361328125" data-ex-zoom="3" data-ex-bubble-width="200" data-ex-icon=".imageURL" data-ex-shape-width="70" data-ex-select-coordinator="Interview" data-ex-shape-height="70"> </div> </div> </td> </tr> </table> </body> </html>

Page 41: Masters Project - FINAL - Public

41

Page 42: Masters Project - FINAL - Public

42

Appendix 5 – Survey questions and responses

Questions Feedback on the Customer Intelligence Prototype

1. Based upon your experience please rank the ease of use for Alternative-1. • Extremely easy to use • Moderately easy to use • Difficult to use

2. Based upon your experience please rank the ease of use for Alternative-2. • Extremely easy to use • Moderately easy to use • Difficult to use

3. Based upon your experience please rank the ease of use for Alternative-3. • Extremely easy to use • Moderately easy to use • Difficult to use

4. Based upon your experience please rank the ease of use for Alternative-4. • Extremely easy to use • Moderately easy to use • Difficult to use

5. After looking at the prototype, generally I think that using key word analysis and visualization techniques are helpful for gathering customer intelligence. • Strongly agree • Moderately agree • Agree • Moderately disagree • Strongly disagree

6. After experiencing the prototype I think we should invest in improvements to manage our customer intelligence materials. • Agree • Disagree • Not Sure

Page 43: Masters Project - FINAL - Public

43

Responses

Page 44: Masters Project - FINAL - Public

44

Appendix 6 – Raw data with descriptive statistics and Key statistical tests

Raw data with descriptive statistics

Session Alt$1$C Alt*1*INC Alt$2$C Alt*2*INC Alt$3$C Alt*3*INC Alt$4$C Alt*4*INC = Error1 121 41 41 73 50 = Slip2 88 144 130 90 = Missing3 182 71 86 161 1544 96 54 63 157 635 159 52 73 308 2316 136 22 81 141 817 133 108 125 1388 115 91 101 92 659 112 65 196 4010 70 5411 31 58 245 6612 151 42 150 5513 44 44 54 2614 51 303 81 201 6915 51 103 119 284 5916 231 56 68 11 159 4817 49 88 223 10518 120 19 15 89 3319 131 54 134 5620 238 134 164 8821 328 232 100 254 7922 144 60 107 6623 121 50 154 122 9924 105 102 118 4325 131 120 308 3826 149 42 76 274 98 3827 84 68 110 6528 282 237 62 111 6229 216 81 133 229 10530 107 46 21 112 30

Count 27 20 27 2 24 6 27 3Session?Count 30 30 30 30 30 30 30 30Percent?Resp. 90% 67% 90% 7% 80% 20% 90% 10%1*Percent?Resp. 10% 33% 10% 93% 20% 80% 10% 90%Average 142.67 82.05 83.44 27.50 175.50 121.00 76.93 52.33Variance 4419.62 6219.00 1259.03 544.50 4909.13 3965.60 1893.61 520.33Standard?Dev. 66.48 78.86 35.48 23.33 70.07 62.97 43.52 22.81Errors 3.00 1.00 5.00 2.00Slips 17.00 1.00 1.00 1.00Percent?Err. 10% 3% 17% 7%Percent?Slip 57% 3% 3% 3%

Page 45: Masters Project - FINAL - Public

45

Key statistical tests

ANOVA

Analysis of Variance (One-Way)

Summary

Groups Sample size Sum Mean Variance Alt-1 30 4,280. 142.66667 3,962.41379

Alt-2 30 2,503.33333 83.44444 1,128.78161 Alt-3 30 5,265. 175.5 3,893.44828 Alt-4 30 2,307.77778 76.92593 1,697.71903

ANOVA

Source of Variation SS df MS F p-level F crit Between Groups 203,555.31636 3 67,851.77212 25.40703 1.04639E-12 2.68281 Within Groups 309,788.51852 116 2,670.59068

Total 513,343.83488 119

Page 46: Masters Project - FINAL - Public

46

T-Tests

Page 47: Masters Project - FINAL - Public

47

1"&"3"" " "Comparing Means [ t-test assuming equal variances (homoscedastic) ]

Descriptive Statistics VAR Sample size Mean Variance

30" 142.66667" 3,962.41379"

30" 175.5" 3,893.44828"

" " " "Summary Degrees Of Freedom 58"

Hypothesized Mean Difference 0.E+0"

Test Statistics 2.02898" Pooled Variance 3,927.93103"

" " " "Two-tailed distribution p-level 0.04706" t Critical Value (5%) 2.00172"

" " " "" " " "1&2"

" " "Comparing Means [ t-test assuming equal variances (homoscedastic) ] Descriptive Statistics

VAR Sample size Mean Variance

30" 142.66667" 3,962.41379"

30" 83.44444" 1,128.78161"

" " " "Summary Degrees Of Freedom 58"

Hypothesized Mean Difference 0.E+0"

Test Statistics 4.54606" Pooled Variance 2,545.5977"

" " " "Two-tailed distribution p-level 0.00003" t Critical Value (5%) 2.00172"

" " " "" " " "1&4"

" " "Comparing Means [ t-test assuming equal variances (homoscedastic) ] Descriptive Statistics

VAR Sample size Mean Variance

30" 142.66667" 3,962.41379"

30" 76.92593" 1,697.71903"

" " " "Summary Degrees Of Freedom 58"

Hypothesized Mean Difference 0.E+0"

Page 48: Masters Project - FINAL - Public

48

2&3$$ $ $Comparing Means [ t-test assuming equal variances (homoscedastic) ]

Descriptive Statistics VAR Sample size Mean Variance

30$ 83.44444$ 1,128.78161$

30$ 175.5$ 3,893.44828$

$ $ $ $Summary Degrees Of Freedom 58$

Hypothesized Mean Difference 0.E+0$

Test Statistics 7.11479$ Pooled Variance 2,511.11494$

$ $ $ $Two-tailed distribution p-level 0.000000002$ t Critical Value (5%) 2.00172$

$ $ $ $$ $ $ $2&4$

$ $ $Comparing Means [ t-test assuming equal variances (homoscedastic) ] Descriptive Statistics

VAR Sample size Mean Variance

30$ 83.44444$ 1,128.78161$

30$ 76.92593$ 1,697.71903$

$ $ $ $Summary Degrees Of Freedom 58$

Hypothesized Mean Difference 0.E+0$

Test Statistics 0.67156$ Pooled Variance 1,413.25032$

$ $ $ $Two-tailed distribution p-level 0.50453$ t Critical Value (5%) 2.00172$

Page 49: Masters Project - FINAL - Public

49

3&4$$ $ $Comparing Means [ t-test assuming equal variances (homoscedastic) ]

Descriptive Statistics VAR Sample size Mean Variance

30$ 175.5$ 3,893.44828$

30$ 76.92593$ 1,697.71903$

$ $ $ $Summary

Degrees Of Freedom 58$Hypothesized Mean Difference 0.E+0$

Test Statistics 7.22058$ Pooled Variance 2,795.58365$

$ $ $ $Two-tailed distribution p-level 0.000000001$ t Critical Value (5%) 2.00172$