23
Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen Dou , Fangxin Ouyang, Mennatallah El-Assady, Liu Jiang , and Daniel A. Keim Abstract—Visual text analytics has recently emerged as one of the most prominent topics in both academic research and the commercial world. To provide an overview of the relevant techniques and analysis tasks, as well as the relationships between them, we comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 in two fields: visualization and text mining. From the analysis, we derived around 300 concepts (visualization techniques, mining techniques, and analysis tasks) and built a taxonomy for each type of concept. The co-occurrence relationships between the concepts were also extracted. Our research can be used as a stepping-stone for other researchers to 1) understand a common set of concepts used in this research topic; 2) facilitate the exploration of the relationships between visualization techniques, mining techniques, and analysis tasks; 3) understand the current practice in developing visual text analytics tools; 4) seek potential research opportunities by narrowing the gulf between visualization and mining techniques based on the analysis tasks; and 5) analyze other interdisciplinary research areas in a similar way. We have also contributed a web-based visualization tool for analyzing and understanding research trends and opportunities in visual text analytics. Index Terms—Visualization, visual text analytics, text mining Ç 1 INTRODUCTION T HE significant growth of textual data and the rapid advancement of text mining have led to the emergence and prevalence of visual text analytics [1], [2]. This research combines the advantages of interactive visualization and text mining techniques to facilitate the exploration and anal- ysis of large-scale textual data from both a structured and unstructured perspective. Visual text analytics has recently emerged as one of the most prominent topics in both aca- demic research and the commercial world. For example, a leading business intelligence system, Power BI, announced features that enable the exploration and analysis of textual collections in 2016 [3]. The ultimate goal of visual text ana- lytics is to enable human understanding and reasoning about large amounts of textual information in order to derive insights and knowledge [4]. Due to the rapid expansion of research in the area of visual text analytics [4], [5], [6], [7], there is a growing need for a meta-analysis of this area to support understanding how approaches have been developed and evolved over time, and their potential to be integrated into real-world applications. There are several initial efforts to summarize the existing text visualization techniques by different aspects, such as data sources, tasks, and visual representa- tions [7], [8], [9]. For example, Alencar and Oliveira summa- rized around 30 text visualization techniques published before 2013 [7]. Later, Kucher and Kerren extended this sur- vey work by creating a comprehensive taxonomy with mul- tiple categories and items in order to classify the techniques with fine granularity [1]. They also developed a useful Web- based survey browser to facilitate the exploration of the cre- ated taxonomy. The most recent effort focuses on analyzing scientific literature [10]. While these efforts provide an over- view of text visualization techniques, they do not investi- gate the underlying text mining techniques. On the other hand, several comprehensive surveys of the work on text mining have been published summarizing the relevant research progress [11], [12], [13], [14], [15], [16], [17]. These surveys provide much valuable complementary informa- tion to existing literature reviews on text visualization. However, they do not establish a link between text mining and interactive visualization. The most powerful visual text analytics systems make use of advanced data mining algo- rithms and techniques, and we are still in need of an over- view, accounting for both the user-facing visualization and the back-end data mining approaches. Our survey will provide practical knowledge that relates to building visual text analytics tools, and it will support researchers in S. Liu, F. Ouyang and L. Jiang are with Tsinghua University, Beijing 100084, P. R. China. E-mail: [email protected], {oyfx15, jiangl16}@mails.tsinghua.edu.cn. X. Wang is with Microsoft Research, Beijing 100080, China. E-mail: [email protected]. C. Collins is with University of Ontario Institute of Technology (UOIT), Oshawa, ON L1H 7K4, Canada. E-mail: [email protected], [email protected]. W. Dou is with University of North Carolina at Charlotte, Charlotte, NC 28223. E-mail: [email protected]. M. El-Assady is with UOIT, Oshawa, ON L1H 7K4, Canada, and University of Konstanz, Konstanz 78464, Germany. E-mail: [email protected]. D.A. Keim is with University of Konstanz, Konstanz 78464, Germany. Manuscript received 1 Jan. 2018; revised 25 Apr. 2018; accepted 1 May 2018. Date of publication 8 May 2018; date of current version 27 May 2019. (Corresponding author: Shixia Liu.) Recommended for acceptance by J. van Wijk. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TVCG.2018.2834341 2482 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019 1077-2626 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

  • Upload
    others

  • View
    48

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

Bridging Text Visualization and Mining:A Task-Driven Survey

Shixia Liu , Xiting Wang, Christopher Collins ,Member, IEEE, Wenwen Dou ,

Fangxin Ouyang, Mennatallah El-Assady, Liu Jiang , and Daniel A. Keim

Abstract—Visual text analytics has recently emerged as one of the most prominent topics in both academic research and the

commercial world. To provide an overview of the relevant techniques and analysis tasks, as well as the relationships between them, we

comprehensively analyzed 263 visualization papers and 4,346 mining papers published between 1992-2017 in two fields: visualization

and text mining. From the analysis, we derived around 300 concepts (visualization techniques, mining techniques, and analysis tasks)

and built a taxonomy for each type of concept. The co-occurrence relationships between the concepts were also extracted. Our

research can be used as a stepping-stone for other researchers to 1) understand a common set of concepts used in this research topic;

2) facilitate the exploration of the relationships between visualization techniques, mining techniques, and analysis tasks; 3) understand

the current practice in developing visual text analytics tools; 4) seek potential research opportunities by narrowing the gulf between

visualization and mining techniques based on the analysis tasks; and 5) analyze other interdisciplinary research areas in a similar way.

We have also contributed a web-based visualization tool for analyzing and understanding research trends and opportunities in visual

text analytics.

Index Terms—Visualization, visual text analytics, text mining

Ç

1 INTRODUCTION

THE significant growth of textual data and the rapidadvancement of text mining have led to the emergence

and prevalence of visual text analytics [1], [2]. This researchcombines the advantages of interactive visualization andtext mining techniques to facilitate the exploration and anal-ysis of large-scale textual data from both a structured andunstructured perspective. Visual text analytics has recentlyemerged as one of the most prominent topics in both aca-demic research and the commercial world. For example, aleading business intelligence system, Power BI, announcedfeatures that enable the exploration and analysis of textualcollections in 2016 [3]. The ultimate goal of visual text ana-lytics is to enable human understanding and reasoningabout large amounts of textual information in order toderive insights and knowledge [4].

Due to the rapid expansion of research in the area ofvisual text analytics [4], [5], [6], [7], there is a growing needfor a meta-analysis of this area to support understandinghow approaches have been developed and evolved overtime, and their potential to be integrated into real-worldapplications. There are several initial efforts to summarizethe existing text visualization techniques by differentaspects, such as data sources, tasks, and visual representa-tions [7], [8], [9]. For example, Alencar and Oliveira summa-rized around 30 text visualization techniques publishedbefore 2013 [7]. Later, Kucher and Kerren extended this sur-vey work by creating a comprehensive taxonomy with mul-tiple categories and items in order to classify the techniqueswith fine granularity [1]. They also developed a useful Web-based survey browser to facilitate the exploration of the cre-ated taxonomy. The most recent effort focuses on analyzingscientific literature [10]. While these efforts provide an over-view of text visualization techniques, they do not investi-gate the underlying text mining techniques. On the otherhand, several comprehensive surveys of the work on textmining have been published summarizing the relevantresearch progress [11], [12], [13], [14], [15], [16], [17]. Thesesurveys provide much valuable complementary informa-tion to existing literature reviews on text visualization.However, they do not establish a link between text miningand interactive visualization. The most powerful visual textanalytics systems make use of advanced data mining algo-rithms and techniques, and we are still in need of an over-view, accounting for both the user-facing visualizationand the back-end data mining approaches. Our surveywill provide practical knowledge that relates to buildingvisual text analytics tools, and it will support researchers in

� S. Liu, F. Ouyang and L. Jiang are with Tsinghua University, Beijing100084, P. R. China. E-mail: [email protected], {oyfx15,jiangl16}@mails.tsinghua.edu.cn.

� X. Wang is with Microsoft Research, Beijing 100080, China.E-mail: [email protected].

� C. Collins is with University of Ontario Institute of Technology (UOIT),Oshawa, ON L1H 7K4, Canada.E-mail: [email protected], [email protected].

� W. Dou is with University of North Carolina at Charlotte, Charlotte, NC28223. E-mail: [email protected].

� M.El-Assady is withUOIT, Oshawa,ONL1H 7K4, Canada, andUniversityof Konstanz, Konstanz 78464, Germany. E-mail: [email protected].

� D.A. Keim is with University of Konstanz, Konstanz 78464, Germany.

Manuscript received 1 Jan. 2018; revised 25 Apr. 2018; accepted 1 May 2018.Date of publication 8 May 2018; date of current version 27 May 2019.(Corresponding author: Shixia Liu.)Recommended for acceptance by J. van Wijk.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TVCG.2018.2834341

2482 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

1077-2626� 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

discovering opportunities to narrow the gulf between thevisualization and text mining fields. In particular, the gapsbetween these two fields based on analysis tasks have notbeen explored yet. Many available text mining techniquesthat may be useful for visual text analytics have not beenconnected to visualizations. This may hinder the furtherdevelopment of this research area.

Hence, our approach is to highlight associations betweenanalysis tasks, visualization techniques, and text miningtechniques through a set of taxonomies. As shown in Fig. 1,these taxonomies contribute to 1) understanding currentpractices in developing visual text analytics tools, the on-going research, and the principal research trends; and 2)seeking potential research opportunities by relating text visu-alization research with text mining research. Ultimately, thesetaxonomies will pave the way for identifying the relation-ships and gaps between analysis tasks and techniques (visu-alization and text mining) to explore future researchdirections and develop novel applications.

To this end, we analyzed over 4,600 research paperspublished between 1992–2017 in 4 journals (IEEE TVCG,ACM TOCHI, IEEE TKDE, and JMLR) and 16 conference

proceedings (InfoVis, VAST, SciVis, EuroVis, PacificVis,AVI, CHI, IUI, KDD, WWW, AAAI, IJCAI, ICML, NIPS,ACL and SIGIR) in the fields of text mining and visualiza-tion. As shown in Fig. 2, a semi-automatic analysis processwas designed to analyze the text visualization and miningliterature. We first extracted and summarized the concepts(described in phrases) that capture visualization techniques,mining techniques, and analysis tasks related to visual textanalytics. Our extraction method is based on a pattern-based concept extraction algorithm [18]. Accordingly, threeconcept taxonomies—visualization techniques, analysistasks, and mining techniques—were derived. The relation-ships between the concepts in the taxonomies were thenextracted by using the co-occurrence statistics between dif-ferent types of concepts in papers (e.g., the co-occurrence ofthe visualization techniques and analysis tasks). Finally, agraph-based interactive visualization was developed tohelp understand and analyze the three taxonomies and therelationships between them.

By semi-automatically refining and analyzing these con-cepts in an interactive and progressive process, we identi-fied the following key features of visual text analytics. First,

Fig. 1. Task-oriented analysis: visualization and mining techniques are connected through their shared analysis tasks (here, only 80 percent of theedges are shown), to aid researchers and practitioners in understanding current practices in visual text analytics, identifying research gaps, andseeking potential research opportunities. Concepts are organized into three taxonomies which can be navigated interactively.

Fig. 2. The analysis pipeline aims at extracting, correlating, organizing, and presenting three types of concepts: mining techniques, analysis tasks,and visualization techniques. Three concepts comprise our description of the space of visual text analytics.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2483

Page 3: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

the most used visualization techniques are traditional onessuch as chart visualizations, typographic visualizations,and graph visualizations. The popularity of these techniquesis probably due to their simplicity and intuitiveness, as wellas the employment of more advanced mining techniquesin recent years. Second, a set of less frequently studied tasksin visual text analytics were identified, which include“information retrieval,” “network analysis,” “classification,”“outlier analysis,” and “predictive analysis.” Third, differentanalysis tasks are supported by different techniques, includ-ing text mining and visualization techniques, as well as theircombination.

Through a task-oriented and data-driven analysis, wethoroughly investigated each of the three concept taxono-mies and their connections to identify overarching researchtrends and under-investigated research topics within visualtext analytics. Consequently, we map out future directionsand research challenges to enhance the development of newvisual text analytics techniques and systems. The major con-tributions of this work are:

� A semi-automatic analysis approach that focuses onextracting, understanding, and analyzing the majorconcepts in the area of visual text analytics. Thisapproach can be easily extended to analyze otherresearch areas.

� Three concept taxonomies and a data-driven method toextract the relationships between them, better rev-ealing overarching research trends and missing res-earch topics within visual text analytics.

� A web-based visualization tool that enables the analysisof the major research trends and the potential res-earch directions in visual text analytics. This visuali-zation tool is published at: http://visgroup.thss.tsinghua.edu.cn/textvis.

� A comprehensive survey of the literature in visual textanalytics, classifying thousands of papers along ourtechnique- and task-taxonomies.

2 SURVEY LANDSCAPE

To obtain an overview of visual text analytics and the rela-tionship between the relevant visualization and miningfields, we systematically reviewed research articles from bothfields. The approach we took for each type of article wasdifferent. For visualization papers, we followed an exhaus-tive manual review of relevant venues. For text miningpapers, we followed the semi-automated approach of Sachaet al. [19], which is a combination of a manual selection andan automatic keyword-based extractionmethod.

2.1 Paper Selection: Visualization

There are fewer visualization papers than text miningpapers. Therefore, it was possible to do an in-depth manualselection. For this group of papers we preferred precision inthat we only wanted papers that deal with visualizing textdata. Depending on how closely related the venue (e.g., aconference) was to visualization research, we followed twomain approaches: full coverage or search-driven selection.For venues identified as being primarily about visualization,we reviewed every title from all the conference proceedingsto identify candidates. We then reviewed the abstracts ofcandidates, and finally the full text of any papers when itwas not clear from the title and abstract whether the papercontained any text visualizations. The venues for thisapproach were: InfoVis, VAST, Vis (later SciVis), EuroVis,AVI, PacificVis, and IUI. For higher volume venues with alarger proportion of irrelevant papers, we used two searchqueries (“text” AND “visualization”; “text” AND “analytics”),then reviewed titles, abstracts, and full texts to finalize theselections. This approach was applied to all available yearsof: CHI, KDD, andWWW. Finally, we used this same search-driven selection approach on all available years of two rele-vant journals: IEEE TKDE and IEEE TVCG. This resulted in atotal of 263 included papers that deal with text visualizationand visual text analytics. Our method is extensible, so itwould be possible to add further venues to the analysis andaccompanyingwebsite in the future.

2.2 Paper Selection: Text Mining

For the text mining papers, we optimize for recall as we areopen to including text mining techniques that may beunknown or underutilized in the visualization field. To pro-vide good coverage of visual text analytics and relatedtext mining methods, we followed the approach of Sachaet al. [19] to extract relevant research papers in a semi-auto-mated fashion. The paper collection was done in three steps.

In the first step, we performed seed paper selection man-ually by reviewing titles and abstracts from the most recentyear of papers from leading data mining conferences andjournals (AAAI, KDD, WWW, SIGIR, ICML, NIPS, IJCAI,ACL, and JMLR). 607 papers were thus identified. Thesewere combined with the papers from our visualizationpaper collection to create the seed collection of papers usedin the second step. A detailed description of the statistics isshown in Fig. 3.

In the second step, we performed keyword extractionbased on the seed paper collection. Specifically, we manu-ally checked the top keywords of the seed paper collectionand selected 10 keywords that denote the data source.

Fig. 3. Corpus size by data source over time. The volume of mining papers is much larger than that of visualization papers.

2484 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 4: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

The keywords then serve as query terms to retrievemoremin-ing papers. The 10 extracted keywords were: “text, document,blog, news, tweet, twitter, wikipedia, book, microblog, textual.”

In the third step, we retrieved the full text of all papersfrom the aforementioned top data mining venues. Theretrieved papers were indexed by using Lucene [20]. Wethen searched the Lucene index with the 10 keywordsextracted from the second step, and ranked the papersbased on the relevance score provided by Lucene. Paperswith relevance scores larger than 0.1 were selected as thetext mining papers. This cut-off threshold of 0.1 was usedbecause we found that it can balance precision and recall.After this step, we obtained 4,346 mining papers.

Since the number of text mining papers are an order ofmagnitude greater than the number of visual text analyticsrelated articles, analyzing the two corpora jointly may leadto the results being dominated by patterns from the textmining articles. Therefore, we developed an approach foranalyzing each corpus separately and connecting the twocorpora by tasks and techniques.

2.3 Analysis Process

Recently, Isenberg et al. [21] developed a data-drivenapproach to determine major visualization topics and pointout missing and overemphasized topics. To this end, theymainly focused on examining and analyzing two sets ofkeywords: author-assigned keywords and PCS taxonomykeywords. Inspired by their method, we first examined the263 collected text visualization papers and performed a bot-tom-up analysis, mainly focusing on checking the employedtechniques and the tasks supported by the techniques.

Our preliminary analysis revealed that visualization tech-niques, analysis tasks, and mining techniques are the key typesof concepts in visual text analytics. We distilled the threetypes of concepts by 1) studying the definition, scope, andpipelines of visual (text) analytics; and 2) learning the keyaspects of academic papers. Previous study in the field ofnatural language processing has shown that the applicationdomain and technique are two key aspects for a scientificpaper [18]. In the field of visual text analytics, informationregarding application domains is usually captured by anal-ysis tasks [22]. According to the pipelines of visual (text)analytics, mining techniques and visualization techniquesare two essential types of techniques [4], [22]. Hence, ouranalysis is based on analysis tasks, mining techniques, and

visualization techniques. In particular, we focus on identify-ing the key concepts in each type, as well as, the co-occur-rence relationships between different types of concepts(Fig. 2).

To identify tasks and techniques from the surveyedresearch articles, we performed both manual coding andautomated analysis that leverages the manual codingresults to bootstrap concept identification on a much largerscale. In particular, visualization papers were gathered first;the manually labeled tasks and mining techniques fromthese papers were later used as seeds for extracting furtherconcepts from the mining papers.

We developed a semi-automatic method to extract thethree taxonomies from our source papers and identified therelationships between them. Fig. 2 shows the correspondingworkflow to generate and analyze the three types of con-cepts in visual text analytics. The workflow consists of threelevels: concept extraction to extract the three types of conceptsby using a computational linguistics method [18]; taxonomybuilding to create a concept taxonomy for each type of con-cept manually or by the K-means clustering algorithm; andconcept visualization to facilitate the understanding and anal-ysis of the concepts and the relationships among them. Inthe following sections, each of the three levels will bedescribed in more detail.

3 CONCEPT EXTRACTION

Our approach extracts concepts from both text visualizationpapers (total number: 263) and text mining papers (totalnumber: 4,346). As shown in Fig. 4, concepts from the textvisualization papers were annotated manually. In total, wegot 92 mining techniques, 77 analysis tasks, and 25 visuali-zation techniques. Since the number of text mining paperswas large, we used a semi-automatic method to extract thecorresponding analysis tasks and mining techniques. Thismethod combines automatic concept extraction with twotypes of expert knowledge: 1) the mining techniques andanalysis tasks extracted manually from the text visualiza-tion papers; and 2) expert labeling and refinement of theconcepts. Specifically, our method consists of three steps:candidate concept extraction, expert labeling and classifica-tion, and concept refinement.

Candidate Concept Extraction. In the first step, we extractthe candidate concepts using a computational linguistics

Fig. 4. The pipeline of concept extraction.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2485

Page 5: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

method proposed by Gupta et al. [18]. Given a collectionof research papers, this method extracts analysis tasks(e.g., speech recognition) and techniques (e.g., latent Dirich-let allocation) by matching dependency patterns betweenwords. For example, given a sentence “we addressed thisproblem by using latent Dirichlet allocation,” we can extractthe technique “latent Dirichlet allocation” by using pattern“using ! ðdirect� objectÞ.” A key here is to define the pat-terns used to extract the techniques and tasks (applicationdomains). In this paper, we combine the seed patterns pro-vided by Gupta et al. [18] with the patterns we extracted byusing the mining techniques and tasks extracted from thevisualization papers. In total, we extracted 1,772 candidatetasks and 8,805 candidate mining techniques.

Expert Labeling and Concept Classification. While the com-putational linguistics method is able to identify many can-didate techniques and tasks, its ability to differentiatemeaningless phrases such as “this problem” from meaning-ful phrases such as “text categorization” is limited. To reducethe noise, we sampled 2,000 candidate concepts, askedexperts to label whether each concept was noise or not, andthen used classification to identify noise in the rest of the can-didate concepts. Specifically, five experts, all having morethan five years of research experience in text mining and/ortext visualization, were asked to label the candidate con-cepts. We assigned the concepts to the experts so that eachconcept was labeled by two experts. The labeling agreementwas 81.4 percent. 1,628 concepts that were given the samelabel by two experts were used in the classification.

Next, we employed the support vector machine (SVM)model [254] to classify the remaining concepts. The SVMinputs were feature vectors and labels of the concepts. Tocalculate the feature vector of each concept, we usedKNET [255], which is a deep learning framework for learn-ing word embeddings. By using this framework, conceptswith similar syntactic and semantic relationships wereassigned to similar feature vectors. The SVM model wastrained using a five-fold cross validation with average accu-racy of 89.4 percent for analysis tasks and 91.4 percentfor mining techniques. After applying the model to theremaining 8,949 candidate concepts, we found 187 analysistasks and 718 mining techniques.

Concept Refinement. When examining the concepts extrac-ted, we observed that some of them were quite similar (e.g.,“text categorization” and “text classification”). Also, theclassification result of the second step was not perfect.To solve these problems, we asked two experts to manuallycheck the results, merge similar concepts, and reduce noise.To facilitate the labeling process, we organized similarconcepts into clusters by applying K-means on the wordembedding feature vectors. By checking the clusters, theexperts were able to detect concepts that needed to bemerged or removed. Following this step, we obtained 141refined analysis tasks and 126 refined mining techniques.

4 TAXONOMY BUILDING

Based on the three-level workflow (Fig. 2) and the analysisof a large number of text visualization papers and text min-ing papers, we have constructed three taxonomies: analysistasks, visualization techniques, and mining techniques.

To build the visualization technique taxonomy, two co-authors, who are the experts in the visualization field, manu-ally constructed the taxonomy based on the 25 visualizationtechnique concepts that were manually extracted from textvisualization papers. The other co-authors then examinedand refined the visualization taxonomy. The refinement wasdone in an iterative fashion; all co-authors participated inmultiple rounds of discussions to finalize the visualizationtaxonomy. With this method, we generated a two-level visu-alization technique hierarchywith 6 internal nodes.

As more concepts were extracted for analysis tasks andmining techniques, a semi-automatic method was employedto build the corresponding taxonomies. Previous researchhas indicated that good taxonomies should not be too deepor too wide[256]. As a result, we strived to establish a com-promise between the tree depth and width while buildingthese two taxonomies. In particular, we first employed thepopular K-means clustering algorithm to create the task tax-onomy and mining taxonomy. We iteratively divided theembedding feature vectors into K clusters by using K-meansand generated a four-level task hierarchy with 24 internalnodes as well as a five-level mining hierarchy with 25 inter-nal nodes. Then two authors worked iteratively to refineand improve these two taxonomies. Next, all other authorsexamined and refined these two taxonomies iteratively.Finally, we consulted with 4 text mining or machine learn-ing experts to derive the final taxonomies.

The main objective of the resulting taxonomies is to pro-vide a framework that is useful from the researcher’s andpractitioner’s standpoint. They help to match techniques toreal-world problems (represented by tasks), and moreimportantly serve as a foundation to develop new techni-ques and new applications.

4.1 Taxonomy of Visualization Techniques

We manually labeled all visualization papers with the con-cepts in the visualization technique taxonomy. In Table 1,we present the two-level taxonomy of visualization techni-ques, along with a list of representative papers that use eachindividual visualization technique in the hierarchy (at most30 papers for each second level visualization technique).For each visualization technique, we reported the numberof papers that use this technique.

The first level visualization techniques include 9 con-cepts, such as “graph visualizations” (Fig. 5a), “timeline vis-ualizations” (Fig. 5b), and “spatial projections” (Fig. 5c).Some first-level concepts are further organized into 2 to 4second-level concepts. Taking “timeline visualizations” asan example, it contains two second-level concepts: “streamgraph” and “flow.” As shown in Table 1, all papers thatbelong to a second-level visualization concept are includedin the papers column.

The three first-level visualization concepts that have thelargest number of related papers are “typographic visual-izations,” “chart visualizations,” and “graph visualizations.”The first-level concept that captures the largest number ofpublications is “typographic visualizations,” which contains125 papers. Since all of the visualization papers included inour survey are related to text data, it makes sense that manyof these visualizations include a view that presents the textsin a typographic form.

2486 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 6: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

TABLE 1Taxonomy of Visualization Techniques and the Papers Demonstrating Each Technique

First-level Second-level Examples

typographicvisualizations (127)

texthighlighting (97)

[23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40],[41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52]

word cloud (56) [23], [33], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67], [68],[69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80]

hybrid (4) [66], [81],[82], [83]

chartvisualizations (102)

other charts(e.g., bar chart) (52)

[23], [28], [29], [37], [38], [40], [57], [60], [62], [68], [70], [72], [73], [84], [85], [86], [87], [88],[89], [90], [91], [92], [93], [94], [95], [96], [97], [98], [99], [100]

scatterplot (32) [24], [27], [29], [37], [38], [42], [50], [91], [99], [101], [102], [103], [104], [105], [106], [107], [108],[109], [110], [111], [112], [113], [114], [115], [116], [117], [118], [119], [120], [121]

line chart (29) [55], [66], [72], [73], [100], [105], [109], [111], [119], [122], [123], [124], [125], [126], [127], [128],[129], [130], [131], [132], [133], [134], [135], [136], [137], [138], [139], [140], [141]

table (15) [33], [71], [105], [110], [112], [134], [142], [143], [144], [145], [146], [147], [148], [149], [150]

graphvisualizations (95)

node-link (48) [25], [39], [68], [86], [94], [99], [130], [151], [152], [153], [154], [155], [156], [157], [158], [159],[160], [161], [162], [163],[164], [165], [166], [167], [168], [169], [170], [171], [172], [173]

tree (34) [26], [27], [44], [64], [73], [100], [103], [105], [108], [111], [114], [118], [140], [166], [174], [175],[176], [177], [178], [179], [180], [181], [182], [183], [184], [185], [186], [187], [188], [189]

matrix (26) [36], [86], [94], [112], [128], [157], [132], [134], [161], [190], [191], [192], [193], [194], [195],[196], [197], [198], [199], [200], [201], [202], [203], [204], [205]

timelinevisualizations (50)

stream graph (35) [32], [44], [45], [46], [49], [52], [80], [104], [121], [139], [154], [161], [165], [168], [178], [179],[185], [187], [206], [207], [208], [209], [210], [211], [212], [213], [214], [215], [216], [217]

flow (28) [32], [54], [67], [73], [80], [139], [143], [144], [147], [161], [167], [168], [169], [173], [178], [179],[185], [187], [208], [213], [215], [218], [219], [220], [221], [222], [223], [224]

spatialprojections (35)

galaxies (33) [33], [34], [36], [58], [70], [74], [116], [128], [130], [142], [144], [146], [149], [153], [161], [164],[176], [187], [196], [202], [225], [226], [227], [228], [229], [230], [231], [232], [233], [234]

voronoi (6) [144], [164], [185], [225], [235], [236]

high-dimensionalvisualizations (34)

glyph (24) [23], [25], [33], [38], [65], [89], [119], [122], [123], [139], [120], [140], [142], [158], [159], [161],[164], [188], [193], [196], [198], [237], [238], [239]

PCP (11) [66], [72], [82], [94], [104], [136], [169], [173], [238], [240], [241]

topologicalmaps (33)

[26], [34], [39], [57], [68], [73], [74], [78], [92], [100], [107], [129], [130], [153], [176], [177], [180],[206], [207], [216], [219], [222], [227], [228], [231], [242], [243], [244], [245], [246],

radialvisualizations (15)

[73], [100], [102], [108], [126], [144], [155], [169], [174], [176], [182], [194], [247], [248], [249]

3D visualizations (10) [85], [110], [111], [162], [196], [245], [250], [251], [252], [253]

The number in the bracket is the number of papers that use each visualization technique.

Fig. 5. Example text visualizations: (a) TopicPanorama [116] leverages graph visualization to encode topic graphs from multiple sources; (b) Even-tRiver [46] helps users browse, search, track, associate, and investigate events by using a timeline-based visualization; (c) In DemographicVis [33],spatial projection is employed to show user groups based on topic interests.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2487

Page 7: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

The second-level techniques under “typographic visual-izations” include “word cloud,” “text highlighting,” and“hybrid.” For example, 56 papers incorporate a “word cloud”visualization; note that some papers designateword clouds asa primary view to summarize texts, while other papers useword clouds as facilitative visualization in addition to othervisualization techniques. Another large first-level concept,“graph visualizations,” is mentioned by 95 text visualizationpapers. This concept includes three second-level concepts,namely “tree,” “matrix,” and “node-link.” The “node-link”technique captures the largest number of papers (48) underthis concept. The “tree” and “matrix” techniques capture34 and 26 papers, respectively. As many papers employ tree-based visualizations as the main view, we separate this con-cept to emphasize its role in visual text analytics.

Overall, the taxonomy provides a categorization of thevisualization techniques presented in all the papers from thevisualization publication venues. The taxonomy providesboth an overview of the available visualization techniquesand away for researchers and practitioners to quickly identifypapers related to a particular visualization technique.

4.2 Taxonomy of Analysis Tasks

When iteratively refining and improving the automaticallygenerated task taxonomy, we first referred to the call forpapers (CFPs) and section organization of several top-tiertext-mining-related conferences and journals, includingSIGIR, ACL, WWW, KDD, AAAI, ICML, NIPS, IJCAI andJMLR. In particular, we used the topics in the CFPs to orga-nize and refine the concept hierarchy.We further used the ses-sion name of each paper to validate the concept(s) that arecontained in the paper. For example, initially, “fragmentdetection” and “duplicate detection”were put at the first levelby the K-means-based taxonomy building method. Afterchecking the CFPs and section names of several SIGIR confer-ences, we found a topic and similar section names in SIGIR2009 and 2010, for some of the papers related to the twoconcepts. The topic was “structure analysis,” so we organizedthese two concepts as sub-concepts under the concept“structure analysis” in “information retrieval.” Second, twosenior PhD students, who majored in interactive machinelearning and are not the co-authors of this paper, workedclosely with two of the co-authors to iteratively refine the tax-onomy through face-to-face discussions. Finally, we alsoworked with two researchers who majored in text miningor machine learning, a senior researcher from MicrosoftResearch, and a professor from the Hong Kong University ofScience and Technology, to further verify and refine ourtaxonomy.

The final taxonomy of analysis tasks consists of threelevels. The first level includes 9 concepts ranging from“information retrieval,” “cluster/topic analysis,” to severalmining-related concepts such as “outlier analysis” and“network analysis.” The selection and refinement of the firstlevel concepts was inspired by previous studies on datamining tasks [12], [322]. In particular, we roughly divide thefirst-level concepts into three categories: tasks onmodel building(e.g., “classification” and “cluster/topic analysis”), tasks onpattern detection (e.g., “outlier analysis” and “trend analysis”),and tasks on applications (e.g., “natural language processing(NLP)” and “information retrieval”).

The largest first-level cluster in terms of number ofpapers and child nodes is “information retrieval,” whichcontains 1,884 papers (33 visualization papers and 1,851mining papers) and 12 children, such as “entity ranking,”“XML retrieval,” and “efficiency and scalability” (Table 2).Another first-level cluster which also has the largest numberof child nodes (12 children) is “natural language proc-essing.” The second largest cluster in terms of number ofpapers is ”cluster/topic analysis,” which contains 1,624papers (54 visualization papers and 1,570 mining papers).“Exploratory analysis” and “natural language processing”are the largest and second largest clusters in terms of num-ber of visualization papers, containing 132 and 120 papers,respectively. An interesting fact is that “exploratory analy-sis” only ranks 5th in terms of paper number. This indicatesthat the well-studied visual text analytics topic is not themost popular in the text mining field. The second level con-tains 62 concepts. For example, the cluster “topic analysis”contains sub-clusters “flat topic analysis”, “hierarchical topicdetection,” and “topic evolution.” The cluster “clustering”contains sub-clusters “flat clustering” and “hierarchicalclustering.” The third level consists of 62 fine-granularityconcepts, such as “part-of-speech tagging,” “co-referenceanalysis,” “query processing,” “search log analysis,” and“indexing.” The number of the second-level concepts is thesame as that of the third-level concepts. After examiningthe taxonomy, we found that only 14 of the 62 second-levelconcepts had third-level child nodes.

The top five first-level clusters (including their children)ranked by the number of papers in each cluster are shownin Table 2. For each second-level node in the hierarchy, weselect a few exemplar papers to show a range of analysistasks within the corresponding visualization and miningpapers. As there may be dozens of papers related to eachsecond-level concept, we selected three visualization papersfor those concepts whose relevant paper number is greaterthan three, if the number of visualization papers is less thanthree, mining papers are additionally selected.

4.3 Taxonomy of Mining Techniques

We categorized the mining techniques based on the threemajor stages of the machine learning life cycle: “data proc-essing,” “modeling,” and “model inference” [323]. In the“data processing” stage, data is gathered and preprocessedfor training and testing. In the “modeling” stage, we gatherknowledge about the problem domain, make assumptionsbased on our knowledge, and express these assumptions in aprecise mathematical form. In the “model inference” stage,the model variables are computed based on the data. Techni-ques related to other stages (e.g., evaluation and diagnosis)were merged with the three major stages to keep the taxon-omy concise and clear. After building the taxonomy, weasked three researchers who majored in data mining to helprefine the taxonomy. One researcher is a professor from theHong Kong University of Science and Technology, the othertwo researchers are senior PhD students withmore than fouryears of research experience in datamining.

The first two levels of the mining technique taxonomy arepresented in Table 3. As shown in the table, we divided dataprocessing techniques based on the data types. Accordingly,we got four second-level concepts: “document-level” data

2488 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 8: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

processing (e.g., document segmentation), “sentence/para-graph-level” data processing (e.g., sentence embedding),“word/phrase/entity-level” data processing (e.g., tokeniza-tion), and “hybrid” processing (e.g., relevance calculation).The modeling techniques were summarized based on text-books on text mining and machine learning [12], [324]. Spe-cifically, we have models for “classification” (e.g., SupportVector Machine), “clustering” (e.g., K-means), “dimensionreduction” (e.g., multidimensional scaling), “topic” (e.g.,latent Dirichlet allocation), and “regression” (e.g., convexregression). We also have “language model,” “graphicalmodels” (e.g., hidden Markov model), “neural networks”(e.g., convolutional neural networks), and “mixture models.”The data inference techniques were categorized basedon whether the method was probabilistic or not. For “proba-bilistic inference,” the taxonomy contains parametricmethods such as expectation maximization, as well as non-parametric methods such as Gibbs sampling. For “non-prob-abilistic inference,” we have optimization techniques such asdynamic programming and convex optimization.

5 VISUALIZATION OF CONCEPT RELATIONS

To understand the relationship between the visualizationtechniques, analysis tasks, and mining techniques, wedeveloped an interactive visualization which connects thethree concept hierarchies and provides access to the under-lying research papers (see Fig. 1). Our visualization is task-oriented, placing analysis tasks at the center as the bridgebetween mining and visualization techniques. The goal ofthe visualization is to reveal common connections betweenvisualization and text mining, as well as show researchtopics that are not well connected to identify potentialopportunities for future work.

TABLE 2Taxonomy of Analysis Tasks (Top Five Largest Clusters

Ranked by the Number of Papers in Each Cluster)and Example Papers

First-level Second-level Examples

informationretrieval(1,884)

entity ranking (6) [257], [258], [259]

XML retrieval (5) [260], [261], [262]

evaluation (35) [236], [263], [264]

user activity tracking (2) [25], [265]

search (76) [25], [229], [266]

recommendation (27) [267], [268], [269]

structure analysis (22) [199], [270], [271]

query analysis (302) [175], [272], [273]

filtering (15) [26], [274], [275]

interactive retrieval (12) [37], [183], [276]

unstructured informationretrieval (107)

[164], [250], [277]

efficiency and scalability (361) [278], [279], [280]

cluster/topicanalysis(1,624)

community discovery (6) [281], [282], [283]

text segmentation (26) [284], [285], [286]

topic analysis (556) [54], [178], [240]

contextual text mining (2) [287], [288]

clustering (699) [289], [290], [291]

naturallanguageprocessing(NLP)(1,623)

geotagging (8) [292], [293], [294]

data/information extraction (535) [73], [206], [293]

domain adaption (54) [295], [296], [297]

data enrichment (7) [23], [47], [92]

alignment (3) [202], [218], [298]

event analysis (113) [46], [57], [125]

discourse analysis (28) [70], [189], [190]

content analysis (78) [64], [68], [299]

sentiment analysis (467) [118], [185], [201]

lexical/syntactical analysis (73) [118], [189], [201]

question answering (156) [146], [300], [301]

text summarization (526) [43], [59], [192]

classification(1,299)

cross language textclassification (2)

[302], [303]

image classification (37) [304], [305], [306]

sub-document classification (4) [37], [90], [93]

binary classification (64) [307], [308], [309]

taxonomy integration (2) [169], [310]

hierarchical classification (23) [311], [312], [313]

query classification (23) [314], [315], [316]

web page classification (14) [101], [317], [318]

uncertainty tackling (4) [28], [164], [169]

tandem learning (1) [319]

exploratoryanalysis(1,028)

monitoring (113) [26], [142], [320]

comparison (541) [33], [82], [199]

navigation/exploration (348) [24], [25], [166]

region of interest (6) [36], [205], [321]

The number in the bracket represents the number of papers that aim at tacklingeach analysis task.

TABLE 3Taxonomy of Mining Techniques and Example Papers

First-level Second-level Examples

dataprocessing(1,175)

sentence-level(sentence embedding) (25)

[325], [326], [327]

word/phrase/entity-level (679) [328], [329], [330]

document-level (288) [331], [332], [333]

hybrid (387) [334], [335], [336]

modelinference(1,335)

non-probabilistic inference (267) [337], [338], [339]

probabilistic inference (1,160) [340], [341], [342]

modeling(3,085)

models for classification (1,636) [343], [344], [345]

models for clustering (908) [346], [347], [348]

models for dimensionreduction (247)

[349], [350], [351]

topic models (1,089) [352], [353], [354]

models for regression (256) [355], [356], [357]

language model (271) [358], [359], [360]

graphical models (187) [361], [362], [363]

neural networks (412) [364], [365], [366]

mixture models (128) [367], [368], [369]

The number in the bracket is the number of papers that leverage each miningtechnique.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2489

Page 9: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

Our visualization is a tripartite graph, framed around thethree concept hierarchies extracted from the research papers.Each column initially contains the first-level concepts, whileconnections are shown between the columns (see Fig. 6 left).Level-of-detail filters are provided to reduce clutter bydecreasing the number of edges (co-occurrences) based onfrequency thresholds. Connections at lower levels of the hier-archy are propagated to the parent concept, so that the initialoverview shows a summary of all connections in the dataset,and drill-down can be used to see the details.

We separated source concepts into ones coming fromvisualization papers, shown in red, and others coming fromdata mining papers, shown in blue (bi-colored connectionsbetween (b) and (c) in Fig. 1). The total number of occur-rences of a concept across all papers is encoded in the size ofthe concept label. Trends over time between visualizationand data mining are revealed through spark lines appearingbeside the concept label. Since the total number varieswidelyacross concepts, we normalized the spark lines for each con-cept so that they reveal the relative number of papers at eachyear. The absolute number of papers in each concept isencoded in the horizontal bar charts under the spark lines.

Connections in the visualization are based on concept co-occurrences. The number of papers containing both con-cepts at the endpoints of an edge is encoded in the thicknessof the edge. The color of the edge is split along the length ofthe edge to show the proportion of contributing papersfrom each research field. For example, in Fig. 6 on the left,the connection between the task “cluster/topic analysis”and the mining technique “modeling” shows that most co-occurrences of these concepts came from mining papers(edge is mostly blue). We found a similar proportional pat-tern between “natural language processing” and “modelinference,” but overall a lower number of co-occurrences(thinner edge).

Hovering on a concept label highlights all reachableedges and concepts while fading the others, thus revealingthe co-occurring concepts across the dataset (see Fig. 6right). Hovering on a timepoint in a spark line graph revealsa rich tooltip with precise numerical data. Since each sparkline is independently scaled to maximize the visibility oftrends, the precise values can be used to compare acrossspark lines. Selecting a concept populates the paper panel atthe right to show titles, abstracts, and metadata for paperslabelled with that concept. Selecting an edge populates the

paper panel with papers containing both of the associatedconcepts. Target concepts appearing in the abstract text arehighlighted in the paper panel for quick identification.Finally, the full text of any paper can be accessed by clickingits DOI link in the paper panel.

6 RESULTS

In this section, we examine the current practices in visual textanalytics by analyzing the three concept taxonomies, the con-nections between them, and the temporal trends revealed byour visualization tool. We also discuss potential researchopportunities by comparing the trends observed in the litera-ture of visual text analytics to the trends in text mining.

6.1 Current Practices of Visual Text Analytics

This study demonstrates that the visualization tool based onour literature analysis can help researchers and practitionersto better understand the current practices in visual text ana-lytics. In particular, we discuss different trends in visualiza-tion techniques, the major analysis tasks, frequently usedmining techniques, and the connections between them.

6.1.1 Current Practices of Visualization Techniques

Fig. 1a summarizes nine first-level visualization techniques,their temporal trends, and how frequently they are used.One can see that the usage patterns vary among differenttypes of techniques. To better understand the current practi-ces, we divided these techniques into three groups based onhow frequently they were used.

The first group (Fig. 1A) contains the three most frequentlyused techniques: “typographic visualizations” (125 papers),“chart visualizations” (102 papers), and “graph visual-izations” (95 papers). We observed that these popular techni-ques were the traditional ones: they were also frequently usedin text visualization papers before 2000. Moreover, the pro-portion of papers that use these techniques has tended toincrease the past few years.

To study this phenomenon in more detail, we drilled intothe next level of “chart visualization.” As shown in Fig. 7,“chart visualization” has four children: “line chart,”“scatterplot,” “table,” and “other charts.” After we switchedto the temporal spark lines that display the absolute numberof papers at each year, we observed an interesting “revival”of these techniques (Fig. 7). While these techniques were

Fig. 6. Our hierarchical visualization of concept relationships. On the left, the initial overview shows relationships extracted from visualization papersin red and data mining papers in blue. On the right, a drill down operation has been applied to investigate tasks under the high level concept of clus-ter/topic analysis, and the timeline for task topic analysis is hovered, isolating connections to this task and revealing detailed statistics in a tooltip.The paper panel is populated with papers related to this task.

2490 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 10: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

popular in early years, there was a time period (2000 to2005) when these techniques were not used frequently.After this time period, researchers started to use these tech-niques again. The trend to use these simple yet effectivetechniques has been even stronger the past five years. Thisphenomenon is interesting because of two reasons. First, allfour types of chart visualizations share similar patterns.Second, the percentage of visualization papers involved(39 percent) is large. We then checked the latest papers thatused the techniques with the paper panel in the visualiza-tion tool. The paper panel reveals that in some cases, thesetechniques served a supportive role as part of a detail viewor dashboard (e.g., in [73]), but in other cases, they were themain visualization components described in the papers(e.g., in [24]). One hypothesis regarding the revival of chartvisualizations is due to the preference for simpler and moreintuitive visualizations in order to reduce the learning curveof users. Another possible reason is that researchers tend torely on more advanced learning methods instead of morecomplex visualizations to discover interesting patterns. Forexample, Berger et al. presented cite2vec [24], a visualiza-tion scheme that allows users to dynamically browse docu-ments via how other documents use them. This usage-oriented exploration is enabled by projecting the words anddocuments into a common embedding space via wordembedding. Here, a simple scatterplot was leveraged tovisualize the embedding space. While the visualization wasrelatively simple, a variety of useful patterns could be founddue to special properties of the embedded space.

The second group consists of four first-level techniques:“timeline visualizations” (50 papers), “spatial projections”(35 papers), “high-dimensional visualizations” (34 papers),and “topological maps” (33 papers) (Fig. 1B). These techni-ques are effective for specific types of data. For example,“timeline visualizations” are suitable for analyzing textualdata with time stamps. “Topological maps” are intuitivechoices for joint analysis of geographical information andtext. This scenario illustrates that our visualization toolhelps new researchers and practitioners to identify relevantvisualization and mining techniques as well as the relatedpapers for a certain type of data.

The last group (Fig. 1C) includes two first-level techni-ques that are not used very frequently: “radial visualization”(15 papers), “3D visualizations” (10 papers). Studyingsuch techniques may help to discover the potential forrarely-used techniques and trigger the development of novelvisualizations.

6.1.2 Current Practices of Analysis Tasks

The task taxonomy consists of nine first-level concepts(Fig. 1b), which are divided into the following two groups.

The first group contains the most studied tasks in textvisualization papers. Tasks in this group are “exploratory

analysis,” “natural language processing,” “trend analysis,”and “cluster/topic analysis.” All these tasks have been stud-ied in more than 50 visualization papers. Except for “trendanalysis,” all the tasks were frequently studied before 2000.Their temporal trends are also diverse. For example, “cluster/topic analysis” shows an upward trend between 2006and 2014, while “exploratory analysis” experiences a surgeafter 2000.

The second group contains the less frequently studiedtasks in the visualization field, namely “informationretrieval,” “network analysis,” “classification,” “outlier ana-lysis,” and “predictive analysis.” We were a little surprisedwhen we found that there were only a few text visualizationpapers on classification. In our collection, there were severalvisualization papers on classification at IEEE VIS each year.After a careful examination of the relevant papers publishedat IEEE VIS in recent years, we found that most of them con-sidered general data sources instead of textual data [370],[371]. Among the five tasks in this group, only “informationretrieval” has shown a downward trend in the visualizationfield in recent years. The task concepts “classification,”“outlier analysis,” and “predictive analysis” have experi-enced an upward trend in recent years. However, in inter-preting these trend lines one has to be cautious due to thesmall sample size, with fewer than 10 visualization papersfor each concept.

Using our interactive visual interface, users may analyzeand explore the relationships between the visualizationtechniques and tasks to understand which visualizationtechniques have been applied to support which tasks. Forexample, we selected the task “predictive analysis” for fur-ther investigation. Hovering over this task in the web-basedvisualization tool will highlight all visualization techniquesthat have been used to support the predictive analysis (Fig. 8).The visualization techniques supporting the “predictiveanalysis” task include: “typographic visualizations,” “chartvisualizations,” “spatial projections,” “high-dimensionalvisualizations,” “topological maps,” and “radial visuali-zations.” Different groups of visualization techniques areapplied to support various predictive analysis tasks.“Spatial projections” and “topological maps” are employedwhen predicting the user’s demographic information.“High-dimensional visualizations,” “topological maps,”

Fig. 8. Visualization techniques supporting the “predictive analysis” task.

Fig. 7. All chart visualizations began to “revive” after 2005.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2491

Page 11: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

and “chart visualizations” are applied when predicting useractions in social media.

6.1.3 Current Practices of Mining Techniques

The current practices of the three first-level mining techni-ques: “modeling,” “model inference,” and “data processing”are illustrated in Fig. 1c. Most of the text visualization papersfocus on “modeling” (131 papers) and “data processing”(110 papers), while fewer papers support “model inference”(17 papers). The spark lines show that the proportion of visu-alization papers focusing on “data processing” and “modelinference” was the largest between 2000 and 2005 (Figs. 1Dand 1E). After that, their popularitywaned.

In contrast, “modeling” has continued to attract attentionsince 1995. To further study the temporal pattern, we drilledinto the concept “modeling.” The eight types of models usedin text visualization papers appear in Fig. 9. They can bedivided into two groups based on their temporal trends. Thefirst group contains traditionalmodels used frequently before2008 (Fig. 9a). Models in this group include “clustering,”“dimension reduction,” “neural networks,” and “languagemodel.” The second group consists of trending models thatbecame more popular after 2008 (Fig. 9b). This groupincludes “classification,” “topic models,” “graphical mod-els,” and “regression.” We further drilled into these second-level concepts to determine which specific techniques con-tribute to the aforementioned temporal trends. Figs. 10a and10b show three specific models for classification and threetypes of topic models, respectively. The spark lines indicatethat “support vector machine” and “boosting” contributemore to the trendiness of classification in contrast to “part-of-speech tagging.” For topic models, the temporal trends ofstatic and dynamic models are similar. Researchers andpractitioners interested in text visualization can utilize ourvisualization tool to find more trending techniques andleverage these state-of-the-art methods in their work.

6.2 Investigating Research Opportunities

In this study, we illustrate how our visualizationmay help toidentify potential research opportunities in a data-drivenmanner. This is achieved by revealing research gaps betweentext visualization and text mining research.

6.2.1 Opportunities Learned by Comparing Analysis

Tasks

To understand the gaps between the text visualization andmining fields, we compared the paper distributions fromthese two fields under different tasks, as well as examinedthe connections between these concepts. In particular, weidentified two main types of interesting tasks.

Tasks Less Frequently Studied in the Visualization Field.Some tasks are considered by many mining papers, but byfew text visualization papers, as shown in Fig. 1b. Forexample, “classification” and “information retrieval” havebeen less studied in text visualization papers. In contrast,these two tasks have been extensively studied in the textmining field. Based on these observations, we summarizetwo opportunities.

Opportunity 1: Supporting tasks proposed by mining res-earchers. After we drilled into the hierarchy and examinedmore specific tasks, we recognized that many tasks pro-posed by text mining researchers are complex and/orinteractive in nature, for which visual analytics researchmay be suitable. However, currently, the visualization fieldhas not paid much attention to them. For example, by dril-ling into “information retrieval” (Fig. 11a), we identifiedtasks such as “query ambiguity,” “federated search,” and“distributed information retrieval,” which have not beenwell studied in the visualization field. Among the childrenof “classification,” the least studied tasks in the visualiza-tion field are “taxonomy integration,” “cross language textclassification,” and “hierarchical classification” (Fig. 11b).Studying these tasks may help broaden the horizon of cur-rent visual text analytics research.

Opportunity 2: Integrating human knowledge to better sup-port text mining tasks. When we explored the task taxonomy,tasks such as ‘binary classification” and “recommendation”attracted our attention. These tasks are typical tasks for amining paper. They have been less considered by the visual-ization field because they can be solved using an automaticalgorithm. Usually, these tasks are well defined and the per-formance of the solution can be automatically evaluated.For these tasks, we believe that visual analytics can helpimprove the model performance by integrating humanknowledge, especially when models do not work asexpected. For example, to improve the performance of textclassification, an interactive visualization can be developedto enable experts to effectively provide informative supervi-sion at every stage of the classification pipeline. Such super-vision can be performed through the identification ofoutliers in the training data, verification of labels of impor-tant data samples, and better parameter settings.

Tasks with Insufficient Coverage in Both Fields. We alsonoticed that several tasks had not yet attracted much atten-tion from either the visualization or the mining field.Examples are “predictive analysis” and “outlier analysis”

Fig. 9. Models used in text visualization papers: (a) traditional modelsused frequently before 2008; (b) models that attract more attentionafter 2008.

Fig. 10. Drilling into (a) models for classification; (b) topic models.

Fig. 11. Example tasks proposed by mining researchers.

2492 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 12: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

(Fig. 1b). After analyzing these tasks, we identified thefollowing opportunity.

Opportunity 3: Supporting challenging tasks in text analysis.After drilling into “predictive analysis” and “outlier analy-sis,” we found several tasks that were difficult to handle,even for human experts, including “causality analysis,”“structured predication,” and “novelty detection” (Fig. 12).Developing visual text analytics approaches that supportsuch challenging tasks is an open research opportunity.To better support these tasks, we need to employ the fullpotential of both interactive visualization and text mining.One possible starting point is to study the state-of-the-art lit-erature from both the visualization and mining fields andfind a solution to tightly combine them by active learning orsemi-supervised learning.

6.2.2 Opportunities Learned by Comparing Mining

Techniques

We compared the visual text analytics papers with the textmining papers in terms of the mining techniques they used.Through our analysis, we identified the following threeopportunities.

Opportunity 4: Incorporating state-of-the-art mining techni-ques. Connections between tasks and mining techniquesdemonstrate that a majority of mining techniques are notsupported by existing text visualization papers (as shownby the bi-colored connections between (b) and (c) in Fig. 1).This gap can be observed by comparing the lengths of redsegments with the lengths of blue segments. For each task,our visualization allows users to find the relevant state-of-the-art mining techniques. Leveraging these techniquesmay help in supporting more difficult tasks and in develop-ing better visual analytics methods. Take topic modeling asan example. To find the state-of-the-art techniques, wedrilled into a relevant mining technique: “static topic mod-els.” Examples of several static topic models are shown inFig. 13. While the visualization field tends to use “LatentDirichlet Allocation (LDA),” text mining researchers alsouse other topic models such as “spherical topic models”and a “correlated topic model.” By observing the temporaltrends, we discovered a recently-proposed model named“geometric Dirichlet means algorithm” [342] that is morecomputationally efficient than LDA and can handle largernumbers of documents. Accordingly, our analysis resultscan be leveraged to discover the state-of-the-art technique(s) from both the visualization and the mining fields. Thiscan further advance the research and development of visualtext analysis applications.

Opportunity 5: Opening the black-box of text mining. In addi-tion, we recognized that the text mining field has produceda substantial amount of techniques that were specializedblack-box models tailored to specific tasks. Examples are

“mixture models,” “neural networks,” and “graphical mod-els.” An open research challenge that involves both textmining and visualization is to make these techniques under-standable. Therefore, developing new visual analytics app-roaches for understanding the inner-workings of thesemodels, which can steer users to better performance, is a gapthat has a great deal of potential for innovative research.

Opportunity 6: Connecting big textual data with people. Tex-tual data such as web pages, tweets, emails, instant mes-sages, web click-streams, or CRM information, is floodinginto the business world, academic community, and relevantgovernmental agencies. This data deluge is a large part ofbig data. In our exploration, we noticed that there werealready some initial efforts in the visualization and miningfields for divining actionable information from the deluge.For example, the task concept “trend/pattern analysis”under “trend analysis” contains one recent paper on visu-ally analyzing streaming textual data [45]. We also observedthat there were a few initial efforts from the mining field.For example, Twitter, Inc. developed a large-scale topicmodeling method for handling Twitter data [372] (underthe task concept “cluster/topic analysis”). Despite thepromising start in both fields, more research is needed forthis research topic, especially for shaping visual analyticsresearch that tightly integrates interactive visualization andtext mining techniques to maximize the value of both inhandling large-scale textual data.

7 DISCUSSIONS AND REFLECTIONS

Our work has studied three primary concepts, visualizationtechniques, mining techniques and analysis tasks. The analy-sis with the web-based visualization tool (Section 6) has dis-closed the connection between mining and visualizationtechniques through analysis tasks, as well as their temporaltrends over time. By investigating the relationships betweenthe three types of concepts, we schematically illustrated cur-rent practices and developments of visualization techniques,mining techniques, and analysis tasks. Popular researchtopics and potential emerging research topics were extractedby examining the connections between the three types ofconcepts and the gaps between them. One unique aspect ofour survey is that the research question drove us to surveytwo distinct research fields, and the need to identify a bridgeconnecting them. Our goal is to provide an overview of theresearch related to text mining and visualization from thetwo fields, and foster more cross-pollination research. In thissection, we introduce the by-product of our reviewwork, les-sons learned, and the limitations of our work.

7.1 Research By-Product

In addition to a comprehensive survey, our research hasalso delivered a research by-product, a visual-analytics-based literature analysis approach. The major feature of thisapproach is that it is based on the overall understandingand analysis of the major concepts (e.g., utilized techniques)mentioned in research papers. In this work, we mainlyfocused on analyzing two types of concepts, techniquesand tasks. Inspired by Gupta and Manning’s method [18]for automatically extracting key concepts, we developeda semi-automatic concept annotation method. By examining

Fig. 12. Example tasks that can be better supported by tightly integratinginteractive visualization with text mining techniques.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2493

Page 13: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

and analyzing the connections between different types ofconcepts, we provided a comprehensive overview of theon-going research efforts in the area of visual text analytics,including major research topics, their temporal trends, hotresearch topics, as well as less studied research topics.Based on the analysis of the aforementioned data, severalresearch opportunities were identified and highlightedfor the visualization domain. As the whole process isdata-driven, this approach can be easily extended to otherliterature review work. It is particularly useful for an inter-disciplinary review. Together with this approach, a vis-ualization tool for navigating these concepts and theconnections between them was also deployed as a web-based tool (http://visgroup.thss.tsinghua.edu.cn/textvis),which allows users to navigate through the major conceptsin a publication dataset, their connections, and the corre-sponding papers.

7.2 Lessons Learned

This taxonomy was constructed in a semi-automatic way,where we iteratively and progressively extracted conceptsand built a taxonomy for each type of concept. During thisprocess, we learned several practical lessons, which aresummarized in the remainder of this section.

Combination of Data and Knowledge. In order to provide acomprehensive overview of visual text analytics, we ana-lyzed over 4,600 research papers and extracted around 300keyword-based concepts. We settled on a semi-automaticmethod for concept extraction that is driven by both dataand knowledge for two reasons. On the one hand, withsuch a large number of papers and concepts involved, man-ual labeling would have been very difficult and time-consuming. On the other hand, the accuracy of the auto-matic concept extraction method is not sufficient anddepends on the coverage of the seed patterns. To overcomethese issues, we employed a semi-automatic concept extrac-tion method that tightly couples human knowledge withdata (Fig. 4). Initially, we manually extracted a set of techni-ques and analysis tasks from the visualization papersthat we were familiar with (knowledge-driven approach).Next, we extracted more concepts from the mining papersbased on the manually extracted concepts and Gupta andManning’s method (data-driven approach). All the authorsthen worked together to verify the extracted concepts itera-tively and resolved conflicts (knowledge-driven approach).The combination of the data- and knowledge-driven app-roaches improved both the quality of the concepts extractedand the labeling performance.

A similar approach that combines data and human knowl-edge (e.g., expert feedback) was also used to build the taxon-omy. K-means clustering was employed to build the initialtaxonomy. Then the authors, as well as several experts frommachine learning and data mining refined and improved thetaxonomy progressively. The experts preferred a balancedconcept taxonomy that was not too deep or too wide. This isalso consistent with a recent study [373]. Another useful les-son we learned is that human knowledge and experience arevery useful to code concepts and build taxonomies. Typically,coding and taxonomy building are an iterative and progres-sive process. This process ismost efficiently handled if experi-enced experts build an initial taxonomy that gets enriched byothers, based on their understanding.

The data-driven approach for finding research opportunities isa good complement to the knowledge-based method. In thepast, most literature reviews were manually carried out withthe aim of examining the progress of a particular researchtopic and identifying emerging research opportunities. Typi-cally, the breadth and depth of research opportunitiesdepend on the experts’ knowledge and their understandingof the research area. Our work contributes to this body ofwork by presenting the connections between different typesof concepts, which allows experts to examine the overalltrend of different research topics, as well as the gaps betweendifferent research fields in an interdisciplinary research area.This data-driven approach can be considered as a comple-ment to the current knowledge-based approach for identify-ing emerging research opportunities.

7.3 Limitations

Although the developed semi-automatic approach shedslight on the research progress and emerging research oppor-tunities in visual text analytics, we would like to note a fewlimitations.

When gathering the data, we tried our best to collect allrelevant papers in both research fields. However, we mayhave missed some papers due to the large number of avail-able venues and articles. To compensate for this, we willfurther extend our visualization tool to allow users to manu-ally add papers that they believe to be relevant. We willthen batch process and verify the submitted papers andmerge the results into the existing concept taxonomies.

Another limitation is related to the calculations of con-nections between different concepts. We chose to leverageconcept co-occurrences in the full-text to derive the connec-tions between different concepts. This strategy may lead tosome inaccurate connections between concepts. For exam-ple, one paper mentions using a scatterplot for an overviewand a line chart for observing the temporal trend. Based onconcept co-occurrences, we built all the possible connectionsbetween the four concepts, including “scatterplot” and“overview,” “scatterplot” and “trend analysis,” “line chart”and “overview,” as well as “line chart” and “trend analy-sis.” Here the concepts “scatterplot” and “trend analysis,”as well as “line chart” and “overview,” did not need to beconnected. The correlation accuracy can be improved byemploying more advanced approaches such as relationshipextraction [374]. Another solution is to utilize a crowd-sourcing platform in order to collect multiple annotations

Fig. 13. Examples of static topic models and their temporal trends.

2494 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 14: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

for each pair of connections. Accordingly, an interestingavenue of potential future work is to leverage a crowd-sourcing model such as M3V [375] to infer the correct con-nection from noisy crowd-sourced labels.

8 CONCLUSIONS

In this work, we conducted a comprehensive survey basedon 263 text visualization papers and 4,346 text miningpapers that have been published between 1992 and 2017.With a semi-automatic, data-driven analysis, we identifiedand extracted three types of concepts. Two of the concepts,visualization techniques and mining techniques, summarizethe research trends in the respective research fields, whilethe analysis tasks summarizes the goals of such research.Through statistically analyzing the relationships betweenthe three types of concepts, we connected visualizationtechniques and mining techniques with analysis tasks serv-ing as the bridge. In addition, a web-based visualizationtool has been developed to facilitate the investigation ofthe major research trends in the area of text visualization,including the major techniques and tasks, their develop-ment over time, as well as the gaps between the visualiza-tion and mining fields.

We believe the data-driven analysis process developedin this work can be directly used to conduct literature anal-ysis in other interdisciplinary research areas, such as inter-active machine learning [376], [377], bio-informaticsvisualization, or brain-inspired artificial intelligence. Thekey is to find an important intermediate concept thatbridges the two fields. For example, for the research areaof brain-inspired artificial intelligence, different types ofneurons and their operating mechanisms might be a candi-date intermediate concept that connects neuro-science andartificial intelligence. In this survey we have shown thatusing such an intermediate concept may help to narrowthe gaps between two research domains and provide use-ful insights into an interdisciplinary area, which can fostera better understanding of the research field and openspromising avenues for future research.

ACKNOWLEDGMENTS

This research was funded by National Key R&D Programof China (No. SQ2018YFB100002), the National NaturalScience Foundation of China (No.s 61761136020, 61672308),and Microsoft Research Asia, the Natural Sciences andEngineering Research Council of Canada (NSERC), theGerman Research Foundation (DFG), and the CanadaResearch Chairs program.

REFERENCES

[1] K. Kucher and A. Kerren, “Text visualization techniques: Taxon-omy, visual survey, and community insights,” in Proc. IEEE Vis.Symp., 2015, pp. 117–121.

[2] S. Liu, M. X. Zhou, S. Pan, Y. Song, W. Qian, W. Cai, and X. Lian,“TIARA: Interactive, topic-based visual text summarization andanalysis,” ACM Trans. Intell. Syst. Technol., vol. 3, no. 2, 2012,Art. no. 25.

[3] New Power BI custom visuals enable browsing and analyzingcollections of text. [Online]. Available: https://powerbi.microsoft.com/en-us/blog/new-power-bi-custom-visuals-for-browsing-and-analyzing-collections-of-text/, Accessed on: May15, 2018.

[4] W. Dou and S. Liu, “Topic- and time-oriented visual text analy-sis,” IEEE Comput. Graph. Appl., vol. 36, no. 4, pp. 8–13, Jul./Aug.2016.

[5] S. Liu, W. Cui, Y. Wu, and M. Liu, “A survey on informationvisualization: Recent advances and challenges,” The VisualComput., vol. 30, no. 12, pp. 1373–1393, 2014.

[6] G.-D. Sun, Y.-C. Wu, R.-H. Liang, and S.-X. Liu, “A survey ofvisual analytics techniques and applications: State-of-the-artresearch and future challenges,” J. Comput. Sci. Technol., vol. 28,no. 5, pp. 852–867, 2013.

[7] A. B. Alencar, M. C. F. de Oliveira, and F. V. Paulovich, “Seeingbeyond reading: A survey on visual text analytics,” Wiley Inter-disciplinary Rev.: Data Mining Knowl. Discovery, vol. 2, no. 6,pp. 476–492, 2012.

[8] Q. Gan, M. Zhu, M. Li, T. Liang, Y. Cao, and B. Zhou, “Documentvisualization: An overview of current research,” Wiley Interdisci-plinary Rev.: Comput. Statist., vol. 6, no. 1, pp. 19–36, 2014.

[9] S. J€anicke, G. Franzini, M. Cheema, and G. Scheuermann, “Visualtext analysis in digital humanities,” Comput. Graph. Forum,vol. 36, no. 6, pp. 226–250, 2017.

[10] P. Federico, F. Heimerl, S. Koch, and S. Miksch, “A survey onvisual approaches for analyzing scientific literature and patents,”IEEE Trans. Vis. Comput. Graph., vol. 23, no. 9, pp. 2179–2198,Sep. 2017.

[11] C. C. Aggarwal, Machine Learning for Text. Berlin, Germany:Springer, 2018.

[12] C. C. Aggarwal and C. Zhai, Mining Text Data. Berlin, Germany:Springer, 2012.

[13] E. Cambria and B. White, “Jumping NLP curves: A review ofnatural language processing research,” IEEE Comput. Intell. Mag.,vol. 9, no. 2, pp. 48–57, May 2014.

[14] J. Hirschberg and C. D. Manning, “Advances in natural languageprocessing,” Sci., vol. 349, no. 6245, pp. 261–266, 2015.

[15] A. Martinez and W. Martinez, “At the interface of computationallinguistics and statistics,” Wiley Interdisciplinary Rev.: Comput.Statist., vol. 7, no. 4, pp. 258–274, 2015.

[16] V. Gupta and G. S. Lehal, “A survey of text mining techniquesand applications,” J. Emerging Technol. Web Intell., vol. 1, no. 1,pp. 60–76, 2009.

[17] Y. Wu, N. Cao, D. Gotz, Y.-P. Tan, and D. A. Keim, “A survey onvisual analytics of social media data,” IEEE Trans. Multimedia,vol. 18, no. 11, pp. 2135–2148, Nov. 2016.

[18] S. Gupta and C. D. Manning, “Analyzing the dynamics ofresearch by extracting key aspects of scientific papers,” in Proc.Int. Joint Conf. Natural Language Process., 2011, pp. 1–9.

[19] D. Sacha, L. Zhang,M. Sedlmair, J. A. Lee, J. Peltonen, D.Weiskopf,S. C. North, and D. A. Keim, “Visual interaction with dimen-sionality reduction: A structured literature analysis,” IEEE Trans.Vis. Comput. Graph., vol. 23, no. 1, pp. 241–250, Jan. 2017.

[20] Apache Lucene Core. [Online]. Available: https://lucene.apache.org/core/, Accessed on: May 15, 2018.

[21] P. Isenberg, T. Isenberg, M. Sedlmair, J. Chen, and T. M€oller,“Visualization as seen through its research paper keywords,”IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 771–780, Jan.2017.

[22] D. Keim, G. Andrienko, J.-D. Fekete, C. G€org, J. Kohlhammer,and G. Melancon, “Visual analytics: Definition, process, andchallenges,” in Information Visualization, Berlin, Germany:Springer, 2008, pp. 154–175.

[23] F. Beck, S. Koch, and D. Weiskopf, “Visual analysis and dissemi-nation of scientific literature collections with SurVis,” IEEE Trans.Vis. Comput. Graph., vol. 22, no. 1, pp. 180–189, Jan. 2016.

[24] M. Berger, K. Mcdonough, and L. M. Seversky, “cite2vec:Citation-driven document exploration via word embeddings,”IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1, pp. 691–700,Jan. 2017.

[25] E. A. Bier, S. K. Card, and J. W. Bodnar, “Principles and tools forcollaborative entity-based intelligence analysis,” IEEE Trans.Vis. Comput. Graph., vol. 16, no. 2, pp. 178–191, Mar./Apr. 2010.

[26] H. Bosch, D. Thom, F. Heimerl, E. P€uttmann, S. Koch,R. Kr€uger, M. W€orner, and T. Ertl, “ScatterBlogs2: Real-timemonitoring of microblog messages through user-guided filter-ing,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12, pp. 2022–2031, Dec. 2013.

[27] M. Brehmer, S. Ingram, J. Stray, and T. Munzner, “Overview: Thedesign, adoption, and analysis of a visual document mining toolfor investigative journalists,” IEEE Trans. Vis. Comput. Graph.,vol. 20, no. 12, pp. 2271–2280, Dec. 2014.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2495

Page 15: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[28] M. Brooks, S. Amershi, B. Lee, S. M. Drucker, A. Kapoor, andP. Simard, “FeatureInsight: Visual support for error-driven fea-ture ideation in text classification,” in Proc. IEEE Conf. VisualAnalytics Sci. Technol., 2015, pp. 105–112.

[29] J. Choo, C. Lee, C. K. Reddy, and H. Park, “UTOPIAN: User-driven topic modeling based on interactive nonnegative matrixfactorization,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12,pp. 1992–2001, Dec. 2013.

[30] A. Cockburn, C. Gutwin, and J. Alexander, “Faster documentnavigation with space-filling thumbnails,” in Proc. SIGCHI Conf.Human Factors Comput. Syst., 2006, pp. 1–10.

[31] M. Correll, M. Witmore, and M. Gleicher, “Exploring collectionsof tagged text for literary scholarship,” Comput. Graph. Forum,vol. 30, no. 3, pp. 731–740, 2011.

[32] W. Cui, S. Liu, L. Tan, C. Shi, Y. Song, Z. J. Gao, X. Tong, andH. Qu, “TextFlow: Towards better understanding of evolvingtopics in text,” IEEE Trans. Vis. Comput. Graph., vol. 17, no. 12,pp. 2412–2421, Dec. 2011.

[33] W. Dou, I. Cho, O. Eltayeby, J. Choo, X. Wang, and W. Ribarsky,“DemographicVis: Analyzing demographic information basedon user generated content,” in Proc. IEEE Conf. Visual AnalyticsSci. Technol., 2015, pp. 57–64.

[34] D. Fried and S. G. Kobourov, “Maps of computer science,” inProc. IEEE Vis. Symp., 2014, pp. 113–120.

[35] V. Gold, C. Rohrdantz, andM. El-Assady, “Exploratory text anal-ysis using lexical episode plots,” in Proc. Eurographics Conf. Vis.,2015, pp. 85–89.

[36] C. G€org, Z. Liu, J. Kihm, J. Choo, H. Park, and J. Stasko,“Combining computational analyses and interactive visualiza-tion for document exploration and sensemaking in Jigsaw,” IEEETrans. Vis. Comput. Graph., vol. 19, no. 10, pp. 1646–1663,Oct. 2013.

[37] F. Heimerl, S. Koch, H. Bosch, and T. Ertl, “Visual classifier train-ing for text document retrieval,” IEEE Trans. Vis. Comput. Graph.,vol. 18, no. 12, pp. 2839–2848, Dec. 2012.

[38] F. Heimerl, M. John, Q. Han, S. Koch, and T. Ertl,“DocuCompass: Effective exploration of document landscap-es,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2016,pp. 11–20.

[39] M. S. Hossain, P. Butler, A. P. Boedihardjo, andN. Ramakrishnan,“Storytelling in entity networks to support intelligence analysts,”in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery DataMining, 2012, pp. 1375–1383.

[40] E. Isaacs, K. Damico, S. Ahern, E. Bart, and M. Singhal,“Footprints: A visual search tool that supports discovery andcoverage tracking,” IEEE Trans. Vis. Comput. Graph., vol. 20,no. 12, pp. 1793–802, Dec. 2014.

[41] I. Jusufi, M. Milrad, and X. Legaspi, “Interactive explorationof student generated content presented in blogs,” in Proc.Eurographics / IEEE VGTC Conf. Vis.: Posters, 2016, pp. 53–55.

[42] M. Kim, K. Kang, D. Park, J. Choo, and N. Elmqvist, “TopicLens:Efficient multi-level visual topic exploration of large-scale docu-ment collections,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1,pp. 151–160, Jan. 2017.

[43] S. Koch, M. John, M. W€orner, A. M€uller, and T. Ertl,“VarifocalReader–In-depth visual analysis of large text doc-uments,” IEEE Trans. Vis. Comput. Graph., vol. 20, no. 12, pp. 1723–1732, Dec. 2014.

[44] S. Liu, Y. Chen, H. Wei, J. Yang, K. Zhou, and S. M. Drucker,“Exploring topical lead-lag across corpora,” IEEE Trans. Knowl.Data Eng., vol. 27, no. 1, pp. 115–129, Jan. 2015.

[45] S. Liu, J. Yin, X. Wang, W. Cui, K. Cao, and J. Pei, “Online visualanalytics of text streams,” IEEE Trans. Vis. Comput. Graph.,vol. 22, no. 11, pp. 2451–2466, Nov. 2016.

[46] D. Luo, J. Yang, M. Krstajic, W. Ribarsky, and D. A. Keim,“EventRiver: Visually exploring text collections with temporalreferences,” IEEE Trans. Vis. Comput. Graph., vol. 18, no. 1,pp. 93–105, Jan. 2012.

[47] N. McCurdy, J. Lein, K. Coles, and M. Meyer, “Poemage: Visual-izing the sonic topology of a poem,” IEEE Trans. Vis. Comput.Graph., vol. 22, no. 1, pp. 439–448, Jan. 2016.

[48] M. Nafari and C. Weaver, “Augmenting visualization with natu-ral language translation of interaction: A usability study,”Comput. Graph. Forum, vol. 32, no. 3, pp. 391–400, 2013.

[49] S. Pan, M. X. Zhou, Y. Song, W. Qian, F. Wang, and S. Liu,“Optimizing temporal topic segmentation for intelligent textvisualization,” in Proc. Int. Conf. Intell. User Interfaces, 2013,pp. 339–350.

[50] F. V. Paulovich, L. G. Nonato, R. Minghim, and H. Levkowitz,“Least square projection: A fast high-precision multidimensionalprojection technique and its application to document mapping,”IEEE Trans. Vis. Comput. Graph., vol. 14, no. 3, pp. 564–575, May/Jun. 2008.

[51] P. Pirolli, E. Wollny, and B. Suh, “So you know you’re getting thebest possible information: A tool that increases Wikipedia credi-bility,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2009,pp. 1505–1508.

[52] J. Zhang, Y. Song, C. Zhang, and S. Liu, “Evolutionary hierarchi-cal Dirichlet processes for multiple correlated time-varying cor-pora,” in Proc. 16th ACM SIGKDD Int. Conf. Knowl. DiscoveryData Mining, 2010, pp. 1079–1088.

[53] S. Afzal, R. Maciejewski, Y. Jang, N. Elmqvist, and D. S. Ebert,“Spatial text visualization using automatic typographic maps,”IEEE Trans. Vis. Comput. Graph., vol. 18, no. 12, pp. 2556–2564,Dec. 2012.

[54] T. Bergstrom and K. Karahalios, “Conversation clusters: Groupingconversation topics through human-computer dialog,” in Proc.SIGCHI Conf. Human Factors Comput. Syst., 2009, pp. 2349–2352.

[55] N. Boukhelifa, F. Chevalier, and J.-D. Fekete, “Real-time aggrega-tion of Wikipedia data for visual analytics,” in Proc. IEEE Conf.Visual Analytics Sci. Technol., 2010, pp. 147–154.

[56] Q. Castell�a and C. Sutton, “Word storms: Multiples of wordclouds for visual comparison of documents,” in Proc. 23rd Int.Conf. World Wide Web, 2014, pp. 665–676.

[57] J. Chae, D. Thom, H. Bosch, Y. Jang, R. Maciejewski, D. S. Ebert,and T. Ertl, “Spatiotemporal social media analytics for abnormalevent detection and examination using seasonal-trend decom-position,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2012,pp. 143–152.

[58] S. Chandrasegaran, S. K. Badam, L. Kisselburgh, K. Ramani, andN. Elmqvist, “Integrating visual analytics support for groundedtheory practice in qualitative text analysis,” Comput. Graph.Forum, vol. 36, no. 3, pp. 201–212, 2017.

[59] M.-T. Chi, S.-S. Lin, S.-Y. Chen, C.-H. Lin, and T.-Y. Lee,“Morphable word clouds for time-varying text data visual-ization,” IEEE Trans. Vis. Comput. Graph., vol. 21, no. 12,pp. 1415–1426, Dec. 2015.

[60] M. A. Correll, E. C. Alexander, and M. Gleicher, “Quantity esti-mation in visualizations of tagged text,” in Proc. SIGCHI Conf.Human Factors Comput. Syst., 2013, pp. 2697–2706.

[61] W. Cui, Y. Wu, S. Liu, F. Wei, M. Zhou, and H. Qu, “Context-preserving, dynamic word cloud visualization,” IEEE Comput.Graph. Appl., vol. 30, no. 6, pp. 42–53, Nov./Dec. 2010.

[62] E. Graells-Garrido, M. Lalmas, and R. Baeza-Yates, “Data por-traits and intermediary topics: Encouraging exploration of politi-cally diverse profiles,” in Proc. Int. Conf. Intell. User Interfaces,2016, pp. 228–240.

[63] U. Hinrichs, S. Forlini, and B. Moynihan, “Speculative practices:Utilizing InfoVis to explore untapped literary collections,” IEEETrans. Vis. Comput. Graph., vol. 22, no. 1, pp. 429–438, Jan. 2016.

[64] M. Hu, K. Wongsuphasawat, and J. Stasko, “Visualizing socialmedia content with SentenTree,” IEEE Trans. Vis. Comput. Graph.,vol. 23, no. 1, pp. 621–630, Jan. 2017.

[65] Indratmo, J. Vassileva, and C. Gutwin, “Exploring blog archiveswith interactive visualization,” in Proc. Work. Conf. AdvancedVisual Interfaces, 2008, pp. 39–46.

[66] B. Lee, N. H. Riche, A. K. Karlson, and S. Carpendale,“SparkClouds: Visualizing trends in tag clouds,” IEEE Trans. Vis.Comput. Graph., vol. 16, no. 6, pp. 1182–1189, Nov./Dec. 2010.

[67] A. Jatowt, Y. Kawai, and K. Tanaka, “Visualizing historicalcontent of web pages,” in Proc. Int. Conf. World Wide Web, 2008,pp. 1221–1222.

[68] S. Koch, H. Bosch, M. Giereth, and T. Ertl, “Iterative integration ofvisual insights during scalable patent search and analysis,” IEEETrans. Vis. Comput. Graph., vol. 17, no. 5, pp. 557–569,May 2011.

[69] K. Koh, B. Lee, B. Kim, and J. Seo, “ManiWordle: Providingflexible control over Wordle,” IEEE Trans. Vis. Comput. Graph.,vol. 16, no. 6, pp. 1190–1197, Nov./Dec. 2010.

[70] B. C. Kwon, S.-H. Kim, S. Lee, J. Choo, J. Huh, and J. S. Yi,“VisOHC: Designing visual analytics for online healthcommunities,” IEEE Trans. Vis. Comput. Graph., vol. 22, no. 1,pp. 71–80, Jan. 2016.

[71] C. Y. Liu, M. S. Chen, and C. Y. Tseng, “IncreSTS: Towards real-time incremental short text summarization on comment streamsfrom social network services,” IEEE Trans. Knowl. Data Eng.,vol. 27, no. 11, pp. 2986–3000, Nov. 2015.

2496 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 16: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[72] Y. Lu, R. Kruger, D. Thom, F. Wang, S. Koch, T. Ertl, andR. Maciejewski, “Integrating predictive analytics and socialmedia,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2014,pp. 193–202.

[73] Y. Lu, M. Steptoe, S. Burke, H. Wang, J.-Y. Tsai, H. Davulcu,D. Montgomery, S. R. Corman, and R. Maciejewski, “Exploringevolving media discourse through event cueing,” IEEE Trans.Vis. Comput. Graph., vol. 22, no. 1, pp. 220–229, Jan. 2016.

[74] A. M. MacEachren, A. Jaiswal, A. C. Robinson, S. Pezanowski, A.Savelyev, P. Mitra, X. Zhang, and J. Blanford, “SensePlace2: Geo-Twitter analytics support for situational awareness,” in Proc.IEEE Conf. Visual Analytics Sci. Technol., 2011, pp. 181–190.

[75] D. Oelke and I. Gurevych, “A study on human-generated tagstructures to inform tag cloud layout,” in Proc. Work. Conf.Advanced Visual Interfaces, 2014, pp. 297–304.

[76] F. V. Paulovich, F. M. B. Toledo, G. P. Telles, R. Minghim, andL. G. Nonato, “Semantic wordification of document collections,”Comput. Graph. Forum, vol. 31, no. 3, pp. 1145–1153, 2012.

[77] H. Strobelt, M. Spicker, A. Stoffel, D. Keim, and O. Deussen,“Rolled-out Wordles: A heuristic method for overlap removal of2D data representatives,” Comput. Graph. Forum, vol. 31, no. 3,pp. 1135–1144, 2012.

[78] J. Wood, J. Dykes, A. Slingsby, and K. Clarke, “Interactive visualexploration of a large spatio-temporal dataset: Reflections on ageovisualization mashup,” IEEE Trans. Vis. Comput. Graph.,vol. 13, no. 6, pp. 1176–1183, Nov./Dec. 2007.

[79] Y. Wu, T. Provan, F. Wei, S. Liu, and K.-L. Ma, “Semantic-preserving word clouds by seam carving,” Comput. Graph. Forum,vol. 30, no. 3, pp. 741–750, 2011.

[80] X. Wang, B. Janssen, and E. Bier, “Finding business informationby visualizing enterprise document activity,” in Proc. Work. Conf.Advanced Visual Interfaces, 2010, pp. 41–48.

[81] N. Cao, L. Lu, Y.-R. Lin, F. Wang, and Z. Wen, “SocialHelix:Visual analysis of sentiment divergence in social media,” J. Vis.,vol. 18, no. 2, pp. 221–235, 2015.

[82] C. Collins, F. B. Vi�egas, and M. Wattenberg, “Parallel tag cloudsto explore and analyze faceted text corpora,” in Proc. IEEE Conf.Visual Analytics Sci. Technol., 2009, pp. 91–98.

[83] G. Sun, T. Tang, T.-Q. Peng, R. Liang, and Y. Wu, “Social Wave:Visual analysis of spatio-temporal diffusion of information onsocial media,” ACM Trans. Intell. Syst. Technol., vol. 9, no. 2, 2017,Art. no. 15.

[84] G. Carenini and L. Rizoli, “A multimedia interface for facilitatingcomparisons of opinions,” in Proc. Int. Conf. Intell. User Interfaces,2009, pp. 325–334.

[85] M.-W. Chang and C. Collins, “Exploring entities in text withdescriptive non-photorealistic rendering,” in Proc. IEEE Vis.Symp., 2013, pp. 9–16.

[86] F. Chen, P. Chiu, and S. Lim, “Topic modeling of documentmetadata for visualizing collaborations over time,” in Proc. Int.Conf. Intell. User Interfaces, 2016, pp. 108–117.

[87] F. Chevalier, S. Huot, and J.-D. Fekete, “WikipediaViz: Convey-ing article quality for casual Wikipedia readers,” in Proc. IEEEVis. Symp., 2010, pp. 49–56.

[88] N. Diakopoulos, M. Naaman, and F. Kivran-Swaine, “Diamondsin the rough: Social media visual analytics for journalisticinquiry,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2010,pp. 115–122.

[89] M. El-Assady, R. Sevastjanova, F. Sperrle, D. Keim, andC. Collins,“Progressive learning of topicmodeling parameters: A visual ana-lytics framework,” IEEE Trans. Vis. Comput. Graph., vol. 24, no. 1,pp. 382–391, Jan. 2018.

[90] S. G. Esparza, M. P. O’Mahony, and B. Smyth, “CatStream: Cate-gorising tweets for user profiling and stream filtering,” in Proc.Int. Conf. Intell. User Interfaces, 2013, pp. 25–36.

[91] C. Felix, A. V. Pandey, and E. Bertini, “TextTile: An interactivevisualization tool for seamless exploratory analysis of structureddata and unstructured text,” IEEE Trans. Vis. Comput. Graph.,vol. 23, no. 1, pp. 161–170, Jan. 2017.

[92] T. Gao, J. R. Hullman, E. Adar, B. Hecht, and N. Diakopoulos,“NewsViews: An automated pipeline for creating custom geovi-sualizations for news,” in Proc. SIGCHI Conf. Human FactorsComput. Syst., 2014, pp. 3005–3014.

[93] M. Jankowska, V. Ke�selj, and E. Milios, “Relative N-gram signa-tures: Document visualization at the level of character N-grams,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2012,pp. 103–112.

[94] H. Lee, J. Kihm, J. Choo, J. Stasko, and H. Park, “iVisClustering:An interactive visual document clustering via topic modeling,”Comput. Graph. Forum, vol. 31, no. 3, pp. 1155–1164, 2012.

[95] J. Mahmud, G. Fei, A. Xu, A. Pal, and M. Zhou, “Predictingattitude and actions of Twitter users,” in Proc. Int. Conf. Intell.User Interfaces, 2016, pp. 2–6.

[96] H. Park and J. Choi, “V-model: A new innovative model to chro-nologically visualize narrative clinical texts,” in Proc. SIGCHIConf. Human Factors Comput. Syst., 2012, pp. 453–462.

[97] S. Park, S. Lee, and J. Song, “Aspect-level news browsing: Under-standing news events from multiple viewpoints,” in Proc. Int.Conf. Intell. User Interfaces, 2010, pp. 41–50.

[98] P. Pirolli and S. K. Card, “Information foraging models of brows-ers for very large document spaces,” in Proc. Work. Conf.Advanced Visual Interfaces, 1998, pp. 83–93.

[99] D. Ren, X. Zhang, Z. Wang, J. Li, and X. Yuan, “WeiboEvents:A crowd sourcing Weibo visual analytic system,” in Proc. IEEEVis. Symp., 2014, pp. 330–334.

[100] F. Y. Wang, A. Sallaberry, K. Klein, M. Takatsuka, and M. Roche,“SentiCompass: Interactive visualization for exploring and com-paring the sentiments of time-varying Twitter data,” in Proc.IEEE Vis. Symp., 2015, pp. 129–133.

[101] S. Bostandjiev, J. O’Donovan, and T. H€ollerer, “LinkedVis:Exploring social and semantic career recommendations,” in Proc.Int. Conf. Intell. User Interfaces, 2013, pp. 107–116.

[102] J. Chuang, D. Ramage, C. Manning, and J. Heer, “Interpretationand trust: Designing model-driven visualizations for text analy-sis,” in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2012,pp. 443–452.

[103] C. Collins, G. Penn, and S. Carpendale, “Bubble sets: Revealing setrelationswith isocontours over existing visualizations,” IEEETrans.Vis. Comput. Graph., vol. 15, no. 6, pp. 1009–1016, Nov./Dec. 2009.

[104] W. Dou, X. Wang, R. Chang, and W. Ribarsky, “ParallelTopics:A probabilistic approach to exploring document collections,”in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2011, pp. 231–240.

[105] M. Glueck, M. P. Naeini, F. Doshi-Velez, F. Chevalier, A. Khan,D. Wigdor, and M. Brudno, “PhenoLines: Phenotype comparisonvisualizations for disease subtyping via topic models,” IEEETrans. Vis. Comput. Graph., vol. 24, no. 1, pp. 371–381, Jan. 2018.

[106] T. M. V. Le and H. W. Lauw, “Semantic visualization for spheri-cal representation,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl.Discovery Data Mining, 2014, pp. 1007–1016.

[107] X. Lin, “Visualization for the document space,” in Proc. IEEEConf. Vis., 1992, pp. 274–281.

[108] X. Liu, A. Xu, L. Gou, H. Liu, R. Akkiraju, and H.-W. Shen,“SocialBrands: Visual analysis of public perceptions of brandson social media,” in Proc. IEEE Conf. Visual Analytics Sci. Technol.,2016, pp. 71–80.

[109] Y. Mao, J. Dillon, and G. Lebanon, “Sequential document visual-ization,” IEEE Trans. Vis. Comput. Graph., vol. 13, no. 6, pp. 1208–1215, Nov./Dec. 2007.

[110] S. Mukherjea, K. Hirata, and Y. Hara, “Visualizing the results ofmultimedia Web search engines,” in Proc. IEEE Symp. Inf. Vis.,1996, pp. 64–65.

[111] N. E. Miller, P. C. Wong, M. Brewster, and H. Foote, “TOPICISLANDSTM—A wavelet-based text visualization system,” inProc. IEEE Conf. Vis., 1998, pp. 189–196.

[112] M. Nacenta, U. Hinrichs, and S. Carpendale, “FatFonts: Combin-ing the symbolic and visual aspects of numbers,” in Proc. Work.Conf. Advanced Visual Interfaces, 2012, pp. 407–414.

[113] F. V. Paulovich and R. Minghim, “HiPP: A novel hierarchicalpoint placement strategy and its application to the exploration ofdocument collections,” IEEE Trans. Vis. Comput. Graph., vol. 14,no. 6, pp. 1229–1236, Nov./Dec. 2008.

[114] A. Perer and M. A. Smith, “Contrasting portraits of email practi-ces: Visual approaches to reflection and analysis,” in Proc. Work.Conf. Advanced Visual Interfaces, 2006, pp. 389–395.

[115] D. A. Rushall and M. R. Ilgen, “DEPICT: Documents evaluated aspictures. Visualizing information using context vectors and self-organizingmaps,” in Proc. IEEE Symp. Inf. Vis., 1996, pp. 100–107.

[116] X. Wang, S. Liu, J. Liu, J. Chen, J. Zhu, and B. Guo,“TopicPanorama: A full picture of relevant topics,” IEEE Trans.Vis. Comput. Graph., vol. 22, no. 12, pp. 2508–2521, Dec. 2016.

[117] P. C. Wong, H. Foote, D. Adams, W. Cowley, and J. Thomas,“Dynamic visualization of transient data streams,” in Proc. IEEESymp. Inf. Vis., 2003, pp. 97–104.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2497

Page 17: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[118] Y. Wu, F. Wei, S. Liu, N. Au, W. Cui, H. Zhou, and H. Qu,“OpinionSeer: Interactive visualization of hotel customerfeedback,” IEEE Trans. Vis. Comput. Graph., vol. 16, no. 6,pp. 1109–1118, Nov./Dec. 2010.

[119] J. Xu, Y. Tao, H. Lin, R. Zhu, and Y. Yan, “Exploring controversyvia sentiment divergences of aspects in reviews,” in Proc. IEEEVis. Symp., 2017, pp. 240–249.

[120] J. Zhao, N. Cao, Z. Wen, Y. Song, Y.-R. Lin, and C. Collins,“#FluxFlow: Visual analysis of anomalous information spreadingon social media,” IEEE Trans. Vis. Comput. Graph., vol. 20, no. 12,pp. 1773–1782, Dec. 2014.

[121] J. Zhao, L. Gou, F. Wang, and M. Zhou, “PEARL: An interactivevisual analytic tool for understanding personal emotion stylederived from social media,” in Proc. IEEE Conf. Visual AnalyticsSci. Technol., 2014, pp. 203–212.

[122] F. Beck and D. Weiskopf, “Word-sized graphics for scientifictexts,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 6, pp. 1576–1587, Jun. 2017.

[123] M. B€ogl, P. Filzmoser, T. Gschwandtner, T. Lammarsch, R.A. Leite,S. Miksch, and A. Rind, “Cycle plot revisited: Multivariate outlierdetection using a distancebased abstraction,” Comput. Graph.Forum, vol. 36, no. 3, pp. 227–238, 2017.

[124] W. Chen, T. Lao, J. Xia, X. Huang, B. Zhu, W. Hu, and H. Guan,“GameFlow: Narrative visualization of NBA basketball games,”IEEE Trans. Multimedia, vol. 18, no. 11, pp. 2247–2256, Nov. 2016.

[125] D. Fisher, A. Hoff, G. Robertson, and M. Hurst, “Narratives:A visualization to track narrative events as they develop,” inProc. IEEE Conf. Visual Analytics Sci. Technol., 2008, pp. 115–122.

[126] E. Hoque and G. Carenini, “MultiConVis: A visual text analyticssystem for exploring a collection of online conversations,” inProc. Int. Conf. Intell. User Interfaces, 2016, pp. 96–107.

[127] J. Hullman, N. Diakopoulos, and E. Adar, “Contextifier: Auto-matic generation of annotated stock visualizations,” in Proc.SIGCHI Conf. Human Factors Comput. Syst., 2013, pp. 2707–2716.

[128] P. Isenberg, F. Heimerl, S. Koch, T. Isenberg, P. Xu, C. D. Stolper,M. Sedlmair, J. Chen, T. M€oller, and J. Stasko, “Vispubdata. org:A metadata collection about IEEE Visualization (VIS) publi-cations,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 9, pp. 2199–2206, Sep. 2017.

[129] A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden,and R. C. Miller, “TwitInfo: Aggregating and visualizing micro-blogs for event exploration,” in Proc. SIGCHI Conf. Human FactorsComput. Syst., 2011, pp. 227–236.

[130] F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski,“Understanding Twitter data with TweetXplorer,” in Proc. 19thACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2013,pp. 1482–1485.

[131] R. Patel and W. Furr, “ReadN’Karaoke: Visualizing prosody inchildren’s books for expressive oral reading,” in Proc. SIGCHIConf. Human Factors Comput. Syst., 2011, pp. 3203–3206.

[132] N. H. Riche, B. Lee, and F. Chevalier, “iChase: Supporting exp-loration and awareness of editing activities on Wikipedia,” inProc. Work. Conf. Advanced Visual Interfaces, 2010, pp. 59–66.

[133] E. Schubert, M. Weiler, and H.-P. Kriegel, “SigniTrend: Scalabledetection of emerging topics in textual streams by hashed signifi-cance thresholds,” in Proc. 20th ACM SIGKDD Int. Conf. Knowl.Discovery Data Mining, 2014, pp. 871–880.

[134] E. Segel and J. Heer, “Narrative visualization: Telling stories withdata,” IEEE Trans. Vis. Comput. Graph., vol. 16, no. 6, pp. 1139–1148, Nov./Dec. 2010.

[135] F. Vi�egas, M. Wattenberg, J. Hebert, G. Borggaard, A. Cichowlas,J. Feinberg, J. Orwant, and C. Wren, “Google+ Ripples: A nativevisualization of information flow,” in Proc. 22nd Int. Conf. WorldWide Web, 2013, pp. 1389–1398.

[136] S. Wan and C. Paris, “Improving government services with socialmedia feedback,” in Proc. Int. Conf. Intell. User Interfaces, 2014,pp. 27–36.

[137] J. Xia, Y. Hou, Y. V. Chen, Z. C. Qian, D. S. Ebert, and W. Chen,“Visualizing rank time series of wikipedia top-viewed pages,”IEEEComput. Graph. Appl., vol. 37, no. 2, pp. 42–53,Mar./Apr. 2017.

[138] C. Xie, W. Chen, X. Huang, Y. Hu, S. Barlowe, and J. Yang,“VAET: A visual analytics approach for E-transactions time-series,” IEEE Trans. Vis. Comput. Graph., vol. 20, no. 12, pp. 1743–1752, Dec. 2014.

[139] P. Xu, Y. Wu, E. Wei, T. Q. Peng, S. Liu, J. J. H. Zhu, and H. Qu,“Visual analysis of topic competition on social media,” IEEE Trans.Vis. Comput. Graph., vol. 19, no. 12, pp. 2012–2021, Dec. 2013.

[140] J. Zhang, B. Ahlbrand, A. Malik, J. Chae, Z. Min, S. Ko, andD. S. Ebert, “A visual analytics framework for microblog dataanalysis at multiple scales of aggregation,” Comput. Graph.Forum, vol. 35, no. 3, pp. 441–450, 2016.

[141] J. Zhao, L. Dong, J. Wu, and K. Xu, “MoodLens: An emoticon-based sentiment analysis system for Chinese tweets,” in Proc.ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012,pp. 1528–1531.

[142] J. Alsakran, Y. Chen, Y. Zhao, J. Yang, and D. Luo, “STREAMIT:Dynamic visualization and interactive exploration of textstreams,” in Proc. IEEE Vis. Symp., 2011, pp. 131–138.

[143] N. Bansal and N. Koudas, “BlogScope: Spatio-temporal analysisof the blogosphere,” in Proc. 16th Int. Conf. World Wide Web, 2007,pp. 1269–1270.

[144] S. Chen, S. Chen, Z. Wang, J. Liang, X. Yuan, N. Cao, and Y. Wu,“D-Map: Visual analysis of ego-centric information diffusion pat-terns in social media,” in Proc. IEEE Conf. Visual Analytics Sci.Technol., 2017, pp. 41–50.

[145] M. A. Hearst and J. O. Pedersen, “Visualizing informationretrieval results: A demonstration of the TileBar interface,”in Proc. SIGCHI Conf. Human Factors Comput. Syst., 1996, pp. 394–395.

[146] E. Hoque, S. Joty, L. Marquez, and G. Carenini, “CQAVis: Visualtext analytics for community question answering,” in Proc. Int.Conf. Intell. User Interfaces, 2017, pp. 161–172.

[147] S. J€anicke and D. J. Wrisley, “Interactive visual alignment ofmedieval text versions,” IEEE VAST, 2017.

[148] K. Kaugars, “Integrated multi scale text retrieval visualization,”in Proc. SIGCHI Conf. Human Factors Comput. Syst., 1998, pp. 307–308.

[149] D. A. Szafir, D. Stuffer, Y. Sohail, and M. Gleicher, “TextDNA:Visualizing word usage with configurable colorfields,” Comput.Graph. Forum, vol. 35, no. 3, pp. 421–430, 2016.

[150] F. B. Vi�egas, M. Wattenberg, and J. Feinberg, “Participatory visu-alization with Wordle,” IEEE Trans. Vis. Comput. Graph., vol. 15,no. 6, pp. 1137–1144, Nov./Dec. 2009.

[151] E. A. Bier, S. K. Card, and J. W. Bodnar, “Entity-based collabora-tion tools for intelligence analysis,” in Proc. IEEE Conf. VisualAnalytics Sci. Technol., 2008, pp. 99–106.

[152] L. Bradel, C. North, L. House, and S. Leman, “Multi-modelsemantic interaction for text analytics,” in Proc. IEEE Conf. VisualAnalytics Sci. Technol., 2014, pp. 163–172.

[153] N. Cao, J. Sun, Y. R. Lin, D. Gotz, S. Liu, and H. Qu, “FacetAtlas:Multifaceted visualization for rich text corpora,” IEEE Trans. Vis.Comput. Graph., vol. 16, no. 6, pp. 1172–1181, Nov./Dec. 2010.

[154] C. Chen, F. Ibekwe-Sanjuan, E. Sanjuan, and C. Weaver, “Visualanalysis of conflicting opinions,” in Proc. IEEE Conf. Visual Ana-lytics Sci. Technol., 2006, pp. 59–66.

[155] J.-K. Chou and C.-K. Yang, “PaperVis: Literature review madeeasy,” Comput. Graph. Forum, vol. 30, no. 3, pp. 721–730, 2011.

[156] H. Chung, S. Yang, N. Massjouni, C. Andrews, R. Kanna, andC. North, “VizCept: Supporting synchronous collaboration forconstructing visualizations in intelligence analysis,” in Proc.IEEE Conf. Visual Analytics Sci. Technol., 2010, pp. 107–114.

[157] P. J. Crossno, D. M. Dunlavy, and T. M. Shead, “LSAView: A toolfor visual exploration of latent semantic modeling,” in Proc. IEEEConf. Visual Analytics Sci. Technol., 2009, pp. 83–90.

[158] M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, andA. Tomkins, “Visualizing tags over time,” ACM Trans. Web,vol. 1, no. 2, 2007, Art. no. 7.

[159] M. El-Assady, R. Sevastjanova, B. Gipp, D. Keim, and C. Collins,“NEREx: Named-entity relationship exploration in multi-partyconversations,” Comput. Graph. Forum, vol. 36, no. 3, pp. 213–225,2017.

[160] A. Endert, P. Fiaux, and C. North, “Semantic interaction forvisual text analytics,” in Proc. SIGCHI Conf. Human Factors Com-put. Syst., 2012, pp. 473–482.

[161] S. Fu, J. Zhao, W. Cui, and H. Qu, “Visual analysis of MOOCforums with iForum,” IEEE Trans. Vis. Comput. Graph., vol. 23,no. 1, pp. 201–210, Jan. 2017.

[162] M. Itoh, N. Yoshinaga, M. Toyoda, and M. Kitsuregawa,“Analysis and visualization of temporal changes in bloggers’activities and interests,” in Proc. IEEE Vis. Symp., 2012, pp. 57–64.

[163] A. Kochtchi, T. von Landesberger, and C. Biemann, “Networksof names: Visual exploration and semi-automatic tagging ofsocial networks from newspaper articles,” Comput. Graph. Forum,vol. 33, no. 3, pp. 211–220, 2014.

2498 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 18: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[164] M. Liu, S. Liu, X. Zhu, Q. Liao, F. Wei, and S. Pan, “An uncer-tainty-aware approach for exploratory microblog retrieval,”IEEE Trans. Vis. Comput. Graph., vol. 22, no. 1, pp. 250–259,Jan. 2016.

[165] S. J. Luo, L. T. Huang, B. Y. Chen, and H.-W. Shen, “EmailMap:Visualizing event evolution and contact interaction within emailarchives,” in Proc. IEEE Vis. Symp., 2014, pp. 320–324.

[166] K. Madhavan, N. Elmqvist, M. Vorvoreanu, X. Chen, Y. Wong,H. Xian, Z. Dong, and A. Johri, “DIA2: Web-based cyberinfras-tructure for visual analysis of funding portfolios,” IEEE Trans.Vis. Comput. Graph., vol. 20, no. 12, pp. 1823–1832, Dec. 2014.

[167] D. Shahaf, C. Guestrin, and E. Horvitz, “Metro maps of science,”in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Min-ing, 2012, pp. 1122–1130.

[168] D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, and J. Leskovec,“Information cartography: Creating zoomable, large-scale mapsof information,” in Proc. 19th ACM SIGKDD Int. Conf. Knowl.Discovery Data Mining, 2013, pp. 1097–1105.

[169] Q. Shen, T. Wu, H. Yang, Y. Wu, H. Qu, and W. Cui,“NameClarifier: A visual analytics system for author namedisambiguation,” IEEE Trans. Vis. Comput. Graph., vol. 23, no. 1,pp. 141–150, Jan. 2017.

[170] J. Stasko, C. G€org, Z. Liu, and K. Singhal, “Jigsaw: Supportinginvestigative analysis through interactive visualization,” in Proc.IEEE Conf. Visual Analytics Sci. Technol., 2007, pp. 131–138.

[171] F. van Ham, M. Wattenberg, and F. B. Vi�egas, “Mapping textwith phrase nets,” IEEE Trans. Vis. Comput. Graph., vol. 15, no. 6,pp. 1169–1176, Nov./Dec. 2009.

[172] R. Vuillemot, T. Clement, C. Plaisant, and A. Kumar, “What’sbeing said near Martha? Exploring name entities in literary textcollections,” in Proc. IEEE Conf. Visual Analytics Sci. Technol.,2009, pp. 107–114.

[173] K. Wongsuphasawat and D. Gotz, “Exploring flow, factors, andoutcomes of temporal event sequences with the outflow visual-ization,” IEEE Trans. Vis. Comput. Graph., vol. 18, no. 12,pp. 2659–2668, Dec. 2012.

[174] A. Abdul-Rahman, J. Lein, K. Coles, E. Maguire, M. Meyer,M. Wynne, C. R. Johnson, A. Trefethen, and M. Chen, “Rule-based visual mappings–with a case study on poetry visual-ization,” Comput. Graph. Forum, vol. 32, no. 3, pp. 381–390, 2013.

[175] R. Baeza-Yates, “Visualization of large answers in text databases,”in Proc.Work. Conf. Advanced Visual Interfaces, 1996, pp. 101–107.

[176] N. Cao, Y. R. Lin, X. Sun, and D. Lazer, “Whisper: Tracing thespatiotemporal process of information diffusion in real time,”IEEE Trans. Vis. Comput. Graph., vol. 18, no. 12, pp. 2649–2658,Dec. 2012.

[177] A. M. Cuadros, F. V. Paulovich, R. Minghim, and G. P. Telles,“Point placement by phylogenetic trees and its application tovisual analysis of document collections,” in Proc. IEEE Conf.Visual Analytics Sci. Technol., 2007, pp. 99–106.

[178] W. Cui, S. Liu, Z. Wu, and H. Wei, “How hierarchical topicsevolve in large text corpora,” IEEE Trans. Vis. Comput. Graph.,vol. 20, no. 12, pp. 2281–2290, Dec. 2014.

[179] W. Dou, L. Yu, X. Wang, Z. Ma, and W. Ribarsky, “Hierarchical-Topics: Visually exploring large text collections using topichierarchies,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12,pp. 2002–2011, Dec. 2013.

[180] D. A. Keim, F.Mansmann, C. Panse, J. Schneidewind, andM. Sips,“Mail explorer—spatial and temporal exploration of electronicmail,” in Proc. Eurographics Conf. Vis., 2005, pp. 247–254.

[181] P. Riehmann,H.Gruendl, B. Froehlich,M. Potthast,M. Trenkmann,and B. Stein, “The NETSPEAK WORDGRAPH: Visualizing key-words in context,” in Proc. IEEEVis. Symp., 2011, pp. 123–130.

[182] C. Rohrdantz, M. Hund, T. Mayer, and D. A. Keim, “The world’slanguages explorer: Visual analysis of language features in gene-alogical and areal contexts,” Comput. Graph. Forum, vol. 31, no. 3,pp. 935–944, 2012.

[183] S. Schlechtweg, P. Schulze-Wollgast, and H. Schumann, “Interac-tive treemaps with detail on demand to support informationsearch in documents,” in Proc. Eurographics Conf. Vis., 2004,pp. 121–128.

[184] M. A. Smith and A. T. Fiore, “Visualization components forpersistent conversations,” in Proc. SIGCHI Conf. Human FactorsComput. Syst., 2001, pp. 136–143.

[185] X. Wang, S. Liu, Y. Chen, T. Q. Peng, J. Su, J. Yang, and B. Guo,“How ideas flow across multiple social groups,” in Proc. IEEEConf. Visual Analytics Sci. Technol., 2016, pp. 51–60.

[186] M. Wattenberg and F. B. Vi�egas, “The Word Tree, an interactivevisual concordance,” IEEE Trans. Vis. Comput. Graph., vol. 14,no. 6, pp. 1221–1228, Nov./Dec. 2008.

[187] Y. Wu, S. Liu, K. Yan, and M. Liu, “OpinionFlow: Visual analysisof opinion diffusion on social media,” IEEE Trans. Vis. Comput.Graph., vol. 20, no. 12, pp. 1763–1772, Dec. 2014.

[188] J. Yabe, S. Takahashi, and E. Shibayama, “Automatic animationof discussions in USENET,” in Proc. Work. Conf. Advanced VisualInterfaces, 2000, pp. 84–91.

[189] J. Zhao, F. Chevalier, C. Collins, and R. Balakrishnan, “Facilitatingdiscourse analysis with interactive visualization,” IEEE Trans.Vis. Comput. Graph., vol. 18, no. 12, pp. 2639–2648, Dec. 2012.

[190] D. Angus, A. Smith, and J. Wiles, “Conceptual Recurrence Plots:Revealing patterns in human discourse,” IEEE Trans. Vis. Com-put. Graph., vol. 18, no. 6, pp. 988–997, Jun. 2012.

[191] P. Butler, P. Chakraborty, and N. Ramakrishan, “The Deshred-der: A visual analytic approach to reconstructing shredded doc-uments,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2012,pp. 113–122.

[192] G. Carenini, R. T. Ng, and A. Pauls, “Interactive multimediasummaries of evaluative text,” in Proc. Int. Conf. Intell. User Inter-faces, 2006, pp. 124–131.

[193] J. Chuang, C. D. Manning, and J. Heer, “Termite: Visualizationtechniques for assessing textual topic models,” in Proc. Work.Conf. Advanced Visual Interfaces, 2012, pp. 74–77.

[194] C. Collins, S. Carpendale, and G. Penn, “DocuBurst: Visualizingdocument content using language structure,” Comput. Graph.Forum, vol. 28, no. 3, pp. 1039–1046, 2009.

[195] S. G. Eick, J. Mauger, and A. Ratner, “Visualizing the perfor-mance of computational linguistics algorithms,” in Proc. IEEEConf. Visual Analytics Sci. Technol., 2006, pp. 151–157.

[196] B. Hetzler, P. Whitney, L. Martucci, and J. Thomas, “Multi-fac-eted insight through interoperable visual information analysisparadigms,” in Proc. IEEE Symp. Inf. Vis., 1998, pp. 137–144.

[197] D. A. Keim and D. Oelke, “Literature fingerprinting: A newmethod for visual literary analysis,” in Proc. IEEE Conf. VisualAnalytics Sci. Technol., 2007, pp. 115–122.

[198] D. Oelke, D. Kokkinakis, and D. A. Keim, “Fingerprint matrices:Uncovering the dynamics of social networks in prose literature,”Comput. Graph. Forum, vol. 32, no. 3, pp. 371–380, 2013.

[199] D. Oelke, D. Spretke, A. Stoffel, and D. A. Keim, “Visual read-ability analysis: How to make your writings easier to read,” inProc. IEEE Conf. Visual Analytics Sci. Technol., 2010, pp. 123–130.

[200] D. Oelke, P. Bak, D. A. Keim,M. Last, andG. Danon, “Visual evalu-ation of text features for document summarization and analysis,”in Proc. IEEEConf. Visual Analytics Sci. Technol., 2008, pp. 75–82.

[201] D. Oelke,M.Hao, C. Rohrdantz, D. A. Keim,U. Dayal, L.-E.Haug,and H. Janetzko, “Visual opinion analysis of customer feedbackdata,” in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2009,pp. 187–194.

[202] R. L. Ribler and M. Abrams, “Using visualization to detectplagiarism in computer science classes,” in Proc. IEEE Symp. Inf.Vis., 2000, pp. 173–178.

[203] F. Stoffel, W. Jentner, M. Behrisch, J. Fuchs, and D. Keim,“Interactive ambiguity resolution of named entities in fictionalliterature,” Comput. Graph. Forum, vol. 36, no. 3, pp. 189–200,2017.

[204] H. Strobelt, D. Oelke, C. Rohrdantz, A. Stoffel, D. A. Keim, andO. Deussen, “Document cards: A top trumps visualization fordocuments,” IEEE Trans. Vis. Comput. Graph., vol. 15, no. 6,pp. 1145–1152, Nov./Dec. 2009.

[205] J. Zhao, C. Collins, F. Chevalier, and R. Balakrishnan, “Interactiveexploration of implicit and explicit relations in faceted datasets,”IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12, pp. 2080–2089,Dec. 2013.

[206] I. Cho, W. Dou, D. X. Wang, E. Sauda, and W. Ribarsky,“VAiRoma: A visual analytics system for making sense of places,times, and events in roman history,” IEEE Trans. Vis. Comput.Graph., vol. 22, no. 1, pp. 210–219, Jan. 2016.

[207] W. Dou, X. Wang, D. Skau, and W. Ribarsky, “LeadLine: Interac-tive visual analysis of text data through event identification andexploration,” in Proc. IEEE Conf. Visual Analytics Sci. Technol.,2012, pp. 93–102.

[208] S. Gad,W. Javed, S. Ghani, N. Elmqvist, T. Ewing, K.N. Hampton,and N. Ramakrishnan, “ThemeDelta: Dynamic segmentationsover temporal topic models,” IEEE Trans. Vis. Comput. Graph.,vol. 21, no. 5, pp. 672–685,May 2015.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2499

Page 19: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[209] S. Havre, E. Hetzler, P. Whitney, and L. Nowell, “ThemeRiver:Visualizing thematic changes in large document collections,”IEEE Trans. Vis. Comput. Graph., vol. 8, no. 1, pp. 9–20, Jan.–Mar. 2002.

[210] S. Havre, B. Hetzler, and L. Nowell, “ThemeRiver: Visualizingtheme changes over time,” in Proc. IEEE Symp. Inf. Vis., 2000,pp. 115–123.

[211] F. Heimerl, Q. Han, S. Koch, and T. Ertl, “CiteRivers: Visual ana-lytics of citation patterns,” IEEE Trans. Vis. Comput. Graph.,vol. 22, no. 1, pp. 190–199, Jan. 2016.

[212] M. Krstajic, E. Bertini, and D. Keim, “CloudLines: Compactdisplay of event episodes in multiple time-series,” IEEE Trans.Vis. Comput. Graph., vol. 17, no. 12, pp. 2432–2439, Dec. 2011.

[213] S. Liu, W. Zhu, N. Xu, F. Li, X.-Q. Cheng, Y. Liu, and Y. Wang,“Co-training and visualizing sentiment evolvement for tweetevents,” in Proc. Int. Conf. World Wide Web, 2013, pp. 105–106.

[214] L. Shi, F. Wei, S. Liu, L. Tan, X. Lian, and M. X. Zhou,“Understanding text corpora with multiple facets,” in Proc. IEEEConf. Visual Analytics Sci. Technol., 2010, pp. 99–106.

[215] G. Sun, Y. Wu, S. Liu, T. Q. Peng, J. J. H. Zhu, and R. Liang,“EvoRiver: Visual analysis of topic coopetition on social media,”IEEE Trans. Vis. Comput. Graph., vol. 20, no. 12, pp. 1753–1762,Dec. 2014.

[216] X. Wang, W. Dou, Z. Ma, J. Villalobos, Y. Chen, T. Kraft, andW. Ribarsky, “I-SI : Scalable architecture for analyzing latent top-ical-level information from social media data,” Comput. Graph.Forum, vol. 31, no. 3, pp. 1275–1284, 2012.

[217] F. Wei, S. Liu, Y. Song, S. Pan, M. X. Zhou, W. Qian, L. Shi,L. Tan, and Q. Zhang, “TIARA: A visual exploratory text analyticsystem,” in Proc. 16th ACM SIGKDD Int. Conf. Knowl. DiscoveryData Mining, 2010, pp. 153–162.

[218] J. Albrecht, R. Hwa, and G. E. Marai, “The chinese room: Visuali-zation and interaction to understand and correct ambiguousmachine translation,” Comput. Graph. Forum, vol. 28, no. 3,pp. 1047–1054, 2009.

[219] B. Chan, L. Wu, J. Talbot, M. Cammarano, and P. Hanrahan,“Vispedia: Interactive visual exploration of Wikipedia datavia search-based integration,” IEEE Trans. Vis. Comput. Graph.,vol. 14, no. 6, pp. 1213–1220, Nov./Dec. 2008.

[220] B. Chul Kwon, W. Javed, S. Ghani, N. Elmqvist, J. S. Yi, andD. S. Ebert, “Evaluating the role of time in investigative analysisof document collections,” IEEE Trans. Vis. Comput. Graph.,vol. 18, no. 11, pp. 1992–2004, Nov. 2012.

[221] J. Fulda, M. Brehmer, and T. Munzner, “TimeLineCurator: Inter-active authoring of visual timelines from unstructured text,”IEEE Trans. Vis. Comput. Graph., vol. 22, no. 1, pp. 300–309, Jan.2016.

[222] S. J€anicke, J. Focht, and G. Scheuermann, “Interactive visualprofiling of musicians,” IEEE Trans. Vis. Comput. Graph., vol. 22,no. 1, pp. 200–209, Jan. 2016.

[223] J. Leskovec, L. Backstrom, and J. Kleinberg, “Meme-tracking andthe dynamics of the news cycle,” in Proc. 15th ACM SIGKDD Int.Conf. Knowl. Discovery Data Mining, 2009, pp. 497–506.

[224] S. Rose, S. Butner, W. Cowley, M. Gregory, and J. Walker,“Describing story evolution from dynamic information streams,”in Proc. IEEE Conf. Visual Analytics Sci. Technol., 2009, pp. 99–106.

[225] K. Andrews, W. Kienreich, V. Sabol, J. Becker, G. Droschl,F. Kappe, M. Granitzer, P. Auer, and K. Tochtermann, “TheInfoSky visual explorer: Exploiting hierarchical structure anddocument similarities,” Inf. Vis., vol. 1, no. 3–4, pp. 166–181, 2002.

[226] Y. Chen, L. Wang, M. Dong, and J. Hua, “Exemplar-based visual-ization of large document corpus,” IEEE Trans. Vis. Comput.Graph., vol. 15, no. 6, pp. 1161–1168, Nov./Dec. 2009.

[227] A. Endert, S. Fox, D. Maiti, S. Leman, and C. North, “The seman-tics of clustering: Analysis of user-generated spatializations oftext documents,” in Proc. Work. Conf. Advanced Visual Interfaces,2012, pp. 555–562.

[228] K. Fujimura, S. Fujimura, T. Matsubayashi, T. Yamada, andH. Okuda, “Topigraphy: Visualization for large-scale tagclouds,” in Proc. Int. Conf. World Wide Web, 2008, pp. 1087–1088.

[229] E. Gomez-Nieto, R. F. San, P. Pagliosa, W. Casaca, E. S. Helou,M. C. de Oliveira, and L. G. Nonato, “Similarity preserving snip-pet-based visualization of web search results,” IEEE Trans. Vis.Comput. Graph., vol. 20, no. 3, pp. 457–470, Mar. 2014.

[230] M. A. Hearst, “TileBars: Visualization of term distribution infor-mation in full text information access,” in Proc. SIGCHI Conf.Human Factors Comput. Syst., 1995, pp. 59–66.

[231] E. G. Hetzler, V. L. Crow, D. A. Payne, and A. E. Turner,“Turning the bucket of text into a pipe,” in Proc. IEEE Symp. Inf.Vis., 2005, pp. 89–94.

[232] A. Leuski and J. Allan, “Lighthouse: Showing the way to relevantinformation,” in Proc. IEEE Symp. Inf. Vis., 2000, pp. 125–129.

[233] T. N. Nguyen and J. Zhang, “A novel visualization model forweb search results,” IEEE Trans. Vis. Comput. Graph., vol. 12,no. 5, pp. 981–988, Sep./Oct. 2006.

[234] K. Verbert, D. Parra, P. Brusilovsky, and E. Duval, “Visualizingrecommendations to support exploration, transparency and con-trollability,” in Proc. Int. Conf. Intell. User Interfaces, 2013, pp. 351–362.

[235] A. Nocaj and U. Brandes, “Organizing search results with a refer-ence map,” IEEE Trans. Vis. Comput. Graph., vol. 18, no. 12,pp. 2546–2555, Dec. 2012.

[236] C. Shah and W. B. Croft, “Evaluating high accuracy retrievaltechniques,” in Proc. 27th Annu. Int. ACM SIGIR Conf. Res.Develop. Inf. Retrieval, 2004, pp. 2–9.

[237] D. Oelke, H. Strobelt, C. Rohrdantz, I. Gurevych, and O. Deussen,“Comparative exploration of document collections: A visualanalytics approach,” Comput. Graph. Forum, vol. 33, no. 3, pp. 201–210, 2014.

[238] P. Riehmann, M. Potthast, B. Stein, and B. Froehlich, “Visualassessment of alleged plagiarism cases,” Comput. Graph. Forum,vol. 34, no. 3, pp. 61–70, 2015.

[239] A. Spoerri, “InfoCrystal: A visual tool for information retrieval &management,” in Proc. IEEE Conf. Vis., 1993, pp. 150–157.

[240] E. Alexander and M. Gleicher, “Task-driven comparison of topicmodels,” IEEE Trans. Vis. Comput. Graph., vol. 22, no. 1, pp. 320–329, Jan. 2016.

[241] P. Riehmann, H. Gruendl, M. Potthast, M. Trenkmann, B. Stein,and B. Froehlich, “WORDGRAPH: Keyword-in-context visuali-zation for NETSPEAK’s wildcard search,” IEEE Trans. Vis. Com-put. Graph., vol. 18, no. 9, pp. 1411–1423, Sep. 2012.

[242] M. D€ork, S. Carpendale, C. Collins, and C. Williamson, “VisGets:Coordinated visualizations for web-based information explora-tion and discovery,” IEEE Trans. Vis. Comput. Graph., vol. 14,no. 6, pp. 1205–1212, Nov./Dec. 2008.

[243] D. Mashima, S. Kobourov, and Y. Hu, “Visualizing dynamic datawith maps,” IEEE Trans. Vis. Comput. Graph., vol. 18, no. 9,pp. 1424–1437, Sep. 2012.

[244] D. Thom, R. Kr€uger, and T. Ertl, “Can Twitter save lives?A broad-scale study on visual social media analytics for publicsafety,” IEEE Trans. Vis. Comput. Graph., vol. 22, no. 7, pp. 1816–1829, Jul. 2016.

[245] D. Thom, H. Bosch, S. Koch, M. W€orner, and T. Ertl, “Spatio-temporal anomaly detection through visual analysis of geolocatedTwittermessages,” inProc. IEEEVis. Symp., 2012, pp. 41–48.

[246] N. UzZaman, J. P. Bigham, and J. F. Allen, “Multimodal summa-rization of complex sentences,” in Proc. Int. Conf. Intell. User Inter-faces, 2011, pp. 43–52.

[247] M. El-Assady, V. Gold, C. Acevedo, C. Collins, and D. Keim,“ConToVi: Multi-party conversation exploration using topic-space views,” Comput. Graph. Forum, vol. 35, pp. 431–440, 2016.

[248] E. Hoque and G. Carenini, “ConVis: A visual text analytic systemfor exploring blog conversations,” Comput. Graph. Forum, vol. 33,no. 3, pp. 221–230, 2014.

[249] C. Zhai and J. Lafferty, “Two-stage language models for informa-tion retrieval,” in Proc. 25th Annu. Int. ACM SIGIR Conf. Res.Develop. Inf. Retrieval, 2002, pp. 49–56.

[250] M. Deller, S. Agne, A. Ebert, A. Dengel, H. Hagen, B. Klein,M. Bender, T. Bernardin, and B. Hamann, “Managing a docu-ment-based information space,” in Proc. Int. Conf. Intell. UserInterfaces, 2008, pp. 119–128.

[251] K. Hartmann, S. Schlechtweg, R. Helbing, and T. Strothotte,“Knowledge-supported graphical illustration of texts,” in Proc.Work. Conf. Advanced Visual Interfaces, 2002, pp. 300–307.

[252] P. Oesterling, G. Scheuermann, S. Teresniak, G. Heyer, S. Koch,T. Ertl, and G. H. Weber, “Two-stage framework for a topology-based projection and visualization of classified documentcollections,” in Proc. IEEE Conf. Visual Analytics Sci. Technol.,2010, pp. 91–98.

[253] R. M. Rohrer, D. S. Ebert, and J. L. Sibert, “The shape of Shake-speare: Visualizing text using implicit surfaces,” in Proc. IEEESymp. Inf. Vis., 1998, pp. 121–129.

[254] C. Cortes and V. Vapnik, “Support-vector networks,” Mach.Learn., vol. 20, no. 3, pp. 273–297, 1995.

2500 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 20: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[255] Q. Cui, B. Gao, J. Bian, S. Qiu, H. Dai, and T.-Y. Liu, “KNET: Ageneral framework for learning word embedding using morpho-logical knowledge,” ACM Trans. Inf. Syst., vol. 34, no. 1, pp. 4:1–4:25, Aug. 2015.

[256] S. Dezdar and A. Sulaiman, “Successful enterprise resource plan-ning implementation: Taxonomy of critical factors,” Ind. Manage.Data Syst., vol. 109, no. 8, pp. 1037–1052, 2009.

[257] S. Agrawal, K. Chakrabarti, S. Chaudhuri, V. Ganti, A. C. Konig,and D. Xin, “Exploiting web search engines to search structureddatabases,” in Proc. 18th Int. Conf. World Wide Web, 2009, pp. 501–510.

[258] H. Bast, A. Chitea, F. Suchanek, and I. Weber, “ESTER: Efficientsearch on text, entities, and relations,” in Proc. 30th Annu. Int.ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 671–678.

[259] S. Jameel, Z. Bouraoui, and S. Schockaert, “MEmbER: Max-Margin based embeddings for entity retrieval,” in Proc. 40th Int.ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2017, pp. 783–792.

[260] J. Chu-Carroll, J. Prager, K. Czuba, D. Ferrucci, and P. Duboue,“Semantic search via XML fragments: A high-precision approachto IR,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf.Retrieval, 2006, pp. 445–452.

[261] C. L. Clarke, “Controlling overlap in content-oriented XMLretrieval,” in Proc. 28th Annu. Int. ACM SIGIR Conf. Res. Develop.Inf. Retrieval, 2005, pp. 314–321.

[262] N. Fuhr and K. Großjohann, “XIRQL: A query language for infor-mation retrieval in XML documents,” in Proc. 24th Annu. Int.ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2001, pp. 172–180.

[263] B. Carterette, J. Allan, and R. Sitaraman, “Minimal test collec-tions for retrieval evaluation,” in Proc. 29th Annu. Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2006, pp. 268–275.

[264] N. Ireson, F. Ciravegna, M. E. Califf, D. Freitag, N. Kushmerick,and A. Lavelli, “Evaluating machine learning for informationextraction,” in Proc. 22nd Int. Conf. Mach. Learn., 2005, pp. 345–352.

[265] R. Atterer, M. Wnuk, and A. Schmidt, “Knowing the user’s everymove: User activity tracking for website usability evaluation andimplicit interaction,” in Proc. 15th Int. Conf. World Wide Web,2006, pp. 203–212.

[266] J. A. Aslam and M. Montague, “Models for metasearch,” in Proc.24th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2001,pp. 276–284.

[267] S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi, “Real-timerecommendation of diverse related articles,” in Proc. 22nd Int.Conf. World Wide Web, 2013, pp. 1–12.

[268] A. Broder, M. Fontoura, V. Josifovski, and L. Riedel, “A semanticapproach to contextual advertising,” in Proc. 30th Annu. Int.ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 559–566.

[269] Y. Lv, T. Moon, P. Kolari, Z. Zheng, X. Wang, and Y. Chang,“Learning to model relatedness for news recommendation,” inProc. 20th Int. Conf. World Wide Web, 2011, pp. 57–66.

[270] J. Proskurnia,M.-A. Cartright, L.Garcia-Pueyo, I. Krka, J. B.Wendt,T. Kaufmann, and B. Miklos, “Template induction over unstruc-tured email corpora,” in Proc. 26th Int. Conf. World Wide Web, 2017,pp. 1521–1530.

[271] L. Ramaswamy, A. Iyengar, L. Liu, and F. Douglis, “Automaticdetection of fragments in dynamically generated web pages,” inProc. 13th Int. Conf. World Wide Web, 2004, pp. 443–454.

[272] T. H. Haveliwala, “Topic-sensitive PageRank,” in Proc. 11th Int.Conf. World Wide Web, 2002, pp. 517–526.

[273] S. Wang, Z. Chen, B. Liu, and S. Emery, “Identifying search key-words for finding relevant social media posts,” in Proc. 30thAAAI Conf. Artif. Intell., 2016, pp. 3052–3058.

[274] Y. Du, C. Xu, and D. Tao, “Privileged matrix factorization for col-laborative filtering,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017,pp. 1610–1616.

[275] Y. Zhang, J. Callan, and T. Minka, “Novelty and redundancydetection in adaptive filtering,” in Proc. 25th Annu. Int. ACMSIGIR Conf. Res. Develop. Inf. Retrieval, 2002, pp. 81–88.

[276] S. Havre, E. Hetzler, K. Perrine, E. Jurrus, and N. Miller,“Interactive visualization of multiple query results,” in Proc.IEEE Symp. Inf. Vis., 2001, pp. 105–112.

[277] S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma, “Improving pseudo-relevance feedback in web information retrieval using web pagesegmentation,” in Proc. 12th Int. Conf. World Wide Web, 2003,pp. 11–18.

[278] M. Shokouhi, L. Azzopardi, and P. Thomas, “Effective queryexpansion for federated search,” in Proc. 32nd Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2009, pp. 427–434.

[279] J. Xu and J. Callan, “Effective retrievalwith distributed collections,”in Proc. 21st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,1998, pp. 112–120.

[280] C. Zhang, L. Shou, K. Chen, and G. Chen, “See-to-retrieve: Efficientprocessing of spatio-visual keyword queries,” in Proc. 35th Int.ACMSIGIRConf. Res. Develop. Inf. Retrieval, 2012, pp. 681–690.

[281] Y. Liu, A. Niculescu-Mizil, and W. Gryc, “Topic-link LDA: Jointmodels of topic and author community,” in Proc. 26th Annu. Int.Conf. Mach. Learn., 2009, pp. 665–672.

[282] J. Tang, J. Sun, C. Wang, and Z. Yang, “Social influence analysisin large-scale networks,” in Proc. 15th ACM SIGKDD Int. Conf.Knowl. Discovery Data Mining, 2009, pp. 807–816.

[283] G. Zheng, J. Guo, L. Yang, S. Xu, S. Bao, Z. Su, D. Han, and Y. Yu,“Mining topics on participations for community discovery,” inProc. 34th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,2011, pp. 445–454.

[284] F. Y. Choi, “Advances in domain independent linear textsegmentation,” in Proc. 1st North Amer. Chapter Assoc. Comput.Linguistics Conf., 2000, pp. 26–33.

[285] X. Ji and H. Zha, “Domain-independent text segmentation usinganisotropic diffusion and dynamic programming,” in Proc. 26thAnnu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2003,pp. 322–329.

[286] C. Wang, Y. Wang, P.-S. Huang, A. Mohamed, D. Zhou, andL. Deng, “Sequence modeling via segmentations,” in Proc. 34thAnnu. Int. Conf. Mach. Learn., 2017, pp. 3674–3683.

[287] Q. Mei, D. Cai, D. Zhang, and C. Zhai, “Topic modeling withnetwork regularization,” in Proc. 17th Int. Conf. World Wide Web,2008, pp. 101–110.

[288] Q. Mei and C. Zhai, “A mixture model for contextual text min-ing,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery DataMining, 2006, pp. 649–655.

[289] F. Ieva, A. M. Paganoni, and N. Tarabelloni, “Covariance-basedclustering in multivariate and functional data analysis,” J. Mach.Learn. Res., vol. 17, no. 1, pp. 4985–5005, 2016.

[290] B. Larsen and C. Aone, “Fast and effective text mining usinglinear-time document clustering,” in Proc. 5th ACM SIGKDD Int.Conf. Knowl. Discovery Data Mining, 1999, pp. 16–22.

[291] W. Xu, X. Liu, and Y. Gong, “Document clustering based on non-negative matrix factorization,” in Proc. 26th Annu. Int. ACMSIGIR Conf. Res. Develop. Inf. Retrieval, 2003, pp. 267–273.

[292] E. Amitay, N. Har’El, R. Sivan, and A. Soffer, “Web-a-where:Geotagging web content,” in Proc. 27th Annu. Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2004, pp. 273–280.

[293] Y. Kim, J. Han, and C. Yuan, “TOPTRAC: Topical trajectory pat-tern mining,” in Proc. 21th ACM SIGKDD Int. Conf. Knowl. Discov-ery Data Mining, 2015, pp. 587–596.

[294] M. D. Lieberman and H. Samet, “Adaptive context features fortoponym resolution in streaming news,” in Proc. 35th Annu. Int.ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 731–740.

[295] C. Wang and S. Mahadevan, “Heterogeneous domain adaptationusing manifold alignment,” in Proc. 22nd Int. Joint Conf. Artif.Intell., 2011, pp. 1541–1546.

[296] F. Wu and Y. Huang, “Sentiment domain adaptation with multi-ple sources,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguis-tics, 2016, pp. 301–310.

[297] G. Zhou, Z. Xie, J. X. Huang, and T. He, “Bi-transferring deepneural networks for domain adaptation,” in Proc. 54th Annu.Meeting Assoc. Comput. Linguistics, 2016, pp. 322–332.

[298] C. Collins, S. Carpendale, and G. Penn, “Visualization of uncer-tainty in lattices to support decision-making,” in Proc. Euro-graphics Conf. Vis., 2007, pp. 51–58.

[299] F. B. Vi�egas, S. Golder, and J. Donath, “Visualizing email con-tent: Portraying relationships from conversational histories,”in Proc. SIGCHI Conf. Human Factors Comput. Syst., 2006,pp. 979–988.

[300] J. Prager, E. Brown, A. Coden, and D. Radev, “Question-answeringby predictive annotation,” in Proc. 23rd Annu. Int. ACMSIGIR Conf.Res. Develop. Inf. Retrieval, 2000, pp. 184–191.

[301] Y. Shen, W. Rong, N. Jiang, B. Peng, J. Tang, and Z. Xiong, “Wordembedding based correlation model for question/answermatching,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 3511–3517.

[302] Y. Guo and M. Xiao, “Cross language text classification via sub-space co-regularized multi-view learning,” Proc. 29th Annu. Int.Conf. Mach. Learn., pp. 1615–1622, 2012.

[303] M. Xiao and Y. Guo, “A novel two-step method for cross lan-guage representation learning,” in Proc. Advances in Neural Infor-mation Processing Systems, 2013, pp. 1259–1267.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2501

Page 21: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[304] Z. Lu, L. Wang, and J.-R. Wen, “Direct semantic analysis forsocial image classification,” in Proc. AAAI Conf. Artif. Intell., 2014,pp. 1258–1264.

[305] T. Maekawa, T. Hara, and S. Nishio, “Image classification formobile web browsing,” in Proc. 15th Int. Conf. World Wide Web,2006, pp. 43–52.

[306] W. Xie, Y. Peng, and J. Xiao, “Cross-view feature learning forscalable social image analysis,” in Proc. AAAI Conf. Artif. Intell.,2014, pp. 201–207.

[307] A. Mishra, K. Dey, and P. Bhattacharyya, “Learning cognitivefeatures from gaze data for sentiment and sarcasm classificationusing convolutional neural network,” in Proc. 55th Annu. MeetingAssoc. Comput. Linguistics, 2017, pp. 377–387.

[308] M. D. Reid and R. C. Williamson, “Information, divergence andrisk for binary experiments,” J. Mach. Learn. Res., vol. 12, no. Mar,pp. 731–817, 2011.

[309] U. Sapkota, T. Solorio, M. Montes-y G�omez, and S. Bethard,“Domain adaptation for authorship attribution: Improved struc-tural correspondence learning,” in Proc. 54th Annu. MeetingAssoc. Comput. Linguistics, 2016, pp. 2226–2235.

[310] D. Zhang and W. S. Lee, “Web taxonomy integration using sup-port vector machines,” in Proc. 13th Int. Conf. World Wide Web,2004, pp. 472–481.

[311] S. Dumais and H. Chen, “Hierarchical classification of webcontent,” in Proc. 23rd Annu. Int. ACM SIGIR Conf. Res. Develop.Inf. Retrieval, 2000, pp. 256–263.

[312] A. J. Perotte, F. Wood, N. Elhadad, and N. Bartlett, “Hier-archically supervised latent Dirichlet allocation,” in Proc. Advan-ces in Neural Information Processing Systems, 2011, pp. 2609–2617.

[313] Z. Ren, M.-H. Peetz, S. Liang, W. van Dolen, and M. de Rijke,“Hierarchical multi-label classification of social text streams,” inProc. 37th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,2014, pp. 213–222.

[314] E. Diemert and G. Vandelle, “Unsupervised query categorizationusing automatically-built concept graphs,” in Proc. 18th Int. Conf.World Wide Web, 2009, pp. 461–470.

[315] J. Hu, G. Wang, F. Lochovsky, J.-T. Sun, and Z. Chen, “Under-standing user’s query intent with Wikipedia,” in Proc. 18th Int.Conf. World Wide Web, 2009, pp. 471–480.

[316] I.-H. Kang and G. Kim, “Query type classification for webdocument retrieval,” in Proc. 26th Annu. Int. ACM SIGIR Conf.Res. Develop. Inf. Retrieval, 2003, pp. 64–71.

[317] D. Shen, Z. Chen, Q. Yang, H.-J. Zeng, B. Zhang, Y. Lu, andW.-Y. Ma, “Web-page classification through summarization,” inProc. 27th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,2004, pp. 242–249.

[318] H. Wang, H. Huang, F. Nie, and C. Ding, “Cross-language webpage classification via dual knowledge transfer using nonnega-tive matrix tri-factorization,” in Proc. 34th Annu. Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2011, pp. 933–942.

[319] H. Raghavan and J. Allan, “An interactive algorithm for askingand incorporating feature feedback into support vectormachines,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop.Inf. Retrieval, 2007, pp. 79–86.

[320] D. Spina, J. Gonzalo, and E. Amig�o, “Learning similarity func-tions for topic detection in online reputation monitoring,” inProc. 37th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval,2014, pp. 527–536.

[321] S. Muthiah, B. Huang, J. Arredondo, D. Mares, L. Getoor,G. Katz, and N. Ramakrishnan, “Planned protest modeling innews and social media,” in Proc. AAAI Conf. Artif. Intell., 2015,pp. 3920–3927.

[322] D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining.Cambridge, MA, USA: MIT Press, 2001.

[323] C. Bishop, J. Winn, and T. Diethe, “Model-based MachineLearning,” 2015. [Online]. Available: http://mbmlbook.com/,Accessed on: Dec. 31, 2017.

[324] C. M. Bishop, Pattern Recognition and Machine Learning. Berlin,Germany: Springer, 2006.

[325] D. Wang, T. Li, S. Zhu, and C. Ding, “Multi-document summari-zation via sentence-level semantic analysis and symmetricmatrix factorization,” in Proc. 31st Annu. Int. ACM SIGIR Conf.Res. Develop. Inf. Retrieval, 2008, pp. 307–314.

[326] Y. Wang, H. Huang, C. Feng, Q. Zhou, J. Gu, and X. Gao, “CSE:Conceptual sentence embeddings based on attention model,”in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, 2016,pp. 505–515.

[327] J. Wieting and K. Gimpel, “Revisiting recurrent networks forparaphrastic sentence embeddings,” in Proc. 55th Annu. MeetingAssoc. Comput. Linguistics, 2017, pp. 2078–2088.

[328] ACL, “Learning bilingualword embeddingswith (almost) no bilin-gual data,” in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics,2017, pp. 451–462.

[329] H. Sch€utze, “Word space,” in Proc. Advances in Neural InformationProcessing Systems, 1993, pp. 895–902.

[330] P. Sen, “Collective context-aware topic models for entitydisambiguation,” in Proc. 21st Int. Conf. World Wide Web, 2012,pp. 729–738.

[331] R. K. Ando, “Latent semantic space: Iterative scaling improvesprecision of inter-document similarity measurement,” in Proc.23rd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2000,pp. 216–223.

[332] P. Ogilvie and J. Callan, “Combining document representationsfor known-item search,” in Proc. 26th Annu. Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2003, pp. 143–150.

[333] X. Zhou, X. Wan, and J. Xiao, “Cross-lingual sentiment classifica-tion with bilingual document representation learning,” in Proc.54th Annu.Meeting Assoc. Comput. Linguistics, 2016, pp. 1403–1412.

[334] A. J. Brockmeier, T. Mu, S. Ananiadou, and J. Y. Goulermas,“Quantifying the informativeness of similarity measurements,”J. Mach. Learn. Res., vol. 18, no. 76, pp. 1–61, 2017.

[335] C.-K. Hsieh, L. Yang, Y. Cui, T.-Y. Lin, S. Belongie, and D. Estrin,“Collaborative metric learning,” in Proc. 26th Int. Conf. WorldWide Web, 2017, pp. 193–201.

[336] C. Xiao, P. Zhang, W. A. Chaovalitwongse, J. Hu, andF. Wang, “Adverse drug reaction prediction with symboliclatent Dirichlet allocation,” in Proc. AAAI Conf. Artif. Intell.,2017, pp. 1590–1596.

[337] Z.-H. Deng, H. Yu, and Y. Yang, “Identifying sentiment wordsusing an optimization model with L1 regularization,” in Proc.AAAI Conf. Artif. Intell., 2016, pp. 115–121.

[338] N. Du, Y. Liang, M.-F. Balcan, M. Gomez-Rodriguez, H. Zha, andL. Song, “Scalable influence maximization for multiple productsin continuous-time diffusion networks,” J. Mach. Learn. Res.,vol. 18, no. 2, pp. 1–45, 2017.

[339] Y. Wang, G. Williams, E. Theodorou, and L. Song, “Variationalpolicy for guiding point processes,” in Proc. 34th Annu. Int. Conf.Mach. Learn., 2017, pp. 3684–3693.

[340] Y.-H. Kim, S.-Y. Hahn, and B.-T. Zhang, “Text filtering by boost-ing naive bayes classifiers,” in Proc. 23rd Annu. Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2000, pp. 168–175.

[341] J. Lee, C. Heaukulani, Z. Ghahramani, L. F. James, and S. Choi,“Bayesian inference on random simple graphs with power lawdegree distributions,” in Proc. 34th Annu. Int. Conf. Mach. Learn.,2017, pp. 2004–2013.

[342] M. Yurochkin and X. Nguyen, “Geometric Dirichlet means algo-rithm for topic inference,” in Proc. Advances in Neural InformationProcessing Systems, 2016, pp. 2505–2513.

[343] R. Campos, S. Canuto, T. Salles, C. C. de S�a, andM. A. Goncalves,“Stacking bagged and boosted forests for effective automatedclassification,” in Proc. 40th Annu. Int. ACM SIGIR Conf. Res.Develop. Inf. Retrieval, 2017, pp. 105–114.

[344] B. Joshi, M. R. Amini, I. Partalas, F. Iutzeler, and Y. Maximov,“Aggressive sampling for multi-class to binary reduction withapplications to text classification,” in Proc. Advances in NeuralInformation Processing Systems, 2017, pp. 4159–4168.

[345] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, andC. Watkins, “Text classification using string Kernels,” J. Mach.Learn. Res., vol. 2, no. Feb, pp. 419–444, 2002.

[346] J.-W. Son, J. Jeon, A. Lee, and S.-J. Kim, “Spectral clustering withbrainstorming process for multi-view data,” in Proc. AAAI Conf.Artif. Intell., 2017, pp. 2548–2554.

[347] Y. Song, S. Pan, S. Liu, F. Wei, M. X. Zhou, and W. Qian,“Constrained coclustering for textual documents,” in Proc. AAAIConf. Artif. Intell., 2010, pp. 581–586.

[348] H. Zha, X. He, C. Ding, M. Gu, and H. D. Simon, “Spectral relaxa-tion for k-means clustering,” in Proc. Advances in Neural Informa-tion Processing Systems, 2002, pp. 1057–1064.

[349] X. Guo, L. Gao, X. Liu, and J. Yin, “Improved deep embeddedclustering with local structure preservation,” in Proc. 26th Int.Joint Conf. Artif. Intell., 2017, pp. 1753–1759.

[350] M. Luo, F. Nie, X. Chang, Y. Yang, A. G. Hauptmann, andQ. Zheng, “Probabilistic non-negative matrix factorization andits robust extensions for topic modeling,” in Proc. AAAI Conf.Artif. Intell., 2017, pp. 2308–2314.

2502 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019

Page 22: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

[351] H. Sch€utze and C. Silverstein, “Projections for efficient documentclustering,” in Proc. 20th Annu. Int. ACM SIGIR Conf. Res. Develop.Inf. Retrieval, 1997, pp. 74–81.

[352] D. M. Blei and J. D. Lafferty, “Dynamic topic models,” in Proc.23rd Annu. Int. Conf. Mach. Learn., 2006, pp. 113–120.

[353] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichletallocation,” J. Mach. Learn. Res., vol. 3, no. Jan, pp. 993–1022, 2003.

[354] J. He, Z. Hu, T. Berg-Kirkpatrick, Y. Huang, and E. P. Xing,“Efficient correlated topic modeling with topic embedding,” inProc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,2017, pp. 225–233.

[355] Z. Cao, W. Li, S. Li, and F. Wei, “Improving multi-documentsummarization via text classification,” in Proc. AAAI Conf. Artif.Intell., 2017, pp. 3053–3059.

[356] Q. M. Hoang, T. N. Hoang, and K. H. Low, “A generalized sto-chastic variational Bayesian hyperparameter learning frameworkfor sparse spectrum Gaussian process regression,” in Proc. AAAIConf. Artif. Intell., 2017, pp. 2007–2014.

[357] L. Si and J. Callan, “Using sampled data and regression to mergesearch engine results,” in Proc. 25th Annu. Int. ACM SIGIR Conf.Res. Develop. Inf. Retrieval, 2002, pp. 19–26.

[358] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neuralprobabilistic language model,” J. Mach. Learn. Res., vol. 3,pp. 1137–1155, 2003.

[359] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Languagemodeling with gated convolutional networks,” in Proc. 34thAnnu. Int. Conf. Mach. Learn., 2017, pp. 933–941.

[360] J. Huang, J. Gao, J. Miao, X. Li, K. Wang, F. Behr, and C. L. Giles,“Exploring web scale language models for search query proc-essing,” in Proc. 19th Int. Conf. World Wide Web, 2010, pp. 451–460.

[361] D. Pinto, A. McCallum, X. Wei, andW. B. Croft, “Table extractionusing conditional random fields,” in Proc. 26th Annu. Int. ACMSIGIR Conf. Res. Develop. Inf. Retrieval, 2003, pp. 235–242.

[362] M. M. Rahman and H. Wang, “Hidden topic sentiment model,”in Proc. 25th Int. Conf. World Wide Web, 2016, pp. 155–165.

[363] N. L. Zhang and L. K. Poon, “Latent tree analysis,” in Proc. AAAIConf. Artif. Intell., 2017, pp. 4891–4898.

[364] R. Johnson and T. Zhang, “Semi-supervised convolutional neuralnetworks for text categorization via region embedding,” in Proc.Advances in Neural Information Processing Systems, 2015, pp. 919–927.

[365] S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neuralnetworks for text classification,” in Proc. AAAI Conf. Artif. Intell.,2015, pp. 2267–2273.

[366] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, andM. Cha, “Detecting rumors from microblogs with recurrentneural networks,” in Proc. 25th Int. Joint Conf. Artif. Intell., 2016,pp. 3818–3824.

[367] D. M. Blei and P. I. Frazier, “Distance dependent Chinese res-taurant processes,” J. Mach. Learn. Res., vol. 12, no. Aug, pp.2461–2488, 2011.

[368] K. Ishiguro, I. Sato, and N. Ueda, “Averaged collapsed varia-tional bayes inference,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 1–29, 2017.

[369] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “HierarchicalDirichlet processes,” J. Amer. Statistical Assoc., vol. 101, no. 476,pp. 1566–1581, 2006.

[370] S. Liu, J. Xiao, J. Liu, X. Wang, J. Wu, and J. Zhu, “Visual diagno-sis of tree boosting methods,” IEEE Trans. Vis. Comput. Graph.,vol. 24, no. 1, pp. 163–173, Jan. 2018.

[371] G. K. Tam, V. Kothari, and M. Chen, “An analysis of machine-and human-analytics in classification,” IEEE Trans. Vis. Comput.Graph., vol. 23, no. 1, pp. 71–80, Jan. 2017.

[372] S.-H. Yang, A. Kolcz, A. Schlaikjer, and P. Gupta, “Large-scalehigh-precision topic modeling on Twitter,” in Proc. 20th ACMSIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 1907–1916.

[373] T. H€ollt, N. Pezzotti, V. van Unen, F. Koning, B. P. F. Lelieveldt,and A. Vilanova, “CyteGuide: Visual guidance for hierarchicalsingle-cell analysis,” IEEE Trans. Vis. Comput. Graph., vol. 24,no. 1, pp. 739–748, Jan. 2018.

[374] J. R. Finkel, T. Grenager, and C. Manning, “Incorporating non-local information into information extraction systems by Gibbssampling,” in Proc. 43rd Annu. Meeting Assoc. Comput. Linguistics,2005, pp. 363–370.

[375] M. Liu, L. Jiang, J. Liu, X. Wang, J. Zhu, and S. Liu, “Improvinglearning-from-crowds through expert validation,” in Proc. 26thInt. Joint Conf. Artif. Intell., 2017, pp. 2329–2336.

[376] S. Liu, X. Wang, M. Liu, and J. Zhu, “Towards better analysis ofmachine learning models: A visual analytics perspective,” VisualInform., vol. 1, no. 1, pp. 48–56, 2017.

[377] M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards betteranalysis of deep convolutional neural networks,” IEEE Trans.Vis. Comput. Graph., vol. 23, no. 1, pp. 91–100, Jan. 2017.

Shixia Liu received the BS and MS degrees fromthe Harbin Institute of Technology, and the PhDdegree from Tsinghua University. She is an asso-ciate professor with Tsinghua University. Herresearch interests include visual text analytics,visual social analytics, interactive machine learn-ing, and text mining. She worked as a researchstaff member with IBM China Research Lab anda lead researcher with Microsoft Research Asia.She is an associate editor of the IEEE Transac-tions on Visualization and Computer Graphicsand information visualization.

Xiting Wang received the BS and PhD degreesfrom Tsinghua University. She is now an associ-ate researcher with Microsoft Research Asia. Herresearch interests include text mining, visual textanalytics, and explainable recommendation.

Christopher Collins received the PhD degreefrom the University of Toronto, in 2010. He is cur-rently the canada research chair in LinguisticInformation Visualization and associate professorwith the University of Ontario Institute of Technol-ogy. His research focus combines informationvisualization and human-computer interactionwith natural language processing. He is a memberof the IEEE, a past member of the executive of theIEEE VGTC and has served several roles on theIEEEVIS ConferenceOrganizing Committee.

Wenwen Dou is an assistant professor with theUniversity of North Carolina at Charlotte. Herresearch interests include visual analytics, textmining, and human computer interaction. Shehas worked with various analytics domainsin reducing information overload and providinginteractive visual means to analyzing unstruc-tured information. She has experience in turningcutting-edge research into technologies that havebroad societal impacts.

Fangxin Ouyang received the BS degree fromthe School of Software, Tsinghua University. Sheis working toward the master’s degree at Tsing-hua University. Her research interest is visualtext analytics.

LIU ET AL.: BRIDGING TEXT VISUALIZATION AND MINING: A TASK-DRIVEN SURVEY 2503

Page 23: Bridging Text Visualization and Mining: A Task …...Bridging Text Visualization and Mining: A Task-Driven Survey Shixia Liu , Xiting Wang, Christopher Collins , Member, IEEE, Wenwen

Mennatallah El-Assady received theMSc degreein information engineering from the University ofKonstanz, in 2015. She is a research associatewith the group for Data Analysis and Visualization,University of Konstanz, Germany and in the Visu-alization for Information Analysis lab with the Uni-versity of Ontario Institute of Technology, Canada.Her research interests include visual text analytics,user-steerable topic modeling, and discourse/con-versational text analysis.

Liu Jiang received the BS degree in ElectronicsEngineering, University of Science and Tech-nology of China. He is working toward the PhDdegree at Tsinghua University. His researchinterest includes learning from crowds and inter-active machine learning.

Daniel A. Keim received the PhD degree incomputer science from the University of Munich,Germany. He is professor and head of the Informa-tion Visualization and Data Analysis ResearchGroup with the Computer Science Department,University of Konstanz, Germany. He has beenactively involved in data analysis and informationvisualization research for more than 25 years anddeveloped novel visual analysis techniques forlarge data sets, including textual data. Before join-ing the University of Konstanz, he was associate

professor with the University of Halle, Germany and Technology Consul-tant at AT&TShannon Research Labs, NJ.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

2504 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 25, NO. 7, JULY 2019