InSocialNet: Interactive visual analytics for role–event ... · behavioral studies [5–8]. Among multimedia data, role–event videos are of special attention. Role–event videos,

Computational Visual Mediahttps://doi.org/10.1007/s41095-019-0157-9 Vol. 5, No. 4, December 2019, 375–390

Research Article

InSocialNet: Interactive visual analytics for role–event videos

Yaohua Pan1, Zhibin Niu1 (�), Jing Wu2, and Jiawan Zhang1

c© The Author(s) 2019.

Abstract Role–event videos are rich in informationbut challenging to be understood at the story level. Thesocial roles and behavior patterns of characters largelydepend on the interactions among characters and thebackground events. Understanding them requires analysisof the video contents for a long duration, which isbeyond the ability of current algorithms designed foranalyzing short-time dynamics. In this paper, wepropose InSocialNet, an interactive video analytics toolfor analyzing the contents of role–event videos. Itautomatically and dynamically constructs social networksfrom role–event videos making use of face and expressionrecognition, and provides a visual interface for interactiveanalysis of video contents. Together with social networkanalysis at the back end, InSocialNet supports usersto investigate characters, their relationships, social roles,factions, and events in the input video. We conduct casestudies to demonstrate the effectiveness of InSocialNet inassisting the harvest of rich information from role–eventvideos. We believe the current prototype implementationcan be extended to applications beyond movie analysis,e.g., social psychology experiments to help understandcrowd social behaviors.

Keywords visual analytics; behavioral psychology;role–event videos; social network; videoanalysis

1 Introduction

Videos provide temporal information in addition tothe visual appearance presented in images. The

1 College of Intelligence and Computing, and School of NewMedia and Communication, Tianjin University, Tianjin,300354, China. E-mail: Y. Pan, [email protected]; Z. Niu,[email protected] (�); J. Zhang, [email protected].

2 School of Computer Science & Informatics, CardiffUniversity, CF243AA, UK. E-mail: [email protected].

Manuscript received: 2019-12-08; accepted: 2019-12-24

temporal information enables analysis of dynamicchanges and has boosted a wide range of computervision research, e.g., facial expression recognition,action recognition, and tracking [1–3]. Most analysistasks in computer vision focused on analyzing thedynamics in short time duration. In social scienceand behavior psychology domains, it is interestingand important to analyze social interactions amongpeople, which also benefit from the temporalinformation in videos [4]. Mining and extractingsocial relationships require analysis of videos overa long time duration, which is a burden for usersaccessing the information. As a result, automaticvisual data mining from long-duration multimediadata has been an important tool for social andbehavioral studies [5–8]. Among multimedia data,role–event videos are of special attention.Role–event videos, as the name suggests, have

two main elements: roles and events. Differentfrom biography or documentary videos which mayemphasize one of these elements, in role–event videos,characters and events are interdependent and jointlypromote the development of the storyline. Thisinterdependence enables the analysis of characters’social roles and their relationships to assist theunderstanding or even prediction of events. Sincemany characters are involved in these videos, takingdifferent social roles in the events, the intricaterelationships constitute complex social networks,making it challenging for computers to understandthe contents of the video at different levels and fromdifferent aspects. Therefore, we believe involvingusers during the analysis would better cater to therequirements of different tasks. In this paper, inaddition to the construction of social networks fromvideos, we propose an interactive video analyticstool named InSocialNet, to assist analysis of theconstructed social networks.

375

376 Y. Pan, Z. Niu, J. Wu, et al.

InSocialNet is developed for long duration role–event videos such as movies, and with an emphasison analyzing the relationships among characters andmore importantly the evolution of these relationships.The main difference between our work and the

existing ones [5] is that we construct a social networkthat is updated dynamically with the progress of thevideo. When new characters or relationships appearin the video, nodes and edges are dynamically addedinto the social network. This enables us to observethe formation process of the social network and theevolution of relationships between characters. We usenode2vec [9] to represent each node in the constructedsocial network as a high dimensional vector whichencodes both the attributes and the social structureof the character at the node. Then t-SNE is used toreduce the high dimensional representation into twodimensions to visualize the clusters formed by nodesand enable interactive analysis of factions.Case studies on two classic role–event videos

have shown that the analysis results from usingthe developed tool are generally consistent withthe storylines in the videos, demonstrating theeffectiveness of InSocialNet. In summary, the maincontributions of our work are:• We propose a framework integrating visual

analytics, social network analysis, and videoprocessing for the purpose of understanding thecontents of role–event videos. We believe itcan provide a pathway to deep understandingof behavior and activity patterns in role–eventvideos, and has potentials to support socialpsychology experiments.

• We develop a prototype implementation of theframework, named InSocialNet that supportsusers to investigate the characters, emotions,relationships, factions, and events, and de-monstrates the effectiveness of the prototype toolusing case studies.

• We propose a visualized faction detectionmethod by combining graph embedding and t-SNE, and demonstrate that interactive visualanalysis provides complementary information toautomatic social network analysis.

The remainder of the paper is organized as follows:Section 2 reviews the related work. Section 3 presentsthe requirement analysis and Section 4 gives anoverview of the system. Sections 5 and 6 specify thedetails of the system, including both the back end and

the front end. Section 7 presents experiments andcase studies to evaluate the system. Finally, Section 8concludes the paper and provides possible avenuesfor future work.

2 Related work

Short-term activity analysis. Role–event videosusually span a long time duration which is necessaryfor analysis of social interactions. However, themajority of works on video analysis are on activitiesof short periods, such as the acquisition of humanmotion trajectory [10] and population densityestimation [11]. Although these short period videoanalysis techniques have advanced progress, theycannot effectively mine useful information hidden inlong duration videos. However, they provide supportfor our research on analysis of long time durationvideos.

Social network based video analysis. Thesocial network is a powerful tool to analyze highlycoupled relationships in videos. Networks representedby interconnected nodes and links between them aregood representations of complex typologies includingsocial relations, genetic interactions, transportation,financial systems, ecology, and Internet. The dynamicconstruction of social networks from video provides aclear and intuitive way to observe the relationshipsbetween the characters, and understand the evolutionof the storyline.Recently, some works began to make use of

social networks for analysis of relationships amongcharacters in long duration videos. StoryRoleNetwas proposed to construct an accurate andintegral network representing the relationships amongcharacters, which are represented by both theappearance and the names extracted from the subtitletexts in video [5]. The work reported in Ref. [12]utilized the technique of face recognition and builtsocial networks of many public characters appearedin news in the last decade to assist understanding theJapanese media.Other related works include: Ref. [4], which

used social networks to evaluate female roles inmovies over the past century, and Ref. [13] whichestablished social networks of vehicle trajectoriesin urban environments to prevent the occurrenceof terrorist activities. Although the above worksused social networks for analysis of the evolution of

InSocialNet: Interactive visual analytics for role–event videos 377

character relationships or networks in videos, thestudy has not risen to the social level.

Social role discovery. Characters in videosusually belong to certain communities. Recentworks on community detection were able to dividekey characters in video into factions based onsocial networks, with the aim to help social rolediscovery [14]. These community detection methodsare based on topological analysis [15, 16] and flowanalysis [17, 18]. RoleNet [19] constructed roles’social networks and determined the leading rolesand their corresponding communities automatically.The work in Ref. [20] proposed a framework thatcan automatically understand the role assignmentwithout training role labels and identify social rolesby interactions between people.The work in Ref. [21] recognized the social

relationships covering all aspects of social life fromthe perspective of social psychology. These works onsocial roles and social relations can help us betterunderstand the content and events in videos. Inthis study, we combine node2vec and t-SNE [22]instead of community detection algorithms. Thusthe result shows the relative position of each node intwo-dimensional space. It no longer gives the absolutepartition, but indirectly shows the possible clustersof nodes by coordinates. We will show comparisonsof the partition results from our method and otherpopular community detection algorithms by analyzingmovie videos.

Human behavior analysis. Analyzing videos todetect and determine temporal and spatial events hasalways been an important research topic. Most ofthe research is about feature extraction and structureidentification to support high-level applications suchas video retrieval, summarization, and navigation.The work in Ref. [23] analyzed parent–infantinteraction behaviors to extract relevant social signals,while in Ref. [24], the authors proposed a predictivefield model framework to predict the roles’ gazebehaviors in a social scene. In Ref. [25], a supervisedframework to recognize human and social behaviorswas presented. In our work, video events manifest asthe emergence of key characters and the changes ofpeople emotions.

Role sentiment analysis. It is challengingto recognize the emotions of characters in videos.However analyzing the emotional changes of charac-

ters helps understand the development of events.The work in Ref. [26] analyzed and summarized theemotional differences between eastern and westernindividuals. Based on quantified satisfaction scoresfrom customers, the work in Ref. [27] made an overallassessment of emotions in video by combining boththe audio and visual components. However, few workshave linked the emotional changes of characters to theevents in the video. Our work combines the emotionaldynamics with the first appearance time of characters,which will support the prediction of major events invideo by observing the fluctuations of emotion.

3 Task analysis

We conducted interviews with potential users ofInSocialNet and summarized their requirements,which supported our design rationales of the system.In detail, we conduct interview with two potential

users of the system. Ua is a social behavioralpsychologist in the university and Ub is a professorof mechanical engineering who is currently doingresearch on worker’s collaboration analysis underIndustry 4.0 in the industrial management sector.Ua has a strong demand for a more powerful tool forcontent analysis in role–event videos to assist researchin the psychological domain. Role–event videosplay an important role in conducting psychologicalexperiments. Many role–event videos have beenand will still be recorded for this purpose. Ua

mentioned an example, the famous Stanford PrisonExperiment�, where twenty-four male students wereselected to take on randomly assigned roles ofprisoners and guards in a mock prison to investigatethe psychological effects of perceived power. Heis interested in their different personalities, theinfluence from key roles, and the flow of interactionsbetween the characters. He advocates a tool thatcan automatically extract complicated interactions,and facilitate all the above analysis tasks. Ub isworking on intelligent manufacturing and would likea tool that can help gain in-depth understanding ofhuman factors that affect the efficiency of factoryproduction. Currently, they need to manually recordthe emotions of workers, the interactions among them,and collect information by questionnaires. A tool thatcan automate the process and assist understanding of

� https://en.wikipedia.org/wiki/Stanford prison experiment


the video contents has been considered very useful bythem. Based on the interviews above, we summarizethe analytic tasks as follows.• Task 1. Inspect key roles, their personalities, and

their social network. Role–event videos usuallyhave long duration activities involving a numberof characters. Recognizing key characters helpsnarrow down the analysis space. Identifying thesocial roles and the factions of the key character,understanding their personalities and behaviorpatterns, and quantifying their social influencehelp gain insights of the video in absence of priorknowledge.

• Task 2. Inspect how the characters are groupedinto factions. A faction refers to a group of people,especially within a political organization, whoexpresses a shared belief or opinion different frompeople who are not part of the group. Sometimes,the faction is implicit, and requires inferencefrom their social behaviors. Faction inspectionrequires to mine all possible clues to discover thegroup information of characters by how they areacting for the common interests of this faction.Sometimes, it is particularly challenging becausecharacters may hide their real intentions for asecret action such as activities of spy or agent.

• Task 3. Inspect how the roles prompt the eventand fight for their common interests. Some ofthe key characters from different factions promptthe event. Comprehensive consideration of thedynamic changes of characters and events givesinsights to understand what has happened andto learn a better strategy from the event.

The design of the system is based on the aboveanalysis. We derive the following design rationales.• Rationale 1. Social network based visual

representation and interaction-based behavioranalysis of the characters. The social networkamong people is always dynamically accumulated.All activities among people are mined based onthe social network. Following Ben Shneiderman’smantra, the social network with informationencoded should be visualized to give an overviewfirst with more fine details enabled by filters(Task 1).

• Rationale 2. Visualize the metaphors intuitively.Different from the majority of previous worksin the computer vision communities, our work

aims to describe the characters’ personalities,factions, social activities, which are all high-levelabstraction of the video contents (Task 1, Task 2,Task 3).

• Rationale 3. Support coordinated visual analysis.The system should support coordinated visualanalysis for inspection of characters, their socialroles and factions, and how their interactionsprompt events (Task 1, Task 2, Task 3).

4 System overview

Based on the task analysis, we design and implementthe visual analytics prototype system, InSocialNet(Fig. 1). Specifically, it consists of two parts.

Back end social network construction. Itsupports a sequence of operations including facedetection and recognition, attribute extraction, socialnetwork construction, network analysis, machinelearning based closeness analysis, etc. The systemtakes videos as input. For each frame, if faces aredetected, they are recognized as either existing ornew characters compared with the stored ones. Thestored characters and the extracted attributes arethen updated based on the detection or recognitionresults, and the social network will be dynamicallyupdated by the co-occurrence of characters.

Front end visual interaction. The results ofdetected characters, constructed social network, andother information are stored as log data and outputinto the visualization module. The visualization issupported by several mining techniques includingnode2vec, t-SNE, etc. It provides information atthree levels from the individual character to localand global social networks, which is presented in

Fig. 1 Our InSocialNet system architecture. The back end detectsindividual characters, builds and updates a social network along withthe progress of the video. The front end visual interactive interfaceenables users to explore the hidden information of the characters,among them and during events.


three linked views. The visualization enables usersto effectively discover the main clues from complexrelationships, and analyze behavior patterns in bothevent and temporal dimensions. Rich interactions areprovided to support analysis along with the timelineat different levels.

5 Back end: Construction of socialnetworks

In our work, social networks are constructedas undirected weighted graphs with each noderepresenting an individual character, and edgesrepresenting relationships between characters. Nodesand edges are added dynamically with the progressof the video.

Identifying characters. Identifying charactersfrom video frames is the first step to constructingsocial networks. Characters are identified by theirfaces. The identification step consists of facedetection, face recognition, and attribute extraction.Face detection is achieved using a deep learningbased method which can handle occlusions well[28]. Detection is performed at an interval of onesecond to detect faces in video frames and returncoordinates of detected areas and landmarks ofdetected faces. Images of detected faces are thencompared with characters already detected and stored(we use MongoDB in our prototype platform), makinguse of the method [29] where different face featuresare extracted to calculate the similarity. For eachdetected face, the results of face recognition areconfidence scores for the detected face to belongto every stored character. If the highest score islower than a threshold (which is not static, this is,the threshold returned by each comparison is notnecessarily the same, but the error recognition rateof each threshold is one in ten thousand), the face isrecognized as the first appearance of a new characterand is assigned a new token. Attributes includingstable ones (gender, age, and race) and dynamic ones(expression and eye status) are extracted using a CNN(Convolutional Neural Network). These attributestogether with the detected face image and time ofappearance are stored. If the highest score is abovethe threshold, the face is recognized as the storedcharacter with the highest score and is assignedthe same token. The dynamic attributes are thenextracted from the detected face and used to update

those that are stored. With the progress of the videos,more and more characters will be added to the socialnetwork.

Establishing relationships between indivi-duals. Edges in the social network represent therelationships between individuals. In this prototypeimplementation, the relationship is established as theco-occurrence of characters in video frames. Theweight of an edge is proportional to the frequencythat the two connected characters appear in thesame frames. The current approach uses a simpleassumption which works reasonably well in most cases,and better strategies will be investigated in futurework. In the implementation, when multiple faces aredetected in a frame, in addition to updating attributesor recording new individuals, edges between pairsof these individuals are added or updated withincreased weights, together with the time of co-occurrence. Characters, relationships between them,and the constructed co-occurrence social networkare updated dynamically with the progress of videoand are fed into the visualization module to allowinteractive analysis and exploration of the evolutionof relationships and events.In addition, users can manually select the time of

interest and inspect dynamic changes of individualsand relationships during the selected time span.

6 Front end: Visual interactive analysis

The design of the visualization is driven by therequirements summarized in Section 3. In role–eventvideos, the character of interest is often the startingpoint of attention and this inspires the design of thevisualization from individual characters to local andglobal views, providing links among the several viewsdeveloped. To support investigating the evolutionof events further encourages the development ofinteractive analysis along the temporal dimension.An overview of the visual interface is shown in Fig. 2.

6.1 Visualization of social networks

At the core of the interface is the visualization of theco-occurrence social network (Fig. 2(A.2)).We choose to use the algorithm [30] with a

force directed graph to draw the co-occurrencesocial network. The drawing algorithm assignsattractive/repulsive forces among edges to attractor separate nodes in order to achieve a layout with


Fig. 2 InSocialNet system. The design follows Shneiderman’s mantra. View A gives an overview of the video and the co-occurrence networkconstructed. The tool enables users to backtrack how the network is constructed by controlling the synced time slider. View B enables users torank the characters by various measures. View C is the main inspection explorer. It employs three coordinated views to support the inspectionand analysis of key roles and relationships, events and factions, and changes of emotions.

roughly equal length edges and as few as possiblecrossings.While the video is playing, the drawing algorithm

is executed continuously at an interval of one second,which dynamically updates the nodes and edgeto enable observing the development of the co-occurrence social network. A progress bar is alsoprovided for users to select a time point in the videobased on which the force directed graph is redrawn,such that users can inspect the network more closelyat that time point.In addition, the system replaces all nodes of the

force directed graph with the corresponding faceimages and adds drag and zoom functions to nodes.These assist more clear and intuitive observationof characters’ attributes and understanding of theirsocial roles from the relationships between them.

6.2 Interface design

Three views (Fig. 2) with different scopes of attentionconstitute the main part of the visualization.

View A shows the video and visualizes the co-occurrence social network according to the time inthe video. Users are able to track the evolutionof the network by controlling the time slider. The

co-occurrence social network is displayed as a forcedirected graph as mentioned above. If a character-of-interest is selected, the display focuses on theco-occurrence network that the selected characterresides in, showing all the characters and weightedconnections that are associated with the character-of-interest.

View B lists all the characters that have appearedup to the time point. The characters are rankedand ordered in the list, and a drop-down menu isprovided to enable users to rank roles based on oneof the following three scores: (1) Node degree whichmeasures “activeness”, i.e., how many interactionseach character has with others. (2) Authorityscore which measures the social “standing” of eachcharacter in the network. As a kind of networkcentrality, the authority score is popularly used tomeasure the “importance” of web pages but hasalso been related to measuring “standing” in socialnetworks [31], which is the case in our work. For anundirected graph, the authority scores of the nodesare defined as the principal eigenvector of ATA whereA is the adjacency matrix. (3) The estimated age ofthe character.In the character list, the attention is on individual


characters. Ranking the characters based on thevarious scores facilitates the identification of moreactive and authoritative persons, which helpsunderstand the main plots of the story.

View C is the inspection explorer, the main partfor users to explore the relations, emotions, theatmosphere, etc. of the story in the video. It includesthe following modules.

View C.1 is for character–event inspection. Weuse a 2D scatter plot to contrast each role’s authorityscore vs. the role’s attributes such as time of firstappearance, duration of appearance, age, etc. Thisplot enables users to find out which attributes areclosely related to the authority of roles, which willin turn help with the identification of authoritativecharacters based on their attributes.

View C.2 is for faction inspection. In a faction,characters are connected to each other and they tendto have similar network structures. As a result, forfaction analysis, the information required for eachnode is not only the attributes of the character,but also the structure of its social relations. Tomodel both pieces of information, we make use ofnode2vec [32], a graph embedding algorithm, to geta high-dimensional vector representation for eachnode. Then, t-SNE is used to project the high-dimensional representation into 2D coordinates forusers to observe potential factions in the inspectedco-occurrence social network.

View C.3 is for emotion analysis. It showsthe dynamic changes of expressions of any selectedcharacters with an emotion string breed chart wherethe emotions are presented by a breed and they arestrung together via a time string. If no characters areselected, it shows the dynamic changes of expressionsof all characters up to the current video time. Thebreed refers to detected emotions encoded withsymbolic colors—anger (red), disgust (purple), fear(black), happiness (green), neutral (grey), sadness(blue), and surprise (orange).The emotion string breed not only shows the overall

emotional changes but also provides abundance ofinformation that enables analysis of many aspectsabout the characters and the story. For example,from the emotional distributions, we can infer usefulinformation such as the inner feelings of a specificcharacter at different time [26], the dispositions ofthe characters, and the atmosphere of the story.

6.3 Interactions supported

Interactive operations are provided to users tosupport exploration of implicit patterns. Specifically,the provided interactions include:

(1) Mouse hovering. When the mouse hoversover a node in the co-occurrence social network inView A.2, the view will display the nodes that areconnected with the hovered node, together withweights of the connections, while hiding the nodesthat are not connected.The benefit of this feature is to enable a better

observation of sub-networks in a complex socialnetwork.

(2) Single clicking. When clicking a node (acharacter) in any view, an “attribute table” willdisplay the attributes of the selected character forusers to inspect the details. When a character isselected in one of the views, it will also be highlightedin other views to allow observation of different aspectsof the same character to achieve a comprehensiveanalysis.

(3) Multiple selection. The brush function isprovided in faction analysis (View C.2) for usersto select multiple characters simultaneously. Thismultiple selection allows investigating whether theclustering results are consistent with the socialnetwork structures. It also enables observation ofemotional changes within a group, comparison ofattributes among a group, etc.

7 Evaluation

In this section, we first compare the results on factiondetection using our method (Section 6.2, View C.2)and other community detection algorithms. Wethen demonstrate the usefulness of the developedtool through a representative case study. Thefollowing two role–event videos from different culturebackgrounds are used for the evaluation.

“Romance of the Three Kingdoms” is aclassic Chinese TV series. The story is set duringthe years towards the end of the Han dynasty andthe Three Kingdoms period (169–280 AD) in thehistory of China�. In the evaluation, Episode 4 isused. The episode is about the union of the eighteenfeudal princes, including the three main force leadersCao Cao, Sun Jian, and Yuan Shao, discussing how

� https://en.wikipedia.org/wiki/Romance of the Three Kingdoms


to attack Dong Zhuo, the ghostly prime minister.And Dong Zhuo decided to lead the army himselfto revenge on the union when he was acknowledgedthat his general Hua Xiong was defeated by Guan Yu.Through the back end processing, we got a total of43 characters and 43 relationships from this episode.The co-occurrence social network was dynamicallyconstructed.

“Harry Potter” is a series of movies based on theeponymous novels by British author J. K. Rowling.The series tell the magical story of the protagonistHarry Potter, who studies, lives, and fights at theHogwarts School of Witchcraft and Wizardry. Usedin evaluation is the episode of “Harry Potter and theGoblet of Fire”, which is around the conspiracy ofdeath eaters. The first half of the episode is about theTriwizard Tournament which was bustling with noiseand participants from all three schools. In the secondhalf, Harry Potter was transferred to the base ofdeath eaters to revive voldemort, the master of deatheaters. The back end processing harvested a total of67 characters and 214 relationships, from which theco-occurrence social network was constructed.

7.1 Experiments on faction detection

There are existing community detection algorithms[33, 34] that aim to identify highly connected groupsfrom social networks. We carry out experiments onthe two videos to compare the detection results fromour graph embedding plus t-SNE method with thosefrom the principal community detection algorithms,including label propagation [18], infomap [35], edgebetweenness [15], multilevel [36], leading eigenvector[16], and walktrap [17]. Detection results on keycharacters with the highest authority scores are shownin Fig. 3 for the “Three Kingdoms” video and Fig. 4for the “Harry Potter” video. Each color in the figuresrepresents a detected community/faction.An immediate observation is that different

algorithms detect communities at different fine levels.

Fig. 3 Faction detection by community detection algorithms andour method for the analysis of “Romance of the Three Kingdoms”.

Fig. 4 Faction detection by community detection algorithms andour method for the analysis of “Harry Potter”.

Label propagation algorithm is at the coarsest,dividing the key characters from both videos into twogroups only, while the algorithms of edge betweenness,multilevel, and walktrap divide the characters intomore and finer groups. Others including our methodare in between.Observing the results on “Romance of the Three

Kingdoms” more closely, we found that the mostfamous faction of Liu Bei, Guan Yu, ZhangFei was mistakenly split using the algorithms ofleading eigenvector, edge betweenness, multilevel,and walktrap. In addition, the algorithms of leadingeigenvetor, multilevel algorithm, and walktrap groupDong Zhuo, the enemy of the allied force, into thefaction of Cao Cao who is a leader of the alliedforce, which is also inconsistent with the storyline.Compared to the community detection algorithms,our method demonstrates its superiority for this video.Our method is the only one that correctly separatesthe faction of Dong Zhuo (including Dong Zhuo andLu Bu) from the faction of the allied force, andspecifically groups Liu Bei, Guan Yu, and ZhangFei into an exclusive faction.Turning our attention to the results on the

“Harry Potter” video, we first noticed that Peterand Voldemort were correctly grouped together byall the algorithms including our method. Theconstructed social network (Fig. 5) shows that thetwo characters mainly interact with each other andhave few interactions with others in this episode. Thisdemonstrates the effectiveness of all these algorithmson detecting exclusive factions.It is also observed that Harry, Ron, and Hermione,

collectively called the Harry Trio, are grouped intothe same faction but not exclusively. They belong toa faction that include other pupils and teachers. Itis especially so for algorithms of label propagation,infomap, leading eigenvector, and our method, which


Fig. 5 The constructed social network of “Harry Potter”. It showsthat voldemort and Peter are only connected with each other, but notto other characters.

group the trio and almost all the others into onefaction.The other three algorithms divide the teachers

and pupils into finer factions, however, with somerandomness for this video. The Harry Trio is part ofthe bigger faction formed by teachers and pupils.But the results here demonstrate a limitation ofcommunity detection algorithms, i.e., they cannotdetect hierarchical factions. An advantage of ourmethod over community detection is visualization.Making use of graph embedding and t-SNE, ourmethod visualizes the distribution of factions andenables the identification of smaller factions insidebigger ones by interactive visual inspection.

7.2 Case study

To demonstrate the effectiveness of the developedtool for analysis of role–event videos, we used theabove two videos as examples, and used the toolto interactively discover the key characters, theirpersonalities, the factions, and the activity patternsfrom the input video. The discoveries were thencompared with the video contents to verify theirconsistency. Due to the limited space, here wepresent the case study on the “Romance of the ThreeKingdoms”.7.2.1 Inspection of key characters and relationshipsWe first want to identify and analyze the mostimportant characters in the video. In the characterlist (Fig. 2(B)), we sorted the characters according totheir “authority” scores and found the character ofCao Cao ranked the first. The co-occurrence socialnetwork as a whole (Fig. 2(A.2)) further confirms thekey role of Cao Cao with his central position in the

network. We thus selected this character to observehis interactions with other characters and analyzetheir relationships.Figure 6(a) shows the co-occurrence network

associated with Cao Cao. We immediately noticed theexceptionally high weight of the connection betweenCao Cao and Cao Ren, indicating very frequentinteractions between the two. We also noticedthat Cao Ren was ranked with the second highest“authority” score. To further inspect the relationshipbetween the two important characters, we first turnedto the Faction View (Fig. 2(C.2)) and found that thetwo characters belonged to the same faction. Wethen switched between selecting Cao Cao and CaoRen and observed their co-occurrence social networks.We found that Cao Ren’s co-occurrence network islargely a subset of Cao Cao’s (Fig. 6(b)). Thus sofar, we can infer that Cao Cao and Cao Ren workclosely in the same faction, and Cao Cao is at a moresenior position than Cao Ren. For those who arefamiliar with the story of “The Three Kingdoms”,this inference above is easily verified, as Cao Cao isindeed one of the most important roles in the story,and Cao Ren is his cousin and right-hand man.We continued exploring the co-occurrence social

network of Cao Cao and checking in the faction view.We noticed some characters in the same faction withCao Cao had frequent co-occurrence with him butalmost no connections with others. An example isa character at the bottom of Fig. 6(a). Figure 6(c)shows the co-occurrence network of this character.We speculated from this exclusive co-occurrence thatthis character was a guard of Cao Cao. We checkedthe first appearance time of this character in the eventview (Fig. 7(a)), and watched the video from then.From the video, the soldier was following Cao Cao andCao Ren on the way to visit Liu Bei, which explainedthe frequent co-occurrence of the soldier with CaoCao and Cao Ren. The character is indeed a guardsoldier of Cao Cao. This again demonstrated theeffectiveness of InSocialNet to analyze relationshipsbetween characters.7.2.2 Inspection of eventsIn the Character–Event Inspection view (Fig. 2(C.1)),we compared the authority scores against the time offirst appearance and the total duration of occurrence.As shown in Fig. 7, we observed that the earlierthe character appears, the more likely this character


Fig. 6 (a) Cao Cao’s co-occurrence social network. Cao Cao is the most authoritative person, indicating that the number of roles associatedwith him is the largest. (b) Cao Ren’s co-occurrence social network is largely a subset of Cao Cao’s, which shows that his position in the factionis lower than Cao Cao’s. (c) Some characters in Cao Cao’s faction are only related to the important roles such as Cao Cao and Cao Ren.

Fig. 7 (a) The Character–Event Inspection view (authority score vs. first appearance time) shows there are two major events: at the beginningand in the middle of the video. Further exploration shows that the first wave is the allied forces of princes planning to attack Dong Zhuo, theghostly prime minister, while the second wave is about the resistance from Dong Zhuo. (b) The Character–Event Inspection view (authorityscore vs. length of appearance time) shows Cao Cao with his brotherhood are the main roles, and unsurprisingly, their generals and soldiers alsohave high frequency appearance.

has higher authority, which means more likely tobe an important character in the entire video. Wealso observed that most of them appeared a longerduration than others.In addition to the peak at the beginning, we

observed a small peak in the middle of the videoas well (Fig. 7(a)). Peaks are caused by intensiveappearance of key characters, and often indicateimportant events. Checking the characters in thepeaks, we further observed that the two events werehappening to different factions of characters. Theevent at the beginning was among a union of factionsincluding the faction that Cao Cao resided in. While

the later event was more focused in one faction asshown in Fig. 8.Watching the video, in the first five minutes, Cao

Cao and his allies, including Yuan Shao, Sun Jian,Liu Bei, and many other generals and guards, werediscussing the strategy of attacking Dong Zhuo. AndLiu Bei, Guan Yu, and Zhang Fei were also standingaside. Then in the middle of the video, Dong Zhuodecided to defend with his own troops, and some ofhis soldiers appeared for the first time, contributing tothe second small peak. These two events in the videocause the first appearance of many new characters,and correspond to the peaks in the event view.


Fig. 8 The back end machine learning algorithms automaticallycluster the characters into separated groups. The results of factionanalysis show the emerging and potential factions in the video, wherethe closer the coordinates are, the more likely the characters belongto the same faction. The manually annotated results validate thecorrectness.

7.2.3 Faction analysisContinuing the exploration in the Faction InspectionView (Fig. 8) and the event view (Fig. 7(a)), wefurther found that most of the characters in thefaction of Cao Cao had a high authority score andappeared in the first five minutes of the video,indicating that the faction of Cao Cao has a highstatus in the video. This is consistent with the storyof the video that Cao Cao and his allies are the maincharacters in this episode. But we also found inFig. 7(a) that three characters, Liu Bei, Guan Yu,and Zhang Fei, although they appeared early in thevideo and in the same time period as other charactersin Cao Cao’s faction, they did not belong to thefaction of Cao Cao in Fig. 8, and their authorityscores were very low. Checking their co-occurrencenetworks (Fig. 9), we further observed that thesethree characters are closely connected to each other,but not with others. This indicated that the threecharacters formed a small independent faction whichparticipated in the event at the beginning, but hadlittle interactions with others.Watching the video, it is confirmed that in this

episode, the small faction of Liu Bei, Guan Yu, ZhangFei had just appeared and was insignificant at thattime. Because of their low political positions, no

Fig. 9 Liu Bei, Guan Yu, and Zhang Fei are closely connected toeach other, but not with others. So although their first appearancesare in the same time period with other characters in the faction ofCao Cao, their authority scores are low.

authoritative characters had much interactions withthem, resulting in their low authority scores.Three more factions are also formed in Fig. 8.

Clicking on each character in these factions, thedetailed information of the character is displayed inthe info panel, helping with the identification of theirroles. Checking the video, we found the faction named“Defenders of allied army” includes the guard soldiersof Cao Cao (noted that the factions and the namesare labeled manually for a better illustration of theresults). Because they are far away from Cao Cao andother princes, but closely related to each other, theyare separately divided into a single faction. The othertwo factions, named “Dong Zhou and his generals”and “Dong Zhou’s adviser and general” are bothformed by characters closely related to Dong Zhou,which is another key role in the story. Ideally, the twofactions should merge into one. But as Dong Zhou’sadviser and the general did not appear together withDong Zhuo and others in this episode, they wereseparately divided. With the analysis extended tomore episodes, the two factions should merge together.In general, the t-SNE algorithm divides the charactersinto meaningful factions, providing complementaryinformation to the co-occurrence networks.7.2.4 Analysis of atmosphere and emotion changesWith the Emotional Evolution view (C.3), we canobserve and explore how the story atmospherechanges. From Fig. 10, we can see that the emotions


Fig. 10 Emotion Inspection view gives an overview of the atmosphereand also supports the visualization of “the shape of the personalities”of different characters. It can be seen that there are more negativeemotions in the first half of the video, and more positive emotions inthe second half.

of “neutral” and “sadness” dominate the episode. Thisis consistent with the main theme of war in this video.We also observed that with the progress of the episode,the occurrence of the emotion “fear” decreased whilethe occurrence of “happiness” increased. It is againconsistent with the story progress in this episode. Thefirst half of the video described Cao Cao and otherprinces who were attacked by Hua Xiong and twogenerals were defeated, resulting in fears and sadnessamong Cao Cao’s allies. The second half is mainlyabout Cao Cao and Liu Bei discussing the situation,and the negotiation between Yuan Shao and DongZhuo. The overall atmosphere is relatively relaxed, sopositive emotions appear more frequently among thecharacters.In addition to analyzing the change of atmospheres,

the interface enables users to explore emotion changesof individuals to better understand their personalities.Here, we analyze two main characters in the story:Guan Yu and Zhang Fei. The charts showing theiremotional changes are displayed in Figs. 11(a) and11(b) respectively. We can see that Zhang Fei’semotion changes more frequently and dramatically,covering all the seven emotions, indicating a moreemotional personality that Zhang Fei possesses. Incomparison, the change of Guan Yu’s emotion isflatter, and his emotions are dominated by neutralwith no happiness, indicating a more calm and seriouspersonality. Watching the video, we can see theiremotions change in accordance with the progress ofthe story. In the first half of the episode, their morenegative emotions are triggered by both the war andthe belittling remarks from the princes, whereas in

the second half, Cao Cao’s visit and praise result intheir more positive emotions.Analyzing the dynamics of atmosphere and

emotions enables users to better understand theprogress of the story and quickly infer thepersonalities of characters. This function, whencombined with more powerful mining algorithms inthe future, may help users make predictions, whichis especially important for the potential applicationof the technique to videos for monitoring purposes.

7.3 Informal user feedback

We conducted informal interviews with the twopotential users (also refer to Section 3) of the systemand asked them to give brief comments from theaspects of the usefulness, visual design, interactionsof the proposed system.

Usefulness. Both experts confirmed the usabilityof our system. Ua praised the rationale of thesocial network centered design, as he believes socialinteraction is an important and unique element inhuman society. He has interest to try the tool for apsychological experiment. Ub commented that withthe tool, they would be able to obtain objectiveinformation and to explore more than what canbe provided in the traditional way. Both expertshighlighted the usefulness of faction analysis. Ub

commented “this is way beyond what questionnairescan do.” However, Ub also expressed his concern onapplying the approach to real industry surveillancevideo analysis, as it seems the face recognitionapproach is not robust when the camera is high over.The face recognition accuracy can be affected by manyissues such as occlusion and lighting changes. Hesuggests that the future social network reconstructioncan be based on gait recognition as the action maybemore reliable than faces in practice.

Visual design. Both experts were impressedby the interface. They appraised its concisenessand intuitiveness. They considered the linkedviews is an effectively way to organize informationfrom overview to individual levels, revealing thehierarchical relationships. Ua especially likes thecolored visualization of characters’ emotions. Hethinks it provides an intuitive way to help understandthe characteristic personalities.

Interactions. Experts consider the interactiveoperations provided are intuitive and smooth, andmeet their requirements for information exploration.


Fig. 11 Comparing the emotion changes of Guan Yu and Zhang Fei (zoom in for better view). Guan Yu is mostly calm. Even when he is sad,he shows few extreme emotions. On the other hand, his brotherhood, Zhang Fei, with a sharply contrasting emotion shape means that he is aperson with distinct love and hate.

Ub especially commented on being able to watchthe dynamic construction of the network, whichenables him to capture critical moments for analysis.However, Ua expressed his concern on the accuracyof faction analysis. The current system supportsgenerating a relationship mapping using t-SNE butdoes not support further interactive exploration. Andthe discovery of hierarchical relationship, which iscommon in many social activities, has not beensupported in the current system as well. Ua suggeststo enhance the interaction in faction analysis to enablethe discovery of more complex social relationships.

8 Discussion and future work

In this paper, we report some preliminary workby integrating visual analytics, social networkanalysis, and video processing for role–event videocontent understanding. We provide a prototypeimplementation and report case studies to demonstratethe usefulness of the tool. Although it shows promisingresults, we note there are still some limitations andthey will be our future work.Firstly, the co-occurrence based social network

construction can sometimes be unreliable, as the co-occurrence does not guarantee a special relationship.Hidden activities for political affairs would be achallenge for the current prototype system. Linkprediction based approaches such as Ref. [37] mayprovide a possible solution to fulfill the crucial missinglinks. More cues obtained by video processing suchas emotions, verbal and body languages can help tore-weight the links.

Secondly, since the motivation of our work isto understand social roles and events from videosrecording a long-time period, cross-video informationwould help with the construction of a more completecross-time social network. Learning based interactiveexploration will be developed to support sophisticatedhierarchical relationship discovery. The completesocial network can help to build a more comprehensiveunderstanding of the roles, behavior patterns, events,and trends, and provide more accurate big pictures.Finally, we believe real-time video analysis will

have broad application domains. In the future, itis desired to have camera(s) attached to the system,and detect humans and perform subsequent analysisfrom the captured videos in real-time. It wouldbe an important feature for picking up unusualbehavior/activity patterns in time.

Acknowledgements

The research is supported by National Natural ScienceFoundation of China (No. 61802278).

References

[1] Khorrami, P.; Paine, T. L.; Brady, K.; Dagli, C.; Huang,T. S. How deep neural networks can improve emotionrecognition on video data. In: Proceedings of the IEEEInternational Conference on Image Processing, 619–623,2016.

[2] Kim, M.; Kumar, S.; Pavlovic, V.; Rowley, H. Facetracking and recognition with visual constraints in real-world videos. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 1–8, 2008.


[3] Forczmanski, P.; Nowosielski, A. Multi-view dataaggregation for behaviour analysis in video surveillancesystems. In: Computer Vision and Graphics. LectureNotes in Computer Science, Vol. 9972. Chmielewski, L.;Datta, A.; Kozera, R.; Wojciechowski, K. Eds. SpringerCham, 462–473, 2016.

[4] Kagan, D.; Chesney, T.; Fire, M. Using data scienceto understand the film industry’s gender gap. arXivpreprint arXiv:1903.06469, 2019.

[5] Lv, J.; Wu, B.; Zhou, L. L.; Wang, H. StoryRoleNet:Social network construction of role relationship in video.IEEE Access Vol. 6, 25958–25969, 2018.

[6] Yu, C.; Zhong, Y. W.; Smith, T.; Park, I.; Huang, W.X. Visual data mining of multimedia data for socialand behavioral studies. Information Visualization Vol.8, No. 1, 56–70, 2009.

[7] Tomasi, M.; Pundlik, S.; Bowers, A. R.; Peli, E.; Luo,G. Mobile gaze tracking system for outdoor walkingbehavioral studies. Journal of Vision Vol. 16, No. 3, 27,2016.

[8] Bernstein, G. A.; Hadjiyanni, T.; Cullen, K. R.;Robinson, J. W.; Harris, E. C.; Young, A. D.;Fasching, J.; Walczak, N.; Lee, S.; Morellas, V.;Papanikolopoulos, N. Use of computer vision toolsto identify behavioral markers of pediatric Obsessive–Compulsive disorder: A pilot study. Journal of Childand Adolescent Psychopharmacology Vol. 27, No. 2,140–147, 2017.

[9] Grover, A.; Leskovec, J. node2vec: Scalable featurelearning for networks. In: Proceedings of the 22ndACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 855–864, 2016.

[10] Jiang, Y. G.; Dai, Q.; Xue, X. Y.; Liu, W.; Ngo, C.W. Trajectory-based modeling of human actions withmotion reference points. In: Computer Vision – ECCV2012. Lecture Notes in Computer Science, Vol. 7576.Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.;Schmid, C. Eds. Springer Berlin Heidelberg, 425–438,2012.

[11] Ren, W. H.; Kang, D.; Tang, Y. D.; Chan, A. B.Fusing crowd density maps and visual object trackersfor people tracking in crowd scenes. In: Proceedings ofthe IEEE/CVF Conference on Computer Vision andPattern Recognition, 5353–5362, 2018.

[12] Renoust, B.; Ngo, T. D.; Le, D. D.; Satoh, S. A socialnetwork analysis of face tracking in news video. In:Proceedings of the 11th International Conference on

Signal-Image Technology & Internet-Based Systems,474–481, 2015.

[13] Schmitt, D. T.; Kurkowski, S. H.; Mendenhall, M. J.Building social networks in persistent video surveillance.In: Proceedings of the IEEE International Conferenceon Intelligence and Security Informatics, 217–219, 2009.

[14] Taha, K. Disjoint community detection in networksbased on the relative association of members. IEEETransactions on Computational Social Systems Vol. 5,No. 2, 493–507, 2018.

[15] Newman, M. E. J.; Girvan, M. Finding and evaluatingcommunity structure in networks. Physical Review EVol. 69, No. 2, 026113, 2004.

[16] Newman, M. E. J. Finding community structure innetworks using the eigenvectors of matrices. PhysicalReview E Vol. 74, No. 3, 036104, 2006.

[17] Pons, P.; Latapy, M. Computing communities inlarge networks using random walks. Journal of GraphAlgorithms and Applications Vol. 10, No. 2, 191–218,2006.

[18] Raghavan, U. N.; Albert, R.; Kumara, S. Near lineartime algorithm to detect community structures in large-scale networks. Physical Review E Vol. 76, No. 3,036106, 2007.

[19] Weng, C. Y.; Chu, W. T.; Wu, J. L. RoleNet: Movieanalysis from the perspective of social networks. IEEETransactions on Multimedia Vol. 11, No. 2, 256–271,2009.

[20] Ramanathan, V.; Yao, B. P.; Li, F. F. Social rolediscovery in human events. In: Proceedings of theIEEE Conference on Computer Vision and PatternRecognition, 2475–2482, 2013.

[21] Sun, Q. R.; Schiele, B.; Fritz, M. A domain basedapproach to social relation recognition. In: Proceedingsof the IEEE Conference on Computer Vision andPattern Recognition, 435–444, 2017.

[22] Van der Maaten, L. Learning a parametric embeddingby preserving local structure. In: Proceedings of the12th International Conference on Artificial Intelligenceand Statistics, 384–391, 2009.

[23] Avril, M.; Leclere, C.; Viaux, S.; Michelet, S.; Achard,C.; Missonnier, S.; Keren, M.; Cohen, D.; Chetouani,M. Social signal processing for studying parent–infantinteraction. Frontiers in Psychology Vol. 5, 1437, 2014.

[24] Park, H. S.; Jain, E.; Sheikh, Y. Predictingprimary gaze behavior using social saliency fields. In:Proceedings of the IEEE International Conference onComputer Vision, 3503–3510, 2013.


[25] Vrigkas, M.; Nikou, C.; Kakadiaris, I. A. Identifyinghuman behaviors using synchronized audio-visual cues.IEEE Transactions on Affective Computing Vol. 8, No.1, 54–66, 2017.

[26] Jack, R. E.; Garrod, O. G. B.; Yu, H.; Caldara,R.; Schyns, P. G. Facial expressions of emotion arenot culturally universal. Proceedings of the NationalAcademy of Sciences Vol. 109, No. 19, 7241–7244, 2012.

[27] Seng, K. P.; Ang, L. M. Video analytics for customeremotion and satisfaction at contact centers. IEEETransactions on Human-Machine Systems Vol. 48, No.3, 266–278, 2018.

[28] Wang, J.; Yuan, Y.; Yu, G. Face attention network:An effective face detector for the occluded faces. arXivpreprint arXiv:1711.07246, 2017.

[29] Zhou, E.; Cao, Z.; Yin, Q. Naive-deep face recognition:Touching the limit of LFW benchmark or not? arXivpreprint arXiv:1501.04690, 2015.

[30] Fruchterman, T. M. J.; Reingold, E. M. Graph drawingby force-directed placement. Software: Practice andExperience Vol. 21, No. 11, 1129–1164, 1991.

[31] Chikhaoui, B.; Chiazzaro, M.; Wang, S. R.; Sotir,M. Detecting communities of authority and analyzingtheir influence in dynamic social networks. ACMTransactions on Intelligent Systems and TechnologyVol. 8, No. 6, Article No. 82, 2017.

[32] 1Grover, A.; Leskovec, J. node2vec: Scalable featurelearning for networks. In: Proceedings of the 22ndACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 855–864, 2016.

[33] Ma, X. K.; Dong, D. Evolutionary nonnegative matrixfactorization algorithms for community detection indynamic networks. IEEE Transactions on Knowledgeand Data Engineering Vol. 29, No. 5, 1045–1058, 2017.

[34] Lu, Z. Q.; Sun, X.; Wen, Y. G.; Cao, G. H.; Porta, T. L.Algorithms and applications for community detectionin weighted networks. IEEE Transactions on Paralleland Distributed Systems Vol. 26, No. 11, 2916–2926,2015.

[35] Rosvall, M.; Bergstrom, C. T. Maps of information flowreveal community structure in complex networks. In:Proceedings of the National Academy of Sciences USA,1118–1123, 2007.

[36] Blondel, V. D.; Guillaume, J. L.; Lambiotte, R.;Lefebvre, E. Fast unfolding of communities in largenetworks. Journal of Statistical Mechanics: Theoryand Experiment Vol. 2008, No. 10, P10008, 2008.

[37] Xiao, Y. P.; Li, X. X.; Wang, H. H.; Xu, M.; Liu,Y. B. 3-HBP: A three-level hidden Bayesian linkprediction model in social networks. IEEE Transactionson Computational Social Systems Vol. 5, No. 2, 430–443,2018.

Yaohua Pan is a master studentat the School of New Media andCommunication, College of Intelligenceand Computing, Tianjing University. Hereceived his B.S. degree from HenanUniversity. His research interests arecomputer vision and visualization.

Zhibin Niu is an assistant professorin the College of Intelligence andComputing at Tianjin University,China. He is a Marie Skodowska-CurieFellow. He received his Ph.D., M.Sc.,and B.Sc. degrees separately fromCardiff University, Shanghai Jiao TongUniversity, and Tianjin University. His

research interests include reverse engineering, data mining,and visual analytics.

Jing Wu is a lecturer in computerscience and informatics at CardiffUniversity, UK. Her research interestsare in computer vision and graphicsincluding image-based 3D reconstruction,face recognition, machine learning, andvisual analytics. She received herB.Sc. and M.Sc. degrees from Nanjing

University, and Ph.D. degree from the University of York,UK. She serves as a PC member in CGVC, BMVC, etc., andis an active reviewer for journals including PR, CGF, etc.

Jiawan Zhang is a full professor atCollege of Intelligence and Computing atTianjin University. He received his Ph.D.degree from Dept. of Computer Science,Tianjin University in 2004. He serve(d)for academic events including the PCChair of IEEE CGIV, PC co-chair ofChina CAD&CG 2017, general co-chair

of VINCI’13, ChinaVis (2015, 2016), Pacific VAST (2015,2016). He also serve(d) as the program committee memberor reviewer for many conferences and journals includingPacificVis, EuroVis, IEEE TVCG, IEEE TIP.

Open Access This article is licensed under a Creative


Commons Attribution 4.0 International License, whichpermits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriatecredit to the original author(s) and the source, provide a linkto the Creative Commons licence, and indicate if changeswere made.The images or other third party material in this article are

included in the article’s Creative Commons licence, unlessindicated otherwise in a credit line to the material. If materialis not included in the article’s Creative Commons licence and

your intended use is not permitted by statutory regulation orexceeds the permitted use, you will need to obtain permissiondirectly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are availablefree of charge from http://www.springer.com/journal/41095.To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Documents

InSocialNet: Interactive visual analytics for role–event ... · behavioral studies [5–8]. Among multimedia data, role–event videos are of special attention. Role–event videos,