ADeepTextAnalysisSystemBasedon OpenNLP · Outline 1. Introduc+on,to,OpenNLP.Similarity, component 2. Whatis,aDeep,TextAnalysis,and,its, Applicaons, 3. Func+onality,of,OpenNLP.Similarity:

A Deep Text Analysis System Based on OpenNLP

Boris Galitsky

Outline

1.  Introduc+on to OpenNLP.Similarity component

2.  What is a Deep Text Analysis and its Applica+ons

3.  Func+onality of OpenNLP.Similarity: search, web mining, text classifica+on, text analysis

4.  Selected problem: content/document genera+on

2

What is OpenNLP.Similarity

3

OpenNLP.Similarity entry point

A search engineer without knowledge of linguistic / parsing can easily integrate syntactic generalization into an application when relevance is important: SentencePairMatchResult matchRes = sm.assessRelevance(snippet, searchQuery);

List<List<ParseTreeChunk>> match =

matchRes.getMatchResult();

score = parseTreeChunkListScorer.

getParseTreeChunkListScore(match);

if (score > 1.5) {

// relevant

}

Deep Text Analysis is required: When coun+ng keywords is not enough When part-‐of-‐speech informa+on is not enough When syntac+c links between words informa+on is not enough

5

Motivation: from keywords to discourse structure

•  Just keywords -‐ is not enough in many cases •  Presence of verbs for communica@ve ac@ons and

mental states -‐ may help to iden+fy a few paPerns, but is s+ll insufficient

•  Parse trees -‐ specific phrases by texts in style or genre but s+ll does not help for systema+c explora+on of deep linguis+c features

•  The discourse structure including rhetoric rela+ons and communica+ve ac+ons

6

We should be able to learn

discourse trees

7

Secondary, Tertiary, Quaternary structure of language

Computa+onal biologists figured out that higher-‐level structures of protein language help! However, linguists like to talk about higher-‐level (discourse) structures but

neither computa+onally learn them nor use in applica+ons

Approach •  Build syntac+c trees •  Extract discourse informa+on

–  Coreferences –  Rhetorical rela+ons [Mann et al., 1987] –  Communica+ve ac+ons in VerbNet format

•  Build addi+onal arcs between syntac+c tree nodes of different sentences

•  Construct parse thicket graph [Galitsky et al 2013] •  Pick extended trees from the graph •  Learn on extended trees •  Learn a special case of parse thicket: communica+ve discourse

tree (CDT)

8

Problem domain in security

•  Engineering Design document is a document which contains a thorough and well-‐structured descrip+on of how to build a par+cular engineering system.

•  Not design document : a document which contains meta-‐level informa+on rela+vely to the design of engineering system, such as: ü  how to write design docs manuals, ü  standards design docs should adhere to, ü  tutorials on how to improve design documents, ü  project requirement document, ü  requirement analysis, ü  opera+onal requirements, ü  construc+on documenta+on, ü  project planning and binning, ü  technical services review, guidelines, manuals, etc.

9

Recognizing document style, not document topic

Classifica+on dimensions •  Topicality •  Polarity (sen+ment) •  Authorship •  Document style (what kind of descrip+ons is provided for

things) 10

Topic

Authorship

Style

This is necessary for document processing

system

Application domains where style recognition is necessary

•  Document type (not topic) recogni+on •  Document flow analysis systems •  Customer rela+onship management •  Security analysis •  Forensic analysis •  Document quality assessment

11

Method: Tree Kernels

12

Tree Kernels

What are they and why use them for classifica+on?

We need to measure distances between texts: •  Tradi+onally – using sta+s+c of keywords •  More accurate – using sta+s+cs of all sub-‐parse trees

13

Measuring similarity between phrases

SVM Tree Kernel learning

–  Trees instead of numeric vectors

–  Tree kernel func+on = number of all common subtrees [Collins et al 2002]

15

( ) ( )1 21 2

1 2 1 2, ,T Tn N n N

TK T T n n∈ ∈

= Δ∑ ∑

16

Convolu+on tree kernel (Collins and Duffy, 2002) defines a feature space consis+ng of all subtree types of parse trees and counts the number of common subtrees as the syntac+c similarity between two parse trees. They have found a number of applica+ons in a series of NLP tasks, including •  search (Moschil 2006); •  syntac+c parsing re-‐ranking; •  rela+on extrac+on (Zelenko et al 2003); •  named en+ty recogni+on (Cumby & Roth 2003) ; •  Seman+c Role Labeling / rela+on extrac+on (Zhang et al 2006); •  pronoun resolu+on (Yang et al., 2006); •  ques+on classifica+on and machine transla+on (Zhang and Li 2009).

Applications of Tree Kernels

Limita+on of sentence-‐level tree kernels

Phrases important for classifica+on can be distributed through different sentences

So we want to combine/merge parse trees to make sure we cover the phrase of interest.

This document describes how the back end processor can be designed. Its requirements are enumerated below.

From the first sentence, it looks like we got the design doc. To process the second sentence, we need to disambiguate ‘its’. As a result, we conclude from the second sentence that

it is a requirement doc

Rhetoric structure

Discourse tree for a paragraph of text

18

Merging parse trees, once inter-‐sentence link is found

Paragraph Level Kernel = Sum of kernels for all extended tree pairs

20

Parse Thicket

Measuring similarity

between two texts using

Parse Thickets

Evaluation 1 & 2

22

Domain:Identifying Logical Argument in Text

23

What is required to represent an argument pattern in text?

We build an RST representa+on of the arguments and observe if a discourse tree is capable of indica+ng whether a paragraph communicates both:

•  a claim, and •  an argumenta+on that backs it up. We then explore what needs to be added to a discourse tree (DT)

so that it is possible to judge if it expresses an argumenta+on paPern or not.

24

Our example

We present a controversial ar+cle published in Wall Street Journal about Theranos, a company providing healthcare services, and the company rebuPal (Theranos 2015).

“Since October [2015], the Wall Street Journal has published a series

of anonymously sourced accusaGons that inaccurately portray Theranos. Now, in its latest story (“U.S. Probes Theranos Complaints,” Dec. 20), the Journal once again is relying on anonymous sources, this Gme reporGng two undisclosed and unconfirmed complaints that allegedly were filed with the Centers for Medicare and Medicaid Services (CMS) and U.S. Food and Drug AdministraGon (FDA).”

25

“…But Theranos has struggled behind the scenes to turn the excitement over its technology into reality. At the end of 2014, the lab instrument developed as the linchpin of its strategy handled just a small fracGon of the tests then sold to consumers, according to four former employees.”

A claim and its CDT

“…But Theranos has struggled behind the scenes to turn the excitement over its technology into reality. At the end of 2014, the lab instrument developed as the linchpin of its strategy handled just a small fracGon of the tests then sold to consumers, according to four former employees.”

A claim and its CDT

When arbitrary communica+ve ac+ons are aPached to DT as labels of its terminal arcs, it becomes clear that the author is trying to bring her point across and not merely sharing a fact.

“Theranos remains acGvely engaged with its regulators, including CMS and the FDA, and no one, including the Wall Street Journal, has provided Theranos a copy of the alleged complaints to those agencies. Because Theranos has not seen these alleged complaints, it has no basis on which to evaluate the purported complaints.”

A claim and its DT

Theranos aPempts to rebuke the claim of WSJ, but without

communica+ve ac+ons it is unclear from

the DT

“It is not unusual for disgruntled and terminated employees in the heavily regulated health care industry to file complaints in an effort to retaliate against employers for terminaGon of employment. Regulatory agencies have a process for evaluaGng complaints, many of which are not substanGated. Theranos trusts its regulators to properly invesGgate any complaints.”

CDT for an aPempt by Theranos to acquit itself Communica+ve ac+ons as labels for rhetoric

rela+ons helps to iden+fy a text which contains

a heated discussion

RST + CA => iden+fica+on of logical argumenta+on

•  To show the structure of arguments, discourse rela+ons are necessary but insufficient, and communica+ve ac+ons, or speech acts are necessary but insufficient as well.

•  We need to know the discourse structure of interac+ons between agents, and what kind of interac+ons they are.

•  We don’t need to know domain of interac+on (here, health), the subjects of these interac+on (the company, the journal, the agencies), what are the en++es, but we need to take into account mental, domain-‐independent rela+ons between them.

“Theranos remains acGvely engaged with its regulators, including CMS and the FDA, and no one, including the Wall Street Journal, has provided Theranos a copy of the alleged complaints to those agencies. Because Theranos has not seen these alleged complaints, it has no basis on which to evaluate the purported complaints.”

Another CDT for an aPempt to make even stronger counter-‐claim

Extrac+ng communica+ve ac+ons from text: VerbNet entry

Let us consider the verb amuse. There is a cluster of similar verbs that have a similar structure of arguments (seman+c roles) such as amaze, anger, arouse, disturb, irritate, and other. The roles of the arguments of these communica+ve ac+ons are as follows: • Experiencer (usually, an animate en+ty) • S+mulus • Result

Extrac+ng communica+ve ac+ons from text: VerbNet Frames

NP V NP Example: "The teacher amused the children." Syntax: S+mulus V Experiencer Clause: amuse(SGmulus, E, EmoGon, Experiencer):-‐ cause(SGmulus, E), emoGonal_state(result(E), EmoGon, Experiencer). NP V ADV-‐Middle Example: "Small children amuse quickly." Syntax: Experiencer V ADV Clause: amuse(Experiencer, Prop):-‐

property(Experiencer, Prop), adv(Prop). NP V NP-‐PRO-‐ARB example "The teacher amused." syntax S+mulus V amuse(SGmulus, E, EmoGon, Experiencer):-‐ cause(SGmulus, E), emoGonal_state(result(E), EmoGon, Experiencer).

For this example, the informa+on for the class of verbs amuse is at hPp://verbs.colorado.edu/verb-‐index/vn/amuse-‐31.1.php#amuse-‐31.1

Argumenta+on scenario and its

DT

I explained that my check bounced (I wrote it aaer I made a deposit). A customer service representaGve accepted that it usually takes some Gme to process the deposit. I reminded that I was unfairly charged an overdraa fee a month ago in a similar situaGon. They denied that it was unfair because the overdraa fee was disclosed in my account informaGon. I disagreed with their fee and wanted this fee deposited back to my account. They explained that nothing can be done at this point and that I need to look into the account rules closer.

Argumenta+on scenario and its

graph representa+on

I explained that my check bounced (I wrote it aaer I made a deposit). A customer service representaGve accepted that it usually takes some Gme to process the deposit. I reminded that I was unfairly charged an overdraa fee a month ago in a similar situaGon. They denied that it was unfair because the overdraa fee was disclosed in my account informaGon. I disagreed with their fee and wanted this fee deposited back to my account. They explained that nothing can be done at this point and that I need to look into the account rules closer.

Extrac+ng communica+ve ac+ons from text: clusters

Verbs with Predica+ve Complements Appoint, characterize, dub, declare, conjecture, masquerade, orphan, captain, consider, classify. Verbs of Percep+on See, sight, peer. Psych-‐Verbs (Verbs of Psychological State) Amuse, admire, marvel, appeal. Verbs of Desire Want, long. Judgment Verbs Judgment. Verbs of Assessment Assess, es+mate. Verbs of Searching Hunt, search, stalk, invesGgate, rummage, ferret. Verbs of Social Interac+on Correspond, marry, meet, bacle. Verbs of Communica+on Transfer(message), inquire, interrogate, tell, manner(speaking), talk, chat, say, complain, advise, confess, lecture, overstate, promise. Avoid Verbs Avoid. Measure Verbs Register, cost, fit, price, bill. Aspectual Verbs Begin, complete, conGnue, stop, establish, sustain.

Sta+s+cal vs induc+ve learning To compute similarity between abstract structures, two approaches are employed: •  represent these structures in a numerical space, and

express similarity as a number. This is a sta+s+cal learning approach

•  use a structural representa+on, without numerical space, such as trees and graphs, and express similarity as a maximal common sub-‐structure. We refer to such opera+on as generalizaGon. This is an induc+ve learning approach.

To conduct feature engineering, we will compare both these approaches for text classifica+on. The representa+on machinery and learning selngs are different, but the classifica+on accuracies can be compared.

Similarity between CDTs: maximal common sub-‐CDTs with

generalized labels

Numerical similarity between CDTs: representa+on for kernel

model

Mo+va+ons Most web visionaries think that good quality content comes from: •  really passionate fans, or •  from professional journalists, knowing the topic of their wri+ng well. However, nowadays, the demand for content requires people write large amount of text, for a variety of commercial purposes: •  from search engine op+miza+on •  marke+ng •  self-‐promo+on. -‐ A wide variety of business areas require content crea+on in some form, including web content, -‐ Expecta+on of text quality are rather low and deteriorate further, as content is becoming more and more commercial -‐

So lets have a machine generate low-‐quality

content for us

Now we focus on content genera+on in detail

41

Objec+ve To build •  an efficient •  domain-‐independent •  crea+ve •  interac+ve wri+ng assistance tool which produces: a large volume of content where •  quality and •  effec+veness are not essen+al

Plan •  Goals of the project •  Technology

–  Content genera+on algorithm –  NLP: Learning parse trees –  Parse thickets

•  Applica+ons –  Facebook agent –  Various cases of content genera+on

•  How to use it –  How to integrate into your applica+on

OpenNLP.Similarity.content generation entry point

String topic = “Albert Einstein”; HitBase hits =

new RelatedSentenceFinder(). generateContentAbout(topic);

String filenameDOCX = new

WordDocBuilder.buildWordDoc(hits, topic);

Innova+on •  Use of web mining to collect pieces of

content •  Machine learning of parse trees to filter

irrelevant pieces of content from the web •  Paragraph-‐level syntac+c structures to form

content from pieces •  Document-‐level structure of sec+ons is

formed by web mining for aPributes of an en+ty which is a subject of an assay being wriPen

Mo+va+ons: why open source •  Today search engines are information

access gatekeepers •  Search engines want their automated

agent to analyze / index content created by humans

•  WE want to automatically build content good enough so that search engines do not recognize it as written by humans…

•  So that search engines would need humans to analyze content created by our machines

Got content from Walmart? •  Today, original content is created by human writers

and therefore costly, slowly produced •  Finding a way to automate content crea+on so that the

result is sa+sfactory for human content consumers and perceived as original by search engine would advance user's web presence and facilitate his professional reputa+on

Reformulate a text for the purpose of

making it original: •  well known to search engines

•  produce uninteres+ng and seman+cally non-‐cohesive

Reproduce the human wri+ng ac+vity

A typical crea+ve wri+ng ac+vity of an average author is: •  searching the web for a subject of wri+ng •  browsing the web for relevant informa+on •  finding candidate pieces •  merging them to form a text •  Inser+ng images/videos/sound recordings •  final text polishing

We intend our wri+ng algorithm to do these exact

steps… And possibly leave the last

step for an author

Technology

Difficulty = relevance Eric P.: We will e-‐mail you a bill for the amount of your money transfer, and you pay using your

bank's online bill pay service. Boris Galitsky: can YOU do this email? Eric P.: Please clarify and restate your ques+on. Boris Galitsky: can you please send me email for bill pay informa+on? Eric P.: Once you have selected WU Pay as the "Pay With" op+on and clicked the SUBMIT buPon,

the RECEIPT screen will provide instruc+ons on how to fund the Money Transfer. Boris Galitsky: ok, I will try that Boris Galitsky: are you an automated agent? Eric P.: A|er choosing WU® Pay, you'll receive a bill by email. Eric P.: Is there anything else I can help you with today? Boris Galitsky: can you please answer my previous ques+on? Eric P.: What ques+on Boris? Boris Galitsky: are you an automated agent? Eric P.: No sir Boris Galitsky: It is my right as a customer to know if I am served by an automated or human

agent, right? Eric P.: Yes it is

My conversa+on with WU agent

Content genera+on algorithm

Algorithm: select candidate sentences

Content genera+on algorithm 1

Algorithm: verify relevance


Algorithm: final acceptance of pieces of content


Content genera+on algorithm: building sec+on structure

We search for a essay topic and obtain main en++es associated with it, such as {space, atomic bomb, +me machine}

Content genera+on algorithm: building sec+on structure and filling

them We now search for a essay topic + sec+on topic:

Building sec+on structure and filling them

Now we construct the sec+on structure: Albert Einstein 1.  Albert Einstein invented Space

2.  Albert Einstein invented Bomb

Content genera+on flow 1 For sentence “Give me a break, there is no reason why you can't reGre in ten years if you had been a raGonal investor and not a crazy trader” • We form the query for search engine API: +raGonal +investor +crazy +trader • From search results we remove duplicates, including “DerivaGves: ImplicaGons for Investors | The <b>RaGonal</b> Walk”. • From the search results we show syntac+c generaliza+on results for the seed and search result: •  Syntac+c similarity: np [ [IN-‐in DT-‐a JJ-‐* ], [DT-‐a JJ-‐* JJ-‐crazy ], [JJ-‐ra+onal NN-‐* ], [DT-‐a JJ-‐crazy ]], score= 0.9 • Rejected candidate sentence: RaGonal opportuniGes in a crazy silly world.

Content genera+on flow 2 •  Syntac+c generaliza+on 2: np [ [VBN-‐* DT-‐a JJ-‐* JJ-‐ra+onal NN-‐investor ], [DT-‐a JJ-‐* JJ-‐ra+onal NN-‐investor ]] vp [ [DT-‐a ], [VBN-‐* DT-‐a JJ-‐* JJ-‐ra+onal NN-‐investor ]], score= 2.0

•  Accepted sentence: I have licle pretensions about being a so-‐called "raGonal investor”.

. The laPer sentence has significantly stronger seman+c commonality with the seed one,

compared to the former one, so it is expected to serve as a relevant part of generated

content about “raGonal investor” from the seed sentence

NLP: machine learning of parse trees

Similarity = common part among parse trees

Why Parse Thickets

•  To represent a linguis+c structure of a paragraph of text based on parse trees for each sentence of this paragraph.

•  We will refer to the sequence of parse trees extended by a number of arcs for inter-‐sentence rela+ons between nodes for words as Parse Thicket (PT).

•  A PT is a graph which includes parse trees for each sentence, as well as addi+onal arcs for inter-‐sentence rela+onship between parse tree nodes for words.

What are we going to do with Parse Thickets

•  Extend the opera+on of least general generaliza+on (unifica+on of logic formula) towards structural representa+ons of paragraph of texts

•  Define the opera+on of generalizaGon of text paragraphs to assess similarity between por+ons of text.

•  Use of generaliza+on for similarity assessment is inspired by structured approaches to machine learning versus unstructured, sta+s+cal where similarity is measured by a distance in feature space (Moschil et al)

Now we compare linguistic phrase search and regular

SOLR search SOLR is one of the most popular search frameworks used in industry lucene.apache.org/solr/‎ Linguis+c phrase is not a string phrase query “…”. There are noun phrases, verb phrases, preposi+onal phrases, and others SOLR search takes into account term frequency, inverse document frequency, and the number of words between keywords in the document

Keyword search

Phrase (natural language) search

Finding similarity between two paragraphs

"Iran refuses to accept the UN proposal to end the dispute over work on nuclear weapons", "UN nuclear watchdog passes a resolu+on condemning Iran for developing a second uranium enrichment site in secret", "A recent IAEA report presented diagrams that suggested Iran was secretly working on nuclear weapons", "Iran envoy says its nuclear development is for peaceful purpose, and the material evidence against it has been fabricated by the US" "UN passes a resolu+on condemning the work of Iran on nuclear weapons, in spite of Iran claims that its nuclear research is for peaceful purpose", "Envoy of Iran to IAEA proceeds with the dispute over its nuclear program and develops an enrichment site in secret", "Iran confirms that the evidence of its nuclear weapons program is fabricated by the US and proceeds with the second uranium enrichment site"

Original text

Text, found on the web: is it RELEVANT?

Keywords: topic with no details

Iran, UN, proposal, dispute, nuclear, weapons, passes, resolu+on, developing, enrichment, site, secret, condemning, second, uranium

Improvement: pair-wise generalization

[NN-‐work IN-‐* IN-‐on JJ-‐nuclear NNS-‐weapons ], [DT-‐the NN-‐dispute IN-‐over JJ-‐nuclear NNS-‐* ], [VBZ-‐passes DT-‐a NN-‐resolu+on ], [VBG-‐condemning NNP-‐iran IN-‐* ], [VBG-‐developing DT-‐* NN-‐enrichment NN-‐site IN-‐in NN-‐secret ]], [DT-‐* JJ-‐second NN-‐uranium NN-‐enrichment NN-‐site ]], [VBZ-‐is IN-‐for JJ-‐peaceful NN-‐purpose ], [DT-‐the NN-‐evidence IN-‐* PRP-‐it ], [VBN-‐* VBN-‐fabricated IN-‐by DT-‐the NNP-‐us ]

Pair-wise vs. Parse thicket

[NN-‐Iran VBG-‐developing DT-‐* NN-‐enrichment NN-‐site IN-‐in NN-‐secret ] [NN-‐generaliza+on-‐<UN/nuclear watchdog> * VB-‐pass NN-‐resolu+on VBG condemning NN-‐ Iran] [NN-‐generaliza+on-‐<Iran/envoy of Iran> Communica@ve_ac@on DT-‐the NN-‐dispute IN-‐over JJ-‐nuclear NNS-‐* [Communica@ve_ac@on -‐ NN-‐work IN-‐of NN-‐Iran IN-‐on JJ-‐nuclear NNS-‐weapons] [NN-‐generaliza+on <Iran/envoy to UN> Communica+ve_ac+on NN-‐Iran NN-‐nuclear NN-‐* VBZ-‐is IN-‐for JJ-‐peaceful NN-‐purpose ], [Communica+ve_ac+on -‐ NN-‐generalize <work/develop> IN-‐of NN-‐Iran IN-‐on JJ-‐nuclear NNS-‐weapons] [NN-‐generaliza+on <Iran/envoy to UN> Communica+ve_ac+on NN-‐evidence IN-‐against NN Iran NN-‐nuclear VBN-‐fabricated IN-‐by DT-‐the NNP-‐us ] condemn^proceed [enrichment site] <leads to> suggest^condemn [ work Iran nuclear weapon ]

Parse Thicket

•  Syntac+c parse trees

•  Links between words from different sentences –  Anaphora –  Rhetoric Structure Theory (RST) [Mann] –  Speech Act Theory (communica+ve ac+ons, CA) [Searle]

Applica+ons

Case Study: Agent ac+ng on behalf of a user pos+ng messages to

maintain friendship

Domain of social promo+on

•  On average, people have 200-‐300 friends or contacts on social network systems such Facebook and LinkedIn.

•  To maintain ac+ve rela+onships with this high number of friends, a few hours per week is required

•  In reality, people only maintain rela+onship with 10-‐20 most close friends, family and colleagues,

•  The rest of friends are being communicated with very rarely. These not so close friends feel that the social network rela+onship has been abandoned

CASP is about to post a message

Case study

CASP writes an essay on a topic

Content Genera+on part of CASP •  The content genera+on part of CASP can be used at

www.facebook.com/RoughDra|Essay •  Given a topic, it first mines the web to auto build a taxonomy of en++es which will

be used in the future comment or essay •  Then the system searches the web for these en++es to find pieces of content to

create respec+ve sec+ons •  The resultant document is delivered as DOCX email aPachment •  CASP can automa+cally compile texts from hundreds of sources to write an essay

on the topic •  If a user wants to share a comprehensive review, opinion on something, provide a

thorough background, then this interac+ve mode should be used.

Evalua+on of CASP We measure the failure score of an individual CASP pos+ng as the number of failures in relevance and appropriateness. The pos+ngs started with [posted by an agent on behalf of Boris]

A|er a certain number of failures, friends •  complain •  unfriend •  share nega+ve informa+on about the loss of trust with

others

•  and even encourage other friends to unfriend an friend who is enabled with CASP.

Trust vs Domain & Complexity

Values are the average numbers of failed CASP pos+ngs, for given domain and given complexity, when issues with trust arises On average, friends complain a|er 6 failures and want to unfriend CASP’s host a|er 10 failures

Trust vs Relevance

averaging between 2 and 3 for each human user pos+ng

•  For less informa+on-‐cri+cal domains like travel and shopping, tolerance to failed relevance is rela+vely high.

•  Conversely, in the domains taken more seriously, like job related, and with personal flavor, like personal life, users are more sensi+ve to CASP failures and the loss of trust in its various forms occur faster

•  For all domains, tolerance slowly decreases when the complexity of pos+ng increases.

Conclusions for Facebook applica+on •  We proposed a problem domain of social promo+on and build a conversa+onal agent CASP to act in this domain

•  CASP maintains friendship and professional rela+onships by automa+cally pos+ng messages on behalf of its human host

•  It was demonstrated that a substan+al intelligence in informa+on retrieval, reasoning, and natural language-‐based relevance assessment is required to retain trust in CASP agents by human users

Content genera+on agent can be trusted

•  We evaluated the scenarios of how trust was lost in Facebook environment.

•  We confirmed that it happens rarely enough for CASP agent to improve the social visibility and maintain more friends for a human host than being without CASP.

Although some friends lost trust in CASP, the friendship with most friends was retained by CASP

Overall CASP’s impact on social ac+vity is posi+ve.

Conclusions

Published work

• Galitsky, B., Josep Lluis de la Rosa & Gábor Dobrocsi. Building Integrated Opinion Delivery Environment FLAIRS-‐24 2011. • Galitsky, B. and Josep Lluis de la Rosa. Concept-‐based learning of human behavior for customer rela+onship management. Special Issue on Informa+on Engineering Applica+ons Based on Lalces. Informa+on Sciences. Volume 181, Issue 10, 15 May 2011, pp 2016-‐2035, 2011. • Galitsky, B., Josep-‐Lluis de la Rosa, and Boris Kovalerchuk. Assessing plausibility of explana+on and meta-‐explana+on in inter-‐human conflict. Engineering Applica+on of AI, V 24 Issue 8, pp 1472-‐1486, 2011. • Galitsky, B. Josep Lluis de la Rosa, Gábor Dobrocsi. Inferring the seman+c proper+es of sentences by mining syntac+c parse trees. Data & Knowledge Engineering. Available online 28 July 2012. • Galitsky, B. Machine Learning of Syntac+c Parse Trees for Search and Classifica+on of Text. Engineering Applica+on of Ar+ficial Intelligence, 2012 • Galitsky, B. Exhaus+ve simula+on of consecu+ve mental states of human agents. Knowledge-‐Based Systems, 2013

JGraphT

This project among open source projects

OpenNLP

StanfordNLP

Graph-‐learning Parse

Thickets Linguists can now enjoy graph-‐ learning of linguis+c structures, and graph community – new rich source of graphs

Documents

ADeepTextAnalysisSystemBasedon OpenNLP · Outline 1. Introduc+on,to,OpenNLP.Similarity, component 2. Whatis,aDeep,TextAnalysis,and,its, Applicaons, 3. Func+onality,of,OpenNLP.Similarity: