Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Deep Text Analysis System Based on OpenNLP
Boris Galitsky
Outline
1. Introduc+on to OpenNLP.Similarity component
2. What is a Deep Text Analysis and its Applica+ons
3. Func+onality of OpenNLP.Similarity: search, web mining, text classifica+on, text analysis
4. Selected problem: content/document genera+on
2
What is OpenNLP.Similarity
3
OpenNLP.Similarity entry point
A search engineer without knowledge of linguistic / parsing can easily integrate syntactic generalization into an application when relevance is important: SentencePairMatchResult matchRes = sm.assessRelevance(snippet, searchQuery);
List<List<ParseTreeChunk>> match =
matchRes.getMatchResult();
score = parseTreeChunkListScorer.
getParseTreeChunkListScore(match);
if (score > 1.5) {
// relevant
}
Deep Text Analysis is required: When coun+ng keywords is not enough When part-‐of-‐speech informa+on is not enough When syntac+c links between words informa+on is not enough
5
Motivation: from keywords to discourse structure
• Just keywords -‐ is not enough in many cases • Presence of verbs for communica@ve ac@ons and
mental states -‐ may help to iden+fy a few paPerns, but is s+ll insufficient
• Parse trees -‐ specific phrases by texts in style or genre but s+ll does not help for systema+c explora+on of deep linguis+c features
• The discourse structure including rhetoric rela+ons and communica+ve ac+ons
6
We should be able to learn
discourse trees
7
Secondary, Tertiary, Quaternary structure of language
Computa+onal biologists figured out that higher-‐level structures of protein language help! However, linguists like to talk about higher-‐level (discourse) structures but
neither computa+onally learn them nor use in applica+ons
Approach • Build syntac+c trees • Extract discourse informa+on
– Coreferences – Rhetorical rela+ons [Mann et al., 1987] – Communica+ve ac+ons in VerbNet format
• Build addi+onal arcs between syntac+c tree nodes of different sentences
• Construct parse thicket graph [Galitsky et al 2013] • Pick extended trees from the graph • Learn on extended trees • Learn a special case of parse thicket: communica+ve discourse
tree (CDT)
8
Problem domain in security
• Engineering Design document is a document which contains a thorough and well-‐structured descrip+on of how to build a par+cular engineering system.
• Not design document : a document which contains meta-‐level informa+on rela+vely to the design of engineering system, such as: ü how to write design docs manuals, ü standards design docs should adhere to, ü tutorials on how to improve design documents, ü project requirement document, ü requirement analysis, ü opera+onal requirements, ü construc+on documenta+on, ü project planning and binning, ü technical services review, guidelines, manuals, etc.
9
Recognizing document style, not document topic
Classifica+on dimensions • Topicality • Polarity (sen+ment) • Authorship • Document style (what kind of descrip+ons is provided for
things) 10
Topic
Authorship
Style
This is necessary for document processing
system
Application domains where style recognition is necessary
• Document type (not topic) recogni+on • Document flow analysis systems • Customer rela+onship management • Security analysis • Forensic analysis • Document quality assessment
11
Method: Tree Kernels
12
Tree Kernels
What are they and why use them for classifica+on?
We need to measure distances between texts: • Tradi+onally – using sta+s+c of keywords • More accurate – using sta+s+cs of all sub-‐parse trees
13
Measuring similarity between phrases
SVM Tree Kernel learning
– Trees instead of numeric vectors
– Tree kernel func+on = number of all common subtrees [Collins et al 2002]
15
( ) ( )1 21 2
1 2 1 2, ,T Tn N n N
TK T T n n∈ ∈
= Δ∑ ∑
16
Convolu+on tree kernel (Collins and Duffy, 2002) defines a feature space consis+ng of all subtree types of parse trees and counts the number of common subtrees as the syntac+c similarity between two parse trees. They have found a number of applica+ons in a series of NLP tasks, including • search (Moschil 2006); • syntac+c parsing re-‐ranking; • rela+on extrac+on (Zelenko et al 2003); • named en+ty recogni+on (Cumby & Roth 2003) ; • Seman+c Role Labeling / rela+on extrac+on (Zhang et al 2006); • pronoun resolu+on (Yang et al., 2006); • ques+on classifica+on and machine transla+on (Zhang and Li 2009).
Applications of Tree Kernels
Limita+on of sentence-‐level tree kernels
Phrases important for classifica+on can be distributed through different sentences
So we want to combine/merge parse trees to make sure we cover the phrase of interest.
This document describes how the back end processor can be designed. Its requirements are enumerated below.
From the first sentence, it looks like we got the design doc. To process the second sentence, we need to disambiguate ‘its’. As a result, we conclude from the second sentence that
it is a requirement doc
Rhetoric structure
Discourse tree for a paragraph of text
18
Merging parse trees, once inter-‐sentence link is found
Paragraph Level Kernel = Sum of kernels for all extended tree pairs
20
Parse Thicket
Measuring similarity
between two texts using
Parse Thickets
Evaluation 1 & 2
22
Domain:Identifying Logical Argument in Text
23
What is required to represent an argument pattern in text?
We build an RST representa+on of the arguments and observe if a discourse tree is capable of indica+ng whether a paragraph communicates both:
• a claim, and • an argumenta+on that backs it up. We then explore what needs to be added to a discourse tree (DT)
so that it is possible to judge if it expresses an argumenta+on paPern or not.
24
Our example
We present a controversial ar+cle published in Wall Street Journal about Theranos, a company providing healthcare services, and the company rebuPal (Theranos 2015).
“Since October [2015], the Wall Street Journal has published a series
of anonymously sourced accusaGons that inaccurately portray Theranos. Now, in its latest story (“U.S. Probes Theranos Complaints,” Dec. 20), the Journal once again is relying on anonymous sources, this Gme reporGng two undisclosed and unconfirmed complaints that allegedly were filed with the Centers for Medicare and Medicaid Services (CMS) and U.S. Food and Drug AdministraGon (FDA).”
25
“…But Theranos has struggled behind the scenes to turn the excitement over its technology into reality. At the end of 2014, the lab instrument developed as the linchpin of its strategy handled just a small fracGon of the tests then sold to consumers, according to four former employees.”
A claim and its CDT
“…But Theranos has struggled behind the scenes to turn the excitement over its technology into reality. At the end of 2014, the lab instrument developed as the linchpin of its strategy handled just a small fracGon of the tests then sold to consumers, according to four former employees.”
A claim and its CDT
When arbitrary communica+ve ac+ons are aPached to DT as labels of its terminal arcs, it becomes clear that the author is trying to bring her point across and not merely sharing a fact.
“Theranos remains acGvely engaged with its regulators, including CMS and the FDA, and no one, including the Wall Street Journal, has provided Theranos a copy of the alleged complaints to those agencies. Because Theranos has not seen these alleged complaints, it has no basis on which to evaluate the purported complaints.”
A claim and its DT
Theranos aPempts to rebuke the claim of WSJ, but without
communica+ve ac+ons it is unclear from
the DT
“It is not unusual for disgruntled and terminated employees in the heavily regulated health care industry to file complaints in an effort to retaliate against employers for terminaGon of employment. Regulatory agencies have a process for evaluaGng complaints, many of which are not substanGated. Theranos trusts its regulators to properly invesGgate any complaints.”
CDT for an aPempt by Theranos to acquit itself Communica+ve ac+ons as labels for rhetoric
rela+ons helps to iden+fy a text which contains
a heated discussion
RST + CA => iden+fica+on of logical argumenta+on
• To show the structure of arguments, discourse rela+ons are necessary but insufficient, and communica+ve ac+ons, or speech acts are necessary but insufficient as well.
• We need to know the discourse structure of interac+ons between agents, and what kind of interac+ons they are.
• We don’t need to know domain of interac+on (here, health), the subjects of these interac+on (the company, the journal, the agencies), what are the en++es, but we need to take into account mental, domain-‐independent rela+ons between them.
“Theranos remains acGvely engaged with its regulators, including CMS and the FDA, and no one, including the Wall Street Journal, has provided Theranos a copy of the alleged complaints to those agencies. Because Theranos has not seen these alleged complaints, it has no basis on which to evaluate the purported complaints.”
Another CDT for an aPempt to make even stronger counter-‐claim
Extrac+ng communica+ve ac+ons from text: VerbNet entry
Let us consider the verb amuse. There is a cluster of similar verbs that have a similar structure of arguments (seman+c roles) such as amaze, anger, arouse, disturb, irritate, and other. The roles of the arguments of these communica+ve ac+ons are as follows: • Experiencer (usually, an animate en+ty) • S+mulus • Result
Extrac+ng communica+ve ac+ons from text: VerbNet Frames
NP V NP Example: "The teacher amused the children." Syntax: S+mulus V Experiencer Clause: amuse(SGmulus, E, EmoGon, Experiencer):-‐ cause(SGmulus, E), emoGonal_state(result(E), EmoGon, Experiencer). NP V ADV-‐Middle Example: "Small children amuse quickly." Syntax: Experiencer V ADV Clause: amuse(Experiencer, Prop):-‐
property(Experiencer, Prop), adv(Prop). NP V NP-‐PRO-‐ARB example "The teacher amused." syntax S+mulus V amuse(SGmulus, E, EmoGon, Experiencer):-‐ cause(SGmulus, E), emoGonal_state(result(E), EmoGon, Experiencer).
For this example, the informa+on for the class of verbs amuse is at hPp://verbs.colorado.edu/verb-‐index/vn/amuse-‐31.1.php#amuse-‐31.1
Argumenta+on scenario and its
DT
I explained that my check bounced (I wrote it aaer I made a deposit). A customer service representaGve accepted that it usually takes some Gme to process the deposit. I reminded that I was unfairly charged an overdraa fee a month ago in a similar situaGon. They denied that it was unfair because the overdraa fee was disclosed in my account informaGon. I disagreed with their fee and wanted this fee deposited back to my account. They explained that nothing can be done at this point and that I need to look into the account rules closer.
Argumenta+on scenario and its
graph representa+on
I explained that my check bounced (I wrote it aaer I made a deposit). A customer service representaGve accepted that it usually takes some Gme to process the deposit. I reminded that I was unfairly charged an overdraa fee a month ago in a similar situaGon. They denied that it was unfair because the overdraa fee was disclosed in my account informaGon. I disagreed with their fee and wanted this fee deposited back to my account. They explained that nothing can be done at this point and that I need to look into the account rules closer.
Extrac+ng communica+ve ac+ons from text: clusters
Verbs with Predica+ve Complements Appoint, characterize, dub, declare, conjecture, masquerade, orphan, captain, consider, classify. Verbs of Percep+on See, sight, peer. Psych-‐Verbs (Verbs of Psychological State) Amuse, admire, marvel, appeal. Verbs of Desire Want, long. Judgment Verbs Judgment. Verbs of Assessment Assess, es+mate. Verbs of Searching Hunt, search, stalk, invesGgate, rummage, ferret. Verbs of Social Interac+on Correspond, marry, meet, bacle. Verbs of Communica+on Transfer(message), inquire, interrogate, tell, manner(speaking), talk, chat, say, complain, advise, confess, lecture, overstate, promise. Avoid Verbs Avoid. Measure Verbs Register, cost, fit, price, bill. Aspectual Verbs Begin, complete, conGnue, stop, establish, sustain.
Sta+s+cal vs induc+ve learning To compute similarity between abstract structures, two approaches are employed: • represent these structures in a numerical space, and
express similarity as a number. This is a sta+s+cal learning approach
• use a structural representa+on, without numerical space, such as trees and graphs, and express similarity as a maximal common sub-‐structure. We refer to such opera+on as generalizaGon. This is an induc+ve learning approach.
To conduct feature engineering, we will compare both these approaches for text classifica+on. The representa+on machinery and learning selngs are different, but the classifica+on accuracies can be compared.
Similarity between CDTs: maximal common sub-‐CDTs with
generalized labels
Numerical similarity between CDTs: representa+on for kernel
model
Mo+va+ons Most web visionaries think that good quality content comes from: • really passionate fans, or • from professional journalists, knowing the topic of their wri+ng well. However, nowadays, the demand for content requires people write large amount of text, for a variety of commercial purposes: • from search engine op+miza+on • marke+ng • self-‐promo+on. -‐ A wide variety of business areas require content crea+on in some form, including web content, -‐ Expecta+on of text quality are rather low and deteriorate further, as content is becoming more and more commercial -‐
So lets have a machine generate low-‐quality
content for us
Now we focus on content genera+on in detail
41
Objec+ve To build • an efficient • domain-‐independent • crea+ve • interac+ve wri+ng assistance tool which produces: a large volume of content where • quality and • effec+veness are not essen+al
Plan • Goals of the project • Technology
– Content genera+on algorithm – NLP: Learning parse trees – Parse thickets
• Applica+ons – Facebook agent – Various cases of content genera+on
• How to use it – How to integrate into your applica+on
OpenNLP.Similarity.content generation entry point
String topic = “Albert Einstein”; HitBase hits =
new RelatedSentenceFinder(). generateContentAbout(topic);
String filenameDOCX = new
WordDocBuilder.buildWordDoc(hits, topic);
Innova+on • Use of web mining to collect pieces of
content • Machine learning of parse trees to filter
irrelevant pieces of content from the web • Paragraph-‐level syntac+c structures to form
content from pieces • Document-‐level structure of sec+ons is
formed by web mining for aPributes of an en+ty which is a subject of an assay being wriPen
Mo+va+ons: why open source • Today search engines are information
access gatekeepers • Search engines want their automated
agent to analyze / index content created by humans
• WE want to automatically build content good enough so that search engines do not recognize it as written by humans…
• So that search engines would need humans to analyze content created by our machines
Got content from Walmart? • Today, original content is created by human writers
and therefore costly, slowly produced • Finding a way to automate content crea+on so that the
result is sa+sfactory for human content consumers and perceived as original by search engine would advance user's web presence and facilitate his professional reputa+on
Reformulate a text for the purpose of
making it original: • well known to search engines
• produce uninteres+ng and seman+cally non-‐cohesive
Reproduce the human wri+ng ac+vity
A typical crea+ve wri+ng ac+vity of an average author is: • searching the web for a subject of wri+ng • browsing the web for relevant informa+on • finding candidate pieces • merging them to form a text • Inser+ng images/videos/sound recordings • final text polishing
We intend our wri+ng algorithm to do these exact
steps… And possibly leave the last
step for an author
Technology
Difficulty = relevance Eric P.: We will e-‐mail you a bill for the amount of your money transfer, and you pay using your
bank's online bill pay service. Boris Galitsky: can YOU do this email? Eric P.: Please clarify and restate your ques+on. Boris Galitsky: can you please send me email for bill pay informa+on? Eric P.: Once you have selected WU Pay as the "Pay With" op+on and clicked the SUBMIT buPon,
the RECEIPT screen will provide instruc+ons on how to fund the Money Transfer. Boris Galitsky: ok, I will try that Boris Galitsky: are you an automated agent? Eric P.: A|er choosing WU® Pay, you'll receive a bill by email. Eric P.: Is there anything else I can help you with today? Boris Galitsky: can you please answer my previous ques+on? Eric P.: What ques+on Boris? Boris Galitsky: are you an automated agent? Eric P.: No sir Boris Galitsky: It is my right as a customer to know if I am served by an automated or human
agent, right? Eric P.: Yes it is
My conversa+on with WU agent
Content genera+on algorithm
Algorithm: select candidate sentences
Content genera+on algorithm 1
Algorithm: verify relevance
Content genera+on algorithm 2
Algorithm: final acceptance of pieces of content
Content genera+on algorithm 3
Content genera+on algorithm: building sec+on structure
We search for a essay topic and obtain main en++es associated with it, such as {space, atomic bomb, +me machine}
Content genera+on algorithm: building sec+on structure and filling
them We now search for a essay topic + sec+on topic:
Building sec+on structure and filling them
Now we construct the sec+on structure: Albert Einstein 1. Albert Einstein invented Space
2. Albert Einstein invented Bomb
Content genera+on flow 1 For sentence “Give me a break, there is no reason why you can't reGre in ten years if you had been a raGonal investor and not a crazy trader” • We form the query for search engine API: +raGonal +investor +crazy +trader • From search results we remove duplicates, including “DerivaGves: ImplicaGons for Investors | The <b>RaGonal</b> Walk”. • From the search results we show syntac+c generaliza+on results for the seed and search result: • Syntac+c similarity: np [ [IN-‐in DT-‐a JJ-‐* ], [DT-‐a JJ-‐* JJ-‐crazy ], [JJ-‐ra+onal NN-‐* ], [DT-‐a JJ-‐crazy ]], score= 0.9 • Rejected candidate sentence: RaGonal opportuniGes in a crazy silly world.
Content genera+on flow 2 • Syntac+c generaliza+on 2: np [ [VBN-‐* DT-‐a JJ-‐* JJ-‐ra+onal NN-‐investor ], [DT-‐a JJ-‐* JJ-‐ra+onal NN-‐investor ]] vp [ [DT-‐a ], [VBN-‐* DT-‐a JJ-‐* JJ-‐ra+onal NN-‐investor ]], score= 2.0
• Accepted sentence: I have licle pretensions about being a so-‐called "raGonal investor”.
. The laPer sentence has significantly stronger seman+c commonality with the seed one,
compared to the former one, so it is expected to serve as a relevant part of generated
content about “raGonal investor” from the seed sentence
NLP: machine learning of parse trees
Similarity = common part among parse trees
Why Parse Thickets
• To represent a linguis+c structure of a paragraph of text based on parse trees for each sentence of this paragraph.
• We will refer to the sequence of parse trees extended by a number of arcs for inter-‐sentence rela+ons between nodes for words as Parse Thicket (PT).
• A PT is a graph which includes parse trees for each sentence, as well as addi+onal arcs for inter-‐sentence rela+onship between parse tree nodes for words.
What are we going to do with Parse Thickets
• Extend the opera+on of least general generaliza+on (unifica+on of logic formula) towards structural representa+ons of paragraph of texts
• Define the opera+on of generalizaGon of text paragraphs to assess similarity between por+ons of text.
• Use of generaliza+on for similarity assessment is inspired by structured approaches to machine learning versus unstructured, sta+s+cal where similarity is measured by a distance in feature space (Moschil et al)
Now we compare linguistic phrase search and regular
SOLR search SOLR is one of the most popular search frameworks used in industry lucene.apache.org/solr/ Linguis+c phrase is not a string phrase query “…”. There are noun phrases, verb phrases, preposi+onal phrases, and others SOLR search takes into account term frequency, inverse document frequency, and the number of words between keywords in the document
Keyword search
Phrase (natural language) search
Finding similarity between two paragraphs
"Iran refuses to accept the UN proposal to end the dispute over work on nuclear weapons", "UN nuclear watchdog passes a resolu+on condemning Iran for developing a second uranium enrichment site in secret", "A recent IAEA report presented diagrams that suggested Iran was secretly working on nuclear weapons", "Iran envoy says its nuclear development is for peaceful purpose, and the material evidence against it has been fabricated by the US" "UN passes a resolu+on condemning the work of Iran on nuclear weapons, in spite of Iran claims that its nuclear research is for peaceful purpose", "Envoy of Iran to IAEA proceeds with the dispute over its nuclear program and develops an enrichment site in secret", "Iran confirms that the evidence of its nuclear weapons program is fabricated by the US and proceeds with the second uranium enrichment site"
Original text
Text, found on the web: is it RELEVANT?
Keywords: topic with no details
Iran, UN, proposal, dispute, nuclear, weapons, passes, resolu+on, developing, enrichment, site, secret, condemning, second, uranium
Improvement: pair-wise generalization
[NN-‐work IN-‐* IN-‐on JJ-‐nuclear NNS-‐weapons ], [DT-‐the NN-‐dispute IN-‐over JJ-‐nuclear NNS-‐* ], [VBZ-‐passes DT-‐a NN-‐resolu+on ], [VBG-‐condemning NNP-‐iran IN-‐* ], [VBG-‐developing DT-‐* NN-‐enrichment NN-‐site IN-‐in NN-‐secret ]], [DT-‐* JJ-‐second NN-‐uranium NN-‐enrichment NN-‐site ]], [VBZ-‐is IN-‐for JJ-‐peaceful NN-‐purpose ], [DT-‐the NN-‐evidence IN-‐* PRP-‐it ], [VBN-‐* VBN-‐fabricated IN-‐by DT-‐the NNP-‐us ]
Pair-wise vs. Parse thicket
[NN-‐Iran VBG-‐developing DT-‐* NN-‐enrichment NN-‐site IN-‐in NN-‐secret ] [NN-‐generaliza+on-‐<UN/nuclear watchdog> * VB-‐pass NN-‐resolu+on VBG condemning NN-‐ Iran] [NN-‐generaliza+on-‐<Iran/envoy of Iran> Communica@ve_ac@on DT-‐the NN-‐dispute IN-‐over JJ-‐nuclear NNS-‐* [Communica@ve_ac@on -‐ NN-‐work IN-‐of NN-‐Iran IN-‐on JJ-‐nuclear NNS-‐weapons] [NN-‐generaliza+on <Iran/envoy to UN> Communica+ve_ac+on NN-‐Iran NN-‐nuclear NN-‐* VBZ-‐is IN-‐for JJ-‐peaceful NN-‐purpose ], [Communica+ve_ac+on -‐ NN-‐generalize <work/develop> IN-‐of NN-‐Iran IN-‐on JJ-‐nuclear NNS-‐weapons] [NN-‐generaliza+on <Iran/envoy to UN> Communica+ve_ac+on NN-‐evidence IN-‐against NN Iran NN-‐nuclear VBN-‐fabricated IN-‐by DT-‐the NNP-‐us ] condemn^proceed [enrichment site] <leads to> suggest^condemn [ work Iran nuclear weapon ]
Parse Thicket
• Syntac+c parse trees
• Links between words from different sentences – Anaphora – Rhetoric Structure Theory (RST) [Mann] – Speech Act Theory (communica+ve ac+ons, CA) [Searle]
Applica+ons
Case Study: Agent ac+ng on behalf of a user pos+ng messages to
maintain friendship
Domain of social promo+on
• On average, people have 200-‐300 friends or contacts on social network systems such Facebook and LinkedIn.
• To maintain ac+ve rela+onships with this high number of friends, a few hours per week is required
• In reality, people only maintain rela+onship with 10-‐20 most close friends, family and colleagues,
• The rest of friends are being communicated with very rarely. These not so close friends feel that the social network rela+onship has been abandoned
CASP is about to post a message
Case study
CASP writes an essay on a topic
Content Genera+on part of CASP • The content genera+on part of CASP can be used at
www.facebook.com/RoughDra|Essay • Given a topic, it first mines the web to auto build a taxonomy of en++es which will
be used in the future comment or essay • Then the system searches the web for these en++es to find pieces of content to
create respec+ve sec+ons • The resultant document is delivered as DOCX email aPachment • CASP can automa+cally compile texts from hundreds of sources to write an essay
on the topic • If a user wants to share a comprehensive review, opinion on something, provide a
thorough background, then this interac+ve mode should be used.
Evalua+on of CASP We measure the failure score of an individual CASP pos+ng as the number of failures in relevance and appropriateness. The pos+ngs started with [posted by an agent on behalf of Boris]
A|er a certain number of failures, friends • complain • unfriend • share nega+ve informa+on about the loss of trust with
others
• and even encourage other friends to unfriend an friend who is enabled with CASP.
Trust vs Domain & Complexity
Values are the average numbers of failed CASP pos+ngs, for given domain and given complexity, when issues with trust arises On average, friends complain a|er 6 failures and want to unfriend CASP’s host a|er 10 failures
Trust vs Relevance
averaging between 2 and 3 for each human user pos+ng
• For less informa+on-‐cri+cal domains like travel and shopping, tolerance to failed relevance is rela+vely high.
• Conversely, in the domains taken more seriously, like job related, and with personal flavor, like personal life, users are more sensi+ve to CASP failures and the loss of trust in its various forms occur faster
• For all domains, tolerance slowly decreases when the complexity of pos+ng increases.
Conclusions for Facebook applica+on • We proposed a problem domain of social promo+on and build a conversa+onal agent CASP to act in this domain
• CASP maintains friendship and professional rela+onships by automa+cally pos+ng messages on behalf of its human host
• It was demonstrated that a substan+al intelligence in informa+on retrieval, reasoning, and natural language-‐based relevance assessment is required to retain trust in CASP agents by human users
Content genera+on agent can be trusted
• We evaluated the scenarios of how trust was lost in Facebook environment.
• We confirmed that it happens rarely enough for CASP agent to improve the social visibility and maintain more friends for a human host than being without CASP.
Although some friends lost trust in CASP, the friendship with most friends was retained by CASP
Overall CASP’s impact on social ac+vity is posi+ve.
Conclusions
Published work
• Galitsky, B., Josep Lluis de la Rosa & Gábor Dobrocsi. Building Integrated Opinion Delivery Environment FLAIRS-‐24 2011. • Galitsky, B. and Josep Lluis de la Rosa. Concept-‐based learning of human behavior for customer rela+onship management. Special Issue on Informa+on Engineering Applica+ons Based on Lalces. Informa+on Sciences. Volume 181, Issue 10, 15 May 2011, pp 2016-‐2035, 2011. • Galitsky, B., Josep-‐Lluis de la Rosa, and Boris Kovalerchuk. Assessing plausibility of explana+on and meta-‐explana+on in inter-‐human conflict. Engineering Applica+on of AI, V 24 Issue 8, pp 1472-‐1486, 2011. • Galitsky, B. Josep Lluis de la Rosa, Gábor Dobrocsi. Inferring the seman+c proper+es of sentences by mining syntac+c parse trees. Data & Knowledge Engineering. Available online 28 July 2012. • Galitsky, B. Machine Learning of Syntac+c Parse Trees for Search and Classifica+on of Text. Engineering Applica+on of Ar+ficial Intelligence, 2012 • Galitsky, B. Exhaus+ve simula+on of consecu+ve mental states of human agents. Knowledge-‐Based Systems, 2013
JGraphT
This project among open source projects
OpenNLP
StanfordNLP
Graph-‐learning Parse
Thickets Linguists can now enjoy graph-‐ learning of linguis+c structures, and graph community – new rich source of graphs