View
33
Download
15
Embed Size (px)
DESCRIPTION
This presentation will introduce Semantic Analysis – a way in which content can be analysed and classified through its linguistic basis, rather than through its overt meaning. It will achieve this by using Lego as a metaphor for language and demonstrating that by examining the building blocks of language a deeper understanding of content can be gained.
Citation preview
1
SM
S M
anag
emen
t & T
echn
olog
y
IAs, Language and Lego™ – an Introduction to Semantic Analysis
Matthew HodgsonRegional-lead, Web and Information Management, Canberra Australia
12 April 2008
2
SM
S M
anag
emen
t & T
echn
olog
y
3
SM
S M
anag
emen
t & T
echn
olog
y
4
SM
S M
anag
emen
t & T
echn
olog
y
IA Tools for understanding content
5
SM
S M
anag
emen
t & T
echn
olog
y
Content analysis…
6
SM
S M
anag
emen
t & T
echn
olog
y
We all:Think about information in different waysWrite about information in different ways
Information: we all think differently …
7
SM
S M
anag
emen
t & T
echn
olog
y
… we all even write differently …
8
SM
S M
anag
emen
t & T
echn
olog
y
Jeffrey Veen on analysing content
“a mind-numbingly detailed odyssey through your web site...
…this process…is a relatively straightforward process of clicking through your web site and recording what you find.”
Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php
9
SM
S M
anag
emen
t & T
echn
olog
y
When analysing content …
10
SM
S M
anag
emen
t & T
echn
olog
y
An extract of medical restrictions text
11
SM
S M
anag
emen
t & T
echn
olog
y
What is this content?! Medical restrictions text Free-text built in Word and hand-crafted (*grrr*) Unclassified Varied consistency within and between texts Highly complex sentence structures in pseudo-legalese Style reflects the author rather than
the meaning in the communication
Content needed for re-use Content output was needed for reuse by others Multiple audiences Multiple purposes for re-use
Codification Codification by 3rd parties (after authoring) takes too long Need to reduce timeframes!
12
SM
S M
anag
emen
t & T
echn
olog
y
The task . . .analyse and codify
Concept 1
Concept 2Concept 3
Concept 4 Concept 5
Concept 5
13
SM
S M
anag
emen
t & T
echn
olog
y
What tools would be appropriate?
?
14
SM
S M
anag
emen
t & T
echn
olog
y Linguistics…a whole discipline devoted to the
study of language…
preposition
verb adjective
noun
determiner
subjectobject
conjunction
semantics
sentence structure
all language has structure
15
SM
S M
anag
emen
t & T
echn
olog
y
Language is like Lego™
Building blocks Subject (S) Verb (V) Object (O)
Order of blocks Differs depending on the language
16
SM
S M
anag
emen
t & T
echn
olog
y
Language is like Lego™
SVO languages English, French, Chinese, Bulgarian, SwahiliSOV Japanese, Turkish, KoreanVSO Classical Arabic, Celtic and HawaiianVOS Fijian, Yoda’s amusing phrases
17
SM
S M
anag
emen
t & T
echn
olog
y
Lego bricks: subjects, verbs and objects
Subject Verb Object
Those Lego bricks are [some] Lego bricksred
Sometimes, though, the SVO structure is hidden: “The Lego is red” or “Those Lego bricks are [some] red Lego bricks” ?
Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what
19
SM
S M
anag
emen
t & T
echn
olog
y
Lego trees…
OBJECTVERBSUBJECT
Those Lego bricks are [some] Lego bricksred
SentenceRoot
Adj Adj Adj
NounPhrase
VerbPhrase
NounPhrase
NounDetVerbDet Noun
20
SM
S M
anag
emen
t & T
echn
olog
y
Semantic analysis
Medical restrictions wording:
Restricted benefitGastro-oesophageal reflux disease; Scleroderma oesophagus;
Authority requiredPeptic ulcer
21
SM
S M
anag
emen
t & T
echn
olog
y
Semantic analysis (cont.)
Actual sentencePeptic ulcer
Implied sentenceThe prescription of medicine is restricted to
the initial treatment of patients with peptic ulcer
22
SM
S M
anag
emen
t & T
echn
olog
y
Semantic structure of ‘peptic ulcer’
OBJ ECTVERBSUBJ ECT
ofthe prescription
DETVNDET PN
(SUBJECT)AUX
VAUX NP P ADJ NN
NounPhrase
PreposPhrase
NounPhrase
Root VerbPhrase
NounPhrase
PreposPhrase
NounPhrase
medicine is restricted to the initial
Adj
treatment of patients with peptic ulcer
23
SM
S M
anag
emen
t & T
echn
olog
y
Semantic model for restrictions textWHO
TREATED?
treatment of patientsinitial
Initi
al o
r co
ntin
uin
g
70 year old
mother
pregnant
Co
ndi
tion
be
ing
tre
ate
d
form
Pra
ctic
al a
spe
cts
Ob
ject
the prescription of medicine is restricted to the
Su
bje
ct
Ve
rb
femalecontinuing
other ADJ
male
Pa
tient
des
crip
tors
(p
op
ulat
ion/
gro
up
)
details of doctorrecord
daterecord
sign
receivingdBMARD treatment
previouslyPBS-
subsidised
PB
S s
ubs
idis
ed
receivingPBS-
subsidiseddBMARD treatment
treated immunologistclinical
Lim
itatio
n o
fP
resc
ribin
g to
a s
pec
ific
spe
cia
list
grou
p
withnausea and
vomiting
advanced psoriasis
peptic ulcerwith
tumorwith malignant
scleroderma oesphaguswith
with
with chronic pain
chemotherapycytotoxic
receivingA 5HT3
antagonist
radiotherapyreceiving
Exi
stin
g t
reat
me
nt
de
scri
pto
rs
of
po
pu
latio
n
not toresponding anelgesics
not
ADJ
receiving
treated dermatologist
WHATCONDITION?
+
ADJ
NOUN
PREP
VERB
by
by
KEY
not previously
ACTIONREQUIRED
=complete
Authority action sheet
includewhole body
area diagrams
treat for period of time
provide historypreivous
prescribe repeatsnumber
with seizures
not toother
anti-epilepticdrugs
receiving treatment2 years
incomplete resolution
ADJ/PP
of
no indication of
surgeryhaving
responding
unable take of topiramatesolid form
partial
hormone dependent metastatic
cancerwith
Me
asu
res/
de
scri
pto
rsof
Co
nd
itio
n s
eve
rity
(AD
J)
breast
contact Medicare
obtainAuthority number
24
SM
S M
anag
emen
t & T
echn
olog
y
Semantics describing “Who Treated”
Age
Patient Group
Documented history
[mg ...etc]
[CLINICIAN] Requiring special expertise in
Requiring no special expertise
[EXPERTISE]
[SEVERITY] [CONDITION]
Sex
PBS subsidised
PBS non-subsidised
At a dose of
Weekly
Daily
Monthly
Yearly
Fortnightly
Hourly
Hours
Days
Weeks
Months
Years
Vocation Veteran
Male
Female
All
Ethnicity [ETHNICITY]
Entitlement [?]
[LIST]
[LIST]
Pregnant
Breastfeeding
[ADJECTIVE]
Veteran
?
[MEASURED AS]?
Co-administered with
That meet a specific definition/criteria as set out in [LIST of references]
General schedule of Lipid-lowering Drugs
and
[DEFINED BY]
Treatments
Within timeframe of
Over a period of
Trials
Treatment with
Treatment of
Treatment for
Initial
Continuing
Maintenance
Effective
Ineffective
Inappropriate
Initiation
Stabilisation
In conjunction with
Not in conjunction with
Following
Preceeding
Received
Has not received
Not responding
Responding
Failed to qualify for
Qualified for
Not indicated
Indicated
Has had
Has not had
Can have
Can not have
Can not receive
Disease progression
Disease regression
Treated by
Diagnosis confirmed by
=
[NUMBER]Over
Under
Exactly
Between
At least
[DRUG]
[TREATMENT]
Diet
Exercise
Surgery [TYPE]
[THERAPY]
Evidence of
[PROCEDURE]
in
[DISORDER]
Symptoms?
Clinical findings
Starts new prepositional-phrase in the same text-block
Starts new prepositional-phrase in the same text-block
Starts new prepositional-phrase
in the same text-block
As measured by?
As evidenced by
Starts new prepositional-phrase
in the same text-block
25
SM
S M
anag
emen
t & T
echn
olog
y Authority Action
(allow) Maximum
Therapy
Supply
(allow) Minimum
In writing
By telephone
[TIME]
days
weeks
months
Therapy
Supply[AMOUNT]
Repeats[AMOUNT]
Repeats[AMOUNT]
Initial
Subsequent
Ongoing
Initial
Subsequent
Ongoing
Initial
Subsequent
Ongoing In writing
By telephone
To complete
Followed by
In writing
By telephone Within timeframe of [TIME]
days
weeks
months
Treatment
Treatment
Electronically
Electronically
Electronically
Remaining
Remaining
Remaining
In writing
By telephone
Electronically
Initial
Subsequent
Ongoing
Remaining
Where approval
[TIMEFRAME]
To [AUTHORITY]
Medicare
To [AUTHORITY]
Medicare
To [AUTHORITY]
Medicare
...etc...
...etc...
...etc...
Repeats[AMOUNT]
Starts new prepositional-phrase
in the same text-block
Starts new prepositional-phrase
in the same text-block
Starts new prepositional-phrase
in the same text-block
Semantics describing “Authority Action”
26
SM
S M
anag
emen
t & T
echn
olog
y
High-level semantic overview
HOWAUTHORISED
WHATCONDITION
WHO TREATED
Notes and Cautions + + + + =
Age limitations
Clinical initiation or
continuation criteria
Prescribing clinicians
Prescribing adviceCondition
Contact information
Grandfathering clauses Patient
groups
Prior treatments Severity
Patient GroupDefinitions Condition Authority ActionForeword
27
SM
S M
anag
emen
t & T
echn
olog
y
Yes, it can be codified!
Medical restrictions: Did have structure Did have underlying logic Were based on repeatable business processes Could be codified
Could we make a ‘system’ to reinforce the structure at the point of authoring?
28
SM
S M
anag
emen
t & T
echn
olog
y
Demo
Putting it together in a system: Supporting building of content restrictions in a
codified way Protyotyping with Axure
29
SM
S M
anag
emen
t & T
echn
olog
y
30
SM
S M
anag
emen
t & T
echn
olog
y
31
SM
S M
anag
emen
t & T
echn
olog
y
32
SM
S M
anag
emen
t & T
echn
olog
y
33
SM
S M
anag
emen
t & T
echn
olog
y
34
SM
S M
anag
emen
t & T
echn
olog
y
35
SM
S M
anag
emen
t & T
echn
olog
y
36
SM
S M
anag
emen
t & T
echn
olog
y
37
SM
S M
anag
emen
t & T
echn
olog
y
The semantic analysis advantage
vsIdentifies:• Themes in content
Identifies:• Themes in content• Work processes• Folk taxonomies used• ‘Things’ written about
38
SM
S M
anag
emen
t & T
echn
olog
y
What else could you use it for?
When you need to understand: Business processes that create content
When you want to disassemble content for: FAQs A-Z indexes Help files
39
SM
S M
anag
emen
t & T
echn
olog
y
How can I add this to my toolbox??!
Theory is important An understanding of semantics - sentence trees
and grammar Text books by authors like Fromkin and Rodman
can help through the tricky bits
Need good tools Connexor:
http://www.connexor.eu/technology/machinese/demo/
Big sheets of paper (and an electronic whiteboard) Visio (not PowerPoint!)
40
SM
S M
anag
emen
t & T
echn
olog
y
Demo
Connexor:http://www.connexor.eu/technology/machinese/demo/
41
SM
S M
anag
emen
t & T
echn
olog
y
Connexor
42
SM
S M
anag
emen
t & T
echn
olog
y
Connexor – machine tagger
43
SM
S M
anag
emen
t & T
echn
olog
y
Connexor – machine syntax
44
SM
S M
anag
emen
t & T
echn
olog
y
Why should I care about this? Google uses semantic analysis to index content
Translation software uses semantic analysis to identify ‘components’ for translation
Good sentence structure equals: Accurate indexing Higher rank relevance of content Happy people (they find what they’re looking for)
45
SM
S M
anag
emen
t & T
echn
olog
y
Why should I care about this?
46
SM
S M
anag
emen
t & T
echn
olog
y
‘Calais’ by Reuters
47
SM
S M
anag
emen
t & T
echn
olog
y
Summing up
Content is still king!
But how can you tell if your content: Is of good quality? Matches your website’s categories? Accurately reflects your metadata? Can be found by people?
Semantic analysis can: Make your content audits more objective Inform processes to improve the quality of the content Inform processes to improve search engine indexing Inform metadata creation Inform choice of taxonomy
48
SM
S M
anag
emen
t & T
echn
olog
y
Take-home message
Semantic analysis can help IAs:Infer How people think about, and structure, their informationDescribe Business processes that produce contentIdentify Where content quality is poor so it can be improved Critical components of the sentence for codificationDesign Taxonomies and describe folk taxonomiesBuild Systems to help bring some structure to content authoring
49
SM
S M
anag
emen
t & T
echn
olog
y
Fin
50
SM
S M
anag
emen
t & T
echn
olog
y
IAs, Language and Lego™
an Introduction to Semantic Analysis
51
SM
S M
anag
emen
t & T
echn
olog
y
by
Matthew Hodgson
Regional-lead, Web and Information Management
SMS Management & Technology Canberra Australia
52
SM
S M
anag
emen
t & T
echn
olog
y
by
Matthew Hodgson
Email [email protected] magia3e.wordpress.com
Slideshare www.slideshare.net/magia3e
Twitter magia3e