23
representing scientific discourse, or: why triples are not enough Anita de Waard Disruptive Technologies Director, Elsevier Labs Casimir Researcher, Utrecht Institute of Linguistics

C-SHALS 2010: representing scientific discourse, or: why triples are not enough

Embed Size (px)

DESCRIPTION

On semantic annotation for science publications

Citation preview

Page 1: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

representing scientific discourse, or: why triples are not enough

Anita de WaardDisruptive Technologies Director, Elsevier Labs

Casimir Researcher, Utrecht Institute of Linguistics

Page 2: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

what is your problem?

Page 3: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

insulin maintaining glucose homeostasis

When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues.

insulin may be involved glucose homeostasis

Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.

why triples are not enough (1): commercial tool

the triples are often wrong.you cannot check if they are true.

Page 4: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

why triples are not enough (2): biocreative challenge

compare:

- In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic polyadenylation element binding protein (CPEB) induces the translation of maternal mRNA [5].

- In mouse testis, another novel member of the CPEB protein family (CPEB2) and a homolog of xGLD-2 (mGLD-2) have been identified [7] and [8]

to:

- TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that TPAP–GSG1 interactions occur in mammalian cells.

how do you know this is true? what is new?

Page 5: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53.

Alteration of nm23, P53, and S100A4 expression may contribute to the development of gastric

how do you know this is true?what is new?

why triples are not enough (3): medie

Page 6: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

why is this so difficult?

Page 7: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

Aristotle Quintilian Scientific Paper

prooimion Introduction/ exordium

The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience.

Introduction: positioning

prothesis Statement of Facts/narratio

The speaker here provides a narrative account of what has happened and generally explains the nature of the case.

Introduction: research question

  Summary/ propostitio

The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation. Summary of contents

pistis Proof/ confirmatio

The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here. Results

  Refutation/ refutatio

As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent. Related Work

epilogos peroratio Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up.

Discussion: summary, implications.

issue # 1: science is rhetoric

- goal of the paper is to be published; it uses us as a host system- format has co-evolved as predator-prey system with reviewers

Page 8: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

Story Grammar The Story of Goldilocks and the Three Bears

Setting Time Once upon a time

Character a little girl named Goldilocks

Location She went for a walk in the forest. Pretty soon, she came upon a house.

Theme Goal She knocked and, when no one answered,

Attempt she walked right in.

Episode Name At the table in the kitchen, there were three bowls of porridge.

Subgoal Goldilocks was hungry.

Attempt She tasted the porridge from the first bowl.

Outcome This porridge is too hot! she exclaimed.

Attempt So, she tasted the porridge from the second bowl.

Outcome This porridge is too cold, she said

Attempt So, she tasted the last bowl of porridge.

Outcome Ahhh, this porridge is just right, she said happily and

Outcome she ate it all up.

issue # 2: science is a storyPaper Grammar

The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

Objects of study

the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.Hypothesis Atx-1 may play a role in the regulation of gene expression

Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Subgoal test the function of the AXH domain

Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells

Data (data not shown),

Results both genotypes show many large holes and loss of cell integrity at 28 days

Data (Figures 1B-1D).

Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

Data (Figure 1F),

Page 9: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

a. Figure 4a shows thatb. following RASV12 stimulationc. p53 was stabilized and activatedd. and the target gene, p21cip1, was induced in all cases,e. indicating an intact p53 pathway in these cells.

Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.

IntratextualMethod

Result Result

Implication

issue #3: science happens in language(and language happens in our heads)

Page 10: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

Conceptual Realm: State Present

Experimental Realm: Event Past

Argumentational Realm: Instantaneous Presen t

Discourse Progression Axi s : Instantaneous Present

Research Progression Axis: Present Per fect

language happens in our head:tense use in biology

Page 11: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

Facts in the eternal present

Endogenous small RNAs (miRNAs) regulate gene expression by mechanisms conserved across metazoans.

I sing of golden-throned Hera whom Rhea bare. Queen of the immortals is she, surpassing all in beauty: she is the sister and the wife of loud-thundering Zeus, --the glorious one whom all the blessed throughout high Olympus reverence and honor.

Events in the simple past

Vehicle-treated animals spent equivalent time investigating a juvenile in the first and second sessions in experiments conducted in the NAC and the striatum: T1 values were 122 ± 6 s and 114 ± 5 s.

Now the wooers turned to the dance and to gladsome song, and made them merry, and waited till evening should come; and as they made merry dark evening came upon them.

Events with embedded facts

We also generated BJ/ET cells expressing the RASV12-ERTAM chimera gene, which is only active when tamoxifen is added (De Vita et al, 2005).

And she took her mighty spear, tipped with sharp bronze, heavy and huge and strong, wherewith she vanquishes the ranks of men-of warriors, with whom she is wroth, she, the daughter of the mighty sire.

Attribution in the present perfect

miRNAs have emerged as important regulators of development and control processes such as cell fate determination and cell death (Abrahante et al., 2003, Brennecke et al., 2003, Chang et al., 2004, Chen et al., 2004, Johnston and Hobert, 2003, Lee et al., 1993, ...

In this book I have had old stories written down, as I have heard them told by intelligent people, concerning chiefs who have held dominion in the northern countries, and who spoke the Danish tongue; and also concerning some of their family branches, according to what has been told me.

Implications are hedged, and in the present tense

These results indicate that although miR-372&3 confer complete protection to oncogene-induced senescence in a manner similar to p53 inactivation, the cellular response to DNA damage remains intact

Now it is said that ever since then whenever the camel sees a place where ashes have been scattered, he wants to get revenge with his enemy the rat and stomps and rolls in the ashes hoping to get the rat

tense use in science and mythology

Page 12: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

Concepts

KnownFact KnownFact

Experiment 1

Goal

Result

Data

Method

Goal

Experiment 2Data

Method Result

#4: ‘A fact is a claim, agreed by a committee’

Hypothesis

To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we...

Implication

Therefore, these results point toLATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity,

Fact

two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).

Raver-Shapira et.al, JMolCell 2007

miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Yabuta, JBioChem 2007

Voorhoeve et al., 2006

Page 13: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

so should we just keep reading papers?

Page 14: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

PHC Growth arrestundergo

possible representation: hypotheses, evidence and relationships

Paper A:implication

results

method

goal

fact

fact

Paper B:

data 4

data 5 data 6

implication

results

method

goal

fact

fact

underpinning

data 1

data 2 data 3

method link

Page 15: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

HYPotheses, Evidence and Relationships

- Goal: Align and expand existing efforts on detection and analysis of Hypotheses, Evidence & Relationships

- Partners:

- Harvard/MGH: SWAN, ARF

- Open University: Cohere

- Oxford University: CiTO, eLearning/Rhetoric

- DERI: SALT, aTags

- University of Trento: LiquidPub

- Xerox Research: XIP hypothesis identifier

- U Tilburg: ML for Science

- Elsevier, UUtrecht: Discourse analysis of biology

Hypothesis 22: Intramembrenous Aβ dimer may be toxic.

Derived from: POSTAT_CONTRIBUTION(This essay explores the possibility that a fraction of these Abeta peptides never leave the membrane lipid bilayer after they are generated, but instead exert their toxic effects by competing with and compromising the functions of intramembranous segments of membrane-bound proteins that serve many critical functions.

Page 16: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

W3C HCLS Sig Rhetorical Document Task

- Part of subgroup on discourse structure

- Goal: come up with a format for authors to explicitly create rhetorical/argumentational structure

- Make life of annotators easier!

- Please correct our ‘Pharma Use case’!

- http://esw.w3.org/topic/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/

Page 17: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

some things elsevier is doing

Page 18: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

three dimensions of annotation

automated

manual

typesetter/EW/SD reader/curator/data miningauthor/editor

document

claim

triple

entity

collection

data

semi-automated

Automated Copy Editing

Reflect

Page 19: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

.XMP RDF in all our PDFs: Dublin Core + PRISM

Page 20: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

but we all know she was wrong that day

Linked Data for Elsevier XML

<ce:section id=#123>this section argues that ‘the moon is made of

cheese’

said @anita on Feb 25 2010

immutable, $$, proprietary dynamic, personal, task-driven, - open?

Page 21: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

some things to mark up: EMTREE

Page 22: C-SHALS 2010: representing scientific discourse, or:  why triples are not enough

‘Community effort to establish an open, independent registry of Researcher Identifiers’