39
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy Tale Anita de Waard, VP Research Data Collaborations Research Data Management Services, Elsevier

The Narrative Structure of Research Articles, or, Why Science is Like a Fairy Tale

Embed Size (px)

Citation preview

The Narrative Structure of Research Articles, or, Why Science is Like a Fairy Tale

Anita de Waard, VP Research Data Collaborations Research Data Management Services, Elsevier

Overview

1. Discourse Comprehension 101

2. Story grammars and the Cycle of Scientific Investigation

3. How can we help scientists read?

Discourse Comprehension 101

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

Discourse Comprehension 101

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

Discourse Comprehension 101

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

Discourse Comprehension 101

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

Discourse Comprehension 101

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

Discourse Comprehension 101

• Letter < syllable < word < clause < sentence < discourse:

This is how linguistics is structured.

But it is not how we understand text!

• Kintsch and Van Dijk, ‘93: we read a text at three levels:

– surface code: literal text, exact words/syntax

– text base: preserves meaning, but not exact wording

– situation model: ‘microworld’ that the text is about: constructed inferentially through interaction between the text and background knowledge

• We use knowledge about text genre to activate a schema: this allows creation of the text base and situation model

Discourse Comprehension 101

Examples of schema’s:

The structure of a research paper:

Discussion: • Statement of principal findings

• Strengths and weaknesses of the study• Relation to other studies

• Unanswered questions and future research

Introduction: “Create a Research Space”• Establish a research territory

• Establish a niche

• Occupy the niche

Methods and Results:“Cycles of Scientific Investigation”

(see below)

THORNDYKE, P.W. (1977), Cognitive Structures in Comprehension and Memory of Narrative Discourse, COGNITIVE PSYCHOLOGY 9, 77- 110 (1977)

A Story Grammar:

Story Grammar The Story of Goldilocks and

the Three Bears

Setting Time Once upon a time

Character a little girl named Goldilocks

Location She went for a walk in the forest.

Pretty soon, she came upon a

house.

Theme Goal She knocked and, when no one

answered,

Attempt she walked right in.

Episode Name At the table in the kitchen, there

were three bowls of porridge.

Subgoal Goldilocks was hungry.

Attempt She tasted the porridge from the

first bowl.

Outcome This porridge is too hot! she

exclaimed.

Attempt So, she tasted the porridge from

the second bowl.

Outcome This porridge is too cold, she

said

Attempt So, she tasted the last bowl of

porridge.

Outcome Ahhh, this porridge is just right,

Paper

Grammar

The AXH Domain of Ataxin-1 Mediates Neurodegeneration

through Its Interaction with Gfi-1/Senseless Proteins

Background The mechanisms mediating SCA1 pathogenesis are still not fully

understood, but some general principles have emerged.

Objects of

study

the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

Experimental

setup

studied and compared in vivo effects and interactions to those of

the human protein

Research

goal

Gain insight into how Atx-1's function contributes to SCA1

pathogenesis. How these interactions might contribute to the

disease process and how they might cause toxicity in only a

subset of neurons in SCA1 is not fully understood.

Hypothesis Atx-1 may play a role in the regulation of gene expression

Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When

Overexpressed in Files

Subgoal test the function of the AXH domain

Method overexpressed dAtx-1 in flies using the GAL4/UAS system

(Brand and Perrimon, 1993) and compared its effects to those

of hAtx-1.

Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which

drives expression in the differentiated R1-R6 photoreceptor cells

(Mollereau et al., 2000 and O'Tousa et al., 1985), results in

neurodegeneration in the eye, as does overexpression of hAtx-

1[82Q]. Although at 2 days after eclosion, overexpression of

either Atx-1 does not show obvious morphological changes in

the photoreceptor cells

Data (data not shown),

Results both genotypes show many large holes and loss of cell integrity

at 28 days

Story Grammar For A Science Paper:

Rubber hits the road in Results:Cycles of Scientific Investigation

© Gully Burns, 2011

CoSI in action:

© Gully Burns, 2011

3. We used for this experiment BJ/ET cells containing p14ARFkd because, following RASV12 treatment, in those cells p53 is still activated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resulting in a sensitized system for slight alterations in p53 in response to RASV12.

1. Importantly, our results so far indicate that the expression of miR-372&3 did not reduce the activity of RASV12, as these cells were still growing faster than normal cells and were tumorigenic, for which RAS activity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005).

2. To shed more light on this aspect, we examined the effect of miR-372&3 expression on p53 activation in response to oncogenic stimulation.

4. Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases, indicating an intact p53 pathway in these cells.

In defense of the clause as the unit of thought:

1. Importantly, our results so far indicate that the expression of miR-372&3 did not reduce the activity of RASV12, as these cells were still growing faster than normal cells and were tumorigenic, for which RAS activity is indispensable (Hahn et al, 1999 and Kolfschoten et al, 2005).

2. To shed more light on this aspect, we examined the effect of miR-372&3 expression on p53 activation in response to oncogenic stimulation.

3. We used for this experiment BJ/ET cells containing p14ARFkd because,following RASV12 treatment, in those cells p53 is still activated but more clearly stabilized than in parental BJ/ET cells (Voorhoeve and Agami, 2003), resulting in a sensitized system for slight alterations in p53 in response to RASV12.

4. Figure 4A shows that following RASV12 stimulation, p53 was stabilized and activated, and its target gene, p21cip1, was induced in all cases,indicating an intact p53 pathway in these cells.

Regulatory clause

Fact Goal Method Result Implication

Both seminomas and the EC component of

nonseminomas share features with ES cells. To

exclude that the detection of miR-371-3 merely

reflects its expression pattern in ES cells, we tested

by RPA miR-302a-d, another ES cells-specific

miRNA cluster (Suh et al, 2004). In many of the

miR-371-3 expressing seminomas and

nonseminomas, miR-302a-d was undetectable (Figs

S7 and S8), suggesting that miR-371-3 expression is

a selective event during tumorigenesis.

Both seminomas and the EC component of

nonseminomas share features with ES cells.

To exclude that

the detection of miR-371-3 merely reflects its

expression pattern in ES cells,

we tested by RPA miR-302a-d, another ES cells-

specific miRNA cluster (Suh et al, 2004).

In many of the miR-371-3 expressing seminomas

and nonseminomas, miR-302a-d was undetectable

(Figs S7 and S8),

suggesting that

miR-371-3 expression is a selective event during

tumorigenesis.

Fact

Hypothesis

Method

Result

Implication

Goal

Reg-Implication

Conceptual knowledge

ExperimentalEvidence

Clause, realm and tense:

Facts in the eternal present

Endogenous small RNAs (miRNAs) regulategene expression by mechanisms conserved across metazoans.

I sing of golden-throned Hera whom Rhea bare. Queen of the immortals is she, surpassing all in beauty: she is the sister and the wife of loud-thundering Zeus, --the glorious one whom all the blessed throughout high Olympus reverence and honor.

Events in the simple past

Vehicle-treated animals spent equivalent time investigating a juvenile in the first and second sessions in experiments conducted in the NAC and the striatum: T1 values were 122 ± 6 s and 114 ± 5 s.

Now the wooers turned to the dance and to gladsome song, and made them merry, and waited till evening should come; and as they made merry dark evening came upon them.

Events with embedded facts

We also generated BJ/ET cells expressing the RASV12-ERTAM chimera gene, which is only active when tamoxifen is added (De Vita et al, 2005).

And she took her mighty spear, tipped with sharp bronze, heavy and huge and strong, wherewith she vanquishes the ranks of men-of warriors, with whom she is wroth, she, the daughter of the mighty sire.

Attribution in the present perfect

miRNAs have emerged as important regulators of development and control processes such as cell fate determination and cell death (Abrahante et al., 2003, Brenneckeet al., 2003, Chang et al., 2004, Chen et al., 2004, Johnston and Hobert, 2003, Lee et al., 1993]

In this book I have had old stories written down, as I have heard them told by intelligent people, concerning chiefs who have held dominion in the northern countries, and who spoke the Danish tongue; and also concerning some of their family branches, according to what has been told me.

Implications are hedged, and in the present tense

These results indicate that although miR-372&3 confer complete protection to oncogene-induced senescence in a manner similar to p53 inactivation, the cellular response to DNA damage remains intact

Now it is said that ever since then whenever the camel sees a place where ashes have been scattered, he wants to get revenge with his enemy the rat and stomps and rolls in the ashes hoping to get the rat

Tense use in science and mythology:

Summing up:

1. Discourse Comprehension 101:– We read gobs of text and integrate these with our

knowledge networks

– We understand through schema’s

2. Story grammars and the Cycle of Scientific Investigation– Papers are like fairytales

– Within Results, Cycles of Scientific Investigation connect data to claims

– Tense helps identify the realm of the claim (like in mythology)

3. How can we use this to help scientists read?

So how can this understanding help us help scientists read papers?

• Why do we read? To learn, i.e.: obtain the knowledge contained within the text and integrate it with what we already know.

• What do we read? Things that are ‘interesting’ :

– Pertinent

– Possibly/probably true

– Novel, but in agreement with what I know

• How do we read?

human breast cancer

noninvasive MCF7-Ras

antisense oligonucleotides

high-grade malignancy

cell viability

retroviral vector

miR-31

cloned

transiently expressed miRNA sponges

Is it pertinent? -> Possibly…Is it true? -> ?Is it new, but in agreement with what I know? -> -?

Represent a paper as Collections of Noun Phrases?

miR-31 PREVENT acquisition of aggressive traits

miR-31 INHIBIT noninvasive MCF7-Ras cells

miR-31 ENHANCE invasion

cell viability AFFECT inhibitor

miR-31 expression DEPRIVE metastatic cells

Is it pertinent? -> Possibly…Is it true? -> ?Is it new, but in agreement with what I know? ->?

Represent a paper as Triples (Two Nouns and a Verb):

The preceding observations demonstrated that X expression deprives Y cells of attributes associated with Z.

We next asked whether X also prevents the acquisition of A traits by B cells.

To do so, we transiently inhibited X in C cells with either D or E.

Both approaches inhibited X function by > 4.5-fold (Figure S7A).

Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was unaffected by either inhibitor (Figure 3A; Figure S7B).

The E sponge reduced X function by 2.5-fold, but did not affect the activity of other known Js (Figures S8A and S8B).

Collectively, these data indicated that sustained X activity is necessary to prevent the acquisition of Z traits by both K and untransformed B cells.

Is it pertinent? -> Need contentIs it true? -> Sounds likely! I know this stuff!Is it new, but in agreement with what I know? -> Need content

Represent a paper’s Metadiscourse:

Claim:

• sustained miR-31 activity is necessary to prevent the acquisition of aggressivetraits by both tumor cells and untransformed breast epithelial

Evidence: Method:

• We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with eitherantisense oligonucleotides or miRNA sponges.

Evidence: Result:

• Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A).

• Suppression of miR-31 enhanced invasion by 20-fold and motility by 5-fold,but cell viability was unaffected by either inhibitor (Figure 3A; Figure S7B).

• The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affectthe activity of other known antimetastatic miRNAs (Figures S8A and S8B).

Is it pertinent? -> ProbablyIs it true? -> Sounds likely! Is it new, but in agreement with what I know? -> Check/know

Represent a Paper as a Set of Claims and Evidence:

Is it pertinent? -> Possibly Is it true? Is it new, but in agreement with what I know? -> Need background

-> Probably!

Show who wrote it, and where:

So we probably need all of these:• Surface code provides noun phrases and triples that offer

pointers re. topical relevance

• Text base and and situation model are created through specific metadiscourse conventions (e.g. refs at the end) that create a biological reasoning model:

• This can be expressed as a set of claims, linked to evidence, that can help represent key points in the paper

• Journal name and author’s affiliation help define schema and provide ‘willingness to be convinced’ socially/interpersonally.

We next asked whether …To do so, we transiently inhibited… Suppression of X enhanced invasion … but F was unaffected …(Figure 3A). …Collectively, these data indicated that … .

HypothesisGoal/MethodResultResultsImplication

But wait: there’s a wolf in the woods!

This article has been retracted: please see Elsevier Policy on Article Withdrawal (http://www.elsevier.com/locate/withdrawalpolicy). This article has been retracted at the request of the authors.

Our study reported that miR-31 is a regulator of multiple mRNAs important for different aspects of breast cancer metastasis. We recently identified concerns with several figure panels in which original data were compiled from different replicate experiments in order to assemble the presented figure. The scope of the figure preparation issues includes compiling data from independent experiments to present them as one internally controlled experiment, statistical analyses based on technical replicates that are not reflective of the biological replicates, and comparisons of selectively chosen data points from multiple experiments. As many of the published figures are therefore not appropriate or accurate representations of the original data, we believe that the responsible course of action is to retract the paper. We apologize for any inconvenience we have caused.

In Summary:

1. Discourse Comprehension 1012. Story grammars and the Cycle of Scientific

Investigation3. How can we help scientists read?

– Tools that ‘read’ papers and allow easy access to claims and evidence

– Tools and practices that record data (=evidence) throughout the practice of creating it

– Tools that help us make sense out of all of this networked knowledge

– Cultural habits to support these practices.

For Change to Occur, We Need Networks of Collaboration:

Force11:

– Multi-stakeholder, member-driven organisation

– Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.

– E.g.: RRID initiative just got implemented in Cell: “STAR Methods: Structured, Transparent, Accessible Reporting.”

National Data Service:

– Multi-stakeholder group, based around supercomputing centres

– Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects.

– Inviting Pilots: two or more partners who have not worked together, interested in collaborating on a data-centric project to solve a real-world needs

– E.g. Datasearch, Data Linking systems

RDA: – Coleading Data publishing, linking group

– Colead Cost Recovery group, part of RDA US Sustainability effort

– Active in Chemistry, Earth Science groups, starting IG on Data Search

– SciDataCon, Sept 11-16, Denver, CO

The NationalDATA SERVICE

Anita de WaardVP Research Data Collaborations Research Data

Management Services, Elsevier

[email protected]

And we all live happily

ever after….

Addendum: Can Computers Help Us Read?

Noun Phrases: some issues

• Problem 1: disambiguating terms (© GoPubMed):– Hnrpa1 = Tis = Fli-2 = nuclear ribonucleoprotein A1 = helix

destabilizing protein = single-strand binding protein = hnRNP core protein A1 = HDP-1 = topoisomerase-inhibitor suppressed.

– Cellulose 1,4-beta-cellobiosidase = exoglucanase

– COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T)

• Problem 2: disambiguating entities (© M. Martone):

– 95 antibodies were (manually!) identified in 8 articles

– 52 did not contain enough information to determine the antibody used

– Some provided details in other papers

– Failed to give species, clonality, vendor, or catalog number

Noun Phrases: some progress• Despite these difficulties, noun phrase recall/precision is

quite high, e.g. I2B22011 [1], [2], others: 90%-98%

• Many tools, see [3] for a list; e.g. GoPubMed:

Triples: some issues:

• Contingent on good NP & VP detection

• Hard to parse text! E.g. a commercial tool gave:insulin maintaining glucose homeostasis

When insulin secretion cannot be increased adequately (type Idiabetes defect) to overcome insulin resistance in maintainingglucose homeostasis, hyperglycemia and glucose intolerance ensues.

insulin may be involved glucose homeostasis

Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.

Triples: some progress:Biological Expression Language [4]: We provide evidence that these miRNAs are potential novel oncogenes participating in the development of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53.

Increased abundance of miR-372 decreases activity of TP53

r(MIR:miR-372) -| tscript(p(HUGO:Trp53))

Context: cancer

SET Disease = “Cancer”

Activity of TP53 decreases cell growth

tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”

Metadiscourse: why it matters

• Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor suppressor LATS2.”

• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).”

• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).”

• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).”

“[Y]ou can transform .. fiction into fact just by adding or subtracting references”, Bruno Latour [5]

Adding Metadiscourse To TriplesClaim ORCA Value

Together, Lats2 and ASPP1 shunt p53 to proapoptoticpromoters and promote the death of polyploid cells [1]. (…)

Value = 3Source = NBasis = 0

Further biochemical characterization of hMOBs showed that only hMOB1A and hMOB1B interact with both LATS1 and LATS2 in vitro and in vivo [39]. (…)

Value = 3Source = NBasis = Data

Our findings reveal that miR-373 would be a potential oncogene and it participates in the carcinogenesis of human esophageal cancer by suppressing LATS2 expression.

Value = 1 or 2 ?Source = AuthorBasis = Data

Furthermore, we demonstrated that the direct inhibition of LATS2 protein was mediated by miR-373 and manipulated the expression of miR-373 to affect esophageal cancer cells growth.

Value = 2 (or 3?)Source = AuthorBasis = Data

Claims and Evidence: some issues:• Data2Semantics [11]: linking clinical guidelines to evidence.

Inconsistency within guideline and guidelines v. evidence: • Studies have demonstrated inconsistent results regarding the use of such

markers of inflammation as C-reactive protein (CRP), interleukins- 6 (IL-6) and -8, and procalcitonin (PCT) in neutropenic patients with cancer [55–57]. • [55]: PCT and IL-6 are more reliable markers than CRP for predicting

bacteremia in patients with febrile neutropenia• [56] In conclusion, daily measurement of PCT or IL-6 could help identify

neutropenic patients with a stable course when the fever lasts >3 d. …, it would reduce adverse events and treatment costs.

• [57] Our study supports the value of PCT as a reliable tool to predict clinical outcome in febrile neutropenia.

• Drug Interaction Knowledgebase [12]: how to identify evidence? • R-citalopram_is_not_substrate_of_cyp2c19:

• At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 -60% of control, quinidine to 80%, and omeprazole to 80-85% of control (Fig. 6).

Claims and Evidence: some progress• Defining ‘salient knowledge components’ in text:

– Argumentative zones, CoreSC can both be found

– Blake, Claim networks (more soon!)

– Claimed Knowledge Updates (Sandor/de Waard, [13]):