37
Wido van Peursen, VU University Amsterdam, Faculty of Theology

Wido van Peursen, VU University Amsterdam, Faculty of Theology

Embed Size (px)

Citation preview

Page 1: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Wido van Peursen,VU University Amsterdam, Faculty of

Theology

Page 2: Wido van Peursen, VU University Amsterdam, Faculty of Theology

1. The corpus: Hebrew Bible2. The WIVU Database3. CLARIN-project: SHEBANQ4. NWO-project: Syntactic Diversity in

BH

5. Case study: Judges 4 and 5

Page 3: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Ca. 400.000 words Probably composed over a period of ca. 1000

years (1200-200 BC) Complex transmission history Oldest complete MS: Codex Leningradensis,

1008/9 AD Various linguistic layers (e.g. vowel signs) No native speakers

Page 4: Wido van Peursen, VU University Amsterdam, Faculty of Theology

WIVU database of the Hebrew Bible [WIVU = Werkgroep Informatica Vrije

Universiteit]• Createted since 1970s• Linguistic levels:

Morphology (encoding rather than tagging!) Words Phrases Clauses Sentences Text hierarchy

Page 5: Wido van Peursen, VU University Amsterdam, Faculty of Theology
Page 6: Wido van Peursen, VU University Amsterdam, Faculty of Theology
Page 7: Wido van Peursen, VU University Amsterdam, Faculty of Theology

1. The corpus: Hebrew Bible2. The WIVU Database3. CLARIN-project: SHEBANQ4. NWO-project: Syntactic Diversity in

BH

5. Case study: Judges 4 and 5

Page 8: Wido van Peursen, VU University Amsterdam, Faculty of Theology

System for HEBrew text: ANnotations for Queries and markup

Page 9: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Challenges:

1. No dedicated space on the web where an authorized version of this resource is guaranteed to exist.

2. No possibility to annotate it, link to it or build (open source) tools around it.

3. Results of existing queries cannot be shown on the web.

4. EMDROS is maintained by one-person private company.

5. Mainly used by specialists in Bible & Computer.

Page 10: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Mission:• To build a bridge between the linguistically

annotated Hebrew Text corpus and biblical scholars.

Three steps:(1)make text & annotations, available to scholars;(2)demonstrate how queries can function to address

research questions: repository of saved queries;(3)give textual scholarship more empirical basis, by

creating the opportunity of unique identifiers referring to saved queries.

Page 11: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Mission:• To build a bridge between the linguistically

annotated Hebrew Text corpus and biblical scholars.

Three steps:(1)make text & annotations, available to scholars;(2)demonstrate how queries can function to address

research questions: repository of saved queries;(3)give textual scholarship more empirical basis, by

creating the opportunity of unique identifiers referring to saved queries.

Page 12: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Mission:• To build a bridge between the linguistically

annotated Hebrew Text corpus and biblical scholars.

Three steps:(1)make text & annotations, available to scholars;(2)demonstrate how queries can function to address

research questions: repository of saved queries;(3)give textual scholarship more empirical basis, by

creating the opportunity of unique identifiers referring to saved queries.

Page 13: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Mission:• To build a bridge between the linguistically

annotated Hebrew Text corpus and biblical scholars.

Three steps:(1)make text & annotations, available to scholars;(2)demonstrate how queries can function to address

research questions: repository of saved queries;(3)give textual scholarship more empirical basis, by

creating the opportunity of unique identifiers referring to saved queries.

Example: “in-his –feet”: a.“on foot” orb.“in his footsteps”.Disambiguation: 1.intuitive/contextual or2.on basis of pattern recognition (participants/agreement)

Page 14: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Mission:• To build a bridge between the linguistically

annotated Hebrew Text corpus and biblical scholars.

Three steps:(1)make text & annotations, available to scholars;(2)demonstrate how queries can function to address

research questions: repository of saved queries;(3)give textual scholarship more empirical basis, by

creating the opportunity of unique identifiers referring to saved queries.

[she-sang <Pr>] [Deborah and Barak <Su>]

Page 15: Wido van Peursen, VU University Amsterdam, Faculty of Theology

1. The corpus: Hebrew Bible2. The WIVU Database3. CLARIN-project: SHEBANQ4. NWO-project: Syntactic Diversity in

BH

5. Case study: Judges 4 and 5

Page 16: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Does Syntactic Variation reflect Language Change? Tracing Syntactic Diversity in Biblical Hebrew Texts

Page 17: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Explanations for linguistic diversity:• Genre• Chronology• Language contact (Aramaic)• Dialects• Textual transmission• Oral versus written layers

Page 18: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Limitations in current research:• Focus on separate Bible books• Methodological presuppositions• Focus on lexical items or set phrases• Failure to make use of methods for

researching linguistic variation and change. • Failure to incorporate insights into syntactic

differences between independent / dependent clauses and between narration / direct speech.

Page 19: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Our approach• Focus on syntax in three project

components: Phrase level Clause level Text level

• Synthesis: Integration of congruous and contradicting tendencies.

• Extra-biblical texts used as points of comparison.

Page 20: Wido van Peursen, VU University Amsterdam, Faculty of Theology

1. The corpus: Hebrew Bible2. The WIVU Database3. CLARIN-project: SHEBANQ4. NWO-project: Syntactic Diversity in

BH

5. Case study: Judges 4 and 5

Page 21: Wido van Peursen, VU University Amsterdam, Faculty of Theology

These chapters deal with battle• of Deborah, Barak and Israelite tribes• against the Canaanite king Jabin and his

army-captain Sisera. Differences, e.g.:

• 4 is prose, 5 is poetry.• Main figures (Jabin absent in 5).• Tribes involved (only two in 4).

Page 22: Wido van Peursen, VU University Amsterdam, Faculty of Theology

4 depends on 5 Wellhausen 1878; Halpern 1983; Houston 1997;

Neef 2002 and many others. 5 depends on 4

Bechmann 1989; Waltisberg 1999. Common source/tradition

Richter 1963; Younger 1991. Synchronous/sequential

Guest 1998; Reis 2005.

Page 23: Wido van Peursen, VU University Amsterdam, Faculty of Theology

1. Identification of ‘similar’ text segments on the basis of ‘distance’ (synopsis impossible).

2. Identification of text features that cause high similarity scores.

3. Analysis of the distribution of these features in the larger context of Judges and the Old Testament.

Page 24: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Is intuition that 4 and 5 belong together supported by textual features?

If so, where in the text can they be found?

Similarity matrices: ‘distance’ measuring between each verse from ch. 4 and each verse from ch. 5.

Page 25: Wido van Peursen, VU University Amsterdam, Faculty of Theology

4\ 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 1 2 2 1 2 1 1 1 2 0 2 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 2 0 1 2 1 1 0 0 0 1 1 1 0 1 0 1 1 1 0 2 1 0 0 2 0 0 2 0 1 0 1 1 2 3 1 2 2 1 2 1 1 1 2 0 2 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 2 3 4 1 1 1 0 1 0 2 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4 5 2 1 1 0 2 1 2 1 1 1 2 2 0 1 1 2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 5 6 4 2 3 1 4 2 1 3 2 1 2 3 1 2 2 0 0 2 1 0 0 0 2 0 0 1 0 0 0 0 1 6 7 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 2 0 0 0 1 2 0 2 0 1 0 7 8 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 8 9 3 1 1 1 1 1 2 0 1 2 1 3 1 0 2 0 0 0 0 1 0 0 2 1 0 2 0 1 0 1 1 9

10 2 0 0 0 0 0 1 1 0 0 0 2 0 1 3 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 10 11 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 11 12 3 0 0 0 1 1 0 0 0 0 0 3 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 12 13 0 1 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 1 2 0 0 0 0 1 0 2 0 1 1 13 14 4 1 1 2 3 1 2 1 1 0 2 3 2 2 2 0 0 0 0 1 0 0 2 0 1 2 0 1 0 1 2 14 15 1 1 1 1 2 0 0 0 1 0 2 1 2 1 2 0 0 0 0 1 0 0 1 0 0 1 1 3 0 1 2 15 16 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 1 1 2 0 1 1 16 17 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 5 0 1 2 1 0 1 0 17 18 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 2 0 1 0 1 0 1 1 18 19 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 2 0 0 0 0 0 0 19 20 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 1 1 0 0 1 0 0 0 20 21 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 1 4 0 3 0 1 0 0 1 21 22 2 0 0 1 0 2 0 1 0 1 0 1 0 0 1 0 0 1 1 1 0 0 2 1 0 3 1 2 0 1 1 22 23 2 1 3 0 3 2 1 2 1 0 1 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 23 24 1 1 2 0 1 2 1 1 1 1 1 1 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 24

Page 26: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Shared Lexemes: the more shared lexemes, the smaller the

distance. ‘Noise’: e.g. ‘and’ >

Stoplist: exclude frequent particles etc. Selection of content words on basis of part of

speech: only words with inflection (nouns, verbs, adjectives).

Page 27: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Basic unit for text comparison: verse, but ‘verse’ based on traditional unit delimitation.

Differences in verse size may affect results.

Jaccard Index: the intersection of the number of shared lexemes divided by the union.

Page 28: Wido van Peursen, VU University Amsterdam, Faculty of Theology

I went homeI went home yesterday

Intersection: Shared lexemes (types): 3 (I, went, home)Union: Total number of lexemes: 4 (I, went, home, yesterday)Jaccard Index = 3/4 = 0.75

I went homeAfter the meeting I went home yesterday

Intersection: 3 (I, went, home)Union: 7 (I, went, home, after, the, meeting, yesterday)Jaccard Index = 3/7 = 0.43

Page 29: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Shared lexemes: ‘feature-based’. Also ‘blind’ methods, based on

mathematical characteristics of the digital representation of the text, e.g. Normalized Compression Distance (NCD).

Page 30: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Example: verse pairs with the highest number of shared lexemes (4 or more)

Page 31: Wido van Peursen, VU University Amsterdam, Faculty of Theology

5:1 5:5 5:24

4:6AbinoamBaraksayson

GodIsraelthe LORDmountain

4:14BarakdayDeborasay

4:17

HeberJaelKenitetentwife

4:21HeberJaeltentwife

Page 32: Wido van Peursen, VU University Amsterdam, Faculty of Theology

Proper nouns: ‘Barak’, ‘Israel’.

Common nouns that are part of proper noun phrases:

‘wife’ in ‘Jael the wife of Heber’; ‘son’ in ‘Barak the son of Abinoam’.

Other verbs and common nouns: ‘say’, ‘tent’, ‘day’.

Page 33: Wido van Peursen, VU University Amsterdam, Faculty of Theology
Page 34: Wido van Peursen, VU University Amsterdam, Faculty of Theology
Page 35: Wido van Peursen, VU University Amsterdam, Faculty of Theology
Page 36: Wido van Peursen, VU University Amsterdam, Faculty of Theology

High similarity scores in places that show high concentration of proper nouns.

Even within category of proper nouns considerable differences.

Shared common nouns and verbs: frequent words such as ‘day’, ‘say’. No significant concentration.

Page 37: Wido van Peursen, VU University Amsterdam, Faculty of Theology

In case of literary dependency we would expect at least some concentration of shared lexemes.

Significant number of shared lexemes only in case of proper nouns.

But proper nouns suggest shared traditions, rather than literary dependency.