Upload
vukhue
View
214
Download
0
Embed Size (px)
Citation preview
COBHUNI | Universität Hamburg
Building a corpus of Islamic
embryology based on a robust and
elegant architecture
Alicia González Martínez,
Tillmann Josua, Thomas Eich
COBHUNI | Universität Hamburg
Outline
1 Advent and challenges of digital humanities
2 The COBHUNI project
3 The source texts
4 The workflow:
(a) collecting and processing source data
(b) annotation
(c) visualization
5 Conclusions and future work
COBHUNI | Universität Hamburg
The COBHUNI project
The COBHUNI project aims at diversifying our understanding
of how pre-natal life is conceptualized in texts of
Islamic normativity.
Contemporary Bioethics and the History of the Unborn in Islam
COBHUNI | Universität Hamburg
The COBHUNI project
1.1 Before the unborn
1.2 The unborn
1.3 After the Unborn
Philological exegesis
Hadith criticism
Latin script
Semen and similarity / heredity
Semen as colors
Semen and coitus interruptus or contraceptives
Semen and wet dream
Sex act itself & its timing
Conception / fertilization
General / larger debate about predestination
Embryology: 40 days
Embryology: Ensoulment
Embryology: Angel visits Embryo
Embryology: expressed in a series of numbers
Embryology: Macrocosm – microcosm
Embryology: Embryo and link to resurrection & afterlife
Embryology: Link to (modern) science
Pregnancy: duration: Definition
Miscarriage / abortion and legal status of slave mother
Miscarriage / abortion and legal status of free mother
Miscarriage / abortion and legal status of the siqt
Abortion compared to killing a new-born
Menstruation
Breast-feeding
Legal status questions concerning the child after birth
2 METAMOTIVES
3 NAMED ENTITIES Proper name→
1 MOTIVES
COBHUNI | Universität Hamburg
The source texts
COBHUNI
Corpus
crawl and extract
crawl andextract
scan and OCR&postcorrect
COBHUNI | Universität Hamburg
COBHUNICorpus
crawl and extract
crawl andextract
scan and OCR&postcorrect
Source material No. tokens
altafsir.com(Quran exegesis)
12,601,880
hadith.al-islam.com 11,482,139
Scan and OCR texts 36,132
TOTAL 24,120,151
The source texts
COBHUNI | Universität Hamburg
Source material No. tokens
altafsir.com(Quran exegesis)
12,601,880
hadith.al-islam.com 11,482,139
Scan and OCR texts 36,132
TOTAL 24,120,151
The source texts
COBHUNICorpus
crawl and extract
crawl andextract
scan and OCR&postcorrect
Prof. Thomas Eich | Universität Hamburg
Great ideas
COBHUNI
Corpus
insert
insert
import
export
Visualization Tool: Annis
import
queryprocessed
information
Annotation Tool: WebAnno
The workflow
insert
Collecting and processing source data
.htmlCOBHUNI
Corpus.json
get source code
crawl
insert
Convert & filter
COBHUNI | Universität Hamburg
Conclusions
✔ Create a multisource and reliable corpus of
Islamic texts of about 24M tokens
✔ Annotate subset of the corpus with semantic
information
✔ Developed a robust pipeline architecture for
integrating heterogeneous data, sanitising it,
enriching it and ingesting it into a powerful
software for corpus visualization (Annis)
COBHUNI | Universität Hamburg
Future work
✔ Continue to manually annotate the subcorpus for
the COBHUNI project
✔ Add more texts for the OCR part.
✔ Convert the data to other formats so that it
can be integrated in other projects
✔ Work on strategies to enrich the texts with
morphological information