Upload
bleierr
View
220
Download
0
Embed Size (px)
Citation preview
From crowd-sourced collection to digital scholarly edition
The example of the Letters of 1916 project
Funding Bodies
Susan Schreibman - Project Director and Editor in Chief
Karolina Badzmierowska - Researcher
Roman Bleier - Researcher
Emma Clarke - Researcher
Vinayak Das Gupta - Researcher
Richard Hadden - Researcher
Hannah Healy - Researcher
Shane McGarry - Software Engineer
Neale Rooney - Researcher
Linda Spinazzè - Researcher
Team
An Foras Feasa Institute, Maynooth
Why 1916?
The Easter Rising
24-29 April 1916
“ Allowing letters from personal collections
to be read alongside official letters
and letters contributed by institutions
will add new perspectives
to the events of the period and allow us
to understand
what it was like to live an ordinary life
through what were extraordinary times ”
Susan Schreibman
1 November 1915 - 31 October 1916
The Letters 1916 - a year in the life
Letters of 1916 - some numbers (from 13 October)Launched: 27 September 2013
Correspondence documents uploaded:
2209
Uploaded items from 42 private
collections and 23 collaborating
institutions
Registered users: 1159
Transcribed characters: 2308911
Diversity of Letters of 1916 correspondence data
Diversity of documents:
Single/Multi-page Letters
Postcards
Greeting cards
Telegrams
Envelopes
...
Variety of topics:
Love letters
Family life
Business
Crime
World War One
...
Crowdsourcing workflow - upload
Crowdsourcing workflow - transcription desk
Facsimile image
Bentham toolbar
Text Editor
Toolbar
About the TrainingTraining of transcribers - Essential part of public outreach
Leads to better quality of transcriptions
Workshop
Seminars
Secondary school history teachers,
students, and general public
Goal : Accuracy
1. Incorrect or incomplete metadata
2. Non-TEI markup (e.g. HTML tagging…)
3. TEI tag abuse - misunderstandings
Facing three main areas with quality issues
Community engagement vs standards of excellence?
Incorrect, incomplete or incoherent metadata
the field correspond to the tag
<note type="summary"> inside the TEI header
non-TEI markup (HTML cases)
non-TEI mark-up (non XML)
Indication of location of a section of text:
NOTE IN LEFT MARGIN Give my regards to Dick when next you meet him
(front of post card)To Lady Clonbrock, Ahascragh, Co.Galway
[Handwritten notes at bottom :I Note annexII Await any application from Prof Collingwood;III Resubmit on 1st March]
Uncertainty and missing text:James McCarthy & Family, Wm Perron. 1.50 Nick Welch, xxxxxxx Jxxx & Mrs Shields. 1.00 Alex xxx, Fred xxxx, M. Barry, L. x.
has told you ?Neeson? is in Sussex. Th? ????? ???? ?????letters? from him, but no
(samples from reliable transcriptions)
TEI tag abuse - misinterpretation of TEI
The transcriber uses the tags in an
attempt to recreate the layout
The Transcriber applies the tags without
comprehending the functionality
Quest for Crowdsourcing Accuracy
Quality check:
● pre-selecting the contributors● a self-regulating community● professional staff hired to ensures the crowdsourced content is fine
The 1916 Letter project tries to go a different way and applies a hybrid and semi-automated approach to proofing
Borrowing a Unix Philosophy
“If you can get 90 percent of the desired effect for 10 percent of
the work, use the simpler solution.”
— Bob Sheifler and Jim Gettys, Early Principles of X-Window
Difficult Letters
Modularity in crowdsourced transcribing and editing
Crowdsourcing needs
discrete tasks to be
carried out —
otherwise, chaos!
Post-Omeka Workflowletters: { 302: { title: ”Letter from Patrick Pearse to his mother”, pages: {
27: { facs: “img27.jpg” transcription: “<p>Dear Mother</p> [...] <salute>Your loving son</salute> Padraic.” } 28: {...} }
other-metadata: {...} }, 303: {...}}
Basic typos with tagsSome examples
Slashes in the wrong place:</pb> → <pb/><address/> → </address>
Accidental angle brackets:<<p> → <p>
Missing angle brackets:<salute → <salute>
Number of ‘tag-typos’ per letter (grouped by number of errors)
Nearly half the letters have at least one tag-typo we can fix like this
Finding types of correspondence“Letter from Patrick Langford Beazley to Piaras Béaslaí, 14 Feb
1916”
“Postcard from Herbert Pim to John Sweetman, 1 October 1916”
“Deportation Order from the Secretary of State to James Gough,
17 June 1916” ??
Envelopes
Page 4
DUBLIN 16 APRIL<address>Diarmid Coffey <sic>Esqu</sic>,<lb/>
Mount Trenchard,<lb/> Foynes,<lb/>
Co. Limerick, Ireland</address>
Page 1
<note>you addressed <lb/><sic>yr</sic> letter to<lb/> Harcourt Terrace<sic>wh</sic> delayed it late <lb/>it came this <lb/>afternoon! <lb/>toolate to<lb/> <hi rend="underline">write</hi></note><address>Langridge,<lb/>Bath</address><date>16.10.16</date><salute>Dearest D.</salute><p> Phyllis & Basil have <lb/>written that they come <lb/> out for weekend so [...] Envelope
address>3 Coast Hill <lb/> Queenstown </address> <date>June.19.1916 </date> <salute>My Own Dearest Jim </salute>
Wish of your loving <lb/> <salute>Mother A. Fitzgerald </salute> xxxxxxx</p>
Adding structural elements to letters <opener> <address> <addrLine>3 Coast Hill </addrLine> <addrLine>Queenstown </addrLine> </address> <dateline> <date>June.19.1916 </date> </dateline> <salute> My Own Dearest Jim </salute> </opener>
<closer> <salute> Wish of your loving <lb/> Mother </salute> <signed> A. Fitzgerald </signed> </closer>
Adding the @when<date>Tues oct 22 1916</date>
>>> a = dateparser.parse('Tues oct 22 1916')
>>> a
datetime.datetime(1916, 10, 22, 0, 0)>>> a.date().isoformat()
'1916-10-22'
<date when=”1916-10-22”>Tues oct 22 1916</date>
Postcards
Type 2
Type 1
Templating
LetEd.
Questions to concludeIs it worth it?
Why the trouble of TEI encoding instead of plain text?
Roman Bleier | [email protected]
Richard Hadden | [email protected] | @oculardexterity
Linda Spinazzè | [email protected]
We welcome suggestions, comments, questions.