Upload
impact-centre-of-competence
View
179
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation on language tools, presented by Jesse de Does and Katrien Depuydt during demo session held at the BNE 5th of October 2011.
Citation preview
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Computer Lexica in OCR and Retrieval
Katrien Depuydt, Jesse de Does (Instituut voor Nederlandse Lexicologie, Leiden)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
4 March 2009 presentation The Hague 2
Can we handle ‘de wereld’ (‘the world’)’?
werreid
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 3
OCR:Abbyy Finereader SDK with built in standard Dutch dictionary
OCR:Abbyy Finereader SDK combining built in modernDutch dictionary with IMPACT external historical lexicon of Dutch:
werreld
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 4
werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
RETRIEVAL: key in modern WERELD and find all
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
5
The long s problem: An example ….
OCR at start of project
A. De eerde was de gevaarlykflti om de verlei¬ding aan 't Hof; de tweede de ftillie en veiligde;de derde de zwaarde, daar hy byna drie millioenenharde en onbefchaafde Menfchen beftieren moest.
.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
6
The long s problem: An example ….
OCR at start of project Results April 2010
A. De eerde was de gevaarlykflti om de verlei¬ding aan 't Hof; de tweede de ftillie en veiligde;de derde de zwaarde, daar hy byna drie millioenenharde en onbefchaafde Menfchen beftieren moest.
A. De eerste was de gevaarlykste om de verlei-ding aan 't Hof; de tweede de stilste en veiligste;de derde de zwaarste, daar hy byna drie millioenenharde en onbeschaafde Menschen bestieren moest.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
7
The long s problem: An example ….
OCR at start of project Results April 2010
A. De eerde was de gevaarlykflti om de verlei¬ding aan 't Hof; de tweede de ftillie en veiligde;de derde de zwaarde, daar hy byna drie millioenenharde en onbefchaafde Menfchen beftieren moest.
A. De eerste was de gevaarlykste om de verlei-ding aan 't Hof; de tweede de stilste en veiligste;de derde de zwaarste, daar hy byna drie millioenenharde en onbeschaafde Menschen bestieren moest.
Workaround: “integrated postcorrection” tell the engine that “eerfte” is OK and postcorrect it afterwards with the lexicon.
In this way we keep it from turning to “eerde” (earth) instead of “eerste” (first)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 8
Overview
What is a computer lexicon
Lexica in IMPACT
Tools for lexicon building and applying lexica
Some results
Searching Demonstration
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 9
What is a computer lexicon?
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 10
Computer lexicon vs electronic dictionary (1)
An electronic dictionary is: Digitised full text (no pictures) For human use Ideally: searchable with explicitely coded material (XML), such as a lemma, part of speech (PoS), meaning, quotes etc. Examples: OED online, WNT online
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 11
Dictionary XML (example)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 12
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 13
Computer Lexicon vs Electronic Dictionary (2)
A computer lexicon is: Always in a structured digital format (XML, relational database) Main purpose: computer application Explicitely coded information (e.g. lemma wereld, part of speech noun, morphology werelden, werelds … , syntax)
Examples of use:
Linguistic enrichment of text material ‘Advanced’ searching (words with all spelling variant and inflections) Automatic summarization, keyword extraction…
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 14
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 15
Lexica in IMPACT
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 16
The OCR lexiconAn OCR lexicon is
A checked list of words in a language Based on a corpus (collection) of dated texts (selection!) Preferably with frequency information Preferably from the same time period or of the same text type as the texts you wish to digitize
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 17
OCR lexicon: example1550-1750 > 1900
song 820rihte 818theire 818manye 818sume 815Do 814Whiche 811fyrst 811while 811Water 810wt 809shalbe 808thingis 807again 806sona 806wa 805mode 804work 802between 801law 799moder 798mis 798softe 798
television 418electronic 375video 194hormone 176jazz 162eco 142software 136vitamin 128movie 121taxi 113isotopic 108electronics 95radar 86basically 71sabotage 71homozygote 70psychedelic 67phonemic 66insulin 64zap 64antibody 61fungicidal 61
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 18
The IR lexicon IR lexicon: most
important information categoriesword forms (lists of words) +
- frequency information- quotes
(dated sources) from corpora or electronic dictionaries- MODERN LEMMA (// entrance dictionary) linked to spelling variants and inflected forms of the
same wordT
he modern lemma is used for searching in textsS
tandard use in corpus linguistics and modern historical lexicography
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 19
<?xml version='1.0'?><!DOCTYPE lexicon SYSTEM 'NL_Structure.dtd'><lexicon><lexical_entry><lemma_id>219490</lemma_id><modern_lemma>aantuilen</modern_lemma><gloss></gloss><POS>VRB</POS><ne_label></ne_label><language_id></language_id><portmanteau_lemma_id></portmanteau_lemma_id>
<wordform><form_representation><wordform_id>850026</wordform_id><written_form>tuyld</written_form><attestation><id>92141</id><token_id></token_id><quote>Verhael ick (<I>t.w. een als vrouw verkleede man</I>) haer mijn min in Vrouwelijcker schynen: Sy acht het boertery, en tuyld daer weer op an, Vermits een Vrou niet op een Vrou verlieven kan,</quote><derivation_id>0</derivation_id><document_id>204</document_id><start_pos>119</start_pos><end_pos>124</end_pos></attestation></form_representation></wordform>
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 20
Tools for lexicon building and application of lexica
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 21
Types variation (spelling, inflection…)uytterlijcste uyterlijkste d'uyterlijke uiterlyke uyterlijcke uiterlijke uyterlijck uiterlyken uiterlijkste uiterlicke wterlicke wterlijcke ulterlijk uiterlyk uiterlijk uyterlick wterlicken d'uyterlijcke uiterlijken uiterlijks wterlijck uytterlicke uitterlijke ujterlijke uytterlijk uyterlycke uyterlicken uijterlicke d'uiterlijcke wtterlijcke wterlyke wtterlijk uuterlick uuterlic uyterlijke uyterlijcken uyterlicke d'uiterlyke wterlijke vuyterlijcke uuterlycke uuterlicke wterlijken uyterlijcksten uuyterlicke uuyterlick uuyterlycke uytterlijcke uytterlycke uytterlick vuytterlicke uiterlijker uyterlyck uterliek wterlijcken uiterlijkst uitterlijk uytterlijcken uyterlyk wterlick uutterlijck uuyterlicken uyttelijck uijterlijk uytterlijck uuterlijck uiterlick uitterlyk uuyterlic uuyterlyck uuyterlijck uiterlijck uytterlyck uterlyc wterlijk
I
werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
II
(patterns to predict variation)
(a number are predictable with patterns, others need to be taken from a lexicon )
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Neil Fitzgerald, 7th July 2011 22
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 23
Computer lexica
For OCR and OCR post correction Improving searchability of historic text material by building a lexicon
with variants by using a modern lemma as a search entry
Tools for lexicon building Tools for application of lexicon in search engines Lexicon cookbook
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 24
Tools (more specific)- Lexicon building from corpus material and dictionaries - Use of lexica in search engines
- Tool to extract spelling variation patterns from historical material
- Tool to relate previously unrecognised spelling variations to their standard form
- Tool to deduct previously unrecognised inflected forms to their basic form
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
25
Spelling variation tools (pattern-based) Language-independent approach:
Supervised rule (pattern) induction from pairs (“modern” word, historical word), yielding patterns like aa/ae, s/z, ….
Pattern weights are computed from example material
Additional approaches possible, eg. : Use of aligned data (parallel historical text and modern version)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
26
Lemmatization Reduction of historical word forms to modern lemma Historical word standard (“modern”) spelling lemma form (pattern matching) (lemmatizer)
Dystels (1) distels (2) distel
When we have a perfect or near-perfect modern full form lexicon, the second step is simply lexicon lookup.
But: 1) We will not have full form information for many lemmata
(especially the historical ones)2) Even lemmata present in modern language may have historical
inflected forms different from the present-day paradigm
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
27
Lemmatization and reverse lemmatizationWe also need a lemmatization process for these situations A typical lemmatizer assigns some standard form (infinitive,
nominative, stem) to inflected forms. Usually based on patterns relating the inflected form to the standard form.
But: Matching these patterns can be hard to combine with matching
both spelling variation patterns and OCR errors (bok/bokken/bokkeu)
We adopt the solution of actually expanding the “hypothetical modern full form lexicon” containing the most plausible possible paradigmatic expansions of lemmata
This construction is carried out by means of a statistical reverse lemmatizer
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
28
Attestation From hypothetical (non-witnessed) lexicon content to attested word forms in
“real” text Automatic selection of candidate attestations Manual work: verification and correction
Two approaches Dictionary based (INL): Woordenboek der Nederlandsche Taal Corpus based (LMU, INL): Dutch DBNL corpus
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
29
IMPACT Dictionary Attestation Tool
work
• We are working on what works.
• Depart from me, ye that worke iniquity.
• She worcketh knittinge of stockings.
headword
Quotations
variants
Task Find the variants of a headword as they occur in the quotations
Lexicon building at work: Verifying attestations in historical dictionaries
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
30
IMPACT Dictionary Attestation Tool
Automatically (preprocessing)
• match literally e.g: work work, Work
• match using existing lexica and lists e.g: work works, worked, wrought
• approximate matching e.g: work worke
By hand (using the tool)
• correct automatic mismatches e.g: works words, worms
• find missed matches e.g: work worketh, wrowght
Task Find the variants of a headword as they occur in the quotations
Electronic
historical
dictionary Database
with lemmata
and quotatioms
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
31
IMPACT Attestation ToolTool
Lemma headword
Quotations
Sorted by uncertainty
Up-to-date overview of what is done and needs to be done
Done by this user so far
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
32
IMPACT Lexicon Tool
Automatically (preprocessing = apply lemmatizer)
• match literally e.g: work work, Work
• match using existing lexica and lists e.g: work works, worked, wrought
• matching using spelling variation module e.g: uiterlijk uyterlick
By hand (using the tool)
• assign correct lemma e.g: was (N) zijn (V)
• group tokens belonging together e.g: konings zoon koningszoon
• select attestations
Task Find and verify attestations in a historical corpus
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
33
Corpus-based lexicon building: Impact Lexicon Tool
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
34
General vocabulary vs. Named entitiesT
ools for lexicon building described so far: applicable to general lexiconT
ools for NE recognition, classification and variant matching
- library requirement- distinguish general vocabulary from NE’s- avoid unpleasant mixups like Abimelech apemelk! (b/p; i/e; e/0; k/ch)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010
35
Improvement of state of the art / innovation
We use existing computational linguistic approaches, but figure out how to apply them to historical language
We develop a workflow to deal with the problems posed by historical language, figuring out how all pieces fit together Data selection and acquisition Manual work Computational linguistics tools
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
languages in IMPACTD
utch, German, English, Spanish, FrenchP
olish, Czech, Slovene and Bulgarian
-Cross language perspective paper
-Parallel OCR and IR experiments
-GT datasets
-Language tools: language independent
-Except from 3 core languages: proof of concept lexica
IMPACT <Demo Day BL, 12 July 2011> 36
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
OCR evaluation results(preliminary!)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
1. Czech Co jest konstituce?, čili, Krátký, prostonárodní wýklad hlawnějších
zásad konstitucí ewropejských, 1848 Ferina Lišák z Kuliferdy a na Klukově, čili, Kratičká historye
zlopověstných kousků starého Reinecke, 1848 Homerowa Iliada, 1802 Na den narození neimocněišího, a neijasněišího cysare rímského,
téz dědičného rakauského a krále ceského, Frantiska II., w Praze 12. den mesyce Unora, léta 1805, 1805
Plody sborů učenců řeči českoslowanské prešporského, 1836 Rozprawy o gmenách, počátkách i starožitnostech národu
Slawského a geho kmeni /, 1830 Sokol, 1872 Základowé pitwy (Anatomie), čili, Soustawnj rozbor a popis těla
lidského a gednotliwých geho částek, 1840
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
2.Dutch1
8th and 19th century books, newspapers, parliamentary papers
Provinciale Overijsselsche en Zwolsche courant : staats-, handels-, nieuws- en advertentieblad, 1852-1852
Rechtsgeleerd advis in de zaak van den gewezen stadhouder, en over deszelfs schryven aan de gouverneurs van de Oost- en West-Indische bezittingen van den staat [...]. Ingelevert [...] op den 7 january 1796. / By B. Voorda et al, 1796-1796
Verhaal van het levensgevaar, waar in zig drie Rotterdamsche burgers [...] bevonden hebben, te Utrecht, 1784-1784
Vrijmoedige aanmerkingen, over de uitsluiting van allen die door publieke armkassen bedeeld worden, als stemgerechtigden [...] bij eene oproeping van het Nederlandsche volk tot eene Nationaale Conventie, 1795-1795
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Precision: 0.8432889410216431 , Recall: 0.843331934927516
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
English1
6th-19th century materialS
ources for lexicon building: OED, ECCO
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
French1
7th century books
Conduite du jugement naturel où tous les bons esprits de l'un et l'autre sexe pourront facilement puiser la pureté de la science, par M. Jacques Forton, sieur de S. Ange,..., 1653
Dissertation de la philosophie en général, 1668
La Dialectique du sieur de Launay, contenant l'art de raisonner juste sur toute sorte de matières..., 1673
Lettre de M. Gadroys à M. de La Grange Trianon,... pour servir de réponse à celle que M. de Castelet a écrite contre les raisons de M. Descartes touchant le flux et le reflux de la mer. - Seconde lettre de M. Gadroys... [au même, sur le même sujet.], 1677
Traitez de métaphysique démontrée selon la méthode des géomètres. [Par le sieur de La Coudraye.], 1693
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
German Das Buch des heyligen Römischen Reichs unnderhalltunge, 1501 Die Poesie ihr Wesen und ihre Formen mit Grundzügen der vergleichenden
Literaturgeschichte, 1884 Echo Deß Hochzeitlichen Te Deum Laudamus, 1722 Ergebnisse der Erhebungen über die Beschäftigung gewerblicher Arbeiter an
Sonn- und Festtagen, Bd.:1, Gruppe I bis VII der Gewerbestatistik, Berlin, 1887, 1887
Quedlinburgisches Kreis-Tags-Memorial, 1673 Von der Regierung der Kirche und den unterschiedlichen Würden der
Geistlichkeit *(full title in comments), 1779 Warhaffter und grundlicher Bericht uß was Ursachen Martinus du Voysin (zu
Basel verburgerter Krämer) inn der Statt Surseew im Aargöw, ..., den 13. Tag Octobris deß 1608. Jars erstlich enthauptet, und volgends verbrennt worden, 1609
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Polish Adwersaria, albo terminata sprawy wojennej, która się toczyła w wołoskiej ziemi z
tureckim cesarzem, 1621 Chorągiew Sarmacka w Wołoszech, to jest pospolite ruszenie i szczęśliwy powrót
Polaków z Wołoch w roku 1621, 1621 Diariusz wiadomości od wyjazdu króla z Wilna do Smoleńska, 1610 Discurs o cenie pieniedzy teraznieyszey y o niektorych skutkach iey…, 1632 Nowe Ateny, albo Akademia wszelkiey scyencyi pełna, na różne tytuły iak na classes
podzielona, mądrym dla memoryału, idiotom dla nauki, politykom dla praktyki, melancholikom dla rozrywki erygowana ... . Część 3 albo Supplement., 1746
Pasja żołnierzy obojga narodów w stolicy moskiewskiej krótko opisana, 1613 Powodzenia niebezpiecznego ale szczęśliwego wojska j. k. m. w Multanach opisanie,
1601 Relacja chwalebnej ekspedycji Jana Kazimierza, króla polskiego i szwedzkiego, 1650 Wyprawa i wyjazd sułtana Amurata, cesarza tureckiego, na wojnę do Korony Polskiej,
1634 Wyprawa i wyjazd sułtana Amurata, cesarza tureckiego, na wojnę do Korony
Polskiej_BW, 1634 Żałosne opisanie upadku króla hiszpańskiego na morzu i na lądzie, 1589
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Slovene Genovefa, 1841 Gosp. Krištofa Šmida korarja avgustanskiga, zgodBe S. Pisma za
mlade ljud..., 1850 Kmetijske in rokodelske novice, 1844 Kratkozhasne uganke, 1788 Kuharske Bukve, 1799 Marianske Kempensar, ali Dvoje bukuvze, 1769 Novice kmetijskih, rokodelnih in narodskih reči, 1851 Sgodbe svetiga pisma za mlade ljudi, 1830 Ta male katechismus, 1768 Vezhna pratika od gospodarstva, 1789 Zerkviza na skali, 1855
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 53
Retrieval demonstrator
Indexing and retrieval library (java) implemented on the lucene search engine
Lexicon in MySQL database
OCR with Finereader SDK and external dictionary interface of about 2000 images of the Dutch Ground Truth selection
Page XML output [in framework]
NE tagging
Indexing and retrieval while using lexicon and NE tagging
53
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.