HInDoLA: A Uniﬁed Cloud-based Platform for Annotation ... · JSON schema is inspired from VIA [9] annotation scheme. B. Backend Annotation Manager ... Bhoomi and Combined datasets

HInDoLA A Unified Cloud-based Platform forAnnotation Visualization and Machine

Learning-based Layout Analysis of HistoricalManuscripts

Abhishek Trivedi Ravi Kiran SarvadevabhatlaCentre for Visual Information Technology (CVIT)

International Institute of Information Technology Hyderabad (IIIT-H)Gachibowli Hyderabad 500032 INDIA

abhishektrivediresearchravikiraniiitacin

AbstractmdashPalm-leaf manuscripts are one of the oldest mediumof inscription in many Asian countries Especially manuscriptsfrom the Indian subcontinent form an important part of theworldrsquos literary and cultural heritage Despite their significancelarge-scale datasets for layout parsing and targeted annotationsystems do not exist Addressing this we propose a web-basedlayout annotation and analytics system Our system calledHInDoLA features an intuitive annotation GUI a graphicalanalytics dashboard and interfaces with machine-learning basedintelligent modules on the backend HInDoLA has successfullyhelped us create the first ever large-scale dataset for layoutparsing of Indic palm-leaf manuscripts These manuscripts inturn have been used to train and deploy deep-learning basedmodules for fully automatic and semi-automatic instance-levellayout parsing

I INTRODUCTION

Palm leaf manuscripts written in different Indian languagesare scattered all over the country in monasteries templeslibraries museums with individuals and in several privatecollections In contrast with modern or recent era documentssuch manuscripts are considerably more fragile prone todegradation from elements of nature and tend to have a shortshelf life Moreover the domain experts who can deciphersuch content are small in number and dwindling Manuscriptsgive a lasting impression of the multicultural society thatIndia and many South-east Asian countries are and its deep-rooted knowledge system that has been passed down fromgenerations Therefore it is essential to access the contentwithin these documents before it is lost forever Many studieson Indic palm-leaf and paper-based manuscripts exist butthese are typically conducted on small and often privatecollections of documents [1] [2] No publicly available large-scale annotated dataset of historical Indic manuscripts existsto the best of our knowledge

In this paper we take a significant step to address the issueby building an annotation and analytics ecosystem called His-torical Intelligent Document Layout Analytics or HInDoLAfor short We are certainly not the first to develop such asystem Indeed many useful annotation and analysis tools exist

to facilitate progress in creation and analysis of historical doc-ument manuscripts [3]ndash[5] The system we propose containsmany of the features found in existing annotation systemsHowever some of these systems are primarily oriented towardsoffline annotation single user interaction and are unable toprovide a unified management of annotation process andmonitoring of annotation performance In contrast our web-based system addresses these aspects and provides additionalcapabilities The additional features are tailored for annotationand examining annotation analytics for documents with denseand irregular layout elements especially those found in Indicmanuscripts More generally our system is an instance ofthe recent trend of collaborative cloudweb based annotationsystems and services [6] [7] Our system is completely open-source and contains documentation for ease of installation anduseThe overall architecture of HInDoLA comprises of three majorcomponents - Annotation Tool Dashboard Analytics and MLEngines An overview of the architecture can be seen inFigure 1 Next we shall look at the components of HInDoLAindividually

II ANNOTATION TOOL

The Annotation tool enables interactive annotation ofmanuscripts with user level involvement The tool requiresannotators to perform a one-time registration This enablestracking their annotation performance at multiple levels ofgranularity It is configured as a web-based service available ata pre-specified URL However the tool is designed such thatit can operate offline as well This is especially convenient forsandboxing and extending the functionality of the tool Theoffline version of the tool is also convenient when the server-based components of HInDoLA cannot be installed due toaccess restrictions The web-based version however is moreflexible allows multiple concurrent annotation sessions andenables addition of new features without the intervention ofusers Most systems like [7] [8] have also released their toolsin web based version

Metadata

UnAnnotated Annotated

Annotation Tool

MetadataDashBoard

Viewer

Analytics

Normal ModeCorrection Mode

ML Engines

User Version image

IntelligentVersion image

Intelligent

Fig 1 The Architecture of HInDoLA

A Region Types amp Spatial Annotations

For annotating components of the manuscript we haveintroduced a free-hand drawing feature along with the usualpolygon and rectangle draw feature The free-hand regiondrawing feature enables rapidity in annotations of irregularsmall regions and very large regions in the images thus provid-ing maximum annotation flexibility Just a single mouse clickto start and smoothly moving the mouse along the componentboundary creates a region The polygon region is created byaddition of vertices through mouse clicks The ultra zoom andspanning feature enable the users to overcome the challengeof dense layout parsing and aid in pixel level accuracy ofannotations Once a region is drawn it can be resized bydragging the vertices on the layout edge Documents evenwith irregular layouts contain repetitive structural elements(eg lines) To exploit this repetitive structure annotations ofprevious regions can be copied and shifted to yet unannotatedregions of the image thus saving time

Once a region is annotated an annotation window pop-upappears for label specification for the region Keeping theIndic palm-leaf manuscripts in mind we have introduced 9different labels for regions on image - Character Line SegmentCharacter Component Hole Page Boundary Library MarkerDecorator Picture Physical Degradation and Boundary LineThe region boundary of annotated components of the imageare shown in different vibrant colors which helps in visualanalysis of quality of annotations Some annotated images andtheir intelligent mode predictions are shown in Figure 2 Forenabling text annotations in future we have also introduced atextbox along with label list

All the annotations from the HInDoLA Annotation tool areextracted in JSON format Along with the region coordinatesand labels the JSON file also stores time-stamp for each regionannotated which serves as a metric to support training ofintelligent models for automatic annotation (see Figure 3) TheJSON schema is inspired from VIA [9] annotation scheme

B Backend Annotation ManagerFor handling HInDoLA annotation activities and sessions

of different annotators we have introduced an AnnotationManager which handles the distributed parallel sessions of reg-istered annotators while managing and updating the databaseThe system is designed with a request queue which handlesthe sessions of multiple users working on the system simulta-neously and loads un-annotated images in Normal mode andannotated images in Correction mode The Correction modetoggle is useful for expert annotators to correct the annotationsand thus improves the quality of annotations The Skip buttonhelps annotator to skip the served image if the image is toocorrupted or if they are not sure about proper layout parsingof newly introduced components The Done button saves theannotation JSON file at the backend folder An interestingfeature is the ability to save multiple copies of JSON filesfor the same document image which enables the admins tosave the best version of the annotated image

For building intelligent annotation systems we configureHInDoLA to interface with Deep Network based machine-learning models One of the models is fully automatic andprovides a instance-level layout estimate for a given documentAnother model is semi-automatic The annotator providespartial information in the form of the regionrsquos bounding box

Character Line Segment Character Component Hole Page Boundary Library Marker Decorator Picture Physical Degradation Boundary Line(CLS) (CC) (H) (PB) (LM) (D) (P) (PD) (BL)

PIH 2455 521 3 262 33 60 95 35 404BHOOMI 2476 202 563 314 134 6 2 2231 4

Combined 4931 723 566 576 167 66 97 2266 408

TABLE I Counts for various annotated region types in INDISCAPES dataset The abbreviations used for region types are givenbelow each region type

Result from Indiscapes (Instance semantic segmentation Model) Paper Bounding Box Supervision Intelligent ModeDataset H CLS PD PB CC P D LM BL

PIH NA 74177690 NA 86909929 52846679 60495775 5235394 50297411 29455070Bhoomi 79297678 29077253 8727006 91099128 32506585 NA NA 38257665 NACombined 79297678 57777497 8727006 88479679 45876649 60495775 5235394 42937556 29455070

TABLE II Classwise average IoUs for Penn in Hand (PIH) Bhoomi and Combined datasets (reference Table I) The betterof IoU scores is highlighted in red for easy comparison

The model then predicts a tight region boundary estimate Both the models predict masks for different regions To storethese estimates in the standard annotation point control pointsalong the boundary are estimated using the Ramer-Douglas-Peucker curve approximation method

III HINDOLA amp INDISCAPES THE INDIC MANUSCRIPTDATASET

Using HInDoLA we have created Indiscapes the first everdetailed spatial layout annotated dataset for Indic manuscriptsThe images in our dataset are obtained from two sources Thefirst source is publicly available Indic manuscript collectionfrom University of Pennsylvaniarsquos Rare book and manuscriptlibrary [10] also referred to as Penn-in-Hand(PIH) Fromthe 2880 Indic manuscript book-sets we annotated 204manuscript images The selection was aimed at maximizing thediversity in terms of various attributes such as script languagephysical degradation and presence of non-textual elements(eg picturs tables) and number of lines The second sourceis Bhoomi a collection of 322 images sourced from multipleOriental research institutes and libraries across India The latterset of images are characterized by relatively larger widthpresence of multiple languages close and irregular spaced textlines binding holes and degradations Overall we annotated526 annotated Indic manuscripts with HInDoLA The statisticsof the dataset are provided in Table I

IV ML ENGINES

The ML Engines integration comprises of the intelligentregion boundary predictions from deep learning models toenable fully-automatic or semi-automatic annotation systemThe deep models can learn to automate annotations giveninitial annotated training dataOne of our models is the Instance Semantic Segmentationmodel which can isolate individual instances of each regionin the manuscript image We employ the Mask R-CNN [11]which has proven to be very effective at the task of object-instance segmentation in photos The network architecturecontains three stages - BackBone Region Proposal Networkand Multi-Task Branch networks The network is initialized

with weights obtained from a mask R-CNN trained on the MS-COCO [12] dataset with a ResNet-50 [13] backbone Detailsabout the model can be found in our work [14] The workflowis as follows A new image is served on Annotation toolupon request If the user selects Intelligent Mode the Instancesegmentation prediction model becomes active The modelpredicts the masks for different regions in the input imagefollowing which a sub-system generates control points on theboundaries of regions These control points are overlaid on theinput image and enable the annotator to modify the predictionas required to obtain a tight region boundaryAnother model for the intelligent system is the Boundingbox supervision model which learns the edge features andgenerates the boundary around the region Annotators selecta bounding box around a region in a newly requested imagewhich is automatically forwarded to the edge model at back-end The model predicts edge features and edge-logits (pixelvalues ranging from 0-1) of the region through a ConvolutionNeural Network and generates a concave-hull boundary fromthe most prominent features The boundary is then integratedwith control points similar to Instance segmentation modelboundary prediction which enables serving the region bound-ary annotation on the Annotation tool Annotators may thenadjust the control points to get a tight bounding box aroundthe region The details can be found in our technical reportComparison of our current two intelligent models- InstanceSemantic Segmentation and Bounding Box supervision modelon basis of IoU (Intersection over Union) of predicted anno-tation boundary masks is detailed in Table IICurrently our system can also be configured with any deeplearning based intelligent boundary prediction engines Modelsjust need to predict the masks and the system auto-generatescontrol points (vertices) on the boundaries of masks to enablethe serving of auto-annotated image on tool This functionalitysaves the time of users and experts otherwise spent on tedioustask of annotating whole manuscript image from scratch

V DASHBOARD

We introduce dashboard-style analytics keeping in mindthe non-technical nature of domain experts and the unique

Fig 2 Ground truth annotations (Normal Mode) (left) and predicted instance segmentations (right) for Intelligent Mode Notethat we use colored shading only to visualize individual region instances and not to color-code region types

layout-level challenges of Indic Manuscripts The dashboardfeatures many annotation management activities such as dis-playing graphs for the annotation stats progress report ofthe annotators An additional feature is the database viewerThis component is helpful in monitoring the live annotationsessions and viewing annotation statistics of the documents inan interactive fashion

A Annotation Analytics

This section of dashboard is responsible for displayingthe annotation statistics The manuscripts are grouped anddisplayed by annotated time languages number of regionsetc Various kind of histograms and pie-charts are shownon dashboard demonstrating number of experiments addedcompleted document annotations (snapshot in Figure 1) Thisfunctionality gives an edge over similar systems [6] [7] savingthe time of researchers otherwise spent on book-keeping ofannotation activities The progress of annotators can be alsobe tracked via the score of annotated files done

B Viewer

This section of dashboard monitors the quality of completedannotations Annotated and un-annotated document imagescan be searched and displayed on the interface with the helpof special search filters Some of the basic filters are searchingby users date of annotation language type and time taken forannotation A special feature of lsquoBookmarkrsquo is introduced toselect priority files for annotation Bookmarking of both un-annotated and annotated images is enabled moving the files

to special database tables with immediate serving on requestinto the tool from next annotation sessions

VI THE HINDOLA SYSTEM AVAILABILITYDOCUMENTATION AND HANDS-ON EXPERIENCE

The HInDoLA is designed for Palm-Leaf Manuscript doc-ument images However the system can be easily used forany other historical document collection To enable such apossibility we have setup Github repositories to access thecomponents developed for the system along with documenta-tion and usage guides

bull Annotation Tool - httpsgithubcomihdiahindola-backend

bull Metadata - httpsgithubcomihdiahindola-dbsbull DashBoard- httpsgithubcomihdiahindola-dashbull Instance semantic segmentation prediction -

httpsgithubcomihdiainstance-segmentation-v1

Hands-on experience videos are recorded for each of thesetools that clearly show how to setup and userun all of theirfunctionalities The source code of these tools and their videolinks are fully available on their respective repositories Thetool can be deployed on a single machine with the helpof scripts enabling incremental addition and modification ofany historical document image collection and database filesThe snapshots of the tools on run can be seen in Figure 1Comparison of our Hindola system with other existing his-torical document image annotations and ground-truth creationsystems is shown in Table III

OnlineDesktop Tool Dashboard Analytics Integration of ML Engines Viewer(Bookmarking) Annotations Export Format Performance Analysis SemiAuto Supervised Availability(as of July 2019)

HINDOLA OnlineDesktop 34 34 34 VIA Json Schema 34 SemiAuto (Deep network) Source-Code + Web ServiceDAB [15] Online 37 37 37 XML 34 NA Web Service

TRANSCRIPTORIUM PROJECT [16] Online 37 34 37 TEI 34 Semi (Non-deep) Web ServiceDIVADIAWI [17] Online (HTTP request) 37 34 37 Json 34 SemiAuto (Non-deep) Web Service (Restful API)

MIRADOR ANNOTATION TOOL OnlineDesktop 34 37 34 Json (stringify) 37 NA Source Code + Web ServiceALETHIA [5] Desktop 37 34 34 PAGE 34 SemiAuto (Non-deep) Executable offline

TABLE III Comparison with existing Historical document image Annotation Tools

Annotation Tool Annotated Image Json

Fig 3 Screenshots of our web-based Annotation Tool(left) and the Attributes of annotated region shown in annotated imageJson (right)

VII CONCLUSION

The presented HInDoLA open-source cloud system is acollection of tools for large scale annotation and analysis ofIndic palm-leaf manuscripts The system not only containsstate-of-art historical document layout analysis techniques (ieAnnotation Tool Dashboard Analytics) but also the Semanticinstance-level segmentation prediction We believe that theavailability of layout annotations will play a crucial role in re-ducing the overall complexity for subsequent OCR and similartasks such as word-spotting style-and-content based retrievalAll the utilities are made available for the research communitywith proper documentation and hands-on experience videosto help use the tools with ease See httpihdiaiiitacin foradditional information

REFERENCES

[1] P N Sastry T V Lakshmi N K Rao and K RamaKrishnanldquoA 3d approach for palm leaf character recognition using histogramcomputation and distance profile featuresrdquo in Proc 5th Intl Conf onFrontiers in Intelligent Computing Theory and Applications Springer2017 pp 387ndash395

[2] C Clausner A Antonacopoulos T Derrick and S Pletschacher ldquoIc-dar2017 competition on recognition of early indian printed documents-reid2017rdquo in ICDAR vol 1 IEEE 2017 pp 1411ndash1416

[3] D Doermann E Zotkina and H Li ldquoGEDI-a groundtruthing environ-ment for document imagesrdquo in Ninth IAPR Intl Workshop on DocumentAnalysis Systems 2010

[4] A Garz M Seuret F Simistira A Fischer and R Ingold ldquoCreatingground truth for historical manuscripts with document graphs andscribbling interactionrdquo in DAS IEEE 2016 pp 126ndash131

[7] M Wursch R Ingold and M Liwicki ldquoDivaservicesa restful webservice for document image analysis methodsrdquo Digital Scholarship inthe Humanities vol 32 no 1 pp 150ndash156 2016

[5] C Clausner S Pletschacher and A Antonacopoulos ldquoAletheia-an ad-vanced document layout and text ground-truthing system for productionenvironmentsrdquo in ICDAR IEEE 2011 pp 48ndash52

[6] ldquoWeb aletheiardquo [Online] Available httpsgithubcomPRImA-Research-Labprima-gwt-lib

[8] S Bukhari A Kadi M Ayman Jouneh F M Mir and A Dengelldquoanyocr An open-source ocr system for historical archivesrdquo 11 2017pp 305ndash310

[9] A Dutta A Gupta and A Zissermann ldquoVGG image annotator (VIA)rdquohttpwwwrobotsoxacuk vggsoftwarevia 2016

[10] ldquoPenn in hand Selected manuscriptsrdquo httpdlalibraryupennedudlamedrensearchhtmlfq=collection facetrdquoIndicManuscriptsrdquo

[11] K He G Gkioxari P Dollar and R B Girshick ldquoMask r-cnnrdquo ICCVpp 2980ndash2988 2017

[12] T Lin M Maire S J Belongie L D Bourdev R B GirshickJ Hays P Perona D Ramanan P Dollar and C L Zitnick ldquoMicrosoftCOCO common objects in contextrdquo CoRR vol abs14050312 2014[Online] Available httparxivorgabs14050312

[13] K He X Zhang S Ren and J Sun ldquoDeep residual learning for imagerecognitionrdquo in CVPR 2016 pp 770ndash778

[14] A Prusty S Aitha A Trivedi and R K Sarvadevabhatla ldquoIndiscapesInstance segmentation networks for layout parsing of historical indicmanuscriptsrdquo ICDAR 2019

[15] B Lamiroy and D Lopresti ldquoAn open architecture for end-to-enddocument analysis benchmarkingrdquo in 2011 International Conference onDocument Analysis and Recognition Sep 2011 pp 42ndash47

[16] B Gatos G Louloudis T Causer K Grint V Romero J ASanchez A H Toselli and E Vidal ldquoGround-truth production in thetranscriptorium projectrdquo in Proceedings of the 2013 27th BrazilianSymposium on Software Engineering ser SBES rsquo13 Washington DCUSA IEEE Computer Society 2013 pp 237ndash241 [Online] Availablehttpsdoiorg101109DAS201423

[17] M Wursch R Ingold and M Liwicki ldquoDivaServicesA RESTful webservice for Document Image Analysis methodsrdquo Digital Scholarship inthe Humanities vol 32 no suppl1 pp150minusminus156 112016

Introduction

Annotation Tool

Region Types amp Spatial Annotations

Backend Annotation Manager

HInDoLA amp Indiscapes The Indic Manuscript Dataset

ML Engines

Dashboard

Annotation Analytics

Viewer

The HInDoLA System Availability Documentation and Hands-on Experience

Conclusion

References

Metadata

UnAnnotated Annotated

Annotation Tool

MetadataDashBoard

Viewer

Analytics

Normal ModeCorrection Mode

ML Engines

User Version image

IntelligentVersion image

Intelligent

Fig 1 The Architecture of HInDoLA

A Region Types amp Spatial Annotations

For annotating components of the manuscript we haveintroduced a free-hand drawing feature along with the usualpolygon and rectangle draw feature The free-hand regiondrawing feature enables rapidity in annotations of irregularsmall regions and very large regions in the images thus provid-ing maximum annotation flexibility Just a single mouse clickto start and smoothly moving the mouse along the componentboundary creates a region The polygon region is created byaddition of vertices through mouse clicks The ultra zoom andspanning feature enable the users to overcome the challengeof dense layout parsing and aid in pixel level accuracy ofannotations Once a region is drawn it can be resized bydragging the vertices on the layout edge Documents evenwith irregular layouts contain repetitive structural elements(eg lines) To exploit this repetitive structure annotations ofprevious regions can be copied and shifted to yet unannotatedregions of the image thus saving time

Once a region is annotated an annotation window pop-upappears for label specification for the region Keeping theIndic palm-leaf manuscripts in mind we have introduced 9different labels for regions on image - Character Line SegmentCharacter Component Hole Page Boundary Library MarkerDecorator Picture Physical Degradation and Boundary LineThe region boundary of annotated components of the imageare shown in different vibrant colors which helps in visualanalysis of quality of annotations Some annotated images andtheir intelligent mode predictions are shown in Figure 2 Forenabling text annotations in future we have also introduced atextbox along with label list

All the annotations from the HInDoLA Annotation tool areextracted in JSON format Along with the region coordinatesand labels the JSON file also stores time-stamp for each regionannotated which serves as a metric to support training ofintelligent models for automatic annotation (see Figure 3) TheJSON schema is inspired from VIA [9] annotation scheme

B Backend Annotation ManagerFor handling HInDoLA annotation activities and sessions

of different annotators we have introduced an AnnotationManager which handles the distributed parallel sessions of reg-istered annotators while managing and updating the databaseThe system is designed with a request queue which handlesthe sessions of multiple users working on the system simulta-neously and loads un-annotated images in Normal mode andannotated images in Correction mode The Correction modetoggle is useful for expert annotators to correct the annotationsand thus improves the quality of annotations The Skip buttonhelps annotator to skip the served image if the image is toocorrupted or if they are not sure about proper layout parsingof newly introduced components The Done button saves theannotation JSON file at the backend folder An interestingfeature is the ability to save multiple copies of JSON filesfor the same document image which enables the admins tosave the best version of the annotated image

For building intelligent annotation systems we configureHInDoLA to interface with Deep Network based machine-learning models One of the models is fully automatic andprovides a instance-level layout estimate for a given documentAnother model is semi-automatic The annotator providespartial information in the form of the regionrsquos bounding box

PIH 2455 521 3 262 33 60 95 35 404BHOOMI 2476 202 563 314 134 6 2 2231 4

Combined 4931 723 566 576 167 66 97 2266 408

IV ML ENGINES

V DASHBOARD

B Viewer

VII CONCLUSION

REFERENCES

Introduction

Annotation Tool




ML Engines

Dashboard


Viewer


Conclusion

References

PIH 2455 521 3 262 33 60 95 35 404BHOOMI 2476 202 563 314 134 6 2 2231 4

Combined 4931 723 566 576 167 66 97 2266 408

IV ML ENGINES

V DASHBOARD

B Viewer

VII CONCLUSION

REFERENCES

Introduction

Annotation Tool




ML Engines

Dashboard


Viewer


Conclusion

References

B Viewer

VII CONCLUSION

REFERENCES

Introduction

Annotation Tool




ML Engines

Dashboard


Viewer


Conclusion

References

VII CONCLUSION

REFERENCES

Introduction

Annotation Tool




ML Engines

Dashboard


Viewer


Conclusion

References