Upload
adaptabit
View
857
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This paper presents a research concerning the conversion of non-accessible web pages containing mathematical formulae into accessible versions through an OCR (Optical Character Recognition) tool. The objective of this research is twofold. First, to establish criteria for evaluating the potential accessibility of mathematical web sites, i.e. the feasibility of converting non-accessible (non-MathML) math sites into accessible ones (Math-ML). Second, to propose a data model and a mechanism to publish evaluation results, making them available to the educational community who may use them as a quality measurement for selecting learning material. Results show that the conversion using OCR tools is not viable for math web pages mainly due to two reasons: many of these pages are designed to be interactive, making difficult, if not almost impossible, a correct conversion; formula (either images or text) have been written without taking into account standards of math writing, as a consequence OCR tools do not properly recognize math symbols and expressions. In spite of these results, we think the proposed methodology to create and publish evaluation reports may be rather useful in other accessibility assessment scenarios.
Citation preview
Miquel Centelles, Mireia Ribera, Inmaculada RodríguezAdaptabit Group – University of Barcelona
CSUN Conference 2014
The visionThe problemSetting the stageOur point of viewOur solution: MathML, OCR
(InftyReader) and linked dataResults and future work
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 2
Teaching methodologies have shifted from content-based to skills-based learning.
A key business for teachers is the selection of web resources which serve as reference, extension and motivation to their students.
This selection is mainly based on content and source quality, but rarely considers accessibility criteria.
Accessibility criteria will be increasingly important.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 3
Still lots of non-accessible math web sites.
Why? Interactivity. Formulae in graphics or
videos. Authoring tools automatic conversion to
MathML▪ Not fully reliable▪ Often convert to images
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 4
Concerning accessibility in maths, several initiatives have been driven by publishers and libraries.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 5
Concerning the semantic meaning to mathematical formulae several initiatives link MathML with the semantic web. Christoph Lange: a proposal to describe
generic mathematical formulae. OpenMath: a lightweight ontology to endorse
the meaning of mathematical symbols. HELM: the pioneer in representing structures
of mathematical knowledge in RDF.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 6
To assess the potential accessibility of non-accessible math web sites? Definition: Potential accessibility = The
feasibility of converting them into accessible webs
To publish assessment results in a semantically-empowered way
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 7
How to accessibilize a Math web? Converting to MathML
Re-writing formulae<mrow>
a ⁢ <msup>x 2</msup> + b
⁢ x + c
</mrow> Describing formulae in
alternative text“A times square x plus b times x plus c”
Converting through OCR<mrow>
a ⁢ <msup>x 2</msup> + b
⁢ x + c
</mrow>March 2014 CSUN 2014 - Potential accessibility of mathematical webs 8
OCR is the best option: Does not require expertise OK for low resources Can be done by the student
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 9
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 10
First Conceived for Web pages.
After General format for exchanging
mathematics. Now
Provides accessible content for people with disabilities.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 11
MathML support still incomplete but…
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 12
Pros Recognizes math symbols, converts
them into ▪ LaTeX, ▪ XHTML and MathML.
Cons Strong requirements:
▪ Pure B&W▪ High resolution (600 dpi)▪ Standard ISO 80000-2:2009
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 13
The four principles of linked data publishing model:
Use URIs as names for things. Use HTTP URIs so that people can look up
those names. When someone looks up a URI, provide
useful information, using the standards (RDF, SPARQL)
Include links to other URIs. so that they can discover more things.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 14
Opportunities for the data in accessibility reports:
To customize answers to user queries. To generate data-enriched reports for
managers, technicians (webmasters) and policy decision makers.
To enrich search engines results with accessibility results used as website quality indicators.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 15
TWO KEY DECISIONS
Reuse a formal vocabulary: EARL
Use an open-source CMS: Drupal.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 16
EARL = Evaluation and Report Language
It is a simple vocabulary that describes test results, such as those generated by web accessibility evaluation tools.
It uses the RDF data model to define terms for expressing test results.
It is a W3C Working Draft.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 17
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 18
A new controlled vocabularies for the Test Case class: Containing 10 suitable criteria based on
requirements of both InftyReader OCR software and ISO 80000-2:2009
Examples:▪ C2-INFTYREADER: Image resolution must be
equals or greater than 600 dpi.▪ C3-ISO: An explicitly defined function not
depending on the context is printed in Roman (upright) type, e.g. sin, exp, ln.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 19
A new controlled vocabularies for the Test Result class: Containing 5 categories, based on the
percentage of formulae correctly converted into MathML.
Examples:▪ Failed conversion [0%-20%]: This web has a
very low potential accessibility.▪ Successful conversion [80%-100%]: This web
has the maximum potential accessibility.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 20
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 21
Drupal is the first CMS with Semantic Web services
▪ Use by non-experts▪ Drupal 7 publishes data in RDF format.
Data model in RDF
Data model in Drupal 7
Content type RDF class
Field RDF property
Node RDF resourceMarch 2014 CSUN 2014 - Potential accessibility of mathematical webs 22
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 23
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 24
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 25
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 26
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 27
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 28
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 29
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 30
Most websites not accessible at all Interactive Non-interactive
▪ Formula images ▪ With very poor quality▪ without alternative text.
▪ Formulae not following standards => OCR not working
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 31
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 32
Useful methodology?
Future work: Using math ontologies
▪ OpenMath▪ OMDoc
Adapt formulas to RDFa
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 33
Questions, suggestions, [email protected]
http://www.ub.edu/adaptabit/[email protected]
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 34
CSUN 2014 - Potential accessibility of mathematical webs
Specification: Analysis of data sources. URI design. Publishing license.
Vocabularies and ontologies Transform into RDFPublication Exploitation
35March 2014