Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
The Standard Landscape a framework ready for adoption
Foundational standards(basic interoperability and exchange) UTF8, ISO639, W3C XML
Standards for management and representation of LRs (foundational) FSR, TEI, LMF
Standards for linguistic representation (officially established, syntactic interoperability), MAF, SyNAF, LAF
Standards for terminology management and translation technologies (market pressure translation industry), TMF, TBX (Etsi), XLIFF (Oasis), OAXAL (w3c, oasis, etsi)
On-going standardisation projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE (iso), EML (w3c), EMMA (w3c)
23
Building on the CLARIN Standardisation Action Plan
To become a living document updated by the community
Reference guide for LT META-SHARE
Barriers for Standards
24
1
• Lack of (open) tools for using existing standards easily
2
• Lack of (ideally open-source) reference implementations and documentation
3
• Lack of developer/ user education and culture for using standards
4
• Lack of an organizing umbrella to monitor compliance to standards
5
• Some standards (ISO) need to be paid for
6
• Participation in the definition and decision making process is costly
7
• Building by consensus is by necessity a slow process
From Barriers … to Recommendations
25
4.1 LRP
• Standards must be open
4.2 PM
• Need of a body organising, monitoring and coordinating standardisation efforts
4.6 LRP PM
• Encourage building tools that enable the use of standards, and step up the availability of sharable/exchangeable data
4.12 LRP
• Create ‗official‘ validators to check compliance of LRs with basic linguistic standards
Motivations for Standards
26
1 Same tools
on different
data; different tools on the same
data
2
Creation of
workflows – Web
Service interoperability
3
Integration/ Interlinking
of resources
4
• Documentation and MetaData
5
• Validation of LRs
6
• Evaluation Campaigns
7
• Mash-up
8
• Collaborative Creation of Resources
9
• Preservation
Some scenarios
From Motivations … to Recommendations
27
4.3 LRP • Semantic/content interoperability is needed
4.19LRP
• Define and establish a Quality Certificate or Quality Score for LRs, to be endorsed by the community
4.5 LRP
• Closely monitor the Linked Open Data initiative tightly connected to semantic interoperability
4.7 LRP
• Use the web service model to provide platforms with NLP modules as web services for various applications
4.8 LRP PM
• Projects results could be provided as web-services. Cloud-based service architectures could also be leveraged as enablers for LT development
4.9 LRP PM
• Promote collaborative development of resources also as a help to standardisation
Conditions
28
Some selected Recommendations
29
4.21 PM
• Testing/applying standards on a multilingual basis
4.22 PM
• Standardisation initiatives conducted at an international level
4.23 LRP
• Ensure that all standards are fully operational
4.24 PM
• Ensure ―meta-interoperability‖ among standards, that must form a coherent framework, in a LR-ecology
Coverage, Quality, Adequacy
“Address appropriate coverage in terms of quantity, quality and adequacy to technological purposes”
Background
With current data-driven paradigm, innovation crucially
depends on the availability of big amounts of data, of the right
type and appropriate quality
Dependence on data creates disparity for under-resourced
languages and domains
More data, for more languages, and more applications
More types of data, accounting for current and future language
data being collected
30
Analysis and Recommendations
Investments in basic and long term research on the
automatic production of language resources should be
increased to broaden the range of languages and resources
addressed
Promote (shared) gold-standard annotation projects
Promote more efficient uses of available annotated data with
repurposing and merging techniques
Promote inter-disciplinary research with other scientific
domains working on large volumes of data and monitor current
innovative techniques
31
Resource Quantity
Suggested Actions
Increase quantity of resources available to address language
and application needs
Enforce shared/distributed construction of resources as a means to
achieve better coverage
Support and develop BLARKS for all languages and main
applications
Fully develop the BLARK concept so that it can be embodied as a
standard
Allocate funding to resource production according to
BLARK—like criteria
32
Resource Quality
Establish a European evaluation and validation body: an
infrastructure for coordinated LRT evaluation & validation
Establish common and standard Language Technology evaluation
procedures
Devise new methods for LR quality check
Promote evaluation and validation activities of LRs and dissemination of
their outcomes
Carry out evaluation in real-world scenarios
Provide high-quality resources for all European languages
Define and establish a Quality Seal of Approval, to be
endorsed by the community
33
An infrastructure of language resources
“…An infrastructure that supports seamless access, reuse and trust of data. In a
sense, the physical and technical infrastructure becomes invisible and the data themselves become
the infrastructure…”
Avoid fragmentation among multiple competing initiatives and
Ensure sustainability after projects’ end
Back up infrastructure with community sustainment to ensure acceptance
34
One of the first recommendations since the beginning!
Being implemented in META-SHARE
An infrastructure of language resources
LRP PM
Build a sustainable facility for sharing resource data and tools • •
Make LRs available, visible and easily accessible through an appropriate
infrastructure; participate in building the infrastructure by providing
feedback on relevant actions in this direction
• •
Ensure a stable sustainable infrastructure for LR sharing and exchange;
support continuous development and promotional activities • •
Establish an infrastructure that can help LR producers with legal issues •
Support the emergence of a tool-sharing infrastructure to lower the cost of
R&D for new applications in new language resource domains •
Establish international hub of resources and technologies for speech and
language services, by creating a mechanism for accumulating speech
and language resources together with industries and communities
•
Develop and propose (free) tools and more generally Web services
(comparable to the Language Grid), including evaluation protocols and
collaborative workbenches in the LR infrastructure
•
35
Infrastructural matters:
Recognition
To compensate for costs and effort, the entire ecosystem
surrounding Language Resources should be promoted and
sustained
Give greater recognition to successful LR and their producers
Support training in production and use of LRs
Develop a standard protocol for citation of LR
LR stakeholders should collaborate towards definition of an
International Standard for Language Resource Numbering
(ISLRN)
Define a Language Resource Impact Factor along the lines
followed in other fields such as Biology
36
International Framework
Shared effort (financial, organisational)
EC / EU Member States – Regions
Extend to Non-EU partners
Provide examples & best practices (e.g., a commonly agreed
set of basic LRs) for less-resourced languages
Discuss future policies and priorities on a global scale
Synergies among initiatives at international level should be promoted
In particular among infrastructural initiatives
Intensify networking and increase support actions
An International Forum to share information, discuss
strategies and declare/define common objectives
37
Some LR Infrastructures:
World-wide Cooperation & some Priorities
A common effort towards priorities for the field, esp. wrt infrastructural, strategic, research, organisational, policy, ... issues
Even more critical now
With several initiatives of infrastructural (meta-research) nature around LRT, we must
38
Conceive common strategies based on shared
principles and
objectives, leading in
convergent directions
State common
recommendations to
our funding agencies, to
create synergies among
programmes
What we can do in the
near future vs what
could be the ―big vision‖
for the future
Which first small steps ?
See how the experience gained by some can
benefit all
See how Less-
Resourced Languages can be taken on-board
From no infrastructure ...
To many infrastructures
We were complaining there was no infrastructure ...
Have we been too successful??
Now many infrastructural initiatives
Very good opportunity
But only if we are able to act in a coordinated & coherent way
Otherwise we spoil & confuse the field
FLaReNet International Cooperation for a common effort
To prove that ―we are a community‖
… towards a shared set of priorities
39
A FLaReNet lesson A plan
The Community: the Way Forward
It is important to act as a community.
Around FLaReNet, i.e. around LRs: Individual subscribers Institutional members National Contact Points
The plan:
Keep the community together with the support of ELRA –
European Language Resources Association & META-SHARE
Meeting of the National contact points at LREC in Istanbul
Planning for a large crowd of people fostering LR & Ev
40
They represent
a community
FLaReNet data:
the Way Forward
Many initiatives, being community based, can go on …
Provide info on National programmes on the wiki
Insert data on the LRE Map
Contribute to the Language Library, with annotated data,
starting at LREC
Contribute LRs & LTs to META-SHARE
...
Try to make the Blueprint a ―living document‖?
…
41
World-wide Collaborative Watch
on LRs
FLaReNet assets will be sustained, not to lose momentum
ELRA will play a role
42
Impact
However, the impact of the recommendations can only be
measured on the long run
If they are adopted by funding agencies, policy makers …
43
Recognise the Value –
& solve the Conflict
Data is power
Competition
Infrastructu
ral
Can do without
No Strategic Agenda
Best way to
capture the
full potential Economic benefits
Public concerns
Data policies
LRs
Competition
Infrastructural
Openness
Sharing
Interoperabi
lity Collaborative
Innovation
Exploitation
New Servic
es
44
The Challenge:
how to unlock this value
Legal issues