22
The Standard Landscape a framework ready for adoption Foundational standards(basic interoperability and exchange) UTF8, ISO639, W3C XML Standards for management and representation of LRs (foundational) FSR, TEI, LMF Standards for linguistic representation (officially established, syntactic interoperability), MAF, SyNAF, LAF Standards for terminology management and translation technologies (market pressure translation industry), TMF, TBX (Etsi), XLIFF (Oasis), OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE (iso), EML (w3c), EMMA (w3c) 23 Building on the CLARIN Standardisation Action Plan To become a living document updated by the community Reference guide for LT META-SHARE

The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

The Standard Landscape a framework ready for adoption

Foundational standards(basic interoperability and exchange) UTF8, ISO639, W3C XML

Standards for management and representation of LRs (foundational) FSR, TEI, LMF

Standards for linguistic representation (officially established, syntactic interoperability), MAF, SyNAF, LAF

Standards for terminology management and translation technologies (market pressure translation industry), TMF, TBX (Etsi), XLIFF (Oasis), OAXAL (w3c, oasis, etsi)

On-going standardisation projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE (iso), EML (w3c), EMMA (w3c)

23

Building on the CLARIN Standardisation Action Plan

To become a living document updated by the community

Reference guide for LT META-SHARE

Page 2: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Barriers for Standards

24

1

• Lack of (open) tools for using existing standards easily

2

• Lack of (ideally open-source) reference implementations and documentation

3

• Lack of developer/ user education and culture for using standards

4

• Lack of an organizing umbrella to monitor compliance to standards

5

• Some standards (ISO) need to be paid for

6

• Participation in the definition and decision making process is costly

7

• Building by consensus is by necessity a slow process

Page 3: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

From Barriers … to Recommendations

25

4.1 LRP

• Standards must be open

4.2 PM

• Need of a body organising, monitoring and coordinating standardisation efforts

4.6 LRP PM

• Encourage building tools that enable the use of standards, and step up the availability of sharable/exchangeable data

4.12 LRP

• Create ‗official‘ validators to check compliance of LRs with basic linguistic standards

Page 4: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Motivations for Standards

26

1 Same tools

on different

data; different tools on the same

data

2

Creation of

workflows – Web

Service interoperability

3

Integration/ Interlinking

of resources

4

• Documentation and MetaData

5

• Validation of LRs

6

• Evaluation Campaigns

7

• Mash-up

8

• Collaborative Creation of Resources

9

• Preservation

Some scenarios

Page 5: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

From Motivations … to Recommendations

27

4.3 LRP • Semantic/content interoperability is needed

4.19LRP

• Define and establish a Quality Certificate or Quality Score for LRs, to be endorsed by the community

4.5 LRP

• Closely monitor the Linked Open Data initiative tightly connected to semantic interoperability

4.7 LRP

• Use the web service model to provide platforms with NLP modules as web services for various applications

4.8 LRP PM

• Projects results could be provided as web-services. Cloud-based service architectures could also be leveraged as enablers for LT development

4.9 LRP PM

• Promote collaborative development of resources also as a help to standardisation

Page 6: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Conditions

28

Page 7: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Some selected Recommendations

29

4.21 PM

• Testing/applying standards on a multilingual basis

4.22 PM

• Standardisation initiatives conducted at an international level

4.23 LRP

• Ensure that all standards are fully operational

4.24 PM

• Ensure ―meta-interoperability‖ among standards, that must form a coherent framework, in a LR-ecology

Page 8: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Coverage, Quality, Adequacy

“Address appropriate coverage in terms of quantity, quality and adequacy to technological purposes”

Background

With current data-driven paradigm, innovation crucially

depends on the availability of big amounts of data, of the right

type and appropriate quality

Dependence on data creates disparity for under-resourced

languages and domains

More data, for more languages, and more applications

More types of data, accounting for current and future language

data being collected

30

Page 9: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Analysis and Recommendations

Investments in basic and long term research on the

automatic production of language resources should be

increased to broaden the range of languages and resources

addressed

Promote (shared) gold-standard annotation projects

Promote more efficient uses of available annotated data with

repurposing and merging techniques

Promote inter-disciplinary research with other scientific

domains working on large volumes of data and monitor current

innovative techniques

31

Page 10: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Resource Quantity

Suggested Actions

Increase quantity of resources available to address language

and application needs

Enforce shared/distributed construction of resources as a means to

achieve better coverage

Support and develop BLARKS for all languages and main

applications

Fully develop the BLARK concept so that it can be embodied as a

standard

Allocate funding to resource production according to

BLARK—like criteria

32

Page 11: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Resource Quality

Establish a European evaluation and validation body: an

infrastructure for coordinated LRT evaluation & validation

Establish common and standard Language Technology evaluation

procedures

Devise new methods for LR quality check

Promote evaluation and validation activities of LRs and dissemination of

their outcomes

Carry out evaluation in real-world scenarios

Provide high-quality resources for all European languages

Define and establish a Quality Seal of Approval, to be

endorsed by the community

33

Page 12: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

An infrastructure of language resources

“…An infrastructure that supports seamless access, reuse and trust of data. In a

sense, the physical and technical infrastructure becomes invisible and the data themselves become

the infrastructure…”

Avoid fragmentation among multiple competing initiatives and

Ensure sustainability after projects’ end

Back up infrastructure with community sustainment to ensure acceptance

34

One of the first recommendations since the beginning!

Being implemented in META-SHARE

Page 13: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

An infrastructure of language resources

LRP PM

Build a sustainable facility for sharing resource data and tools • •

Make LRs available, visible and easily accessible through an appropriate

infrastructure; participate in building the infrastructure by providing

feedback on relevant actions in this direction

• •

Ensure a stable sustainable infrastructure for LR sharing and exchange;

support continuous development and promotional activities • •

Establish an infrastructure that can help LR producers with legal issues •

Support the emergence of a tool-sharing infrastructure to lower the cost of

R&D for new applications in new language resource domains •

Establish international hub of resources and technologies for speech and

language services, by creating a mechanism for accumulating speech

and language resources together with industries and communities

Develop and propose (free) tools and more generally Web services

(comparable to the Language Grid), including evaluation protocols and

collaborative workbenches in the LR infrastructure

35

Page 14: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Infrastructural matters:

Recognition

To compensate for costs and effort, the entire ecosystem

surrounding Language Resources should be promoted and

sustained

Give greater recognition to successful LR and their producers

Support training in production and use of LRs

Develop a standard protocol for citation of LR

LR stakeholders should collaborate towards definition of an

International Standard for Language Resource Numbering

(ISLRN)

Define a Language Resource Impact Factor along the lines

followed in other fields such as Biology

36

Page 15: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

International Framework

Shared effort (financial, organisational)

EC / EU Member States – Regions

Extend to Non-EU partners

Provide examples & best practices (e.g., a commonly agreed

set of basic LRs) for less-resourced languages

Discuss future policies and priorities on a global scale

Synergies among initiatives at international level should be promoted

In particular among infrastructural initiatives

Intensify networking and increase support actions

An International Forum to share information, discuss

strategies and declare/define common objectives

37

Page 16: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

Some LR Infrastructures:

World-wide Cooperation & some Priorities

A common effort towards priorities for the field, esp. wrt infrastructural, strategic, research, organisational, policy, ... issues

Even more critical now

With several initiatives of infrastructural (meta-research) nature around LRT, we must

38

Conceive common strategies based on shared

principles and

objectives, leading in

convergent directions

State common

recommendations to

our funding agencies, to

create synergies among

programmes

What we can do in the

near future vs what

could be the ―big vision‖

for the future

Which first small steps ?

See how the experience gained by some can

benefit all

See how Less-

Resourced Languages can be taken on-board

Page 17: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

From no infrastructure ...

To many infrastructures

We were complaining there was no infrastructure ...

Have we been too successful??

Now many infrastructural initiatives

Very good opportunity

But only if we are able to act in a coordinated & coherent way

Otherwise we spoil & confuse the field

FLaReNet International Cooperation for a common effort

To prove that ―we are a community‖

… towards a shared set of priorities

39

Page 18: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

A FLaReNet lesson A plan

The Community: the Way Forward

It is important to act as a community.

Around FLaReNet, i.e. around LRs: Individual subscribers Institutional members National Contact Points

The plan:

Keep the community together with the support of ELRA –

European Language Resources Association & META-SHARE

Meeting of the National contact points at LREC in Istanbul

Planning for a large crowd of people fostering LR & Ev

40

They represent

a community

Page 19: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

FLaReNet data:

the Way Forward

Many initiatives, being community based, can go on …

Provide info on National programmes on the wiki

Insert data on the LRE Map

Contribute to the Language Library, with annotated data,

starting at LREC

Contribute LRs & LTs to META-SHARE

...

Try to make the Blueprint a ―living document‖?

41

World-wide Collaborative Watch

on LRs

Page 20: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

FLaReNet assets will be sustained, not to lose momentum

ELRA will play a role

42

Impact

However, the impact of the recommendations can only be

measured on the long run

If they are adopted by funding agencies, policy makers …

Page 21: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

43

Recognise the Value –

& solve the Conflict

Data is power

Competition

Infrastructu

ral

Can do without

No Strategic Agenda

Best way to

capture the

full potential Economic benefits

Public concerns

Data policies

Page 22: The Standard Landscape · OAXAL (w3c, oasis, etsi) On-going standardisatio n projects and initiatives (recently mature areas of linguistic analysis and emerging technologies), SPACE

LRs

Competition

Infrastructural

Openness

Sharing

Interoperabi

lity Collaborative

Innovation

Exploitation

New Servic

es

44

The Challenge:

how to unlock this value

Legal issues