Upload
waylon
View
16
Download
0
Embed Size (px)
DESCRIPTION
Shared resources, shared values? Ethical implications of sharing translation resources. Jo Drugan and Bogdan Babych University of Leeds, UK www.leeds.ac.uk/cts. Overview. Practical challenges to sharing translation resources, but also ethical and legal problems - PowerPoint PPT Presentation
Citation preview
4 November 2010 EM+/CNGL workshop 1
Shared resources, shared values? Ethical implications of sharing translation resources
Jo Drugan and Bogdan BabychUniversity of Leeds, UK
www.leeds.ac.uk/cts
4 November 2010 EM+/CNGL workshop 2
Overview• Practical challenges to sharing translation
resources, but also ethical and legal problems• Recent collaboration and greater openness, but
focus generally on practical issues• Good reasons for failure to broach ethics• Yet essential to do so – huge and growing
demand for translation can’t be met without sharing
• Questions users and developers should be asking and suggested ways forward
4 November 2010 EM+/CNGL workshop 3
Talk map1. Practical problems in sharing translation
resources2. Ethical problems in sharing translation
resources3. Case studies
• Google Translation Toolkit• TAUS Language Search Engine (LSE)
4. Conclusion
Sharing translation resources: Practical problems• Exploitation of large parallel corpora to
create/populate translation resources hampered by:– “Locked-in” data: range of tools– Ineffective exchange formats
• Vashee 2010: ‘Translation tools often trap your data in a silo because the vendors WANT to lock you in and make it painful for you to leave’
– Client reservations
4 November 2010 EM+/CNGL workshop 4
Recent progress on practical problems• Large minable multilingual corpora released
online since 1990s– Canadian Hansard, UN texts, Europarl corpus– Large-scale SMT platforms rely on such
parallel corpora• European Union TM archive, 2007• Translation Automation User Society (TAUS),
2007• Shared online Translation Environment Tools
(TenTs), crowdsourced/collaborative translation
4 November 2010 EM+/CNGL workshop 5
Sharing translation resources and MT• Koehn 2010: SMT is domain-dependent to
much greater degree than RBMT– Lower quality of out-of-domain translation
• Sharing translation resources essential for building high-quality SMT systems– Range of text types/subject domains– Requires consideration of ethical and legal
issues
4 November 2010 EM+/CNGL workshop 6
And ethics?...• Conspicuous by its absence: limited to issues
of (informed) consent and ‘threats’ to translators– Improved MT quality– Collaborative translation
• Yet familiar issues– Trailblazers (Wikipedia)– Legal grey areas (translation as
international activity par excellence)
4 November 2010 EM+/CNGL workshop 7
Consequences?• Two standard reactions:
1. ‘Don’t ask, don’t tell’• Risks of burying your head in the sand• Legal implications, traceability
2. Excessive caution• Passing up potentially valuable data
4 November 2010 EM+/CNGL workshop 8
Consequences - MT?• ‘What has ethics got to do with MT?’• Sharing translation resources requires
consideration of ethical and legal issues– Confidentiality of data– Trade, industrial, state secrets– Intellectual property rights (moral rights?)
of translators, authors, data owners
4 November 2010 EM+/CNGL workshop 9
Engaging with ethics• Share data confidently, arguing from clearly stated
values• Draw on precedents in related fields/debates• Essential because sharing is increasingly the norm
– TAUS: Information Age = ‘insatiable demand for translation services that cannot be met with existing proprietary business models and the capacity of around 300 000 professional translators worldwide’
• One way in: case studies– Ethical questions raised by what’s actually
happening
4 November 2010 EM+/CNGL workshop 10
1. Google Translation Toolkit• SMT
– Since 2005, http://translate.google.com/ – 58 language pairs in 2010– For assimilation, typically not integrated in
translation workflow• MT post-editing concerns• Google move to embed MT in online
collaborative translation environment: Google Translation Toolkit
4 November 2010 EM+/CNGL workshop 11
Google Translation Toolkit• MT integrated with TM and user dictionary
functionality• TM matches/user dictionary entries have
priority but post-edit MT output if not available• Translators collaborate, as for Google Docs• Stored on ‘cloud’ servers but can be
downloaded• User options, no MT if preferred• But limiting factors…
4 November 2010 EM+/CNGL workshop 12
Limiting factors• Ethical rather than technological• No.1: Confidentiality of project and resources
– Not practical for most real-world professional projects
– Technically possible to address translators’/clients’ concerns
– Default settings
4 November 2010 EM+/CNGL workshop 13
Other ethical issues not addressed• Recognition, compensation of translators’ work
– Potential legal consequences– Other tools support such approaches:
http://mymemory.translated.net/doc/ • Ownership, attribution• Familiar issues• Potentially useful innovative technology falls
down because it fails to take into account practical user-based scenarios, in part due to inadequate ethical framework
4 November 2010 EM+/CNGL workshop 14
2. TAUS Language Search Engine (LSE)• Online tool for searching uploaded TMX data
– Parallel concordances, word alignment techniques
– Intelligent dictionary– User (mis)expectations
• Ethical framework is explicit – even a ‘model’– User consent– Quid pro quo– Data owner responsibility
4 November 2010 EM+/CNGL workshop 15
But key questions remain unaddressed• Ethical, not technical• Ownership and consent – broader issues
– ‘Community of users and providers of translation technologies and services’ – but all large-scale, not end users or freelance translators
– Informed consent?• NB not legal/contractual - broader
– Industry codes of ethics, ‘taking credit for others’ work’
– UNESCO 1976, ‘supplementary payment’?
4 November 2010 EM+/CNGL workshop 16
Key questions unaddressed• Translator choice? • Should ultimate responsibility afford claims to
ultimate ownership?• Avoiding harm?• Effects on future translation quality
judgments?
4 November 2010 EM+/CNGL workshop 17
Positively ethical• The aims and ambitions of these two initiatives
can be seen as profoundly ethical• Relevant principles in codes:
– Professional review, informed critiques, raise standards, improve public understanding, contribute to society and human well-being, respect human diversity, support fellow professionals, contribute to profession’s standing, enhance quality of life
• Not just defensive, but allows case to be made for action rather than inaction
4 November 2010 EM+/CNGL workshop 18
4 November 2010 EM+/CNGL workshop 19
Questions/Discussion