Upload
lorraine-pitts
View
230
Download
1
Tags:
Embed Size (px)
Citation preview
A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech
Translation System
Alon Lavie, Carnegie Mellon University
Florian Metze, University of Karlsruhe
Roldano Cattoni, ITC-irst
Erica Costantini, University of Trieste
July 8, 2002 ACL-02 S2S-Translation Wshp 2
Outline
• The NESPOLE! Project
• Approach and System Architecture
• Performance and Usability Challenges:– Distributed real-time performance over internet– Integration and use of multi-modal capabilities– End-to-end Translation performance
• Lessons learned and conclusions
• Speech-to-speech translation for E-Commerce applications• Partners: CMU, Univ of Karlsruhe, ITC-irst, UJF-CLIPS,
AETHRA, APT-Trentino• Builds on successful collaboration within C-STAR• Improved limited-domain speech translation• Experiment with multimodality and with MEMT• Showcase-1: Travel and Tourism in Trentino, completed
in Nov-2001, demonstrated• Showcase-2: expanded travel + medical service
July 8, 2002 ACL-02 S2S-Translation Wshp 4
Speech-to-speech in E-commerce
• Replace current passive web E-commerce with live interaction capabilities
• Client starts via web, can easily connect to agent for specific information
• “Thin client” - very little special hardware and software on client PC: browser, MS Netmeeting, Shared Whiteboard
July 8, 2002 ACL-02 S2S-Translation Wshp 11
Recent Developments: Apr-02
• Improved analysis and generation grammars (using old C-STAR data)
• Improved SR engines• Packet-loss, video, and modem connection tests• Data Collection for Showcase 2A• Evaluation Scheme Experiment• Paper and Demo at HLT-02• Paper submissions to ACL-02, ICSLP-02,
ESSLLI-02
July 8, 2002 ACL-02 S2S-Translation Wshp 13
WP5: HLT Modules
• Data Collection for Showcase-2A completed in February-2002
• Status of transcriptions from all sites?• CMU will maintain a data repository: (Alon
collecting all data CDs here)• IF discussions and development have
already started (Donna)• Development Schedule?
July 8, 2002 ACL-02 S2S-Translation Wshp 14
WP7: Evaluation
• D9: Evaluation of Showcase-1 Report: draft circulated earlier this week
• Each site should verify that most up-to-date results are being reported
• Include detailed tables in the report?• Majority vote – finalize a common
procedure• New evaluation experiments
July 8, 2002 ACL-02 S2S-Translation Wshp 15
Majority Vote Scheme
• Issue: did all sites use same guidelines?• What to do when there is no majority?
– i.e. 4 graders assign P/P/K/K
• What to do when there is complete disagreement?– i.e. 3 graders assign P/K/B
• Need to recalculate scores from prev evaluation?
July 8, 2002 ACL-02 S2S-Translation Wshp 16
New Evaluation Experiments
• We are investigating three main issues:– Binary versus 3-way grading
– Majority vote versus averaging of scores
– Intercoder and Intracoder agreement
• Grading Experiment:– Four groups, three graders in each group
– Each group grades two sets, two weeks apart
– Sets are different but have a common large overlap
– Groups differ in eval scheme used (binary/3-way)
July 8, 2002 ACL-02 S2S-Translation Wshp 17
Planned Analysis of Data
• Compare results across grading schemes (binary vs. 3-way) on same set of data
• Compare majority scores with average scores• Evaluate Intercoder agreement between graders
(on same set and same scheme)• Evaluate Intracoder agreement of same grader (on
overlap data in the two sets, same grading scheme in both sessions)
July 8, 2002 ACL-02 S2S-Translation Wshp 18
Preliminary Results
Group(procedure) W1 Acc W1 Bad W2 Acc W2 Bad
Gr1 (binary/3-way) 50.2 49.8 48.7 51.3
Gr2 (3-way/binary) 52.4 47.6 48.8 51.2
Gr3 (3-way/3-way) 53.8 46.2 54.9 45.1
Gr4 (binary/binary) 49.0 51.0 50.0 50.0