Upload
arle-lommel
View
1.955
Download
2
Tags:
Embed Size (px)
Citation preview
Translation Quality Assessment:
Five Easy StepsUsing Multidimensional Quality Metrics to
improve quality assessment and management
Prepared by the QTLaunchPad project ([email protected])
version 1.0 (26.April 2013)
Who does this apply to?
Requesters of translation services looking for relevant quality metrics
Language Service Providers (LSPs) delivering translation services to their clients
The following materials will apply to negotiation between requesters and providers
This description does not apply to individual translators (although they may want to be aware of the contents)
Step 1: Specifications
Basic questions about your project
E.g.,
What languages are you working in?
What is your subject field?
What sort of project is it (e.g., user interface, documentation, advertising)?
What technology are you using (MT, CAT, etc.)?
What register and style are you using?
Step 2. Select Metrics
Based on your specifications…
MQM recommendation tool will: suggest a pre-defined metric used for similar projects, or recommend a custom metric that applies to your project
You are free to modify the metric as needed
Create a metrics specification file that defines the issues to be examined provides weights (descriptions of how important the
issues are)
Metrics specification file can be used by an MQM-compliant tool
Step 3: Evaluation Method
Three options:
1. Sampling: Examine a portion of the text to determine whether to pass or fail the entire text. Sampling can utilize quality estimation for better results
2. Full error analysis: Review the entire text (needed for critical legal or safety texts)
3. Rubric: Rate the text on a numerical scale (suitable for quick assessment of suitability)
Automated Metrics
If sampling is used, MQM’s quality estimation tools will help focus sampling on those parts of the text that need attention
Automatic metrics can be used in some cases where human evaluation is too expensive or time-consuming
Step 4: Evaluation
Evaluation…
Can be conducted by the requester or LSP in accordance with the agreement between the parties
Follows the method chosen in Step 3 (evaluation method)
Issues must match the metric chosen in Step 2: issues not found in the metric should not be considered errors
MQM provides capabilities
For human evaluation Inline markup provides an audit trail:
Allows independent verification of errors Helps ensure that issues are corrected
Full reporting functions: See what types of errors are reported Understand where errors come from
For automatic evaluation Integrated use of existing quality metrics to help
provide evaluation
translate5
These capabilities are being integrated into an open-source editing tool, translate5 (http://www.translate5.net)
All results are free to implement in additional tools (both open source and proprietary)
Parties interested in development should contact [email protected]
The source matters
Full MQM evaluation includes the source
Source quality evaluation can help identify reasons for problems and resolve them
Translators can be rewarded for addressing source deficiencies (scores over 100% are possible!)
Step 5: Scoring
Scoring Formula
(Q = whatever set of issues being counted within the bigger formula)
Provides consistency with LISA QA Model scoring method
Can be customized to support other legacy systems
Can be applied to individual parts of the overall formula: i.e., fluency, accuracy, grammar, etc. subscores can be derived
Weights (not shown) can be used to adjust importance of various issue types
Scores help guide decisions
Scores are given on a 100% basis
Scores can be broken down into more fine-grained reports. E.g., a score of 96% could have 100% accuracy but
92% fluency. Helps target actions for quality control.
Example
1. Specifications
Parameter Value
Language/Locale Source: English; Target: Japanese
Subject field/domain Medical
Text type Narrative
Audience Educated readers with an interest in medicine
Purpose Education about a new procedure for managing diabetes
Register Moderately formal
Style no specified style – match source if possible
Content correspondence
Literal translation
Output modality subtitles (speech to text)
File format Time-coded XML for dotSub
Production technology human translation
2. Recommended Metric
Issue type Weight (high, medium, low)
Notes
Fluency
Orthography High
Grammar High
Accuracy
Mistranslation High
Omission Low Due to nature as captions, some information loss is expected. Captions should be 60% of spoken dialogue
Untranslated High
Legal requirements
High Must make sure that legal claims are admissible under Japanese law
Chosen from…
Issue types are a subset of the full catalog of types
Chosen from…
Quality Formula (1)
TQ = (Atr + At - As) + (Ft – Fs)
with respect to specifications
TQ = translation qualityAtr = accuracy (transfer)At = accuracy for the target textAs = accuracy for the source textFt = fluency score for target textFs = fluency score for source text
Quality Formula (2)
TQ = (Atr + At - As) + (Ft – Fs)
with respect to specifications
Definition: A quality translation demonstrates required accuracy and fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.
The gold portion = dimensions (specifications)
3. Evaluation method
In this example, portions of the text are marketing: sampling is an acceptable evaluation method for these parts
Other portions contain legal and regulatory claims: full error analysis is required for those portions
Inline markup can be used via MQM namespace (because text is in XML) to ensure corrections are made.
4. Evaluation
• Evaluation includes subsegment markup with issues in metric
• Issues stored in MQM namespace to allow audit and revision
• Users can select three severity levels:• critical: the issue renders the text unusable• major: the issue leaves the text usable, but is an obstacle
to understanding• minor: the issue does not impact usability of the text
screenshot: translate5.net showing MQM markup tool
5. Scoring
Issue type Weight Minor Major
Critical
Penalty
Adjusted
Total
Fluency
Orthography 1.0 8 2 1 28 28 97.2%
Grammar 1.0 6 2 0 16 16 98.4%
Subtotal 44 95.6%
Accuracy
Mistranslation
1.0 4 0 0 4 4 99.6%
Omission 0.2 12 4 1 42 8.4 99.2%
Untranslated 1.0 1 0 0 1 1 99.9%
Legal requirements
1.0 0 0 1 10 10 99.0%
Subtotal 23.4 97.7%
Total 67.4 93.3%
Assumes 1000-word sample
Because Omission is considered a low priority in this case, it is given a low
weight
5. Scoring
Without weighting of Omission, the score would be 89.9%
We can see that the translator has more problems with fluency than with accuracy
5. Full scoring (including source)
Issue type Source Target Adjusted
Fluency
Orthography 96.1% 97.2% 101.1%
Grammar 99.0% 98.4% 99.6%
Subtotal 95.1% 95.6% ☞ 100.5%
Accuracy
Mistranslation (100%) 99.6% 99.6%
Omission (100%) 99.2% 99.2%
Untranslated (100%) 99.9% 99.9%
Legal requirements
(100%) 99.0% 99.0%
Subtotal 100% 97.7% 97.7%
Total 95.1% 89.9% 98.2%
Assumes 1000-word sample. Source accuracy set to 100% for computational purposes.
5. Scoring (including source)
In many cases, some problems in a translation are not caused by the translator.
In this case, the translator fixed problems in the source, resulting in better quality for fluency in the target. The translator should be recognized for this work.