Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
SDL BeGlobal Trainer User Guide
Administrative & End User Guide
2
Table of Contents
1 Getting Started ...................................................................................................................................... 3
1.1 What is BeGlobal Trainer? ............................................................................................................ 3
1.2 Recommended Usage ................................................................................................................... 3
1.3 How to Access SDL BeGlobal Trainer Online ................................................................................. 3
2 Guide for Trainers Overview ................................................................................................................. 4
2.1 Train New Language Pair .............................................................................................................. 5
2.2 New Language Pairs form: Optional Settings ............................................................................... 8
2.3 Managing Language Pairs ........................................................................................................... 11
2.4 Training Details ........................................................................................................................... 14
2.5 Testing a Trained Language Pair ................................................................................................. 15
2.6 Managing test slots ..................................................................................................................... 19
2.7 Activating a Trained Language Pair for general use .................................................................... 19
3 Guide for Account Administrators ...................................................................................................... 20
3.1 Creating/Managing Users ........................................................................................................... 20
3.2 Projects Overview ....................................................................................................................... 21
3.3 New Project ................................................................................................................................. 22
3.4 Managing Projects ...................................................................................................................... 23
3.5 Managing Test Slots .................................................................................................................... 24
3.6 Activating a Trained Language Pair for general use .................................................................... 25
3
1 Getting Started
1.1 What is BeGlobal Trainer? SDL BeGlobal Trainer allows you to use Translation Memory eXchange (TMX) files to create
trained Language Pairs for the purposes of Machine Translation (MT).
1.2 Recommended Usage
This guide is divided into two sections:
Guide for Trainer role users – for end users of the application that will be training and
evaluating Language Pairs.
Guide for Administrator role users – for Account Administrators who are responsible for
configuring the account settings and users for your specific account.
Please refer to the appropriate section of the Guide depending on which category of User you
belong to.
1.3 How to Access SDL BeGlobal Trainer Online
The URL for accessing SDL BeGlobal Trainer Online is as follows:
https://www.sdlbeglobal.com
Figure 1 - SDL BeGlobal Trainer Online Log in screen
Tip: For more information about BeGlobal, please refer to the SDL BeGlobal Online Guide.
4
2 Guide for Trainers Overview
SDL BeGlobal Trainer allows you to create custom trained Language Pairs that can be used in SDL BeGlobal Online, SDL BeGlobal API et cetera or any of the BeGlobal integrations such as FLAVIUS platform. To train a Language Pair you need parallel data, such as aligned English to Spanish sentences, grouped together in a TMX (Translation Memory eXchange) file. Details about the amount of parallel data you need to supply can be found in the D4.6 SMT Customization module through retraining, delivered at M20 (January 2012).
Training process flow:
Note: Trainer comes with a default project folder called “My Projects” so Trainer users can get started
quickly. In order to use SDL BeGlobal Trainer, you will need at least one TMX file with source and target
parallel data.
General rules to follow when using TMX files for training new Language Pairs:
Use one sentence per line parallel data.
Remove junk characters from data including HTML tags.
Use 2 million (or more) words when training a language pair to get best results.
Please note that the more words are used the longer the training will take.
Only UTF-8 encoded files are supported.
Common TMX errors:
Wrong language is supplied – a translation is marked as one language but is actually another.
Source and Target languages are reversed.
A segment contains more than one language.
Wrong or incomplete translation – the source and target are not translations of each other.
Missing source or target translation.
Identical source and target – target and source segments are the same.
5
Junk characters cause errors – this includes HTML tags and programming source code for example.
2.1 Train New Language Pair
The term Language Pair refers to the machine translation engine and statistical data used to carry out
translations. To train a new Language Pair, follow the instructions below.
Figure 1.1 – Language Pairs overview screen – Select the New Language Pair Button to start a new training.
1. Select the Language Pair tab to open the Language Pairs overview screen.
2. Choose a Project for the new Language Pair. (If you are not sure just use My Projects)
3. Press the New Language Pair button to open the New Language Pair form.
Tip: The term “train” or “training” refers to the process of combining the source and target language data from the user provided TMX file(s) into a statistical translation engine that is capable of making new translations based upon the original source data. Mathematical algorithms are used to make associations between the source and target languages to make this possible. This is also known as Machine Translation (MT)
2
1
3
6
Fig 1.2 - New Language Pairs form
4. Give your new Language Pair a name.
5. Next select the source and target languages. The source is the language your new Language
Pair will be translating from and the target is the language it will be translating to. Example:
from Spanish (source) to English (target). Note that the source languages displayed are
dependent on how your account was configured, and the target language is dependent on
the source language you select, as trainings are based on the standard generic baseline
Language Pairs supported by SDL BeGlobal.
6. To upload a TMX file press the Browse button and navigate to the Translation Memory
eXchange (TMX) file(s) you want to use. You may upload one or TMX files containing text in
the same source and target languages that you selected in step 4 above.
Tip: You may use the .zip file format to group several TMX files together and upload them as a single set at one time. For more information on ZIP files visit: www.winzip.com
4
5
6
7
7
7. After you upload a file you will see it in the file upload manager. If you upload a zip file with
multiple TMX files, the system will automatically unzip it and display each TMX file. All other
file types such as .doc or .jpg for example will be automatically skipped by the system. If an
error is detected you will see a warning icon. Any files with errors will automatically not be
included in the training even if you do not remove them from the list. If you wish to remove a
file press the corresponding (X) button on the right.
Fig 1.3 – File upload manager showing a successful TMX file upload.
8. Press the Submit button after all files have completed uploading. During
file upload the Submit button is automatically disabled until all uploads
have completed.
Tip: Use the optional settings feature to add a Test Set and Regression Text. See next page.
8
2.2 New Language Pairs form: Optional Settings
If you are an advanced user and would like to add a Test Set and/or Regression Text, use the optional
settings inputs at the bottom on the form. Before pressing the Submit button, click on the Optional
Settings text to expand the form to see the optional settings.
Tip: After training, the “Test Set” is used to calculate a BLEU score (an automatically calculated measure of similarity between machine and human translations of the same text). This is useful if you want to compare BLEU scores for different Language Pairs. If you do not upload a Test Set file, 1000 random lines are extracted from the training data making it difficult to compare different Language Pairs based on the BLEU score, as different text may be used for the calculations.
Tip: Regression Testing text is automatically translated as soon as the Language Pair has been trained and is useful for visually comparing translations of the same source text generated by different Language Pairs.
9
Fig 1.4 – Optional Settings
1. Optional Settings: Click the Optional Settings text or arrow to expand the optional settings section.
2. Test Set (optional setting)
Click the Browse button to upload a test set. A test set must be in TMX file format (UTF-8 encoded). Only
one Test Set is allowed per training. A test set TMX file should contain a minimum of 100 segments of
parallel data.
Test Set upload manager
After a test set is uploaded a file upload manager appears and upload progress is displayed. If no errors
are found, a checkmark will appear next to the filename and the test set file is now ready to be included
in the processing of the new Language Pair.
1
2
3
4
10
Fig 1.5 – Test Set upload manager showing a successful file upload.
Only one (1) test set is allowed per new Language Pair. To upload another test set, you must first delete
the one that was uploaded by clicking the “X” to the right of the progress bar. If errors are found, the
system will automatically ignore the file when submitted for training.
3. Regression Testing (optional setting)
Click the Browse button to upload a regression testing file. Regression sets must be TXT files (UTF-8
encoded). You may upload multiple files if you like. You may also upload zip files containing multiple test
files.
Regression Testing file upload manager
Fig 1.6 – Regression Testing upload manager showing a successful file upload.
After a regression file is uploaded, the file upload manager appears and the uploaded file is inspected
for compatibility. You may upload multiple regression files if you like. If the file is accepted, a checkmark
will appear next to the filename and the file is now ready for processing. If errors are found, the system
will automatically ignore that file when submitted for training. If you want to remove a file press the
corresponding (X) button on the right.
11
Here is what the optional settings looks like after files have been uploaded…
Fig 1.7 – Optional settings showing successful upload of Test Set and Regression Testing files.
4. Press the Submit button after all files have completed uploading. During file upload the Submit button
is automatically disabled until all file uploads have completed.
Once submitted, the training is queued by the system for processing. Typical training time is between 12
to 24 hours to complete training.
Tip: In general, the larger the input data, the longer the training will take, but the additional content will improve the quality of the Language Pair (as long as it is all pertinent to the type of material the trained Language Pair will be translating).
2.3 Managing Language Pairs
The Language Pairs tab allows the trainer to create, delete, view & manage trainings. The screen
is presented in a workflow tile view with four columns.
12
Fig 1.8 –Language Pair manager
Being Trained column
This column shows Language Pairs that are currently running or have failed. Run time can vary by
volume of the words used in the TMX files used for training. Typical run time for an average sized
training of 2 million words can take between 12-24 hours to complete, depending also on how
many other trainings are queued up or currently running.
Trained column
After the new Language Pair has finished training, it will be automatically moved into this
column. Once a Language Pair is in this column it is ready to move to the testing stage or it may
be activated in the BeGlobal environment where it can then be accessed by users.
Being Testing column
This column shows Language Pairs actively being tested for translation quality. Each Language
Pair in this column uses a “test slot”. A test slot corresponds to a server required to run
translations through a Language Pair. As the number of servers is not infinite, test slots need to
be managed, as described later in this User Guide.
Activated column
This column shows Language Pairs that have been activated and are available for use in SDL
BeGlobal.
13
Language Pair Tiles
Language Pair Tiles are displayed by project and are used to display important information and
provide an easy to use dashboard view.
In Progress This is the first view of a tile that you will see. It shows high level information about the new Language Pair including real-time information such as processing state and estimated time to completion. For details about a Language Pair, click on the “View & Manage” button.
Better result This trained Language Pair (LP) has produced a BLEU score better than the matching SDL generic baseline Language Pair. Note: this training also shows a unique Language Pair Identification (LPID) meaning this Language Pair is using a test slot or has been activated.
Worse result This trained Language Pair has produced a BLEU score lower than the related SDL generic baseline Language Pair. You can still test evaluate a Language Pair with a worse result. Note: Please keep in mind that Test Sets can affect the BLEU score.
Training Failed There are a number of reasons why a Language Pair training can fail. The system will automatically contact our support team for inspection. Note: To see how you may be able to fix a failed training, see 2 Guide for Trainers Overview: Common TMX errors.
14
2.4 Training Details
The Language Pair details screen allows users to see detailed information and initiate actions,
specifically, Test and Activate the trained Language Pair Delete the training and download associated
training files.
Fig 1.9 –Language Pair manager
1. Use the Back button to return to the previous screen.
1 2
3
4
5
6
7
8
15
2. You may permanently delete the trained Language Pair by pressing the “Delete Training” button. A
confirmation window will appear asking if you are sure you want to delete. Once a trained Language
Pair is deleted it cannot be recovered.
3. The training details header at the top of the screen is used to display training results and allows the
user to Test or Activate the trained Language Pair.
4. Training Information is used to show details about the trained Language Pair. This is also where the
BLEU scores are displayed which are used to determine if a trained Language Pair has a better or worse
result (Training BLEU Score) when compared to the SDL standard generic baseline Language Pair
(Baseline BLEU score).
5. Training Corpus contains a zip file containing two TMX files, one with the source and target text
extracted from all the uploaded parallel data TMX files, and one TMX with the same data after an
automated cleansing process. This data is the actual text used for training.
6. The Test Set contains the data used to calculate the BLEU score, which is either the user uploaded test
data or, if a file was not provided, the data the system automatically withheld from the uploaded
parallel data.
7. Regression Set contains the optional files uploaded for automated translation upon completion of the
training, which can be used for an initial quality check on the trained Language Pair. It’s typically used to
verify correct translation of important phrases, terms, product names, et cetera.
8. Sample Output displays sample uploaded source text and its corresponding target text as well as a
machine translation carried out with the new Language Pair. This provides a quick indication of the
quality of the trained Language Pair.
2.5 Testing a Trained Language Pair
After the new Language Pair has completed training, you should evaluate its translation performance.
BLEU scores are useful but they may not be representative of the translation quality that can be
achieved for your particular application of machine translation. To start testing, follow these
instructions:
16
Fig 2.0 –Language Pair manager displaying a completed training.
1. Press the View & Manage button on the completed training you want to evaluate.
Fig 2.1 –Training details screen
2. Press the Test button on the training details screen.
1
2
17
Fig 2.2 – Deploying a Test Slot
3. When deploying a Language Pair for the first time, it may take a few minutes to configure the test
slot. This is a one-time occurrence per Language Pair being deployed to a new test slot. Once ready, this
Language Pair may be tested by using the Instant Translation or Translate a Document features.
Fig 2.3 – Evaluate a trained Language Pair showing the Instant translation option
4. Use Instant Translation for testing a small amount of text quickly. For comparison purposes, the
4
3
18
“Evaluation result” comes from the new Language Pair that was created by SDL BeGlobal Trainer, while
the “Baseline result” comes from the matching SDL standard generic baseline Language pair.
Fig 2.3 – Evaluate a trained Language Pair showing the Translate a Document option
5. Use the Translate a Document option for processing larger translations in standard document
formats. Note that it is important when evaluating trained Language Pairs to use the same test
document as a control for comparing evaluation results.
If you upload a TXT file, you are offered the option to output CSV or XLIFF format files.
The CSV format file is useful if you need to send translations to language experts who are not
professional translators. The CSV file can be opened in Microsoft Excel or other office application. The
first column includes the source sentences, the second column includes the machine translated
sentences and the recipient can provide comments on the translation quality in the third column.
The XLIFF format file is useful for sending translations to professional translators who typically have
tools for managing XLIFF files.
Whether CSV or XLIFF output is selected, SDL BeGlobal Trainer generates a ZIP file that contains two files
containing:
(1) Source text and the translation carried out with your trained Language Pair
(2) Source text and the translation carried out with the standard generic baseline Language Pair
6. There is one additional method for testing a trained Language Pair. Once it is deployed in a test slot, it
is possible to access it directly via existing SDL Trados Studio, SDL WorldServer and SDL TMS integrations
with SDL BeGlobal. Please refer to the documentation of those products for information on how to
5
19
connect them to SDL BeGlobal.
2.6 Managing test slots
Test Slots simulate deployment to the final BeGlobal production environment and allow for the
evaluation of a trained Language Pair.
Fig 2.4 – Training details screen showing all Test Slots are in use.
Each Language Pair being tested requires one test slot. When all the Test Slots allocated to your
organization have been used up, in order to deploy an additional trained Language Pair for testing,
please contact your account Administrator who will be able to release one of the Test Slots containing a
Language Pair that is no longer needed.
2.7 Activating a Trained Language Pair for general use
To make a trained Language Pair accessible for use in BeGlobal, it must be activated. Please contact your
account Administrator to activate the Language Pair.
20
3 Guide for Account Administrators As the account Administrator for your SDL BeGlobal Trainer account, you are responsible for setting up
your Users, configuration and administration of application settings such as Test Slots, and acting as the
point of contact with SDL for support issues. For more information please refer to the BeGlobal Online
User Guide.
3.1 Creating/Managing Users
Once you receive access to BeGlobal, you’ll want to begin the process of setting up the users of
your application.
User Management is performed from the Manage Users section of Account Management.
From the Manage Users tab, click on the New button to create new Users for your account.
Fig 2.9 – Manage Users
User information includes name, email address, time zone and their role (see below). The
Language Specialty information is not required for using SDL BeGlobal Trainer.
The following roles are available when an account is configured for training Language Pairs:
Administrator
Trainer
Each role has a different set of permissions, which determine the functions and features of the
application they have access to. An Administrator can perform all the training tasks that a Trainer
21
can perform and in addition has the ability to manage users and projects, free up Test Slots and
Activate Language Pairs.
Once set up, each user will receive an email with additional instructions on setting up their
account and password. Users may also reset their own password at any time by using the Forgot
My Password link on the BeGlobal Online log-in page.
3.2 Projects Overview Projects are used to group trained Language Pairs together for categorization purposes. This is
useful for keeping trainings organized by client or vertical such as “computer” or “automotive”.
Only Administrators can create or delete projects.
Figure 3.1 – Manage Projects default view
Tip: SDL BeGlobal Trainer automatically comes with a default folder called “My Projects” so Trainer users can get started with creating new language pairs right away. The My Projects folder is created by the system and cannot be deleted.
22
3.3 New Project
Projects are used to group trained Language Pairs together for organizational purposes. This is useful for
keeping trainings categorized by client (such as Project XYZ) or vertical (such as “computer” or
“automotive”.)
1. Click the New Project button under the Projects tab:
On the New Project form give your project a name and press submit. You may optionally add
more information about the project for end users by adding a description and a domain. The
domain is used to define the category of information such “computer” or “automotive”.
Figure 3.2 – New Project form
Tip: At the current time, once created projects cannot be renamed. Therefore be careful when selecting a project name. If a project contains no Language Pairs, you may delete it and create a new one with a different name.
23
3.4 Managing Projects
Figure 3.3 – Manage Projects view showing multiple projects
1. Project folder – Displays the total number of Language Pairs. Click on the folder to view the
Language Pairs associated with this project.
2. Project Details will display an overlay containing detailed project information.
3. Deleting - To delete a project requires two steps as a safety feature. First click the checkbox
on the tile of the project you want to delete, then press the Delete button. A confirmation
window will appear asking for you to confirm the deletion.
4. Search - Can be used to filter if there are a large number of projects.
5. New Language Pair- Selecting this button will take you to the New Language Pair form.
Tip: To delete multiple projects at one time first mark the checkboxes of the projects you want to delete, then press the delete button. A confirmation window will appear asking to confirm your deletions. Please note that projects that have Language Pairs that are being trained or have test slots being used cannot be deleted until those processes are completed.
1
3
3
4
2
24
3.5 Managing Test Slots
As an administrator you are responsible for managing the Test Slots that are used for evaluating trained
Language Pairs. Each Trainer account comes with a number of test slots which are used to manage the
number of test translations that can occur at any given time. If you wish to add more Test Slots you will
need to contact your SDL Account Manager.
The following is an example of a Trainer account where all test slots are being used:
What the Trainer user sees:
1. Trainer user sees on the Language Pair details screen that all test slots are being used in this account.
2. If a Trainer User presses the Test button they will see the message above.
What the Administrator needs to do:
3. For Administrators, the user interface will display a Un-deploy button option next to the Test button
on each Language Pair that is using a Test Slot. Press the Un-deploy button to free up the Test Slot. You
will be asked to confirm the un-deployment (release) of the test slot.
1
2
3
25
4. A Confirmation window will appear for Un-deploying the test slot. Click Un-deploy to confirm the
release of this test slot. If there are any active test evaluation translations running at this time, they will
be lost when the test slot is released and cannot be recovered.
Tip: It is up to the discretion of the Administrator(s) to decide which test slots to release and when. Make sure to coordinate with your Trainer users and plan appropriately. Note that you can deploy and un-deploy a Language Pair as many times as you need to, as long as there is a Test Slot available.
3.6 Activating a Trained Language Pair for general use
To make a trained Language Pair accessible for use in BeGlobal, it must first be activated.
Note that activation is only possible when a training account is linked to a production account. Normally
this is the case, but the linkage may not be set up if your training account was set up as a trial.
Fig 3.1 – Language Pair tile and Language details screen header
1. First, navigate to the details screen of the Language Pair you want to activate.
2. Press the Activate button on the training details screen of the Language Pair you want to
activate.
4
1
2
26
Fig 3.2 – Activation window
3. A dialog window will appear allowing you to rename the Language Pair for final production use
and to confirm your choice of Language Pair activation. Renaming the Language Pair is optional
but advisable, as this is how the Language Pair will be displayed in your production account.
Press the Activate button to approve Language Pair activation.
Fig 3.3 – Activation confirmation message with newly generated LPID
4. Once activated, a 4-digit unique identifier known as a Language Pair Identification (LPID) is created
and a confirmation window will appear. The activated Language Pair is now available for use in
BeGlobal. At this point activation is complete, press the OK to continue.
Fig 3.4 – Training details header now shows the LPID of the activated Language Pair.
5. The top of the training details screen is updated to show “In Use” and displays the Language Pair ID
(LPID) of the activated training.
3
4
5
27
Tip: The LPID is a unique for each Language Pair, use it to locate trained Language Pairs in BeGlobal. To carry out any further operations on your new trained Language Pair, you must login to the production account.