Advanced: Preprocessing Question Adjustments

When Transcelerator generates guessed translations, it is based on its ability to statistically match parts of previously translated questions with parts of the English question. These “parts” typically consist of words or short phrases, sometimes even partial words. Since many of the English questions are fairly short, a relatively high percentage of the text of the question is sometimes grammatical fluff (e.g., helping words such as “did” or “does”) that might not be reflected in the grammar of the target vernacular. Additionally, the basic word order used in English questions and the vernacular questions might differ. These kinds of differences can slow Transcelerator down in its ability to figure out what vernacular parts correspond to what English parts. Writing effective Preprocessing Question Adjustments is not particularly intuitive, but if you can figure out a few phrase substitution rules that help align the English to the vernacular, it might speed things up, especially in the early stages of using Transcelerator. If your team does not have anyone with the skills to analyze the differences or you can’t figure out how to convey the necessary adjustment via a rule, do not worry. Transcelerator can still be used successfully without defining any preprocessing question adjustments.

Defining Adjustments

The best indication that a question adjustment would be helpful is when you are looking at a question and see that the guessed translation is almost correct but has incorrect word order, repeated words, or extra words that don’t make sense. To make adjustments, on the Advanced menu, select Preprocessing Question Adjustments, and then in the Question Adjustments dialog box, do the following:

  1. Note that the question you had selected in the main window is displayed in the Preview Sample Question. You will probably want to refer to that when writing your rule.
  2. To begin to create a new adjustment, click in the blank line at the bottom of the list of rules.
  3. In the column Word or Phrase to Replace, type the portion of the English question that you want to adjust by reordering, deleting, adding, etc.
  4. In the Replacement column, type the words or phrase as you want them to be adjusted. Note that these should not be translations into the target vernacular. To delete the entire word or phrase you typed in step 3 (e.g., to remove an auxiliary verb), just leave this cell blank.
  5. For a more complex adjustment that needs to be defined as a regular expression, select the Regular Expression check box. See Examples below for more help.
  6. If the adjustment is case-sensitive, select the Match Case check box. (Most probably don’t need to be.)
  7. In the Preview Result column, look to see whether your new rule produced a change for the currently selected sample question. If not, then you probably made a mistake in step 3. Or you accidentally typed the same thing in the Replacement cell.
  8. If desired, you can now select a different question in the Preview Sample Question list to see if the rule makes any change to it. In addition to checking a couple questions where you expect the rule to make a helpful adjustment, you should probably also check a few questions to which your adjustment is not expected to apply to make sure that it doesn't.

Ordering Adjustments

If you have more than one adjustment, Transcelerator applies them in the order listed, so an earlier adjustment can change the text in such a way that a subsequent rule that would have applied no longer does. If two adjustments are completely independent, then their relative order does not matter. (Hint: Even if the adjustments are independent for the current sample question being displayed, you should consider whether they are likely to be independent for other questions to which might they both apply.) If it is important for the adjustment rules to be considered and applied in a particular order, you can move an rule up or down in the list by selecting it and clicking the green up or down arrow to the right of the grid. The Preview Result column will adjust dynamically as the order of the rules changes.

When You are Done Making Adjustments

When you have finished working in the Question Adjustments dialog box, click OK to save your changes. You will probably notice a long pause while Transcelerator re-processes all the questions using the new adjustment rules you defined. Once it has analyzed everything and made sense of the adjusted questions, it will save the analysis so it can be reloaded quickly whenever you restart it.

Examples

Simple Word Order Adjustment

In English, the phrase should not occurs in a few dozen questions. In Spanish, the equivalent phrase is no debe, which translates to not should. If Spanish is the target vernacular, and Transcelerator has figured out that that should is debe and not is no, it will incorrectly translate should not as debe no. It would be helpful to have Transcelerator reorder the English words should and not to match the word order in Spanish when trying to guess at the translation. This adjustment can be defined as follows:

  1. Select the Word or Phrase to Replace box in the blank row and type should not.
  2. Type not should in the Replacement box.

Treating Two Words as One

The two words in the English phrase how long are used in a lot of questions, but in that phrase they work together to express a very specific sense that is quite distinct from the most common sense of the individual words. In Italian, the equivalent phrase is quanto tempo, which translates to how-much time. But using the more common individual senses of the words, Transcelerator is likely to translate how as come and long as lungo, incorrectly rendering the phrase as come lungo. To avoid this, an adjustment can be defined to tell Transcelerator to treat that phrase as a single unit, which would improve its chances of discovering the correct Italian translation. This adjustment can be defined as follows:

  1. Select the Word or Phrase to Replace box in the blank row and type how long.
  2. Type how-long in the Replacement box.

Note: You might observe that how-long is not a “correct” English word. Not only can an adjustment transform the text into something that is grammatically incorrect, it does not even need to use real English words. In place of how-long, we could have just as easily typed globbetygibberish, but that would be harder for you to understand later when you come back to look at the rules defined for your project.

Eliminating Helping Words

Although English is an SVO language, questions typically use various inflections of the helping verb do in the normal verb position, pushing the uninflected main verb to the end of the sentence. Although Spanish is also an SVO language, it does not use helping verbs for questions, simply using the question word as the subject and retaining the inflected verb in its normal position. So the question What does the name "Abraham" mean? translates to ¿Qué significa el nombre "Abraham"? (literally, What means the name "Abraham"?). Trying to get the order exactly correct can be challenging (see following examples), but just eliminating the helping verb does from consideration can improve the guessed translation for a lot of questions. To define this adjustment:

  1. Select the Word or Phrase to Replace box in the blank row and type does.
  2. Leave the Replacement box blank.

Using a Regular Expression to Reorder Words

There are many English questions in Transcelerator that begin with What did ___ say.... As described in the previous example, in Spanish these questions are translated as ¿Qué dijo ___? (literally, What said ___?). This adjustment requires both a different inflected form of the main verb as well as a reordering. Since there is a blank that could be any of several dozen names, multiple names, a pronoun or even common noun phrases (e.g., What did some of the visitors say...), a regular expression is needed to represent the text that goes in the blank:

  1. Select the Word or Phrase to Replace box in the blank row and type what did (.*?) say\b. The parentheses define the enclosed expression as a match group that can be referred to by number in the replacement expression.
  2. Type what said $1 in the Replacement box.
  3. Select the Regular Expression check box.

Note: If you have trouble remembering the expression to use for a numbered match group, don't worry. Just select the Regular Expression check box first. Then when you edit the Replacement text, a helpful little Match group box appears underneath the cell where you are typing. Select the numbered group in the drop-down list, and the correct replacement expression will be inserted into the text at the current typing location. There’s also an item to add the replacement expression that represents the entirety of the text that was matched.

More Complex Example of Reordering Words

This example illustrates how to reorder words when multiple match groups are used. In addition to the “how long” questions previously mentioned, Transcelerator also has a lot of “how often” questions. Most of these are asking about the past and about half of them use the helping verb “did” along with an uninflected main verb. To reorder the words to correspond to Spanish, we can use a rule that explicitly lists some of the common English verbs:

  1. Select the Word or Phrase to Replace box in the blank row and type How (long|often) did (.*) (last|allow|stay|fish|help|enjoy|remain|mourn|expect|wait|rain).
  2. Select the Regular Expression check box.
  3. Type How $1 $3ed $2 in the Replacement box. Note: If you are using the Match group box to insert the match group specifiers, you will notice that there are now three numbered groups to choose from: 1, 2, and 3.
  4. If you entered the adjustment rule from the second example, you should now move this rule above that one (as described in Ordering Adjustments) so it will be applied first. Otherwise, this rule will not match the “how long” questions.

Helpful Tips:

homeHome

View this page in:

  • Español