ADVANTAGES AND DISADVANTAGES OF TRANSLATION MEMORY:

A COST/BENEFIT ANALYSIS

by

Lynn E. Webb

BA, San Francisco State University, 1992

Submitted in partial satisfaction of the requirements for

the Degree of

MASTER OF ARTS

in

Translation of German

Graduate Division

Monterey Institute of International Studies

Monterey, California

Copyright © 1998-1999 by Lynn E. Webb

 


ACKNOWLEDGEMENTS                                                                                                                                      3

1      INTRODUCTION                                                                                                                                          4

2      Translation Memory Defined                                                                                                              4

3      The Effects of Translation Memory on the Translation Process                                           7

3.1    THE TRANSLATION PROCESS                                                                                                                 7

3.2    MANAGING THE TRANSLATION PROCESS                                                                                              9

3.2.1     INTERNAL ATTRIBUTES                                                                                                                 9

3.2.2     TERMINOLOGY DATABASES                                                                                                        10

3.2.3     ANALYSIS                                                                                                                                     10

4      Texts That Are Conducive To Using Translation Memory                                                          10

4.1    REUSABILITY                                                                                                                                        10

4.1.1     UPDATES                                                                                                                                      10

4.1.2     REVISIONS                                                                                                                                   11

4.1.3     “RECYCLING” PRIOR WORK                                                                                                          11

4.2    REPETITIVE CONTENT                                                                                                                           12

5      Key Considerations for Determining the Cost-Effectiveness of Translation MEMORY     14

5.1    TYPE OF PROJECT: INDIVIDUAL VS. TEAM                                                                                           14

5.2    PERCENTAGE OF WORK IN HARD COPY VS. ELECTRONIC FORMAT                                                    14

5.3    TYPE OF TEXT USUALLY TRANSLATED                                                                                                 14

5.4    TIME REQUIRED FOR CONVENTIONAL TRANSLATION PROCESS VS. TM PROCESS                             15

5.5    COMPARING RATES                                                                                                                              15

5.6    NEED TO INTEGRATE PRIOR WORK FOR PRESENT PROJECTS (ALIGNMENT)                                      16

5.7    NEED TO INTEGRATE PRESENT WORK FOR FUTURE PROJECTS                                                         16

5.8    FREQUENCY OF UPDATES AND REVISIONS                                                                                         17

6      Examples                                                                                                                                               17

6.1    INITIAL INVESTMENT                                                                                                                             18

6.2    THE CLIENT                                                                                                                                           18

6.3    THE TRANSLATION AGENCY                                                                                                                 22

6.4    THE FREELANCE TRANSLATOR                                                                                                            24

6.5    COMPANIES WITH IN-HOUSE TRANSLATION DIVISIONS                                                                        27

7      Survey/Case Studies                                                                                                                           29

7.1    SURVEY                                                                                                                                                29

7.2    CASE STUDIES                                                                                                                                     32

8.     TM DATABASE OWNERSHIP                                                                                                                     33

9.     Drawbacks of translation memory                                                                                               34

10.    Translation Memory Products                                                                                                        34

10.1      STANDARD TRANSLATION MEMORY SOFTWARE                                                                             35

10.2      LOCALIZATION SOFTWARE WITH TM                                                                                                 35

10.3      TM/MT HYBRIDS                                                                                                                                35

11.    Finding Common Ground                                                                                                                     35

12.    FUTURE TRENDS                                                                                                                                      36

13.    Conclusion                                                                                                                                            38

14.    REFERENCES                                                                                                                                            39

15.    APPENDICES                                                                                                                                            43



ACKNOWLEDGEMENTS

            When I first considered writing my thesis on translation memory, I wasn’t exactly sure on what aspect I should focus. I would like to thank Chris Langewis for helping me to narrow my topics down to the one presented in this thesis. I would also like to thank him for his valuable input as my thesis advisor and as an expert in the field.

            My survey would not have been a success without responses from the members of LANTRA-L, CompuServe’s Foreign Language Education Forum (FLEFO), Interlang and from fellow translators who responded to the various personal e-mails that I sent out. Gerald Dennett, Mark Berry and Jeff Allen all deserve a special thanks for contributing valuable information. I would also like to thank the various translation memory software manufacturers, especially TRADOS Corporation, for their input and responses to my questions.

            Finally, I would like to thank William Webb for reading and editing and David Sawyer and Frank Austermühl for reading and approving my final thesis. After all, if it weren’t for this audience, I wouldn’t be able to share it with the wider audience—all of you.

            The figures and tables that are not displayed in this electronic document can be found in the HTML pages (figures) and Excel spreadsheets (tables) included with this document in the archive zip file.


1       INTRODUCTION

Many articles have been written about translation memory in the last few years. Most of the material is provided by the producers of translation memory systems and only covers a specific product or lists specific features of the technology. Some articles even magnify the negative aspects of translation memory. Despite all that has been written, not much has been said about the actual costs or potential savings involved when using translation memory. It is becoming increasingly clear that translation memory is here to stay and that it is serving a useful purpose, but just exactly how is this technology affecting the translation industry? Who can profit from it? Are there any "losers?” This thesis will attempt to determine the applicability of translation memory technology and illustrate the advantages and disadvantages of translation memory in the form of a cost/benefit analysis from the point of view of the end-user.

For the purpose of this thesis, the “end-user” comprises freelance translators, companies with in-house translation divisions, translation agencies and direct clients. The key considerations for determining the cost-effectiveness of translation memory and the cost/benefit analysis will be covered later in this text.

 

2       Translation Memory Defined

What is translation memory? Translation memory (TM) is defined by the Expert Advisory Group on Language Engineering Standards (EAGLES) Evaluation Working Group's document on the evaluation of natural language processing systems as “a multilingual text archive containing (segmented, aligned, parsed and classified) multilingual texts, allowing storage and retrieval of aligned multilingual text segments against various search conditions.”[1] In other words, translation memory (also known as sentence memory) consists of a database that stores source and target language pairs of text segments that can be retrieved for use with present texts and texts to be translated in the future. The translator, a different translation memory system or a machine translation system provide the target text segments that are paired with the source text segments so that the end product is a quality translation.

What distinguishes TM from other computer-assisted translation (CAT) tools? There are many CAT tools available to assist the translator, such as bilingual and multilingual dictionaries, grammar and spell checkers and terminology software, but TM goes one step further by making use of these other CAT tools while at the same matching up the original source document stored in its database with the updated or revised document through exact and fuzzy matching. Normally, the basic unit of text in a TM database is a sentence; however, the TM user can define what the unit will be. The basic unit might even be a sentence fragment or a paragraph. The translator does not have to re-translate work he or she has already completed. Figure 1 illustrates the basic translation memory process for creating a target language translation.



Fig. 1: Basic TM Process

How does TM differ from machine translation (MT)? MT creates automated translations and requires an advanced terminology database that includes all grammatical elements of a language. The MT system uses comprehensive dictionaries to translate the source text while at the same time applying the grammatical rules, or rule sets, from the database in order to produce the resulting grammatically correct target sentences. The technology sounds like an excellent solution; however, there is a catch: the source and resulting target text segments are not stored away in a database for future use. If a similar text (such as an automobile user’s manual for the same model but different year) needs to be translated, the MT system would have to start from scratch. On the other hand, a TM system is used as a translator’s aid, storing a human translator’s text in a database for future use. TM can be used a few different ways. One way would be to have a translator or a machine translation system translate the original text, using translation memory to store the paired source and target segments. The translator could then reuse the stored texts to create the revised or updated version of the text. Only the segments of the new text that do not match the old one would have to be translated. The alternative would be to use an MT system or a different TM system to translate the original. The new TM system could then be used by a translator to translate the revision or update by aligning the texts produced by the MT system or other TM system and storing them in the TM database for present and future work. The translator could then proceed to translate only the segments of the new text, using TM as described above.

3       The Effects of Translation Memory on the Translation Process

3.1       THE TRANSLATION PROCESS

How does TM affect the conventional translation process? In order to answer this question, we must have an idea of what takes place during this process. This section will briefly touch on the general translation process and the features of TM used in this process.

Figures 2 and 3 illustrate the conventional translation process and the translation process using TM respectively. If it is possible to analyze quickly what type of text one is translating and if the text suits translation memory, there are about the same number of steps involved in both processes. One of the greatest differences, however, is that once a translation has been performed using TM, not only is there a glossary of terms stored for future recall, individual sentences will also be stored, thus cutting down considerably on the time required for a future translation, update or revision.

Figure 4 illustrates the conventional translation process for a text that is being revised. Figure 5 illustrates the same text using TM. Note that when using TM, fuzzy and exact matching are performed using the translation memory program, allowing for quick access to sections that have changed and permitting the user to focus on translating only those changed sections.

Exact matching is the process by which the TM program pairs text segments in a revised source text that match the original source text exactly; however, any text in the document that does not exactly match the original will not be translated. Fuzzy matching is the process by which the TM program pairs text segments in a revised source text with similar text segments from a previously stored translation based on the original source text. Fuzzy matching will find segments that are very similar to the original and suggest the original translation. This function can be set to different levels of sensitivity, allowing the translator to “match” source text segments that may differ only slightly or segments that vary greatly, but still have some similarities. After exact and fuzzy matching, the translator can modify the remaining segments that reflect the changes between the original and revised texts without having to retranslate the entire document (see Figure 6).


Fig. 6: Exact and Fuzzy Matching


 


In addition to matching source text segments, fuzzy matching can also be used to find terminology in the terminology database that is very similar to terminology being used for a translation. For example, if the term “communicate” is in the terminology database, the translation of “communicate” will be suggested whenever the terms “communicated” or “communication” appear in the original text. The translator can then enter the correct form of the word accordingly.

Although fuzzy matching is quite useful, the user must also be aware of problems that may arise during post-editing of matched text segments. Gerald Dennett explains in his thesis entitled “Translation Memory: Concepts, products, impact and prospects”:

Take the German sentence pairs:

1        Ein Messer ist im Schrank. Er mißt Elektrizität.

2        Ein Messer ist im Schrank. Es ist sehr scharf.”

 

Imagine that the translator has translated a document containing sentence pair 1 and has thus stored in his Translation Memory the two segments:

A meter is in the cabinet.” And “It measures electricity.” The syntactical and contextual information supplied by the second sentence indicates to the translator that the word “Messer” here refers to a meter. The translator then runs a text containing sentence pair 2 through the pre-translation routine in his Translation Memory software. The Translation Memory software will recognise a 100% match in the first part of the pair, and insert “A meter is in the cabinet.” in the translation. A human translator would immediately realise from the syntactical and contextual information supplied in the second part of the pair that here in German word “Messer” is of neuter gender, and hence means “knife”. The translator must hope that he can pick up such mistranslations in his proof-reading.[2]

 

On the other hand, the likelihood that the above sentences would appear in the same document is probably quite low, especially since they would probably be used in completely different domains or in different types of text.

The alignment tool is an example of a CAT tool that is almost indispensable when initially integrating older translations into TM. Using the alignment tool with TM can save the user time on future projects. Figure 7 illustrates the process of alignment. Alignment involves matching the electronic source/target texts by aligning matching source/target text segments. The translator is essentially building a TM database that is identical to a normal TM database built during the translation process. This process is performed when it is clear that the source text will be revised or updated in the future but was originally translated using the conventional translation process. If the source or target texts are in hard copy, one should seriously consider the likelihood of whether or not the text will require future updates or revisions before performing an alignment.

3.2       MANAGING THE TRANSLATION PROCESS

Perhaps the most intriguing aspect of translation memory is its ability to aid the user in managing projects, coordinating team efforts and building glossaries and dictionaries. Following are some additional features of TM that allow the translator or other user to manage translation projects more efficiently.

 

3.2.1   INTERNAL ATTRIBUTES

Most TM products not only store language pairs; they also store other information, called attributes, with the pairs. The most common attributes stored include the creation date, the name of the user or creator, the client, the project ID and the main domain or field (e.g., legal, technical, etc.) of the translation. Once this information is stored with the translated segments, the translator or other user can filter the text for the most important attributes. For example, the user can look for similar text segments by project, client, etc. when performing fuzzy matching, or a project manager may have more control over accountability for translated texts by filtering for creation date or the name of the creator of the translated segments. The latter is particularly useful when a number of translators are working on one large project, especially when the translators are all working with the same language pair.

 

3.2.2   TERMINOLOGY DATABASES

Most TM products come with a terminology database so that the translator can take full advantage of all of the features of TM. Using an integrated terminology database allows a translator to perform fuzzy matching for a specific term or to use a term in the database suggested by TM. Without a terminology database that is compatible with translation memory, the TM user cannot easily obtain suggested translations for individual words without opening a separate electronic dictionary or looking through a conventional dictionary. Naturally, the user must enter the terminology into the database before it can be useful. Once the terms are in the database, however, an individual translator or team of translators can work on a project and receive the suggested terms from the database, maintaining terminological consistency throughout the translation.

3.2.3   ANALYSIS

The ability to estimate in advance approximately how much time a project will take is not always an easy task. If the translation memory system is a good one, it will have the capability of analyzing a document for similar sentences and text repetition. It will also provide raw word counts, ignoring elements like graphics, HTML tags, software code, etc. that could influence the count. This analysis makes it easier for the translator or project manager to assess whether or not translation memory will be useful for the project and also helps him or her determine how much time may be involved in translating the document, depending on the amount of repetition, the word count, etc.

The user may also use the analysis function to compare different documents for similarities. Analysis can reveal if one document that has been translated previously and a newer document are in any way similar. Depending on how similar the two documents are, the user can estimate the time required for translation.

4       Texts That Are Conducive To Using Translation Memory

4.1             REUSABILITY

The most important characteristic of a text that is conducive to translation memory is that the text will be reused in one way or another. Following are examples of how texts can be reused and how translation memory becomes involved in the process.

4.1.1   UPDATES

A not uncommon occurrence during the translation process is when an update of the text being translated is suddenly made available to the translator. An update is a change in a source text that occurs while the translation is still in progress. Receiving an updated text can cause major difficulties for the translator if the text is large and changes have been made throughout the entire document. Figures 3 and 4 illustrate the update/revision process with and without translation memory. Making updates using translation memory has the advantage over the conventional update process in that the translator does not have to physically search through the entire document for changes. Instead, the translator only has to run the updated source text through the translation memory program to identify new or changed segments and any new terminology. New terminology can be entered into the terminology database by the translator for future use.

Keep in mind that in order for translation memory to be effective, all work must be done in TM and saved in TM format. Anything done outside of TM will not be stored in the memory database and therefore will not be a translation that can be manipulated in the future, unless one has access to an alignment tool. The best way to approach TM is to think about it as being an integral part of the main word processor, just like the word processor’s spell checker. If the TM system is a stand-alone product, always keep a copy of the text file that retains the TM product’s file format.

A translator can even begin the translation process before the final original document is completed. If the translator is given drafts of the original document in its early stages of development, the text can be translated and stored in the TM database. Then, as updated sections of the text are made available, the translator can perform fuzzy and exact matching, thus isolating the new parts from the parts that have already been translated or that are similar to the original. Section 6.5 is an example of this process.

 

4.1.2   REVISIONS

Many translators find that they continually receive revisions from the same clients. A revision is a new project amending a prior translation, reflecting changes made to a prior source text. Often a translator is asked by a client to revise the translation of a manual for the current product model that will be released within a short period of time. The client wants the translated manual to be available at the same time that the product is launched on the market. If the translator were to use the conventional translation process, it could take months before a very large document would be ready, and the client might not have that much patience or time. If, however, the translator uses translation memory, he or she can analyze what has changed within the document and can provide the revised translation of the manual within a shorter period of time than if he or she had used the conventional process. Section 6.4 illustrates this process.

 

4.1.3   “RECYCLING” PRIOR WORK

At times, a translator may find that he or she is translating a text very similar to one that had been translated in the past. The translator may run across words or phrases that are almost identical to words or phrases in the older document. The odds that a translator will ever translate the same sentence twice in two different texts is very low; however, the odds are higher that a translator will run across similar phrases or words in texts within the same field and/or for the same client. If the translator has an electronic copy of the target and source texts from the previous translation, then he or she can quickly access the files and perform fuzzy matching with the new source text against the old source and target texts.

4.2             REPETITIVE CONTENT

Another important factor is whether or not there is repetitive content within a text. The higher the percentage of repetitive content within a text, the more desirable it is to use translation memory. Repetitive content may include words, phrases or entire paragraphs. There are a number of different text types, but some tend to have more repetitive content than others. The majority of translatable texts fall into the following categories[3]:

-        Correspondence

-        Journalism/Communication

v     Business/Commercial

-        Marketing

-        Advertising

-        Administration

v     Legal

v     Scientific

v  Technical

-        Culture

-        Literature

The types of texts that are usually suited for translation memory are marked with the "v" symbol. Interestingly, according to the Telecom Observer, “each year 450 million pages of scientific, technical, and commercial materials are translated world-wide.”[4] Some examples of the type of texts that fall into these categories include:

-                    Patents (Legal)

-                    Contracts (Legal, Business/Commercial)