Abstract

The incidence of computers in translation ranges from the rather unsuccessful attempts to attain the so-called Fully Automatic High Quality Machine Translation to the current widespread usage of translation memories. Gone are the years of vast government spending in research funding attempting to “help” computers do the translator's job, and spending comes now from individuals who invest in expensive software in the hope of having back machines what do you mean here? helping the translator. Are there any better ways than the ones available to share thatresearch spending and its benefits? Can you change this into a statement and then make this argument?

Key words

 

Glory and Shame of Machine Translation

The idea of trying to make numbers talk like words is an old one. While thinkers like Leibniz already devised a mathematical system of language representation and translation as early as in late 17th century, and even Descartes sketched out what he called a “universal language” in form of mathematical expressions, we can go back as far as 1661 to trace one of the first fully developed attempts to work out a mathematical model for translation citation for information and quote.

 

In 1661that year, the precocious chemist, explorer and mathematician, Johannes Becher produced a numeric system that was allegedly able to translate from Latin into German and postulated a generic mechanism that could be extended to all vernacular languages. It consisted of some 10,000 words, designated by a number, and it used additional numeric values for endings and cases, together with some basic equations. By entering “word” values into the calculations, new numbers would come out that could be checked against a new list in German, eventually returning a translation of the original (Freigang 2001).

 

It could be said, thus, that the concept of computer assisted translation, or even, automated translation, dates back several centuries before the appearance of computers. Should Becher have been able to use a computer or a calculating machine, no one would doubt to qualify his invention like the first attempt to develop an automated translation system. And there were other similar attempts, curiously enough, very close in time, like the one by Athanasius Kircher in 1663, or an even earlier one by Cave Beck in 1657 (Hutchins 1986, 2:1).

 

The idea of such “mechanical dictionaries” experienced a revival in the early 20th century, with the “Mechanical Brain,, (place punctuation within quotation marks) by the French engineer Georges Artsruni, or the invention by the Russian Petr Trojanskij, which were the first truly mechanical translation devices (Freigang 2001). Could you explain these in a bit more detail? Why are they “mechanical”?

 

The heyday of many other subsequent attempts to? started with a famous —or infamous— memorandum addressed to the Rockefeller Foundation in 1949 by Warren Weaver. His well-known mathematical model of communication, developed together with Claude Shannon, would consolidate the idea of translation as a mere question of “breaking the code” and would initiate two decades of frantic activities and huge investments in order (what types of investments?) to attain the so called “Fully Automated High Quality Machine Translation.”

 

Make more clear connections here for readers—the infamous part makes me pause and think—the infamous part is that scientists received more funding only to conclude that they could not attain their goals and thus end funding? The final report of the Automatic Language Processing Advisory Committee (ALPAC) is almost as famous —or as infamous [1]: the year 1966 meant the end of government spending for research on machine translation and the establishment of a certainty that lasts until nowadaystoday: machine translation is mostly useless without human intervention in the form of editing or rewriting. Do you have any comments about why the government stopped such funding as science finding began to increase in the name of the space reace?

 

However, further attempts and approaches would provide new insights to the complexity of the machine translation question, like the initiative in the former European Community, called Eurotra, which not only retrieved Descartes’s the original idea by Descartes of developing what is called an “interlingua,” or intermediate metalanguage, but also provided richer analytical developments while establishing the basies for current computer- assisted translation techniques.

 

The idea of developing an input-controlled translation method is very much associated with the Canadian system for bilingual weather reports, Méteo, which is still working nowadaystoday. This approach, which is also effectively working also for many multinational companies in the production of their internal multilingual paperwork, memorandums and manuals, can be very well summarised by outlining the features of the project called KANT, for “Knowledge-based Accurate Natural-language Translation.”

 

KANT works by carefully controlling the input quality of the source text. Developed by Carnegie Mellon University, the system monitors ambiguities in the original document, returns to the “writer” those segments considered incorrect by the machine’s internal grammar and only when the text is considered “understandable” by the machine, automated translation takes place (Nyberg and Mitamura 1992). However, it is a fully human intralingual translation;, in that the sense of R. Jakobson (Jakobson 2000[1959]) what takes place first: the original is conventionally translated into another simplified “original.” Automated translation becomes more of a by-product rather than a real translation.

 

But with science-fiction high-brow automated translation projects more or less at a halt, down to earth translation professionals did started to benefiting from the advantages of computers. Computer- Assisted Translation is “the broadest term used to describe an area of computer technology applications that automates or assists the act of translating text from one language to another” (SDL International). The list of contributions of computer technologies that conform to this definition is not short: word processors, electronic dictionaries, terminological data banks, BBS and discussion groups, optical character recognition, spell and grammar check, e-mail, WWW documentation, desktop publishing, speech recognition, specific localization tools, translation memories, etc.

 

From MT to TM

I intend here to speculate about the pendulum-like movement that may articulate the relationship between translation memories and machine translation which goes beyond a simple swap of capital initials —from MT to TM—, although it may very well have to do with the swapping of full, translated sentences.

 

Translation memories (TM) may be defined as set of software applications devised to help translators in their activity by retrieving already translated terms or segments and recycling them, or by building up tentative translations from previously translated segments that share common traits. Those perfectly duplicable segments are called “perfect matches.” Those tentative translations generated from analogous segments are called “fuzzy matches.”

 

Leaving aside the particular mechanics of different software, there are more than a dozen different Translation Memories in the market, ranging in price from the twenty- dollar amateurish “Alair II” to the highly professional, corporate and expensive 5000 dollar “Alchemy Catalyst,” or other suites like “Trados,” a de facto standard, or its competitors  Déjà Vu,” “SDLX” or “Transit”.

 

Translation memories are optimal tools for highly repetitive texts that are highly repetitive, belong to a larger corpus of specialized texts to be translated, present a wide specialized terminology pool and belong to multilingual localization projects. They help to guarantee a high degree of terminological consistency, ease massive revision processes, speed up productivity in large localization projects and efficiently cumulate topic-related formulisms.

 

However, it is easy to anticipate that they do not deal well with “stylistically rich” originals and that they impose a segment-restricted optics instead of general-text approaches. The so-called “pPerfect matches” may induce disastrous context-related misinterpretations. Furthermore, there has been a traditional problem of low compatibility between different TM software and, in most cases they involve an expensive investment for translators that may need to face too diverse customer requirements.

 

Let us focus on the last two problems: how can the information contained in a translation memory be shared between users of different software? Why can that be useful and when could it be desirable?

 

Large localization projects are often undertaken by teams of translators who are required to use the same software. Their already translated segments are uploaded into a common repository that subsequently provides possible perfect of fuzzy matches not only to the one translator that uploaded them, but also to the other members of the translation team.

 

Although the advantages of sharing one’s work with other project partners, by means of increasing the size of the commonly- developed repository of paired sentences and, thus, the overall amount of translated text that can be recycled, are clear and appealing, there are a few serious drawbacks that the current actual practice of translation memory sharing involves. Translators may have to put up with non-agreed solutions, revisions and eventual changes do affect other translators’ work, the search for consensus tends to slow down the process, and there is a higher workload for early starters, while more recycled segments are available for late participants.

 

Finally, all translators must use the same software and versions. Thus, a professional may end up being excluded from a project because it may not be worthwhile for him or her to invest in that particular new software that may be needed exclusively for a specific project. Why so? Even thinking of it as an investment in the long run, by when he or she may need the same software for a new project, new incompatible versions of the program may have been released.

 

Among the above problems, some are strictly work-flow related —which will not be discussed here— and some others are good-old translation problems. Finally, some other problems related with software standards fragment—what are the problems or what do they do?. For the latter, TMX provides a general solution that is becoming increasingly accepted and integrated by software makers.

 

TMX and New Paradigms in File Sharing

Translation Memory eXchange language (TMX) is a SGML/XML-based markup language —which involves a fairly easy and compatible Internet implementation. It is a standard established by LISA (Localization Industry Standards Association —www.lisa.org—) that is being increasingly integrated by translation memory makers within the export/import capabilities of their latest versions. There are several levels of compliance with the TMX norm, ranking 1 to 3 what do these rank mean? What are the levels of compliance? depending on the amount of meta-data aside from purely textual information which the system is able to convert into TMX. It becomes a powerful exchange tool when combined with TBX (TermBase eXchange Language), which is its counterpart by means of exchanging terminological database contents. Ultimately, by using TMX, translators would not have to use the same TM software in order to co-participate in the same localization project.

 

Essentially, TMX works as a text-only based mark-up language into which aligned text —original and its translation(s)—is exported from a translation memory [2]. No matter which TM software is being used, as long as it furnishes TMX import/export capabilities, the resulting tagged text-only file could be “read” by any other TM that effectively participates of the same capabilities, no matter what particular internal codification system it uses to store the information.

New subject heading here?

This is a —very much— general picture of how far things have evolved up to these days. How further can they go is still questionable but here follows a speculation on the potential of TMX when combined with currently existing possibilities and software already running on the Internet. What will be said from now on, however speculative, is not simple science-fiction and, should technical and human means be provided, an interesting field of theoretical research and practical application may unfold before us.

 

The new paradigms in Internet file sharing must be considered here. In the late 1990s a new way of sharing information and files shook the music industry and pushed it to the fringe of bankruptcy in some cases. Programs like Napster, Gnutella, Kazaa, and others, allow users to share their files —including music— and to them exchange them freely. Several national branches of large music companies were forced to close or to deeply restructure their business philosophies because of the economic breakdown inflicted by peer-to-peer Internet music sharing. As a result, a ruling of the Supreme Court in 2001 closed Napter’s web page and all its activities. This involved one of the most echoed direct interventions of the administration on the actual practices that take place in the Internet. But it is not music or even major financial consequences what may be interesting in regard to translation memories: it is instead the fact that a network of independent users may share their files so easily which becomes of importance here.

 

Basically, a program like Napster works as follows: a user sets a series of music files in his computer within an especial “share” folder. The program sends the list of filenames (song titles) to the server, which indexes it. Then the user sends a query about any song he or she  may be interested in. Since many other users of the same software sent their shareable filenames to the server using Napster, the server locates the requested song title in his indexed directory and tells the first user in which other computer the song is stored. Then, both users’ computers connect directly one with another and file transmission takes place inon a one-to-one basis. The bulge of data (the comparably huge music file) is only transmitted in the final stage. All what happens before that is just listings of short textual units (song titles) going to and fro.

 

The Gnutella system works in a slightly different way: it is more of a “word-of-mouth” system —if such a bodily metaphor can be used when talking about computers— which is consequently slower but requires no central server. One user launches a request, which is directed to only two computers, the “closest ones” in the network of Gnutella users.  The odds are that those particular computers are not able to satisfy the request of that particular filename, so the next thing they do is to re-launch the same query to the next two computers. After twenty times, more than one million computers will have received the request. Once the requested file is located, a response stating where the host is travels back the chain and, finally the requesterrequestor and the provider get in touch directly, without a “middleman” this time, and the file is transmitted, again on a one-to-one basis.

 

The question arising from this seems both obvious and compelling: Can a peer-to-peer exchange system be developed for translation memory sharing?

 

UsageUsing of TMX as a unifying standard would provide common grounds for the exchange. Once a translation project is finished, translators usually return their final version to their client, while they usually hold the resulting translation memory as a by-product of their work.  A program would convert the contents of those memories into TMX-tagged multilingual text and an “exchanger” would expose the memories to the World Wide Web by placing them it into a share area open to public access.

 

The repetition of this action by several users would create a dense and sprawling network of interconnected computers, as it happens with Napster and Gnutella, which could potentially become the largest pool of aligned text (translations and originals) ever.

 

Whenever a translation project starts, users would connect to the network and their “memory exchanger” would launch queries for similar segments to the bulge of participants. Slowly, in a way similar to that of the basic translation memories themselves, pre-translated replies would travel back to the requesterrequestor, some in form of perfect matches, most of them in form of fuzzy matches. This would result in a pre-translated draft, whose production perhaps could require the computer to be left working overnight (depending on factors such as length, actual degree of matches found, level of requirements set by the user etc…).

 

There are of course many questions arising from this, most of them far beyond the scope of this paper. And not a few immediate drawbacks. To start with, all previous drawbacks from conventional non peer-to-peer sharing would be still there, unsolved. But also, there would be higher risks of potentially wrong translations from anonymous partners: sharper criticism on received equivalences will be needed, making thus the revision process even more demanding. The obviously wider range of topic variety would add to confusion and metadata describing the thematic adscription of segments would be indispensable in order for the machine to “trust” one potential translation or the other. Bandwidth requirements would be unknown. There would also be legal and copyright issues on translated text —as such— versus equivalent segments whose ownership is determined differently depending on national legislations.

 

With technical, legal and translational problems ahead, the possibility to implement some peer-to-peer device for translation memory sharing appears both as a challenging enterprise and as a promising area of research. As one good old friend of mine says “machines don’t have intuition, but they have memories” (Fustuegueres 2001, my translation). I would add, maybe we can help them sharing.

 

There needs to be a conclusion section that clearly brings together the purpose,   findings, and effects...

 

End notes

[1] See Hutchins (Hutchins 1996) for an enlightening description of the most frequent misinterpretations and misleading circumstances related to the ALPAC report.

 

[2] Aligned texts are the main asset of a translation memory, and many resources are usually devoted by companies and institutions to align texts that had been translated before the implementation of TM software in order to enhance the production of subsequent translation activity


 

References

Abaitua, Joseba. TMX format,” 1998,

http://paginaspersonales.deusto.es/abaitua/konzeptu/ta/tmx.htm

 

Brain, Marshall. “How Gnutella Works,” How stuff works, http://computer.howstuffworks.com/file-sharing3.htm

 

Davis, Paul C. Stone Soup Translation: The Linked Automata Model. Doctoral dissertation. Ohio State University, 2002, http://www.ling.ohio-state.edu/~pcdavis/papers/diss.pdf

 

Freigang, Karl Heinz. “Automation of Translation: Past, Presence, and Future” in Revista Tradumatica No. 0 (2001), http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm

 

Gow, Francie. Metrics for evaluating Translation Memory Software. Unpublished Thesis. University of Ottawa, 2003.

 

Hutchins, John. “The precursos and the pioneers,” Machine Translation, past, present and future. New York: Halsted, 1986.

 

-          ALPAC: the (in)famous report,” MT News International Vol. 14, June 1996, pp. 9-12. Reprinted in: Readings in machine translation, ed. Sergei Nirenburg, Harold Somers, and Yorick Wilks (Cambridge, Mass.: The MIT Press, 2003), pp. 131-135. Also available at: http://ourworld.compuserve.com/homepages/WJHutchins/Alpac.htm

 

Jakobson, Roman. “On Linguistic Aspects of Translation.” In Baker, M., and Venuti, L.k Eds. The Translation Studies Reader, 113-118. London and New York: Routledge, 2000.

 

Nyberg, Eric; Mitamura, Teruko. “The Kant system: fast, accurate, high-quality, translation in practical domains,” Proceeds of Coling 92, Nantes, 1992.

 

Sanchez-Gijon, Pilar. “Cataleg de sistemes de memories de traduccio,” Revista Tradumatica, No. 0 (2001).

 

SDL International. An Introduction to Computer Aided-Translation, http://www.sdl.com/products and http://tc.eserver.org/18490.html

 

Several authors. “CAT fight,” Proz, The Translators Workplace, http://www.proz.com/?sp=cat/compare

 

Silvia Fustegueres. “Qui te por de les memories de traduccio?” Revista Tradumatica, No. 0 (2001).

 

Zerfass, Angelika. “Evaluating Translation Memory Systems,” First International Workshop on Language Resources for Translation Work and Research, Gran Canaria, 2002.

 

 

 

Rating Table
Submission Number:
oooo
Submission title :
oooo

Quality Statements

Strongly
Agree

Agree

Disagree

Strongly
Disagree

A:  The manuscript deals with a significant problem.

 

 

X

 

B:  The manuscript is creative or deals with the
subject in a new or novel way.

 

X

 

 

C:  The author included the appropriate background or literature review.

 

X

 

 

D:  The author's writing style is appropriate,
academic, and clear.

 

X

 

 

E:  The study is conceptually based and theoretically grounded.

 

X

 

 

F:  The analyses are sound and appropriate.

 

 

X

 

G:  The conclusions and/or policy implications flow from the study's findings.

 

X

 

 

H:  Readers of AEQ will find this article       
of interest.

 

 

 

X

COMMENTS:  The two primary problems with this submission are: 1the submission does not really discuss pedagogy. How could this information be useful in a classroom or for teaching? How could teachers use such software? 2What is the argument here? You seem to want to argue that file-sharing groups can enable speedier, more effective translation research. The translation into the Napster section does not clearly enough signify that shift, nor does your introdcution.
 

REVIEWER'S NAME:    kes