Abstract
The
incidence of computers in translation ranges from the rather unsuccessful
attempts to attain the so-called Fully Automatic High Quality Machine
Translation to the current widespread usage of translation memories. Gone are
the years of vast government spending in research funding attempting
to “help” computers do the translator's job, and
spending comes now from individuals who invest in expensive software in the
hope of having back machines what do
you mean here? helping the translator.
Are there any better ways than the ones available to share thatresearch
spending and its benefits? Can you change this into a statement and then make
this argument?
Key words
Glory and Shame of Machine Translation
The idea of
trying to make numbers talk like words is an old one. While thinkers like
Leibniz already devised a mathematical system of language representation and
translation as early as in late 17th century, and even Descartes
sketched out what he called a “universal language” in form of mathematical
expressions, we can go back as far as 1661 to trace one of the first fully
developed attempts to work out a mathematical model for translation citation
for information and quote.
In 1661that year,
the precocious chemist, explorer and mathematician, Johannes Becher produced a numeric system that was allegedly able to
translate from Latin into German and postulated a generic mechanism that could
be extended to all vernacular languages. It consisted of some 10,000 words,
designated by a number, and it used additional numeric values for endings and
cases, together with some basic equations. By entering “word” values into the
calculations, new numbers would come out that could be checked against a new list
in German, eventually returning a translation of the original (Freigang 2001).
It could be said,
thus, that the concept of computer assisted translation, or even, automated
translation, dates back several centuries before the appearance of computers. Should Becher have been able to use a computer or a calculating
machine, no one would doubt to qualify his invention like the first attempt to
develop an automated translation system. And there were other similar attempts, curiously enough, very close in
time, like the one by Athanasius Kircher
in 1663, or an even earlier one by Cave Beck in 1657 (Hutchins 1986, 2:1).
The idea of such
“mechanical dictionaries” experienced a revival in the early 20th century, with
the “Mechanical Brain,”, (place
punctuation within quotation marks) by
the French engineer Georges Artsruni, or the
invention by the Russian Petr Trojanskij,
which were the first truly mechanical translation devices (Freigang
2001).
Could you explain these in a bit more detail? Why are they “mechanical”?
The heyday of many other subsequent attempts to? started with a famous —or infamous—
memorandum addressed to the Rockefeller Foundation in 1949 by Warren Weaver.
His well-known mathematical model of communication, developed together with Claude
Shannon, would consolidate the idea of translation as a mere question of
“breaking the code” and would initiate two decades of frantic activities and
huge investments in order (what
types of investments?) to attain the so called “Fully Automated
High Quality Machine Translation.”
Make more
clear connections here for readers—the infamous part makes me pause and
think—the infamous part is that scientists received more funding only to
conclude that they could not attain their goals and thus end funding? The
final report of the Automatic Language Processing Advisory Committee (ALPAC) is almost as famous —or as
infamous [1]: the year 1966 meant the end of government spending for research
on machine translation and the establishment of a certainty that lasts until
nowadaystoday: machine translation is mostly
useless without human intervention in the form of editing or rewriting. Do you have
any comments about why the government stopped such funding as science finding
began to increase in the name of the space reace?
However,
further attempts and approaches would provide new insights to the complexity of
the machine translation question, like the initiative in the former European
Community, called Eurotra, which not only retrieved Descartes’s the original
idea by
Descartes of developing what is called
an “interlingua,” or intermediate metalanguage, but
also provided richer analytical developments while establishing the basies
for current computer- assisted
translation techniques.
The idea of
developing an input-controlled translation method is very much associated with
the Canadian system for bilingual weather reports, Méteo,
which is still working nowadaystoday. This
approach, which is also effectively working also
for many multinational companies in the production of their internal
multilingual paperwork, memorandums and manuals, can be very well summarised by
outlining the features of the project called KANT, for “Knowledge-based
Accurate Natural-language Translation.”
KANT works
by carefully controlling the input quality of the source text. Developed by ,
in that
the sense of R. Jakobson
(Jakobson 2000[1959]) what takes place first: the
original is conventionally translated into another simplified “original.”
Automated translation becomes more of a by-product rather than a real
translation.
But with
science-fiction high-brow automated translation projects more or less at a
halt, down to earth translation professionals did started to
benefiting from the advantages of computers.
Computer- Assisted
Translation is “the broadest term used to describe an area of computer technology
applications that automates or assists the act of translating text from one
language to another” (SDL International). The list of
contributions of computer technologies that conform to this definition is not
short: word processors, electronic
dictionaries, terminological data banks, BBS and discussion groups, optical
character recognition, spell and grammar check, e-mail, WWW documentation,
desktop publishing, speech recognition, specific localization tools,
translation memories, etc.…
From MT to TM
I intend
here to speculate about the pendulum-like movement that may articulate the
relationship between translation memories and machine translation which goes
beyond a simple swap of capital initials —from MT to
TM—,
although it may very well have to do with the swapping of full,
translated sentences.
Translation
memories (TM) may be defined as set of software applications devised to help
translators in their activity by retrieving already translated terms or
segments and recycling them, or by building up tentative translations from
previously translated segments that share common traits. Those perfectly
duplicable segments are called “perfect matches.” Those tentative translations
generated from analogous segments are called “fuzzy matches.”
Leaving
aside the particular mechanics of different software, there are more than a
dozen different Translation Memories in the market, ranging in price from the
twenty- dollar
amateurish “Alair II” to the highly professional,
corporate and expensive 5000 dollar “Alchemy Catalyst,” or other suites like “Trados,” a de facto
standard, or its competitors
“Déjà Vu,” “SDLX” or “Transit”.
Translation
memories are optimal tools for highly repetitive texts that are
highly repetitive, belong to a larger corpus of specialized texts
to be translated, present a wide specialized terminology pool and belong to
multilingual localization projects. They help to guarantee a high degree of
terminological consistency, ease massive revision processes, speed up
productivity in large localization projects and efficiently cumulate
topic-related formulisms.
However, it
is easy to anticipate that they do not deal well with “stylistically rich”
originals and that they impose a segment-restricted optics instead of
general-text approaches. The so-called “pPerfect
matches” may induce disastrous context-related misinterpretations. Furthermore,
there has been a traditional problem of low compatibility between different TM
software and, in most cases they involve an expensive investment for translators
that may need to face too diverse customer requirements.
Let us
focus on the last two problems: how can the information contained in a
translation memory be shared between users of different
software? Why can that be useful and when could it be desirable?
Large
localization projects are often undertaken by teams of translators who are
required to use the same software. Their already translated segments are
uploaded into a common repository that subsequently provides possible perfect
of fuzzy matches not only to the one translator that uploaded them, but also to
the other members of the translation team.
Although
the advantages of sharing one’s work with other project partners, by means of
increasing the size of the commonly- developed
repository of paired sentences and, thus, the overall amount of translated text
that can be recycled, are clear and appealing, there are a few serious
drawbacks that the current actual practice of translation memory sharing
involves. Translators may have to put up with non-agreed solutions, revisions
and eventual changes do affect other translators’ work, the search for
consensus tends to slow down the process, and there is a higher workload for
early starters, while more recycled segments are available for late
participants.
Finally,
all translators must use the same software and versions. Thus, a professional
may end up being excluded from a project because it may not be worthwhile
for him or her to invest in that particular new software that may be needed
exclusively for a specific project. Why so? Even
thinking of it as an investment in the long run, by when he or she may need the
same software for a new project, new incompatible versions of the program may
have been released.
Among the
above problems, some are strictly work-flow related —which will not be
discussed here— and some others are good-old translation problems. Finally, some
other problems related with software standards fragment—what are the
problems or what do they do?. For the latter, TMX provides a general solution that is becoming
increasingly accepted and integrated by software makers.
TMX and New Paradigms in File Sharing
Translation
Memory eXchange language (TMX)
is a SGML/XML-based markup language —which involves a
fairly easy and compatible Internet implementation. It is a standard
established by LISA (Localization Industry Standards Association
—www.lisa.org—) that is being increasingly integrated by translation memory
makers within the export/import capabilities of their latest versions. There
are several levels of compliance with the TMX norm,
ranking 1 to 3 what do these rank mean? What are the levels of compliance?
depending on the amount of meta-data
aside from purely textual information which the system is able to convert into TMX. It becomes a powerful exchange tool when combined with
TBX (TermBase eXchange Language), which is its counterpart by means of exchanging
terminological database contents. Ultimately, by using TMX,
translators would not have to use the same TM software in order to
co-participate in the same localization project.
Essentially,
TMX works as a text-only based mark-up language into
which aligned text —original and its translation(s)—is exported from a
translation memory [2]. No matter which TM software is being used, as long as
it furnishes TMX import/export capabilities, the
resulting tagged text-only file could be “read” by any other TM that
effectively participates of the same capabilities, no matter what particular
internal codification system it uses to store the information.
New subject heading here?
This is a
—very much— general picture of how far things have evolved up to
these days. How further can they go is still questionable but here
follows a speculation on the potential of TMX when
combined with currently existing possibilities and software already running on
the Internet. What will be said from now on, however speculative, is not simple
science-fiction and, should technical and human means be provided, an
interesting field of theoretical research and practical application may unfold
before us.
The new
paradigms in Internet file sharing must be considered here. In the late 1990’s
a new way of sharing information and files shook the music industry and pushed
it to the fringe of bankruptcy in some cases. Programs like Napster, Gnutella, Kazaa, and others, allow users to share
their files —including music— and to them
exchange them freely. Several national branches
of large music companies were forced to close or to deeply restructure their
business philosophies because of the economic breakdown inflicted by
peer-to-peer Internet music sharing. As a result, a ruling of the Supreme Court
in 2001 closed Napter’s web page and all its
activities. This involved one of the most echoed direct interventions of the
administration on the actual practices that take place in the Internet. But it
is not music or even major financial consequences what may be interesting in
regard to translation memories: it is instead the fact that a network of
independent users may share their files so easily which becomes of importance
here.
Basically,
a program like Napster works as follows: a user sets a series of music files in
his computer within an especial “share” folder. The program sends the list of
filenames (song titles) to the server, which indexes it. Then the user sends a
query about any song he or she may be interested in. Since many other
users of the same software sent their shareable filenames to the server using
Napster, the server locates the requested song title in his indexed directory
and tells the first user in which other computer the song is stored. Then, both
users’ computers connect directly one with another and file transmission takes
place inon a one-to-one
basis. The bulge of data (the comparably huge music file) is only transmitted
in the final stage. All what happens before that is just listings of short
textual units (song titles) going to and fro.
The
Gnutella system works in a slightly different way: it is more of a
“word-of-mouth” system —if such a bodily metaphor can be
used when talking about computers— which is consequently slower but
requires no central server. One user launches a request, which is directed to
only two computers, the “closest ones” in the network of Gnutella users. The odds are that those particular computers
are not able to satisfy the request of that particular filename, so the next
thing they do is to re-launch the same query to the next two computers. After
twenty times, more than one million computers will have received the request.
Once the requested file is located, a response stating where the host is
travels back the chain and, finally the requesterrequestor
and the provider get in touch directly, without a “middleman” this time, and
the file is transmitted, again on a one-to-one basis.
The
question arising from this seems both obvious and compelling: Can a peer-to-peer exchange system be
developed for translation memory sharing?
UsageUsing
of TMX as a unifying standard would provide common
grounds for the exchange. Once a translation project is finished, translators
usually return their final version to their client, while they usually hold the
resulting translation memory as a by-product of their work. A program would convert the contents of those
memories into TMX-tagged multilingual text and an
“exchanger” would expose the memories to the World Wide Web by placing them it
into a share area open to public access.
The
repetition of this action by several users would create a dense and sprawling
network of interconnected computers, as it happens with Napster and Gnutella,
which could potentially become the largest pool of aligned text (translations
and originals) ever.
Whenever a
translation project starts, users would connect to the network and their
“memory exchanger” would launch queries for similar segments to the bulge of participants.
Slowly, in a way similar to that of the basic translation memories themselves,
pre-translated replies would travel back to the requesterrequestor,
some in form of perfect matches, most of them in form of fuzzy matches. This
would result in a pre-translated draft, whose production perhaps could require
the computer to be left working overnight (depending on factors such as length,
actual degree of matches found, level of requirements set by the user etc…).
There are
of course many questions arising from this, most of them far beyond the scope
of this paper. And not a few immediate drawbacks. To
start with, all previous drawbacks from conventional non peer-to-peer sharing
would be still there, unsolved. But also, there would be higher risks of potentially
wrong translations from anonymous partners: sharper criticism on received
equivalences will be needed, making thus the revision process even more
demanding. The obviously wider range of topic variety would add to confusion
and metadata describing the thematic adscription of segments would be
indispensable in order for the machine to “trust” one potential translation or
the other. Bandwidth requirements would be unknown. There would also be legal
and copyright issues on translated text —as such— versus equivalent segments
whose ownership is determined differently depending on national legislations.
With
technical, legal and translational problems ahead, the possibility to implement
some peer-to-peer device for translation memory sharing appears both as a
challenging enterprise and as a promising area of research. As one good old
friend of mine says “machines don’t have intuition, but they have memories” (Fustuegueres 2001, my translation). I would add, maybe we
can help them sharing.
There needs to
be a conclusion section that clearly brings together the purpose, findings, and effects...
End notes
[1] See Hutchins (Hutchins 1996) for an
enlightening description of the most frequent misinterpretations and misleading
circumstances related to the ALPAC report.
[2] Aligned
texts are the main asset of a translation memory, and many resources are
usually devoted by companies and institutions to align texts that had been
translated before the implementation of TM software in order to enhance the
production of subsequent translation activity
References
Abaitua, Joseba. “TMX format,” 1998,
http://paginaspersonales.deusto.es/abaitua/konzeptu/ta/tmx.htm
Brain,
Davis, Paul C. Stone Soup Translation: The Linked Automata
Model. Doctoral dissertation.
Freigang,
Karl Heinz. “Automation of Translation: Past, Presence, and Future” in Revista Tradumatica No.
0 (2001), http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm
Gow, Francie. Metrics for evaluating Translation Memory Software. Unpublished Thesis.
Hutchins,
John. “The precursos and the
pioneers,” Machine Translation, past,
present and future.
- “ALPAC: the
(in)famous report,”
MT News International Vol. 14, June
1996, pp. 9-12.
Reprinted in:
Jakobson,
Roman. “On Linguistic Aspects of Translation.” In Baker, M., and Venuti, L.k Eds. The
Translation Studies Reader, 113-118.
Nyberg, Eric; Mitamura, Teruko. “The Kant system:
fast, accurate, high-quality, translation in practical domains,” Proceeds of Coling
92,
Sanchez-Gijon, Pilar. “Cataleg de sistemes de memories de traduccio,” Revista Tradumatica, No. 0 (2001).
SDL International. An Introduction to Computer Aided-Translation,
http://www.sdl.com/products and http://tc.eserver.org/18490.html
Several authors. “CAT fight,” Proz, The Translators
Workplace, http://www.proz.com/?sp=cat/compare
Silvia Fustegueres. “Qui te por de les memories de traduccio?” Revista Tradumatica, No. 0 (2001).
Zerfass, Angelika. “Evaluating Translation Memory Systems,” First International Workshop on Language
Resources for Translation Work and Research, Gran
Canaria, 2002.