The incidence practice of computers in translation *** Vague: “computers in
translation.” ***
ranges from the rather unsuccessful attempts to
attain the so-called Fully Automatic High Quality Machine Translation to the
current widespread usage of translation memories. Gone are
the years of vast government research-spending, in research
attempting to “help” computers do the translator’s
job,; and
spending comes now comes from individuals who invest in expensive
software in the hopes of having back creating machines that helping the
translator. Are there any better ways than the ones available to share that
spending and its benefits? *** You may consider
distinguishing what type of “benefits” you mean ***
The Glory and The Shame
of Machine Translation
The idea of
trying to make numbers talk like words is an old one.1 While thinkers like Leibniz had already devised a mathematical system of language
representation and translation as early as in the late 17th century *** A citation is required here. ***, and even Descartes had sketched out what he called a “universal
language” in form of mathematical expressions ***
Citation required ***, we can go back as
far as to 1661 to trace one of the first fully developed
attempts to work out a mathematical model for translation.
In Tthat year, the
precocious chemist, explorer and mathematician, Johannes Becher
produced a numeric system that was allegedly able to translate from Latin into German,
and postulated a generic mechanism that could be extended to all vernacular
languages. *** It is unclear as to whether Becher or the numeric system did the “postulating.” ***
It consisted of some 10,000 words, *** You may want to include some of these words as an example.
*** designated by a number, *** One number for each word? It is unclear. ***
and it used additional numeric values for endings and
cases, together with some basic equations. ***
Examples of some equations? *** By entering “word” values into the
calculations, new numbers would come out that could be checked against a new
list in German, eventually returning a translation of the original (Freigang 2001).
It could
be said, thus, that Tthe concept
practice of
computer assisted translation, or even, automated translation, dates back several
centuries before the appearance of computers. Should
Becher have been able to use a computer or a calculating machine, no one would
doubt to qualify his invention like the first attempt to develop an automated
translation system. *** You would most certainly
need a citation here to back up your opinion that no one would doubt Becher.
*** And Moreover, there were other similar attempts, curiously enough, very close in time, like the
one by Athanasius Kircher
in 1663, or an even earlier one by Cave Beck in 1657 *** These examples may require an endnote rather than a citation. *** (Hutchins 1986, 2:1).
The idea
of Ssuch “mechanical dictionaries” experienced a brief revival in the early 20th century, with the
“Mechanical Brain,”, by the French engineer Georges Artsruni, or and the
invention by the Russian Petr Trojanskij,. These which were the first truly mechanical translation
devices (Freigang 2001).
*** You seem to give more credit to the attempts
at translation than to the actual models. You don’t give enough credit
to the “Mechanical Brain.” Please extend this paragraph. ***
The heyday
of many other subsequent attempts started with a famous
—or infamous— memorandum addressed to the Rockefeller Foundation in
1949 by Warren Weaver. His well-known mathematical model of communication,
developed together with Claude Shannon, would consolidate *** Not sure if “consolidate” (to merge, to strengthen)
is the correct word to use, here. *** the
idea of translation as a mere *** How is the question
a “mere” one? *** question of
“breaking the code.” and This model would
initiate two decades of frantic activities and huge investment in order to
attain the so called “Fully Automated High Quality Machine Translation.”
The final
report of the Automatic Language Processing Advisory Committee (ALPAC) *** You have this in your
reference page, but this is not a proper citation. *** is almost as famous —or
as infamous writes [1]1: the year 1966 meant the end of
government spending for research on machine translation and the establishment
of a certainty that lasts until nowadays: machine translation is mostly useless
without human intervention in the form of editing or rewriting. *** Are the words after the colon a quote or
paraphrasing? This paragraph is unclear and not cited properly. ***
However,
further attempts and approaches would provide new insights to the complexity of
the machine translation question, like the initiative in by the
former European Community, called in creating Eurotra, *** Endnote? *** which
not only retrieved borrowed the
original idea by Descartes of developing what is called an “interlingua,” or
intermediate meta-language, but also
provided richer analytical developments ***
Example? *** while
establishing the bases for current computer assisted translation techniques.
The
idea of developing Aan input-controlled translation method is very much associated with similar to Meteo, the current Canadian system for bilingual weather reports, Méteo, which is still working
nowadays. This approach, which is
effectively working also works for many multinational companies in the
production of their internal multilingual paperwork, memorandums and manuals,
can be very well summarised by outlining the features of the project called KANT, by a project for “Knowledge-based Accurate Natural-language
Translation.,”
or KANT.
KANT
works by carefully controlling the input quality of the source text. Developed by Carnegie Mellon
University, the system KANT monitors
ambiguities in the original text,
returns to the “writer” inquirer those
segments considered incorrect by the machine’s internal grammar and only when
the text is considered “understandable” by the machine, automated translation
takes place *** It is still unclear, up until
this point, what exactly is being translated. *** (Nyberg and Mitamura 1992). However,
it is a fully human intralingual translation, in that sense of what R. Jakobson (113-118) what takes
place first recognizes as:
the original is [being] conventionally
translated into another simplified “original.” Automated translation becomes
more of a by-product rather than a real translation.
But with
science-fiction high-brow, automated
translation projects more or less at a halt, ***
Why are they coming to a halt? *** down to earth translation professionals did start have
started benefiting from the advantages of modern-day computers. Computer Assisted Translation *** CAT? Is this not the title of your article? It should
be recognized here, now. *** is “the broadest
term used to describe an area of computer technology applications that
automates or assists the act of translating text from one language to another”
(SDL International). The list of contributions of
Ccomputer technologies that conform attributed to this definition is not
short
application include, but are not
limited to: word
processors, electronic dictionaries, terminological data banks, BBS and
discussion groups, optical character recognition, spell and grammar check, Ee-mail,
WWW documentation, desktop publishing, speech recognition, specific
localization tools, and translation memories…. *** It seems
that the former list contains programs and systems that would be placed better
at the introduction of your document—again, adding more focus to it. ***
From MT to TM
I intend here to speculate about the
pendulum-like movement that may articulate the relationship between translation
memories and machine translation which goes beyond a simple swap of capital
initials —from MT to TM—, although it may very well have to do with the
swapping of full translated sentences. *** Also, watch your point-of-view. You suddenly jump
from third person to first person. Be consistent. ***
Translation
memories (TM) may be defined as a set of software applications devised to help
translators in their activity by retrieving already translated terms or
segments and recycling them, or by building up tentative translations from
previously translated segments that share common traits. Those perfectly duplicable
segments are called “perfect matches.” Those tentative translations generated
from analogous segments are called “fuzzy matches.”***
There should be examples or citations to support your points. *** 2
Leaving
aside the particular mechanics of different software, Tthere
are more than a dozen different Translation Memories oin
the market, ranging in price from the twenty dollar
amateurish from the “Alair II” to the highly professional, corporate and expensive 5000 dollar
high ended
“Alchemy Catalyst,” or other suites like
“Trados,” a de facto standard, or its competitors “Déjà
Vu,” “SDLX” or “Transit.”. *** An endnote may be required for readers who may want
to know more about the aforementioned programs. ***
Translation
memories TM’s are optimal tools for texts that are highly
repetitive, belong to a larger corpus of specialized texts to be translated,
present a wide specialized terminology pool and belong to multilingual
localization projects. They help to guarantee a high degree of terminological
consistency, ease massive revision processes, speed up productivity in large
localization projects and efficiently cumulate topic-related formulisms. *** You seem to
be losing more of your focus; keep to your topic. ***
However, it
is easy *** Why is it “easy”? *** to anticipate that they
do not deal well with “stylistically rich” originals and that they impose a
segment-restricted optics instead of general-text approaches. The so-called “pPerfect matches” may induce disastrous
context-related misinterpretations. Furthermore, there has been a traditional
problem of low compatibility between different TM software and, in most cases
they involve an expensive investment for translators that may need to face too
diverse customer requirements. *** Where are
your citations? ***
Let
us focus on the last two problems: Hhow can the information contained in a translation memory be shared between users of
different software? Why can that be useful and when
could it be desirable? How can this be useful—practically and financially? ***
You should bold face and capitalize (where appropriate) the previous lines and
make them a subheading. ***
Large
localization projects are often undertaken by teams of translators who that are required to use the same software. Their
already translated segments are uploaded into a common repository that
subsequently provides possible perfection of
fuzzy matches not only to the one translator that uploaded them, but also to
the other members of the translation team. ***
Example? ***
Although
the advantages of sharing one’s work with other project partners, by means of increasing the size of the commonly developed
repository of paired sentences and, thus, the overall amount of translated text
that can be recycled, are clear and appealing, *** Why and how are they clear and appealing? *** there are a few serious
drawbacks that to the current actual practice of translation
memory TM sharing involves.: tTranslators may have
to put up with non-agreed be faced
with non- unanimous solutions,
revisions and eventual changes do affecting other
another translators’
work,;
the search for group consensus tends to
may slow
down the process,; and while there is a higher
heavier workload
for early starters, while more recycled segments are
available for late participants. veteran members, newer partners may be given only left-over
material to work with.
Finally,
all translators must use would have to use the
same software type and versions. Thus, a professional may end up
being excluded from a project because it may not be worth for him or her to
invest in that particular new software that may be needed exclusively for a
specific project. Even thinking of it as an investment in the long run, by when
he or she may need the same software for a new project, new incompatible
versions of the program may have been released. ***
This paragraph seems to have lost some of the formality of language found in
the rest of the text. ***
Among the above problems, some are strictly work-flow related —which will not be discussed here— and some others are good-old translation problems. Finally, some other problems related with software standards. For the latter, TMX provides a general solution that is becoming increasingly accepted and integrated by software makers.
TMX and New Paradigms in File Sharing
Translation
Memory eXchange language (TMX)
provides a general solution for software
standards this is becoming increasingly accepted and integrated by software
developers. TMX is a SGML/XML *** What are these? *** -based markup language —which involves a fairly easy and compatible
Internet implementation. It is a standard established by LISA (Localization
Industry Standards Association —www.lisa.org—
*** Cite this properly. ***) that is
being increasingly integrated by translation memory makers within the
export/import capabilities of their latest versions. There are several levels
of compliance with the TMX norm, ranking from 1 to 3 depending on the amount of meta-data set aside
from purely textual information which the system is able to convert into TMX.
It becomes a powerful exchange tool when combined with TBX (TermBase eXchange Language) (TBX),
which is its counterpart by means of exchanging terminological database
contents. Ultimately, by using TMX, translators would not have to use the same
TM software in order to co-participate in
the same localizationed project.
Essentially,
TMX works as a text-only based mark-up language into which aligned text
—original and its translation(s)—is exported from a translation memory [2].2
No matter which TM software is being used, as long as it furnishes TMX
import/export capabilities, the resulting tagged,
text-only file could be “read” by any other TM that effectively participates of
the same capabilities, no matter what particular internal codification system
it uses to store the information.
This is a —very much— general picture of how far things have evolved up to these days. How further can they go is still questionable but here follows a speculation on the potential of TMX when combined with currently existing possibilities and software already running on the Internet. What will be said from now on, however speculative, is not simple science-fiction and, should technical and human means be provided, an interesting field of theoretical research and practical application may unfold before us.
The
new paradigms in Internet file sharing The potential of TMX when combined with current Internet software must be considered here. In the late 1990’s a new
way wave of sharing information and files shook the music
industry and pushed it to the fringe of bankruptcy in some cases. Programs like
Napster, Gnutella, Kazaa and others, allowed users to share their
files —including music— and to them exchange exchange
files freely. Several national
branches of large music companies were forced to close or to deeply restructure
their business philosophies because of the economic breakdown inflicted by
peer-to-peer Internet music sharing. As a result, a
ruling one instance of the
a Supreme
Court ruling
in 2001 closed Napter’s web page and all its
activities. This involved one of the most echoed direct interventions of the
administration on the actual practices that take place in the Internet. 3 But it is
not music or even major financial consequences what may be interesting in
regard to translation memories *** “Financial
consequences” should be a major concern, as it is a major point stated in your
abstract! And, if indeed, it is no longer a concern, then why place any focus
at all on the financial demise of such large companies as Napster? ***:
it is instead the fact that a network of independent users may share their
files so easily which becomes of importance here.
Basically,
a program like Napster works as follows: a user sets
copies a
series of music files in his computer within an
especial into a “share” folder. The program sends the list of
filenames (song titles) to the server, which indexes it. Then Tthe user then sends
a query about any song he may be interested in. Since many other users of the
same software have also sent their shareable filenames to the
server using via Napster, the server locates the requested song
title(s) in his its indexed directory and tells the first user in
which other computer the song is stored. Then, both
users’ each party’s computers connect directly to one with
another and file transmission takes place in
on a
one-to-one basis. The bulge of data (the comparably huge music file) is only
transmitted in the final stage. All what happens before that is just listings
of short
textual units *** Citation? *** (song titles) going to
and fro. 4
The
Gnutella system works in a slightly different way: it is more of a “word-of-mouth”
system —if such a bodily metaphor can be used when
talking about computers— which is consequently slower but requires
no central server. One user launches a request ***
How? ***, which is directed to only two computers, the
“closest ones” in the network of Gnutella users. The odds are that those particular computers
are not able to satisfy the request of that particular filename *** Proof? ***, so the next thing they do is
to re-launch the same query to the next two computers. After twenty times, more
than one million computers will have received the request. Once the requested
file is located, a response stating where the host is travels back the chain, and, finally the requestor and the provider may get in touch connect
directly, without a “middleman” this time, and
Tthe file is transmitted, again on a one-to-one basis. *** Citations? ***
The
question arising from this seems both obvious and compelling: Can a
peer-to-peer exchange system be developed for translation memory sharing? *** This should be a bold-faced, capitalized sub-title.
***
Usage of TMX as a unifying *** Universal? *** standard would provide common grounds for
the exchange. Once a translation project is finished, translators usually would return their final version to their client, while they usually holding the resulting translation memory as a
by-product of their work. *** This sentence is
not clear. *** A program would convert the contents of those
memories into TMX-tagged multilingual text and an “exchanger” would expose the
memories to the World Wide Web by placing them it into a share area open to
public access. *** It is not clear if you are
finally giving the reader your “solution” to these problems, or if this is
still a continuation of the TMX definition. ***
The
repetition of this action by several users would create a dense and sprawling
network of interconnected computers, as it happens with Napster and Gnutella,
which could potentially become the largest pool of aligned text (translations
and originals) ever.
Whenever a translation project starts, users would
connect to the network and their “memory exchanger” would launch queries for
similar segments to the bulge of participants. Slowly, in a way similar to that
of the basic translation memories themselves, pre-translated replies would
travel back to the requestor, some in form of perfect matches, most of them in
form of fuzzy matches. This would result in a pre-translated draft, whose
production perhaps could require the computer to be left working overnight
(depending on factors such as length, actual degree of matches found, level of
requirements set by the user etc…). *** This
“solution” is the focus of your abstract, and should be the bulk of your
article, yet you give it only one paragraph? You need to lessen the majority of
your piecemeal definitions, which are scattered throughout, and give this
“solution” much more attention and research. ***
There are
of course many questions arising from this, most of them far beyond the scope
of this paper. *** Yet, these “questions” are
what your focus should be—what you should be researching. *** And not a few immediate drawbacks. To start with,
all previous drawbacks from conventional non peer-to-peer sharing would be
still there, unsolved. But also, there would be higher risks of potentially
wrong translations from anonymous partners: sharper criticism on received
equivalences will be needed, making thus the revision process even more
demanding. The obviously wider range of topic variety would add to confusion
and metadata describing the thematic adscription of segments would be
indispensable in order for the machine to “trust” one potential translation or
the other. Bandwidth requirements would be unknown. There would also be legal
and copyright issues on translated text —as such— versus equivalent segments
whose ownership is determined differently depending on national legislations. 5
With
technical, legal and translational problems ahead, the possibility to implement
some peer-to-peer device for translation memory sharing appears both as a
challenging enterprise and as a promising area of research. As one good old friend of mine says “machines don’t have
intuition, but they have memories” (Fustuegueres
2001, my translation). I would add, maybe we can help them sharing.
End notes 6
[1] 1. See
Hutchins (Hutchins 1996) for an enlightening description of the most frequent
misinterpretations and misleading circumstances related to the ALPAC report.
[2] 2. Aligned texts are the main asset of a translation
memory, and many resources are usually devoted by companies and institutions to
align texts that had been translated before the implementation of TM software
in order to enhance the production of subsequent translation activity. *** The should still be a reference link with this
endnote. ***
References
7
Abaitua, Joseba. “TMX Fformat,.” 1.1 (August 1998),.
TABhttp://paginaspersonales.deusto.es/abaitua/konzeptu/ta/tmx.htm (Date of
access). *** I do not see this reference
in your paper. ***
Brain, ,.”
How Sstuff
Wworks,.
(Date of publication.)
TABhttp://computer.howstuffworks.com/file-sharing3.htm
(Date of access). *** I do not see this reference in your paper. ***
*** This is not a user-friendly site; I can find no location for your article
title. ***
Davis, Paul C. (2002). Stone Soup Translation: The Linked Automata
Model. (Doctoral dissertation., ,. Retrieved (month date, year), from
http://www.ling.ohio-state.edu/~pcdavis/papers/diss.pdf
*** I
do not see this reference in your paper. ***
Freigang,
Karl Heinz. “Automation of Translation: Past, Presence, and Future.” in Revista Tradumatica No. 0 (October 2001),.
http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm
(Date of access).
Gow, Francie. (2003). Metrics for
Eevaluating Translation Memory Software. Unpublished
master’s tThesis., , 2003. ***
I do not see this reference in your paper. ***
Hutchins,
John. “The precursos and the
pioneers,” Machine Translation, past,
present and future.
- “ALPAC: the
(in)famous report,” MT News International Vol. 14, June 1996, pp. 9-12. Reprinted in:
Baker, M, and L.K. Venuti,
eds. Jakobson, Roman. “On Linguistic
Aspects of Translation.” In
Baker, M., and Venuti, L.k
Eds. The Translation
Studies Reader,. By Roman Jakobson. 113-118.
Nyberg,
Eric;, and
Teruko Mitamura. Mitamura,
Teruko. “The Kant Ssystem: Ffast,
Aaccurate, Hhigh-Qquality,
Ttranslation in Ppractical
Ddomains,.” Proceedsings of Coling COLING-92,.
Sanchez-Gijon, Pilar. “Cataleg de sistemes de
memories de traduccio,.” Revista Tradumatica,. No. 0 (October 2001).
http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm (Date accessed). *** I do not see this reference
in your paper. ***
SDL International. An Introduction to Computer
Aided-Translation, http://www.sdl.com/products and http://tc.eserver.org/18490.html
*** You need to separate both web site
addresses, and give each one its own line; if this is not a significant source,
but rather supplementary material, it should be cited as an end note. ***
Several authors. “CAT fight,.”
1999-2005.
Proz, The Translators Workplace,.
http://www.proz.com/?sp=cat/compare
(Date of access). *** I do not see this reference in your paper. ***
Silvia Fustegueres, Silvia. “Qui te
por de les memories de traduccio?” Revista Tradumatica,. No. 0 (October 2001).
http://www.fti.uab.es/tradumatica/revista/num0/sumari/sumari.htm
(Date accessed). *** I do not see this reference in your paper. ***
Zerfass, Angelika. “Evaluating Translation Memory Systems,.” First International Workshop on Language
Resources for Translation Work and Research,.
Gran Canaria, 2002. *** I do not see this reference
in your paper. ***
(Refer
to corresponding coloured numbers in text)
1. This
would have been a very interesting topic to read about. However, the bulk of
your article also falls into the trap of discussing issues that are not new.
There is no real new train of thought here, but rather a chaotic summary of
terms and definitions.
2.
All citations disappear from this point on, which happens to be the majority of
your article. It is vital to source anything that could be attributed to
someone else’s ideas and research. Pay special attention to making off-handed
remarks or making judgments (like when you indicated the “cheap” vs.
“expensive” translation memory programs).
3.
The decision to close down the Napster website was made by the Lower Courts,
and not the Supreme Court. Following are two links that may be helpful:
http://archives.cnn.com/2001/LAW/02/12/napster.decision/
4.
The wording on directions for downloading music on the Napster site is a little
awkward. The following site may be helpful: http://en.wikipedia.org/wiki/Napster
5.
Mentioning these dilemmas is just and valid, but they seem to overshadow any
solutions you may have already proposed, and may have become even more of an
obstruction than the already existing models (see KANT and TMX).
6. Endnotes / Footnotes: www.aresearchguide.com/7footnot.html.
Superscripting is a better way to indicate an endnote.
7. Referencing: you may consider purchasing the 1998 copy of MLA handbook
(www.mla.org)
if you haven’t already done so, or visit www.ccc.commnet.edu/mla/.
Your
focus is unclear from the beginning. Work on an introduction that addresses the
points you’re going to make. (John Hutchins and Harold
L. Somers, an introduction to machine translations:
http://ourworld.compuserve.com/homepages/WJHutchins/IntroMT-TOC.htm).
Your abstract seems to promise the following: old vs. newer translation
programs, government vs. personal expenditures, alternative ways to share the
wealth. Yet, in the actual article you jump from one point to another; your
bold-faced titles do not follow any logical pattern. This could potentially be
an interesting paper, but you throw too many ideas, definitions, and piece-meal
examples into the pot—which makes it look more like you should be writing a
research paper—you have too much going on to justify turning this in as just an
article.
You may want to focus on one issue you bring up (like sharing files
between differing peer-
to-peer programs; why certain
governments have stopped funding research in the translation field; or
expensive vs. inexpensive translation programs and the good and not-so-good of
each), and work with that. You cannot jump into such a huge topic and not give
some background information to machine translation.
The following link may also be of interest regarding machine translation:
www.essex.ac.uk/linguistics/clmt/papers/mt/ (mostly .ps files)
MIRC and IRC are probably the grandfather’s of on-line chat, which you
do not mention,
but could also provide
some interesting points. Take a look at Doug Robinson: Cyborg
Translation (it deals humourously with chat and
translation from a sci-fi view point).
Proxies and Firewalls may also provide an interesting look at
translation (since part of your discussion touched on file-sharing:
www.mirc.co.uk/help/proxies.html.
Are you familiar with the Janice Walker style of Internet referencing?
Whichever style you decide to use, you must just make sure you are consistent
with usage.
A – 1