MACHINE TRANSLATION

हिंदी अनुवाद के सैध्दातिक पक्षों की जानकारी छात्रों और अनुवाद में कार्य कर रहें अभिभावकों देने के लिए यह ब्लॉग बनाया गया है। इस ब्लाग में मशीनी अनुवाद की बढती माँग के कारण रखाना उचीत समझा गया है। अनुवाद के विद्वानों और छात्रों से अनुरोध हैं की वे अपना विचार अवश्य भेजें।....... कांबले प्रकाश अभिमन्यु

Wednesday, December 5, 2007

related terms in homonmys

Related terms
Samepronunciation
Differentpronunciation
Samespelling

Differentspelling

Homonym
Homograph
Homophone
Heteronym/heterophone
Several similar linguistic concepts are related to homonymy, and some are considered sub-types of homonyms. This variety stems in part from the fact that the term 'homonym' is ambiguous, as there are a number of ways that two meanings can share the 'same name'. Related terms include:
Homography. Homographs are homonyms that share the same spelling. Homographs may be pronounced the same, in which case they are also homophones – for example, bark (the sound of a dog) and bark (the skin of a tree). Alternatively they may be pronounced differently, in which case they are also heteronyms – for example, row (argument) and row (propel with oars). ("Homograph" also has a specialised meaning in typography, where it may be used as a synonym for homoglyph.)
Homophony. Homophones are homonyms that share the same pronunciation. Homophones may be spelled the same (in which case they are also homographs) or spelled differently (in which case they are heterographs). Homographic examples include desert (to abandon) and dessert (a thing deserved). Heterographic examples include to, too, two, and there, their, they’re.
Heteronymy. Heteronyms are homonyms that share the same spelling but have different pronunciations. That is, they are homographs which are not homophones. Such words include desert (to abandon) and desert (arid region). Heteronyms are also sometimes called heterophones. ("Heteronym" also has a specialized meaning in poetry; see Heteronym (literature).)
Polysemy. Polysemes are words with the same spelling and distinct but related meanings. The distinction between polysemy and homonymy is often subtle and subjective, and not all sources consider polysemous words to be homonyms. Words such as "mouth", meaning either the orifice on one's face, or the opening of a cave or river, are polysemous and may or may not be considered homonyms.
Capitonymy. Capitonyms are homonyms that share the same spelling but have different meanings when capitalized (and may or may not have different pronunciations). Such words include polish (to make shiny) and Polish (from Poland).
In derivation, homograph means "same writing", homophone means "same sound", heteronym (somewhat confusingly) means "different name", and heterophone means "different sound".

[edit] Terminological confusion
There is considerable confusion and contradiction in published sources about the distinction between homonyms, homographs, homophones and heteronyms. Significant variant interpretations include:
Chambers 21st Century Dictionary [1] defines a homonym as "a word with the same sound and spelling as another, but with a different meaning" (italics added). Merriam-Webster Online Dictionary [2] also says that a homonym is "one of two or more words spelled and pronounced alike but different in meaning" (italics added), but appears to also give homonym as a synonym for either homophone or homograph.
Cambridge Dictionary of American English [3] defines homonym as "a word that is spelled the same as another word but that does not have the same meaning" (the same as what above is called a homograph).
The entry for homonym in The Encyclopaedia Britannica (14th Edition) states that homographs are "words spelt but not sounded alike", and homophones are "words alike only in sound [i.e. not alike in spelling]" (italics and comment added).
Homographs are defined in the Oxford English Dictionary as words that are spelled and pronounced the same as another but with a different meaning, thus excluding pairs such as desert (abandon) and desert (arid region).
The Encarta dictionary [4] defines heteronym as "each of two or more words that are spelled the same, but differ in meaning and often in pronunciation" (italics added).
The "Fun with Words" website [5] says that a heteronym is "One of two (or more) words that have the same spelling, but different meaning, and sometimes different pronunciation too" (in other words, what is called a homograph above).

[edit] Further examples
A further example of a homonym which is both a homophone and a homograph is fluke. Fluke can mean:
A fish, and a flatworm.
The end parts of an anchor.
The fins on a whale's tail.
A stroke of luck.
All four are separate lexemes with separate etymologies, but share the one form, fluke*[6].
Similarly, a river bank, a savings bank, a bank of switches, and a bank shot in pool share only a common spelling and pronunciation, but not meaning.
The words bow and bough are interesting because there are two meanings associated with a single pronunciation and spelling (the weapon and the knot); there are two meanings with two different pronunciations (the knot and the act of bending at the waist), and there are two distinct meanings sharing the same sound but different spellings: (bow, the act of bending at the waist, and bough, the branch of a tree). In addition, it has several related but distinct meanings - a bent line is sometimes called a 'bowed' line, reflecting its similarity to the weapon. Thus, even according to the most restrictive definitions, various pairs of sounds and meanings of bow and bough are homonyms, homographs, homophones, heterophones, heterographs, and are polysemous.
bow - To bend forward at the waist in respect (e.g. "bow down")
bow - the front of the ship (e.g. "bow and stern")
bow - the weapon which fires arrows (e.g. "bow and arrow")
bow - a kind of tied ribbon (e.g. bow on a present, a bowtie)
bow - to bend outward at the sides (e.g. a "bow-legged" cowboy)
bough - a branch on a tree. (e.g. "when the bough breaks...")

[edit] Homonymy in historical linguistics
Homonymy can lead to communicative conflicts and thus trigger lexical (onomasiological) change[1]. This is known as homonymic conflict.

[edit] External links

Look up homonym in Wiktionary, the free dictionary.

This article has been illustrated as part of WikiProject WikiWorld.

Information on teaching homophones including free ebook and teaching tips
Alan Cooper's Homonym List
Quiz to learn homonyms
Quiz Using Picture Clues
Homophone Translator
Etymologies

[edit] References
^ On this phenomenon see Williams, Edna R. (1944), The Conflict of Homonyms in English, [Yale Studies in English 100], New Haven: Yale University Press, Grzega, Joachim (2004), Bezeichnungswandel: Wie, Warum, Wozu? Ein Beitrag zur englischen und allgemeinen Onomasiologie, Heidelberg: Winter, p. 216ff., and Grzega, Joachim (2001d), “Über Homonymenkonflikt als Auslöser von Wortuntergang”, in: Grzega, Joachim (2001c), Sprachwissenschaft ohne Fachchinesisch: 7 aktuelle Studien für alle Sprachinteressierten, Aachen: Shaker, p. 81-98.

how to writ in hindi in your blog

Google Help > Blogger Help > Publish and Archive > Posting & Editing > Getting Started
How do I use the transliteration feature?
What is Transliteration?
Blogger offers an automatic transliteration option for converting Roman characters to the Devanāgarī characters used in Hindi. This lets you type Hindi words phonetically in English script and still have them appear in their correct alphabet. Note that this is not the same as translation -- it is the sound of the words that are converted from one alphabet to the other, not their meaning. For example, typing "hamesha" transliterates into Hindi as:
Enabling the Transliteration Feature
To enable this feature, go to the Settings Basics page and select "Yes" for the transliteration option. This setting will affect all blogs on your account, similar to the Compose Mode setting.
Once you've done that, go to your post editor and you'll see a new button there. A yellow label points it out for you the first time you use it. Simply click the "X" in the top right corner to make it disappear.
Typing with Transliteration
This button toggles the transliteration feature on and off. (You can also use Ctrl-G as a shortcut.) When it is on, it affects the title and labels, as well as the body of your post. The letters of a word will appear as you type them until you reach the end of the word. As soon as you type a space or a punctuation mark, the letters will be converted to Devanāgarī characters, like this:
If you prefer to do the transliteration all at once, rather than as you go, you can type your text with the transliteration button turned off. Then select all your text and click the button. Everything selected will be transliterated at once, and you can go back and edit it as desired. (Note: This only works in the body of the post, and not in the title or labels.)
The transliteration will attempt to match the sounds of the letters as accurately as possible between the two alphabets. If you find that it is incorrect, however, you can fix it.
Correcting and Editing Words
When you find a word you want to change, just click on it once, using the left mouse button. This displays a short menu of alternate spellings, as well as an option to switch back to the original Roman characters you typed, or to edit the word further.
If you choose the "Edit..." option, you'll see the word in an Edit mode that provides on-the-fly suggestions for the next letter in the sequence. Click on the letter you want to enter next, it will be added to the word, and suggestions for the following letter will come up. You can also continue typing from your keyboard as well, if you prefer, and the characters will be entered according to this chart. Type a space or press the Enter key to end the word and go back to normal typing mode.
The suggestions provided in the "Edit..." option are limited to letters which could reasonably follow the ones already typed. Each button shows some English text in gray, which indicates the part of the last syllable that you have already typed. The text in bold indicates what you can type to get the Hindi letter displayed on that button. Alternatively, you can just click the button and it will add the correct letter for you. If a button is green, that means that the letter is phonetically similar to the last typed syllable, and clicking on the button will replace it.
On-Screen Keyboard
If you want complete control over the choice of letters, click the keyboard icon to the right of the word you're editing. A full on-screen keyboard comes up, and you can simply click the letters you want to insert them into your text.
Matras (accent marks) are shown with dotted circles to indicate that they can be applied to different letters. To use them, first click the letter you want to use, then click the matra you want to apply to it.
Type a space, punctuation mark or the Enter key to end the word and go back to normal typing mode, or just click the "X" icon on the keyboard to remove it.
Saving Corrected Transliterations
Whenever you type a word and the Hindi transliteration is not the one you wanted, you can correct it using the editing features described above. When you do this, the new transliteration is remembered for you. If you type the same word again in the future, it will then be transliterated correctly based on your saved preference. These corrections are stored to help us improve our service.
Installing and Viewing Hindi Fonts
Blogger uses Unicode to encode the Hindi characters in your post. Unicode is a system of representing text and symbols and is supported by all modern browsers and operating systems.
If you use Internet Explorer 6+ in Windows Vista/XP/2000, you should have no problems in viewing and editing Hindi text correctly. Mozilla Firefox requires support for complex text layout, otherwise it might display the Hindi text incorrectly. The support for complex text layout is usually turned off by default, but this Wikipedia article gives a detailed explanation on how to turn it on in various operating systems.
Notes:
The transliteration feature is only supported in Internet Explorer versions 6.0 and higher on Windows, and Firefox 1.5 and higher on Windows and Linux. It is not supported on Macs.
The transliteration button is only available in Compose Mode. All the other editing features of Compose Mode will continue to work normally with transliterated text, and you can also copy and paste text to work with into the editor.
For a complete mapping of which Roman characters will be converted into which Devanāgarī characters, please see this article. Note that this is a static mapping that only applies in Edit mode. When you are simply typing as usual, a more complex algorithm is used to determine the correct characters to display based on the sound of each overall word.
If you see a message saying that the transliteration service is unavailable, check your internet connection. This feature requires a live internet connection, as all the transliteration is done on Google's servers and sent back to your browser while you work on your post.
For further help with transliteration, please see the Blogger Help Group.
function getText(url) {
var request = newRequest();
request.onreadystatechange = function(){
if(request.readyState == 4 && request.status == 200)
displayResults(request.responseText);
}
request.open("GET", url);
request.send(null);
};
function newRequest(){
if (window.XMLHttpRequest) { // Mozilla, Safari, etc.
return new XMLHttpRequest();
} else if (window.ActiveXObject) { // IE
return new ActiveXObject("Microsoft.XMLHTTP");
}
}
function displayResults(htmlContent){
if(htmlContent.length > 1){
document.getElementById('u2uResultsContent').innerHTML = htmlContent; // set the content
document.getElementById('u2uResultsDiv').style.display = ""; // make it visible
}
}
// get the text from the search page
//getText('search.py?type=f&query=transliteration&ctx=en%3Asearchbox&Action.Search=%C2%A0%C2%A0%C2%A0Search%C2%A0%C2%A0%C2%A0&mode=groupsHTML&answer=58226')

Other users are saying...
Re: Transliteration in Bloggerto write parts of a post in Hindi and the rest in English, you could switch between the transliteration and the normal mode by d...by Blogger Employee Helper - Apr 2, 2007 - 58 messages
Re: Hindi Blog - Transliteration Feature - Problematic Cursor movementtough time with the transliteration feature. When I type new characters in the blogpost they do not appear where the cursor is. ...by Rat - Oct 13, 2007 - 4 messages
Transliteration into TeluguHi, Transliteration into Hindi is great in the Blogger Editor, but how do I do the same into Telugu? It's a bi...

Sunday, December 2, 2007

machine translation

machine transaltion

Machine translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.
Current machine translation software often allows for customisation by domain or profession (such as weather reports) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.
Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language.

[edit] Introduction
The translation process may be stated as:
Decoding the meaning of the source text; and
Re-encoding this meaning in the target language.
Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar, semantics, syntax, idioms, etc., of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language.
Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language that "sounds" as if it has been written by a person.
This problem may be approached in a number of ways.

[edit] Approaches

Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.
Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.
It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first.
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.
Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.
To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.

[edit] Dictionary-based
Main article: Dictionary-based machine translation
Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them.

[edit] Statistical
Main article: Statistical machine translation
Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE from IBM. Google used SYSTRAN for several years, but has switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations materials to train their system. Accuracy of the translation has improved. [1]

[edit] Example-based
Main article: Example-based machine translation
Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning.

[edit] Interlingual
Main article: Interlingual machine translation
Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the interlingua.

[edit] Major issues

[edit] Disambiguation
Main article: Word sense disambiguation
Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel [2]. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word [3]. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.
Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.[citation needed]

[edit] Named entities
Related to named entity recognition in information extraction.

[edit] History
Main article: History of machine translation
The history of machine translation begins in the 1950s, after World War II. The Georgetown experiment (1954) involved fully-automatic translation of over sixty Russian sentences into English. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.
Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation.

[edit] Applications
There are now many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and AltaVista's Babel Fish. Although no system provides the holy grail of "fully automatic high quality machine translation" (FAHQMT), many systems produce reasonable output.
Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission, which employs a highly-customised version of the commercial SYSTRAN MT system to automatically translate a large volume of document preliminary drafts for internal use.
Global Translations, [4]a translation agency in the USA, has been developing specialized dictionaries for machine translation of tenders for telecommunications companies. Due to the highly technical nature of these documents, which are often very large in volume, machine translation quality improves dramatically in proportion to the text corpus that is imported into the dictionaries.
A Danish translation agency, Lingtech A/S [5], has been translating patent applications from English to Danish since 1993, using a proprietary rule-based machine-translation system, PaTrans [2], working together with the commercial translation-memory-based Trados CAT tool.
The Spanish daily newspaper Periódico de Catalunya is translated from Spanish into Catalan with an MT system [6].
Toggletext uses a transfer-based system (known as Kataku) to translate between English and Indonesian.
Google has claimed that promising results were obtained using a proprietary statistical machine translation engine [7]. The statistical translation engine used in the Google language tools for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology. [8] [9] [10] Uwe Muegge has implemented a demo website [11] that uses a controlled language in combination with the Google tool to produce fully automatic, high-quality machine translations of his English, German, and French web sites.
With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. In-Q-Tel [12] (a venture capital fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver. Currently the military community is interested in translation and processing of languages like Arabic, Pashto, and Dari.[citation needed] Information Processing Technology Office in DARPA hosts programs like TIDES and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology. [13]

[edit] Evaluation
Main article: Evaluation of machine translation
There are various means for evaluating the performance of machine-translation systems. The oldest is the use of human judges to assess a translation's quality. More recent, automated means of evaluation include BLEU, NIST and METEOR.
Relying exclusively on machine translation ignores that communication in human language is context-embedded, and that it takes a human to adequately comprehend the context of the original text. Even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human.
It has, however, been asserted that in certain applications, e.g. product descriptions written in a controlled language, a dictionary-based machine translation system has, in a production environment, produced perfect translation results that require no human intervention. [14]

[edit] See also
Comparison of Machine translation applications
Artificial Intelligence
Computational linguistics
Computer-assisted translation
Controlled natural language
History of machine translation
Human Language Technology
Universal translator
Wiktionary:Translations
List of research laboratories for machine translation
Russian translation

[edit] Notes
^ http://blog.outer-court.com/archive/2005-05-22-n83.html
^ Milestones in machine translation - No.6: Bar-Hillel and the nonfeasibility of FAHQT by John Hutchins
^ Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf
^ [1]
^ Lingtech.com
^ Informe sobre el sistema de traducción automática del Periódico de Catalunya (in Spanish)
^ Google Blog: The machines do the translating (by Franz Och)
^ Geer, David, "Statistical Translation Gains Respect", pp. 18 - 21, IEEE Computer, October 2005
^ Ratcliff, Evan "Me Translate Pretty One Day", Wired December 2006
^ "NIST 2006 Machine Translation Evaluation Official Results", November 1, 2006
^ This demo website uses a controlled language in combination with the Google engine
^ In-Q-Tel
^ GCN — Air force wants to build a universal translator
^ Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16-17 November 2006, London, London: Aslib. ISBN 978-0-85142-483-5.

[edit] References
Hutchins, W. John; and Harold L. Somers (1992). An Introduction to Machine Translation. London: Academic Press. ISBN 0-12-362830-X.

[edit] External links

At Wikiversity, you can learn about:
Topic:Computational linguistics
International Association for Machine Translation (IAMT)
Machine Translation, an introductory guide to MT by D.J.Arnold et al. (1994)
Machine Translation Archive by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
Machine translation (computer-based translation) — Publications by John Hutchins (includes PDFs of several books on machine translation)
NIST Machine Translation Tests - index
Machine Translation and Minority Languages
John Hutchins 1999
Retrieved from "http://en.wikipedia.org/wiki/Machine_translation"
Categories: All articles with unsourced statements Articles with unsourced statements since April 2007 Articles with unsourced statements since February 2007 Artificial intelligence applications Computational linguistics Machine translation Natural language processing