Google Neural Machine Translation

From HandWiki
Short description: System developed by Google to increase fluency and accuracy in Google Translate

Google Neural Machine Translation (GNMT) is a neural machine translation (NMT) system developed by Google and introduced in November 2016 that uses an artificial neural network to increase fluency and accuracy in Google Translate.[1][2][3][4] The neural network consists of two main blocks, an encoder and a decoder, both of LSTM architecture with 8 1024-wide layers each and a simple 1-layer 1024-wide feedforward attention mechanism connecting them.[4][5] The total number of parameters has been variously described as over 160 million,[6] approximately 210 million,[7] 278 million[8] or 380 million.[9]

GNMT improves on the quality of translation by applying an example-based (EBMT) machine translation method in which the system learns from millions of examples of language translation.[2] GNMT's proposed architecture of system learning was first tested on over a hundred languages supported by Google Translate.[2] With the large end-to-end framework, the system learns over time to create better, more natural translations.[1] GNMT attempts to translate whole sentences at a time, rather than just piece by piece.[1] The GNMT network can undertake interlingual machine translation by encoding the semantics of the sentence, rather than by memorizing phrase-to-phrase translations.[2][10]

History

The Google Brain project was established in 2011 in the "secretive Google X research lab"[11] by Google Fellow Jeff Dean, Google Researcher Greg Corrado, and Stanford University Computer Science professor Andrew Ng.[12][13][14] Ng's work has led to some of the biggest breakthroughs at Google and Stanford.[11]

In November 2016, Google Neural Machine Translation system (GNMT) was introduced. Since then, Google Translate began using neural machine translation (NMT) in preference to its previous statistical methods (SMT)[1][15][16][17] which had been used since October 2007, with its proprietary, in-house SMT technology.[18][19]

Training GNMT was a big effort at the time and took, by a 2021 OpenAI estimate, on the order of 100 PFLOP/s*day (up to 1022 FLOPs) of compute which was 1.5 orders of magnitude larger than Seq2seq model of 2014[20] (but about 2x smaller than GPT-J-6B in 2021[21]).

Google Translate's NMT system uses a large artificial neural network capable of deep learning.[1][2][3] By using millions of examples, GNMT improves the quality of translation,[2] using broader context to deduce the most relevant translation. The result is then rearranged and adapted to approach grammatically based human language.[1] GNMT's proposed architecture of system learning was first tested on over a hundred languages supported by Google Translate.[2] GNMT did not create its own universal interlingua but rather aimed at finding the commonality between many languages using insights from psychology and linguistics.[22] The new translation engine was first enabled for eight languages: to and from English and French, German, Spanish, Portuguese, Chinese, Japanese, Korean and Turkish in November 2016.[23] In March 2017, three additional languages were enabled: Russian, Hindi and Vietnamese along with Thai for which support was added later.[24][25] Support for Hebrew and Arabic was also added with help from the Google Translate Community in the same month.[26] In mid April 2017 Google Netherlands announced support for Dutch and other European languages related to English.[27] Further support was added for nine Indian languages: Hindi, Bengali, Marathi, Gujarati, Punjabi, Tamil, Telugu, Malayalam and Kannada at the end of April 2017.[28]

Evaluation

The GNMT system is said to represent an improvement over the former Google Translate in that it will be able to handle "zero-shot translation", that is it directly translates one language into another (for example, Japanese to Korean).[2] Google Translate previously first translated the source language into English and then translated the English into the target language rather than translating directly from one language to another.[10]

A July 2019 study in Annals of Internal Medicine found that "Google Translate is a viable, accurate tool for translating non–English-language trials". Only one disagreement between reviewers reading machine-translated trials was due to a translation error. Since many medical studies are excluded from systematic reviews because the reviewers do not understand the language, GNMT has the potential to reduce bias and improve accuracy in such reviews.[29]

Languages supported by GNMT

As of December 2021, all of the languages of Google Translate support GNMT, with Latin being the most recent addition.

  1. Afrikaans
  2. Albanian
  3. Amharic
  4. Arabic
  5. Armenian
  6. Azerbaijani
  7. Basque
  8. Belarusian
  9. Bengali
  10. Bosnian
  11. Bulgarian
  12. Burmese
  13. Catalan
  14. Cebuano
  15. Chewa
  16. Chinese (Simplified)
  17. Chinese (Traditional)
  18. Corsican
  19. Croatian
  20. Czech
  21. Danish
  22. Dutch
  23. English
  24. Esperanto
  25. Estonian
  26. Filipino (Tagalog)
  27. Finnish
  28. French
  29. Galician
  30. Georgian
  31. German
  32. Greek
  33. Gujarati
  34. Haitian Creole
  35. Hausa
  36. Hawaiian
  37. Hebrew
  38. Hindi
  39. Hmong
  40. Hungarian
  41. Icelandic
  42. Igbo
  43. Indonesian
  44. Irish
  45. Italian
  46. Japanese
  47. Javanese
  48. Kannada
  49. Kazakh
  50. Khmer
  51. Kinyarwanda
  52. Korean
  53. Kurdish (Kurmanji)
  54. Kyrgyz
  55. Lao
  56. Latin
  57. Latvian
  58. Lithuanian
  59. Luxembourgish
  60. Macedonian
  61. Malagasy
  62. Malay
  63. Malayalam
  64. Maltese
  65. Maori
  66. Marathi
  67. Mongolian
  68. Nepali
  69. Norwegian (Bokmål)
  70. Odia
  71. Pashto
  72. Persian
  73. Polish
  74. Portuguese
  75. Punjabi (Gurmukhi)
  76. Romanian
  77. Russian
  78. Samoan
  79. Scottish Gaelic
  80. Serbian
  81. Shona
  82. Sindhi
  83. Sinhala
  84. Slovak
  85. Slovenian
  86. Somali
  87. Sotho
  88. Spanish
  89. Sundanese
  90. Swahili
  91. Swedish
  92. Tajik
  93. Tamil
  94. Tatar
  95. Telugu
  96. Thai
  97. Turkish
  98. Turkmen
  99. Ukrainian
  100. Urdu
  101. Uyghur
  102. Uzbek
  103. Vietnamese
  104. Welsh
  105. West Frisian
  106. Xhosa
  107. Yiddish
  108. Yoruba
  109. Zulu


See also


References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 Barak Turovsky (November 15, 2016), "Found in translation: More accurate, fluent sentences in Google Translate", Google Blog, https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/, retrieved January 11, 2017 
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Mike Schuster; Melvin Johnson; Nikhil Thorat (November 22, 2016), "Zero-Shot Translation with Google's Multilingual Neural Machine Translation System", Google Research Blog, https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html, retrieved January 11, 2017 
  3. 3.0 3.1 Gil Fewster (January 5, 2017), "The mind-blowing AI announcement from Google that you probably missed", freeCodeCamp, https://medium.freecodecamp.com/the-mind-blowing-ai-announcement-from-google-that-you-probably-missed-2ffd31334805#.msj1mdvbh, retrieved January 11, 2017 
  4. 4.0 4.1 Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. Bibcode2016arXiv160908144W. 
  5. "Peeking into the neural network architecture used for Google's Neural Machine Translation". https://smerity.com/articles/2016/google_nmt_arch.html. 
  6. Qin, Minghai; Zhang, Tianyun; Sun, Fei; Chen, Yen-Kuang; Fardad, Makan; Wang, Yanzhi; Xie, Yuan (2021). "Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting". arXiv:2112.10930 [cs.NE].
  7. "Compression of Google Neural Machine Translation Model – NLP Architect by Intel® AI Lab 0.5.5 documentation". https://intellabs.github.io/nlp-architect/sparse_gnmt.html. 
  8. Langroudi, Hamed F.; Karia, Vedant; Pandit, Tej; Kudithipudi, Dhireesha (2021). "TENT: Efficient Quantization of Neural Networks on the tiny Edge with Tapered FixEd PoiNT". arXiv:2104.02233 [cs.LG].
  9. "Data Augmentation | How to use Deep Learning when you have Limited Data". May 19, 2021. https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2. 
  10. 10.0 10.1 Boitet, Christian; Blanchon, Hervé; Seligman, Mark; Bellynck, Valérie (2010). "MT on and for the Web". http://www-clips.imag.fr/geta/herve.blanchon/Pdfs/NLP-KE-10.pdf. Retrieved December 1, 2016. 
  11. 11.0 11.1 Robert D. Hof (August 14, 2014). "A Chinese Internet Giant Starts to Dream: Baidu is a fixture of online life in China, but it wants to become a global power. Can one of the world's leading artificial intelligence researchers help it challenge Silicon Valley's biggest companies?". Technology Review. https://www.technologyreview.com/s/530016/a-chinese-internet-giant-starts-to-dream/. Retrieved January 11, 2017. 
  12. "Using large-scale brain simulations for machine learning and A.I.". June 26, 2012. http://googleblog.blogspot.com/2012/06/using-large-scale-brain-simulations-for.html. Retrieved January 26, 2015. 
  13. "Google's Large Scale Deep Neural Networks Project". https://www.youtube.com/watch?v=KELYHjq9Gbs. Retrieved October 25, 2015. 
  14. Markoff, John (June 25, 2012). "How Many Computers to Identify a Cat? 16,000". New York Times. https://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all. Retrieved February 11, 2014. 
  15. Katyanna Quach (November 17, 2016), Google's neural network learns to translate languages it hasn't been trained on: First time machine translation has used true transfer learning, https://www.theregister.co.uk/2016/11/17/googles_neural_net_translates_languages_not_trained_on, retrieved January 11, 2017 
  16. Lewis-Kraus, Gideon (December 14, 2016). "The Great A.I. Awakening". The New York Times. https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html. Retrieved January 11, 2017. 
  17. Le, Quoc; Schuster, Mike (September 27, 2016). "A Neural Network for Machine Translation, at Production Scale". Google Research Blog. https://research.googleblog.com/2016/09/a-neural-network-for-machine.html. Retrieved December 1, 2016. 
  18. Google Switches to its Own Translation System, October 22, 2007
  19. Barry Schwartz (October 23, 2007). "Google Translate Drops SYSTRAN for Home-Brewed Translation". Search Engine Land. http://searchengineland.com/google-translate-drops-systran-for-home-brewed-translation-12502. 
  20. "AI and compute". https://openai.com/research/ai-and-compute. 
  21. "Table of contents". https://github.com/kingoflolz/mesh-transformer-jax. 
  22. Chris McDonald (January 7, 2017), Commenting on Gil Fewster's January 5th article in the Atlantic, https://medium.com/@chrismcdonald_94568/ok-slow-down-516f93f83ac8#.l0ti3ct0b, retrieved January 11, 2017 
  23. Turovsky, Barak (November 15, 2016). "Found in translation: More accurate, fluent sentences in Google Translate". The Keyword Google Blog. https://www.blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/. Retrieved December 1, 2016. 
  24. Perez, Sarah (March 6, 2017). "Google's smarter, A.I.-powered translation system expands to more languages". Oath Inc.. https://techcrunch.com/2017/03/06/googles-smarter-a-i-powered-translation-system-expands-to-more-languages/. 
  25. Turovsky, Barak (March 6, 2017). "Higher quality neural translations for a bunch more languages". https://blog.google/products/translate/higher-quality-neural-translations-bunch-more-languages/. Retrieved March 6, 2017. 
  26. Novet, Jordan (March 30, 2017). "Google now provides AI-powered translations for Arabic and Hebrew". https://venturebeat.com/2017/03/30/google-now-provides-ai-powered-translations-for-arabic-and-hebrew/. 
  27. Finge, Rachid (April 19, 2017). "Grote verbetering voor het Nederlands in Google Translate" (in Dutch). https://nederland.googleblog.com/2017/04/grote-verbetering-voor-het-nederlands.html. 
  28. Turovsky, Barak (April 25, 2017). "Making the internet more inclusive in India". https://blog.google/products/translate/making-internet-more-inclusive-india/. 
  29. Jackson, Jeffrey L; Kuriyama, Akira; Anton, Andreea; Choi, April; Fournier, Jean-Pascal; Geier, Anne-Kathrin; Jacquerioz, Frederique; Kogan, Dmitry et al. (July 30, 2019). "The Accuracy of Google Translate for Abstracting Data From Non–English-Language Trials for Systematic Reviews". Annals of Internal Medicine 171 (9): 678. doi:10.7326/M19-0891. ISSN 0570-183X. PMID 31357212. 

External links