Global AI and Data Science

 View Only

Evolved Transformer - the next big thing for NMT of Indian Languages??

By Rajkumar Rajasekaran posted Fri April 30, 2021 05:46 AM

  

Over the past decade, spectacular improvements have been in the branch of machine translation. After the wider success of convolution based network for solving the Seq2Seq problems , the more focus towards architectural changes resulted in the greater Full attention transformer network for the Seq2Seq tasks.  Presently in the marker, transformers are a go to solution incase of the  Neural machine translation task.

But is that all enough??

Not all languages are same, each language is unique and have its own depth of morphology , Grammar, structure etc. Unique variations have to be brought into our models and techniques to clearly understand the in-depth translation .When we take the Indian languages - Tamil, Telugu, Malayalam , Hindi etc, We have a long way to go and I feel that a lot of work has to done in the area for focussing on the depth of translation.

The greatest challenge in translation  being the morphological richness and syntactic variation in divergence of words of the Indian languages, which we fail to extract clearly..

The  need a greater training data which has the full corpora of language, Non repetitive sentences , are a big questions. The major task of such issues have to be solved irrespective , but with the given  common known dataset , have we reached to our full potential?  is the question. For example , English -Tamil translation ,EnTamV2.0 (Ramasamy et al., 2012)   and  Opus dataset are   commonly used for research work.With MultiHead Attention as the base architecture ,  the notable translation model proposed by (Himanshu, Shivansh, Rajesh 2020) in "Neural machine translation for low resource Indian languages" have also used  the same dataset.

After the dataset fixation and preprocessing  ,more focus have been given towards the rare words and subunit words. It is really important for us ,not to neglect when comes to translation mainly on language that have a rich vocabulary and variations. In the paper "Neural Machine Translation of Rare Words with Subword Units" published in 2015, the author proposed doing Byte Pair Encoding (BPE) for a corpus of words. In BPE , Words are tokenised and  a specific token  is assigned at the end of each word  and the frequency of the words are measured. In the Subsequent iterations, the root word and the subunit extensions are measured for its frequency of  cooccurence and finally, each of the root words and the subunit extensions are considered as separate tokens and encoded and needed into the model. This helped the model to see to the small nuance variations in the sentence formation according to the grammar of the language.

Himanshu, Shivansh, Rajesh 2020 proposed  the model which outperforms the google translator . The Model which was created by applying  the  Base Attention architecture, along with the BPE which improved  the result. The Result  comparison was against the google translator with the BLEU score of 5.67 and their  Model - Attention with the BPE was with 9.67 . But here, more focuses towards were on to encodings rather than that of the transformer architecture.

Evolving transformer(ET) (David R , Chen kiang Quoc V. Le 2019) , main purpose was on the neural architectures search and to provide  an alternative to the transformer architecture. With the  use of tournament selection (Goldberg & Deb, 1991) algorithm as Real et al. (2019) ,  Search space encoding inspired by NasaNet ,  In the search space  mentioned ,each child model's are expressed as [left input, left normalization, left layer, left relative output dimension, left activation, right input, right normalization, right layer, right relative output dimension, right activation, combiner function] × 14 + [number of cells] × 2, with the first 6 blocks allocated to the encoder and the latter 8 allocated to the decoder.  Additionally , the  Progressive Dynamic Hurdles  method created is an automated way to give more resources to the child models which performs better and increase it for more training steps  .The poorly performed child models will not consume many resources . When a fitness falls below a give tolerable amount m the evaluation is terminated immediately . When we look at the results the Evolved transformer actually outperforms the transformer at all sizes with the biggest difference  of 0.7 BLEU at the smallest . Also when we look at the performance the ET model used 37.6% less parameters. As the ET ,outperformed transformer not only on Machine translation but also the Language modelling tasks.

Thus ,We have here an improved base model architecture to transformers , the Evolved transformer and a key method to identify the small nuance variations and subunit words through BPE . Given the fact that the BPE and MBPE performed better than the  google translator and that of direct translator with the same dataset .

It will be worth try to pitch in the idea of the BPE with the search space architecture (ET), which outperforms the base transformer model in every possible aspect. After all , building a more efficient model is our only goal.

Feel free to share your thoughts .

References :

  • Neural Machine Translation of Rare Words with Subword Units
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In I. Guyon, et al., editors, Advances in Neural Information Processing Sys- tems 30, pages 5998–6008. Curran Associates, Inc.

  • Ramasamy,L.,Bojar,O.,andZˇabokrtsky ́,Z.(2012).Mor- phological processing for english-tamil statistical ma- chine translation. In Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), pages 113–122

  • The Evolved transformer - David R. So ,Chen Liang, Quoc V. Le

  • Neural Machine Translation for Low-Resourced Indian Languages -Himanshu Choudhary, Shivansh Rao, Rajesh Rohilla


#GlobalAIandDataScience
#GlobalDataScience
0 comments
12 views

Permalink