Facebook’s Artificial Intelligence research team have expanded and improved their Language-Agnostic Sentence Representations (LASER) toolkit to work with over 90 languages, along with written in 28 different alphabets. This advancement has expedited the transfer of NLP (Natural Language Processing) applications to many more languages.
The company’s AI team is now open-sourcing LASER toolkit and making it as the first multilingual sentence representations’ exploration. The LASER toolkit currently embedded 93 languages. The toolkit gains the outcomes by embedding all languages together in a single shared place. The team is also forming the multilingual encoder and PyTorch code easily available and offering a multilingual test set for in excess for 100 languages. The LASER included 93 languages, with subject-verb-object (SVO) order like English, SOV order like Bengali and Turkic, VSO order like Tagalog and Berber, and even VOS order like Malagasy, according to the reports.
The features of LASER toolkit included, it allows zero-shot transfer of NLP models from one language like English, to scores of others involving languages where training data is limited; controls low-resource languages and dialects; offers accuracy for 13 out of the 14 languages in the XNLI corpus and delivers outcomes in cross-lingual document classification, MLDoc corpus; It offers high-speed performance with processing up to 2,000 sentences per second on GPU; and PyTorch has been used to implement the sentence encoder with minimal external dependencies. The features of the LASER also include, it supports the utilization of multiple languages in one sentence. It advances when more new languages get added and the system continues learning to identify the traits of language families.