Language models (LMs) often struggle with reasoning tasks like math or coding, particularly in low-resource languages. This challenge arises because LMs are primarily trained on data from a few high-resource languages, leaving low-resource languages underrepresented.
Previously, researchers have addressed this by continually training English-centric LMs on target languages. However, this method is difficult to scale across many languages due to the need for specific training data for each language. This issue could be more problematic for specialized LMs like MetaMath and Orca 2, which have undergone domain-specific adaptation primarily in English.
Researchers at KAIST and the University of Washington have introduced ‘LANGBRIDGE, ‘ a novel method for adapting LMs to multilingual reasoning tasks without requiring explicit multilingual training data. LANGBRIDGE combines two specialized models: one adept at understanding multiple languages (such as an mT5 encoder) and another focused on reasoning (like Orca 2). By introducing minimal trainable parameters between them, LANGBRIDGE effectively connects those models.
Importantly, their approach doesn’t require multilingual supervision and relies solely on English data while still generalizing to multiple languages during testing, similar to zero-shot cross-lingual transfer. They demonstrate LANGBRIDGE’s effectiveness on LMs specialized in mathematical reasoning, coding, and logical reasoning. Empirical results show significant improvements in multilingual reasoning performance.
Even though it’s trained only on English data, LANGBRIDGE significantly boosts language models’ performance on low-resource languages across various reasoning tasks like mathematics, coding, and logic. Their analysis indicates that the success of LANGBRIDGE is due to the language-agnostic nature of multilingual representations inspired by multimodal literature. For instance, applying LANGBRIDGE to MetaMath-13B using the mT5-XXL encoder boosts average accuracy on MGSM from 40.5% to 55.8%, matching the performance of PaLM540B at 51.3%.
They hypothesize that LANGBRIDGE’s effectiveness lies in the language-agnostic nature of multilingual representations. By mapping these representations to the LMs’ input space, the LM can grasp their semantics, making the specific language of the input irrelevant. Empirical analysis using techniques like principal component analysis (PCA) and qualitative methods supports their hypothesis.
Although multilingual representations are generally language-agnostic, previous research suggests room for improvement. While LANGBRIDGE has the potential to generalize to all languages supported by the multilingual encoder, its effectiveness in enhancing the reasoning capability of a specific language depends on two main factors: the initial proficiency of the language model in that language and the proficiency of the encoder model in that language.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our Telegram Channel
The post Researchers from KAIST and the University of Washington have introduced ‘LANGBRIDGE’: A Zero-Shot AI Approach to Adapt Language Models for Multilingual Reasoning Tasks without Multilingual Supervision appeared first on MarkTechPost.