This Paper Explores the Legal and Ethical Maze of Language Model Train …

As language models become increasingly advanced, concerns have arisen around the ethical and legal implications of training them on vast and diverse datasets. If the training data is not properly understood, it could leak sensitive information between the training and test datasets. This could expose personally identifiable information (PII), introduce unintended biases or behaviors, and ultimately produce lower-quality models than expected. The lack of comprehensive information and documentation surrounding these models creates significant ethical and legal risks that must be addressed.

A team of researchers from various institutions, including MIT, Harvard Law School, UC Irvine, MIT Center for Constructive Communication, Inria, Univ. Lille Center, Contextual AI, ML Commons, Olin College, Carnegie Mellon University, Tidelift, and Cohere For AI have demonstrated their commitment to promoting transparency and responsible utilization of datasets by releasing a comprehensive audit. The audit includes Data Provenance Explorer, an interactive user interface that enables practitioners to trace and filter data provenance for widely used open-source fine-tuning data collections.

Copyright laws provide authors exclusive ownership of their work, while open-source licenses encourage collaboration in software development. However, supervised AI training data presents unique challenges for open-source licenses in managing data effectively. The interaction between copyright and permits within collected datasets is yet to be determined, with legal challenges and uncertainties surrounding the application of relevant laws to generative AI and supervised datasets. Previous work has stressed the importance of data documentation and attribution, with Datasheets and other studies highlighting the need for comprehensive documentation and curation rationale for datasets.

The study conducted by researchers involved manual retrieval of pages and automatic extraction of licenses from HuggingFace configurations and GitHub pages. They also utilized the Semantic Scholar public API to retrieve academic publication release dates and citation counts. To ensure fair treatment across languages, the researchers used a series of data properties in characters, such as text metrics, dialog turns, and sequence length. In addition, they conducted a landscape analysis to trace the lineage of over 1800 text datasets, examining their source, creators, license conditions, properties, and subsequent use. To facilitate the audit and tracing processes, they developed tools and standards to improve dataset transparency and responsible use.

The landscape analysis has revealed stark differences in the composition and focus of commercially available open and closed datasets. The datasets that are difficult to access dominate essential categories such as lower resource languages, more creative tasks, wider topic variety, and newer and more synthetic training data. The study has also highlighted the problem of misattribution and the incorrect use of frequently used datasets. On popular dataset hosting sites, licenses are frequently miscategorized, and license omission rates exceed 70%, with error rates of over 50%. The study emphasizes the need for comprehensive data documentation and attribution. It also highlights the challenges of synthesizing documentation for models trained on multiple data sources.

The study concludes that there are significant differences in the composition and focus of commercially open and closed datasets. Impenetrable datasets monopolize important categories, indicating a deepening divide in the data types available under different license conditions. The study found frequent miscategorization of licenses on dataset hosting sites and high rates of license omission. This points to trouble in misattribution and informed use of popular datasets, raising concerns about data transparency and responsible use. The researchers released their entire audit, including the Data Provenance Explorer, to contribute to ongoing improvements in dataset transparency and reliable use. The landscape analysis and tools developed in the study aim to improve dataset transparency and understanding, addressing the legal and ethical risks associated with training language models on inconsistently documented datasets.

Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This Paper Explores the Legal and Ethical Maze of Language Model Training: Unveiling the Risks and Remedies in Dataset Transparency and Use appeared first on MarkTechPost.

Can Machine Learning Predict Chaos? This Paper from UT Austin Performs …

The science of predicting chaotic systems lies at the intriguing intersection of physics and computer science. This field delves into understanding and forecasting the unpredictable nature of systems where small initial changes can lead to significantly divergent outcomes. It’s a realm where the butterfly effect reigns supreme, challenging the traditional notions of predictability and order.

Central to the challenge in this domain is the unpredictability inherent in chaotic systems. Forecasting these systems is complex due to their sensitive dependence on initial conditions, making long-term predictions highly challenging. Researchers strive to find methods that can accurately anticipate the future states of such systems despite the inherent unpredictability.

Prior approaches in chaotic system prediction have largely centered around domain-specific and physics-based models. These models, informed by an understanding of the underlying physical processes, have been the traditional tools for tackling the complexities of chaotic systems. However, their effectiveness is often limited by the intricate nature of the systems they attempt to predict.

Researchers from the University of Texas at Austin Introduce a new spectrum of domain-agnostic models diverging from traditional physics-based approaches. These models are based on leveraging large-scale machine learning techniques, utilizing extensive datasets to navigate the complexities of chaotic systems without relying heavily on domain-specific knowledge.

The novel methodology employs large-scale, overparametrized statistical learning models, such as transformers and hierarchical neural networks. These models utilize their extensive scale and access to substantial time series datasets, enabling them to forecast chaotic systems effectively. The approach signifies a shift from relying on domain knowledge to using data-driven predictions.

The performance of these new models is noteworthy. They consistently produce accurate predictions over extended periods, well beyond the traditional forecasting horizons. This advancement represents a significant leap in the field, demonstrating that the ability to forecast chaotic systems can extend far beyond previously established limits.

In conclusion, the paper reveals an intriguing development in forecasting chaotic systems. The transition from domain-specific models to large-scale, data-driven approaches opens new avenues in predicting the unpredictable. It highlights a growing trend where the scale and availability of data, coupled with advanced machine learning techniques, are reshaping our approach to understanding and forecasting chaotic systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Can machine learning predict chaos? My new paper performs a large-scale comparison of modern forecasting methods on a giant dataset of 135 chaotic systems. (1/N)https://t.co/bOD3C1qTxY @TexasScience @OdenInstitute @UTPhysics pic.twitter.com/kkTiUfDfvU— William Gilpin (@wgilpin0) December 21, 2023

The post Can Machine Learning Predict Chaos? This Paper from UT Austin Performs a Large-Scale Comparison of Modern Forecasting Methods on a Giant Dataset of 135 Chaotic Systems appeared first on MarkTechPost.

This AI Paper Unveils the Cached Transformer: A Transformer Model with …

Transformer models are crucial in machine learning for language and vision processing tasks. Transformers, renowned for their effectiveness in sequential data handling, play a pivotal role in natural language processing and computer vision. They are designed to process input data in parallel, making them highly efficient for large datasets. Regardless, traditional Transformer architectures must improve their ability to manage long-term dependencies within sequences, a critical aspect for understanding context in language and images.

The central challenge addressed in the current study is the efficient and effective modeling of long-term dependencies in sequential data. While adept at handling shorter sequences, traditional transformer models need help capturing extensive contextual relationships, primarily due to computational and memory constraints. This limitation becomes pronounced in tasks requiring understanding long-range dependencies, such as in complex sentence structures in language modeling or detailed image recognition in vision tasks, where the context may span across a wide range of input data.

Present methods to mitigate these limitations include various memory-based approaches and specialized attention mechanisms. However, these solutions often increase computational complexity or fail to capture sparse, long-range dependencies adequately. Techniques like memory caching and selective attention have been employed, but they either increase the model’s complexity or need to extend the model’s receptive field sufficiently. The existing landscape of solutions underscores the need for a more effective method to enhance Transformers’ ability to process long sequences without prohibitive computational costs.

Researchers from The Chinese University of Hong Kong, The University of Hong Kong, and Tencent Inc. propose an innovative approach called Cached Transformers, augmented with a Gated Recurrent Cache (GRC). This novel component is designed to enhance Transformers’ capability to handle long-term relationships in data. The GRC is a dynamic memory system that efficiently stores and updates token embeddings based on their relevance and historical significance. This system allows the Transformer to process the current input and draw on a rich, contextually relevant history, thereby significantly expanding its understanding of long-range dependencies.

https://arxiv.org/abs/2312.12742

The GRC is a key innovation that dynamically updates a token embedding cache to represent historical data efficiently. This adaptive caching mechanism enables the Transformer model to attend to a combination of current and accumulated information, significantly extending its ability to process long-range dependencies. The GRC maintains a balance between the need to store relevant historical data and the computational efficiency, thereby addressing the traditional Transformer models’ limitations in handling long sequential data.

Integrating Cached Transformers with GRC demonstrates notable improvements in language and vision tasks. For instance, in language modeling, the enhanced Transformer models equipped with GRC outperform traditional models, achieving lower perplexity and higher accuracy in complex tasks like machine translation. This improvement is attributed to the GRC’s efficient handling of long-range dependencies, providing a more comprehensive context for each input sequence. Such advancements indicate a significant step forward in the capabilities of Transformer models.

https://arxiv.org/abs/2312.12742

In conclusion, the research can be summarized in the following points:

The problem of modeling long-term dependencies in sequential data is effectively tackled by Cached Transformers with GRC.

The GRC mechanism significantly enhances the Transformers’ ability to understand and process extended sequences, thus improving performance in both language and vision tasks.

This advancement represents a notable leap in machine learning, particularly in how Transformer models handle context and dependencies over long data sequences, setting a new standard for future developments in the field.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper Unveils the Cached Transformer: A Transformer Model with GRC (Gated Recurrent Cached) Attention for Enhanced Language and Vision Tasks appeared first on MarkTechPost.

This AI Paper Introduces InstructVideo: A Novel AI Approach to Enhance …

Diffusion models have become the prevailing approach for generating videos. Yet, their dependence on large-scale web data, which varies in quality, frequently leads to outcomes lacking visual appeal and not aligning well with the provided textual prompts. Despite advancements in recent times, there is still room for enhancing the visual quality of generated videos. One notable factor contributing to this challenge is the diverse quality of the extensive web data used in pre-training. This variability can result in models capable of producing content that lacks visual appeal, may be toxic, and does not align well with the provided prompts.

A team of researchers from Zhejiang University, Alibaba Group, Tsinghua University, Singapore University of Technology and Design, S-Lab, Nanyang Technological University, CAML Lab, and the University of Cambridge introduced InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning. Comprehensive experiments, encompassing qualitative and quantitative assessments, confirm the practicality and effectiveness of incorporating image reward models in InstructVideo. This approach significantly improves the visual quality of generated videos without compromising the model’s ability to generalize.

https://arxiv.org/abs/2312.12490

Early efforts at video generation focused on GANs and VAEs, but generating videos from texts remained a challenge. Diffusion models have emerged as the de facto method for video generation, providing diversity and fidelity. VDM extended image diffusion models to video generation. Efforts were made to introduce spatiotemporal conditions for a more controllable generation. Understanding human preference in visual content generation is challenging, and some works use annotation and fine-tuning annotated data to model human preferences. Learning from human feedback in optical content generation is desirable, and previous works focused on reinforcement learning and agent alignment. 

https://arxiv.org/abs/2312.12490

InstructVideo utilizes a reformulation of reward fine-tuning as editing, improving computational efficiency and efficacy. The method incorporates Segmental Video Reward (SegVR) and Temporally Attenuated Reward (TAR) to enable efficient reward fine-tuning using image reward models. SegVR provides reward signals based on segmental sparse sampling, while TAR mitigates temporal modeling degradation during fine-tuning. The optimization objective is rewritten to include the degree of the attenuating rate, with a default value of 1 for the coefficient. InstructVideo leverages the diffusion process to obtain the starting point for reward fine-tuning in video generation.

https://arxiv.org/abs/2312.12490

The research provides more visualization results to exemplify the conclusions drawn, showcasing how the generated videos evolve. The efficacy of InstructVideo is demonstrated through an ablation study on SegVR and TAR, showing that their removal leads to a noticeable reduction in temporal modeling capabilities. InstructVideo consistently outperforms other methods in terms of video quality, with improvements in video quality being more pronounced than improvements in video-text alignment.

In conclusion, the InstructVideo method significantly enhances the visual quality of generated videos without compromising generalization capabilities, as validated through extensive qualitative and quantitative experiments.InstructVideo outperforms other methods regarding video quality, with improvements in video quality being more pronounced than improvements in video-text alignment. Using image reward models, such as HPSv2, in InstructVideo proves practical and effective in enhancing the visual quality of generated videos. Incorporating SegVR and TAR in InstructVideo improves fine-tuning and mitigates temporal modeling degradation. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper Introduces InstructVideo: A Novel AI Approach to Enhance Text-to-Video Diffusion Models Using Human Feedback and Efficient Fine-Tuning Techniques appeared first on MarkTechPost.

Alibaba Researchers Propose I2VGen-xl: A Cascaded Video Synthesis AI M …

Researchers from Alibaba, Zhejiang University, and Huazhong University of Science and Technology have come together and introduced a groundbreaking video synthesis model, I2VGen-XL, addressing key challenges in semantic accuracy, clarity, and spatio-temporal continuity. Video generation is often hindered by the scarcity of well-aligned text-video data and the complex structure of videos. To overcome these obstacles, the researchers propose a cascaded approach with two stages, known as I2VGen-XL.

The I2VGen-XL overcomes the obstacle in two stages:

The base stage focuses on ensuring coherent semantics and preserving content by utilizing two hierarchical encoders. A fixed CLIP encoder extracts high-level semantics, while a learnable content encoder captures low-level details. These features are then integrated into a video diffusion model to generate videos with semantic accuracy at a lower resolution. 

The refinement stage enhances video details and resolution to 1280×720 by incorporating additional brief text guidance. The refinement model employs a distinct video diffusion model and a simple text input for high-quality video generation.

One of the main challenges in text-to-video synthesis currently is the collection of high-quality video-text pairs. To enrich the diversity and robustness of I2VGen-XL, the researchers collect a vast dataset comprising around 35 million single-shot text-video pairs and 6 billion text-image pairs, covering a wide range of daily life categories. Through extensive experiments, the researchers compare I2VGen-XL with existing top methods, demonstrating its effectiveness in enhancing semantic accuracy, continuity of details, and clarity in generated videos.

The proposed model leverages Latent Diffusion Models (LDM), a generative model class that learns a diffusion process to generate target probability distributions. In the case of video synthesis, LDM gradually recovers the target latent from Gaussian noise, preserving the visual manifold and reconstructing high-fidelity videos. I2VGen-XL adopts a 3D UNet architecture for LDM, referred to as VLDM, to achieve effective and efficient video synthesis.

The refinement stage is pivotal in enhancing spatial details, refining facial and bodily features, and reducing noise within local details. The researchers analyze the working mechanism of the refinement model in the frequency domain, highlighting its effectiveness in preserving low-frequency data and improving the continuity of high-definition videos.

In experimental comparisons with top methods like Gen-2 and Pika, I2VGen-XL showcases richer and more diverse motions, emphasizing its effectiveness in video generation. The researchers also conduct qualitative analyses on a diverse range of images, including human faces, 3D cartoons, anime, Chinese paintings, and small animals, demonstrating the model’s generalization ability.

In conclusion, I2VGen-XL represents a significant advancement in video synthesis, addressing key challenges in semantic accuracy and spatio-temporal continuity. The cascaded approach, coupled with extensive data collection and utilization of Latent Diffusion Models, positions I2VGen-XL as a promising model for high-quality video generation from static images. The model has also identified limitations, including challenges in generating natural and free human body movements, limitations in generating long videos, and the need for improved user intent understanding.

Check out the Paper, Model, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Alibaba Researchers Propose I2VGen-xl: A Cascaded Video Synthesis AI Model which is Capable of Generating High-Quality Videos from a Single Static Image appeared first on MarkTechPost.

This Machine Learning Research Opens up a Mathematical Perspective on …

The release of Transformers has marked a significant advancement in the field of Artificial Intelligence (AI) and neural network topologies. Understanding the workings of these complex neural network architectures requires an understanding of transformers. What distinguishes transformers from conventional architectures is the concept of self-attention, which describes a transformer model’s capacity to focus on distinct segments of the input sequence during prediction. Self-attention greatly enhances the performance of transformers in real-world applications, including computer vision and Natural Language Processing (NLP).

In a recent study, researchers have provided a mathematical model that can be used to perceive Transformers as particle systems in interaction. The mathematical framework offers a methodical way to analyze Transformers’ internal operations. In an interacting particle system, the behavior of the individual particles influences that of the other parts, resulting in a complex network of interconnected systems.

The study explores the finding that Transformers can be thought of as flow maps on the space of probability measures. In this sense, transformers generate a mean-field interacting particle system in which every particle, called a token, follows the vector field flow defined by the empirical measure of all particles. The continuity equation governs the evolution of the empirical measure, and the long-term behavior of this system, which is typified by particle clustering, becomes an object of study.

In tasks like next-token prediction, the clustering phenomenon is important because the output measure represents the probability distribution of the next token. The limiting distribution is a point mass, which is unexpected and suggests that there isn’t much diversity or unpredictability. The concept of a long-time metastable condition, which overcomes this apparent paradox, has been introduced in the study. Transformer flow shows two different time scales: tokens quickly form clusters at first, then clusters merge at a much slower pace, eventually collapsing all tokens into one point.

The primary goal of this study is to offer a generic, understandable framework for a mathematical analysis of Transformers. This includes drawing links to well-known mathematical subjects such as Wasserstein gradient flows, nonlinear transport equations, collective behavior models, and ideal point configurations on spheres. Secondly, it highlights areas for future research, with a focus on comprehending the phenomena of long-term clustering. The study involves three major sections, which are as follows.

Modeling: By interpreting discrete layer indices as a continuous time variable, an idealized model of the Transformer architecture has been defined. This model emphasizes two important transformer components: layer normalization and self-attention.

Clustering: In the large time limit, tokens have been shown to cluster according to new mathematical results. The major findings have shown that as time approaches infinity, a collection of randomly initialized particles on the unit sphere clusters to a single point in high dimensions.

Future research: Several topics for further research have been presented, such as the two-dimensional example, the model’s changes, the relationship to Kuramoto oscillators, and parameter-tuned interacting particle systems in transformer architectures.

The team has shared that one of the main conclusions of the study is that clusters form inside the Transformer architecture over extended periods of time. This suggests that the particles, i.e., the model elements have a tendency to self-organize into discrete groups or clusters as the system changes with time. 

In conclusion, this study emphasizes the concept of Transformers as interacting particle systems and adds a useful mathematical framework for the analysis. It offers a new way to study the theoretical foundations of Large Language Models (LLMs) and a new way to use mathematical ideas to comprehend intricate neural network structures. 

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This Machine Learning Research Opens up a Mathematical Perspective on the Transformers appeared first on MarkTechPost.

Microsoft Researchers Introduce InsightPilot: An LLM-Empowered Automat …

Data exploration is an important step in data analysis that extracts key insights using multiple steps such as filtering, sorting, grouping, etc. It helps uncover patterns in the dataset and reveal potential relationships among the variables. However, this process is generally interactive and requires the user to manually explore the data, making the process time-consuming and necessitating domain expertise. 

Although different tools exist for general data exploration, they often fail to consider user intent and dataset characteristics, leading to irrelevant insights. Additionally, LLM hallucination is an infamous issue that causes LLMs to generate unreliable content. To tackle the shortcomings of existing models, researchers at Microsoft have released InsightPilot, a system that automates the process of data exploration using LLMs. The system provides LLMs with accurate insights to avoid hallucinations and presents a compact abstraction of the dataset to reduce computational costs, which allows the LLM to answer user questions better.

InsightsPilot consists of the following three components:

A UI that allows users to ask questions in natural language and also display the analysis results.

An LLM that facilitates data exploration by selecting the appropriate analysis on the basis of the context.

An insight engine that does the analysis and presents the results in natural language.

A user initially poses a query in the interface, and the insight engine generates preliminary insights. Depending on the context, the LLM identifies the most relevant insights and keeps querying the engine to get more details about them. For example, a user may ask about trends in science scores for students, and then, based on initial insights, the LLM might query the engine for further analysis, such as comparing scores or finding any outliers. As long as the exploration is not complete, the interaction between the LLM and the engine continues, and at the end of the data exploration step, the engine presents the top-K insights in the form of a coherent report, which is then displayed to the user via the interface.

To evaluate its performance, the researchers conducted user studies to simulate real-world use cases of InsightPilot. Four data science participants were asked to raise three questions, and the system was evaluated against metrics like relevance, completeness, and understandability. The results show that InsightPilot consistently outperformed both OpenAI Code Interpreter and Langchain Pandas Agent. 

A case study based on a car sales dataset was also conducted to assess the performance of InsightPilot. When enquiring about the overall trend of Toyota’s car sales, the system not only identified ‘Camry’ as the key driver of Toyota’s sales but also compared Toyota’s sales with that of Honda and provided other interesting insights as well.

Although InsightPilot performs better than other state-of-the-art systems, it often produces vague answers that necessitate manual evaluation. Therefore, it is crucial to test its effectiveness across different real-life datasets. Nonetheless, it is an effective method of deriving insights from a dataset using natural language inquiries and has the potential to streamline the process of exploratory data analysis and save time and effort. Further research is necessary to ensure the method can be deployed in real-world scenarios and bolster efficiency and data-driven decision-making.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Microsoft Researchers Introduce InsightPilot: An LLM-Empowered Automated Data Exploration System appeared first on MarkTechPost.

Meet HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions U …

In response to the challenging task of generating realistic 3D human-object interactions (HOIs) guided by textual prompts, researchers from Northeastern University, Hangzhou Dianzi University, Stability AI, and Google Research have introduced an innovative solution called HOI-Diff. The intricacies of human-object interactions in computer vision and artificial intelligence have posed a significant hurdle for synthesis tasks. HOI-Diff stands out by adopting a modular design that effectively decomposes the synthesis task into three core modules: a dual-branch diffusion model (HOI-DM) for coarse 3D HOI generation, an affordance prediction diffusion model (APDM) for estimating contacting points, and an affordance-guided interaction correction mechanism for precise human-object interactions.

Traditional approaches to text-driven motion synthesis often fell short by concentrating solely on generating isolated human motions, neglecting the crucial interactions with objects. HOI-Diff addresses this limitation by introducing a dual-branch diffusion model (HOI-DM) capable of simultaneously generating human and object motions based on textual prompts. This innovative design enhances the coherence and realism of generated motions through a cross-attention communication module between the human and object motion generation branches. Additionally, the research team introduces an affordance prediction diffusion model (APDM) to predict the contacting areas between humans and objects during interactions guided by textual prompts.

https://arxiv.org/abs/2312.06553

The affordance prediction diffusion model (APDM) plays a crucial role in the overall effectiveness of HOI-Diff. Operating independently of the HOI-DM results, the APDM acts as a corrective mechanism, addressing potential errors in the generated motions. Notably, the stochastic generation of contacting points by the APDM introduces diversity in the synthesized motions. The researchers further integrate the estimated contacting points into a classifier-guidance system, ensuring accurate and close contact between humans and objects, thereby forming coherent HOIs.

To experimentally validate the capabilities of HOI-Diff, the researchers annotated the BEHAVE dataset with text descriptions, providing a comprehensive training and evaluation framework. The results demonstrate the model’s ability to produce realistic HOIs encompassing various interactions and different types of objects. The modular design and affordance-guided interaction correction showcase significant improvements in generating dynamic and static interactions.

Comparative evaluations against conventional methods, which primarily focus on generating human motions in isolation, reveal the superior performance of HOI-Diff. For this purpose, the researchers adapt two baseline models, MDM and PriorMDM. Visual and quantitative results underscore the model’s effectiveness in generating realistic and accurate human-object interactions.

However, the research team acknowledges certain limitations. Existing datasets for 3D HOIs pose constraints on action and motion diversity, presenting challenges for synthesizing long-term interactions. The precision of affordance estimation remains a critical factor influencing the model’s overall performance.

In conclusion, HOI-Diff represents a novel and effective solution to the intricate problem of 3D human-object interaction synthesis. The modular design and innovative correction mechanisms position it as a promising approach for applications such as animation and virtual environment development. Addressing challenges related to dataset limitations and affordance estimation precision as the field progresses could further enhance the model’s realism and applicability across diverse domains. HOI-Diff is a testament to the continual advancements in text-driven synthesis and human-object interaction modeling.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Check out our latest work. HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models.We propose a diffusion model that can generate realistic 3D human-object interactions (HOIs) driven by textual prompts .Paper: https://t.co/c2ga21JpnsProject:… pic.twitter.com/Dms4v4iczn— Huaizu Jiang (@HuaizuJiang) December 20, 2023

The post Meet HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions Using Diffusion Models appeared first on MarkTechPost.

Meet PowerInfer: A Fast Large Language Model (LLM) on a Single Consume …

Generative Large Language Models (LLMs) are well known for their remarkable performance in a variety of tasks, including complex Natural Language Processing (NLP), creative writing, question answering, and code generation. In recent times, LLMs have been run on approachable local systems, including home PCs with consumer-grade GPUs for improved data privacy, customizable models, and lower inference costs. Local installations prioritize low latency over high throughput; however, LLMs are difficult to implement on consumer-grade GPUs because of high memory requirements.

These models, which are frequently autoregressive transformers, produce text token by token and, for each inference, need access to the complete model with hundreds of billions of parameters. This limitation is noticeable in local deployments because there is less space for parallel processing when handling individual requests. Two current strategies to deal with these memory problems are offloading and model compression.

In a recent study, a team of researchers presented PowerInfer, an effective LLM inference system designed for local deployments using a single consumer-grade GPU. PowerInfer reduces the requirement for expensive PCIe (Peripheral Component Interconnect Express) data transfers by preselecting and preloading hot-activated neurons onto the GPU offline and using online predictors to identify active neurons during runtime. 

The core idea behind PowerInfer’s design is to make use of the high locality that comes with LLM inference, which is typified by a power-law distribution in neuron activation. This distribution shows that most cold neurons change based on certain inputs, whereas a tiny fraction of hot neurons consistently activate across different inputs.

The team has shared that PowerInfer is a GPU-CPU hybrid inference engine that makes use of this understanding. It preloads cold-activated neurons onto the CPU for computation and hot-activated neurons onto the GPU for instant access. By distributing the workload strategically, the GPU’s memory requirements are greatly reduced, and there are fewer data transfers between the CPU and GPU. 

PowerInfer integrates neuron-aware sparse operators and adaptive predictors to optimize performance further. Neuron-aware sparse operators directly interact with individual neurons, eliminating the need to operate on entire matrices, while adaptive predictors help identify and forecast active neurons during runtime. These optimizations enhance computational sparsity and effective neuron activation.

The team has evaluated PowerInfer’s performance, which has shown an average token creation rate of 13.20 per second and a peak performance of 29.08 tokens per second. These outcomes have been achieved using a single NVIDIA RTX 4090 GPU and a variety of LLMs, including the OPT-175B model. This performance only falls 18% short of the best-in-class server-grade A100 GPU, demonstrating PowerInfer’s effectiveness on mainstream hardware.

Upon evaluation, PowerInfer has also shown that it has the capability to run up to 11.69 times faster than the current llama.cpp system while retaining model fidelity. In conclusion, PowerInfer offers a significant boost in LLM inference speed, indicating its potential as a solution for advanced language model execution on desktop PCs with constrained GPU capabilities.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Meet PowerInfer: A Fast Large Language Model (LLM) on a Single Consumer-Grade GPU that Speeds up Machine Learning Model Inference By 11 Times appeared first on MarkTechPost.

Researchers from Genentech and Stanford University Develop an Iterativ …

Basic information about gene and cell function is revealed by the expression response of a cell to a genetic disturbance. Using a readout of the expression response to a perturbation using single-cell RNA seq (scRNA-seq), perturb-seq is a new method for pooled genetic screens. Perturb-seq allows for the engineering of cells to a certain state, sheds light on the gene regulation system, and aids in identifying target genes for therapeutic intervention. 

The efficiency, scalability, and breadth of Perturb-Seq have all been augmented by recent technological developments. The number of tests needed to evaluate various perturbations multiplies exponentially due to the wide variety of biological contexts, cell types, states, and stimuli. This is because non-additive genetic interactions are a possibility. Executing all of the experiments directly becomes impractical when there are billions of possible configurations.

According to recent research, the results of perturbations can be predicted using machine learning models. They use pre-existing Perturb-seq datasets to train their algorithms, forecasting the expression results of unseen perturbations, individual genes, or combinations of genes. Although these models show promise, they are flawed due to a selection bias introduced by the original experiment’s design, which affected the biological circumstances and perturbations chosen for training. 

Genentech and Stanford University researchers introduce a new way of thinking about running a series of perturb-seq experiments to investigate a perturbation space. In this paradigm, the Perturb-seq assay is carried out in a wet-lab environment, and the machine learning model is implemented using an interleaving sequential optimal design approach. Data acquisition and re-training of the machine learning model occurs at each process stage. To ensure that the model can accurately forecast unprofiled perturbations, the researchers next use an optimal design technique to choose a set of perturbation experiments. To intelligently sample the perturbation space, one must consider the most informative and representative perturbations to the model while allowing for diversity. This approach allows the creation of a model that has adequately explored the perturbation space with minimal perturbation experiments done.

Active learning is based on this principle, which has been extensively researched in machine learning. Document classification, medical imaging, and speech recognition are examples of the many areas that have put active learning into practice. The findings demonstrate that active learning methods that work require a large initial set of labeled examples—profiled perturbations in this case—along with several batches that add up to tens of thousands of labeled data points. The team also performed an economic analysis that shows such conditions are not feasible due to the time and money constraints of iterative Perturb-seq in the lab.

To address the issue of active learning in a budget context for Perturb-seq data, the team provides a novel approach termed ITERPERT (ITERative PERTurb-seq). Inspired by data-driven research, this work’s main takeaway is that it might be useful to supplement data evidence with publically available prior knowledge sources, particularly in the early stages and when funds are tight. Data on physical molecular interactions, such as protein complexes, Perturb-seq information from comparable systems, and large-scale genetic screens using other modalities, such as genome-scale optical pooling screens, are examples of such prior knowledge. The prior knowledge encompasses several forms of representation, including networks, text, images, and three-dimensional structures, which could be difficult to utilize when engaging in active learning. To get around this, the team defines replicating kernel Hilbert spaces on all modalities and uses a kernel fusion approach to merge data from different sources.

They performed an intensive empirical investigation using a large-scale single-gene CRISPRi Perturb-seq dataset obtained in a cancer cell line (K562 cells). They benchmarked eight recent active learning methodologies to compare ITERPERT to other regularly used approaches. ITERPERT obtained accuracy levels comparable to the top active learning technique while using training data containing three times fewer perturbations. When considering batch effects throughout iterations, ITERPERT demonstrated strong performance in critical gene and genome-scale screens.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Researchers from Genentech and Stanford University Develop an Iterative Perturb-seq Procedure Leveraging Machine Learning for Efficient Design of Perturbation Experiments appeared first on MarkTechPost.

This AI Paper from CMU Shows an in-depth Exploration of Gemini’s Lan …

Google’s Gemini Model has been in the talks ever since the day of its release. This recent addition to the long list of incredible language models has marked a significant milestone in the field of Artificial Intelligence (AI) and Machine Learning (ML). Gemini’s exceptional performance makes it the first to compete with the OpenAI GPT model series on a variety of tasks. The Ultra version of Gemini is said to perform better than GPT-4, and the Pro version is on par with GPT-3.5.

However, the full details of the evaluation and model projections have not been made public, which limits the capacity to replicate, closely examine, and thoroughly analyze the results, even in light of the potential relevance of these discoveries. To address this, in a recent study, a team of researchers from Carnegie Mellon University and BerriAI explored Gemini’s language production and its capabilities in depth.

The team has conducted the study with two primary goals. Firstly, a third-party assessment of the capabilities of the Google Gemini and OpenAI GPT model classes has been conducted. A reproducible code and an open display of the results have also been used to achieve this. The second goal’s main focus was finding areas where one of the two model classes performs better than the other, which is a thorough analysis of the outcomes. A brief comparison with the Mixtral model, which acts as a standard for the best-in-class open-source model, has also been included in the study.

Ten datasets have been included in the analysis, which thoroughly assesses different language proficiency levels. The tasks included reasoning, knowledge-based question answering, mathematical problem solving, language translation, following instructions, and code production. The evaluation datasets included WebArena for instruction-following, FLORES for language translation, and BigBenchHard for reasoning problems.

The assessment has offered a thorough comprehension of Gemini’s advantages and disadvantages in comparison to the OpenAI GPT models. The results have shown that Gemini Pro performs on all benchmarked tasks with accuracy that is nearly identical to, but marginally behind, that of the matching GPT 3.5 Turbo. The report goes beyond simply summarising the findings and explores the reasons behind some of Gemini’s performance lapses. Prominent examples include difficulties with multiple-digit numerical reasoning, sensitivity to multiple-choice response ordering, and problems with severe content filtering.

The study has also highlighted the strengths of Gemini, including the creation of material in languages other than English and the deft management of lengthier and more intricate reasoning chains. These revelations offer a more nuanced perspective on the advantages and disadvantages of the Gemini models relative to their GPT equivalents.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post This AI Paper from CMU Shows an in-depth Exploration of Gemini’s Language Abilities appeared first on MarkTechPost.

MIT Researchers Introduce a Novel Machine Learning Approach in Develop …

In recent AI advancements, optimizing large language models (LLMs) has been the most pressing issue. These advanced AI models offer unprecedented capabilities in processing and understanding natural language, yet they come with significant drawbacks. The primary challenges include their immense size, high computational demands, and substantial energy requirements. These factors make LLMs costly to operate and limit their accessibility and practical application, particularly for organizations without extensive resources. There is a growing need for methods to streamline these models, making them more efficient without sacrificing performance.

The current landscape of LLM optimization involves various techniques, with model pruning standing out as a prominent method. Model pruning focuses on reducing the size of neural networks by removing weights that are deemed non-critical. The idea is to strip down the model to its essential components, reducing its complexity and operational demands. Model pruning addresses the challenges of high costs and latency associated with running large models.

Additionally, identifying trainable subnetworks within larger models, known as ‘lottery tickets,’ offers a path to achieving comparable accuracy with a significantly reduced model footprint.

The proposed solution by the MIT researchers is a novel technique called ‘contextual pruning,’ aimed at developing efficient Mini-GPTs. This approach tailors the pruning process to specific domains, such as law, healthcare, and finance. By analyzing and selectively removing weights less critical for certain domains, the method aims to maintain or enhance the model’s performance while drastically reducing its size and resource requirements. This targeted pruning strategy represents a significant leap forward in making LLMs more versatile and sustainable.

The methodology of contextual pruning involves meticulous analysis and pruning of linear layers, activation layers, and embedding layers in LLMs. The research team conducted comprehensive studies to identify less crucial weights for maintaining performance in different domains. This process included a multi-faceted pruning approach, targeting various model components to optimize efficiency.

The performance of Mini-GPTs post-contextual pruning was rigorously evaluated using metrics like perplexity and multiple-choice question testing. The promising results showed that the pruned models generally retained or improved their performance across various datasets after pruning and fine-tuning. These results indicate that the models preserved their core capabilities despite the reduction in size and complexity. In some instances, the pruned models even outperformed their unpruned counterparts in specific tasks, highlighting the effectiveness of contextual pruning.

In conclusion, this research marks a significant stride in optimizing LLMs for practical use. The development of Mini-GPTs through contextual pruning not only addresses the challenges of size and resource demands but also opens up new possibilities for applying LLMs in diverse domains. Future directions include refinement of pruning techniques, application to larger datasets, integration with other optimization methods, and exploration of newer model architectures. This research paves the way for more accessible, efficient, and versatile use of LLMs across various industries and applications.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post MIT Researchers Introduce a Novel Machine Learning Approach in Developing Mini-GPTs via Contextual Pruning appeared first on MarkTechPost.

Microsoft Azure AI Widens Model Selection with Llama 2 and GPT-4 Turbo …

In a recent move, Microsoft’s Azure AI platform has expanded its range by introducing two advanced AI models, Llama 2 and GPT-4 Turbo with Vision. This addition marks a significant expansion in the platform’s AI capabilities.

The team at Microsoft Azure AI recently announced the arrival of Llama 2, a set of models developed by Meta, into the Azure AI Model as a Service (MaaS). These include different versions like Llama-2-7b and Llama-2-13b, specializing in tasks such as generating text and completing conversations. Furthermore, these models are now accessible through simplified API endpoints, making it easier for businesses to use without the complexities of managing cloud execution instances.

At the same time, Microsoft unveiled OpenAI GPT-4 Turbo with Vision, an innovative multimodal model that combines language understanding and image analysis. This integration allows the model to analyze images and provide text-based responses to queries, significantly enhancing its capabilities. GPT-4 with Vision is now available for public testing on platforms like Azure OpenAI Service and Azure AI Studio, offering features such as video prompts, object identification in images using text descriptions, and improved optical character recognition (OCR).

Recognizing the need for diverse AI sources, especially following recent developments at OpenAI, Microsoft reiterated its commitment to expanding its AI offerings while ensuring secure and responsible AI governance in the Microsoft cloud. The team emphasized the expansion of fine-tuning capabilities for various models, enabling developers and data scientists to customize models like GPT-35-Turbo, Babbage-002, and Davinci-002 to suit specific needs.

Azure AI’s model catalog has also been enriched with six new models, including Microsoft’s small language models, Phi-2 and Orca 2, each equipped with billions of parameters. These are accompanied by DeciLM, DeciDiffusion, DeciCoder, and Mistral 7B Mixtral 8x7b, catering to diverse requirements in text generation, text-to-picture translation, program completion, and more.

Azure AI Studio now provides benchmark test data in the model’s test section to assist enterprise users in navigating this range of models. This data is a helpful resource, aiding users in making informed decisions when selecting models for specific applications or needs.

Introducing these models coincides with recent leadership changes within OpenAI, prompting Microsoft to stress the importance of diversifying AI model sources to mitigate risks. Despite affirming continued collaboration with OpenAI, this strategic expansion underscores Microsoft’s commitment to broadening its AI offerings and ensuring a versatile range of tools and solutions for its users.

In summary, Microsoft’s recent enhancements in Azure AI, including the addition of Llama 2, GPT-4 Turbo with Vision, expanded fine-tuning capabilities, and the introduction of diverse models, reflect a concerted effort to strengthen its AI offerings, providing users with a comprehensive suite of advanced tools and solutions.
The post Microsoft Azure AI Widens Model Selection with Llama 2 and GPT-4 Turbo with Vision appeared first on MarkTechPost.

Amazon SageMaker model parallel library now accelerates PyTorch FSDP w …

Large language model (LLM) training has surged in popularity over the last year with the release of several popular models such as Llama 2, Falcon, and Mistral. Customers are now pre-training and fine-tuning LLMs ranging from 1 billion to over 175 billion parameters to optimize model performance for applications across industries, from healthcare to finance and marketing.
Training performant models at this scale can be a challenge. Highly accurate LLMs can require terabytes of training data and thousands or even millions of hours of accelerator compute time to achieve target accuracy. To complete training and launch products in a timely manner, customers rely on parallelism techniques to distribute this enormous workload across up to thousands of accelerator devices. However, these parallelism techniques can be difficult to use: different techniques and libraries are only compatible with certain workloads or restricted to certain model architectures, training performance can be highly sensitive to obscure configurations, and the state of the art is quickly evolving. As a result, machine learning practitioners must spend weeks of preparation to scale their LLM workloads to large clusters of GPUs.
In this post, we highlight new features of the Amazon SageMaker model parallel (SMP) library that simplify the large model training process and help you train LLMs faster. In particular, we cover the SMP library’s new simplified user experience that builds on open source PyTorch Fully Sharded Data Parallel (FSDP) APIs, expanded tensor parallel functionality that enables training models with hundreds of billions of parameters, and performance optimizations that reduce model training time and cost by up to 20%.
To learn more about the SageMaker model parallel library, refer to SageMaker model parallelism library v2 documentation. You can also refer to our example notebooks to get started.
New features that simplify and accelerate large model training
This post discusses the latest features included in the v2.0 release of the SageMaker model parallel library. These features improve the usability of the library, expand functionality, and accelerate training. In the following sections, we summarize the new features and discuss how you can use the library to accelerate your large model training.
Aligning SMP with open source PyTorch
Since its launch in 2020, SMP has enabled high-performance, large-scale training on SageMaker compute instances. With this latest major version release of SMP, the library simplifies the user experience by aligning its APIs with open source PyTorch.
PyTorch offers Fully Sharded Data Parallelism (FSDP) as its main method for supporting large training workload across many compute devices. As demonstrated in the following code snippet, SMP’s updated APIs for techniques such as sharded data parallelism mirror those of PyTorch. You can simply run import torch.sagemaker and use it in place of torch.

## training_script.py
import torch.sagemaker as tsm
tsm.init()

# Set up a PyTorch model
model = …

# Wrap the PyTorch model using the PyTorch FSDP module
model = FSDP(
model,

)

optimizer = …

With these updates to SMP’s APIs, you can now realize the performance benefits of SageMaker and the SMP library without overhauling your existing PyTorch FSDP training scripts. This paradigm also allows you to use the same code base when training on premises as on SageMaker, simplifying the user experience for customers who train in multiple environments.
For more information on how to enable SMP with your existing PyTorch FSDP training scripts, refer to Get started with SMP.
Integrating tensor parallelism to enable training on massive clusters
This release of SMP also expands PyTorch FSDP’s capabilities to include tensor parallelism techniques. One problem with using sharded data parallelism alone is that you can encounter convergence problems as you scale up your cluster size. This is because sharding parameters, gradients, and the optimizer state across data parallel ranks also increases your global batch size; on large clusters, this global batch size can be pushed beyond the threshold below which the model would converge. You need to incorporate an additional parallelism technique that doesn’t require an increase in global batch size as you scale your cluster.
To mitigate this problem, SMP v2.0 introduces the ability to compose sharded data parallelism with tensor parallelism. Tensor parallelism allows the cluster size to increase without changing the global batch size or affecting model convergence. With this feature, you can safely increase training throughput by provisioning clusters with 256 nodes or more.
Today, tensor parallelism with PyTorch FSDP is only available with SMP v2. SMP v2 allows you to enable this technique with a few lines of code change and unlock stable training even on large clusters. SMP v2 integrates with Transformer Engine for its implementation of tensor parallelism and makes it compatible with the PyTorch FSDP APIs. You can enable PyTorch FSDP and SMP tensor parallelism simultaneously without making any changes to your PyTorch model or PyTorch FSDP configuration. The following code snippets show how to set up the SMP configuration dictionary in JSON format and add the SMP initialization module torch.sagemaker.init(), which accepts the configuration dictionary in the backend when you start the training job, to your training script.
The SMP configuration is as follows:

{
“tensor_parallel_degree”: 8,
“tensor_parallel_seed”: 0
}

In your training script, use the following code:

import torch.sagemaker as tsm
tsm.init()

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_config(..)
model = tsm.transform(model)

To learn more about using tensor parallelism in SMP, refer to the tensor parallelism section of our documentation.
Use advanced features to accelerate model training by up to 20%
In addition to enabling distributed training on clusters with hundreds of instances, SMP also offers optimization techniques that can accelerate model training by up to 20%. In this section, we highlight a few of these optimizations. To learn more, refer to the core features section of our documentation.
Hybrid sharding
Sharded data parallelism is a memory-saving distributed training technique that splits the state of a model (model parameters, gradients, and optimizer states) across devices. This smaller memory footprint allows you to fit a larger model into your cluster or increase the batch size. However, sharded data parallelism also increases the communication requirements of your training job because the sharded model artifacts are frequently gathered from different devices during training. In this way, the degree of sharding is an important configuration that trades off memory consumption and communication overhead.
By default, PyTorch FSDP shards model artifacts across all of the accelerator devices in your cluster. Depending on your training job, this method of sharding could increase communication overhead and create a bottleneck. To help with this, the SMP library offers configurable hybrid sharded data parallelism on top of PyTorch FSDP. This feature allows you to set the degree of sharding that is optimal for your training workload. Simply specify the degree of sharding in a configuration JSON object and include it in your SMP training script.
The SMP configuration is as follows:

{ “hybrid_shard_degree”: 16 }

To learn more about the advantages of hybrid sharded data parallelism, refer to Near-linear scaling of gigantic-model training on AWS. For more information on implementing hybrid sharding with your existing FSDP training script, see hybrid shared data parallelism in our documentation.
Use the SMDDP collective communication operations optimized for AWS infrastructure
You can use the SMP library with the SageMaker distributed data parallelism (SMDDP) library to accelerate your distributed training workloads. SMDDP includes an optimized AllGather collective communication operation designed for best performance on SageMaker p4d and p4de accelerated instances. In distributed training, collective communication operations are used to synchronize information across GPU workers. AllGather is one of the core collective communication operations typically used in sharded data parallelism to materialize the layer parameters before forward and backward computation steps. For training jobs that are bottlenecked by communication, faster collective operations can reduce training time and cost with no side effects on convergence.
To use the SMDDP library, you only need to add two lines of code to your training script:

import torch.distributed as dist

# Initialize with SMDDP
import smdistributed.dataparallel.torch.torch_smddp
dist.init_process_group(backend=”smddp”) # Replacing “nccl”

# Initialize with SMP
import torch.sagemaker as tsm
tsm.init()

In addition to SMP, SMDDP supports open source PyTorch FSDP and DeepSpeed. To learn more about the SMDDP library, see Run distributed training with the SageMaker distributed data parallelism library.
Activation offloading
Typically, the forward pass of model training computes activations at each layer and keeps them in GPU memory until the backward pass for the corresponding layer finishes. These stored activations can consume significant GPU memory during training. Activation offloading is a technique that instead moves these tensors to CPU memory after the forward pass and later fetches them back to GPU when they are needed. This approach can substantially reduce GPU memory usage during training.
Although PyTorch supports activation offloading, its implementation is inefficient and can cause GPUs to be idle while activations are fetched back from CPU during a backward pass. This can cause significant performance degradation when using activation offloading.
SMP v2 offers an optimized activation offloading algorithm that can improve training performance. SMP’s implementation pre-fetches activations before they are needed on the GPU, reducing idle time.
Because SMP is built on top of PyTorch’s APIs, enabling optimized activation offloading requires just a few lines of code change. Simply add the associated configurations (sm_activation_offloading and activation_loading_horizon parameters) and include them in your training script.
The SMP configuration is as follows:

{
“activation_loading_horizon”: 2,
“sm_activation_offloading”: True
}

In the training script, use the following code:

import torch.sagemaker as tsm
tsm.init()

# Native PyTorch module for activation offloading
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
apply_activation_checkpointing,
offload_wrapper,
)

model = FSDP(…)

# Activation offloading requires activation checkpointing.
apply_activation_checkpointing(
model,
check_fn=checkpoint_tformer_layers_policy,
)

model = offload_wrapper(model)

To learn more about the open source PyTorch checkpoint tools for activation offloading, see the checkpoint_wrapper.py script in the PyTorch GitHub repository and Activation Checkpointing in the PyTorch blog post Scaling Multimodal Foundation Models in TorchMultimodal with Pytorch Distributed. To learn more about SMP’s optimized implementation of activation offloading, see the activation offloading section of our documentation.
Beyond hybrid sharding, SMDDP, and activation offloading, SMP offers additional optimizations that can accelerate your large model training workload. This includes optimized activation checkpointing, delayed parameter initialization, and others. To learn more, refer to the core features section of our documentation.
Conclusion
As datasets, model sizes, and training clusters continue to grow, efficient distributed training is increasingly critical for timely and affordable model and product delivery. The latest release of the SageMaker model parallel library helps you achieve this by reducing code change and aligning with PyTorch FSDP APIs, enabling training on massive clusters via tensor parallelism and optimizations that can reduce training time by up to 20%.
To get started with SMP v2, refer to our documentation and our sample notebooks.

About the Authors
Robert Van Dusen is a Senior Product Manager with Amazon SageMaker. He leads frameworks, compilers, and optimization techniques for deep learning training.
Luis Quintela is the Software Developer Manager for the AWS SageMaker model parallel library. In his spare time, he can be found riding his Harley in the SF Bay Area.
Gautam Kumar is a Software Engineer with AWS AI Deep Learning.  He is passionate about building tools and systems for AI. In his spare time, he enjoy biking and reading books.
Rahul Huilgol is a Senior Software Development Engineer in Distributed Deep Learning at Amazon Web Services.

Mixtral-8x7B is now available in Amazon SageMaker JumpStart

Today, we are excited to announce that the Mixtral-8x7B large language model (LLM), developed by Mistral AI, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. The Mixtral-8x7B LLM is a pre-trained sparse mixture of expert model, based on a 7-billion parameter backbone with eight experts per feed-forward layer. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Mixtral-8x7B model.
What is Mixtral-8x7B
Mixtral-8x7B is a foundation model developed by Mistral AI, supporting English, French, German, Italian, and Spanish text, with code generation abilities. It supports a variety of use cases such as text summarization, classification, text completion, and code completion. It behaves well in chat mode. To demonstrate the straightforward customizability of the model, Mistral AI has also released a Mixtral-8x7B-instruct model for chat use cases, fine-tuned using a variety of publicly available conversation datasets. Mixtral models have a large context length of up to 32,000 tokens.
Mixtral-8x7B provides significant performance improvements over previous state-of-the-art models. Its sparse mixture of experts architecture enables it to achieve better performance result on 9 out of 12 natural language processing (NLP) benchmarks tested by Mistral AI. Mixtral matches or exceeds the performance of models up to 10 times its size. By utilizing only, a fraction of parameters per token, it achieves faster inference speeds and lower computational cost compared to dense models of equivalent sizes—for example, with 46.7 billion parameters total but only 12.9 billion used per token. This combination of high performance, multilingual support, and computational efficiency makes Mixtral-8x7B an appealing choice for NLP applications.
The model is made available under the permissive Apache 2.0 license, for use without restrictions.
What is SageMaker JumpStart
With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment.
You can now discover and deploy Mixtral-8x7B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security.
Discover models
You can access Mixtral-8x7B foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane.

From the SageMaker JumpStart landing page, you can search for “Mixtral” in the search box. You will see search results showing Mixtral 8x7B and Mixtral 8x7B Instruct.

You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find the Deploy button, which you can use to deploy the model and create an endpoint.

Deploy a model
Deployment starts when you choose Deploy. After deployment finishes, you an endpoint has been created. You can test the endpoint by passing a sample inference request payload or selecting your testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in your preferred notebook editor in SageMaker Studio.
To deploy using the SDK, we start by selecting the Mixtral-8x7B model, specified by the model_id with value huggingface-llm-mixtral-8x7b. You can deploy any of the selected models on SageMaker with the following code. Similarly, you can deploy Mixtral-8x7B instruct using its own model ID:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=”huggingface-llm-mixtral-8x7b”)
predictor = model.deploy()

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel.
After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {“inputs”: “Hello!”}
predictor.predict(payload)

Example prompts
You can interact with a Mixtral-8x7B model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide example prompts.
Code generation
Using the preceding example, we can use code generation prompts like the following:

# Code generation
payload = {
“inputs”: “Write a program to compute factorial in python:”,
“parameters”: {
“max_new_tokens”: 200,
},
}
predictor.predict(payload)

You get the following output:

Input Text: Write a program to compute factorial in python:
Generated Text:
Factorial of a number is the product of all the integers from 1 to that number.

For example, factorial of 5 is 1*2*3*4*5 = 120.

Factorial of 0 is 1.

Factorial of a negative number is not defined.

The factorial of a number can be written as n!.

For example, 5! = 120.

## Write a program to compute factorial in python

“`
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n-1)

print(factorial(5))
“`

Output:

“`
120
“`

## Explanation:

In the above program, we have defined a function called factorial which takes a single argument n.

If n is equal to 0, then we return 1.

Otherwise, we return n multiplied by the factorial of n-1.

We then call the factorial function with the argument 5 and print the result.

The output of the program is 120, which is the factorial of 5.

## How to compute factorial in python

In the above program, we have used a recursive function to compute the factorial of a number.

A recursive function is a function that calls itself.

In this case, the factorial function calls itself with the argument n-1.

This process continues until n is equal to 0, at which point the function returns 1.

The factorial of a number can also be computed using a loop.

For example, the following program uses a for loop to compute the factorial of a number:

“`
def factorial(n):
result = 1
for i in range(1, n+1):
result *= i
return result

Sentiment analysis prompt
You can perform sentiment analysis using a prompt like the following with Mixtral 8x7B:

payload = {
“inputs”: “””
Tweet: “I hate it when my phone battery dies.”
Sentiment: Negative

Tweet: “My day has been :+1:”
Sentiment: Positive

Tweet: “This is the link to the article”
Sentiment: Neutral

Tweet: “This new music video was incredibile”
Sentiment:”””,
“parameters”: {
“max_new_tokens”: 2,
},
}
predictor.predict(payload)

You get the following output:

Input Text: Tweet: “I hate it when my phone battery dies.”
Sentiment: Negative
###
Tweet: “My day has been :+1:”
Sentiment: Positive
###
Tweet: “This is the link to the article”
Sentiment: Neutral
###
Tweet: “This new music video was incredibile”
Sentiment:
Generated Text:  Positive

Question answering prompts
You can use a question answering prompt like the following with Mixtral-8x7B:

# Question answering
payload = {
“inputs”: “Could you remind me when was the C programming language invented?”,
“parameters”: {
“max_new_tokens”: 100,
},
}
query_endpoint(payload)

You get the following output:

Input Text: Could you remind me when was the C programming language invented?
Generated Text:

C was invented in 1972 by Dennis Ritchie at Bell Labs.

C is a general-purpose programming language. It was invented to write the UNIX operating system.

C is a structured programming language. It is a middle-level language. It is a procedural language.

C is a compiled language. It is a portable language.

C is a case-sensitive language. It is a free-form language

Mixtral-8x7B Instruct
The instruction-tuned version of Mixtral-8x7B accepts formatted instructions where conversation roles must start with a user prompt and alternate between user instruction and assistant (model answer). The instruction format must be strictly respected, otherwise the model will generate sub-optimal outputs. The template used to build a prompt for the Instruct model is defined as follows:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]]

Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS), whereas [INST] and [/INST] are regular strings.
The following code shows how you can format the prompt in instruction format:

from typing import Dict, List

def format_instructions(instructions: List[Dict[str, str]]) -> List[str]:
“””Format instructions where conversation roles must alternate user/assistant/user/assistant/…”””
prompt: List[str] = []
for user, answer in zip(instructions[::2], instructions[1::2]):
prompt.extend([“<s>”, “[INST] “, (user[“content”]).strip(), ” [/INST] “, (answer[“content”]).strip(), “</s>”])
prompt.extend([“<s>”, “[INST] “, (instructions[-1][“content”]).strip(), ” [/INST] “,”</s>”])
return “”.join(prompt)

def print_instructions(prompt: str, response: str) -> None:
bold, unbold = ‘33[1m’, ‘33[0m’
print(f”{bold}> Input{unbold}n{prompt}nn{bold}> Output{unbold}n{response[0][‘generated_text’]}n”)

Knowledge retrieval
You can use the following code for a knowledge retrieval prompt:

instructions = [{“role”: “user”, “content”: “what is the recipe of mayonnaise?”}]
prompt = format_instructions(instructions)
payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 315, “do_sample”: True}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

You get the following output:

> Input
<s>[INST] what is the recipe of mayonnaise? [/INST]</s>

> Output
Here is a simple recipe for homemade mayonnaise:

Ingredients:

* 1 egg yolk
* 1 tablespoon of mustard (Dijon is a good choice)
* 1 cup of oil (canola or vegetable oil work well)
* 1-2 tablespoons of white vinegar or lemon juice
* Salt to taste

Instructions:

1. In a medium bowl, whisk together the egg yolk and mustard. It is important that these ingredients are at room temperature.
2. Start adding the oil drop by drop, whisking continuously. This step is crucial to ensure that the mayonnaise emulsifies and does not separate.
3. Once you have added about a quarter of the oil and the mixture has thickened, you can start adding the oil in a thin stream, still whisking constantly.
4. When all the oil has been incorporated and the mayonnaise is thick, whisk in the vinegar or lemon juice and salt to taste.
5. If the mayonnaise is too thick, you can thin it with a little water.
6. Store the mayonnaise in the refrigerator and use within a few days.

Note: It is important to use pasteurized eggs or egg yolks when making homemade mayonnaise to reduce the risk of foodborne illness.

Coding
Mixtral models can demonstrate benchmarked strengths for coding tasks, as shown in the following code:

instructions = [
{
“role”: “user”,
“content”: “In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month?”,
}
]
prompt = format_instructions(instructions)
payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 256, “do_sample”: True, “temperature”: 0.2}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

You get the following output:

> Input
<s>[INST] In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST]</s>
> Output
ef{0.15cm} To list all text files in the current directory that have been modified in the last month, you can use a combination of the `find` and `grep` commands in Bash. Here’s the command you’re looking for:

“`bash
find . -maxdepth 1 -type f -name “*.txt” -mtime -30
“`

Let’s break down this command:

– `find .` starts a search in the current directory.
– `-maxdepth 1` limits the search to the current directory only (excluding subdirectories).
– `-type f` specifies that you’re looking for files.
– `-name “*.txt”` filters the results to only include files with a `.txt` extension.
– `-mtime -30` filters the results to only include files modified within the last 30 days.

This command will output the paths of all text files in the current directory that have been modified in the last month.

Mathematics and reasoning
Mixtral models also report strengths in mathematics accuracy:

instructions = [
{
“role”: “user”,
“content”: “I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering.”,
}
]
prompt = format_instructions(instructions)
payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 600, “do_sample”: True, “temperature”: 0.2}
}
response=predictor.predict(payload)
print_instructions(prompt, response)

Mixtral models can provide comprehension as shown in the following output with the math logic:

> Input
<s>[INST] I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering. [/INST] </s>
> Output
First, let’s calculate the total cost of the ice cream cones. Since each cone costs $1.25 and you bought 6 cones, the total cost would be:

Total cost = Cost per cone * Number of cones
Total cost = $1.25 * 6
Total cost = $7.50

Next, subtract the total cost from the amount you paid with the $10 bill to find out how much change you got back:

Change = Amount paid – Total cost
Change = $10 – $7.50
Change = $2.50

So, you got $2.50 back.

Clean up
After you’re done running the notebook, delete all resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mixtral-8x7B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.
Resources

SageMaker JumpStart documentation
SageMaker JumpStart foundation models documentation
SageMaker JumpStart product detail page
SageMaker JumpStart model catalog

About the authors
Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.
Christopher Whitten is a software developer on the JumpStart team. He helps scale model selection and integrate models with other SageMaker services. Chris is passionate about accelerating the ubiquity of AI across a variety of business domains.
Dr. Fabio Nonato de Paula is a Senior Manager, Specialist GenAI SA, helping model providers and customers scale generative AI in AWS. Fabio has a passion for democratizing access to generative AI technology. Outside of work, you can find Fabio riding his motorcycle in the hills of Sonoma Valley or reading ComiXology.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Karl Albertsen leads product, engineering, and science for Amazon SageMaker Algorithms and JumpStart, SageMaker’s machine learning hub. He is passionate about applying machine learning to unlock business value.