Researchers at UC San Diego Propose DrS: A Novel Machine Learning Appr …

The success of many reinforcement learning (RL) techniques relies on dense reward functions, but designing them can be difficult due to expertise requirements and trial and error. Sparse rewards, like binary task completion signals, are easier to obtain but pose challenges for RL algorithms, such as exploration. Consequently, the question emerges: Can dense reward functions be learned in a data-driven manner to address these challenges? 

Existing research on reward learning often overlooks the importance of reusing rewards for new tasks. In learning reward functions from demonstrations, known as inverse RL, methods like adversarial imitation learning (AIL) have gained traction. Inspired by GANs, AIL employs a policy network and a discriminator to generate and distinguish trajectories, respectively. However, AIL’s rewards are not reusable across tasks, limiting its ability to generalize to new tasks.

Researchers from UC San Diego present Dense reward learning from Stages (DrS), a unique approach to learning reusable rewards by incorporating sparse rewards as a supervision signal instead of the original signal for classifying demonstration and agent trajectories. This involves training a discriminator to classify success and failure trajectories based on binary sparse rewards. Higher rewards are assigned to transitions in success trajectories, and lower rewards are assigned to transitions within failure trajectories, ensuring consistency throughout training. Once training is completed, the rewards become reusable. Expert demonstrations can be included as success trajectories, but they are not mandatory, as only sparse rewards are needed, which is often inherent in task definitions.

DrS model consists of two phases: Reward Learning and Reward Reuse. In the Reward Learning phase, a classifier is trained to differentiate between successful and unsuccessful trajectories using sparse rewards. This classifier serves as a dense reward generator. The Reward Reuse phase applies the learned dense reward to train new RL agents in test tasks. Stage-specific discriminators are trained to provide dense rewards for multi-stage functions for each stage, ensuring effective guidance through task progression.

The proposed model was evaluated on three challenging physical manipulation tasks: Pick-and-Place, Turn Faucet, and Open Cabinet Door, each containing various objects. The evaluation focused on the reusability of learned rewards, utilizing non-overlapping training and test sets for each task family. During the Reward Learning phase, rewards were learned by training agents to manipulate training objects, and then these rewards were reused to train agents on test objects in the Reward Reuse phase. The study utilized the Soft Actor-Critic (SAC) algorithm for evaluation. Results demonstrated that the learned rewards outperformed baseline rewards across all task families, sometimes rivaling human-engineered rewards. Semi-sparse rewards exhibited limited success, while other reward learning methods failed to achieve success.

In conclusion, this research presents DrS, a data-driven approach for learning dense reward functions from sparse rewards Evaluated on robotic manipulation tasks, showcasing DrS’s effectiveness in transferring across tasks with varying object geometries. This simplification of the reward design process holds promise for scaling up RL applications in diverse scenarios. However, two main limitations arise with the multi-stage version of the approach. Firstly, the acquisition of task structure knowledge remains unexplored, which could be addressed using large language models or information-theoretic approaches. Secondly, relying on stage indicators may pose challenges in directly training RL agents in real-world settings. However, tactile sensors or visual detection/tracking methods can obtain stage information when necessary.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Researchers at UC San Diego Propose DrS: A Novel Machine Learning Approach for Learning Reusable Dense Rewards for Multi-Stage Tasks in a Data-Driven Manner appeared first on MarkTechPost.

SEED-Bench-2-Plus: An Extensive Benchmark Specifically Designed for Ev …

Evaluating Multimodal Large Language Models (MLLMs) in text-rich scenarios is crucial, given their increasing versatility. However, current benchmarks mainly assess general visual comprehension, overlooking the nuanced challenges of text-rich content. MLLMs like GPT-4V, Gemini-Pro-Vision, and Claude-3-Opus showcase impressive capabilities but lack comprehensive evaluation in text-rich contexts. Understanding text within images requires interpreting textual and visual cues, a challenge yet to be rigorously addressed. 

SEED-Bench-2-Plus, developed by researchers from Tencent AI Lab, ARC Lab, Tencent PCG, and The Chinese University of Hong Kong, Shenzhen, is a specialized benchmark for evaluating MLLMs’ understanding of text-rich visual content. It consists of 2.3K meticulously crafted multiple-choice questions covering three broad categories: charts, Maps, and Webs, encompassing diverse real-world scenarios. Human annotators ensure accuracy, and evaluation involves 34 leading MLLMs like GPT-4V, Gemini-Pro-Vision, and Claude-3-Opus. 

Recent research has seen a surge in MLLMs, aiming to enhance understanding across text and images. While some studies integrate video inputs, others focus on generating images from text. However, the proficiency of these models in text-rich contexts still needs to be explored. SEED-Bench-2-Plus addresses this gap by offering a comprehensive benchmark to evaluate MLLMs’ performance in understanding text-rich visual content. Unlike existing benchmarks, SEED-Bench-2-Plus encompasses a broad spectrum of real-world scenarios and avoids biases introduced by human annotators, providing a valuable tool for objective evaluation and advancement in this domain.

SEED-Bench-2-Plus presents a comprehensive benchmark comprising 2K multiple-choice questions across three main categories: Charts, Maps, and Webs. Each category encompasses a variety of data types, totaling 63 in all. The dataset is meticulously curated, including charts, maps, and website screenshots rich in textual information. Utilizing GPT-4V, questions are generated and further refined by human annotators. Evaluation employs an answer ranking strategy, assessing MLLMs’ performance based on the likelihood of developing the correct answer for each choice. Unlike previous methods, this approach avoids dependency on model instruction-following capabilities and mitigates the impact of multiple-choice option order on performance.

The evaluation encompassed 31 open-source MLLMs and three closed-source ones across various categories of SEED-Bench-2-Plus. GPT-4V outperformed many MLLMs, showing superior performance across most evaluation types. However, most MLLMs struggled with text-rich data, achieving an average accuracy rate of less than 40%, indicating the complexity of comprehending such data. Maps posed significant challenges due to their multidimensional nature, while performance varied across different data types within categories. These observations underscore the need for further research to enhance MLLMs’ proficiency in text-rich scenarios, ensuring adaptability across diverse data types.

In conclusion, SEED-Bench-2-Plus is a comprehensive benchmark for assessing MLLMs in text-rich contexts. With 2K human-annotated multiple-choice questions covering 63 data types across three broad categories, it offers a thorough evaluation platform. By examining 31 open-source and three closed-source MLLMs, valuable insights have been gleaned to guide future research. Complementing SEED-Bench-2, both the dataset and evaluation code are publicly accessible, accompanied by a leaderboard to foster advancements in text-rich visual comprehension with MLLMs.

Check out the Paper and Project page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post SEED-Bench-2-Plus: An Extensive Benchmark Specifically Designed for Evaluating Multimodal Large Language Models (MLLMs) in Text-Rich Scenarios appeared first on MarkTechPost.

SynthEval: A Novel Open-Source Machine Learning Framework for Detailed …

Computer vision, machine learning, and data analysis across many fields have all seen a surge in the usage of synthetic data in the past few years. Synthetic means to mimic complicated situations that would be challenging, if not impossible, to record in the actual world. Information about individuals, such as patients, citizens, or customers, along with their unique attributes, can be found in tabular records at the personal level. These records are ideal for knowledge discovery tasks and the creation of advanced predictive models to help with decision-making and product development. The privacy implications of tabular information are substantial, though, and they should not be openly disclosed. Data protection regulations are essential for safeguarding individuals’ rights against harmful designs, blackmail, frauds, or discrimination in the event that sensitive data is compromised. While they may slow down scientific development, they are necessary to prevent such harm. 

In theory, synthetic data improves upon conventional methods of anonymization by enabling access to tabular datasets while simultaneously shielding people’ identities from prying eyes. In addition to strengthening, balancing, and reducing data bias, synthetic data can improve downstream models. Although we have achieved remarkable success with text and image data, it is still difficult to simulate tabular data, and the privacy and quality of synthetic data can differ greatly based on the algorithms used to create it, the parameters used for optimization, and the assessment methodology. Particularly, it is difficult to compare current models and, by extension, to objectively assess the efficacy of a new algorithm due to the absence of consensus on assessment methodologies.

A new study by University of Southern Denmark researchers introduces SynthEval, a novel evaluation framework in the Python package. Its purpose is to facilitate the easy and consistent evaluation of synthetic tabular data. Their motivation comes from the belief that the SynthEval framework may significantly influence the research community and provide a much-needed answer to the evaluation scene. SynthEval incorporates a large collection of metrics that can be used to create user-specific benchmarks. With the press of a button, users can access predefined benchmarks in the presets, and the given components make it easy to construct your own unique settings. Adding custom metrics to benchmarks is a breeze and doesn’t need editing the source code. 

A robust shell for accessing a large library of measurements and condensing them into evaluation reports or benchmark configurations is the primary function of SynthEval. The metrics object and the SynthEval interface object are the two primary building blocks that do this. The former specifies how the metric modules are structured and how the SynthEval workflow can access them. Evaluation and benchmark modules are mostly hosted by the SynthEval interface object, which is an object that may be interacted with. If non-numerical values are not supplied, the SynthEval utilities will automatically determine them. They handle any data preprocessing that is required. 

Theoretically, there are just two lines of code needed to perform evaluation and benchmarking: creating the SynthEval object and calling either method. The command line interface is another way that SynthEval is made available to you.

The team has given multiple ways to get the metrics to make SynthEval to be as versatile as possible. There are now three preset setups available, or metrics can be selected manually from the library. Bulk selection is also an option. If you specify a file path as a preset, SynthEval will try to load the file. If users use any non-standard setup, a new config file will be saved in JSON format for repeatability.

As an additional useful feature, SynthEval’s benchmark module permits the simultaneous evaluation of multiple synthetic renditions of the same dataset. The outcomes are combined, evaluated internally, and then sent forth. The user can easily and thoroughly assess several datasets using various metrics thanks to this. Generative model skills can be thoroughly evaluated with the use of datasets generated by frameworks like SynthEval. Concerning tabular data, one of the biggest obstacles is maintaining consistency when dealing with fluctuating percentages of numerical and categorical data. This problem has been addressed in earlier evaluation systems in various ways, for example by limiting the metrics that may be used or by limiting the sorts of data that can be accepted. In contrast, SynthEval builds mixed correlation matrix equivalents, uses similarity functions instead of classical distances to account for heterogeneity, and uses empirical approximation of p-values to try to portray the complexities of real data. 

The team employs the linear ranking strategy and a bespoke evaluation configuration in SynthEval’s benchmark module. It appears that the generative models have a tough time competing with the baselines. The “random sample” baseline in particular stands out as a formidable opponent, ranking among the top overall and boasting privacy and utility ratings that are not matched anywhere else in the benchmark. The findings make it clear that guaranteeing high utility does not automatically mean good privacy. When it comes to privacy, the most useful datasets—unoptimized BN and CART models—are also among the lowest ranked, posing unacceptable risks of identifying. 

The accessible metrics in SynthEval each take dataset heterogeneity into consideration in their own unique manner, which is a limitation in and of itself. Preprocessing has its limits, and future metric integrations must take into account the fact that synthetic data is very heterogeneous in order to adhere to it. The researchers intend to incorporate additional metrics asked for or provided by the community and aim to continue improving the performance of the several algorithms and the framework that is already in place.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post SynthEval: A Novel Open-Source Machine Learning Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data appeared first on MarkTechPost.

Develop and train large models cost-efficiently with Metaflow and AWS …

This is a guest post co-authored with Ville Tuulos (Co-founder and CEO) and Eddie Mattia (Data Scientist) of Outerbounds.
To build a production-grade AI system today (for example, to do multilingual sentiment analysis of customer support conversations), what are the primary technical challenges? Historically, natural language processing (NLP) would be a primary research and development expense. In 2024, however, organizations are using large language models (LLMs), which require relatively little focus on NLP, shifting research and development from modeling to the infrastructure needed to support LLM workflows.
For AWS and Outerbounds customers, the goal is to build a differentiated machine learning and artificial intelligence (ML/AI) system and reliably improve it over time. This often means the method of using a third-party LLM API won’t do for security, control, and scale reasons. Owning the infrastructural control and knowhow to run workflows that power AI systems is a requirement.
Returning to the original question, three MLOps challenges may arise:

You need high-quality data to train and fine-tune models
You need a diverse cloud infrastructure for experimentation, training, tracking, and orchestrating the production system
You need a significant amount of compute to power the system

In this post, we highlight a collaboration between Outerbounds and AWS that takes a step towards addressing the last two challenges. First, the AWS Trainium accelerator provides a high-performance, cost-effective, and readily available solution for training and fine-tuning large models. Second, open source Metaflow provides the necessary software infrastructure to build production-grade ML/AI systems in a developer-friendly manner. It provides an approachable, robust Python API for the full infrastructure stack of ML/AI, from data and compute to workflows and observability.
In the following sections, we first introduce Metaflow and the Trainium integration. We then show how to set up the infrastructure stack you need to take your own data assets and pre-train or fine-tune a state-of-the-art Llama2 model on Trainium hardware.
Metaflow overview
Metaflow was originally developed at Netflix to enable data scientists and ML engineers to build ML/AI systems quickly and deploy them on production-grade infrastructure. Netflix open sourced the framework in 2019 with integrations to AWS services like AWS Batch, AWS Step Functions (see Unbundling Data Science Workflows with Metaflow and AWS Step Functions), Kubernetes, and throughput-optimized Amazon Simple Storage Service (Amazon S3), so you can build your own Netflix-scale ML/AI environment in your AWS account.
The key motivation of Metaflow is to address the typical needs of all ML/AI projects with a straightforward, human-centric API, from prototype to production (and back). The following figure illustrates this workflow.

Metaflow’s coherent APIs simplify the process of building real-world ML/AI systems in teams. Metaflow helps scientists and engineers access, move, and manipulate data efficiently; track and version experiments and models; orchestrate and integrate workflows to surrounding systems; and scale compute to the cloud easily. Moreover, it has first-class support for teams, such as namespacing and deploying workflows in versioned production branches.
Now, with today’s announcement, you have another straightforward compute option for workflows that need to train or fine-tune demanding deep learning models: running them on Trainium.
How Metaflow integrates with Trainium
From a Metaflow developer perspective, using Trainium is similar to other accelerators. After a Metaflow deployment is configured to access Trainium chips through the compute platform customers use with Metaflow (which we discuss later in this post), ML engineers and data scientists can operate autonomously in the land of deep learning code. Scientists can write PyTorch, Hugging Face, and use the AWS Neuron SDK along with the NeuronX Distributed SDK to optimize these frameworks to target Trainium devices, and Metaflow integrates with the underlying AWS services to separate concerns about how to actually run the code at scale.
As illustrated by the following figure, you can declare the following in a few lines of Python code:

How many nodes to launch
How many Trainium devices to use per node
How the nodes are interconnected (Elastic Fabric Adapter)
How often to check the resource utilization
What training script the torchrun process should run on each node

You can initialize the training process in the start step, which directs the next train step to run on two parallel instances (num_parallel=2). The decorators of the train step configure your desired training setup:

@torchrun – Sets up PyTorch Distributed across two instances
@batch – Configures the Trainium nodes, managed by AWS Batch
@neuron_monitor – Activates the monitoring UI that allows you to monitor the utilization of the Trainium cores

Metaflow allows you to configure all this functionality in a few lines of code. However, the main benefit is that you can embed Trainium-based training code inside a larger production system, using the scaffolding provided by Metaflow.
Benefits of using Trainium with Metaflow
Trainium and Metaflow work together to solve problems like what we discussed earlier in this post. The Trainium devices and Neuron software stack make it straightforward for teams to access and effectively use the high-performance hardware needed for cutting-edge AI.
Trainium provides a few key benefits for building real-world AI systems:

Trainium instances can help reduce generative AI model training and fine-tuning costs by up to 50% over comparable instances on AWS
It is readily available in many AWS Regions, is often more available than GPU-based instance types, and scaling is available in the most popular Regions worldwide
The hardware and software are mature and actively developed by AWS

If you have been struggling with GPU availability and cost, you’ll surely appreciate these benefits. Using Trainium effectively can require a bit of infrastructure effort and knowledge, which is a key motivation for this integration. Through Metaflow and the deployment scripts provided in this post, you should be able to get started with Trainium with ease.
Besides easy access, using Trainium with Metaflow brings a few additional benefits:
Infrastructure accessibility
Metaflow is known for its developer-friendly APIs that allow ML/AI developers to focus on developing models and applications, and not worry about infrastructure. Metaflow helps engineers manage the infrastructure, making sure it integrates with existing systems and policies effortlessly.
Data, model, and configuration management
Metaflow provides built-in, seamless artifact persistence, tracking, and versioning, which covers the full state of the workflows, making sure you’ll follow MLOps best practices. Thanks to Metaflow’s high-throughput S3 client, you can load and save datasets and model checkpoints very quickly, without having to worry about extra infrastructure such as shared file systems. You can use artifacts to manage configuration, so everything from hyperparameters to cluster sizing can be managed in a single file, tracked alongside the results.
Metaflow comes with a convenient UI, which you can customize to observe metrics and data that matter to your use cases in real time. In the case of Trainium, we provide a custom visualization that allows you to monitor utilization of the NeuronCores inside Trainium instances, making sure that resources are used efficiently. The following screenshot shows an example of the visualization for core (top) and memory (bottom) utilization.

Multi-node compute
Finally, a huge benefit of Metaflow is that you can use it to manage advanced multi-instance training clusters, which would take a lot of involved engineering otherwise. For instance, you can train a large PyTorch model, sharded across Trainium instances, using Metaflow’s @torchrun and @batch decorators.
Behind the scenes, the decorators set up a training cluster using AWS Batch multi-node with a specified number of Trainium instances, configured to train a PyTorch model across the instances. By using the launch template we provide in this post, the setup can benefit from low-latency, high-throughput networking via Elastic Fabric Adapter (EFA) networking interfaces.
Solution overview
As a practical example, let’s set up the complete stack required to pre-train Llama2 for a few epochs on Trainium using Metaflow. The same recipe applies to the fine-tuning examples in the repository.
Deploy and configure Metaflow
If you already use a Metaflow deployment, you can skip to the next step to deploy the Trainium compute environment.
To deploy a Metaflow stack using AWS CloudFormation, complete the following steps:

Download the CloudFormation template.
On the CloudFormation console, choose Stacks in the navigation pane.
Choose Create new stack.
For Prepare template¸ select Template is ready.
For Template source, select Upload a template file.
Upload the template.
Choose Next.

If you are brand new to Metaflow, or are trying this recipe as a proof of concept, we suggest you change the APIBasicAuth parameter to false and leave all other default parameter settings.
Complete the stack creation process.

After you create the CloudFormation stack and configure Metaflow to use the stack resources, there is no additional setup required. For more information about the Metaflow components that AWS CloudFormation deploys, see AWS Managed with CloudFormation.
To use the stack you just deployed from your laptop or cloud workstation, complete the following steps:

Prepare a Python environment and install Metaflow in it:

pip install metaflow

Run metaflow configure aws in a terminal.

metaflow configure aws

After the CloudFormation stack deployment is complete, the Outputs on the stack details page will contain a list of resource names and their values, which you can use in the Metaflow AWS configuration prompts.
Deploy a Trainium compute environment
The default Metaflow deployment from the previous step has an AWS Batch compute environment, but it will not be able to schedule jobs to run on Amazon Elastic Compute Cloud (Amazon EC2) instances with Trainium devices. To deploy an AWS Batch compute environment for use with Trainium accelerators, you can use the following CloudFormation template. Complete the following steps:

Download the CloudFormation template.
On the CloudFormation console, choose Stacks in the navigation pane.
Choose Create new stack.
For Prepare template¸ select Template is ready.
For Template source, select Upload a template file.
Upload the template.
Choose Next.
Complete the stack creation process.

Take note of the name of the AWS Batch job queue that you created to use in a later step.
Prepare a base Docker image to run Metaflow tasks
Metaflow tasks run inside Docker containers when AWS Batch is used as a compute backend. To run Trainium jobs, developers need to build a custom image and specify it in the @batch decorator Metaflow developers use to declare task resources:

@batch(trainium=16, efa=8, image=”YOUR_IMAGE_IN_ECR” )
def train_llama2(self):
# neuron distributed training code

To make the image, complete the following steps:

Create an Amazon Elastic Container Registry (Amazon ECR) registry to store your image in.
Create and log in to an EC2 instance with sufficient memory. For this post, we used Ubuntu x86 OS on a C5.4xlarge instance.
Install Docker.
Copy the following Dockerfile to your instance.
Authenticate with the upstream base image provider:

aws ecr get-login-password
–region $REGION | docker login
–username AWS
–password-stdin 763104351884.dkr.ecr.$

Build the image:

docker build . -t $YOUR_IMAGE_NAME:$YOUR_IMAGE_TAG

On the Amazon ECR console, navigate to the ECR registry you created, and you will find the commands needed to authenticate from the EC2 instance and push your image.

Clone the repository on your workstation
Now you’re ready to verify the infrastructure is working properly, after which you can run complex distributed training code like Llama2 training. To get started, clone the examples repository to the workstation where you configured Metaflow with AWS:

git clone

Verify the infrastructure with an allreduce example
To validate your infrastructure configuration, complete the following steps:

Navigate to the allreduce example:

cd allreduce-trn

Open the file and make sure to set the job queue and image to the name of the queue you deployed with AWS CloudFormation and the image you pushed to Amazon ECR, respectively.
To run the allreduce code, run the following Metaflow command:

python – run

You can find the logs (truncated in the following code snippet for readability) in the Metaflow UI:

Task is starting (status SUBMITTED)…
Task is starting (status RUNNABLE)… (parallel node status: [SUBMITTED:3])
Task is starting (status STARTING)… (parallel node status: [SUBMITTED:3])
Task is starting (status RUNNING)… (parallel node status: [SUBMITTED:3])
Setting up task environment.
Downloading code package…
Code package downloaded.
Task is starting.

Compiler status PASS
result OK step 0: tensor([[64., 64., 64.],
[64., 64., 64.]], device=’xla:1′)

result OK step 900: tensor([[64., 64., 64.],
[64., 64., 64.]], device=’xla:1′)
Before final rendezvous
Waiting for batch secondary tasks to finish

Configure and run any Neuron distributed code
If the allreduce test runs successfully, you are ready to move on to meaningful workloads. To complete this onboarding, complete the following steps:

Navigate to the llama2-7b-pretrain-trn directory.
Similar to the all reduce example, before using this code, you need to modify the file so that it matches the AWS Batch job queue and ECR image that you created. Open the file, find these lines, and modify them to your values:

class BatchJobConfig:
# <snip>
image: str = “YOUR_IMAGE”
job_queue: str = “YOUR_QUEUE”

After modifying these values, and any others you want to experiment with, run the following command:


Then run the workflow to pre-train your own Llama2 model from scratch:

python run –config-file config.yaml

This will train the model on however many nodes you specify in the file, and will push the trained model result to Amazon S3 storage, versioned by Metaflow’s data store using the flow name and run ID.
Logs will appear like the following (truncated from a sample run of five steps for readability):

Task is starting (status SUBMITTED)…
Task is starting (status RUNNABLE)… (parallel node status: [SUBMITTED:3])
Task is starting (status STARTING)… (parallel node status: [SUBMITTED:3])
Task is starting (status RUNNING)… (parallel node status: [SUBMITTED:3])
Setting up task environment.
Downloading code package…
Code package downloaded.
Task is starting.

initializing tensor model parallel with size 8
initializing pipeline model parallel with size 1
initializing data parallel with size 16

Epoch 0 begin Fri Mar 15 21:19:10 2024

Compiler status PASS

(0, 3) step_loss : 15.4375 learning_rate : 3.00e-04 throughput : 4.38
(0, 4) step_loss : 12.1250 learning_rate : 1.50e-04 throughput : 5.47
(0, 5) step_loss : 11.8750 learning_rate : 0.00e+00 throughput : 6.44

Writing data to the provided results file: /metaflow/metaflow/metrics.json

Waiting for batch secondary tasks to finish

Clean up
To clean up resources, delete the CloudFormation stacks for your Metaflow deployment and Trainium compute environment:

aws cloudformation delete-stack –stack-name metaflow
aws cloudformation delete-stack –stack-name trn1-batch

You can get started experimenting with the solution presented in this post in your environment today. Follow the instructions in the GitHub repository to pre-train a Llama2 model on Trainium devices. Additionally, we have prepared examples for fine-tuning Llama2 and BERT models, demonstrating how you can use the Optimum Neuron package to use the integration from this post with any Hugging Face model.
We are happy to help you get started. Join the Metaflow community Slack for support, to provide feedback, and share experiences!

About the authors

Ville Tuulos is a co-founder and CEO of Outerbounds, a developer-friendly ML/AI platform. He has been developing infrastructure for ML and AI for over two decades in academia and as a leader at a number of companies. At Netflix, he led the ML infrastructure team that created Metaflow, a popular open-source, human-centric foundation for ML/AI systems. He is also the author of a book, Effective Data Science Infrastructure, published by Manning.

Eddie Mattia is in scientific computing and more recently building machine learning developer tools. He has worked as a researcher in academia, in customer-facing and engineering roles at MLOps startups, and as a product manager at Intel. Currently, Eddie is working to improve the open-source Metaflow project and is building tools for AI researchers and MLOps developers at Outerbounds.

Vidyasagar specializes in high performance computing, numerical simulations, optimization techniques and software development across industrial and academic environments. At AWS, Vidyasagar is a Senior Solutions Architect developing predictive models, generative AI and simulation technologies. Vidyasagar has a PhD from the California Institute of Technology.

Diwakar Bansal is an AWS Senior Specialist focused on business development and go-to-market for GenAI and Machine Learning accelerated computing services. Diwakar has led product definition, global business development, and marketing of technology products in the fields of IOT, Edge Computing, and Autonomous Driving focusing on bringing AI and Machine leaning to these domains. Diwakar is passionate about public speaking and thought leadership in the Cloud and GenAI space.

Sadaf Rasool is a Machine Learning Engineer with the Annapurna ML Accelerator team at AWS. As an enthusiastic and optimistic AI/ML professional, he holds firm to the belief that the ethical and responsible application of AI has the potential to enhance society in the years to come, fostering both economic growth and social well-being.

Scott Perry is a Solutions Architect on the Annapurna ML accelerator team at AWS. Based in Canada, he helps customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium. His interests include large language models, deep reinforcement learning, IoT, and genomics.

Cohere Command R and R+ are now available in Amazon SageMaker JumpStar …

This blog post is co-written with Pradeep Prabhakaran from Cohere. 
Today, we are excited to announce that Cohere Command R and R+ foundation models are available through Amazon SageMaker JumpStart to deploy and run inference. Command R/R+ are the state-of-the-art retrieval augmented generation (RAG)-optimized models designed to tackle enterprise-grade workloads.
In this post, we walk through how to discover and deploy Cohere Command R/R+ via SageMaker JumpStart.
What are Cohere Command R and Command R+?
Cohere Command R is a family of highly scalable language models that balance high performance with strong accuracy. Command R family – include Command R and Command R+ models – are optimized for RAG based workflows such as conversational interaction and long context tasks, enabling companies to move beyond proof of concept and into production. These powerful models are designed to handle complex tasks with high performance and strong accuracy, making them suitable for real-world applications.
Command R boasts high precision on RAG and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese.
Command R+ is the newest model, optimized for extremely performant conversational interaction and long-context tasks. It is recommended for workflows that lean on complex RAG functionality and multi-step tool use (agents), while Cohere R is well-suited for simpler RAG and single-step tool use tasks, as well as applications where price is a major consideration.
What is SageMaker JumpStart
With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances from a network-isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Cohere Command R/R+ models with a few choices in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. Doing so enables you to derive model performance and machine learning operations (MLOps) controls with SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs.
The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping provide data security. Cohere Command R/R+ models are available today for deployment and inferencing in Amazon SageMaker Studio in us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-1 (N. California), us-west-2 (Oregon), Canada (Central), eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris), eu-north-1 (Stockholm), ap-southeast-1 (Singapore), ap-southeast-2 (Sydney), ap-northeast-1 (Tokyo) , ap-northeast-2 (Seoul), ap-south-1 (Mumbai), and sa-east-1 (Sao Paulo).
Discover models
You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
From the SageMaker JumpStart landing page, you can easily discover various models by browsing through different hubs, which are named after model providers. The Cohere Command R and R+ models are available in the Cohere hub. If you don’t see these models, ensure you have the latest SageMaker Studio version by shutting down and restarting Studio Classic Apps.
To find the Command R and R+ models, search for “Command R” in the search box located at the top left of the SageMaker JumpStart landing page. Each model can be deployed on Amazon Elastic Compute Cloud (EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs (p5.48xlarge) and Amazon EC2 P4de instances powered by NVIDIA A100 Tensor Core GPUs (ml.p4de.24xlarge).

Deploy a model
To illustrate model deployment, we’ll deploy Cohere Command R+ on NVIDIA H100. Choose the model card to open the corresponding model detail page.
When you choose Deploy, a window appears prompting you to subscribe to the model on AWS Marketplace. Choose Subscribe, which redirects you to the AWS Marketplace listing for Cohere Command R+ (H100). Follow the on-screen instructions to complete the subscription process.

Once subscribed, return to the model detail page and choose Deploy in the window. The deployment process initiates.

Alternatively, you can choose Notebooks on the model card and open the example notebook in JupyterLab. This notebook provides end-to-end guidance on deploying the model for inference and cleaning up resources. You can also find this example notebook in the Cohere SageMaker GitHub repository. To ensure the security of the endpoint, you can configure AWS Key Management Service (KMS) key for a SageMaker endpoint configuration.
If an endpoint has already been created, you can simply connect to it:

co = Client(region_name=region)


Real-time inference
Once your endpoint has been connected, you can perform real-time inference using the endpoint.

message = “Write a LinkedIn post about starting a career in tech:”
response =, stream=False)

Multilingual capabilities
Command R/R+ is optimized to perform well in 10 key languages, as listed in the introduction. Additionally, pre-training data have been included for the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, Persian.
The model has been trained to respond in the language of the user. Here’s an example in Spanish:
message=”Écris une description de produit pour une voiture électrique en 50 à 75 mots”

Here’s what the response might look like:

Découvrez la voiture électrique qui va révolutionner votre façon de conduire.
Avec son design élégant, cette voiture offre une expérience de conduit unique avec une accélération puissante et une autonomie impressionnante. Sa technologie avancée vous garantit une charge rapide et une fiabilité inégalée. Avec sa conception innovante et durable, cette voiture est parfaite pour les trajets urbains et les longues distances. Profitez d’une conduite silencieuse et vivez l’expérience de la voiture électrique!

Command R/R+ can also perform cross-lingual tasks, such as translation or answering questions about content in other languages.
Chat with documents (RAG)
Command R/R+ can ground its generations. This means that it can generate responses based on a list of supplied document snippets, and it includes citations in its response indicating the source of the information.
For example, the code snippet that follows produces an answer to “How deep is the Mariana Trench” along with inline citations based on the provided on-line documents.

message=”How deep is the Mariana Trench”
documents = [
“id”: “national_geographic_everest”,
“title”: “Height of Mount Everest”,
“snippet”: “The height of Mount Everest is 29,035 feet”,
“url”: “”,
“id”: “national_geographic_mariana”,
“title”: “Depth of the Mariana Trench”,
“snippet”: “The depth of the Mariana Trench is 36,070 feet”,
“url”: “”,

response =, documents=documents, stream=False)


text: “The depth of the Mariana Trench is 36,070 feet.”,
citations: [
{‘start’: 35, ‘end’: 47, ‘text’: ‘36,070 feet.’, ‘document_ids’: [‘national_geographic_mariana’]}
documents: [
{‘id’: ‘national_geographic_mariana’,
‘snippet’: ‘The depth of the Mariana Trench is 36,070 feet’,
‘title’: ‘Depth of the Mariana Trench’

Single-Step & Multi-Step Tool Use
Command R/R+, comes with a Tool Use API that enables the language model to interact with user-defined tools to automate highly sophisticated tasks. Command R/R+ in Tool Use mode creates API payloads (JSONs with specific parameters) based on user interactions and conversational history. These can be used to instruct any other application or tool.
For example, an application can be instructed to automatically categorize and route support tickets to the appropriate individual, change a status in customer relationship management software (CRM), or retrieve relevant snippets from a vector database. It comes in two variants; single-step and multi-step:

Single-step tool use enables a richer set of behaviors by leveraging data stored in tools, taking actions through APIs, interacting with a vector database, querying a search engine, etc.
Multi-step tool use is an extension of this basic idea and allows the model to call more than one tool in a sequence of steps, using the results from one tool call in a subsequent step. This process allows the language model to reason, perform dynamic actions, and quickly adapt based on information coming from external sources.

To explore these capabilities further, you can refer to the provided Jupyter notebook and Cohere’s AWS GitHub repository, which offer additional examples showcasing various use cases and applications.
Clean Up
After you’ve finished running the notebook and exploring the Cohere Command R and R+ models, it’s essential to clean up the resources you’ve created to avoid incurring unnecessary charges. Follow these steps to delete the resources and stop the billing:


In this post, we explored how to leverage the powerful capabilities of Cohere’s Command R and R+ models on Amazon SageMaker JumpStart. These state-of-the-art large language models are specifically designed to excel at real-world enterprise use cases, offering unparalleled performance and scalability. With their availability on SageMaker JumpStart and AWS Marketplace, you now have seamless access to these cutting-edge models, enabling you to unlock new levels of productivity and innovation in your natural language processing projects.

About the authors

Pradeep Prabhakaran is a Customer Solutions Architect at Cohere. In his current role at Cohere, Pradeep acts as a trusted technical advisor to customers and partners, providing guidance and strategies to help them realize the full potential of Cohere’s cutting-edge Generative AI platform. Prior to joining Cohere, Pradeep was a Principal Customer Solutions Manager at Amazon Web Services, where he led Enterprise Cloud transformation programs for large enterprises. Prior to AWS, Pradeep has held various leadership positions at consulting companies such as Slalom, Deloitte, and Wipro. Pradeep holds a Bachelor’s degree in Engineering and is based in Dallas, TX.

James Yi is a Senior AI/ML Partner Solutions Architect at Amazon Web Services. He spearheads AWS’s strategic partnerships in Emerging Technologies, guiding engineering teams to design and develop cutting-edge joint solutions in GenAI. He enables field and technical teams to seamlessly deploy, operate, secure, and integrate partner solutions on AWS. James collaborates closely with business leaders to define and execute joint Go-To-Market strategies, driving cloud-based business growth. Outside of work, he enjoys playing soccer, traveling, and spending time with his family.

Revolutionizing large language model training with Arcee and AWS Train …

This is a guest post by Mark McQuade, Malikeh Ehghaghi, and Shamane Siri from Arcee.
In recent years, large language models (LLMs) have gained attention for their effectiveness, leading various industries to adapt general LLMs to their data for improved results, making efficient training and hardware availability crucial. At Arcee, we focus primarily on enhancing the domain adaptation of LLMs in a client-centric manner. Arcee’s innovative continual pre-training (CPT) and model merging techniques have brought a significant leap forward in the efficient training of LLMs, with particularly strong evaluations in the medical, legal, and financial verticals. Close collaboration with AWS Trainium has also played a major role in making the Arcee platform extremely performant, not only accelerating model training but also reducing overall costs and enforcing compliance and data integrity in the secure AWS environment. In this post, we show you how efficient we make our continual pre-training by using Trainium chips.
Understanding continual pre-training
Arcee recognizes the critical importance of continual CPT [1] in tailoring models to specific domains, as evidenced by previous studies such as PMC-LLaMA [2] and ChipNeMo [3]. These projects showcase the power of domain adaptation pre-training in enhancing model performance across diverse fields, from medical applications to industrial chip design. Inspired by these endeavors, our approach to CPT involves extending the training of base models like Llama 2 using domain-specific datasets, allowing us to fine-tune models to the nuances of specialized fields. To further amplify the efficiency of our CPT process, we collaborated with the Trainium team, using their cutting-edge technology to enhance a Llama 2 [4] model using a PubMed dataset [2] comprising 88 billion tokens. This collaboration represents a significant milestone in our quest for innovation, and through this post, we’re excited to share the transformative insights we’ve gained. Join us as we unveil the future of domain-specific model adaptation and the potential of CPT with Trainium in optimizing model performance for real-world applications.
Dataset collection
We followed the methodology outlined in the PMC-Llama paper [6] to assemble our dataset, which includes PubMed papers sourced from the Semantic Scholar API and various medical texts cited within the paper, culminating in a comprehensive collection of 88 billion tokens. For further details on the dataset, the original paper offers in-depth information.
To prepare this dataset for training, we used the Llama 2 tokenizer within an AWS Glue pipeline for efficient processing. We then organized the data so that each row contained 4,096 tokens, adhering to recommendations from the Neuron Distributed tutorials.
Why Trainium?
Continual pre-training techniques like the ones described in this post require access to high-performance compute instances, which has become more difficult to get as more developers are using generative artificial intelligence (AI) and LLMs for their applications. Traditionally, these workloads have been deployed to GPUs; however, in recent years, the cost and availability of GPUs has stifled model building innovations. With the introduction of Trainium, we are able to unlock new techniques that enable us to continue model innovations that will allow us to build models more efficiently and most importantly, at lower costs. Trainium is the second-generation machine learning (ML) accelerator that AWS purpose built to help developers access high-performance model training accelerators to help lower training costs by up to 50% over comparable Amazon Elastic Compute Cloud (Amazon EC2) instances. With Trainium available in AWS Regions worldwide, developers don’t have to take expensive, long-term compute reservations just to get access to clusters of GPUs to build their models. Trainium instances offer developers the performance they need with the elasticity they want to optimize both for training efficiency and lowering model building costs.
Setting up the Trainium cluster
We used AWS ParallelCluster to build a High Performance Computing (HPC) compute environment that uses Trn1 compute nodes to run our distributed ML training job (see the GitHub tutorial). You can also use developer flows like Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), Ray, or others (to learn more, see Developer Flows). After the nodes were launched, we ran a training task to confirm that the nodes were working, and used slurm commands to check the job status. In this part, we used the AWS pcluster command to run a .yaml file to generate the cluster. Our cluster consisted of 16 nodes, each equipped with a trn1n.32xlarge instance featuring 32 GB of VRAM.
We set up our ParallelCluster infrastructure as shown in the following diagram (source).

As shown in the preceding figure, inside a VPC, there are two subnets, a public one and a private one. The head node resides in the public subnet, and the compute fleet (in this case, Trn1 instances) is in the private subnet. A NAT gateway is also needed in order for nodes in the private subnet to connect to clients outside the VPC. In the following section, we describe how to set up the necessary infrastructure for Trn1 ParallelCluster.
Set up the environment
To set up your environment, complete the following steps:

Install the VPC and necessary components for ParallelCluster. For instructions, see VPC setup for ParallelCluster with Trn1.
Create and launch ParallelCluster in the VPC. For instructions, see Create ParallelCluster.

Now you can launch a training job to submit a model training script as a slurm job.
Deploy to Trainium
Trainium-based EC2 Trn1 instances use the AWS Neuron SDK and support common ML frameworks like PyTorch and TensorFlow. Neuron allows for effortless distributed training and has integrations with Megatron Nemo and Neuron Distributed.
When engaging with Trainium, it’s crucial to understand several key parameters:

Tensor parallel size – This determines the level of tensor parallelization, particularly in self-attention computations within transformers, and is crucial for optimizing memory usage (not computational time efficiency) during model loading
NeuronCores – Each Trainium device has two NeuronCores, and an eight-node setup equates to a substantial 256 cores
Mini batch – This reflects the number of examples processed in each batch as determined by the data loader
World size – This is the total count of nodes involved in the training operation

A deep understanding of these parameters is vital for anyone looking to harness the power of Trainium devices effectively.
Train the model
For this post, we train a Llama 2 7B model with tensor parallelism. For a streamlined and effective training process, we adhered to the following steps:

Download the Llama 2 full checkpoints (model weights and tokenizer) from Hugging Face.
Convert these checkpoints to a format compatible with the Neuron Distributed setup, so they can be efficiently utilized in our training infrastructure.
Determine the number of steps required per epoch, incorporating the effective batch size and dataset size to tailor the training process to our specific needs.
Launch the training job, carefully monitoring its progress and performance.
Periodically save training checkpoints. Initially, this process may be slow due to its synchronous nature, but improvements are anticipated as the NeuronX team works on enhancements.
Finally, convert the saved checkpoints back to a standard format for subsequent use, employing scripts for seamless conversion.

For more details, you can find the full implementation of the training steps in the following GitHub repository.
Clean up
Don’t forget to tear down any resources you set up in this post.
Our study focused on evaluating the quality of the CPT-enhanced checkpoints. We monitored the perplexity of a held-out PubMed dataset [6] across various checkpoints obtained during training, which provided valuable insights into the model’s performance improvements over time.
Through this journey, we’ve advanced our model’s capabilities, and hope to contribute to the broader community’s understanding of effective model adaptation strategies.
The following figure shows the perplexity of the baseline Llama 2 7B checkpoint vs. its CPT-enhanced checkpoint on the PMC test dataset. Based on these findings, continual pre-training on domain-specific raw data, specifically PubMed papers in our study, resulted in an enhancement of the Llama 2 7B checkpoint, leading to improved perplexity of the model on the PMC test set.

The following figure shows the perplexity of the CPT-enhanced checkpoints of the Llama 2 7B model across varying numbers of trained tokens. The increasing number of trained tokens correlated with enhanced model performance, as measured by the perplexity metric.

The following figure shows the perplexity comparison between the baseline Llama 2 7B model and its CPT-enhanced checkpoints, with and without data mixing. This underscores the significance of data mixing, where we have added 1% of general tokens to the domain-specific dataset, wherein utilizing a CPT-enhanced checkpoint with data mixing exhibited better performance compared to both the baseline Llama 2 7B model and the CPT-enhanced checkpoint solely trained on PubMed data.

Arcee’s innovative approach to CPT and model merging, as demonstrated through our collaboration with Trainium, signifies a transformative advancement in the training of LLMs, particularly in specialized domains such as medical research. By using the extensive capabilities of Trainium, we have not only accelerated the model training process, but also significantly reduced costs, with an emphasis on security and compliance that provides data integrity within a secure AWS environment.
The results from our training experiments, as seen in the improved perplexity scores of domain-specific models, underscore the effectiveness of our method in enhancing the performance and applicability of LLMs across various fields. This is particularly evident from the direct comparisons of time-to-train metrics between Trainium and traditional GPU setups, where Trainium’s efficiency and cost-effectiveness shine.
Furthermore, our case study using PubMed data for domain-specific training highlights the potential of Arcee’s CPT strategies to fine-tune models to the nuances of highly specialized datasets, thereby creating more accurate and reliable tools for professionals in those fields.
As we continue to push the boundaries of what’s possible in LLM training, we encourage researchers, developers, and enterprises to take advantage of the scalability, efficiency, and enhanced security features of Trainium and Arcee’s methodologies. These technologies not only facilitate more effective model training, but also open up new avenues for innovation and practical application in AI-driven industries.
The integration of Trainium’s advanced ML capabilities with Arcee’s pioneering strategies in model training and adaptation is poised to revolutionize the landscape of LLM development, making it more accessible, economical, and tailored to meet the evolving demands of diverse industries.
To learn more about, visit or reach out to our team.
Additional resources

Arcee’s whitepaper: Case Study on How Arcee is Innovating Domain Adaptation, through Continual Pre-Training and Model Merging
Arcee’s paper on Arxiv on model merging: Arcee’s MergeKit: A Toolkit for Merging Large Language Models
Arcee’s Mergekit repository on GitHub


Gupta, Kshitij, et al. “Continual Pre-Training of Large Language Models: How to (re) warm your model?.” arXiv preprint arXiv:2308.04014 (2023).
Wu, Chaoyi, et al. “Pmc-LLaMA: Towards building open-source language models for medicine.” arXiv preprint arXiv:2305.10415 6 (2023).
Liu, Mingjie, et al. “Chipnemo: Domain-adapted llms for chip design.” arXiv preprint arXiv:2311.00176 (2023).
Touvron, Hugo, et al. “Llama 2: Open foundation and fine-tuned chat models.” arXiv preprint arXiv:2307.09288 (2023).
Wu, C., Zhang, X., Zhang, Y., Wang, Y., & Xie, W. (2023). Pmc-llama: Further fine tuning llama on medical papers. arXiv preprint arXiv:2304.14454.

About the Authors

Mark McQuade is the CEO/Co-Founder at Arcee. Mark co-founded Arcee with a vision to empower enterprises with industry-specific AI solutions. This idea emerged from his time at Hugging Face, where he helped spearhead the Monetization team, collaborating with high-profile enterprises. This frontline experience exposed him to critical industry pain points: the reluctance to rely on closed source APIs and the challenges of training open source models without compromising data security.

Shamane Siri Ph.D. is the Head of Applied NLP Research at Arcee. Before joining Arcee, Shamane worked in both industry and academia, developing recommendation systems using language models to address the cold start problem, and focusing on information retrieval, multi-modal emotion recognition, and summarization. Shamane has also collaborated with the Hugging Face Transformers crew and Meta Reality Labs on cutting-edge projects. He holds a PhD from the University of Auckland, where he specialized in domain adaptation of foundational language models.

Malikeh Ehghaghi is an Applied NLP Research Engineer at Arcee. Malikeh’s research interests are NLP, domain-adaptation of LLMs, ML for healthcare, and responsible AI. She earned an MScAC degree in Computer Science from the University of Toronto. She previously collaborated with Lavita AI as a Machine Learning Consultant, developing healthcare chatbots in partnership with Dartmouth Center for Precision Health and Artificial Intelligence. She also worked as a Machine Learning Research Scientist at Cambridge Cognition Inc. and Winterlight Labs, with a focus on monitoring and detection of mental health disorders through speech and language. Malikeh has authored several publications presented at top-tier conferences such as ACL, COLING, AAAI, NAACL, IEEE-BHI, and MICCAI. A Lightning-Fast LLM Inference Platform with Device Suppor …

In artificial intelligence, one common challenge is ensuring that language models can process information quickly and efficiently. Imagine you’re trying to use a language model to generate text or answer questions on your device, but it’s taking too long to respond. This delay can be frustrating and impractical, especially in real-time applications like chatbots or voice assistants.

Currently, some solutions are available to address this issue. Some platforms offer optimization techniques like quantization, which reduces the model’s size and speeds up inference. However, these solutions may not always be easy to implement or may not support a wide range of devices and models.

Meet, a new platform designed to tackle the problem of slow language model inference head-on. offers various features to make inference faster and more efficient on different devices. It supports quantization, which reduces the memory usage of models and speeds up inference. Additionally, provides an easy-to-use HTTP server and Python bindings, making it accessible for developers to integrate into their applications. demonstrates its remarkable capabilities through its support for a wide range of quantization levels, from 2-bit to 8-bit. This allows developers to choose the level of optimization that best suits their needs, balancing inference speed and model accuracy. It also supports device offloading, allowing certain layers of the model to be processed on specialized hardware for even faster inference.

Another important feature of is its support for various types of models, including those from Hugging Face and GGUF. This means developers can use their preferred models without worrying about compatibility issues. Additionally, supports advanced techniques like Flash Attention V2 and X-LoRA MoE, further enhancing inference speed and efficiency.

In conclusion, is a powerful platform that addresses the challenge of slow language model inference with its wide range of features and optimizations. enables developers to create fast and efficient AI applications for various use cases by supporting quantization, device offloading, and advanced model architectures. 
The post A Lightning-Fast LLM Inference Platform with Device Support, Quantization, and Open-AI API Compatible HTTP Server and Python Bindings appeared first on MarkTechPost.

This Machine Learning Paper from ICMC-USP, NYU, and Capital-One Introd …

In the ever-evolving field of machine learning, developing models that predict and explain their reasoning is becoming increasingly crucial. As these models grow in complexity, they often become less transparent, resembling “black boxes” where the decision-making process is obscured. This opacity is problematic, particularly in sectors like healthcare and finance, where understanding the basis of decisions can be as important as understanding the decisions themselves.

One fundamental issue with complex models is their lack of transparency, which complicates their adoption in environments where accountability is key. Traditionally, methods to increase model transparency have included various feature attribution techniques that explain predictions by assessing the importance of input variables. However, these methods often suffer from inconsistencies; for example, results may vary significantly across different runs of the same model on identical data.

Researchers have developed gradient-based attribution methods to tackle these inconsistencies, but they, too, have limitations. These methods can provide divergent explanations for the same input under different conditions, undermining their reliability and the trust users place in the models they aim to elucidate.

Researchers from the University of São Paulo (ICMC-USP), New York University, and Capital One introduced a new approach known as the T-Explainer. This framework focuses on local additive explanations based on the robust mathematical principles of Taylor expansions. It aims to maintain high accuracy and consistency in its explanations. Unlike other methods that might fluctuate in their explanatory output, the T-Explainer operates through a deterministic process that ensures stability and repeatability in its results.

The T-Explainer not only pinpoints which features of a model influence predictions but does so with a precision that allows for deeper insight into the decision-making process. Through a series of benchmark tests, the T-Explainer demonstrated its superiority over established methods like SHAP and LIME regarding stability and reliability. For instance, in comparative evaluations, T-Explainer consistently showed an ability to maintain explanation accuracy across multiple assessments, outperforming others in stability metrics such as Relative Input Stability (RIS) and Relative Output Stability (ROS).

The T-Explainer integrates seamlessly with existing frameworks, enhancing its utility. It has been applied effectively across various model types, showcasing flexibility that is not always present in other explanatory frameworks. Its ability to provide consistent and understandable explanations enhances the trust in AI systems and facilitates a more informed decision-making process, making it invaluable in critical applications.

In conclusion, the T-Explainer emerges as a powerful solution to the pervasive opacity issue in machine learning models. By leveraging Taylor expansions, this innovative framework offers deterministic and stable explanations that surpass existing methods like SHAP and LIME regarding consistency and reliability. The results from various benchmark tests confirm T-Explainer’s superior performance, significantly enhancing the transparency and trustworthiness of AI applications. As such, the T-Explainer addresses the critical need for clarity in AI decision-making processes and sets a new standard for explainability, paving the way for more accountable and interpretable AI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post This Machine Learning Paper from ICMC-USP, NYU, and Capital-One Introduces T-Explainer: A Novel AI Framework for Consistent and Reliable Machine Learning Model Explanations appeared first on MarkTechPost.

From Lost to Found: INformation-INtensive (IN2) Training Revolutionize …

Long-context large language models (LLMs) have garnered attention, with extended training windows enabling processing of extensive context. However, recent studies highlight a challenge: these LLMs struggle to utilize middle information effectively, termed the lost-in-the-middle challenge. While the LLM can comprehend the information at the beginning and end of the long context, it often overlooks the information in the middle. This impedes tasks like Needle-in-the-Haystack and passkey retrieval. Consequently, a pressing research question arises: how can long-context LLMs fully utilize the information in the long context?

Recent research has significantly advanced the exploration of training large models with extended context windows, particularly in long-context LLMs. This development primarily encompasses two directions: data engineering and effective training methods. Data engineering involves balancing, arrangement, instruction, data collection, and quality measurement. Effective training methods focus on optimizing training processes through techniques such as position encoding, batching strategy, parameter-efficient training, and novel model architectures. Evaluations of long-context models are conducted through real-world benchmarks like NarrativeQA and probing tasks like Needle-in-the-Haystack and passkey retrieval, providing insights into long-context utilization across various lengths and positions.

A team of researchers from IAIR, Xi’an Jiaotong University, Microsoft, and Peking University present INformation-INtensive (IN2) training to effectively utilize information throughout the context in long-context LLMs. IN2 training employs a purely data-driven approach using a synthesized long-context question-answer dataset. This dataset comprises concatenated long contexts from multiple short segments and corresponding question-answer pairs. These pairs prompt the model to recognize fine-grained information within individual segments and integrate information from various segments. The QA pairs are generated by directing GPT-4-Turbo with predefined instructions and raw segments.

IN2 training teaches the model that crucial information can exist throughout a long context, not just at its edges. The researchers construct a long-context question-answer training dataset, where answers require information from randomly placed short segments within the long context. Utilizing a natural language corpus, they generate question-answer pairs using a powerful LLM and synthesize long contexts containing relevant information from various texts. Two types of pairs are generated, focusing on fine-grained information awareness and integration of information across segments. They evenly distribute context lengths and retain some original short-context pairs to ensure balanced training. The resulting dataset comprises various types of data for different training purposes. 

FILM-7B, trained using IN2 training, effectively addresses the lost-in-the-middle problem long-context models encounter. Probing results demonstrate FILM-7B’s robust performance compared to the vanilla Mistral model, indicating its ability to utilize information across different positions within the context. FILM-7B achieves performance comparable to or better than GPT-4-Turbo across various tasks. Quantitative analysis using average score and min-max gap metrics on VAL Probing further validates FILM-7B’s effectiveness, particularly in document and code probing tasks. These results suggest that open-source long-context models can rival proprietary ones, closing the performance gap

In conclusion, this research introduces IN2 training, representing a significant step forward in addressing the lost-in-the-middle challenge faced by long-context LLMs. By effectively utilizing information throughout the context, FILM-7B, trained using IN2, demonstrates robust performance across various tasks, comparable to or even outperforming proprietary models like GPT-4-Turbo. These findings underscore the potential of open-source models to bridge the gap with proprietary ones, paving the way for further advancements in long-context language modeling.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post From Lost to Found: INformation-INtensive (IN2) Training Revolutionizes Long-Context Language Understanding appeared first on MarkTechPost.

Microsoft’s GeckOpt Optimizes Large Language Models: Enhancing Compu …

Large language models (LLMs) are the backbone of numerous computational platforms, driving innovations that impact a broad spectrum of technological applications. These models are pivotal in processing and interpreting vast amounts of data, yet they are often hindered by high operational costs and inefficiencies related to system tool utilization.

Optimizing LLM performance without prohibitive computational expenses is a significant challenge in this field. Traditionally, LLMs operate under systems that engage various tools for any given task, regardless of the specific needs of each operation. This broad tool activation drains computational resources and significantly increases the costs associated with data processing tasks.

Emerging methodologies are refining the approach to tool selection in LLMs, focusing on the precision of tool deployment based on the task. By identifying the underlying intent of user commands through advanced reasoning capabilities, these systems can selectively streamline the toolset required for task execution. This strategic reduction in tool activation directly contributes to enhanced system efficiency and reduced computational overhead.

The GeckOpt system, developed by Microsoft Corporation researchers, represents a cutting-edge approach to intent-based tool selection. This methodology involves a preemptive user intent analysis, allowing for an optimized selection of API tools before the task execution begins. The system operates by narrowing down the potential tools to those most relevant to the task’s specific requirements, minimizing unnecessary activations, and focusing computational power where it is most needed.

Preliminary results from implementing GeckOpt in a real-world setting, specifically on the Copilot platform with over 100 GPT-4-Turbo nodes, have shown promising outcomes. The system has substantially reduced token consumption by up to 24.6% while maintaining high operational standards. These efficiency gains are reflected in reduced system costs and improved response times without significant sacrifices in performance quality. The trials conducted have shown deviations within a negligible range of 1% in success rates, underscoring the reliability of GeckOpt under varied operational conditions.

The success of GeckOpt in streamlining LLM operations presents a robust case for the widespread adoption of intent-based tool selection methodologies. By effectively reducing the operational load and optimizing tool use, the system curtails costs and enhances the scalability of LLM applications across different platforms. Introducing such technologies is poised to transform the landscape of computational efficiency, offering a sustainable and cost-effective model for the future of large-scale AI implementations.

In conclusion, integrating intent-based tool selection through systems like GeckOpt marks a progressive step towards optimizing the infrastructure of large language models. This approach significantly mitigates the operational demands on LLM systems, promoting a cost-efficient and highly effective computational environment. As these models evolve and their applications expand, technological advancements will be crucial in harnessing AI’s potential while maintaining economic viability.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Microsoft’s GeckOpt Optimizes Large Language Models: Enhancing Computational Efficiency with Intent-Based Tool Selection in Machine Learning Systems appeared first on MarkTechPost.

How Scientific Machine Learning is Revolutionizing Research and Discov …

Scientific Machine Learning (SciML) is an innovative field at the crossroads of ML, data science, and computational modeling. This emerging discipline utilizes powerful algorithms to propel discoveries across various scientific domains, including biology, physics, and environmental sciences.

Image Source

Expanding the Horizons of Research

Accelerated Discovery and Innovation

SciML allows for the quick processing and analysis of massive datasets, drastically reducing the time from hypothesis generation to experimental verification. This rapid cycle is pivotal in fields like pharmacology, where algorithms streamline the drug development process by analyzing vast databases of chemical compounds for potential drug efficacy and safety.

Sophisticated Predictive Models

Integrating machine learning with specific domain knowledge facilitates the creation of advanced predictive models. These models are instrumental in various applications, such as anticipating climate changes, predicting disease patterns in biomedicine, or discovering new astronomical phenomena.

Enhancement of Computational Efficiency

By automating the analysis of large datasets, SciML reduces both the time and cost associated with traditional research methods. This efficiency allows scientists to allocate more resources towards creative and complex challenges.

Diverse Applications in Scientific Fields

Biology and Medicine: In drug discovery, SciML helps identify new drug candidates more efficiently by predicting outcomes based on molecular data. This accelerates development and improves the predictability of drug interactions and side effects.

Genetics: Machine learning techniques are used in genomics to understand complex genetic information. These techniques facilitate advances in personalized medicine by predicting disease susceptibility and drug responses based on genetic variations.

Environmental Science: In climate science, machine learning models analyze vast amounts of ecological data to forecast weather patterns, assess climate change impacts, and model ecosystem responses.

Astrophysics: Machine learning aids in processing and interpreting the enormous data from telescopes and space missions, enhancing our understanding of the universe, from star classification to galaxy formation.

Material Science: AI is used to discover and design new materials with desirable properties faster than ever, revolutionizing industries from manufacturing to electronics.

Benefits and Challenges


The path forward for SciML involves enhancing collaborative efforts across disciplines to refine methodologies and expand applications. Addressing the ethical and technical challenges will ensure that SciML can fulfill its potential to push the boundaries of human knowledge and solve complex problems. SciML is poised to be a cornerstone in the next generation of scientific research, offering unprecedented tools for discovery and a deeper understanding of the complex natural world.


What is SciML?

The post How Scientific Machine Learning is Revolutionizing Research and Discovery appeared first on MarkTechPost.

Cohere AI Open-Sources ‘Cohere Toolkit’: A Major Accelerant for Ge …

Cohere AI has made a major advancement in the field of Artificial Intelligence (AI) development by releasing the Cohere Toolkit, a comprehensive open-source repository designed to accelerate the development of AI applications. Cohere, which is a leading enterprise AI platform, has released the toolkit with future extensions to incorporate new platforms. This toolkit enables developers to make use of Cohere’s advanced models, Command, Embed, and Rerank, across several platforms, including AWS, Azure, and Cohere’s own platform. 

By providing a set of production-ready apps that can be easily deployed across cloud providers, the Cohere Toolkit enables developers to comply with strict security guidelines and interface with their desired data sources. The primary function of the Cohere Toolkit is to accelerate the development process by offering a collection of modular components and pre-built apps. 

Through the use of the Cohere Toolkit, developers can take advantage of a knowledge assistant, the toolkit’s first application that functions similarly to the well-known demo on Cohere’s platform. With its complex integration with company data, this virtual assistant boosts efficiency by enabling quick access to information, task automation, and more efficient teamwork.

The conversational skills of the Cohere-powered knowledge assistant are one of its most notable features. Cohere’s models are expertly adjusted to comprehend the intent of a conversation, remember previous exchanges, and carry out intricate corporate use cases through the RAG (Retrieval-Augmented Generation) method. It also provides carefully selected, relevant references taken straight from private databases, which improves the precision and reliability of answers.

Customizability is another appealing feature of the Cohere Toolkit. More than one hundred pre-built connectors have been made available for developers to integrate, enhancing the assistant’s functionality with customized data sources and tools. Because of its adaptability, customised apps may be created that perfectly match the particular needs and operations of the company. 

The plug-and-play architecture of the toolkit consists of the following.

Interfaces: Pre-made UI elements that are easily incorporated with the backend programming. The knowledge assistant interface supports multi-turn chats, citation embedding, document uploads, and conversation histories.

Models: Enables users to engage with Cohere’s exclusive Command R and R+ models, which are hosted on a variety of AI platforms, to efficiently power applications.

Retrieval: Consists of a set of parts needed to construct some great retrieval systems that are essential to reliable RAG pipelines. This module includes integration possibilities with popular libraries like LlamaIndex and LangChain and offers a selection of 100 free-to-use connectors to key enterprise data sources with OAuth authentication. The developers can use Cohere’s Embed model with vector databases like Pinecone, Weaviate, and OpenSearch on a variety of cloud AI services.

In conclusion, the Cohere Toolkit is a major step forward in the development of AI applications. In the past, creating AI applications has been a laborious process that involves assembling many parts, including AI models, prompt templates, retrieval systems, and user interfaces, in a safe setting. Months of development time are frequently spent on this difficult undertaking as developers work through the challenges of integration and experimentation. The Cohere Toolkit greatly expedites the development lifecycle and democratizes access to powerful AI capabilities.

Check out the GitHub page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Cohere AI Open-Sources ‘Cohere Toolkit’: A Major Accelerant for Getting LLMs into Production within an Enterprise appeared first on MarkTechPost.

Databricks DBRX is now available in Amazon SageMaker JumpStart

Today, we are excited to announce that the DBRX model, an open, general-purpose large language model (LLM) developed by Databricks, is available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. The DBRX LLM employs a fine-grained mixture-of-experts (MoE) architecture, pre-trained on 12 trillion tokens of carefully curated data and a maximum context length of 32,000 tokens.
You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you can quickly get started with ML. In this post, we walk through how to discover and deploy the DBRX model.
What is the DBRX model
DBRX is a sophisticated decoder-only LLM built on transformer architecture. It employs a fine-grained MoE architecture, incorporating 132 billion total parameters, with 36 billion of these parameters being active for any given input.
The model underwent pre-training using a dataset consisting of 12 trillion tokens of text and code. In contrast to other open MoE models like Mixtral and Grok-1, DBRX features a fine-grained approach, using a higher quantity of smaller experts for optimized performance. Compared to other MoE models, DBRX has 16 experts and chooses 4.
The model is made available under the Databricks Open Model license, for use without restrictions.
What is SageMaker JumpStart
SageMaker JumpStart is a fully managed platform that offers state-of-the-art foundation models for various use cases such as content writing, code generation, question answering, copywriting, summarization, classification, and information retrieval. It provides a collection of pre-trained models that you can deploy quickly and with ease, accelerating the development and deployment of ML applications. One of the key components of SageMaker JumpStart is the Model Hub, which offers a vast catalog of pre-trained models, such as DBRX, for a variety of tasks.
You can now discover and deploy DBRX models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with Amazon SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping provide data security.
Discover models in SageMaker JumpStart
You can access the DBRX model through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane.

From the SageMaker JumpStart landing page, you can search for “DBRX” in the search box. The search results will list DBRX Instruct and DBRX Base.

You can choose the model card to view details about the model such as license, data used to train, and how to use the model. You will also find the Deploy button to deploy the model and create an endpoint.

Deploy the model in SageMaker JumpStart
Deployment starts when you choose the Deploy button. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you select the option to use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.
To deploy using the SDK, we start by selecting the DBRX Base model, specified by the model_id with value huggingface-llm-dbrx-base. You can deploy any of the selected models on SageMaker with the following code. Similarly, you can deploy DBRX Instruct using its own model ID.

from sagemaker.jumpstart.model import JumpStartModel

accept_eula = True

model = JumpStartModel(model_id=”huggingface-llm-dbrx-base”)
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The Eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA). Also make sure you have the account-level service limit for using ml.p4d.24xlarge or ml.pde.24xlarge for endpoint usage as one or more instances. You can follow the instructions here in order to request a service quota increase.
After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
“inputs”: “Hello!”,
“parameters”: {
“max_new_tokens”: 10,

Example prompts
You can interact with the DBRX Base model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. In this section, we provide some example prompts and sample output.
Code generation
Using the preceding example, we can use code generation prompts as follows:

payload = {
“inputs”: “Write a function to read a CSV file in Python using pandas library:”,
“parameters”: {
“max_new_tokens”: 30, }, }
response = predictor.predict(payload)[“generated_text”].strip()

The following is the output:

import pandas as pd
df = pd.read_csv(“file_name.csv”)
#The above code will import pandas library and then read the CSV file using read_csv

Sentiment analysis
You can perform sentiment analysis using a prompt like the following with DBRX:

payload = {
“inputs”: “””
Tweet: “I am so excited for the weekend!”
Sentiment: Positive

Tweet: “Why does traffic have to be so terrible?”
Sentiment: Negative

Tweet: “Just saw a great movie, would recommend it.”
Sentiment: Positive

Tweet: “According to the weather report, it will be cloudy today.”
Sentiment: Neutral

Tweet: “This restaurant is absolutely terrible.”
Sentiment: Negative

Tweet: “I love spending time with my family.”
“parameters”: {
“max_new_tokens”: 2,
response = predictor.predict(payload)[“generated_text”].strip()

The following is the output:


Question answering
You can use a question answering prompt like the following with DBRX:

# Question answering
payload = {
“inputs”: “Respond to the question: How did the development of transportation systems, such as railroads and steamships, impact global trade and cultural exchange?”,
“parameters”: {
“max_new_tokens”: 225,
response = predictor.predict(payload)[“generated_text”].strip()

The following is the output:

The development of transportation systems, such as railroads and steamships, impacted global trade and cultural exchange in a number of ways.
The documents provided show that the development of these systems had a profound effect on the way people and goods were able to move around the world.
One of the most significant impacts of the development of transportation systems was the way it facilitated global trade.
The documents show that the development of railroads and steamships made it possible for goods to be transported more quickly and efficiently than ever before.
This allowed for a greater exchange of goods between different parts of the world, which in turn led to a greater exchange of ideas and cultures.
Another impact of the development of transportation systems was the way it facilitated cultural exchange. The documents show that the development of railroads and steamships made it possible for people to travel more easily and quickly than ever before.
This allowed for a greater exchange of ideas and cultures between different parts of the world. Overall, the development of transportation systems, such as railroads and steamships, had a profound impact on global trade and cultural exchange.

DBRX Instruct
The instruction-tuned version of DBRX accepts formatted instructions where conversation roles must start with a prompt from the user and alternate between user instructions and the assistant (DBRX-instruct). The instruction format must be strictly respected, otherwise the model will generate suboptimal outputs. The template to build a prompt for the Instruct model is defined as follows:

{system_message} <|im_end|>
{human_message} <|im_end|>

<|im_start|> and <|im_end|> are special tokens for beginning of string (BOS) and end of string (EOS). The model can contain multiple conversation turns between system, user, and assistant, allowing for the incorporation of few-shot examples to enhance the model’s responses.
The following code shows how you can format the prompt in instruction format:

from typing import Dict, List

def format_instructions(instructions: List[Dict[str, str]]) -> List[str]:
“””Format instructions where conversation roles must alternate system/user/assistant/user/assistant/…”””
prompt: List[str] = []
for instruction in instructions:
if instruction[“role”] == “system”:
prompt.extend([“<|im_start|>systemn”, (instruction[“content”]).strip(), ” <|im_end|>n”])
elif instruction[“role”] == “user”:
prompt.extend([“<|im_start|>usern”, (instruction[“content”]).strip(), ” <|im_end|>n”])
raise ValueError(f”Invalid role: {instruction[‘role’]}. Role must be either ‘user’ or ‘system’.”)
return “”.join(prompt)

def print_instructions(prompt: str, response: str) -> None:
bold, unbold = ‘33[1m’, ‘33[0m’
print(f”{bold}> Input{unbold}n{prompt}nn{bold}> Output{unbold}n{response[‘generated_text’].strip()}n”)

Knowledge retrieval
You can use the following prompt for knowledge retrieval:

instructions = [
{“role”: “system”, “content”: “You are an expert in suggesting diet plans optimized for different sports or fitness activities, taking into account energy requirements, and recovery requirements.”},
{“role”: “user”, “content”: “I am trying to build muscle mass and increase my strength. What kind of high-protein, calorie-dense diet plan would you recommend for me?”}
prompt = format_instructions(instructions)
payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 700, “temperature”: 0.5, “do_sample”: True}
print_instructions(prompt, response)

The following is the output:

> Input
You are an expert in suggesting diet plans optimized for different sports or fitness activities, taking into account energy requirements, and recovery requirements. <|im_end|>
I am trying to build muscle mass and increase my strength. What kind of high-protein, calorie-dense diet plan would you recommend for me? <|im_end|>

> Output
Absolutely, I’d be happy to help you with that. Given your goal of building muscle mass and increasing strength, you’ll want a diet that’s high in protein and provides sufficient calories to support muscle growth and recovery. Here’s a basic outline of what your diet plan could look like:

1. **Breakfast:**
– 3 whole eggs
– 3 egg whites
– 1 cup of oatmeal
– 1 scoop of whey protein
– 1 banana

2. **Mid-Morning Snack:**
– 1.5 ounces of almonds
– 1 apple

3. **Lunch:**
– 8 ounces of chicken breast
– 1 cup of brown rice
– 1 cup of mixed vegetables

4. **Afternoon Snack:**
– 1.5 cups of Greek yogurt
– 1 cup of mixed berries

5. **Dinner:**
– 8 ounces of salmon or lean beef
– 1 cup of quinoa
– 1.5 cups of mixed vegetables

6. **Evening Snack:**
– 1 scoop of casein protein
– 1 tablespoon of almond butter

This is a high-protein, calorie-dense diet that includes a variety of nutrient-rich foods. It’s important to note that everyone’s nutritional needs are different, so you may need to adjust portion sizes or food choices based on your specific needs and preferences.

Also, remember to stay hydrated by drinking plenty of water throughout the day. Regular exercise, particularly strength training, is crucial to achieving your goals. It’s also important to get enough sleep to aid in muscle recovery and growth.

Before starting any new diet or exercise program, it’s a good idea to consult with a healthcare professional or a registered dietitian. They can provide personalized advice based on your health history and current lifestyle.

Code generation
DBRX models demonstrate benchmarked strengths for coding tasks. For example, see the following code:

instructions = [
{“role”: “system”, “content”: “You are an expert in coding and programming. Your role is to provide examples, explanations, and insights related to programming languages, algorithms, data structures, and software development practices.”},
{“role”: “user”, “content”: “Write a Python script to read a CSV file containing stock prices and plot the closing prices over time using Matplotlib.”}
prompt = format_instructions(instructions)
payload = { “inputs”: prompt, “parameters”: {“max_new_tokens”: 315, “temperature”: 0.3, “do_sample”: True} }
print_instructions(prompt, response)

The following is the output:

> Input
You are an expert in coding and programming. Your role is to provide examples, explanations, and insights related to programming languages, algorithms, data structures, and software development practices. <|im_end|>
Write a Python script to read a CSV file containing stock prices and plot the closing prices over time using Matplotlib. <|im_end|>

> Output
Sure, I’d be happy to help you with that. Here’s a simple Python script that uses the `pandas` library to read a CSV file and `matplotlib` to plot the closing prices over time.

import pandas as pd
import matplotlib.pyplot as plt

# Load the data from a CSV file
df = pd.read_csv(‘stock_prices.csv’)

# Plot the closing prices over time
plt.plot(df[‘Date’], df[‘Close’])
plt.title(‘Stock Prices Over Time’)

In this script, replace `’stock_prices.csv’` with the path to your CSV file. The file should have columns named ‘Date’ and ‘Close’ for this script to work correctly. If your file uses different column names, you’ll need to adjust the script accordingly.

The `pandas` library is a powerful tool for data manipulation and analysis. It can read data from a variety of sources, including CSV files, and represent the data in a tabular format. The `matplotlib` library is a popular choice for creating static, animated, and interactive visualizations in Python.

This script first loads the data from the CSV file into a pandas DataFrame. It then plots the ‘Close’ column against the ‘Date’ column using matplotlib’s `plot()` function. The `figure()` function is used to specify the size of the plot, and `show()` is used to display the plot.

Mathematics and reasoning
The DBRX models also report strengths in mathematic accuracy. For example, see the following code:

instructions = [
{“role”: “system”, “content”: “You are an expert in mathematics and reasoning. Your role is to provide examples, explanations, and insights related to mathematical concepts, problem-solving techniques, and logical reasoning.”},
{“role”: “user”, “content”: “A car rental company charges $40 per day plus $0.25 per mile driven. If I rented a car for 3 days and drove 250 miles, what would be the total cost? Break down the calculation steps before giving the final answer.”}
prompt = format_instructions(instructions)
payload = {
“inputs”: prompt,
“parameters”: {“max_new_tokens”: 400, “temperature”: 0.2, “do_sample”: True}
print_instructions(prompt, response)

DBRX can provide comprehension as shown in the following output with the math logic:

> Input
You are an expert in mathematics and reasoning. Your role is to provide examples, explanations, and insights related to mathematical concepts, problem-solving techniques, and logical reasoning. <|im_end|>
A car rental company charges $40 per day plus $0.25 per mile driven. If I rented a car for 3 days and drove 250 miles, what would be the total cost? Break down the calculation steps before giving the final answer. <|im_end|>

> Output
Sure, I’d be happy to help with that! The problem you’ve presented involves both fixed costs (the daily rate) and variable costs (the cost per mile). Let’s break it down:

1. First, let’s calculate the total cost of the fixed daily rate. The car rental company charges $40 per day, and you rented the car for 3 days. So, the total cost for the daily rate is $40/day * 3 days = $120.

2. Next, let’s calculate the total cost of the variable rate, which is the cost per mile driven. The company charges $0.25 per mile, and you drove 250 miles. So, the total cost for the mileage is $0.25/mile * 250 miles = $62.50.

3. Finally, to find the total cost of renting the car, we add the total cost of the daily rate and the total cost of the mileage. So, the total cost is $120 + $62.50 = $182.50.

So, the total cost of renting the car for 3 days and driving 250 miles would be $182.50.

Clean up
After you’re done running the notebook, make sure to delete all resources that you created in the process so your billing is stopped. Use the following code:


In this post, we showed you how to get started with DBRX in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

SageMaker JumpStart documentation
SageMaker JumpStart foundation models documentation
SageMaker JumpStart product detail page
SageMaker JumpStart model catalog

About the Authors
Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 400 patents in the AI/ML and IoT domains. He has over 8 years of industry experience from startups to large-scale enterprises, from IoT Research Engineer, Data Scientist, to Data & AI Architect. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for organizations and supports GSI partners in building strategic industry
Niithiyn Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.
Sebastian Bustillo is a Solutions Architect at AWS. He focuses on AI/ML technologies with a profound passion for generative AI and compute accelerators. At AWS, he helps customers unlock business value through generative AI. When he’s not at work, he enjoys brewing a perfect cup of specialty coffee and exploring the world with his wife.
Armando Diaz is a Solutions Architect at AWS. He focuses on generative AI, AI/ML, and data analytics. At AWS, Armando helps customers integrating cutting-edge generative AI capabilities into their systems, fostering innovation and competitive advantage. When he’s not at work, he enjoys spending time with his wife and family, hiking, and traveling the world.

Knowledge Bases in Amazon Bedrock now simplifies asking questions on a …

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG).
In previous posts, we covered new capabilities like hybrid search support, metadata filtering to improve retrieval accuracy, and how Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow.
Today, we’re introducing the new capability to chat with your document with zero setup in Knowledge Bases for Amazon Bedrock. With this new capability, you can securely ask questions on single documents, without the overhead of setting up a vector database or ingesting data, making it effortless for businesses to use their enterprise data. You only need to provide a relevant data file as input and choose your FM to get started.
But before we jump into the details of this feature, let’s start with the basics and understand what RAG is, its benefits, and how this new capability enables content retrieval and generation for temporal needs.
What is Retrieval Augmented Generation?
FM-powered artificial intelligence (AI) assistants have limitations, such as providing outdated information or struggling with context outside their training data. RAG addresses these issues by allowing FMs to cross-reference authoritative knowledge sources before generating responses.
With RAG, when a user asks a question, the system retrieves relevant context from a curated knowledge base, such as company documentation. It provides this context to the FM, which uses it to generate a more informed and precise response. RAG helps overcome FM limitations by augmenting its capabilities with an organization’s proprietary knowledge, enabling chatbots and AI assistants to provide up-to-date, context-specific information tailored to business needs without retraining the entire FM. At AWS, we recognize RAG’s potential and have worked to simplify its adoption through Knowledge Bases for Amazon Bedrock, providing a fully managed RAG experience.
Short-term and instant information needs
Although a knowledge base does all the heavy lifting and serves as a persistent large store of enterprise knowledge, you might require temporary access to data for specific tasks or analysis within isolated user sessions. Traditional RAG approaches are not optimized for these short-term, session-based data access scenarios.
Businesses incur charges for data storage and management. This may make RAG less cost-effective for organizations with highly dynamic or ephemeral information requirements, especially when data is only needed for specific, isolated tasks or analyses.
Ask questions on a single document with zero setup
This new capability to chat with your document within Knowledge Bases for Amazon Bedrock addresses the aforementioned challenges. It provides a zero-setup method to use your single document for content retrieval and generation-related tasks, along with the FMs provided by Amazon Bedrock. With this new capability, you can ask questions of your data without the overhead of setting up a vector database or ingesting data, making it effortless to use your enterprise data.
You can now interact with your documents in real time without prior data ingestion or database configuration. You don’t need to take any further data readiness steps before querying the data.
This zero-setup approach makes it straightforward to use your enterprise information assets with generative AI using Amazon Bedrock.
Use cases and benefits
Consider a recruiting firm that needs to analyze resumes and match candidates with suitable job opportunities based on their experience and skills. Previously, you would have to set up a knowledge base, invoking a data ingestion workflow to make sure only authorized recruiters can access the data. Additionally, you would need to manage cleanup when the data was no longer required for a session or candidate. In the end, you would pay more for the vector database storage and management than for the actual FM usage. This new feature in Knowledge Bases for Amazon Bedrock enables recruiters to quickly and ephemerally analyze resumes and match candidates with suitable job opportunities based on the candidate’s experience and skill set.
For another example, consider a product manager at a technology company who needs to quickly analyze customer feedback and support tickets to identify common issues and areas for improvement. With this new capability, you can simply upload a document to extract insights in no time. For example, you could ask “What are the requirements for the mobile app?” or “What are the common pain points mentioned by customers regarding our onboarding process?” This feature empowers you to rapidly synthesize this information without the hassle of data preparation or any management overhead. You can also request summaries or key takeaways, such as “What are the highlights from this requirements document?”
The benefits of this feature extend beyond cost savings and operational efficiency. By eliminating the need for vector databases and data ingestion, this new capability within Knowledge Bases for Amazon Bedrock helps secure your proprietary data, making it accessible only within the context of isolated user sessions.
Now that we’ve covered the feature benefits and the use cases it enables, let’s dive into how you can start using this new feature from Knowledge Bases for Amazon Bedrock.
Chat with your document in Knowledge Bases for Amazon Bedrock
You have multiple options to begin using this feature:

The Amazon Bedrock console
The Amazon Bedrock RetrieveAndGenerate API (SDK)

Let’s see how we can get started using the Amazon Bedrock console:

On the Amazon Bedrock console, under Orchestration in the navigation pane, choose Knowledge bases.
Choose Chat with your document.
Under Model, choose Select model.
Choose your model. For this example, we use the Claude 3 Sonnet model (we are only supporting Sonnet at the time of the launch).
Choose Apply.
Under Data, you can upload the document you want to chat with or point to the Amazon Simple Storage Service (Amazon S3) bucket location that contains your file. For this post, we upload a document from our computer.

The supported file formats are PDF, MD (Markdown), TXT, DOCX, HTML, CSV, XLS, and XLSX. Make that the file size does not exceed 10 MB and contains no more than 20,000 tokens. A token is considered to be a unit of text, such as a word, sub-word, number, or symbol, that is processed as a single entity. Due to the preset ingestion token limit, it is recommended to use a file under 10MB. However, a text-heavy file, that is much smaller than 10MB, can potentially breach the token limit.

You’re now ready to chat with your document.

As shown in the following screenshot, you can chat with your document in real time.

To customize your prompt, enter your prompt under System prompt.

Similarly, you can use the AWS SDK through the retrieve_and_generate API in major coding languages. In the following example, we use the AWS SDK for Python (Boto3):

import boto3

bedrock_client = boto3.client(service_name=’bedrock-agent-runtime’)
model_id = “your_model_id_here” # Replace with your modelID
document_uri = “your_s3_uri_here” # Replace with your S3 URI

def retrieveAndGenerate(input_text, sourceType, model_id, document_s3_uri=None, data=None):
region = ‘us-west-2′
model_arn = f’arn:aws:bedrock:{region}::foundation-model/{model_id}’

if sourceType == “S3”:
return bedrock_client.retrieve_and_generate(
input={‘text’: input_text},
‘externalSourcesConfiguration’: {
‘modelArn’: model_arn,
‘sources’: [
“sourceType”: sourceType,
“s3Location”: {
“uri”: document_s3_uri

return bedrock_client.retrieve_and_generate(
input={‘text’: input_text},
‘externalSourcesConfiguration’: {
‘modelArn’: model_arn,
‘sources’: [
“sourceType”: sourceType,
“byteContent”: {
“identifier”: “testFile.txt”,
“contentType”: “text/plain”,
“data”: data

response = retrieveAndGenerate(
input_text=”What is the main topic of this document?”,


In this post, we covered how Knowledge Bases for Amazon Bedrock now simplifies asking questions on a single document. We explored the core concepts behind RAG, the challenges this new feature addresses, and the various use cases it enables across different roles and industries. We also demonstrated how to configure and use this capability through the Amazon Bedrock console and the AWS SDK, showcasing the simplicity and flexibility of this feature, which provides a zero-setup solution to gather information from a single document, without setting up a vector database.
To further explore the capabilities of Knowledge Bases for Amazon Bedrock, refer to the following resources:

Knowledge bases for Amazon Bedrock
Getting started with Amazon Bedrock, RAG, and Vector database in Python
Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain (Part 1 and Part 2)

Share and learn with our generative AI community at

About the authors

Suman Debnath is a Principal Developer Advocate for Machine Learning at Amazon Web Services. He frequently speaks at AI/ML conferences, events, and meetups around the world. He is passionate about large-scale distributed systems and is an avid fan of Python.

Sebastian Munera is a Software Engineer in the Amazon Bedrock Knowledge Bases team at AWS where he focuses on building customer solutions that leverage Generative AI and RAG applications. He has previously worked on building Generative AI-based solutions for customers to streamline their processes and Low code/No code applications. In his spare time he enjoys running, lifting and tinkering with technology.

Revolutionizing Web Automation: AUTOCRAWLER’s Innovative Framework E …

Web automation technologies are vital in streamlining complex tasks that traditionally require human intervention. These technologies automate actions within web-based platforms, enhancing efficiency and scalability across various digital operations. Traditionally, web automation relies heavily on scripts or software, known as wrappers, to extract data from websites. While effective in consistent, unchanging environments, this method struggles with adaptability when confronted with new or updated web architectures.

The primary challenge in the field revolves around the inflexibility of existing web automation tools, which fail to adapt to dynamic and evolving web environments efficiently. Many of these tools depend on static rules or wrappers that cannot cope with the variability and unpredictability of modern web interfaces, leading to inefficiencies in web interaction and data extraction.

Researchers from Fudan University, Fudan-Aishu Cognitive Intelligence Joint Research Center, and Alibaba Holding-Aicheng Technology-Enterprise have developed AUTOCRAWLER. This sophisticated two-stage framework significantly enhances the capability of web automation tools. This new approach utilizes HTML’s hierarchical nature to better understand and interact with web pages. By implementing a combination of top-down and step-back operations, AUTO CRAWLER adapts to the structure of web content, learning from previous errors to optimize future actions.

AUTOCRAWLER’s innovation lies in its ability to learn and adjust quickly. As it navigates through web pages, it refines its approach to interacting with web elements, thus minimizing errors and enhancing efficiency. The framework’s adaptability is evident in its performance across diverse web environments, showing considerable improvements over traditional methods. For instance, in tests involving multiple large language models (LLMs), AUTOCRAWLER demonstrated a success rate enhancement, with precision metrics improving significantly compared to existing tools.

The framework’s experimental results showed a remarkable increase in the accuracy and efficiency of web crawlers powered by AUTOCRAWLER. Specifically, using AUTOCRAWLER with smaller LLMs achieved a correct execution rate upwards of 40%, a substantial improvement over traditional methods, which often struggled to reach such levels of precision.

In conclusion, the research presents AUTOCRAWLER, a pioneering framework that addresses the critical shortcomings of traditional web automation tools. By employing a two-stage methodology that capitalizes on the hierarchical structure of HTML, AUTOCRAWLER significantly enhances adaptability and scalability in dynamic web environments. The results from extensive testing showcase marked improvements in efficiency and performance, particularly in precision metrics across diverse web scenarios. This breakthrough signifies a major advancement in web automation, promising more robust and flexible tools for handling the complexities of modern digital landscapes.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit
The post Revolutionizing Web Automation: AUTOCRAWLER’s Innovative Framework Enhances Efficiency and Adaptability in Dynamic Web Environments appeared first on MarkTechPost.