Researchers from the University of Manchester Introduce MentalLLaMA: T …

PTSD and other mental health issues have an impact on public health globally. Due to stigma, many individuals do not promptly seek psychiatric assistance, which can have catastrophic repercussions. Social media has ingrained itself into people’s daily lives due to the advancement of online technology1. Social media texts are a great source for mental health analysis and possibly early intervention since many people with probable mental health disorders use sites like Twitter and Reddit to convey negative emotions and express stress. However, the exponentially rising volume of social media messages makes manual analysis of posts unfeasible. As a result, numerous studies use natural language processing (NLP) approaches to analyze social media for mental health automatically. 

Prior approaches to natural language processing (NLP) for mental health generally modeled social media analysis of mental health as text classification problems, where discriminative domain-specific pre-trained language models (PLMs) attained cutting-edge performance. One of their major drawbacks is that these algorithms provide predictions in a black-box manner with little interpretability, which considerably reduces their dependability in actual usage. Recent studies assessed the effectiveness of the newest large language models (LLMs), including ChatGPT2 and LLaMA, in identifying numerous mental health illnesses and providing in-depth justifications for their choices using Chain-of-Thought (CoT) approaches. They also conducted extensive human tests to demonstrate that ChatGPT can produce explanations for its right classifications comparable to those provided by humans, demonstrating its potential to improve the readability of mental health analysis. 

However, ChatGPT currently fails to match the performance of state-of-the-art supervised algorithms in a zero-shot or few-shot learning environment, which restricts its application in real-world situations. A practical method is to align foundation LLMs with the target domain by fine-tuning them on a limited amount of data. The development of LLMs for understandable mental health analysis faces two major obstacles. First, good training data are necessary for optimizing LLMs. Although several datasets for the investigation of mental health on social media contain brief extracts of casual content, open-source data that offers thorough and trustworthy justifications for detection findings is still lacking. The delicate study subject and the high expense of explanations written by subject-matter specialists are the major causes of this. 

Second, only a few open-source LLMs for usable interpretable mental health analyses have been made available to the general public. However, prompting or fine-tuning close-source LLMs like ChatGPT may be quite expensive3. The growth of the relevant research community needs to be improved by the high cost and lack of resources. They created the first multi-task and multisource Interpretable Mental Health Instruction (IMHI) dataset with 105K data samples to allow the tweaking and assessment of LLM instruction to close these gaps. First, they gather training data from 10 existing sources, covering 8 tasks, such as binary mental health detection tasks, multi-class mental health detection tasks, mental health cause/factor detection tasks, and mental risk and wellness factors detection tasks. 

Figure 1 shows a few instances of MentalLLaMA’s performance in various tasks, including mental health analysis. It also summarize the training data and underlying models of MentalLLaMA.

Social media posts and the labels that go with them are included in the data gathered. Second, customers get a thorough justification for every label with annotations. They employ expert-written few-shot questions and the gathered labels to inspire ChatGPT and elicit explanations from its replies, drawing inspiration from self-instruct’s success. They undertake automatic assessments of all acquired data to guarantee the explanations’ quality further. During these evaluations, the accuracy of the predictions, the correspondence between the labels and the explanations, and the explanations’ overall quality are assessed. With a well-crafted annotation strategy from subject-matter experts, they also conduct human evaluations for some of the gathered data. 

Thirdly, they utilize a rule-based approach to convert all gathered social media posts, labels, and explanations into instruction-based query-answer pairs. These are then used to create the IMHI dataset’s training data and assessment benchmark. Researchers from the University of Manchester introduce MentalLLaMA, the first open-source LLM series for interpretable mental health analysis with instruction-following capacity, based on the IMHI dataset. The LLaMA2 foundation models serve as the basis for training MentalLLaMA models. They specifically tweak the MentalLLaMA-7B, MentalLLaMA-chat-7B, and MentalLLaMA-chat-13B models. Figure 1 displays a few instances of MentalLLaMA’s excellent capabilities. 

Additionally, they thoroughly assess how well MentalLLaMA models perform against the IMHI assessment standard. They assess MentalLLaMA’s predictive accuracy by contrasting their classification results with cutting-edge discriminative techniques and other generative language models. According to the findings, MentalLLaMA-chat-13B performs more accurately than or on par with state-of-the-art levels on seven out of ten test sets. They assess the caliber of the explanations that are generated as well. The outcomes demonstrate that instruction tailoring, reinforcement learning from human feedback (RLHF), and growing model sizes improve the explanation creation quality. 

They created the first multi-task and multisource instruction-tuning dataset for interpretable mental health analysis on social media, the Interpretable Mental Health Instruction (IMHI) dataset with 105K samples. • They suggest MentalLLaMA, the first instruction-following large language model that is open-source and able to do interpretable analyses of mental health. MentalLLaMA may use social media data to undertake mental health analysis, and it can produce compelling justifications for its conclusions. • With 19K test samples, which include 8 tasks and 10 test sets, they present the first comprehensive assessment standard for understandable mental health analysis. On this benchmark, they contrast MentalLLaMA with currently used techniques. Results and analysis show that MentalLLaMA is superior, and future work will focus on improving LLMs for comprehensible mental health analysis.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Researchers from the University of Manchester Introduce MentalLLaMA: The First Open-Source LLM Series for Readable Mental Health Analysis with Capacity of Instruction Following appeared first on MarkTechPost.

A New Machine Learning Research from MIT Shows How Large Language Mode …

Large Language Models (LLMs) have shown some incredible skills in recent times. The well-known ChatGPT, which has been built on the GPT’s transformer architecture, has gained massive popularity due to its human-imitating capabilities. From question answering and text summarization to content generation and language translation, it has a number of use cases. With their excessive popularity, what these models have truly learned during their training has come into question.

According to one theory, LLMs are excellent at spotting and forecasting patterns and correlations in data but fall short in their comprehension of the fundamental mechanisms that produce data. They resemble very competent statistical engines in principle, albeit they might not actually have comprehension. Another theory states that LLMs learn correlations and grow more condensed, coherent, and understandable models of the generative processes underlying the training data.

Recently, two researchers from the Massachusetts Institute of Technology have studied Large Language Models to understand better how they learn. The research particularly explores whether these models actually construct a cohesive model of the underlying data-generating process, frequently referred to as a “world model,” or if they merely memorize statistical patterns.

The researchers have used probing tests with a family of LLMs Llama-2 models by creating six datasets that cover different spatiotemporal scales and comprise names of places, events, and the related space or time coordinates. The locations in these databases span the entire world, including the United States New York City, the dates on which works of art and entertainment were first released, and the dates on which news headlines were first published. They have used linear regression probes on the internal activations of the LLMs’ layers to look into whether LLMs create representations of space and time. These probes forecast the precise position or time in the real world corresponding to each dataset name.

The research has shown that LLMs learn linear representations of both space and time at different scales. This implies that the models learn about spatial and temporal aspects in a structured and organized manner. They grasp the relationships and patterns throughout space and time in a methodical way rather than just memorizing data items. It has also been discovered that LLMs’ representations are resilient to changes in instructions or prompts. Even when the manner in which the information is provided differs, the models consistently demonstrate a good understanding and representation of spatial and temporal information.

According to the study, the representations are not restricted to any one particular class of entities. Cities, landmarks, historical individuals, pieces of art, or news headlines are all represented uniformly by LLMs in terms of space and time, by which it can be inferred that the models produce a comprehensive comprehension of these dimensions. The researchers have even recognized particular LLM neurons they describe as ‘space neurons’ and ‘time neurons.’ These neurons accurately express spatial and temporal coordinates, demonstrating the existence of specialized components in the models that process and represent space and time.

In conclusion, the results of this study have reinforced the notion that contemporary LLMs go beyond rote memorizing of statistics and instead learn structured and significant information about important dimensions like space and time. It is definitely possible to say that LLMs are more than just statistical engines and can represent the underlying structure of the data-generating processes they are trained on.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post A New Machine Learning Research from MIT Shows How Large Language Models (LLMs) Comprehend and Represent the Concepts of Space and Time appeared first on MarkTechPost.

New – No-code generative AI capabilities now available in Amazon Sag …

Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document data (such as sentiment analysis, document processing, or object detection in images). Custom models allow you to build predictive models for use cases such as demand forecasting, customer churn, and defect detection in manufacturing.
We are excited to announce that SageMaker Canvas is expanding its support of ready-to-use models to include foundation models (FMs), enabling you to use generative AI to generate and summarize content. You can use natural language with a conversational chat interface to perform tasks such as creating narratives, reports, and blog posts; answering questions; summarizing notes and articles; and explaining concepts, without writing a single line of code. Your data is not used to improve the base models, is not shared with third-party model providers, and stays entirely within your secure AWS environment.
SageMaker Canvas allows you to access a variety of FMs that include Amazon Bedrock models (such as Claude 2 from Anthropic and Jurassic-2 from AI21 Labs) and publicly available Amazon SageMaker JumpStart models, including Falcon-7B-Instruct, Falcon-40B-Instruct, and MPT-7B-Instruct). You may use a single model or up to three models to compare model responses side by side. In SageMaker Canvas, Amazon Bedrock models are always active, allowing you to use them instantly. SageMaker JumpStart models can be started and deployed in your AWS account on demand and are automatically shut down after two hours of inactivity.
Let’s explore how to use the generative AI capabilities of SageMaker Canvas. For this post, we work with a fictitious enterprise customer support use case as an example.
Prerequisites
Complete the following prerequisite steps:

Create an AWS account.
Set up SageMaker Canvas and optionally configure it to use a VPC without internet access.
Set up model access in Amazon Bedrock.
Request service quota increases for g5.12xlarge and g5.2xlarge, if required, in your Region. These instances are required to host the SageMaker JumpStart model endpoints. Other instances may be selected based on availability.

Handling customer complaints
Let’s say that you’re a customer support analyst who handles complaints for a bicycle company. When receiving a customer complaint, you can use SageMaker Canvas to analyze the complaint and generate a personalized response to the customer. To do so, complete the following steps:

On the SageMaker console, choose Canvas in the navigation pane.
Choose your domain and user profile and choose Open Canvas to open the SageMaker Canvas application.

SageMaker Canvas is also accessible using single sign-on or other existing identity providers (IdPs) without having to first access the SageMaker console.

Choose Generate, extract and summarize content to open the chat console.
With the Claude 2 model selected, enter your instructions to retrieve the customer sentiment for the provided complaint and press Enter.
You may want to know the specific problems with the bicycle, especially if it’s a long complaint. So, ask for the problems with the bicycle. Note that you don’t have to repost the complaint because SageMaker Canvas stores the context for your chat.

Now that we understand the customer’s problem, you can send them a response including a link to the company’s feedback form.

In the input window, request a response to the customer complaint.
If you want to generate another response from the FM, choose the refresh icon in the response section.

The original response and all new responses are paginated within the response section. Note that the new response is different from the original response. You can choose the copy icon in the response section to copy the response to an email or document, as required.

You can also modify the model’s response by requesting specific changes. For example, let’s ask the model to add a $50 gift card offer to the email response.

Comparing model responses
You can compare the model responses from multiple models (up to three). Let’s compare two Amazon Bedrock models (Claude 2 and Jurassic-2 Ultra) with a SageMaker JumpStart model (Falcon-7B-Instruct) to evaluate and find the best model for your use case:

Choose New chat to open a chat interface.
On the model drop-down menu, choose Start up another model.
On the Foundation models page, under Amazon SageMaker JumpStart models, choose Falcon-7B-Instruct and in the right pane, choose Start up model.

The model will take around 10 minutes to start.

On the Foundation models page, confirm that the Falcon-7B-Instruct model is active before proceeding to the next step.
Choose New chat to open a chat interface.
Choose Compare to display a drop-down menu for the second model, then choose Compare again to display a drop-down menu for the third model.
Choose the Falcon-7B-Instruct model on the first drop-down menu, Claude 2 on the second drop-down menu, and Jurassic-2 Ultra on the third drop-down menu.
Enter your instructions in the chat input box and press Enter.

You will see responses from all three models.

Clean up
Any SageMaker JumpStart models started from SageMaker Canvas will be automatically shut down after 2 hours of inactivity. If you want to shut down these models sooner to save costs, follow the instructions in this section. Note that Amazon Bedrock models are not deployed in your account, so there is no need to shut these down.

To shut down the Falcon-40B-Instruct SageMaker JumpStart model, you can choose from two methods:

On the results comparison page, choose the Falcon-7B-Instruct model’s options menu (three dots), then choose Shut down model.
Alternatively, choose New chat, and on the model drop-down menu, choose Start up another model. Then, on the Foundation models page, under Amazon SageMaker JumpStart models, choose Falcon-7B-Instruct and in the right pane, choose Shut down model.

Choose Log out in the left pane to log out of the SageMaker Canvas application to stop the consumption of SageMaker Canvas workspace instance hours and release all resources used by the workspace instance.

Conclusion
In this post, you learned how to use SageMaker Canvas to generate text with ready-to-use models from Amazon Bedrock and SageMaker JumpStart. You used the Claude 2 model to analyze the sentiment of a customer complaint, ask questions, and generate a response without a single line of code. You also started a publicly available model and compared responses from three models.
For Amazon Bedrock models, you are charged based on the volume of input tokens and output tokens as per the Amazon Bedrock pricing page. Because SageMaker JumpStart models are deployed on SageMaker instances, you are charged for the duration of usage based on the instance type as per the Amazon SageMaker pricing page.
SageMaker Canvas continues to democratize AI with a no-code visual, interactive workspace that allows business analysts to build ML models that address a wide variety of use cases. Try out the new generative AI capabilities in SageMaker Canvas today! These capabilities are available in all Regions where Amazon Bedrock or SageMaker JumpStart are available.

About the Authors
Anand Iyer has been a Principal Solutions Architect at AWS since 2016. Anand has helped global healthcare, financial services, and telecommunications clients architect and implement enterprise software solutions using AWS and hybrid cloud technologies. He has an MS in Computer Science from Louisiana State University Baton Rouge, and an MBA from USC Marshall School of Business, Los Angeles. He is AWS certified in the areas of Security, Solutions Architecture, and DevOps Engineering.
Gavin Satur is a Principal Solutions Architect at Amazon Web Services. He works with enterprise customers to build strategic, well-architected solutions and is passionate about automation. Outside of work, he enjoys family time, tennis, cooking, and traveling.
Gunjan Jain is an AWS Solutions Architect in SoCal and primarily works with large financial services companies. He helps with cloud adoption, cloud optimization, and adopting best practices for being Well-Architected on the cloud.
Harpreet Dhanoa, a seasoned Senior Solutions Architect at AWS, has a strong background in designing and building scalable distributed systems. He is passionate about machine learning, observability, and analytics. He enjoys helping large-scale customers build their cloud enterprise strategy and transform their business in AWS. In his free time, Harpreet enjoys playing basketball with his two sons and spending time with his family.

Whisper models for automatic speech recognition now available in Amazo …

Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. Sagemaker JumpStart is the machine learning (ML) hub of SageMaker that provides access to foundation models in addition to built-in algorithms and end-to-end solution templates to help you quickly get started with ML.
You can also do ASR using Amazon Transcribe ,a fully-managed and continuously trained automatic speech recognition service.
In this post, we show you how to deploy the OpenAI Whisper model and invoke the model to transcribe and translate audio.
The OpenAI Whisper model uses the huggingface-pytorch-inference container. As a SageMaker JumpStart model hub customer, you can use ASR without having to maintain the model script outside of the SageMaker SDK. SageMaker JumpStart models also improve security posture with endpoints that enable network isolation.
Foundation models in SageMaker
SageMaker JumpStart provides access to a range of models from popular model hubs including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and can be adapted to a wide category of use cases, such as text summarization, generating digital art, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.
You can now find foundation models from different model providers within SageMaker JumpStart, enabling you to get started with foundation models quickly. SageMaker JumpStart offers foundation models based on different tasks or model providers, and you can easily review model characteristics and usage terms. You can also try these models using a test UI widget. When you want to use a foundation model at scale, you can do so without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, you trust that your data, whether used for evaluating or using the model at scale, won’t be shared with third parties.
OpenAI Whisper foundation models
Whisper is a pre-trained model for ASR and speech translation. Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, and others, from OpenAI. The original code can be found in this GitHub repository.
Whisper is a Transformer-based encoder-decoder model, also referred to as a sequence-to-sequence model. It was trained on 680 thousand hours of labelled speech data annotated using large-scale weak supervision. Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning.
The models were trained on either English-only data or multilingual data. The English-only models were trained on the task of speech recognition. The multilingual models were trained on speech recognition and speech translation. For speech recognition, the model predicts transcriptions in the same language as the audio. For speech translation, the model predicts transcriptions to a different language to the audio.
Whisper checkpoints come in five configurations of varying model sizes. The smallest four are trained on either English-only or multilingual data. The largest checkpoints are multilingual only. All ten of the pre-trained checkpoints are available on the Hugging Face hub. The checkpoints are summarized in the following table with links to the models on the hub:

Model name
Number of parameters
Multilingual

whisper-tiny
39 M
Yes

whisper-base
74 M
Yes

whisper-small
244 M
Yes

whisper-medium
769 M
Yes

whisper-large
1550 M
Yes

whisper-large-v2
1550 M
Yes

Lets explore how you can use Whisper models in SageMaker JumpStart.
OpenAI Whisper foundation models WER and latency comparison
The word error rate (WER) for different OpenAI Whisper models based on the LibriSpeech test-clean is shown in the following table.  WER is a common metric for the performance of a speech recognition or machine translation system. It measures the difference between the reference text (the ground truth or the correct transcription) and the output of an ASR system in terms of the number of errors, including substitutions, insertions, and deletions that are needed to transform the ASR output into the reference text. These numbers have been taken from the Hugging Face website.

Model
WER (percent)

whisper-tiny
7.54

whisper-base
5.08

whisper-small
3.43

whisper-medium
2.9

whisper-large
3

whisper-large-v2
3

For this blog, we took the below audio file and compared the latency of speech recognition across different whisper models. Latency is the amount of time from the moment that a user sends a request until the time that your application indicates that the request has been completed. The numbers in the following table represent the average latency for a total of 100 requests using the same audio file with the model hosted on the ml.g5.2xlarge instance.

Model
Average latency(s)
Model output

whisper-tiny
0.43
We are living in very exciting times with machine lighting. The speed of ML model development will really actually increase. But you won’t get to that end state that we won in the next coming years. Unless we actually make these models more accessible to everybody.

whisper-base
0.49
We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we won in the next coming years. Unless we actually make these models more accessible to everybody.

whisper-small
0.84
We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.

whisper-medium
1.5
We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.

whisper-large
1.96
We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.

whisper-large-v2
1.98
We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won’t get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody.

Solution walkthrough
You can deploy Whisper models using the Amazon SageMaker console or using an Amazon SageMaker Notebook. In this post, we demonstrate how to deploy the Whisper API using the SageMaker Studio console or a SageMaker Notebook and then use the deployed model for speech recognition and language translation. The code used in this post can be found in this GitHub notebook.
Let’s expand each step in detail.
Deploy Whisper from the console

To get started with SageMaker JumpStart, open the Amazon SageMaker Studio console and go to the launch page of SageMaker JumpStart and select Get Started with JumpStart.
To choose a Whisper model, you can either use the tabs at the top or use the search box at the top right as shown in the following screenshot. For this example, use the search box on the top right and enter Whisper, and then select the appropriate Whisper model from the dropdown menu.
After you select the Whisper model, you can use the console to deploy the model. You can select an instance for deployment or use the default.

Deploy the foundation model from a Sagemaker Notebook
The steps to first deploy and then use the deployed model to solve different tasks are:

Set up
Select a model
Retrieve artifacts and deploy an endpoint
Use deployed model for ASR
Use deployed model for language translation
Clean up the endpoint

Set up
This notebook was tested on an ml.t3.medium instance in SageMaker Studio with the Python 3 (data science) kernel and in an Amazon SageMaker Notebook instance with the conda_python3 kernel.

%pip install –upgrade sagemaker –quiet

Select a pre-trained model
Set up a SageMaker Session using Boto3, and then select the model ID that you want to deploy.

model_id = “huggingface-asr-whisper-large-v2”

Retrieve artifacts and deploy an endpoint
Using SageMaker, you can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. To host the pre-trained model, create an instance of sagemaker.model.Model and deploy it. The following code uses the default instance ml.g5.2xlarge for the inference endpoint of a whisper-large-v2 model. You can deploy the model on other instance types by passing instance_type in the JumpStartModel class. The deployment might take few minutes.

#Deploying the model

from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.serializers import JSONSerializer

my_model = JumpStartModel(model_id=dropdown.value)
predictor = my_model.deploy()

Automatic speech recognition
Next, you read the sample audio file, sample1.wav, from a SageMaker Jumpstart public Amazon Simple Storage Service (Amazon S3) location and pass it to the predictor for speech recognition. You can replace this sample file with any other sample audio file but make sure the .wav file is sampled at 16 kHz because is required by the automatic speech recognition models. The input audio file must be less than 30 seconds.

from scipy.io.wavfile import read
import json
import boto3
from sagemaker.jumpstart import utils

# The wav files must be sampled at 16kHz (this is required by the automatic speech recognition models), so make sure to resample them if required. The input audio file must be less than 30 seconds.
s3_bucket = utils.get_jumpstart_content_bucket(boto3.Session().region_name)
key_prefix = “training-datasets/asr_notebook_data”
input_audio_file_name = “sample1.wav”

s3_client = boto3.client(“s3″)
s3_client.download_file(s3_bucket, f”{key_prefix}/{input_audio_file_name }”, input_audio_file_name )

with open(input_audio_file_name, “rb”) as file:
wav_file_read = file.read()

# If you receive client error (413) please check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5MB
response = predictor.predict(wav_file_read)
print(response[“text”])

This model supports many parameters when performing inference. They include:

max_length: The model generates text until the output length. If specified, it must be a positive integer.
language and task: Specify the output language and task here. The model supports the task of transcription or translation.
max_new_tokens: The maximum numbers of tokens to generate.
num_return_sequences: The number of output sequences returned. If specified, it must be a positive integer.
num_beams: The number of beams used in the greedy search. If specified, it must be integer greater than or equal to num_return_sequences.
no_repeat_ngram_size: The model ensures that a sequence of words of no_repeat_ngram_size isn’t repeated in the output sequence. If specified, it must be a positive integer greater than 1.
temperature: This controls the randomness in the output. Higher temperature results in an output sequence with low-probability words and lower temperature results in an output sequence with high-probability words. If temperature approaches 0, it results in greedy decoding. If specified, it must be a positive float.
early_stopping: If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
do_sample: If True, sample the next word for the likelihood. If specified, it must be boolean.
top_k: In each step of text generation, sample from only the top_k most likely words. If specified, it must be a positive integer.
top_p: In each step of text generation, sample from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.

You can specify any subset of the preceding parameters when invoking an endpoint. Next, we show you an example of how to invoke an endpoint with these arguments.
Language translation
To showcase language translation using Whisper models, use the following audio file in French and translate it to English. The file must be sampled at 16 kHz (as required by the ASR models), so make sure to resample files if required and make sure your samples don’t exceed 30 seconds.

Download the sample_french1.wav from SageMaker JumpStart from the public S3 location so it can be passed in payload for translation by the Whisper model.

input_audio_file_name = “sample_french1.wav”

s3_client.download_file(s3_bucket, f”{key_prefix}/{input_audio_file_name }”, input_audio_file_name )

Set the task parameter as translate and language as French to force the Whisper model to perform speech translation.

with open(input_audio_file_name, “rb”) as file:
wav_file_read = file.read()

payload = {“audio_input”: wav_file_read.hex(), “language”: “french”, “task”: “translate”}

predictor.serializer = JSONSerializer()
predictor.content_type = “application/json”

Use predictor to predict the translation of the language. If you receive client error (error 413), check the payload size to the endpoint. Payloads for SageMaker invoke endpoint requests are limited to about 5 MB.

response = predictor.predict(payload)
print(response[“text”])

The text output translated to English from the French audio file follows:

[‘ Welcome to JPBSystem. We have more than 150 employees and 90% of sales. We have developed about 15 patents.’]

Clean up
After you’ve tested the endpoint, delete the SageMaker inference endpoint and delete the model to avoid incurring charges.
Conclusion
In this post, we showed you how to test and use OpenAI Whisper models to build interesting applications using Amazon SageMaker. Try out the foundation model in SageMaker today and let us know your feedback!
This guidance is for informational purposes only. You should still perform your own independent assessment and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party model referenced in this guidance. AWS has no control or authority over the third-party model referenced in this guidance and does not make any representations or warranties that the third-party model is secure, virus-free, operational, or compatible with your production environment and standards. AWS does not make any representations, warranties, or guarantees that any information in this guidance will result in a particular outcome or result.

About the authors
Hemant Singh is an Applied Scientist with experience in Amazon SageMaker JumpStart. He got his masters from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.
Rachna Chadha is a Principal Solution Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in future and bring economical and social prosperity. In her spare time, Rachna likes spending time with her family, hiking and listening to music.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Reinventing a cloud-native federated learning architecture on AWS

Machine learning (ML), especially deep learning, requires a large amount of data for improving model performance. Customers often need to train a model with data from different regions, organizations, or AWS accounts. It is challenging to centralize such data for ML due to privacy requirements, high cost of data transfer, or operational complexity.
Federated learning (FL) is a distributed ML approach that trains ML models on distributed datasets. The goal of FL is to improve the accuracy of ML models by using more data, while preserving the privacy and the locality of distributed datasets. FL increases the amount of data available for training ML models, especially data associated with rare and new events, resulting in a more general ML model. Existing partner open-source FL solutions on AWS include FedML and NVIDIA FLARE. These open-source packages are deployed in the cloud by running in virtual machines, without using the cloud-native services available on AWS.
In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.
We first discuss different approaches and challenges of FL. We then demonstrate how to build a cloud-native FL architecture on AWS. The sample code to build this architecture is available on GitHub. We use the AWS Cloud Development Kit (AWS CDK) to deploy the architecture with one-click deployment. The sample code demos a scenario where the server and all clients belong to the same organization (the same AWS account), but their datasets cannot be centralized due to data localization requirements. The sample code supports horizontal and synchronous FL for training neural network models. The ML framework used at FL clients is TensorFlow.
Overview of federated learning
FL typically involves a central FL server and a group of clients. Clients are compute nodes that perform local training. In an FL training round, the central server first sends a common global model to a group of clients. Clients train the global model with local data, then provide local models back to the server. The server aggregates the local models into a new global model, then starts a new training round. There may be tens of training rounds until the global model converges or until the number of training rounds reaches a threshold. Therefore, FL exchanges ML models between the central FL server and clients, without moving training data to a central location.
There are two major categories of FL depending on the client type: cross-device and cross-silo. Cross-device FL trains a common global models by keeping all the training data locally on a large number of devices, such as mobile phones or IoT devices, with limited and unstable network connections. Therefore, the design of cross-device FL needs to consider frequent joining and dropout of FL clients.
Cross-silo FL trains a global model on datasets distributed at different organizations and geo-distributed data centers. These datasets are prohibited from moving out of organizations and data center regions due to data protection regulations, operational challenges (such as data duplication and synchronization), or high costs. In contrast with cross-device FL, cross-silo FL assumes that organizations or data centers have reliable network connections, powerful computing resources, and addressable datasets.
FL has been applied to various industries, such as finance, healthcare, medicine, and telecommunications, where privacy preservation is critical or data localization is required. FL has been used to train a global model for financial crime detection among multiple financial institutions. The global model outperforms models trained with only local datasets by 20%. In healthcare, FL has been used to predict mortality of hospitalized patients based on electronic health records from multiple hospitals. The global model predicting mortality outperforms local models at all participating hospitals. FL has also been used for brain tumor segmentation. The global models for brain tumor segmentation perform similarly to the model trained by collecting distributed datasets at a central location. In telecommunications, FL can be applied to edge computing, wireless spectrum management, and 5G core networks.
There are many other ways to classify FL:

Horizontal or vertical – Depending on the partition of features in distributed datasets, FL can be classified as horizontal or vertical. In horizontal FL, all distributed datasets have the same set of features. In vertical FL, datasets have different groups of features, requiring additional communication patterns to align samples based on overlapped features.
Synchronous or asynchronous – Depending on the aggregation strategy at an FL server, FL can be classified as synchronous or asynchronous. A synchronous FL server aggregates local models from a selected set of clients into a global model. An asynchronous FL server immediately updates the global model after a local model is received from a client, thereby reducing the waiting time and improving training efficiency.
Hub-and-spoke or peer-to-peer – The typical FL topology is hub-and-spoke, where a central FL server coordinates a set of clients. Another FL topology is peer-to-peer without any centralized FL server, where FL clients aggregate information from neighboring clients to learn a model.

Challenges in FL
You can address the following challenges using algorithms running at FL servers and clients in a common FL architecture:

Data heterogeneity – FL clients’ local data can vary (i.e., data heterogeneity) due to particular geographic locations, organizations, or time windows. Data heterogeneity impacts the accuracy of global models, leading to more training iterations and longer training time. Many solutions have been proposed to mitigate the impact of data heterogeneity, such as optimization algorithms, partial data sharing among clients, and domain adaptation.
Privacy preservation – Local and global models may leak private information via an adversarial attack. Many privacy preservation approaches have been proposed for FL. A secure aggregation approach can be used to preserve the privacy of local models exchanged between FL servers and clients. Local and global differential privacy approaches bound the privacy loss by adding noise to local or global models, which provides a controlled trade-off between privacy and model accuracy. Depending on the privacy requirements, combinations of different privacy preservation approaches can be used.
Federated analytics – Federated analytics provides statistical measurements of distributed datasets without violating privacy requirements. Federated analytics is important not only for data analysis across distributed datasets before training, but also for model monitoring at inference.

Despite these challenges of FL algorithms, it is critical to build a secure architecture that provides end-to-end FL operations. One important challenge to building such an architecture is to enable the ease of deployment. The architecture must coordinate FL servers and clients for FL model building, training, and deployment, including continuous integration and continuous development (CI/CD) among clients, traceability, and authentication and access control for FL servers and clients. These features are similar to centralized ML operations (ML Ops), but are more challenging to implement because more parties are involved. The architecture also needs to be flexible to implement different FL topologies and synchronous or asynchronous aggregation.
Solution overview
We propose a cloud-native FL architecture on AWS, as shown in the following diagram. The architecture includes a central FL server and two FL clients. In reality, the number of FL clients can reach hundreds for cross-silo clients. The FL server must be on the AWS Cloud because it consists of a suite of microservices offered on the cloud. The FL clients can be on AWS or on the customer premises. The FL clients host their own local dataset and have their own IT and ML system for training ML models.
During FL model training, the FL server and a group of clients exchange ML models. That is, the clients download a global ML model from the server, perform local training, and upload local models to the server. The server downloads local models, aggregates local models into a new global model. This model exchange procedure is a single FL training round. The FL training round repeats until the global model reaches a given accuracy or the number of training rounds reach a threshold.

Figure 1 – A cloud-native FL architecture for model training between a FL server and FL clients.

Prerequisites
To implement this solution, you need an AWS account to launch the services for a central FL server and the two clients. On-premises FL clients need to install the AWS Command Line Interface (AWS CLI), which allows access to the AWS services at the FL server, including Amazon Simple Queue Service (Amazon SQS), Amazon Simple Storage Service (Amazon S3), and Amazon DynamoDB.
Federated learning steps
In this section, we walk through the proposed architecture in Figure 1. At the FL server, the AWS Step Functions state machine runs a workflow as shown in Figure 2, which executes Steps 0, 1, and 5 from Figure 1. The state machine initiates the AWS services at the server (Step 0) and iterates FL training rounds. For each training round, the state machine sends out an Amazon Simple Notification Service (Amazon SNS) notification to the topic global_model_ready, along with a task token (Step 1). The state machine then pauses and waits for a callback with the task token. There are SQS queues subscribing to the global_model_ready topic. Each SQS queue corresponds to an FL client and queues the notifications sent from the server to the client.

Figure 2 – The workflow at the Step Functions state machine.

Each client keeps pulling messages from its assigned SQS queue. When a global_model_ready notification is received, the client downloads a global model from Amazon S3 (Step 2) and starts local training (Step 3). Local training generates a local model. The client then uploads the local model to Amazon S3 and writes the local model information, along with the received task token, to the DynamoDB table (Step 4).
We implement the FL model registry using Amazon S3 and DynamoDB. We use Amazon S3 to store the global and local models. We use DynamoDB table to store local model information because local model information can be different between FL algorithms, which requires a flexible schema supported by a DynamoDB table.
We also enable a DynamoDB stream to trigger a Lambda function, so that whenever a record is written into the DynamoDB table (when a new local model is received), a Lambda function is triggered to check if required local models are collected (Step 5). If so, the Lambda function runs the aggregation function to aggregate the local models into global models. The resulting global model is written to Amazon S3. The function also sends a callback, along with the task token retrieved from the DynamoDB table, to the Step Functions state machine. The state machine then determines if the FL training should be continued with a new training round or should be stopped based on a condition, for example, the number of training rounds reaching a threshold.
Each FL client uses the following sample code to interact with the FL server. If you want to customize the local training at your FL clients, the localTraining() function can be modified as long as the returned values are local_model_name and local_model_info for uploading to the FL server. You can select any ML framework for training local models at FL clients as long as all clients use the same ML framework.

# Step 2: receive notifications and model file name from its SQS queue
client.receiveNotificationsFromServer(sqs_region, client_queue_name)

# Step 3: download a global model and train locally
local_model_name, local_model_info = client.localTraining(global_model_name, s3_fl_model_registry)

# Step 4: upload the local model and local model info to the FL server
client.uploadToFLServer(s3_fl_model_registry, local_model_name, dynamodb_table_model_info, local_model_info)

The Lambda function for running the aggregation function at the server has the following sample code. If you want to customize the aggregation algorithm, you need to modify the fedAvg() function and the output.

# Step 5: aggregate local models in the Lambda function
def lambda_handler(event, context):
# obtain task_name from the event triggered by the DynamoDB Stream
task_name = event[‘Records’][0][‘dynamodb’][‘Keys’][‘taskName’][‘S’]

# retrieve transactions from the DynamoDB table
transactions = readFromFLServerTaskTable(os.environ[‘TASKS_TABLE_NAME’], task_name)

# read local model info from required clients
# token is a call back token from the Step Functions state machine
local_model_info, round_id, token = receiveUpdatedModelsFromClients(transactions, task_name)

# fedAvg function aggregates local models into a global model and stores the global model in S3
global_model_name, avg_train_acc, avg_test_acc, avg_train_loss, avg_test_loss = fedAvg(local_model_info, round_id)

# output sent to the Step Function state machine
output = {‘taskName’: task_name, ’roundId’: str(round_id), ‘trainAcc’: str(avg_train_acc), ‘testAcc’: str(avg_test_acc), ‘trainLoss’: str(avg_train_loss), ‘testLoss’: str(avg_test_loss), ‘weightsFile’: str(global_model_name)}

# send call back to the Step Functions state machine to report that the task identified by the token successfully completed
step_client = boto3.client(‘stepfunctions’)
out_str = json.dumps(output)
step_client.send_task_success(taskToken=token, output=out_str)
This architecture has two innovative designs. First, the FL server uses serverless services, such as Step Functions and Lambda. Therefore, no computing instance is kept running for the FL server, which minimizes the computing cost. Second, FL clients pull messages from their assigned SQS queues and upload or download models and info to or from services at the FL server. This design avoids the FL server directly accessing resources at the clients, which is critical to provide private and flexible IT and ML environments (on premises or on the AWS Cloud) to FL clients.

Advantages of being cloud-native
This architecture is cloud-native and provides end-to-end transparency by using AWS services with proven security and operational excellence. For example, you can have cross-account clients to assume roles to access the resource at the FL server. For on-premises clients, the AWS CLI and AWS SDK for Python (Boto3) at clients automatically provide secure network connections between the FL server and clients. For clients on the AWS Cloud, you can use AWS PrivateLink and AWS services with data encryption in transit and at rest for data protection. You can use Amazon Cognito and AWS Identity and Access Management (IAM) for the authentication and access control of FL servers and clients. For deploying the trained global model, you can use ML Ops capabilities in Amazon SageMaker.
The cloud-native architecture also enables integration with customized ML frameworks and federated learning algorithms and protocols. For example, you can select a ML framework for training local models at FL clients and customize different aggregation algorithms as scripts running in Lambda functions at the server. Also, you can modify the workflows in Step Functions to accommodate different communication protocols between the server and clients.
Another advantage of the cloud-native architecture is the ease of deployment by using IaC tools offered for the cloud. You can use the AWS Cloud Development Kit (AWS CDK) and AWS CloudFormation for one-click deployment.
Conclusion
New privacy laws continue to be implemented worldwide, and technology infrastructures are rapidly expanding across multiple regions and extending to network edges. Federated learning helps cloud customers use distributed datasets to train accurate ML models in a privacy-preserving manner. Federated learning also supports data localization and potentially saves costs, because it does not require large amounts of raw data to be moved or shared.
You can start experimenting and building cloud-native federated learning architectures for your use cases. You can customize the architecture to support various ML frameworks, such as TensorFlow or PyTorch. You can also customize it to support different FL algorithms, including asynchronous federated learning, aggregation algorithms, and differential privacy algorithms. You can enable this architecture with FL Ops functionalities using ML Ops capabilities in Amazon SageMaker.

About the Authors
Qiong (Jo) Zhang, PhD, is a Senior Partner SA at AWS, specializing in AI/ML. Her current areas of interest include federated learning, distributed training, and generative AI.  She holds 30+ patents and has co-authored 100+ journal/conference papers. She is also the recipient of the Best Paper Award at IEEE NetSoft 2016, IEEE ICC 2011, ONDM 2010, and IEEE GLOBECOM 2005.
Parker Newton is an applied scientist in AWS Cryptography. He received his Ph.D. in cryptography from U.C. Riverside, specializing in lattice-based cryptography and the complexity of computational learning problems. He is currently working at AWS in secure computation and privacy, designing cryptographic protocols to enable customers to securely run workloads in the cloud while preserving the privacy of their data.
Olivia Choudhury, PhD, is a Senior Partner SA at AWS. She helps partners, in the Healthcare and Life Sciences domain, design, develop, and scale state-of-the-art solutions leveraging AWS. She has a background in genomics, healthcare analytics, federated learning, and privacy-preserving machine learning. Outside of work, she plays board games, paints landscapes, and collects manga.
Gang Fu  is a Healthcare Solution Architect at AWS. He holds a PhD in Pharmaceutical Science from the University of Mississippi and has over ten years of technology and biomedical research experience. He is passionate about technology and the impact it can make on healthcare.
Kris is a renowned leader in machine learning and generative AI, with a career spanning Goldman Sachs, consulting for major banks, and successful ventures like Foglight and SiteRock. He founded Indigo Capital Management and co-founded adaptiveARC, focusing on green energy tech. Kris also supports non-profits aiding assault victims and disadvantaged youth.
Bill Horne is a General Manager in AWS Cryptography. He leads the Cryptographic Computing Program, consisting of a team of applied scientists and engineers who are solving customer problems using emerging technologies like secure multiparty computation and homomorphic encryption. Prior to joining AWS in 2020 he was the VP and General Manager of Intertrust Secure Systems and was the Director of Security Research at Hewlett-Packard Enterprise. He is the author of 60 peer reviewed publications in the areas of security and machine learning, and holds 50 granted patents and 58 patents pending.

Revolutionizing CPR Training With CPR-Coach: Harnessing Artificial Int …

Cardiopulmonary Resuscitation (CPR) is a life-saving medical procedure designed to revive individuals who have experienced cardiac arrest, meaning the heart suddenly stops beating effectively or someone stops breathing. This procedure aims to maintain the flow of oxygenated blood to vital organs, particularly the brain, until professional medical help arrives or until the person can be transported to a healthcare facility for advanced care. Performing CPR requires endurance but becomes straightforward as soon as you follow the correct movements. However, there are several actions to master, such as chest compressions, rescue breaths, and early defibrillation (having the right equipment). Since CPR is a vital emergency skill, it is essential to spread this fundamental expertise as far as possible. Nevertheless, its assessment traditionally relies on physical mannequins and instructors, resulting in high training costs and limited efficiency. Furthermore, since both instructors and this very specific equipment are not available everywhere, this approach results hardly scalable.

In a groundbreaking development, the research presented in this article introduced a vision-based system to enhance error action recognition and skill assessment during CPR. This innovative approach marks a significant departure from conventional training methods. Specifically, 13 distinct single-error actions and 74 composite error actions associated with external cardiac compression have been identified and categorized. This innovative CPR-based research is the first to analyze action-specific errors commonly committed during this procedure. The researchers have curated a comprehensive video dataset called CPR-Coach to facilitate this novel approach. An overview of some of the most typical errors annotated in the dataset is reported below.

https://shunli-wang.github.io/CPR-Coach/

Using CPR-Coach as their reference dataset, the authors embarked on a thorough investigation, evaluating and comparing the performance of various action recognition models that leverage different data modalities. Their objective is to address the challenge posed by the single-class training and multi-class testing problem inherent in CPR skill assessment. To tackle this issue, they introduced a pioneering framework called ImagineNet, inspired by human cognition principles. ImagineNet is designed to enhance the model’s capacity for recognizing multiple errors within the CPR context, even under the constraints of limited supervision.

An overview of ImagineNet’s workflow is presented in the figure below.

https://shunli-wang.github.io/CPR-Coach/

This research represents a significant leap forward in the assessment of CPR skills, offering the potential to reduce training costs and enhance the efficiency of CPR instruction through the innovative application of vision-based technology and advanced deep learning models. Ultimately, this approach has the potential to improve the quality of CPR training and, by extension, the outcomes for individuals experiencing cardiac emergencies.

This was the summary of CPR-Coach and ImagineNet, two essential AI tools designed to analyze CPR-related errors and automatize the CPR assessment task. If you are interested and want to learn more about it, please feel free to refer to the links cited below. 

Check out the Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Revolutionizing CPR Training With CPR-Coach: Harnessing Artificial Intelligence for Error Recognition and Assessment appeared first on MarkTechPost.

UC Berkeley and UCSF Researchers Revolutionize Neural Video Generation …

In response to the challenges faced in generating videos from text prompts, a team of researchers has introduced a new approach called LLM-grounded Video Diffusion (LVD). The core issue at hand is that existing models struggle to create videos that accurately represent complex spatiotemporal dynamics described in textual prompts.

To provide context, text-to-video generation is a complex task because it requires generating videos solely based on textual descriptions. While there have been previous attempts to address this problem, they often fall short in producing videos that align well with the given prompts in terms of spatial layouts and temporal dynamics.

LVD, however, takes a different approach. Instead of directly generating videos from text inputs, it employs Large Language Models (LLMs) to first create dynamic scene layouts (DSLs) based on the text descriptions. These DSLs essentially act as blueprints or guides for the subsequent video generation process.

What’s particularly intriguing is that the researchers found that LLMs possess a surprising capability to generate these DSLs that not only capture spatial relationships but also intricate temporal dynamics. This is crucial for generating videos that accurately reflect real-world scenarios based solely on text prompts.

To make this process more concrete, LVD introduces an algorithm that utilizes DSLs to control how object-level spatial relations and temporal dynamics are generated in video diffusion models. Importantly, this method doesn’t require extensive training; it’s a training-free approach that can be integrated into various video diffusion models capable of classifier guidance.

The results of LVD are quite remarkable. It significantly outperforms the base video diffusion model and other strong baseline methods in terms of generating videos that faithfully adhere to the desired attributes and motion patterns described in text prompts. The similarity between text and generated video with LVD is 0.52. Not only the similarity between the text and video but also the quality of the video exceeds other models.

In conclusion, LVD is a groundbreaking approach to text-to-video generation that leverages the power of LLMs to generate dynamic scene layouts, ultimately improving the quality and fidelity of videos generated from complex text prompts. This approach has the potential to unlock new possibilities in various applications, such as content creation and video generation.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post UC Berkeley and UCSF Researchers Revolutionize Neural Video Generation: Introducing LLM-Grounded Video Diffusion (LVD) for Improved Spatiotemporal Dynamics appeared first on MarkTechPost.

In a New AI Paper, CMU and Google Researchers Redefine Language Model …

Tokens are generated in rapid succession using causal language models based on transformers. The model takes in the K preceding tokens and then iteratively calculates K intermediate vectors in each hidden layer to produce the (K + 1)th token. The module operates on the previous layer’s output vectors, and each vector in itself is the output of a module. Despite the complexity of the entire procedure, one unusual restriction must be met: the number of operations required to determine the next token is constrained by the number of tokens already viewed. 

A recent study by Carnegie Mellon University and Google investigated the strategy of adding fake tokens to the input of a decoder-only model to postpone its output. In this work, they decided to pick a (learnable) pause token and append it to the input in a sequence of one or more times. To obtain the model’s answer after the last token has been seen, they simply ignore the matching outputs until then.

Importantly, the researchers think about inserting such delays at inference and during downstream fine-tuning and pretraining. What effect this seemingly little adjustment might have in the real world cannot be known now. The delay creates a potentially “wider” computational channel, which the Transformer may use to its advantage. A simpler result could be that the model ignores the tokens’ ability to cause delays and continues running. After all, neither the tokens themselves nor the small number of new parameters introduced by embedding a single token are adequate to encode any additional information from the training data. These meaningless tokens may obscure useful signals and weaken the model. 

The team undertook an empirical assessment to understand the outcome of introducing (appended) delays in all training and inference phases. They examine pause training on a 1B and 130M parameter decoder-only model initially trained on C4 (Raffel et al., 2019) and then fine-tuned on nine downstream tasks covering extractive question response, reasoning, general understanding, and fact recall. Most significantly, this method raises the 1B model’s exact match score by 18% on the SQuAD extractive question-answering task. Similarly, they observed an 8% increase in the general understanding task of CommonSense QA and a 1% accuracy gain on the reasoning task of GSM8k over the standard model’s accuracy of 7.5%. 

On the other hand, when tokens are introduced only during the final fine-tuning stage (using the baseline pretrained model), improvements are seen in just a small fraction of cases. The team also conducted a series of key ablations, including:

Discovering that appending tokens is generally superior to prepending them.

Discovering that there is an optimal number of tokens for any downstream task.

Discovering that decreasing the number of inference-time tokens results in a graceful performance degradation. 

The team believes that the essential next step would be developing ways to directly make delays helpful on a normal pretrained model. They envision several new theoretical and applied research directions opening up thanks to their work expanding the paradigm of delayed next-token prediction.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post In a New AI Paper, CMU and Google Researchers Redefine Language Model Outputs: How Delaying Responses with Pause Tokens Boosts Performance on QA and Reasoning Tasks appeared first on MarkTechPost.

Mistral 7B foundation models from Mistral AI are now available in Amaz …

Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Mistral 7B model.
What is Mistral 7B
Mistral 7B is a foundation model developed by Mistral AI, supporting English text and code generation abilities. It supports a variety of use cases, such as text summarization, classification, text completion, and code completion. To demonstrate the easy customizability of the model, Mistral AI has also released a Mistral 7B Instruct model for chat use cases, fine-tuned using a variety of publicly available conversation datasets.
Mistral 7B is a transformer model and uses grouped-query attention and sliding-window attention to achieve faster inference (low latency) and handle longer sequences. Group query attention is an architecture that combines multi-query and multi-head attention to achieve output quality close to multi-head attention and comparable speed to multi-query attention. Sliding-window attention uses the stacked layers of a transformer to attend in the past beyond the window size to increase context length. Mistral 7B has an 8,000-token context length, demonstrates low latency and high throughput, and has strong performance when compared to larger model alternatives, providing low memory requirements at a 7B model size. The model is made available under the permissive Apache 2.0 license, for use without restrictions.
What is SageMaker JumpStart
With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances within a network isolated environment, and customize models using SageMaker for model training and deployment.
You can now discover and deploy Mistral 7B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security.
Discover models
You can access Mistral 7B foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find Mistral 7B in the Foundation Models: Text Generation carousel.

You can also find other model variants by choosing Explore all Text Models or searching for “Mistral.”

You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find two buttons, Deploy and Open notebook, which will help you use the model (the following screenshot shows the Deploy option).

Deploy models
Deployment starts when you choose Deploy. Alternatively, you can deploy through the example notebook that shows up when you choose Open notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using notebook, we start by selecting the Mistral 7B model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=”huggingface-llm-mistral-7b-instruct”)
predictor = model.deploy()

This deploys the model on SageMaker with default configurations, including default instance type (ml.g5.2xlarge) and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {“inputs”: “<s>[INST] Hello! [/INST]”}
predictor.predict(payload)

Optimizing the deployment configuration
Mistral models use Text Generation Inference (TGI version 1.1) model serving. When deploying models with the TGI deep learning container (DLC), you can configure a variety of launcher arguments via environment variables when deploying your endpoint. To support the 8,000-token context length of Mistral 7B models, SageMaker JumpStart has configured some of these parameters by default: we set MAX_INPUT_LENGTH and MAX_TOTAL_TOKENS to 8191 and 8192, respectively. You can view the full list by inspecting your model object:

print(model.env)

By default, SageMaker JumpStart doesn’t clamp concurrent users via the environment variable MAX_CONCURRENT_REQUESTS smaller than the TGI default value of 128. The reason is because some users may have typical workloads with small payload context lengths and want high concurrency. Note that the SageMaker TGI DLC supports multiple concurrent users through rolling batch. When deploying your endpoint for your application, you might consider whether you should clamp MAX_TOTAL_TOKENS or MAX_CONCURRENT_REQUESTS prior to deployment to provide the best performance for your workload:

model.env[“MAX_CONCURRENT_REQUESTS”] = “4”

Here, we show how model performance might differ for your typical endpoint workload. In the following tables, you can observe that small-sized queries (128 input words and 128 output tokens) are quite performant under a large number of concurrent users, reaching token throughput on the order of 1,000 tokens per second. However, as the number of input words increases to 512 input words, the endpoint saturates its batching capacity—the number of concurrent requests allowed to be processed simultaneously—resulting in a throughput plateau and significant latency degradations starting around 16 concurrent users. Finally, when querying the endpoint with large input contexts (for example, 6,400 words) simultaneously by multiple concurrent users, this throughput plateau occurs relatively quickly, to the point where your SageMaker account will start encountering 60-second response timeout limits for your overloaded requests.

.
throughput (tokens/s)

concurrent users
1
2
4
8
16
32
64
128

model
instance type
input words
output tokens
.

mistral-7b-instruct
ml.g5.2xlarge
128
128
30
54
89
166
287
499
793
1030

512
128
29
50
80
140
210
315
383
458

6400
128
17
25
30
35



.
p50 latency (ms/token)

concurrent users
1
2
4
8
16
32
64
128

model
instance type
input words
output tokens
.

mistral-7b-instruct
ml.g5.2xlarge
128
128
32
33
34
36
41
46
59
88

512
128
34
36
39
43
54
71
112
213

6400
128
57
71
98
154



Inference and example prompts
Mistral 7B
You can interact with a base Mistral 7B model like any standard text generation model, where the model processes an input sequence and outputs predicted next words in the sequence. The following is a simple example with multi-shot learning, where the model is provided with several examples and the final example response is generated with contextual knowledge of these previous examples:

> Input
Tweet: “I get sad when my phone battery dies.”
Sentiment: Negative
###
Tweet: “My day has been :+1:”
Sentiment: Positive
###
Tweet: “This is the link to the article”
Sentiment: Neutral
###
Tweet: “This new music video was incredibile”
Sentiment:

> Output
Positive

Mistral 7B instruct
The instruction-tuned version of Mistral accepts formatted instructions where conversation roles must start with a user prompt and alternate between user and assistant. A simple user prompt may look like the following:

<s>[INST] {user_prompt} [/INST]

A multi-turn prompt would look like the following:

<s>[INST] {user_prompt_1} [/INST] {assistant_response_1} </s><s>[INST] {user_prompt_1} [/INST]

This pattern repeats for however many turns are in the conversation.
In the following sections, we explore some examples using the Mistral 7B Instruct model.
Knowledge retrieval
The following is an example of knowledge retrieval:

> Input
<s>[INST] Which country has the most natural lakes? Answer with only the country name. [/INST]

> Output
1. Canada

Large context question answering
To demonstrate how to use this model to support large input context lengths, the following example embeds a passage, titled “Rats” by Robert Sullivan (reference), from the MCAS Grade 10 English Language Arts Reading Comprehension test into the input prompt instruction and asks the model a directed question about the text:

> Input
<s>[INST] A rat is a rodent, the most common mammal in the world. Rattus norvegicus is one of the approximately four hundred different kinds of rodents, and it is known by many names, each of which describes a trait or a perceived trait or sometimes a habitat: the earth rat, the roving rat, the barn rat, the fi eld rat, the migratory rat, the house rat, the sewer rat, the water rat, the wharf rat, the alley rat, the gray rat, the brown rat, and the common rat. The average brown rat is large and stocky; it grows to be approximately sixteen inches long from its nose to its tail—the size of a large adult human male’s foot—and weighs about a pound, though brown rats have been measured by scientists and exterminators at twenty inches and up to two pounds. The brown rat is sometimes confused with the black rat, or Rattus rattus, which is smaller and once inhabited New York City and all of the cities of America but, since Rattus norvegicus pushed it out, is now relegated to a minor role. (The two species still survive alongside each other in some Southern coastal cities and on the West Coast, in places like Los Angeles, for example, where the black rat lives in attics and palm trees.) The black rat is always a very dark gray, almost black, and the brown rat is gray or brown, with a belly that can be light gray, yellow, or even a pure-seeming white. One spring, beneath the Brooklyn Bridge, I saw a red-haired brown rat that had been run over by a car. Both pet rats and laboratory rats are Rattus norvegicus, but they are not wild and therefore, I would emphasize, not the subject of this book. Sometimes pet rats are called fancy rats. But if anyone has picked up this book to learn about fancy rats, then they should put this book down right away; none of the rats mentioned herein are at all fancy.

Rats are nocturnal, and out in the night the brown rat’s eyes are small and black and shiny; when a fl ashlight shines into them in the dark, the eyes of a rat light up like the eyes of a deer. Though it forages* in darkness, the brown rat has poor eyesight. It makes up for this with, fi rst of all, an excellent sense of smell. . . . They have an excellent sense of taste, detecting the most minute amounts of poison, down to one part per million. A brown rat has strong feet, the two front paws each equipped with four clawlike nails, the rear paws even longer and stronger. It can run and climb with squirrel-like agility. It is an excellent swimmer, surviving in rivers and bays, in sewer streams and toilet bowls.

The brown rat’s teeth are yellow, the front two incisors being especially long and sharp, like buckteeth. When the brown rat bites, its front two teeth spread apart. When it gnaws, a fl ap of skin plugs the space behind its incisors. Hence, when the rat gnaws on indigestible materials—concrete or steel, for example—the shavings don’t go down the rat’s throat and kill it. Its incisors grow at a rate of fi ve inches per year. Rats always gnaw, and no one is certain why—there are few modern rat studies. It is sometimes erroneously stated that the rat gnaws solely to limit the length of its incisors, which would otherwise grow out of its head, but this is not the case: the incisors wear down naturally. In terms of hardness, the brown rat’s teeth are stronger than aluminum, copper, lead, and iron. They are comparable to steel. With the alligator-like structure of their jaws, rats can exert a biting pressure of up to seven thousand pounds per square inch. Rats, like mice, seem to be attracted to wires—to utility wires, computer wires, wires in vehicles, in addition to gas and water pipes. One rat expert theorizes that wires may be attractive to rats because of their resemblance to vines and the stalks of plants; cables are the vines of the city. By one estimate, 26 percent of all electric-cable breaks and 18 percent of all phone-cable disruptions are caused by rats. According to one study, as many as 25 percent of all fi res of unknown origin are rat-caused. Rats chew electrical cables. Sitting in a nest of tattered rags and newspapers, in the fl oorboards of an old tenement, a rat gnaws the head of a match—the lightning in the city forest.

When it is not gnawing or feeding on trash, the brown rat digs. Anywhere there is dirt in a city, brown rats are likely to be digging—in parks, in fl owerbeds, in little dirt-poor backyards. They dig holes to enter buildings and to make nests. Rat nests can be in the floorboards of apartments, in the waste-stuffed corners of subway stations, in sewers, or beneath old furniture in basements. “Cluttered and unkempt alleyways in cities provide ideal rat habitat, especially those alleyways associated with food-serving establishments,” writes Robert Corrigan in Rodent Control, a pest control manual. “Alley rats can forage safely within the shadows created by the alleyway, as well as quickly retreat to the safety of cover in these narrow channels.” Often, rats burrow under concrete sidewalk slabs. Entrance to a typical under-the-sidewalk rat’s nest is gained through a two-inch-wide hole—their skeletons collapse and they can squeeze into a hole as small as three quarters of an inch wide, the average width of their skull. This tunnel then travels about a foot down to where it widens into a nest or den. The den is lined with soft debris, often shredded plastic garbage or shopping bags, but sometimes even grasses or plants; some rat nests have been found stuffed with the gnawed shavings of the wood-based, spring-loaded snap traps that are used in attempts to kill them. The back of the den then narrows into a long tunnel that opens up on another hole back on the street. This second hole is called a bolt hole; it is an emergency exit. A bolt hole is typically covered lightly with dirt or trash—camoufl age. Sometimes there are networks of burrows, which can stretch beneath a few concrete squares on a sidewalk, or a number of backyards, or even an entire city block—when Rattus norvegicus fi rst came to Selkirk, England, in 1776, there were so many burrows that people feared the town might sink. Rats can also nest in basements, sewers, manholes, abandoned pipes of any kind, fl oorboards, or any hole or depression. “Often,” Robert Corrigan writes, “‘city rats’ will live unbeknownst to people right beneath their feet.”

Rats also inhabit subways, as most people in New York City and any city with a subway system are well aware. Every once in a while, there are reports of rats boarding trains, but for the most part rats stay on the tracks—subway workers I have talked to refer to rats as “track rabbits.” People tend to think that the subways are fi lled with rats, but in fact rats are not everywhere in the system; they live in the subways according to the supply of discarded human food and sewer leaks. Sometimes, rats use the subway purely for nesting purposes; they fi nd ways through the walls of the subway stations leading from the tracks to the restaurants and stores on the street—the vibrations of subway trains tend to create rat-size cracks and holes. Many subway rats tend to live near stations that are themselves near fast-food restaurants. At the various subway stations near Herald Square, for example, people come down from the streets and throw the food that they have not eaten onto the tracks, along with newspapers and soda bottles and, I have noticed, thousands of nolonger-charged AA batteries, waiting to leak acid. The rats eat freely from the waste and sit at the side of the little streams of creamy brown sewery water that fl ows between the rails. They sip the water the way rats do, either with their front paws or by scooping it up with their incisors.

What is the most likely reason the author states, “cables are the vines of the city”? [/INST]

> Output
1. The author states that rats are attracted to wires, such as utility wires, computer wires, and vehicle wires, which may be due to their resemblance to vines and the stalks of plants.

Mathematics and reasoning
The Mistral models also report strengths in mathematics accuracy. Mistral can provide comprehension such as the following math logic:

> Input
<s>[INST] I bought an ice cream for 6 kids. Each cone was $1.25 and I paid with a $10 bill. How many dollars did I get back? Explain first before answering. [/INST]

> Output
1. First, let’s figure out how much the ice cream cost in total. Since each cone was $1.25 and there were 6 kids, the total cost would be:

Total cost = Number of cones * Cost per cone
Total cost = 6 * $1.25
Total cost = $7.50

2. Next, we’ll subtract the total cost from the amount paid ($10) to find out how much money was returned:

Money returned = Amount paid – Total cost
Money returned = $10 – $7.50
Money returned = $2.50

So, you got back $2.50 after buying the ice cream for the 6 kids.

Coding
The following is an example of a coding prompt:

> Input
<s>[INST] In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? [/INST]

> Output
1. Use the `find` command to search for text files in the current directory (excluding subdirectories) that have been modified in the last month. The `-type f` option specifies that we only want to search for files, not directories. The `-name “*.txt”` option specifies that we only want to search for files with the `.txt` extension. The `-mtime -30` option specifies that we only want to search for files that have been modified in the last 30 days (1 month).

“`bash
find . -type f -name “*.txt” -mtime -30
“`

This command will list all text files in the current directory (excluding subdirectories) that have been modified in the last month.

Clean up
After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion
In this post, we showed you how to get started with Mistral 7B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit Amazon SageMaker JumpStart now to get started.
Resources

SageMaker JumpStart documentation
SageMaker JumpStart foundation models documentation
SageMaker JumpStart product detail page
SageMaker JumpStart model catalog

About the Authors
Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Vivek Singh is a product manager with Amazon SageMaker JumpStart. He focuses on enabling customers to onboard SageMaker JumpStart to simplify and accelerate their ML journey to build generative AI applications.
Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS based in Munich, Germany. Roy helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS. Roy is passionate about computational optimization problems and improving the performance of AI workloads.

Use no-code machine learning to derive insights from product reviews u …

According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it can be challenging for companies to process and derive meaningful insights from the data using traditional methods. Machine learning (ML) can analyze large volumes of product reviews and identify patterns, sentiments, and topics discussed. With this information, companies can gain a better understanding of customer preferences, pain points, and satisfaction levels. They can also use this information to improve products and services, identify trends, and take strategic actions that drive business growth. However, implementing ML can be a challenge for companies that lack resources such as ML practitioners, data scientists, or artificial intelligence (AI) developers. With the new Amazon SageMaker Canvas features, business analysts can now use ML to derive insights from product reviews.
SageMaker Canvas is designed for the functional needs of business analysts to use AWS no code ML for ad hoc analysis of tabular data. SageMaker Canvas is a visual, point-and-click service that allows business analysts to generate accurate ML predictions without writing a single line of code or requiring ML expertise. You can use models to make predictions interactively and for batch scoring on bulk datasets. SageMaker Canvas offers fully-managed ready-to-use AI model and custom model solutions. For common ML use cases, you can use a ready-to-use AI model to generate predictions with your data without any model training. For ML use cases specific to your business domain, you can train an ML model with your own data for custom prediction.
In this post, we demonstrate how to use the ready-to-use sentiment analysis model and custom text analysis model to derive insights from product reviews. In this use case, we have a set of synthesized product reviews that we want to analyze for sentiments and categorize the reviews by product type, to make it easy to draw patterns and trends that can help business stakeholders make better informed decisions. First, we describe the steps to determine the sentiment of the reviews using the ready-to-use sentiment analysis model. Then, we walk you through the process to train a text analysis model to categorize the reviews by product type. Next, we explain how to review the trained model for performance. Finally, we explain how to use the trained model to perform predictions.
Sentiment analysis is a natural language processing (NLP) ready-to-use model that analyzes text for sentiments. Sentiment analysis may be run for single line or batch predictions. The predicted sentiments for each line of text are either positive, negative, mixed or neutral.
Text analysis allows you to classify text into two or more categories using custom models. In this post, we want to classify product reviews based on product type. To train a text analysis custom model, you simply provide a dataset consisting of the text and the associated categories in a CSV file. The dataset requires a minimum of two categories and 125 rows of text per category. After the model is trained, you can review the model’s performance and retrain the model if needed, before using it for predictions.
Prerequisites
Complete the following prerequisites:

Have an AWS account.
Set up SageMaker Canvas.
Download the sample product reviews datasets:

sample_product_reviews.csv – Contains 2,000 synthesized product reviews and is used for sentiment analysis and Text Analysis predictions.
sample_product_reviews_training.csv – Contains 600 synthesized product reviews and three product categories, and is for text analysis model training.

Sentiment analysis
First, you use sentiment analysis to determine the sentiments of the product reviews by completing the following steps.

On the SageMaker console, click Canvas in the navigation pane, then click Open Canvas to open the SageMaker Canvas application.
Click Ready-to-use models in the navigation pane, then click Sentiment analysis.
Click Batch prediction, then click Create dataset.
Provide a Dataset name and click Create.
Click Select files from your computer to import the sample_product_reviews.csv dataset.
Click Create dataset and review the data. The first column contains the reviews and is used for sentiment analysis. The second column contains the review ID and is used for reference only.
Click Create dataset to complete the data upload process.
In the Select dataset for predictions view, select sample_product_reviews.csv and then click Generate predictions. 
When the batch prediction is complete, click View to view the predictions.

The Sentiment and Confidence columns provide the sentiment and confidence score, respectively. A confidence score is a statistical value between 0 and 100%, that shows the probability that the sentiment is correctly predicted.

Click Download CSV to download the results to your computer.

Text analysis
In this section, we go through the steps to perform text analysis with a custom model: importing the data, training the model and then making predictions.
Import the data
First import the training dataset. Complete the following steps:

On Ready-to-use models page, click Create a custom model
For Model name, enter a name (for example, Product Reviews Analysis). Click Text analysis, then click Create.
On the Select tab, click Create dataset to import the sample_product_reviews_training.csv dataset.
Provide a Dataset name and click Create.
Click Create dataset and review the data. The training dataset contains a third column describing product category, the target column consisting of three products: books, video, and music.
Click Create dataset to complete the data upload process.
On the Select dataset page, select sample_product_reviews_training.csv and click Select dataset.

Train the model
Next, you configure the model to begin the training process.

On the Build tab, on the Target column drop-down menu, click product_category as the training target.
Click product_review as the source.
Click Quick build to start the model training.

For more information about the differences between Quick build and Standard build, refer to Build a custom model.
When the model training is complete, you may review the performance of the model before you use it for prediction.

On the Analyze tab, the model’s confidence score will be displayed. A confidence score indicates how certain a model is that its predictions are correct. On the Overview tab, review the performance for each category.
Click Scoring to review the model accuracy insights.
Click Advance metrics to review the confusion matrix and F1 score.

Make predictions
To make a prediction with your custom model, complete the following steps:

On the Predict tab, click Batch prediction, then click Manual.
Click the same dataset, sample_product_reviews.csv, that you used previously for the sentiment analysis, then click Generate predictions.
When the batch prediction is complete, click View to view the predictions.

For custom model prediction, it takes some time for SageMaker Canvas to deploy the model for initial use. SageMaker Canvas automatically de-provisions the model if idle for 15 minutes to save costs.
The Prediction (Category) and Confidence columns provide the predicted product categories and confidence scores, respectively.

Highlight the completed job, select the three dots and click Download to download the results to your computer.

Clean up
Click Log out in the navigation pane to log out of the SageMaker Canvas application to stop the consumption of Canvas session hours and release all resources.
Conclusion
In this post, we demonstrated how you can use Amazon SageMaker Canvas to derive insights from product reviews without ML expertise. First, you used a ready-to-use sentiment analysis model to determine the sentiments of the product reviews. Next, you used text analysis to train a custom model with the quick build process. Finally, you used the trained model to categorize the product reviews into product categories. All without writing a single line of code. We recommend that you repeat the text analysis process with the standard build process to compare the model results and prediction confidence.

About the Authors
Gavin Satur is a Principal Solutions Architect at Amazon Web Services. He works with enterprise customers to build strategic, well-architected solutions and is passionate about automation. Outside work, he enjoys family time, tennis, cooking and traveling.
Les Chan is a Sr. Solutions Architect at Amazon Web Services, based in Irvine, California. Les is passionate about working with enterprise customers on adopting and implementing technology solutions with the sole focus of driving customer business outcomes. His expertise spans application architecture, DevOps, serverless, and machine learning.
Aaqib Bickiya is a Solutions Architect at Amazon Web Services based in Southern California. He helps enterprise customers in the retail space accelerate projects and implement new technologies. Aaqib’s focus areas include machine learning, serverless, analytics, and communication services

Prepare your data for Amazon Personalize with Amazon SageMaker Data Wr …

A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users.
In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research at the University of Minnesota, which consists of a variety of user rankings of various movies, into Amazon Personalize using Amazon SageMaker Data Wrangler. [1]
Solution overview
Amazon Personalize is a managed service whose core value proposition is its ability to learn user preferences from their past behavior and quickly adjust those learned preferences to take account of changing user behavior in near-real time. To be able to develop this understanding of users, Amazon Personalize needs to train on the historical user behavior so that it can find patterns that are generalizable towards the future. Specifically, the main type of data that Amazon Personalize learns from is what we call an interactions dataset, which is a tabular dataset that consists of at minimum three critical columns, userID, itemID, and timestamp, representing a positive interaction between a user and an item at a specific time. Users Amazon Personalize need to upload data containing their own customer’s interactions in order for the model to be able to learn these behavioral trends. Although the internal algorithms within Amazon Personalize have been chosen based on Amazon’s experience in the machine learning space, a personalized model doesn’t come pre-loaded with any sort of data and trains models on a customer-by-customer basis.
The MovieLens dataset explored in this walkthrough isn’t in this format, so to prepare it for Amazon Personalize, we use SageMaker Data Wrangler, a purpose-built data aggregation and preparation tool for machine learning. It has over 300 preconfigured data transformations as well as the ability to bring in custom code to create custom transformations in PySpark, SQL, and a variety of data processing libraries, such as pandas.
Prerequisites
First, we need to have an Amazon SageMaker Studio domain set up. For details on how to set it up, refer to Onboard to Amazon SageMaker Domain using Quick setup.
Also, we need to set up the right permissions using AWS Identity and Access Management (IAM) for Amazon Personalize and Amazon SageMaker service roles so that they can access the needed functionalities.
You can create a new Amazon Personalize dataset group to use in this walkthrough or use an existing one.
Finally, we need to download and unzip the MovieLens dataset and place it in an Amazon Simple Storage Service (Amazon S3) bucket.
Launch SageMaker Data Wrangler from Amazon Personalize
To start with the SageMaker Data Wrangler integration with Amazon Personalize, complete the following steps:

On the Amazon Personalize console, navigate to the Overview page of your dataset group.
Choose Import interaction data, Import user data, or Import item data, depending on the dataset type (for this post, we choose Import interaction data).

For Import method, select Import data using Data Wrangler.
Choose Next.

Specify the SageMaker domain, user profile, and IAM service role that you created earlier as part of the prerequisites.
Choose Next.

Continue through the steps to launch an instance of SageMaker Data Wrangler.

Setting up the environment for the first time can take up to 5 minutes.
Import the raw data into SageMaker Data Wrangler
When using SageMaker Data Wrangler to prepare and import data, we use a data flow. A data flow defines a series of transformations and analyses on data to prepare it to create a machine learning model. Each time we add a step to our flow, SageMaker Data Wrangler takes an action on our data, such as joining it with another dataset or dropping some rows and columns.
To start, let’s import the raw data.

On the data flow page, choose Import data.

With SageMaker Data Wrangler, we can import data from over 50 supported data sources.

For Data sources¸ choose Amazon S3.

Choose the dataset you uploaded to your S3 bucket.

SageMaker Data Wrangler automatically displays a preview of the data.

Keep the default settings and choose Import.

After the data is imported, SageMaker Data Wrangler automatically validates the datasets and detects the data types for all the columns based on its sampling.

Choose Data flow at the top of the Data types page to view the main data flow before moving to the next step.

One of the main advantages of SageMaker Data Wrangler is the ability to run previews of your transformations on a small subset of data before committing to apply the transformations on the entire dataset. To run the same transformation on multiple partitioned files in Amazon S3, you can use parameters.
Transform the data
To transform data in SageMaker Data Wrangler, you add a Transform step to your data flow. SageMaker Data Wrangler includes over 300 transforms that you can use to prepare your data, including a Map columns for Amazon Personalize transform. You can use the general SageMaker Data Wrangler transforms to fix issues such as outliers, type issues, and missing values, or apply data preprocessing steps.
To use Amazon Personalize, the data you provided in the interactions dataset must match your dataset schema. For our movie recommender engine, the proposed interactions dataset schema includes:

user_id (string)
item_id (string)
event_type (string)
timestamp (in Unix epoch time format)

To learn more about Amazon Personalize datasets and schemas, refer to Datasets and schemas.
The ratings.csv file as shown in the last step in the previous section includes movies rated from 1–5. We want to build a movie recommender engine based on that. To do so, we must complete the following steps:

Modify the columns data types.
Create two event types: Click and Watch.
Assign all movies rated 2 and above as Click and movies rated 4 and above as both Click and Watch.
Drop the ratings column.
Map the columns to the Amazon Personalize interactions dataset schema.
Validate that our timestamp is in Unix epoch time.

Note that Step 3 isn’t needed to make a personalization model. If we want to use one of the Amazon Personalize streamlined video on demand domain recommenders, such as Top Picks for You, Click and Watch would be required event types. However, if we don’t have these, we could not include an event type field (or add our own event types such as the raw user ratings) and use a custom recipe such as User Personalization. Regardless of what type of recommendation engine we use, we need to ensure our dataset contains only representations of positive user intent. So whatever approach you choose, you need to drop all one-star ratings (and possibly two-star ratings as well).
Now let’s use SageMaker Data Wrangler to perform the preceding steps.

On the Data flow page, choose the first transform, called Data types.

Update the type for each column.
Choose Preview to reflect the changes, then choose Update.

To add a step in the data flow, choose the plus sign next to the step you want to perform the transform on, then choose Add transform.

To filter the event Click out of the movie ratings, we add a Filter data step to filter out the movies rated 2 and above.

Add another custom transform step to add a new column, eventType, with Click as an assigned value.
Choose Preview to review your transformation to double-check the results are as intended, then choose Add.
In this case, we write some PySpark code to add a column called eventType whose value will be uniformly Click for all of our two-star through five-star movies:

from pyspark.sql.functions import lit

df = df.withColumn(“eventType”, lit(“Click”))

For the Watch events, repeat the previous steps for movies rated 4 and above and assign the Watch value by adding the steps to the Data types step. Our PySpark code for these steps is as follows:

from pyspark.sql.functions import lit

df = df.withColumn(“eventType”, lit(“Watch”))

Up to this point, the data flow should look like the following screenshot.

Concatenate datasets
Because we have two datasets for watched and clicked events, let’s walk through how to concatenate these into one interactions dataset.

On the Data flow page, choose the plus sign next to Create Watch Event and choose Concatenate.

Choose the other final step (Create Click Event), and this should automatically map (converge) both the sets into a concatenate preview.

Choose Configure to view a preview of the concatenated datasets.
Add a name to the step.
Choose Add to add the step.

The data flow now looks like the following screenshot.

Now, let’s add a Manage columns step to drop the original rating column.

Amazon Personalize has default column names for users, items, and timestamps. These default column names are user_id, item_id, and timestamp.

Let’s add a Transform for Amazon Personalize step to replace the existing column headers with the default headers.
In our case, we also use the event_type field, so let’s map that as well.

With this step, the data transformation activity is complete and the interactions dataset is ready for the next step.

Next, let’s validate our timestamps.

We can do this by adding a Custom transform step. For this post, we choose Python (User-Defined Function).
Choose timestamp as the input column and as the output, create a new column called readable_timestamp.
Choose Python as the mode for the transformation and insert the following code for the Python function:

def custom_func(value: int) → str:
return datetime.utcfromtimestamp(value).strftime(‘%Y-%m-%d %H:%M:%S’)

Choose Preview to review the changes.

In this case, we see dates in the 2000s—because MovieLens started collecting data in 1996, this aligns with what is expected. If we don’t choose Add, this transformation won’t be added to our data flow.

Because this was merely a sanity check, you can navigate back to the data flow by choosing Data flow in the upper left corner.

Finally, we add an analysis step to create a summary report about the dataset. This step performs an analysis to assess the suitability of the dataset for Amazon Personalize.

Choose the plus sign next to the final step on the data flow and choose Add analysis.
For Analysis type¸ choose Data Quality And Insights Report for Amazon Personalize.
For Dataset type¸ choose Interactions.
Choose Create.

The MovieLens dataset is quite clean, so the analysis shows no issues. If some issues were identified, you can iterate on the dataset and rerun the analysis until you can address them.

Note the analysis by default runs on a sample of 50,000 rows.
Import the dataset to Amazon Personalize
At this point, our raw data has been transformed and we are ready to import the transformed interactions dataset to Amazon Personalize. SageMaker Data Wrangler gives you the ability to export your data to a location within an S3 bucket. You can specify the location using one of the following methods:

Destination node – Where SageMaker Data Wrangler stores the data after it has processed it
Export to – Exports the data resulting from a transformation to Amazon S3
Export data – For small datasets, you can quickly export the data that you’ve transformed

With the Destination node method, to export your data, you create destination nodes and a SageMaker Data Wrangler job. Creating a SageMaker Data Wrangler job starts a SageMaker Processing job to export your flow. You can choose the destination nodes that you want to export after you’ve created them.

Choose the plus sign next to the node that represents the transformations you want to export.

Choose Export to and then choose Amazon S3 (via Jupyter Notebook).

Note we could have also chosen to export the data to Amazon Personalize via a Jupyter notebook available in SageMaker Data Wrangler.

For Dataset name, enter a name, which will be used as a folder name in the S3 bucket provided as a destination.
You can specify the file type, field delimiter, and compression method.
Optionally, specify the number of partitions and column to partition by.
Choose Add destination.

The data flow should look like the following screenshot.

Create a job to process the data flow and store the data in the destination (S3 bucket) that we configured in the previous step.
Enter a job name, then choose Configure job.

SageMaker Data Wrangler provides the ability to configure the instance type, instance count, and job configuration, and the ability to create a schedule to process the job. For guidance on how to choose an instance count, refer to Create and Use a Data Wrangler Flow.

To monitor the status of the job, navigate to the Dashboard page on the SageMaker console. The Processing section shows the number of completed and created jobs. You can drill down to get more details about the completed job.

When the job is complete, a new file of the transformed data is created in the destination specified.

Return to the Amazon Personalize console and navigate to the dataset group to import another dataset.
Choose Import interaction data.

Select Import data directly into Amazon Personalize datasets to import the transformed dataset directly from Amazon S3, then choose Next.

Define the schema. For this post, our case our dataset consists of the user_id (string), item_id (string), event_type (string), and timestamp (long) fields.

At this point, you can create a video on demand domain recommender or a custom solution. To do so, follow the steps in Preparing and importing data
Conclusion
In this post, we described how to use SageMaker Data Wrangler to prepare a sample dataset for Amazon Personalize. SageMaker Data Wrangler offers over 300 transformations. These transformations and the ability to add custom user transformations can help streamline the process of creating a quality dataset to offer hyper-personalized content to end-users.
Although we only explored how to prepare an interactions dataset in this post, you can use SageMaker Data Wrangler to prepare user and item datasets as well. For more information on the types of data that can be used with Amazon Personalize, refer to Datasets and schemas.
If you’re new to Amazon Personalize or SageMaker Data Wrangler, refer to Get Started with Amazon Personalize or Get Started with SageMaker Data Wrangler, respectively. If you have any questions related to this post, please add them in the comments section.

About the Authors
Maysara Hamdan is a Partner Solutions Architect based in Atlanta, Georgia. Maysara has over 15 years of experience in building and architecting Software Applications and IoT Connected Products in Telecom and Automotive Industries. In AWS, Maysara helps partners in building their cloud practices and growing their businesses. Maysara is passionate about new technologies and is always looking for ways to help partners innovate and grow.
Eric Bolme is a Specialist Solution Architect with AWS based on the East Coast of the United States. He has 8 years of experience building out a variety of deep learning and other AI use cases and focuses on Personalization and Recommendation use cases with AWS.

References
[1] Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872

Google DeepMind Introduces Direct Reward Fine-Tuning (DRaFT): An Effec …

Diffusion models have revolutionized generative modeling across various data types. However, in practical applications like generating aesthetically pleasing images from text descriptions, fine-tuning is often needed. Text-to-image diffusion models employ techniques like classifier-free guidance and curated datasets such as LAION Aesthetics to improve alignment and image quality.

In their research, the authors present a straightforward and efficient method for gradient-based reward fine-tuning, which involves differentiating through the diffusion sampling process. They introduce the concept of Direct Reward Fine-Tuning (DRaFT), which essentially backpropagates through the entire sampling chain, typically represented as an unrolled computation graph with a length of 50 steps. To manage memory and computational costs effectively, they employ gradient checkpointing techniques and optimize LoRA weights instead of modifying the entire set of model parameters.

The above image demonstrates DRaFT using human preference reward models. Furthermore, the authors introduce enhancements to the DRaFT method to enhance its efficiency and performance. First, they propose DRaFT-K, a variant that limits backpropagation to only the last K steps of sampling when computing the gradient for fine-tuning. Empirical results demonstrate that this truncated gradient approach significantly outperforms full backpropagation with the same number of training steps, as full backpropagation can lead to issues with exploding gradients.

Additionally, the authors introduce DRaFT-LV, a variation of DRaFT-1 that computes lower-variance gradient estimates by averaging over multiple noise samples, further improving efficiency in their approach.

The authors of the study applied DRaFT to Stable Diffusion 1.4 and conducted evaluations using various reward functions and prompt sets. Their methods, which leverage gradients, demonstrated significant efficiency advantages compared to RL-based fine-tuning baselines. For instance, they achieved over a 200-fold speed improvement when maximizing scores from the LAION Aesthetics Classifier compared to RL algorithms.

DRaFT-LV, one of their proposed variations, exhibited exceptional efficiency, learning approximately twice as fast as ReFL, a prior gradient-based fine-tuning method. Furthermore, they demonstrated the versatility of DRaFT by combining or interpolating DRaFT models with pre-trained models, which can be achieved by adjusting LoRA weights through mixing or scaling. 

In conclusion, directly fine-tuning diffusion models on differentiable rewards offers a promising avenue for improving generative modeling techniques, with implications for applications spanning images, text, and more. Its efficiency, versatility, and effectiveness make it a valuable addition to the toolkit of researchers and practitioners in the field of machine learning and generative modeling.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Google DeepMind Introduces Direct Reward Fine-Tuning (DRaFT): An Effective Artificial Intelligence Method for Fine-Tuning Diffusion Models to Maximize Differentiable Reward Functions appeared first on MarkTechPost.

Meet SynthIA (Synthetic Intelligent Agent) 7B-v1.3: A Mistral-7B-v0.1 …

SynthIA-7B-v1.3 is a seven-billion-parameter large language model (LLM). It is actually Mistral-7B-v0.1 model trained on Orca-style datasets and is now proficient at following directions and conducting in-depth discussions. SynthIA-7B-v1.3 is completely unrestricted and can be put to many different uses, such as:

Text creation, language translation, generating original content, and providing insightful responses to questions are all within the scope of this skill set.

Carefully carrying out requests and following instructions.

Whether a question is simple or complex, common or out of the ordinary, you should always answer it thoroughly and accurately.

Poetry, code, screenplays, music, letters, emails, and other creative text formats can be generated.

SynthIA-7B-v1.3 is a robust and flexible LLM with many potential uses. Some instances are as follows:

Articles, blogs, stories, and poems are just some written works that can be generated with SynthIA-7B-v1.3. It can also be used for creative writing and language translation.

SynthIA-7B-v1.3 is a tool for researchers that can facilitate their studies. It can be put to use, for instance, in the form of hypothesis development, paper summarization, and report writing.

SynthIA-7B-v1.3 can be utilized as a teaching tool in the classroom. It has many potential educational applications, including the creation of tailor-made course materials, the resolution of student inquiries, and the evaluation of their work.

Commercial: SynthIA-7B-v1.3 can be utilized to improve corporate processes. Its potential applications include product/service ideation, customer support response writing, and marketing.

The SynthIA-7B-v1.3 User’s Guide

Hugging Face Transformers is where you’ll find SynthIA-7B-v1.3 if you want to use it. After the model has been loaded, you can communicate with it by giving it questions and directives. You can train the model to do various tasks, including creating poetry, translating text, and reporting on current events.

Key Features

SynthIA-7B-v1.3 is one of the most powerful and comprehensive LLMs, with 7 billion parameters.

It is unfiltered. Thus, it can generate writing on any subject, including controversial or sensitive ones.

It is ideally suited for writing, researching, teaching, and business-related interactions because of its emphasis on long-form conversation and instruction-following.

How to get the most out of SynthIA-7B-v1.3

Some suggestions for getting the most out of SynthIA-7B-v1.3:

Your directions and suggestions should be as detailed as possible. Your model’s ability to comprehend your needs and produce the expected results improves with the level of detail you provide.

Give the model some samples of what you want it to do. You could train the model with examples of poetry written in a certain style, for instance, if you wanted it to generate poems written in that manner.

Dissect complicated jobs into simpler ones. This will improve the model’s ability to finish the job.

It may take some practice to become proficient in using SynthIA-7B-v1.3. But with some exercise, you can use it to produce professional-grade writing and accomplish many goals.

Please visit this link for further information: https://huggingface.co/migtissera/SynthIA-7B-v1.3 

In conclusion, SynthIA-7B-v1.3 is a robust and flexible LLM with many potential uses. Although it is still in progress, it has learned to execute various jobs and is always improving. SynthIA-7B-v1.3 is an excellent choice if you need a robust and flexible LLM.

Check out the Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Meet SynthIA (Synthetic Intelligent Agent) 7B-v1.3: A Mistral-7B-v0.1 Model Trained on Orca Style Datasets appeared first on MarkTechPost.

Researchers from ITU Denmark Introduce Neural Developmental Programs: …

The human brain is an extraordinarily complex organ, often considered one of the most intricate and sophisticated systems in the known universe. The brain is hierarchically organized, with lower-level sensory processing areas sending information to higher-level cognitive and decision-making regions. This hierarchy allows for the integration of knowledge and complex behaviors. The brain processes information in parallel, with different regions and networks simultaneously working on various aspects of perception, cognition, and motor control. This parallel processing contributes to its efficiency and adaptability.

Can we adapt these hierarchy organization and parallel processing techniques in deep learning? Yes, the field of study is called Neural networks. Researchers at the University of Copenhagen present a graph neural network type of encoding in which the growth of a policy network is controlled by another network running in each neuron. They call it a Neural Developmental Program (NDP). 

Some biological processes involve mapping a compact genotype to a larger phenotype. Inspired by this, the researchers have built indirect encoding methods. In Indirect encoding, the description of the solution is compressed. This allows the information to be reused, and the final solution will contain more components than the description itself. However, these encodings (particularly indirect encoding family) must be developed. 

The NDP architecture comprises a Multilayer Perceptron (MLP) and a Graph Cellular Automata (GNCA). This updates the node embeddings after each message passing step during the developmental phase. In general,  cellular automata are mathematical models consisting of a grid of cells in one of several states. These automata evolve over discrete time steps based on a set of rules that determine how the states of the cells change over time. 

In NDP, the same model is applied to every. So,  the number of parameters is constant with respect to the size of the graph in which it operates. This provides an advantage to NDP as it can operate upon any neural network of arbitrary size or architecture. The NDP neural network can also be trained with any black-box optimization algorithm to satisfy any objective function. This will allow neural networks to solve reinforcement learning and classification tasks and exhibit topological properties. 

Researchers also tried to evaluate the differentiable NDP by comparing trained and tested models on different numbers of growth steps. They observed that for most tasks, the network’s performance decreased after a certain number of growth steps. The reason to observe this was that the new modes of the network got larger. You would require an automated method to know when to stop growing the steps. They say this automation would be an important addition to the NDP. In the future, they also want to include activity-dependent and reward-modulated growth and adaptation techniques for the NDP.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Now, we are also on WhatsApp. Join our AI Channel on Whatsapp..
The post Researchers from ITU Denmark Introduce Neural Developmental Programs: Bridging the Gap Between Biological Growth and Artificial Neural Networks appeared first on MarkTechPost.

Meta AI Researchers Propose Advanced Long-Context LLMs: A Deep Dive in …

The emergence of Large Language Models (LLMs) in natural language processing represents a groundbreaking development. These models, trained on vast amounts of data and leveraging immense computational resources, promise to transform human interactions with the digital world. As they evolve through scaling and rapid deployment, their potential use cases become increasingly intricate and complex. They extend their capabilities to tasks such as analyzing dense, knowledge-rich documents, enhancing chatbot experiences to make them more genuine and engaging, and assisting human users in iterative creative processes like coding and design.

One crucial feature that empowers this evolution is the capacity to effectively process long-context inputs. This means that LLMs should be able to understand and generate text based on substantial amounts of preceding context, which is particularly important for tasks involving lengthy documents, multi-turn conversations, or complex problem-solving.

However, until now, LLMs with robust long-context capabilities have primarily been available through proprietary LLM APIs, leaving a gap in accessible solutions for researchers and developers. Open-source long-context models, while valuable, have often fallen short in their evaluations. Typically, they focus on language modeling loss and synthetic tasks, which, while informative, do not comprehensively showcase their effectiveness in diverse, real-world scenarios. Furthermore, many of these models overlook the need to maintain strong performance on standard short-context tasks, bypassing these evaluations or reporting subpar results.

In response to these challenges, new Meta research presents an approach to constructing long-context LLMs that outshine all existing open-source models. This methodology revolves around continual pretraining from LLAMA 2 checkpoints and utilizes an additional 400 billion tokens to form extensive training sequences. These sequences are designed to capture the essence of long-context understanding. The work offers a range of model variants, including smaller 7B/13B models trained with 32,768-token sequences and larger 34B/70B models trained with 16,384-token sequences.

What sets this approach apart is the thoroughness of their evaluation process. Unlike previous studies, the team assesses the model’s performance across multiple dimensions. This includes evaluating their language modeling capabilities, performance on synthetic tasks, and, most importantly, their effectiveness in a wide range of real-world benchmarks. They cover long and short-context tasks to provide a holistic view of the models’ capabilities.

The findings show that the scaling behavior demonstrates the models’ ability to consistently benefit from more extensive contexts and highlights context length as another crucial axis of scaling for LLMs.

Compared to LLAMA 2 on research benchmarks, this method observes significant improvements in long-context tasks and modest enhancements in standard short-context tasks. These improvements are particularly notable in coding, mathematical problem-solving, and knowledge-related tasks. Moreover, the team explores a simple and cost-effective procedure for instruction fine-tuning of continually pretrained long models achieved without human-annotated data. The outcome is a chat model that surpasses the performance of gpt-3.5-turbo-16k on a series of long-context benchmarks, 

Overall, the approach represents a significant step towards bridging the gap between proprietary and open-source long-context LLMs. It offers models with superior performance, extensive evaluation across various dimensions, and a deeper understanding of the factors that influence their capabilities. Ultimately, the team hopes to empower researchers and developers to harness the potential of long-context LLMs for a wide array of applications, ushering in a new era of natural language processing.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..
The post Meta AI Researchers Propose Advanced Long-Context LLMs: A Deep Dive into Upsampling, Training Techniques, and Surpassing GPT-3.5-Turbo-16k’s Performance appeared first on MarkTechPost.