Perform batch transforms with Amazon SageMaker Jumpstart Text2Text Gen …

Today we are excited to announce that you can now perform batch transforms with Amazon SageMaker JumpStart large language models (LLMs) for Text2Text Generation. Batch transforms are useful in situations where the responses don’t need to be real time and therefore you can do inference in batch for large datasets in bulk. For batch transform, a batch job is run that takes batch input as a dataset and a pre-trained model, and outputs predictions for each data point in the dataset. Batch transform is cost-effective because unlike real-time hosted endpoints that have persistent hardware, batch transform clusters are torn down when the job is complete and therefore the hardware is only used for the duration of the batch job.
In some use cases, real-time inference requests can be grouped in small batches for batch processing to create real-time or near-real-time responses. For example, if you need to process a continuous stream of data with low latency and high throughput, invoking a real-time endpoint for each request separately would require more resources and can take longer to process all the requests because the processing is being done serially. A better approach would be to group some of the requests and call the real-time endpoint in batch inference mode, which processes your requests in one forward pass of the model and returns the bulk response for the request in real time or near-real time. The latency of the response will depend upon how many requests you group together and instance memory size, therefore you can tune the batch size per your business requirements for latency and throughput. We call this real-time batch inference because it combines the concept of batching while still providing real-time responses. With real-time batch inference, you can achieve a balance between low latency and high throughput, enabling you to process large volumes of data in a timely and efficient manner.
Jumpstart batch transform for Text2Text Generation models allows you to pass the batch hyperparameters through environment variables that further increase throughput and minimize latency.
JumpStart provides pretrained, open-source models for a wide range of problem types to help you get started with machine learning (ML). You can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker. You can access the pre-trained models, solution templates, and examples through the JumpStart landing page in Amazon SageMaker Studio. You can also access JumpStart models using the SageMaker Python SDK.
In this post, we demonstrate how to use the state-of-the-art pre-trained text2text FLAN T5 models from Hugging Face for batch transform and real-time batch inference.
Solution overview
The notebook showing batch transform of pre-trained Text2Text FLAN T5 models from Hugging Face in available in the following GitHub repository. This notebook uses data from the Hugging Face cnn_dailymail dataset for a text summarization task using the SageMaker SDK.
The following are the key steps for implementing batch transform and real-time batch inference:

Set up prerequisites.
Select a pre-trained model.
Retrieve artifacts for the model.
Specify batch transform job hyperparameters.
Prepare data for the batch transform.
Run the batch transform job.
Evaluate the summarization using a ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score.
Perform real-time batch inference.

Set up prerequisites
Before you run the notebook, you must complete some initial setup steps. Let’s set up the SageMaker execution role so it has permissions to run AWS services on your behalf:

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

Select a pre-trained model
We use the huggingface-text2text-flan-t5-large model as a default model. Optionally, you can retrieve the list of available Text2Text models on JumpStart and choose your preferred model. This method provides a straightforward way to select different model IDs using same notebook. For demonstration purposes, we use the huggingface-text2text-flan-t5-large model:

model_id, model_version, = (
“huggingface-text2text-flan-t5-large”,
“*”,
)

Retrieve artifacts for the model
With SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the deploy_image_uri, deploy_source_uri, and model_uri for the pre-trained model:

inference_instance_type = “ml.p3.2xlarge”

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None, # automatically inferred from model_id
image_scope=”inference”,
model_id=model_id,
model_version=model_version,
instance_type=inference_instance_type,
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”inference”
)

#Create the SageMaker model instance
model = Model(
image_uri=deploy_image_uri,
model_data=model_uri,
role=aws_role,
predictor_cls=Predictor)

Specify batch transform job hyperparameters
You may pass any subset of hyperparameters as environment variables to the batch transform job. You can also pass these hyperparameters in a JSON payload. However, if you’re setting environment variables for hyperparameters like the following code shows, then the advanced hyperparameters from the individual examples in the JSON lines payload will not be used. If you want to use hyperparameters from the payload, you may want to set the hyper_params_dict parameter as null instead.

#Specify the Batch Job Hyper Params Here, If you want to treate each example hyperparameters different please pass hyper_params_dict as None
hyper_params = {“batch_size”:4, “max_length”:50, “top_k”: 50, “top_p”: 0.95, “do_sample”: True}
hyper_params_dict = {“HYPER_PARAMS”:str(hyper_params)}

Prepare data for batch transform
Now we’re ready to load the cnn_dailymail dataset from Hugging Face:

cnn_test = load_dataset(‘cnn_dailymail’,’3.0.0′,split=’test’)

We go over each data entry and create the input data in the required format. We create an articles.jsonl file as a test data file containing articles that need to be summarized as input payload. As we create this file, we append the prompt “Briefly summarize this text:” to each test input row. If you want to have different hyperparameters for each test input, you can append those hyperparameters as part of creating the dataset.
We create highlights.jsonl as the ground truth file containing highlights of each article stored in the test file articles.jsonl. We store both test files in an Amazon Simple Storage Service (Amazon S3) bucket. See the following code:

#You can specify a prompt here
prompt = “Briefly summarize this text: ”
#Provide the test data and the ground truth file name
test_data_file_name = “articles.jsonl”
test_reference_file_name = ‘highlights.jsonl’

test_articles = []
test_highlights =[]

# We will go over each data entry and create the data in the input required format as described above
for id, test_entry in enumerate(cnn_test):
article = test_entry[‘article’]
highlights = test_entry[‘highlights’]
# Create a payload like this if you want to have different hyperparameters for each test input
# payload = {“id”: id,”text_inputs”: f”{prompt}{article}”, “max_length”: 100, “temperature”: 0.95}
# Note that if you specify hyperparameter for each payload individually, you may want to ensure that hyper_params_dict is set to None instead
payload = {“id”: id,”text_inputs”: f”{prompt}{article}”}
test_articles.append(payload)
test_highlights.append({“id”:id, “highlights”: highlights})

with open(test_data_file_name, “w”) as outfile:
for entry in test_articles:
outfile.write(“%sn” % json.dumps(entry))

with open(test_reference_file_name, “w”) as outfile:
for entry in test_highlights:
outfile.write(“%sn” % json.dumps(entry))

# Uploading the data
s3 = boto3.client(“s3”)
s3.upload_file(test_data_file_name, output_bucket, os.path.join(output_prefix + “/batch_input/articles.jsonl”))

Run the batch transform job
When you start a batch transform job, SageMaker launches the necessary compute resources to process the data, including CPU or GPU instances depending on the selected instance type. During the batch transform job, SageMaker automatically provisions and manages the compute resources required to process the data, including instances, storage, and networking resources. When the batch transform job is complete, the compute resources are automatically cleaned up by SageMaker. This means that the instances and storage used during the job are stopped and removed, freeing up resources and minimizing cost. See the following code:

# Creating the Batch transformer object
batch_transformer = model.transformer(
instance_count=1,
instance_type=inference_instance_type,
output_path=s3_output_data_path,
assemble_with=”Line”,
accept=”text/csv”,
max_payload=1,
env = hyper_params_dict
)

# Making the predications on the input data
batch_transformer.transform(s3_input_data_path, content_type=”application/jsonlines”, split_type=”Line”)

batch_transformer.wait()

The following is one example record from the articles.jsonl test file. Note that record in this file has an ID that matched with predict.jsonl file records that shows a summarized record as output from the Hugging Face Text2Text model. Similarly, the ground truth file also has a matching ID for the data record. The matching ID across the test file, ground truth file, and output file allows linking input records with output records for easy interpretation of the results.
The following is the example input record provided for summarization:

{“id”: 0, “text_inputs”: “Briefly summarize this text: (CNN)The Palestinian Authority officially became the 123rd member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories. The formal accession was marked with a ceremony at The Hague, in the Netherlands, where the court is based. The Palestinians signed the ICC’s founding Rome Statute in January, when they also accepted its jurisdiction over alleged crimes committed “in the occupied Palestinian territory, including East Jerusalem, since June 13, 2014.” Later that month, the ICC opened a preliminary examination into the situation in Palestinian territories, paving the way for possible war crimes investigations against Israelis. As members of the court, Palestinians may be subject to counter-charges as well. Israel and the United States, neither of which is an ICC member, opposed the Palestinians’ efforts to join the body. But Palestinian Foreign Minister Riad al-Malki, speaking at Wednesday’s ceremony, said it was a move toward greater justice. “As Palestine formally becomes a State Party to the Rome Statute today, the world is also a step closer to ending a long era of impunity and injustice,” he said, according to an ICC news release. “Indeed, today brings us closer to our shared goals of justice and peace.” Judge Kuniko Ozaki, a vice president of the ICC, said acceding to the treaty was just the first step for the Palestinians. “As the Rome Statute today enters into force for the State of Palestine, Palestine acquires all the rights as well as responsibilities that come with being a State Party to the Statute. These are substantive commitments, which cannot be taken lightly,” she said. Rights group Human Rights Watch welcomed the development. “Governments seeking to penalize Palestine for joining the ICC should immediately end their pressure, and countries that support universal acceptance of the court’s treaty should speak out to welcome its membership,” said Balkees Jarrah, international justice counsel for the group. “What’s objectionable is the attempts to undermine international justice, not Palestine’s decision to join a treaty to which over 100 countries around the world are members.” In January, when the preliminary ICC examination was opened, Israeli Prime Minister Benjamin Netanyahu described it as an outrage, saying the court was overstepping its boundaries. The United States also said it “strongly” disagreed with the court’s decision. “As we have said repeatedly, we do not believe that Palestine is a state and therefore we do not believe that it is eligible to join the ICC,” the State Department said in a statement. It urged the warring sides to resolve their differences through direct negotiations. “We will continue to oppose actions against Israel at the ICC as counterproductive to the cause of peace,” it said. But the ICC begs to differ with the definition of a state for its purposes and refers to the territories as “Palestine.” While a preliminary examination is not a formal investigation, it allows the court to review evidence and determine whether to investigate suspects on both sides. Prosecutor Fatou Bensouda said her office would “conduct its analysis in full independence and impartiality.” The war between Israel and Hamas militants in Gaza last summer left more than 2,000 people dead. The inquiry will include alleged war crimes committed since June. The International Criminal Court was set up in 2002 to prosecute genocide, crimes against humanity and war crimes. CNN’s Vasco Cotovio, Kareem Khadder and Faith Karimi contributed to this report.”}

The following is the predicted output with summarization:

{‘id’: 0, ‘generated_texts’: [‘The Palestinian Authority officially became a member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories.’]}

The following is the ground truth summarization for model evaluation purposes:

{“id”: 0, “highlights”: “Membership gives the ICC jurisdiction over alleged crimes committed in Palestinian territories since last June .nIsrael and the United States opposed the move, which could open the door to war crimes investigations against Israelis .”}

Next, we use the ground truth and predicted outputs for model evaluation.
Evaluate the model using a ROUGE score¶
ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation in natural language processing. The metrics compare an automatically produced summary or translation against a reference (human-produced) summary or translation or a set of references.
In the following code, we combine the predicted and original summaries by joining them on the common key id and use this to compute the ROUGE score:

# Downloading the predictions
s3.download_file(
output_bucket, output_prefix + “/batch_output/” + “articles.jsonl.out”, “predict.jsonl”
)

with open(‘predict.jsonl’, ‘r’) as json_file:
json_list = list(json_file)

# Creating the prediction list for the dataframe
predict_dict_list = []
for predict in json_list:
if len(predict) > 1:
predict_dict = ast.literal_eval(predict)
predict_dict_req = {“id”: predict_dict[“id”], “prediction”: predict_dict[“generated_texts”][0]}
predict_dict_list.append(predict_dict_req)

# Creating the predictions dataframe
predict_df = pd.DataFrame(predict_dict_list)

test_highlights_df = pd.DataFrame(test_highlights)

# Combining the predict dataframe with the original summarization on id to compute the rouge score
df_merge = test_highlights_df.merge(predict_df, on=”id”, how=”left”)

rouge = evaluate.load(‘rouge’)
results = rouge.compute(predictions=list(df_merge[“prediction”]),references=list(df_merge[“highlights”]))
print(results)
{‘rouge1’: 0.32749078992945646, ‘rouge2’: 0.126038645005132, ‘rougeL’: 0.22764277967933363, ‘rougeLsum’: 0.28162915746368966}

Perform real-time batch inference
Next, we show you how to run real-time batch inference on the endpoint by providing the inputs as a list. We use the same model ID and dataset as earlier, except we take a few records from the test dataset and use them to invoke a real-time endpoint.
The following code shows how to create and deploy a real-time endpoint for real-time batch inference:

from sagemaker.utils import name_from_base
endpoint_name = name_from_base(f”jumpstart-example-{model_id}”)
# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=endpoint_name
)

Next, we prepare our input payload. For this, we use the data that we prepared earlier and extract the first 10 test inputs and append the text inputs with hyperparameters that we want to use. We provide this payload to the real-time invoke_endpoint. The response payload is then returned as a list of responses. See the following code:

#Provide all the text inputs to the model as a list
text_inputs = [entry[“text_inputs”] for entry in test_articles[0:10]]

# The information about the different Parameters is provided above
payload = {
“text_inputs”: text_inputs,
“max_length”: 50,
“num_return_sequences”: 1,
“top_k”: 50,
“top_p”: 0.95,
“do_sample”: True,
“batch_size”: 4
}

def query_endpoint_with_json_payload(encoded_json, endpoint_name):
client = boto3.client(“runtime.sagemaker”)
response = client.invoke_endpoint(
EndpointName=endpoint_name, ContentType=”application/json”, Body=encoded_json
)
return response

query_response = query_endpoint_with_json_payload(
json.dumps(payload).encode(“utf-8”), endpoint_name=endpoint_name
)

def parse_response_multiple_texts(query_response):
model_predictions = json.loads(query_response[“Body”].read())
return model_predictions

generated_text_list = parse_response_multiple_texts(query_response)
print(*generated_text_list, sep=’n’)

Clean up
After you have tested the endpoint, make sure you delete the SageMaker inference endpoint and delete the model to avoid incurring charges.
Conclusion
In this notebook, we performed a batch transform to showcase the Hugging Face Text2Text Generator model for summarization tasks. Batch transform is advantageous in obtaining inferences from large datasets without requiring a persistent endpoint. We linked input records with inferences to aid in result interpretation. We used the ROUGE score to compare the test data summarization with the model-generated summarization.
Additionally, we demonstrated real-time batch inference, where you can send a small batch of data to a real-time endpoint to achieve a balance between latency and throughput for scenarios like streaming input data. Real-time batch inference helps increase throughput for real-time requests.
Try out the batch transform with Text2Text Generation models in SageMaker today and let us know your feedback!

About the authors
Hemant Singh is a Machine Learning Engineer with experience in Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He got his masters from Courant Institute of Mathematical Sciences and B.Tech from IIT Delhi. He has experience in working on a diverse range of machine learning problems within the domain of natural language processing, computer vision, and time series analysis.
Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that the ethical and responsible use of AI can improve society in future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Deploy generative AI models from Amazon SageMaker JumpStart using the …

The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of virtually infinite compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are rapidly adopting and using ML technologies to transform their businesses.
Just recently, generative AI applications have captured everyone’s attention and imagination. We are truly at an exciting inflection point in the widespread adoption of ML, and we believe every customer experience and application will be reinvented with generative AI.
Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast corpora of data and commonly referred to as foundation models (FMs).
The size and general-purpose nature of FMs make them different from traditional ML models, which typically perform specific tasks, like analyzing text for sentiment, classifying images, and forecasting trends.

With tradition ML models, in order to achieve each specific task, you need to gather labeled data, train a model, and deploy that model. With foundation models, instead of gathering labeled data for each model and training multiple models, you can use the same pre-trained FM to adapt various tasks. You can also customize FMs to perform domain-specific functions that are differentiating to your businesses, using only a small fraction of the data and compute required to train a model from scratch.
Generative AI has the potential to disrupt many industries by revolutionizing the way content is created and consumed. Original content production, code generation, customer service enhancement, and document summarization are typical use cases of generative AI.
Amazon SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with ML. You can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker.
With over 600 pre-trained models available and growing every day, JumpStart enables developers to quickly and easily incorporate cutting-edge ML techniques into their production workflows. You can access the pre-trained models, solution templates, and examples through the JumpStart landing page in Amazon SageMaker Studio. You can also access JumpStart models using the SageMaker Python SDK. For information about how to use JumpStart models programmatically, see Use SageMaker JumpStart Algorithms with Pretrained Models.
In April 2023, AWS unveiled Amazon Bedrock, which provides a way to build generative AI-powered apps via pre-trained models from startups including AI21 Labs, Anthropic, and Stability AI. Amazon Bedrock also offers access to Titan foundation models, a family of models trained in-house by AWS. With the serverless experience of Amazon Bedrock, you can easily find the right model for your needs, get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using the AWS tools and capabilities you’re familiar with (including integrations with SageMaker ML features like Amazon SageMaker Experiments to test different models and Amazon SageMaker Pipelines to manage your FMs at scale) without having to manage any infrastructure.
In this post, we show how to deploy image and text generative AI models from JumpStart using the AWS Cloud Development Kit (AWS CDK). The AWS CDK is an open-source software development framework to define your cloud application resources using familiar programming languages like Python.
We use the Stable Diffusion model for image generation and the FLAN-T5-XL model for natural language understanding (NLU) and text generation from Hugging Face in JumpStart.
Solution overview
The web application is built on Streamlit, an open-source Python library that makes it easy to create and share beautiful, custom web apps for ML and data science. We host the web application using Amazon Elastic Container Service (Amazon ECS) with AWS Fargate and it is accessed via an Application Load Balancer. Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters or virtual machines. The generative AI model endpoints are launched from JumpStart images in Amazon Elastic Container Registry (Amazon ECR). Model data is stored on Amazon Simple Storage Service (Amazon S3) in the JumpStart account. The web application interacts with the models via Amazon API Gateway and AWS Lambda functions as shown in the following diagram.

API Gateway provides the web application and other clients a standard RESTful interface, while shielding the Lambda functions that interface with the model. This simplifies the client application code that consumes the models. The API Gateway endpoints are publicly accessible in this example, allowing for the possibility to extend this architecture to implement different API access controls and integrate with other applications.
In this post, we walk you through the following steps:

Install the AWS Command Line Interface (AWS CLI) and AWS CDK v2 on your local machine.
Clone and set up the AWS CDK application.
Deploy the AWS CDK application.
Use the image generation AI model.
Use the text generation AI model.
View the deployed resources on the AWS Management Console.

We provide an overview of the code in this project in the appendix at the end of this post.
Prerequisites
You must have the following prerequisites:

An AWS account
The AWS CLI v2
Python 3.6 or later
node.js 14.x or later
The AWS CDK v2
Docker v20.10 or later

You can deploy the infrastructure in this tutorial from your local computer or you can use AWS Cloud9 as your deployment workstation. AWS Cloud9 comes pre-loaded with AWS CLI, AWS CDK and Docker. If you opt for AWS Cloud9, create the environment from the AWS console.
The estimated cost to complete this post is $50, assuming you leave the resources running for 8 hours. Make sure you delete the resources you create in this post to avoid ongoing charges.
Install the AWS CLI and AWS CDK on your local machine
If you don’t already have the AWS CLI on your local machine, refer to Installing or updating the latest version of the AWS CLI and Configuring the AWS CLI.
Install the AWS CDK Toolkit globally using the following node package manager command:

$ npm install -g aws-cdk-lib@latest

Run the following command to verify the correct installation and print the version number of the AWS CDK:

$ cdk –version

Make sure you have Docker installed on your local machine. Issue the following command to verify the version:

$ docker –version

Clone and set up the AWS CDK application
On your local machine, clone the AWS CDK application with the following command:

$ git clone https://github.com/aws-samples/generative-ai-sagemaker-cdk-demo.git

Navigate to the project folder:

$ cd generative-ai-sagemaker-cdk-demo

Before we deploy the application, let’s review the directory structure:

.
├── LICENSE
├── README.md
├── app.py
├── cdk.json
├── code
│ ├── lambda_txt2img
│ │ └── txt2img.py
│ └── lambda_txt2nlu
│ └── txt2nlu.py
├── construct
│ └── sagemaker_endpoint_construct.py
├── images
│ ├── architecture.png
│ ├── …
├── requirements-dev.txt
├── requirements.txt
├── source.bat
├── stack
│ ├── __init__.py
│ ├── generative_ai_demo_web_stack.py
│ ├── generative_ai_txt2img_sagemaker_stack.py
│ ├── generative_ai_txt2nlu_sagemaker_stack.py
│ └── generative_ai_vpc_network_stack.py
├── tests
│ ├── __init__.py
│ └── …
└── web-app
├── Dockerfile
├── Home.py
├── configs.py
├── img
│ └── sagemaker.png
├── pages
│ ├── 2_Image_Generation.py
│ └── 3_Text_Generation.py
└── requirements.txt

The stack folder contains the code for each stack in the AWS CDK application. The code folder contains the code for the Lambda functions. The repository also contains the web application located under the folder web-app.
The cdk.json file tells the AWS CDK Toolkit how to run your application.
This application was tested in the us-east-1 Region, but it should work in any Region that has the required services and inference instance type ml.g4dn.4xlarge specified in app.py.
Set up a virtual environment
This project is set up like a standard Python project. Create a Python virtual environment using the following code:

$ python3 -m venv .venv

Use the following command to activate the virtual environment:

$ source .venv/bin/activate

If you’re on a Windows platform, activate the virtual environment as follows:

% .venvScriptsactivate.bat

After the virtual environment is activated, upgrade pip to the latest version:

$ python3 -m pip install –upgrade pip

Install the required dependencies:

$ pip install -r requirements.txt

Before you deploy any AWS CDK application, you need to bootstrap a space in your account and the Region you’re deploying into. To bootstrap in your default Region, issue the following command:

$ cdk bootstrap

If you want to deploy into a specific account and Region, issue the following command:

$ cdk bootstrap aws://ACCOUNT-NUMBER/REGION

For more information about this setup, visit Getting started with the AWS CDK.
AWS CDK application stack structure
The AWS CDK application contains multiple stacks, as shown in the following diagram.

You can list the stacks in your AWS CDK application with the following command:

$ cdk list

GenerativeAiTxt2imgSagemakerStack
GenerativeAiTxt2nluSagemakerStack
GenerativeAiVpcNetworkStack
GenerativeAiDemoWebStack

The following are other useful AWS CDK commands:

cdk ls – Lists all stacks in the app
cdk synth – Emits the synthesized AWS CloudFormation template
cdk deploy – Deploys this stack to your default AWS account and Region
cdk diff – Compares the deployed stack with current state
cdk docs – Opens the AWS CDK documentation

The next section shows you how to deploy the AWS CDK application.
Deploy the AWS CDK application
The AWS CDK application will be deployed to the default Region based on your workstation configuration. If you want to force the deployment in a specific Region, set your AWS_DEFAULT_REGION environment variable accordingly.
At this point, you can deploy the AWS CDK application. First you launch the VPC network stack:

$ cdk deploy GenerativeAiVpcNetworkStack

If you are prompted, enter y to proceed with the deployment. You should see a list of AWS resources that are being provisioned in the stack. This step takes around 3 minutes to complete.
Then you launch the web application stack:

$ cdk deploy GenerativeAiDemoWebStack

After analyzing the stack, the AWS CDK will display the resource list in the stack. Enter y to proceed with the deployment. This step takes around 5 minutes.

Note down the WebApplicationServiceURL from the output to use later. You can also retrieve it on the AWS CloudFormation console, under the GenerativeAiDemoWebStack stack outputs.
Now, launch the image generation AI model endpoint stack:

$ cdk deploy GenerativeAiTxt2imgSagemakerStack

This step takes around 8 minutes. The image generation model endpoint is deployed, we can now use it.
Use the image generation AI model
The first example demonstrates how to utilize Stable Diffusion, a powerful generative modeling technique that enables the creation of high-quality images from text prompts.

Access the web application using the WebApplicationServiceURL from the output of GenerativeAiDemoWebStack in your browser.
In the navigation pane, choose Image Generation.
The SageMaker Endpoint Name and API GW Url fields will be pre-populated, but you can change the prompt for the image description if you’d like.
Choose Generate image.
The application will make a call to the SageMaker endpoint. It takes a few seconds. A picture with the characteristics in your image description will be displayed.

Use the text generation AI model
The second example centers around using the FLAN-T5-XL model, which is a foundation or large language model (LLM), to achieve in-context learning for text generation while also addressing a broad range of natural language understanding (NLU) and natural language generation (NLG) tasks.
Some environments might limit the number of endpoints you can launch at a time. If this is the case, you can launch one SageMaker endpoint at a time. To stop a SageMaker endpoint in the AWS CDK app, you have to destroy the deployed endpoint stack and before launching the other endpoint stack. To turn down the image generation AI model endpoint, issue the following command:

$ cdk destroy GenerativeAiTxt2imgSagemakerStack

Then launch the text generation AI model endpoint stack:

$ cdk deploy GenerativeAiTxt2nluSagemakerStack

Enter y at the prompts.
After the text generation model endpoint stack is launched, complete the following steps:

Go back to the web application and choose Text Generation in the navigation pane.
The Input Context field is pre-populated with a conversation between a customer and an agent regarding an issue with the customers phone, but you can enter your own context if you’d like.
Below the context, you will find some pre-populated queries on the drop-down menu. Choose a query and choose Generate Response.
You can also enter your own query in the Input Query field and then choose Generate Response.

View the deployed resources on the console
On the AWS CloudFormation console, choose Stacks in the navigation pane to view the stacks deployed.

On the Amazon ECS console, you can see the clusters on the Clusters page.

On the AWS Lambda console, you can see the functions on the Functions page.

On the API Gateway console, you can see the API Gateway endpoints on the APIs page.

On the SageMaker console, you can see the deployed model endpoints on the Endpoints page.

When the stacks are launched, some parameters are generated. These are stored in the AWS Systems Manager Parameter Store. To view them, choose Parameter Store in the navigation pane on the AWS Systems Manager console.

Clean up
To avoid unnecessary cost, clean up all the infrastructure created with the following command on your workstation:

$ cdk destroy –all

Enter y at the prompt. This step takes around 10 minutes. Check if all resources are deleted on the console. Also delete the assets S3 buckets created by the AWS CDK on the Amazon S3 console as well as the assets repositories on Amazon ECR.
Conclusion
As demonstrated in this post, you can use the AWS CDK to deploy generative AI models in JumpStart. We showed an image generation example and a text generation example using a user interface powered by Streamlit, Lambda, and API Gateway.
You can now build your generative AI projects using pre-trained AI models in JumpStart. You can also extend this project to fine-tune the foundation models for your use case and control access to API Gateway endpoints.
We invite you to test the solution and contribute to the project on GitHub. Share your thoughts on this tutorial in the comments!
License summary
This sample code is made available under a modified MIT license. See the LICENSE file for more information. Also, review the respective licenses for the stable diffusion and flan-t5-xl models on Hugging Face.

About the authors
Hantzley Tauckoor is an APJ Partner Solutions Architecture Leader based in Singapore. He has 20 years’ experience in the ICT industry spanning multiple functional areas, including solutions architecture, business development, sales strategy, consulting, and leadership. He leads a team of Senior Solutions Architects that enable partners to develop joint solutions, build technical capabilities, and steer them through the implementation phase as customers migrate and modernize their applications to AWS.
Kwonyul Choi is a CTO at BABITALK, a Korean beauty care platform startup, based in Seoul. Prior to this role, Kownyul worked as Software Development Engineer at AWS with a focus on AWS CDK and Amazon SageMaker.
Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.
Satish Upreti is a Migration Lead PSA and Security SME in the partner organization in APJ. Satish has 20 years of experience spanning on-premises private cloud and public cloud technologies. Since joining AWS in August 2020 as a migration specialist, he provides extensive technical advice and support to AWS partners to plan and implement complex migrations.

Appendix: Code walkthrough
In this section, we provide an overview of the code in this project.
AWS CDK application
The main AWS CDK application is contained in the app.py file in the root directory. The project consists of multiple stacks, so we have to import the stacks:

#!/usr/bin/env python3
import aws_cdk as cdk

from stack.generative_ai_vpc_network_stack import GenerativeAiVpcNetworkStack
from stack.generative_ai_demo_web_stack import GenerativeAiDemoWebStack
from stack.generative_ai_txt2nlu_sagemaker_stack import GenerativeAiTxt2nluSagemakerStack
from stack.generative_ai_txt2img_sagemaker_stack import GenerativeAiTxt2imgSagemakerStack

We define our generative AI models and get the related URIs from SageMaker:

from script.sagemaker_uri import *
import boto3

region_name = boto3.Session().region_name
env={“region”: region_name}

#Text to Image model parameters
TXT2IMG_MODEL_ID = “model-txt2img-stabilityai-stable-diffusion-v2-1-base”
TXT2IMG_INFERENCE_INSTANCE_TYPE = “ml.g4dn.4xlarge”
TXT2IMG_MODEL_TASK_TYPE = “txt2img”
TXT2IMG_MODEL_INFO = get_sagemaker_uris(model_id=TXT2IMG_MODEL_ID,
model_task_type=TXT2IMG_MODEL_TASK_TYPE,
instance_type=TXT2IMG_INFERENCE_INSTANCE_TYPE,
region_name=region_name)

#Text to NLU image model parameters
TXT2NLU_MODEL_ID = “huggingface-text2text-flan-t5-xl”
TXT2NLU_INFERENCE_INSTANCE_TYPE = “ml.g4dn.4xlarge”
TXT2NLU_MODEL_TASK_TYPE = “text2text”
TXT2NLU_MODEL_INFO = get_sagemaker_uris(model_id=TXT2NLU_MODEL_ID,
model_task_type=TXT2NLU_MODEL_TASK_TYPE,
instance_type=TXT2NLU_INFERENCE_INSTANCE_TYPE,
region_name=region_name)

The function get_sagemaker_uris retrieves all the model information from JumpStart. See script/sagemaker_uri.py.
Then, we instantiate the stacks:

app = cdk.App()

network_stack = GenerativeAiVpcNetworkStack(app, “GenerativeAiVpcNetworkStack”, env=env)
GenerativeAiDemoWebStack(app, “GenerativeAiDemoWebStack”, vpc=network_stack.vpc, env=env)

GenerativeAiTxt2nluSagemakerStack(app, “GenerativeAiTxt2nluSagemakerStack”, env=env, model_info=TXT2NLU_MODEL_INFO)
GenerativeAiTxt2imgSagemakerStack(app, “GenerativeAiTxt2imgSagemakerStack”, env=env, model_info=TXT2IMG_MODEL_INFO)

app.synth()

The first stack to launch is the VPC stack, GenerativeAiVpcNetworkStack. The web application stack, GenerativeAiDemoWebStack, is dependent on the VPC stack. The dependency is done through parameter passing vpc=network_stack.vpc.
See app.py for the full code.
VPC network stack
In the GenerativeAiVpcNetworkStack stack, we create a VPC with a public subnet and a private subnet spanning across two Availability Zones:

self.output_vpc = ec2.Vpc(self, “VPC”,
nat_gateways=1,
ip_addresses=ec2.IpAddresses.cidr(“10.0.0.0/16″),
max_azs=2,
subnet_configuration=[
ec2.SubnetConfiguration(name=”public”,subnet_type=ec2.SubnetType.PUBLIC,cidr_mask=24),
ec2.SubnetConfiguration(name=”private”,subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,cidr_mask=24)
]
)

See /stack/generative_ai_vpc_network_stack.py for the full code.
Demo web application stack
In the GenerativeAiDemoWebStack stack, we launch Lambda functions and respective API Gateway endpoints through which the web application interacts with the SageMaker model endpoints. See the following code snippet:

# Defines an AWS Lambda function for Image Generation service
lambda_txt2img = _lambda.Function(
self, “lambda_txt2img”,
runtime=_lambda.Runtime.PYTHON_3_9,
code=_lambda.Code.from_asset(“code/lambda_txt2img”),
handler=”txt2img.lambda_handler”,
role=role,
timeout=Duration.seconds(180),
memory_size=512,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
),
vpc=vpc
)

# Defines an Amazon API Gateway endpoint for Image Generation service
txt2img_apigw_endpoint = apigw.LambdaRestApi(
self, “txt2img_apigw_endpoint”,
handler=lambda_txt2img
)

The web application is containerized and hosted on Amazon ECS with Fargate. See the following code snippet:

# Create Fargate service
fargate_service = ecs_patterns.ApplicationLoadBalancedFargateService(
self, “WebApplication”,
cluster=cluster, # Required
cpu=2048, # Default is 256 (512 is 0.5 vCPU, 2048 is 2 vCPU)
desired_count=1, # Default is 1
task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
image=image,
container_port=8501,
),
#load_balancer_name=”gen-ai-demo”,
memory_limit_mib=4096, # Default is 512
public_load_balancer=True) # Default is True

See /stack/generative_ai_demo_web_stack.py for the full code.
Image generation SageMaker model endpoint stack
The GenerativeAiTxt2imgSagemakerStack stack creates the image generation model endpoint from JumpStart and stores the endpoint name in Systems Manager Parameter Store. This parameter will be used by the web application. See the following code:

endpoint = SageMakerEndpointConstruct(self, “TXT2IMG”,
project_prefix = “GenerativeAiDemo”,

role_arn= role.role_arn,

model_name = “StableDiffusionText2Img”,
model_bucket_name = model_info[“model_bucket_name”],
model_bucket_key = model_info[“model_bucket_key”],
model_docker_image = model_info[“model_docker_image”],

variant_name = “AllTraffic”,
variant_weight = 1,
instance_count = 1,
instance_type = model_info[“instance_type”],

environment = {
“MMS_MAX_RESPONSE_SIZE”: “20000000”,
“SAGEMAKER_CONTAINER_LOG_LEVEL”: “20”,
“SAGEMAKER_PROGRAM”: “inference.py”,
“SAGEMAKER_REGION”: model_info[“region_name”],
“SAGEMAKER_SUBMIT_DIRECTORY”: “/opt/ml/model/code”,
},

deploy_enable = True
)

ssm.StringParameter(self, “txt2img_sm_endpoint”, parameter_name=”txt2img_sm_endpoint”, string_value=endpoint.endpoint_name)

See /stack/generative_ai_txt2img_sagemaker_stack.py for the full code.
NLU and text generation SageMaker model endpoint stack
The GenerativeAiTxt2nluSagemakerStack stack creates the NLU and text generation model endpoint from JumpStart and stores the endpoint name in Systems Manager Parameter Store. This parameter will also be used by the web application. See the following code:

endpoint = SageMakerEndpointConstruct(self, “TXT2NLU”,
project_prefix = “GenerativeAiDemo”,

role_arn= role.role_arn,

model_name = “HuggingfaceText2TextFlan”,
model_bucket_name = model_info[“model_bucket_name”],
model_bucket_key = model_info[“model_bucket_key”],
model_docker_image = model_info[“model_docker_image”],

variant_name = “AllTraffic”,
variant_weight = 1,
instance_count = 1,
instance_type = model_info[“instance_type”],

environment = {
“MODEL_CACHE_ROOT”: “/opt/ml/model”,
“SAGEMAKER_ENV”: “1”,
“SAGEMAKER_MODEL_SERVER_TIMEOUT”: “3600”,
“SAGEMAKER_MODEL_SERVER_WORKERS”: “1”,
“SAGEMAKER_PROGRAM”: “inference.py”,
“SAGEMAKER_SUBMIT_DIRECTORY”: “/opt/ml/model/code/”,
“TS_DEFAULT_WORKERS_PER_MODEL”: “1”
},

deploy_enable = True
)

ssm.StringParameter(self, “txt2nlu_sm_endpoint”, parameter_name=”txt2nlu_sm_endpoint”, string_value=endpoint.endpoint_name)

See /stack/generative_ai_txt2nlu_sagemaker_stack.py for the full code.
Web application
The web application is located in the /web-app directory. It is a Streamlit application that is containerized as per the Dockerfile:

FROM python:3.9
EXPOSE 8501
WORKDIR /app
COPY requirements.txt ./requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD streamlit run Home.py
–server.headless true
–browser.serverAddress=”0.0.0.0″
–server.enableCORS false
–browser.gatherUsageStats false

To learn more about Streamlit, see Streamlit documentation.

Meet BLOOMChat: An Open-Source 176-Billion-Parameter Multilingual Chat …

With some great advancements being made in the field of Artificial Intelligence, natural language systems are rapidly progressing. Large Language Models (LLMs) are getting significantly better and more popular with each upgrade and innovation. A new feature or modification is being added nearly daily, enabling LLMs to serve in different applications in almost every domain. LLMs are everywhere, from Machine translation and text summarization to sentiment analysis and question answering.

The open-source community has made some remarkable progress in developing chat-based LLMs, but mostly in the English language. A little less focus has been put on developing kind of similar multilingual chat capability in an LLM. To address that, SambaNova, a software company that focuses on generative AI solutions, has introduced an open-source, multilingual chat LLM called BLOOMChat. Developed in collaboration with Together, which is an open, scalable, and decentralized cloud for Artificial Intelligence, BLOOMChat is a 176-billion-parameter multilingual chat LLM built on top of the BLOOM model.

The BLOOM model has the ability to generate text in 46 natural languages and 13 programming languages. For languages such as Spanish, French, and Arabic, BLOOM represents the first language model ever created with over 100 billion parameters. BLOOM was developed by the BigScience organization, which is an international collaboration of over 1000 researchers. By fine-tuning BLOOM on open conversation and alignment datasets from projects like OpenChatKit, Dolly 2.0, and OASST1, the core capabilities of BLOOM were extended into the chat domain.

For the development of the multilingual chat LLM, BLOOMChat, SambaNova, and Together have used the SambaNova DataScale systems that utilize SambaNova’s unique Reconfigurable Dataflow Architecture for the training process. Synthetic conversation data and human-written samples have been combined to create BLOOMChat. A big synthetic dataset called OpenChatKit has served as the basis for chat functionality, and higher-quality human-generated datasets like Dolly 2.0 and OASST1 have been used to enhance performance significantly. The code and scripts used for instruction-tuning on the OpenChatKit and Dolly-v2 datasets have been made available on SambaNova’s GitHub.

In human evaluations conducted across six languages, BLOOMChat responses were preferred over GPT-4 responses 45.25% of the time. Compared to four other open-source chat-aligned models in the same six languages, BLOOMChat’s responses ranked as the best 65.92% of the time. This accomplishment successfully closes the open-source market’s multilingual chat capability gap. In the WMT translation test, BLOOMChat performed better than additional BLOOM model iterations as well as popular open-source conversation models.

BLOOMChat, like other chat LLMs, has limitations. It may produce factually incorrect or irrelevant information or may switch languages by mistake. It can even repeat phrases, have limited coding or math capabilities, and sometimes generate toxic content. Further research is working towards addressing these challenges and ensuring better usage.

In conclusion, BLOOMChat builds upon the extensive work of the open-source community and is a great addition to the list of some highly useful and multilingual LLMs. By releasing it under an open-source license, SambaNova and Together aims to expand access to advanced multilingual chat capabilities and encourage further innovation in the AI research community.

Check out the Project and Reference Article. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet BLOOMChat: An Open-Source 176-Billion-Parameter Multilingual Chat Large Language Model (LLM) Built on Top of the BLOOM Model appeared first on MarkTechPost.

How To Use ChatGPT To Chat With Any PDF Document

Step 1

Open https://chat.openai.com/

Step 2

Click on the left bottom three dots and then click on settings

Step 3

Click on ‘Beta Features’. Turn on the ‘Web browsing’ and ‘Plugin’

JOIN the fastest ML Subreddit Community

Step 4

close the pop-up and hover over the GPT 4 or GPT 3 on the top bar. Then choose ‘Plugins’ and Open the plugin store. Choose a plugin and install it. More details are here.

Step 5

Search and install ‘AskYourPDF’ plugin

Step 6

Upload your PDF document by adding this prompt ‘upload a pdf’ and then click on the hyperlink ‘Upload Document’. This will open up a new tab where you can upload a PDF Document. After uploading the document, please copy the Document ID.

Step 7

Now go back to the ChatGPT portal and add this prompt with your document ID ‘Prompt → What is this document about? doc_id: XXX’

and add more prompts to ask any questions inside the document.

Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post How To Use ChatGPT To Chat With Any PDF Document appeared first on MarkTechPost.

Best 10 AI Tools for Google Sheets (2023)

In the ever-evolving world of data analysis and automation, artificial intelligence (AI) has become a game-changer. With its ability to streamline tasks and provide intelligent insights, AI is now making its way into Google Sheets, the popular online spreadsheet tool. This article will explore some of the top AI tools available for Google Sheets, each offering unique functionalities to enhance productivity and decision-making.

XLMiner Analysis ToolPak: The XLMiner Analysis ToolPak, a Google Sheets add-on, equips users with advanced data analysis capabilities. It includes regression analysis, clustering, time series forecasting, and other powerful tools. These features empower users to derive meaningful insights from their data, leading to better-informed decision-making.  Link to XLMiner Analysis ToolPak

Power Tools: Power Tools is a collection of AI-powered add-ons designed to enhance data manipulation tasks within Google Sheets. With features such as Split Names, and Remove Duplicates, Power Tools streamlines workflows, making data management more efficient and accurate. Link to Power Tools

DocParser: DocParser simplifies the extraction of structured data from various file formats, such as PDFs and scanned documents, directly into Google Sheets. By automating this process, DocParser saves valuable time and effort otherwise spent on manual data entry. Link to DocParser

Data Everywhere: Seamlessly transfer data between Excel and Google Sheets with a single click. Eliminate the need for manual copy-pasting, emailing, or importing data between the two platforms. Enjoy up-to-date data synchronization and the flexibility to choose between Windows or OS X platforms for a smooth data-sharing experience. Link to Data Everywhere

MonkeyLearn: MonkeyLearn is a powerful AI tool that automates text tagging in Google Sheets, eliminating manual and repetitive tasks. It is 100 times faster than human processing, significantly saving time, and 50 times more cost-effective. With MonkeyLearn, users can ensure consistent tagging criteria without errors, enabling efficient analysis of spreadsheets and faster insights from data. It offers direct integration with Google Sheets, allowing users to build customized reports using Google Sheets or their preferred BI tools. MonkeyLearn also provides quick-start functionality with pre-made models like sentiment analysis or keyword extraction, simplifying the analysis process. Link to MonkeyLearn

Tableau: Tableau, a renowned data visualization platform, offers a Google Sheets connector. This integration allows users to connect their data in Google Sheets to Tableau, empowering them to create interactive dashboards and reports for comprehensive data analysis. Link to Tableau

AppSheet: AppSheet is a no-code platform that allows you to turn your Google Sheets into powerful web and mobile apps. With its AI capabilities, AppSheet automatically detects data relationships, creates interactive forms, and generates intuitive user interfaces. It’s an excellent tool for building custom applications and automating processes using data from Google Sheets. Link to AppSheet

Two Minute Reports (TMR) – Two Minute Reports is a cloud-based add-on for Google Sheets that empowers businesses to connect with multiple data sources and create customized dashboards and reports. With TMR, professionals can schedule automatic data transfers into Google Sheets, ensuring that the data is always up to date. The tool also enables users to generate reports in popular formats such as PDF or Excel and send them as email attachments. This feature streamlines the reporting process, making sharing insights with stakeholders convenient. Link to Two-Minute Reports

Supermetrics: Supermetrics is a popular data integration tool that connects Google Sheets with various data sources, including marketing platforms, analytics tools, and databases. Supermetrics allows users to easily import data from multiple sources directly into Google Sheets, enabling seamless analysis and reporting. It offers a user-friendly interface and supports scheduled data refreshes, ensuring the data is always up to date. Link to Supermetrics

Form Publisher: Form Publisher is a powerful add-on for Google Sheets that automates the creation of personalized documents based on data collected through Google Forms. With Form Publisher, users can generate custom reports, certificates, letters, invoices, and more directly from the responses in Google Sheets. It offers flexible template options, dynamic field merging, and the ability to send the generated documents via email or share them as PDFs. This tool streamlines document creation processes, saving time and ensuring accuracy. Link to Form Publisher

The emergence of AI tools for Google Sheets brings new possibilities for data analysis, automation, and enhanced productivity. These AI tools cater to a wide range of needs, from automating workflows to advanced data analysis and text analytics. By leveraging the power of AI in Google Sheets, users can unlock the full potential of their data, make data-driven decisions, and streamline their spreadsheet-related tasks.

Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Best 10 AI Tools for Google Sheets (2023) appeared first on MarkTechPost.

Instruction fine-tuning for FLAN T5 XL with Amazon SageMaker Jumpstart

Generative AI is in the midst of a period of stunning growth. Increasingly capable foundation models are being released continuously, with large language models (LLMs) being one of the most visible model classes. LLMs are models composed of billions of parameters trained on extensive corpora of text, up to hundreds of billions or even a trillion tokens. These models have proven extremely effective for a wide range of text-based tasks, from question answering to sentiment analysis.
The power of LLMs comes from their capacity to learn and generalize from extensive and diverse training data. The initial training of these models is performed with a variety of objectives, supervised, unsupervised, or hybrid. Text completion or imputation is one of the most common unsupervised objectives: given a chunk of text, the model learns to accurately predict what comes next (for example, predict the next sentence). Models can also be trained in a supervised fashion using labeled data to accomplish a set of tasks (for example, is this movie review positive, negative, or neutral). Whether the model is trained for text completion or some other task, it is frequently not the task customers want to use the model for.
To improve the performance of a pre-trained LLM on a specific task, we can tune the model using examples of the target task in a process known as instruction fine-tuning. Instruction fine-tuning uses a set of labeled examples in the form of {prompt, response} pairs to further train the pre-trained model in adequately predicting the response given the prompt. This process modifies the weights of the model.
This post describes how to perform instruction fine-tuning of an LLM, namely FLAN T5 XL, using Amazon SageMaker Jumpstart. We demonstrate how to accomplish this using both the Jumpstart UI and a notebook in Amazon SageMaker Studio. You can find the accompanying notebook in the amazon-sagemaker-examples GitHub repository.
Solution overview
The target task in this post is to, given a chunk of text in the prompt, return questions that are related to the text but can’t be answered based on the information it contains. This is a useful task to identify missing information in a description or identify whether a query needs more information to be answered.
FLAN T5 models are instruction fine-tuned on a wide range of tasks to increase the zero-shot performance of these models on many common tasks[1]. Additional instruction fine-tuning for a particular customer task can further increase the accuracy of these models, especially if the target task wasn’t previously used to train a FLAN T5 model, as is the case for our task.
In our example task, we’re interested in generating relevant but unanswered questions. To this end, we use a subset of the version 2 of the Stanford Question Answering Dataset (SQuAD2.0)[2] to fine-tune the model. This dataset contains questions posed by human annotators on a set of Wikipedia articles. In addition to questions with answers, SQuAD2.0 contains about 50,000 unanswerable questions. Such questions are plausible but can’t be directly answered from articles’ content. We only use the unanswerable questions. Our data is structured as a JSON Lines file, with each line containing a context and a question.

Prerequisites
To get started, all you need is an AWS account in which you can use Studio. You will need to create a user profile for Studio if you don’t already have one.
Fine-tune FLAN-T5 with the Jumpstart UI
To fine-tune the model with the Jumpstart UI, complete the following steps:

On the SageMaker console, open Studio.
Under SageMaker Jumpstart in the navigation pane, choose Models, notebooks, solutions.

You will see a list of foundation models, including FLAN T5 XL, which is marked as fine-tunable.

Choose View model.

Under Data source, you can provide the path to your training data. The source for the data used in this post is provided by default.
You can keep the default value for the deployment configuration (including instance type), security, and the hyperparameters, but you should increase the number of epochs to at least three to get good results.
Choose Train to train the model.

You can track the status of the training job in the UI.

When training is complete (after about 53 minutes in our case), choose Deploy to deploy the fine-tuned model.

After the endpoint is created (a few minutes), you can open a notebook and start using your fine-tuned model.
Fine-tune FLAN-T5 using a Python notebook
Our example notebook shows how to use Jumpstart and SageMaker to programmatically fine-tune and deploy a FLAN T5 XL model. It can be run in Studio or locally.
In this section, we first walk through some general setup. Then you fine-tune the model using the SQuADv2 datasets. Next, you deploy the pre-trained version of the model behind a SageMaker endpoint, and do the same with the fine-tuned model. Finally, you can query the endpoints and compare the quality of the output of the pre-trained and fine-tuned model. You will find that the output of the fine-tuned model is of much higher quality.
Set up prerequisites
Begin by installing and upgrading the necessary packages. Restart the kernel after running the following code:

!pip install nest-asyncio==1.5.5 –quiet
!pip install ipywidgets==8.0.4 –quiet
!pip install –upgrade sagemaker –quiet

Next, obtain the execution role associated with the current notebook instance:

import boto3
import sagemaker
# Get current region, role, and default bucket
aws_region = boto3.Session().region_name
aws_role = sagemaker.session.Session().get_caller_identity_arn()
output_bucket = sagemaker.Session().default_bucket()
# This will be useful for printing
newline, bold, unbold = “n”, “33[1m”, “33[0m”
print(f”{bold}aws_region:{unbold} {aws_region}”)
print(f”{bold}aws_role:{unbold} {aws_role}”)
print(f”{bold}output_bucket:{unbold} {output_bucket}”

You can define a convenient drop-down menu that will list the model sizes available for fine-tuning:

import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.filters import And
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models
# Default model choice
model_id = “huggingface-text2text-flan-t5-xl”
# Identify FLAN T5 models that support fine-tuning
filter_value = And(
“task == text2text”, “framework == huggingface”, “training_supported == true”
)
model_list = [m for m in list_jumpstart_models(filter=filter_value) if “flan-t5″ in m]
# Display the model IDs in a dropdown, for user to select
dropdown = Dropdown(
value=model_id,
options=model_list,
description=”FLAN T5 models available for fine-tuning:”,
style={“description_width”: “initial”},
layout={“width”: “max-content”},
)
display(IPython.display.Markdown(“### Select a pre-trained model from the dropdown below”))
display(dropdown)

Jumpstart automatically retrieves appropriate training and inference instance types for the model that you chose:

from sagemaker.instance_types import retrieve_default
model_id, model_version = dropdown.value, “*”
# Instance types for training and inference
training_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope=”training”
)
inference_instance_type = retrieve_default(
model_id=model_id, model_version=model_version, scope=”inference”
)
print(f”{bold}model_id:{unbold} {model_id}”)
print(f”{bold}training_instance_type:{unbold} {training_instance_type}”)
print(f”{bold}inference_instance_type:{unbold} {inference_instance_type}”)

If you have chosen the FLAN T5 XL, you will see the following output:

model_id: huggingface-text2text-flan-t5-xl

training_instance_type: ml.p3.16xlarge

inference_instance_type: ml.g5.2xlarge

You’re now ready to start fine-tuning.
Retrain the model on the fine-tuning dataset
After your setup is complete, complete the following steps:
Use the following code to retrieve the URI for the artifacts needed:

from sagemaker import image_uris, model_uris, script_uris
# Training instance will use this image
train_image_uri = image_uris.retrieve(
region=aws_region,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope=”training”,
instance_type=training_instance_type,
)
# Pre-trained model
train_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”training”
)
# Script to execute on the training instance
train_script_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope=”training”
)
print(f”{bold}image uri:{unbold} {train_image_uri}”)
print(f”{bold}model uri:{unbold} {train_model_uri}”)
print(f”{bold}script uri:{unbold} {train_script_uri}”)

The training data is located in a public Amazon Simple Storage Service (Amazon S3) bucket.
Use the following code to point to the location of the data and set up the output location in a bucket in your account:

from sagemaker.s3 import S3Downloader

# We will use the train split of SQuAD2.0
original_data_file = “train-v2.0.json”

# The data was mirrored in the following bucket
original_data_location = f”s3://sagemaker-sample-files/datasets/text/squad2.0/{original_data_file}”
S3Downloader.download(original_data_location, “.”)

The original data is not in a format that corresponds to the task for which you are fine-tuning the model, so you can reformat it:

import json

local_data_file = “task-data.jsonl”  # any name with .jsonl extension

with open(original_data_file) as f:
data = json.load(f)

with open(local_data_file, “w”) as f:
for article in data[“data”]:
for paragraph in article[“paragraphs”]:
# iterate over questions for a given paragraph
for qas in paragraph[“qas”]:
if qas[“is_impossible”]:
# the question is relevant, but cannot be answered
example = {“context”: paragraph[“context”], “question”: qas[“question”]}
json.dump(example, f)
f.write(“n”)

template = {
“prompt”: “Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}”,
“completion”: “{question}”,
}
with open(“template.json”, “w”) as f:
json.dump(template, f)

from sagemaker.s3 import S3Uploader

train_data_location = f”s3://{output_bucket}/train_data”
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload(“template.json”, train_data_location)
print(f”{bold}training data:{unbold} {train_data_location}”)

Now you can define some hyperparameters for the training:

from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# We will override some default hyperparameters with custom values
hyperparameters[“epochs”] = “3”
# TODO
# hyperparameters[“max_input_length”] = “300”  # data inputs will be truncated at this length
# hyperparameters[“max_output_length”] = “40”  # data outputs will be truncated at this length
# hyperparameters[“generation_max_length”] = “40”  # max length of generated output
print(hyperparameters)

You are now ready to launch the training job:

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

model_name = “-“.join(model_id.split(“-“)[2:])  # get the most informative part of ID
training_job_name = name_from_base(f”js-demo-{model_name}-{hyperparameters[‘epochs’]}”)
print(f”{bold}job name:{unbold} {training_job_name}”)

training_metric_definitions = [
{“Name”: “val_loss”, “Regex”: “‘eval_loss’: ([0-9\.]+)”},
{“Name”: “train_loss”, “Regex”: “‘loss’: ([0-9\.]+)”},
{“Name”: “epoch”, “Regex”: “‘epoch’: ([0-9\.]+)”},
]

# Create SageMaker Estimator instance
sm_estimator = Estimator(
role=aws_role,
image_uri=train_image_uri,
model_uri=train_model_uri,
source_dir=train_script_uri,
entry_point=”transfer_learning.py”,
instance_count=1,
instance_type=training_instance_type,
volume_size=300,
max_run=360000,
hyperparameters=hyperparameters,
output_path=output_location,
metric_definitions=training_metric_definitions,
)

# Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console
sm_estimator.fit({“training”: train_data_location}, job_name=training_job_name, wait=False)

Depending on the size of the fine-tuning data and model chosen, the fine-tuning could take up to a couple of hours.
You can monitor performance metrics such as training and validation loss using Amazon CloudWatch during training. Conveniently, you can also fetch the most recent snapshot of metrics by running the following code:

from sagemaker import TrainingJobAnalytics

# This can be called while the job is still running
df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()
df.head(10)

model uri: s3://sagemaker-us-west-2-802376408542/avkan/training-huggingface-text2text-huggingface-text2text-flan-t5-xl-repack.tar.gz
job name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738
INFO:sagemaker:Creating training-job with name: jumpstart-demo-xl-3-2023-04-06-08-16-42-738

When the training is complete, you have a fine-tuned model at model_uri. Let’s use it!
You can create two inference endpoints: one for the original pre-trained model, and one for the fine-tuned model. This allows you to compare the output of both versions of the model. In the next step, you deploy an inference endpoint for the pre-trained model. Then you deploy an endpoint for your fine-tuned model.
Deploy the pre-trained model
Let’s start by deploying the pre-trained model retrieve the inference Docker image URI. This is the base Hugging Face container image. Use the following code:

from sagemaker import image_uris

# Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
region=None,
framework=None,  # automatically inferred from model_id
model_id=model_id,
model_version=model_version,
image_scope=”inference”,
instance_type=inference_instance_type,
)

You can now create the endpoint and deploy the pre-trained model. Note that you need to pass the Predictor class when deploying model through the Model class to be able to run inference through the SageMaker API. See the following code:

from sagemaker import model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

# Retrieve the URI of the pre-trained model
pre_trained_model_uri = model_uris.retrieve(
model_id=model_id, model_version=model_version, model_scope=”inference”
)

pre_trained_name = name_from_base(f”jumpstart-demo-pre-trained-{model_id}”)

# Create the SageMaker model instance of the pre-trained model
if (“small” in model_id) or (“base” in model_id):
deploy_source_uri = script_uris.retrieve(
model_id=model_id, model_version=model_version, script_scope=”inference”
)
pre_trained_model = Model(
image_uri=deploy_image_uri,
source_dir=deploy_source_uri,
entry_point=”inference.py”,
model_data=pre_trained_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=pre_trained_name,
)
else:
# For those large models, we already repack the inference script and model
# artifacts for you, so the `source_dir` argument to Model is not required.
pre_trained_model = Model(
image_uri=deploy_image_uri,
model_data=pre_trained_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=pre_trained_name,
)

print(f”{bold}image URI:{unbold}{newline} {deploy_image_uri}”)
print(f”{bold}model URI:{unbold}{newline} {pre_trained_model_uri}”)
print(“Deploying an endpoint …”)

# Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the SageMaker API
pre_trained_predictor = pre_trained_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=pre_trained_name,
)
print(f”{newline}Deployed an endpoint {pre_trained_name}”)

The endpoint creation and model deployment can take a few minutes, then your endpoint is ready to receive inference calls.
Deploy the fine-tuned model
Let’s deploy the fine-tuned model to its own endpoint. The process is almost identical to the one we used earlier for the pre-trained model. The only difference is that we use the fine-tuned model name and URI:

from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

fine_tuned_name = name_from_base(f”jumpstart-demo-fine-tuned-{model_id}”)
fine_tuned_model_uri = f”{output_location}{training_job_name}/output/model.tar.gz”

# Create the SageMaker model instance of the fine-tuned model
fine_tuned_model = Model(
image_uri=deploy_image_uri,
model_data=fine_tuned_model_uri,
role=aws_role,
predictor_cls=Predictor,
name=fine_tuned_name,
)

print(f”{bold}image URI:{unbold}{newline} {deploy_image_uri}”)
print(f”{bold}model URI:{unbold}{newline} {fine_tuned_model_uri}”)
print(“Deploying an endpoint …”)

# Deploy the fine-tuned model.
fine_tuned_predictor = fine_tuned_model.deploy(
initial_instance_count=1,
instance_type=inference_instance_type,
predictor_cls=Predictor,
endpoint_name=fine_tuned_name,
)
print(f”{newline}Deployed an endpoint {fine_tuned_name}”)

When this process is complete, both pre-trained and fine-tuned models are deployed behind their own endpoints. Let’s compare their outputs.
Generate output and compare the results
Define some utility functions to query the endpoint and parse the response:

import boto3
import json

# Parameters of (output) text generation. A great introduction to generation
# parameters can be found at https://huggingface.co/blog/how-to-generate
parameters = {
“max_length”: 40,  # restrict the length of the generated text
“num_return_sequences”: 5,  # we will inspect several model outputs
“num_beams”: 10,  # use beam search
}

# Helper functions for running inference queries
def query_endpoint_with_json_payload(payload, endpoint_name):
encoded_json = json.dumps(payload).encode(“utf-8”)
client = boto3.client(“runtime.sagemaker”)
response = client.invoke_endpoint(
EndpointName=endpoint_name, ContentType=”application/json”, Body=encoded_json
)
return response

def parse_response_multiple_texts(query_response):
model_predictions = json.loads(query_response[“Body”].read())
generated_text = model_predictions[“generated_texts”]
return generated_text

def generate_questions(endpoint_name, text):
expanded_prompt = prompt.replace(“{context}”, text)
payload = {“text_inputs”: expanded_prompt, **parameters}
query_response = query_endpoint_with_json_payload(payload, endpoint_name=endpoint_name)
generated_texts = parse_response_multiple_texts(query_response)
for i, generated_text in enumerate(generated_texts):
print(f”Response {i}: {generated_text}{newline}”)

In the next code snippet, we define the prompt and the test data. The describes our target task, which is to generate questions that are related to the provided text but can’t be answered based on it.
The test data consists of three different paragraphs, one on the Australian city of Adelaide from the first two paragraphs of it Wikipedia page, one regarding Amazon Elastic Block Store (Amazon EBS) from the Amazon EBS documentation, and one of Amazon Comprehend from the Amazon Comprehend documentation. We expect the model to identify questions related to these paragraphs but that can’t be answered with the information provided therein.

prompt = “Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}”

test_paragraphs = [
“””
Adelaide is the capital city of South Australia, the state’s largest city and the fifth-most populous city in Australia.
“Adelaide” may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.

Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.
“””,
“””
Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance.

We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.
“””,
“””
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition.
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input.
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.
“””
]

You can now test the endpoints using the example articles

print(f”{bold}Prompt:{unbold} {repr(prompt)}”)
for paragraph in test_paragraphs:
print(“-” * 80)
print(paragraph)
print(“-” * 80)
print(f”{bold}pre-trained{unbold}”)
generate_questions(pre_trained_name, paragraph)
print(f”{bold}fine-tuned{unbold}”)
generate_questions(fine_tuned_name, paragraph)

Test data: Adelaide
We use the following context:

delaide is the capital city of South Australia, the state’s largest city and the fifth-most populous city in Australia.
“Adelaide” may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre.
The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide
region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.

Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and
the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of
the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.

The pre-trained model response is as follows:

Response 0: What is the area of the city centre and surrounding parklands called in the Kaurna language?
Response 1: What is the area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language?
Response 2: What is the area of the city centre and surrounding parklands called in Kaurna?
Response 3: What is the capital city of South Australia?
Response 4: What is the area of the city centre and surrounding parklands known as in the Kaurna language?

The fine-tuned model responses are as follows:

Response 0: What is the second most populous city in Australia?
Response 1: What is the fourth most populous city in Australia?
Response 2: What is the population of Gawler?
Response 3: What is the largest city in Australia?
Response 4: What is the fifth most populous city in the world?

Test data: Amazon EBS
We use the following context:

Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance.

We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.

The pre-trained model responses are as follows:

esponse 0: What is the difference between Amazon EBS and Amazon Elastic Block Store (Amazon EBS)?
Response 1: What is the difference between Amazon EBS and Amazon Elastic Block Store?
Response 2: What is the difference between Amazon EBS and Amazon Simple Storage Service (Amazon S3)?
Response 3: What is Amazon Elastic Block Store (Amazon EBS)?
Response 4: What is the difference between Amazon EBS and a hard drive?

The fine-tuned model responses are as follows:

Response 0: What type of applications are not well suited to Amazon EBS?
Response 1: What behaves like formatted block devices?
Response 2: What type of applications are not suited to Amazon EBS?
Response 3: What type of applications are not well suited for Amazon EBS?
Response 4: What type of applications are not suited for Amazon EBS?

Test data: Amazon Comprehend
We use the following context:

Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition.
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input.
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.

The pre-trained model responses are as follows:

Response 0: What does Amazon Comprehend use to extract insights about the content of documents?
Response 1: How does Amazon Comprehend extract insights about the content of documents?
Response 2: What does Amazon Comprehend use to develop insights about the content of documents?
Response 3: How does Amazon Comprehend develop insights about the content of documents?
Response 4: What does Amazon Comprehend use to extract insights about the content of a document?

The fine-tuned model responses are as follows:

Response 0: What does Amazon Comprehend use to extract insights about the structure of documents?
Response 1: How does Amazon Comprehend recognize sentiments in a document?
Response 2: What does Amazon Comprehend use to extract insights about the content of social networking feeds?
Response 3: What does Amazon Comprehend use to extract insights about the content of documents?
Response 4: What type of files does Amazon Comprehend reject as input?

The difference in output quality between the pre-trained model and the fine-tuned model is stark. The questions provided by the fine-tuned model touch on a wider range of topics. They are systematically meaningful questions, which isn’t always the case for the pre-trained model, as illustrated with the Amazon EBS example.
Although this doesn’t constitute a formal and systematic evaluation, it’s clear that the fine-tuning process has improved the quality of the model’s responses on this task.
Clean up
Lastly, remember to clean up and delete the endpoints:

# Delete resources
pre_trained_predictor.delete_model()
pre_trained_predictor.delete_endpoint()
fine_tuned_predictor.delete_model()
fine_tuned_predictor.delete_endpoint()

Conclusion
In this post, we showed how to use instruction fine-tuning with FLAN T5 models using the Jumpstart UI or a Jupyter notebook running in Studio. We provided code explaining how to retrain the model using data for the target task and deploy the fine-tuned model behind an endpoint. The target task in this post was to identify questions that relate to a chunk of text provided in the input but can’t be answered based on the information provided in that text. We demonstrated that a model fine-tuned for this specific task returns better results than a pre-trained model.
Now that you know how to instruction fine-tune a model with Jumpstart, you can create powerful models customized for your application. Gather some data for your use case, uploaded it to Amazon S3, and use either the Studio UI or the notebook to tune a FLAN T5 model!
References
[1] Chung, Hyung Won, et al. “Scaling instruction-fine tuned language models.” arXiv preprint arXiv:2210.11416 (2022).
[2] Rajpurkar, Pranav, Robin Jia, and Percy Liang. “Know What You Don’t Know: Unanswerable Questions for SQuAD.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.

About the authors
Laurent Callot is a Principal Applied Scientist and manager at AWS AI Labs who has worked on a variety of machine learning problems, from foundational models and generative AI to forecasting, anomaly detection, causality, and AI Ops.
Andrey Kan is a Senior Applied Scientist at AWS AI Labs within interests and experience in different fields of Machine Learning. These include research on foundation models, as well as ML applications for graphs and time series.
Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana Champaign. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.
Baris Kurt is an Applied Scientist at AWS AI Labs. His interests are in time series anomaly detection and foundation models. He loves developing user friendly ML systems.
Jonas Kübler is an Applied Scientist at AWS AI Labs. He is working on foundation models with the goal to facilitate use-case specific applications.

Microsoft Researchers Introduce Reprompting: An Iterative Sampling Alg …

In recent times, Large Language Models (LLMs) have evolved and transformed Natural Language Processing with their few-shot prompting techniques.  These models have extended their usability in almost every domain, ranging from Machine translation, Natural Language Understanding, Text completion, sentiment analysis, speech recognition, and so on.  With the few-shot prompting approach, LLMs are provided with a few examples of a particular task, along with some natural language instructions, and using these; they are able to adapt and learn how to perform the task properly.  The tasks requiring iterative steps and constraint propagation come with many limitations when using these prompting techniques, to overcome which a new approach has been introduced.

A team of researchers at Microsoft Research, Redmond, USA, recently introduced a new method called Reprompting, which addresses all the limitations accompanying prompting techniques.  This approach automatically searches for some useful and effective chain-of-thought (CoT) prompts.  Chain-of-thought prompting helps improve the reasoning ability of large language models and helps them perform complex reasoning tasks.  For this, a few chains of thought demonstrations are provided as exemplars during prompting.  Reprompting finds CoT prompts very efficiently without any human involvement. 

The researchers have used an iterative sampling approach known as Gibbs sampling in the Reprompting algorithm.  It frames the problem as sampling from a joint distribution of CoT recipes.  Since the distribution is difficult to characterize directly, Gibbs Sampling has been used as an approximation method.  This sampling method helps determine the best instructions by trying different ones and deciding which works best.

The Reproompting algorithm begins with a sampling of initial CoT recipes with the help of zero-shot prompting, where no prompt information is provided.  Zero-shot prompting enables an LLM to generate task responses without prior training.  The algorithm then iteratively samples new recipes using previously sampled solutions as parent prompts, and these new recipes are used to solve other training problems, aiming to find a set of prompts that share similar CoT prompts. 

The algorithm has been evaluated on the five Big-Bench Hard (BBH) tasks that require multi-step reasoning.  BBH focuses on tasks that are believed to be beyond the abilities and potentials of the current language models.  ChatGPT and InstructGPT have been used as LLMs for the evaluation of the algorithm.  Upon evaluation, Reprompting has proved to perform better than the zero-shot, few-shot, and human-written CoT prompting techniques. 

Reprompting also showed significant potential in model combination by using different LLMs for initializing and sampling new recipes.  It can help in the transfer of knowledge from a stronger model to a weaker model, thus resulting in a noticeably better performance shown by the weaker model.  Reprompting performed better than the human-written CoT prompting on BBH tasks by up to 17 points.  The researchers have mentioned that the CoT recipes that work fine on one model may not work well on another, highlighting the need for optimizing CoT for each model to have some fairer comparisons.

To sum up, the Reprompting algorithm is a great automated approach for finding effective CoT prompts for LLMs without human intervention.  It is a valuable approach to addressing the limitations of existing methods and achieving superior performance on tasks requiring multi-step reasoning.

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Microsoft Researchers Introduce Reprompting: An Iterative Sampling Algorithm that Searches for the Chain-of-Thought (CoT) Recipes for a Given Task without Human Intervention appeared first on MarkTechPost.

Does The Segment Anything Model Work For Medical Images? This AI Study …

Image segmentation, which includes the segmentation of organs, abnormalities, bones, and other objects, is a key problem in medical image analysis. Deep learning has made considerable advances in this area. The expensive and time-consuming nature of gathering and curating medical images, particularly because trained radiologists must frequently provide meticulous mask annotations, makes it practically difficult to develop and train segmentation models for new medical imaging data and tasks. These issues might be considerably reduced with the introduction of foundation models and zero-shot learning. 

The natural language processing field has benefited from foundation models’ paradigm-shifting capacities. To perform zero-shot learning on brand-new data in various contexts, foundation models are neural networks trained on a large amount of data with inventive knowledge and prompting objectives that typically do not require traditional supervised training labels. The recently created Segment Anything Model is a foundation model which has demonstrated impressive zero-shot segmentation performance on several realistic picture datasets. Researchers from the Duke University put it to the test on a medical image dataset.

In response to user-provided instructions, the Segment Anything Model (SAM) is intended to segment an object of interest in an image. A single point, a group of points (including a whole mask), a bounding box, or text can all be used as prompts. Even when the prompt is unclear, the model is prompted to provide a suitable segmentation mask. The main notion behind this method is that the model can segment any object that is pointed out since it has learnt the concept of an object. As a result, there is a good chance that it will perform well under the zero-shot learning regime and be able to segment objects of kinds that it has never seen before. The SAM authors used a particular model architecture and a particularly big dataset in addition to the prompt-based formulation of the job, as explained in the following. 

SAM was gradually trained while the collection of pictures and accompanying object masks (SA-1B) was being developed. Three processes went into the creation of the dataset. First, human annotators clicked on items in a series of photographs and manually refined masks produced by SAM, which had been trained on open datasets at the time. Second, to broaden the variety of objects, the annotators were requested to segment masks SAM had yet to create confidently. The final set of masks was created automatically by picking confident and stable masks and providing the SAM model with a collection of points scattered in a grid over the image. 

SAM is made to need one or more prompts to generate a segmentation mask. Technically, the model may be run without asking for any visible items, but they don’t anticipate this will be helpful for medical imaging because there are frequently many other things in the image in addition to the one of interest. SAM cannot be utilised in the same manner as most segmentation models in medical imaging, where the input is only an image and the output is a segmentation mask or multiple segmentation masks for the required item or objects. This is because SAM is prompt-based. They suggest that there are three key applications for SAM in the segmentation of medical pictures. 

The first two entail training new models, creating masks, or annotating data using the Segment Anything Model itself. These methods don’t involve SAM adjustments. The final method is developing and honing a SAM-like model specifically for medical imagery. Then, each strategy is explained. Because SAM is still in the proof-of-concept phase with text-based prompting, please note that they make no comments here. “Human in the loop” semi-automated annotation. One of the major obstacles to creating segmentation models in this discipline is the human annotation of medical pictures, which often takes up doctors’ valuable time. 

SAM might be utilized as a tool for quicker annotation in this situation. There are several methods for doing this. In the most basic scenario, a human user prompts SAM, which creates a mask that the user may accept or modify. This might be improved repeatedly. The “segment everything” mode is another option, where SAM receives instructions spaced evenly over the image and creates masks for several things that the user may subsequently name, pick, and/or modify. There are many more options after this; this is only the beginning.

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Does The Segment Anything Model Work For Medical Images? This AI Study Explains appeared first on MarkTechPost.

Meta AI Introduces MTIA v1: It’s First-Generation AI Inference Accel …

At Meta, AI workloads are everywhere, serving as the foundation for numerous applications like content comprehension, Feeds, generative AI, and ad ranking. Thanks to its seamless Python integration, eager-mode programming, and straightforward APIs, PyTorch can run these workloads. In particular, DLRMs are vital to enhancing user experiences across all of Meta’s products and offerings. The hardware systems must supply increasingly more memory and computing as the size and complexity of these models grow, all without sacrificing efficiency.

When it comes to the highly efficient processing of Meta’s unique recommendation workloads at scale, GPUs aren’t always the best option. To address this issue, the Meta team developed a set of application-specific integrated circuits (ASICs) called the “Meta Training and Inference Accelerator” (MTIA). With the needs of the next-generation recommendation model in mind, the first-generation ASIC is included in PyTorch to develop a completely optimized ranking system. Keeping developers productive is an ongoing process as they maintain support for PyTorch 2.0, which dramatically improves the compiler-level performance of PyTorch.

In 2020, the team created the original MTIA ASIC to handle Meta’s internal processing needs. Co-designed with silicon, PyTorch, and the recommendation models, this inference accelerator is part of a full-stack solution. Using a TSMC 7nm technology, this 800 MHz accelerator can achieve 102.4 TOPS with INT8 precision and 51.2 TFLOPS with FP16 precision. The device’s TDP, or thermal design power, is 25 W.

The accelerator can be divided into constituent parts, including processing elements (PEs), on-chip and off-chip memory resources, and interconnects in a grid structure. An independent control subsystem within the accelerator manages the software. The firmware coordinates the execution of jobs on the accelerator, controls the available computing and memory resources, and communicates with the host through a specific host interface. LPDDR5 is used for off-chip DRAM in the memory subsystem, which allows for expansion to 128 GB. More bandwidth and far less latency are available for frequently accessed data and instructions because the chip’s 128 MB of on-chip SRAM is shared among all the PEs.

The 64 PEs in the grid are laid out in an 8 by 8 matrix. Each PE’s 128 KB of local SRAM memory allows for speedy data storage and processing. A mesh network links the PEs together and to the memory banks. The grid can be used in its whole to perform a job, or it can be split up into numerous subgrids, each of which can handle its work. Matrix multiplication, accumulation, data transportation, and nonlinear function calculation are only some of the important tasks optimized for by the multiple fixed-function units and two processor cores in each PE. The RISC-V ISA-based processor cores have been extensively modified to perform the required computation and control operations. The architecture was designed to make the most of two essentials for effective workload management: parallelism and data reuse.

The researchers compared MTIA to an NNPI accelerator and a graphics processing unit. The results show that MTIA relies on efficiently managing small forms and batch sizes for low-complexity models. MTIA actively optimizes its SW stack to achieve similar levels of performance. In the meantime, it uses larger forms that are significantly more optimized on the GPU’s SW stack to run medium- and high-complexity models.

To optimize performance for Meta’s workloads, the team is now concentrating on finding a happy medium between computing power, memory capacity, and interconnect bandwidth to develop a better and more efficient solution.

Check out the Project. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meta AI Introduces MTIA v1: It’s First-Generation AI Inference Accelerator appeared first on MarkTechPost.

Meet LETI: A New Language Model (LM) Fine-Tuning Paradigm That Explore …

With the increasing popularity of Large Language Models (LLMs), new research and advancements are getting introduced almost every day. Using deep learning technologies and the power of Artificial Intelligence, LLMs are continuously evolving and spreading in every domain. LLMs are trained on massive amounts of raw text, and in order to enhance their performance, these models are fine-tuned. During the process of fine-tuning, LLMs are trained on particular tasks using direct training signals that measure their performance, such as classification accuracy, question answering, document summarization, etc. 

Recently, a new fine-tuning paradigm called LETI (Learn from Textual Interactions) has been introduced, which dives into the potential that Large Language Models can learn from textual interactions & feedback. LETI enables language models to understand not just if they were wrong but why they are wrong. This approach enables LLMs to surpass the limitations of learning solely from labels and scalar rewards.

The team of researchers behind the development of LETI has mentioned how this approach provides textual feedback to the language model. It helps check the correctness of the model’s outputs with the help of binary labels and identifies and explains errors in its generated code. The LETI paradigm is just like the iterative process of software development, which involves a developer writing a program, testing it, and improving it based on feedback. Similarly, LETI fine-tunes the LLM by providing textual feedback that pinpoints bugs and errors. 

During the fine-tuning process, the model is prompted with a natural language problem description, followed by which it generates a set of solutions. A Solution Evaluator then evaluates these solutions using a set of test cases. The researchers used a Python interpreter to use the error messages and stack traces obtained from the generated code as the source of textual feedback. The Solution Evaluator is that Python interpreter.

The training data used for fine-tuning the model consists of three components: natural language instructions, LM-generated programs, and textual feedback. When the generated program is unable to provide a solution, feedback is provided to the LLM. Otherwise, a reward token is provided to the model in the form of binary feedback to encourage it to generate an accurate solution. The generated textual feedback is used in the fine-tuning process of the LM, known as Feedback-Conditioned Fine-Tuning.

For the evaluation process, the researchers have used a dataset of code generation tasks called the  MBPP (Multiple Big Programming Problems) datasets. The results have shown that LETI significantly improves the performance of two base LMs of different scales on the MBPP dataset without requiring ground-truth outputs for training. On the HumanEval dataset, LETI achieves a similar or better performance than the base LMs on unseen problems. Moreover, researchers have found that, as compared to binary feedback, using textual feedback allows the model to achieve the same performance but with fewer gradient steps. 

In conclusion, LETI is a great approach for fine-tuning which enhances language models by using detailed textual feedback. It enables them to learn from mistakes and improve performance in tasks like code generation. LETI seems promising.

Check out the Paper and GitHub link. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet LETI: A New Language Model (LM) Fine-Tuning Paradigm That Explores LM’s Potential To Learn From Textual Interactions appeared first on MarkTechPost.

What if LLM Hallucinations Were A Feature And Not A Bug? Meet dreamGPT …

Large Language Models are the new trend for all good reasons. These models use deep learning techniques and are trained on large amounts of textual data. They produce human-like text and perform various Natural Language Processing (NLP) and Natural Language Understanding (NLU) tasks. Some famous LLMs like GPT 3.5, GPT 4, BERT, DALL-E, and T5 are performing various tasks like generating meaningful responses to questions, text summarization, translations, text-to-text transformation, and so on. 

Recently, a new approach called dreamGPT has been introduced, which makes use of the power of hallucinations from Large Language Models to stimulate divergent thinking. This innovative approach helps in generating unique and creative ideas. While on the one hand, where hallucinations are typically associated with a negative connotation and are mostly referred to as a drawback of LLMs, DreamGPT enables the transformation of hallucinations into something valuable for generating innovative solutions.

The current LLMs are mainly designed to address particular problems by understanding and generating text based on instructions or prompts. But, these models are limited to generating responses that align with existing patterns that they have learned from the data they have been trained upon. This restricts their ability to explore alternative or unconventional ideas. Here comes DreamGPT, with a different methodology to make use of the inherent capacity of LLMs to hallucinate. During the generation of text, the production of a text that may not have a direct basis in reality but can still be useful and creative is the aim of this approach. 

This can help dreamGPT explore different use cases and use divergent thinking. Divergent thinking refers to generating a wide range of creative ideas, considering multiple perspectives, and exploring different solutions. Using this, DreamGPT can explore as many possibilities as possible instead of just aiming for a single correct answer or a specific problem-solving approach.

To use dreamGPT, the users need to install Python 3.10+ and Poetry. Poetry is a tool that is used for dependency management and packaging in Python. It allows the declaration of the used libraries in a project and helps in installing and updating them. DreamGPT works in a loop by planting random seeds, dreaming about new and creative ideas, combining and evaluating different approaches, selecting the most novel approach, and repeating it in a cycle.

dreamGPT is open-source in nature and can run locally on any PC or Mac without the requirement of a GPU on the device. Samples have been shown on Github’s Readme file, which can be accessed at here. On running DreamGPT, it generates a random seed of concepts and uses them as a starting point for its dreaming process. Each idea is evaluated based on diverse criteria, and the score is used to reward the best ideas over time. With the growth in population, the results improve.

In conclusion, dreamGPT is a great approach that embraces the hallucinatory capabilities of LLMs and seems promising for stimulating divergent thinking and generating innovative ideas.

Check out the GitHub and Reddit Post. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post What if LLM Hallucinations Were A Feature And Not A Bug? Meet dreamGPT: An Open-Source GPT-Based Solution That Uses Hallucinations From Large Language Models (LLMs) As A Feature appeared first on MarkTechPost.

Meta AI Introduces Generative AI For Advertisers

Meta, the company that owns Facebook, has made an important announcement that has created excitement in advertising. With the positive quarterly results for Q1 2023, Meta has uncovered another addition to their money-making projects. AI Sandbox is a completely generative AI-based tool tailored for advertisers and content creators. This tool strengthens how brands interact and engage with their audience on social media platforms.

The announcement created a positive outlook for Meta, as it has posted year-on-year revenue growth for the first time in three quarters. This positive momentum further strengthens the company’s position as a leader in the tech industry and enhances its vision to deliver innovative solutions for advertisers.

The AI Sandbox offers three main features as per the release. The first feature enables brands to generate variations of the same copy with customization capabilities for their advertisements. This feature helps them to tailor their message for a range of demographics while keeping the core idea of the message intact. This level of personalization ensures that brands connect with their target audience more personally, maximizing their impression of them.

The second feature of the AI sandbox is a background generation tool that simplifies the creation of uniquely different campaign materials. Based on text prompt as an inference input, This allows brands to create unique ads with trendy and engaging backdrops that align with the ad’s core idea. This feature enables the simple and enhanced visual addition to their suite of campaign assets and saves time in the whole iteration of the ad material lifecycle.

The third and final feature of the AI sandbox is its image-cropping tool. This tool allows advertisers to create visuals in different aspect ratios, such as social posts, stories, or even short videos like Reels. This simple-sounding automation is a quite needed feature in creator space as a significant amount of time is spent creating these visuals in different dimensions per the requirements. This feature saves the time and efforts of a creator hence, enhancing a creator or an advertiser’s overall experience.

Meta has made this generative AI Sandobx features available to a select group of advertisers. However, new expansions will roll out in July of this year. In a recent post, Meta stated, “In July, we will begin gradually expanding access to more advertisers with plans to add some of these features into our products later this year.” This expansion of access reflects Meta’s commitment to empowering a broader range of advertisers with the capabilities offered by the AI Sandbox.

It is worth noting that Meta’s expansion into AI and advertising innovation does not mean they have a blurry vision of Metaverse development. While Meta works on developing AI tools to enhance advertising capabilities, it remains dedicated to building a metaverse and revolutionizing how people connect and engage with technology.

Conclusively, the announcement of Meta’s AI sandbox for advertisers marks a significant milestone for the company. As Meta expands its services beyond social media, this AI toolset is set to revolutionize the advertising industry. By enhancing the process of content creators with the help of these generative AI features, Meta aims to strengthen engagement, streamline the creative process and provide brands with the resources they need to connect with the target audience quickly and effectively.

Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meta AI Introduces Generative AI For Advertisers appeared first on MarkTechPost.

Google is Adding Text-to-Code Generation for Cells in Colab

In a remarkable stride toward enhancing the coding experience, Google Colab, the go-to platform for Python programming since 2017, is set to unveil an array of cutting-edge AI coding features. With an established user base of over 7 million individuals, including students and professionals, Colab has empowered users to tap into powerful computing resources seamlessly, free of charge, and without needing software installations or management. With the integration of advanced AI capabilities, Colab is poised to redefine programming efficiency, speed, and comprehension.

The forthcoming enhancements will be made possible through implementing Codey, an innovative family of code models built on the revolutionary PaLM 2 technology, unveiled recently at the prestigious I/O event. Codey meticulously fine-tuned on a vast corpus of high-quality, permissively licensed code from external sources, has been optimized explicitly for Python and tailored to cater to the unique needs of Colab users. This custom-built adaptation ensures an unparalleled programming experience within the Colab ecosystem.

One of the standout features soon to grace Colab is the natural language to code generation capability. This groundbreaking functionality empowers users to generate extensive blocks of code effortlessly, enabling the creation of entire functions from comments or prompts. The objective is to alleviate the burden of writing repetitive code, freeing up valuable time to focus on the more captivating aspects of programming and data science. Eligible users will discover a new “Generate” button within their notebooks, which acts as a gateway to the code generation feature. By inputting any text prompt, users will witness the seamless transformation of natural language into executable code.

For those subscribed to Colab’s paid tier, the AI-powered autocomplete suggestions will further streamline the coding process. Intelligent recommendations will be provided in real-time as users type, assisting with syntax completion and reducing the likelihood of errors. This feature catalyzes productivity, facilitating a smoother coding experience.

To enhance user support, Colab is introducing an interactive code-assisting chatbot. Users will soon be able to seek assistance and ask questions directly within the Colab interface. From queries like “How do I import data from Google Sheets?” to “How do I filter a Pandas DataFrame?”—a wealth of knowledge and guidance will be just a chatbox away. This integration of a chatbot marks an exciting step toward integrating comprehensive learning resources within the Colab platform.

The impact of these advancements extends beyond geographical boundaries. Colab’s free access to anyone with an internet connection has facilitated access for millions of students and under-resourced communities worldwide. The integration of AI capabilities promises to democratize programming education further, granting access to high-powered GPUs for machine learning applications and empowering users to embark on their data science journeys.

With the imminent rollout of these features, Colab aims to usher in a new era of programming productivity. Gradual availability will commence with the paid subscribers in the United States, subsequently expanding to the free-of-charge tier and encompassing other regions.

As Google Colab continues to evolve, users can eagerly anticipate many forthcoming features and improvements seamlessly integrating with their data and machine learning workflows. With AI as a steadfast ally, programming and data science endeavors are set to soar to new heights, unlocking innovation and creativity at every step.

Check out the Reference Article. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Google is Adding Text-to-Code Generation for Cells in Colab appeared first on MarkTechPost.

Meet Argilla: An Open-Source Data Curation Platform for Large Language …

Generative Artificial Intelligence has taken over the world, especially in the past few months. The super popular chatbot, ChatGPT, developed by OpenAI, has more than a million users and is used by almost everyone, whether researchers in the AI domain or students. Based on the GPT architecture, this Large Language Model (LLM) helps answer questions, generate unique and accurate content, summarize long textual paragraphs, complete codes, and so on. With the release of the latest version by the OpenAI community, i.e., the GPT-4 version, ChatGPT now also supports multimodal data. Other famous LLMs like DALL-E, BERT, and LLaMa have also contributed to some great advancements in the domain of Generative AI.

An open-source data curation platform called Argilla has recently been introduced for Large Language Models. Argilla has been released to help users in completing the full lifecycle of developing, evaluating, and improving Natural Language Processing Models, from the initial experimentation phase to the deployment in production environments. This platform uses human and machine feedback to build some robust LLMs through quicker data curation. 

Argilla helps the user in each and every step of the MLOps cycle, ranging from data labeling to model monitoring. Data labeling is a crucial step in training supervised NLP models, as annotating and labeling raw textual data helps in creating high-quality labeled datasets. On the other hand, Model monitoring is another crucial step to monitor the performance and behavior of deployed models in real time, thereby maintaining the model’s reliability and consistency. 

The developers have shared a few principles upon which Argilla is based on. Those are as follows. 

Open-source – Argilla is open-source in nature, meaning it’s free for everyone to use and modify. It supports major NLP libraries like Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc., and users can combine their preferred libraries without implementing any specific interface.

End-to-end – Argilla provides an end-to-end solution for ML model development by bridging the gap between data collection, model iteration, and production monitoring. Argilla considers the data collection process an ongoing process for continuous improvement of the model and enables iterative development throughout the entire Machine Learning lifecycle.

Better user and developer experience – Argilla focuses on user and developer experience by creating a user-friendly environment where domain experts can easily interpret and annotate data and experiment, and engineers have complete control over the data pipelines. 

Beyond traditional hand-labeling – Argilla goes beyond traditional hand-labeling workflows by offering a range of innovative data annotation approaches. It allows the users to combine hand labeling with active learning, bulk labeling, and zero-shot models, which enables more efficient and cost-effective data annotation workflows.

Argilla is a production-ready framework and supports data curation, evaluation, model monitoring, debugging, and explainability. It automates human-in-the-loop workflows and can smoothly integrate with any tools of the user’s choice. It can be locally deployed on the device using the Docker command – ‘docker run -d –name argilla -p 6900:6900 argilla/argilla-quickstart:latest’.

Check out the Github link. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meet Argilla: An Open-Source Data Curation Platform for Large Language Models (LLMs) and MLOps for Natural Language Processing appeared first on MarkTechPost.

Meta AI Researchers Propose MEGABYTE: A Multiscale Decoder Architectur …

Million-byte sequences are common as music, picture, and video files frequently have several megabyte sizes. However, because of the quadratic cost of self-attention and, more significantly, the expense of large feedforward networks per position, large transformer decoders (LLMs) normally only require a few thousand tokens of context. This significantly reduces the range of tasks for which LLMs may be used. Researchers from META present MEGABYTE, a novel method for simulating lengthy byte sequences. Byte sequences are divided into fixed-sized patches roughly equivalent to tokens. 

Then, their model has three components: 

(1) A local module, a tiny autoregressive model that forecasts bytes within a patch. 

(2) A patch embedder merely encodes a patch by losslessly concatenating embeddings of each byte. 

(3) A global module, a big autoregressive transformer that inputs and outputs patch representations.

Importantly, most byte predictions are straightforward for many tasks (such as completing a word given the initial few letters), negating the need for massive networks per byte and allowing for considerably smaller models for intra-patch modeling. For extended sequence modeling, the MEGABYTE architecture offers three key advantages over Transformers: Self-attention that is sub-quadratic The vast majority of research on long sequence models has been devoted to reducing the quadratic cost of self-attention. Lengthy sequences are divided into two shorter sequences using MEGABYTE, and the self-attention cost is decreased to O(N(4/3)) by using optimum patch sizes, which are still tractable for lengthy sequences. Layers with per-patch feedforward. MEGABYTE allows for far bigger and more expressive models at the same cost by using huge feedforward layers per patch rather than per position. More than 98% of FLOPS are used in GPT3-size models to compute position-wise feedforward layers. 

Decoding Parallelism three Transformers must serially process all calculations during generation since each timestep’s input results from the initial output. MEGABYTE makes Greater parallelism during generation possible thanks to the parallel production of representations for patches. With patch size P, MEGABYTE may utilize a layer with mP parameters once for the same price as a baseline transformer, using the same feedforward layer with m parameters P times. For instance, when trained on the same compute, a MEGABYTE model with 1.5B parameters may create sequences 40% quicker than a conventional 350M Transformer while increasing perplexity. 

Together, these enhancements enable us to expand to lengthy sequences, increase generation speed during deployment, and train much bigger and better-performing models for the same computational budget. Sequences of bytes are translated into bigger discrete tokens in existing autoregressive models, which generally involve some tokenization. This is where MEGABYTE stands in stark contrast. Tokenization makes pre-processing, multi-modal modeling, and transfer to different domains more difficult while obscuring the model’s beneficial structure. Additionally, it implies that most cutting-edge models are still in progress. The most popular methods of tokenization lose information without language-specific heuristics. 

Therefore, switching from tokenization to performant and effective byte models would have several benefits. They carry out in-depth tests for both strong baselines and MEGABYTE. To concentrate their comparisons entirely on the model architecture rather than training resources, which are known to be advantageous to all models, they employ a single compute and data budget across all models. They discover that MEGABYTE enables byte-level models to reach state-of-the-art perplexities for density estimation on ImageNet, perform competitively with subword models on extended context language modeling, and allow audio modeling from raw audio data. These findings show that tokenization-free autoregressive sequence modeling is scaleable.

Check out the Paper. Don’t forget to join our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

Check Out 100’s AI Tools in AI Tools Club
The post Meta AI Researchers Propose MEGABYTE: A Multiscale Decoder Architecture that Enables End-to-End Differentiable Modeling of Sequences of Over One Million Bytes appeared first on MarkTechPost.