Build a serverless voice-based contextual chatbot for people with disa …

At Amazon and AWS, we are always finding innovative ways to build inclusive technology. With voice assistants like Amazon Alexa, we are enabling more people to ask questions and get answers on the spot without having to type. Whether you’re a person with a motor disability, juggling multiple tasks, or simply away from your computer, getting search results without typing is a valuable feature. With modern voice assistants, you can now ask your questions conversationally and get verbal answers instantly.
In this post, we discuss voice-guided applications. Specifically, we focus on chatbots. Chatbots are no longer a niche technology. They are now ubiquitous on customer service websites, providing around-the-clock automated assistance. Although AI chatbots have been around for years, recent advances of large language models (LLMs) like generative AI have enabled more natural conversations. Chatbots are proving useful across industries, handling both general and industry-specific questions. Voice-based assistants like Alexa demonstrate how we are entering an era of conversational interfaces. Typing questions already feels cumbersome to many who prefer the simplicity and ease of speaking with their devices.
We explore how to build a fully serverless, voice-based contextual chatbot tailored for individuals who need it. We also provide a sample chatbot application. The application is available in the accompanying GitHub repository. We create an intelligent conversational assistant that can understand and respond to voice inputs in a contextually relevant manner. The AI assistant is powered by Amazon Bedrock. This chatbot is designed to assist users with various tasks, provide information, and offer personalized support based on their unique requirements. For our LLM, we use Anthropic Claude on Amazon Bedrock.
We demonstrate the process of integrating Anthropic Claude’s advanced natural language processing capabilities with the serverless architecture of Amazon Bedrock, enabling the deployment of a highly scalable and cost-effective solution. Additionally, we discuss techniques for enhancing the chatbot’s accessibility and usability for people with motor disabilities. The aim of this post is to provide a comprehensive understanding of how to build a voice-based, contextual chatbot that uses the latest advancements in AI and serverless computing.
We hope that this solution can help people with certain mobility disabilities. A limited level of interaction is still required, and specific identification of start and stop talking operations is required. In our sample application, we address this by having a dedicated Talk button that performs the transcription process while being pressed.
For people with significant motor disabilities, the same operation can be implemented with a dedicated physical button that can be pressed by a single finger or another body part. Alternatively, a special keyword can be said to indicate the beginning of the command. This approach is used when you communicate with Alexa. The user always starts the conversation with “Alexa.”
Solution overview
The following diagram illustrates the architecture of the solution.

To deploy this architecture, we need managed compute that can host the web application, authentication mechanisms, and relevant permissions. We discuss this later in the post.
All the services that we use are serverless and fully managed by AWS. You don’t need to provision the compute resources. You only consume the services through their API. All the calls to the services are made directly from the client application.
The application is a simple React application that we create using the Vite build tool. We use the AWS SDK for JavaScript to call the services. The solution uses the following major services:

Amazon Polly is a service that turns text into lifelike speech.
Amazon Transcribe is an AWS AI service that makes it straightforward to convert speech to text.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities that you need to build generative AI applications.
Amazon Cognito is an identity service for web and mobile apps. It’s a user directory, an authentication server, and an authorization service for OAuth 2.0 access tokens and AWS credentials.

To consume AWS services, the user needs to obtain temporary credentials from AWS Identity and Access Management (IAM). This is possible due to the Amazon Cognito identity pool, which acts as a mediator between your application user and IAM services. The identity pool holds the information about the IAM roles with all permissions necessary to run the solution.
Amazon Polly and Amazon Transcribe don’t require additional setup from the client aside from what we have described. However, Amazon Bedrock requires named user authentication. This means that having an Amazon Cognito identity pool is not enough—you also need to use the Amazon Cognito user pool, which allows you to define users and bind them to the Amazon Cognito identity pool. To understand better how Amazon Cognito allows external applications to invoke AWS services, refer to refer to Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.
The heavy lifting of provisioning the Amazon Cognito user pool and identity pool, including generating the sign-in UI for the React application, is done by AWS Amplify. Amplify consists of a set of tools (open source framework, visual development environment, console) and services (web application and static website hosting) to accelerate the development of mobile and web applications on AWS. We cover the steps of setting Amplify in the next sections.
Prerequisites
Before you begin, complete the following prerequisites:

Make sure you have the following installed:

Node.js
npm
git

Create an IAM role to use in the Amazon Cognito identity pool. Use the least privilege principal to provide only the minimum set of permissions needed to run the application.

To invoke Amazon Bedrock, use the following code:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor1”,
“Effect”: “Allow”,
“Action”: “bedrock:InvokeModel”,
“Resource”: “*”
}
]
}

To invoke Amazon Polly, use the following code:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor2”,
“Effect”: “Allow”,
“Action”: “polly:SynthesizeSpeech”,
“Resource”: “*”
}
]
}

To invoke Amazon Transcribe, use the following code:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor3”,
“Effect”: “Allow”,
“Action”: “transcribe:StartStreamTranscriptionWebSocket”,
“Resource”: “*”
}
]
}

The full policy JSON should look as follows:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “VisualEditor1”,
“Effect”: “Allow”,
“Action”: “bedrock:InvokeModel”,
“Resource”: “*”
},
{
“Sid”: “VisualEditor2”,
“Effect”: “Allow”,
“Action”: “polly:SynthesizeSpeech”,
“Resource”: “*”
},
{
“Sid”: “VisualEditor3”,
“Effect”: “Allow”,
“Action”: “transcribe:StartStreamTranscriptionWebSocket”,
“Resource”: “*”
}
]
}

Run the following command to clone the GitHub repository:

git clone https://github.com/aws-samples/serverless-conversational-chatbot.git

To use Amplify, refer to Set up Amplify CLI to complete the initial setup.
To be consistent with the values that you use later in the instructions, call your AWS profile amplify when you see the following prompt.
Create the role amplifyconsole-backend-role with the AdministratorAccess-Amplify managed policy, which allows Amplify to create the necessary resources.
For this post, we use the Anthropic Claude 3 Haiku LLM. To enable the LLM in Amazon Bedrock, refer to Access Amazon Bedrock foundation models.

Deploy the solution
There are two options to deploy the solution:

Use Amplify to deploy the application automatically
Deploy the application manually

We provide the steps for both options in this section.
Deploy the application automatically using Amplify
Amplify can deploy the application automatically if it’s stored in GitHub, Bitbucket, GitLab, or AWS CodeCommit. Upload the application that you downloaded earlier to your preferred repository (from the aforementioned options). For instructions, see Getting started with deploying an app to Amplify Hosting.
You can now continue to the next section of this post to set up IAM permissions.
Deploy the application manually
If you don’t have access to one of the storage options that we mentioned, you can deploy the application manually. This can also be useful if you want to modify the application to better fit your use case.
We tested the deployment on AWS Cloud9, a cloud integrated development environment (IDE) for writing, running, and debugging code, with Ubuntu Server 22.04 and Amazon Linux 2023.
We use the Visual Studio Code IDE and run all the following commands directly in the terminal window inside the IDE, but you can also run the commands in the terminal of your choice.

From the directory where you checked out the application on GitHub, run the following command:

cd serverless-conversational-chatbot

Run the following commands:

npm i

amplify init

Follow the prompts as shown in the following screenshot.

For authentication, choose the AWS profile amplify that you created as part of the prerequisite steps.
Two new files will appear in the project under the src folder:

amplifyconfiguration.json
aws-exports.js

Next run the following command:

amplify configure project

Then select “Project Information”

 Enter the following information:

Which setting do you want to configure? Project information

Enter a name for the project: servrlsconvchat

Choose your default editor: Visual Studio Code

Choose the type of app that you’re building: javascript

What javascript framework are you using: react

Source Directory Path: src

Distribution Directory Path: dist

Build Command: npm run-script build

Start Command: npm run-script start

You can use an existing Amazon Cognito identity pool and user pool or create new objects.

For our application, run the following command:

amplify add auth

If you get the following message, you can ignore it:

Auth has already been added to this project. To update run amplify update auth

Choose Default configuration.
Accept all options proposed by the prompt.
Run the following command:

amplify add hosting

Choose your hosting option.

You have two options to host the application. The application can be hosted to the Amplify console or to Amazon Simple Storage Service (Amazon S3) and then exposed through Amazon CloudFront.
Hosting with the Amplify console differs from CloudFront and Amazon S3. The Amplify console is a managed service providing continuous integration and delivery (CI/CD) and SSL certificates, prioritizing swift deployment of serverless web applications and backend APIs. In contrast, CloudFront and Amazon S3 offer greater flexibility and customization options, particularly for hosting static websites and assets with features like caching and distribution. CloudFront and Amazon S3 are preferable for intricate, high-traffic web applications with specific performance and security needs.
For this post, we use the Amplify console. To learn more about the deployment with Amazon S3 and Amazon CloudFront, refer to documentation.
Now you’re ready to publish the application. There is an option to publish the application to GitHub to support CI/CD pipelines. Amplify has built-in integration with GitHub and can redeploy the application automatically when you push the changes. For simplicity, we use manual deployment.

Choose Manual deployment.
Run the following command:

amplify publish

After the application is published, you will see the following output. Note down this URL to use in a later step.

Log in to the Amplify console, navigate to the servrlsconvchat application, and choose General under App settings in the navigation pane.
Edit the app settings and enter amplifyconsole-backend-role for Service role (you created this role in the prerequisites section).

Now you can proceed to the next section to set up IAM permissions.
Configure IAM permissions
As part of the publishing method you completed, you provisioned a new identity pool. You can view this on the Amazon Cognito console, along with a new user pool. The names will be different from those presented in this post.
As we explained earlier, you need to attach policies to this role to allow interaction with Amazon Bedrock, Amazon Polly, and Amazon Transcribe. To set up IAM permissions, complete the following steps:

On the Amazon Cognito console, choose Identity pools in the navigation pane.
Navigate to your identity pool.
On the User access tab, choose the link for the authenticated role.
Attach the policies that you defined in the prerequisites section.

Amazon Bedrock can only be used with a named user, so we create a sample user in the Amazon Cognito user pool that was provisioned as part of the application publishing process.

On the user pool details page, on the Users tab, choose Create user.
Provide your user information.

You’re now ready to run the application.
Use the sample serverless application
To access the application, navigate to the URL you saved from the output at the end of the application publishing process. Sign in to the application with the user you created in the previous step. You might be asked to change the password the first time you sign in.
Use the Talk button and hold it while you’re asking the question. (We use this approach for the simplicity of demonstrating the abilities of the tool. For people with motor disabilities, we propose using a dedicated button that can be operated with different body parts, or a special keyword to initiate the conversation.)
When you release the button, the application sends your voice to Amazon Transcribe and returns the transcription text. This text is used as an input for an Amazon Bedrock LLM. For this example, we use Anthropic Claude 3 Haiku, but you can modify the code and use another model.
The response from Amazon Bedrock is displayed as text and is also spoken by Amazon Polly.
The conversation history is also stored. This means that you can ask follow-up questions, and the context of the conversation is preserved. For example, we asked, “What is the most famous tower there?” without specifying the location, and our chatbot was able to understand that the context of the question is Paris based on our previous question.
We store the conversation history inside a JavaScript variable, which means that if you refresh the page, the context will be lost. We discuss how to preserve the conversation context in a persistent way later in this post.
To identify that the transcription process is happening, choose and hold the Talk button. The color of the button changes and a microphone icon appears.

Clean up
To clean up your resources, run the following command from the same directory where you ran the Amplify commands:

amplify delete

This command removes the Amplify settings from the React application, Amplify resources, and all Amazon Cognito objects, including the IAM role and Amazon Cognito user pool’s user.
Conclusion
In this post, we presented how to create a fully serverless voice-based contextual chatbot using Amazon Bedrock with Anthropic Claude.
This serves a starting point for a serverless and cost-effective solution. For example, you could extend the solution to have persistent conversational memory for your chats, such as Amazon DynamoDB. If you want to use a Retrieval Augmented Generation (RAG) approach, you can use Amazon Bedrock Knowledge Bases to securely connect FMs in Amazon Bedrock to your company data.
Another approach is to customize the model you use in Amazon Bedrock with your own data using fine-tuning or continued pre-training to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.
For additional resources, refer to the following:

Building a serverless document chat with AWS Lambda and Amazon Bedrock
Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock
Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training

About the Author
Michael Shapira is a Senior Solution Architect covering general topics in AWS and part of the AWS Machine Learning community. He has 16 years’ experience in Software Development. He finds it fascinating to work with cloud technologies and help others on their cloud journey.
Eitan Sela is a Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Maintain access and consider alternatives for Amazon Monitron

Amazon Monitron, the Amazon Web Services (AWS) machine learning (ML) service for industrial equipment condition monitoring, will no longer be available to new customers effective October 31, 2024. Existing customers of Amazon Monitron will be able to purchase devices and use the service as normal. We will continue to sell devices until July 2025 and will honor the 5-year device warranty, including service support. AWS continues to invest in security, availability, and performance improvements for Amazon Monitron, but we do not plan to introduce new features to Amazon Monitron.
This post discusses how customers can maintain access to Amazon Monitron after it is closed to new customers and what some alternatives are to Amazon Monitron.
Maintaining access to Amazon Monitron
Customers will be considered an existing customer if they have commissioned an Amazon Monitron sensor through a project any time in the 30 days prior to October 31, 2024. In order to maintain access to the service after October 31, 2024, customers should create a project and commission at least one sensor.
For any questions or support needed, you may contact your assigned account manager, solutions architect, or create a case from the AWS Management Console.
Ordering Amazon Monitron hardware
For existing Amazon business customers, we will allowlist your account with the existing Amazon Monitron devices. For existing Amazon.com retail customers, the Amazon Monitron team will provide specific ordering instructions according to individual request.
Alternatives to Amazon Monitron
For customers interested in an alternative for your condition monitoring needs, we recommend exploring alternative solutions provided by our AWS Partners: Tactical Edge, IndustrAI, and Factory AI.
Summary
While new customers will no longer have access to Amazon Monitron after October 31, 2024, AWS offers a range of partner solutions through the AWS Partner Network finder. Customers should explore these options to understand what works best for their specific needs.
More details can be found in the following resources at AWS Partner Network.

About the author
Stuart Gillen is a Sr. Product Manager for Monitron, at AWS. Stuart has held a variety of roles in engineering management, business development, product management, and consulting. Most of his career has been focused on industrial applications specifically in reliability practices, maintenance systems, and manufacturing.

Import a question answering fine-tuned model into Amazon Bedrock as a …

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Common generative AI use cases, including but not limited to chatbots, virtual assistants, conversational search, and agent assistants, use FMs to provide responses. Retrieval Augment Generation (RAG) is a technique to optimize the output of FMs by providing context around the questions for these use cases. Fine-tuning the FM is recommended to further optimize the output to follow the brand and industry voice or vocabulary.
Custom Model Import for Amazon Bedrock, in preview now, allows you to import customized FMs created in other environments, such as Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises, into Amazon Bedrock. This post is part of a series that demonstrates various architecture patterns for importing fine-tuned FMs into Amazon Bedrock.
In this post, we provide a step-by-step approach of fine-tuning a Mistral model using SageMaker and import it into Amazon Bedrock using the Custom Import Model feature. We use the OpenOrca dataset to fine-tune the Mistral model and use the SageMaker FMEval library to evaluate the fine-tuned model imported into Amazon Bedrock.
Key Features
Some of the key features of Custom Model Import for Amazon Bedrock are:

This feature allows you to bring your fine-tuned models and leverage the fully managed serverless capabilities of Amazon Bedrock
Currently we are supporting Llama 2, Llama 3, Flan, Mistral Model architectures using this feature with a precisions of FP32, FP16 and BF16 with further quantizations coming soon.
To leverage this feature you can run the import process (covered later in the blog) with your model weights being in Amazon Simple Storage Service (Amazon S3).
You can even leverage your models created using Amazon SageMaker by referencing the Amazon SageMaker model Amazon Resource Names (ARN) which provides for a seamless integration with SageMaker.
Amazon Bedrock will automatically scale your model as your traffic pattern increases and when not in use, scale your model down to 0 thus reducing your costs.

Let us dive into a use-case and see how easy it is to use this feature.
Solution overview
At the time of writing, the Custom Model Import feature in Amazon Bedrock supports models following the architectures and patterns in the following figure.

In this post, we walk through the following high-level steps:

Fine-tune the model using SageMaker.
Import the fine-tuned model into Amazon Bedrock.
Test the imported model.
Evaluate the imported model using the FMEval library.

The following diagram illustrates the solution architecture.

The process includes the following steps:

We use a SageMaker training job to fine-tune the model using a SageMaker JupyterLab notebook. This training job reads the dataset from Amazon Simple Storage Service (Amazon S3) and writes the model back into Amazon S3. This model will then be imported into Amazon Bedrock.
To import the fine-tuned model, you can use the Amazon Bedrock console, the Boto3 library, or APIs.
An import job orchestrates the process to import the model and make the model available from the customer account.

The import job copies all the model artifacts from the user’s account into an AWS managed S3 bucket.

When the import job is complete, the fine-tuned model is made available for invocation from your AWS account.
We use the SageMaker FMEval library in a SageMaker notebook to evaluate the imported model.

The copied model artifacts will remain in the Amazon Bedrock account until the custom imported model is deleted from Amazon Bedrock. Deleting model artifacts in your AWS account S3 bucket doesn’t delete the model or the related artifacts in the Amazon Bedrock managed account. You can delete an imported model from Amazon Bedrock along with all the copied artifacts using either the Amazon Bedrock console, Boto3 library, or APIs.
Additionally, all data (including the model) remains within the selected AWS Region. The model artifacts are imported into the AWS operated deployment account using a virtual private cloud (VPC) endpoint, and you can encrypt your model data using an AWS Key Management Service (AWS KMS) customer managed key.
In the following sections, we dive deep into each of these steps to deploy, test, and evaluate the model.
Prerequisites
We use Mistral-7B-v0.3 in this post because it uses an extended vocabulary compared to its prior version produced by Mistral AI. This model is straightforward to fine-tune, and Mistral AI has provided example fine-tuned models. We use Mistral for this use case because this model supports a 32,000-token context capacity and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Mixture of Experts (MoE) feature, it can achieve higher accuracy for customer support use cases.
Mistral-7B-v0.3 is a gated model on the Hugging Face model repository. You need to review the terms and conditions and request access to the model by submitting your details.

We use Amazon SageMaker Studio to preprocess the data and fine-tune the Mistral model using a SageMaker training job. To set up SageMaker Studio, refer to Launch Amazon SageMaker Studio. Refer to the SageMaker JupyterLab documentation to set up and launch a JupyterLab notebook. You will submit a SageMaker training job to fine-tune the Mistral model from the SageMaker JupyterLab notebook, which can found on the GitHub repo.
Fine-tune the model using QLoRA
To fine-tune the Mistral model, we apply QLoRA and Parameter-Efficient Fine-Tuning (PEFT) optimization techniques. In the provided notebook, you use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.
Prepare the dataset
The first step in the fine-tuning process is to prepare and format the dataset. After you transform the dataset into the Mistral Default Instruct format, you upload it as a JSONL file into the S3 bucket used by the SageMaker session, as shown in the following code:

# Load dataset from the hub
dataset = load_dataset(“Open-Orca/OpenOrca”)
flan_dataset = dataset.filter(lambda example, indice: “flan” in example[“id”], with_indices=True)
flan_dataset = flan_dataset[“train”].train_test_split(test_size=0.01, train_size=0.035)

columns_to_remove = list(dataset[“train”].features)
flan_dataset = flan_dataset.map(create_conversation, remove_columns=columns_to_remove, batched=False)

# save datasets to s3
flan_dataset[“train”].to_json(f”{training_input_path}/train_dataset.json”, orient=”records”, force_ascii=False)
flan_dataset[“test”].to_json(f”{training_input_path}/test_dataset.json”, orient=”records”, force_ascii=False)

You transform the dataset into Mistral Default Instruct format within the SageMaker training job as instructed in the training script (run_fsdp_qlora.py):

################
# Dataset
################

train_dataset = load_dataset(
“json”,
data_files=os.path.join(script_args.dataset_path, “train_dataset.json”),
split=”train”,
)
test_dataset = load_dataset(
“json”,
data_files=os.path.join(script_args.dataset_path, “test_dataset.json”),
split=”train”,
)

################
# Model & Tokenizer
################

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(script_args.model_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = MISTRAL_CHAT_TEMPLATE

# template dataset
def template_dataset(examples):
return{“text”: tokenizer.apply_chat_template(examples[“messages”], tokenize=False)}

train_dataset = train_dataset.map(template_dataset, remove_columns=[“messages”])
test_dataset = test_dataset.map(template_dataset, remove_columns=[“messages”])

Optimize fine-tuning using QLoRA
You optimize your fine-tuning using QLoRA and with the precision provided as input into the training script as SageMaker training job parameters. QLoRA is an efficient fine-tuning approach that reduces memory usage to fine-tune a 65-billion-parameter model on a single 48 GB GPU, preserving the full 16-bit fine-tuning task performance. In this notebook, you use the bitsandbytes library to set up quantization configurations, as shown in the following code:

# Model
torch_dtype = torch.bfloat16 if training_args.bf16 else torch.float32
quant_storage_dtype = torch.bfloat16

if script_args.use_qlora:
print(f”Using QLoRA – {torch_dtype}”)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=”nf4″,
bnb_4bit_compute_dtype=torch_dtype,
bnb_4bit_quant_storage=quant_storage_dtype,
)
else:
quantization_config = None

You use the LoRA config based on the QLoRA paper and Sebastian Raschka experiment, as shown in the following code. Two key points to consider from the Raschka experiment are that QLoRA offers 33% memory savings at the cost of an 39% increase in runtime, and to make sure LoRA is applied to all layers to maximize model performance.

################
# PEFT
################
# LoRA config based on QLoRA paper & Sebastian Raschka experiment
peft_config = LoraConfig(
lora_alpha=8,
lora_dropout=0.05,
r=16,
bias=”none”,
target_modules=”all-linear”,
task_type=”CAUSAL_LM”,
)

You use SFTTrainer to fine-tune the Mistral model:

################
# Training
################
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
dataset_text_field=”text”,
eval_dataset=test_dataset,
peft_config=peft_config,
max_seq_length=script_args.max_seq_length,
tokenizer=tokenizer,
packing=True,
dataset_kwargs={
“add_special_tokens”: False, # We template with special tokens
“append_concat_token”: False, # No need to add additional separator token
},
)

At the time of writing, only merged adapters are supported using the Custom Model Import feature for Amazon Bedrock. Let’s look at how to merge the adapter with the base model next.
Merge the adapters
Adapters are new modules added between layers of a pre-trained network. Creation of these new modules is possible by back-propagating gradients through a frozen, 4-bit quantized pre-trained language model into low-rank adapters in the fine-tuning process. To import the Mistral model into Amazon Bedrock, the adapters need to be merged with the base model and saved in Safetensors format. Use the following code to merge the model adapters and save them in Safetensors format:

# load PEFT model in fp16
model = AutoPeftModelForCausalLM.from_pretrained(
training_args.output_dir,
low_cpu_mem_usage=True,
torch_dtype=torch.float16
)
# Merge LoRA and base model and save
model = model.merge_and_unload()
model.save_pretrained(
sagemaker_save_dir, safe_serialization=True, max_shard_size=”2GB”
)

To import the Mistral model into Amazon Bedrock, the model needs to be in an uncompressed directory within an S3 bucket accessible by the Amazon Bedrock service role used in the import job.
Import the fine-tuned model into Amazon Bedrock
Now that you have fine-tuned the model, you can import the model into Amazon Bedrock. In this section, we demonstrate how to import the model using the Amazon Bedrock console or the SDK.
Import the model using the Amazon Bedrock console
To import the model using the Amazon Bedrock console, see Import a model with Custom Model Import. You use the Import model page as shown in the following screenshot to import the model from the S3 bucket.

After you successfully import the fine-tuned model, you can see the model listed on the Amazon Bedrock console.

Import the model using the SDK
The AWS Boto3 library supports importing custom models into Amazon Bedrock. You can use the following code to import a fine-tuned model from within the notebook into Amazon Bedrock. This is an asynchronous method.

import boto3
import datetime
br_client = boto3.client(‘bedrock’, region_name='<aws-region-name>’)
pt_model_nm = “<bedrock-custom-model-name>”
pt_imp_jb_nm = f”{pt_model_nm}-{datetime.datetime.now().strftime(‘%Y%m%d%M%H%S’)}”
role_arn = “<<bedrock_role_with_custom_model_import_policy>>”
pt_model_src = {“s3DataSource”: {“s3Uri”: f”{pt_pubmed_model_s3_path}”}}
resp = br_client.create_model_import_job(jobName=pt_imp_jb_nm,
importedModelName=pt_model_nm,
roleArn=role_arn,
modelDataSource=pt_model_src)

Test the imported model
Now that you have imported the fine-tuned model into Amazon Bedrock, you can test the model. In this section, we demonstrate how to test the model using the Amazon Bedrock console or the SDK.
Test the model on the Amazon Bedrock console
You can test the imported model using an Amazon Bedrock playground, as illustrated in the following screenshot.

Test the model using the SDK
You can also use the Amazon Bedrock Invoke Model API to run the fine-tuned imported model, as shown in the following code:

client = boto3.client(“bedrock-runtime”, region_name=”us-west-2″)
model_id = “<<replace with the imported bedrock model arn>>”

def call_invoke_model_and_print(native_request):
request = json.dumps(native_request)

try:
# Invoke the model with the request.
response = client.invoke_model(modelId=model_id, body=request)
model_response = json.loads(response[“body”].read())

response_text = model_response[“outputs”][0][“text”]
print(response_text)
except (ClientError, Exception) as e:
print(f”ERROR: Can’t invoke ‘{model_id}’. Reason: {e}”)
exit(1)

prompt = “will there be a season 5 of shadowhunters”
formatted_prompt = f”[INST] {prompt} [/INST]</s>”
native_request = {
“prompt”: formatted_prompt,
“max_tokens”: 64,
“top_p”: 0.9,
“temperature”: 0.91
}
call_invoke_model_and_print(native_request)

The custom Mistral model that you imported using Amazon Bedrock supports temperature, top_p, and max_gen_len parameters when invoking the model for inferencing. The inference parameters top_k, max_seq_len, max_batch_size, and max_new_tokens are not supported for a custom Mistral fine-tuned model.
Evaluate the imported model
Now that you have imported and tested the model, let’s evaluate the imported model using the SageMaker FMEval library. For more details, refer to Evaluate Bedrock Imported Models. To evaluate the question answering task, we use the metrics F1 Score, Exact Match Score, Quasi Exact Match Score, Precision Over Words, and Recall Over Words. The key metrics for the question answering tasks are Exact Match, Quasi-Exact Match, and F1 over words evaluated by comparing the model predicted answers against the ground truth answers. The FMEval library supports out-of-the-box evaluation algorithms for metrics such as accuracy, QA Accuracy, and others detailed in the FMEval documentation. Because you fine-tuned the Mistral model for question answering, you can use the QA Accuracy algorithm, as shown in the following code. The FMEval library supports these metrics for the QA Accuracy algorithm.

config = DataConfig(
dataset_name=”trex_sample”,
dataset_uri=”data/test_dataset.json”,
dataset_mime_type=MIME_TYPE_JSONLINES,
model_input_location=”question”,
target_output_location=”answer”
)
bedrock_model_runner = BedrockModelRunner(
model_id=model_id,
output=’outputs[0].text’,
content_template='{“prompt”: $prompt, “max_tokens”: 500}’,
)

eval_algo = QAAccuracy()
eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config,
prompt_template=”[INST]$model_input[/INST]”, save=True)

You can get the consolidated metrics for the imported model as follows:

for op in eval_output:
print(f”Eval Name: {op.eval_name}”)
for score in op.dataset_scores:
print(f”{score.name} : {score.value}”)

Clean up
To delete the imported model from Amazon Bedrock, navigate to the model on the Amazon Bedrock console. On the options menu (three dots), choose Delete.

To delete the SageMaker domain along with the SageMaker JupyterLab space, refer to Delete an Amazon SageMaker domain. You may also want to delete the S3 buckets where the data and model are stored. For instructions, see Deleting a bucket.
Conclusion
In this post, we explained the different aspects of fine-tuning a Mistral model using SageMaker, importing the model into Amazon Bedrock, invoking the model using both an Amazon Bedrock playground and Boto3, and then evaluating the imported model using the FMEval library. You can use this feature to import base FMs or FMs fine-tuned either on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the models without any heavy lifting in your generative AI applications. Explore the Custom Model Import feature for Amazon Bedrock to deploy FMs fine-tuned for code generation tasks in a secure and scalable manner. Visit our GitHub repository to explore samples prepared for fine-tuning and importing models from various families.

About the Authors
Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.
Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.
Evandro Franco is a Sr. AI/ML Specialist Solutions Architect at Amazon Web Services. He helps AWS customers overcome business challenges related to AI/ML on top of AWS. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning.
Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Prior to joining AWS, Felipe worked with GE Digital and SLB, where he focused on modeling and optimization products for industrial applications.
Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, artificial intelligence, machine learning, and system design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.
Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock, where he has had the privilege to listen to customer needs first-hand and understands what it takes to build and launch scalable and secure Gen AI products. Prior to Bedrock, he worked on numerous products in Amazon, ranging from devices to Ads to Robotics.
Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Using task-specific models from AI21 Labs on AWS

In this blog post, we will show you how to leverage AI21 Labs’ Task-Specific Models (TSMs) on AWS to enhance your business operations. You will learn the steps to subscribe to AI21 Labs in the AWS Marketplace, set up a domain in Amazon SageMaker, and utilize AI21 TSMs via SageMaker JumpStart.
AI21 Labs is a foundation model (FM) provider focusing on building state-of-the-art language models. AI21 Task Specific Models (TSMs) are built for answering questions, summarization, condensing lengthy texts, and so on. AI21 TSMs are available in Amazon SageMaker Jumpstart.
Here are the AI21 TSMs that can be accessed and customized in SageMaker JumpStart: AI21 Contextual Answers, AI21 Summarize, AI21 Paraphrase, and AI21 Grammatical Error Correction.
AI21 FMs (Jamba-Instruct, AI21 Jurassic-2 Ultra, AI21 Jurassic-2 Mid) are available in Amazon Bedrock and can be used for large language model (LLM) use cases. We used AI21 TSMs available in SageMaker Jumpstart for this post. SageMaker Jumpstart enables you to select, compare, and evaluate available AI21 TSMs.
AI21’s TSMs
Foundation models can solve many tasks, but not every task is unique. Some commercial tasks are common across many applications. AI21 Labs’ TSMs are specialized models built to solve a particular problem. They’re built to deliver out-of-box value, cost effectiveness, and higher accuracy for the common tasks behind many commercial use-cases. In this post, we will explore three of AI21 Labs’ TSMs and their unique capabilities.
Foundation models are built and trained on massive datasets to perform a variety of tasks. Unlike FMs, TSMs are trained to perform unique tasks.
When your use case is supported by a TSM, you quickly realize benefits such as improved refusal rates when you don’t want the model to provide answers unless they’re grounded in actual document content.

Paraphrase: This model is used to enhance content creation and communication by generating varied versions of text while maintaining a consistent tone and style. This model is ideal for creating multiple product descriptions, marketing materials, and customer support responses, improving clarity and engagement. It also simplifies complex documents, making information more accessible.
Summarize: This model is used to condense lengthy texts into concise summaries while preserving the original meaning. This model is particularly useful for processing large documents, such as financial reports, legal documents, and technical papers, making critical information more accessible and comprehensible.
Contextual answers: This model is used to significantly enhance information retrieval and customer support processes. This model excels at providing accurate and relevant answers based on specific document contexts, making it particularly useful in customer service, legal, finance, and educational sectors. It streamlines workflows by quickly accessing relevant information from extensive databases, reducing response times and improving customer satisfaction.

Prerequisites
To follow the steps in this post, you must have the following prerequisites in place:
AWS account setup
Completing the labs in this post requires an AWS account and SageMaker environments set up. If you don’t have an AWS account, see Complete your AWS registration for the steps to create one.
AWS Marketplace opt-in
AI21 TSMs can also be accessed through Amazon Marketplace for subscription. Using AWS Marketplace, you can subscribe to AI21 TSMs and deploy SageMaker endpoints.

To do these exercises you must subscribe to the following offerings in the AWS Marketplace

AI21 Summarize
AI21 Contextual Answers
AI21 Paraphrase

Service quota limits
To use some of the GPU’s required to run AI21’s task specific models, you must have the required service quota limits. You can request a service quota limit increase in the AWS Management Console. Limits are account and resource specific.
To create a service request, search for service quotas in the console search bar. Select the service to land go to the dashboard and enter the name of the GPU (for example, ml.g5.48xlarge). Ensure the quota is for endpoint usage
Estimated cost
The following is the estimated cost to walk through the solution in this post.
Contextual answers:

We used an ml.g5.48xlarge

By default, AWS accounts don’t have access to this GPU. You must request a service quota limit increase (see the previous section: Service Quota Limits).

The notebook runtime was approximately 15 minutes.
The cost was $20.41 (billed on an hourly basis).

Summarize notebook

We used an ml.g4dn.12xlarge GPU.

You must request a service quota limit increase (see the previous section: Service Quota Limits).

The notebook runtime was approximately 10 minutes.
The cost was $4.94 (billed on an hourly basis).

Paraphrase notebook

We used the ml.g4dn.12xlarge GPU.

You must request a service quota limit increase (see the previous section: Service Quota Limits).

The notebook runtime approximately 10 minutes.
The cost was $4.94 (billed on an hourly basis).

Total cost: $30.29 (1 hour charge for each deployed endpoint)
Using AI21 models on AWS
Getting started
In this section, you will access AI21 TSMs in SageMaker Jumpstart.  These interactive notebooks contain code to deploy TSM endpoints and will also provide example code blocks to run inference.  These first few steps are pre-requisites to deploying the same notebooks.  If you already have a SageMaker domain and username set up, you may skip to Step 7.

Use the search bar in the AWS Management Console to navigate to Amazon SageMaker , as shown in the following figure.

If you don’t already have one set up, you must create a SageMaker domain. A domain consists of an associated Amazon Elastic File System (Amazon EFS) volume; a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.
Users within a domain can share notebook files and other artifacts with each other. For more information, see Learn about Amazon SageMaker domain entities and statuses. For today’s exercises, you will use Quick Set-Up to deploy an environment.

Choose Create a SageMaker domain as shown in the following figure.
Select Quick setup. After you choose Set up the domain will begin creation
After a moment, your domain will be created.
Choose Add user.
You can keep the default user profile values.
Launch Studio by choosing Launch button and then selecting Studio.
Choose JumpStart in the navigation pane as shown in the following figure.

Here you can see the model providers for our JumpStart notebooks. You will see the model providers for JumpStart notebooks.

Select AI21 Labs to see their available models.

Each of AI21’s models has an associated model card. A model card provides key information about the model such as its intended use cases, training, and evaluation details. For this example, you will use the Summarize, Paraphrase, and Contextual Answers TSMs.

Start with Contextual Answers. Select the AI21 Contextual Answers model card.

A sample notebook is included as part of the model. Jupyter Notebooks are a popular way to interact with code and LLMs.

Choose Notebooks to explore the notebook.
To run the notebook’s code blocks, choose Open in JupyterLab.
If you do not already have an existing space, choose Create new space and enter an appropriate name. When ready, choose Create space and open notebook.

It can take up to 5 minutes to open your notebook. SageMaker Spaces are used to manage the storage and resource needs of some SageMaker Studio applications. Each space has a 1:1 relationship with an instance of an application.

After the notebook opens, you will be prompted to select a kernal. Ensure Python 3 is selected and choose Select.

Navigating the notebook exercises
Repeat the preceding process to import the remaining notebooks.
Each AI21 notebook demonstrates required code imports, version checks, model selection, endpoint creation, and inferences showcasing the TSM’s unique strengths through code blocks and example prompts
Each notebook will have a clean up step at the end to delete your deployed endpoints. It’s important to terminate any running endpoints to avoid additional costs.
Contextual Answers JumpStart Notebook
AWS customers and partners can use AI21 Labs’s Contextual Answers model to significantly enhance their information retrieval and customer support processes. This model excels at providing accurate and relevant answers based on specific context, making it useful in customer service, legal, finance, and educational sectors.
The following are code snippets from AI21’s Contextual Answers TSM through JumpStart. Notice that there is no prompt engineering required. The only input is the question and the context provided.
Input:

financial_context = “””In 2020 and 2021, enormous QE — approximately $4.4 trillion, or 18%, of 2021 gross domestic product (GDP) — and enormous fiscal stimulus (which has been and always will be inflationary) — approximately $5 trillion, or 21%, of 2021 GDP — stabilized markets and allowed companies to raise enormous amounts of capital. In addition, this infusion of capital saved many small businesses and put more than $2.5 trillion in the hands of consumers and almost $1 trillion into state and local coffers. These actions led to a rapid decline in unemployment, dropping from 15% to under 4% in 20 months — the magnitude and speed of which were both unprecedented. Additionally, the economy grew 7% in 2021 despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods. Fortunately, during these two years, vaccines for COVID-19 were also rapidly developed and distributed.
In today’s economy, the consumer is in excellent financial shape (on average), with leverage among the lowest on record, excellent mortgage underwriting (even though we’ve had home price appreciation), plentiful jobs with wage increases and more than $2 trillion in excess savings, mostly due to government stimulus. Most consumers and companies (and states) are still flush with the money generated in 2020 and 2021, with consumer spending over the last several months 12% above pre-COVID-19 levels. (But we must recognize that the account balances in lower-income households, smaller to begin with, are going down faster and that income for those households is not keeping pace with rising inflation.)
Today’s economic landscape is completely different from the 2008 financial crisis when the consumer was extraordinarily overleveraged, as was the financial system as a whole — from banks and investment banks to shadow banks, hedge funds, private equity, Fannie Mae and many other entities. In addition, home price appreciation, fed by bad underwriting and leverage in the mortgage system, led to excessive speculation, which was missed by virtually everyone — eventually leading to nearly $1 trillion in actual losses.
“””
question = “Did the economy shrink after the Omicron variant arrived?”
response = client.answer.create(
context=financial_context,
question=question,
)

print(response.answer)

Output:

No, the economy did not shrink after the Omicron variant arrived. In fact, the economy grew 7% in 2021, despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods.

As mentioned in our introduction, AI21’s Contextual Answers model does not provide answers to questions outside of the context provided. If the prompt includes a question unrelated to 2020/2021 economy, you will get a response as shown in the following example.
Input:

irrelevant_question = “How did COVID-19 affect the financial crisis of 2008?”

response = client.answer.create(
context=financial_context,
question=irrelevant_question,
)

print(response.answer)

Output:
None
When finished, you can delete your deployed endpoint by running the final two cells of the notebook.

model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)
model.delete_model()

You can import the other notebooks by navigating to SageMaker JumpStart and repeating the same process you used to import this first notebook.
Summarize JumpStart Notebook
AWS customers and partners can uses AI21 Labs’ Summarize model to condense lengthy texts into concise summaries while preserving the original meaning. This model is particularly useful for processing large documents, such as financial reports, legal documents, and technical papers, making critical information more accessible and comprehensible.
The following are highlight code snippets from AI21’s Summarize TSM using JumpStart. Notice that the  input must include the full text that the user wants to summarize.
Input:

text = “””The error affected a number of international flights leaving the terminal on Wednesday, with some airlines urging passengers to travel only with hand luggage.
Virgin Atlantic said all airlines flying out of the terminal had been affected.
Passengers have been warned it may be days before they are reunited with luggage.
An airport spokesperson apologised and said the fault had now been fixed.
Virgin Atlantic said it would ensure all bags were sent out as soon as possible.
It added customers should retain receipts for anything they had bought and make a claim to be reimbursed.
Passengers, who were informed by e-mail of the problem, took to social media to vent their frustrations.
One branded the situation “ludicrous” and said he was only told 12 hours before his flight.
The airport said it could not confirm what the problem was, what had caused it or how many people had been affected.”””

response = client.summarize.create(
source=text,
source_type=DocumentType.TEXT,
)

print(“Original text:”)
print(text)
print(“================”)
print(“Summary:”)
print(response.summary)

Output:

Original text:
The error affected a number of international flights leaving the terminal on Wednesday, with some airlines urging passengers to travel only with hand luggage.
Virgin Atlantic said all airlines flying out of the terminal had been affected.
Passengers have been warned it may be days before they are reunited with luggage.
An airport spokesperson apologised and said the fault had now been fixed.
Virgin Atlantic said it would ensure all bags were sent out as soon as possible.
It added customers should retain receipts for anything they had bought and make a claim to be reimbursed.
Passengers, who were informed by e-mail of the problem, took to social media to vent their frustrations.
One branded the situation “ludicrous” and said he was only told 12 hours before his flight.
The airport said it could not confirm what the problem was, what had caused it or how many people had been affected.
================
Summary:
A number of international flights leaving the terminal were affected by the error on Wednesday, with some airlines urging passengers to travel only with hand luggage. Passengers were warned it may be days before they are reunited with their luggage.

Paraphrase JumpStart Notebook
AWS customers and partners can use AI21 Labs’s Paraphrase TSM through JumpStart to enhance content creation and communication by generating varied versions of text.
The following are highlight code snippets from AI21’s Paraphrase TSM using JumpStart. Notice that there is no extensive prompt engineering required. The only input required is the full text that the user wants to paraphrase and a chosen style, for example casual, formal, and so on.
Input:

text = “Throughout this page, we will explore the advantages and features of the Paraphrase model.”

response = client.paraphrase.create(
text=text,
style=”formal”

)

print(response.suggestions) Output:

[Suggestion(text=’We will examine the advantages and features of the Paraphrase model throughout this page.’), Suggestion(text=’The purpose of this page is to examine the advantages and features of the Paraphrase model.’), Suggestion(text=’On this page, we will discuss the advantages and features of the Paraphrase model.’), Suggestion(text=’This page will provide an overview of the advantages and features of the Paraphrase model.’), Suggestion(text=’In this article, we will examine the advantages and features of the Paraphrase model.’), Suggestion(text=’Here we will explore the advantages and features of the Paraphrase model.’), Suggestion(text=’The purpose of this page is to describe the advantages and features of the Paraphrase model.’), Suggestion(text=’In this page, we will examine the advantages and features of the Paraphrase model.’), Suggestion(text=’The Paraphrase model will be reviewed on this page with an emphasis on its advantages and features.’), Suggestion(text=’Our goal on this page will be to explore the benefits and features of the Paraphrase model.’)]

Input:

print(“Original sentence:”)
print(text)
print(“============================”)
print(“Suggestions:”)
print(“n”.join([“- ” + x.text for x in response.suggestions]))

Output:

Original sentence:
Throughout this page, we will explore the advantages and features of the Paraphrase model.
============================
Suggestions:
– We will examine the advantages and features of the Paraphrase model throughout this page.
– The purpose of this page is to examine the advantages and features of the Paraphrase model.
– On this page, we will discuss the advantages and features of the Paraphrase model.
– This page will provide an overview of the advantages and features of the Paraphrase model.
– In this article, we will examine the advantages and features of the Paraphrase model.
– Here we will explore the advantages and features of the Paraphrase model.
– The purpose of this page is to describe the advantages and features of the Paraphrase model.
– In this page, we will examine the advantages and features of the Paraphrase model.
– The Paraphrase model will be reviewed on this page with an emphasis on its advantages and features.
– Our goal on this page will be to explore the benefits and features of the Paraphrase model.

Less prompt engineering
A key advantage of AI21’s task-specific models is the reduced need for complex prompt engineering compared to foundation models. Let’s consider how you might approach a summarization task using a foundation model compared to using AI21’s specialized Summarize TSM.
For a foundation model, you might need to craft an elaborate prompt template with detailed instructions:

python prompt_template = “You are a highly capable summarization assistant. Concisely summarize the given text while preserving key details and overall meaning. Use clear language tailored for human readers.nnText:

[INPUT_TEXT]nnSummary:” “` To summarize text with this foundation model, you’d populate the template and pass the full prompt: “`python input_text = “Insert text to summarize here…” prompt = prompt_template.replace(“[INPUT_TEXT]”, input_text) summary = model(prompt)

In contrast, using AI21’s Summarize TSM is more straightforward:

python input_text = “Insert text to summarize here…” summary = summarize_model(input_text)

That’s it! With the Summarize TSM, you pass the input text directly to the model; there’s no need for an intricate prompt template.
Lower cost and higher accuracy
By using TSMs, you can achieve lower costs and higher accuracy. As demonstrated previously in the Contextual Notebook, TSMs have a higher refusal rate than most mainstream models, which can lead to higher accuracy. This characteristic of TSMs is beneficial in use cases where wrong answers are less acceptable.
Conclusion
In this post, we explored AI21 Labs’s approach to generative AI using task-specific models (TSMs). Through guided exercises, you walked through the process of setting up a SageMaker domain and importing sample JumpStart Notebooks to experiment with AI21’s TSMs, including Contextual Answers, Paraphrase, and Summarize.
Throughout the exercises, you saw the potential benefits of task-specific models compared to foundation models. When asking questions outside the context of the intended use case, the AI21 TSMs refused to answer, making them less prone to hallucinating or generating nonsensical outputs beyond their intended domain—a critical factor for applications that require precision and safety. Lastly, we highlighted how task-specific models are designed from the outset to excel at specific tasks, streamlining development and reducing the need for extensive prompt engineering and fine-tuning, which can them a more cost-effective solution.
Whether you’re a data scientist, machine learning practitioner, or someone curious about AI advancements, we hope this post has provided valuable insights into the advantages of AI21 Labs’s task-specific approach. As the field of generative AI continues to evolve rapidly, we encourage you to stay curious, experiment with various approaches, and ultimately choose the one that best aligns with your project’s unique requirements and goals. Visit AWS GitHub for other example use cases and codes to experiment in your own environment.
Additional resources

AI21 Labs
Transform your business with generative AI
Amazon SageMaker
Getting started with Amazon SageMaker JumpStart
Amazon Bedrock
Task-Specific Models overview

About the Authors
Joe Wilson is a Solutions Architect at Amazon Web Services supporting nonprofit organizations. He has core competencies in data analytics, AI/ML and GenAI. Joe background is in data science and international development. He is passionate about leveraging data and technology for social good.
Pat Wilson is a Solutions Architect at Amazon Web Services with a focus on AI/ML workloads and security. He currently supports Federal Partners. Outside of work Pat enjoys learning, working out, and spending time with family/friends.
Josh Famestad is a Solutions Architect at Amazon Web Services. Josh works with public sector customers to build and execute cloud based approaches to deliver on business priorities.